Skip to content

RAG System for Internal Documentation

Status: Implementation guide with checklist. Tasks are unchecked by design - check them off as you implement.

Project Goal: Build a production-ready RAG (Retrieval-Augmented Generation) system for Minnova's internal documentation to serve as a proof-of-concept and learning exercise before taking on client RAG projects.

Target Corpus: - Minnova knowledge-base (services, accounting, ops, faq) - Minnova-site content (blog posts, service pages) - AGENTS.md and other technical documentation

Success Criteria: - Query: "How do we handle accounting?" -> Returns relevant content from accounting/README.md with citations - Query: "What services do we offer?" -> Returns services list with proper source links - Response time < 2 seconds - Citation accuracy: answers link back to exact source documents - Can be self-hosted or use managed services


Phase 1: Setup & Planning (2-3 hours)

Tasks

  • Set up PostgreSQL with pgvector extension locally
  • Create new Elixir/Phoenix project or add to existing
  • Add dependencies: req, req_llm, ecto, pgvector
  • Get OpenAI API key (or Anthropic)
  • Set up migrations for vector storage table

Deliverables

  • Working dev environment with pgvector
  • Database schema for storing document chunks + embeddings
  • Config for LLM API credentials

Elixir Dependencies

# mix.exs
defp deps do
  [
    {:phoenix, "~> 1.7"},
    {:ecto_sql, "~> 3.10"},
    {:postgrex, ">= 0.0.0"},
    {:pgvector, "~> 0.2"},  # PostgreSQL vector extension
    {:req, "~> 0.4"},        # HTTP client
    {:req_llm, "~> 0.1"},    # LLM provider wrapper
    {:jason, "~> 1.4"}       # JSON parsing
  ]
end

Database Schema

defmodule Minnova.Repo.Migrations.CreateDocumentChunks do
  use Ecto.Migration

  def up do
    execute "CREATE EXTENSION IF NOT EXISTS vector"

    create table(:document_chunks) do
      add :content, :text, null: false
      add :source_file, :string, null: false
      add :heading, :string
      add :line_start, :integer
      add :line_end, :integer
      add :embedding, :vector, size: 1536  # OpenAI embedding size

      timestamps()
    end

    create index(:document_chunks, [:embedding], using: :ivfflat, opclass: :vector_cosine_ops)
  end

  def down do
    drop table(:document_chunks)
    execute "DROP EXTENSION IF EXISTS vector"
  end
end

Phase 2: Document Ingestion Pipeline (4-6 hours)

Tasks

  • Write Mix task to parse all markdown files from knowledge-base
  • Implement chunking strategy (start simple: chunk by heading or fixed size ~500 tokens)
  • Generate embeddings for each chunk via OpenAI API
  • Store chunks + embeddings in PostgreSQL using Ecto
  • Test: verify all docs are indexed and searchable

Deliverables

  • Mix task: mix rag.ingest /path/to/knowledge-base
  • PostgreSQL populated with embedded chunks
  • Metadata tracking (source file, line numbers, headings)

Technical Decisions

  • Chunking: Start with heading-based (## sections in markdown)
  • Embeddings: OpenAI text-embedding-3-small ($0.02 per 1M tokens)
  • Storage: PostgreSQL with pgvector (cosine similarity search)

Example Implementation

defmodule Minnova.RAG.Ingest do
  @moduledoc "Ingests markdown files and generates embeddings"

  alias Minnova.Repo
  alias Minnova.RAG.DocumentChunk

  def ingest_directory(path) do
    path
    |> Path.join("**/*.md")
    |> Path.wildcard()
    |> Enum.each(&ingest_file/1)
  end

  defp ingest_file(file_path) do
    file_path
    |> File.read!()
    |> chunk_by_heading()
    |> Enum.each(fn chunk ->
      embedding = generate_embedding(chunk.content)

      %DocumentChunk{}
      |> DocumentChunk.changeset(%{
        content: chunk.content,
        source_file: file_path,
        heading: chunk.heading,
        embedding: embedding
      })
      |> Repo.insert!()
    end)
  end

  defp generate_embedding(text) do
    # Use req + OpenAI API
    # Returns list of 1536 floats
  end

  defp chunk_by_heading(markdown) do
    # Split on ## headings
    # Return list of %{heading: "...", content: "..."}
  end
end

Phase 3: Query & Retrieval (3-4 hours)

Tasks

  • Implement query embedding (same model as document embeddings)
  • Vector similarity search (retrieve top-k chunks, start with k=5)
  • Reranking logic (optional: use Cohere rerank or simple keyword scoring)
  • Test retrieval quality on sample queries
  • Tune k parameter and similarity threshold

Deliverables

  • Query function that returns top-k relevant chunks
  • Evaluation of retrieval quality (manual testing with 10-15 queries)
  • Documentation of optimal k value and similarity threshold

Sample Queries for Testing

  • "How do we handle accounting?"
  • "What services do we offer?"
  • "Explain trunk-based development"
  • "What is our tech stack?"
  • "How do AI agents work?"

Phase 4: Answer Generation with Citations (3-4 hours)

Tasks

  • Build prompt template that includes retrieved chunks
  • Instruct LLM to cite sources in answers
  • Parse LLM response and format citations as links
  • Implement fallback for low-confidence answers ("I don't have enough information")
  • Test citation accuracy manually

Deliverables

  • Prompt template with clear instructions for citations
  • Answer generation function
  • Citation formatter (markdown links back to source files)

Prompt Structure (Initial)

You are a helpful assistant for Minnova's internal documentation.

Context:
{retrieved_chunks_with_sources}

Question: {user_question}

Instructions:
- Answer based ONLY on the provided context
- Include inline citations [1], [2], etc.
- If the context doesn't contain enough information, say so
- Keep answers concise and practical

Answer:

Phase 5: Basic UI/Interface (2-3 hours)

Tasks

  • Build simple CLI interface (start here)
  • Or: Build simple web UI (Phoenix LiveView or Next.js)
  • Display query, answer, and citations
  • Add loading states and error handling

Deliverables

  • Working interface for testing
  • README with usage instructions

CLI Option (Faster)

$ mix rag.query "How do we handle accounting?"

Web UI Option (Better UX)

  • Simple Phoenix LiveView page
  • Input box for query
  • Display answer with citations
  • Show retrieved chunks (for debugging)

Phase 6: Edge Cases & Refinement (4-6 hours)

Tasks

  • Test query rewriting (e.g., "KYC flow" vs "customer onboarding")
  • Implement hybrid search (vector + keyword) if needed
  • Handle multi-document answers (citations from multiple sources)
  • Add observability (log queries, latency, retrieval quality)
  • Test with diverse queries and tune chunking strategy

Deliverables

  • Refined chunking strategy (based on actual retrieval quality)
  • Query rewriting logic (if needed)
  • Observability dashboard or logs

Known Edge Cases to Test

  • Synonyms: "KYC" vs "customer onboarding"
  • Multi-step questions: "How do we handle accounting and what services do we offer?"
  • Missing information: "What is our revenue?" (should say "I don't know")
  • Ambiguous queries: "How do we work?" (process vs services vs tech stack)

Phase 7: Production Hardening (Optional, 3-4 hours)

Tasks

  • Add access control (if hosting publicly)
  • Implement PII redaction (if needed)
  • Add rate limiting
  • Set up monitoring and alerts
  • Write runbook for common issues

Deliverables

  • Production-ready deployment
  • Monitoring dashboard
  • Runbook for operations

Phase 8: Documentation & Blog Post (2-3 hours)

Tasks

  • Document architecture and design decisions
  • Write developer guide for maintaining the system
  • Update blog post with learnings from building the system
  • Create demo video or screenshots

Deliverables

  • Architecture documentation
  • Developer guide
  • Updated blog post (experience report instead of how-to)
  • Demo assets

Timeline Estimate

MVP (Phases 1-5): 14-20 hours (1-2 weekends) Production-ready (Phases 1-7): 25-35 hours (2-3 weekends) Full project (Phases 1-8): 30-40 hours


Tech Stack Recommendation

Elixir-first approach (recommended): - Language: Elixir (fits your stack, plays to your strengths) - Embeddings: OpenAI text-embedding-3-small (via HTTPoison or Req) - Vector DB: PostgreSQL + pgvector extension (you already know Postgres) - LLM: OpenAI GPT-4o-mini or Anthropic Claude (via req_llm) - Framework: LangChainEx or custom with req_llm

Why this stack: - PostgreSQL + pgvector means no separate vector DB to manage - Ecto for migrations and queries (familiar territory) - req_llm handles LLM API calls cleanly - Can deploy alongside existing Phoenix apps - Simpler ops than Python + separate services

Alternative (if you want something working faster): - Python + LangChain + Pinecone (2-3 hours to MVP) - Then port to Elixir once you understand the flow


Success Metrics

  • Retrieval accuracy: 80%+ of queries return relevant chunks
  • Citation accuracy: 90%+ of answers cite correct sources
  • Response time: < 2 seconds end-to-end
  • User satisfaction: Manual testing feels "good enough" for internal use

Next Actions

  1. Pick tech stack (Python MVP or Elixir from start)
  2. Set up development environment
  3. Start Phase 1 (research & planning)
  4. Build ingestion pipeline first (Phase 2)
  5. Iterate quickly, learn as you go

Resources

Elixir Libraries: - pgvector-elixir - PostgreSQL vector extension for Elixir - req_llm - LLM API wrapper using Req - LangChainEx - Elixir LangChain implementation - Req - HTTP client for Elixir

AI/ML APIs: - OpenAI Embeddings Guide - OpenAI API Reference - Anthropic Claude API

PostgreSQL: - pgvector Documentation - pgvector Performance Tuning


Open Questions

  • Should we self-host or use managed services?
  • Do we need query rewriting from day 1?
  • Should we build CLI or web UI first?
  • What's the budget for API costs (embeddings + LLM)?
  • Do we want to open-source this after building it?