RAG System for Internal Documentation¶

Status: Implementation guide with checklist. Tasks are unchecked by design - check them off as you implement.

Project Goal: Build a production-ready RAG (Retrieval-Augmented Generation) system for Minnova's internal documentation to serve as a proof-of-concept and learning exercise before taking on client RAG projects.

Target Corpus: - Minnova knowledge-base (services, accounting, ops, faq) - Minnova-site content (blog posts, service pages) - AGENTS.md and other technical documentation

Success Criteria: - Query: "How do we handle accounting?" -> Returns relevant content from accounting/README.md with citations - Query: "What services do we offer?" -> Returns services list with proper source links - Response time < 2 seconds - Citation accuracy: answers link back to exact source documents - Can be self-hosted or use managed services

Phase 1: Setup & Planning (2-3 hours)¶

Tasks¶

Set up PostgreSQL with pgvector extension locally
Create new Elixir/Phoenix project or add to existing
Add dependencies: req, req_llm, ecto, pgvector
Get OpenAI API key (or Anthropic)
Set up migrations for vector storage table

Deliverables¶

Working dev environment with pgvector
Database schema for storing document chunks + embeddings
Config for LLM API credentials

Elixir Dependencies¶

# mix.exs
defp deps do
  [
    {:phoenix, "~> 1.7"},
    {:ecto_sql, "~> 3.10"},
    {:postgrex, ">= 0.0.0"},
    {:pgvector, "~> 0.2"},  # PostgreSQL vector extension
    {:req, "~> 0.4"},        # HTTP client
    {:req_llm, "~> 0.1"},    # LLM provider wrapper
    {:jason, "~> 1.4"}       # JSON parsing
  ]
end

Database Schema¶

defmodule Minnova.Repo.Migrations.CreateDocumentChunks do
  use Ecto.Migration

  def up do
    execute "CREATE EXTENSION IF NOT EXISTS vector"

    create table(:document_chunks) do
      add :content, :text, null: false
      add :source_file, :string, null: false
      add :heading, :string
      add :line_start, :integer
      add :line_end, :integer
      add :embedding, :vector, size: 1536  # OpenAI embedding size

      timestamps()
    end

    create index(:document_chunks, [:embedding], using: :ivfflat, opclass: :vector_cosine_ops)
  end

  def down do
    drop table(:document_chunks)
    execute "DROP EXTENSION IF EXISTS vector"
  end
end

Phase 2: Document Ingestion Pipeline (4-6 hours)¶

Tasks¶

Write Mix task to parse all markdown files from knowledge-base
Implement chunking strategy (start simple: chunk by heading or fixed size ~500 tokens)
Generate embeddings for each chunk via OpenAI API
Store chunks + embeddings in PostgreSQL using Ecto
Test: verify all docs are indexed and searchable

Deliverables¶

Mix task: mix rag.ingest /path/to/knowledge-base
PostgreSQL populated with embedded chunks
Metadata tracking (source file, line numbers, headings)

Technical Decisions¶

Chunking: Start with heading-based (## sections in markdown)
Embeddings: OpenAI text-embedding-3-small ($0.02 per 1M tokens)
Storage: PostgreSQL with pgvector (cosine similarity search)

Example Implementation¶

defmodule Minnova.RAG.Ingest do
  @moduledoc "Ingests markdown files and generates embeddings"

  alias Minnova.Repo
  alias Minnova.RAG.DocumentChunk

  def ingest_directory(path) do
    path
    |> Path.join("**/*.md")
    |> Path.wildcard()
    |> Enum.each(&ingest_file/1)
  end

  defp ingest_file(file_path) do
    file_path
    |> File.read!()
    |> chunk_by_heading()
    |> Enum.each(fn chunk ->
      embedding = generate_embedding(chunk.content)

      %DocumentChunk{}
      |> DocumentChunk.changeset(%{
        content: chunk.content,
        source_file: file_path,
        heading: chunk.heading,
        embedding: embedding
      })
      |> Repo.insert!()
    end)
  end

  defp generate_embedding(text) do
    # Use req + OpenAI API
    # Returns list of 1536 floats
  end

  defp chunk_by_heading(markdown) do
    # Split on ## headings
    # Return list of %{heading: "...", content: "..."}
  end
end

Phase 3: Query & Retrieval (3-4 hours)¶

Tasks¶

Implement query embedding (same model as document embeddings)
Vector similarity search (retrieve top-k chunks, start with k=5)
Reranking logic (optional: use Cohere rerank or simple keyword scoring)
Test retrieval quality on sample queries
Tune k parameter and similarity threshold

Deliverables¶

Query function that returns top-k relevant chunks
Evaluation of retrieval quality (manual testing with 10-15 queries)
Documentation of optimal k value and similarity threshold

Sample Queries for Testing¶

"How do we handle accounting?"
"What services do we offer?"
"Explain trunk-based development"
"What is our tech stack?"
"How do AI agents work?"

Phase 4: Answer Generation with Citations (3-4 hours)¶

Tasks¶

Build prompt template that includes retrieved chunks
Instruct LLM to cite sources in answers
Parse LLM response and format citations as links
Implement fallback for low-confidence answers ("I don't have enough information")
Test citation accuracy manually

Deliverables¶

Prompt template with clear instructions for citations
Answer generation function
Citation formatter (markdown links back to source files)

Prompt Structure (Initial)¶

You are a helpful assistant for Minnova's internal documentation.

Context:
{retrieved_chunks_with_sources}

Question: {user_question}

Instructions:
- Answer based ONLY on the provided context
- Include inline citations [1], [2], etc.
- If the context doesn't contain enough information, say so
- Keep answers concise and practical

Answer:

Phase 5: Basic UI/Interface (2-3 hours)¶

Tasks¶

Build simple CLI interface (start here)
Or: Build simple web UI (Phoenix LiveView or Next.js)
Display query, answer, and citations
Add loading states and error handling

Deliverables¶

Working interface for testing
README with usage instructions

CLI Option (Faster)¶

$ mix rag.query "How do we handle accounting?"

Web UI Option (Better UX)¶

Simple Phoenix LiveView page
Input box for query
Display answer with citations
Show retrieved chunks (for debugging)

Tasks¶

Test query rewriting (e.g., "KYC flow" vs "customer onboarding")
Implement hybrid search (vector + keyword) if needed
Handle multi-document answers (citations from multiple sources)
Add observability (log queries, latency, retrieval quality)
Test with diverse queries and tune chunking strategy

Deliverables¶

Refined chunking strategy (based on actual retrieval quality)
Query rewriting logic (if needed)
Observability dashboard or logs

Known Edge Cases to Test¶

Synonyms: "KYC" vs "customer onboarding"
Multi-step questions: "How do we handle accounting and what services do we offer?"
Missing information: "What is our revenue?" (should say "I don't know")
Ambiguous queries: "How do we work?" (process vs services vs tech stack)

Phase 7: Production Hardening (Optional, 3-4 hours)¶

Tasks¶

Add access control (if hosting publicly)
Implement PII redaction (if needed)
Add rate limiting
Set up monitoring and alerts
Write runbook for common issues

Deliverables¶

Production-ready deployment
Monitoring dashboard
Runbook for operations

Phase 8: Documentation & Blog Post (2-3 hours)¶

Tasks¶

Document architecture and design decisions
Write developer guide for maintaining the system
Update blog post with learnings from building the system
Create demo video or screenshots

Deliverables¶

Architecture documentation
Developer guide
Updated blog post (experience report instead of how-to)
Demo assets

Timeline Estimate¶

MVP (Phases 1-5): 14-20 hours (1-2 weekends) Production-ready (Phases 1-7): 25-35 hours (2-3 weekends) Full project (Phases 1-8): 30-40 hours

Tech Stack Recommendation¶

Elixir-first approach (recommended): - Language: Elixir (fits your stack, plays to your strengths) - Embeddings: OpenAI text-embedding-3-small (via HTTPoison or Req) - Vector DB: PostgreSQL + pgvector extension (you already know Postgres) - LLM: OpenAI GPT-4o-mini or Anthropic Claude (via req_llm) - Framework: LangChainEx or custom with req_llm

Why this stack: - PostgreSQL + pgvector means no separate vector DB to manage - Ecto for migrations and queries (familiar territory) - req_llm handles LLM API calls cleanly - Can deploy alongside existing Phoenix apps - Simpler ops than Python + separate services

Alternative (if you want something working faster): - Python + LangChain + Pinecone (2-3 hours to MVP) - Then port to Elixir once you understand the flow

Success Metrics¶

Retrieval accuracy: 80%+ of queries return relevant chunks
Citation accuracy: 90%+ of answers cite correct sources
Response time: < 2 seconds end-to-end
User satisfaction: Manual testing feels "good enough" for internal use

Next Actions¶

Pick tech stack (Python MVP or Elixir from start)
Set up development environment
Start Phase 1 (research & planning)
Build ingestion pipeline first (Phase 2)
Iterate quickly, learn as you go

Resources¶

Elixir Libraries: - pgvector-elixir - PostgreSQL vector extension for Elixir - req_llm - LLM API wrapper using Req - LangChainEx - Elixir LangChain implementation - Req - HTTP client for Elixir

AI/ML APIs: - OpenAI Embeddings Guide - OpenAI API Reference - Anthropic Claude API

PostgreSQL: - pgvector Documentation - pgvector Performance Tuning

Open Questions¶

Should we self-host or use managed services?
Do we need query rewriting from day 1?
Should we build CLI or web UI first?
What's the budget for API costs (embeddings + LLM)?
Do we want to open-source this after building it?

RAG System for Internal Documentation¶

Phase 1: Setup & Planning (2-3 hours)¶

Tasks¶

Deliverables¶

Elixir Dependencies¶

Database Schema¶

Phase 2: Document Ingestion Pipeline (4-6 hours)¶

Tasks¶

Deliverables¶

Technical Decisions¶

Example Implementation¶

Phase 3: Query & Retrieval (3-4 hours)¶

Tasks¶

Deliverables¶

Sample Queries for Testing¶

Phase 4: Answer Generation with Citations (3-4 hours)¶

Tasks¶

Deliverables¶

Prompt Structure (Initial)¶

Phase 5: Basic UI/Interface (2-3 hours)¶

Tasks¶

Deliverables¶

CLI Option (Faster)¶

Web UI Option (Better UX)¶

Phase 6: Edge Cases & Refinement (4-6 hours)¶

Tasks¶

Deliverables¶

Known Edge Cases to Test¶

Phase 7: Production Hardening (Optional, 3-4 hours)¶

Tasks¶

Deliverables¶

Phase 8: Documentation & Blog Post (2-3 hours)¶

Tasks¶

Deliverables¶

Timeline Estimate¶

Tech Stack Recommendation¶

Success Metrics¶

Next Actions¶

Resources¶

Open Questions¶