RAG System for Internal Documentation¶
Status: Implementation guide with checklist. Tasks are unchecked by design - check them off as you implement.
Project Goal: Build a production-ready RAG (Retrieval-Augmented Generation) system for Minnova's internal documentation to serve as a proof-of-concept and learning exercise before taking on client RAG projects.
Target Corpus: - Minnova knowledge-base (services, accounting, ops, faq) - Minnova-site content (blog posts, service pages) - AGENTS.md and other technical documentation
Success Criteria: - Query: "How do we handle accounting?" -> Returns relevant content from accounting/README.md with citations - Query: "What services do we offer?" -> Returns services list with proper source links - Response time < 2 seconds - Citation accuracy: answers link back to exact source documents - Can be self-hosted or use managed services
Phase 1: Setup & Planning (2-3 hours)¶
Tasks¶
- Set up PostgreSQL with pgvector extension locally
- Create new Elixir/Phoenix project or add to existing
- Add dependencies: req, req_llm, ecto, pgvector
- Get OpenAI API key (or Anthropic)
- Set up migrations for vector storage table
Deliverables¶
- Working dev environment with pgvector
- Database schema for storing document chunks + embeddings
- Config for LLM API credentials
Elixir Dependencies¶
# mix.exs
defp deps do
[
{:phoenix, "~> 1.7"},
{:ecto_sql, "~> 3.10"},
{:postgrex, ">= 0.0.0"},
{:pgvector, "~> 0.2"}, # PostgreSQL vector extension
{:req, "~> 0.4"}, # HTTP client
{:req_llm, "~> 0.1"}, # LLM provider wrapper
{:jason, "~> 1.4"} # JSON parsing
]
end
Database Schema¶
defmodule Minnova.Repo.Migrations.CreateDocumentChunks do
use Ecto.Migration
def up do
execute "CREATE EXTENSION IF NOT EXISTS vector"
create table(:document_chunks) do
add :content, :text, null: false
add :source_file, :string, null: false
add :heading, :string
add :line_start, :integer
add :line_end, :integer
add :embedding, :vector, size: 1536 # OpenAI embedding size
timestamps()
end
create index(:document_chunks, [:embedding], using: :ivfflat, opclass: :vector_cosine_ops)
end
def down do
drop table(:document_chunks)
execute "DROP EXTENSION IF EXISTS vector"
end
end
Phase 2: Document Ingestion Pipeline (4-6 hours)¶
Tasks¶
- Write Mix task to parse all markdown files from knowledge-base
- Implement chunking strategy (start simple: chunk by heading or fixed size ~500 tokens)
- Generate embeddings for each chunk via OpenAI API
- Store chunks + embeddings in PostgreSQL using Ecto
- Test: verify all docs are indexed and searchable
Deliverables¶
- Mix task:
mix rag.ingest /path/to/knowledge-base - PostgreSQL populated with embedded chunks
- Metadata tracking (source file, line numbers, headings)
Technical Decisions¶
- Chunking: Start with heading-based (## sections in markdown)
- Embeddings: OpenAI text-embedding-3-small ($0.02 per 1M tokens)
- Storage: PostgreSQL with pgvector (cosine similarity search)
Example Implementation¶
defmodule Minnova.RAG.Ingest do
@moduledoc "Ingests markdown files and generates embeddings"
alias Minnova.Repo
alias Minnova.RAG.DocumentChunk
def ingest_directory(path) do
path
|> Path.join("**/*.md")
|> Path.wildcard()
|> Enum.each(&ingest_file/1)
end
defp ingest_file(file_path) do
file_path
|> File.read!()
|> chunk_by_heading()
|> Enum.each(fn chunk ->
embedding = generate_embedding(chunk.content)
%DocumentChunk{}
|> DocumentChunk.changeset(%{
content: chunk.content,
source_file: file_path,
heading: chunk.heading,
embedding: embedding
})
|> Repo.insert!()
end)
end
defp generate_embedding(text) do
# Use req + OpenAI API
# Returns list of 1536 floats
end
defp chunk_by_heading(markdown) do
# Split on ## headings
# Return list of %{heading: "...", content: "..."}
end
end
Phase 3: Query & Retrieval (3-4 hours)¶
Tasks¶
- Implement query embedding (same model as document embeddings)
- Vector similarity search (retrieve top-k chunks, start with k=5)
- Reranking logic (optional: use Cohere rerank or simple keyword scoring)
- Test retrieval quality on sample queries
- Tune k parameter and similarity threshold
Deliverables¶
- Query function that returns top-k relevant chunks
- Evaluation of retrieval quality (manual testing with 10-15 queries)
- Documentation of optimal k value and similarity threshold
Sample Queries for Testing¶
- "How do we handle accounting?"
- "What services do we offer?"
- "Explain trunk-based development"
- "What is our tech stack?"
- "How do AI agents work?"
Phase 4: Answer Generation with Citations (3-4 hours)¶
Tasks¶
- Build prompt template that includes retrieved chunks
- Instruct LLM to cite sources in answers
- Parse LLM response and format citations as links
- Implement fallback for low-confidence answers ("I don't have enough information")
- Test citation accuracy manually
Deliverables¶
- Prompt template with clear instructions for citations
- Answer generation function
- Citation formatter (markdown links back to source files)
Prompt Structure (Initial)¶
You are a helpful assistant for Minnova's internal documentation.
Context:
{retrieved_chunks_with_sources}
Question: {user_question}
Instructions:
- Answer based ONLY on the provided context
- Include inline citations [1], [2], etc.
- If the context doesn't contain enough information, say so
- Keep answers concise and practical
Answer:
Phase 5: Basic UI/Interface (2-3 hours)¶
Tasks¶
- Build simple CLI interface (start here)
- Or: Build simple web UI (Phoenix LiveView or Next.js)
- Display query, answer, and citations
- Add loading states and error handling
Deliverables¶
- Working interface for testing
- README with usage instructions
CLI Option (Faster)¶
$ mix rag.query "How do we handle accounting?"
Web UI Option (Better UX)¶
- Simple Phoenix LiveView page
- Input box for query
- Display answer with citations
- Show retrieved chunks (for debugging)
Phase 6: Edge Cases & Refinement (4-6 hours)¶
Tasks¶
- Test query rewriting (e.g., "KYC flow" vs "customer onboarding")
- Implement hybrid search (vector + keyword) if needed
- Handle multi-document answers (citations from multiple sources)
- Add observability (log queries, latency, retrieval quality)
- Test with diverse queries and tune chunking strategy
Deliverables¶
- Refined chunking strategy (based on actual retrieval quality)
- Query rewriting logic (if needed)
- Observability dashboard or logs
Known Edge Cases to Test¶
- Synonyms: "KYC" vs "customer onboarding"
- Multi-step questions: "How do we handle accounting and what services do we offer?"
- Missing information: "What is our revenue?" (should say "I don't know")
- Ambiguous queries: "How do we work?" (process vs services vs tech stack)
Phase 7: Production Hardening (Optional, 3-4 hours)¶
Tasks¶
- Add access control (if hosting publicly)
- Implement PII redaction (if needed)
- Add rate limiting
- Set up monitoring and alerts
- Write runbook for common issues
Deliverables¶
- Production-ready deployment
- Monitoring dashboard
- Runbook for operations
Phase 8: Documentation & Blog Post (2-3 hours)¶
Tasks¶
- Document architecture and design decisions
- Write developer guide for maintaining the system
- Update blog post with learnings from building the system
- Create demo video or screenshots
Deliverables¶
- Architecture documentation
- Developer guide
- Updated blog post (experience report instead of how-to)
- Demo assets
Timeline Estimate¶
MVP (Phases 1-5): 14-20 hours (1-2 weekends) Production-ready (Phases 1-7): 25-35 hours (2-3 weekends) Full project (Phases 1-8): 30-40 hours
Tech Stack Recommendation¶
Elixir-first approach (recommended): - Language: Elixir (fits your stack, plays to your strengths) - Embeddings: OpenAI text-embedding-3-small (via HTTPoison or Req) - Vector DB: PostgreSQL + pgvector extension (you already know Postgres) - LLM: OpenAI GPT-4o-mini or Anthropic Claude (via req_llm) - Framework: LangChainEx or custom with req_llm
Why this stack: - PostgreSQL + pgvector means no separate vector DB to manage - Ecto for migrations and queries (familiar territory) - req_llm handles LLM API calls cleanly - Can deploy alongside existing Phoenix apps - Simpler ops than Python + separate services
Alternative (if you want something working faster): - Python + LangChain + Pinecone (2-3 hours to MVP) - Then port to Elixir once you understand the flow
Success Metrics¶
- Retrieval accuracy: 80%+ of queries return relevant chunks
- Citation accuracy: 90%+ of answers cite correct sources
- Response time: < 2 seconds end-to-end
- User satisfaction: Manual testing feels "good enough" for internal use
Next Actions¶
- Pick tech stack (Python MVP or Elixir from start)
- Set up development environment
- Start Phase 1 (research & planning)
- Build ingestion pipeline first (Phase 2)
- Iterate quickly, learn as you go
Resources¶
Elixir Libraries: - pgvector-elixir - PostgreSQL vector extension for Elixir - req_llm - LLM API wrapper using Req - LangChainEx - Elixir LangChain implementation - Req - HTTP client for Elixir
AI/ML APIs: - OpenAI Embeddings Guide - OpenAI API Reference - Anthropic Claude API
PostgreSQL: - pgvector Documentation - pgvector Performance Tuning
Open Questions¶
- Should we self-host or use managed services?
- Do we need query rewriting from day 1?
- Should we build CLI or web UI first?
- What's the budget for API costs (embeddings + LLM)?
- Do we want to open-source this after building it?