Why RAG is Essential
Large Language Models have static, pre-trained knowledge that becomes obsolete quickly. Immigration law is highly volatile:
- Executive orders change enforcement priorities
- Agency memoranda update procedures
- Case law creates new precedents
- Administrative policies shift rapidly
RAG (Retrieval Augmented Generation) solves this by:
- Intercepting user queries
- Searching your curated legal content
- Injecting relevant text into the LLM's context
- Constraining responses to provided sources
RAG Pipeline Architecture
┌─────────────────────────────────────────────────────────┐
│ User Query │
│ "What are my rights at a checkpoint?" │
└───────────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ Query Embedding │
│ nomic-embed-text / bge-large │
└───────────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ Vector Database Search │
│ ChromaDB / Qdrant / Weaviate │
│ Return: Top 5 relevant document chunks │
└───────────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ Context Injection │
│ System: "Answer using ONLY the following sources:" │
│ [Chunk 1: checkpoints.md, lines 45-89] │
│ [Chunk 2: 100-mile-zone.md, lines 12-56] │
│ [Chunk 3: traffic-stops.md, lines 78-102] │
└───────────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ LLM Generation │
│ "At immigration checkpoints, you have the right │
│ to remain silent... [Source: checkpoints.md]" │
└─────────────────────────────────────────────────────────┘
Vector Database Selection
ChromaDB (Recommended for Legal Aid)
Best for: Small to mid-sized deployments, local hosting
import chromadb
from chromadb.config import Settings
# Initialize with persistence
client = chromadb.PersistentClient(
path="./chroma_db",
settings=Settings(
anonymized_telemetry=False, # Disable telemetry
allow_reset=False # Prevent accidental deletion
)
)
# Create collection for KYR content
collection = client.create_collection(
name="know_your_rights",
metadata={"hnsw:space": "cosine"}
)
Advantages:
- Open-source, self-hosted
- Minimal infrastructure overhead
- Python-native, easy integration
- No external dependencies
Qdrant (Enterprise Scale)
Best for: Large deployments, high query volume
from qdrant_client import QdrantClient
client = QdrantClient(
host="localhost",
port=6333,
prefer_grpc=True # Better performance
)
Advantages:
- Rust-based, high performance
- Self-hosted option
- Advanced filtering capabilities
- Scales to millions of documents
Avoid Cloud-Only Solutions
Pinecone and similar cloud-only services are unsuitable because:
- Data leaves your infrastructure
- Creates subpoena-able records
- Violates air-gapped requirements
- Privacy cannot be guaranteed
Embedding Models
Recommended Models
| Model | Dimensions | Quality | Speed | License |
|---|---|---|---|---|
| nomic-embed-text | 768 | Excellent | Fast | Apache 2.0 |
| bge-large-en-v1.5 | 1024 | Excellent | Medium | MIT |
| all-MiniLM-L6-v2 | 384 | Good | Very Fast | Apache 2.0 |
| e5-large-v2 | 1024 | Excellent | Slow | MIT |
Legal-Optimized Embeddings
For legal text, consider fine-tuned models that capture statutory relationships:
from sentence_transformers import SentenceTransformer
# Load embedding model
embed_model = SentenceTransformer('nomic-ai/nomic-embed-text-v1.5')
# Generate embeddings
def embed_document(text: str) -> list[float]:
return embed_model.encode(text, convert_to_numpy=True).tolist()
Document Chunking Strategy
Why Chunking Matters
Legal documents cannot be arbitrarily sliced. Cutting a statute mid-sentence destroys conditional logic:
Bad chunk:
"...shall be subject to removal if the alien (1) was not admitted or paroled into the United States, and (2)..."
Good chunk:
"INA § 212(a)(6)(A)(i): An alien present in the United States without being
admitted or paroled, or who arrives in the United States at any time or place
other than as designated by the Attorney General, is inadmissible."
Hierarchical Chunking
Break documents by natural structure:
import re
from typing import List, Dict
def chunk_legal_document(content: str, metadata: Dict) -> List[Dict]:
chunks = []
# Split by headers (##, ###)
sections = re.split(r'\n(#{2,3}\s+[^\n]+)\n', content)
current_header = metadata.get('title', 'Unknown')
for i, section in enumerate(sections):
if section.startswith('#'):
current_header = section.strip('# ')
else:
if len(section.strip()) > 100: # Minimum chunk size
chunks.append({
'text': section.strip(),
'header': current_header,
'source': metadata.get('source'),
'jurisdiction': metadata.get('jurisdiction'),
'last_updated': metadata.get('last_updated')
})
return chunks
Overlap for Context Preservation
Include 10-20% overlap between chunks:
def chunk_with_overlap(text: str, chunk_size: int = 500, overlap: int = 100):
words = text.split()
chunks = []
for i in range(0, len(words), chunk_size - overlap):
chunk = ' '.join(words[i:i + chunk_size])
if len(chunk) > 50: # Minimum viable chunk
chunks.append(chunk)
return chunks
11ty Integration
Ingesting Markdown Content
Your 11ty site's Markdown files are the authoritative source. Parse them for RAG:
import os
import yaml
import frontmatter
from pathlib import Path
def ingest_11ty_content(content_dir: str):
"""Ingest all Markdown files from 11ty source."""
documents = []
for md_file in Path(content_dir).rglob('*.md'):
post = frontmatter.load(md_file)
# Extract metadata from YAML frontmatter
metadata = {
'source': str(md_file),
'title': post.get('title'),
'jurisdiction': post.get('jurisdiction'),
'last_updated': post.get('last_updated'),
'schema_type': post.get('schema_type'),
'tags': post.get('tags', [])
}
# Chunk the content
chunks = chunk_legal_document(post.content, metadata)
documents.extend(chunks)
return documents
Hybrid Search with Metadata Filtering
Use frontmatter metadata for precise filtering:
def search_with_jurisdiction(
query: str,
jurisdiction: str = None,
content_type: str = None
) -> List[Dict]:
"""Search with optional metadata filters."""
# Generate query embedding
query_embedding = embed_document(query)
# Build filter
where_filter = {}
if jurisdiction:
where_filter['jurisdiction'] = jurisdiction
if content_type:
where_filter['schema_type'] = content_type
# Query ChromaDB
results = collection.query(
query_embeddings=[query_embedding],
n_results=5,
where=where_filter if where_filter else None
)
return results
Incremental Indexing
Don't rebuild the entire database on every update:
import subprocess
from datetime import datetime
def get_changed_files(since_date: str) -> List[str]:
"""Get files changed since last index."""
result = subprocess.run(
['git', 'log', '--since', since_date, '--name-only', '--pretty=format:'],
capture_output=True, text=True
)
files = [f for f in result.stdout.split('\n') if f.endswith('.md')]
return list(set(files))
def incremental_index(content_dir: str, last_index_date: str):
"""Only re-embed changed files."""
changed = get_changed_files(last_index_date)
for file_path in changed:
# Delete old chunks for this file
collection.delete(where={'source': file_path})
# Re-ingest
post = frontmatter.load(file_path)
chunks = chunk_legal_document(post.content, {...})
# Add new chunks
for i, chunk in enumerate(chunks):
collection.add(
documents=[chunk['text']],
metadatas=[chunk],
ids=[f"{file_path}_{i}"]
)
System Prompt for RAG
Strict Grounding Instructions
SYSTEM_PROMPT = """You are an educational assistant providing general information
about immigration rights. You must:
1. ONLY use information from the provided source documents
2. ALWAYS cite the source document for each factual claim
3. If the sources don't contain the answer, say "I don't have information about that"
4. NEVER provide legal advice or apply law to specific situations
5. ALWAYS remind users to consult a licensed immigration attorney
CRITICAL: You are NOT a lawyer. You provide educational information only.
---
SOURCES:
{context}
---
Answer the user's question using ONLY the sources above. Cite sources like this:
[Source: document_name.md]"""
Handling Low-Confidence Queries
When vector similarity scores are low, refuse gracefully:
def generate_response(query: str, retrieved_docs: List[Dict]):
# Check retrieval confidence
min_similarity = min(doc['score'] for doc in retrieved_docs)
if min_similarity < 0.65: # Low confidence threshold
return """I don't have specific information about that topic in my
knowledge base. For accurate information, please:
1. Visit our Know Your Rights guides at /know-your-rights/
2. Contact a licensed immigration attorney
3. Call the United We Dream hotline: 1-844-363-1423
This ensures you receive accurate, up-to-date information."""
# Proceed with normal generation
context = format_context(retrieved_docs)
return llm.generate(SYSTEM_PROMPT.format(context=context), query)
Citation Injection
Automatic Source Attribution
Force citations in responses:
def post_process_response(response: str, sources: List[Dict]) -> str:
"""Ensure citations are present in response."""
# Check if citations exist
if '[Source:' not in response:
# Append source list
source_list = "\n\nSources:\n"
for src in sources:
source_list += f"- {src['title']} ({src['source']})\n"
response += source_list
return response
Quality Assurance
Evaluation Metrics
Test RAG quality with legal-specific benchmarks:
| Metric | Target | Description |
|---|---|---|
| Retrieval Precision | >85% | Relevant chunks in top 5 |
| Citation Accuracy | 100% | Every claim has valid source |
| Hallucination Rate | <5% | Claims not in sources |
| Refusal Rate | >90% | Correct refusal on OOD queries |
Testing Framework
def evaluate_rag_response(
question: str,
response: str,
expected_sources: List[str],
ground_truth: str
) -> Dict:
"""Evaluate RAG response quality."""
# Check citations
cited_sources = extract_citations(response)
citation_accuracy = len(set(cited_sources) & set(expected_sources)) / len(expected_sources)
# Check for hallucinations
hallucinations = detect_hallucinations(response, retrieved_context)
return {
'citation_accuracy': citation_accuracy,
'hallucination_count': len(hallucinations),
'response_length': len(response),
'contains_disclaimer': 'legal advice' in response.lower()
}
Keeping Content Current
Update Workflow
1. Legal team updates Markdown in 11ty repo
2. CI/CD triggers incremental re-indexing
3. Changed documents are re-embedded
4. Vector database updates atomically
5. Users immediately get current information
Version Control for Legal Accuracy
# Store version info with each chunk
metadata = {
'source': 'checkpoints.md',
'git_commit': 'abc123',
'last_updated': '2026-03-24',
'reviewed_by': 'legal_team',
'version': '2.1'
}
Next Steps
- Implement safety guardrails - UPL compliance is required
- Configure privacy architecture - Zero-retention logging
- Review multilingual support - Spanish-first design