RAG Architecture for Immigration Legal Aid Chatbots | ICE Encounter

Why RAG is Essential

Large Language Models have static, pre-trained knowledge that becomes obsolete quickly. Immigration law is highly volatile:

Executive orders change enforcement priorities
Agency memoranda update procedures
Case law creates new precedents
Administrative policies shift rapidly

RAG (Retrieval Augmented Generation) solves this by:

Intercepting user queries
Searching your curated legal content
Injecting relevant text into the LLM's context
Constraining responses to provided sources

RAG Pipeline Architecture

┌─────────────────────────────────────────────────────────┐
│                    User Query                           │
│        "What are my rights at a checkpoint?"            │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│                  Query Embedding                        │
│           nomic-embed-text / bge-large                  │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│               Vector Database Search                    │
│            ChromaDB / Qdrant / Weaviate                 │
│    Return: Top 5 relevant document chunks               │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│                Context Injection                        │
│   System: "Answer using ONLY the following sources:"    │
│   [Chunk 1: checkpoints.md, lines 45-89]                │
│   [Chunk 2: 100-mile-zone.md, lines 12-56]              │
│   [Chunk 3: traffic-stops.md, lines 78-102]             │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│                 LLM Generation                          │
│    "At immigration checkpoints, you have the right     │
│     to remain silent... [Source: checkpoints.md]"       │
└─────────────────────────────────────────────────────────┘

Vector Database Selection

ChromaDB (Recommended for Legal Aid)

Best for: Small to mid-sized deployments, local hosting

import chromadb
from chromadb.config import Settings

# Initialize with persistence
client = chromadb.PersistentClient(
    path="./chroma_db",
    settings=Settings(
        anonymized_telemetry=False,  # Disable telemetry
        allow_reset=False            # Prevent accidental deletion
    )
)

# Create collection for KYR content
collection = client.create_collection(
    name="know_your_rights",
    metadata={"hnsw:space": "cosine"}
)

Advantages:

Open-source, self-hosted
Minimal infrastructure overhead
Python-native, easy integration
No external dependencies

Qdrant (Enterprise Scale)

Best for: Large deployments, high query volume

from qdrant_client import QdrantClient

client = QdrantClient(
    host="localhost",
    port=6333,
    prefer_grpc=True  # Better performance
)

Advantages:

Rust-based, high performance
Self-hosted option
Advanced filtering capabilities
Scales to millions of documents

Avoid Cloud-Only Solutions

Pinecone and similar cloud-only services are unsuitable because:

Data leaves your infrastructure
Creates subpoena-able records
Violates air-gapped requirements
Privacy cannot be guaranteed

Embedding Models

Recommended Models

Model	Dimensions	Quality	Speed	License
nomic-embed-text	768	Excellent	Fast	Apache 2.0
bge-large-en-v1.5	1024	Excellent	Medium	MIT
all-MiniLM-L6-v2	384	Good	Very Fast	Apache 2.0
e5-large-v2	1024	Excellent	Slow	MIT

Legal-Optimized Embeddings

For legal text, consider fine-tuned models that capture statutory relationships:

from sentence_transformers import SentenceTransformer

# Load embedding model
embed_model = SentenceTransformer('nomic-ai/nomic-embed-text-v1.5')

# Generate embeddings
def embed_document(text: str) -> list[float]:
    return embed_model.encode(text, convert_to_numpy=True).tolist()

Document Chunking Strategy

Why Chunking Matters

Legal documents cannot be arbitrarily sliced. Cutting a statute mid-sentence destroys conditional logic:

Bad chunk:

"...shall be subject to removal if the alien (1) was not admitted or paroled into the United States, and (2)..."

Good chunk:

"INA § 212(a)(6)(A)(i): An alien present in the United States without being
admitted or paroled, or who arrives in the United States at any time or place
other than as designated by the Attorney General, is inadmissible."

Hierarchical Chunking

Break documents by natural structure:

import re
from typing import List, Dict

def chunk_legal_document(content: str, metadata: Dict) -> List[Dict]:
    chunks = []

    # Split by headers (##, ###)
    sections = re.split(r'\n(#{2,3}\s+[^\n]+)\n', content)

    current_header = metadata.get('title', 'Unknown')

    for i, section in enumerate(sections):
        if section.startswith('#'):
            current_header = section.strip('# ')
        else:
            if len(section.strip()) > 100:  # Minimum chunk size
                chunks.append({
                    'text': section.strip(),
                    'header': current_header,
                    'source': metadata.get('source'),
                    'jurisdiction': metadata.get('jurisdiction'),
                    'last_updated': metadata.get('last_updated')
                })

    return chunks

Overlap for Context Preservation

Include 10-20% overlap between chunks:

def chunk_with_overlap(text: str, chunk_size: int = 500, overlap: int = 100):
    words = text.split()
    chunks = []

    for i in range(0, len(words), chunk_size - overlap):
        chunk = ' '.join(words[i:i + chunk_size])
        if len(chunk) > 50:  # Minimum viable chunk
            chunks.append(chunk)

    return chunks

11ty Integration

Ingesting Markdown Content

Your 11ty site's Markdown files are the authoritative source. Parse them for RAG:

import os
import yaml
import frontmatter
from pathlib import Path

def ingest_11ty_content(content_dir: str):
    """Ingest all Markdown files from 11ty source."""
    documents = []

    for md_file in Path(content_dir).rglob('*.md'):
        post = frontmatter.load(md_file)

        # Extract metadata from YAML frontmatter
        metadata = {
            'source': str(md_file),
            'title': post.get('title'),
            'jurisdiction': post.get('jurisdiction'),
            'last_updated': post.get('last_updated'),
            'schema_type': post.get('schema_type'),
            'tags': post.get('tags', [])
        }

        # Chunk the content
        chunks = chunk_legal_document(post.content, metadata)
        documents.extend(chunks)

    return documents

Hybrid Search with Metadata Filtering

Use frontmatter metadata for precise filtering:

def search_with_jurisdiction(
    query: str,
    jurisdiction: str = None,
    content_type: str = None
) -> List[Dict]:
    """Search with optional metadata filters."""

    # Generate query embedding
    query_embedding = embed_document(query)

    # Build filter
    where_filter = {}
    if jurisdiction:
        where_filter['jurisdiction'] = jurisdiction
    if content_type:
        where_filter['schema_type'] = content_type

    # Query ChromaDB
    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=5,
        where=where_filter if where_filter else None
    )

    return results

Incremental Indexing

Don't rebuild the entire database on every update:

import subprocess
from datetime import datetime

def get_changed_files(since_date: str) -> List[str]:
    """Get files changed since last index."""
    result = subprocess.run(
        ['git', 'log', '--since', since_date, '--name-only', '--pretty=format:'],
        capture_output=True, text=True
    )

    files = [f for f in result.stdout.split('\n') if f.endswith('.md')]
    return list(set(files))

def incremental_index(content_dir: str, last_index_date: str):
    """Only re-embed changed files."""
    changed = get_changed_files(last_index_date)

    for file_path in changed:
        # Delete old chunks for this file
        collection.delete(where={'source': file_path})

        # Re-ingest
        post = frontmatter.load(file_path)
        chunks = chunk_legal_document(post.content, {...})

        # Add new chunks
        for i, chunk in enumerate(chunks):
            collection.add(
                documents=[chunk['text']],
                metadatas=[chunk],
                ids=[f"{file_path}_{i}"]
            )

System Prompt for RAG

Strict Grounding Instructions

SYSTEM_PROMPT = """You are an educational assistant providing general information
about immigration rights. You must:

1. ONLY use information from the provided source documents
2. ALWAYS cite the source document for each factual claim
3. If the sources don't contain the answer, say "I don't have information about that"
4. NEVER provide legal advice or apply law to specific situations
5. ALWAYS remind users to consult a licensed immigration attorney

CRITICAL: You are NOT a lawyer. You provide educational information only.

---
SOURCES:
{context}
---

Answer the user's question using ONLY the sources above. Cite sources like this:
[Source: document_name.md]"""

Handling Low-Confidence Queries

When vector similarity scores are low, refuse gracefully:

def generate_response(query: str, retrieved_docs: List[Dict]):
    # Check retrieval confidence
    min_similarity = min(doc['score'] for doc in retrieved_docs)

    if min_similarity < 0.65:  # Low confidence threshold
        return """I don't have specific information about that topic in my
        knowledge base. For accurate information, please:

        1. Visit our Know Your Rights guides at /know-your-rights/
        2. Contact a licensed immigration attorney
        3. Call the United We Dream hotline: 1-844-363-1423

        This ensures you receive accurate, up-to-date information."""

    # Proceed with normal generation
    context = format_context(retrieved_docs)
    return llm.generate(SYSTEM_PROMPT.format(context=context), query)

Citation Injection

Automatic Source Attribution

Force citations in responses:

def post_process_response(response: str, sources: List[Dict]) -> str:
    """Ensure citations are present in response."""

    # Check if citations exist
    if '[Source:' not in response:
        # Append source list
        source_list = "\n\nSources:\n"
        for src in sources:
            source_list += f"- {src['title']} ({src['source']})\n"
        response += source_list

    return response

Quality Assurance

Evaluation Metrics

Test RAG quality with legal-specific benchmarks:

Metric	Target	Description
Retrieval Precision	>85%	Relevant chunks in top 5
Citation Accuracy	100%	Every claim has valid source
Hallucination Rate	<5%	Claims not in sources
Refusal Rate	>90%	Correct refusal on OOD queries

Testing Framework

def evaluate_rag_response(
    question: str,
    response: str,
    expected_sources: List[str],
    ground_truth: str
) -> Dict:
    """Evaluate RAG response quality."""

    # Check citations
    cited_sources = extract_citations(response)
    citation_accuracy = len(set(cited_sources) & set(expected_sources)) / len(expected_sources)

    # Check for hallucinations
    hallucinations = detect_hallucinations(response, retrieved_context)

    return {
        'citation_accuracy': citation_accuracy,
        'hallucination_count': len(hallucinations),
        'response_length': len(response),
        'contains_disclaimer': 'legal advice' in response.lower()
    }

Keeping Content Current

Update Workflow

1. Legal team updates Markdown in 11ty repo
2. CI/CD triggers incremental re-indexing
3. Changed documents are re-embedded
4. Vector database updates atomically
5. Users immediately get current information

Version Control for Legal Accuracy

# Store version info with each chunk
metadata = {
    'source': 'checkpoints.md',
    'git_commit': 'abc123',
    'last_updated': '2026-03-24',
    'reviewed_by': 'legal_team',
    'version': '2.1'
}

Next Steps

Implement safety guardrails - UPL compliance is required
Configure privacy architecture - Zero-retention logging
Review multilingual support - Spanish-first design