Emergency Hotline: Call 1-844-363-1423 (United We Dream Hotline)
ICE Encounter

Why RAG is Essential

Large Language Models have static, pre-trained knowledge that becomes obsolete quickly. Immigration law is highly volatile:

  • Executive orders change enforcement priorities
  • Agency memoranda update procedures
  • Case law creates new precedents
  • Administrative policies shift rapidly

RAG (Retrieval Augmented Generation) solves this by:

  1. Intercepting user queries
  2. Searching your curated legal content
  3. Injecting relevant text into the LLM's context
  4. Constraining responses to provided sources

RAG Pipeline Architecture

┌─────────────────────────────────────────────────────────┐
│                    User Query                           │
│        "What are my rights at a checkpoint?"            │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│                  Query Embedding                        │
│           nomic-embed-text / bge-large                  │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│               Vector Database Search                    │
│            ChromaDB / Qdrant / Weaviate                 │
│    Return: Top 5 relevant document chunks               │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│                Context Injection                        │
│   System: "Answer using ONLY the following sources:"    │
│   [Chunk 1: checkpoints.md, lines 45-89]                │
│   [Chunk 2: 100-mile-zone.md, lines 12-56]              │
│   [Chunk 3: traffic-stops.md, lines 78-102]             │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│                 LLM Generation                          │
│    "At immigration checkpoints, you have the right     │
│     to remain silent... [Source: checkpoints.md]"       │
└─────────────────────────────────────────────────────────┘

Vector Database Selection

ChromaDB (Recommended for Legal Aid)

Best for: Small to mid-sized deployments, local hosting

import chromadb
from chromadb.config import Settings

# Initialize with persistence
client = chromadb.PersistentClient(
    path="./chroma_db",
    settings=Settings(
        anonymized_telemetry=False,  # Disable telemetry
        allow_reset=False            # Prevent accidental deletion
    )
)

# Create collection for KYR content
collection = client.create_collection(
    name="know_your_rights",
    metadata={"hnsw:space": "cosine"}
)

Advantages:

  • Open-source, self-hosted
  • Minimal infrastructure overhead
  • Python-native, easy integration
  • No external dependencies

Qdrant (Enterprise Scale)

Best for: Large deployments, high query volume

from qdrant_client import QdrantClient

client = QdrantClient(
    host="localhost",
    port=6333,
    prefer_grpc=True  # Better performance
)

Advantages:

  • Rust-based, high performance
  • Self-hosted option
  • Advanced filtering capabilities
  • Scales to millions of documents

Avoid Cloud-Only Solutions

Pinecone and similar cloud-only services are unsuitable because:

  • Data leaves your infrastructure
  • Creates subpoena-able records
  • Violates air-gapped requirements
  • Privacy cannot be guaranteed

Embedding Models

Recommended Models

Model Dimensions Quality Speed License
nomic-embed-text 768 Excellent Fast Apache 2.0
bge-large-en-v1.5 1024 Excellent Medium MIT
all-MiniLM-L6-v2 384 Good Very Fast Apache 2.0
e5-large-v2 1024 Excellent Slow MIT

Legal-Optimized Embeddings

For legal text, consider fine-tuned models that capture statutory relationships:

from sentence_transformers import SentenceTransformer

# Load embedding model
embed_model = SentenceTransformer('nomic-ai/nomic-embed-text-v1.5')

# Generate embeddings
def embed_document(text: str) -> list[float]:
    return embed_model.encode(text, convert_to_numpy=True).tolist()

Document Chunking Strategy

Why Chunking Matters

Legal documents cannot be arbitrarily sliced. Cutting a statute mid-sentence destroys conditional logic:

Bad chunk:

"...shall be subject to removal if the alien (1) was not admitted or paroled into the United States, and (2)..."

Good chunk:

"INA § 212(a)(6)(A)(i): An alien present in the United States without being
admitted or paroled, or who arrives in the United States at any time or place
other than as designated by the Attorney General, is inadmissible."

Hierarchical Chunking

Break documents by natural structure:

import re
from typing import List, Dict

def chunk_legal_document(content: str, metadata: Dict) -> List[Dict]:
    chunks = []

    # Split by headers (##, ###)
    sections = re.split(r'\n(#{2,3}\s+[^\n]+)\n', content)

    current_header = metadata.get('title', 'Unknown')

    for i, section in enumerate(sections):
        if section.startswith('#'):
            current_header = section.strip('# ')
        else:
            if len(section.strip()) > 100:  # Minimum chunk size
                chunks.append({
                    'text': section.strip(),
                    'header': current_header,
                    'source': metadata.get('source'),
                    'jurisdiction': metadata.get('jurisdiction'),
                    'last_updated': metadata.get('last_updated')
                })

    return chunks

Overlap for Context Preservation

Include 10-20% overlap between chunks:

def chunk_with_overlap(text: str, chunk_size: int = 500, overlap: int = 100):
    words = text.split()
    chunks = []

    for i in range(0, len(words), chunk_size - overlap):
        chunk = ' '.join(words[i:i + chunk_size])
        if len(chunk) > 50:  # Minimum viable chunk
            chunks.append(chunk)

    return chunks

11ty Integration

Ingesting Markdown Content

Your 11ty site's Markdown files are the authoritative source. Parse them for RAG:

import os
import yaml
import frontmatter
from pathlib import Path

def ingest_11ty_content(content_dir: str):
    """Ingest all Markdown files from 11ty source."""
    documents = []

    for md_file in Path(content_dir).rglob('*.md'):
        post = frontmatter.load(md_file)

        # Extract metadata from YAML frontmatter
        metadata = {
            'source': str(md_file),
            'title': post.get('title'),
            'jurisdiction': post.get('jurisdiction'),
            'last_updated': post.get('last_updated'),
            'schema_type': post.get('schema_type'),
            'tags': post.get('tags', [])
        }

        # Chunk the content
        chunks = chunk_legal_document(post.content, metadata)
        documents.extend(chunks)

    return documents

Hybrid Search with Metadata Filtering

Use frontmatter metadata for precise filtering:

def search_with_jurisdiction(
    query: str,
    jurisdiction: str = None,
    content_type: str = None
) -> List[Dict]:
    """Search with optional metadata filters."""

    # Generate query embedding
    query_embedding = embed_document(query)

    # Build filter
    where_filter = {}
    if jurisdiction:
        where_filter['jurisdiction'] = jurisdiction
    if content_type:
        where_filter['schema_type'] = content_type

    # Query ChromaDB
    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=5,
        where=where_filter if where_filter else None
    )

    return results

Incremental Indexing

Don't rebuild the entire database on every update:

import subprocess
from datetime import datetime

def get_changed_files(since_date: str) -> List[str]:
    """Get files changed since last index."""
    result = subprocess.run(
        ['git', 'log', '--since', since_date, '--name-only', '--pretty=format:'],
        capture_output=True, text=True
    )

    files = [f for f in result.stdout.split('\n') if f.endswith('.md')]
    return list(set(files))

def incremental_index(content_dir: str, last_index_date: str):
    """Only re-embed changed files."""
    changed = get_changed_files(last_index_date)

    for file_path in changed:
        # Delete old chunks for this file
        collection.delete(where={'source': file_path})

        # Re-ingest
        post = frontmatter.load(file_path)
        chunks = chunk_legal_document(post.content, {...})

        # Add new chunks
        for i, chunk in enumerate(chunks):
            collection.add(
                documents=[chunk['text']],
                metadatas=[chunk],
                ids=[f"{file_path}_{i}"]
            )

System Prompt for RAG

Strict Grounding Instructions

SYSTEM_PROMPT = """You are an educational assistant providing general information
about immigration rights. You must:

1. ONLY use information from the provided source documents
2. ALWAYS cite the source document for each factual claim
3. If the sources don't contain the answer, say "I don't have information about that"
4. NEVER provide legal advice or apply law to specific situations
5. ALWAYS remind users to consult a licensed immigration attorney

CRITICAL: You are NOT a lawyer. You provide educational information only.

---
SOURCES:
{context}
---

Answer the user's question using ONLY the sources above. Cite sources like this:
[Source: document_name.md]"""

Handling Low-Confidence Queries

When vector similarity scores are low, refuse gracefully:

def generate_response(query: str, retrieved_docs: List[Dict]):
    # Check retrieval confidence
    min_similarity = min(doc['score'] for doc in retrieved_docs)

    if min_similarity < 0.65:  # Low confidence threshold
        return """I don't have specific information about that topic in my
        knowledge base. For accurate information, please:

        1. Visit our Know Your Rights guides at /know-your-rights/
        2. Contact a licensed immigration attorney
        3. Call the United We Dream hotline: 1-844-363-1423

        This ensures you receive accurate, up-to-date information."""

    # Proceed with normal generation
    context = format_context(retrieved_docs)
    return llm.generate(SYSTEM_PROMPT.format(context=context), query)

Citation Injection

Automatic Source Attribution

Force citations in responses:

def post_process_response(response: str, sources: List[Dict]) -> str:
    """Ensure citations are present in response."""

    # Check if citations exist
    if '[Source:' not in response:
        # Append source list
        source_list = "\n\nSources:\n"
        for src in sources:
            source_list += f"- {src['title']} ({src['source']})\n"
        response += source_list

    return response

Quality Assurance

Evaluation Metrics

Test RAG quality with legal-specific benchmarks:

Metric Target Description
Retrieval Precision >85% Relevant chunks in top 5
Citation Accuracy 100% Every claim has valid source
Hallucination Rate <5% Claims not in sources
Refusal Rate >90% Correct refusal on OOD queries

Testing Framework

def evaluate_rag_response(
    question: str,
    response: str,
    expected_sources: List[str],
    ground_truth: str
) -> Dict:
    """Evaluate RAG response quality."""

    # Check citations
    cited_sources = extract_citations(response)
    citation_accuracy = len(set(cited_sources) & set(expected_sources)) / len(expected_sources)

    # Check for hallucinations
    hallucinations = detect_hallucinations(response, retrieved_context)

    return {
        'citation_accuracy': citation_accuracy,
        'hallucination_count': len(hallucinations),
        'response_length': len(response),
        'contains_disclaimer': 'legal advice' in response.lower()
    }

Keeping Content Current

Update Workflow

1. Legal team updates Markdown in 11ty repo
2. CI/CD triggers incremental re-indexing
3. Changed documents are re-embedded
4. Vector database updates atomically
5. Users immediately get current information

Version Control for Legal Accuracy

# Store version info with each chunk
metadata = {
    'source': 'checkpoints.md',
    'git_commit': 'abc123',
    'last_updated': '2026-03-24',
    'reviewed_by': 'legal_team',
    'version': '2.1'
}

Next Steps

  1. Implement safety guardrails - UPL compliance is required
  2. Configure privacy architecture - Zero-retention logging
  3. Review multilingual support - Spanish-first design
Legal Disclaimer

This website does not provide legal advice. The information provided on this site is for general informational and educational purposes only. It does not create an attorney-client relationship.

Information on this website may not be current or accurate. Immigration law is complex and varies by jurisdiction and individual circumstances. Always consult with a qualified immigration attorney for advice specific to your situation.

Neither ICE Encounter, its developers, partners, nor any contributors shall be liable for any actions taken or not taken based on information from this site. Use of this site is subject to our Terms of Use and Privacy Policy.