Multilingual Support for Immigration AI Chatbots | ICE Encounter

Why Multilingual Support is Critical

Effective access to justice requires communicating with immigrants in their primary language. Key considerations:

Many users have limited English proficiency
Legal terminology is complex even in native language
Trust is built through language accessibility
Crisis situations require immediate comprehension

Language Performance by Model

High-Resource Languages

Leading open-source LLMs perform excellently in Spanish, Chinese, and Vietnamese:

Model	Spanish	Chinese	Vietnamese	Notes
Qwen 2.5/3	Excellent	Excellent	Excellent	29+ languages native
Llama 3.3 70B	Excellent	Good	Good	Strong FLORES+ scores
Mistral 7B	Excellent	Moderate	Moderate	European language focus
Gemma 3	Good	Good	Good	Solid all-around

Model Selection for Multilingual

Recommended: Qwen 2.5 32B or 72B for maximum language coverage

# Primary model for multilingual deployment
PRIMARY_MODEL = "Qwen/Qwen2.5-32B-Instruct"

# Fallback for English-heavy queries
FALLBACK_MODEL = "mistralai/Mistral-7B-Instruct-v0.3"

Spanish-First Design

Regional Dialect Considerations

Spanish terminology varies significantly:

Concept	Mexico	Caribbean	South America
Attorney	Abogado	Abogado	Abogado/Letrado
Deportation	Deportación	Deportación	Expulsión
Checkpoint	Retén	Punto de control	Puesto de control
Green Card	Tarjeta verde/Mica	Green card	Residencia permanente

Best Practice: Neutral Legal Vocabulary

SPANISH_SYSTEM_PROMPT = """
Responda en español utilizando vocabulario legal estándar y neutral.
Evite regionalismos que puedan confundir a usuarios de diferentes países.
Use términos oficiales de inmigración de EE.UU. cuando sea posible.

Ejemplos de términos preferidos:
- "tarjeta de residencia permanente" (no "mica" ni "green card")
- "audiencia de inmigración" (no "corte")
- "orden de deportación" (no "removal order" sin traducción)
"""

Code-Switching (Spanglish) Support

Many users blend English and Spanish. The model should understand:

User: "Mi esposo tiene un appointment con ICE mañana, ¿qué rights tenemos?"

Expected understanding:
- Recognize mixed language input
- Respond in user's apparent preferred language (Spanish)
- Use consistent terminology

Chinese Language Support

Simplified vs Traditional

Community	Script	Primary Regions
Mainland China	Simplified (简体)	Most mainland immigrants
Taiwan, Hong Kong	Traditional (繁體)	Taiwan, HK, older diaspora

Implementation

def detect_chinese_variant(text: str) -> str:
    """Detect simplified vs traditional Chinese."""
    # Use character frequency analysis
    simplified_chars = set('国说这时会对经')
    traditional_chars = set('國說這時會對經')

    simp_count = sum(1 for c in text if c in simplified_chars)
    trad_count = sum(1 for c in text if c in traditional_chars)

    return 'zh-CN' if simp_count > trad_count else 'zh-TW'

Chinese Legal Terminology

Immigration terms require careful translation:

English	Simplified Chinese	Notes
Asylum	政治庇护 (zhèngzhì bìhù)	Literally "political shelter"
Green Card	绿卡 (lǜkǎ)	Direct transliteration common
Deportation	驱逐出境 (qūzhú chūjìng)	Formal legal term
ICE	移民海关执法局	Full agency name needed

Vietnamese Language Support

Considerations

Tonal language with diacritical marks essential
Large Vietnamese immigrant population (1.3M+)
Strong community organization networks
Models: Qwen performs best, followed by Llama

Vietnamese Legal Terms

English	Vietnamese	Notes
Immigration	Di trú
Asylum	Tị nạn chính trị	Political asylum
Deportation	Trục xuất
Attorney	Luật sư
Right to remain silent	Quyền giữ im lặng

Indigenous Languages: Critical Limitations

The Hard Truth

LLMs perform near random chance on Indigenous languages:

Language	Speakers in US	LLM Performance
K'iche'	~100,000	Unusable
Mam	~50,000	Unusable
Q'anjob'al	~40,000	Unusable
Mixtec	~30,000	Unusable

Why AI Translation Fails

Minimal training data - These languages have almost no internet presence
Morphological complexity - Agglutinative structures unlike European languages
Text literacy rates - Many speakers are oral-tradition primary
Dialectal variation - Village-level differences

The Safe Approach

DO NOT attempt AI text generation in Indigenous languages.

Instead:

INDIGENOUS_LANGUAGE_ROUTING = {
    'mam': {
        'action': 'route_to_audio',
        'resources': [
            'International Mayan League audio KYR',
            'ProBAR visual guides',
        ],
        'hotline': '1-800-354-0365'  # ProBAR
    },
    'kiche': {
        'action': 'route_to_audio',
        'resources': [
            'International Mayan League recordings',
            'CLINIC Mayan language materials',
        ]
    }
}

UI Implementation

<div class="language-selector">
  <h3>Select Your Language / Seleccione su idioma</h3>

  <!-- Text-based languages -->
  <button data-lang="en">English</button>
  <button data-lang="es">Español</button>
  <button data-lang="zh">中文</button>
  <button data-lang="vi">Tiếng Việt</button>

  <!-- Indigenous languages - route to audio/visual -->
  <div class="indigenous-section">
    <p>For these languages, we provide audio and visual guides:</p>
    <button data-lang="mam" data-type="audio">
      <img src="/assets/icons/speaker.svg" alt="">
      Mam
    </button>
    <button data-lang="kiche" data-type="audio">
      <img src="/assets/icons/speaker.svg" alt="">
      K'iche'
    </button>
  </div>
</div>

Language Detection and Routing

Automatic Detection

from langdetect import detect, detect_langs

def detect_user_language(text: str) -> dict:
    """Detect language with confidence score."""
    try:
        detected = detect_langs(text)
        primary = detected[0]
        return {
            'language': primary.lang,
            'confidence': primary.prob,
            'alternatives': [
                {'lang': d.lang, 'prob': d.prob}
                for d in detected[1:3]
            ]
        }
    except:
        return {'language': 'en', 'confidence': 0.5}

Language Routing Logic

def route_by_language(query: str, detected_lang: str):
    """Route query based on detected language."""

    # High-resource languages → LLM
    if detected_lang in ['en', 'es', 'zh', 'vi']:
        return {
            'handler': 'llm',
            'model': select_model_for_language(detected_lang),
            'response_language': detected_lang
        }

    # Indigenous languages → pre-recorded resources
    if detected_lang in ['mam', 'quc', 'kjb']:
        return {
            'handler': 'static_resources',
            'media_type': 'audio_video',
            'resources': get_indigenous_resources(detected_lang)
        }

    # Unknown → default to English with offer to switch
    return {
        'handler': 'llm',
        'response_language': 'en',
        'offer_alternatives': True
    }

Translation Verification

Human-in-the-Loop Required

AI translations of legal content MUST be verified by:

Native-speaking legal professionals
Community members familiar with immigration terminology
Regular audit cycles

Common Translation Errors

English	Bad Translation	Issue
"Parole"	"Libertad condicional"	Criminal vs immigration parole confusion
"Asylum"	"Asilo" (without context)	Can mean nursing home in some regions
"Removal"	"Eliminación"	Sounds like extermination
"Bond"	"Bono"	Financial bond vs bail bond confusion

Verified Translation Database

Maintain approved translations:

# translations/legal-terms.yaml
asylum:
  en: "asylum"
  es: "asilo político"
  zh-CN: "政治庇护"
  vi: "tị nạn chính trị"
  verified_by: "CLINIC Legal Team"
  verified_date: "2026-03-01"

parole:
  en: "immigration parole"
  es: "libertad condicional de inmigración"  # NOT criminal parole
  zh-CN: "移民假释"
  vi: "tạm tha nhập cư"
  note: "Distinguish from criminal parole"

Bilingual Disclaimers

Session Start (English/Spanish)

┌─────────────────────────────────────────────────────┐
│  IMPORTANT NOTICE / AVISO IMPORTANTE               │
│                                                     │
│  [EN] This chatbot provides educational            │
│       information only—NOT legal advice.           │
│                                                     │
│  [ES] Este chatbot proporciona información         │
│       educativa solamente—NO es asesoría legal.    │
│                                                     │
│  For your specific situation, consult an attorney. │
│  Para su situación específica, consulte un abogado.│
│                                                     │
│  Emergency / Emergencia: 1-844-363-1423            │
└─────────────────────────────────────────────────────┘

Per-Response Disclaimer (Auto-Language)

DISCLAIMERS = {
    'en': "This is general information, not legal advice. Consult an attorney for your situation.",
    'es': "Esta es información general, no asesoría legal. Consulte a un abogado para su situación.",
    'zh': "这是一般信息，不是法律建议。请咨询律师了解您的具体情况。",
    'vi': "Đây là thông tin chung, không phải tư vấn pháp lý. Hãy tham khảo luật sư cho trường hợp của bạn."
}

def append_disclaimer(response: str, language: str) -> str:
    disclaimer = DISCLAIMERS.get(language, DISCLAIMERS['en'])
    return f"{response}\n\n---\n{disclaimer}"

Culturally Appropriate Design

Tone Considerations

Culture	Communication Style	Chatbot Adaptation
Latin American	Warm, personal	Use respectful "usted", express empathy
Chinese	Formal, hierarchical	Professional tone, clear structure
Vietnamese	Respectful, indirect	Avoid blunt statements, offer options

Cultural Context in Responses

CULTURAL_ADAPTATIONS = {
    'es': {
        'greeting': "Hola, estoy aquí para ayudarle con información sobre sus derechos.",
        'empathy': "Entiendo que esta situación puede ser difícil.",
        'formal_you': True  # Use "usted" not "tú"
    },
    'zh': {
        'greeting': "您好，我可以为您提供有关移民权利的信息。",
        'empathy': "我理解这种情况可能很困难。",
        'formal_structure': True
    }
}

Testing Multilingual Quality

Evaluation Framework

def evaluate_multilingual_response(
    query: str,
    response: str,
    target_language: str
) -> dict:
    """Evaluate multilingual response quality."""

    return {
        'language_correct': detect(response) == target_language,
        'terminology_accurate': verify_legal_terms(response, target_language),
        'disclaimer_present': has_disclaimer(response, target_language),
        'culturally_appropriate': check_cultural_markers(response, target_language),
        'fluency_score': rate_fluency(response, target_language)
    }

Native Speaker Review

Before deployment:

[ ] Spanish responses reviewed by native speaker with legal background
[ ] Chinese (Simplified + Traditional) reviewed separately
[ ] Vietnamese reviewed by community organization partner
[ ] Indigenous language resources verified by community leaders

Next Steps

Review UX design - Accessibility requirements
Configure attorney integration - Handoff protocols
Review implementation roadmap - Deployment phases