Why Multilingual Support is Critical
Effective access to justice requires communicating with immigrants in their primary language. Key considerations:
- Many users have limited English proficiency
- Legal terminology is complex even in native language
- Trust is built through language accessibility
- Crisis situations require immediate comprehension
Language Performance by Model
High-Resource Languages
Leading open-source LLMs perform excellently in Spanish, Chinese, and Vietnamese:
| Model | Spanish | Chinese | Vietnamese | Notes |
|---|---|---|---|---|
| Qwen 2.5/3 | Excellent | Excellent | Excellent | 29+ languages native |
| Llama 3.3 70B | Excellent | Good | Good | Strong FLORES+ scores |
| Mistral 7B | Excellent | Moderate | Moderate | European language focus |
| Gemma 3 | Good | Good | Good | Solid all-around |
Model Selection for Multilingual
Recommended: Qwen 2.5 32B or 72B for maximum language coverage
# Primary model for multilingual deployment
PRIMARY_MODEL = "Qwen/Qwen2.5-32B-Instruct"
# Fallback for English-heavy queries
FALLBACK_MODEL = "mistralai/Mistral-7B-Instruct-v0.3"
Spanish-First Design
Regional Dialect Considerations
Spanish terminology varies significantly:
| Concept | Mexico | Caribbean | South America |
|---|---|---|---|
| Attorney | Abogado | Abogado | Abogado/Letrado |
| Deportation | Deportación | Deportación | Expulsión |
| Checkpoint | Retén | Punto de control | Puesto de control |
| Green Card | Tarjeta verde/Mica | Green card | Residencia permanente |
Best Practice: Neutral Legal Vocabulary
SPANISH_SYSTEM_PROMPT = """
Responda en español utilizando vocabulario legal estándar y neutral.
Evite regionalismos que puedan confundir a usuarios de diferentes países.
Use términos oficiales de inmigración de EE.UU. cuando sea posible.
Ejemplos de términos preferidos:
- "tarjeta de residencia permanente" (no "mica" ni "green card")
- "audiencia de inmigración" (no "corte")
- "orden de deportación" (no "removal order" sin traducción)
"""
Code-Switching (Spanglish) Support
Many users blend English and Spanish. The model should understand:
User: "Mi esposo tiene un appointment con ICE mañana, ¿qué rights tenemos?"
Expected understanding:
- Recognize mixed language input
- Respond in user's apparent preferred language (Spanish)
- Use consistent terminology
Chinese Language Support
Simplified vs Traditional
| Community | Script | Primary Regions |
|---|---|---|
| Mainland China | Simplified (简体) | Most mainland immigrants |
| Taiwan, Hong Kong | Traditional (繁體) | Taiwan, HK, older diaspora |
Implementation
def detect_chinese_variant(text: str) -> str:
"""Detect simplified vs traditional Chinese."""
# Use character frequency analysis
simplified_chars = set('国说这时会对经')
traditional_chars = set('國說這時會對經')
simp_count = sum(1 for c in text if c in simplified_chars)
trad_count = sum(1 for c in text if c in traditional_chars)
return 'zh-CN' if simp_count > trad_count else 'zh-TW'
Chinese Legal Terminology
Immigration terms require careful translation:
| English | Simplified Chinese | Notes |
|---|---|---|
| Asylum | 政治庇护 (zhèngzhì bìhù) | Literally "political shelter" |
| Green Card | 绿卡 (lǜkǎ) | Direct transliteration common |
| Deportation | 驱逐出境 (qūzhú chūjìng) | Formal legal term |
| ICE | 移民海关执法局 | Full agency name needed |
Vietnamese Language Support
Considerations
- Tonal language with diacritical marks essential
- Large Vietnamese immigrant population (1.3M+)
- Strong community organization networks
- Models: Qwen performs best, followed by Llama
Vietnamese Legal Terms
| English | Vietnamese | Notes |
|---|---|---|
| Immigration | Di trú | |
| Asylum | Tị nạn chính trị | Political asylum |
| Deportation | Trục xuất | |
| Attorney | Luật sư | |
| Right to remain silent | Quyền giữ im lặng |
Indigenous Languages: Critical Limitations
The Hard Truth
LLMs perform near random chance on Indigenous languages:
| Language | Speakers in US | LLM Performance |
|---|---|---|
| K'iche' | ~100,000 | Unusable |
| Mam | ~50,000 | Unusable |
| Q'anjob'al | ~40,000 | Unusable |
| Mixtec | ~30,000 | Unusable |
Why AI Translation Fails
- Minimal training data - These languages have almost no internet presence
- Morphological complexity - Agglutinative structures unlike European languages
- Text literacy rates - Many speakers are oral-tradition primary
- Dialectal variation - Village-level differences
The Safe Approach
DO NOT attempt AI text generation in Indigenous languages.
Instead:
INDIGENOUS_LANGUAGE_ROUTING = {
'mam': {
'action': 'route_to_audio',
'resources': [
'International Mayan League audio KYR',
'ProBAR visual guides',
],
'hotline': '1-800-354-0365' # ProBAR
},
'kiche': {
'action': 'route_to_audio',
'resources': [
'International Mayan League recordings',
'CLINIC Mayan language materials',
]
}
}
UI Implementation
<div class="language-selector">
<h3>Select Your Language / Seleccione su idioma</h3>
<!-- Text-based languages -->
<button data-lang="en">English</button>
<button data-lang="es">Español</button>
<button data-lang="zh">中文</button>
<button data-lang="vi">Tiếng Việt</button>
<!-- Indigenous languages - route to audio/visual -->
<div class="indigenous-section">
<p>For these languages, we provide audio and visual guides:</p>
<button data-lang="mam" data-type="audio">
<img src="/assets/icons/speaker.svg" alt="">
Mam
</button>
<button data-lang="kiche" data-type="audio">
<img src="/assets/icons/speaker.svg" alt="">
K'iche'
</button>
</div>
</div>
Language Detection and Routing
Automatic Detection
from langdetect import detect, detect_langs
def detect_user_language(text: str) -> dict:
"""Detect language with confidence score."""
try:
detected = detect_langs(text)
primary = detected[0]
return {
'language': primary.lang,
'confidence': primary.prob,
'alternatives': [
{'lang': d.lang, 'prob': d.prob}
for d in detected[1:3]
]
}
except:
return {'language': 'en', 'confidence': 0.5}
Language Routing Logic
def route_by_language(query: str, detected_lang: str):
"""Route query based on detected language."""
# High-resource languages → LLM
if detected_lang in ['en', 'es', 'zh', 'vi']:
return {
'handler': 'llm',
'model': select_model_for_language(detected_lang),
'response_language': detected_lang
}
# Indigenous languages → pre-recorded resources
if detected_lang in ['mam', 'quc', 'kjb']:
return {
'handler': 'static_resources',
'media_type': 'audio_video',
'resources': get_indigenous_resources(detected_lang)
}
# Unknown → default to English with offer to switch
return {
'handler': 'llm',
'response_language': 'en',
'offer_alternatives': True
}
Translation Verification
Human-in-the-Loop Required
AI translations of legal content MUST be verified by:
- Native-speaking legal professionals
- Community members familiar with immigration terminology
- Regular audit cycles
Common Translation Errors
| English | Bad Translation | Issue |
|---|---|---|
| "Parole" | "Libertad condicional" | Criminal vs immigration parole confusion |
| "Asylum" | "Asilo" (without context) | Can mean nursing home in some regions |
| "Removal" | "Eliminación" | Sounds like extermination |
| "Bond" | "Bono" | Financial bond vs bail bond confusion |
Verified Translation Database
Maintain approved translations:
# translations/legal-terms.yaml
asylum:
en: "asylum"
es: "asilo político"
zh-CN: "政治庇护"
vi: "tị nạn chính trị"
verified_by: "CLINIC Legal Team"
verified_date: "2026-03-01"
parole:
en: "immigration parole"
es: "libertad condicional de inmigración" # NOT criminal parole
zh-CN: "移民假释"
vi: "tạm tha nhập cư"
note: "Distinguish from criminal parole"
Bilingual Disclaimers
Session Start (English/Spanish)
┌─────────────────────────────────────────────────────┐
│ IMPORTANT NOTICE / AVISO IMPORTANTE │
│ │
│ [EN] This chatbot provides educational │
│ information only—NOT legal advice. │
│ │
│ [ES] Este chatbot proporciona información │
│ educativa solamente—NO es asesoría legal. │
│ │
│ For your specific situation, consult an attorney. │
│ Para su situación específica, consulte un abogado.│
│ │
│ Emergency / Emergencia: 1-844-363-1423 │
└─────────────────────────────────────────────────────┘
Per-Response Disclaimer (Auto-Language)
DISCLAIMERS = {
'en': "This is general information, not legal advice. Consult an attorney for your situation.",
'es': "Esta es información general, no asesoría legal. Consulte a un abogado para su situación.",
'zh': "这是一般信息,不是法律建议。请咨询律师了解您的具体情况。",
'vi': "Đây là thông tin chung, không phải tư vấn pháp lý. Hãy tham khảo luật sư cho trường hợp của bạn."
}
def append_disclaimer(response: str, language: str) -> str:
disclaimer = DISCLAIMERS.get(language, DISCLAIMERS['en'])
return f"{response}\n\n---\n{disclaimer}"
Culturally Appropriate Design
Tone Considerations
| Culture | Communication Style | Chatbot Adaptation |
|---|---|---|
| Latin American | Warm, personal | Use respectful "usted", express empathy |
| Chinese | Formal, hierarchical | Professional tone, clear structure |
| Vietnamese | Respectful, indirect | Avoid blunt statements, offer options |
Cultural Context in Responses
CULTURAL_ADAPTATIONS = {
'es': {
'greeting': "Hola, estoy aquí para ayudarle con información sobre sus derechos.",
'empathy': "Entiendo que esta situación puede ser difícil.",
'formal_you': True # Use "usted" not "tú"
},
'zh': {
'greeting': "您好,我可以为您提供有关移民权利的信息。",
'empathy': "我理解这种情况可能很困难。",
'formal_structure': True
}
}
Testing Multilingual Quality
Evaluation Framework
def evaluate_multilingual_response(
query: str,
response: str,
target_language: str
) -> dict:
"""Evaluate multilingual response quality."""
return {
'language_correct': detect(response) == target_language,
'terminology_accurate': verify_legal_terms(response, target_language),
'disclaimer_present': has_disclaimer(response, target_language),
'culturally_appropriate': check_cultural_markers(response, target_language),
'fluency_score': rate_fluency(response, target_language)
}
Native Speaker Review
Before deployment:
- [ ] Spanish responses reviewed by native speaker with legal background
- [ ] Chinese (Simplified + Traditional) reviewed separately
- [ ] Vietnamese reviewed by community organization partner
- [ ] Indigenous language resources verified by community leaders
Next Steps
- Review UX design - Accessibility requirements
- Configure attorney integration - Handoff protocols
- Review implementation roadmap - Deployment phases