Emergency Hotline: Call 1-844-363-1423 (United We Dream Hotline)
ICE Encounter

Why Build a Database?

Translating massive volumes of disparate FOIA releases, redacted contract PDFs, and dense inspection reports into actionable intelligence requires robust, relational databases.

The methodology pioneered by TRAC (Transactional Records Access Clearinghouse) at Syracuse University and the Deportation Data Project at UC Berkeley serves as the gold standard.


Core Principles

Event-Based Architecture

A functional detention monitoring database must move beyond static facility lists to track chronological movement of individuals through the enforcement pipeline.


Anonymous Unique Identifiers

Challenge: Government redacts actual Alien Registration Numbers (A-Numbers) to protect privacy.

Solution: Cryptographic hash creates unique identifier allowing individual tracking across datasets without exposing identity.

Original A-Number → SHA-256 Hash → Anonymous Unique ID
A123456789        → [hash]       → AUI_a7b3c9d2e4f1

Database Schema

Core Tables

-- Arrest/Encounter Event
CREATE TABLE arrests (
    arrest_id         UUID PRIMARY KEY,
    anonymous_uid     VARCHAR(32) NOT NULL,
    arrest_date       DATE NOT NULL,
    geographic_loc    VARCHAR(100),
    zip_code          VARCHAR(10),
    arresting_agency  VARCHAR(50),
    agency_type       VARCHAR(20), -- ICE, CBP, 287g
    landmark_text     TEXT,
    latitude          DECIMAL(10,8),
    longitude         DECIMAL(11,8),
    created_at        TIMESTAMP DEFAULT NOW()
);

-- Detainer Requests
CREATE TABLE detainers (
    detainer_id       UUID PRIMARY KEY,
    anonymous_uid     VARCHAR(32) NOT NULL,
    issue_date        DATE NOT NULL,
    jail_facility     VARCHAR(100),
    jail_state        VARCHAR(2),
    response_type     VARCHAR(20), -- honored, ignored, released
    response_date     DATE,
    FOREIGN KEY (anonymous_uid) REFERENCES arrests(anonymous_uid)
);

-- Detention Stints
CREATE TABLE detention (
    detention_id      UUID PRIMARY KEY,
    anonymous_uid     VARCHAR(32) NOT NULL,
    facility_code     VARCHAR(20) NOT NULL,
    book_in_datetime  TIMESTAMP NOT NULL,
    book_out_datetime TIMESTAMP,
    transfer_flag     BOOLEAN DEFAULT FALSE,
    stint_number      INTEGER, -- nth detention for this individual
    FOREIGN KEY (facility_code) REFERENCES facilities(facility_code)
);

-- Removal/Release Events
CREATE TABLE removals (
    removal_id        UUID PRIMARY KEY,
    anonymous_uid     VARCHAR(32) NOT NULL,
    event_date        DATE NOT NULL,
    event_type        VARCHAR(20), -- removal, release, voluntary
    destination       VARCHAR(100),
    flight_id         VARCHAR(50),
    FOREIGN KEY (anonymous_uid) REFERENCES arrests(anonymous_uid)
);

-- Facility Reference
CREATE TABLE facilities (
    facility_code     VARCHAR(20) PRIMARY KEY,
    facility_name     VARCHAR(200) NOT NULL,
    facility_type     VARCHAR(20), -- SPC, CDF, IGSA, FRC
    operator          VARCHAR(100),
    standards_applied VARCHAR(20), -- PBNDS2011, NDS2019
    aor               VARCHAR(20), -- Area of Responsibility
    state             VARCHAR(2),
    city              VARCHAR(100),
    latitude          DECIMAL(10,8),
    longitude         DECIMAL(11,8),
    capacity          INTEGER,
    guaranteed_beds   INTEGER,
    per_diem_rate     DECIMAL(8,2),
    active            BOOLEAN DEFAULT TRUE
);

Data Integration Challenges

Inconsistent Nomenclature

ICE data is plagued by inconsistencies:

Challenge Example
Facility name variations "Adelanto" vs "Adelanto ICE Processing Center"
AOR code changes Historical reorganizations
Typos "Lumpkin" vs "Lumpkn"
Format shifts Different field offices use different conventions

Data Cleaning Pipeline

import pandas as pd
from fuzzywuzzy import fuzz

def standardize_facility_name(raw_name, facility_lookup):
    """
    Match raw facility name to standardized facility_code
    using fuzzy matching with manual review threshold
    """
    best_match = None
    best_score = 0

    for code, standard_name in facility_lookup.items():
        score = fuzz.ratio(raw_name.lower(), standard_name.lower())
        if score > best_score:
            best_score = score
            best_match = code

    if best_score >= 90:
        return best_match
    elif best_score >= 70:
        # Flag for manual review
        return f"REVIEW:{best_match}:{best_score}"
    else:
        return "UNMATCHED"

Geocoding Raw Data

ICE often provides only landmark text strings instead of coordinates.

import googlemaps

def geocode_landmark(landmark_text, state=None):
    """
    Convert landmark description to coordinates
    """
    gmaps = googlemaps.Client(key='YOUR_API_KEY')

    query = landmark_text
    if state:
        query += f", {state}"

    result = gmaps.geocode(query)

    if result:
        location = result[0]['geometry']['location']
        return {
            'latitude': location['lat'],
            'longitude': location['lng'],
            'formatted_address': result[0]['formatted_address']
        }
    return None

Calculating Key Metrics

Average Daily Population (ADP)

Method: Aggregate at facility-day level by counting distinct individuals present at midnight.

-- Calculate ADP for a facility in a given month
WITH daily_counts AS (
    SELECT
        facility_code,
        date_trunc('day', gs.day) AS count_date,
        COUNT(DISTINCT anonymous_uid) AS daily_pop
    FROM
        generate_series(
            '2026-01-01'::date,
            '2026-01-31'::date,
            '1 day'::interval
        ) AS gs(day)
    JOIN detention d ON
        gs.day >= d.book_in_datetime::date
        AND (d.book_out_datetime IS NULL
             OR gs.day < d.book_out_datetime::date)
    WHERE facility_code = 'ADELANTO'
    GROUP BY facility_code, date_trunc('day', gs.day)
)
SELECT
    facility_code,
    AVG(daily_pop) AS average_daily_population,
    MAX(daily_pop) AS peak_population,
    MIN(daily_pop) AS minimum_population
FROM daily_counts
GROUP BY facility_code;

Length of Stay Calculations

-- Calculate average length of stay by facility
SELECT
    facility_code,
    AVG(
        EXTRACT(EPOCH FROM (
            COALESCE(book_out_datetime, NOW()) - book_in_datetime
        )) / 86400
    ) AS avg_days,
    PERCENTILE_CONT(0.5) WITHIN GROUP (
        ORDER BY EXTRACT(EPOCH FROM (
            COALESCE(book_out_datetime, NOW()) - book_in_datetime
        )) / 86400
    ) AS median_days
FROM detention
WHERE book_in_datetime >= '2025-01-01'
GROUP BY facility_code
ORDER BY avg_days DESC;

Transfer Frequency Analysis

-- Identify individuals with high transfer counts
SELECT
    anonymous_uid,
    COUNT(*) AS transfer_count,
    COUNT(DISTINCT facility_code) AS facilities_visited,
    MIN(book_in_datetime) AS first_detention,
    MAX(COALESCE(book_out_datetime, NOW())) AS last_activity
FROM detention
GROUP BY anonymous_uid
HAVING COUNT(*) > 3
ORDER BY transfer_count DESC;

Data Visualization

Interactive Filtering

Design public interfaces allowing filtering by:

Filter Options
Fiscal Year FY2020-FY2026
Area of Responsibility 24 AORs
Facility Type SPC, CDF, IGSA, FRC
Subpopulation Family, juvenile, non-criminal
Operator GEO, CoreCivic, IHSC, etc.

Map Visualizations

Effective Approaches:

  • Choropleth maps by state/AOR
  • Graduated symbols for facility size
  • Flow maps for transfer patterns
  • Heat maps for enforcement concentration

Time Series Dashboards

Track over time:

  • ADP by facility/AOR
  • Book-in volumes
  • Length of stay trends
  • Transfer frequencies
  • Death rates

Public API Design

Enabling External Research

GET /api/facilities
GET /api/facilities/{facility_code}
GET /api/facilities/{facility_code}/population?start=2025-01-01&end=2025-12-31
GET /api/statistics/adp?aor=SNA&fiscal_year=2026
GET /api/statistics/transfers?facility_code=ADELANTO

Response Format:

{
    "facility_code": "ADELANTO",
    "facility_name": "Adelanto ICE Processing Center",
    "facility_type": "CDF",
    "operator": "GEO Group",
    "statistics": {
        "period": "2026-01",
        "average_daily_population": 1802,
        "book_ins": 423,
        "book_outs": 398,
        "avg_length_of_stay_days": 45.2
    }
}

Data Quality Assurance

Validation Rules

def validate_detention_record(record):
    """
    Validate detention record integrity
    """
    errors = []

    # Required fields
    if not record.get('anonymous_uid'):
        errors.append("Missing anonymous_uid")
    if not record.get('facility_code'):
        errors.append("Missing facility_code")
    if not record.get('book_in_datetime'):
        errors.append("Missing book_in_datetime")

    # Logical checks
    if record.get('book_out_datetime'):
        if record['book_out_datetime'] < record['book_in_datetime']:
            errors.append("Book out before book in")

    # Referential integrity
    if record.get('facility_code'):
        if not facility_exists(record['facility_code']):
            errors.append(f"Unknown facility: {record['facility_code']}")

    return errors

Handling Data Gaps

Gap Type Strategy
Missing book_out Flag as "currently detained"
Unknown facility Create placeholder, research later
Date inconsistencies Flag for manual review
Duplicate records Deduplication pipeline

Update Frequency

Data Sources and Cadence

Source Update Frequency
FOIA bulk data As received (irregular)
Public statistics Weekly/monthly ICE releases
Facility inventory Quarterly refresh
Contract data Annual from USASpending
Inspection reports As released

Version Control

Maintain historical snapshots:

  • Monthly database dumps
  • Change logs for facility updates
  • Data source provenance tracking

Infrastructure Requirements

Minimum Setup

Component Specification
Database PostgreSQL 14+
Storage 100GB+ for multi-year data
API Server Node.js or Python Flask
Visualization D3.js, Leaflet, or Tableau

Scaling Considerations

For large-scale analysis:

  • Read replicas for query load
  • Time-series partitioning for detention table
  • Caching layer for common queries
  • CDN for map tiles

Related Resources