Importing Medicaid Data Processed by the Legacy Pipeline

Status of this Document

This document describes an abandoned attempt to load already processed medicaid data into NSAPH Data Platform PostgreSQL Database. The attempt was abandoned in favor of creating a reproducible pipeline that ingests raw CMS data from the packages delivered by ResDac.

See documentation about the new pipeline.

Examples of ingestion of processed data:

All paths are on nsaph-sandbox01.rc.fas.harvard.edu

Ingest demographics:

python -u -m nsaph.model2 /data/incoming/rce/ci3_d_medicaid/processed_data/cms_medicaid-max/csv/maxdata_demographics.csv.gz

Ingest enrollments (yearly) and eligibility (monthly)

nohup python -u -m nsaph.data_model.model2 --data /data/incoming/rce/ci3_d_medicaid/processed_data/cms_medicaid-max/data_cms_medicaid-max-ps_patient-year/medicaid_mortality_2005.fst -t enrollments_year --threads 4 --page 5000 &

Ingest admissions

for f in /data/incoming/rce/ci3_d_medicaid/processed_data/cms_medicaid-max/data_cms_medicaid-max-ip_patient-admission-date/maxdata_*_ip_${year}.fst ; do 
    date
    echo $f
    python -u -m nsaph.data_model.model2 --data $f  -t admissions --page 5000 --log 10000 --threads 2
done