Health Data

Pipelines to process CMS data: Medicaid and Medicare

Overview of health data (Medicare and Medicaid)
Project Structure
Documentation Indices

Overview of health data (Medicare and Medicaid)

We use health data provided by Centers for Medicare & Medicaid Services (CMS)

Data processing pipelines included in this package create a data warehouse with health data (Medicare and Medicaid). They perform ingestion of raw data into the database, data cleansing and deduplication , when possible, data quality analysis and optimization of the tables for efficient queries.

Please see the following documents for details:

Data model and processing of Medicaid data
Data model and processing of Medicare data
Tips on querying of Medicaid data

Medicare processing now includes a pipeline to automatically create QC Tables. These tables are used by Apache Superset dashboard that visualizes QC results.

Project Structure 

Top level directories are:

- doc
- src

Doc directory contains documentation.

Src directory contains software source code. The directories under sources are:

- cwl
- python

CWL 

CWL folder contains reusable workflows, packaged as tools that can and should be used by all NSAPH pipelines.

Each processing step of CMS data is packaged as a standalone tool that can be run individually. Each tool is individually documented. The tools are combined into a workflow represented by medicaid.cwl and medicare.cwl files.

Python 

Python packages and modules are described in the Python Package Description document.

Included are utilities to:

Parse FTS format and generate database schema

Data Model for health data 

The data model in YAML format is used to generate database schema and processing code to ingest data into the database. Read more about the modeling in the
Data Modeling.

The model for raw data is automatically generated by parsing FTS files or analyzing SAS data.

The following models are defined here:

Medicaid processed data. See also Handling Medicaid data
- Tables
  - medicaid.beneficiaries details
  - medicaid.enrollments details
  - medicaid.eligibility details
  - medicaid.admissions details
- SQL Views, used internally for data processing
  - medicaid.monthly
  - medicaid._eligibility
Medicare processed data. See also Medicare Files Handling
- Tables
  - medicare.beneficiaries details
  - medicare.enrollments details
  - medicare.admissions details
- SQL Views, used internally for data processing
  - medicare.ps Combined raw data for patient summaries
  - medicare. Combined raw data for inpatient admissions
  - medicare._ps
  - medicare._beneficiaries
  - medicare._enrollments

SQL 

File procedures addresses the problem that creating Medicaid eligibility table in a single transaction requires too much time and memory. The stored procedures in this file split populating this table with data either by beneficiary or by year and state. Splitting by beneficiary (i.e. using one database transaction per beneficiary) works best.

File functions contain helper functions to parse dates in non-standard formats that are encountered in raw medicare files that we have.

Health Data

Overview of health data (Medicare and Medicaid)

Project Structure

CWL

Python

Data Model for health data

SQL

Documentation Indices

Project Structure 

CWL 

Python 

Data Model for health data 

SQL 

Documentation Indices 