Querying Medicaid Data
Querying diagnoses
Querying by diagnoses might be expensive. Use EXPLAIN to understand and optimize your queries See SampleQuery.
Listing patients
The following sample lists all patient hospitalized with specified diagnosis codes. The code can be either primary or secondary diagnosis.
SELECT
state,
zip,
YEAR,
age(admission_date::timestamp, dob::timestamp) as bene_age,
admission_date,
discharge_date,
diagnosis,
file,
bene_id
FROM
medicaid.admissions
natural join medicaid.enrollments
natural join medicaid.beneficiaries
WHERE
diagnosis && ARRAY['29620'::varchar, '29633'::varchar]
AND EXTRACT (YEAR FROM age(admission_date::timestamp, dob::timestamp))
BETWEEN 10 and 18
This query uses array operator &&
.
See documentation
to understand how it works.
It also uses two conditions in WHERE clause:
Diagnosis has to be one of those listed (ICD9 29620 or 29633)
Age at a date of admission has to be between 10 and 18 years.
Problem: no wildcarding
Here we encounter to one of the problems.
The syntax above (based on array operators) does not support wildcards.
You must write all the ICD codes explicitly,
something like this ‘2962*’ or ‘2962%’ will not fly.
You technically can get around this with UNNEST function,
but then the index built on
diagnosis column will not be used and query performance might become horrible.
One solution can be to write a Python program that will generate the query explicitly listing all required ICD codes, i.e. ‘2962’, ‘29620’,’29621’, etc. You would need to download ICD9 data for it, e.g. from UMLS website.
Calculating numbers
Instead of listing patients we can just calculate the number of those patients that are of interest to us, without seeing individual records. This is done through aggregate queries.
If we want to calculate hospitalizations in every zip code for every year separately, we can use the following query:
SELECT
zip,
year,
count(*) as enrollees,
COUNT(*) FILTER (
WHERE bene_id IN (
SELECT bene_id
FROM medicaid.admissions natural join medicaid.beneficiaries
WHERE
diagnosis && ARRAY['2962'::varchar, '2963'::varchar]
AND EXTRACT (
YEAR FROM age(admission_date::timestamp, dob::timestamp)
) BETWEEN 10 and 18
)
) AS hospitalized
FROM
medicaid.enrollments
GROUP BY
year,
zip
ORDER BY
year,
zip
This query uses an array operator and FILTER clause.
This query calculates the number of all people enrolled in Medicaid in a certain zip code. Instead you might want to calculate just those whos age is between 10 and 18 years. To do this you should use FILTER clause to calculate enrollees similar to how it is used to calculate hospitalized patients.
Now, to calculate hospitalizations in every Zip code over all years:
SELECT
zip,
count(*) as enrollees,
COUNT(*) FILTER (
WHERE bene_id IN (
SELECT bene_id
FROM medicaid.admissions natural join medicaid.beneficiaries
WHERE
diagnosis && ARRAY['2962'::varchar, '2963'::varchar]
AND EXTRACT (
YEAR FROM age(admission_date::timestamp, dob::timestamp)
) BETWEEN 10 and 18
)
) AS hospitalized
FROM
medicaid.enrollments
GROUP BY
zip
ORDER BY
3 desc,
zip