NSAPH EPA Package Description
Toolkit for downloading and preprocessing of data provided by EPA
Handles the following types of data:
EPA AQS Data hosted at https://www.epa.gov/aqs and EPA AirNow data from https://docs.airnowapi.org/webservices
AirNow contains real-time up-to-date pollution data but is less reliable than AQS
Package Contents
- EPA Toolkit Usage
- Python Components
- Main Class for EPA AirNow Package
- Downloader for AirNow
- Configuration Options for EPA AirNow Package
- The airnow_setup Module
- The airnow_shapes Module
- Main Class for EPA AQS Package
- Configuration Options for EPA AQS Package
- The aqs_expand Module
- Internal scripts used for download tasks
- The registry Module
- CWL Tools and Common Workflows
- Full EPA AirNow Processing Pipeline (including downloading shapefiles)
- Full AirNowProcessing Pipeline (with shapefiles on local file system)
- airnow_local_shapes.cwl
- airnow.cwl
- Full EPA AQS Processing Pipeline
- aqs.cwl
- Downloader of AirNow Data
- download_airnow.cwl
- Downloader of AQS Data
- download_aqs.cwl
- Resource Loader
- ensure_resource.cwl
- Downloader of AQS Data
- expand_aqs.cwl
- Downloader of AirNow Data
- get_shapes.cwl
- Index Builder
- index.cwl
- Universal uploader of the tabular data to the database
- ingest.cwl
- Database initializer
- initcoredb.cwl
- Introspector for downloaded data file
- introspect.cwl
- Runs an SQL Test Script, presumably generated by DBT Utility
- run_test.cwl
- Sample arguments to AirNow pipeline
- Sample arguments to AQS pipeline
- Prepare environment for AirNow
- setup_airnow.cwl
- Test harness for airnow.cwl
- test_airnow.cwl
- Test harness for aqs.cwl
- test_aqs.cwl
- Table tuner tool (running VACUUM)
- vacuum.cwl
Querying Data
AirNow
Note: Negative values indicate missing data, therefore we need to specify a condition (value > 0)
Columns
COLUMN_NAME |
TYPE_NAME |
IS_NULLABLE |
IS_AUTOINCREMENT |
IS_GENERATEDCOLUMN |
---|---|---|---|---|
fullaqscode |
text |
YES |
NO |
NO |
latitude |
numeric |
YES |
NO |
NO |
longitude |
numeric |
YES |
NO |
NO |
utc |
timestamp |
YES |
NO |
NO |
parameter |
varchar |
YES |
NO |
NO |
unit |
varchar |
YES |
NO |
NO |
value |
numeric |
YES |
NO |
NO |
aqi |
numeric |
YES |
NO |
NO |
category |
int4 |
YES |
NO |
NO |
sitename |
varchar |
YES |
NO |
NO |
agencyname |
varchar |
YES |
NO |
NO |
intlaqscode |
text |
YES |
NO |
NO |
zcta |
int4 |
YES |
NO |
NO |
state |
int4 |
YES |
NO |
NO |
fips5 |
int4 |
YES |
NO |
NO |
statefp |
int4 |
YES |
NO |
NO |
countyfp |
int4 |
YES |
NO |
NO |
county |
int4 |
YES |
NO |
NO |
stusps |
varchar |
YES |
NO |
NO |
record |
varchar |
NO |
NO |
NO |
monitor |
varchar |
YES |
NO |
NO |
Sample AirNow query by date
SELECT
utc::date,
AVG(value) AS mean_value,
MIN(value) AS minimum,
MAX(value) AS maximum,
COUNT(*)
FROM
epa.airnow_pm25_2022
WHERE value > 0
GROUP BY 1
ORDER BY 1
Sample AirNow query, aggregating monthly
SELECT
EXTRACT(MONTH FROM utc),
AVG(value) AS mean_value,
MIN(value) AS minimum,
MAX(value) AS maximum,
COUNT(*)
FROM
epa.airnow_pm25_2022
WHERE value > 0
GROUP BY 1
ORDER BY 1
Sample AirNow query, aggregating by State
SELECT
stusps,
AVG(value) AS mean_value,
MIN(value) AS minimum,
MAX(value) AS maximum,
COUNT(*)
FROM
epa.airnow_pm25_2022
WHERE value > 0
GROUP BY 1
ORDER BY 2 desc
AQS
See AQS Data Dictionary for the description of the columns
Annual Columns
COLUMN_NAME |
TYPE_NAME |
IS_NULLABLE |
IS_AUTOINCREMENT |
IS_GENERATEDCOLUMN |
---|---|---|---|---|
state_code |
varchar |
YES |
NO |
NO |
county_code |
varchar |
YES |
NO |
NO |
site_num |
varchar |
YES |
NO |
NO |
parameter_code |
varchar |
YES |
NO |
NO |
poc |
numeric |
YES |
NO |
NO |
latitude |
numeric |
YES |
NO |
NO |
longitude |
numeric |
YES |
NO |
NO |
datum |
varchar |
YES |
NO |
NO |
parameter_name |
varchar |
YES |
NO |
NO |
sample_duration |
varchar |
YES |
NO |
NO |
pollutant_standard |
varchar |
YES |
NO |
NO |
metric_used |
varchar |
YES |
NO |
NO |
method_name |
varchar |
YES |
NO |
NO |
year |
numeric |
YES |
NO |
NO |
units_of_measure |
varchar |
YES |
NO |
NO |
event_type |
varchar |
YES |
NO |
NO |
observation_count |
numeric |
YES |
NO |
NO |
observation_percent |
numeric |
YES |
NO |
NO |
completeness_indicator |
varchar |
YES |
NO |
NO |
valid_day_count |
numeric |
YES |
NO |
NO |
required_day_count |
numeric |
YES |
NO |
NO |
exceptional_data_count |
numeric |
YES |
NO |
NO |
null_data_count |
numeric |
YES |
NO |
NO |
primary_exceedance_count |
varchar |
YES |
NO |
NO |
secondary_exceedance_count |
varchar |
YES |
NO |
NO |
certification_indicator |
varchar |
YES |
NO |
NO |
num_obs_below_mdl |
numeric |
YES |
NO |
NO |
arithmetic_mean |
numeric |
YES |
NO |
NO |
arithmetic_standard_dev |
numeric |
YES |
NO |
NO |
c_1st_max_value |
numeric |
YES |
NO |
NO |
c_1st_max_datetime |
varchar |
YES |
NO |
NO |
c_2nd_max_value |
varchar |
YES |
NO |
NO |
c_2nd_max_datetime |
varchar |
YES |
NO |
NO |
c_3rd_max_value |
varchar |
YES |
NO |
NO |
c_3rd_max_datetime |
varchar |
YES |
NO |
NO |
c_4th_max_value |
varchar |
YES |
NO |
NO |
c_4th_max_datetime |
varchar |
YES |
NO |
NO |
c_1st_max_non_overlapping_value |
varchar |
YES |
NO |
NO |
c_1st_no_max_datetime |
varchar |
YES |
NO |
NO |
c_2nd_max_non_overlapping_value |
varchar |
YES |
NO |
NO |
c_2nd_no_max_datetime |
varchar |
YES |
NO |
NO |
c_99th_percentile |
numeric |
YES |
NO |
NO |
c_98th_percentile |
numeric |
YES |
NO |
NO |
c_95th_percentile |
numeric |
YES |
NO |
NO |
c_90th_percentile |
numeric |
YES |
NO |
NO |
c_75th_percentile |
numeric |
YES |
NO |
NO |
c_50th_percentile |
numeric |
YES |
NO |
NO |
c_10th_percentile |
numeric |
YES |
NO |
NO |
local_site_name |
varchar |
YES |
NO |
NO |
address |
varchar |
YES |
NO |
NO |
state_name |
varchar |
YES |
NO |
NO |
county_name |
varchar |
YES |
NO |
NO |
city_name |
varchar |
YES |
NO |
NO |
cbsa_name |
varchar |
YES |
NO |
NO |
date_of_last_change |
date |
YES |
NO |
NO |
monitor |
varchar |
YES |
NO |
NO |
record |
varchar |
NO |
NO |
NO |
Daily columns
COLUMN_NAME |
TYPE_NAME |
IS_NULLABLE |
SCOPE_TABLE |
IS_AUTOINCREMENT |
IS_GENERATEDCOLUMN |
---|---|---|---|---|---|
state_code |
varchar |
YES |
(null) |
NO |
NO |
county_code |
varchar |
YES |
(null) |
NO |
NO |
site_num |
varchar |
YES |
(null) |
NO |
NO |
parameter_code |
varchar |
YES |
(null) |
NO |
NO |
poc |
numeric |
YES |
(null) |
NO |
NO |
latitude |
numeric |
YES |
(null) |
NO |
NO |
longitude |
numeric |
YES |
(null) |
NO |
NO |
datum |
varchar |
YES |
(null) |
NO |
NO |
parameter_name |
varchar |
YES |
(null) |
NO |
NO |
sample_duration |
varchar |
YES |
(null) |
NO |
NO |
pollutant_standard |
varchar |
YES |
(null) |
NO |
NO |
date_local |
date |
YES |
(null) |
NO |
NO |
units_of_measure |
varchar |
YES |
(null) |
NO |
NO |
event_type |
varchar |
YES |
(null) |
NO |
NO |
observation_count |
numeric |
YES |
(null) |
NO |
NO |
observation_percent |
numeric |
YES |
(null) |
NO |
NO |
arithmetic_mean |
numeric |
YES |
(null) |
NO |
NO |
c_1st_max_value |
numeric |
YES |
(null) |
NO |
NO |
c_1st_max_hour |
numeric |
YES |
(null) |
NO |
NO |
aqi |
numeric |
YES |
(null) |
NO |
NO |
method_code |
varchar |
YES |
(null) |
NO |
NO |
method_name |
varchar |
YES |
(null) |
NO |
NO |
local_site_name |
varchar |
YES |
(null) |
NO |
NO |
address |
varchar |
YES |
(null) |
NO |
NO |
state_name |
varchar |
YES |
(null) |
NO |
NO |
county_name |
varchar |
YES |
(null) |
NO |
NO |
city_name |
varchar |
YES |
(null) |
NO |
NO |
cbsa_name |
varchar |
YES |
(null) |
NO |
NO |
date_of_last_change |
date |
YES |
(null) |
NO |
NO |
monitor |
varchar |
YES |
(null) |
NO |
NO |
record |
varchar |
NO |
(null) |
NO |
NO |