Gridmet Computational Utilities
What is gridMET?
gridMET is a dataset of daily high-spatial resolution (~4-km, 1/24th degree) surface meteorological data covering the contiguous US from 1979-yesterday. The data are also known and cited as METDATA.
Executing pipelines from this package require a collection of shape files corresponding to geographies for which data is aggregated (for example, zip code areas or counties).
The data has to be placed in the following directory structure:
${year}/${geo_type: zip|county|etc.}/${shape:point|polygon}/
Which geography is used is defined by geography
argument that defaults
to “zip”. Only actually used geographies must have their shape files
for the years actually used.
Using command line gridMET utility
usage: gridmet.py [-h] --variable
{bi,erc,etr,fm100,fm1000,pet,pr,rmax,rmin,sph,srad,th,tmmn,tmmx,vpd,vs}
[{bi,erc,etr,fm100,fm1000,pet,pr,rmax,rmin,sph,srad,th,tmmn,tmmx,vpd,vs} ...]
[--strategy {default,all_touched,combined}]
[--destination DESTINATION] [--downloads DOWNLOADS]
[--geography GEOGRAPHY] [--shapes_dir SHAPES_DIR]
[--shapes [SHAPES [SHAPES ...]]]
optional arguments:
-h, --help show this help message and exit
--years [YEARS [YEARS ...]], -y [YEARS [YEARS ...]]
Year or list of years to download. For example, the
following argument: `-y 1992:1995 1998 1999 2011
2015:2017` will produce the following list:
[1992,1993,1994,1995,1998,1999,2011,2015,2016,2017] ,
default: 1990:2020
--compress, -c Use gzip compression for the result, default: True
--variables {bi,erc,etr,fm100,fm1000,pet,pr,rmax,rmin,sph,srad,th,tmmn,tmmx,vpd,vs} [{bi,erc,etr,fm100,fm1000,pet,pr,rmax,rmin,sph,srad,th,tmmn,tmmx,vpd,vs} ...], --var {bi,erc,etr,fm100,fm1000,pet,pr,rmax,rmin,sph,srad,th,tmmn,tmmx,vpd,vs} [{bi,erc,etr,fm100,fm1000,pet,pr,rmax,rmin,sph,srad,th,tmmn,tmmx,vpd,vs} ...]
Gridmet bands or variables
--strategy {default,all_touched,combined,downscale}, -s {default,all_touched,combined,downscale}
Rasterization Strategy, default: default
--destination DESTINATION, --dest DESTINATION, -d DESTINATION
Destination directory for the processed files,
default: data/processed
--raw_downloads RAW_DOWNLOADS
Directory for downloaded raw files, default:
data/downloads
--geography {zip,county,custom}
The type of geographic area over which we aggregate
data, default: zip
--shapes_dir SHAPES_DIR
Directory containing shape files for geographies.
Directory structure is expected to be:
.../${year}/${geo_type}/{point|polygon}/, default:
shapes
--shapes [{point,polygon} [{point,polygon} ...]]
Type of shapes to aggregate over, default: ['polygon']
--points POINTS Path to CSV file containing points, default:
--coordinates COORDINATES [COORDINATES ...], --xy COORDINATES [COORDINATES ...], --coord COORDINATES [COORDINATES ...]
Column names for coordinates, default:
--metadata METADATA [METADATA ...], -m METADATA [METADATA ...], --meta METADATA [METADATA ...]
Column names for metadata, default:
Example
One can try it on nsaph-sandbox01.rc.fas.harvard.edu
changing to folder:
/data/projects/gridmet/
and running the following command (do not forget -u
option, or you
will not be able to see the progress):
source /home/nsaph/projects/tools/gridmet/.gridmet/bin/activate && PYTHONPATH=/home/nsaph/projects/tools/gridmet/src/python python -u -m gridmet --var tmmx -y 2001 --shapes_dir shapes/zip_shape_files --strategy downscale
The results can be then found in data/processed
folder
CWL pipelines and tools
- Downloader of gridMET Data
- add_daily_data.cwl
- Uploader of the gridMET Data to the database
- add_data.cwl
- Tool aggregating a NetCDF grid file over shapes
- aggregate_daily.cwl
- Workflow to aggregate and ingest NetCDF files for one year
- aggregate_one_file.cwl
- Aggregates data in NetCDF file over given geographies
- aggregate_wustl.cwl
- Downloader of gridMET Data
- download.cwl
- Downloader of AirNow Data
- get_shapes.cwl
- Pipeline to aggregate data from Climatology Lab
- Sub-workflow init_tables from gridmet.cwl
- tmpd5z66z6qgridmet.cwl
- gridMET Pipeline
- Sub-workflow process from gridmet_dwnl_only.cwl
- tmp9pmq12ragridmet_dwnl_only.cwl
- gridmet_dwnl_only.cwl
- gridMET Pipeline
- Sub-workflow process from gridmet_local_shapes.cwl
- tmprm194zk3gridmet_local_shapes.cwl
- gridmet_local_shapes.cwl
- Workflow to aggregate and ingest one gridMET file in NetCDF format
- gridmet_one_file.cwl
- gridmet.cwl
- Index Builder
- index.cwl
- Universal uploader of the tabular data to the database
- ingest.cwl
- Database initializer
- initdb.cwl
- Pipeline to aggregate data in NetCDF format over given geographies
- pm25_yearly_download.cwl
- Model YAML Writer
- registry.cwl
- Generic Table (View/Materialized View) Initializer
- reset.cwl
- Runs an SQL Test Script, presumably generated by DBT Utility
- run_test.cwl
- Test harness for gridmet.cwl
- test_gridmet.cwl
- Test harness for pm25_yearly_download.cwl
- test_pm25_yearly_download.cwl
- Table tuner tool (running VACUUM)
- vacuum.cwl
- Pipeline to ingest Pollution downloaded from WashU Box
- wustl_consolidate_components.cwl
- wustl_consolidate_components.cwl
- Expression evaluator to format a file name for pollution files downloaded from WashU
- wustl_file_pattern.cwl
- Workflow to aggregate and ingest one file in NetCDF format
- wustl_one_file.cwl
- Workflow to aggregate and ingest NetCDF files for one year
- wustl_one_year.cwl
- wustl.cwl