Internal scripts used for download tasks
Python module to download EPA AQS Data hosted at https://www.epa.gov/aqs
The module can be used as a library of functions to be called from other python scripts.
The data is downloaded from https://aqs.epa.gov/aqsweb/airdata/download_files.html
The tool adds a column containing a uniquely generated Monitor Key
Probably the only method useful to external user is download_aqs_data()
- transfer(reader: DictReader, writer: DictWriter, flt=None, header: bool = True)[source]
Specific for EPA AQS Data
Rewrites the CSV content adding Monitor Key and optionally filtering rows by a provided list of parameter codes
- Parameters
reader¶ – Input data as an instance of csv.DictReader
writer¶ – Output source should be provided as csv.DictWriter
flt¶ – Optionally, a callable function returning True for rows that should be written to the output and False for those that should be omitted
header¶ – whether to first write header row
- Returns
Nothing
- add_monitor_key(row: Dict)[source]
Internal method to generate and add unique Monitor Key
- Parameters
row¶ – a row of AQS CSV file
- Returns
Nothing, modifies the given row in place
- download_data(task: DownloadTask)[source]
A utility method to download the content of given URL to the given file
- destination_path(destination: str, path: str) str [source]
A utility method to construct destination file path
- collect_annual_downloads(destination: str, path: str, contiguous_year_segment: List, parameters: List) DownloadTask [source]
A utility method to collect all URLs that should be downloaded for a given list of years and EPA AQS parameters
- Parameters
- Returns
downloads list
- collect_daily_downloads(destination: str, ylabel: str, contiguous_year_segment: List, parameter) DownloadTask [source]
A utility method to collect all URLs that should be downloaded for a given list of years and EPA AQS parameters
- Parameters
destination¶ – Destination directory for downloads
ylabel¶ – a label to use for years in the destination path
contiguous_year_segment¶ – a list of contiguous years taht can be saved in the same file
parameters¶ – List of EPA AQS Parameter codes
downloads¶ – The resulting collection of downloads that have to be performed
- Returns
downloads list
- collect_aqs_download_tasks(context: AQSContext)[source]
Main entry into the library
- Parameters
aggregation¶ – Type of time aggregation: annual or daily
years¶ – a list of years to include, if None - then all years are included
destination¶ – Destination Directory
parameters¶ – List of EPA AQS Parameter codes. For annual aggregation can be empty, in which case all data is downloaded. Required for daily aggregation. Can contain either integer codes, or mnemonic instanced of Parameter Enum or both.
merge_years¶ –
- Returns
- as_stream(url: str, extension: str = '.csv', params=None, mode=None)[source]
Returns the content of URL as a stream. In case the content is in zip format (excluding gzip) creates a temporary file
- Parameters
- Returns
Content of the URL or a zip entry
- as_csv_reader(url: str, mode=None) DictReader [source]
An utility method to return the CSV content of the URL as CSVReader
- Parameters
url¶ – URL
- Returns
an instance of csv.DictReader
- file_as_stream(filename: str, extension: str = '.csv', mode=None)[source]
Returns the content of file as a stream. In case the content is in zip format (excluding gzip) creates a temporary file
- file_as_csv_reader(filename: str)[source]
An utility method to return the CSV content of the file as CSVReader
- Parameters
filename¶ – path to file
- Returns
an instance of csv.DictReader
- check_http_response(r: Response)[source]
An internal method raises an exception of HTTP response is not OK
- Parameters
r¶ – Response
- Returns
nothing, raises an exception if response is not OK
- download(url: str, to: IO)[source]
A utility method to download large binary data to a file-like object
- is_downloaded(url: str, target: str, check_size: int = 0) bool [source]
Checks if the same data has already been downloaded
- Parameters
check_size¶ – Use default value (0) if target size should be equal to source size. If several urls are combined when downloaded then specify a positive integer to check that destination file size is greater than the specified value. Specifying negative value will disable size check
url¶ – URL with data
target¶ – Destination of the downloads
- Returns
True if the destination file exists and is newer than URL content
- write_csv(reader: DictReader, writer: DictWriter, transformer=None, filter=None, write_header: bool = True)[source]
Rewrites the CSV content optionally transforming and filtering rows
- Parameters
transformer¶ – An optional callable that tranmsforms a row in place
reader¶ – Input data as an instance of csv.DictReader
writer¶ – Output source should be provided as csv.DictWriter
filter¶ – Optionally, a callable function returning True for rows that should be written to the output and False for those that should be omitted
write_header¶ – whether to first write header row
- Returns
Nothing
- basename(path)[source]
Returns a name without extension of a file or an archive entry
- Parameters
path¶ – a path to a file or archive entry
- Returns
base name without full path or extension
- is_readme(name: str) bool [source]
Checks if a file is a documentation file This method is used to extract some metadata from documentation provided as markDOwn files
- Parameters
name¶ –
- Returns
- get_entries(path: str) Tuple[List, Callable] [source]
Returns a list of entries in an archive or files in a directory
- Parameters
path¶ – path to a directory or an archive
- Returns
Tuple with the list of entry names and a method
to open these entries for reading
- get_readme(path: str)[source]
Looks for a README file in the specified path :param _sphinx_paramlinks_nsaph_utils.utils.io_utils.get_readme.path: a path to a folder or an archive :return: a file that is possibly a README file