Modules

Modules that are used in this project are listed in this section.

query.py

Functions for Requesting Data from the Census API

get_census_data(year: int, variables: list, geography: str, dataset: str, sum_file: Optional[str] = None, key: Optional[str] = None, state: Optional[str] = None, county: Optional[str] = None)[source]

Parameters

year¶ – Year of data that we are querying
variables¶ – list of strings containing the census variable names to request
geography¶ – Geographic resolution we’re querying at (zcta, county, state)
dataset¶ – The census data set you want (dec, acs1, acs5, pums)
sum_file¶ – For the 2000 census, sf1 or sf3
key¶ – Your census API key. We recommend not passing it here and instead either setting the “CENSUS_API_KEY” environmental variable or using the set_api_key function.
state¶ – 2 digit FIPS code of the state you want to limit the query to (i.e. “06” for CA)
county¶ – 3 digit FIPS code of the county you want to include. Requires state to be specified

Returns

a pandas DataFrame

api_geography(geo: str)[source]

go from function shorthand to the input the census api needs

Parameters: geo¶ – shorthand for a given geography type
Returns: corrected geography name

assemble_data.py

Core module for assembling a census plan

class DataPlan(yaml_path, geometry, years=[2000, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019], state=None, county=None)[source]

A class containing information on how to create a desired set of census data.

Inputs for initializing a DataPlan object from a census yaml document

Yaml_path: path to a yaml file. Structure defined in Census Variable File Structure
Geometry: which census geography this plan is for
Years: The list of years to query data from. The census_years() function can calculate which years in your timeframe of interest can be queried for the decennial and 5 year acs data. Note that this may not apply for the ACS1 or other data. That function may be updated in the future, but for now creating lists of years besides the defaults is left as an exercise for the interested reader.
State: 2 digit FIPS code of the state you want to limit the query to (i.e. “06” for CA)
County: 3 digit FIPS code of the county you want to include. Requires state to be specified

Members:

geometry: which census geography this plan is for
years: The list of years that the data should be queried for
state: 2 digit FIPS code of the state you want to limit the query to (i.e. “06” for CA)
county: 3 digit FIPS code of the county you want to include. Requires state to be specified
plan: A dict with keys of years, storing lists of VariableDef objects defining the variables to be calculated for that year. Created from a yaml file. Structure defined in Census Variable File Structure
data: A pandas data frame created based on the defined data plan. only exists after the DataPlan.assemble_data() method is called.

initialize a DataPlan object from a census yaml document

Parameters

yaml_path¶ – path to a yaml file. Structure defined in Census Variable File Structure
geometry¶ – which census geography this plan is for
years¶ – The list of years to query data from. The census_years() function can calculate which years in your timeframe of interest can be queried for the decennial and 5 year acs data. Note that this may not apply for the ACS1 or other data. That function may be updated in the future, but for now creating lists of years besides the defaults is left as an exercise for the interested reader.
state¶ – 2 digit FIPS code of the state you want to limit the query to (i.e. “06” for CA)
county¶ – 3 digit FIPS code of the county you want to include. Requires state to be specified

assemble_data()[source]

Create a data frame for each geoid , for each year, with each variable as defined in the data plan

Returns: Assembled data frame stored in self.data

get_var_names()[source]

Return a list containing all the variable names that are created in the data plan

Returns: List of strings

add_geoid()[source]

add a single column named ‘geoid’ to self.data combining all portions of a data sets geographical identifiers

Returns: None

adjust_geo_fields()[source]

Adds geo columns to standardize it’s set

Returns: None

create_missingness(min_year=None, max_year=None)[source]: Create a row for all combinations of geospatial ID and year :return:

write_data(path, file_type='csv')[source]

Write data out to a file. Default method is to write out to csv. new methods can be implemented in the future.

Parameters

path¶ – Path to write the data to
file_type¶ – Method to output data, currently only implemented for csv files

Returns

None, writes data to disk.

calculate_densities(variables=['population'], sq_mi=True)[source]: Divide specified variables by area :param _sphinx_paramlinks_census.assemble_data.DataPlan.calculate_densities.variables: List of variables to calculate densities for :param _sphinx_paramlinks_census.assemble_data.DataPlan.calculate_densities.sq_mi: Should denisties be calculated per square mile? If false, calculated per square meter :return: None

interpolate(method='ma', min_year=None, max_year=None)[source]: Fill in values :param _sphinx_paramlinks_census.assemble_data.DataPlan.interpolate.method: Interpolation method to use :param _sphinx_paramlinks_census.assemble_data.DataPlan.interpolate.min_year: Minimum year to interpolate :param _sphinx_paramlinks_census.assemble_data.DataPlan.interpolate.max_year: Maximum year to interpolate :return:

quality_check(test_file: str)[source]: Test self.data for the checks defined in the test file :param _sphinx_paramlinks_census.assemble_data.DataPlan.quality_check.test_file: path to a yaml file defining tests per the quality check paradigm in nsaph_utils.qc :return: None

write_schema(filename: Optional[str] = None, table_name: Optional[str] = None)[source]: Write out a yaml file describing the data schema :param _sphinx_paramlinks_census.assemble_data.DataPlan.write_schema.filename: path to write to :param _sphinx_paramlinks_census.assemble_data.DataPlan.write_schema.table_name: Name of the table for the schema :return: True

class VariableDef(name: str, var_dict: dict, log: Optional[Logger] = None)[source]

Structured way of representing what we need to know for a variable. Members: * dataset: a string. The data set used to calculate a variable, should be dec, acs1, acs5, or pums * num: a list, the names of variables that make up the numerator * den: a list, the names of the variables that make up the denominator. Can be missing * has_den: a boolean, indicates whether or not there is a denominator.

get_vars()[source]

Returns: a union of all census variables needed for this variable

do_query(year, geometry, state=None, county=None)[source]: Run the query defined by the contained variables :param _sphinx_paramlinks_census.assemble_data.VariableDef.do_query.geometry: census geometry to query :param _sphinx_paramlinks_census.assemble_data.VariableDef.do_query.year: year of data to query :param _sphinx_paramlinks_census.assemble_data.VariableDef.do_query.state: 2 Digit Fips code of state to limit the query to :param _sphinx_paramlinks_census.assemble_data.VariableDef.do_query.county: 3 Digit county code to limit the query to, must be used with state :return: data frame of all census variables specified by the query

calculate_var(year, geometry, state=None, county=None)[source]: Query the required data from the census, then calculate the variable defined :param _sphinx_paramlinks_census.assemble_data.VariableDef.calculate_var.year: year of data to query :param _sphinx_paramlinks_census.assemble_data.VariableDef.calculate_var.geometry: census geometry to query :param _sphinx_paramlinks_census.assemble_data.VariableDef.calculate_var.state: 2 Digit Fips code of state to limit the query to :param _sphinx_paramlinks_census.assemble_data.VariableDef.calculate_var.county: 3 Digit county code to limit the query to, must be used with state :return: a data frame with one column of the calcualted variable and the census geography columns

census_info.py

Core module for handling census metadata

get_endpoint(year: int, dataset: str, sum_file: Optional[str] = None)[source]

Returns a string containing the URL to the census API endpoint

Parameters

year¶ – The year for which you want data
dataset¶ – The census data set you want (dec, acs1, acs5, pums)
sum_file¶ – For the 2000 census, sf1 or sf3

Returns

get_varlist(year: int, dataset: str, sum_file: Optional[str] = None)[source]

Parameters

year¶ – Year of data
dataset¶ – The census data set you want (dec, acs1, acs5, pums)
sum_file¶ – For the 2000 census, sf1 or sf3

Returns

Dataframe of available variables in a given data set

set_api_key(key: str)[source]

Sets an environment variable to contain your census API key. To avoid needing to run this every session you can also permanently set CENSUS_API_KEY to your key in your environment.

Parameters: key¶ – Your Census API key as a string
Returns: nothing

census_years(min_year: int = 2000, max_year: int = 2019)[source]

Constructs a list of years for which census data is available in the range provided. At this point assumes we want the decennial census and acs5. Future functionality might expand to allow this to vary.

Parameters

min_year¶ – minimum year we want data for
max_year¶ – max year we want data for (inclusive)

Returns

list of all years in specified range for which data is available

tigerweb.py

Code for interacting with the Census TIGERWEb API, query area and download shape: files.

get_area(geometry, sq_mi=True)[source]

Create a data frame of Census GEOIDs and Area. Due to the Tigerweb API’s limiting of the number of features per query to 100,000, block groups aren’t currently supported through this wrapper.

Parameters

geometry¶ – type of census geometry to use
sq_mi¶ – Should areas be converted to square miles?

Returns

pandas data frame

download_geometry(geometry, year=2019, out_dir='.')[source]

Get spatial information for a census geometry in geojson format and save it to disk

Parameters

geometry¶ – type of census geometry to use
year¶ – Year to get geometry for
out_dir¶ – Directory to save downloaded files in. Note that due to requiring multiple downloads, tract and block group downloads will create a directory if no out_dir is defined.

Returns

None, downloads files only

cli.py

Command Line Interface for the census python package

class CensusContext(doc=None)[source]

Context object supporting the CLI functionality of this package

Creates a new object

Parameters

subclass¶ – A concrete class containing configuration information Configuration options must be defined as class memebers with names, starting with one ‘_’ characters and values be instances of :class Argument:
description¶ – Optional text to use as description. If not specified, then it is extracted from subclass documentation

validate(attr, value)[source]

Subclasses can override this method to implement custom handling of command line arguments

Parameters

attr¶ – Command line argument name
value¶ – Value returned by argparse

Returns

value to use

exceptions.py

census exceptions

exception CensusException[source]

load_county_codes()[source]

Read in data file listing all counties in the US, with their state and county FIPS codes

Returns: pandas data frame

load_state_codes()[source]

Read in data file listing all states in the US, with their FIPS codes

Returns: pandas data frame

utils.py

Census utility functions

show_api_keys()[source]: Prints out api keys.