Census Variable File Structure

The variables used to create the census data using the get_census package must be formally described in a yaml file. Each variable is defined by a top level entry in this file. The format for a variable is as follows:

<var_name>:
    <year>:
        <source>:
            num:
                 - <numerator_variable_1>
                 - <numerator_variable_2>
            [den]:
                 - <denominator_variable_1>
                 - <denominator_variable_2>

These descriptions provide instructions to the code on which variables to ask for from which api and how to calculate the desired variable from the information available in the API.

Field Definitions

  • var_name: The user chosen name for a variable

  • year: Which year the specification should be used for. The code uses the specification for the closest year to the input year that is current with or after the input year.

  • source: This should be either “census” or “acs”. If it is census, the code knows to make the call for this variable from the decennial census, and if acs the code uses the 5 year ACS data. As a reminder, decennial census data is available in 2000 and 2010, and ACS data is available 2009-2018. Some variables present in the 2000 census are not available in the 2010 decennial census.

  • num: This field should always be named “num”. All variables listed in this field are summed and divided by the sum of the den variables. If no den field is present, the sum of these fields will be the value reported for the variable

  • den: An optional field. This must always be named den. The variables listed here are summed, and then the sum of the numerator variables is divided by this to produce the final reported value.

Examples

  • % of the population that is Black,

    blk_pct:
       2000:
           census:
               num: P007003
               den: P007001
       2009:
           acs:
               num: B02001_003
               den: B02001_001
       2010:
           census:
               num: P006003
               den: P006001
       2018:
           acs:
               num: B02001_003
               den: B02001_001
    

Here a variable named blk_pct will be created. In 2000, the census data will be used, and the variable will be calculated as P007003/P007001. In 2009, ACS data will be used, and the variable will be calculated as B02001_003/B02001_001. In 2010, back the the census data and the variable will be calculated as P007003/P007001. Then finally, from 2011-2018, ACS data will be used and the variable will be calculated as B02001_003/B02001_001.

  • Median Household Income:

    median_household_income:
        2000:
            census:
                num: P053001
        2018:
            acs:
                num: B19013_001
    

Here a variable named median_household_income will be created. Note the lack of a denominator. In 2000, the census variable P053001 is reported. This value was missing from the 2010 census, so from 2009-2018 the acs variable B19013_001 will be used instead.

Skipping years for a Variable

Sometimes, you may want to skip getting data for a variable for a given year. The most common reason to do this is when working with ZCTAs. ZCTA level data is not available from the 5 year ACS (acs5) for 2009 and 2010, despite the acs5 being available in those years. To further complicate things, ZCTAs are available for the decennial census in 2010. If you do not provide instructions on how to handles those years and you try to query the ACS for ZCTAs, you query will hit an error and fail.

To handle this, instead of specifying a data set in your variable for a given year, you can instead write “skip” for that year.

For example, a file trying to get ZCTA level poverty rates and population might look something like this:

population:
    2000:
        census:
            num: P001001
    2009: skip
    2010:
        census:
            num: P001001
    2018:
        acs:
            num: B01001_001
poverty:
    2000:
        census:
            num:
                - P087002
            den:
                - P087001
    2010: skip
    2018:
        acs:
            num:
                - B17001_002
            den:
                - B17001_001

For population, only 2009 will be skipped, but for poverty, 2009 and 2010 will be skipped.

Census Variable Documentation

Assembling these files will require reading through the census variable documentation, which can be quite tedious, especially when trying to figure out how to assemble more complex estimates. Links to this documentation can be found here: