Quick Start

Set up your environment (or activate it) using Setting Up Environment section.

Census data comes from different resources, and they may have a different formats in different years. The current supported source of data includes:

acs American Community Survey

dec Data collected by the decennial census

pums Public use microdata sample (has not been implemented.)

The users need to get familiar with these data sources and their variables and tables names for each year. In the census package, one can check the available variables based on the provided year and data source.

For example, in the following code we want to see all available variables for acs1 dataset in 2011.

from census.census_info import get_varlist

varlist = get_varlist(2011, "acs1")
print(f"{Number of downloaded variables: len(varlist)}")
print(f"List of first 10 variables: ")
varlist[1:10]

The results are according to the following:

Number of downloaded variables: 34450
List of first 10 variables:
['B19001B_014E',
 'C02014_002E',
 'B23023_070E',
 'B07007PR_019E',
 'B19101A_004E',
 'B24022_061E',
 'B19001B_013E',
 'C02014_003E',
 'B07007PR_018E']

There are 34450 different variables for acs1 data set in 2011. You can google “census acs1 2011 variable_name” to get more information about the variable, or you can use the following link pattern to see more information about the data.

https://api.census.gov/data/2011/acs/acs1/variables/B23023_070E.json

And as you can see, this variable is

estimate of total females with a disability worked in the past 12 months! Usually worked 1 to 14 hours per week!!48 and 49 weeks.

Note

Please note that selecting the correct variables for the research is beyond the scope of this package and this documentation. This package will work the best when you know what you need to download.

After deciding what variables you want to download, you need to create census_vars.yml file (read more Census Variable File Structure). Here is an example:

hispanic_count:
    2000:
        census:
            num: P004002

Let’s say we want to download and review hispanic_count in 2000, for each state. We can use the following code to get the results.

import yaml
import census

with open("census_vars.yml") as f:
    yaml_dict = yaml.load(f, Loader=yaml.FullLoader)

my_var = census.VariableDev("count_hispanic", yaml_dict["hispanic_count"][2000])
data = my_var.do_query(2000, "state")
print(data)

Note

Please note that the best way to download data is using DataPlan class. These are some internal examples to become familiar with the package.

Now, let’s say we want to download the same data with DataPlan class. DataPlan has numerous methods to conduct different analyses and filters on data.

import census

plan = census.DataPlan("census_myvar_test.yml", geometry="state", years=census.census_years(2000,2000))
plan.assemble_data()
plan.data.head()

If you take a look at the results, you will see that the column names are according to your request rather than the original table on Census API.

state  year  hispanic_count
   2000  75830
   2000  25852
   2000  1295617
   2000  86866
   2000  10966556

Now, if we want to compute the percentage of Hispanic population we can change the yaml file according to the following:

hispanic_pct:
2000:
    census:
        num: P004002
        den: P001001

And rerun the plan:

import census

plan = census.DataPlan("census_myvar_test.yml", geometry="state", years=census.census_years(2000,2000))
plan.assemble_data()
plan.data.head()

This time we get the following results:

state  year  hispanic_count
   2000  0.017052
   2000  0.041236
   2000  0.252526
   2000  0.032493
   2000  0.323768