The tester Module

Generic object for testing for data quality issues.

Tester class contains list of tests to run on data. Tests contain a variable name, a condition, and a severity

class Condition(value)[source]

An enumeration.

less_than = 'lt'
greater_than = 'gt'
data_type = 'dtype'
no_missing = 'no_nan'
count_missing = 'count_nan'
class Severity(value)[source]

An enumeration.

debug = 10
info = 20
warning = 30
error = 40
critical = 50
exception ExpectationError[source]

Error for when an expected value to a condition cannot be valid

class Test(variable, condition, severity, val=None, name=None, logger=None)[source]
val

Value to compare against, can be excluded for a no_missing check

check(df: DataFrame)[source]

Check variable of input dataframe to see if it meets conditions

Parameters

df – Pandas data frame

Returns

boolean of if the data passed the test

class Tester(name, yaml_file=None)[source]
add(t: Test)[source]
load_yaml(yaml_file)[source]
check(df: DataFrame)[source]