The tester Module
Generic object for testing for data quality issues.
Tester class contains list of tests to run on data. Tests contain a variable name, a condition, and a severity
-
class Condition(value)[source]
An enumeration.
-
less_than = 'lt'
-
greater_than = 'gt'
-
data_type = 'dtype'
-
no_missing = 'no_nan'
-
count_missing = 'count_nan'
-
class Severity(value)[source]
An enumeration.
-
debug = 10
-
info = 20
-
warning = 30
-
error = 40
-
critical = 50
-
exception ExpectationError[source]
Error for when an expected value to a condition cannot be valid
-
class Test(variable, condition, severity, val=None, name=None, logger=None)[source]
-
val
Value to compare against, can be excluded for a no_missing
check
-
check(df: DataFrame)[source]
Check variable of input dataframe to see if it meets conditions
- Parameters
df – Pandas data frame
- Returns
boolean of if the data passed the test
-
class Tester(name, yaml_file=None)[source]
-
add(t: Test)[source]
-
load_yaml(yaml_file)[source]
-
check(df: DataFrame)[source]