The data_loader Module
Implements parallel loading data into a PostgreSQL database. It is also responsible for loading DDL and creation of view, both virtual and materialized.
Usage
API
Domain Data Loader
Provides Command line interface for loading data from a single or a set of column-formatted files into NSAPH PostgreSQL Database.
Input (aka source) files can be either in FST or in CSV format.
- class DataLoader(context: Optional[LoaderConfig] = None)[source]
 Class for data loader
Configuration
Common options for data manipulation
- class DBConnectionConfig(subclass, doc)[source]
 Configuration class for connection to a database
Creates a new object
- Parameters:
 subclass¶ – A concrete class containing configuration information Configuration options must be defined as class memebers with names, starting with one ‘_’ characters and values be instances of :class Argument:
description¶ – Optional text to use as description. If not specified, then it is extracted from subclass documentation
- autocommit
 Use autocommit
- db
 Path to a database connection parameters file
- connection
 Section in the database connection parameters file
- verbose
 Generate verbose output
- dryrun
 Dry run: do no database modifications
- class DBTableConfig(subclass, doc)[source]
 Creates a new object
- Parameters:
 subclass¶ – A concrete class containing configuration information Configuration options must be defined as class memebers with names, starting with one ‘_’ characters and values be instances of :class Argument:
description¶ – Optional text to use as description. If not specified, then it is extracted from subclass documentation
- table
 Name of the table to manipulate
- class CommonConfig(subclass, doc)[source]
 Abstract base class for configurators used for data loading
Creates a new object
- Parameters:
 subclass¶ – A concrete class containing configuration information Configuration options must be defined as class memebers with names, starting with one ‘_’ characters and values be instances of :class Argument:
description¶ – Optional text to use as description. If not specified, then it is extracted from subclass documentation
- domain
 Name of the domain
- registry
 Path to domain registry. Registry is a directory or an archive containing YAML files with domain definition. Default is to use the built-in registry
Domain Loader Configurator
Intended to configure loading of a single or a set of column-formatted files into NSAPH PostgreSQL Database. Input (aka source) files can be either in FST or in CSV format
Configurator assumes that the database schema is defined as a YAML or JSON file. A separate tool is available to introspect source files and infer possible database schema.
- class LoaderConfig(doc)[source]
 Configurator class for data loader
Creates a new object
- Parameters:
 subclass¶ – A concrete class containing configuration information Configuration options must be defined as class memebers with names, starting with one ‘_’ characters and values be instances of :class Argument:
description¶ – Optional text to use as description. If not specified, then it is extracted from subclass documentation
- action: Optional[DataLoaderAction]
 If this option is given, then the whole domain schema will be dropped
- data
 Path to a data file or directory. Can be a single CSV, gzipped CSV or FST file or a directory recursively containing CSV files. Can also be a tar, tar.gz (or tgz) or zip archive containing CSV files
- reset
 Force recreating table(s) if it/they already exist
- page
 Explicit page size for the database
- log
 Explicit interval for logging
- limit
 Load at most specified number of records
- buffer
 Buffer size for converting fst files
- threads
 Number of threads writing into the database
- parallelization
 Type of parallelization, if any
- pattern
 pattern for files in a directory or an archive, e.g., “**/maxdata_*_ps_*.csv”
- incremental
 Commit every file and skip over files that have already been ingested
- sloppy
 Do not update existing tables and views