The data_loader Module
Implements parallel loading data into a PostgreSQL database. It is also responsible for loading DDL and creation of view, both virtual and materialized.
API
Domain Data Loader
Provides Command line interface for loading data from a single or a set of column-formatted files into NSAPH PostgreSQL Database.
Input (aka source) files can be either in FST or in CSV format.
- class DataLoader(context: Optional[LoaderConfig] = None)[source]
Class for data loader
Configuration
Common options for data manipulation
- class DBConnectionConfig(subclass, doc)[source]
Configuration class for connection to a database
Creates a new object
- Parameters
subclass¶ – A concrete class containing configuration information Configuration options must be defined as class memebers with names, starting with one ‘_’ characters and values be instances of :class Argument:
description¶ – Optional text to use as description. If not specified, then it is extracted from subclass documentation
- autocommit
Use autocommit
- db
Path to a database connection parameters file
- connection
Section in the database connection parameters file
- verbose
Generate verbose output
- dryrun
Dry run: do no database modifications
- class DBTableConfig(subclass, doc)[source]
Creates a new object
- Parameters
subclass¶ – A concrete class containing configuration information Configuration options must be defined as class memebers with names, starting with one ‘_’ characters and values be instances of :class Argument:
description¶ – Optional text to use as description. If not specified, then it is extracted from subclass documentation
- table
Name of the table to manipulate
- class CommonConfig(subclass, doc)[source]
Abstract base class for configurators used for data loading
Creates a new object
- Parameters
subclass¶ – A concrete class containing configuration information Configuration options must be defined as class memebers with names, starting with one ‘_’ characters and values be instances of :class Argument:
description¶ – Optional text to use as description. If not specified, then it is extracted from subclass documentation
- domain
Name of the domain
- registry
Path to domain registry. Registry is a directory or an archive containing YAML files with domain definition. Default is to use the built-in registry
Domain Loader Configurator
Intended to configure loading of a single or a set of column-formatted files into NSAPH PostgreSQL Database. Input (aka source) files can be either in FST or in CSV format
Configurator assumes that the database schema is defined as a YAML or JSON file. A separate tool is available to introspect source files and infer possible database schema.
- class LoaderConfig(doc)[source]
Configurator class for data loader
Creates a new object
- Parameters
subclass¶ – A concrete class containing configuration information Configuration options must be defined as class memebers with names, starting with one ‘_’ characters and values be instances of :class Argument:
description¶ – Optional text to use as description. If not specified, then it is extracted from subclass documentation
- action: Optional[DataLoaderAction]
If this option is given, then the whole domain schema will be dropped
- data
Path to a data file or directory. Can be a single CSV, gzipped CSV or FST file or a directory recursively containing CSV files. Can also be a tar, tar.gz (or tgz) or zip archive containing CSV files
- reset
Force recreating table(s) if it/they already exist
- page
Explicit page size for the database
- log
Explicit interval for logging
- limit
Load at most specified number of records
- buffer
Buffer size for converting fst files
- threads
Number of threads writing into the database
- parallelization
Type of parallelization, if any
- incremental
Commit every file and skip over files that have already been ingested
- sloppy
Do not update existing tables and views