How to add data to the database

What data are you adding?
Data modelling vs data introspection
Adding new data domain
Adding data to existing table
Creating new single table
Automatically ingesting multiple files from a file system

What data are you adding?

There are many ways to add data to the database. We review the following options:

Creating a new data domain with its own pipelines and, optionally, software tools written in a programming language like Python, Java, R, Pl/PgSQL, etc.,
Adding a new table:
- from a file on file system
- from remote data source
Adding data to existing table
Bulk ingesting multiple CSV-like files (we support many formats) from local file system to create a lightweight data domain

For creating new tables in the database, there is a choice between manually creating a data model and required data conversions and transformations or automatically inferring data structure based on data sampling.

Data modelling vs data introspection 

Tools for data modelling are discussed in Data Modelling for NSAPH Data Platform.

Examples of manually created data models are data models for Medicare and Medicaid domains. Actual models are defined respectively in Medicare.yaml and Medicaid.yaml

To automatically infer data structure by analyzing sample data and generating data model corresponding to the existing structure one can use Introspector tool. It can be run as a standalone command-line tool or used via Python API.
Examples of using introspector via API can be found in EPA pipeline.

Project Loader Tool also uses Introspector.

Adding new data domain 

To add a new data domain one create a new repository on GitHub or other source control system

Adding data to existing table 

The process of adding data to an existing table is described in NSAPH Data Loader

Creating new single table 

In many cases, creating a new single table will mean running a pipeline that first introspects the data in a file (CSV, JSON, FST and some other formats) and then running the Data Loader. However, for simple cases one can use Project Loader Tool to either ingest or just to introspect the data (introspection can be done by using --dryrun argument).

Automatically ingesting multiple files from a file system 

See Project Loader Tool for details.

How to add data to the database

What data are you adding?

Data modelling vs data introspection

Adding new data domain

Adding data to existing table

Creating new single table

Automatically ingesting multiple files from a file system

Data modelling vs data introspection 

Adding new data domain 

Adding data to existing table 

Creating new single table 

Automatically ingesting multiple files from a file system 