Dorieh Core Data Platform

Documentation Home

Core platform overview

The data platform provides generic functionality for Dorieh Data Platform with APIs and command line utilities dependent on the infrastructure and the environment. For instance, its components assume presence of PostgreSQL DBMS (version 13 or later) and CWL runtime environment.

Some mapping (or crosswalk) tables are also included in the Core Platform module. These tables include between different territorial codes, such as USPS ZIP codes, Census ZCTA codes, FIPS codes for US states and counties, SSA codes for US states and counties. See more information in the Mapping between different territorial codes

See also: Managing database connections.

Tool Examples

Examples of tools included in this package are:

Project Structure

The package is under intensive development, the project structure is in flux

Top level directories are:

- doc
- resources
- src
- examples
- docker

Doc directory contains documentation.

Resource directory contains resources that must be loaded in the data platform for its normal functioning. For example, they contain mappings between US states, counties, fips and zip codes. See details in Resources section.

Src directory contains software source code. See details in Software Sources section.

Software Sources

The directories under sources are:

- cwl
- python
- sql

They are described in more details in the corresponding sections. Here is a brief overview:

  • cwl contains reusable workflows, packaged as tools that can and should be used by Dorieh pipelines. Examples of such tools are: introspection of CSV files, indexing tables, linking tables with GIS information for easy mapping, creation of a Superset datasource.

  • sql contains PostgreSQL procedures and functions implemented in the PostgreSQL dialect of SQL/DDL and PL/pgSQL language

  • python contains Python code. See more details.

Python packages

Modules and subpackages included in dorieh.platform package are described here.

Resources

Resources are organized in the following way:

- ${database schema}/
    - ddl file for ${resource1}
    - content of ${resource1} in JSON Lines format (*.json.gz)
    - ddl file for ${resource2}
    - content of ${resource2} in JSON Lines format (*.json.gz)

Resources can be packaged when a wheel is built. Support for packaging resources during development and after a package is deployed is provided by resources module.

Another module, pg_json_dump, provides support for packaging tables as resources in JSONLines format. This format is used natively by some DBMSs.

SQL Utilities

Utilities, implementing the following:

Territorial Codes Mappings

An important part of the data platform is the mappings between different territorial codes, such as USPS ZIP codes, Census ZCTA codes, FIPS codes for US states and counties, SSA codes for codes for US states and counties. See more information in the Mapping between different territorial codes page.

Documentation Indices