Managing database connections

One of the most important functionalities of Dorieh data platform is support for creating a data warehouse for storing an analysis of large volumes of data. Moreover, Dorieh supports a rich set of in-database data transformations (see Modelling data in Dorieh). Dorieh assumes that the DBMS used for data warehousing is an instance of
PostgreSQL version 13 or higher.

A preferred way to manage connections to a database is through a .ini file.

Such a file contains URIs and credentials for database connections and can store multiple entries. These entries can be either in a clear text or a reference to a Vault or a Secret Manager.

Dorieh supports connection that require creating an ssh tunnel to connect to a host on the same network as the DBMS.

Below is an example of database connection file storing 3 different ways to connect to a database.

[mimic]
host=localhost
database=mimicii
user=postgres
password=*****

[nsaph2]
host=dorieh.platform.cluster.uni.edu
database=nsaph
user=dbuser
password=*********
ssh_user=johndoe

[dorieh]
database=dorieh
secret=aws:region=us-east-1:name=nsaph/public/dorieh/

The first connection uses a local instance of PostgreSQL containing a copy of MIMIC-III Clinical Database.

The second connects to a database that is not accessible from a local machine.
It is using an ssh tunnel to connect to a remote host that has direct access. The tunnel is defined by adding an ssh_user parameter. In this example johndoe is a username for ssh while dbuser is a username for the database.

These options are appropriate for development and debugging, but are not secure enough for any production. The third connection uses AWS Secrets Manager to retrieve connection credentials. In this case security is provided by AWS.

Note: in some cases it might be required to specify an AWS profile to be able to access the Secrets Manager. This is done by setting an environment variable. See the following example:

 export AWS_PROFILE=dorieh

defines that an AWS profile named dorieh will be used to retrieve credentials from the Secrets Manager.