System Architecture

Overview

Gentrain is a browser-based web application. Below are the key components of its architecture.

System Components

Components

Frontend

User Interface

Technologies

React, Zustand

Since only sequence analyses are performed on the server side, the majority of the business logic is located on the frontend. The application state at runtime is managed using the Zustand library for React. Data is persisted in a local browser database managed by IndexedDB.

The React application is divided into separate domains to create a well-structured project. These domains correspond to the application's pages.

Domain	Description
Core	Basic functionalities like application bootstrapping, routing and browser database management.
Dashboard	Functionalities and components associated with the dashboard page of the application inlcuding the dashboard graph canvas, additional information charts, case information table and distance matrix.
Outbreak Analysis	Functionalities and components associated with the outbreak analysis module inlcuding analysis overview page, outbreak analysis graph canvas and settings panel and pdf export.
Data Management	Functionalities and components associated with the data management of the application including import section, cases overview table, outbreak management and automatic deletion configuration.
Help	Functionalities and components associated with the help page of the application.
Tutorial	Functionalities and components associated with the tutorial module of the application.

The application uses IndexedDB to manage the local browser database. As the tutorial relies on a static dataset, two IndexedDB instances have been integrated: gentrain and gentrain_example. While the latter is only active during the tutorial, the former is active during productive usage of the application. Additionally state-management solution Zustand is used for data management during runtime. Zustand orchestrates the application state within so-called stores. In GENTRAIN, these stores mirror the domains of the React application, except for the Help domain, which does not have a corresponding store. The following graph illustrates the concrete data sources of the React application.

User Interface Data Sources

IndexedDB

This low-level API offers powerful and efficient database functionality for client-side storage. Personal data is stored solely within the web browser's local database. As IndexedDB does not support automatic data deletion, the application provides two options: data can be deleted either after the 24-hour TTL has expired (only possible during productive usage or when the application is loaded again), or when the tab/browser is reloaded or closed (beforeunload event). The former is activated by default, but it is also possible to disable both options.

On page load, the pathogens in the client-side database are compared with the persisted pathogens in the Postgres database and synchronised. The data model of the client-side database is shown below.

Entity Relationship Model (IndexedDB)

Details

Pathogen

Type	Name
int	id
string	name
int	genetic_distance_threshold
int	pathogen_type_id
boolean	activated
datetime	created_at
datetime	updated_at

Pathogen Type

Type	Name
int	id
string	name
datetime	initialized_at
datetime	created_at
datetime	updated_at

Case

Type	Name
int	id
string	case_id
string	fasta_id
datetime	registered_at
string	city
string	zip_code
string	street
string	last_name
string	first_name
datetime	created_at
datetime	updated_at

Outbreak

Type	Name
int	id
string	name
datetime	created_at
datetime	updated_at

Group

Type	Name
int	id
string	name
datetime	created_at
datetime	updated_at

Sequence Analysis

Type	Name
int	id
string	hash
object	result
string	schema
string	version
datetime	created_at
datetime	updated_at

Distance Matrix

Type	Name
int	id
datetime	created_at
datetime	updated_at

Distance

Type	Name
int	id
int	case_id_1
int	case_id_2
int	value
datetime	created_at
datetime	updated_at

Contact

Type	Name
int	id
int	case_id_1
int	case_id_2
string	type
type	context
datetime	created_at
datetime	updated_at

Analysis

Type	Name
int	id
string	name
object	settings
datetime	created_at
datetime	updated_at

Backend

API

Technologies

Python (Flask), Flask-SocketIO, RQ, Prisma

This API facilitates communication with the backend and primarily offers pathogen resources and file downloads, including example data and pathogen schemes. To ensure tasks are processed reliably, long-running tasks are processed within Redis queues. This means that these tasks will be processed even in high-stress situations. A WebSocket server (SocketIO) is therefore integrated into the API to facilitate bidirectional, event-driven communication during sequence analysis. Redis queues for viral and bacterial sequence analyses are orchestrated using Supervisord and can be configured in the file /backend/api/workers/supervisor.conf according to current requirements and server specifications.

API Structure

The Python application is divided into two domains, as shown in the following table.

Domain	Description
Pathogen Registry	Operations related to pathogens. This includes retrieving information about persisted pathogens and downloading pathogen schemes and example data.
Sequence Analysis	Sequence analysis orchestration. This involves managing websocket events, executing sequence analysis jobs, and preparing sequence analysis responses.

Admin Panel

Technologies

NodeJS + React (AdminJS), Prisma

A user interface for managing pathogen data, particularly schemes.

Redis Cache

Long-running tasks, such as sequence analyses, are managed via Redis job queues and handled by workers. The Redis cache is also used to implement chunking of WebSocket message data by temporarily storing sequence chunks. Furthermore, sequence analyses are persisted for 30 minutes to prevent data loss if users leave the dashboard during processing.

PostgreSQL

Pathogens, admin users and roles are stored in a server-side PostgreSQL database, which can be accessed and managed via the Admin Panel.

Entity Relationship Model (Postgres DB)

Details

Pathogen

Type	Name
int	id
string	name
int	genetic_distance_threshold
int	pathogen_type_id
int	genetic_distance_threshold
int	pathogen_type_id
string	example_cases_key
string	example_cases_size
string	example_cases_bucket
string	example_sequences_key
string	example_sequences_size
string	example_sequences_bucket
string	example_contacts_key
string	example_contacts_size
string	example_contacts_bucket
datetime	created_at
datetime	updated_at

User

Type	Name
int	id
string	username
string	password
datetime	confirmed_at
datetime	created_at
datetime	updated_at
id	roleId

Role

Type	Name
int	id
string	name
string	description

Logs

Type	Name
int	id
int	recordId
string	recordTitle
json	difference
string	action
string	resource
string	userId
datetime	createdAt
datetime	updatedAt

Session

Type	Name
int	sid
json	sess
datetime	expire

Docker Deployment

GENTRAIN is deployed in Docker containers. Some containers are used to deploy application components directly, while others are used to build production code or perform configuration tasks. The following graph illustrates the dependencies between the containers for a production environment. Dotted arrows indicate dependencies at system start, while solid arrows represent dependencies at runtime.

Docker Deployment

Not all containers are also used for local development, as can be seen in the table below. The table also provides descriptions of the purpose of each container.

Container	Purpose	Used locally?
`gentrain-init`	A one-time initialization container to configure volume access permissions.
`gentrain-frontend`	Builds and provides the compiled assets of the user interface for `gentrain-caddy` to serve.
`gentrain-api`	Runs a Flask server with API endpoints and WebSocket event handlers.	✅
`gentrain-worker`	Executes background job workers using the same codebase as `gentrain-api` in Redis queues.	✅
`gentrain-db`	Hosts a Postgres database for centralized server-side data persistence (pathogens, users, roles).	✅
`gentrain-redis`	Provides the Redis in-memory cache used by `gentrain-api` and `gentrain-worker` to implement job queueing, as well as to persist and retrieve sequence analysis results.	✅
`gentrain-admin`	Runs a adminJS node application to manage pathogen information and files.
`gentrain-documentation`	Builds the Docusaurus documentation for `gentrain-caddy` to serve as static files.
`gentrain-caddy`	Serves as the production reverse proxy and static file server for the `gentrain-api`, `gentrain-frontend`, and `gentrain-documentation`.

Overview​

Components​

Frontend​

User Interface​

IndexedDB​

Pathogen

Pathogen Type

Case

Outbreak

Category

Group

Sequence Analysis

Distance Matrix

Distance

Contact

Analysis

Backend​

API​

Admin Panel​

Redis Cache​

PostgreSQL​

Pathogen

User

Role

Logs

Session

Docker Deployment​

Overview

Components

Frontend

User Interface

IndexedDB

Backend

API

Admin Panel

Redis Cache

PostgreSQL

Docker Deployment