Skip to main content

Security and Data Privacy

Processed Data

Gentrain only processes personal data on the client side. Personal data is added via case imports and stored in the browser's IndexedDB. Data does not refer to the user, but to the persons associated with the cases registered with the health authorities.

NameDescriptionSourceServer-side persistenceServer-side processingClient-side persistenceClient-side processing
Case IdUnique identifier of a case registered at the relevant public health department. Used to identify cases and to map sequences to associated cases.SurvNet, Octoware, ISGA
First NameFirst name of a person associated with a case. Used to support case association in outbreak analyses.SurvNet, Octoware, ISGA
Last NameLast name of a person associated with a case. Used to support case association in outbreak analyses.SurvNet, Octoware, ISGA
AddressAddress of a person associated with a case. Used to support case association in outbreak analyses.SurvNet, Octoware, ISGA
Registration DateDate on which a case was registered at the relevant public health department. Used to support case association in outbreak analyses and to filter cases presented in graphs.SurvNet, Octoware, ISGA
Outbreak NameOutbreak associated with an imported case. Used to display outbreaks in graphs and for filtering.SurvNet, Octoware, ISGA
Contact InformationMapping between two cases and their type of contact. Used to support outbreak analysis by providing further understanding about the incidence of infection.SurvNet, Octoware, ISGA
Fasta IdIdentifier of a sequence that is linked to an imported case. Used to identify sequences and to assign cases to imported sequences.Sequencing Labs
Genetic sequencesSequences associated with imported cases. These are used to calculate the genetic distances between cases, which in turn are used to support outbreak analyses.Sequencing Labs
Sequence analysis resultSequences are analysed for mutations based on the corresponding reference genome. These results are stored on server side for a maximum of 30 minutes in case users close the websocket connection (e.g. by closing the browser tab) and therefore cannot retrieve these results immediately.Internal(temporary)

Genetic data, such as viral and bacterial genomes, are transferred to the server and analysis results are temporarily stored for a maximum of 30 minutes. These contain information on mutations based on the corresponding reference genome. Whenever communicating with the server, sequences are aggregated and anonymised using only a hash as a reference. The following graphic illustrates this procedure. Sequence Aggregation

It is also possible to add flexible data columns to the case import. This data is used for filtering graphs and is only persisted and processed on the client side.

Data Processing Operations

OperationDecriptionDataPurpose
Data importImport of data from fasta / csv filesCases, sequences, contactsData basis for outbreak analyses
Client side persistence of imported dataPersistence of data from fasta / csv filesCases, sequences, contactsData basis for outbreak analyses
Database exportExport of data stored in IndexedDBClient side application statePersistent data management, state sharing, backups
Database importImport of data stored in IndexedDBClient side application statePersistent data management, state sharing, backups
Sequence analysisWebsocket messaging between client and serverAnonymized sequence data, mutation informationAnalysis for mutations to persist relevant sequence information in compressed form
Client side persistence of sequence analysis resultsPersistence in IndexedDBMutation informationDistance calculation between sequences
Server side caching of sequence analysis resultsTemporary persistence in Redis cache (max 30 minutes)Mutation informationRetaining the results if the user leaves the application during analysis
Server side caching of sequence chunksTemporary persistence in Redis cache (until all chunks have been transmitted or max 30 minutes)Anonymized sequences, anonymized assembliesReliable web socket transmission despite large amounts of data
Outbreak analysis report exportExport of an outbreak analysis containing case, sequence and contact tracing information in the form of a pdf documentCases, sequences, contactsCollectiong and exporting outbreak analysis information and conclusions

Securing the Postgres Database

The docker container running the postgres instance is execute as non-root user. Accessing the database requires authentication.

Securing Redis

'Redis security' guideline was followed conscientiously, with the exception of TLS encryption, as the gentrain-redis container is only accessible from the local Docker network. The docker container running the redis instance is execute as non-root user and the legacy authentication method is enabled. In addition, the local docker IP address is bound to prevent access from other origins. ACL rules were configured for vulnerable commands:

  • all commands except dangerous ones (-@dangerous) are allowed
  • commands +client|list and +keys are explicitly allowed, as python-rq makes use of them

Securing the Admin Panel

Authentication

To access the GENTRAIN Admin Panel, users must authenticate themselves. In addition, various roles have been implemented to grant authorisation for user administration only to certain users.

Admin passwords must be set on first login and follow following rules:

  • at least 8 characters
  • at least 1 number
  • at least 1 special character ($, #, @, !, *, .)

File uploads

Access to directories other than the specified upload locations is not permitted for the Ubuntu user running the admin panel. Uploaded files are also strictly checked against the expected file patterns. In addition, files (including individual files from zip archives) are checked for security vulnerabilities using ClamAV. If the ClamAV scan indicates security vulnerabilities, the upload is prevented. This applies to schema and sample data uploads.

Securing the Websocket Connection

WebSocket communication is TLS encrypted.