Skip to main content

Database entities

Below is the documentation for all database entities (a.k.a. "database tables" or "database relations"). In addition to information regarding their attributes, you will also find each of their unique constraints. These unique constraints are attributes (beside the primary key) that, together, have unique values in each row.

info

PostgreSQL automatically creates simple indexes for primary keys, as well as composite indexes for unique constraints. Only simple indexes are considered for the column "Indexed?" of the tables below.

Project management entities

metadata

Some metadata to keep track of analyzes status. Among other things, this table helps implement the parsomics analysis status command of the parsomics-cli. This table is a "singleton", meaning it only has one row (Primary key = 1).

AttributeTypeNullable?Indexed?Description
keySERIALNoYesPrimary key
created_atTIMESTAMPNoNoWhen the row was created
updated_atTIMESTAMPNoNoWhen the row was last updated
statusprogressstatusNoNoOne of "IN_PROGRESS" and "DONE"
SQL creation script
CREATE TABLE metadata (
created_at TIMESTAMP WITHOUT TIME ZONE NOT NULL,
updated_at TIMESTAMP WITHOUT TIME ZONE NOT NULL,
status progressstatus NOT NULL,
key SERIAL NOT NULL,
PRIMARY KEY (key)
)

project

Tracks projects under which assemblies and analyses are organized.

AttributeTypeNullable?Indexed?Description
keySERIALNoYesPrimary key
created_atTIMESTAMPNoNoWhen the project was created
updated_atTIMESTAMPNoNoWhen the project was updated
statusprogressstatusNoNoProgress status
nameVARCHARNoNoUnique project name

Attributes with unique values per row (unique constraint):

  • name
SQL creation script
CREATE TABLE project (
created_at TIMESTAMP WITHOUT TIME ZONE NOT NULL,
updated_at TIMESTAMP WITHOUT TIME ZONE NOT NULL,
status progressstatus NOT NULL,
name VARCHAR NOT NULL,
key SERIAL NOT NULL,
PRIMARY KEY (key),
UNIQUE (name)
)

tool

Represents a software tool used in analyses.

AttributeTypeNullable?Indexed?Description
keySERIALNoYesPrimary key
nameVARCHARNoYesTool name

Attributes with unique values per row (unique constraint):

  • name
SQL creation script
CREATE TABLE tool (
name VARCHAR NOT NULL,
key SERIAL NOT NULL,
PRIMARY KEY (key),
UNIQUE (name)
)

assembly

Represents a genome assembly, linked to a project.

AttributeTypeNullable?Indexed?Description
keySERIALNoYesPrimary key
created_atTIMESTAMPNoNoCreation time
updated_atTIMESTAMPNoNoLast update time
statusprogressstatusNoNoProgress status
project_keyINTEGERNoNoForeign key to project.key
nameVARCHARNoNoUnique per project

Attributes with unique values per row (unique constraint):

  • name
  • project_key
SQL creation script
CREATE TABLE assembly (
created_at TIMESTAMP WITHOUT TIME ZONE NOT NULL,
updated_at TIMESTAMP WITHOUT TIME ZONE NOT NULL,
status progressstatus NOT NULL,
project_key INTEGER NOT NULL,
name VARCHAR NOT NULL,
key SERIAL NOT NULL,
PRIMARY KEY (key),
UNIQUE (name, project_key),
FOREIGN KEY(project_key) REFERENCES project (key)
)

source

Represents data source metadata used in annotations.

AttributeTypeNullable?Indexed?Description
keySERIALNoYesPrimary key
nameVARCHARNoNoSource name
versionVARCHARYesNoSource version
reliabilityFLOATYesNoOptional reliability score
tool_keyINTEGERNoNoForeign key to tool.key

Attributes with unique values per row (unique constraint):

  • name
  • version
SQL creation script
CREATE TABLE source (
name VARCHAR NOT NULL,
version VARCHAR,
reliability FLOAT,
tool_key INTEGER NOT NULL,
key SERIAL NOT NULL,
PRIMARY KEY (key),
UNIQUE (name, version),
FOREIGN KEY(tool_key) REFERENCES tool (key)
)

run

A tool execution run, with links to its inputs and outputs.

AttributeTypeNullable?Indexed?Description
keySERIALNoYesPrimary key
created_atTIMESTAMPNoNoWhen the run was created
updated_atTIMESTAMPNoNoWhen the run was last updated
statusprogressstatusNoNoStatus of the run
output_directoryVARCHARNoNoPath to run output
versionVARCHARYesNoVersion of the tool
dateDATEYesNoDate the run occurred
tool_keyINTEGERNoNoForeign key to tool.key
assembly_keyINTEGERNoNoForeign key to assembly.key

Attributes with unique values per row (unique constraint):

  • output_directory
SQL creation script
CREATE TABLE run (
created_at TIMESTAMP WITHOUT TIME ZONE NOT NULL,
updated_at TIMESTAMP WITHOUT TIME ZONE NOT NULL,
status progressstatus NOT NULL,
output_directory VARCHAR NOT NULL,
version VARCHAR,
date DATE,
tool_key INTEGER NOT NULL,
assembly_key INTEGER NOT NULL,
key SERIAL NOT NULL,
PRIMARY KEY (key),
UNIQUE (output_directory),
FOREIGN KEY(tool_key) REFERENCES tool (key),
FOREIGN KEY(assembly_key) REFERENCES assembly (key)
)

File and entry entities

drepdirectory

Stores paths to directories produced by dRep runs.

AttributeTypeNullable?Indexed?Description
keySERIALNoYesPrimary key
pathVARCHARNoYesPath to the dRep directory
run_keyINTEGERNoNoForeign key to run.key

Attributes with unique values per row (unique constraint):

  • path
SQL creation script
CREATE TABLE drepdirectory (
path VARCHAR NOT NULL,
run_key INTEGER NOT NULL,
key SERIAL NOT NULL,
PRIMARY KEY (key),
UNIQUE (path),
FOREIGN KEY(run_key) REFERENCES run (key)
)

drepentry

Represents a genome identified by dRep, part of a cluster.

AttributeTypeNullable?Indexed?Description
keySERIALNoYesPrimary key
genome_nameVARCHARNoNoGenome name
genome_cluster_nameVARCHARNoNoCluster name as assigned by dRep
is_winnerBOOLEANNoNoWhether it's the representative genome
directory_keyINTEGERNoNoForeign key to drepdirectory.key

Attributes with unique values per row (unique constraint):

  • genome_name
  • directory_key
SQL creation script
CREATE TABLE drepentry (
genome_name VARCHAR NOT NULL,
genome_cluster_name VARCHAR NOT NULL,
is_winner BOOLEAN NOT NULL,
directory_key INTEGER NOT NULL,
key SERIAL NOT NULL,
PRIMARY KEY (key),
UNIQUE (genome_name, directory_key),
FOREIGN KEY(directory_key) REFERENCES drepdirectory (key)
)

gtdbtkfile

Stores output files from GTDB-Tk classifications.

AttributeTypeNullable?Indexed?Description
keySERIALNoYesPrimary key
pathVARCHARNoYesPath to GTDB-Tk file
run_keyINTEGERNoNoForeign key to run.key

Attributes with unique values per row (unique constraint):

  • path
SQL creation script
CREATE TABLE gtdbtkfile (
path VARCHAR NOT NULL,
run_key INTEGER NOT NULL,
key SERIAL NOT NULL,
PRIMARY KEY (key),
UNIQUE (path),
FOREIGN KEY(run_key) REFERENCES run (key)
)

gtdbtkentry

Holds GTDB-Tk classification results for a genome.

AttributeTypeNullable?Indexed?Description
keySERIALNoYesPrimary key
referenceVARCHARYesNoClosest GTDB reference genome
radiusFLOATYesNoClassification radius
aniFLOATYesNoAverage Nucleotide Identity
afFLOATYesNoAlignment Fraction
classification_methodVARCHARNoNoMethod used (e.g., FastANI, Mash)
noteVARCHARYesNoOptional notes
red_valueFLOATYesNoRelative Evolutionary Divergence
warningsVARCHARYesNoAny GTDB warnings
domainVARCHARNoNoTaxonomic domain
phylumVARCHARNoNoTaxonomic phylum
klassVARCHARNoNoTaxonomic class
orderVARCHARNoNoTaxonomic order
familyVARCHARNoNoTaxonomic family
genusVARCHARNoNoTaxonomic genus
speciesVARCHARNoNoTaxonomic species
taxonomic_noveltyBOOLEANNoNoIndicates a novel taxonomic classification
genome_keyINTEGERNoNoForeign key to genome.key
file_keyINTEGERNoNoForeign key to gtdbtkfile.key

Attributes with unique values per row (unique constraint):

  • file_key
  • genome_key
SQL creation script
CREATE TABLE gtdbtkentry (
reference VARCHAR,
radius FLOAT,
ani FLOAT,
af FLOAT,
classification_method VARCHAR NOT NULL,
note VARCHAR,
red_value FLOAT,
warnings VARCHAR,
domain VARCHAR NOT NULL,
phylum VARCHAR NOT NULL,
klass VARCHAR NOT NULL,
"order" VARCHAR NOT NULL,
family VARCHAR NOT NULL,
genus VARCHAR NOT NULL,
species VARCHAR NOT NULL,
taxonomic_novelty BOOLEAN NOT NULL,
genome_key INTEGER NOT NULL,
file_key INTEGER NOT NULL,
key SERIAL NOT NULL,
PRIMARY KEY (key),
UNIQUE (file_key, genome_key),
FOREIGN KEY(genome_key) REFERENCES genome (key),
FOREIGN KEY(file_key) REFERENCES gtdbtkfile (key)
)

fastafile

FASTA file containing genome sequences.

AttributeTypeNullable?Indexed?Description
keySERIALNoYesPrimary key
pathVARCHARNoYesFile path
run_keyINTEGERNoNoForeign key to run.key
genome_keyINTEGERNoNoForeign key to genome.key
sequence_typesequencetypeNoNoType of sequences in file

Attributes with unique values per row (unique constraint):

  • path
SQL creation script
CREATE TABLE fastafile (
path VARCHAR NOT NULL,
run_key INTEGER NOT NULL,
genome_key INTEGER NOT NULL,
sequence_type sequencetype NOT NULL,
key SERIAL NOT NULL,
PRIMARY KEY (key),
UNIQUE (path),
FOREIGN KEY(run_key) REFERENCES run (key),
FOREIGN KEY(genome_key) REFERENCES genome (key)
)

fastaentry

A sequence entry from a FASTA file.

AttributeTypeNullable?Indexed?Description
keySERIALNoYesPrimary key
sequence_nameVARCHARNoYesName of the sequence
descriptionVARCHARYesNoDescription field
sequenceVARCHARNoNoRaw nucleotide or protein sequence
file_keyINTEGERNoNoForeign key to fastafile.key
source_keyINTEGERYesNoForeign key to source.key

Attributes with unique values per row (unique constraint):

  • file_key
  • sequence_name
SQL creation script
CREATE TABLE fastaentry (
sequence_name VARCHAR NOT NULL,
description VARCHAR,
sequence VARCHAR NOT NULL,
file_key INTEGER NOT NULL,
source_key INTEGER,
key SERIAL NOT NULL,
PRIMARY KEY (key),
UNIQUE (file_key, sequence_name),
FOREIGN KEY(file_key) REFERENCES fastafile (key),
FOREIGN KEY(source_key) REFERENCES source (key)
)

gfffile

GFF annotation file for a genome.

AttributeTypeNullable?Indexed?Description
keySERIALNoYesPrimary key
pathVARCHARNoYesFile path
run_keyINTEGERNoNoForeign key to run.key
genome_keyINTEGERNoNoForeign key to genome.key

Attributes with unique values per row (unique constraint):

  • path
SQL creation script
CREATE TABLE gfffile (
path VARCHAR NOT NULL,
run_key INTEGER NOT NULL,
genome_key INTEGER NOT NULL,
key SERIAL NOT NULL,
PRIMARY KEY (key),
UNIQUE (path),
FOREIGN KEY(run_key) REFERENCES run (key),
FOREIGN KEY(genome_key) REFERENCES genome (key)
)

gffentry

Gene or feature annotation from a GFF file.

AttributeTypeNullable?Indexed?Description
keySERIALNoYesPrimary key
gene_nameVARCHARYesYesName of the gene
identifierVARCHARYesYesUnique feature identifier
contig_nameVARCHARNoYesContig name
fragment_typefragmenttypeNoNoType of feature (e.g., CDS)
coord_startINTEGERNoNoStart coordinate
coord_stopINTEGERNoNoStop coordinate
scoreFLOATYesNoFeature score
strandVARCHARYesNoStrand (+/-)
phaseINTEGERYesNoPhase of the CDS
attributesJSONBYesNoAdditional feature attributes
file_keyINTEGERNoNoForeign key to gfffile.key
source_keyINTEGERNoNoForeign key to source.key

Attributes with unique values per row (unique constraint):

  • file_key
  • identifier
SQL creation script
CREATE TABLE gffentry (
gene_name VARCHAR,
identifier VARCHAR,
contig_name VARCHAR NOT NULL,
fragment_type fragmenttype NOT NULL,
coord_start INTEGER NOT NULL,
coord_stop INTEGER NOT NULL,
score FLOAT,
strand VARCHAR,
phase INTEGER,
attributes JSONB,
file_key INTEGER NOT NULL,
source_key INTEGER NOT NULL,
key SERIAL NOT NULL,
PRIMARY KEY (key),
UNIQUE (file_key, identifier),
FOREIGN KEY(file_key) REFERENCES gfffile (key),
FOREIGN KEY(source_key) REFERENCES source (key)
)

proteinannotationfile

Stores the path to a file containing protein-level annotations.

AttributeTypeNullable?Indexed?Description
keySERIALNoYesPrimary key
pathVARCHARNoYesFile path
run_keyINTEGERNoNoForeign key to run.key
genome_keyINTEGERNoNoForeign key to genome.key

Attributes with unique values per row (unique constraint):

  • path
SQL creation script
CREATE TABLE proteinannotationfile (
path VARCHAR NOT NULL,
run_key INTEGER NOT NULL,
genome_key INTEGER NOT NULL,
key SERIAL NOT NULL,
PRIMARY KEY (key),
UNIQUE (path),
FOREIGN KEY(run_key) REFERENCES run (key),
FOREIGN KEY(genome_key) REFERENCES genome (key)
)

proteinannotationentry

An annotation on a protein with optional source attribution.

AttributeTypeNullable?Indexed?Description
keySERIALNoYesPrimary key
descriptionVARCHARYesNoAnnotation description
coord_startINTEGERYesNoStart coordinate
coord_stopINTEGERYesNoStop coordinate
accessionVARCHARYesNoAccession identifier
scoreFLOATYesNoAnnotation score
annotation_typeVARCHARYesNoType of annotation
detailsJSONBYesNoAdditional annotation details
protein_keyINTEGERNoNoForeign key to protein.key
file_keyINTEGERNoNoForeign key to proteinannotationfile.key
source_keyINTEGERYesNoForeign key to source.key

Attributes with unique values per row (unique constraint):

  • coord_start
  • coord_stop
  • description
  • protein_key
  • file_key
  • source_key
SQL creation script
CREATE TABLE proteinannotationentry (
description VARCHAR,
coord_start INTEGER,
coord_stop INTEGER,
accession VARCHAR,
score FLOAT,
annotation_type VARCHAR,
details JSONB,
protein_key INTEGER NOT NULL,
file_key INTEGER NOT NULL,
source_key INTEGER,
key SERIAL NOT NULL,
PRIMARY KEY (key),
UNIQUE (coord_start, coord_stop, description, protein_key, file_key, source_key),
FOREIGN KEY(protein_key) REFERENCES protein (key),
FOREIGN KEY(file_key) REFERENCES proteinannotationfile (key),
FOREIGN KEY(source_key) REFERENCES source (key)
)

geneannotationfile

Represents a gene annotation output file from a run.

Important

The geneannotationfile table is currently unused. It will eventually be used to allow the creation of gene annotation plugins (in addition to the existing protein annotation plugins).

AttributeTypeNullable?Indexed?Description
keySERIALNoYesPrimary key
pathVARCHARNoYesPath to gene annotation file
run_keyINTEGERNoNoForeign key to run.key
SQL creation script

Attributes with unique values per row (unique constraint):

  • path
CREATE TABLE geneannotationfile (
path VARCHAR NOT NULL,
run_key INTEGER NOT NULL,
key SERIAL NOT NULL,
PRIMARY KEY (key),
UNIQUE (path),
FOREIGN KEY(run_key) REFERENCES run (key)
)

geneannotationentry

Annotations associated with a gene, providing coordinates, source, and other metadata.

Important

The geneannotationentry table is currently unused. It will eventually be used to allow the creation of gene annotation plugins (in addition to the existing protein annotation plugins).

AttributeTypeNullable?Indexed?Description
keySERIALNoYesPrimary key
descriptionVARCHARYesNoDescription of the annotation
coord_startINTEGERYesNoStart coordinate
coord_stopINTEGERYesNoStop coordinate
accessionVARCHARYesNoAccession identifier
scoreFLOATYesNoAnnotation score
annotation_typeVARCHARYesNoType of annotation
detailsJSONBYesNoExtra data in JSON format
gene_keyINTEGERNoNoForeign key to gene.key
file_keyINTEGERNoNoForeign key to geneannotationfile.key
source_keyINTEGERNoNoForeign key to source.key

Attributes with unique values per row (unique constraint):

  • coord_start
  • coord_stop
  • description
  • gene_key
  • file_key
  • source_key
SQL creation script
CREATE TABLE geneannotationentry (
description VARCHAR,
coord_start INTEGER,
coord_stop INTEGER,
accession VARCHAR,
score FLOAT,
annotation_type VARCHAR,
details JSONB,
gene_key INTEGER NOT NULL,
file_key INTEGER NOT NULL,
source_key INTEGER NOT NULL,
key SERIAL NOT NULL,
PRIMARY KEY (key),
UNIQUE (coord_start, coord_stop, description, gene_key, file_key, source_key),
FOREIGN KEY(gene_key) REFERENCES gene (key),
FOREIGN KEY(file_key) REFERENCES geneannotationfile (key),
FOREIGN KEY(source_key) REFERENCES source (key)
)

Biological entities

sample

Represents a sample associated with a dRep directory.

AttributeTypeNullable?Indexed?Description
keySERIALNoYesPrimary key
nameVARCHARNoNoSample name
drep_directory_keyINTEGERNoNoForeign key to drepdirectory.key

Attributes with unique values per row (unique constraint):

  • name
  • drep_directory_key
SQL creation script
CREATE TABLE sample (
name VARCHAR NOT NULL,
drep_directory_key INTEGER NOT NULL,
key SERIAL NOT NULL,
PRIMARY KEY (key),
UNIQUE (name, drep_directory_key),
FOREIGN KEY(drep_directory_key) REFERENCES drepdirectory (key)
)

genomecluster

Represents a dRep genome cluster.

AttributeTypeNullable?Indexed?Description
keySERIALNoYesPrimary key
nameVARCHARNoNoName of the cluster
drep_directory_keyINTEGERNoNoForeign key to drepdirectory.key

Attributes with unique values per row (unique constraint):

  • name
  • drep_directory_key
SQL creation script
CREATE TABLE genomecluster (
name VARCHAR NOT NULL,
drep_directory_key INTEGER NOT NULL,
key SERIAL NOT NULL,
PRIMARY KEY (key),
UNIQUE (name, drep_directory_key),
FOREIGN KEY(drep_directory_key) REFERENCES drepdirectory (key)
)

genome

Represents a genome linked to its dRep entry, cluster, and sample.

AttributeTypeNullable?Indexed?Description
keySERIALNoYesPrimary key
drep_entry_keyINTEGERNoNoForeign key to drepentry.key
genome_cluster_keyINTEGERNoNoForeign key to genomecluster.key
sample_keyINTEGERNoNoForeign key to sample.key

Attributes with unique values per row (unique constraint):

  • drep_entry_key
SQL creation script
CREATE TABLE genome (
drep_entry_key INTEGER NOT NULL,
genome_cluster_key INTEGER NOT NULL,
sample_key INTEGER NOT NULL,
key SERIAL NOT NULL,
PRIMARY KEY (key),
UNIQUE (drep_entry_key),
FOREIGN KEY(drep_entry_key) REFERENCES drepentry (key),
FOREIGN KEY(genome_cluster_key) REFERENCES genomecluster (key),
FOREIGN KEY(sample_key) REFERENCES sample (key)
)

contig

A contiguous DNA sequence associated with a genome.

AttributeTypeNullable?Indexed?Description
keySERIALNoYesPrimary key
genome_keyINTEGERNoNoForeign key to genome.key
fasta_entry_keyINTEGERNoNoForeign key to fastaentry.key

Attributes with unique values per row (unique constraint):

  • fasta_entry_key
SQL creation script
CREATE TABLE contig (
genome_key INTEGER NOT NULL,
fasta_entry_key INTEGER NOT NULL,
key SERIAL NOT NULL,
PRIMARY KEY (key),
UNIQUE (fasta_entry_key),
FOREIGN KEY(genome_key) REFERENCES genome (key),
FOREIGN KEY(fasta_entry_key) REFERENCES fastaentry (key)
)

repeatedregion

A repetitive region (CRISPR) on a contig, associated with a GFF entry.

AttributeTypeNullable?Indexed?Description
keySERIALNoYesPrimary key
contig_keyINTEGERNoNoForeign key to contig.key
gff_entry_keyINTEGERNoNoForeign key to gffentry.key

Attributes with unique values per row (unique constraint):

  • gff_entry_key
SQL creation script
CREATE TABLE repeatedregion (
contig_key INTEGER NOT NULL,
gff_entry_key INTEGER NOT NULL,
key SERIAL NOT NULL,
PRIMARY KEY (key),
UNIQUE (gff_entry_key),
FOREIGN KEY(contig_key) REFERENCES contig (key),
FOREIGN KEY(gff_entry_key) REFERENCES gffentry (key)
)

gene

Represents a gene associated with a contig and derived from a FASTA entry.

AttributeTypeNullable?Indexed?Description
keySERIALNoYesPrimary key
fasta_entry_keyINTEGERNoNoForeign key to fastaentry.key
contig_keyINTEGERNoNoForeign key to contig.key

Attributes with unique values per row (unique constraint):

  • fasta_entry_key
SQL creation script
CREATE TABLE gene (
fasta_entry_key INTEGER NOT NULL,
contig_key INTEGER NOT NULL,
key SERIAL NOT NULL,
PRIMARY KEY (key),
UNIQUE (fasta_entry_key),
FOREIGN KEY(fasta_entry_key) REFERENCES fastaentry (key),
FOREIGN KEY(contig_key) REFERENCES contig (key)
)

fragment

Represents a genomic fragment from a gene, tied to a GFF entry.

AttributeTypeNullable?Indexed?Description
keySERIALNoYesPrimary key
gene_keyINTEGERNoNoForeign key to gene.key
gff_entry_keyINTEGERNoNoForeign key to gffentry.key

Attributes with unique values per row (unique constraint):

  • gff_entry_key
SQL creation script
CREATE TABLE fragment (
gene_key INTEGER NOT NULL,
gff_entry_key INTEGER NOT NULL,
key SERIAL NOT NULL,
PRIMARY KEY (key),
UNIQUE (gff_entry_key),
FOREIGN KEY(gene_key) REFERENCES gene (key),
FOREIGN KEY(gff_entry_key) REFERENCES gffentry (key)
)

Links fragments to proteins, supporting many-to-many relationships.

AttributeTypeNullable?Indexed?Description
fragment_keyINTEGERNoYesForeign key to fragment.key (Primary key)
protein_keyINTEGERNoYesForeign key to protein.key (Primary key)
SQL creation script
CREATE TABLE fragmentproteinlink (
fragment_key INTEGER NOT NULL,
protein_key INTEGER NOT NULL,
PRIMARY KEY (fragment_key, protein_key),
FOREIGN KEY(fragment_key) REFERENCES fragment (key),
FOREIGN KEY(protein_key) REFERENCES protein (key)
)

protein

Represents a protein derived from a FASTA entry.

AttributeTypeNullable?Indexed?Description
keySERIALNoYesPrimary key
fasta_entry_keyINTEGERYesNoForeign key to fastaentry.key

Attributes with unique values per row (unique constraint):

  • fasta_entry_key
SQL creation script
CREATE TABLE protein (
fasta_entry_key INTEGER,
key SERIAL NOT NULL,
PRIMARY KEY (key),
UNIQUE (fasta_entry_key),
FOREIGN KEY(fasta_entry_key) REFERENCES fastaentry (key)
)