Database entities

Below is the documentation for all database entities (a.k.a. "database tables" or "database relations"). In addition to information regarding their attributes, you will also find each of their unique constraints. These unique constraints are attributes (beside the primary key) that, together, have unique values in each row.

info

PostgreSQL automatically creates simple indexes for primary keys, as well as composite indexes for unique constraints. Only simple indexes are considered for the column "Indexed?" of the tables below.

Project management entities

`metadata`

Some metadata to keep track of analyzes status. Among other things, this table helps implement the parsomics analysis status command of the parsomics-cli. This table is a "singleton", meaning it only has one row (Primary key = 1).

Attribute	Type	Nullable?	Indexed?	Description
`key`	`SERIAL`	No	Yes	Primary key
`created_at`	`TIMESTAMP`	No	No	When the row was created
`updated_at`	`TIMESTAMP`	No	No	When the row was last updated
`status`	`progressstatus`	No	No	One of "IN_PROGRESS" and "DONE"

SQL creation script

CREATE TABLE metadata (
    created_at TIMESTAMP WITHOUT TIME ZONE NOT NULL,
    updated_at TIMESTAMP WITHOUT TIME ZONE NOT NULL,
    status progressstatus NOT NULL,
    key SERIAL NOT NULL,
    PRIMARY KEY (key)
)

`project`

Tracks projects under which assemblies and analyses are organized.

Attribute	Type	Nullable?	Indexed?	Description
`key`	`SERIAL`	No	Yes	Primary key
`created_at`	`TIMESTAMP`	No	No	When the project was created
`updated_at`	`TIMESTAMP`	No	No	When the project was updated
`status`	`progressstatus`	No	No	Progress status
`name`	`VARCHAR`	No	No	Unique project name

Attributes with unique values per row (unique constraint):

name

SQL creation script

CREATE TABLE project (
    created_at TIMESTAMP WITHOUT TIME ZONE NOT NULL,
    updated_at TIMESTAMP WITHOUT TIME ZONE NOT NULL,
    status progressstatus NOT NULL,
    name VARCHAR NOT NULL,
    key SERIAL NOT NULL,
    PRIMARY KEY (key),
    UNIQUE (name)
)

`tool`

Represents a software tool used in analyses.

Attribute	Type	Nullable?	Indexed?	Description
`key`	`SERIAL`	No	Yes	Primary key
`name`	`VARCHAR`	No	Yes	Tool name

Attributes with unique values per row (unique constraint):

name

SQL creation script

CREATE TABLE tool (
    name VARCHAR NOT NULL,
    key SERIAL NOT NULL,
    PRIMARY KEY (key),
    UNIQUE (name)
)

`assembly`

Represents a genome assembly, linked to a project.

Attribute	Type	Nullable?	Indexed?	Description
`key`	`SERIAL`	No	Yes	Primary key
`created_at`	`TIMESTAMP`	No	No	Creation time
`updated_at`	`TIMESTAMP`	No	No	Last update time
`status`	`progressstatus`	No	No	Progress status
`project_key`	`INTEGER`	No	No	Foreign key to `project.key`
`name`	`VARCHAR`	No	No	Unique per project

Attributes with unique values per row (unique constraint):

name

project_key

SQL creation script

CREATE TABLE assembly (
    created_at TIMESTAMP WITHOUT TIME ZONE NOT NULL,
    updated_at TIMESTAMP WITHOUT TIME ZONE NOT NULL,
    status progressstatus NOT NULL,
    project_key INTEGER NOT NULL,
    name VARCHAR NOT NULL,
    key SERIAL NOT NULL,
    PRIMARY KEY (key),
    UNIQUE (name, project_key),
    FOREIGN KEY(project_key) REFERENCES project (key)
)

`source`

Represents data source metadata used in annotations.

Attribute	Type	Nullable?	Indexed?	Description
`key`	`SERIAL`	No	Yes	Primary key
`name`	`VARCHAR`	No	No	Source name
`version`	`VARCHAR`	Yes	No	Source version
`reliability`	`FLOAT`	Yes	No	Optional reliability score
`tool_key`	`INTEGER`	No	No	Foreign key to `tool.key`

Attributes with unique values per row (unique constraint):

name

version

SQL creation script

CREATE TABLE source (
    name VARCHAR NOT NULL,
    version VARCHAR,
    reliability FLOAT,
    tool_key INTEGER NOT NULL,
    key SERIAL NOT NULL,
    PRIMARY KEY (key),
    UNIQUE (name, version),
    FOREIGN KEY(tool_key) REFERENCES tool (key)
)

`run`

A tool execution run, with links to its inputs and outputs.

Attribute	Type	Nullable?	Indexed?	Description
`key`	`SERIAL`	No	Yes	Primary key
`created_at`	`TIMESTAMP`	No	No	When the run was created
`updated_at`	`TIMESTAMP`	No	No	When the run was last updated
`status`	`progressstatus`	No	No	Status of the run
`output_directory`	`VARCHAR`	No	No	Path to run output
`version`	`VARCHAR`	Yes	No	Version of the tool
`date`	`DATE`	Yes	No	Date the run occurred
`tool_key`	`INTEGER`	No	No	Foreign key to `tool.key`
`assembly_key`	`INTEGER`	No	No	Foreign key to `assembly.key`

Attributes with unique values per row (unique constraint):

output_directory

SQL creation script

CREATE TABLE run (
    created_at TIMESTAMP WITHOUT TIME ZONE NOT NULL,
    updated_at TIMESTAMP WITHOUT TIME ZONE NOT NULL,
    status progressstatus NOT NULL,
    output_directory VARCHAR NOT NULL,
    version VARCHAR,
    date DATE,
    tool_key INTEGER NOT NULL,
    assembly_key INTEGER NOT NULL,
    key SERIAL NOT NULL,
    PRIMARY KEY (key),
    UNIQUE (output_directory),
    FOREIGN KEY(tool_key) REFERENCES tool (key),
    FOREIGN KEY(assembly_key) REFERENCES assembly (key)
)

File and entry entities

`drepdirectory`

Stores paths to directories produced by dRep runs.

Attribute	Type	Nullable?	Indexed?	Description
`key`	`SERIAL`	No	Yes	Primary key
`path`	`VARCHAR`	No	Yes	Path to the dRep directory
`run_key`	`INTEGER`	No	No	Foreign key to `run.key`

Attributes with unique values per row (unique constraint):

path

SQL creation script

CREATE TABLE drepdirectory (
    path VARCHAR NOT NULL,
    run_key INTEGER NOT NULL,
    key SERIAL NOT NULL,
    PRIMARY KEY (key),
    UNIQUE (path),
    FOREIGN KEY(run_key) REFERENCES run (key)
)

`drepentry`

Represents a genome identified by dRep, part of a cluster.

Attribute	Type	Nullable?	Indexed?	Description
`key`	`SERIAL`	No	Yes	Primary key
`genome_name`	`VARCHAR`	No	No	Genome name
`genome_cluster_name`	`VARCHAR`	No	No	Cluster name as assigned by dRep
`is_winner`	`BOOLEAN`	No	No	Whether it's the representative genome
`directory_key`	`INTEGER`	No	No	Foreign key to `drepdirectory.key`

Attributes with unique values per row (unique constraint):

genome_name

directory_key

SQL creation script

CREATE TABLE drepentry (
    genome_name VARCHAR NOT NULL,
    genome_cluster_name VARCHAR NOT NULL,
    is_winner BOOLEAN NOT NULL,
    directory_key INTEGER NOT NULL,
    key SERIAL NOT NULL,
    PRIMARY KEY (key),
    UNIQUE (genome_name, directory_key),
    FOREIGN KEY(directory_key) REFERENCES drepdirectory (key)
)

`gtdbtkfile`

Stores output files from GTDB-Tk classifications.

Attribute	Type	Nullable?	Indexed?	Description
`key`	`SERIAL`	No	Yes	Primary key
`path`	`VARCHAR`	No	Yes	Path to GTDB-Tk file
`run_key`	`INTEGER`	No	No	Foreign key to `run.key`

Attributes with unique values per row (unique constraint):

path

SQL creation script

CREATE TABLE gtdbtkfile (
    path VARCHAR NOT NULL,
    run_key INTEGER NOT NULL,
    key SERIAL NOT NULL,
    PRIMARY KEY (key),
    UNIQUE (path),
    FOREIGN KEY(run_key) REFERENCES run (key)
)

`gtdbtkentry`

Holds GTDB-Tk classification results for a genome.

Attribute	Type	Nullable?	Indexed?	Description
`key`	`SERIAL`	No	Yes	Primary key
`reference`	`VARCHAR`	Yes	No	Closest GTDB reference genome
`radius`	`FLOAT`	Yes	No	Classification radius
`ani`	`FLOAT`	Yes	No	Average Nucleotide Identity
`af`	`FLOAT`	Yes	No	Alignment Fraction
`classification_method`	`VARCHAR`	No	No	Method used (e.g., FastANI, Mash)
`note`	`VARCHAR`	Yes	No	Optional notes
`red_value`	`FLOAT`	Yes	No	Relative Evolutionary Divergence
`warnings`	`VARCHAR`	Yes	No	Any GTDB warnings
`domain`	`VARCHAR`	No	No	Taxonomic domain
`phylum`	`VARCHAR`	No	No	Taxonomic phylum
`klass`	`VARCHAR`	No	No	Taxonomic class
`order`	`VARCHAR`	No	No	Taxonomic order
`family`	`VARCHAR`	No	No	Taxonomic family
`genus`	`VARCHAR`	No	No	Taxonomic genus
`species`	`VARCHAR`	No	No	Taxonomic species
`taxonomic_novelty`	`BOOLEAN`	No	No	Indicates a novel taxonomic classification
`genome_key`	`INTEGER`	No	No	Foreign key to `genome.key`
`file_key`	`INTEGER`	No	No	Foreign key to `gtdbtkfile.key`

Attributes with unique values per row (unique constraint):

file_key

genome_key

SQL creation script

CREATE TABLE gtdbtkentry (
    reference VARCHAR,
    radius FLOAT,
    ani FLOAT,
    af FLOAT,
    classification_method VARCHAR NOT NULL,
    note VARCHAR,
    red_value FLOAT,
    warnings VARCHAR,
    domain VARCHAR NOT NULL,
    phylum VARCHAR NOT NULL,
    klass VARCHAR NOT NULL,
    "order" VARCHAR NOT NULL,
    family VARCHAR NOT NULL,
    genus VARCHAR NOT NULL,
    species VARCHAR NOT NULL,
    taxonomic_novelty BOOLEAN NOT NULL,
    genome_key INTEGER NOT NULL,
    file_key INTEGER NOT NULL,
    key SERIAL NOT NULL,
    PRIMARY KEY (key),
    UNIQUE (file_key, genome_key),
    FOREIGN KEY(genome_key) REFERENCES genome (key),
    FOREIGN KEY(file_key) REFERENCES gtdbtkfile (key)
)

`fastafile`

FASTA file containing genome sequences.

Attribute	Type	Nullable?	Indexed?	Description
`key`	`SERIAL`	No	Yes	Primary key
`path`	`VARCHAR`	No	Yes	File path
`run_key`	`INTEGER`	No	No	Foreign key to `run.key`
`genome_key`	`INTEGER`	No	No	Foreign key to `genome.key`
`sequence_type`	`sequencetype`	No	No	Type of sequences in file

Attributes with unique values per row (unique constraint):

path

SQL creation script

CREATE TABLE fastafile (
    path VARCHAR NOT NULL,
    run_key INTEGER NOT NULL,
    genome_key INTEGER NOT NULL,
    sequence_type sequencetype NOT NULL,
    key SERIAL NOT NULL,
    PRIMARY KEY (key),
    UNIQUE (path),
    FOREIGN KEY(run_key) REFERENCES run (key),
    FOREIGN KEY(genome_key) REFERENCES genome (key)
)

`fastaentry`

A sequence entry from a FASTA file.

Attribute	Type	Nullable?	Indexed?	Description
`key`	`SERIAL`	No	Yes	Primary key
`sequence_name`	`VARCHAR`	No	Yes	Name of the sequence
`description`	`VARCHAR`	Yes	No	Description field
`sequence`	`VARCHAR`	No	No	Raw nucleotide or protein sequence
`file_key`	`INTEGER`	No	No	Foreign key to `fastafile.key`
`source_key`	`INTEGER`	Yes	No	Foreign key to `source.key`

Attributes with unique values per row (unique constraint):

file_key

sequence_name

SQL creation script

CREATE TABLE fastaentry (
    sequence_name VARCHAR NOT NULL,
    description VARCHAR,
    sequence VARCHAR NOT NULL,
    file_key INTEGER NOT NULL,
    source_key INTEGER,
    key SERIAL NOT NULL,
    PRIMARY KEY (key),
    UNIQUE (file_key, sequence_name),
    FOREIGN KEY(file_key) REFERENCES fastafile (key),
    FOREIGN KEY(source_key) REFERENCES source (key)
)

`gfffile`

GFF annotation file for a genome.

Attribute	Type	Nullable?	Indexed?	Description
`key`	`SERIAL`	No	Yes	Primary key
`path`	`VARCHAR`	No	Yes	File path
`run_key`	`INTEGER`	No	No	Foreign key to `run.key`
`genome_key`	`INTEGER`	No	No	Foreign key to `genome.key`

Attributes with unique values per row (unique constraint):

path

SQL creation script

CREATE TABLE gfffile (
    path VARCHAR NOT NULL,
    run_key INTEGER NOT NULL,
    genome_key INTEGER NOT NULL,
    key SERIAL NOT NULL,
    PRIMARY KEY (key),
    UNIQUE (path),
    FOREIGN KEY(run_key) REFERENCES run (key),
    FOREIGN KEY(genome_key) REFERENCES genome (key)
)

`gffentry`

Gene or feature annotation from a GFF file.

Attribute	Type	Nullable?	Indexed?	Description
`key`	`SERIAL`	No	Yes	Primary key
`gene_name`	`VARCHAR`	Yes	Yes	Name of the gene
`identifier`	`VARCHAR`	Yes	Yes	Unique feature identifier
`contig_name`	`VARCHAR`	No	Yes	Contig name
`fragment_type`	`fragmenttype`	No	No	Type of feature (e.g., CDS)
`coord_start`	`INTEGER`	No	No	Start coordinate
`coord_stop`	`INTEGER`	No	No	Stop coordinate
`score`	`FLOAT`	Yes	No	Feature score
`strand`	`VARCHAR`	Yes	No	Strand (+/-)
`phase`	`INTEGER`	Yes	No	Phase of the CDS
`attributes`	`JSONB`	Yes	No	Additional feature attributes
`file_key`	`INTEGER`	No	No	Foreign key to `gfffile.key`
`source_key`	`INTEGER`	No	No	Foreign key to `source.key`

Attributes with unique values per row (unique constraint):

file_key

identifier

SQL creation script

CREATE TABLE gffentry (
    gene_name VARCHAR,
    identifier VARCHAR,
    contig_name VARCHAR NOT NULL,
    fragment_type fragmenttype NOT NULL,
    coord_start INTEGER NOT NULL,
    coord_stop INTEGER NOT NULL,
    score FLOAT,
    strand VARCHAR,
    phase INTEGER,
    attributes JSONB,
    file_key INTEGER NOT NULL,
    source_key INTEGER NOT NULL,
    key SERIAL NOT NULL,
    PRIMARY KEY (key),
    UNIQUE (file_key, identifier),
    FOREIGN KEY(file_key) REFERENCES gfffile (key),
    FOREIGN KEY(source_key) REFERENCES source (key)
)

`proteinannotationfile`

Stores the path to a file containing protein-level annotations.

Attribute	Type	Nullable?	Indexed?	Description
`key`	`SERIAL`	No	Yes	Primary key
`path`	`VARCHAR`	No	Yes	File path
`run_key`	`INTEGER`	No	No	Foreign key to `run.key`
`genome_key`	`INTEGER`	No	No	Foreign key to `genome.key`

Attributes with unique values per row (unique constraint):

path

SQL creation script

CREATE TABLE proteinannotationfile (
    path VARCHAR NOT NULL,
    run_key INTEGER NOT NULL,
    genome_key INTEGER NOT NULL,
    key SERIAL NOT NULL,
    PRIMARY KEY (key),
    UNIQUE (path),
    FOREIGN KEY(run_key) REFERENCES run (key),
    FOREIGN KEY(genome_key) REFERENCES genome (key)
)

`proteinannotationentry`

An annotation on a protein with optional source attribution.

Attribute	Type	Nullable?	Indexed?	Description
`key`	`SERIAL`	No	Yes	Primary key
`description`	`VARCHAR`	Yes	No	Annotation description
`coord_start`	`INTEGER`	Yes	No	Start coordinate
`coord_stop`	`INTEGER`	Yes	No	Stop coordinate
`accession`	`VARCHAR`	Yes	No	Accession identifier
`score`	`FLOAT`	Yes	No	Annotation score
`annotation_type`	`VARCHAR`	Yes	No	Type of annotation
`details`	`JSONB`	Yes	No	Additional annotation details
`protein_key`	`INTEGER`	No	No	Foreign key to `protein.key`
`file_key`	`INTEGER`	No	No	Foreign key to `proteinannotationfile.key`
`source_key`	`INTEGER`	Yes	No	Foreign key to `source.key`

Attributes with unique values per row (unique constraint):

coord_start

coord_stop

description

protein_key

file_key

source_key

SQL creation script

CREATE TABLE proteinannotationentry (
    description VARCHAR,
    coord_start INTEGER,
    coord_stop INTEGER,
    accession VARCHAR,
    score FLOAT,
    annotation_type VARCHAR,
    details JSONB,
    protein_key INTEGER NOT NULL,
    file_key INTEGER NOT NULL,
    source_key INTEGER,
    key SERIAL NOT NULL,
    PRIMARY KEY (key),
    UNIQUE (coord_start, coord_stop, description, protein_key, file_key, source_key),
    FOREIGN KEY(protein_key) REFERENCES protein (key),
    FOREIGN KEY(file_key) REFERENCES proteinannotationfile (key),
    FOREIGN KEY(source_key) REFERENCES source (key)
)

`geneannotationfile`

Represents a gene annotation output file from a run.

Important

The geneannotationfile table is currently unused. It will eventually be used to allow the creation of gene annotation plugins (in addition to the existing protein annotation plugins).

Attribute	Type	Nullable?	Indexed?	Description
`key`	`SERIAL`	No	Yes	Primary key
`path`	`VARCHAR`	No	Yes	Path to gene annotation file
`run_key`	`INTEGER`	No	No	Foreign key to `run.key`

SQL creation script

Attributes with unique values per row (unique constraint):

path

CREATE TABLE geneannotationfile (
    path VARCHAR NOT NULL,
    run_key INTEGER NOT NULL,
    key SERIAL NOT NULL,
    PRIMARY KEY (key),
    UNIQUE (path),
    FOREIGN KEY(run_key) REFERENCES run (key)
)

`geneannotationentry`

Annotations associated with a gene, providing coordinates, source, and other metadata.

Important

The geneannotationentry table is currently unused. It will eventually be used to allow the creation of gene annotation plugins (in addition to the existing protein annotation plugins).

Attribute	Type	Nullable?	Indexed?	Description
`key`	`SERIAL`	No	Yes	Primary key
`description`	`VARCHAR`	Yes	No	Description of the annotation
`coord_start`	`INTEGER`	Yes	No	Start coordinate
`coord_stop`	`INTEGER`	Yes	No	Stop coordinate
`accession`	`VARCHAR`	Yes	No	Accession identifier
`score`	`FLOAT`	Yes	No	Annotation score
`annotation_type`	`VARCHAR`	Yes	No	Type of annotation
`details`	`JSONB`	Yes	No	Extra data in JSON format
`gene_key`	`INTEGER`	No	No	Foreign key to `gene.key`
`file_key`	`INTEGER`	No	No	Foreign key to `geneannotationfile.key`
`source_key`	`INTEGER`	No	No	Foreign key to `source.key`

Attributes with unique values per row (unique constraint):

coord_start

coord_stop

description

gene_key

file_key

source_key

SQL creation script

CREATE TABLE geneannotationentry (
    description VARCHAR,
    coord_start INTEGER,
    coord_stop INTEGER,
    accession VARCHAR,
    score FLOAT,
    annotation_type VARCHAR,
    details JSONB,
    gene_key INTEGER NOT NULL,
    file_key INTEGER NOT NULL,
    source_key INTEGER NOT NULL,
    key SERIAL NOT NULL,
    PRIMARY KEY (key),
    UNIQUE (coord_start, coord_stop, description, gene_key, file_key, source_key),
    FOREIGN KEY(gene_key) REFERENCES gene (key),
    FOREIGN KEY(file_key) REFERENCES geneannotationfile (key),
    FOREIGN KEY(source_key) REFERENCES source (key)
)

Biological entities

`sample`

Represents a sample associated with a dRep directory.

Attribute	Type	Nullable?	Indexed?	Description
`key`	`SERIAL`	No	Yes	Primary key
`name`	`VARCHAR`	No	No	Sample name
`drep_directory_key`	`INTEGER`	No	No	Foreign key to `drepdirectory.key`

Attributes with unique values per row (unique constraint):

name

drep_directory_key

SQL creation script

CREATE TABLE sample (
    name VARCHAR NOT NULL,
    drep_directory_key INTEGER NOT NULL,
    key SERIAL NOT NULL,
    PRIMARY KEY (key),
    UNIQUE (name, drep_directory_key),
    FOREIGN KEY(drep_directory_key) REFERENCES drepdirectory (key)
)

`genomecluster`

Represents a dRep genome cluster.

Attribute	Type	Nullable?	Indexed?	Description
`key`	`SERIAL`	No	Yes	Primary key
`name`	`VARCHAR`	No	No	Name of the cluster
`drep_directory_key`	`INTEGER`	No	No	Foreign key to `drepdirectory.key`

Attributes with unique values per row (unique constraint):

name

drep_directory_key

SQL creation script

CREATE TABLE genomecluster (
    name VARCHAR NOT NULL,
    drep_directory_key INTEGER NOT NULL,
    key SERIAL NOT NULL,
    PRIMARY KEY (key),
    UNIQUE (name, drep_directory_key),
    FOREIGN KEY(drep_directory_key) REFERENCES drepdirectory (key)
)

`genome`

Represents a genome linked to its dRep entry, cluster, and sample.

Attribute	Type	Nullable?	Indexed?	Description
`key`	`SERIAL`	No	Yes	Primary key
`drep_entry_key`	`INTEGER`	No	No	Foreign key to `drepentry.key`
`genome_cluster_key`	`INTEGER`	No	No	Foreign key to `genomecluster.key`
`sample_key`	`INTEGER`	No	No	Foreign key to `sample.key`

Attributes with unique values per row (unique constraint):

drep_entry_key

SQL creation script

CREATE TABLE genome (
    drep_entry_key INTEGER NOT NULL,
    genome_cluster_key INTEGER NOT NULL,
    sample_key INTEGER NOT NULL,
    key SERIAL NOT NULL,
    PRIMARY KEY (key),
    UNIQUE (drep_entry_key),
    FOREIGN KEY(drep_entry_key) REFERENCES drepentry (key),
    FOREIGN KEY(genome_cluster_key) REFERENCES genomecluster (key),
    FOREIGN KEY(sample_key) REFERENCES sample (key)
)

`contig`

A contiguous DNA sequence associated with a genome.

Attribute	Type	Nullable?	Indexed?	Description
`key`	`SERIAL`	No	Yes	Primary key
`genome_key`	`INTEGER`	No	No	Foreign key to `genome.key`
`fasta_entry_key`	`INTEGER`	No	No	Foreign key to `fastaentry.key`

Attributes with unique values per row (unique constraint):

fasta_entry_key

SQL creation script

CREATE TABLE contig (
    genome_key INTEGER NOT NULL,
    fasta_entry_key INTEGER NOT NULL,
    key SERIAL NOT NULL,
    PRIMARY KEY (key),
    UNIQUE (fasta_entry_key),
    FOREIGN KEY(genome_key) REFERENCES genome (key),
    FOREIGN KEY(fasta_entry_key) REFERENCES fastaentry (key)
)

`repeatedregion`

A repetitive region (CRISPR) on a contig, associated with a GFF entry.

Attribute	Type	Nullable?	Indexed?	Description
`key`	`SERIAL`	No	Yes	Primary key
`contig_key`	`INTEGER`	No	No	Foreign key to `contig.key`
`gff_entry_key`	`INTEGER`	No	No	Foreign key to `gffentry.key`

Attributes with unique values per row (unique constraint):

gff_entry_key

SQL creation script

CREATE TABLE repeatedregion (
    contig_key INTEGER NOT NULL,
    gff_entry_key INTEGER NOT NULL,
    key SERIAL NOT NULL,
    PRIMARY KEY (key),
    UNIQUE (gff_entry_key),
    FOREIGN KEY(contig_key) REFERENCES contig (key),
    FOREIGN KEY(gff_entry_key) REFERENCES gffentry (key)
)

`gene`

Represents a gene associated with a contig and derived from a FASTA entry.

Attribute	Type	Nullable?	Indexed?	Description
`key`	`SERIAL`	No	Yes	Primary key
`fasta_entry_key`	`INTEGER`	No	No	Foreign key to `fastaentry.key`
`contig_key`	`INTEGER`	No	No	Foreign key to `contig.key`

Attributes with unique values per row (unique constraint):

fasta_entry_key

SQL creation script

CREATE TABLE gene (
    fasta_entry_key INTEGER NOT NULL,
    contig_key INTEGER NOT NULL,
    key SERIAL NOT NULL,
    PRIMARY KEY (key),
    UNIQUE (fasta_entry_key),
    FOREIGN KEY(fasta_entry_key) REFERENCES fastaentry (key),
    FOREIGN KEY(contig_key) REFERENCES contig (key)
)

`fragment`

Represents a genomic fragment from a gene, tied to a GFF entry.

Attribute	Type	Nullable?	Indexed?	Description
`key`	`SERIAL`	No	Yes	Primary key
`gene_key`	`INTEGER`	No	No	Foreign key to `gene.key`
`gff_entry_key`	`INTEGER`	No	No	Foreign key to `gffentry.key`

Attributes with unique values per row (unique constraint):

gff_entry_key

SQL creation script

CREATE TABLE fragment (
    gene_key INTEGER NOT NULL,
    gff_entry_key INTEGER NOT NULL,
    key SERIAL NOT NULL,
    PRIMARY KEY (key),
    UNIQUE (gff_entry_key),
    FOREIGN KEY(gene_key) REFERENCES gene (key),
    FOREIGN KEY(gff_entry_key) REFERENCES gffentry (key)
)

`fragmentproteinlink`

Links fragments to proteins, supporting many-to-many relationships.

Attribute	Type	Nullable?	Indexed?	Description
`fragment_key`	`INTEGER`	No	Yes	Foreign key to `fragment.key` (Primary key)
`protein_key`	`INTEGER`	No	Yes	Foreign key to `protein.key` (Primary key)

SQL creation script

CREATE TABLE fragmentproteinlink (
    fragment_key INTEGER NOT NULL,
    protein_key INTEGER NOT NULL,
    PRIMARY KEY (fragment_key, protein_key),
    FOREIGN KEY(fragment_key) REFERENCES fragment (key),
    FOREIGN KEY(protein_key) REFERENCES protein (key)
)

`protein`

Represents a protein derived from a FASTA entry.

Attribute	Type	Nullable?	Indexed?	Description
`key`	`SERIAL`	No	Yes	Primary key
`fasta_entry_key`	`INTEGER`	Yes	No	Foreign key to `fastaentry.key`

Attributes with unique values per row (unique constraint):

fasta_entry_key

SQL creation script

CREATE TABLE protein (
    fasta_entry_key INTEGER,
    key SERIAL NOT NULL,
    PRIMARY KEY (key),
    UNIQUE (fasta_entry_key),
    FOREIGN KEY(fasta_entry_key) REFERENCES fastaentry (key)
)

Project management entities​

metadata​

project​

tool​

assembly​

source​

run​

File and entry entities​

drepdirectory​

drepentry​

gtdbtkfile​

gtdbtkentry​

fastafile​

fastaentry​

gfffile​

gffentry​

proteinannotationfile​

proteinannotationentry​

geneannotationfile​

geneannotationentry​

Biological entities​

sample​

genomecluster​

genome​

contig​

repeatedregion​

gene​

fragment​

fragmentproteinlink​

protein​

Project management entities

`metadata`

`project`

`tool`

`assembly`

`source`

`run`

File and entry entities

`drepdirectory`

`drepentry`

`gtdbtkfile`

`gtdbtkentry`

`fastafile`

`fastaentry`

`gfffile`

`gffentry`

`proteinannotationfile`

`proteinannotationentry`

`geneannotationfile`

`geneannotationentry`

Biological entities

`sample`

`genomecluster`

`genome`

`contig`

`repeatedregion`

`gene`

`fragment`

`fragmentproteinlink`

`protein`