Skip to main content

GTDB-Tk

File naming

The file name must end with ".summary.tsv".

File format

tip

For more information on the GTDB-Tk output format, visit the GTDB-Tk documentation.

The file must include a header (i.e. the column names at the top). The column names can be anything, as long as the order is exactly the same. It must have the following columns, in that order:

Column nameColumn obligatorinessData typeData nullability
user_genomeMandatoryStringNot nullable
classificationMandatoryStringNot nullable
closest_genome_referenceMandatoryStringNullable
closest_genome_reference_radiusMandatoryFloatNullable
closest_genome_taxonomyMandatory (ignored)N/AN/A
closest_genome_aniMandatoryFloatNullable
closest_genome_afMandatoryFloatNullable
closest_placement_referenceMandatoryStringNullable
closest_placement_radiusMandatoryFloatNullable
closest_placement_taxonomyMandatory (ignored)N/AN/A
closest_placement_aniMandatoryFloatNullable
closest_placement_afMandatoryFloatNullable
pplacer_taxonomyMandatory (ignored)N/AN/A
classification_methodMandatoryStringNot nullable
noteMandatoryStringNullable
other_related_referencesMandatory (ignored)N/AN/A
msa_percentMandatory (ignored)N/AN/A
translation_tableMandatory (ignored)N/AN/A
red_valueMandatoryFloatNullable
warningsMandatoryStringNullable
info

Why are there mandatory columns that are ignored?

That has to do with the way the GTDB-Tk file parser is written. When the file is read, it must comply with a pre-defined schema (column order and types), even though some of these columns end up being dropped later.

Mapping to database

GTDBTkTsvFile

Original dataGTDBTkTsvFile fieldNotes
GTDB-Tk file pathpath

GTDBTkTsvEntry

Original dataGTDBTkTsvEntry fieldNotes
user_genomegenome_keyThe MAG name in the GFF file name is used to query the primary key of the corresponding genome in the database
classificationdomain, phylum, klass, order, family, genus, speciesThe classification column is broken down into multiple fields for better readability
closest_genome_reference or closest_placement_reference columnMandatory
closest_genome_reference_radius or closest_placement_radius columnMandatory
closest_genome_ani or closest_placement_ani columnMandatory
closest_genome_af or closest_placement_af columnMandatory
classification_methodclassification_method
notenote
red_valuered_value
warningswarnings