ProteInfer
Predicting the functional properties of protein sequences using deep neural networks.
Important
This file type requires the parsomics-plugin-proteinfer plugin
File naming
The file names must adhere to one of the following patterns:
<MAG-name>_ProteInfer_out.tsv,<MAG-name>.tsv,
File format
The file must include a header (i.e. it should include column names at the top). It must have the following columns:
| Column name | Column obligatoriness | Data type | Data nullability |
|---|---|---|---|
sequence_name | Mandatory | String | Not nullable |
predicted_label | Mandatory | String | Nullable |
confidence | Mandatory | String | Nullable |
description | Optional | String | Nullable |
Mapping to database
ProteinAnnotationFile
| Original data | ProteinAnnotationFile field |
|---|---|
| ProteInfer TSV file path | path |
ProteinAnnotationEntry
| Original data | ProteinAnnotationEntry field |
|---|---|
sequence_name | protein_key 1 |
predicted_label | accession and annotation_type 2 |
confidence | score |
description | description |
Footnotes
-
The protein name in the ProteInfer TSV file name is used to query the primary key of the corresponding protein in the database ↩
-
The
predicted_labelcolumn in the ProteInfer TSV files is formatted like so:<Annotation-type>:<Accession>. One such example would bePfam:CL0023, in which case this plugin would setannotation_typeto "PFAM" andaccessionto "CL0023". ↩