Skip to main content

Configuration

With the parsomics configuration file, you can set:

  • Which plugins to activate
  • The names of your projects
  • The paths of directories with files to parse
  • Etc.

Configuration file scope

You can have multiple configuration files, at different scopes.

ScopePath
System/etc/parsomics/config.toml
User$HOME/.config/parsomics/config/toml

Users can also opt to place a configuration file at a custom location. This will be further explained in the next section.

Configuration file resolution

If both user-scoped and system-scoped configuration files exist for the currently logged-in user, then the system-scoped configuration file is ignored and the user-scoped configuration file is used.

Configuration file format

The parsomics configuration file uses the TOML format. Here's an example of a configuration file:

config.toml
# Environment settings -------------------------------------------------------

environment = "PRODUCTION"

# Plugins --------------------------------------------------------------------

plugins = [
"parsomics-plugin-dbcan",
"parsomics-plugin-interpro",
"parsomics-plugin-clean",
"parsomics-plugin-proteinfer",
]


# Projects -------------------------------------------------------------------

[[Project]]

name = "bagasse-10"

[[Project.Assembly]]

name="PACBIO"

[Project.Assembly.drep]
output_directory = "/ibira/scratch/Projects/bagasse-10/pacbio/dRep"

[Project.Assembly.prokka]
output_directory = "/ibira/scratch/Projects/bagasse-10/pacbio/prokka"

[Project.Assembly.gtdbtk]
output_directory = "/ibira/scratch/Projects/bagasse-10/GTDB-Tk/pacbio"

[Project.Assembly.interpro]
output_directory = "/ibira/scratch/Projects/bagasse-10/pacbio/interpro"

[Project.Assembly.dbcan]
output_directory = "/ibira/scratch/Projects/bagasse-10/pacbio/dbCAN"

[Project.Assembly.clean]
output_directory = "/ibira/scratch/Projects/bagasse-10/pacbio/CLEAN"

[Project.Assembly.proteinfer]
output_directory = "/ibira/scratch/Projects/bagasse-10/pacbio/proteinfer"

[[Project]]

name = "manatee-4"

[[Project.Assembly]]

name = "PACBIO-ILLUMINA-HYBRID"

[Project.Assembly.drep]
output_directory = "/ibira/scratch/Projects/manatee-4/hybrid/dRep/"

[Project.Assembly.prokka]
output_directory = "/ibira/scratch/Projects/manatee-4/hybrid/prokka/"

[Project.Assembly.gtdbtk]
output_directory = "/ibira/scratch/Projects/manatee-4/hybrid/GTDB-Tk/"

[Project.Assembly.interpro]
output_directory = "/ibira/scratch/Projects/manatee-4/hybrid/interpro/"

[Project.Assembly.dbcan]
output_directory = "/ibira/scratch/Projects/manatee-4/hybrid/dbCAN/"

[[Project.Assembly]]

name = "PACBIO"

[Project.Assembly.drep]
output_directory = "/ibira/scratch/Projects/manatee-4/pacbio/dRep/"

[Project.Assembly.prokka]
output_directory = "/ibira/scratch/Projects/manatee-4/pacbio/prokka/"

[Project.Assembly.gtdbtk]
output_directory = "/ibira/scratch/Projects/manatee-4/pacbio/GTDB-Tk/"

[Project.Assembly.interpro]
output_directory = "/ibira/scratch/Projects/manatee-4/pacbio/interpro/"

[Project.Assembly.dbcan]
output_directory = "/ibira/scratch/Projects/manatee-4/pacbio/dbCAN/"

[Project.Assembly.clean]
output_directory = "/ibira/scratch/Projects/manatee-4/pacbio/CLEAN/"

[[Project]]

name = "plastics-6"

[[Project.Assembly]]

name = "ILLUMINA"

[Project.Assembly.drep]
output_directory = "/ibira/scratch/Projects/plastics-6/illumina/dRep"

[Project.Assembly.prokka]
output_directory = "/ibira/scratch/Projects/plastics-6/illumina/prokka/"

[Project.Assembly.gtdbtk]
output_directory = "/ibira/scratch/Projects/plastics-6/illumina/GTDB-Tk/"

[Project.Assembly.interpro]
output_directory = "/ibira/scratch/Projects/plastics-6/illumina/interpro/"

[Project.Assembly.dbcan]
output_directory = "/ibira/scratch/Projects/plastics-6/illumina/dbCAN/"

Configuration file structure

The configuration file has a hierarchichal structure. At the top, you can set the "global" variables that influence parsomics' behavior in general, and apply to every project. These variables are environment and plugins, which will be further explained in the coming sections.

Below that, there are project-wide configurations. Within each project-wide configuration, you must set a project name, as well as one or more assembly-wide configurations for the assemblies of that project.

Within each assembly-wide configuration, you can set an assembly name, as well as one or more tool-wide configurations for the tools that were used in that assembly.

Finally, within each tool-wide configuration you must set the path to the files produced by that tool run in output_directory. You can also (optionally) set the date when the tool was executed in the format "%d/%m/%Y", as well as a string for the version of the tool that was used for that run.

Configuration file fields

Global

enviroment

This field is mandatory.

The purpose of this field is to set the logging levels. It's value can be either "PRODUCTION" (sets the logging level to WARNING) or "DEVELOPMENT" (sets the logging level to INFO).

plugins

This field is mandatory.

A list of the plugin package names that can be used. To use a plugin, it must be included in this list. However, being included in the list doesn't mean it has to be used. For example, you can have "parsomics-plugin-interpro in the list but not have interpro configured in any assembly of any project.

Leave the list empty (i.e. []) if you are not using any plugins.

[[Project]]

Starts a project-wide configuration. A single configuration file may have one or more project-wide configurations.

Project-wide

name

This field is mandatory.

The name of the project. This will be used to populate the name field in the project table of the relational database.

[[Project.Assembly]]

Starts an assembly-wide configuration for that project. A single project may have one or more assembly-wide configurations.

Assembly-wide

name

This field is mandatory.

The name of the assembly. That's typically something that relates to method of sequencing and/or the method of assemblying. This will be used to populate the name field in the assembly table of the relational database.

[Project.Assembly.<toolname>]

Starts a tool-wide configuration for that assembly. A single assembly may have one or more tool-wide configurations.

The possible values for <toolname> are:

<toolname>ObligatorinessRequirements
drepMandatoryNone (built-in support)
prokkaMandatoryNone (built-in support)
gtkdbtkOptionalNone (built-in support)
interproOptionalparsomics-plugin-interpro
dbcanOptionalparsomics-plugin-dbcan
cleanOptionalparsomics-plugin-clean
proteinferOptionalparsomics-plugin-proteinfer
Important

The values for <toolname> are case sensitive and always lowercase!

info

Each tool can be configured at most once for each assembly. This means you can't have multiple tool-wide configurations for the same tool in the same assembly. That is currently a known limitation.

Tool-wide

output_directory

This field is mandatory.

The directory on which parsomics finds the files that it should parse. Two conditions apply to this directory:

  • All valid files that were generated by that tool run must be in the directory.
  • All valid files in the directory must have been generated by that tool run.

Additionally, the directory may also contain invalid files that were generated by that same run (e.g. .log files), since these files will be ignored by parsomics either way.

date

This field is optional.

The date when the tool run was carried out. The date must be written in the format "%d/%m/%Y".

version

This field in optional.

The version of the tool that was used in that execution. This field is a string, not a number.