Configuration
With the parsomics
configuration file, you can set:
- Which plugins to activate
- The names of your projects
- The paths of directories with files to parse
- Etc.
Configuration file scope
You can have multiple configuration files, at different scopes.
Scope | Path |
---|---|
System | /etc/parsomics/config.toml |
User | $HOME/.config/parsomics/config/toml |
Users can also opt to place a configuration file at a custom location. This will be further explained in the next section.
Configuration file resolution
If both user-scoped and system-scoped configuration files exist for the currently logged-in user, then the system-scoped configuration file is ignored and the user-scoped configuration file is used.
Configuration file format
The parsomics
configuration file uses the TOML format. Here's an example of a
configuration file:
config.toml
# Environment settings -------------------------------------------------------
environment = "PRODUCTION"
# Plugins --------------------------------------------------------------------
plugins = [
"parsomics-plugin-dbcan",
"parsomics-plugin-interpro",
"parsomics-plugin-clean",
"parsomics-plugin-proteinfer",
]
# Projects -------------------------------------------------------------------
[[Project]]
name = "bagasse-10"
[[Project.Assembly]]
name="PACBIO"
[Project.Assembly.drep]
output_directory = "/ibira/scratch/Projects/bagasse-10/pacbio/dRep"
[Project.Assembly.prokka]
output_directory = "/ibira/scratch/Projects/bagasse-10/pacbio/prokka"
[Project.Assembly.gtdbtk]
output_directory = "/ibira/scratch/Projects/bagasse-10/GTDB-Tk/pacbio"
[Project.Assembly.interpro]
output_directory = "/ibira/scratch/Projects/bagasse-10/pacbio/interpro"
[Project.Assembly.dbcan]
output_directory = "/ibira/scratch/Projects/bagasse-10/pacbio/dbCAN"
[Project.Assembly.clean]
output_directory = "/ibira/scratch/Projects/bagasse-10/pacbio/CLEAN"
[Project.Assembly.proteinfer]
output_directory = "/ibira/scratch/Projects/bagasse-10/pacbio/proteinfer"
[[Project]]
name = "manatee-4"
[[Project.Assembly]]
name = "PACBIO-ILLUMINA-HYBRID"
[Project.Assembly.drep]
output_directory = "/ibira/scratch/Projects/manatee-4/hybrid/dRep/"
[Project.Assembly.prokka]
output_directory = "/ibira/scratch/Projects/manatee-4/hybrid/prokka/"
[Project.Assembly.gtdbtk]
output_directory = "/ibira/scratch/Projects/manatee-4/hybrid/GTDB-Tk/"
[Project.Assembly.interpro]
output_directory = "/ibira/scratch/Projects/manatee-4/hybrid/interpro/"
[Project.Assembly.dbcan]
output_directory = "/ibira/scratch/Projects/manatee-4/hybrid/dbCAN/"
[[Project.Assembly]]
name = "PACBIO"
[Project.Assembly.drep]
output_directory = "/ibira/scratch/Projects/manatee-4/pacbio/dRep/"
[Project.Assembly.prokka]
output_directory = "/ibira/scratch/Projects/manatee-4/pacbio/prokka/"
[Project.Assembly.gtdbtk]
output_directory = "/ibira/scratch/Projects/manatee-4/pacbio/GTDB-Tk/"
[Project.Assembly.interpro]
output_directory = "/ibira/scratch/Projects/manatee-4/pacbio/interpro/"
[Project.Assembly.dbcan]
output_directory = "/ibira/scratch/Projects/manatee-4/pacbio/dbCAN/"
[Project.Assembly.clean]
output_directory = "/ibira/scratch/Projects/manatee-4/pacbio/CLEAN/"
[[Project]]
name = "plastics-6"
[[Project.Assembly]]
name = "ILLUMINA"
[Project.Assembly.drep]
output_directory = "/ibira/scratch/Projects/plastics-6/illumina/dRep"
[Project.Assembly.prokka]
output_directory = "/ibira/scratch/Projects/plastics-6/illumina/prokka/"
[Project.Assembly.gtdbtk]
output_directory = "/ibira/scratch/Projects/plastics-6/illumina/GTDB-Tk/"
[Project.Assembly.interpro]
output_directory = "/ibira/scratch/Projects/plastics-6/illumina/interpro/"
[Project.Assembly.dbcan]
output_directory = "/ibira/scratch/Projects/plastics-6/illumina/dbCAN/"
Configuration file structure
The configuration file has a hierarchichal structure. At the top, you can set
the "global" variables that influence parsomics
' behavior in general, and
apply to every project. These variables are environment
and plugins
, which
will be further explained in the coming sections.
Below that, there are project-wide configurations. Within each project-wide
configuration, you must set a project name
, as well as one or more
assembly-wide configurations for the assemblies of that project.
Within each assembly-wide configuration, you can set an assembly name
, as
well as one or more tool-wide configurations for the tools that were used in
that assembly.
Finally, within each tool-wide configuration you must set the path to the files
produced by that tool run in output_directory
. You can also (optionally) set
the date when the tool was executed in the format "%d/%m/%Y", as well as a
string for the version of the tool that was used for that run.
Configuration file fields
Global
enviroment
This field is mandatory.
The purpose of this field is to set the logging levels. It's value can be
either "PRODUCTION"
(sets the logging level to WARNING) or "DEVELOPMENT"
(sets the logging level to INFO).
plugins
This field is mandatory.
A list of the plugin package names that can be used. To use a plugin, it must
be included in this list. However, being included in the list doesn't mean it
has to be used. For example, you can have "parsomics-plugin-interpro
in the
list but not have interpro configured in any assembly of any project.
Leave the list empty (i.e. []
) if you are not using any plugins.
[[Project]]
Starts a project-wide configuration. A single configuration file may have one or more project-wide configurations.
Project-wide
name
This field is mandatory.
The name of the project. This will be used to populate the name
field in the
project
table of the relational database.
[[Project.Assembly]]
Starts an assembly-wide configuration for that project. A single project may have one or more assembly-wide configurations.
Assembly-wide
name
This field is mandatory.
The name of the assembly. That's typically something that relates to method of
sequencing and/or the method of assemblying. This will be used to populate the
name
field in the assembly
table of the relational database.
[Project.Assembly.<toolname>]
Starts a tool-wide configuration for that assembly. A single assembly may have one or more tool-wide configurations.
The possible values for <toolname>
are:
<toolname> | Obligatoriness | Requirements |
---|---|---|
drep | Mandatory | None (built-in support) |
prokka | Mandatory | None (built-in support) |
gtkdbtk | Optional | None (built-in support) |
interpro | Optional | parsomics-plugin-interpro |
dbcan | Optional | parsomics-plugin-dbcan |
clean | Optional | parsomics-plugin-clean |
proteinfer | Optional | parsomics-plugin-proteinfer |
The values for <toolname>
are case sensitive and always lowercase!
Each tool can be configured at most once for each assembly. This means you can't have multiple tool-wide configurations for the same tool in the same assembly. That is currently a known limitation.
Tool-wide
output_directory
This field is mandatory.
The directory on which parsomics
finds the files that it should parse. Two
conditions apply to this directory:
- All valid files that were generated by that tool run must be in the directory.
- All valid files in the directory must have been generated by that tool run.
Additionally, the directory may also contain invalid files that were generated
by that same run (e.g. .log
files), since these files will be ignored by
parsomics
either way.
date
This field is optional.
The date when the tool run was carried out. The date must be written in the format "%d/%m/%Y".
version
This field in optional.
The version of the tool that was used in that execution. This field is a string, not a number.