Commit 3425f607 authored by mmassaviol's avatar mmassaviol
Browse files

Update workflow

parent 49ec912b
FROM mmassaviol/mbb_workflows_base:latest as alltools
RUN wget http://opengene.org/fastp/fastp \
&& chmod a+x ./fastp \
&& mv fastp /opt/biotools/bin/fastp
FROM mbbteam/mbb_workflows_base:latest as alltools
RUN wget https://github.com/OpenGene/fastp/archive/v0.20.0.tar.gz \
&& tar -xvzf v0.20.0.tar.gz \
&& cd fastp-0.20.0 \
&& make \
&& mv fastp /opt/biotools/bin/fastp \
&& cd .. \
&& rm -r fastp-0.20.0 v0.20.0.tar.gz
RUN cd /opt/biotools/bin \
&& wget -O jellyfish https://github.com/gmarcais/Jellyfish/releases/download/v2.3.0/jellyfish-linux \
......
Ce workflow peut être lancé de manière autonome sur :
* des machines de type BigMem
* sur un Cloud type IFB <https://biosphere.france-bioinformatique.fr/>
* sur le cluster MBB (non détaillée)
* En pouvant les modifier en y apporter de légères adaptations
* Dans ce qui suit nous allons voir comment déployer un workflow sur une machine autonome, de lancer ce workflow en mode Web, de le modifier puis de le lancer en mode ligne de commande.
* Remarquer la présence des fichiers suivants :
* install.sh permet d'installer les logiciels pré-requis (à faire une seule fois si nécessaire!)
* deployBigMem.sh : permet de déployer un conteneur en mode web sur une machine de type bigmem
* deployIFB.sh : permet de deployer en mode web sur le cloud IFB (<https://biosphere.france-bioinformatique.fr/cloudweb/login/)>
* deployLocalHost.sh : permet de deployer sur votre machine
* waw_workflow.qsub : script de soumission d'un workflow sur le cluster MBB
* RunCmdLine.sh : permet de déployer et executer un workflow en ligne de commande (Nous verrons cela dans une prochaine partie)
## Déploiement en mode application web :
* Lancer ***bash deployLocalHost.sh*** pour voir les paramètres dont il a besoin :
* dataDir : dossier de la machine Bigmem hôte contenant les données
* resultsDir : dossier de la machine hôte qui va contenir les résultats de l'analyse
* le dernier paramètre (optionnel) indique la source de l'image Docker à utiliser
* dockerHub : l'image sera téléchargée depuis le dépôt Docker Hub (cette option n'est valide que pour les workflows développés par MBB)
* local : l'image sera construite en local à partir des fichiers sources issus du dépôt Git (il faut choisir cette option pour les workflows non disponibles sur gitHub)
* Assurez vous que les données soient disponibles dans un dossier ex. : /home/$USER/datasets/rnaseq/
les fichiers de reads doivent être de la forme : <sample_name><pair><extension>
pair = _R1 _R2 ou _1 _2 ou vide en single end
extension libre (fastq, fastq.gz, fq, fq.gz, ...)
ex paired end : sample1_R1.fastq.gz sample1_R2.fastq.gz
ex single end : sample1.fastq.gz
* Créer un dossier pour les résultas ex. : *** mkdir -p /home/$USER/result1 ***
* lancer :
*** deployLocalHost.sh /home/$USER/datasets/rnaseq/ /home/$USER/result1 local ***
* Voir plus bas pour la correspondance entre chemins sur le système hôte et chemins du conteneur
* Consulter la sortie écran pour voir comment :
* Accéder au workflow par navigateur web
* Accéder en *ssh* à l'intérieur du système du conteneur
* Noter quel est l'identifiant de votre conteneur !!!
* Pour arrêter le conteneur :
* *** docker ps *** pour lister les conteneurs
* *** docker kill ID ***
* Modifications du workflow
### A/ Ajouter un paramètre à un outil
Les règles des différentes étapes du workflow sont assemblées dans le fichier files/Snakefile. Elles sont écrites selon la syntaxe du gestionnaire de workflow Snakemake (<https://github.com/deto/Snakemake_Tutorial)> (<https://www.youtube.com/watch?v=UOKxta3061g&feature=youtu.be)>
* Ajout du paramètre --gcBias pour corriger les biais de GC des reads à l'outil de quantification SALMON :
* Ouvrir le fichier Snakemake et aller au rule salmon_quant_PE
* Repérer la partie shell qui indique comment sera lancé l'outil SALMON
* Insérer le paramètre --gcBias
* Relancer le conteneur avec l'option 'local' pour reconstruire l'image avec vos modifs
***deployLocalHost.sh /home/$USER/datasets/rnaseq/ /home/$USER/result1 local***
### B/ Changer la version d’un outil
Les procédures d'installation des différente outils nécessaires au bon fonctionnement du workflow sont rassemblées dans un fichier de recette nommé Dockerfile.
* Ouvrir ce fichier et repérer la partie concernant l'installation de kallisto
* Liste des versions de kallisto : <https://github.com/pachterlab/kallisto/releases>
* Modifier le n° de version pour une version de kallisto plus récente
* Relancer le conteneur avec l'option 'local' pour reconstruire l'image avec vos modifs
***deployLocalHost.sh /home/$USER/datasets/rnaseq/ /home/$USER/result1 local***
### C/ Ajouter une étape
* Deux possibilités :
* recharger le fichier .json sur l'interface subwaw (http://web.mbb.univ-montp2.fr/subwaw/workflowmanager.php) puis insérer l'étape souhaitée puis télécharger la nouvelle version du workflow
* Faire une demande via le système de tickets : <https://kimura.univ-montp2.fr/calcul/helpdesk_NewTicket.html>
## Utilisation en mode ligne de commande
Pour ré-utiliser un workflow sur différents fichiers ou avec différents paramètres, il est bon de pouvoir le lancer en ligne de commande.
Il faut pour cela avoir un fichier texte contenant tous les paramètres du workflow.
Ce fichier peur être :
* Récupéré depuis l'interface Web d'un déploiement comme en [Déploiement en mode application web] puis modifié à vos besoins
* Récupéré depuis le dossier de résultats d'une analyse effectuée avec ce workflow
* Directement à partir du modèle par défaut disponible dans files/params.total.yml
* Modifier un ou plusieurs paramètres parmi les suivants :
* results_dir:
* sample_dir:
* group_file:
* kallisto_index_input:
ou
* salmon_index_input:
* edger_annotations:
* Enregistrer vos modifs dans maconfig.yaml dans par ex. /home/$USER/results1/version2/ et sera visible dans le conteneur sous /Result/maconfig.yaml
* lancer depuis une console la ligne de commande (ici la paramètre 10 pour utiliser 10 coeurs) :
***bash RunCmdLine.sh /home/$USER/datasets/rnaseq/ /home/$USER/results1/version2/ /Results/maconfig.yaml 10***
* Suivre la progression du workflow
* A la fin vérifier le contenu de /home/$USER/results1/version2/
## Correspondance entre dossiers de votre machine et dossiers du conteneur
ex 1 deploiement : ***bash deployBigMem.sh /home/votrelogin/data1/ /home/votrelogin/results1/***
A l’intérieur du conteneur :
* /home/votrelogin/data1/ -> /Data/
* /home/votrelogin/results1/ -> /Results/
ex 2 deploiement : ***bash deployBigMem.sh /share/bio/datasets/rnaseq/ /home/votrelogin/results1/version1/***
A l'interieur du conteneur :
* /share/bio/datasets/rnaseq/ -> /Data/
* /share/bio/datasets/rnaseq/fastqs/ -> /Data/fastqs/
* /share/bio/datasets/rnaseq/reference/ -> /Data/reference/
* /share/bio/datasets/rnaseq/conditions/groups.tsv -> /Results/conditions/groups.tsv
* /home/votrelogin/results1/version1/ -> /Results/
## 6/ Liens utiles
* commandes docker : <https://devhints.io/docker>
* commandes bash : <https://devhints.io/bash>
* système de tickets MBB : <https://kimura.univ-montp2.fr/calcul/helpdesk_NewTicket.html>
* système de réservation de Bigmem : <https://mbb.univ-montp2.fr/grr/login.php>
* cloud IFB : <https://biosphere.france-bioinformatique.fr/>
* cluster mbb : ssh login@cluster-mbb.mbb.univ-montp2.fr
* depôts Git des MBBworkflows : <https://gitlab.mbb.univ-montp2.fr/mmassaviol/wapps>
* dépôt Git du framework de conception MBBworkflows : <https://gitlab.mbb.univ-montp2.fr/mmassaviol/waw>
* les conteneurs docker des MBBworkflows : <https://hub.docker.com/search?q=mbbteam&type=image>
......@@ -53,5 +53,5 @@ then
echo Results were written to : $2
echo " "
else
echo Failed to run the docker container !!
echo Failed to run the docker container
fi
......@@ -89,9 +89,9 @@ then
echo " "
echo Results will be written to : $2
echo " "
hostname -I | awk -v port=$APP_PORT '{print "You can access the workflow interface at : http://"$1":"port}'
hostname -I | grep -E -o "162.38.181.[0-9]{1,3}" | awk -v port=$APP_PORT '{print "You can access the workflow interface at : http://"$1":"port}'
echo " "
echo To start a Bash session inside the container : docker exec -it $CONTAINER_ID /bin/bash
else
echo Failed to run the docker container !!
echo Failed to run the docker container
fi
......@@ -114,5 +114,5 @@ then
echo " "
echo XX étant le nombre de coeurs qui seront utilisés par le workflow.
else
echo Failed to run the docker container !!
echo Failed to run the docker container
fi
#!/bin/bash
#This script will help a deployment of a docker image on your local machine
if [ $# -lt 2 ]
then
echo usage : $0 dataDir resultsDir '[dockerHub|local]'
exit
fi
# Exit on error with message
exit_on_error() {
exit_code=$1
last_command=${@:2}
if [ $exit_code -ne 0 ]; then
>&2 echo "\"${last_command}\" command failed with exit code ${exit_code}."
exit $exit_code
fi
}
# enable !! command completion
set -o history -o histexpand
#essayer une plage de ports entre 8787 et 8800
#APP_PORT=$2
APP_PORT=8787
while [[ $(ss -tulw | grep $APP_PORT) != "" && $APP_PORT < 8800 ]]
do
APP_PORT=$(( $APP_PORT + 1))
done
if [[ $(ss -tulw | grep $APP_PORT) != "" ]]
then
echo "No tcp port available !!"
exit -1
fi
# Docker volumes
# MBB Workflows reads data from /Data and write results to /Results
if [ $SUDO_USER ]; then realUSER=$SUDO_USER; else realUSER=`whoami`; fi
Data=$1
Results=$2
mkdir -p $Data
mkdir -p $Results
DOCK_VOL+=" --mount type=bind,src=$Data,dst=/Data"
DOCK_VOL+=" --mount type=bind,src=$Results,dst=/Results"
if [ $# -lt 3 ]
then
APP_IMG="mbbteam/genomeprofiler:latest"
else
IMG_SRC=$3
case $IMG_SRC in
dockerHub )
APP_IMG="mbbteam/genomeprofiler:latest" ;;
local)
docker build . -t genomeprofiler:latest
APP_IMG="genomeprofiler:latest" ;;
mbb)
#APP_IMG="X.X.X.X:5000/genomeprofiler:latest" ;;
esac
fi
IMG_NAME=$(echo $APP_IMG"-"$APP_PORT | sed s/:/-/ )
CONTAINER_ID=$( docker run --rm -d --name $IMG_NAME -p $APP_PORT:3838 $DOCK_VOL $APP_IMG )
if [ $CONTAINER_ID ]
then
echo " "
echo You have to put your Data on : $1
echo " "
echo Results will be written to : $2
echo " "
echo localhost | awk -v port=$APP_PORT '{print "You can access the shiny workflow interface at : http://"$1":"port}'
echo " "
echo To start a Bash session inside the container : docker exec -it $IMG_NAME /bin/bash
else
echo Failed to run the docker container
fi
This diff is collapsed.
base_tools:
MBB_platform:
- Montpellier Bioinformatics Biodiversity platform supported by the LabEx CeMEB,
an ANR "Investissements d'avenir" program (ANR-10-LABX-04-01).
snakemake:
- "K\xF6ster, Johannes and Rahmann, Sven. Snakemake - A scalable bioinformatics\
\ workflow engine. Bioinformatics 2012."
multiqc:
- "Philip Ewels, M\xE5ns Magnusson, Sverker Lundin, Max K\xE4ller, MultiQC: summarize\
\ analysis results for multiple tools and samples in a single report, Bioinformatics,\
\ Volume 32, Issue 19, 1 October 2016, Pages 3047\u20133048, https://doi.org/10.1093/bioinformatics/btw354"
shiny:
- 'Winston Chang, Joe Cheng, JJ Allaire, Yihui Xie and Jonathan McPherson (2019).
shiny: Web Application Framework for R. https://CRAN.R-project.org/package=shiny'
Docker:
- 'Dirk Merkel. 2014. Docker: lightweight Linux containers for consistent development
and deployment. Linux J. 2014, 239, Article 2 (March 2014), 1 pages.'
fastp:
fastp:
- 'Shifu Chen, Yanqing Zhou, Yaru Chen, Jia Gu, fastp: an ultra-fast all-in-one
FASTQ preprocessor, Bioinformatics, Volume 34, Issue 17, 01 September 2018, Pages
i884-i890, https://doi.org/10.1093/bioinformatics/bty560'
jellyfish_count:
jellyfish:
- "Guillaume Mar\xE7ais, Carl Kingsford, A fast, lock-free approach for efficient\
\ parallel counting of occurrences of k-mers, Bioinformatics, Volume 27, Issue\
\ 6, 15 March 2011, Pages 764-770, https://doi.org/10.1093/bioinformatics/btr011"
jellyfish_histo:
jellyfish:
- "Guillaume Mar\xE7ais, Carl Kingsford, A fast, lock-free approach for efficient\
\ parallel counting of occurrences of k-mers, Bioinformatics, Volume 27, Issue\
\ 6, 15 March 2011, Pages 764-770, https://doi.org/10.1093/bioinformatics/btr011"
genomescope:
genomescope:
- 'Gregory W Vurture, Fritz J Sedlazeck, Maria Nattestad, Charles J Underwood, Han
Fang, James Gurtowski, Michael C Schatz, GenomeScope: fast reference-free genome
profiling from short reads, Bioinformatics, Volume 33, Issue 14, 15 July 2017,
Pages 2202-2204, https://doi.org/10.1093/bioinformatics/btx153'
import re
import sys
from tools import *
from tools import read_yaml
config = read_yaml(sys.argv[1])
def files_or_dirs_to_ignore():
# files
res = "fn_ignore_files:\n"
file_ignore = ["*.fa","*.fasta","*.fa.gz","*.fasta.gz","*.fq","*.fastq","*.fq.gz","*.fastq.gz","*.sam","*.bam","*.gtf","*.gtf.gz","*.vcf","*.vcf.gz"]
for file in file_ignore:
res += " - '" + file + "'\n"
res += "\n"
res += "fn_ignore_dirs:\n"
dirs_ignore = ["workflow"]
for dir in dirs_ignore:
res += " - '" + dir + "'\n"
res += "\n"
return res
def module_order():
res = ""
for step in config["steps"]:
tool = config["params"][step["name"]]
print(tool)
if (config["multiqc"][tool] != "custom"):
res += " - " + config["multiqc"][tool] + ":\n"
res += " name: " + step["title"] + " (" + tool + ")\n"
res += " anchor: " + step["name"] + "__" + config["multiqc"][tool] + "\n"
res += " path_filters:\n"
for rule in config["outputs"][step["name"] + "__" + tool].keys():
res += " - '*" + config["params"][rule + "_output_dir"] + "/*'\n" # limit search to tool output dir
res += " - '*/logs/" + config["params"][rule + "_output_dir"] + "/*'\n" # and tool logs
# dont put module order if empty
if res != "" : res = "module_order:\n" + res
return res
def report_section_order():
res = "skip_generalstats: true\n\n"
res += "report_section_order:\n"
res = "report_section_order:\n"
res += " Rule_graph:\n"
res += " order: 990\n"
res += " params_tab:\n"
res += " order: 980\n"
res += " outputs:\n"
res += " order: 970\n"
cpt = 960
res += " Tools_version:\n"
res += " order: 960\n"
res += " Citations:\n"
res += " order: -1000\n"
cpt = 950
for step in config["steps"]:
tool = config["params"][step["name"]]
if (config["multiqc"][tool] != "custom"):
res += " " + config["multiqc"][tool] + ":\n"
res += " " + step["name"] + "__" + config["multiqc"][tool] + ":\n"
res += " " + "order: " + str(cpt) + "\n"
cpt += -10
for rule in config["outputs"][tool]:
if ((config["params"]["SeOrPe"] == "SE" and not("_PE" in rule)) or (config["params"]["SeOrPe"] == "PE" and not("_SE" in rule))):
for output in config["outputs"][tool][rule]:
for rule in config["outputs"][step["name"] + "__" + tool]:
if ("SeOrPe" not in config.keys() or (config["params"]["SeOrPe"] == "SE" and not("_PE" in rule)) or (config["params"]["SeOrPe"] == "PE" and not("_SE" in rule))):
for output in config["outputs"][step["name"] + "__" + tool][rule]:
if("file" in output.keys() and "mqc" in output["file"] and '{' not in output["file"]): # case of dynamic files ({wildcard}_mqc.png) to deal with
section = re.sub('\_mqc.*$', '', output["file"])
res += " " + section + ":\n"
res += " " + "order: " + str(cpt) + "\n"
cpt += -10
if step["name"] + "__" + tool in config["prepare_report_outputs"]:
if isinstance(config["prepare_report_outputs"][step["name"] + "__" + tool], list):
for output in config["prepare_report_outputs"][step["name"] + "__" + tool]:
section = re.sub('\_mqc.*$', '', output)
res += " " + section + ":\n"
res += " " + "order: " + str(cpt) + "\n"
cpt += -10
else:
section = re.sub('\_mqc.*$', '', config["prepare_report_outputs"][step["name"] + "__" + tool])
res += " " + step["name"] + "__" + section + ":\n"
res += " " + "order: " + str(cpt) + "\n"
cpt += -10
return res
def main():
res = ""
res += report_section_order()
res = "skip_generalstats: true\n\n"
res += module_order() + "\n\n"
res += report_section_order() + "\n\n"
res += files_or_dirs_to_ignore()
with open(sys.argv[2],"w") as out:
out.write(res)
......
......@@ -4,41 +4,55 @@ params:
sample_dir: /Data
SeOrPe: PE
preprocessing: 'null'
fastp_PE_output_dir: fastp_PE
fastp_threads: 4
fastp_complexity_threshold: 30
fastp_report_title: fastp report
fastp_adapter_sequence: ''
fastp_adapter_sequence_R2_PE: ''
fastp_P: 20
fastp_correction_PE: true
fastp_low_complexity_filter: true
fastp_overrepresentation_analysis: true
fastp_SE_output_dir: fastp_SE
null_output_dir: ''
preprocessing__fastp_PE_output_dir: preprocessing/fastp_PE
preprocessing__fastp_PE_command: fastp
preprocessing__fastp_threads: 4
preprocessing__fastp_complexity_threshold: 30
preprocessing__fastp_report_title: fastp report
preprocessing__fastp_adapter_sequence: ''
preprocessing__fastp_adapter_sequence_R2_PE: ''
preprocessing__fastp_P: 20
preprocessing__fastp_correction_PE: true
preprocessing__fastp_low_complexity_filter: true
preprocessing__fastp_overrepresentation_analysis: true
preprocessing__fastp_SE_output_dir: preprocessing/fastp_SE
preprocessing__fastp_SE_command: fastp
preprocessing__null_output_dir: preprocessing/
preprocessing__null_command: ''
kmer_counting: jellyfish_count
kmer_counting__jellyfish_count_output_dir: kmer_counting/jellyfish
kmer_counting__jellyfish_count_command: jellyfish count
kmer_counting__jellyfish_count_threads: 4
kmer_counting__jellyfish_count_canonical_kmer: true
kmer_counting__jellyfish_count_kmer_len: 21
kmer_counting__jellyfish_count_hash_size: 100000000
kmer_histogram: jellyfish_histo
kmer_histogram__jellyfish_histo_output_dir: kmer_histogram/jellyfish
kmer_histogram__jellyfish_histo_command: jellyfish histo
kmer_histogram__jellyfish_histo_threads: 4
kmer_analysis: genomescope
jellyfish_count_output_dir: jellyfish
reads: ''
jellyfish_threads: 4
jellyfish_count_canonical_kmer: true
jellyfish_count_kmer_len: 21
jellyfish_count_hash_size: 100000000
jellyfish_histo_output_dir: jellyfish
kmer_counts: counts.jf
genomescope_output_dir: genomescope
kmer_histo: kmer_histo_jf.hist
genomescope_reads_len: 150
samples: []
groups: []
kmer_analysis__genomescope_output_dir: kmer_analysis/genomescope
kmer_analysis__genomescope_command: Rscript /opt/biotools/genomescope.R
kmer_analysis__genomescope_reads_len: 150
steps:
- title: Preprocessing
name: preprocessing
- name: preprocessing
title: Preprocessing
tools:
- fastp
- 'null'
default: 'null'
- title: K-mer analysis
name: kmer_analysis
- name: kmer_counting
title: K-mer counting
tools:
- jellyfish_count
default: jellyfish_count
- name: kmer_histogram
title: K-mer histogram
tools:
- jellyfish_histo
default: jellyfish_histo
- name: kmer_analysis
title: K-mer analysis
tools:
- genomescope
default: genomescope
......@@ -49,130 +63,152 @@ params_info:
type: input_dir
SeOrPe:
type: radio
fastp_threads:
preprocessing__fastp_threads:
tool: fastp
rule: fastp_SE
rule: preprocessing_fastp_SE
type: numeric
label: Number of threads to use
fastp_complexity_threshold:
preprocessing__fastp_complexity_threshold:
tool: fastp
rule: fastp_SE
rule: preprocessing_fastp_SE
type: numeric
label: The threshold for low complexity filter (0~100)
fastp_report_title:
preprocessing__fastp_report_title:
tool: fastp
rule: fastp_SE
rule: preprocessing_fastp_SE
type: text
label: fastp report title
fastp_adapter_sequence:
preprocessing__fastp_adapter_sequence:
tool: fastp
rule: fastp_SE
rule: preprocessing_fastp_SE
type: text
label: The adapter for read1. For SE data, if not specified, the adapter will
be auto-detected. For PE data, this is used if R1/R2 are found not overlapped.
fastp_adapter_sequence_R2_PE:
preprocessing__fastp_adapter_sequence_R2_PE:
tool: fastp
rule: fastp_PE
rule: preprocessing_fastp_PE
type: text
label: the adapter for read2 (PE data only). This is used if R1/R2 are found not
overlapped. If not specified, it will be the same as <adapter_sequence>
fastp_P:
preprocessing__fastp_P:
tool: fastp
rule: fastp_SE
rule: preprocessing_fastp_SE
type: numeric
label: One in (--overrepresentation_sampling) reads will be computed for overrepresentation
analysis (1~10000), smaller is slower.
fastp_correction_PE:
preprocessing__fastp_correction_PE:
tool: fastp
rule: fastp_PE
rule: preprocessing_fastp_PE
type: checkbox
label: Enable base correction in overlapped regions
fastp_low_complexity_filter:
preprocessing__fastp_low_complexity_filter:
tool: fastp
rule: fastp_SE
rule: preprocessing_fastp_SE
type: checkbox
label: Enable low complexity filter. The complexity is defined as the percentage
of base that is different from its next base (base[i] != base[i+1]).
fastp_overrepresentation_analysis:
preprocessing__fastp_overrepresentation_analysis:
tool: fastp
rule: fastp_SE
rule: preprocessing_fastp_SE
type: checkbox
label: enable overrepresented sequence analysis.
jellyfish_threads:
tool: genomescope
rule: jellyfish_histo
kmer_counting__jellyfish_count_threads:
tool: jellyfish_count
rule: kmer_counting_jellyfish_count
type: numeric
label: Number of threads to use
jellyfish_count_canonical_kmer:
tool: genomescope
rule: jellyfish_count
kmer_counting__jellyfish_count_canonical_kmer:
tool: jellyfish_count
rule: kmer_counting_jellyfish_count
type: checkbox
label: '-C : Save in the hash only canonical k-mers, while the count is the number
of occurrences of both a k-mer and it reverse complement'
jellyfish_count_kmer_len:
tool: genomescope
rule: jellyfish_count
kmer_counting__jellyfish_count_kmer_len:
tool: jellyfish_count
rule: kmer_counting_jellyfish_count
type: numeric
label: K-mers length
jellyfish_count_hash_size:
tool: genomescope
rule: jellyfish_count
kmer_counting__jellyfish_count_hash_size:
tool: jellyfish_count
rule: kmer_counting_jellyfish_count
type: numeric
label: Kmer size estimation