... | @@ -259,27 +259,48 @@ The [assembly Snakefile](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidr |
... | @@ -259,27 +259,48 @@ The [assembly Snakefile](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidr |
|
* results/03_demultiplex/02_raw/`projet`/`marker`/`run`/`sample`.fasta: sequences which belong to a `sample` fasta file
|
|
* results/03_demultiplex/02_raw/`projet`/`marker`/`run`/`sample`.fasta: sequences which belong to a `sample` fasta file
|
|
|
|
|
|
|
|
|
|
### 4 Filtering
|
|
### 4. Filtering
|
|
|
|
|
|
#### 4.1 Dereplicate sequences at `sample` level
|
|
#### 4.1 Dereplicate sequences at `sample` level
|
|
|
|
|
|
dereplicate reads into uniq sequences
|
|
[dereplicate_samples](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_obitools/-/blob/master/04_filter_samples/rules/dereplicate_samples.smk): dereplicate reads into uniq sequences
|
|
|
|
|
|
|
|
* input:
|
|
|
|
* results/03_demultiplex/02_raw/`projet`/`marker`/`run`/`sample`.fasta: sequences which belong to a `sample` fasta file
|
|
|
|
* output:
|
|
|
|
* results/04_filter_samples/01_dereplicated/`projet`/`marker`/`run`/`sample`.uniq.fasta: dereplicated sequences fasta file
|
|
|
|
|
|
#### 4.2 Remove sequences with wrong length or IUAPC ambiguity or low depth coverage
|
|
#### 4.2 Remove sequences with wrong length or IUAPC ambiguity or low depth coverage
|
|
|
|
|
|
only sequence more than 20bp with no ambiguity IUAPC with total coverage greater than 10 reads
|
|
[goodlength_samples](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_obitools/-/blob/master/04_filter_samples/rules/goodlength_samples.smk): only sequence more than `good_length_samples:seq_count` with no ambiguity IUAPC with total coverage greater than `good_length_samples:seq_length` reads
|
|
|
|
|
|
|
|
* input:
|
|
|
|
* results/04_filter_samples/01_dereplicated/`projet`/`marker`/`run`/`sample`.uniq.fasta: dereplicated sequences fasta file
|
|
|
|
* parameters:
|
|
|
|
* `good_length_samples:seq_count`: selects sequences with number of copies greater than seq_count
|
|
|
|
* `good_length_samples:seq_length`: selects sequences with length equal or longer than seq_length
|
|
|
|
* output:
|
|
|
|
* results/04_filter_samples/02_goodlength/`projet`/`marker`/`run`/`sample`.l.u.fasta: filtered sequences fasta file
|
|
|
|
|
|
#### 4.3 Detect PCR/sequencing errors sequences
|
|
#### 4.3 Detect PCR/sequencing errors sequences
|
|
|
|
|
|
Clean the sequences for PCR/sequencing errors (sequence variants)
|
|
[clean_pcrerr_samples](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_obitools/-/blob/master/04_filter_samples/rules/clean_pcrerr_samples.smk): clean the sequences for PCR/sequencing errors (sequence variants). See [Explanation](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_obitools/-/wikis/Explanation) page for algorithm details about obiclean.
|
|
|
|
|
|
#### 4.4 Remove PCR/sequencing errors sequences
|
|
* input:
|
|
|
|
* results/04_filter_samples/02_goodlength/`projet`/`marker`/`run`/`sample`.l.u.fasta: filtered sequences fasta file
|
|
Remove sequence which are classified as 'internal' by obiclean
|
|
* parameters:
|
|
|
|
* `clean_pcrerr_samples:r`: threshold ratio between counts (rare/abundant counts) of two sequence records so that the less abundant one is a variant of the more abundant.
|
|
|
|
* output:
|
|
|
|
* results/04_filter_samples/03_clean_pcrerr/`projet`/`marker`/`run`/`sample`.r.l.u.fasta: PCR clone annotation sequences fasta file
|
|
|
|
|
|
|
|
#### 4.4 Remove PCR/sequencing errors sequences
|
|
|
|
|
|
|
|
[rm_internal_samples](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_obitools/-/blob/master/04_filter_samples/rules/rm_internal_samples.smk) : remove sequence which are classified as 'internal' by obiclean
|
|
|
|
|
|
|
|
* input:
|
|
|
|
* results/04_filter_samples/03_clean_pcrerr/`projet`/`marker`/`run`/`sample`.r.l.u.fasta: PCR clone annotation sequences fasta file
|
|
|
|
* output:
|
|
|
|
* results/04_filter_samples/04_filtered/`projet`/`marker`/`run`/`sample`.c.r.l.u.fasta : cleaned sequences fasta file
|
|
|
|
|
|
|
|
|
|
### 5 Taxonomic assignment and format
|
|
### 5 Taxonomic assignment and format
|
... | | ... | |