@@ -31,7 +31,7 @@ For that, we will use the OBITools commands and swarm.
...
@@ -31,7 +31,7 @@ For that, we will use the OBITools commands and swarm.
-[OBITools](https://git.metabarcoding.org/obitools/obitools/wikis/home) are commands written in python
-[OBITools](https://git.metabarcoding.org/obitools/obitools/wikis/home) are commands written in python
-[swarm](https://github.com/torognes/swarm) is a command written in C++ and which can be used with a Unix shell
-[swarm](https://github.com/torognes/swarm) is a command written in C++ and which can be used with a Unix shell
In this example, 2 datasets are used, because the study analyzes the sequencing of 2 tiles.
In this example, two datasets are used because the study analyzes the result of a pair-end sequencing (Example of filtrated eDNA from aquarium seawater).
First, unzip your data in your shell if you need :
First, unzip your data in your shell if you need :
```
```
...
@@ -99,95 +99,94 @@ Activate your environment in your shell :
...
@@ -99,95 +99,94 @@ Activate your environment in your shell :
conda activate obitools
conda activate obitools
```
```
Use the function _illuminapairedend_ to make the pair-end sequencing from the forward and reverse strands of the sequences you have in your data. In other words, the function aligns the complementary strands in order to get a longer sequence. In fact, during PCR, the last bases are rarely correctly sequenced. So having the forward and the reverse strands allows to lenghten the sequence, thanks to the beginning of the reverse strand, which is usually correctly sequenced.
Use the command _illuminapairedend_ to make the pair-ended merging from the forward and reverse strands of the sequences you have in your data. The command aligns the complementary strands in order to get a longer sequence. In fact, after PCR, the last bases are rarely correctly sequenced. So having the forward and the reverse strands allows to lenghten the sequence, thanks to the beginning of the reverse strand, which is usually correctly sequenced.
# a new .fastq file is created, it contains the sequences after the merging of forward and reverse strands
# a new .fastq file is created, it contains the sequences after the pair-end of forward and reverse sequences which have a quality score higher than 40 (-- score-min=40)
# alignments which have a quality score higher than 40 (-- score-min=40) are merged and annotated "aligned", while alignemnts with a lower quality score are concatenated and annotated "joined"
```
```
To only conserve the sequences which have been aligned, use _obigrep_ :
To only conserve the sequences which have been merged, use _obigrep_ :
Now you have as many files as samples, containing pair-ended and demultiplexed sequences.
Now you have as many files as samples, containing merged pair-ended and demultiplexed sequences.
<aname="step3"></a>
<aname="step3"></a>
## STEP 3 : Dereplication (OBITools)
## STEP 3 : Dereplication
Now that you have the sequences corresponding to the barcode you want to study, dereplicate them to only conserve the amplicons with their abundance stored in the header :
Now that you have the sequences corresponding to the barcode you want to study, dereplicate them to only conserve the amplicons with their abundance stored in the header with _obiuniq_ :
```
```
obiuniq Aquarium_2.fastq > Aquarium_2.uniq.fasta
obiuniq Aquarium_2.fastq > Aquarium_2.uniq.fasta
```
```
This command also transforms _fastq_ files into fasta format.
<aname="step4"></a>
<aname="step4"></a>
## STEP 4 : Filtering (OBITools)
## STEP 4 : Filtering
The _obigrep_ command filters the sequences according to different criteria which you can chose, such as the sequence length, or the abundance of the amplicons :
The _obigrep_ command filters the sequences according to different criteria which you can chose, such as the sequence length, or the abundance of the amplicons :
# "-l 20" option filters sequences with a length shorter than 20 bp
# "-l 20" option eliminates sequences with a length shorter than 20 bp
# "-p "'count>=10'" option filters sequences with an abundance inferior to 10
# "-p 'count>=10'" option eliminates sequences with an abundance inferior to 10
```
```
<aname="step5"></a>
<aname="step5"></a>
## STEP 5 : Elimination of PCR errors (OBITools)
## STEP 5 : Elimination of PCR errors
_obiclean_ is a command which eliminates punctual errors caused during PCR. The algorithm makes parwise alignments for all the amplicons. It counts the number of dissimilarities between the amplicons, and calculates the ratio between the abundance of the 2 amplicons. If there is only one dissimilarity (parameter by default, but can be modified) and if the ratio is lower than a chosen threshold, the less abundant amplicon is considered as a variant of the most abundant one.
_obiclean_ is a command which eliminates punctual errors caused during PCR. The algorithm makes parwise alignments for all the amplicons. It counts the number of dissimilarities between the amplicons, and calculates the ratio between the abundance of the two amplicons aligned. If there is only one dissimilarity (parameter by default, can be modified) and if the ratio is lower than a chosen threshold, the less abundant amplicon is considered as a variant of the most abundant one.
Sequences which are at the origin of variants without being considered as one are tagged "head". The variants are tagged "internal". The other sequences are tagged "singleton".
Sequences which are at the origin of variants without being considered as one are tagged "head". The variants are tagged "internal". The other sequences are tagged "singleton".
# here, the command returns only the sequences tagged "head" by the algorithm, and the ratio retained is 0.05
# here, the command returns only the sequences tagged "head" by the algorithm, and the chosen ratio is 0.05
```
```
<aname="step6"></a>
<aname="step6"></a>
## STEP 6 : Taxonomic assignment (OBITools)
## STEP 6 : Taxonomic assignment
_ecotag_ is a command which permits to assign each head amplicon to its corresponding taxon. It requires to first having used [ecoPCR](https://git.metabarcoding.org/obitools/ecopcr/wikis/home) with your primers used to amplify your sequences. This command have given a file containing the taxons which can potentially be amplified by the selected primers. _ecotag_ permits to assign your sequences to one of these taxons, with a minimum similarity score fixed at a chosen value, and so to be sure that your sequences come from the correct taxon. However, this step is optionnal is your primers are specific enough.
_ecotag_ is the command which permits to assign each head amplicon to its corresponding taxon. The algorithm compares the amplicons with the sequences from the reference database. If the similarity score is higher than the threshold chosen, the amplicon is assigned to its "taxid" thanks to the taxonomy database.
# only the sequences with a similarity score higher than 0.5 are annotated
# only the sequences with a similarity score higher than 0.95 are annotated to their corresponding taxon
```
```
Then, after a selection of the amplicons corresponding to your studied taxon, you can eliminate the non-interesting attributes. Here, we only conserve the amplicons abundance :
Then, after a selection of the amplicons corresponding to your studied taxon, you can eliminate the non-interesting attributes. Here, we only conserve the amplicons abundance :