|
|
Here, we produce a valid reference database using **MKBDR** and test it with ecotag.
|
|
|
|
|
|
## Installation
|
|
|
|
|
|
|
|
|
See [Installing MKBDR](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Installing-MKBDR) for installation instructions.
|
|
|
|
|
|
## Example data
|
|
|
|
|
|
Download example data with:
|
|
|
|
|
|
```
|
|
|
curl -LJO https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/raw/master/tests/data/raw.fasta
|
|
|
```
|
|
|
|
|
|
* `raw.fasta`: a FASTA file of 4 records representative sequence of 4 taxon groups. More details about input files [here](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition#representative-sequences).
|
|
|
|
|
|
|
|
|
## 1. First validation
|
|
|
|
|
|
|
|
|
The module **[validate](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Validate)** produces a valid records FASTA file and faulty records FASTA file.
|
|
|
|
|
|
```
|
|
|
mkbdr validate --fasta tests/data/raw.fasta --output_prefix res_raw
|
|
|
```
|
|
|
|
|
|
This will outputs:
|
|
|
|
|
|
```
|
|
|
Checking arguments...done.
|
|
|
Validate records...
|
|
|
Loading local NCBI taxonomy...done.
|
|
|
4 processed records.
|
|
|
On these records, 2 are valid, 0 are faulty format and 2 are faulty taxon.
|
|
|
```
|
|
|
|
|
|
* `res_raw_faulty_format.fasta` : a FASTA file with faulty format records (empty in this example)
|
|
|
* `res_raw_faulty_taxon.fasta`: a FASTA file with faulty taxonomy records (2 faulty records in this example)
|
|
|
* `res_raw_valide.fasta`: a FASTA file with correct records that can be use as reference database for taxonomic assignment (2 valid records in this example)
|
|
|
|
|
|
Read more details about output files [here](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition#output-files).
|
|
|
|
|
|
:seedling: On 4 records, `validate` asserted 2 records with a faulty taxonomy. Faulty taxonomy means that the name of the species is unknown in NCBI taxonomy. So it is impossible to use these records in a reference database. We will need to curate their taxonomy to validate these records.
|
|
|
|
|
|
## 2. Curation
|
|
|
|
|
|
The module **[curegen](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Curegen)** produces a [curation CSV file](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
mkbdr curegen --database_globalnames 'Catalogue of Life' --output_prefix res_raw --fasta res_raw_faulty_taxon.fasta
|
... | ... | |