|  |  |  | Here, we produce a valid reference database using **MKBDR** and test it with ecotag. | 
|  |  |  |  | 
|  |  |  | ## Installation | 
|  |  |  |  | 
|  |  |  |  | 
|  |  |  | See [Installing MKBDR](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Installing-MKBDR) for installation instructions. | 
|  |  |  |  | 
|  |  |  | ## Example data | 
|  |  |  |  | 
|  |  |  | Download example data with: | 
|  |  |  |  | 
|  |  |  | ``` | 
|  |  |  | curl -LJO https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/raw/master/tests/data/raw.fasta | 
|  |  |  | ``` | 
|  |  |  |  | 
|  |  |  | * `raw.fasta`: a FASTA file of 4 records representative sequence of 4 taxon groups. More details about input files [here](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition#representative-sequences). | 
|  |  |  |  | 
|  |  |  |  | 
|  |  |  | ## 1. First validation | 
|  |  |  |  | 
|  |  |  |  | 
|  |  |  | The module **[validate](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Validate)** produces a valid records FASTA file and faulty records FASTA file. | 
|  |  |  |  | 
|  |  |  | ``` | 
|  |  |  | mkbdr validate --fasta tests/data/raw.fasta --output_prefix res_raw | 
|  |  |  | ``` | 
|  |  |  |  | 
|  |  |  | This will outputs: | 
|  |  |  |  | 
|  |  |  | ``` | 
|  |  |  | Checking arguments...done. | 
|  |  |  | Validate records... | 
|  |  |  | Loading local NCBI taxonomy...done. | 
|  |  |  | 4 processed records. | 
|  |  |  | On these records, 2 are valid, 0 are faulty format and 2 are faulty taxon. | 
|  |  |  | ``` | 
|  |  |  |  | 
|  |  |  | * `res_raw_faulty_format.fasta` : a FASTA file with faulty format records (empty in this example) | 
|  |  |  | * `res_raw_faulty_taxon.fasta`: a FASTA file with faulty taxonomy records (2 faulty records in this example) | 
|  |  |  | * `res_raw_valide.fasta`: a FASTA file with correct records that can be use as reference database for taxonomic assignment (2 valid records in this example) | 
|  |  |  |  | 
|  |  |  | Read more details about output files [here](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition#output-files). | 
|  |  |  |  | 
|  |  |  | :seedling: On 4 records, `validate` asserted 2 records with a faulty taxonomy. Faulty taxonomy means that the name of the species is unknown in NCBI taxonomy. So it is impossible to use these records in a reference database. We will need to curate their taxonomy to validate these records. | 
|  |  |  |  | 
|  |  |  | ## 2. Curation | 
|  |  |  |  | 
|  |  |  | The module **[curegen](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Curegen)** produces a [curation CSV file](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition) | 
|  |  |  |  | 
|  |  |  |  | 
|  |  |  |  | 
|  |  |  |  | 
|  | mkbdr curegen --database_globalnames 'Catalogue of Life' --output_prefix res_raw --fasta res_raw_faulty_taxon.fasta |  | mkbdr curegen --database_globalnames 'Catalogue of Life' --output_prefix res_raw --fasta res_raw_faulty_taxon.fasta | 
| ... |  | ... |  | 
| ... |  | ... |  |