*`raw.fasta`: a FASTA file of 4 records representative sequence of 4 taxon groups. More details about input files [here](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition#representative-sequences).
## 1. First validation
The module **[validate](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Validate)** produces a valid records FASTA file and faulty records FASTA file.
On these records, 2 are valid, 0 are faulty format and 2 are faulty taxon.
```
*`res_raw_faulty_format.fasta` : a FASTA file with faulty format records (empty in this example)
*`res_raw_faulty_taxon.fasta`: a FASTA file with faulty taxonomy records (2 faulty records in this example)
*`res_raw_valide.fasta`: a FASTA file with correct records that can be use as reference database for taxonomic assignment (2 valid records in this example)
Read more details about output files [here](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition#output-files).
:seedling: On 4 records, `validate` asserted 2 records with a faulty taxonomy. Faulty taxonomy means that the name of the species is unknown in NCBI taxonomy. So it is impossible to use these records in a reference database. We will need to curate their taxonomy to validate these records.
## 2. Curation
The module **[curegen](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Curegen)** produces a [curation CSV file](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition)
mkbdr curegen --database_globalnames 'Catalogue of Life' --output_prefix res_raw --fasta res_raw_faulty_taxon.fasta
mkbdr curegen --database_globalnames 'Catalogue of Life' --output_prefix res_raw --fasta res_raw_faulty_taxon.fasta