The **validate** module produces valid record fasta file with taxid attributes. This fasta file works with NCBI taxonomy so that it can be used for further analysis like taxonomic assignment using ecotag.
The **validate** module produces valid record FASTA file with taxid attributes. This FASTA file works with NCBI taxonomy so that it can be used for further analysis like taxonomic assignment using ecotag.
### Usage
# Inputs
* species representative barcodes sequences FASTA file (see description [here](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition#representative-sequences))
* (optional) NCBI taxonomy files (see description [here](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition#ncbi-taxonomies-file)
* (optional) Curation CSV file (see description [here](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition#curation-file)
# Command-line Arguments
This table summarizes the command-line arguments which are using by COAT.
| complete flag argument | short flag | Default value | Summary |
| --- | --- | --- | --- |
| `--fasta` | `-f` | NA | path of the input barcodes sequences FASTA file |
| `--curate` | `-c` | NA | path of the input taxonomy curation CSV file. Header must be current_name;ncbi_name;genus;family;ncbi_rank. A curation CSV file can be generated with the command curegen |
| `--ncbi_taxdump` | `-n` | NA | path of NCBI taxonomy folder |
* Valid FASTA file (see description [here](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition#valid-fasta-file))
* Faulty taxonomy FASTA file (see description [here](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition#faulty-taxonomy-fasta-file))
* Faulty format FASTA file (see description [here](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition#faulty-format-fasta-file))
* Edited taxonomy files: with `--curate`, `--ncbi_taxdump_edition` and `--ncbi_taxdump` options, validate module edits taxonomy files mentionned by `--ncbi_taxdump` so that nodes are added to the tree of life according to curation file specification.
To perform taxonomic assignment in further analysis you need valid FASTA file and corresponding taxonomy files.
In case the faulty format FASTA file is not empty, you have to correct manually the records in your input FASTA file and run again `mkbdr validate`.
### Inputs
In case the faulty taxonomy FASTA file is not empty, you have to correct the species name of the records in the input FASTA file and run again `mkbdr validate`. You can correct species name manually or use `mkbdr curegen` to generate a curation CSV file that can be use as input of `mkbdr validate --curate` that will apply curation on your input FASTA file.