## Workflow
## Workflow
* inputs:
* FASTA file
1. Check FASTA format
2. Check species name format
3. Check DNA sequence format
4. Check species name against NCBI taxonomy
5. Attribute NCBI taxid
6. Write `valid`, `faulty_taxon` and `faulty_format` FASTA files
7. Curate species name using `curation` CSV file
8. Write new nodes in NCBI taxonomy for unattributed taxid species
9. Write `valid` FASTA files
## Credits
* outputs:
* formatted FASTA file
* .ldx new nodes for missing taxid into the taxonomy to link to existing genus/family taxid
**MKBDR** was coded and written by Pierre-Edouard Guerin.
We thank the following people for their help in the development of this software: Virginie Marques, Alice Valentini, David Mouillot, Emilie Boulanger, Laetitia Mathon, Laura Benestan, Stephanie Manel, Tony Dejean.
1. `raw fasta` --> validate --> `valid fasta` `faulty_format fasta` `faulty_taxon fasta`
2. `faulty_taxon fasta` --> curegen --> `curation csv`
3. verifier et corriger à la main le tableau `curation csv`
## Contributions and Support
4. `raw fasta` `curation csv` --> validate --> `valid fasta` et mise à jour de la taxonomy
## Environment
## Environment
cd TAXO/testouille; tar zxvf taxdump_2021.tar.gz ; cd ../../
python3 mkbdr validate --fasta teleo_ok.fasta --curate curated_taxon.csv --output_prefix "truc" --curate curated_taxon.csv --ncbi_taxdump TAXO/testouille --ncbi_taxdump_edition
mkbdr validate --fasta teleo_ok.fasta --curate curated_taxon.csv --output_prefix "laetinicer" --ncbi_taxdump_edition --ncbi_taxdump TAXO/testouille
tar zxvf taxdump.tar/gz
tar zxvf taxdump.tar/gz
\ No newline at end of file
