@@ -14,32 +14,29 @@ Scripts to convert FASTA files into reference database with NCBI taxonomy.
## Workflow
* inputs:
* FASTA file

1. Check FASTA format
2. Check species name format
3. Check DNA sequence format
4. Check species name against NCBI taxonomy
5. Attribute NCBI taxid
6. Write `valid`, `faulty_taxon` and `faulty_format` FASTA files
7. Curate species name using `curation` CSV file
8. Write new nodes in NCBI taxonomy for unattributed taxid species
9. Write `valid` FASTA files
## Credits
* outputs:
* formatted FASTA file
* .ldx new nodes for missing taxid into the taxonomy to link to existing genus/family taxid
**MKBDR** was coded and written by Pierre-Edouard Guerin.
We thank the following people for their help in the development of this software: Virginie Marques, Alice Valentini, David Mouillot, Emilie Boulanger, Laetitia Mathon, Laura Benestan, Stephanie Manel, Tony Dejean.
3. verifier et corriger à la main le tableau `curation csv`
## Contributions and Support
:bug: If you are sure you have found a bug, please submit a bug report. You can submit your bug reports on Gitlab [here](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/issues).
For further information or help, don't hesitate to get in touch on Slack (you can join with this invite).
[](https://cefebev.slack.com/archives/C01MDQSS57F)
4.`raw fasta``curation csv` --> validate --> `valid fasta` et mise à jour de la taxonomy