Commit 042e0f56 authored by peguerin's avatar peguerin
Browse files

update readme

parent ec69b87b
......@@ -14,32 +14,29 @@ Scripts to convert FASTA files into reference database with NCBI taxonomy.
## Workflow
* inputs:
* FASTA file
![mkbdr](docs/mkbdr.png)
1. Check FASTA format
2. Check species name format
3. Check DNA sequence format
4. Check species name against NCBI taxonomy
5. Attribute NCBI taxid
6. Write `valid`, `faulty_taxon` and `faulty_format` FASTA files
7. Curate species name using `curation` CSV file
8. Write new nodes in NCBI taxonomy for unattributed taxid species
9. Write `valid` FASTA files
## Credits
* outputs:
* formatted FASTA file
* .ldx new nodes for missing taxid into the taxonomy to link to existing genus/family taxid
**MKBDR** was coded and written by Pierre-Edouard Guerin.
We thank the following people for their help in the development of this software: Virginie Marques, Alice Valentini, David Mouillot, Emilie Boulanger, Laetitia Mathon, Laura Benestan, Stephanie Manel, Tony Dejean.
1. `raw fasta` --> validate --> `valid fasta` `faulty_format fasta` `faulty_taxon fasta`
2. `faulty_taxon fasta` --> curegen --> `curation csv`
3. verifier et corriger à la main le tableau `curation csv`
## Contributions and Support
:bug: If you are sure you have found a bug, please submit a bug report. You can submit your bug reports on Gitlab [here](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/issues).
For further information or help, don't hesitate to get in touch on Slack (you can join with this invite).
[![Get help on Slack](https://img.shields.io/badge/slack-cefebev%23basereference-4A154B?logo=slack)](https://cefebev.slack.com/archives/C01MDQSS57F)
4. `raw fasta` `curation csv` --> validate --> `valid fasta` et mise à jour de la taxonomy
## Environment
......@@ -120,6 +117,8 @@ python3 mkbdr validate --fasta teleo_ok.fasta --curate curated_taxon.csv --outpu
cd TAXO/testouille; tar zxvf taxdump_2021.tar.gz ; cd ../../
python3 mkbdr validate --fasta teleo_ok.fasta --curate curated_taxon.csv --output_prefix "truc" --curate curated_taxon.csv --ncbi_taxdump TAXO/testouille --ncbi_taxdump_edition
```
mkbdr validate --fasta teleo_ok.fasta --curate curated_taxon.csv --output_prefix "laetinicer" --ncbi_taxdump_edition --ncbi_taxdump TAXO/testouille
obitools
......@@ -171,4 +170,5 @@ name class -- (synonym, common name, ...)
wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz
tar zxvf taxdump.tar/gz
cd TAXO/testouille; tar zxvf taxdump_2021.tar.gz ; cd ../../
\ No newline at end of file
cd TAXO/testouille; tar zxvf taxdump_2021.tar.gz ; cd ../../
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment