Commit b7831f8f authored by peguerin's avatar peguerin
Browse files

readme update

parent d8163619
......@@ -4,11 +4,11 @@
Scripts to convert FASTA files into reference database linked to NCBI taxonomy.
Scripts to convert FASTA files into reference database with NCBI taxonomy.
## Introduction
scripts to create our own reference database with our own sequences only and using the NCBI taxonomy
**mkbdr** is a python program designed to create reference database from FASTA file using the NCBI taxonomy. It also provides tools to assist and perform taxonomy curation on the input FASTA file.
......@@ -17,28 +17,29 @@ scripts to create our own reference database with our own sequences only and usi
* inputs:
* FASTA file
0. get raw fasta files of new sequences with species-names
1. Extract sequence name
2. Check sequence name format
3. Check sequences format (iuapc ambiguity, gaps)
4. Correct NCBI-taxonomy species name (this is semi-automatic)
5. Attribute NCBI-taxonomy taxid
6. Extract names with missing taxid
1. Attribute NCBI-taxonomy taxid of genus
2. Run obitaxonommy command for unattributed taxid species
7. Write fasta file of sequences with their taxid and complete genus-species name
1. Check FASTA format
2. Check species name format
3. Check DNA sequence format
4. Check species name against NCBI taxonomy
5. Attribute NCBI taxid
6. Write `valid`, `faulty_taxon` and `faulty_format` FASTA files
7. Curate species name using `curation` CSV file
8. Write new nodes in NCBI taxonomy for unattributed taxid species
9. Write `valid` FASTA files
* outputs:
* formatted FASTA file
* .ldx new nodes for missing taxid into the taxonomy to link to existing genus/family taxid
1. `raw fasta` --> validate --> `valide fasta` `faulty_format fasta` `faulty_taxon fasta`
2. `faulty_taxon fasta` --> curate (actuellement Laetitia qui fait ce job) --> `curated_taxon csv`
1. `raw fasta` --> validate --> `valid fasta` `faulty_format fasta` `faulty_taxon fasta`
2. `faulty_taxon fasta` --> curegen --> `curation csv`
3. verifier et corriger à la main le tableau `curated_taxon csv`
3. verifier et corriger à la main le tableau `curation csv`
4. `raw fasta` `curated_taxon csv` --> validate --> `valide fasta` et mise à jour de la taxonomy
4. `raw fasta` `curation csv` --> validate --> `valid fasta` et mise à jour de la taxonomy
## Environment
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment