Commit 5510a9ca authored by peguerin's avatar peguerin
Browse files

readme update

parent b6393f1a
......@@ -8,18 +8,18 @@ Scripts to convert FASTA files into reference database with NCBI taxonomy.
## Introduction
**mkbdr** is a python program designed to create reference database from FASTA file using the NCBI taxonomy. It also provides tools to assist and perform taxonomy curation on the input FASTA file.
**MKBDR** is a python program designed to create reference database from FASTA file using the NCBI taxonomy. It also provides tools to assist and perform taxonomy curation on the input FASTA file.
## Workflow
## Method
![mkbdr](docs/mkbdr.png)
## Credits
**MKBDR** was coded and written by Pierre-Edouard Guerin.
**MKBDR** was coded and written by Pierre-Edouard Guerin, Laetitia Mathon and Virginie Marques.
We thank the following people for their help in the development of this software: Virginie Marques, Alice Valentini, David Mouillot, Emilie Boulanger, Laetitia Mathon, Laura Benestan, Stephanie Manel, Tony Dejean.
......@@ -127,43 +127,6 @@ conda activate obitools
ecotag -t TAXO/testouille -R truc_valide.fasta -m 0.95 -r nimp.fasta
```
## Taxonomy
(Thanks to the work of [Guy Leonard](https://github.com/guyleonard/taxdump_edit).)
### Structure of *.dmp files
As per NCBI's taxdump_readme.txt: Each of the files store one record in the single line that are delimited by "\t|\n" (tab, vertical bar, and newline) characters. Each record consists of one or more fields delimited by "\t|\t" (tab, vertical bar, and tab) characters. The brief description of field position and meaning for each file follows.
### nodes.dmp
This file represents taxonomy nodes. The description for each node includes the following fields:
```
tax_id -- node id in GenBank taxonomy database
parent tax_id -- parent node id in GenBank taxonomy database
rank -- rank of this node (superkingdom, kingdom, ...)
embl code -- locus-name prefix; not unique
division id -- see division.dmp file
inherited div flag (1 or 0) -- 1 if node inherits division from parent
genetic code id -- see gencode.dmp file
inherited GC flag (1 or 0) -- 1 if node inherits genetic code from parent
mitochondrial genetic code id -- see gencode.dmp file
inherited MGC flag (1 or 0) -- 1 if node inherits mitochondrial gencode from parent
GenBank hidden flag (1 or 0) -- 1 if name is suppressed in GenBank entry lineage
hidden subtree root flag (1 or 0) -- 1 if this subtree has no sequence data yet
comments -- free-text comments and citations
```
### names.dmp
Taxonomy names file has these fields:
```
tax_id -- the id of node associated with this name
name_txt -- name itself
unique name -- the unique variant of this name if name not unique
name class -- (synonym, common name, ...)
```
### Taxdump Files
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment