... | ... | @@ -8,7 +8,7 @@ The representative sequences must be stored as a FASTA file. See the definition |
|
|
|
|
|
The FASTA file is a set of records of representatives sequences of taxon you want to put into your custom reference database.
|
|
|
|
|
|
The following format for the description line is required (otherwise MKDIR will consider records as faulty format):
|
|
|
The following format for the description line is required (otherwise MKBDR will consider records as faulty format):
|
|
|
|
|
|
```
|
|
|
> ID; species_name=Mullus_surmuletus
|
... | ... | @@ -28,7 +28,7 @@ ATGCATGCATGCATGCATGCATGCATGCATGCATGCATGC |
|
|
* The `ID` is the unique identifier of the sequence.
|
|
|
* `;` is the delimiter between identifier and species name
|
|
|
* `species_name=` is mandatory and must be the prefix of the species name
|
|
|
* `Mullus surmuletus` is the species name in NCBI taxonom. It have to be exactly the same than the name in NCBI taxonomy otherwise MKBDR will result a taxonomy fault. The name of the species is composed of 2 words _Genus_ and _species_ separated by a delimiter. The delimiter can be `_` or ` `. Otherwise MKBDR will result a format fault.
|
|
|
* `Mullus surmuletus` is the species name in NCBI taxonomy. The species name have to be exactly the same than the name in NCBI taxonomy otherwise MKBDR will result a taxonomy fault. The name of the species is composed of 2 words _Genus_ and _species_ separated by a delimiter. The delimiter can be `_` or ` `. Otherwise MKBDR will result a format fault.
|
|
|
|
|
|
#### DNA sequence line:
|
|
|
|
... | ... | @@ -44,18 +44,15 @@ You can manually download the NCBI taxonomies file: |
|
|
|
|
|
```
|
|
|
wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz
|
|
|
tar zxvf taxdump.tar.gz
|
|
|
```
|
|
|
|
|
|
Alternatively MKBDR can download and untar the NCBI taxonomies file at the target path:
|
|
|
Alternatively MKBDR can download the NCBI taxonomies file at the target path:
|
|
|
|
|
|
```
|
|
|
mkbdr init_ncbi_taxdump --folder_path /target_path/ncbi_tax/
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
#### Structure of *.dmp files
|
|
|
|
|
|
As per NCBI's taxdump_readme.txt: Each of the files store one record in the single line that are delimited by "\t|\n" (tab, vertical bar, and newline) characters. Each record consists of one or more fields delimited by "\t|\t" (tab, vertical bar, and tab) characters. The brief description of field position and meaning for each file follows.
|
... | ... | |