... | ... | @@ -90,3 +90,69 @@ name_txt -- name itself |
|
|
unique name -- the unique variant of this name if name not unique
|
|
|
name class -- (synonym, common name, ...)
|
|
|
```
|
|
|
|
|
|
# Output files
|
|
|
|
|
|
|
|
|
#### Valid FASTA file
|
|
|
|
|
|
This is the "good" output of MKBDR. This is a FASTA files with NCBI taxid information and valid records (format and taxonomy are corrects).
|
|
|
This FASTA file can be use as input to perform taxonomic assignment with software such as [ecotag](https://pythonhosted.org/OBITools/scripts/ecotag.html)
|
|
|
|
|
|
In addition to `species_name` attributes, MKBDR added the following attributes:
|
|
|
|
|
|
* `taxid`: the NCBI taxonomy identifier of species
|
|
|
* `genus_taxid`: the NCBI taxonomy identifier of genus
|
|
|
* `genus_name`: the NCBI genus name
|
|
|
* `family_taxid`: the NCBI taxonomy identifier of
|
|
|
* `family_name`: the NCBI family name
|
|
|
* `scientific_name`: the NCBI scientific name
|
|
|
* `rank`: the known rank for this record (species, genus, family, etc)
|
|
|
* description can be `valid`, `curated` or `new taxon node`
|
|
|
* `valid` means that the record taxonomy and format is correct
|
|
|
* `curated` means that the record taxonomy was wrong and MKBDR performed a curation (`--curate` option) to replace the wrong species name by the NCBI one.
|
|
|
* `new taxon node`means the record taxonomy was wrong and no synonyms were found in NCBI so that MKBDR created a new node in your local NCBI taxonomy. For further analysis, taxonomic assignment with ecotag will only work with these records if you also use the edited NCBI taxonomy generated by MKBDR.
|
|
|
|
|
|
|
|
|
* Example of a valid FASTA file output:
|
|
|
|
|
|
```
|
|
|
>ID1 taxid=36214; species_name=Abudefduf concolor; genus_taxid=36213; genus_name=Abudefduf; family_taxid=30863; family_name=Pomacentridae; scientific_name=Abudefduf concolor; rank=species; valid
|
|
|
CCCCGAGCTAACATGAATGTATTCTTAATAACCAACACCTGCAAAGGGGAGGCAAGTCGT
|
|
|
>ID2; taxid=531982; species_name=Albula argentea; genus_taxid=54908; genus_name=Albula; family_taxid=54907; family_name=Albulidae; scientific_name=Albula argentea; rank=species; curated
|
|
|
CCTCGAATTACATGAGTAACAAGTATATAAGCTTTAAGGTAGCTATAAGAGGAGGTAAGT
|
|
|
>ID3; taxid=10000015; species_name=Rhinobatos sainsburyi; genus_taxid=7861; genus_name=Rhinobatos; family_taxid=7860; family_name=Rhinobatidae; scientific_name=Rhinobatos sainsburyi; rank=species; new taxon node
|
|
|
CCTCAACACAAAAAAATCACTACATAAACAAACTTAACCAACAAGAGGAGGCAAGTCGTA
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
#### Faulty taxonomy FASTA file
|
|
|
|
|
|
This is the same file than the input FASTA file with faulty taxonomy records only. This means the format of the record is correct but the species name is not found in NCBI.
|
|
|
You can use this file as input of `mkbdr curegen` to generate a curation table.
|
|
|
|
|
|
* Example of a faulty taxonomy FASTA file output:
|
|
|
|
|
|
```
|
|
|
>ID1; species_name=Mullus_wrongus; faulty taxonomy: species name Mullus wrongus not found in NCBI
|
|
|
CCTCAAACATTTATATACATATATCCATAAAAAGAAATACTGAACAAGAGGAGGCAAGTC
|
|
|
```
|
|
|
|
|
|
|
|
|
#### Faulty format FASTA file
|
|
|
|
|
|
This is the same file than the input FASTA file with faulty format records only. The format fault is described in the description line. You will have to edit manually your input FASTA file so that the format is correct.
|
|
|
|
|
|
* Example of a faulty format FASTA file output:
|
|
|
|
|
|
```
|
|
|
>ID07 ; species_name=Mullus sp. surmetuls ; faulty species name format Mullus sp. surmetuls
|
|
|
CCTCAAACATTTATATACATATATCCATAAAAAGAAATACTGAACAAGAGGAGGCAAGTC
|
|
|
>ID19 ; species_name=Mullus surmueltus ; faulty DNA sequence
|
|
|
CNNNCTCAACACAAAAAAATCACTACATAAACAAACTT--CCAACAAGAGGAGGCAAGTC
|
|
|
```
|
|
|
* ID07 species name is format faulty (excepted 2 words _Genus_ _species_ separated by ` ` or `_`)
|
|
|
* ID19 DNA sequence is format faulty (IUAPC ambiguities `NNN` and gaps `-`)
|
|
|
|