... | @@ -18,4 +18,68 @@ This table summarizes the command-line arguments which are using by `mkbdr cureg |
... | @@ -18,4 +18,68 @@ This table summarizes the command-line arguments which are using by `mkbdr cureg |
|
| `--ncbi_taxdump_load` | `-l` | FALSE | load NCBI taxonomy from NCBI taxonomy folder path |
|
|
| `--ncbi_taxdump_load` | `-l` | FALSE | load NCBI taxonomy from NCBI taxonomy folder path |
|
|
|
|
|
|
|
|
|
|
|
|
# Example of commands
|
|
|
|
|
|
|
|
### The simplest case:
|
|
|
|
|
|
|
|
|
|
|
|
This command read `res_faulty_taxon.fasta` previously generated by `mkbdr validate` and generate a curation CSV file called `mycure.csv`.
|
|
|
|
|
|
|
|
```
|
|
|
|
mkbdr curegen --fasta res_faulty_taxon.fasta --output_prefix "mycure"
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Select a prefered database for geonames:
|
|
|
|
|
|
|
|
By default MKBDR seeks synonyms of genus and family names in 'Fishbase' database usinge geonames. It is possible to specify another database such as 'Catalog of Life' with the argument `--database_globalnames`.
|
|
|
|
|
|
|
|
```
|
|
|
|
mkbdr curegen --fasta test_raw_faulty_taxon.fasta \
|
|
|
|
--output_prefix "cure" \
|
|
|
|
--database_globalnames 'Catalogue of Life'
|
|
|
|
```
|
|
|
|
|
|
|
|
### Using local NCBI taxonomy files
|
|
|
|
|
|
|
|
It is possible to load local NCBI taxonomy files. Here we load `path/to/ncbi_taxo` with arguments `--ncbi_taxdump` and `--ncbi_taxdump_load`.
|
|
|
|
|
|
|
|
```
|
|
|
|
mkbdr curegen --fasta test_raw_faulty_taxon.fasta \
|
|
|
|
--output_prefix "cure" \
|
|
|
|
--ncbi_taxdump path/to/ncbi_taxo \
|
|
|
|
--ncbi_taxdump_load
|
|
|
|
```
|
|
|
|
|
|
|
|
## Outputs
|
|
|
|
|
|
|
|
The **curegen** module generates a curation CSV file. (see Files definition section for more information). The curation file is a `;`-separated CSV file. It is a matrix with records (taxonomy) as row and the following columns:
|
|
|
|
|
|
|
|
* `current_name` is the species name of the record in the input FASTA file
|
|
|
|
* `ncbi_name`is the curated species name of the record
|
|
|
|
* `genus` is the curated genus name of the record
|
|
|
|
* `family` is the curated family name of the record
|
|
|
|
* `ncbi_rank` is the NCBI knowledge level of curated names (species, genus or family)
|
|
|
|
* `method` gives non-mandatory information about the source of the curation of the record (i.e. geonames or NCBI synonyms seeking)
|
|
|
|
|
|
|
|
|
|
|
|
Let's see an example of result returned by curegen:
|
|
|
|
|
|
|
|
|
|
|
|
| current_name | ncbi_name | genus | family | ncbi_rank | method |
|
|
|
|
|---|---|---|---|---|---|
|
|
|
|
| Albula forsteri | Albula argentea | Albula | Albulidae | species | NCBI synonym score=1.0 |
|
|
|
|
| Hyporhamphus melanopterus | NA | Hyporhamphus | Hemiramphidae | genus | Catalogue of Life |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
* A NCBI synonym has been found for _Albula forsteri_ species name. This is _Albula argentea_.
|
|
|
|
* A Catalog of Life matching with NCBI has been found for _Hyporhamphus_ at genus level.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The curation CSV file can be used with `mkbdr validate --curate` command to perform curation on faulty taxonomy records.
|
|
|
|
|
|
|
|
:warning: This file is a raw product and need to be manually checked before to be used as input of `mkbdr validate --curate` to perform curation of species name and local taxonomy.
|
|
|
|
|
|
|
|
|