|
|
The **validate** module produces valid record fasta file with taxid attributes. This fasta file works with NCBI taxonomy so that it can be used for further analysis like taxonomic assignment using ecotag.
|
|
|
The **validate** module produces valid record FASTA file with taxid attributes. This FASTA file works with NCBI taxonomy so that it can be used for further analysis like taxonomic assignment using ecotag.
|
|
|
|
|
|
|
|
|
### Usage
|
|
|
# Inputs
|
|
|
|
|
|
* species representative barcodes sequences FASTA file (see description [here](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition#representative-sequences))
|
|
|
* (optional) NCBI taxonomy files (see description [here](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition#ncbi-taxonomies-file)
|
|
|
* (optional) Curation CSV file (see description [here](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition#curation-file)
|
|
|
|
|
|
|
|
|
# Command-line Arguments
|
|
|
|
|
|
This table summarizes the command-line arguments which are using by COAT.
|
|
|
|
|
|
| complete flag argument | short flag | Default value | Summary |
|
|
|
| --- | --- | --- | --- |
|
|
|
| `--fasta` | `-f` | NA | path of the input barcodes sequences FASTA file |
|
|
|
| `--output_prefix` | `-o` | NA | Output files prefix names |
|
|
|
| `--curate` | `-c` | NA | path of the input taxonomy curation CSV file. Header must be current_name;ncbi_name;genus;family;ncbi_rank. A curation CSV file can be generated with the command curegen |
|
|
|
| `--ncbi_taxdump` | `-n` | NA | path of NCBI taxonomy folder |
|
|
|
| `--ncbi_taxdump_load` | `-l` | FALSE | load NCBI taxonomy from NCBI taxonomy folder path |
|
|
|
| `--ncbi_taxdump_edition` | `-e` | FALSE | allow curation to edit NCBI taxonomy files in order to add new taxonomy nodes |
|
|
|
|
|
|
|
|
|
# Example of commands
|
|
|
|
|
|
|
|
|
#### The simplest case:
|
... | ... | @@ -60,11 +81,19 @@ mkbdr validate --fasta raw.fasta --output_prefix res --ncbi_taxdump path/to/an/o |
|
|
```
|
|
|
|
|
|
|
|
|
# Outputs
|
|
|
|
|
|
* Valid FASTA file (see description [here](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition#valid-fasta-file))
|
|
|
* Faulty taxonomy FASTA file (see description [here](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition#faulty-taxonomy-fasta-file))
|
|
|
* Faulty format FASTA file (see description [here](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition#faulty-format-fasta-file))
|
|
|
* Edited taxonomy files: with `--curate`, `--ncbi_taxdump_edition` and `--ncbi_taxdump` options, validate module edits taxonomy files mentionned by `--ncbi_taxdump` so that nodes are added to the tree of life according to curation file specification.
|
|
|
|
|
|
To perform taxonomic assignment in further analysis you need valid FASTA file and corresponding taxonomy files.
|
|
|
|
|
|
In case the faulty format FASTA file is not empty, you have to correct manually the records in your input FASTA file and run again `mkbdr validate`.
|
|
|
|
|
|
### Inputs
|
|
|
In case the faulty taxonomy FASTA file is not empty, you have to correct the species name of the records in the input FASTA file and run again `mkbdr validate`. You can correct species name manually or use `mkbdr curegen` to generate a curation CSV file that can be use as input of `mkbdr validate --curate` that will apply curation on your input FASTA file.
|
|
|
|
|
|
|
|
|
### Outputs
|
|
|
|
|
|
|
|
|
### Options |
|
|
\ No newline at end of file |