... | @@ -175,9 +175,54 @@ The local taxonomy located on `customtaxonomy/` has been edited and all the faul |
... | @@ -175,9 +175,54 @@ The local taxonomy located on `customtaxonomy/` has been edited and all the faul |
|
* `customtaxonomy`: custom NCBI taxonomy
|
|
* `customtaxonomy`: custom NCBI taxonomy
|
|
|
|
|
|
|
|
|
|
## 6. Test the custom reference database
|
|
## 6. Taxonomic assignment with our custom reference database
|
|
|
|
|
|
|
|
We use our custom reference database to assign taxon with [ecotag](https://pythonhosted.org/OBITools/scripts/ecotag.html).
|
|
|
|
|
|
|
|
|
|
|
|
* Installing ecotag with conda environment:
|
|
|
|
|
|
|
|
```
|
|
|
|
curl -LJO https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/blob/master/envs/obitools_envs.yaml
|
|
|
|
conda env create -f envs/obitools_envs.yaml
|
|
|
|
```
|
|
|
|
|
|
|
|
Input files are:
|
|
|
|
|
|
|
|
* `res_taxo_curated_valide.fasta`: fasta file containing reference sequences
|
|
|
|
* `customtaxonomy/`: NCBI Taxonomy repository name
|
|
|
|
* `raw.fasta`: Fasta query records file
|
|
|
|
|
|
|
|
```
|
|
|
|
conda activate obitools
|
|
|
|
ecotag -t customtaxonomy \
|
|
|
|
-R res_taxo_curated_valide.fasta \
|
|
|
|
raw.fasta
|
|
|
|
```
|
|
|
|
|
|
|
|
This outputs:
|
|
|
|
|
|
|
|
```
|
|
|
|
>YCA_R0001; count=1; id_status={'res_taxo_curated_valide': True}; family=30840; species_name=Haemulon flavolineatum; best_match={'res_taxo_curated_valide': 'YCA_R0001'}; taxid_by_db={'res_taxo_curated_valide': 236585}; rank_by_db={'res_taxo_curated_valide': 'species'}; scientific_name=Haemulon flavolineatum; match_count={'res_taxo_curated_valide': 1}; rank=species; taxid=236585; species=236585; order_name=Lutjaniformes; best_identity={'res_taxo_curated_valide': 1.0}; scientific_name_by_db={'res_taxo_curated_valide': 'Haemulon flavolineatum'}; species_list={'res_taxo_curated_valide': ['Haemulon flavolineatum']}; genus_name=Haemulon; family_name=Haemulidae; genus=119374; order=2024539; species_name=Haemulon flavolineatum
|
|
|
|
ccccaagctcaactagcaacatgcctaaaacacaaaaaatgcaaaggggaggcaagtcgt
|
|
|
|
aa
|
|
|
|
>YCA_R0002; count=1; id_status={'res_taxo_curated_valide': True}; family=30828; species_name=Chaetodon ocellatus; best_match={'res_taxo_curated_valide': 'YCA_R0002'}; taxid_by_db={'res_taxo_curated_valide': 466120}; rank_by_db={'res_taxo_curated_valide': 'species'}; scientific_name=Chaetodon ocellatus; match_count={'res_taxo_curated_valide': 1}; rank=species; taxid=466120; species=466120; order_name=Chaetodontiformes; best_identity={'res_taxo_curated_valide': 1.0}; scientific_name_by_db={'res_taxo_curated_valide': 'Chaetodon ocellatus'}; species_list={'res_taxo_curated_valide': ['Chaetodon ocellatus']}; genus_name=Chaetodon; family_name=Chaetodontidae; genus=37948; order=1545895; species_name=Chaetodon ocellatus
|
|
|
|
ctccaagcctaaaactttttactttactaatgtgctgcaattgtagaggagaggcaagtc
|
|
|
|
gtaa
|
|
|
|
>REF-17-0642; count=1; id_status={'res_taxo_curated_valide': True}; family=54907; species_name=Albula argentea; best_match={'res_taxo_curated_valide': 'REF-17-0642;'}; taxid_by_db={'res_taxo_curated_valide': 531982}; rank_by_db={'res_taxo_curated_valide': 'species'}; scientific_name=Albula argentea; match_count={'res_taxo_curated_valide': 1}; rank=species; taxid=531982; species=531982; order_name=Albuliformes; best_identity={'res_taxo_curated_valide': 1.0}; scientific_name_by_db={'res_taxo_curated_valide': 'Albula argentea'}; species_list={'res_taxo_curated_valide': ['Albula argentea']}; genus_name=Albula; family_name=Albulidae; genus=54908; order=54906; species_name=Albula forsteri
|
|
|
|
cctcgaattacatgagtaacaagtatataagctttaaggtagctataagaggaggtaagt
|
|
|
|
cgtaa
|
|
|
|
>YCA_R0449; count=1; id_status={'res_taxo_curated_valide': True}; family=30863; species_name=Stegastes xanthurus; best_match={'res_taxo_curated_valide': 'YCA_R0449;'}; taxid_by_db={'res_taxo_curated_valide': 10000000}; rank_by_db={'res_taxo_curated_valide': 'species'}; scientific_name=Stegastes xanthurus; match_count={'res_taxo_curated_valide': 1}; rank=species; taxid=10000000; species=10000000; order_name=None; best_identity={'res_taxo_curated_valide': 1.0}; scientific_name_by_db={'res_taxo_curated_valide': 'Stegastes xanthurus'}; species_list={'res_taxo_curated_valide': ['Stegastes xanthurus']}; genus_name=Stegastes; family_name=Pomacentridae; genus=80992; order=None; species_name=Stegastes xanthurus
|
|
|
|
cccctaaatttgacatttaacaatatttaaaaccttgcacaagaaagaggggagaaaagt
|
|
|
|
cgtaa
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
As we can read, all the query records have been found in our custom database including curated records such as `REF-17-0642` with a NCBI synonym species_name=Albula argentea and `YCA_R0449` with a custom taxid=10000000.
|
|
|
|
|
|
|
|
|
|
|
|
:seedling: We successfully made a custom reference database and tested it.
|
|
|
|
|
|
We will use
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|