| ... | @@ -175,9 +175,54 @@ The local taxonomy located on `customtaxonomy/` has been edited and all the faul | 
... | @@ -175,9 +175,54 @@ The local taxonomy located on `customtaxonomy/` has been edited and all the faul | 
| 
 | 
* `customtaxonomy`: custom NCBI taxonomy
 | 
 | 
* `customtaxonomy`: custom NCBI taxonomy
 | 
| 
 | 
 | 
 | 
 | 
| 
 | 
 | 
 | 
 | 
| 
 | 
## 6. Test the custom reference database
 | 
 | 
## 6. Taxonomic assignment with our custom reference database
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
We use our custom reference database to assign taxon with [ecotag](https://pythonhosted.org/OBITools/scripts/ecotag.html).
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
* Installing ecotag with conda environment:
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
```
 | 
 | 
 | 
 | 
curl -LJO https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/blob/master/envs/obitools_envs.yaml
 | 
 | 
 | 
 | 
conda env create -f envs/obitools_envs.yaml
 | 
 | 
 | 
 | 
```
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
Input files are:
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
* `res_taxo_curated_valide.fasta`: fasta file containing reference sequences
 | 
 | 
 | 
 | 
* `customtaxonomy/`: NCBI Taxonomy repository name
 | 
 | 
 | 
 | 
* `raw.fasta`: Fasta query records file
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
```
 | 
 | 
 | 
 | 
conda activate obitools
 | 
 | 
 | 
 | 
ecotag -t customtaxonomy \
 | 
 | 
 | 
 | 
-R res_taxo_curated_valide.fasta \
 | 
 | 
 | 
 | 
raw.fasta
 | 
 | 
 | 
 | 
```
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
This outputs:
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
```
 | 
 | 
 | 
 | 
>YCA_R0001; count=1; id_status={'res_taxo_curated_valide': True}; family=30840; species_name=Haemulon flavolineatum; best_match={'res_taxo_curated_valide': 'YCA_R0001'}; taxid_by_db={'res_taxo_curated_valide': 236585}; rank_by_db={'res_taxo_curated_valide': 'species'}; scientific_name=Haemulon flavolineatum; match_count={'res_taxo_curated_valide': 1}; rank=species; taxid=236585; species=236585; order_name=Lutjaniformes; best_identity={'res_taxo_curated_valide': 1.0}; scientific_name_by_db={'res_taxo_curated_valide': 'Haemulon flavolineatum'}; species_list={'res_taxo_curated_valide': ['Haemulon flavolineatum']}; genus_name=Haemulon; family_name=Haemulidae; genus=119374; order=2024539; species_name=Haemulon flavolineatum
 | 
 | 
 | 
 | 
ccccaagctcaactagcaacatgcctaaaacacaaaaaatgcaaaggggaggcaagtcgt
 | 
 | 
 | 
 | 
aa
 | 
 | 
 | 
 | 
>YCA_R0002; count=1; id_status={'res_taxo_curated_valide': True}; family=30828; species_name=Chaetodon ocellatus; best_match={'res_taxo_curated_valide': 'YCA_R0002'}; taxid_by_db={'res_taxo_curated_valide': 466120}; rank_by_db={'res_taxo_curated_valide': 'species'}; scientific_name=Chaetodon ocellatus; match_count={'res_taxo_curated_valide': 1}; rank=species; taxid=466120; species=466120; order_name=Chaetodontiformes; best_identity={'res_taxo_curated_valide': 1.0}; scientific_name_by_db={'res_taxo_curated_valide': 'Chaetodon ocellatus'}; species_list={'res_taxo_curated_valide': ['Chaetodon ocellatus']}; genus_name=Chaetodon; family_name=Chaetodontidae; genus=37948; order=1545895; species_name=Chaetodon ocellatus
 | 
 | 
 | 
 | 
ctccaagcctaaaactttttactttactaatgtgctgcaattgtagaggagaggcaagtc
 | 
 | 
 | 
 | 
gtaa
 | 
 | 
 | 
 | 
>REF-17-0642; count=1; id_status={'res_taxo_curated_valide': True}; family=54907; species_name=Albula argentea; best_match={'res_taxo_curated_valide': 'REF-17-0642;'}; taxid_by_db={'res_taxo_curated_valide': 531982}; rank_by_db={'res_taxo_curated_valide': 'species'}; scientific_name=Albula argentea; match_count={'res_taxo_curated_valide': 1}; rank=species; taxid=531982; species=531982; order_name=Albuliformes; best_identity={'res_taxo_curated_valide': 1.0}; scientific_name_by_db={'res_taxo_curated_valide': 'Albula argentea'}; species_list={'res_taxo_curated_valide': ['Albula argentea']}; genus_name=Albula; family_name=Albulidae; genus=54908; order=54906; species_name=Albula forsteri
 | 
 | 
 | 
 | 
cctcgaattacatgagtaacaagtatataagctttaaggtagctataagaggaggtaagt
 | 
 | 
 | 
 | 
cgtaa
 | 
 | 
 | 
 | 
>YCA_R0449; count=1; id_status={'res_taxo_curated_valide': True}; family=30863; species_name=Stegastes xanthurus; best_match={'res_taxo_curated_valide': 'YCA_R0449;'}; taxid_by_db={'res_taxo_curated_valide': 10000000}; rank_by_db={'res_taxo_curated_valide': 'species'}; scientific_name=Stegastes xanthurus; match_count={'res_taxo_curated_valide': 1}; rank=species; taxid=10000000; species=10000000; order_name=None; best_identity={'res_taxo_curated_valide': 1.0}; scientific_name_by_db={'res_taxo_curated_valide': 'Stegastes xanthurus'}; species_list={'res_taxo_curated_valide': ['Stegastes xanthurus']}; genus_name=Stegastes; family_name=Pomacentridae; genus=80992; order=None; species_name=Stegastes xanthurus
 | 
 | 
 | 
 | 
cccctaaatttgacatttaacaatatttaaaaccttgcacaagaaagaggggagaaaagt
 | 
 | 
 | 
 | 
cgtaa
 | 
 | 
 | 
 | 
```
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
As we can read, all the query records have been found in our custom database including curated records such as  `REF-17-0642` with a NCBI synonym species_name=Albula argentea and `YCA_R0449` with a custom taxid=10000000.
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
:seedling: We successfully made a custom reference database and tested it. 
 | 
| 
 | 
 | 
 | 
 | 
| 
 | 
We will use 
 | 
 | 
 | 
| 
 | 
 | 
 | 
 | 
| 
 | 
 | 
 | 
 | 
| 
 | 
 | 
 | 
 |