Modification: 20220829_DB_Ref_CEFE_teleo_curation.csv
Ophidion rochei;NA;Ophidion;Ophidiidae;genus;Catalogue of Life Checklist
The specie Ophidion rochei is present in the NCBI taxonomy database: https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?name=Ophidion+rochei
current_name | ncbi_name | genus | family | ncbi_rank | method |
---|---|---|---|---|---|
Ophidion rochei | Ophidion rochei | Ophidion | Orchidaceae | species | NCBI synonym score=1.0 |
terminal
error Ophidion rochei
error Ophidion rochei
20220829_DB_Ref_CEFE_teleo_curated_faulty_taxon.fasta
>Sample_ID51; species_name=Ophidion_rochei; faulty taxonomy: species name Ophidion rochei not found in NCBI; faulty taxonomy: species name Ophidion rochei not found in NCBI
CTCCTAAAATACCGGCTATATAACTTAATACATACACACGTTAAAGGGGAGGAAAGTCGT
AA
>Sample_ID52; species_name=Ophidion_rochei; faulty taxonomy: species name Ophidion rochei not found in NCBI; faulty taxonomy: species name Ophidion rochei not found in NCBI
CTCCTAAAATACCGGCTATATAACTTAATACATACACACGTTAAAGGGGAGGAAAGTCGT
AA
The specie Ophidion rochei is present in the NCBI taxonomy database: https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?name=Ophidion+rochei
current_name | ncbi_name | genus | family | ncbi_rank | method |
---|---|---|---|---|---|
Ophidion rochei | Ophidion rochei | Ophidion | Orchidaceae | species | NCBI synonym score=1.0 |
terminal
error Ophidion rochei
error Ophidion rochei
20220829_DB_Ref_CEFE_teleo_curated_faulty_taxon.fasta
>Sample_ID51; species_name=Ophidion_rochei; faulty taxonomy: species name Ophidion rochei not found in NCBI; faulty taxonomy: species name Ophidion rochei not found in NCBI
CTCCTAAAATACCGGCTATATAACTTAATACATACACACGTTAAAGGGGAGGAAAGTCGT
AA
>Sample_ID52; species_name=Ophidion_rochei; faulty taxonomy: species name Ophidion rochei not found in NCBI; faulty taxonomy: species name Ophidion rochei not found in NCBI
CTCCTAAAATACCGGCTATATAACTTAATACATACACACGTTAAAGGGGAGGAAAGTCGT
AA
peguerin (ffaed86b) at 04 Sep 13:25
update version
peguerin (9843a6a6) at 04 Sep 13:25
add ncbi taxon root id argument
peguerin (5962c60a) at 04 Sep 13:24
add argument to set rootaxon value
Bug - when adding a new node, the programs add ";" after the sequence name This causes ecotag to behave weirdly later on
mbruno (567a6959) at 13 Dec 14:56
fix(curate.py): remove semicolon after sequence id in curated_valid...
mkbdr validate --fasta data/teleo_ok_global+med.fasta --curate res_raw_curation.csv --ncbi_taxonomy_edition customtaxonomy/ --output_prefix res_taxo_curated
Checking arguments...done.
Validate records...
Loading local NCBI taxonomy...done.
Curating records with faulty taxonomy...
Traceback (most recent call last):
File "/home/mbruno/.local/bin/mkbdr", line 8, in <module>
sys.exit(main())
File "/home/mbruno/.local/lib/python3.9/site-packages/mkbdr/__main__.py", line 55, in main
results = curation(args.curate, rawResults, taxDic, ncbi, args.ncbi_taxonomy_edition)
File "/home/mbruno/.local/lib/python3.9/site-packages/mkbdr/curate.py", line 171, in curation
cureRankNCBI = convert_ncbirank_literal_to_integer(cureRecord.ncbi_rank.values[0])
AttributeError: 'bool' object has no attribute 'ncbi_rank'
mkbdr curegen --fasta res_raw_faulty_taxon.fasta \
--database_globalnames 'Catalogue of Life' \
--output_prefix res_raw
Results - 10/12/2021
current_name;ncbi_name;genus;family;ncbi_rank;method
Albula forsteri;Albula argentea;Albula;Albulidae;species;NCBI synonym score=1.0
Amphiprion fuscocaudatus;NA;NA;NA;NA;FAILURE: Catalogue of Life database not found in globalNames query
Atherinomorus lineatus;NA;NA;NA;NA;FAILURE: Catalogue of Life database not found in globalNames query
...
Results - 15/05/2021
current_name;ncbi_name;genus;family;ncbi_rank;method
Albula forsteri;Albula argentea;Albula;Albulidae;species;NCBI synonym score=1.0
Amphiprion fuscocaudatus;NA;Amphiprion;Pomacentridae;genus;Catalogue of Life
Atherinomorus lineatus;NA;Atherinomorus;Atherinidae;genus;Catalogue of Life
...
Test: option --database_globalnames 'Catalogue of Life Checklist'
current_name;ncbi_name;genus;family;ncbi_rank;method
Albula forsteri;Albula argentea;Albula;Albulidae;species;NCBI synonym score=1.0
Amphiprion fuscocaudatus;NA;Amphiprion;Pomacentridae;genus;Catalogue of Life Checklist
Atherinomorus lineatus;NA;Atherinomorus;Atherinidae;genus;Catalogue of Life Checklist
Test: Database found for Amphiprion fuscocaudatus
import pytaxize
pytaxize.__version__ #'0.7.0'
query = pytaxize.gn.resolve('Amphiprion fuscocaudatus')
for q in query[0]:
print(str(q['data_source_title']))
Encyclopedia of Life
CU*STAR
Index to Organism Names
uBio NameBank
Arctos
FishBase Cache
Open Tree of Life Reference Taxonomy
Catalogue of Life Checklist
Integrated Taxonomic Information SystemITIS
Union 4
The Interim Register of Marine and Nonmarine Genera
World Register of Marine Species
GBIF Backbone Taxonomy
Catalog of Fishes
FishBase
Bishop Museum
Bishop Museum
BioLib.cz
nlbif
uBio NameBank
mkbdr curegen --fasta res_raw_faulty_taxon.fasta \
--database_globalnames 'Catalogue of Life' \
--output_prefix res_raw
Results - 10/12/2021
current_name;ncbi_name;genus;family;ncbi_rank;method
Albula forsteri;Albula argentea;Albula;Albulidae;species;NCBI synonym score=1.0
Amphiprion fuscocaudatus;NA;NA;NA;NA;FAILURE: Catalogue of Life database not found in globalNames query
Atherinomorus lineatus;NA;NA;NA;NA;FAILURE: Catalogue of Life database not found in globalNames query
...
Results - 15/05/2021
current_name;ncbi_name;genus;family;ncbi_rank;method
Albula forsteri;Albula argentea;Albula;Albulidae;species;NCBI synonym score=1.0
Amphiprion fuscocaudatus;NA;Amphiprion;Pomacentridae;genus;Catalogue of Life
Atherinomorus lineatus;NA;Atherinomorus;Atherinidae;genus;Catalogue of Life
...
mkbdr validate --fasta data/teleo_ok_global+med.fasta --curate res_raw_curation.csv --ncbi_taxonomy_edition customtaxonomy/ --output_prefix res_taxo_curated
Checking arguments...done.
Validate records...
Loading local NCBI taxonomy...done.
Curating records with faulty taxonomy...
Traceback (most recent call last):
File "/home/mbruno/.local/bin/mkbdr", line 8, in <module>
sys.exit(main())
File "/home/mbruno/.local/lib/python3.9/site-packages/mkbdr/__main__.py", line 55, in main
results = curation(args.curate, rawResults, taxDic, ncbi, args.ncbi_taxonomy_edition)
File "/home/mbruno/.local/lib/python3.9/site-packages/mkbdr/curate.py", line 171, in curation
cureRankNCBI = convert_ncbirank_literal_to_integer(cureRecord.ncbi_rank.values[0])
AttributeError: 'bool' object has no attribute 'ncbi_rank'
Bug - when adding a new node, the programs add ";" after the sequence name This causes ecotag to behave weirdly later on
names.dmp:3416120:2839645 | ANK:collector:H.Duman:10209 | | isotype |
names.dmp:3416121:2839645 | GAZI:collector:H.Duman:10209 | | holotype |
names.dmp:3416122:2839645 | HUB:collector:H.Duman:10209 | | isotype |
causing ecotag to fail
I have some species name indicated as faulty format if there is more than Genus_species for example Genus_species_subspecies (or even Genus_sp_cf_species for when there is a possible new undescribed species)
>RBM2_194; species_name=Syngnathus_typhle_rondeleti ; faulty species name format Syngnathus_typhle_rondeleti
CCCCTAATATCTCATAAATTTAAGTAAAACACCTGAAAAATTAAGGGGAGGCAAGTCGTA
A
It needs to be corrected to allow such cases in an accepted format
Example with the csv for curation
current_name;ncbi_name;genus;family;ncbi_rank;method
Albula forsteri;Albula argentea;Albula;Albulidae;species;NCBI synonym score=1.0
Albula forsteri;Albula argentea;Albula;Albulidae;species;NCBI synonym score=1.0
Albula forsteri;Albula argentea;Albula;Albulidae;species;NCBI synonym score=1.0
Amphiprion fuscocaudatus;NA;Amphiprion;Pomacentridae;genus;Catalogue of Life
Atherinomorus lineatus;NA;Atherinomorus;Atherinidae;genus;Catalogue of Life
Haemulon chrysargyreum;Brachygenys chrysargyreum;Brachygenys;Haemulidae;species;NCBI synonym score=1.0
Haemulon chrysargyreum;Brachygenys chrysargyreum;Brachygenys;Haemulidae;species;NCBI synonym score=1.0
Canthigaster epilampra;NA;Canthigaster;Tetraodontidae;genus;Catalogue of Life
Distichodus perspicillatus;NA;Distichodus;Distichodontidae;genus;Catalogue of Life
Distichodus perspicillatus;NA;Distichodus;Distichodontidae;genus;Catalogue of Life
Hirundichthys rondeleti;Hirundichthys rondeletii;Hirundichthys;Exocoetidae;species;NCBI synonym score=0.9565217391304348
Haemulon chrysargyreum;Brachygenys chrysargyreum;Brachygenys;Haemulidae;species;NCBI synonym score=1.0
Hyporhamphus melanopterus;NA;Hyporhamphus;Hemiramphidae;genus;Catalogue of Life
Haemulopsis corvinaeformis;Pomadasys corvinaeformis;Pomadasys;Haemulidae;species;NCBI synonym score=1.0
Neoglyphidodon crossi;NA;Neoglyphidodon;Pomacentridae;genus;Catalogue of Life
Neoglyphidodon crossi;NA;Neoglyphidodon;Pomacentridae;genus;Catalogue of Life
Neoploactis tridorsalis;NA;NA;Aploactinidae;family;Catalogue of Life
Ophidion barbatum;NA;Ophidion;Ophidiidae;genus;Catalogue of Life
Ostorhinchus monospilus;NA;Ostorhinchus;Apogonidae;genus;Catalogue of Life
Ostorhinchus monospilus;NA;Ostorhinchus;Apogonidae;genus;Catalogue of Life
Cynoponticus savanna;NA;Cynoponticus;Muraenesocidae;genus;Catalogue of Life
Pseudanthias randali;Pseudanthias randalli;Pseudanthias;Serranidae;species;NCBI synonym score=0.95
Pseudanthias randali;Pseudanthias randalli;Pseudanthias;Serranidae;species;NCBI synonym score=0.95
Pseudanthias randali;Pseudanthias randalli;Pseudanthias;Serranidae;species;NCBI synonym score=0.95
Pseudanthias randali;Pseudanthias randalli;Pseudanthias;Serranidae;species;NCBI synonym score=0.95
Pseudanthias randali;Pseudanthias randalli;Pseudanthias;Serranidae;species;NCBI synonym score=0.95
Pseudanthias randali;Pseudanthias randalli;Pseudanthias;Serranidae;species;NCBI synonym score=0.95
Rhinobatos sainsburyi;NA;Rhinobatos;Rhinobatidae;genus;Catalogue of Life
Aspitrigla cuculus;Chelidonichthys cuculus;Chelidonichthys;Triglidae;species;NCBI synonym score=1.0
Aspitrigla cuculus;Chelidonichthys cuculus;Chelidonichthys;Triglidae;species;NCBI synonym score=1.0
Carcharhinus taurus;Carcharhinus cautus;Carcharhinus;Carcharhinidae;species;NCBI synonym score=0.8947368421052632
Glaucostegus cemicullus;Glaucostegus cemiculus;Glaucostegus;Glaucostegidae;species;NCBI synonym score=0.9565217391304348
Glaucostegus cemicullus;Glaucostegus cemiculus;Glaucostegus;Glaucostegidae;species;NCBI synonym score=0.9565217391304348
Glaucostegus cemicullus;Glaucostegus cemiculus;Glaucostegus;Glaucostegidae;species;NCBI synonym score=0.9565217391304348
Glaucostegus cemicullus;Glaucostegus cemiculus;Glaucostegus;Glaucostegidae;species;NCBI synonym score=0.9565217391304348
Gobius ater;NA;Gobius;Gobiidae;genus;Catalogue of Life
Ophidion rochei;NA;Ophidion;Ophidiidae;genus;NA
Ophidion rochei;NA;Ophidion;Ophidiidae;genus;NA
and the customtaxonomy names.dmp
10000000 | Amphiprion fuscocaudatus | | scientific name |
10000001 | Atherinomorus lineatus | | scientific name |
10000002 | Canthigaster epilampra | | scientific name |
10000003 | Distichodus perspicillatus | | scientific name |
10000004 | Distichodus perspicillatus | | scientific name |
10000005 | Hyporhamphus melanopterus | | scientific name |
10000006 | Neoglyphidodon crossi | | scientific name |
10000007 | Neoglyphidodon crossi | | scientific name |
10000008 | Ophidion barbatum | | scientific name |
10000009 | Ostorhinchus monospilus | | scientific name |
10000010 | Ostorhinchus monospilus | | scientific name |
10000011 | Cynoponticus savanna | | scientific name |
10000012 | Rhinobatos sainsburyi | | scientific name |
10000013 | Gobius ater | | scientific name |
10000014 | Ophidion rochei | | scientific name |
10000015 | Ophidion rochei | | scientific name |