reference_database issueshttps://gitlab.mbb.univ-montp2.fr/edna/reference_database/-/issues2022-06-17T08:10:19Zhttps://gitlab.mbb.univ-montp2.fr/edna/reference_database/-/issues/9download_mitochrondrion_ncbi.sh: some incomplete records (server MARBEC)2022-06-17T08:10:19Zmbrunodownload_mitochrondrion_ncbi.sh: some incomplete records (server MARBEC)- the two separating lines between some records are missing `\\` + `[empty line]`
```
16561 atcacgatg
//
LOCUS MW172448 16569 bp DNA circular PRI 09-DEC-2020
```
- some records are incompletes (ex: MW17244...- the two separating lines between some records are missing `\\` + `[empty line]`
```
16561 atcacgatg
//
LOCUS MW172448 16569 bp DNA circular PRI 09-DEC-2020
```
- some records are incompletes (ex: MW172448)
- some sequences are cut (ex: NC_057214, MN531849)mbrunombrunohttps://gitlab.mbb.univ-montp2.fr/edna/reference_database/-/issues/8The EMBL database is no longer updated2022-11-18T12:44:41ZmbrunoThe EMBL database is no longer updated- [x] replace `wget ftp://ftp.ebi.ac.uk/pub/databases/ena/sequence/snapshot_latest/std/*` by genbank API
- [x] convert genbank file into embl format
- [x] compare `ecoPCR` results- [x] replace `wget ftp://ftp.ebi.ac.uk/pub/databases/ena/sequence/snapshot_latest/std/*` by genbank API
- [x] convert genbank file into embl format
- [x] compare `ecoPCR` resultsmbrunombrunohttps://gitlab.mbb.univ-montp2.fr/edna/reference_database/-/issues/7Error ecoPCR2022-04-11T15:13:22ZmbrunoError ecoPCR![image](/uploads/619f2020a85a0900953f3fe6167f4c05/image.png)![image](/uploads/619f2020a85a0900953f3fe6167f4c05/image.png)mbrunombrunohttps://gitlab.mbb.univ-montp2.fr/edna/reference_database/-/issues/6Cleaning NCBI2021-10-29T11:51:27ZvmarquesCleaning NCBIAt the moment we use an uncurated version of NCBI
Check for algorithms/methods to automatically clean some sequences identified as erroneousAt the moment we use an uncurated version of NCBI
Check for algorithms/methods to automatically clean some sequences identified as erroneoushttps://gitlab.mbb.univ-montp2.fr/edna/reference_database/-/issues/5ENA releases are no more - update needed2021-10-29T11:50:07ZvmarquesENA releases are no more - update neededIn the classic download of reference database, we download ENA release.
Last one was num 143 but it appears it is the last one.
For further updates, a new protocol will be required.
Here is the notice:
```
This was the last ENA quate...In the classic download of reference database, we download ENA release.
Last one was num 143 but it appears it is the last one.
For further updates, a new protocol will be required.
Here is the notice:
```
This was the last ENA quaterly release using the old paradigm. For new data use the snapshot_latest folder in the parent folder.
```
From here :
`http://ftp.ebi.ac.uk/pub/databases/ena/sequence/release/README_ENA_RELEASE_IS_NO_MORE`peguerinpierre-edouard.guerin@cefe.cnrs.frpeguerinpierre-edouard.guerin@cefe.cnrs.frhttps://gitlab.mbb.univ-montp2.fr/edna/reference_database/-/issues/4Update2021-02-18T10:12:44ZvmarquesUpdatethis repo needs updating to remove singularity usage and replace all by condathis repo needs updating to remove singularity usage and replace all by condahttps://gitlab.mbb.univ-montp2.fr/edna/reference_database/-/issues/3update EMBL version2021-02-18T10:12:16Zvmarquesupdate EMBL versionDownload last version EMBLDownload last version EMBLhttps://gitlab.mbb.univ-montp2.fr/edna/reference_database/-/issues/2Problème sur la nouvelle base de référence2020-01-29T10:04:31ZvmarquesProblème sur la nouvelle base de référenceIl y a un soucis avec la nouvelle base de référence compilée pour le rapid run.
Mes assignations reviennent avec 0 MOTUs assignés à l'espèce et que des trucs improbables.
J'ai regardé un peu d'où pourrait venir la source du problème, ...Il y a un soucis avec la nouvelle base de référence compilée pour le rapid run.
Mes assignations reviennent avec 0 MOTUs assignés à l'espèce et que des trucs improbables.
J'ai regardé un peu d'où pourrait venir la source du problème, je prend un exemple d'un poisson qui est séquencé dans la base de référence et présent sur mon jeu de données:
- Nom: Lutjanus erythropterus
- Séquence (teleo): ccccaagcttataacactaagtacctaaaaccttaaaactgcaaaggggaggcaagtcgtaa
Si je cherche sur l'ancienne base:
```
grep 'Lutjanus erythropterus' /media/superdisk/edna/donnees/reference_database/reference_database_teleo/v_embl_std_clean.fasta -A2
>KP939271 family_name=Lutjanidae; species_name=Lutjanus erythropterus; family=30850; reverse_match=CTTCCGGTACACTTACCATG; taxid=211835; rank=species; forward_error=0; forward_tm=60.26; genus_name=Lutjanus; seq_length_ori=16509; forward_match=ACACCGCCCGTCACTCT; reverse_tm=54.79; genus=40493; reverse_error=0; species=211835; strand=D; Lutjanus erythropterus mitochondrion, complete genome
ccccaagcttataacactaagtacctaaaaccttaaaactgcaaaggggaggcaagtcgt
aa
```
Maintenant je fais la même chose sur la nouvelle base:
```
grep 'Lutjanus erythropterus' /media/superdisk/edna/donnees/reference_database/ref141/reference_database_teleo/v_embl_std_clean.fasta -A2
>KP939271 family_name=Lutjanidae; species_name=Lutjanus erythropterus; family=30850; reverse_match=ACACCGCCCGTCACTCT; taxid=211835; rank=species; forward_error=0; forward_tm=50.96; genus_name=Lutjanus; seq_length_ori=16509; forward_match=CTTCCGGTACACTTACCATG; reverse_tm=nan; genus=40493; reverse_error=0; species=211835; strand=R; Lutjanus erythropterus mitochondrion, complete genome
ttacgacttgcctcccctttgcagttttaaggttttaggtacttagtgttataagcttgg
gg
--
```
L'espèce existe, seulement sa séquence est totalement différente, ce qui n'est pas normal avec le même primer.
On remarque la différence suivante dans les header:
`strand=R` au lieu de `strand=D`.
En réalité, toutes les séquences sont à l'envers. Ce sont les reverse complement de ce qu'on est censé obtenir.
Tu as une idée de pourquoi on obtient ce genre de choses?peguerinpierre-edouard.guerin@cefe.cnrs.frpeguerinpierre-edouard.guerin@cefe.cnrs.frhttps://gitlab.mbb.univ-montp2.fr/edna/reference_database/-/issues/1species on GenBank 12S Tele01 missing in db_embl_std.fasta2019-08-06T13:07:16Zeboulangerspecies on GenBank 12S Tele01 missing in db_embl_std.fastasome species present on GenBank for the Tele01 12S primer do not appear on the extracted reference database.
e.g. Syngnathus typhle, accession n° KU925872.1: whole mitogenome on GenBank. But it is not in the reference database (searched...some species present on GenBank for the Tele01 12S primer do not appear on the extracted reference database.
e.g. Syngnathus typhle, accession n° KU925872.1: whole mitogenome on GenBank. But it is not in the reference database (searched in db_embl_std.fasta)
list of Mediterranean species for which this is the case:
[1] "Merlangius merlangus" "Helicolenus dactylopterus" "Istiophorus albicans" "Cetorhinus maximus"
[5] "Alosa alosa" "Thunnus albacares" "Makaira nigricans" "Istiompax indica"
[9] "Kajikia albida" "Tetrapturus georgii" "Coregonus lavaretus" "Salmo trutta"
[13] "Pagrus major" "Pampus argenteus" "Lampanyctus crocodilus" "Etmopterus spinax"
[17] "Squalus megalops" "Carcharhinus altimus" "Carcharhinus plumbeus" "Sardinella aurita"
[21] "Sparus aurata" "Beryx splendens" "Syngnathus typhle" "Trachurus trachurus"
[25] "Symphodus roissali" "Syngnathus abaster" "Symphodus cinereus" "Tetrapturus belone"
[29] "Raja clavata" "Acipenser sturio" "Echeneis naucrates" "Acipenser naccarii"
[33] "Amblyraja radiata" "Holacanthus ciliaris" "Terapon jarbua" "Symphodus mediterraneus"
[37] "Labrus merula" "Plotosus lineatus" "Pagrus auriga" "Epinephelus malabaricus"
[41] "Epinephelus coioides" "Abudefduf vaigiensis" "Dipturus oxyrinchus" "Mobula mobular"
[45] "Lagocephalus spadiceus" "Facciolella oxyrhyncha" "Netuma thalassina" "Labrus viridis"
[49] "Scomber colias"peguerinpierre-edouard.guerin@cefe.cnrs.frpeguerinpierre-edouard.guerin@cefe.cnrs.fr