Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
edna
reference_database
Commits
805ad789
Commit
805ad789
authored
Feb 12, 2020
by
peguerin
Browse files
add authorship
parent
8a105b16
Changes
7
Hide whitespace changes
Inline
Side-by-side
README.md
View file @
805ad789
#
reference_database
#
REFERENCE DATABASE
Collection of scripts to build a reference database.
Collection of scripts to build a reference database
for metabarcoding analysis
.
#
r
eference database built from EMBL taxonomy and sequences
#
R
eference database built from EMBL taxonomy and sequences
This method is based on
[
OBItools
](
https://pythonhosted.org/OBITools/welcome.html#installing-the-obitools
)
's reference database.
...
...
@@ -42,7 +42,7 @@ You will also need to have the following programs installed on your computer.
-
clone the project (see
[
Installation
](
#installation
)
section)
-
fill in
[
config.sh
](
config.sh
)
and read
[
ecoPCR
](
https://pythonhosted.org/OBITools/scripts/ecoPCR.html?highlight=ecopcr
)
documentation
## Build a reference database
## Build a reference database
(MISEQ)
*
Overview of the steps
1.
Download the sequences
...
...
@@ -86,4 +86,15 @@ Type the following command to convert your reference database into STAMPA format
python3 scripts/formatDB_obifasta_to_stampa.py -f /path/to/db_{prefix}.fasta -o /path/to/reference_database_stampa.fasta
```
This command generates a file
`/path/to/reference_database_stampa.fasta`
which you can use as a reference database for the pipeline :
*
[
bash_swarm
](
https://gitlab.mbb.univ-montp2.fr/edna/bash_swarm
)
\ No newline at end of file
*
[
bash_swarm
](
https://gitlab.mbb.univ-montp2.fr/edna/bash_swarm
)
# Authors
*
Pierre-Edouard GUERIN, CNRS, CEFE
*
Virginie Marques, CNRS, MARBEC
# License
This project is licensed under the MIT License - see the
[
LICENSE.md
](
LICENSE.md
)
file for details
\ No newline at end of file
build_bdr.sh
View file @
805ad789
# Build a reference database
##########################################################################
## Codes for scientific papers related to metabarcoding studies
##
## AUTHORS
## =======
## * Pierre-Edouard Guerin | pierre-edouard.guerin@cefe.cnrs.fr
## * Virginie Marques | virginie.marques@etu.umontpellier.fr
## * CNRS/CEFE, CNRS/MARBEC | Montpellier, France
## * 2018-2020
##
##
## Inspired by the Eric Coissac et al. obitools
## Molecular Ecology Resources 2015
##
## DESCRIPTION
## ===========
##
## Build a reference database for "miseq" sequences data.
## We recommand to launch each command line by line into linux terminal
## to check each step results
##
##
##
## USAGE
## =====
## bash buil_bdr.sh
##
##
##########################################################################
## load an environment with obitools
SINGULARITY_SIMG
=
"/media/superdisk/utils/conteneurs/obitools.simg"
...
...
@@ -7,21 +34,22 @@ singularity shell --bind /media/superdisk:/media/superdisk $SINGULARITY_SIMG
# configure arguments value
#
# configure arguments value
source
./config.sh
# download the sequences
#
# download the sequences
mkdir
EMBL
cd
EMBL
wget ftp://ftp.ebi.ac.uk/pub/databases/ena/sequence/release/std/
*
gzip
-d
*
cd
..
# download taxonomy
#
# download taxonomy
mkdir
TAXO
cd
TAXO
wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
tar
-zxvf
taxdump.tar.gz
cd
..
## add mitofish sequences (by default we skip this step)
if
[
$MITOFISH
==
'y'
]
then
echo
"adding sequences from mitofish..."
...
...
@@ -31,12 +59,12 @@ else
echo
"skip adding sequences from mitofish"
fi
# format the data
#
# format the data
obiconvert
--skip-on-error
--embl
-t
./TAXO
--ecopcrdb-output
=
"
${
rd_prefix
}
"
EMBL/rel_std_
*
.dat
# ecoPCR to simulate an in silico PCR
# 50 :: change to 20 : lost of lamproie
#
# ecoPCR to simulate an in silico PCR
###
# 50 :: change to 20 : lost of lamproie
ecoPCR
-d
"
${
rd_prefix
}
"
-e
"
${
ecoPCR_e
}
"
-l
"
${
ecoPCR_l
}
"
-L
"
${
ecoPCR_L
}
"
"
${
primer5
}
"
"
${
primer3
}
"
>
v_
"
${
rd_prefix
}
"
.ecopcr
# clean the database
#
# clean the database
## filter sequences so that they have a good taxonomic description at the species genus and family levels
obigrep
-d
"
${
rd_prefix
}
"
--require-rank
=
species
--require-rank
=
genus
--require-rank
=
family v_
"
${
rd_prefix
}
"
.ecopcr
>
v_
"
${
rd_prefix
}
"
_clean.fasta
## remove redundant sequences
...
...
@@ -45,7 +73,7 @@ obiuniq -d "${rd_prefix}" v_"${rd_prefix}"_clean.fasta > v_"${rd_prefix}"_clean_
obigrep
-d
"
${
rd_prefix
}
"
--require-rank
=
family v_
"
${
rd_prefix
}
"
_clean_uniq.fasta
>
v_
"
${
rd_prefix
}
"
_clean_uniq_clean.fasta
## ensure that sequences each have a unique identification
obiannotate
--uniq-id
v_
"
${
rd_prefix
}
"
_clean_uniq_clean.fasta
>
db_
"
${
rd_prefix
}
"
.fasta
# your reference database is built !
#
# your reference database is built !
...
...
@@ -54,7 +82,7 @@ obiannotate --uniq-id v_"${rd_prefix}"_clean_uniq_clean.fasta > db_"${rd_prefix}
#obitaxonomy -d "${rd_prefix}" -a 'Cidella_Hmolitrix':'species':10000088
obiconvert
--skip-on-error
--fasta
-t
./TAXO
--ecopcrdb-output
=
mitofish/
"
${
rd_prefix
}
"
mitofish/mitogene_12S.fasta
obiconvert
--skip-on-error
--embl
-t
./TAXO
--ecopcrdb-output
=
"
${
rd_prefix
}
"
EMBL/rel_std_
*
.dat
#
obiconvert --skip-on-error --fasta -t ./TAXO --ecopcrdb-output=mitofish/"${rd_prefix}" mitofish/mitogene_12S.fasta
#
obiconvert --skip-on-error --embl -t ./TAXO --ecopcrdb-output="${rd_prefix}" EMBL/rel_std_*.dat
build_bdr_rapidrun.sh
View file @
805ad789
###############################################################################
##########################################################################
## Codes for scientific papers related to metabarcoding studies
##
## AUTHORS
## =======
## * Pierre-Edouard Guerin | pierre-edouard.guerin@cefe.cnrs.fr
## * Virginie Marques | virginie.marques@etu.umontpellier.fr
## * CNRS/CEFE, CNRS/MARBEC | Montpellier, France
## * 2018-2020
##
##
## Inspired by the Eric Coissac et al. obitools
## Molecular Ecology Resources 2015
##
## DESCRIPTION
## ===========
##
## Build a reference database for "RAPIDRUN" sequences data.
## We recommand to launch each command line by line into linux terminal
## to check each step results
##
##
##
## USAGE
## =====
## bash build_bdr_rapidrun.sh
##
##
##########################################################################
##########################################################################
## RAPIDRUN
source
./config.sh
...
...
config.sh
View file @
805ad789
# argument values for building reference database
##########################################################################
## Codes for scientific papers related to metabarcoding studies
##
## AUTHORS
## =======
## * Pierre-Edouard Guerin | pierre-edouard.guerin@cefe.cnrs.fr
## * Virginie Marques | virginie.marques@etu.umontpellier.fr
## * CNRS/CEFE, CNRS/MARBEC | Montpellier, France
## * 2018-2020
##
## DESCRIPTION
## ===========
##
## Information to fill in order to execute build_bdr.sh and other scripts
##
##
##
##
##
########################################################################### argument values for building reference database
## "y" add sequences from mitofish the reference database
## "n" don't add sequences from mitofish
...
...
scripts/add_sequences_from_mitofish.sh
View file @
805ad789
##########################################################################
## Codes for scientific papers related to metabarcoding studies
##
## AUTHORS
## =======
## * Pierre-Edouard Guerin | pierre-edouard.guerin@cefe.cnrs.fr
## * Virginie Marques | virginie.marques@etu.umontpellier.fr
## * CNRS/CEFE, CNRS/MARBEC | Montpellier, France
## * 2018-2020
##
##
## Inspired by the Eric Coissac et al. obitools
## Molecular Ecology Resources 2015
##
## DESCRIPTION
## ===========
##
## add sequences of gene from mitofish database
##
##
##########################################################################
##
mkdir
mitofish
...
...
scripts/extract_gene_from_mito.py
View file @
805ad789
#===============================================================================
#HEADER
#===============================================================================
__author__
=
"Pierre-Edouard Guerin"
__credits__
=
[
"Pierre-Edouard Guerin"
,
"Virginie Marques"
]
__license__
=
"MIT"
__version__
=
"1.0.1"
__maintainer__
=
"Pierre-Edouard Guerin"
__email__
=
"pierre-edouard.guerin@cefe.cnrs.fr"
__status__
=
"Production"
"""
#
# Codes for scientific papers related to metabarcoding studies
AUTHORS
=======
* Pierre-Edouard Guerin | pierre-edouard.guerin@cefe.cnrs.fr
* CNRS/CEFE, CNRS/MARBEC | Montpellier, France
* 2018-2020
DESCRIPTION
===========
extract gene sequences (12S for instance) from mitofish database
"""
#===============================================================================
#MODULES
#===============================================================================
import
Bio
from
Bio
import
SeqIO
from
Bio.Alphabet
import
IUPAC
...
...
scripts/formatDB_obifasta_to_stampa.py
View file @
805ad789
#===============================================================================
#INFORMATIONS
#===============================================================================
"""
CEFE - EPHE - YERSIN 2018
guerin pierre-edouard
from reference database obitools fasta files,
it creates a stampa-format reference database
"""
#===============================================================================
#USAGES
#HEADER
#===============================================================================
__author__
=
"Pierre-Edouard Guerin"
__credits__
=
[
"Pierre-Edouard Guerin"
,
"Virginie Marques"
]
__license__
=
"MIT"
__version__
=
"1.0.1"
__maintainer__
=
"Pierre-Edouard Guerin"
__email__
=
"pierre-edouard.guerin@cefe.cnrs.fr"
__status__
=
"Production"
"""
#
# Codes for scientific papers related to metabarcoding studies
AUTHORS
=======
* Pierre-Edouard Guerin | pierre-edouard.guerin@cefe.cnrs.fr
* CNRS/CEFE, CNRS/MARBEC | Montpellier, France
* 2018-2020
DESCRIPTION
===========
Format obifasta file into STAMPA database FASTA file
USAGE
=====
input :
path to FASTA obitools file with "species=" information field available
output :
path to write the STAMPA-format FASTA reference databasae file
command :
python3 ncbi_to_stampa.py -f input -o output
"""
#===============================================================================
#MODULES
...
...
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment