Commit 805ad789 authored by peguerin's avatar peguerin
Browse files

add authorship

parent 8a105b16
# reference_database
# REFERENCE DATABASE
Collection of scripts to build a reference database.
Collection of scripts to build a reference database for metabarcoding analysis.
# reference database built from EMBL taxonomy and sequences
# Reference database built from EMBL taxonomy and sequences
This method is based on [OBItools](https://pythonhosted.org/OBITools/welcome.html#installing-the-obitools)'s reference database.
......@@ -42,7 +42,7 @@ You will also need to have the following programs installed on your computer.
- clone the project (see [Installation](#installation) section)
- fill in [config.sh](config.sh) and read [ecoPCR ](https://pythonhosted.org/OBITools/scripts/ecoPCR.html?highlight=ecopcr) documentation
## Build a reference database
## Build a reference database (MISEQ)
* Overview of the steps
1. Download the sequences
......@@ -86,4 +86,15 @@ Type the following command to convert your reference database into STAMPA format
python3 scripts/formatDB_obifasta_to_stampa.py -f /path/to/db_{prefix}.fasta -o /path/to/reference_database_stampa.fasta
```
This command generates a file `/path/to/reference_database_stampa.fasta` which you can use as a reference database for the pipeline :
* [bash_swarm](https://gitlab.mbb.univ-montp2.fr/edna/bash_swarm)
\ No newline at end of file
* [bash_swarm](https://gitlab.mbb.univ-montp2.fr/edna/bash_swarm)
# Authors
* Pierre-Edouard GUERIN, CNRS, CEFE
* Virginie Marques, CNRS, MARBEC
# License
This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details
\ No newline at end of file
# Build a reference database
##########################################################################
## Codes for scientific papers related to metabarcoding studies
##
## AUTHORS
## =======
## * Pierre-Edouard Guerin | pierre-edouard.guerin@cefe.cnrs.fr
## * Virginie Marques | virginie.marques@etu.umontpellier.fr
## * CNRS/CEFE, CNRS/MARBEC | Montpellier, France
## * 2018-2020
##
##
## Inspired by the Eric Coissac et al. obitools
## Molecular Ecology Resources 2015
##
## DESCRIPTION
## ===========
##
## Build a reference database for "miseq" sequences data.
## We recommand to launch each command line by line into linux terminal
## to check each step results
##
##
##
## USAGE
## =====
## bash buil_bdr.sh
##
##
##########################################################################
## load an environment with obitools
SINGULARITY_SIMG="/media/superdisk/utils/conteneurs/obitools.simg"
......@@ -7,21 +34,22 @@ singularity shell --bind /media/superdisk:/media/superdisk $SINGULARITY_SIMG
# configure arguments value
## configure arguments value
source ./config.sh
# download the sequences
## download the sequences
mkdir EMBL
cd EMBL
wget ftp://ftp.ebi.ac.uk/pub/databases/ena/sequence/release/std/*
gzip -d *
cd ..
# download taxonomy
## download taxonomy
mkdir TAXO
cd TAXO
wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
tar -zxvf taxdump.tar.gz
cd ..
## add mitofish sequences (by default we skip this step)
if [ $MITOFISH == 'y' ]
then
echo "adding sequences from mitofish..."
......@@ -31,12 +59,12 @@ else
echo "skip adding sequences from mitofish"
fi
# format the data
## format the data
obiconvert --skip-on-error --embl -t ./TAXO --ecopcrdb-output="${rd_prefix}" EMBL/rel_std_*.dat
# ecoPCR to simulate an in silico PCR
# 50 :: change to 20 : lost of lamproie
## ecoPCR to simulate an in silico PCR
#### 50 :: change to 20 : lost of lamproie
ecoPCR -d "${rd_prefix}" -e "${ecoPCR_e}" -l "${ecoPCR_l}" -L "${ecoPCR_L}" "${primer5}" "${primer3}" > v_"${rd_prefix}".ecopcr
# clean the database
## clean the database
## filter sequences so that they have a good taxonomic description at the species genus and family levels
obigrep -d "${rd_prefix}" --require-rank=species --require-rank=genus --require-rank=family v_"${rd_prefix}".ecopcr > v_"${rd_prefix}"_clean.fasta
## remove redundant sequences
......@@ -45,7 +73,7 @@ obiuniq -d "${rd_prefix}" v_"${rd_prefix}"_clean.fasta > v_"${rd_prefix}"_clean_
obigrep -d "${rd_prefix}" --require-rank=family v_"${rd_prefix}"_clean_uniq.fasta > v_"${rd_prefix}"_clean_uniq_clean.fasta
## ensure that sequences each have a unique identification
obiannotate --uniq-id v_"${rd_prefix}"_clean_uniq_clean.fasta > db_"${rd_prefix}".fasta
# your reference database is built !
## your reference database is built !
......@@ -54,7 +82,7 @@ obiannotate --uniq-id v_"${rd_prefix}"_clean_uniq_clean.fasta > db_"${rd_prefix}
#obitaxonomy -d "${rd_prefix}" -a 'Cidella_Hmolitrix':'species':10000088
obiconvert --skip-on-error --fasta -t ./TAXO --ecopcrdb-output=mitofish/"${rd_prefix}" mitofish/mitogene_12S.fasta
obiconvert --skip-on-error --embl -t ./TAXO --ecopcrdb-output="${rd_prefix}" EMBL/rel_std_*.dat
#obiconvert --skip-on-error --fasta -t ./TAXO --ecopcrdb-output=mitofish/"${rd_prefix}" mitofish/mitogene_12S.fasta
#obiconvert --skip-on-error --embl -t ./TAXO --ecopcrdb-output="${rd_prefix}" EMBL/rel_std_*.dat
###############################################################################
##########################################################################
## Codes for scientific papers related to metabarcoding studies
##
## AUTHORS
## =======
## * Pierre-Edouard Guerin | pierre-edouard.guerin@cefe.cnrs.fr
## * Virginie Marques | virginie.marques@etu.umontpellier.fr
## * CNRS/CEFE, CNRS/MARBEC | Montpellier, France
## * 2018-2020
##
##
## Inspired by the Eric Coissac et al. obitools
## Molecular Ecology Resources 2015
##
## DESCRIPTION
## ===========
##
## Build a reference database for "RAPIDRUN" sequences data.
## We recommand to launch each command line by line into linux terminal
## to check each step results
##
##
##
## USAGE
## =====
## bash build_bdr_rapidrun.sh
##
##
##########################################################################
##########################################################################
## RAPIDRUN
source ./config.sh
......
# argument values for building reference database
##########################################################################
## Codes for scientific papers related to metabarcoding studies
##
## AUTHORS
## =======
## * Pierre-Edouard Guerin | pierre-edouard.guerin@cefe.cnrs.fr
## * Virginie Marques | virginie.marques@etu.umontpellier.fr
## * CNRS/CEFE, CNRS/MARBEC | Montpellier, France
## * 2018-2020
##
## DESCRIPTION
## ===========
##
## Information to fill in order to execute build_bdr.sh and other scripts
##
##
##
##
##
########################################################################### argument values for building reference database
## "y" add sequences from mitofish the reference database
## "n" don't add sequences from mitofish
......
##########################################################################
## Codes for scientific papers related to metabarcoding studies
##
## AUTHORS
## =======
## * Pierre-Edouard Guerin | pierre-edouard.guerin@cefe.cnrs.fr
## * Virginie Marques | virginie.marques@etu.umontpellier.fr
## * CNRS/CEFE, CNRS/MARBEC | Montpellier, France
## * 2018-2020
##
##
## Inspired by the Eric Coissac et al. obitools
## Molecular Ecology Resources 2015
##
## DESCRIPTION
## ===========
##
## add sequences of gene from mitofish database
##
##
##########################################################################
##
mkdir mitofish
......
#===============================================================================
#HEADER
#===============================================================================
__author__ = "Pierre-Edouard Guerin"
__credits__ = ["Pierre-Edouard Guerin", "Virginie Marques"]
__license__ = "MIT"
__version__ = "1.0.1"
__maintainer__ = "Pierre-Edouard Guerin"
__email__ = "pierre-edouard.guerin@cefe.cnrs.fr"
__status__ = "Production"
"""
#
# Codes for scientific papers related to metabarcoding studies
AUTHORS
=======
* Pierre-Edouard Guerin | pierre-edouard.guerin@cefe.cnrs.fr
* CNRS/CEFE, CNRS/MARBEC | Montpellier, France
* 2018-2020
DESCRIPTION
===========
extract gene sequences (12S for instance) from mitofish database
"""
#===============================================================================
#MODULES
#===============================================================================
import Bio
from Bio import SeqIO
from Bio.Alphabet import IUPAC
......
#===============================================================================
#INFORMATIONS
#===============================================================================
"""
CEFE - EPHE - YERSIN 2018
guerin pierre-edouard
from reference database obitools fasta files,
it creates a stampa-format reference database
"""
#===============================================================================
#USAGES
#HEADER
#===============================================================================
__author__ = "Pierre-Edouard Guerin"
__credits__ = ["Pierre-Edouard Guerin", "Virginie Marques"]
__license__ = "MIT"
__version__ = "1.0.1"
__maintainer__ = "Pierre-Edouard Guerin"
__email__ = "pierre-edouard.guerin@cefe.cnrs.fr"
__status__ = "Production"
"""
#
# Codes for scientific papers related to metabarcoding studies
AUTHORS
=======
* Pierre-Edouard Guerin | pierre-edouard.guerin@cefe.cnrs.fr
* CNRS/CEFE, CNRS/MARBEC | Montpellier, France
* 2018-2020
DESCRIPTION
===========
Format obifasta file into STAMPA database FASTA file
USAGE
=====
input :
path to FASTA obitools file with "species=" information field available
output :
path to write the STAMPA-format FASTA reference databasae file
command :
python3 ncbi_to_stampa.py -f input -o output
"""
#===============================================================================
#MODULES
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment