# reference_database Collection of scripts to build a reference database. # reference database built from EMBL taxonomy and sequences This method is based on [OBItools](https://pythonhosted.org/OBITools/welcome.html#installing-the-obitools)'s reference database. ## Installation To build this reference database, you will need a local copy of this repository * open a shell * make a folder, name it yourself, I named it workdir ``` mkdir workdir cd workdir ``` * clone the project and switch to the main folder, it's your working directory ``` git clone http://gitlab.mbb.univ-montp2.fr/edna/reference_database.git cd reference_database ``` ## Dependencies You will also need to have the following programs installed on your computer. - [OBItools](https://pythonhosted.org/OBITools/welcome.html#installing-the-obitools) - [ecoPCR](https://git.metabarcoding.org/obitools/ecopcr/wikis/home) ## Preparation - install dependencies - clone the project (see [Installation](#installation) section) - fulfill [config.sh](config.sh) and read [ecoPCR ](https://pythonhosted.org/OBITools/scripts/ecoPCR.html?highlight=ecopcr) documentation ## Build a reference database * Overview of the steps 1. Download the sequences¶ 2. Download the taxonomy 3. Use [ecoPCR](https://git.metabarcoding.org/obitools/ecopcr/wikis/home) to simulate an *in silico* PCR 4. Clean the database * Run the script Once you have fulfilled the [config.sh](config.sh) files with the right parameters, simply run the following command into the current folder `reference_database` ``` bash build_bdr.sh ``` It will be very long to download the sequences and process the *in silico* PCR and filtering steps. Be sure you can run without interruption this script for several days. I recommand you to open [build_bdr.sh](build_bdr.sh) and to run it step by step. For more information, this script is based on the [OBItools](https://pythonhosted.org/OBITools/welcome.html#installing-the-obitools)'s tutorial available [here](https://pythonhosted.org/OBITools/wolves.html). ## Results let's define {prefix} as the prefix of the names of reference database as it's defined in [config.sh](config.sh) The folder of the project will contain `{prefix}_*.sdx` files and a `db_{prefix}.fasta` file. In addition, it includes `EMBL` folder which contains all the sequences and `TAXO` folder which contains taxonomic information files. ## Use the reference database Now, your reference database can be used for taxonomic assignment in our pipeline to generate species environmental presence from raw eDNA data. You can use the absolute path of the folder of your reference database as the `/path/to/reference_database` argument in the following pipelines : * [only_obitools](http://gitlab.mbb.univ-montp2.fr/edna/only_obitools) * [nextflow_obitools](https://gitlab.mbb.univ-montp2.fr/edna/nextflow_obitools) * [snakemake_only_obitools](http://gitlab.mbb.univ-montp2.fr/edna/snakemake_only_obitools)