Commit 43cbdd91 authored by peguerin's avatar peguerin
Browse files

prefix reference database

parent 4818a4ba
......@@ -25,6 +25,7 @@ programs and libraries you absolutely need are:
- [GNU Parallel](https://www.gnu.org/software/parallel/)
In addition, you will need a reference database for taxonomic assignment. You can build a reference database by following the instructions [here](http://gitlab.mbb.univ-montp2.fr/edna/reference_database).
# 3. Reporting bugs
......@@ -51,11 +52,11 @@ git clone http://gitlab.mbb.univ-montp2.fr/edna/only_obitools.git
cd only_obitools
```
* define 2 folders :
- folder which contains reference database files. You can built a reference database by following the instructions [here](projet_builtdatabase).
- folder which contains reference database files. You can build a reference database by following the instructions [here](http://gitlab.mbb.univ-montp2.fr/edna/reference_database).
- folder which contains pairend-end raw reads `.fastq.gz` files and the sample description `.dat` files. Raw reads files from the same pair must be named as `*_R1.fastq.gz` and `*_R2.fastq.gz` where wildcard `*` is the name of the sequencing run. The alphanumeric order of the names of sample description `.dat` files must be the same than the names of paired-end raw reads `.fastq.gz` files. The sample description file is a text file where each line describes one sample. Columns are separated by space or tab characters. Sample description file is described [here](https://pythonhosted.org/OBITools/scripts/ngsfilter.html).
* run the pipeline :
```
bash pipeline.sh /path/to/data /path/to/baseofreference
bash pipeline.sh /path/to/data /path/to/reference_database
```
order of arguments is important :
......
......@@ -12,9 +12,9 @@ sample_description_file=$2
### absolute path to the folder which contains reference database files
base_dir=$4
### [very important and tricky !!! ]
### prefix of the reference database files after "embl_"
### prefix of the reference database files
### the prefix must contain no "." or "_" characters
base_pref=`ls $base_dir/embl_* | sed 's/embl_/|/g' | cut -d "|" -f 2 | cut -d "." -f 1 | cut -d "_" -f 1 | uniq`
base_pref=`ls $base_dir/*sdx | sed 's/_[0-9][0-9][0-9].sdx//'g | awk -F/ '{print $NF}' | uniq`
### path to the folder which stores intermediate and temporary results
main_dir=$(pwd)/main
### path to the folder which contains final results tables for this run
......@@ -57,7 +57,7 @@ all_sample_sequences_uniq="${all_sample_sequences_clean/.fasta/.uniq.fasta}"
obiuniq -m sample $all_sample_sequences_clean > $all_sample_sequences_uniq
##Assign each sequence to a taxon
all_sample_sequences_tag="${all_sample_sequences_uniq/.fasta/.tag.fasta}"
ecotag -d "$base_dir"/embl_"$base_pref" -R $base_dir/db_"$base_pref".fasta $all_sample_sequences_uniq > $all_sample_sequences_tag
ecotag -d "$base_dir"/"${base_pref}" -R $base_dir/db_"${base_pref}".fasta $all_sample_sequences_uniq > $all_sample_sequences_tag
##Some unuseful attributes can be removed at this stage
all_sample_sequences_ann="${all_sample_sequences_tag/.fasta/.ann.fasta}"
obiannotate --delete-tag=scientific_name_by_db --delete-tag=obiclean_samplecount \
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment