Commit 8db81907 authored by peguerin's avatar peguerin

stampa

parent eb3d0c37
......@@ -31,6 +31,10 @@ You will also need to have the following programs installed on your computer.
- [OBItools](https://pythonhosted.org/OBITools/welcome.html#installing-the-obitools)
- [ecoPCR](https://git.metabarcoding.org/obitools/ecopcr/wikis/home)
- [python 3.6.6](https://www.python.org/downloads/release/python-366/)
- `python 3 library` [biopython](https://biopython.org/)
- `python 3 library` [etetoolkit](http://etetoolkit.org/)
## Preparation
......@@ -41,7 +45,7 @@ You will also need to have the following programs installed on your computer.
## Build a reference database
* Overview of the steps
1. Download the sequences
1. Download the sequences
2. Download the taxonomy
3. Use [ecoPCR](https://git.metabarcoding.org/obitools/ecopcr/wikis/home)
to simulate an *in silico* PCR
......@@ -73,3 +77,11 @@ You can use the absolute path of the folder of your reference database as the `/
* [nextflow_obitools](https://gitlab.mbb.univ-montp2.fr/edna/nextflow_obitools)
* [snakemake_only_obitools](http://gitlab.mbb.univ-montp2.fr/edna/snakemake_only_obitools)
## Reference database in STAMPA format
Once you generated you're reference database, you need the `db_{prefix}.fasta` file (see [results](Results) section).
To convert your reference database in STAMPA format type :
```
python3 scripts/formatDB_obifasta_to_stampa.py -f db_{prefix}.fasta -o /path/to/reference_database_stampa.fasta
```
It will generate a file `/path/to/reference_database_stampa.fasta` which you can use as a reference database into the pipeline [bash_swarm](https://gitlab.mbb.univ-montp2.fr/edna/bash_swarm)
\ No newline at end of file
#===============================================================================
#INFORMATIONS
#===============================================================================
"""
CEFE - EPHE - YERSIN 2018
guerin pierre-edouard
from reference database obitools fasta files,
it creates a stampa-format reference database
"""
#===============================================================================
#USAGES
#===============================================================================
"""
input :
path to FASTA obitools file with "species=" information field available
output :
path to write the STAMPA-format FASTA reference databasae file
command :
python3 ncbi_to_stampa.py -f input -o output
"""
#===============================================================================
#MODULES
#===============================================================================
import Bio
from Bio import SeqIO
from Bio.Alphabet import IUPAC
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
import argparse
from ete3 import NCBITaxa
#===============================================================================
#ARGUMENTS
#===============================================================================
parser = argparse.ArgumentParser(description='convert fasta obitools to stampa format')
parser.add_argument("-o","--output", type=str)
parser.add_argument("-f","--fasta",type=str)
#===============================================================================
#MAIN
#===============================================================================
args = parser.parse_args()
outputFile = args.output
fastaFile = args.fasta
ncbi = NCBITaxa()
mes_records=[]
for seq_record in SeqIO.parse(fastaFile, "fasta",alphabet=IUPAC.unambiguous_dna):
local_id=seq_record.id
sr_description=seq_record.description.split(';')
#print(local_id)
#print(sr_description)
for elem in sr_description:
#print(elem.split('=')[0])
if len(elem) == 0:
continue
if elem.split('=')[0] == " species":
cmd_get_dico="id_species="+str(elem.split('=')[1])
exec(cmd_get_dico)
if id_species != -1:
lineage=ncbi.get_lineage(id_species)
names = ncbi.get_taxid_translator(lineage)
all_tax_species=""
for taxid in lineage[2:]:
all_tax_species+=names[taxid]
all_tax_species+="|"
all_tax_species=all_tax_species.replace(" ","+")
else:
continue
local_seq=Seq(str(repr(str(seq_record.seq.lower()))).replace("'",""),IUPAC.unambiguous_dna)
local_record=SeqRecord(local_seq,id=seq_record.id,description=all_tax_species[:-1])
mes_records.append(local_record)
SeqIO.write(mes_records, outputFile, "fasta")
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment