Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
edna
reference_database
Commits
8db81907
Commit
8db81907
authored
Jan 07, 2019
by
peguerin
Browse files
stampa
parent
eb3d0c37
Changes
2
Hide whitespace changes
Inline
Side-by-side
README.md
View file @
8db81907
...
...
@@ -31,6 +31,10 @@ You will also need to have the following programs installed on your computer.
-
[
OBItools
](
https://pythonhosted.org/OBITools/welcome.html#installing-the-obitools
)
-
[
ecoPCR
](
https://git.metabarcoding.org/obitools/ecopcr/wikis/home
)
-
[
python 3.6.6
](
https://www.python.org/downloads/release/python-366/
)
-
`python 3 library`
[
biopython
](
https://biopython.org/
)
-
`python 3 library`
[
etetoolkit
](
http://etetoolkit.org/
)
## Preparation
...
...
@@ -41,7 +45,7 @@ You will also need to have the following programs installed on your computer.
## Build a reference database
*
Overview of the steps
1.
Download the sequences
¶
1.
Download the sequences
2.
Download the taxonomy
3.
Use
[
ecoPCR
](
https://git.metabarcoding.org/obitools/ecopcr/wikis/home
)
to simulate an
*in silico*
PCR
...
...
@@ -73,3 +77,11 @@ You can use the absolute path of the folder of your reference database as the `/
*
[
nextflow_obitools
](
https://gitlab.mbb.univ-montp2.fr/edna/nextflow_obitools
)
*
[
snakemake_only_obitools
](
http://gitlab.mbb.univ-montp2.fr/edna/snakemake_only_obitools
)
## Reference database in STAMPA format
Once you generated you're reference database, you need the
`db_{prefix}.fasta`
file (see
[
results
](
Results
)
section).
To convert your reference database in STAMPA format type :
```
python3 scripts/formatDB_obifasta_to_stampa.py -f db_{prefix}.fasta -o /path/to/reference_database_stampa.fasta
```
It will generate a file
`/path/to/reference_database_stampa.fasta`
which you can use as a reference database into the pipeline
[
bash_swarm
](
https://gitlab.mbb.univ-montp2.fr/edna/bash_swarm
)
\ No newline at end of file
scripts/formatDB_obifasta_to_stampa.py
0 → 100644
View file @
8db81907
#===============================================================================
#INFORMATIONS
#===============================================================================
"""
CEFE - EPHE - YERSIN 2018
guerin pierre-edouard
from reference database obitools fasta files,
it creates a stampa-format reference database
"""
#===============================================================================
#USAGES
#===============================================================================
"""
input :
path to FASTA obitools file with "species=" information field available
output :
path to write the STAMPA-format FASTA reference databasae file
command :
python3 ncbi_to_stampa.py -f input -o output
"""
#===============================================================================
#MODULES
#===============================================================================
import
Bio
from
Bio
import
SeqIO
from
Bio.Alphabet
import
IUPAC
from
Bio.Seq
import
Seq
from
Bio.SeqRecord
import
SeqRecord
import
argparse
from
ete3
import
NCBITaxa
#===============================================================================
#ARGUMENTS
#===============================================================================
parser
=
argparse
.
ArgumentParser
(
description
=
'convert fasta obitools to stampa format'
)
parser
.
add_argument
(
"-o"
,
"--output"
,
type
=
str
)
parser
.
add_argument
(
"-f"
,
"--fasta"
,
type
=
str
)
#===============================================================================
#MAIN
#===============================================================================
args
=
parser
.
parse_args
()
outputFile
=
args
.
output
fastaFile
=
args
.
fasta
ncbi
=
NCBITaxa
()
mes_records
=
[]
for
seq_record
in
SeqIO
.
parse
(
fastaFile
,
"fasta"
,
alphabet
=
IUPAC
.
unambiguous_dna
):
local_id
=
seq_record
.
id
sr_description
=
seq_record
.
description
.
split
(
';'
)
#print(local_id)
#print(sr_description)
for
elem
in
sr_description
:
#print(elem.split('=')[0])
if
len
(
elem
)
==
0
:
continue
if
elem
.
split
(
'='
)[
0
]
==
" species"
:
cmd_get_dico
=
"id_species="
+
str
(
elem
.
split
(
'='
)[
1
])
exec
(
cmd_get_dico
)
if
id_species
!=
-
1
:
lineage
=
ncbi
.
get_lineage
(
id_species
)
names
=
ncbi
.
get_taxid_translator
(
lineage
)
all_tax_species
=
""
for
taxid
in
lineage
[
2
:]:
all_tax_species
+=
names
[
taxid
]
all_tax_species
+=
"|"
all_tax_species
=
all_tax_species
.
replace
(
" "
,
"+"
)
else
:
continue
local_seq
=
Seq
(
str
(
repr
(
str
(
seq_record
.
seq
.
lower
()))).
replace
(
"'"
,
""
),
IUPAC
.
unambiguous_dna
)
local_record
=
SeqRecord
(
local_seq
,
id
=
seq_record
.
id
,
description
=
all_tax_species
[:
-
1
])
mes_records
.
append
(
local_record
)
SeqIO
.
write
(
mes_records
,
outputFile
,
"fasta"
)
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment