README.md 3.12 KB
Newer Older
peguerin's avatar
peguerin committed
1
2
# Custom Metabarcoding Reference Database

peguerin's avatar
peguerin committed
3
4
5
[![Twitter Follow](https://img.shields.io/twitter/follow/ephe_bev?style=social)](https://twitter.com/ephe_bev)


peguerin's avatar
peguerin committed
6

peguerin's avatar
peguerin committed
7
Scripts to convert FASTA files into reference database with NCBI taxonomy.
peguerin's avatar
peguerin committed
8
9
10

## Introduction

peguerin's avatar
peguerin committed
11
**MKBDR** is a python program designed to create reference database from FASTA file using the NCBI taxonomy. It also provides tools to assist and perform taxonomy curation on the input FASTA file.
peguerin's avatar
peguerin committed
12
13
14



peguerin's avatar
peguerin committed
15
## Method
peguerin's avatar
peguerin committed
16

peguerin's avatar
peguerin committed
17
![mkbdr](docs/mkbdr.png)
peguerin's avatar
peguerin committed
18

peguerin's avatar
peguerin committed
19

peguerin's avatar
peguerin committed
20
## Credits
peguerin's avatar
peguerin committed
21

peguerin's avatar
peguerin committed
22
**MKBDR** was coded and written by Pierre-Edouard Guerin, Laetitia Mathon and Virginie Marques.
peguerin's avatar
peguerin committed
23

peguerin's avatar
peguerin committed
24
We thank the following people for their help in the development of this software: Virginie Marques, Alice Valentini, David Mouillot, Emilie Boulanger, Laetitia Mathon, Laura Benestan, Stephanie Manel, Tony Dejean.
peguerin's avatar
peguerin committed
25

peguerin's avatar
peguerin committed
26

peguerin's avatar
peguerin committed
27

peguerin's avatar
peguerin committed
28
29
30
31
32
33
34
35
36
37
38
## Contributions and Support

:bug: If you are sure you have found a bug, please submit a bug report. You can submit your bug reports on Gitlab [here](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/issues).




For further information or help, don't hesitate to get in touch on Slack (you can join with this invite).

[![Get help on Slack](https://img.shields.io/badge/slack-cefebev%23basereference-4A154B?logo=slack)](https://cefebev.slack.com/archives/C01MDQSS57F)

peguerin's avatar
peguerin committed
39

peguerin's avatar
peguerin committed
40
41


peguerin's avatar
peguerin committed
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
## Environment

To create environments with required softwares: 

```
conda env create -f envs/obitools_envs.yaml
conda env create -f envs/pylib_cbdr.yaml
```

* Obitools

```
conda activate obitools
```

* Required python libraries to build custom reference database

```
conda activate pylib_cbdr
```






peguerin's avatar
peguerin committed
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
## Usage

First time loading the taxdump

```
mkbdr validate --fasta resources/test/raw.fasta \
--ncbi_taxdump "TAXO/taxdump_2021.tar.gz" \
--output_prefix "test_raw"
```

taxdump previously loaded (faster)

```
mkbdr validate --fasta resources/test/raw.fasta \
--output_prefix "test_raw"
```

Apply curation

```
mkbdr validate --fasta resources/test/raw.fasta \
--curate curated_taxon.csv
--output_prefix "test_curated"
peguerin's avatar
peguerin committed
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
```

Generate a curation csv file

```
mkbdr curegen --fasta test_raw_faulty_taxon.fasta \
--output_prefix "test"
```

Specify the globalnames database to query

```
mkbdr curegen --fasta test_raw_faulty_taxon.fasta \
--output_prefix "test" \
--database_globalnames 'Catalogue of Life'
```



_______________________________________________________________________________


crash test
```
python3 mkbdr validate --fasta teleo_ok.fasta --curate curated_taxon.csv --output_prefix "truc"
peguerin's avatar
peguerin committed
116

peguerin's avatar
peguerin committed
117
cd TAXO/testouille; tar zxvf taxdump_2021.tar.gz ; cd ../../
peguerin's avatar
peguerin committed
118
119
python3 mkbdr validate --fasta teleo_ok.fasta --curate curated_taxon.csv --output_prefix "truc" --curate curated_taxon.csv --ncbi_taxdump TAXO/testouille --ncbi_taxdump_edition
```
peguerin's avatar
peguerin committed
120
121
mkbdr validate --fasta teleo_ok.fasta --curate curated_taxon.csv --output_prefix "laetinicer" --ncbi_taxdump_edition --ncbi_taxdump TAXO/testouille

peguerin's avatar
peguerin committed
122
123
124
125
126

obitools

```
conda activate obitools
peguerin's avatar
peguerin committed
127
ecotag -t TAXO/testouille -R truc_valide.fasta -m 0.95 -r nimp.fasta
peguerin's avatar
peguerin committed
128
129
130
131
132
133
134
135
```

```

### Taxdump Files

wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz
tar zxvf taxdump.tar/gz
peguerin's avatar
peguerin committed
136
137
cd TAXO/testouille; tar zxvf taxdump_2021.tar.gz ; cd ../../