README.md 4.75 KB
Newer Older
peguerin's avatar
peguerin committed
1
2
# Custom Metabarcoding Reference Database

peguerin's avatar
peguerin committed
3
4
5
[![Twitter Follow](https://img.shields.io/twitter/follow/ephe_bev?style=social)](https://twitter.com/ephe_bev)


peguerin's avatar
peguerin committed
6

peguerin's avatar
peguerin committed
7
Scripts to convert FASTA files into reference database with NCBI taxonomy.
peguerin's avatar
peguerin committed
8
9
10

## Introduction

peguerin's avatar
peguerin committed
11
**MKBDR** is a python program designed to create reference database from FASTA file using the NCBI taxonomy. It also provides tools to assist and perform taxonomy curation on the input FASTA file.
peguerin's avatar
peguerin committed
12
13
14



peguerin's avatar
peguerin committed
15
## Method
peguerin's avatar
peguerin committed
16

peguerin's avatar
peguerin committed
17
18
19
1. [Installation](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Installing-MKBDR)
2. Input Files
    * [species representative records FASTA file](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition#representative-sequences)
peguerin's avatar
peguerin committed
20
    * [NCBI taxonomy files](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition#ncbi-taxonomies-file)
peguerin's avatar
peguerin committed
21
22
23
24
25
26
27
28
29
30
31
32
33
34
    * [Curation table file](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition#curation-file)
3. [Quick start]()
4. Running MKBDR
    * [Module validate - Check taxonomy and format validity](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Validate)
    * [Module cureGen - Curation Generation](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Curegen)
    * [Module init_ncbi_taxdump - Download NCBI taxonomy](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/init_ncbi_taxdump)
    * [Apply taxonomy curation](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Validate#basic-curation)
    * [Add new species to NCBI taxonomy](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Validate#using-a-local-ncbi-taxonomy-performs-a-curation-which-add-new-species-to-your-local-taxonomy)
4. [Output results](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition#output-files)
5. [How-to guide]()
6. [Reference](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/home#credits)
7. [Metabarcoding context - discussion to go further]()


peguerin's avatar
peguerin committed
35
![mkbdr](docs/mkbdr.png)
peguerin's avatar
peguerin committed
36

peguerin's avatar
peguerin committed
37

peguerin's avatar
peguerin committed
38
## Credits
peguerin's avatar
peguerin committed
39

peguerin's avatar
peguerin committed
40
**MKBDR** was coded and written by Pierre-Edouard Guerin, Laetitia Mathon and Virginie Marques.
peguerin's avatar
peguerin committed
41

peguerin's avatar
peguerin committed
42
We thank the following people for their help in the development of this software: Virginie Marques, Alice Valentini, David Mouillot, Emilie Boulanger, Laetitia Mathon, Laura Benestan, Stephanie Manel, Tony Dejean.
peguerin's avatar
peguerin committed
43

peguerin's avatar
peguerin committed
44

peguerin's avatar
peguerin committed
45

peguerin's avatar
peguerin committed
46
47
48
49
50
51
52
53
54
55
56
## Contributions and Support

:bug: If you are sure you have found a bug, please submit a bug report. You can submit your bug reports on Gitlab [here](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/issues).




For further information or help, don't hesitate to get in touch on Slack (you can join with this invite).

[![Get help on Slack](https://img.shields.io/badge/slack-cefebev%23basereference-4A154B?logo=slack)](https://cefebev.slack.com/archives/C01MDQSS57F)

peguerin's avatar
peguerin committed
57

peguerin's avatar
peguerin committed
58
59


peguerin's avatar
peguerin committed
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
## Environment

To create environments with required softwares: 

```
conda env create -f envs/obitools_envs.yaml
conda env create -f envs/pylib_cbdr.yaml
```

* Obitools

```
conda activate obitools
```

* Required python libraries to build custom reference database

```
conda activate pylib_cbdr
```






peguerin's avatar
peguerin committed
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
## Usage

First time loading the taxdump

```
mkbdr validate --fasta resources/test/raw.fasta \
--ncbi_taxdump "TAXO/taxdump_2021.tar.gz" \
--output_prefix "test_raw"
```

taxdump previously loaded (faster)

```
mkbdr validate --fasta resources/test/raw.fasta \
--output_prefix "test_raw"
```

Apply curation

```
mkbdr validate --fasta resources/test/raw.fasta \
--curate curated_taxon.csv
--output_prefix "test_curated"
peguerin's avatar
peguerin committed
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
```

Generate a curation csv file

```
mkbdr curegen --fasta test_raw_faulty_taxon.fasta \
--output_prefix "test"
```

Specify the globalnames database to query

```
mkbdr curegen --fasta test_raw_faulty_taxon.fasta \
--output_prefix "test" \
--database_globalnames 'Catalogue of Life'
```



_______________________________________________________________________________


crash test
```
python3 mkbdr validate --fasta teleo_ok.fasta --curate curated_taxon.csv --output_prefix "truc"
peguerin's avatar
peguerin committed
134

peguerin's avatar
peguerin committed
135
cd TAXO/testouille; tar zxvf taxdump_2021.tar.gz ; cd ../../
peguerin's avatar
peguerin committed
136
137
python3 mkbdr validate --fasta teleo_ok.fasta --curate curated_taxon.csv --output_prefix "truc" --curate curated_taxon.csv --ncbi_taxdump TAXO/testouille --ncbi_taxdump_edition
```
peguerin's avatar
peguerin committed
138
139
mkbdr validate --fasta teleo_ok.fasta --curate curated_taxon.csv --output_prefix "laetinicer" --ncbi_taxdump_edition --ncbi_taxdump TAXO/testouille

peguerin's avatar
peguerin committed
140
141
142
143
144

obitools

```
conda activate obitools
peguerin's avatar
peguerin committed
145
ecotag -t TAXO/testouille -R truc_valide.fasta -m 0.95 -r nimp.fasta
peguerin's avatar
peguerin committed
146
147
148
149
150
151
152
153
```

```

### Taxdump Files

wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz
tar zxvf taxdump.tar/gz
peguerin's avatar
peguerin committed
154
155
cd TAXO/testouille; tar zxvf taxdump_2021.tar.gz ; cd ../../