README.md 4.76 KB
Newer Older
peguerin's avatar
peguerin committed
1
2
# Custom Metabarcoding Reference Database

peguerin's avatar
peguerin committed
3
4
5
[![Twitter Follow](https://img.shields.io/twitter/follow/ephe_bev?style=social)](https://twitter.com/ephe_bev)


peguerin's avatar
peguerin committed
6

peguerin's avatar
peguerin committed
7
Scripts to convert FASTA files into reference database with NCBI taxonomy.
peguerin's avatar
peguerin committed
8
9
10

## Introduction

peguerin's avatar
peguerin committed
11
**MKBDR** is a python program designed to create reference database from FASTA file using the NCBI taxonomy. It also provides tools to assist and perform taxonomy curation on the input FASTA file.
peguerin's avatar
peguerin committed
12
13
14



peguerin's avatar
peguerin committed
15
## Method
peguerin's avatar
peguerin committed
16

peguerin's avatar
peguerin committed
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
1. [Installation](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Installing-MKBDR)
2. Input Files
    * [species representative records FASTA file](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition#representative-sequences)
    * [NCBI taxonomy files]https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition#ncbi-taxonomies-file)
    * [Curation table file](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition#curation-file)
3. [Quick start]()
4. Running MKBDR
    * [Module validate - Check taxonomy and format validity](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Validate)
    * [Module cureGen - Curation Generation](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Curegen)
    * [Module init_ncbi_taxdump - Download NCBI taxonomy](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/init_ncbi_taxdump)
    * [Apply taxonomy curation](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Validate#basic-curation)
    * [Add new species to NCBI taxonomy](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Validate#using-a-local-ncbi-taxonomy-performs-a-curation-which-add-new-species-to-your-local-taxonomy)
4. [Output results](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/Files-definition#output-files)
5. [How-to guide]()
6. [Reference](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/wikis/home#credits)
7. [Metabarcoding context - discussion to go further]()

## Workflow

peguerin's avatar
peguerin committed
36
![mkbdr](docs/mkbdr.png)
peguerin's avatar
peguerin committed
37

peguerin's avatar
peguerin committed
38

peguerin's avatar
peguerin committed
39
## Credits
peguerin's avatar
peguerin committed
40

peguerin's avatar
peguerin committed
41
**MKBDR** was coded and written by Pierre-Edouard Guerin, Laetitia Mathon and Virginie Marques.
peguerin's avatar
peguerin committed
42

peguerin's avatar
peguerin committed
43
We thank the following people for their help in the development of this software: Virginie Marques, Alice Valentini, David Mouillot, Emilie Boulanger, Laetitia Mathon, Laura Benestan, Stephanie Manel, Tony Dejean.
peguerin's avatar
peguerin committed
44

peguerin's avatar
peguerin committed
45

peguerin's avatar
peguerin committed
46

peguerin's avatar
peguerin committed
47
48
49
50
51
52
53
54
55
56
57
## Contributions and Support

:bug: If you are sure you have found a bug, please submit a bug report. You can submit your bug reports on Gitlab [here](https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/issues).




For further information or help, don't hesitate to get in touch on Slack (you can join with this invite).

[![Get help on Slack](https://img.shields.io/badge/slack-cefebev%23basereference-4A154B?logo=slack)](https://cefebev.slack.com/archives/C01MDQSS57F)

peguerin's avatar
peguerin committed
58

peguerin's avatar
peguerin committed
59
60


peguerin's avatar
peguerin committed
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
## Environment

To create environments with required softwares: 

```
conda env create -f envs/obitools_envs.yaml
conda env create -f envs/pylib_cbdr.yaml
```

* Obitools

```
conda activate obitools
```

* Required python libraries to build custom reference database

```
conda activate pylib_cbdr
```






peguerin's avatar
peguerin committed
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
## Usage

First time loading the taxdump

```
mkbdr validate --fasta resources/test/raw.fasta \
--ncbi_taxdump "TAXO/taxdump_2021.tar.gz" \
--output_prefix "test_raw"
```

taxdump previously loaded (faster)

```
mkbdr validate --fasta resources/test/raw.fasta \
--output_prefix "test_raw"
```

Apply curation

```
mkbdr validate --fasta resources/test/raw.fasta \
--curate curated_taxon.csv
--output_prefix "test_curated"
peguerin's avatar
peguerin committed
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
```

Generate a curation csv file

```
mkbdr curegen --fasta test_raw_faulty_taxon.fasta \
--output_prefix "test"
```

Specify the globalnames database to query

```
mkbdr curegen --fasta test_raw_faulty_taxon.fasta \
--output_prefix "test" \
--database_globalnames 'Catalogue of Life'
```



_______________________________________________________________________________


crash test
```
python3 mkbdr validate --fasta teleo_ok.fasta --curate curated_taxon.csv --output_prefix "truc"
peguerin's avatar
peguerin committed
135

peguerin's avatar
peguerin committed
136
cd TAXO/testouille; tar zxvf taxdump_2021.tar.gz ; cd ../../
peguerin's avatar
peguerin committed
137
138
python3 mkbdr validate --fasta teleo_ok.fasta --curate curated_taxon.csv --output_prefix "truc" --curate curated_taxon.csv --ncbi_taxdump TAXO/testouille --ncbi_taxdump_edition
```
peguerin's avatar
peguerin committed
139
140
mkbdr validate --fasta teleo_ok.fasta --curate curated_taxon.csv --output_prefix "laetinicer" --ncbi_taxdump_edition --ncbi_taxdump TAXO/testouille

peguerin's avatar
peguerin committed
141
142
143
144
145

obitools

```
conda activate obitools
peguerin's avatar
peguerin committed
146
ecotag -t TAXO/testouille -R truc_valide.fasta -m 0.95 -r nimp.fasta
peguerin's avatar
peguerin committed
147
148
149
150
151
152
153
154
```

```

### Taxdump Files

wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz
tar zxvf taxdump.tar/gz
peguerin's avatar
peguerin committed
155
156
cd TAXO/testouille; tar zxvf taxdump_2021.tar.gz ; cd ../../