README.md 1.33 KB
Newer Older
peguerin's avatar
peguerin committed
1
2
3
# Custom Metabarcoding Reference Database

![Twitter URL](https://img.shields.io/twitter/url?style=social&url=https%3A%2F%2Fgitlab.mbb.univ-montp2.fr%2Fedna%2Fcustom_reference_database)
peguerin's avatar
peguerin committed
4

peguerin's avatar
peguerin committed
5
Scripts to convert FASTA files into reference database linked to NCBI taxonomy.
peguerin's avatar
peguerin committed
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

## Introduction

scripts to create our own reference database with our own sequences only and using the NCBI taxonomy



## Workflow

* inputs:
    * FASTA file

0. get raw fasta files of new sequences with species-names
1. Extract sequence name
2. Check sequence name format
3. Check sequences format (iuapc ambiguity, gaps)
4. Correct NCBI-taxonomy species name (this is semi-automatic)
5. Attribute NCBI-taxonomy taxid
6. Extract names with missing taxid
    1. Attribute NCBI-taxonomy taxid of genus
    2. Run obitaxonommy command for unattributed taxid species
peguerin's avatar
peguerin committed
27
7. Write fasta file of sequences with their taxid and complete genus-species name
peguerin's avatar
peguerin committed
28
29
30
31
32

* outputs:
    * formatted FASTA file
    * .ldx new nodes for missing taxid into the taxonomy to link to existing genus/family taxid

peguerin's avatar
peguerin committed
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
## Environment

To create environments with required softwares: 

```
conda env create -f envs/obitools_envs.yaml
conda env create -f envs/pylib_cbdr.yaml
```

* Obitools

```
conda activate obitools
```

* Required python libraries to build custom reference database

```
conda activate pylib_cbdr
```