README.md 815 Bytes
Newer Older
peguerin's avatar
peguerin committed
1
2
# custom_reference_database

peguerin's avatar
peguerin committed
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

## Introduction

scripts to create our own reference database with our own sequences only and using the NCBI taxonomy



## Workflow

* inputs:
    * FASTA file

0. get raw fasta files of new sequences with species-names
1. Extract sequence name
2. Check sequence name format
3. Check sequences format (iuapc ambiguity, gaps)
4. Correct NCBI-taxonomy species name (this is semi-automatic)
5. Attribute NCBI-taxonomy taxid
6. Extract names with missing taxid
    1. Attribute NCBI-taxonomy taxid of genus
    2. Run obitaxonommy command for unattributed taxid species
9. Write fasta file of sequences with their taxid and complete genus-species name

* outputs:
    * formatted FASTA file
    * .ldx new nodes for missing taxid into the taxonomy to link to existing genus/family taxid