Commit d8fb5aa9 authored by peguerin's avatar peguerin
Browse files

readme update

parent ded7c5b8
# custom_reference_database
scripts to create our own reference database with our own sequences only and using the NCBI taxonomy
\ No newline at end of file
## Introduction
scripts to create our own reference database with our own sequences only and using the NCBI taxonomy
## Workflow
* inputs:
* FASTA file
0. get raw fasta files of new sequences with species-names
1. Extract sequence name
2. Check sequence name format
3. Check sequences format (iuapc ambiguity, gaps)
4. Correct NCBI-taxonomy species name (this is semi-automatic)
5. Attribute NCBI-taxonomy taxid
6. Extract names with missing taxid
1. Attribute NCBI-taxonomy taxid of genus
2. Run obitaxonommy command for unattributed taxid species
9. Write fasta file of sequences with their taxid and complete genus-species name
* outputs:
* formatted FASTA file
* .ldx new nodes for missing taxid into the taxonomy to link to existing genus/family taxid
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment