# Custom Metabarcoding Reference Database [![Twitter Follow](https://img.shields.io/twitter/follow/ephe_bev?style=social)](https://twitter.com/ephe_bev) Scripts to convert FASTA files into reference database linked to NCBI taxonomy. ## Introduction scripts to create our own reference database with our own sequences only and using the NCBI taxonomy ## Workflow * inputs: * FASTA file 0. get raw fasta files of new sequences with species-names 1. Extract sequence name 2. Check sequence name format 3. Check sequences format (iuapc ambiguity, gaps) 4. Correct NCBI-taxonomy species name (this is semi-automatic) 5. Attribute NCBI-taxonomy taxid 6. Extract names with missing taxid 1. Attribute NCBI-taxonomy taxid of genus 2. Run obitaxonommy command for unattributed taxid species 7. Write fasta file of sequences with their taxid and complete genus-species name * outputs: * formatted FASTA file * .ldx new nodes for missing taxid into the taxonomy to link to existing genus/family taxid ## Environment To create environments with required softwares: ``` conda env create -f envs/obitools_envs.yaml conda env create -f envs/pylib_cbdr.yaml ``` * Obitools ``` conda activate obitools ``` * Required python libraries to build custom reference database ``` conda activate pylib_cbdr ```