MKBDR: Make a custom reference database for metabarcoding
.___ ___. __ ___ .______ _______ .______ | \/ | | |/ / | _ \ | \ | _ \ | \ / | | ' / | |_) | | .--. || |_) | | |\/| | | < | _ < | | | || / | | | | | . \ | |_) | | '--' || |\ \----. |__| |__| |__|\__\ |______/ |_______/ | _| `._____|
MKBDR is used to process FASTA files in order to produce a reference database with right format and taxonomy according to NCBI taxonomies. If new species must be added, it can produce a custom NCBI taxonomy.
The custom reference database generated by MKBDR can be use in further analysis to perform taxonomic assignment with software such as ecotag
See Installing MKBDR for installation instructions.
Download example data with:
curl -LJO https://gitlab.mbb.univ-montp2.fr/edna/custom_reference_database/-/raw/master/tests/data/raw.fasta
raw.fasta: a FASTA file of 4 records representative sequence of 4 taxon groups. More details about input files here.
If you have installed MKBDR, you can run the example data with:
mkbdr validate --fasta raw.fasta --output_prefix res_raw
This will output:
Checking arguments...done. Validate records... Loading local NCBI taxonomy...done. 4 processed records. On these records, 2 are valid, 0 are faulty format and 2 are faulty taxon.
validate module checks if the format or the taxonomy is valid. Then it writes 3 files:
res_raw_faulty_format.fasta: a FASTA file with faulty format records (empty in this example)
res_raw_faulty_taxon.fasta: a FASTA file with faulty taxonomy records (2 faulty records in this example)
res_raw_valide.fasta: a FASTA file with correct records that can be use as reference database for taxonomic assignment (2 valid records in this example)
Read more details about output files here.
Now that you've gotten the example to work, use the menu in the upper right to navigate to the more detailed descriptions and instructions for exploring your own data. You can try Running MKBDR section to continue.
- python3 required modules included: argparse, numpy, biopython, pandas, PyQt5, ete3, pytaxize, pathlib
- the NCBI taxonomy files
- formated FASTA file of the sequences with their NCBI species name
- (optionnal) a curation CSV table file
See the above example to explore the example inputs provided. These required inputs are described in more detail here.
- Input Files
- Quick start
- Running MKBDR
- Output results
- How-to guide
- Metabarcoding context - discussion to go further
MKBDR was coded and written by Pierre-Edouard Guerin, Laetitia Mathon and Virginie Marques.
We thank the following people for their help in the development of this software: Virginie Marques, Alice Valentini, David Mouillot, Emilie Boulanger, Laetitia Mathon, Laura Benestan, Stephanie Manel, Tony Dejean.
Questions, comments, etc ? Contact us
For further information or help, don't hesitate to get in touch on Slack (you can join with this invite).