Commit 5485925f authored by peguerin's avatar peguerin
Browse files

readme method supernova

parent 2596f2db
......@@ -51,22 +51,22 @@ The three draft genomes were sequenced within the NGS technology.
## Set the initial directory structure
- pe-dir
- pe_dir
- 180802_NB501473_A_L1-4_ANIZ-1_R1.RD30.NotEmpty.LinkerTrimmed-50bp-PR.fastq
- 180802_NB501473_A_L1-4_ANIZ-1_R2.RD30.NotEmpty.LinkerTrimmed-50bp-PR.fastq
- 180802_NB501473_A_L1-4_ANIZ-2_R1.RD30.NotEmpty.LinkerTrimmed-50bp-PR.fastq
- 180802_NB501473_A_L1-4_ANIZ-2_R2.RD30.NotEmpty.LinkerTrimmed-50bp-PR.fastq
- me-dir
- me_dir
- 180806_NB501850_A_L1-4_ANIZ-3_R1.fastq
- 180806_NB501850_A_L1-4_ANIZ-3_R2.fastq
- 180807_NB501850_A_L1-4_ANIZ-4_R1.fastq
- 180807_NB501850_A_L1-4_ANIZ-4_R2.fastq
- x-dir
- x_dir
- Lib_10_S1_L002_I1_001.fastq
- Lib_10_S1_L002_R1_001.fastq
- Lib_10_S1_L002_R2_001.fastq
with `pe-dir` as a folder of paired-end sequencing results, `me-dir` as mate-pair and `x-dir` as linked-reads.
with `pe_dir` as a folder of paired-end sequencing results, `me_dir` as mate-pair and `x_dir` as linked-reads.
......@@ -93,10 +93,38 @@ fff
## Supernova
Supernova should be run using 38-56x coverage of the genome.
- Somewhat higher coverage is sometimes advantageous.
- Supernova will exit if it finds that coverage is far from the recommended range.
- Note that at most 2.14 billion reads are allowed.
- Please note that we have not extensively tested genomes larger than human, and any genome above approximately 4 GB should be considered experimental and is not supported.
### De novo assembly
generate a whole genome _de novo_ assembly for serran
```
supernova run --id=serran --fastqs=x_dir/ --localmem=470 --maxreads=298666666
```
### Generating phased genome sequences
Once serran's assembly has completed, we generate a FASTA file representing your assembly.
```
supernova mkoutput --style=pseudohap2 --asmdir=serran/outs/assembly --outprefix=serranus
```
A style `pseudohap2`, identified in FASTA records as style=4, generates a single record per scaffold , except that for each scaffold, two ‘parallel’ pseudohaplotypes are created and placed in separate FASTA files. Records in these files are parallel to each other. Megabubble arms are chosen arbitrarily so many records will mix maternal and paternal alleles.
## Arcs
Scaffolding genome sequence assemblies using 10X Genomics Chromium data. In other words we use linked-reads information to improve genome assembly based on paired-end/mate-pair libraries.
see [arcs_pipeline.sh](arcs/pipeline.sh) for details.
## Measuring the assembly
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment