Commit 638ea077 authored by peguerin's avatar peguerin
Browse files

readme update

parent 5485925f
......@@ -52,15 +52,15 @@ The three draft genomes were sequenced within the NGS technology.
## Set the initial directory structure
- pe_dir
- 180802_NB501473_A_L1-4_ANIZ-1_R1.RD30.NotEmpty.LinkerTrimmed-50bp-PR.fastq
- 180802_NB501473_A_L1-4_ANIZ-1_R2.RD30.NotEmpty.LinkerTrimmed-50bp-PR.fastq
- 180802_NB501473_A_L1-4_ANIZ-2_R1.RD30.NotEmpty.LinkerTrimmed-50bp-PR.fastq
- 180802_NB501473_A_L1-4_ANIZ-2_R2.RD30.NotEmpty.LinkerTrimmed-50bp-PR.fastq
- lib350bp_R1.fastq
- lib350bp_R2.fastq
- lib550bp_R1.fastq
- lib550bp_R2.fastq
- me_dir
- 180806_NB501850_A_L1-4_ANIZ-3_R1.fastq
- 180806_NB501850_A_L1-4_ANIZ-3_R2.fastq
- 180807_NB501850_A_L1-4_ANIZ-4_R1.fastq
- 180807_NB501850_A_L1-4_ANIZ-4_R2.fastq
- lib3kbp_R1.fastq
- lib3kbp_R2.fastq
- lib5kbp_R1.fastq
- lib5kbp_R2.fastq
- x_dir
- Lib_10_S1_L002_I1_001.fastq
- Lib_10_S1_L002_R1_001.fastq
......@@ -78,18 +78,21 @@ Platanus is a novel de novo sequence assembler that can reconstruct genomic sequ
### 1. Contig assembling
platanus assemble -tmp temp/ -m 256 -t 64 -o serran_assemble -f pe-dir/*.fastq 2> assemble.log
platanus assemble -tmp temp/ -m 256 -t 64 -o serran_assemble -f pe_dir/*.fastq 2> assemble.log
### 2. Scaffoling
platanus scaffold -t 80 -tmp temp/ -c serran_assemble_contig.fa -b serran_assemble_contigBubble.fa -IP1 me-dir/*ANIZ-3*.fastq -IP2 me-dir/*ANIZ-3*.fastq -OP3 -OP4 pe-dir/*ANIZ-2*.fastq /media/bigvol/peguerin/rawdata/fasteris/ANIZ-1-2/data/180802_NB501473_A_L1-4_ANIZ-2_R2.RD30.NotEmpty.LinkerTrimmed-50bp-PR.fastq 2> scaffold.log
platanus scaffold -t 64 -tmp temp/ -c serran_assemble_contig.fa -b serran_assemble_contigBubble.fa -IP1 pe_dir/lib350bp_R*.fastq -IP2 pe_dir/lib550bp_R*.fastq -OP3 me_dir/lib3kbp_R*.fastq -OP4 lib5kbp_R*.fastq 2> scaffold.log
### 3. Gapclose
platanus gap_close -t 64 -tmp temp/ -o serran_hpc_gapclose -c out_scaffold.fa -IP1 pe_dir/lib350bp_R*.fastq -IP2 pe_dir/lib550bp_R*.fastq -OP3 me_dir/lib3kbp_R*.fastq -OP4 lib5kbp_R*.fastq 2> gapclose.log
## Supernova
......@@ -99,7 +102,7 @@ Supernova should be run using 38-56x coverage of the genome.
- Note that at most 2.14 billion reads are allowed.
- Please note that we have not extensively tested genomes larger than human, and any genome above approximately 4 GB should be considered experimental and is not supported.
### De novo assembly
### 1. De novo assembly
generate a whole genome _de novo_ assembly for serran
......@@ -107,7 +110,7 @@ generate a whole genome _de novo_ assembly for serran
supernova run --id=serran --fastqs=x_dir/ --localmem=470 --maxreads=298666666
### Generating phased genome sequences
### 2. Generating phased genome sequences
Once serran's assembly has completed, we generate a FASTA file representing your assembly.
......@@ -118,7 +121,7 @@ supernova mkoutput --style=pseudohap2 --asmdir=serran/outs/assembly --outprefix=
A style `pseudohap2`, identified in FASTA records as style=4, generates a single record per scaffold , except that for each scaffold, two ‘parallel’ pseudohaplotypes are created and placed in separate FASTA files. Records in these files are parallel to each other. Megabubble arms are chosen arbitrarily so many records will mix maternal and paternal alleles.
## Arcs
Scaffolding genome sequence assemblies using 10X Genomics Chromium data. In other words we use linked-reads information to improve genome assembly based on paired-end/mate-pair libraries.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment