Commit 59b98475 authored by peguerin's avatar peguerin
Browse files

readme update

parent 44a3e14e
......@@ -32,78 +32,84 @@ pip3 install gitdb2
You have to set 2 files:
* [01_infos/all_samples.tsv](01_infos/all_samples.tsv)
* [01_infos/config.yaml](01_infos/config_test.yaml)
* `RAPIDRUN_METADATA` *e.g.* [all_samples.tsv](resources/test/all_samples.tsv)
* `CONFIG_FILE` *e.g.* [config/config.yaml]
# The Workflow
## 1. Set rapidrun
Indicate the path of [01_infos/all_samples.tsv](01_infos/all_samples.tsv) file in the `fichiers` `rapidrun` field in [01_infos/config.yaml](01_infos/config_test.yaml)
Indicate the absolute path of rapidrun metadata table [all_samples.tsv](resources/test/all_samples.tsv) in the `fichiers` `rapidrun` field in [config/config.yaml](config/)
```
CONFIGFILE=01_infos/config_test.yaml
snakemake --configfile $CONFIGFILE -s readwrite_rapidrun_demultiplexing.py
cd 01_settings
snakemake --configfile "../"$CONFIGFILE -s readwrite_rapidrun_demultiplexing.py --cores $CORES
cd ..
```
This command will read [01_infos/all_samples.tsv](01_infos/all_samples.tsv) and return a file [01_infos/all_demultiplex.csv](01_infos/all_demultiplex.csv) which can be used to process rapidrun data in the next steps of the workflow.
This command will read `CONFIGFILE` and the `RAPIDRUN_METADATA` then return a file [results/01_settings/all_demultiplex.csv](results/01_settings/) which can be used to process rapidrun data in the next steps of the workflow.
[01_infos/all_demultiplex.csv](01_infos/all_demultiplex.csv) has 14 fields : *demultiplex,projet,marker,run,plaque,sample,barcode5,barcode3,primer5,primer3,min_f,min_r,lenBarcode5,lenBarcode3* and each row is an unique sample which belong to an unique marker and a project.
[results/01_settings/all_demultiplex.csv](results/01_settings/) has 14 fields : *demultiplex,projet,marker,run,plaque,sample,barcode5,barcode3,primer5,primer3,min_f,min_r,lenBarcode5,lenBarcode3* and each row is an unique sample which belong to an unique marker and a project.
## 2 Assembly
```
CORES=16
CONFIGFILE=01_infos/config_test.yaml
cd 02_assembly
snakemake --configfile "../"$CONFIGFILE -s Snakefile -j $CORES --use-singularity --singularity-args "--bind /media/superdisk:/media/superdisk" --latency-wait 20
snakemake --configfile "../"$CONFIGFILE -s Snakefile --cores $CORES --use-singularity --singularity-args "--bind /media/superdisk:/media/superdisk --home $HOME" --latency-wait 20
cd ..
```
Merge paired-end sequences and remove unaligned sequences records.
Merge paired-end sequences and remove unaligned sequences records. Results are stored into [results/02_assembly](results/02_assembly)
## 3 Demultiplexing
```
cd 03_demultiplexing
snakemake --configfile "../"$CONFIGFILE -s Snakefile -j $CORES --use-singularity --singularity-args "--bind /media/superdisk:/media/superdisk" --latency-wait 20
cd 03_demultiplex
snakemake --configfile "../"$CONFIGFILE -s Snakefile --cores $CORES --use-singularity --singularity-args "--bind /media/superdisk:/media/superdisk --home $HOME" --latency-wait 20
cd ..
```
The tags correspond to short and specific sequences added on the 5’ end of each primer to distinguish the different samples
The tags correspond to short and specific sequences added on the 5’ end of each primer to distinguish the different samples.
Check results of demultiplexing into [results/03_demultiplex](results/03_demultiplex)
## 4 Filter samples
```
cd 04_filter_samples
snakemake --configfile "../"$CONFIGFILE -s Snakefile -j $CORES --use-singularity --singularity-args "--bind /media/superdisk:/media/superdisk" --latency-wait 20
snakemake --configfile "../"$CONFIGFILE -s Snakefile --cores $CORES --use-singularity --singularity-args "--bind /media/superdisk:/media/superdisk --home $HOME" --latency-wait 20
cd ..
```
For each sample, dereplicate sequences, remove sequences with a given length, remove sequences with IUAPC ambiguity, remove PCR clone, remove sequences which are classified as 'internal' by `obiclean`
Check results into [results/04_filter_samples](results/04_filter_samples)
## 5 Concatenate samples into runs
```
for projet in `ls 04_filter_samples/04_filtered/`;
for projet in `ls results/04_filter_samples/04_filtered/`;
do
for marker in `ls 04_filter_samples/04_filtered/${projet}/`;
for marker in `ls results/04_filter_samples/04_filtered/${projet}/`;
do
for run in `ls 04_filter_samples/04_filtered/${projet}/${marker}/`;
for run in `ls results/04_filter_samples/04_filtered/${projet}/${marker}/`;
do
echo 04_filter_samples/04_filtered/${projet}/${marker}/${run};
mkdir -p 05_assignment/01_runs/${projet}/${marker}/
cat 04_filter_samples/04_filtered/${projet}/${marker}/${run}/*.c.r.l.u.fasta > 05_assignment/01_runs/${projet}/${marker}/${run}.fasta
echo results/04_filter_samples/04_filtered/${projet}/${marker}/${run};
mkdir -p results/05_assignment/01_runs/${projet}/${marker}/
cat results/04_filter_samples/04_filtered/${projet}/${marker}/${run}/*.c.r.l.u.fasta > results/05_assignment/01_runs/${projet}/${marker}/${run}.fasta
done
done
done
```
sample files are concatenated by run
sample files are concatenated by run into [results/05_assignment/01_runs](results/05_assignment/01_runs)
## 6 Assignment
```
cd 05_assignment
snakemake --configfile "../"$CONFIGFILE -s Snakefile -j $CORES --use-singularity --singularity-args "--bind /media/superdisk:/media/superdisk" --latency-wait 20
snakemake --configfile "../"$CONFIGFILE -s Snakefile --cores $CORES --use-singularity --singularity-args "--bind /media/superdisk:/media/superdisk --home $HOME" --latency-wait 20
cd ..
```
Assign each sequence to a taxon. Format the output into a csv results in folder [06_final_tables](06_final_tables)
\ No newline at end of file
Assign each sequence to a taxon. Format the output into a csv results in folder [results/06_final_tables](results/06_final_tables)
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment