README.md 2.58 KB
Newer Older
peguerin's avatar
peguerin committed
1
2
# snakemake_rapidrun_swarm

peguerin's avatar
peguerin committed
3
4
5
6
7
8
OTU clustering with SWARM on RAPIDRUN data encapsulated in SNAKEMAKE


# Prerequisites


peguerin's avatar
peguerin committed
9
* python3
peguerin's avatar
peguerin committed
10

peguerin's avatar
peguerin committed
11
* snakemake
peguerin's avatar
peguerin committed
12

peguerin's avatar
peguerin committed
13
* singularity
peguerin's avatar
peguerin committed
14

peguerin's avatar
peguerin committed
15
python3 dependencies
peguerin's avatar
peguerin committed
16

peguerin's avatar
peguerin committed
17
```
peguerin's avatar
peguerin committed
18
19
pip3 install pandas
pip3 install biopython
peguerin's avatar
peguerin committed
20
```
peguerin's avatar
peguerin committed
21

peguerin's avatar
peguerin committed
22
python3 dependencies to run `snakemake`:
peguerin's avatar
peguerin committed
23

peguerin's avatar
peguerin committed
24
25
26
27
28
29
```
pip3 install datrie
pip3 install ConfigArgParse
pip3 install appdirs
pip3 install gitdb2
```
peguerin's avatar
peguerin committed
30
31
32
33
34
35
36
37
38
39
40
41

# Configuration / input files

You have to set 2 files:

* [01_infos/all_samples.tsv](01_infos/all_samples.tsv)
* [01_infos/config.yaml](01_infos/config_test.yaml)


# Run the workflow

```
peguerin's avatar
peguerin committed
42
CORES=32
peguerin's avatar
peguerin committed
43
CONFIGFILE="01_infos/config_test.yaml"
peguerin's avatar
peguerin committed
44
bash main.sh $CORES $CONFIGFILE
peguerin's avatar
peguerin committed
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
```


# Run from scratch


## clone repositories

```
git clone git@gitlab.mbb.univ-montp2.fr:edna/snakemake_rapidrun_swarm.git
cd snakemake_rapidrun_swarm
```

## write demultiplexing table

From
- `01_infos/config_test.yaml` 
	* `fichiers:` `rapidrun`
	* `fichiers:` `dat`

peguerin's avatar
peguerin committed
65
* :warning: the colon "marker" into `fichiers:` `rapidrun` must be the same name as marker's keys of `fichiers:` `dat` into `01_infos/config_test.yaml` 
peguerin's avatar
peguerin committed
66

peguerin's avatar
peguerin committed
67
* This will generate a file `01_infos/all_demultiplex.csv` with each line arguments for each command to run in parallel.
peguerin's avatar
peguerin committed
68
69
70
71
72
73
74
75
76
77
78
79

```
snakemake --configfile 01_infos/config_test.yaml -s readwrite_rapidrun_demultiplexing.py
```


## merge fastq

From
- `01_infos/config_test.yaml` 
	* `fichiers:` `rapidrun`

peguerin's avatar
peguerin committed
80
* Deduce the `{run}` fastq paired-end to merge
peguerin's avatar
peguerin committed
81

peguerin's avatar
peguerin committed
82
* :warning: guadeloupe have been removed by hand into [02_assembly/Snakefile](02_assembly/Snakefile) !!!
peguerin's avatar
peguerin committed
83
84
85
86
87
88
89
90
91
92
93
94
95
96


```
cd 02_assembly
snakemake --configfile $CONFIGFILE -s Snakefile -j $CORES --use-singularity --singularity-args "--bind /media/superdisk:/media/superdisk" --latency-wait 20
cd ..
```

## demultiplexing

From
- `01_infos/config_test.yaml` 
	* `fichiers:` `rapidrun`

peguerin's avatar
peguerin committed
97
* will generate demultiplexed .fasta for each `{projet}`/`{marker}`/`{sample}` into `03_demultiplexing`
peguerin's avatar
peguerin committed
98
99
100
101
102
103
104
105
106
107
108
109
110
111


```
cd 03_demultiplexing
snakemake --configfile $CONFIGFILE -s Snakefile -j $CORES --use-singularity --singularity-args "--bind /media/superdisk:/media/superdisk" --latency-wait 20
cd ..
```

## cat qualities

From
- `01_infos/config_test.yaml` 
	* `fichiers:` `rapidrun`

peguerin's avatar
peguerin committed
112
* will concatenate and format .qual files by `{projet}`/`{marker}`
peguerin's avatar
peguerin committed
113
114
115
116
117
118
119
120
121

```
cd 04_cat_quality
snakemake --configfile $CONFIGFILE -s Snakefile -j $CORES --use-singularity --singularity-args "--bind /media/superdisk:/media/superdisk" --latency-wait 20
cd ..
```

## clustering

peguerin's avatar
peguerin committed
122
123
...

peguerin's avatar
peguerin committed
124
125
126
127
128
129
130
131
132
133
134
```
cd 05_clustering
snakemake --configfile $CONFIGFILE -s Snakefile -j $CORES --use-singularity --singularity-args "--bind /media/superdisk:/media/superdisk" --latency-wait 20
cd ..

```