README.md 2.82 KB
Newer Older
peguerin's avatar
peguerin committed
1
2
# snakemake_rapidrun_swarm

peguerin's avatar
peguerin committed
3
4
5
6
7
8
OTU clustering with SWARM on RAPIDRUN data encapsulated in SNAKEMAKE


# Prerequisites


peguerin's avatar
peguerin committed
9
* python3
peguerin's avatar
peguerin committed
10

peguerin's avatar
peguerin committed
11
* snakemake
peguerin's avatar
peguerin committed
12

peguerin's avatar
peguerin committed
13
* singularity
peguerin's avatar
peguerin committed
14

peguerin's avatar
peguerin committed
15
python3 dependencies
peguerin's avatar
peguerin committed
16

peguerin's avatar
peguerin committed
17
```
peguerin's avatar
peguerin committed
18
19
pip3 install pandas
pip3 install biopython
peguerin's avatar
peguerin committed
20
```
peguerin's avatar
peguerin committed
21

peguerin's avatar
peguerin committed
22
python3 dependencies to run `snakemake`:
peguerin's avatar
peguerin committed
23

peguerin's avatar
peguerin committed
24
25
26
27
28
29
```
pip3 install datrie
pip3 install ConfigArgParse
pip3 install appdirs
pip3 install gitdb2
```
peguerin's avatar
peguerin committed
30
31
32
33
34
35
36
37
38
39
40
41

# Configuration / input files

You have to set 2 files:

* [01_infos/all_samples.tsv](01_infos/all_samples.tsv)
* [01_infos/config.yaml](01_infos/config_test.yaml)


# Run the workflow

```
peguerin's avatar
peguerin committed
42
CORES=32
peguerin's avatar
peguerin committed
43
CONFIGFILE="01_infos/config_test.yaml"
peguerin's avatar
peguerin committed
44
bash main.sh $CORES $CONFIGFILE
peguerin's avatar
peguerin committed
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
```


# Run from scratch


## clone repositories

```
git clone git@gitlab.mbb.univ-montp2.fr:edna/snakemake_rapidrun_swarm.git
cd snakemake_rapidrun_swarm
```

## write demultiplexing table

From
- `01_infos/config_test.yaml` 
	* `fichiers:` `rapidrun`
	* `fichiers:` `dat`
peguerin's avatar
peguerin committed
64
	* `blacklist:`
peguerin's avatar
peguerin committed
65

peguerin's avatar
peguerin committed
66
* :warning: the colon "marker" into `fichiers:` `rapidrun` must be the same name as marker's keys of `fichiers:` `dat` into `01_infos/config_test.yaml` 
peguerin's avatar
peguerin committed
67

peguerin's avatar
peguerin committed
68
* This will generate a file `01_infos/all_demultiplex.csv` with each line arguments for each command to run in parallel.
peguerin's avatar
peguerin committed
69

peguerin's avatar
peguerin committed
70
71
72
73
74
75
76
* `blacklist:` `projet:` contains a list of projects you don't want to proceed 


* `blacklist:` `run:` contains a list of runs you don't want to proceed 



peguerin's avatar
peguerin committed
77
78
79
80
81
82
83
84
85
86
```
snakemake --configfile 01_infos/config_test.yaml -s readwrite_rapidrun_demultiplexing.py
```


## merge fastq

From
- `01_infos/config_test.yaml` 
	* `fichiers:` `rapidrun`
peguerin's avatar
peguerin committed
87
	* `blacklist:`
peguerin's avatar
peguerin committed
88

peguerin's avatar
peguerin committed
89
* Deduce the `{run}` fastq paired-end to merge
peguerin's avatar
peguerin committed
90

peguerin's avatar
peguerin committed
91
92
93
94
95
* `blacklist:` `projet:` contains a list of projects you don't want to proceed 


* `blacklist:` `run:` contains a list of runs you don't want to proceed 

peguerin's avatar
peguerin committed
96
97
98
99
100
101
102
103
104
105
106
107
108
109


```
cd 02_assembly
snakemake --configfile $CONFIGFILE -s Snakefile -j $CORES --use-singularity --singularity-args "--bind /media/superdisk:/media/superdisk" --latency-wait 20
cd ..
```

## demultiplexing

From
- `01_infos/config_test.yaml` 
	* `fichiers:` `rapidrun`

peguerin's avatar
peguerin committed
110
* will generate demultiplexed .fasta for each `{projet}`/`{marker}`/`{sample}` into `03_demultiplexing`
peguerin's avatar
peguerin committed
111
112
113
114
115
116
117
118
119
120
121
122
123
124


```
cd 03_demultiplexing
snakemake --configfile $CONFIGFILE -s Snakefile -j $CORES --use-singularity --singularity-args "--bind /media/superdisk:/media/superdisk" --latency-wait 20
cd ..
```

## cat qualities

From
- `01_infos/config_test.yaml` 
	* `fichiers:` `rapidrun`

peguerin's avatar
peguerin committed
125
* will concatenate and format .qual files by `{projet}`/`{marker}`
peguerin's avatar
peguerin committed
126
127
128
129
130
131
132
133
134

```
cd 04_cat_quality
snakemake --configfile $CONFIGFILE -s Snakefile -j $CORES --use-singularity --singularity-args "--bind /media/superdisk:/media/superdisk" --latency-wait 20
cd ..
```

## clustering

peguerin's avatar
peguerin committed
135
136
...

peguerin's avatar
peguerin committed
137
138
139
140
141
142
143
144
145
146
147
```
cd 05_clustering
snakemake --configfile $CONFIGFILE -s Snakefile -j $CORES --use-singularity --singularity-args "--bind /media/superdisk:/media/superdisk" --latency-wait 20
cd ..

```