CHANGELOG.md 3.75 KB
Newer Older
peguerin's avatar
peguerin committed
1
2
3
# Changelog
All notable changes to this project will be documented in this file.

peguerin's avatar
peguerin committed
4
5
## [unreleased]
### Added
peguerin's avatar
peguerin committed
6
7
- custom reference database
- rules taxonomic assignment custom ref
peguerin's avatar
peguerin committed
8
- `scripts/prepare_spygen_data.py` From SPYGEN Alice's file {marker}, {run} and {projet}, the program seek into corresponding  marker .dat file the column tag to get the plaque position in order to attribute the right {sample} at demultiplexing step
peguerin's avatar
peguerin committed
9
10
11
12

### Changed
- configfile handles custom reference database

peguerin's avatar
peguerin committed
13
14
15
### Removed

- useless quality sequences steps 
peguerin's avatar
peguerin committed
16
17
18
19
20



## [1.1.4]
### Added
peguerin's avatar
peguerin committed
21
22
- fastqc rule
- fastqc conda env
peguerin's avatar
peguerin committed
23
24
- documentation quickstart step
- resources test folder
peguerin's avatar
peguerin committed
25
26
27

### Changed
- name of results subfolders
peguerin's avatar
peguerin committed
28
- add fastqc item to configfiles
peguerin's avatar
peguerin committed
29
30
31
32

### Removed
- deprecated config files

peguerin's avatar
peguerin committed
33
## [1.1.3]
peguerin's avatar
peguerin committed
34
### Added
peguerin's avatar
peguerin committed
35
- "spygen origine" format (one single `.dat` file for one single `.fastq` file) can be processed
peguerin's avatar
peguerin committed
36
- UNIT TEST `01_settings/readwrite_rapidrun_demultiplexing.py` barcode_tags_duplicated.csv has exactly 2 columns (original, duplicated)
peguerin's avatar
peguerin committed
37
38
39
- TEST dataset to check in few seconds if the whole workflow works fine on RAPIDRUN case
- TEST dataset to check in few seconds if the whole workflow works fine on CLASSIC case
- copy and rename files generated from 19_otu_table and 24_table_assigned_sequences to results folder
peguerin's avatar
peguerin committed
40
- resources job: number of parallelized jobs are limited by the resource "job"
peguerin's avatar
peguerin committed
41
42
43
- `scripts/prepare_spygen_data.py` generates the all_samples.csv file from SPYGEN "standard" information file and {marker}.dat files
- TEST integrity of `.dat` files
- 
peguerin's avatar
peguerin committed
44
45

### Changed
peguerin's avatar
peguerin committed
46
47
48
- all worklows are merged into a single one
- results subfolders are generated automatically
- unique `scripts`, `rules` and `envs` folders
peguerin's avatar
peguerin committed
49
- Demultiplex: no_indel option cutadapt to prevent insertion between tags and primers
peguerin's avatar
peguerin committed
50
- Alternative workflow to perform taxonomic assignment wihtout `ecotag`
peguerin's avatar
peguerin committed
51
52
- Fix wildcards in CLASSIC mode
- convert `scripts/scripts/OTU_contingency_table.py` code from python2 into python3
peguerin's avatar
peguerin committed
53
- update cutadapt version 2 to cutadapt version 3.2
54
- Fix SettingWithCopyWarning pandas dataframe
peguerin's avatar
peguerin committed
55
56
- factorisation folder results generation
- Fix data export
peguerin's avatar
peguerin committed
57
58
59
60
61

### Removed
- old folders and scripts 01_settings, 02_assembly, 03_demultiplex, etc...
- old preexisting subfolders into results folder
- `clean.sh` script which is now useless
peguerin's avatar
peguerin committed
62
- `rename.sh` script which is replaced by a rule in Snakefile
peguerin's avatar
peguerin committed
63

peguerin's avatar
peguerin committed
64
65
66
67
68
69
## [1.1.2] - 21th sep 2020
### Added
- can handle 2 different types of input format CLASSIC and RAPIDRUN
- tutorial to process data in CLASSIC format based on a subset of Rhone project data


peguerin's avatar
peguerin committed
70
## [1.1.1] - 8th sep 2020
peguerin's avatar
peguerin committed
71
72
73
74
75
76
77
78
### Added
- data test
- tutorial

### Changed
- The demultiplexing method is more efficient. First it demultiplexes linked barcode tags. For each run, every linked barcode are seeked only once. Second it trimms primer 5' and 3' for each sample fastq files. Primer and barcode tags are processed in forward and reverse complement.


peguerin's avatar
peguerin committed
79
80
## [1.1.0] - 1st sep 2020
### Added
peguerin's avatar
peguerin committed
81
82
83
84
85
86
87
88
89
90
- Conda v4.8.2 envs
- Autorship
- license MIT

### Changed
- new structure for distribution based on snakemake recommandation
- all_samples.tsv rapidrun input is now in a `csv` format with `;` as separator


## [1.0.1] - 31th aug 2020
peguerin's avatar
peguerin committed
91
92
### Added
- blacklist runs or projects from 'rapidrun' .tsv file
peguerin's avatar
peguerin committed
93
- convert otu.table into otu.fasta at step 05_clustering
peguerin's avatar
peguerin committed
94
- step 06_assignment taxonomic assignment
peguerin's avatar
peguerin committed
95

peguerin's avatar
peguerin committed
96
97
### Changed
- configfile has blacklist keys
peguerin's avatar
peguerin committed
98
- configfile has reference database information
peguerin's avatar
peguerin committed
99
- final output is `.csv` instead of `.table`
peguerin's avatar
peguerin committed
100
- script `rename.sh` to automatically rename output files with a prefix
peguerin's avatar
peguerin committed
101
102

### Removed
peguerin's avatar
peguerin committed
103
- old rapidrun tables `.rrr`
peguerin's avatar
peguerin committed
104

peguerin's avatar
peguerin committed
105
106
107
108
109
110
111
112
113
114
## [1.0.0] - 6th feb 2020
### Added
- Complete workflow 04_cat_quality dedicated to generate .qual files

### Changed
- main.sh argument CONFIGFILE is constant

### Removed
- Files .qual are not generated anymore at demultiplexing step