Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
seaconnect
dataPrep
Commits
ede530d6
Commit
ede530d6
authored
Jul 15, 2019
by
eboulanger
Browse files
scripts and results update
parent
4f618e30
Changes
4
Hide whitespace changes
Inline
Side-by-side
01-SNPfilters/filtering_mul.sh
View file @
ede530d6
...
...
@@ -7,7 +7,7 @@ mkdir 02-Mullus
cd
02-Mullus
FILE
=
../../00-rawData/02-Mullus/mullus.vcf
#
PATH=$PATH:~/programmes/vcflib/bin/
PATH
=
$PATH
:~/programmes/vcflib/bin/
# step 1: filter genotypes called below 50% (across all individuals),
# filter SNPs that have a minor allele count less than 3
...
...
@@ -52,7 +52,7 @@ vcftools --vcf mullus.g5mac3dp3.recode.vcf --remove lowDP.indv --recode --recode
# step 4 : restrict data to variants called in a high percentage of individuals and filter by mean depth of genotypes
vcftools
--vcf
mullus.g5mac3dplm.recode.vcf
--max-missing
0.95
--maf
0.05
--recode
--recode-INFO-all
--out
DP3g95maf05
--min-meanDP
0
vcftools
--vcf
mullus.g5mac3dplm.recode.vcf
--max-missing
0.95
--maf
0.05
--recode
--recode-INFO-all
--out
DP3g95maf05
--min-meanDP
5
###
# FreeBayes - specific filters
...
...
@@ -63,7 +63,7 @@ awk '/#/' DP3g95maf05.recode.vcf
# step 5 : filter for allele balance
# this requires vcflib
vcffilter
-s
-f
"AB > 0.25 & AB < 0.75 | AB < 0.01"
DP3g95p5maf05.recode.vcf
>
DP3g95p5maf05.fil1.vcf
# population-level:
vcffilter -s -f "AB > 0.25 & AB < 0.75 | AB < 0.01" DP3g95p5maf05.recode.vcf > DP3g95p5maf05.fil1.vcf
vcffilter
-s
-f
"AB > 0.25 & AB < 0.75 | AB < 0.01"
DP3g95maf05.recode.vcf
>
DP3g95maf05.fil1.vcf
# check how many were removed
awk
'!/#/'
DP3g95maf05.recode.vcf |
wc
-l
...
...
@@ -73,8 +73,9 @@ awk '!/#/' DP3g95maf05.fil1.vcf | wc -l
vcffilter
-f
"SAF / SAR > 100 & SRF / SRR > 100 | SAR / SAF > 100 & SRR / SRF > 100"
-s
DP3g95maf05.fil1.vcf
>
DP3g95maf05.fil2.vcf
awk
'!/#/'
DP3g95maf05.fil2.vcf |
wc
-l
# SKIPPED STEP 6 FOR NOW
# step 7 : look at ration of mapping qualities between reference and alternate alleles
vcffilter
-f
"MQM / MQMR > 0.9 & MQM / MQMR < 1.05"
DP3g95maf05.fil
2
.vcf
>
DP3g95maf05.fil3.vcf
vcffilter
-f
"MQM / MQMR > 0.9 & MQM / MQMR < 1.05"
DP3g95maf05.fil
1
.vcf
>
DP3g95maf05.fil3.vcf
awk
'!/#/'
DP3g95maf05.fil3.vcf |
wc
-l
# step 8 : check whether there is discrepancy in the properly paired status (SNAP IK NIET MAAR MAAKT BLIJKBAAR NIET UIT WANT GEEN SNPs WEGGEFILTERD)
...
...
@@ -101,7 +102,7 @@ paste DP3g95maf05.fil5.vcf.loci.qual DP3g95maf05.fil5.DEPTH | awk -v x=$MEAN_DP_
## 6 : remove those sites and recalculate depth across loci
vcftools
--vcf
DP3g95maf05.fil5.vcf
--site-depth
--exclude-positions
DP3g95maf05.fil5.lowQDloci
--out
DP3g95maf05.fil5
## 7 : cut depth scores, calculate average depth and plot
NINDIV
=
297
NINDIV
=
424
cut
-f3
DP3g95maf05.fil5.ldepth
>
DP3g95maf05.fil5.site.depth
awk
'!/D/'
DP3g95maf05.fil5.site.depth |
awk
-v
x
=
$NINDIV
'{print $1/x}'
>
meandepthpersite
gnuplot
<<
\
EOF
...
...
@@ -139,10 +140,13 @@ TO DO
TO DO
# step 3 : outliers
# step
1
3 : outliers
TO DO
# step 14 : rename final filtered dataset
cp
DP3g95maf05.FIL.hwe.recode.vcf mullus_all_filtered.vcf
01-SNPfilters/filtering_results_dip.txt
View file @
ede530d6
SNP filtering results for Diplodus sargus
ddocent pipeline + additions
filtering step filter for individuals retained SNPs retained run time
step 0 ddocent output data 297 13362
step 1 call below 50%, mac < 3, min quality score = 30 297 13362 14.00
step 2 min mean depth genotype call = 3 297 13362 14.00
step 3 remove individuals > 50% missing data 297 13362 16.00
step 4 remove sites > 5% missing data, maf 0.05, min meanDP = 15 297 10342 11.00
step 5 filter for allele balance 297 10155
step 6 filter out sites with reads from both strands 297 9899
step 7 ration of mapping qualities reference vs alternate alleles 297 9399
step 8 paired status 297 9399
step 9 remove sites quality score < 1/4 depth 297 9398
step 10 depth x quality score cutoff 297 8077 11.00
step 11 HWE 0.000001 297 7761 7.00
step 12 He
step 13 Fis
\ No newline at end of file
SNP filtering results for Diplodus sargus
ddocent pipeline + additions
filtering step filter for individuals retained SNPs retained run time (sec) output
step 0 ddocent output data 297 13362
step 1 call below 50%, mac < 3, min quality score = 30 297 13362 14.00
step 2 min mean depth genotype call = 3 297 13362 14.00
step 3 remove individuals > 50% missing data 297 13362 16.00
step 4 remove sites > 5% missing data, maf 0.05, min meanDP = 15 297 10342 11.00
step 5 filter for allele balance 297 10155
step 6 filter out sites with reads from both strands 297 9899
step 7 ration of mapping qualities reference vs alternate alleles 297 9399
step 8 paired status 297 9399
step 9 remove sites quality score < 1/4 depth 297 9398
step 10 depth x quality score cutoff 297 8077 11.00
step 11 HWE 0.000001 297 7761 7.00
step 12 He
step 13 Fis
\ No newline at end of file
01-SNPfilters/filtering_results_mul.rmd
0 → 100644
View file @
ede530d6
SNP filtering results for Mullus surmuletus
ddocent pipeline + additions
| filtering step | filter for | individuals retained | SNPs retained | run time (sec) | output |
| -------------- | ------------------------------------------------- | :------------------: | :-----------: | :------------: | ------ |
| step 0 | ddocent output data | 431 | 49027 | | mullus.vcf
| step 1 | call below 50%, mac < 3, min quality score = 30 | 431 | 49027 | 71.00 | mullus.g5mac3.recode.vcf
| step 2 | min mean depth genotype call = 3 | 431 | 49027 | 74.00 | mullus.g5mac3dp3.recode.vcf
| step 3 | individuals > 50% missing data | 424 | 49027 | 70.00 | mullus.g5mac3dplm.recode.vcf
| step 4 | sites > 5% missing data, maf 0.05, min meanDP = 5 | 424 | 18727 | 14.00 | DP3g95maf05.recode.vcf
| step 5 | filter for allele balance | | 17965 | | DP3g95maf05.fil1.vcf
| step 6 | filter out sites with reads from both strands | | **SKIP** | |
| step 7 | ration mapping qualities ref vs alternate alleles | | 17546 | | DP3g95maf05.fil3.vcf
| step 8 | paired status | | 17546 | | DP3g95maf05.fil4.vcf
| step 9 | remove sites quality score < 1/4 depth | | 17546 | | DP3g95maf05.fil5.vcf
| step 10 | depth x quality score cutoff | 424 | 15466 | |
| step 11 | HWE 0.000001 | 424 | 14532 | 26.00 | DP3g95maf05.FIL.hwe.recode.vcf
| step 12 | He | | | |
| step 13 | Fis | | | |
| step 14 | rename | | | | mullus_all_filtered.vcf
\ No newline at end of file
01-SNPfilters/filtering_results_mul.txt
View file @
ede530d6
...
...
@@ -6,12 +6,12 @@ step 0 ddocent output data 431 49027 mullus.vcf
step 1 call below 50%, mac < 3, min quality score = 30 431 49027 71.00 mullus.g5mac3.recode.vcf
step 2 min mean depth genotype call = 3 431 49027 74.00 mullus.g5mac3dp3.recode.vcf
step 3 remove individuals > 50% missing data 424 49027 70.00 mullus.g5mac3dplm.recode.vcf
step 4 remove sites > 5% missing data, maf 0.05, min mean DP =
0
424 18727 14.00
step 5 filter for allele balance
step 6 filter out sites with reads from both strands
step 7 ration of mapping qualities reference vs alternate alleles
step 8 paired status
step 9 remove sites quality score < 1/4 depth
step 4 remove sites > 5% missing data, maf 0.05, min mean DP =
5
424 18727 14.00
DP3g95maf05.recode.vcf
step 5 filter for allele balance
17965 DP3g95maf05.fil1.vcf
step 6 filter out sites with reads from both strands
SKIP
step 7 ration of mapping qualities reference vs alternate alleles
17546 DP3g95maf05.fil3.vcf
step 8 paired status
17546 DP3g95maf05.fil4.vcf
step 9 remove sites quality score < 1/4 depth
DP3g95maf05.fil5.vcf
step 10 depth x quality score cutoff
step 11 HWE 0.000001
step 12 He
...
...
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment