# 1- calculate individual homozygosity with vcftools
# 2- calculate individual heterozygosity in R
# 3- inspect values and keep only individuals with non-extreme values
# extreme values are defined as those that fall outside of 9 times the interquartal range. (typical outliers are defined as 1.5IQR) this was set arbitrarily after visual inspection, and to allows the same criteria to be applied to both species
# extreme values are defined as those that fall outside of 6 times the interquartal range. (typical outliers are defined as 1.5IQR)
# this was set arbitrarily after visual inspection, and to allow the same criteria to be applied to both species
@@ -39,44 +39,6 @@ bash filtering.sh ../00-rawData/02-Mullus/mullus.vcf 02-Mullus mul
The final step consists of removing individuals with (extreme) outlier heterozygosity values. Here we define outliers as falling outside 6 * interquartile range.
This theshold value is set in accordance to our data and it is advised to look at the outputted figures to validate this choice for your data.
### SNP filtering results for Mullus surmuletus
| filtering step | filter for | individuals retained | SNPs retained | run time (sec) | output |