Sample quality control#
In this notebook, we perform quality control on samples, removing samples with very low depth or elevated heterozygosity.
Coverage data#
How many samples fall below the threshold for total reads?
Removing 70 samples due to low total depth
Total reads per target SNP#
Which target SNPs have lower total depth than the amplicon threshold?
Removing 2 target SNPs due to low total depth
mutation | |
---|---|
56 | DSX1 |
57 | DSX2 |
Number of missing calls#
14 samples have more than 80 missing calls overall out of all possible target SNPs
14/14 of these are also present in the low depth samples to be excluded
Autosome / Sex chromosome coverage ratios (ag-vampir only)#
Females will have a lower ratio of autosomes:x, and males will have a higher ratio. Its not clear whether we can use this yet to sex samples.
Sample heterozygosity#
Locate heterozygosity outliers#
We then find samples within each cohort which have a heterozygosity (2.5 * IQR) higher than the 75% quantile, to exclude samples with very high heterozygosity for their cohort.
For Obuasi the heterozygosity threshold is 0.145, out of 96 samples, 1 are outliers
For Gambia_URR the heterozygosity threshold is 0.125, out of 96 samples, 5 are outliers
For VK7 the heterozygosity threshold is 0.118, out of 96 samples, 1 are outliers
For Siaya the heterozygosity threshold is 0.375, out of 264 samples, 0 are outliers
Removing 7 samples in total due to high heterozygosity
Preliminary PCA - remove outliers#
removing any invariant and highly missing sites
Obuasi - Found 3 PCA outliers in 96 samples using Z-scores
removing any invariant and highly missing sites
Gambia_URR - Found 2 PCA outliers in 96 samples using Z-scores
removing any invariant and highly missing sites
removing any invariant and highly missing sites
Siaya - Found 11 PCA outliers in 264 samples using Z-scores
Summary of samples to exclude#
location | count | |
---|---|---|
0 | Gambia_URR | 29 |
1 | Obuasi | 22 |
2 | VK7 | 22 |
3 | Siaya | 12 |
4 | total | 85 |
Sample QC complete!#
A new metadata file with low-quality samples removed has been written to results/config/ :)