Sample quality control#

In this notebook, we perform quality control on samples, removing samples with very low depth or elevated heterozygosity.

Coverage data#

How many samples fall below the threshold for total reads?

Removing 70 samples due to low total depth

Total reads per target SNP#

Which target SNPs have lower total depth than the amplicon threshold?

Removing 2 target SNPs due to low total depth
mutation
56 DSX1
57 DSX2

Number of missing calls#

14 samples have more than 80 missing calls overall out of all possible target SNPs
14/14 of these are also present in the low depth samples to be excluded

Autosome / Sex chromosome coverage ratios (ag-vampir only)#

Females will have a lower ratio of autosomes:x, and males will have a higher ratio. Its not clear whether we can use this yet to sex samples.

Sample heterozygosity#

Locate heterozygosity outliers#

We then find samples within each cohort which have a heterozygosity (2.5 * IQR) higher than the 75% quantile, to exclude samples with very high heterozygosity for their cohort.

For Obuasi the heterozygosity threshold is 0.145, out of 96 samples, 1 are outliers
For Gambia_URR the heterozygosity threshold is 0.125, out of 96 samples, 5 are outliers
For VK7 the heterozygosity threshold is 0.118, out of 96 samples, 1 are outliers
For Siaya the heterozygosity threshold is 0.375, out of 264 samples, 0 are outliers

Removing 7 samples in total due to high heterozygosity

Preliminary PCA - remove outliers#

removing any invariant and highly missing sites
Obuasi - Found 3 PCA outliers in 96 samples using Z-scores
removing any invariant and highly missing sites
Gambia_URR - Found 2 PCA outliers in 96 samples using Z-scores
removing any invariant and highly missing sites
removing any invariant and highly missing sites
Siaya - Found 11 PCA outliers in 264 samples using Z-scores

Summary of samples to exclude#

location count
0 Gambia_URR 29
1 Obuasi 22
2 VK7 22
3 Siaya 12
4 total 85

Sample QC complete!#

A new metadata file with low-quality samples removed has been written to results/config/ :)