Population structure

Population structure#

In this notebook, we run a principal components analysis and build a neighbour joining tree on the amplicon sequencing variant data. For the PCA, we will plot PC1 v PC2 and PC3 v PC4, and the variance explained by the model.

Variance explained#

The variance explained shows the proportion of total variance in the dataset that is captured by each principal component. Higher values indicate more informative components. As a general rule of thumb, when the variance explained for each PC begins to flatten out, that is when the PCs are no longer informative.

PCA#

Principal Component Analysis (PCA) is a dimensionality reduction technique that transforms high-dimensional genetic data into a smaller set of uncorrelated variables (Reich et al., 2008). It helps visualize population structure and genetic relationships between samples.

NJT#

Neighbor-Joining Tree (NJT) is a clustering method that reconstructs evolutionary relationships between samples based on genetic distances (Saitou & Nei, 1987). It creates a tree where genetically similar samples cluster together.

excluding extreme outliers from NJT []