# Clustering

## Exercises

Load the `seu_int` dataset you have created earlier today:

``````seu_int <- readRDS("seu_int_day2_part1.rds")
``````

The method implemented in Seurat first constructs a SNN graph based on the euclidean distance in PCA space, and refine the edge weights between any two cells based on the shared overlap in their local neighborhoods (Jaccard similarity). This step is performed using the `FindNeighbors()` function, and takes as input the previously defined dimensionality of the dataset.

Note

We use the integrated object (`seu_int`) and the assay `integrated`. Unsure? Check `DefaultAssay(seu_int)`, and set it by `DefaultAssay(seu_int) <- "integrated"`.

``````seu_int <- Seurat::FindNeighbors(seu_int, dims = 1:25)
``````

To cluster the cells, Seurat next implements modularity optimization techniques such as the Louvain algorithm (default) or SLM [SLM, Blondel et al., Journal of Statistical Mechanics], to iteratively group cells together, with the goal of optimizing the standard modularity function. The `FindClusters()` function implements this procedure, and contains a resolution parameter that sets the ‘granularity’ of the downstream clustering, with increased values leading to a greater number of clusters.

``````seu_int <- Seurat::FindClusters(seu_int, resolution = seq(0.1, 0.8, by=0.1))
``````

Cluster id of each cell is added to the metadata object, as a new column for each resolution tested:

``````head(seu_int@meta.data)
``````

To view how clusters sub-divide at increasing resolution:

``````library(clustree)
clustree::clustree(seu_int@meta.data[,grep("integrated_snn_res", colnames(seu_int@meta.data))],
prefix = "integrated_snn_res.")
``````

You can view the UMAP coloring each cell according to a cluster id like this:

``````Seurat::DimPlot(seu_int, group.by = "integrated_snn_res.0.1")
``````

Exercise: Visualise clustering based on a few more resolutions. Taking the clustering and the UMAP plots into account what do you consider as a good resolution to perform the clustering?

``````Seurat::DimPlot(seu_int, group.by = "integrated_snn_res.0.3")
Exercise: When do the number of neighbors need to be changed? How does changing the method of clustering in `FindClusters` affect the output? Which parameter should be changed?