Exercise 2

Enriched Heatmaps and identification of regions with different activities

Learning Objectives

  • By the end of this section, you will be able to:
  • Distinguish between when to use Heatmap() and HeatmapAnnotation()
  • Generate row split annotations for enriched heatmaps
  • Add multiple categorical annotations with separate color legends
  • Avoid common data handling errors in heatmap plotting

Load Libraries

ATAC SE object

Here we load the SummarizedExperiment for the ATAC-seq data

atac <- readRDS("data/atac_se.rds")

ATAC peaks info

Each peak region represent the activity in a genomic region. These peaks are annotated for various features, including their genomic annotations and the results for differential accessibility analysis is also added there.

rowRanges(atac)
GRanges object with 3656 ranges and 18 metadata columns:
         seqnames              ranges strand |             annotation   geneChr
            <Rle>           <IRanges>  <Rle> |            <character> <integer>
     [1]     chr1     3670547-3672665      * |       Promoter (<=1kb)         1
     [2]     chr1     4332510-4332710      * | Intron (ENSMUST00000..         1
     [3]     chr1     4491755-4492573      * |       Promoter (1-2kb)         1
     [4]     chr1     4571186-4572423      * |      Distal Intergenic         1
     [5]     chr1     4785062-4786325      * |       Promoter (<=1kb)         1
     ...      ...                 ...    ... .                    ...       ...
  [3652]     chr2 181764664-181764891      * |       Promoter (1-2kb)         2
  [3653]     chr2 181766881-181767875      * |       Promoter (<=1kb)         2
  [3654]     chr2 181837244-181838655      * |       Promoter (<=1kb)         2
  [3655]     chr2 181863853-181865066      * |       Promoter (<=1kb)         2
  [3656]     chr2 181918525-181918853      * |      Distal Intergenic         2
         geneStart   geneEnd geneLength geneStrand      geneId
         <integer> <integer>  <integer>  <integer> <character>
     [1]   3214482   3671498     457017          2      497097
     [2]   4344146   4360314      16169          2       19888
     [3]   4492465   4493735       1271          2       20671
     [4]   4491390   4497354       5965          2       20671
     [5]   4773206   4785710      12505          2       27395
     ...       ...       ...        ...        ...         ...
  [3652] 181763332 181827797      64466          1       17932
  [3653] 181767029 181795892      28864          1       17932
  [3654] 181837854 181857461      19608          1      245867
  [3655] 181864337 181870830       6494          1       67005
  [3656] 181864360 181866209       1850          1       67005
                  transcriptId distanceToTSS            ENSEMBL      SYMBOL
                   <character>     <numeric>        <character> <character>
     [1]  ENSMUST00000070533.4             0 ENSMUSG00000051951        Xkr4
     [2]  ENSMUST00000027032.5         27604 ENSMUSG00000025900         Rp1
     [3]  ENSMUST00000191939.1          1162 ENSMUSG00000025902       Sox17
     [4]  ENSMUST00000192650.5        -73832 ENSMUSG00000025902       Sox17
     [5]  ENSMUST00000130201.7             0 ENSMUSG00000033845      Mrpl15
     ...                   ...           ...                ...         ...
  [3652] ENSMUST00000081125.10          1332 ENSMUSG00000010505        Myt1
  [3653]  ENSMUST00000156190.7             0 ENSMUSG00000010505        Myt1
  [3654] ENSMUST00000029116.13             0 ENSMUSG00000027589      Pcmtd2
  [3655]  ENSMUST00000039551.8             0 ENSMUSG00000038628      Polr3k
  [3656]  ENSMUST00000153214.1         54165 ENSMUSG00000038628      Polr3k
                       GENENAME        anno     logFC    logCPM         F
                    <character> <character> <numeric> <numeric> <numeric>
     [1] X-linked Kx blood gr..    Promoter  0.602984  10.45378  41.78446
     [2] retinitis pigmentosa..      Intron  0.855748   5.18424   3.68371
     [3] SRY (sex determining..    Promoter -0.620926   7.13469   9.14522
     [4] SRY (sex determining..  Intergenic -0.511046   8.90885  15.15688
     [5] mitochondrial riboso..    Promoter  0.540908   8.63469  15.08892
     ...                    ...         ...       ...       ...       ...
  [3652] myelin transcription..    Promoter -1.275275   5.54135  10.44866
  [3653] myelin transcription..    Promoter  0.477658   7.56969   6.03877
  [3654] protein-L-isoasparta..    Promoter  0.689247   7.92175  16.67881
  [3655] polymerase (RNA) III..    Promoter  0.241693   9.05746   4.06676
  [3656] polymerase (RNA) III..  Intergenic  0.394577   6.84745   2.73331
              PValue      qvalue
           <numeric>   <numeric>
     [1] 6.58453e-08 4.81983e-07
     [2] 6.13517e-02 9.28785e-02
     [3] 4.12368e-03 8.38540e-03
     [4] 3.27158e-04 8.67115e-04
     [5] 3.36058e-04 8.82634e-04
     ...         ...         ...
  [3652] 0.002309221  0.00497558
  [3653] 0.017945118  0.03116850
  [3654] 0.000181039  0.00051468
  [3655] 0.049773014  0.07677364
  [3656] 0.105289776  0.15077925
  -------
  seqinfo: 21 sequences from an unspecified genome; no seqlengths
rd_atac <- rowRanges(atac)
rd_atac <- rd_atac[abs(rd_atac$logFC) >= 0.5 & rd_atac$qvalue <= 0.1]
colnames(elementMetadata(rd_atac)) <- paste("ATAC", colnames(elementMetadata(rd_atac)), sep = "_")

Taking 1000 bp around the mid of ATAC-peaks

Most of the peaks are around 1000 bp wide. We can check that

median(lengths(rowRanges(atac)))
[1] 1031

For plotting the data, we can hence consider 1000 bp around the peak-mid

# Finding mid of all peaks
mid_peaks <- gr.mid(rd_atac)

# adding names to peaks to give them identity
names(mid_peaks) <- paste("peak", 1:length(mid_peaks), sep = "_")

Normalizing data for plotting

bigWig files are compressed, indexed, binary format used for efficiently displaying continuous data, like genomic signal data, in genome browsers. Here, we read in ATAC-seq bigWig files, filters the data to specific chromosomes, normalizes signal intensity around genomic regions of interest (peak centers), and saves the resulting matrices for downstream visualization.

atac_files <- list.files("data", pattern = "ATAC", full.names = TRUE)
names(atac_files) <- gsub(pattern = "\\.bw", replacement = "", x = basename(atac_files))
atac_bw <- lapply(atac_files, function(x){
  a <- rtracklayer::import(x)
  a <- a[seqnames(a) %in% c("chr1", "chr2")]
  a
})

bigWig files are represented as GRanges.

atac_bw
$ATAC_11half
GRanges object with 6354084 ranges and 1 metadata column:
            seqnames              ranges strand |     score
               <Rle>           <IRanges>  <Rle> | <numeric>
        [1]     chr1           1-3000000      * | 0.0000000
        [2]     chr1     3000001-3000050      * | 0.0225005
        [3]     chr1     3000051-3000100      * | 0.0600013
        [4]     chr1     3000101-3000150      * | 0.0337507
        [5]     chr1     3000151-3000200      * | 0.0375008
        ...      ...                 ...    ... .       ...
  [6354080]     chr2 182013051-182013100      * |   45.3122
  [6354081]     chr2 182013101-182013150      * |   49.6173
  [6354082]     chr2 182013151-182013200      * |   79.4379
  [6354083]     chr2 182013201-182013250      * |   50.2623
  [6354084]     chr2 182013251-182113224      * |    0.0000
  -------
  seqinfo: 66 sequences from an unspecified genome

$ATAC_15half
GRanges object with 5356375 ranges and 1 metadata column:
            seqnames              ranges strand |      score
               <Rle>           <IRanges>  <Rle> |  <numeric>
        [1]     chr1           1-3000400      * | 0.00000000
        [2]     chr1     3000401-3000500      * | 0.00844927
        [3]     chr1     3000501-3000550      * | 0.00000000
        [4]     chr1     3000551-3000650      * | 0.00844927
        [5]     chr1     3000651-3003300      * | 0.00000000
        ...      ...                 ...    ... .        ...
  [5356371]     chr2 182012501-182012600      * | 0.01689850
  [5356372]     chr2 182012601-182012650      * | 0.00844927
  [5356373]     chr2 182012651-182012900      * | 0.00000000
  [5356374]     chr2 182012901-182013000      * | 0.01689850
  [5356375]     chr2 182013001-182113224      * | 0.00000000
  -------
  seqinfo: 66 sequences from an unspecified genome

Next, we calculate the normalized signals into the area of our interest. Please check ?normalizeToMatrix for details of this function

mat_AS <- lapply(atac_bw, FUN = function(x) {
  normalizeToMatrix(x, mid_peaks,
    extend = 1000,
    value_column = "score",
    include_target = TRUE,
    mean_mode = "w0",
    w = 20,
    smooth = T,
    background = 0
  )
})
Warning: Width of `target` are all 1, `include_target` is set to `FALSE`.
Warning: Width of `target` are all 1, `include_target` is set to `FALSE`.
mat_AS
$ATAC_11half
Normalize x to mid_peaks:
  Upstream 1000 bp (50 windows)
  Downstream 1000 bp (50 windows)
  Include target regions (width = 1)
  1933 target regions

$ATAC_15half
Normalize x to mid_peaks:
  Upstream 1000 bp (50 windows)
  Downstream 1000 bp (50 windows)
  Include target regions (width = 1)
  1933 target regions

Enriched heatmap

Enriched heatmap is a special type of heatmap which visualizes the enrichment of genomic signals over specific target regions.

EnrichedHeatmap(mat = mat_AS$ATAC_11half, name = "E11.5") + 
  EnrichedHeatmap(mat = mat_AS$ATAC_15half, name = "E15.5")
The automatically generated colors map from the 1^st and 99^th of the
values in the matrix. There are outliers in the matrix whose patterns
might be hidden by this color mapping. You can manually set the color
to `col` argument.

Use `suppressMessages()` to turn off this message.
The automatically generated colors map from the 1^st and 99^th of the
values in the matrix. There are outliers in the matrix whose patterns
might be hidden by this color mapping. You can manually set the color
to `col` argument.

Use `suppressMessages()` to turn off this message.

Joining 2 EnrichedHeatmaps is very easy with a + sign.

Changing aesthectics of Enriched heatmap

Let’s work on one data for now.

Changind color and size

EnrichedHeatmap(
  mat = mat_AS$ATAC_11half,    # normalized matrix
  name = "E11.5",              # Name for the plot
  col = c("white", "red"),     # We change the colors for low to high values
  width = unit(4, "cm"),       # Width of the heatmap
  height = unit(8, "cm")      # Height of the heatmap
)

You may wonder why the color looks so light. The reason is in coverage values in ATAC, there exist some extreme values, which results in extreme value in normalizedMatrix.

Color based on quantile

# Taking data between 1 and 99 percentile
col_fun <- colorRamp2(quantile(mat_AS$ATAC_11half, c(0.01, 0.99)), c("white", "red"))

EnrichedHeatmap(
  mat = mat_AS$ATAC_11half,
  name = "E11.5",
  col = col_fun,
  width = unit(4, "cm"),
  height = unit(8, "cm")
)

Changing some other aesthetics

# We first change the color legent of the plot to show only 3 values
vmin <- as.numeric(quantile(mat_AS$ATAC_11half, c(0.01)))
vmax <- as.numeric(quantile(mat_AS$ATAC_11half, c(0.99)))
vmid <- (vmin + vmax) / 2
legend_ticks <- c(vmin, vmid, vmax)

EnrichedHeatmap(
  mat = mat_AS$ATAC_11half,
  name = "E11.5",
  col = col_fun,
  width = unit(4, "cm"),
  height = unit(8, "cm"),
  column_title = "E11.5",
  column_title_gp = gpar(fontsize = 10, fill = "#ffcccc"),
  axis_name = c("-1kb", "mid", "1kb"),  # We changed the axis names here
 heatmap_legend_param = list(
    at = legend_ticks,
    labels = round(legend_ticks, digits = 1),
    title_gp = gpar(fontsize = 8),
    labels_gp = gpar(fontsize = 7)
  ),
  top_annotation = HeatmapAnnotation(
    lines = anno_enriched(
      height = unit(2, "cm"),
      gp = gpar(
        lwd = 0.7,
        fontsize = 5
      ),
      axis_param = list(
        side = "right",
        facing = "inside",
        gp = gpar(
          fontsize = 7,
          col = "black",
          lwd = 0.4
        )
      )
    )
  )
)

Split the Enriched heatmap based on logFC values

Although we see some signal here, but it might be a good idea to split the heatmap into the regions which gained and lost accessibility.

split_change <- ifelse(mid_peaks$ATAC_logFC > 0, yes = "Increased accessibility", no = "Decreased accessibility")
names(split_change) <- names(mid_peaks)

head(split_change)
                   peak_1                    peak_2                    peak_3 
"Increased accessibility" "Increased accessibility" "Decreased accessibility" 
                   peak_4                    peak_5                    peak_6 
"Decreased accessibility" "Increased accessibility" "Decreased accessibility" 
# Define cluster colors
cluster_colors <- c("Increased accessibility" = "red", "Decreased accessibility" = "blue")

# Make sure split_change has levels matching the color names
split_change <- factor(split_change, levels = names(cluster_colors))

EnrichedHeatmap(
  mat = mat_AS$ATAC_11half,
  name = "E11.5",
  row_split = split_change,
  col = col_fun,
  width = unit(4, "cm"),
  height = unit(8, "cm"),
  column_title = "E11.5",
  column_title_gp = gpar(fontsize = 10, fill = "#ffcccc"),
  axis_name = c("-1kb", "mid", "1kb"),
  heatmap_legend_param = list(
    at = legend_ticks,
    labels = round(legend_ticks, digits = 1),
    title_gp = gpar(fontsize = 8),
    labels_gp = gpar(fontsize = 7)
  ),
  top_annotation = HeatmapAnnotation(
    lines = anno_enriched(
      gp = gpar(col = cluster_colors),
      height = unit(2, "cm"),
      axis_param = list(
        side = "right",
        facing = "inside",
        gp = gpar(
          fontsize = 7,
          lwd = 0.4
        )
      )
    )
  )
)

Make a function to make this heatmap

As you know we have at least 2 samples as of now. It will be a good idea to create a function to make this heatmap.

make_EH <- function(norm_mat, heatmap_cols = c("white", "red"), split_rows = NULL, hm_name, col_fill = "#ffcccc"){
  col_fun <- colorRamp2(quantile(norm_mat, c(0.01, 0.99)), heatmap_cols)  
  
  vmin <- as.numeric(quantile(norm_mat, c(0.01)))
  vmax <- as.numeric(quantile(norm_mat, c(0.99)))
  vmid <- (vmin + vmax) / 2
  legend_ticks <- c(vmin, vmid, vmax)

EnrichedHeatmap(
  mat = norm_mat,
  name = hm_name,
  row_split = split_rows,
  col = col_fun,
  width = unit(4, "cm"),
  height = unit(8, "cm"),
  column_title = hm_name,
  column_title_gp = gpar(fontsize = 10, fill = col_fill),
  axis_name = c("-1kb", "mid", "1kb"),
  heatmap_legend_param = list(
    at = legend_ticks,
    labels = round(legend_ticks, digits = 1),
    title_gp = gpar(fontsize = 8),
    labels_gp = gpar(fontsize = 7)
  ),
  top_annotation = HeatmapAnnotation(
    lines = anno_enriched(
      gp = gpar(col = cluster_colors),
      height = unit(2, "cm"),
      axis_param = list(
        side = "right",
        facing = "inside",
        gp = gpar(
          fontsize = 7,
          lwd = 0.4
        )
      )
    )
  )
)
}

Make Enriched Heatmaps for both ATAC samples

eh_11h <- make_EH(norm_mat = mat_AS$ATAC_11half, split_rows = split_change, hm_name = "E11.5")
eh_15h <- make_EH(norm_mat = mat_AS$ATAC_15half, split_rows = split_change, hm_name = "E15.5", col_fill = "#e6fff2")

draw(eh_11h + eh_15h, merge_legend = TRUE)

Another way to make annotations for split

It is probably a good idea to represent clusters with colors, instead of text

eh_11h <- make_EH(norm_mat = mat_AS$ATAC_11half, hm_name = "E11.5")
eh_15h <- make_EH(norm_mat = mat_AS$ATAC_15half, hm_name = "E15.5", col_fill = "#e6fff2")

row_order_eh <- row_order(eh_11h)
Warning: The heatmap has not been initialized. You might have different results
if you repeatedly execute this function, e.g. when row_km/column_km was
set. It is more suggested to do as `ht = draw(ht); row_order(ht)`.
anno_hm <- Heatmap(
  split_change,
  col = c("red", "blue"), 
  name = "Change",
  show_row_names = FALSE, 
  show_column_names = FALSE, 
  width = unit(3, "mm"),
  height = unit(8, "cm"),
  row_order = row_order_eh,
  row_title_gp = gpar(fontsize = 0)
)

draw(anno_hm + eh_11h + eh_15h, split = split_change, merge_legend = TRUE)

Question 1

Can you make split the Enriched Heatmap based on the annotations?

Hint: mid_peaks$ATAC_anno contain annotations for the regions.

split_anno <- mid_peaks$ATAC_anno
names(split_anno) <- names(mid_peaks)

head(split_anno)
      peak_1       peak_2       peak_3       peak_4       peak_5       peak_6 
  "Promoter"     "Intron"   "Promoter" "Intergenic"   "Promoter" "Intergenic" 
cols_an <- RColorBrewer::brewer.pal(n = length(unique(split_anno)), name = "Set1")

anno_an <- Heatmap(
  split_anno,
  col = cols_an, 
  name = "anno",
  show_row_names = FALSE, 
  show_column_names = FALSE, 
  width = unit(3, "mm"),
  height = unit(8, "cm"),
  row_order = row_order_eh,
  row_title_gp = gpar(fontsize = 0)
)

draw(anno_an + eh_11h + eh_15h, split = split_anno, merge_legend = TRUE)

Question 2

Can you make split the Enriched Heatmap based on the annotations and change in direction?

Hint: mid_peaks$ATAC_anno contain annotations for the regions. mid_peaks$ATAC_logFC contain sign of change.

split_anno_dir <- paste(mid_peaks$ATAC_anno, ifelse(mid_peaks$ATAC_logFC > 0, yes = "Inc", no = "Dec"))
names(split_anno_dir) <- names(mid_peaks)

head(split_anno_dir)
          peak_1           peak_2           peak_3           peak_4 
  "Promoter Inc"     "Intron Inc"   "Promoter Dec" "Intergenic Dec" 
          peak_5           peak_6 
  "Promoter Inc" "Intergenic Dec" 
cols_an <- RColorBrewer::brewer.pal(n = length(unique(split_anno_dir)), name = "Paired")

anno_an_dir <- Heatmap(
  split_anno_dir,
  col = cols_an, 
  name = "anno",
  show_row_names = FALSE, 
  show_column_names = FALSE, 
  width = unit(3, "mm"),
  height = unit(8, "cm"),
  row_order = row_order_eh,
  row_title_gp = gpar(fontsize = 0)
)
There are 10 unique colors in the vector `col` and 10 unique values in
`matrix`. `Heatmap()` will treat it as an exact discrete one-to-one
mapping. If this is not what you want, slightly change the number of
colors, e.g. by adding one more color or removing a color.
draw(anno_an_dir + eh_11h + eh_15h, split = split_anno_dir, merge_legend = TRUE)

Question 3

Can you make split the Enriched Heatmap based on the annotations and change in direction with separate color bars for annotation and direction of change?

Hint: mid_peaks$ATAC_anno contain annotations for the regions. mid_peaks$ATAC_logFC contain sign of change.

split_anno_df <- data.frame(
  Annotation = mid_peaks$ATAC_anno,
  Direction = ifelse(mid_peaks$ATAC_logFC > 0, yes = "Inc", no = "Dec")
)

head(split_anno_df)
  Annotation Direction
1   Promoter       Inc
2     Intron       Inc
3   Promoter       Dec
4 Intergenic       Dec
5   Promoter       Inc
6 Intergenic       Dec
cols_an <- c("red", "blue", 
             RColorBrewer::brewer.pal(n = length(unique(split_anno_df$Annotation)), name = "Set1")
)

names(cols_an) <- c(unique(split_anno_df$Direction), unique(split_anno_df$Annotation))

anno_an_df <- Heatmap(
  split_anno_df,
  name = "anno",
  col = cols_an,
  show_row_names = FALSE, 
  show_column_names = FALSE, 
  width = unit(3, "mm"),
  height = unit(8, "cm"),
  row_order = row_order_eh,
  row_title_gp = gpar(fontsize = 0)
)
Warning: The input is a data frame-like object, convert it to a matrix.
Warning: Note: not all columns in the data frame are numeric. The data frame
will be converted into a character matrix.
draw(anno_an_df + eh_11h + eh_15h, split = split_anno_dir, merge_legend = TRUE)

Important

Heatmap and EnrichedHeatmap serve different functions, however the plots can be combined effortlessly. This makes the visualization of complex data easy.