Making Heat Maps In R

Amanda Birmingham (abirmingham at ucsd.edu)

Heat maps are a staple of data visualization for numerous tasks, including differential expression analyses on microarray and RNA-Seq data. Many people have already written heat-map-plotting packages for R, so it takes a little effort to decide which to use; here I investigate the performance of the six that I found referenced most frequently online.

My main goals (YMMV) beyond basic plotting were to be able to (a) annotate rows and columns with metadata information, (b) include scales and labels in the figure itself (since often figures are reused in presentations without caption information), and (c) do as much label customization as possible with the shallowest learning curve. I also want automatic dendrogram creation, so using ggplot2 or another graphics-only package was out. Note that throughout I have accepted the default colors for every heat map tool, as these are pretty easy to change after the fact if you care.

TL;DR: I recommend using heatmap3 (NB: not “heatmap.3”). It mimics the easy-to-use interface of heatmap.2 but can be extended into more complex settings if you later find you need fine-grained control.

Table of Contents

Table of Contents

Set-Up

This blog post is adapted from a Jupyter Notebook and supporting files that you can download from GitHub and run yourself if you have Jupyter Notebook server with an R kernel installed.

All test “data” used here are just random numbers 🙂

In [1]:
# This line prevents SVG output, which does not play well with export to HTML
options(jupyter.plot_mimetypes = c("text/plain", "image/png" ))
In [2]:
# Load the example "data"
gLogCpmData = as.matrix(read.table("heatmap_test_matrix.txt"))
gLogCpmData
Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Sample7 Sample8 Sample9 Sample10 ⋯ Sample47 Sample48 Sample49 Sample50 Sample51 Sample52 Sample53 Sample54 Sample55 Sample56
GeneA 30 67 34 98 32 3 79 15 6 18 97 49 12 6 87 58 55 13 48 28
GeneB 80 70 28 51 74 76 85 98 7 64 45 69 1 9 8 49 97 5 83 66
GeneC 43 36 41 24 71 76 91 50 81 57 21 10 75 35 77 92 85 73 97 12
GeneD 88 66 57 8 11 91 71 84 89 63 12 16 61 42 48 96 26 84 75 78
GeneE 90 57 73 51 86 32 22 78 84 31 37 40 1 37 74 79 89 79 68 42
GeneF 31 87 65 36 64 15 28 89 94 58 58 32 11 18 81 47 27 60 79 91
GeneG 1 93 70 64 98 28 65 96 83 2 51 83 99 3 37 54 6 22 56 19
GeneH 27 18 37 85 59 61 85 3 16 7 38 14 22 66 100 14 99 37 39 51
GeneI 85 35 24 73 21 25 45 80 20 94 34 10 59 40 100 32 40 11 32 4
GeneK 16 55 76 60 40 48 85 90 24 44 66 53 91 23 90 48 61 56 47 74
In [3]:
# Load the example annotation/metadata
gAnnotationData = read.table("heatmap_test_annotation.txt")
gAnnotationData
sample_name subject_drug
1 Sample1 MiracleDrugA
2 Sample2 MiracleDrugA
3 Sample3 MiracleDrugA
4 Sample4 MiracleDrugA
5 Sample5 MiracleDrugB
6 Sample6 MiracleDrugB
7 Sample7 MiracleDrugB
8 Sample8 MiracleDrugB
9 Sample9 MiracleDrugB
10 Sample10 MiracleDrugB
11 Sample11 MiracleDrugB
12 Sample12 MiracleDrugC
13 Sample13 MiracleDrugC
14 Sample14 MiracleDrugC
15 Sample15 MiracleDrugC
16 Sample16 MiracleDrugC
17 Sample17 MiracleDrugC
18 Sample18 MiracleDrugC
19 Sample19 MiracleDrugC
20 Sample20 MiracleDrugA
21 Sample21 MiracleDrugA
22 Sample22 MiracleDrugA
23 Sample23 MiracleDrugA
24 Sample24 MiracleDrugA
25 Sample25 MiracleDrugA
26 Sample26 MiracleDrugA
27 Sample27 MiracleDrugA
28 Sample28 MiracleDrugA
29 Sample29 MiracleDrugA
30 Sample30 MiracleDrugA
31 Sample31 MiracleDrugC
32 Sample32 MiracleDrugC
33 Sample33 MiracleDrugC
34 Sample34 MiracleDrugC
35 Sample35 MiracleDrugC
36 Sample36 MiracleDrugC
37 Sample37 MiracleDrugC
38 Sample38 MiracleDrugC
39 Sample39 MiracleDrugA
40 Sample40 MiracleDrugA
41 Sample41 MiracleDrugA
42 Sample42 MiracleDrugA
43 Sample43 MiracleDrugA
44 Sample44 MiracleDrugA
45 Sample45 MiracleDrugA
46 Sample46 MiracleDrugA
47 Sample47 MiracleDrugA
48 Sample48 MiracleDrugA
49 Sample49 MiracleDrugA
50 Sample50 MiracleDrugA
51 Sample51 MiracleDrugA
52 Sample52 MiracleDrugA
53 Sample53 MiracleDrugA
54 Sample54 MiracleDrugA
55 Sample55 MiracleDrugA
56 Sample56 MiracleDrugA
In [4]:
# Make helper function to map metadata category to color
mapDrugToColor<-function(annotations){
    colorsVector = ifelse(annotations["subject_drug"]=="MiracleDrugA", 
        "blue", ifelse(annotations["subject_drug"]=="MiracleDrugB", 
        "green", "red"))
    return(colorsVector)
}

Table of Contents

heatmap

heatmap is the built-in option for heat maps in R:

In [5]:
# Test heatmap with column annotations
testHeatmap<-function(logCPM, annotations) {    
    sampleColors = mapDrugToColor(annotations)
    heatmap(logCPM, margins=c(5,8), ColSideColors=sampleColors)
}

testHeatmap(gLogCpmData, gAnnotationData)

Not bad, but there are no legends for either the main values or the annotation information.

Table of Contents

heatmap.2

heatmap.2 is an “enhanced” heat map function from the add-on package gplots (not to be confused with ggplot!):

In [6]:
install.packages("gplots")
The downloaded binary packages are in
	/var/folders/hn/rpn4rhms41v939mg20d7w0dh0000gn/T//RtmpjRP53o/downloaded_packages
In [7]:
library(gplots)

# Test heatmap.2 with column annotations and custom legend text
testHeatmap2<-function(logCPM, annotations) {    
    sampleColors = mapDrugToColor(annotations)
    heatmap.2(logCPM, margins=c(5,8), ColSideColors=sampleColors,
        key.xlab="log CPM",
        key=TRUE, symkey=FALSE, density.info="none", trace="none")
}

testHeatmap2(gLogCpmData, gAnnotationData)
Attaching package: 'gplots'

The following object is masked from 'package:stats':

    lowess

I turned off a few of the default options (density.info, trace) to make the graphic a bit less busy. The default main legend is nice, but I don’t see an option to include a legend for the annotation information.

Table of Contents

aheatmap

aheatmap, which stands for “annotated heatmap”, is a heat map plotting function from the NMF package:

In [8]:
install.packages("NMF")
The downloaded binary packages are in
	/var/folders/hn/rpn4rhms41v939mg20d7w0dh0000gn/T//RtmpjRP53o/downloaded_packages
In [9]:
library(NMF)

# Test aheatmap with column annotations
testAheatmap<-function(logCPM, annotations) {    
    aheatmap(logCPM, annCol=annotations[
        "subject_drug"])
}

testAheatmap(gLogCpmData, gAnnotationData)
Loading required package: pkgmaker
Loading required package: registry

Attaching package: 'pkgmaker'

The following object is masked from 'package:base':

    isNamespaceLoaded

Loading required package: rngtools
Loading required package: cluster
NMF - BioConductor layer [OK] | Shared memory capabilities [NO: bigmemory] | Cores 3/4
  To enable shared memory capabilities, try: install.extras('
NMF
')

Yay, legends for both the main data and the annotations! However, note that something weird is going on here: The dendrograms aren’t showing up right. It appears that somehow the body of the heatmap is overlapping with the finer levels of the dendrograms at both top and left. There may be a way to fix this by digging further into the settings of aheatmap, but since I’m looking for something easy to use out-of-the-box, I consider this a disqualifier for my usage.

Table of Contents

pheatmap

pheatmap, where the “p” stands for “pretty”, is the sole function of the package pheatmap:

In [10]:
install.packages("pheatmap")
The downloaded binary packages are in
	/var/folders/hn/rpn4rhms41v939mg20d7w0dh0000gn/T//RtmpjRP53o/downloaded_packages
In [11]:
library(pheatmap)

# Test pheatmap with two annotation options
testPheatmap<-function(logCPM, annotations) {    
    drug_info = data.frame(annotations[,"subject_drug"])
    rownames(drug_info) = annotations[["sample_name"]]
    
    # Assign the column annotation straight from 
    # the input annotation dataframe
    pheatmap(logCPM, annotation_col=drug_info, 
        annotation_names_row=FALSE,
        annotation_names_col=FALSE,
        fontsize_col=5)
    
    # Assign the column annotation to an intermediate
    # variable first in order to change the name 
    # pheatmap uses for its legend
    subject_drug = annotations[["subject_drug"]]
    drug_df = data.frame(subject_drug)
    rownames(drug_df) = annotations[["sample_name"]]
    
    pheatmap(logCPM, annotation_col=drug_df, 
        annotation_names_row=FALSE,
        annotation_names_col=FALSE,
        fontsize_col=5)
}
testPheatmap(gLogCpmData, gAnnotationData)

Again, nice to have legends for both main and annotation information. Note that:

  1. You control the annotation legend title through the variable name, which I consider suboptimal as variable names often do not read nicely as English text.
  2. The function uses row names to match annotations to data, so all data and annotations must be contained in dataframes (not matrices).

Table of Contents

heatmap3

heatmap3 is the central function of the heatmap3 package. Beware that this is different from “heatmap.3”, of which there are numerous versions (e.g., here, here, and here–apparently a lot of people felt heatmap.2 needed an upgrade! I don’t investigate these others here because I haven’t seen them discussed online by users very often.)

In [12]:
install.packages("heatmap3")
The downloaded binary packages are in
	/var/folders/hn/rpn4rhms41v939mg20d7w0dh0000gn/T//RtmpjRP53o/downloaded_packages
In [13]:
library(heatmap3)

# Test heatmap3 with several annotation options
testHeatmap3<-function(logCPM, annotations) {    
    sampleColors = mapDrugToColor(annotations)
    
    # Assign just column annotations
    heatmap3(logCPM, margins=c(5,8), ColSideColors=sampleColors) 
    
    # Assign column annotations and make a custom legend for them
    heatmap3(logCPM, margins=c(5,8), ColSideColors=sampleColors, 
        legendfun=function()showLegend(legend=c("MiracleDrugA", 
        "MiracleDrugB", "?"), col=c("blue", "green", "red"), cex=1.5))
    
    # Assign column annotations as a mini-graph instead of colors,
    # and use the built-in labeling for them
    ColSideAnn<-data.frame(Drug=annotations[["subject_drug"]])
    heatmap3(logCPM,ColSideAnn=ColSideAnn,
        ColSideFun=function(x)showAnn(x),
        ColSideWidth=0.8)
}
             
testHeatmap3(gLogCpmData, gAnnotationData)

This one follows the syntax of heatmap.2, which is good if you already know the latter. However, its added functionality is quite complicated … definitely complicated enough to get me into trouble (e.g., in the second option above, my annotation legend runs into my heat map and I’ve lost the main legend). It may also be complicated enough to get me out of trouble again (e.g., via explicit setting of the legend and/or heat map placement) but it would clearly take more digging.

Table of Contents

annHeatmap2

annHeatmap2 is the core function of the Heatplus package. Unlike other packages discussed in this evaluation, Heatplus is available through the bioconductor bioinformatics software project rather than through CRAN.

In [14]:
# Source bioconductor
source("http://bioconductor.org/biocLite.R")
biocLite("Heatplus")
Bioconductor version 3.3 (BiocInstaller 1.22.3), ?biocLite for help
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.3 (BiocInstaller 1.22.3), R 3.3.0 (2016-05-03).
Installing package(s) 'Heatplus'
The downloaded binary packages are in
	/var/folders/hn/rpn4rhms41v939mg20d7w0dh0000gn/T//RtmpjRP53o/downloaded_packages
Old packages: 'AnnotationDbi', 'DBI', 'IRanges', 'IRdisplay', 'Matrix', 'R6',
  'Rcpp', 'S4Vectors', 'curl', 'devtools', 'digest', 'httr', 'irlba',
  'jsonlite', 'limma', 'manipulate', 'mgcv', 'mime', 'plyr', 'repr',
  'rstudioapi', 'statmod', 'stringi', 'stringr', 'survival', 'withr'
In [15]:
library(Heatplus)

# Test annHeatmap2 with column annotations
testAnnHeatmap2<-function(logCPM, annotations){
    ann.dat = data.frame(annotations[,"subject_drug"])

    plot(annHeatmap2(logCPM, legend=2,
        ann = list(Col = list(data = ann.dat))))
}

testAnnHeatmap2(gLogCpmData, gAnnotationData)

Like heatmap3, annHeatmap2 does metadata annotations as a mini-graph; apparently it doesn’t do such annotations as color bars (?!) It requires that all annotation (and dendrogram, etc) options be passed in as lists, which clearly offers a lot of powerful abilities but is sort of heavy-weight.

Table of Contents

Summary

Below I summarize the features I assessed for these tools:

feature heatmap heatmap.2 aheatmap pheatmap heatmap3 annHeatmap2
source built-in cran cran cran cran bioconductor
can add main legend x x x x x
can control main legend text x x
can add row/col annotations x x x x x x
can specify annotations as table column x x x
can add annotation legend x x x ~
can control annotation legend text ~ x ~
notes no clear advantages pretty nice results with little tinkering Appears UNUSABLE as dendrograms show up wrong, at least in notebook pretty nice results with little tinkering powerful but complicated powerful but complicated; doesn’t support ColSideColors?

Table of Contents

Recommendation

I recommend using heatmap3 (NB: not “heatmap.3”). It mimics the easy-to-use interface of heatmap.2 but can be extended into more complex settings if you later find you need fine-grained control.

Table of Contents

« Return

See How CCBB Can Help With Your Bioinformatics Data

Request Free Consult 858-822-6258