Clustering and Heatmap Analysis

In order for clustering to be successful, rows of a data table have to be normalized to have a similar data distribution. This is especially true in case of gene expression data, where intensity values of genes are usually quite diverse, and are not directly meaningful unless compared to a control (see tutorial on Data import and processing). Start with the normalized data from previous tutorial.

Usually, data clustering is performed on a selected subset of the dataset, as clustering the whole dataset of thousands of rows can be computationally intensive. First, we select rows from our dataset where the variance of relative gene expression (log2 ratio values) is highest.

  1. Select the log2 transformed columns and get standard deviations of each row by clicking  button on the TableView toolbar.
  2. Click on the header of the new standard deviations column, then click again, to sort the rows in the descending order of standard deviations.
    This is how the sorted table should look like. See right-most column for standard deviations.
  3. Click on the “row selection” button () to allow selection of the table row-wise. Select all rows that have >0.8 of standard deviation (start selecting from top till you reach 0.8 on the standard deviations column).
  4. Copy the selection to a new table (click ), which will create a new TableView window containing a new dataset made of the selected rows. In this new table, remove all columns except for the log2-transformed data and the names column. Next operations will be performed on this new table.
  5. Select all columns (you should only have log2 transformed columns now as data columns) except for the Names column. Click on the heatmap button () and select green-red coloring. You can manually enter data range for color mapping here (by default: [-2,2], meaning that all data 2 will be colored by brightest red, intermediate values will be colored according to the gradient). You can adjust column widths by “resize” button to make the view prettier.
  6. Next, click on “cluster” button () and select “Re-order rows”. Click OK.

    This is how the clustered heatmap view will look like.
  7. Now you have a clustered heatmap of genes with most variable expression in response to low and high dose doxorubicin. If you resize all rows so that you can see the whole heatmap, you will notice 4 main clusters. From top down: one where genes’ expression goes up in high but not low dose, one where genes’ expression goes up with high dose but goes down with low dose, one where genes’ expression goes up in low but goes down in high dose, and a big cluster where genes’ expression goes extensively down in the high but not low dose.

At this point, you can obtain an image of the clustered heatmap to use as a figure by printing the TableView (Print button on the application toolbar) to a PDF, which can be opened in any image processing software.