3

So my problem may not be suited for SO. But I am looking for a solution (in R, Python mainly, prefer R) to create heatmaps for data that has two extreme ends. Consider the following data.

+----+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+
| …  |     X1      |     X2      |     X3      |     X4      |     X5      |     X6      |     X7      |     X8      |     X9      |     X10     |     X11     |     X12     |
+----+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+
|  1 | 0.960023745 | 0.006412462 | 0.002413886 | 1.75E-06    | 1.33E-07    | 6.53E-07    | 0.000789362 | 1.56E-07    | 0.027248026 | 2.54E-05    | 0.000108822 | 0.002949816 |
|  2 | 0.013783554 | 0.960582857 | 0.010711838 | 0.003933983 | 0.002573642 | 0.001472307 | 0.000319789 | 0.000195265 | 1.87E-05    | 1.29E-06    | 0.004194081 | 0.002209041 |
|  3 | 0.000839561 | 0.005466858 | 0.944159921 | 0.023892784 | 0.001752099 | 0.000828122 | 0.000493376 | 1.84E-06    | 0.011739846 | 0.000879784 | 9.53E-05    | 0.00980562  |
|  4 | 2.26E-08    | 0.004108291 | 0.010781282 | 0.966410413 | 0.010459999 | 3.04E-05    | 1.64E-06    | 0.001983494 | 0           | 0.000225223 | 0.002846474 | 0.0031448   |
|  5 | 0           | 0.003175902 | 0.002023363 | 0.010022482 | 0.919020424 | 0.032083951 | 0.001814906 | 0.030203657 | 2.02E-06    | 7.07E-05    | 0.001165208 | 0.000413012 |
|  6 | 7.34E-08    | 0.002817014 | 0.000931738 | 7.01E-05    | 0.026999736 | 0.947850807 | 0.003017895 | 0.017994113 | 0           | 0.00011791  | 0.000194055 | 0           |
|  7 | 0.001857195 | 0.000220267 | 0.001523402 | 1.23E-05    | 0.001915852 | 0.010193007 | 0.960227998 | 0.012040256 | 0.007093175 | 0.001441301 | 0.002149965 | 0.001306157 |
|  8 | 0           | 0.000337953 | 0           | 0.00536237  | 0.030409165 | 0.01670267  | 0.009929247 | 0.936720524 | 0           | 0           | 0.000503316 | 3.12E-05    |
|  9 | 0.00350741  | 2.38E-06    | 0.002294787 | 1.17E-06    | 9.38E-08    | 8.74E-08    | 0.000252812 | 4.25E-10    | 0.984092182 | 0.003173648 | 2.42E-05    | 0.006649569 |
| 10 | 0.000126558 | 4.85E-05    | 0.001686418 | 0.000202837 | 3.87E-05    | 9.82E-05    | 0.000425687 | 0           | 0.013116146 | 0.983428814 | 5.28E-05    | 0.000776452 |
| 11 | 0.000170592 | 0.002728779 | 0.000117028 | 0.002794149 | 0.000621607 | 0.000224662 | 0.000969203 | 0.000299963 | 0.000629235 | 4.68E-05    | 0.991344498 | 5.02E-05    |
| 12 | 0.004371355 | 0.001246307 | 0.02523568  | 0.007498292 | 0.000186287 | 6.00E-07    | 0.000956249 | 2.93E-05    | 0.0590514   | 0.001253133 | 8.40E-05    | 0.900059314 |
+----+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+

Consider the first row. The X1 column entry is a very high number compared to the rest of the entries in that row. This goes for all the rows. The heat map this data generates looks like the following

enter image description here

As you can see, the diagonal is very strong compared to the other colors (and this can be seen from the data and is actually expected). I am just trying to find a way to "darken" up the other colors. I'm mainly looking for a ggplot solution. Anything I've tried dosnt work.

The code for R right now is

heatmap(data.matrix(result_matrix), Rowv=NA, Colv=NA, col = rev(heat.colors(256)), margins=c(5,10))

1 Answer 1

2

The basic idea is to put the fill colors on a logarithmic scale. Here is a ggplot solution.

library(ggplot2)
library(reshape2)
df$id <- rownames(df)
gg <- melt(df,id="id")
ggplot(gg, aes(x=variable,y=id,fill=value))+
  geom_tile()+
  scale_fill_gradientn(colours=rev(heat.colors(10)),
                       trans="log10",na.value="white")+
  coord_fixed()+
  scale_x_discrete(expand=c(0,0))+scale_y_discrete(expand=c(0,0))

The key here is trans="log10" in the call to scale_fill_gradientn(...). One problem with logs is that you have zeros in your data, which are transformed to NA. Using na.value="white" deals with that (you could make it another color if that was appropriate in your use case).

The calls to scale_x... and scale_y... are just to compress the axes so the tiles cover the whole plot (ggplot adds a bit of empty space by default which is distracting in heatmaps).

EDIT: Response ot OP's comment.

This business of "making the diagonal pop out more" is an aesthetic choice which has almost nothing to do with the data, and will probably lead to a misleading graphic. I do not recommend it. Having said that, you can always choose a different transformation.

# reorder the y-axis  - should not be necessary
gg$id <- factor(gg$id,levels=unique(gg$id))  # should not be necessary...

# square root scale
ggplot(gg, aes(x=variable,y=id,fill=value))+
  geom_tile()+
  scale_fill_gradientn(colours=rev(heat.colors(10)),
                       trans="sqrt",na.value="white")+
  coord_fixed()+
  scale_x_discrete(expand=c(0,0))+scale_y_discrete(expand=c(0,0))

#logit scale; need to set breaks=... to avoid labels overlapping
ggplot(gg, aes(x=variable,y=id,fill=value))+
  geom_tile()+
  scale_fill_gradientn(colours=rev(heat.colors(10)),
                       trans="logit",na.value="white",breaks=5*10^-(0:8))+
  coord_fixed()+
  scale_x_discrete(expand=c(0,0))+scale_y_discrete(expand=c(0,0))

Sign up to request clarification or add additional context in comments.

4 Comments

I see a strong diagonal there, but the other colors do overpower it. Is there a way to get the diagonal to pop out more? Also, on my machine the y axis is not ordered.
Thanks. I have no idea why my y axis is not ordered. I'll try to debug it myself. Here is a screenshot: imgur.com/sR2l00i. And I've taken your advice to not do the "pop up diagonal" however, now I am looking for a solution to print the actual values in the box (maybe *100 to make them into integers)
Sorry - I misread your comment, thought the x-axis was in the wrong order. Using gg$id <- factor(gg$id,levels=unique(gg$id)) should fix it. I've edited the answer.
Okay thanks that works. But could you explain what you did? I cant seem to figure it out, this way I can learn also. Also any idea on how to include the actual numbers in each cell?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.