how to group based on category in ggtree

Question

I have a ggtree plot where I'm working with three different varieties for a plant, and since I'm planning to add additional info in the circos around it, I was wondering whether there is a simple way to ensure the same varieties are close to each other in space/the plot.

Below, what the tree looks like. As you can see all varieties mixed with each other, whereas I wish to have them grouped in the four distinct categories. I tried to use group_by and group_split but couldn't attain the desired output.

CODE

library(dplyr)
library(tibble)
library(ggtree)
library(ggplot2)
library(phangorn)
library(ggtreeExtra)
library(RColorBrewer)


tree_NJ <- NJ(ibs_matrix_small)

###GGTREE PERSONALIZATION
meta_df_small$variety <- factor(meta_df_small$variety, levels=c('wt', 'lr', 'cv', 'unk'))
t1 <- ggtree(tree_NJ, branch.length='none', layout='circular') %<+% meta_df_small + geom_tippoint(aes(color=variety), size=1.5) +
  scale_color_manual(values=c(brewer.pal(12, "Set3")[c(5, 1, 4)], "black"), 
                     guide=guide_legend(keywidth=1, keyheight=0.8, ncol=2, order=1)) + 
  theme(legend.title=element_text(hjust=.5,face='italic'))
t1

ibs_matrix_small — 10 samples

structure(c(0, 0.0505857, 0.0440299, 0.0456033, 0.0467799, 0.0469243, 
0.0499126, 0.0480139, 0.0474165, 0.0476661, 0.0505857, 0, 0.0491273, 
0.0475119, 0.0481583, 0.0459553, 0.0483575, 0.0467083, 0.046498, 
0.0486043, 0.0440299, 0.0491273, 0, 0.0435812, 0.0454869, 0.0480994, 
0.0488441, 0.0472959, 0.0474109, 0.0485103, 0.0456033, 0.0475119, 
0.0435812, 0, 0.0423738, 0.0449022, 0.0490488, 0.0436106, 0.0452121, 
0.0444688, 0.0467799, 0.0481583, 0.0454869, 0.0423738, 0, 0.0405592, 
0.0519895, 0.0447788, 0.0463662, 0.0478919, 0.0469243, 0.0459553, 
0.0480994, 0.0449022, 0.0405592, 0, 0.0450354, 0.0442473, 0.0403124, 
0.0465415, 0.0499126, 0.0483575, 0.0488441, 0.0490488, 0.0519895, 
0.0450354, 0, 0.0481892, 0.0427342, 0.0488455, 0.0480139, 0.0467083, 
0.0472959, 0.0436106, 0.0447788, 0.0442473, 0.0481892, 0, 0.0462259, 
0.0492101, 0.0474165, 0.046498, 0.0474109, 0.0452121, 0.0463662, 
0.0403124, 0.0427342, 0.0462259, 0, 0.0426052, 0.0476661, 0.0486043, 
0.0485103, 0.0444688, 0.0478919, 0.0465415, 0.0488455, 0.0492101, 
0.0426052, 0), dim = c(10L, 10L), dimnames = list(c("INLUP00130", 
"INLUP00131", "INLUP00132", "INLUP00133", "INLUP00134", "INLUP00135", 
"INLUP00136", "INLUP00137", "INLUP00138", "INLUP00139"), NULL))

meta_df_small — 10 samples

structure(list(id = c("INLUP00130", "INLUP00131", "INLUP00132", 
"INLUP00133", "INLUP00134", "INLUP00135", "INLUP00136", "INLUP00137", 
"INLUP00138", "INLUP00139"), variety = c("wt", "lr", "wt", "cv", 
"lr", "lr", "cv", "cv", "wt", "wt"), location = c("ESP", "ESP", 
"ESP", "ESP", "ESP", "ESP", "PRT", "ESP", "ESP", "ESP")), row.names = c(NA, 
10L), class = "data.frame")

EDIT Current result

Intended result

What you are asking for isn't logically possible unless the branches of the tree are allowed to cross each other. But this would be incredibly messy, and with the sheer number of items you have, the relationships would be indecipherable to the viewer. If you want to view the relationships in a coherent tree structure, the groups can't be clustered together, and if you want the groups to be together you can't have a coherent tree structure. Or do I misunderstand what you are asking? — Allan Cameron
– Allan Cameron, Commented May 24 at 16:40
@AllanCameron Indeed, I agree with you. I had the same feeling about it; however, I wanted to ask just in case I was missing something. In fact, I already have a version of this tree with box plots for each one of the ~300 samples expressing their diversity in terms of SNPs, but I realized the visual is not appealing. Hence, I thought to color them by location and generate four big boxplots with the variation within each category hoping that will make more sense. If helps, I can add the final figure and what I want to attain; granted, I was trying to proceed by small steps to get there. — Matteo
– Matteo, Commented May 24 at 16:46
@AllanCameron I added the current result and the intended one as an edit to the main thread. — Matteo
– Matteo, Commented May 24 at 17:25
I suspected that's what you meant. There's no way to achieve that layout unless the branches cross each other, which would ruin the tree structure — Allan Cameron
– Allan Cameron, Commented May 24 at 18:01

miken32 · Accepted Answer · 2025-11-04 00:15:57Z

ggalign can handle this situations. It is an Integrative Composable Visualization Framework for ggplot2. In the development version, I added group support for phylogenetic trees.

ggalign treats each plot as a single circle track, so it can work with all ggplot2 extensions.

It can be hard to show this in a minimal example, so I’ve provided code along with a detailed explanation for each line for this issue only (you can find more examples on the official site).

Example 1: Groups exist before adding the phylogenetic tree

In this case, the phylogenetic tree will be split into multiple subtrees.

# Initialize a circle layout
circle_discrete(
    ibs_matrix_small,
    radial = coord_radial(inner.radius = 0.2, rotate.angle = TRUE)
) +
    # split the layout into groups
    align_group(meta_df_small$variety) +
    # With existing groups, align_phylo splits the tree into multiple subtrees
    align_phylo(tree_NJ, mapping = aes(color = clade), split = TRUE) +
    # Customize theme for the phylogenetic tree
    theme(
        axis.text.theta = element_text(),
        # Add a panel border for clarity
        panel.background = element_rect(color = "black", fill = NA)
    ) +
    # Set the circular axis guide
    guides(theta = guide_axis_theta(angle = 0)) +

    # Initialize a new plot
    # when no data is input, it will autamatically inherit from the layout
    # which is the `ibs_matrix_small`
    ggalign() +
    # add boxplot layer
    geom_boxplot(aes(x = .discrete_x, y = value, fill = .panel))

Example 2: No groups before adding phylo, but groups added after

Groups must follow the ordering defined by the phylogenetic tree, you can use `order2()` to determine the acutal ordering:

ordering <- order2(tree_NJ)
group <- sort(sample(c("g1", "g2", "g3"), length(ordering), replace = TRUE))[
    order(ordering)
]
circle_discrete(
    ibs_matrix_small,
    radial = coord_radial(inner.radius = 0.2, rotate.angle = TRUE)
) +
    # No initial groups, align_phylo adds a single phylogenetic tree
    align_phylo(tree_NJ, mapping = aes(color = clade), split = TRUE) +
    # Customize theme
    theme(
        axis.text.theta = element_text(),
        # Add a panel border for clarity
        panel.background = element_rect(color = "black", fill = NA)
    ) +
    # Set the circular axis guide
    guides(theta = guide_axis_theta(angle = 0)) +

    # Add groups after the tree; ordering must follow the phylogenetic tree
    align_group(group) +

    # Initialize a new plot
    # when no data is input, it will autamatically inherit from the layout
    # which is the `ibs_matrix_small`
    ggalign() +
    # add boxplot layer
    geom_boxplot(aes(x = .discrete_x, y = value, fill = .panel))

that's fantastic! I really love it, the thing is my dataset has 304 samples and the output is not ideal with this numerosity since the plot is dominated by dots. Also, boxplots seem to disappear in my viz... I'm happy to share the exact command I'm using to produce the plot/dataset as well as the current output based on Example1; let me know!
It would be even better if you could share your exact code and dataset so I can take a closer look at why the output behaves that way. If you’re okay with it, I can also include your dataset in my example gallery: yunuuuu.github.io/ggalign-gallery. I recently added a phylogenetic trees example here: yunuuuu.github.io/ggalign-gallery/advanced/….
Absolutely, I think it might be of use for the community that way! Let me know how is best to share this data since on So there are few limitations in terms of character numbers, typo of attachments etc.
Please note that all examples are generated using the development version, since ggplot2 4.0.0 hasn’t been released yet. It will be available on CRAN once ggplot2 itself is updated there, which is expected later this month.
I see, indeed I'm using the development version based on you suggestion.
@Matteo Feel free to send it over via GitHub, email, or Dropbox, OneDrive, or another netdisk link, whichever works best for you.
I opened an issue on GitHub with output, explanation, and data. Please, feel free to look at it in your own time since I'll be on vacation till the end of August, so no rush :) Here is the link: github.com/Yunuuuu/ggalign/issues/78, get back to me if anything isn't working!

Collectives™ on Stack Overflow

how to group based on category in ggtree

1 Answer 1

Example 1: Groups exist before adding the phylogenetic tree

Example 2: No groups before adding phylo, but groups added after

7 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Example 1: Groups exist before adding the phylogenetic tree

Example 2: No groups before adding phylo, but groups added after

7 Comments

Related