1

Code :

library(plyr)
library(datasets)
data("iris")

iris$Sepal.Length

size <- c()
for (s in iris$Sepal.Length){
  if (s < 5.8){
    size <- c(size, "SMALL")
  } else if(s >= 5.8){
    size <- c(size, "LARGE")
  }
}
iris$Size <- size
plot(table(iris$Species, iris$Size))

plot :

image1


I'm wondering how to plot this kind of thing in ggplot.

I have this (which is dependent on the previous code):

ggplot(as.data.frame(table(iris$Species, iris$Size)), 
       aes(x=Var1, y=Freq, fill=Var2)) +
  geom_bar(stat="identity", position="fill") + 
  theme_fivethirtyeight() + 
  theme(axis.text.x = element_text(size=15),
        text = element_text(size=15)) +
  scale_x_discrete(labels=c("S1", "S2", "S3")) + 
  labs(y = "Percentage") + 
  labs(x = "") + 
  theme(axis.title = element_text()) + 
  ggtitle("Something about iris stuff") + 
  scale_fill_discrete(name = "Size")

image1

Which communicates similar information, but it's not the same.

So - how can I make the a table in ggplot like that of plot(table(a, b)). I don't want it to be stylistically exactly the same ( or I'd just use base ), but I like the way that the proportions are displayed in that table more than the bars that I have with gg

The ability to be able to pass a table object is useful here as I'm generating the plots with base within a for loop

Edit - code which plots within a loop

I shall edit this post when it's finished so that it's cleaner, I didn't want to delete stuff that people might currently be looking at referencing

Here's some code that plots within a loop, by plotting a table object. I'm not sure how I would go about doing this in ggplot.

rm(list=ls())

library(plyr)
library(datasets)
data("iris")
set.seed(1234)

iris$Sep.Size <- c("SMALL", "LARGE")[(iris$Sepal.Length >= 5.8) + 1]
# create an additional categorical variable, purely for the sake of plotting it
iris$Data.2 <- cut(
  rnorm(150, 10, 2), 
  c(-Inf, 8, 10, 11, Inf),
  labels = c('a', 'b', 'c', 'd'),
  include.lowest = TRUE)


iris.2 <- data.frame(data = iris$Data.2, sepsize = iris$Sep.Size, species = iris$Species)

# plotting tables using a loop - one of them will be nonsense, but the others are usable. 
for ( i in 1:dim(iris.2)[2]){
  t = table(iris.2$species, iris.2[,i])
  plot(t)
}

There are 3 plots created as a result of this

5
  • I don't understand exactly what you don't like about the 2nd plot - the absence of spacing? Commented Dec 24, 2018 at 18:29
  • I wanted to be able to plot table objects within a for loop. I thought this example was clear, but thinking about it now perhaps I should have included an example of such plotting. Sorry about that. I'll edit this later to include that Commented Dec 24, 2018 at 18:31
  • Tip: size <- c("SMALL", "LARGE")[(iris$Sepal.Length >= 5.8) + 1L]. A one-liner. No loops at all. R is a vectorized language. Commented Dec 24, 2018 at 19:07
  • @RuiBarradas thanks, how would that be extended to 3 or 4 levels though? Commented Dec 24, 2018 at 19:16
  • 1
    You can use findInterval or cut. In the latter case, you would set the labels to the character values you want. In the former, use the return value of findInterval to index a vector of values for size. See the edit to the answer. Commented Dec 24, 2018 at 19:19

2 Answers 2

2

The following is not exactly what you are asking for but it solves your problem in a somewhat natural way.

It uses package dplyr to pipe the dataset to as.data.frame. And to ggplot.

library(ggplot2)
library(dplyr)
library(datasets)

data("iris")

size <- c("SMALL", "LARGE")[(iris$Sepal.Length >= 5.8) + 1L]
tbl <- table(iris[5:6])

tbl %>%
  as.data.frame() %>%
  ggplot(aes(Species, Freq, fill = Size)) + 
  geom_bar(stat="identity", position="fill")

On the downside, you have to load an extra package.

As for the new vector size, it can be created using

findInterval.

Size_Values <- c("SMALL", "LARGE")
i <- findInterval(iris$Sepal.Length, c(0, 5.8, Inf))
size2 <- Size_Values[i]

identical(size, size2)
#[1] TRUE

cut.

In this case the output is an object of class "factor".

size3 <- cut(iris$Sepal.Length, c(0, 5.8, Inf), labels = Size_Values,
    include.lowest = TRUE, right = FALSE)

identical(size, as.character(size3))
#[1] TRUE

EDIT.

In order to address the change in the question, with new data, the following code plots two tables in the same graphic window. The dataset iris.2 is created in the question in a reproducible manner, setting the pseudo RNG seed prior to calling one of the built-in PRNG functions.

# plotting tables using a loop
# the columns to plot are determined by 
# these 2 instructions
ref <- "species"
others <- names(iris.2)[names(iris.2) != ref]

old_par <- par(mfrow = c(1, 2))
for(i in others){
  tbl <- table(iris.2[[ref]], iris.2[[i]])
  plot(tbl)
}
par(old_par)
Sign up to request clarification or add additional context in comments.

1 Comment

I've added an example with a for loop, I'm unsure how that would be done ( or if it can be ) with ggplot
2

Perhaps using the ggmosaic package?

library(ggplot2)
library(ggmosaic)
library(datasets)

size <- c("SMALL", "LARGE")[(iris$Sepal.Length >= 5.8) + 1L]

ggplot(data = iris) +
  geom_mosaic(aes(x = product(Size, Species), fill = Size), na.rm = TRUE)

and you can then format the chart as you'd like.

EDIT:

To address the loop requested (based on the iris.2 dataframe created in the original question and using code from Rui Barradas above) you can use:

ref <- "species"
others <- names(iris.2)[names(iris.2) != ref]

for (i in others){
  tmp <- iris.2[, c(ref, i)]
  p <- ggplot(data = tmp) + 
         geom_mosaic(aes(x = product(species, !!ensym(i)), fill = !!ensym(i)), na.rm = TRUE)
  print(p)  
}

This should create a different mosaic ggplot for each variable against species. Of course, you can format the charts accordingly inside the loop, or even save each plot in a list and then plot them on the same page if required.

3 Comments

Great! Didn't know about package ggmosaic.
Thanks, how would this be done within a for loop though ? I've edited the post to make this part clearer (hopefully)
I'll add the code for a loop later today. The idea is simply to use only the relevant variables in the darafrane used for plotting.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.