Reshape the data in R

Question

I have dataset with the following format. I was trying to do that with reshape2 package in R, but it gives an inappropriate format(binary variables for all pages). Is there any method available that can reshape the data set in the required format as below.

Input format:
User    Pages
1   index.html
1   search.html
1   help.html
1   contact.html
2   help.html
2   contact.html
3   index.html
3   search.html
3   feedback.html

Output format:
User    page1       page2         page3         page4         page5
1       index.html  search.html   help.html     contact.html  NA
2       help.html   contact.html  NA            NA            NA
3       index.html  search.html   feedback.html NA            NA

@Ananda Mahto I am interesting in knowing the visited pages of every users for website from Google Analytics data. You may know Google analytics provides the data in vertical manner so need to restructure the data. — Vignesh
– Vignesh, Commented Apr 27, 2013 at 13:18
The data are in that format because it is usually easier to do analysis on "long" data rather than on "wide" data. That said, you may also be interested in exploring the aggregate function, with which you can condense your "Pages" variable into a list by user. Something like aggregate(as.character(Pages) ~ User, mydf, I, simplify = FALSE), where "mydf" is the name of your dataset. — A5C1D2H2I1M1N2O1R2T1
– A5C1D2H2I1M1N2O1R2T1, Commented Apr 27, 2013 at 15:51

zelite · Accepted Answer · 2013-04-26 12:13:43Z

9

Use the function dcast from the reshape2 package:

library(reshape2)

txt <- "User    Pages
1   index.html
1   search.html
1   help.html
1   contact.html
2   help.html
2   contact.html
3   index.html
3   search.html
3   feedback.html"

mydf <- read.table(text=txt, header=TRUE)

#creating a new column to count the page number:
mydf$page <- paste("Page", unlist((sapply(table(mydf$User), seq))))  

new.df <- dcast( mydf, User ~ page, value.var="Pages") #here the magic happens. 

> print(new.df)
   User     Page 1       Page 2        Page 3       Page 4
1    1 index.html  search.html     help.html contact.html
2    2  help.html contact.html          <NA>         <NA>
3    3 index.html  search.html feedback.html         <NA>

answered Apr 26, 2013 at 12:13

zelite

1,54318 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Vignesh Over a year ago

Hey zelite, Thanks for code this. Its working on this sample dataset. But when I am applying this on large dataset it gives me the binary values(0 or 1) instead of the page information(like index.html, search.html). Is there something need to be updated while applying on another dataset with the same format (User, Pages).

zelite Over a year ago

Can you check the str of your original dataframe? is there something in the wrong type? an example of code where the output becomes 0 or 1 would be helpful to understand were the code fails.

Anthony Damico · Accepted Answer · 2013-04-26 12:19:39Z

2

incorporating @zelite's amazing unlist trick

x <- read.table( text = "User    Pages
1   index.html
1   search.html
1   help.html
1   contact.html
2   help.html
2   contact.html
3   index.html
3   search.html
3   feedback.html", h=T)

library(reshape2)

x$tv <- unlist((sapply(table(x$User), seq)))

reshape( x , idvar = 'User' , timevar = 'tv' , direction = 'wide' )

edited Apr 26, 2013 at 12:19

answered Apr 26, 2013 at 12:11

Anthony Damico

5,6847 gold badges52 silver badges84 bronze badges

4 Comments

Arun Over a year ago

Another possibility using plyr's rbind.fill with unstack: > rbind.fill(lapply(unstack(df, Pages ~ User), function(x) as.data.frame(matrix(x, nrow=1))))

Vignesh Over a year ago

Hi Anthony, Thanks for optimized code. This is generating the structure but I need all the pages information to be in left side of the NA values (As you already have provided for previous code but it takes too much time to complete on my large dataset (21,000 rows). Can we have function for doing that?)

A5C1D2H2I1M1N2O1R2T1 Over a year ago

Provided that "Pages" is a character variable, a more direct way to create x$tv is: x$tv <- with(x, ave(Pages, User, FUN = seq_along)).

Anthony Damico Over a year ago

@AnandaMahto so long as you wrap Pages in as.numeric :)

Collectives™ on Stack Overflow

Reshape the data in R

2 Answers 2

2 Comments

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

4 Comments

Related