automate repeating models with different data forloop in R

Question

I need to run a lot of replicates on the same model but cycle different data into it on each iteration.

e.g.

db1 <- mtcars
db2 <- mtcars
db3 <- mtcars

for(i in 1:db) {
  # keep model structure but alternate the data
  lm(mpg ~ wt, data = db[i])
}

I need to create a for-loop or a function that can run the model on db1, then swap in db2 and run the same model. I also need them to be stored as separate objects in my R environment e.g. lm1 (for db1) and lm2 (for db2)

Cn someone please help me automate this.

thanks

Joel Kandiah · Accepted Answer · 2021-04-06 15:07:32Z

The method I would use to do something like this would be to use a map function over a list of dataframes. My preferred method would to use a nested dataframe where we have a column for dataframe name, the dataframe and we add a linear model column.

I have coded a version of this below using the map function which takes our vector of dataframes and applies lm to each entry.

library(tidyverse)

db1 <- mtcars
db2 <- mtcars
db3 <- mtcars

# Place dataframes in a liset (note do not use c() to put dfs into an array)
a <- list(db1, db2 , db3)

# Construct our dataframe
df <- tibble(entry = 1:3, dataframes = a)

df %>% 
  # Map the lm function to all of the dataframes
  mutate(lm = map(dataframes, ~lm(mpg~wt, data = .x)))
#> # A tibble: 3 x 3
#>   entry dataframes          lm    
#>   <int> <list>              <list>
#> 1     1 <df[,11] [32 x 11]> <lm>  
#> 2     2 <df[,11] [32 x 11]> <lm>  
#> 3     3 <df[,11] [32 x 11]> <lm>

^{Created on 2021-04-06 by the reprex package (v2.0.0)}

A slighlty more intuitive method with lists only could be as follows:

(Note that some information i.e. the call to lm is lost)

library(tidyverse)

db1 <- mtcars
db2 <- mtcars
db3 <- mtcars

a <- list(db1, db2 , db3)

b <- rep(list(), 3)

for(i in 1:3) {
  b[i] <- lm(mpg~wt, data = a[[i]])
}
#> Warning in b[i] <- lm(mpg ~ wt, data = a[[i]]): number of items to replace is
#> not a multiple of replacement length
b
#> [[1]]
#> (Intercept)          wt 
#>   37.285126   -5.344472 
#> 
#> [[2]]
#> (Intercept)          wt 
#>   37.285126   -5.344472 
#> 
#> [[3]]
#> (Intercept)          wt 
#>   37.285126   -5.344472

^{Created on 2021-04-06 by the reprex package (v2.0.0)}

PKumar · Accepted Answer · 2021-04-06 18:22:06Z

Create a list of data frames rather than individual data-frames as objects, as it is harder to loop db1, db2,db3 rather create data frames which are easier to loop inside lists. Here dfs created is basically list of dataframes on which you can create your models. Now here I have created random dataset with mtcars, In your case you might be having dataset already saved as db1, db2 or db3, so you can do either of these things:

a) dfs = list(db1, db2, db3) Use this dfs with lapply like this: mymodels <- lapply(dfs, function(x)lm(mpg ~ wt, data=x))

b) dfs <- mget(ls(pattern='^db\\d+'), envir = globalenv()) , here inside pattern you put your pattern of data , In this case it starts with db word and ending with a number, now use the similar lapply like above: mymodels <- lapply(dfs, function(x)lm(mpg ~ wt, data=x))

I have given one example from mtcars data using randomly selected rows to propose a way of doing it.

# Creating a list of data-frames randomly
# Using replicate function n(3) times here and picking 80% of data randomly, using seed value 1 for reproducibility

set.seed(1)
n <- 3
prop = .8

dfs <- lapply(data.frame(replicate(n, sample(1:nrow(mtcars), prop*nrow(mtcars)))), function(x)mtcars[x,])
## replicate function here replicates sample command n number of times and create a matrix of indexs of rows taken as different data points from mtcars dataset

mymodels <- lapply(dfs, function(x)lm(mpg ~ wt, data=x)) #mymodels is your output

Output:

$X1

Call:
lm(formula = mpg ~ wt, data = x)

Coefficients:
(Intercept)           wt  
  38.912167    -5.874795  


$X2

Call:
lm(formula = mpg ~ wt, data = x)

Coefficients:
(Intercept)           wt  
  37.740419    -5.519547  


$X3

Call:
lm(formula = mpg ~ wt, data = x)

Coefficients:
(Intercept)           wt  
  39.463332    -6.051852

Both really helpful solutions! Thank you very much. Very efficient. — Kilian Murphy, Commented Apr 7, 2021 at 9:47

Collectives™ on Stack Overflow

automate repeating models with different data forloop in R

2 Answers 2

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Related