R code optimization

Question

I have about 977 obs in top500Stocks which contains name of 977 stocks.

head(top500Stocks,10)
    ï..Symbol
1    RELIANCE
2         TCS
3    HDFCBANK
4        INFY
5  HINDUNILVR
6        HDFC
7   ICICIBANK
8   KOTAKBANK
9        SBIN
10 BAJFINANCE

and I have Date, OHLC and Adj.Close, Vol and Ret of each stocks from the top500Stocks in stocksRetData

  head(stocksRetData[[1]],3)
          Date     Open     High      Low    Close Adj.Close    Volume   Ret
    1 20000103 28.18423 29.86935 28.18423 38.94457  29.86935  28802010 0.000
    2 20000104 30.66445 32.26056 29.82188 42.06230  32.26056  61320457 0.080
    3 20000105 30.45677 34.16522 30.45677 43.71014  33.52440 173426953 0.039

Now for a given lookbackPeriod and holdPeriod I am trying to run the below function but it takes about 1 minute. How can I make it faster? Because I have to run for multiple lookbackPeriod and holdPeriod it will take forever to complete.

CalC.MOD_MScore.Ret.High <- function(lookbackPeriod, holdPeriod, fnoStocks, 
                                     stocksRetData, totalTestPeriod) {
  
  #We go through each stock and calculate Modified mscores where we give more importance to recent data
  
  WeeklyData <- list()
  wmean <- function(x, k) mean(seq(k)/k * x)
  
  for (i in 1:nrow(fnoStocks)){
    
    out <- stocksRetData[[i]]
    out <- tail(out,totalTestPeriod)
    
    if (nrow(out)==totalTestPeriod){
      
      tempDF <- transform(out, wtMean = rollapply(Ret, lookbackPeriod, wmean, 
                                                  k = lookbackPeriod, align = "right", 
                                                  fill = NA))
      
      tempDF <- transform(tempDF, ExitVal = rollapply(lead(High, holdPeriod), 
                                                      holdPeriod, max, 
                                                      align = "right", 
                                                        fill = NA))
      
      tempDF$NWeekRet <- (tempDF$ExitVal - tempDF$Adj.Close ) / tempDF$Adj.Close
      
      tempDF <- tempDF[!is.na(tempDF$wtMean),]
      tempDF <- tempDF[!is.na(tempDF$ExitVal),]
      tempDF$StockName = fnoStocks[i,1]
      tempDF$WeekNum = c((lookbackPeriod):(nrow(tempDF)+lookbackPeriod-1))
      
      WeeklyData[[i]] <- data.frame(
        StockName = tempDF$StockName,
        WeekNum = tempDF$WeekNum,
        M_Score = tempDF$wtMean,
        NWeekRet = tempDF$NWeekRet,
        stringsAsFactors =  FALSE
      )
      
    }
  }# i ends here
  
  return(bind_rows(WeeklyData))
}

This takes more than a minute to complete.

 a <- CalC.MOD_MScore.Ret.High(4,14,fnoStocks = top500Stocks, stocksRetData = stocksRetData, 2000)

@minem top500Stocks contains 977 unique strings in column 1. And stocksRetData contains list of 977 dataframes with data in above format. Other than that no other data is used. — Stupid_Intern
– Stupid_Intern, Commented Jun 1, 2021 at 8:31
The current question title, which states your concerns about the code, applies to too many questions on this site to be useful. The site standard is for the title to simply state the task accomplished by the code. Please see How do I ask a good question?. — BCdotWEB
– BCdotWEB, Commented Jun 1, 2021 at 12:35

minem · Accepted Answer · 2021-06-02 10:00:21Z

Fake data:

set.seed(1)
n1 <- 977
top500Stocks <- data.frame(a = sample(letters, n1, replace = T))
n <- 2000
df <- data.frame(High = rnorm(n),
                 Adj.Close = rnorm(n),
                 Ret = rnorm(n))
stocksRetData <- lapply(1:n1, function(x) df)

new function:

require(data.table)
minem <- function(lookbackPeriod, holdPeriod, fnoStocks, stocksRetData, totalTestPeriod) {
  
  WeeklyData <- list()
  k <- lookbackPeriod
  y <- seq(k)/k/k
  wmean2 <- function(x) sum(y * x)
  
  for (i in 1:nrow(fnoStocks)) {
    
    out <- tail(stocksRetData[[i]], totalTestPeriod)
    
    if (nrow(out) == totalTestPeriod) {
      
      wtMean <- frollapply(out$Ret, k, wmean2, align = "right", fill = NA)
      ExitVal <- frollapply(lead(out$High, holdPeriod), 
                            holdPeriod, max, align = "right", fill = NA)
      tempDF <- out
      tempDF$wtMean <- wtMean
      tempDF$ExitVal <- ExitVal
      tempDF$NWeekRet <- (ExitVal / tempDF$Adj.Close - 1)
      
      tempDF <- tempDF[!is.na(wtMean),]
      tempDF <- tempDF[!is.na(tempDF$ExitVal),]
      tempDF$StockName = fnoStocks[i, 1]
      tempDF$WeekNum = c(k:(nrow(tempDF) + k - 1))
      
      WeeklyData[[i]] <- tempDF[, c('StockName', 'WeekNum', 'wtMean', 'NWeekRet')]
      colnames(WeeklyData[[i]])[3] <- 'M_Score'
    }
  }# i ends here
  return(bind_rows(WeeklyData))
}

speed:

system.time(a2 <- minem(4, 14, top500Stocks, stocksRetData, 2000))
# user  system elapsed 
# 2.36    0.00    2.36

(versus 31.75 seconds for original on my machine)

some comments:

from code profiling we see that wmean was the slowest part(called thousands of times). As we know k(lookbackPeriod) beforehand we can simplify/speedup this function.
rollapply was quite slow. replacing it with data.tables frollapply we gain performance
creating new data.frame in loop isn't efficient, we can reuse tempDF (but this isn't as significant as first two points)
other minor changes

update

we can go faster:

minem2 <- function(lookbackPeriod, holdPeriod, fnoStocks, 
                  stocksRetData, totalTestPeriod) {
  
  k <- lookbackPeriod
  y <- seq(k)/k/k
  
  # create stock name data.table
  sn <- data.table(sname = fnoStocks[[1]], id = 1:nrow(fnoStocks), key = 'id')
  dt <- rbindlist(stocksRetData[1:nrow(fnoStocks)], idcol = 'id') # bind together data
  setkey(dt, id)
  dt <- dt[, tail(.SD, totalTestPeriod), by = id] # select last n rows by group
  dt[sn, StockName := sname] # merge stock names
  
  # calculate wtMean using tcrossprod:
  v <- shift(dt$Ret, (k:1) - 1L) # create list of lagged values
  v <- do.call(cbind, v) # bind together
  dt[, wtMean := tcrossprod(y, v)[1, ]]
  dt[1:(k - 1), wtMean := NA, by = id] # delete first (k-1) as they aren't correct
  # maybe not needed
  
  # the same approach for ExitVal as it was sufficiently fast
  dt[, ExitVal := frollapply(lead(High, holdPeriod), holdPeriod, max,
                             align = "right", fill = NA), by = id]
  
  dt[, NWeekRet := (ExitVal / Adj.Close - 1)]
  dt <- dt[!is.na(wtMean) & !is.na(ExitVal)]
  dt[, WeekNum := c(k:(.N + k - 1)), by = id]
  res <- dt[, .(StockName, WeekNum, M_Score = wtMean, NWeekRet)]
  return(res[])
}

system.time(a3 <- minem2(4, 14, top500Stocks, stocksRetData, 2000))
# user  system elapsed 
# 0.91    0.01    0.78

Stack Exchange Network

R code optimization

1 Answer 1

update

You must log in to answer this question.

Hot Network Questions

R code optimization

1 Answer 1

update

You must log in to answer this question.

Related

Hot Network Questions