I have 12 separate dataframes for 3 subjects and 4 activities for each subject. Using the following code, I have created a list of lists containing indices or "markers" from the 12 separate data frames
df_list <- lapply(file_paths, \(f) read.csv (f, header = FALSE)) |>
setNames(variable_name)
results <- list()
for (i in seq_along(df_list)) {
current_df <- df_list[[i]]
markers <- which (current_df[1] == "Sample")
results[[names(df_list)[i]]] <- markers
Resulting list:
> dput(results)
list(subj1_act1 = c(5L, 169L, 1119L), subj1_act2 = c(2L, 156L,
1310L, 1815L), subj1_act3 = c(484L, 504L, 1685L), subj1_act4 = c(2L,
67L, 1234L), subj2_act1 = c(2L, 172L, 1346L), subj2_act2 = c(3L,
132L, 1311L), subj2_act3 = c(2L, 62L, 1206L), subj2_act4 = c(2L,
97L, 1266L), subj3_act1 = c(5L, 1219L), subj3_act2 = c(2L, 459L,
1563L), subj3_act3 = c(2L, 443L, 1592L), subj3_act4 = c(2L, 558L,
1675L))
I am trying to find the "starter marker" and "end marker" for each subject and activity. These markers or indices will then be used to "slice" each dataframe. This can be calculated by finding whether the difference between "next marker" and "current marker" is greater than 700.
For example, for subj1_act1 the values of the indices are 5, 169 and 1119.
169 - 5 > 700 FALSE
1119 - 169 > 700 TRUE .. Therefore, 1119 is the end marker and 169 is the starter marker.
I have tried using a for loop to run through each value from each list separately but I am getting the error subscript out of bounds. I am unsure how to refer to the length of each vector as the length varies (2, 3 and 4).
Any insight would be greatly appreciated.
Please note, only base R functions can be used.