Multiply columns by columns using substrings

Multiply columns by columns using substrings

I'm relatively new to R and was struggling with potentially a very simple problem.

I have data that has multiple columns named in a similar way. Here is a sample data:

df = data.frame(PPID = 1:50, time1 = sample(c(0,1), 50, replace = TRUE), time2 = sample(c(0,1), 50, replace = TRUE), time3 = sample(c(0,1), 50, replace = TRUE), condition1 = sample(c(0:3), 50, replace = TRUE), condition2 = sample(c(0:3), 50, replace = TRUE))

In my actual data, I have much more columns - approximately 50 for time and 10 for condition.

I want to multiply week columns and condition columns, e.g. in that sample data it should give me 6 extra columns, like: time1_condition1, time1_condition2, time2_condition1, time2_condition2, time3_condition1, time3_condition2.

I tried solutions that were suggested in this thread but they did not work (presumably because I didn't understand how mapply/apply worked and did not make appropriate changes) - it gave me error message that the longer argument is not a multiple of length of shorter.

Any help would be greatly appreciated!

3 Answers
3

#Get all the columns with "time" columns time_cols <- grep("^time", names(df)) #Get all the columns with "condition" column condition_cols <- grep("^condition", names(df)) #Multiply each "time" columns with all the condition columns # and creating a new dataframe new_df <- do.call("cbind", lapply(df[time_cols] , function(x) x * df[condition_cols])) #Combine both the dataframes complete_df <- cbind(df,new_df)

We can also generate column names using expand.grid

expand.grid

new_names <- do.call("paste0", expand.grid(names(df)[condition_cols], names(df)[time_cols])) colnames(complete_df)[7:12] <- new_names

this is awesome! thank you @Ronak Shah!
– yjpark
1 hour ago

++ve for simple and nice code.
– RavinderSingh13
1 hour ago

Here is a tidyverse alternative

tidyverse

library(tidyverse) idx.time <- grep("time", names(df), value = T) idx.cond <- grep("condition", names(df), value = T) bind_cols( df, map_dfc(transpose(expand.grid(idx.time, idx.cond, stringsAsFactors = F)), ~setNames(data.frame(df[, .x$Var1] * df[, .x$Var2]), paste(.x$Var1, .x$Var2, sep = "_")))) # PPID time1 time2 time3 condition1 condition2 time1_condition1 #1 1 1 0 1 3 0 3 #2 2 0 1 1 0 1 0 #3 3 0 1 1 0 2 0 #4 4 0 0 1 0 3 0 #5 5 0 0 0 0 3 0 #...

@RonakShah I did not know that! That's great. Thanks!
– Maurits Evers
1 hour ago

I closed this question but your solution also worked out perfectly well! thank you! @Maurits Evers
– yjpark
1 hour ago

No worries and you're very welcome @yjpark:-)
– Maurits Evers
1 hour ago

Using

library(tidyverse) a = df[grep("time",names(df))] b = df[grep("condition",names(df))]

we can do:

map(a,~.x*b)%>% bind_cols()%>% set_names(paste(rep(names(a),each=ncol(b)),names(b),sep="_"))

or we can

cross2(a,b)%>% map(lift(`*`))%>% set_names(paste(rep(names(a),each=ncol(b)),names(b),sep="_"))%>% data.frame() time1_condition1 time2_condition1 time3_condition1 time1_condition2 time2_condition2 time3_condition2 1 3 0 3 2 0 2 2 3 3 0 1 1 0 3 0 0 0 0 0 0 4 3 3 0 0 0 0 5 0 0 2 0 0 1 6 0 0 1 0 0 1 7 2 2 0 0 0 0

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Search This Blog

YTjnti