Multiply columns by columns using substrings

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP

Multiply columns by columns using substrings



I'm relatively new to R and was struggling with potentially a very simple problem.



I have data that has multiple columns named in a similar way. Here is a sample data:


df = data.frame(PPID = 1:50,
time1 = sample(c(0,1), 50, replace = TRUE),
time2 = sample(c(0,1), 50, replace = TRUE),
time3 = sample(c(0,1), 50, replace = TRUE),
condition1 = sample(c(0:3), 50, replace = TRUE),
condition2 = sample(c(0:3), 50, replace = TRUE))



In my actual data, I have much more columns - approximately 50 for time and 10 for condition.



I want to multiply week columns and condition columns, e.g. in that sample data it should give me 6 extra columns, like: time1_condition1, time1_condition2, time2_condition1, time2_condition2, time3_condition1, time3_condition2.



I tried solutions that were suggested in this thread but they did not work (presumably because I didn't understand how mapply/apply worked and did not make appropriate changes) - it gave me error message that the longer argument is not a multiple of length of shorter.



Any help would be greatly appreciated!




3 Answers
3


#Get all the columns with "time" columns
time_cols <- grep("^time", names(df))

#Get all the columns with "condition" column
condition_cols <- grep("^condition", names(df))

#Multiply each "time" columns with all the condition columns
# and creating a new dataframe
new_df <- do.call("cbind", lapply(df[time_cols] , function(x) x *
df[condition_cols]))

#Combine both the dataframes
complete_df <- cbind(df,new_df)



We can also generate column names using expand.grid


expand.grid


new_names <- do.call("paste0",
expand.grid(names(df)[condition_cols], names(df)[time_cols]))
colnames(complete_df)[7:12] <- new_names





this is awesome! thank you @Ronak Shah!
– yjpark
1 hour ago





++ve for simple and nice code.
– RavinderSingh13
1 hour ago



Here is a tidyverse alternative


tidyverse


library(tidyverse)
idx.time <- grep("time", names(df), value = T)
idx.cond <- grep("condition", names(df), value = T)
bind_cols(
df,
map_dfc(transpose(expand.grid(idx.time, idx.cond, stringsAsFactors = F)),
~setNames(data.frame(df[, .x$Var1] * df[, .x$Var2]), paste(.x$Var1, .x$Var2, sep = "_"))))
# PPID time1 time2 time3 condition1 condition2 time1_condition1
#1 1 1 0 1 3 0 3
#2 2 0 1 1 0 1 0
#3 3 0 1 1 0 2 0
#4 4 0 0 1 0 3 0
#5 5 0 0 0 0 3 0
#...





@RonakShah I did not know that! That's great. Thanks!
– Maurits Evers
1 hour ago






I closed this question but your solution also worked out perfectly well! thank you! @Maurits Evers
– yjpark
1 hour ago






No worries and you're very welcome @yjpark:-)
– Maurits Evers
1 hour ago



Using


library(tidyverse)

a = df[grep("time",names(df))]
b = df[grep("condition",names(df))]



we can do:


map(a,~.x*b)%>%
bind_cols()%>%
set_names(paste(rep(names(a),each=ncol(b)),names(b),sep="_"))



or we can


cross2(a,b)%>%
map(lift(`*`))%>%
set_names(paste(rep(names(a),each=ncol(b)),names(b),sep="_"))%>%
data.frame()

time1_condition1 time2_condition1 time3_condition1 time1_condition2 time2_condition2 time3_condition2
1 3 0 3 2 0 2
2 3 3 0 1 1 0
3 0 0 0 0 0 0
4 3 3 0 0 0 0
5 0 0 2 0 0 1
6 0 0 1 0 0 1
7 2 2 0 0 0 0






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Comments

Popular posts from this blog

Executable numpy error

PySpark count values by condition

Mass disable jenkins jobs