While using %in%
is perhaps the appropriate solution here, you may have searched for grepl
, where you can use such patterns that include '|'
operators. I'd prefer using NA
for non-matches, obviously it's up to you to encode remaining categories differently.
> within(G14, {
+ Behavior_cat <- NA
+ Behavior_cat[
+ grepl("slow approach|fin raise|fast approach|tail beat|ram|bite", Behavior)
+ ] <- "aggressive"
+ Behavior_cat[
+ grepl("flee|avoid|tail quiver", Behavior)
+ ] <- "submissive"
+ Behavior_cat[
+ grepl("bump|join", Behavior)
+ ] <- 'affiliative'
+ })
Behavior Behavior_cat
1 slow approach aggressive
2 fin raise aggressive
3 fast approach aggressive
4 tail beat aggressive
5 ram aggressive
6 bite aggressive
7 flee submissive
8 avoid submissive
9 tail quiver submissive
10 bump affiliative
11 join affiliative
12 random behavior <NA>
Here's an alternative solution using stringi::stri_replace_all_regex
:
> G14 |>
+ transform(
+ Behavior_cat=stringi::stri_replace_all_regex(
+ Behavior,
+ list(c('slow approach|fin raise|fast approach|tail beat|ram|bite'),
+ c('flee|avoid|tail quiver'),
+ c('bump|join'), c('random behavior')),
+ list('aggressive', 'submissive', 'affiliative', NA_character_),
+ vectorize_all=FALSE)
+ )
Behavior Behavior_cat
1 slow approach aggressive
2 fin raise aggressive
3 fast approach aggressive
4 tail beat aggressive
5 ram aggressive
6 bite aggressive
7 flee submissive
8 avoid submissive
9 tail quiver submissive
10 bump affiliative
11 join affiliative
12 random behavior <NA>
Note, that these also match word parts so far. To only match whole words, include boundary metacharacters, or ^
and $
to denote start and end of a pattern, as shown e.g. in this answer.
Data:
> dput(G14)
structure(list(Behavior = c("slow approach", "fin raise", "fast approach",
"tail beat", "ram", "bite", "flee", "avoid", "tail quiver", "bump",
"join", "random behavior"), Behavior_cat = c("aggressive", "aggressive",
"aggressive", "aggressive", "aggressive", "aggressive", "aggressive",
"aggressive", "aggressive", "aggressive", "aggressive", "aggressive"
)), row.names = c(NA, -12L), class = "data.frame")
%in%
and==
? to better understand why%in%
is optimal here over==
. Good luck and happy coding!