I am struggling with subsetting strings from the column of a dataframe. I am dealing with language data. In my dataframe, I have a 1st column with the verb stem, and a 2nd column with a full sentence containing several words, including one which is the conjugated verb. I would like to create a 3rd column with only the conjugated verb (therefore removing the other words) that contains the same verb stem as in column 1 within the same row. I cannot simply use a list of all verb stems for this, because some sentences contain 2 verbs, and I only want the verb with the same stem as in column 1 in that row.
This is how my data looks like now:
Verb_stem Full_sentence
1. copt to coptu to
2. puns punse kanchina
3. khag basana na lo khagunse nan
And this is the output that I would like:
Verb_stem Full_sentence Conjugated verb
1. copt to coptu to copto
2. puns punse kanchina punse
3. khag basana na lo khagunse nan khagunse
After doing some research, I tried the following formula:
Df$Conjugated_verb <- lapply(strsplit(Df$Full_sentence, " "), grep, pattern = Df$Verb_stem, value = TRUE)
The problem that I am facing right now is that the formula seems to look only for the verbs stem in the 1st row in all sentences, instead of switching to a new verb stem at each row. Here is the output that I get:
Verb_stem Full_sentence Conjugated_verb
1. copt to coptu to coptu
2. puns punse kanchina character(0)
3. khag basana na lo khagunse nan character(0)
I tried many things, and I have been looking for a solution for days, but I really cannot figure out how to do it. If someone had an idea, I would be super grateful! Thanks in advance!