I have data from a survey with variables containing strings that I would like to convert to a numeric value for analysis. They also contain some blanks. I use Stata 15.1 to perform the analysis. Here is an example of a variable:
input str50 workplace
"1"
"1"
"Consultant"
"1"
"3"
"3"
"1"
"3"
"1"
"2"
"1"
"Resident"
"Physician"
It should look like this:
input workplace
1
.
1
5
1
3
.
3
1
.
3
1
.
2
1
5
5
Unfortunately I do not succeed to destring without losing information.
I tried following:
gen workplace_cleaned = workplace
replace workplace_cleaned = 5 if real(workplace) == .
However, this did not work, as it return the error type mismatch r(109); . If I did my research correctly, real() does not recognize strings not being a valid number.
I tried to use force, but this mixes up the missing data with the long strings (e.g."Consultant"). Is there a way to preserve the missings as . and convert the longer strings?