2

I have data from a survey with variables containing strings that I would like to convert to a numeric value for analysis. They also contain some blanks. I use Stata 15.1 to perform the analysis. Here is an example of a variable:

input str50 workplace
"1"                 
                                                                                             
"1"                       
"Consultant"     
"1"                        
"3"                    
                                                
"3"                                  
"1"                        
                                                 
"3"                                             
"1"                           
                                                
"2"                                                 
"1"                                                 
"Resident"
"Physician"  

It should look like this:

input workplace
1                                               
.                                                                                             
1                                                 
5     
1                                               
3                                                
.                                              
3                                                                                              
1                                                 
.                                               
3                                                                                              
1                                                
.                                              
2                                                
1                                                 
5
5 

Unfortunately I do not succeed to destring without losing information.

I tried following:

gen workplace_cleaned = workplace
replace workplace_cleaned = 5 if real(workplace) == .

However, this did not work, as it return the error type mismatch r(109); . If I did my research correctly, real() does not recognize strings not being a valid number.

I tried to use force, but this mixes up the missing data with the long strings (e.g."Consultant"). Is there a way to preserve the missings as . and convert the longer strings?

1 Answer 1

4

destring can't find a numeric equivalent whenever none exists. It is a smart wrapper for real() but (e.g.) real("Consultant") can't be mapped to a number.

Backing up, you have mixed string and numeric input read as string. The bridge between them consists of value labels. There is not a one-step solution for your problems, but the following code may be instructive. I identify two routes: go more string and get where you want by encode; or go more numeric and get where you want using value labels.

See also e.g.

SJ-18-4 dm0098  . . Speaking Stata: Seven steps for vexatious string variables
        . . . . . . . . . . . . . . . . . . . .  N. J. Cox and C. B. Schechter
        Q4/18   SJ 18(4):981--994                                (no commands)
        provides a step-by-step guide explaining how to convert string
        variables or -- as the case may merit -- to leave them as they
        are

https://www.stata-journal.com/article.html?article=dm0098

clear 
input str50 workplace
""
"1"                 
"Consultant"                    
"3"                    
"2"                                                 
"Resident"
"Physician"  
end 

clonevar original=workplace 
order original 
clonevar copyvar=workplace 

* route 1 
replace workplace = "One" if workplace == "1"
replace workplace = "Two" if workplace == "2"
replace workplace = "Three" if workplace == "3"

label def work 1 "One" 2 "Two" 3 "Three" 4 "Resident" 5 "Physician" 6 "Consultant"

encode workplace, gen(work) label(work)

* route 2 
replace copyvar = "6" if copyvar == "Consultant"
replace copyvar = "5" if copyvar == "Physician"
replace copyvar = "4" if copyvar == "Resident"

destring copyvar, gen(work2)

label val work2 work

list, sep(0)

list, sep(0) nolabel 

Results

. list, sep(0)

     +-------------------------------------------------------------+
     |   original    workplace   copyvar        work2         work |
     |-------------------------------------------------------------|
  1. |                                              .            . |
  2. |          1          One         1          One          One |
  3. | Consultant   Consultant         6   Consultant   Consultant |
  4. |          3        Three         3        Three        Three |
  5. |          2          Two         2          Two          Two |
  6. |   Resident     Resident         4     Resident     Resident |
  7. |  Physician    Physician         5    Physician    Physician |
     +-------------------------------------------------------------+

. 
. list, sep(0) nolabel 

     +--------------------------------------------------+
     |   original    workplace   copyvar   work2   work |
     |--------------------------------------------------|
  1. |                                         .      . |
  2. |          1          One         1       1      1 |
  3. | Consultant   Consultant         6       6      6 |
  4. |          3        Three         3       3      3 |
  5. |          2          Two         2       2      2 |
  6. |   Resident     Resident         4       4      4 |
  7. |  Physician    Physician         5       5      5 |
     +--------------------------------------------------+
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.