SAS: Organizing data through Proc Format

Proc Format can be used to group categorical variables or numeric values for reporting.  The following examples show the coversion of sex, age, and id variables into desired format. Method 1 uses put and input statements and creates new variables, while method 2 use format statement to just format the existing variables.

  • Need $ in the format name for the existing character variable. eg. $gender
  • Need to put . after the format name when applying the format. eg. gender = put(sex, $gender.); format age agerange. sex $gender.;

/**** Proc Format ****/
proc format;
value
$gender
‘M’ = ‘Male’
‘F’ = ‘Female’
;
value
agerange
1 – 30 = ‘le 30’
30<-40 = '31 to 40'
40<-50 = '41 to 50'
50<-60 = '51 to 60'
60<-100 = 'gt 60'
other = 'Missing'
;
run;
options fmtsearch = (work.formats);

Method1:

data admit;
set sasuser.admit;
agecatgory = put(age, agerange.);
gender = put (sex, $gender.);
numericid = input(id, 4.);
run;

Method2:

data admit;
set sasuser.admit;
format age agerange. sex $gender.;
run;

Character to Numeric, using INVALUE.
Example: convert character gpa to numeric gpa.

/*** calculate the ESL grade points that needs to be backed out from the GPA ***/
/** format grade to index **/
proc format ; 
invalue   /* UES INVALUE, NOT VALUE */
$grade
'A+' = 9
'A'= 8
'B+'= 7
'B'= 6
'C+' = 5
'C' = 4
'D+' = 3
'D' = 2
'E' = 1 
'F' = 0
'P' = 0
;
run;
proc options  option=fmtsearch;run;
data backgpa;
set pes;
array crs{3} esl1000 esl1010 esl1015  ;
array ncr{3} esl1000ncr esl1010ncr esl1015ncr  ;
array cr{3} esl1000cr esl1010cr esl1015cr  ;
array back{3} esl1000bo esl1010bo esl1015bo;
do i =1 to dim(crs);
if crs{i} ne '' and ncr{i} ne 'NCR' then  back{i} =  input(crs{i}, $Grade.) * input(cr{i}, 8.);
end;
drop course1-course20 gr1-gr20 credit1-credit20 ncr1-ncr20 i;
run;

SAS: Convert Variable between Numeric and Character Format

A. Use Put or Input
Convert between numeric and character variable.

  • character to numeric (input)
old_char = "2018";
new_num = input(old_char, 8.);
new_num = 2018;
  • character to numeric (input) to character (put)
old_char = "2018";
new_char = put (input (substr(old_char , 1, 4 ), 8.) -1 , 4.);
new_char = "2017";

/* or */
new_char = put (old_char*1-1 , 4.);
new_char = "2017";
  • numeric to character (put), and with leading Zero.
old_num = 2018;
new_char = put(old_num, 4.); *new_char = "2018";
new_char1 = put(old_num, z8.); *new_char1 = "00002018";
  • numeric to character (put) to numeric (input)
old_num = 2018;
new_num = input(substr(put(old_num, 4.),3, 2) , 8.);
new_num = 18;

Use the following functions to check whether any digit or alphabetic character is in the character variable. The results will be the position of the first digit/alphabetic character in the field.

check1 = anydigit(var); *return position of first digit, 0 if not found in the string;
check2 = anyalpha(var); *return position of first alphabetic character, 0 if not found in the string;
check3 = notalpha(var); *return position of first non digit, 0 if not found in the string;
check4 = notdigit(var); *return position of first non alphabetic character, 0 if not found in the string;
  • Please note there should be no space in between % and statement eg. macro, let, mend, put, eval etc.
  • To apply the macro variable, use &varname. format.

CODE:

%macro setyr;
%let yr = 14;
%let nextyr =%eval(&yr. +1);  /* 15 */
%let next2yr=%eval(&yr. +2);  /* 16 */
%let fisyr = “20%substr(&yr., 1,2)/%substr(&nextyr., 1,2)”;   /* “2014/2015” */
%let fisnextyr = “20%substr(&nextyr., 1,2)/%substr(&next2yr., 1, 2)”;   /* “2015/2016” */
%let year = “20&yr.”; /* “2014” */
%put &yr;
%put &nextyr;
%put &next2yr;
%put &fisyr;
%put &fisnextyr;
%put &year;
%mend setyr;
%setyr;

OUTPUT:
14
15
16
“2014/15”
“2015/16”
“2014”

B. Use Vvalue fuction
Vvalue() returns the formatted value that is associate with the variable.

data want;
data have;
new_charvar1 = vvalue(formatted_numvar1);  *formatted means the variable has been applied with format;
new_numvar2= vvalue(formated_charvar2);
run;