SAS Proc Tabulate: Order Class Variable

The following block of code provides a cross-tabulation by student’s home faculty and responsible faculty who students took course from.  The special treatment in the code is the two class statements used for setting order for the home faculty class variable based on format of the home faculty.  The second class statement where the responsible class variable is defined doesn’t have the “/order” option, therefore the order for responsible variable will be based on internal value.

proc tabulate data=count;
class home /order= formatted ;
class responsible;
var ffte;
table (home="Home Faculty" all), (responsible="Responsible Faculty" all)*(count="")*(sum="");
run;

SAS: Data Set Compilation through Remote Submit and Download

Sometime it is better to submit the block of codes to the server for execution when the data sets are large and multiple years of data needed to be  compiled.  The remote calculation can also free up the calculation capacity on your own PC, so you can work on something else while waiting for the server.

The following codes are used to:

  • loop through 5 reporting periods each year;
  • retrieve specific variables that meet the conditions;
  • compile data set for each year;
  • stack data sets for multiple years and generate final data set with all years records for download;
  • delete data set for each year in the workspace on the server

The caveats for remote submit are:

  • codes need to be thoroughly validated locally before submit to the server
  • use “nodetails” and “nolist” to omit output report

To remote submit the code, you first select the code block,  then right click and then select “Remote Submit…”.

%let varlist1 = var1 var2 var3 var4 var5 var6 var7 ;
proc datasets library = work nodetails nolist;
delete opsis;
run;
%macro opsis (strtyear, stopyear);
%do yr= &strtyear %to &stopyear;
%put &yr.;
%let nextyr = %eval(&yr.+1);
%let charyr = %substr(&yr.,3,2);
%let charnextyr = %substr(&nextyr.,3,2);
%put &charyr.;
DATA opsis_&charyr.;
SET opsis.uecJun&charyr. (keep=&varlist1.)
opsis.uecJul&charyr. (keep=&varlist1.)
opsis.uecNov&charyr. (keep=&varlist1.)
opsis.uecFeb&charnextyr. (keep=&varlist1.)
opsis.uecMar&charnextyr. (keep=&varlist1.);
where var1 ne 'XX' and var2 ne '' and var3= 'XXX';
proc append base=opsis data =opsis_&charyr.;
run;
%end;
%mend opsis;
%opsis (2009, 2017);
proc download data = opsis out=opsis;run;
proc datasets library=work nodetails;
delete opsis_09 opsis_10-opsis_17;
run;

SAS: Proc Freq for all Character Variables

Before processing the large data set, it is a good practice to check the levels of the categorical varaible and make sure records are assigned to the right category and missing categories are treated.

Use _character_ in the freqency procedure to apply the procedure to all character variables.  Be careful not to include any character variables that are numeric in nature, eg id variable in the data set.  Using “nlevels” option to provide a summary table of number of the levels, missing levels, non-missing levels for each categorical variable.  Then check the frequency table for the specific categorical variable for the values of the levels.

proc freq data= table1 nlevels;
tables _character_ / nocum nopercent out=freqcnt;
run;
Number of Variable Levels
Variable Label Levels Missing Levels Nonmissing Levels
gender Gender Code 4 0 4
Var2 label2 12 1 11
Gender Code
gender Frequency
F 495320
M 402399
U 3475
{NUL} 1

SAS: Sample Data Sets

SAShelp and SASuser data library contain many data sets for the base SAS usages.  But for components other than base SAS, such as SAS/EST and SAS/STAT, the examples used in the user manual are not directly in the SAShelp and SASuser libraries.  Need to use the “Help” menu in SAS to access the datasets that are referred in the these user manuals.

  1. Help Menu
  2. SAS Help and Documentation
  3. Contents Tab
  4. Learning to Use SAS
  5. Sample SAS Programs
  6. Relevant Compontent (eg: SAS/STAT, SAS ETS)
  7. Samples

SAS Array: check non zero variables and calculate accumulative counts

Course1 contains the counts information for students who have taken crs1000 to crs 2000.   If the student haven’t taken the specific course, the value for the course of the student is 0. If the student have taken the specific course for multiple times, the value for the course is the total number of times the student took the course.

In following code, the array was set up to group the course of interest; count variable is used to capture number of distinct course taken from the selected courses; ifretake variable is used to track if student retakes  any of the course from the selected courses.

data course2;
set course1;
array crs{*} crs1000 crs1010 crs1015 crs1200 crs1450 crs2000;
count=0;
ifretake =0;
do i =1 to dim(crs);
if crs{i}>0 then count = count+1;
if crs{i}>1 then ifretake = 1;
end;
run;

SAS: Read files in folder and compile list of filenames

In a prvious post, “Filename: Create SAS dataset of folder hierarchical structure and file list“, I have described a way to extract filenames in a folder or subfolder to a file. Here is a different way of doing it.

Reference: Reading and Processing the Contents of a Directory by Ben Cochran, The Bedford Group, Raleigh, NC. https://www.mwsug.org/proceedings/2012/S1/MWSUG-2012-S128.pdf

The following SAS codes get the filenames from L:\DaraFiles directory and pass them to a macro variable called varlist.

%let varlist =;
data null;
rc =filename("mydir", "L:\DataFiles");
did = dopen("mydir");
if did > 0 then do;
num =dnum(did);
do i = 1 to num;
fname =dread (did, i );
put fname = ;
call symput('varlist', trim(resolve('&varlist'))||' '||trim(fname));
end;
end;
run;
%put &varlist.;

Key Functions:

  • filename: assign the directory to a alias.
  • dopen: open the directory and get the directory identifier;
  • dnum: get the number of member in the directory;
  • dread: get the filenname of the directory member;

SAS: Concatenate Values in Multiple Rows by Group

For example, data table contains student grades  for the term. Each grade is one record and the term, program, student id information are the same  for the student.  The following data step can be used to order the data table and then concatenate the grade records into something like ‘A/A/B/B+/A’ for each student.

proc sort data= one;
by groupvar1 groupvar2 groupvar3;
run;
data two;
set one;
by groupvar1 groupvar2 groupvar3;
firstgroup = first.groupvar2;
lastgroup = last.groupvar2;
retain catvar;
if firstgroup =1 then catvar=trim(targetvar);
else catvar=catx('/',catvar, targetvar);
if lastgroup ne 1 then delete;
keep groupvar1 groupvar2 catvar;
run;

SAS: Proc Tabulate and Percentage Statistics

The following proc tabulate procedure will generate at a report on sum of undergraduate teaching and graduate teaching and FTE for each Faculty by departments (row header) and by year (column header).

proc tabulate data=data;
class year faculty dept;
var undergrad_load graduate_load fte;
table faculty, (dept all), year*(undergrad_load="UG" graduate_load="GR" fte)*(sum="");
run;

Data table contains grade, program, and incoming year of students. “Total” variable equals to 1 if the student has a grade; “ge75” is 1 if the grade is great than and equal to 75; and “ge80” is 1 if the grade is great than and equal to 80.

The following proc tabulate procedure gives a summary table of total number of counts of students with grade, students with grade great than and equal to 75 and 80.

proc tabulate missing f=6. noseps data=data;
class program progname category year level2;
var total ge75 ge80;
tables category,program*progname*level2,year*(total ge75 ge80) / rts=40 indent=3;
run;

PERCENTAGE STATISTICS

  • reppctn: report percentage (all rows and all columns add up to 100)
  • colpctn: column percentage (every column adds up to 100)
  • rowpctn: row percentage (every row adds up to 100)
  • A*pctsum: construct a custom denomination, both A and B are analysis variables, and B as in the is the denominator for calculating the percentage.

The following tabulate procedure calculates the percentage of greater and equal to 75 and 80. The outomatic SAS naming with ‘1111111’ of the percentage output reflects the number of class variables in the procedure.

proc tabulate missing f=6. noseps data=data out=dataout (drop=_type_ _table_ _page_ rename=(ge75_pctsum_1111111=ge75 ge80_pctsum_1111111=ge80));
class level1 faculty program progname category level2 year;
var total ge75 ge80;
tables category*level1*faculty*program*progname*level2*year, ge75*pctsum='ge75%'*f=8.2 ge80*pctsum='ge80%'*f=8.2 / rts=40 indent=3;
run;

Examples: with multiple class variables in row expression

  • Reppctn/Reppctsum
proc tabulate data = table1 ;
class Agegroup gender acadyear;
var heads ;
table agegroup="Age Group" all gender="Gender" all ,
(acadyear="" all)*(heads="Heads"*SUM="" heads="% Total"*reppctsum="" );
run;
  • Colpctn/Colpctsum — best for % sum for multiple class variables
proc tabulate data = table1 ;
class Agegroup gender acadyear;
var heads ;
table agegroup="Age Group" all gender="Gender" all ,
(acadyear="" all)*(heads="Heads"*SUM="" heads="% Total"*colpctsum="" );
run;
  • Rowpctn/Rowpctsum
proc tabulate data = table1 ;
class Agegroup gender acadyear;
var heads ;
table agegroup="Age Group" all gender="Gender" all ,
(acadyear="" all)*(heads="Heads"*SUM="" heads="% Total"*rowpctsum="" );
run;
  • pctsum – not working for the All column
proc tabulate data = table1 ;
class Agegroup gender acadyear;
var heads ;
table agegroup="Age Group" all gender="Gender" all ,
(acadyear="" all="All Year" )*(heads="Heads"*SUM="" heads="% Total"*pctsum < agegroup all gender all > ="" );
run;

Show % sign in the table

Reference: http://support.sas.com/kb/38/001.htmlhttp://support.sas.com/kb/38/001.html

proc format;                           
   picture fmtpct (round) low-high='009.99%';   
run;
/* use: f=fmtpct. */

SAS Proc Sgplot: Assign colors by group in Statistics Plot

Reference: https://blogs.sas.com/content/iml/2012/10/17/specify-the-colors-of-groups-in-sas-statistical-graphics.html; https://blogs.sas.com/content/graphicallyspeaking/2012/02/27/roses-are-red-violets-are-blue/

 

Original code use gplot

%macro plots;
%do i = 0 %to 1;
%if &i=0 %then %do;
title height=12pt "Regular Salary Group";
%end;
%else %do;
title height=12pt "High Salary Group";
%end;
%do j = 2 %to 4;
%if &j=2 %then %do;
title2 height=14pt "&curr_fiscal_year  Assistant Professors";
%end;
%else %if &j=3 %then %do;
title2 height=14pt "&curr_fiscal_year  Associate Professors";
%end;
%else %do;
title2 height=14pt "&curr_fiscal_year Full Professors";
%end;
proc gplot data=anno_&i(where=(rank_cd=&j)) anno=anno_&i(where=(rank_cd=&j));
plot y*x / haxis=axis1 vaxis=axis2 noframe;
symbol1 v=dot h=.6 w=.6 color='Black';
format basesal comma7.;
run;
quit;
%end;
%end;
%mend plots;
%plots;

New code with the following improvements:

  1. produce regression line with color and transparency attributes
  2. produce 95% confidence limit with color and transparency attributes
  3. produce scatter plot by gender group and use grouporder attribute to make sure fixed color is assigned to Male and Female.
  4. output the plots to pdf

 

ods pdf file="X:regression_with_gender_label.pdf";
goptions reset = global;
%macro plots;
%do i = 0 %to 1;
%if &i=0 %then %do;
title height=12pt "Regular Salary Group";
%end;
%else %do;
title height=12pt "High Salary Group";
%end;
%do j = 2 %to 4;
%if &j=2 %then %do;
title2 height=14pt "&curr_fiscal_year Assistant Professors";
%end;
%else %if &j=3 %then %do;
title2 height=14pt "&curr_fiscal_year Associate Professors";
%end;
%else %do;
title2 height=14pt "&curr_fiscal_yearFull Professors ";
%end;
proc sgplot data=log_glm (where=(rank_cd=&j. and area = "&i.")) ;
scatter x=exper y=log_sal / group=gender grouporder=ascending name='plot' markerattrs=(symbol=circlefilled);
series x=exper y=yhat / name='predict' legendlabel='ln(Predicted Log Sal)' lineattrs=(color=blue ) transparency = 0.5 ;
series x=exper y=ucl / name='upper' legendlabel='Upper Confidence Level' lineattrs=(color = lightblue) transparency = 0.5;
series x=exper y=lcl / name='lower' legendlabel='Lower Confidence Level' lineattrs=(color = lightblue) transparency = 0.5;
run;
quit;
%end;
%end;
%mend plots;
%plots
Ods pdf close;

Problem: myattrmap not reconginzed by the scatter statement though ‘attrid= myid’ doesn’t generate error.

data myattrmap;
retain id value linecolor fillcolor;
length linecolor $ 9 fillcolor $ 9;
input ID $ value $ linecolor $ fillcolor $;
datalines;
myid F blue blue
myid M red red;
run;