SAS: Change Variable Length in one data step

Need to put Length statement before the Set statement to change the length of existing variable. Use the LENGTH statement as the very first statement in the DATA STEP to declare the length of variable in the Set dataset.
thekey and program variables in the mapping dataset are both $7 in length.  The following data step change the length of these two variables to $10.

data mapping;
length thekey $10 program $10;
set mapping;
run;

SAS: Proc Boxplot and Proc Sgplot

It is easier to use proc sgplot than proc boxplot to compare distibution by classification variable.  “Drive Train” and “Type” are both categorical variables.

proc sgplot data=sashelp.cars;
title "Price distribution by Drive Train and Type";
vbox invoice / category =type group = drivetrain;
run;
  • side by side comparison
  • group became legend
  • applied legend color by the group
  • inset statement for sgplot doesn’t have statistics output (n/min/max/mean/stddev)

SGPlot22

proc sort data=sashelp.cars out=cars;
by drivetrain type;
run;
proc boxplot data=cars;
title "Price distribution by Drive Train and Type";
plot invoice*type;
by DriveTrain;
inset min mean max stddev/ header = "Overall Statistics";
insetgroup min max / header = "Cheap and Expensive by Type";
run;
  • need to sort the data first according to by statement and plot categorical variable;
  • plot in light blue; want other color, need extra code
  • not able to show 2 categorical variable plot side by side;
  • use by statement use produce plot separately.
  • inset and insetgroup are nice to have to produce stats as part of the plot.
    • inset: data, min, max, mean, nmax, nmin, dobs, stddev;
    • insetgroup: max, mean, min, n, nhigh, nlow, nout, q1, q2, q3, range, stddev;

Boxplot22

Boxplot24

SAS: Use Filename to Create data set of hierarchical folder structure and file list

FILENAME contents pipe 'TREE "x:\project" /F /A' LRECL = 2000;
DATA project;
infile contents TRUNCOVER;
INPUT content_entry $char2000.;
content_entry = left(compress(content_entry, "|"));
run;

TREE: disply each directory within the listed directory. /F: display the names of the files within each directory.

proc sql noprint;
select count (*) into : file_exist
from project
where content_entry = "table1.sas7bdat";
quit;
%put file_exist = &file_exist.;

SAS: Compile varlist and run report for each variable

Recently, I found Proc Sql can allow you to create macro variable with long string values separated by either space or other delimiters directly without extra step for table generation and null data step. Here is the modify codes.

proc sql;
select distinct varname into :varlist separated by ' ' from table;
quit;
%put &varlist.;

Sometime the macro variable is created within a macro, then the macro variable will be just local macro variable and it will not be available beyond the macro statement. To force the macro variable to be available in the global environment, using %Global.

%macro new;
proc sql;
select distinct varname into :varlist separated by ' ' from table;
quit;
%global varlist;
%mend new;
%put &varlist.;

If you would like the varlist items to be wrapped with quotation and separated with comma, then use the following code.

%macro new;
proc sql;
select distinct '"'||varname||'"' as temp into :varlist separated by ', ' from table;
quit;
%global varlist;
%mend new;
%put &varlist.;

If the same macro variable is defined or used in multiple macros, making sure the macro variable is set at the global level, because macro variables are local at default. You might get error message like below if the macro variable is not forced to be global variable.

ERROR: Attempt to %GLOBAL a name (NAME) which exists in a local environment.

Here is an example of use previously created macro variable in another macro.

%macro del;
%global varlist;
proc datasets nolist;
delete &varlist;
run;
%mend del;

previous reference: http://www2.sas.com/proceedings/sugi30/028-30.pdf

proc sql;
create table varlist as
select distinct varname from table;
quit;

%let varlist =;

data _null_;
set varlist;
call symput('varlist', trim(resolve('&varlist'))||' '||trim(varname));
run;

null is a special data set name, although sas doesn’t create a data set called “_null”. It allows sas to carry out commands in the data steps.  There is no output data set.

Omitting value in the %let syntax will produce a null value for the macro variable.

resolve function: resolve the value of macro variable in data step.

CALL SYMPUT  assigns value produced in a DATA step to macro variable(S).

SAS: Assign Order Number by group and Calculating Accumulative Total in dataset using Retain Statement

Assigning group order for a set sequence of records.

data temp2;
retain order;
set temp1;
if _n_ =1 then order=1; /* _n_=1 identifies the first record.
else if var1 ne lag(var1) then order= sum(order,1);
else order = order ; 
run;
  • Lag(var) the value of previous record in the var column.
  • retain statement needs to be before the set statement.

Calculating the running total of weight and height for each record. The Retain statement retains the running total from the previous iteration to the next.

data one;
length AccWeight AccHeight 8.;
retain AccWeight 0 AccHeight 0;
set sashelp.class end=eof;
AccWeight = sum(AccWeight, weight);
AccHeight = sum(AccHeight, Height);
run;

The following codes deal with cumulative sum by the group where the variable for calculation contains null values. The variable regs is either 1 or 0 or null value. Sorting by the regs variable will force the the record with a regs value of 1 to be last record by the desired group.

Proc sort data= apps out = sortapp;
by id app_num regs;
run;
data sortapp1;
set sortapp;
by id app_num;
retain sumreg;
firstnum = first.a_num;
lastnum = last.a_num;
if first.a_num then sumreg = sum(regs);
else sumreg = sum(sumreg, regs);
run;

SAS: Identify first and last record by group

Staff are in two wage category “H” and “S” . Sort the table by wage category and wage rate.  Firstrate will equal to 1 if it is the first record in the wagecategory, and the lastrate will equal to 1 if it is the last record in the wagecategory.

proc sort data=sasuser.staff out =sortstaff;
by wagecategory wagerate;
run;
data sortstaff;
set sortstaff;
by wagecategory;
firstrate = first.wagecategory;
lastrate = last.wagecategory;
run;

SAS: Proc Transpose long to wide

  • Need to sort by the variables in the by statement before using the proc transpose.
  • variables in the by statement are row labels;
  • contents in the id variable become the column labels;
  • var variable is the value of the table.
proc transpose data=sum out=sumwide;
by program academicyear yearlevel;
id type;
var count;
run;

SAS: Proc Summary and Statistics Output

SUM ONE VARIABLE

%MACRO summ_prg (data=, out=);
proc summary nway missing data=&data ;
class year program level1 level2;
id progname faculty category ;
var value;
output out=&out(drop=_freq_ _type_)  sum(value)=;
run;

%mend summ_prg;

SUM MULTIPLE VARIABLES

%MACRO summ_prg (data=, out=);
proc summary nway missing data=&data ;
class year program level1 level2;
id progname faculty category ;
var value1 value2 value3;
output out=&out(drop=_freq_ _type_)  sum(value1 value2)=;
run;

%mend summ_prg;

To sum all the numerica variables, use var

_numeric_

and sum=.

CONDITIONAL SUM VARIABLES

%MACRO summ_prg (data=, out=);
proc summary nway missing data=&data (WHERE = (CON_VAR = "XXX ")) ;
class year program level1 level2;
id progname faculty category ;
var value1 value2 value3;
output out=&out(drop=_freq_ _type_)  sum(value1 value2)=;
run;

%mend summ_prg;

OUTPUT DESCRIPTIVE STATISTICS

%MACRO summ_prg (data=, out=);
proc summary nway missing data=&data ;
class year program level1 level2;
id progname faculty category ;
var value;
output out=&out(drop=_freq_ _type_) n(value) = N mean(value)=mean q1(value)=q1 q3(value) =q3;
run;
%mend summ_prg;

SAS: Pros and Cons of Methods to Output SAS Results to Excel

  1. ODS tagset.excelxp
    • Con:can only output to .xml file, not xls or xlsx files;
    • Pro: use Proc Report by statement and output to separate spreadsheets;
  2. ODS html
    • Pro: can output to xls files;
    • Pro : use Proc Report can control label, column width and text wrapping;
    • Con: can’t use Proc Report by statement to output to separate spreadsheets;
ODS HTML BODY = 'C:\cars.xls' style=minimal;
PROC FREQ DATA=sashelp.cars ;
TABLES make*type type origin driveTrain enginesize cylinders horsepower mpg_city ;
ODS OUTPUT OneWayFreqs(match_all)=freqs;
RUN;
ODS HTML CLOSE;
DS HTML BODY = 'C:\cars.xls' style=minimal;
PROC MEANS DATA=sashelp.cars ;
VAR MSRP horsepower mpg_city;
CLASS type origin driveTrain ;
TYPES () type typeorigin typeorigin*drivetrain;
RUN;
ODS HTML CLOSE;

Use PROC Univariate to get histogram by group

proc univariate data =sample.college NOPRINT;
class region type;
histogram tuition / normal (color=red)
cfill =itgray
ctext =blue;
inset N= ‘Number of colleges’ Median (8.2) mean (8.2) std = ‘Standard Deviation’ (8.3) / position = ne;
run;

The following histograms illustrate the distribution of tuition by region and type of college.  You can get a general understanding that private colleges are much more expensive than the public colleges in certain region.

Histogram5