SAS: Plot Regression Line with 2 Standard Deviation

Get GLM output in data set

proc glm data=data noprint;
class rank_cd area;
model salary= Experience rank area / predicted cli; * regress SALARY against the 3 predictor variables ;
output out = glm_out Predicted = yhat R=resid lcl=lcl lclm=lclm ucl=ucl uclm=uclm rstudent=rstd student=stu dffits = infl stdr =error;
run;

Calcuate sample standard deviation using GLM output data set

proc univariate data=glm_out;
var resid;
output out = univar_out STD = sample_std_devn;
run;

Assign sample standard deviation to macro variable

data std;
set univar_out;
call symput('sstd',sample_std_devn);
run;
%put &sstd.;

Update GLM output data set with 2 std information

data glm_out;
set glm_out;
ustd_2 = yhat + 2*&sstd.;
lstd_2 = yhat - 2*&sstd.;
run;

Plot regression line with 2 standard deviation lines (upper/lower)

proc sgplot data=glm_out (where = (rank = 1 and area =1);
scatter x=experience y=salary / group=gender grouporder=ascending name='plot' markerattrs=(symbol=circlefilled) ;
series x=experience y=yhat / name='predict' legendlabel='ln(Predicted Sal)' lineattrs=(color=blue ) transparency = 0.5 ;
series x=experience y=ustd_2 / name='upper' legendlabel='2 Standard Deviation' lineattrs=(color = lightblue) transparency = 0.5;
series x=experience y=lstd_2 / name='lower' legendlabel='2 Standard Deviation' lineattrs=(color = lightblue) transparency = 0.5;
run;
quit;

SAS Proc Sgplot: Assign colors by group in Statistics Plot

Reference: https://blogs.sas.com/content/iml/2012/10/17/specify-the-colors-of-groups-in-sas-statistical-graphics.html; https://blogs.sas.com/content/graphicallyspeaking/2012/02/27/roses-are-red-violets-are-blue/

 

Original code use gplot

%macro plots;
%do i = 0 %to 1;
%if &i=0 %then %do;
title height=12pt "Regular Salary Group";
%end;
%else %do;
title height=12pt "High Salary Group";
%end;
%do j = 2 %to 4;
%if &j=2 %then %do;
title2 height=14pt "&curr_fiscal_year  Assistant Professors";
%end;
%else %if &j=3 %then %do;
title2 height=14pt "&curr_fiscal_year  Associate Professors";
%end;
%else %do;
title2 height=14pt "&curr_fiscal_year Full Professors";
%end;
proc gplot data=anno_&i(where=(rank_cd=&j)) anno=anno_&i(where=(rank_cd=&j));
plot y*x / haxis=axis1 vaxis=axis2 noframe;
symbol1 v=dot h=.6 w=.6 color='Black';
format basesal comma7.;
run;
quit;
%end;
%end;
%mend plots;
%plots;

New code with the following improvements:

  1. produce regression line with color and transparency attributes
  2. produce 95% confidence limit with color and transparency attributes
  3. produce scatter plot by gender group and use grouporder attribute to make sure fixed color is assigned to Male and Female.
  4. output the plots to pdf

 

ods pdf file="X:regression_with_gender_label.pdf";
goptions reset = global;
%macro plots;
%do i = 0 %to 1;
%if &i=0 %then %do;
title height=12pt "Regular Salary Group";
%end;
%else %do;
title height=12pt "High Salary Group";
%end;
%do j = 2 %to 4;
%if &j=2 %then %do;
title2 height=14pt "&curr_fiscal_year Assistant Professors";
%end;
%else %if &j=3 %then %do;
title2 height=14pt "&curr_fiscal_year Associate Professors";
%end;
%else %do;
title2 height=14pt "&curr_fiscal_yearFull Professors ";
%end;
proc sgplot data=log_glm (where=(rank_cd=&j. and area = "&i.")) ;
scatter x=exper y=log_sal / group=gender grouporder=ascending name='plot' markerattrs=(symbol=circlefilled);
series x=exper y=yhat / name='predict' legendlabel='ln(Predicted Log Sal)' lineattrs=(color=blue ) transparency = 0.5 ;
series x=exper y=ucl / name='upper' legendlabel='Upper Confidence Level' lineattrs=(color = lightblue) transparency = 0.5;
series x=exper y=lcl / name='lower' legendlabel='Lower Confidence Level' lineattrs=(color = lightblue) transparency = 0.5;
run;
quit;
%end;
%end;
%mend plots;
%plots
Ods pdf close;

Problem: myattrmap not reconginzed by the scatter statement though ‘attrid= myid’ doesn’t generate error.

data myattrmap;
retain id value linecolor fillcolor;
length linecolor $ 9 fillcolor $ 9;
input ID $ value $ linecolor $ fillcolor $;
datalines;
myid F blue blue
myid M red red;
run;

SAS: Proc Boxplot and Proc Sgplot

It is easier to use proc sgplot than proc boxplot to compare distibution by classification variable.  “Drive Train” and “Type” are both categorical variables.

proc sgplot data=sashelp.cars;
title "Price distribution by Drive Train and Type";
vbox invoice / category =type group = drivetrain;
run;
  • side by side comparison
  • group became legend
  • applied legend color by the group
  • inset statement for sgplot doesn’t have statistics output (n/min/max/mean/stddev)

SGPlot22

proc sort data=sashelp.cars out=cars;
by drivetrain type;
run;
proc boxplot data=cars;
title "Price distribution by Drive Train and Type";
plot invoice*type;
by DriveTrain;
inset min mean max stddev/ header = "Overall Statistics";
insetgroup min max / header = "Cheap and Expensive by Type";
run;
  • need to sort the data first according to by statement and plot categorical variable;
  • plot in light blue; want other color, need extra code
  • not able to show 2 categorical variable plot side by side;
  • use by statement use produce plot separately.
  • inset and insetgroup are nice to have to produce stats as part of the plot.
    • inset: data, min, max, mean, nmax, nmin, dobs, stddev;
    • insetgroup: max, mean, min, n, nhigh, nlow, nout, q1, q2, q3, range, stddev;

Boxplot22

Boxplot24