SAS Installation: ” Connect: Class not registered “

Issue: Installed SAS 9.4 on the machine where SAS 9.3 has already been installed. Not able to import Excel file in SAS 9.4.
Error message:
ERROR: Connect: Class not registered
ERROR: Error in the LIBNAME statement
Configuration:
– office 365 (32 bit)
– office 2010 (64 bit)
– Windows 7 (64 bit)
– SAS 9.3
– SAS 9.4 (64 bit)
– odbc 64bit: C:\Windows\System32\odbcad32.exe
– odbc 32bit: C:\Windows\SysWOW64\odbcad32.exe

Possible Reason: SAS PC Files Server not installed or not working properly. Need Excel ODBC driver in the same bitness as in SAS.
reference: http://support.sas.com/kb/60/356.html
Attempt 1: Stop SAS PC Files Server in Service (administrative tools); uninstal SAS PC Files Server; download SAS PC Files Server from (pcfilesrv__94200__wx6__en__web__1.zip) from https://support.sas.com/downloads/package.htm?pid=2167; unzip, install from setup.exe; get Java private environment error when installing PC Files Server.
Results 1: not working

Possible Reason: SAS 9.4 is not connected to the right ODBC. ODBC 32bit has Excel in the driver list but ODBC 64bit doesn’t have Excel in the driver list.
reference: https://support.microsoft.com/en-ca/help/942976/odbc-administrator-tool-displays-both-the-32-bit-and-the-64-bit-user-d
Attempt 2: run as administrator on both ODBC files; add excel driver as the system data source in the “System DSN” tab; change target from “\system32\odbcad32.exe” to \syswow64\odbcad32.exe” in the property of “Data Sources (ODBC) in Administrative Tools
Results 2: not working

Possible Reason: office 365 doesn’t have the match Access Engine for program external to Microsoft Office to access the Office 365 application data
Attempt 3:
— download Access Database engine 2016 distribution (32bit)
— install to cmd, cd to where the installation file is saved, run AccessDatabaseEngine.exe /quiet /passive
— open registry editor on window 7: windows key + r -> run regedit -> yes
— navigate to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\16.0\Common\FilesPaths, delete mso.dll
reference: http://support.sas.com/kb/43/933.html
https://knowledge.autodesk.com/support/autocad-map-3d/learn-explore/caas/screencast/Main/Details/7132d430-dc09-40bc-9148-94ff9db41c24.html
Results 3: not sure why but WORKED, but doesn’t work on the Office 365 32bit installation.

Attempt 4:
— uninstall Microsoft Office 365 (32bits)
— install Microsoft Office 365 (64bits)
— uninstall Access Database engine 2016 distribution (32bits)
— install Access Database engine 2016 distribution (64bits)
— install to cmd, cd to where the installation file is saved, run AccessDatabaseEngine.exe /quiet /passive
— open registry editor on window 7: windows key + r -> run regedit -> yes
— navigate to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\16.0\Common\FilesPaths, delete mso.dll
Results 4: worked. matched SAS 64bits with Office 365 64bits

SAS: Find String Characters within String Variable

Find “Mike” in the Name variable. Use scan function when you know the position of the word within the variable and the delimiter. Use contains for general search. Use substr function when you know the starting position of the word within the variable.

  • Scan
data a;
set b;
where scan(name, 1, ' ') = "Mike";
run;
/* nested Scan example */
/* data string is like 'XX XX ABCD>XXXX' */ 
/* need to get the 'ABCD' portion of the string, but it can be different lengths */
data temp;
set set1;
temp =scan(scan(var, 3, ' '),1, '>');
keep temp;
run;
  • Contains
data a;
set b;
where upcase(name) contains "MIKE";
run;
  • Substr
data a;
set b;
where substr(name, 1, 4) = "Mike";
run;

SAS STAT: getting output statistics (2)

This post focus on the ODS Output for regression, including ANOVA, least square means, parameter estimates etc.

Each output from a SAS procedure has an associated name and label.  Each name is part of a name path, and each label is part of a label path.  In order to select, exclude, or modify an output table, you must first know its name.  Each level in the name path corresponds to a part of the procedure’s hierarchy of output that you can view in the Results window.

ods output 
parameterestimates = outputtable1 
fitstatistics =  outputtable2
overallanova = outputtable3
lsmeans = outputtable4
nobs = outputtable5
means = outputtable6;
ODS table name  Statement / Option Key Stats
ParameterEstimates Model / Solution Estimated linear model coefficients, t Value, Probability T, standard error
FitStatistics Default r-square, Coeff Var, Root MSE, dependent mean
OverallAnova Default DF, sum of squares, mean square, F value, Probability F
LsMeans LsMeans Least squares means are requested for each effect listed in the LSMEANS statement. LSMEANS can perform on classification variables and multiple comparison on interactions. LSMEANS are predicted population margins; that is, they estimate the marginal means over a balanced population.
Nobs Default Number of Observations
Means Means When variables in the Means statement is the same as in the Model statement, Means statement will provide the count of observations for each model effect or combination of effects.

Principal Component Analysis and Factor Analysis

Reference: A Step-by-Step Approach to Using SAS for Factor Analysis and Structural Equation Modeling

Options:

  • flag: output to flag factor loading with absolute values greater than the specified size. eg flag = .40
  • method: extracting method
  • prin: principal factors for a principal component analysis
  • mineigen: critical eigenvalue to retain and rotate any component whose eigenvalue is specified or lager.
  • nfact: number of components to be retained and rotated

SAS: Standardize Variables

Standardized variables can be referred as Z score with a mean of 0 and standard deviation of 1. When multi-scale variables are enter into Regression analysis, the variables with larger variances will have more importance and influence on the results than the variables with small variances. It is important to standardized variables in the preprocessing step for regression analysis, cluster analysis, and neural network.

To calculate the standardized variable, use the non-standardized variable minus the mean and then divided by standard deviation.

Get means and standard deviation before standardizing the variables.

proc univariate data = data1 out=stat;
var var1 var2 var3;
run;

In the data set, the variables are not measured in the same units and cannot be assumed to have equal variance. We use PROC STDIZE to standardize the variables.

proc stdize data= data1 out=Stand method =std;
var var1 var2 var3;
run;

Get means and standard deviation after standardizing the variables. Means of the variables should equal 0 and standard deviations equal to 1.

proc univariate data = Stand out=stat;
var var1 var2 var3;
run;

SAS: Missing Value Imputation Procedures

Unfortunately that some of my earlier posted pages are missing from the WordPress site and I am not sure how it happened.

reference: https://www.youtube.com/watch?v=JQHAA_AGxrE
reference: https://www.youtube.com/watch?v=V2wBBB0wNRc

It is very common when conducting survey, the results are incomplete or having missing values. Most of the statistics procedures will drop the observation with missing values which will some time render the survey unable to draw conclusion due to not enough sample size. SAS provides some procedures to impute those missing values so the observation can be used for analysis.

/* --- Identify Missing Value --- */
proc format;
 value $missfmt ' ' = 'Missing' other = 'Not Missing';
 value missfmt .  = 'Missing' other = 'Not Missing';
run;

proc freq ;
format _char_ $missfmt.;
format _numeric_ missfmt.;
tables _char_ / missing nocum nopercent;
tables _numeric_ /missing nocum nopercent;
run;
/* --- Multiple Impuatation Method --- */
/* --- Step 1. Generate Imputed Samples --- */
proc mi data = data nimpute=20 seed = 135782 
    out = ImputedSample;
var v1-v3;
run;
quit;

/* --- Step 2. Fit the Generated Imputed Samples --- */
proc reg data=  ImputedSample outest=Estimates covout;
model v1 = v2-v3;
by _Imputation_;
run;
quit;

/* --- Step 3: Use Proc Mianalyze   --- */
proc mianalyze data = Estimates;
    modeleffects Intercept v2 v3;
run;
quit;
/* --- Direct maximum Likelyhood --- */
/* --- Step 1. Proc Calis --- */
proc calis data = d2 method = fiml;
path v1 <--- v2 v3;
run;
quit;

Numeric missing value in the SAS table some time shows as . (dot)and other times as _ (underscore). When the value of the variable are used to create categorical variable, please taking the null value into consideration. Sometimes the condition ‘ne .’ would not exclude null values but ‘gt .’ can. The following macro is used to identify the significance of the p value. There are some null value in the p-value variable

%macro sig (data =,stat=, var =);
%let i=1;
%do %while (%scan(&data., &i, ' ') ne );
%let dsn=%scan(&data., &i, ' ');
%put &dsn.;
data &dsn._&stat.;
set &dsn._&stat. ;
format sig&var. $4. prob&var.  pvalue6.4;
if Prob&var.=0.01  then sig&var.="*";
if Prob&var.=0.001   then sig&var.="**";
if Prob&var.=0.0001  then sig&var.="***";
if Prob&var.<0.0001 and Prob&var. gt .  then sig&var.="****"; /* exclude null value */
if Prob&var.<0.0001 and Prob&var. ne .  then sig&var.="****"; /* ne . doesn't work. **** shows for null p-value/
run;
%let i = %eval (&i +1);
%end;
%mend sig;
%sig (data =group1 group2, stat= anova, var= f);

SAS: Batch Macro Variable Assignment by Class and Application in Dynamic Labeling

Set up “Class” table for Proc Tabulation.  “Class” table can force missing value in the table so the layout of the combination of class variables will not change due to 0 counts of the frequency of the variables.

data class;
do rank= 1, 2, 3;
do area = '0', '1';
do gender = 'F', 'M';
output;
end;
end;
end;
run;

Output Proc Tabulate summary to data set using “Class” layout. “Layout” data set contains frequency counts by rank by area and by gender.

proc tabulate  data = data out = layout missing classdata = class;
class rank area gender ;
var count;
table rank*(area="" all), (gender all)*count*(sum="");
run;

Add customary titles by rank by area in the “Layout” data set. Assign counts and titles to macro variables.

data layout;
set layout;
length varname $11 sum $2 title1 $60 title2 $60 vartitle1 $12 vartitle2 $12;
if count_sum = . then count_sum = 0;
sum = put(count_sum, 2.);
drop _TYPE_ _PAGE_ _TABLE_;
if gender = '' then gender = 'N';
if rank = 1 then title1 = "2017 Assistant Professors/Lecturers";
if rank=2 then title1 = "2017 Associate Professors/Lecturers";
if rank =3 then title1 = "2017 Fall Professors/Senior Lecturers";
if area = 0 then title2 = "General Arts"; 
if area = 1 then title2 = "Engineering";
if area = '' then do;
title2 = "All Areas of Specialization";
varname ="rank"||trim(left(rank_cd))||gender;
vartitle1 = "rank"||trim(left(rank_cd))||'t1';
vartitle2 = "rank"||trim(left(rank_cd))||'t2';
end;
else do;
varname = "rank"||trim(left(rank_cd))||"area"||trim(left(area))||gender;
vartitle1 = "rank"||trim(left(rank_cd))||"area"||trim(left(area))||'t1';
vartitle2 = "rank"||trim(left(rank_cd))||"area"||trim(left(area))||'t2';
end;
call symput(varname, sum);
call symput (vartitle1, title1);
call symput (vartitle2, title2);
run;
options symbolgen;

Application
Print the regression line by rank by area with corresponding titles and insert counts information of total observation and by gender on the respective chart. The “Inset” statement will translate the macro variables into format like N=75(46F/29M) and put into a text box inside of the axes of the plot.

ods pdf file="c:\test.pdf";
%macro plots;
%do j = 1 %to 3;
%do i = 0 %to 1;
title1 height=14pt "&&rank&j.area&i.t1";
title2 height=12pt "&&rank&j.area&i.t2";
proc sgplot data=glm_out (where=(rank=&j. and area = "&i.")) ;
scatter x=experience y=salary / group=gender grouporder=ascending  name='plot' markerattrs=(symbol=circlefilled) ;
series x=experience y=yhat / name='predict' legendlabel='ln(Predicted Sal)' lineattrs=(color=blue ) transparency = 0.5 ;
inset "N=&&rank&j.area&i.N (&&rank&j.area&i.f.F/&&rank&j.area&i.m.M)"  / position = bottomright  BORDER;
run;
quit;
%end;
%end;
%mend plots;
%plots
Ods pdf close;

SAS: Plot Regression Line with 2 Standard Deviation

Get GLM output in data set

proc glm data=data noprint;
class rank_cd area;
model salary= Experience rank area / predicted cli; * regress SALARY against the 3 predictor variables ;
output out = glm_out Predicted = yhat R=resid lcl=lcl lclm=lclm ucl=ucl uclm=uclm rstudent=rstd student=stu dffits = infl stdr =error;
run;

Calcuate sample standard deviation using GLM output data set

proc univariate data=glm_out;
var resid;
output out = univar_out STD = sample_std_devn;
run;

Assign sample standard deviation to macro variable

data std;
set univar_out;
call symput('sstd',sample_std_devn);
run;
%put &sstd.;

Update GLM output data set with 2 std information

data glm_out;
set glm_out;
ustd_2 = yhat + 2*&sstd.;
lstd_2 = yhat - 2*&sstd.;
run;

Plot regression line with 2 standard deviation lines (upper/lower)

proc sgplot data=glm_out (where = (rank = 1 and area =1);
scatter x=experience y=salary / group=gender grouporder=ascending name='plot' markerattrs=(symbol=circlefilled) ;
series x=experience y=yhat / name='predict' legendlabel='ln(Predicted Sal)' lineattrs=(color=blue ) transparency = 0.5 ;
series x=experience y=ustd_2 / name='upper' legendlabel='2 Standard Deviation' lineattrs=(color = lightblue) transparency = 0.5;
series x=experience y=lstd_2 / name='lower' legendlabel='2 Standard Deviation' lineattrs=(color = lightblue) transparency = 0.5;
run;
quit;

SAS: Configuration

SAS configuration file sasv9.cfg can be edited to change the SAS system options at SAS initialization. Certain system options can only be configured through the .cfg file, for example, -memsize and -altlog. -memsize defines the ram size used for SAS session; -altlog defines the file directory for copy of the sas log. In the configuration file, the syntax for system option is ‘-option’ which is different from when they are used in sas programs.

The Default Configuration file
!SASROOT = C:\Program Files\SASHome\x86\SASFoundation\9.3
Default folder (where all plots and results are saved) = C:\Program Files\SASHome\x86\SASFoundation\9.3

Sometimes there are two sasv9.cfg files in separate SAS folders, but one file will be pointing to the other sasv9.cfg which has the configuration setup.
Configuration location = C:\Program Files\SASHome\x86\SASFoundation\9.3\sasv9.cfg
-config “C:\Program Files\SASHome\x86\SASFoundation\9.3\nls\en\sasv9.cfg”
Default configuration location = C:\Program Files\SASHome\x86\SASFoundation\9.3\nls\en\sasv9.cfg

Example 1: change memsize

-MEMSIZE 4G

Example 2: save a copy of the SAS log from the session to user-defined directory

-ALTLOG  "C:\users\ussername\documents\log\%Y-%m-%d_%H-%M.saslog"

SAS: Output Table to Excel Using Tagsets.excelxp

Frequently we need to output SAS results to Excel.  With tagsets.excelxp, we will have the flexibility to create multiple sheets; define worksheet name(s) and column width; contrl titles and footnotes on each sheet.

Here is the basic tageset.excel for the Excel workbook.

ods listing close;
ods tagsets.excelxp path = "" file = "" style =
options (embedded_titles='' embedded_footnotes='' orientation='' sheet_name ='' pages_fitwidth='' pages_fitheight='' autofit_height=''
width_fudge="" absolute_column_width= "");
ods tagsets.excelxp close;

If the individual worksheet has different setting, put the following  statement before the corresponding SAS procedure that generates the content for the worksheet.

ods tagsets.excelxp options (sheet_name='RespHome');

Use sheet_name and sheet_interval to generate multiple sheets by groups defined in the procedure in the same Excel workbook.

... sheet_name='#byval1' sheet_interval="bygroup" ...

sample code:

ods listing close;
ods tagsets.excelxp path = "C:\SAS\" file = "sum.xls" style = journal
options (embedded_titles='yes' embedded_footnotes='yes' orientation='portrait' pages_fitwidth='1' pages_fitheight='1' autofit_height='yes'
width_fudge=".0625" absolute_column_width= "100");
title1 'Title1';
title2 "Title2";
footnote "Prepared by First Last @dept on %sysfunc(date(),worddate.)";
ods tagsets.excelxp options (sheet_name='homeresp');
proc tabulate data=count;
class home /order= formatted ;
class reponsible;
var count;
table (home="" all="Total"), (responsible="Responsible Faculty" all="Total")*(count="")*(sum="" )*f=10.3/ box = "Home Faculty" misstext = '0.000';
run;
ods tagsets.excelxp options (sheet_name='RespHome');
proc tabulate data=count;
class responsible /order= formatted ;
class home ;
var count;
table (responsible="" all="Total"), (home="Home Faculty" all="Total")*(count="")*(sum="" )*f=10.3/ box = "Resonsible Faculty" misstext = '0.000';
run;
ods tagsets.excelxp close;