SAS 9.4 ODS RTF file corrupt

Recently, I updated from SAS 9.3 to 9.4. The code was working fine on 9.3. It contains a macro running do loops, producing graphs for each loop, and outputting all the graphs into a rtf file. When I ran the code in 9.4, I can’t open the file in Word. Google search recommends update the sasv9.cfg file to increase the -memsize and -memmaxsz to 6G (https://communities.sas.com/t5/ODS-and-Base-Reporting/SAS-ODS-RTF-file-corrupt/td-p/282779). Another alternative I find is to just change ODS RTF to ODS PDF without changing any code. When running the ODS PDF statements, I got the following warning, but the PDF file was generated without a problem.

Warning: Unsupported device "ACTIVEX" for LISTING destination.  Using device "ACTXIMG".

To get rid of the warning, in SAS EG, navigating to Tools/Options…/Results/Graph. In the “Graph Format” dropdown list, changing “ActiveX” to “ActiveX image”.

SAS Installation: ” Connect: Class not registered “

Issue: Installed SAS 9.4 on the machine where SAS 9.3 has already been installed. Not able to import Excel file in SAS 9.4.
Error message:
ERROR: Connect: Class not registered
ERROR: Error in the LIBNAME statement
Configuration:
– office 365 (32 bit)
– office 2010 (64 bit)
– Windows 7 (64 bit)
– SAS 9.3
– SAS 9.4 (64 bit)
– odbc 64bit: C:\Windows\System32\odbcad32.exe
– odbc 32bit: C:\Windows\SysWOW64\odbcad32.exe

Possible Reason: SAS PC Files Server not installed or not working properly. Need Excel ODBC driver in the same bitness as in SAS.
reference: http://support.sas.com/kb/60/356.html
Attempt 1: Stop SAS PC Files Server in Service (administrative tools); uninstal SAS PC Files Server; download SAS PC Files Server from (pcfilesrv__94200__wx6__en__web__1.zip) from https://support.sas.com/downloads/package.htm?pid=2167; unzip, install from setup.exe; get Java private environment error when installing PC Files Server.
Results 1: not working

Possible Reason: SAS 9.4 is not connected to the right ODBC. ODBC 32bit has Excel in the driver list but ODBC 64bit doesn’t have Excel in the driver list.
reference: https://support.microsoft.com/en-ca/help/942976/odbc-administrator-tool-displays-both-the-32-bit-and-the-64-bit-user-d
Attempt 2: run as administrator on both ODBC files; add excel driver as the system data source in the “System DSN” tab; change target from “\system32\odbcad32.exe” to \syswow64\odbcad32.exe” in the property of “Data Sources (ODBC) in Administrative Tools
Results 2: not working

Possible Reason: office 365 doesn’t have the match Access Engine for program external to Microsoft Office to access the Office 365 application data
Attempt 3:
— download Access Database engine 2016 distribution (32bit)
— install to cmd, cd to where the installation file is saved, run AccessDatabaseEngine.exe /quiet /passive
— open registry editor on window 7: windows key + r -> run regedit -> yes
— navigate to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\16.0\Common\FilesPaths, delete mso.dll
reference: http://support.sas.com/kb/43/933.html
https://knowledge.autodesk.com/support/autocad-map-3d/learn-explore/caas/screencast/Main/Details/7132d430-dc09-40bc-9148-94ff9db41c24.html
Results 3: not sure why but WORKED, but doesn’t work on the Office 365 32bit installation.

Attempt 4:
— uninstall Microsoft Office 365 (32bits)
— install Microsoft Office 365 (64bits)
— uninstall Access Database engine 2016 distribution (32bits)
— install Access Database engine 2016 distribution (64bits)
— install to cmd, cd to where the installation file is saved, run AccessDatabaseEngine.exe /quiet /passive
— open registry editor on window 7: windows key + r -> run regedit -> yes
— navigate to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\16.0\Common\FilesPaths, delete mso.dll
Results 4: worked. matched SAS 64bits with Office 365 64bits

SAS: Find String Characters within String Variable

Find “Mike” in the Name variable. Use scan function when you know the position of the word within the variable and the delimiter. Use contains for general search. Use substr function when you know the starting position of the word within the variable.

  • Scan
data a;
set b;
where scan(name, 1, ' ') = "Mike";
run;
/* nested Scan example */
/* data string is like 'XX XX ABCD>XXXX' */ 
/* need to get the 'ABCD' portion of the string, but it can be different lengths */
data temp;
set set1;
temp =scan(scan(var, 3, ' '),1, '>');
keep temp;
run;
  • Contains
data a;
set b;
where upcase(name) contains "MIKE";
run;
  • Substr
data a;
set b;
where substr(name, 1, 4) = "Mike";
run;

GitHub: Download a file from github

  1. Using Jupyter Notebook
    • Open the file in the github and copy the url
    • Open Jupyter Notebook
    • Click “upload” button and paste the url in the “File Name:” field in the “File Upload” window. Click “Open”.
    • Click “upload” button where the file entry is.

SAS STAT: getting output statistics (2)

This post focus on the ODS Output for regression, including ANOVA, least square means, parameter estimates etc.

Each output from a SAS procedure has an associated name and label.  Each name is part of a name path, and each label is part of a label path.  In order to select, exclude, or modify an output table, you must first know its name.  Each level in the name path corresponds to a part of the procedure’s hierarchy of output that you can view in the Results window.

ods output 
parameterestimates = outputtable1 
fitstatistics =  outputtable2
overallanova = outputtable3
lsmeans = outputtable4
nobs = outputtable5
means = outputtable6;
ODS table name  Statement / Option Key Stats
ParameterEstimates Model / Solution Estimated linear model coefficients, t Value, Probability T, standard error
FitStatistics Default r-square, Coeff Var, Root MSE, dependent mean
OverallAnova Default DF, sum of squares, mean square, F value, Probability F
LsMeans LsMeans Least squares means are requested for each effect listed in the LSMEANS statement. LSMEANS can perform on classification variables and multiple comparison on interactions. LSMEANS are predicted population margins; that is, they estimate the marginal means over a balanced population.
Nobs Default Number of Observations
Means Means When variables in the Means statement is the same as in the Model statement, Means statement will provide the count of observations for each model effect or combination of effects.

Principal Component Analysis and Factor Analysis

Reference: A Step-by-Step Approach to Using SAS for Factor Analysis and Structural Equation Modeling

Options:

  • flag: output to flag factor loading with absolute values greater than the specified size. eg flag = .40
  • method: extracting method
  • prin: principal factors for a principal component analysis
  • mineigen: critical eigenvalue to retain and rotate any component whose eigenvalue is specified or lager.
  • nfact: number of components to be retained and rotated

SAS: Standardize Variables

Standardized variables can be referred as Z score with a mean of 0 and standard deviation of 1. When multi-scale variables are enter into Regression analysis, the variables with larger variances will have more importance and influence on the results than the variables with small variances. It is important to standardized variables in the preprocessing step for regression analysis, cluster analysis, and neural network.

To calculate the standardized variable, use the non-standardized variable minus the mean and then divided by standard deviation.

Get means and standard deviation before standardizing the variables.

proc univariate data = data1 out=stat;
var var1 var2 var3;
run;

In the data set, the variables are not measured in the same units and cannot be assumed to have equal variance. We use PROC STDIZE to standardize the variables.

proc stdize data= data1 out=Stand method =std;
var var1 var2 var3;
run;

Get means and standard deviation after standardizing the variables. Means of the variables should equal 0 and standard deviations equal to 1.

proc univariate data = Stand out=stat;
var var1 var2 var3;
run;

Eigenvalue and Eigenvector

Google Definition: 1) each of a set of values of parameter of which a differential equation has a nonzero solution (an eigenfuction) under given conditions. 2) any number such that a given matrix minus that number times the identity matrix has a zero determinant.

reference: https://www.khanacademy.org/math/linear-algebra/alternate-bases/eigen-everything/v/linear-algebra-introduction-to-eigenvalues-and-eigenvectors

Kaiser-Guttman Criterion: ‘Eigenvalues greater than one’ (Guttan, 1954; Kaiser, 1960, 1970) is commonly used to determine number of factors to retain. The thinking behind the criterion is “that a factor must account for at least as much variance as an individual variable” per Nunnally and Bernstein (1994)

SAS: Missing Value Imputation Procedures

Unfortunately that some of my earlier posted pages are missing from the WordPress site and I am not sure how it happened.

reference: https://www.youtube.com/watch?v=JQHAA_AGxrE
reference: https://www.youtube.com/watch?v=V2wBBB0wNRc

It is very common when conducting survey, the results are incomplete or having missing values. Most of the statistics procedures will drop the observation with missing values which will some time render the survey unable to draw conclusion due to not enough sample size. SAS provides some procedures to impute those missing values so the observation can be used for analysis.

/* --- Identify Missing Value --- */
proc format;
 value $missfmt ' ' = 'Missing' other = 'Not Missing';
 value missfmt .  = 'Missing' other = 'Not Missing';
run;

proc freq ;
format _char_ $missfmt.;
format _numeric_ missfmt.;
tables _char_ / missing nocum nopercent;
tables _numeric_ /missing nocum nopercent;
run;
/* --- Multiple Impuatation Method --- */
/* --- Step 1. Generate Imputed Samples --- */
proc mi data = data nimpute=20 seed = 135782 
    out = ImputedSample;
var v1-v3;
run;
quit;

/* --- Step 2. Fit the Generated Imputed Samples --- */
proc reg data=  ImputedSample outest=Estimates covout;
model v1 = v2-v3;
by _Imputation_;
run;
quit;

/* --- Step 3: Use Proc Mianalyze   --- */
proc mianalyze data = Estimates;
    modeleffects Intercept v2 v3;
run;
quit;
/* --- Direct maximum Likelyhood --- */
/* --- Step 1. Proc Calis --- */
proc calis data = d2 method = fiml;
path v1 <--- v2 v3;
run;
quit;

Numeric missing value in the SAS table some time shows as . (dot)and other times as _ (underscore). When the value of the variable are used to create categorical variable, please taking the null value into consideration. Sometimes the condition ‘ne .’ would not exclude null values but ‘gt .’ can. The following macro is used to identify the significance of the p value. There are some null value in the p-value variable

%macro sig (data =,stat=, var =);
%let i=1;
%do %while (%scan(&data., &i, ' ') ne );
%let dsn=%scan(&data., &i, ' ');
%put &dsn.;
data &dsn._&stat.;
set &dsn._&stat. ;
format sig&var. $4. prob&var.  pvalue6.4;
if Prob&var.=0.01  then sig&var.="*";
if Prob&var.=0.001   then sig&var.="**";
if Prob&var.=0.0001  then sig&var.="***";
if Prob&var.<0.0001 and Prob&var. gt .  then sig&var.="****"; /* exclude null value */
if Prob&var.<0.0001 and Prob&var. ne .  then sig&var.="****"; /* ne . doesn't work. **** shows for null p-value/
run;
%let i = %eval (&i +1);
%end;
%mend sig;
%sig (data =group1 group2, stat= anova, var= f);

SAS: Batch Macro Variable Assignment by Class and Application in Dynamic Labeling

Set up “Class” table for Proc Tabulation.  “Class” table can force missing value in the table so the layout of the combination of class variables will not change due to 0 counts of the frequency of the variables.

data class;
do rank= 1, 2, 3;
do area = '0', '1';
do gender = 'F', 'M';
output;
end;
end;
end;
run;

Output Proc Tabulate summary to data set using “Class” layout. “Layout” data set contains frequency counts by rank by area and by gender.

proc tabulate  data = data out = layout missing classdata = class;
class rank area gender ;
var count;
table rank*(area="" all), (gender all)*count*(sum="");
run;

Add customary titles by rank by area in the “Layout” data set. Assign counts and titles to macro variables.

data layout;
set layout;
length varname $11 sum $2 title1 $60 title2 $60 vartitle1 $12 vartitle2 $12;
if count_sum = . then count_sum = 0;
sum = put(count_sum, 2.);
drop _TYPE_ _PAGE_ _TABLE_;
if gender = '' then gender = 'N';
if rank = 1 then title1 = "2017 Assistant Professors/Lecturers";
if rank=2 then title1 = "2017 Associate Professors/Lecturers";
if rank =3 then title1 = "2017 Fall Professors/Senior Lecturers";
if area = 0 then title2 = "General Arts"; 
if area = 1 then title2 = "Engineering";
if area = '' then do;
title2 = "All Areas of Specialization";
varname ="rank"||trim(left(rank_cd))||gender;
vartitle1 = "rank"||trim(left(rank_cd))||'t1';
vartitle2 = "rank"||trim(left(rank_cd))||'t2';
end;
else do;
varname = "rank"||trim(left(rank_cd))||"area"||trim(left(area))||gender;
vartitle1 = "rank"||trim(left(rank_cd))||"area"||trim(left(area))||'t1';
vartitle2 = "rank"||trim(left(rank_cd))||"area"||trim(left(area))||'t2';
end;
call symput(varname, sum);
call symput (vartitle1, title1);
call symput (vartitle2, title2);
run;
options symbolgen;

Application
Print the regression line by rank by area with corresponding titles and insert counts information of total observation and by gender on the respective chart. The “Inset” statement will translate the macro variables into format like N=75(46F/29M) and put into a text box inside of the axes of the plot.

ods pdf file="c:\test.pdf";
%macro plots;
%do j = 1 %to 3;
%do i = 0 %to 1;
title1 height=14pt "&&rank&j.area&i.t1";
title2 height=12pt "&&rank&j.area&i.t2";
proc sgplot data=glm_out (where=(rank=&j. and area = "&i.")) ;
scatter x=experience y=salary / group=gender grouporder=ascending  name='plot' markerattrs=(symbol=circlefilled) ;
series x=experience y=yhat / name='predict' legendlabel='ln(Predicted Sal)' lineattrs=(color=blue ) transparency = 0.5 ;
inset "N=&&rank&j.area&i.N (&&rank&j.area&i.f.F/&&rank&j.area&i.m.M)"  / position = bottomright  BORDER;
run;
quit;
%end;
%end;
%mend plots;
%plots
Ods pdf close;