Survey basic frequency macro

This coding can be used to run

Case 1: Count frequency of distribution for one question and exclude a category from the counting results (eg. exclude records from the question with 88 as the answer in the frequency distribution).

%macro count  (field=);

proc freq data=indata (where = (&field ne 88));
tables var1*(&field) ;
run;
%mend count;

%count (field=q1);

Case 2: Count frequency of distribution for one question conditional on the result of a different question. In the example, the where statement will delete records with q1 equals to 88 and then delete records with q10 not equal to 2 or 3 or 4.  For the rest of the records, proc freq will run the distribution of q1.

%macro count (field1=, field2=);

proc freq data=indata (where = (&field1 ne 88 and &field2 in (2,3,4)));
tables var1*(&field1) ;
run;
%mend count;

%count (field1=q1, field2=q10);

Merge data with Data step

Because usually the information for the subject of interest are scattered in different datasets, it is common to join the tables to gather all the variables of the subject.  There needs to be a common id field with the same data format in all the datasets that need to be merged.

Both dataset A and B have a variable id. Merge statement join the variables from A and B by the id variable in A.  All the records and fields from A will be in the merged table and those records with equal id field from B will be added to the records in the merged table.

Caution:

  1. Make sure id in A is unique
  2. id field in both A and B should be the same type of field (character or numeric)
  3. After merging the table, need to check if there is any duplicated information and why.

data merged;

merge A (in =a) B;

by id;

if a;

run;

 

SAS: data extraction and merge routine (2)

In the following data step, table 1 and table 2 have the same data structure. Setting multiple tables in the data step will stack the tables. First, the where statement will make the program more efficient by applying the conditions to all the datasets in the Set statement. If statements create subsets and apply action(s) to specific subsets.
Data temp;
set table1 table2;
where  var1 = ‘condition1’ and var2 = ‘condition2’ and var3 NE ‘condition3’;
if var4  = a and var5 = ‘3’ then do; output; end;
if var4 = b and var5 IN (‘1′,’2’) then do; output;end;
run;

SAS: Export to Excel with Label Option and 9.3/9.4 Difference

It is common to export SAS data tables or outputs to Excel spreadsheets.  The default Excel output will only contain field names in the first row of the spreadsheet.  The ‘label’ option will allow the SAS label names to be the column names on the first row of the exported table.

Method 1:

PROC EXPORT DATA=table1.
OUTFILE= “C:\Work\SAS\table1.xlsx”
label DBMS=EXCEL REPLACE;
SHEET=”table1″;
NEWFILE=YES;
RUN;

method 2:

libname excelout “X:\Work\excelfile.xlsx”;

data excelout.sheet1 (dblabel=yes) ;
set tablename;

run;

libname excelout clear;

9.4 SAS Difference

PROC EXPORT

PROC EXPORT DATA= SASHELP.CARS 
OUTFILE= "C:\Excel\car.xlsx" 
            DBMS=EXCELCS REPLACE;
     SHEET="car"; 
RUN;
  • Bug: a blank tab (_SAS_empty_) with A1 = “_empty_(CHECK_OTHER_SHEET) was generated in the xlsx file.

Solution: https://communities.sas.com/t5/SAS-Programming/Proc-Export-creating-a-blank-SAS-empty-sheet-in-the-exported/td-p/483623

  • Change DBMS=EXCEL to DBMS=XLSX and the blank tab will disappear.
PROC EXPORT DATA= SASHELP.CARS 
OUTFILE= "C:\Excel\car.xlsx" 
            DBMS=XLSX REPLACE;
     SHEET="car"; 
RUN;

Alternative 1: LIBNAME method

  • Add xlsx in the libname statement;
  • dblabel = option not work for the xlsx engine.
Libname exl xlsx "C:\Excel\car.xlsx";
data exl.cars;
set SASHELP.CARS ;
run;
libname exl clear;

Alternative 2: ODS EXCEL method

  • ods excel; ods excel close;
  • file =” “;
  • ods excel options (sheet_name = ” “);
  • proc print noobs label data=;run;
ods excel file= "C:\Excel\&fac._research.xlsx";
ods excel options (sheet_name = "&fac.");
proc print noobs label data =research; 
where faculty = "&fac.";
run; 
ods excel close;

SAS: Identfy and Remove Duplicate Records

Step 1: Check duplicate records by one field or a combination of fields.

proc freq data=temp noprint;
table var1 /out =dupl (keep =var1 count where = (count >1));
run;

proc freq data=temp noprint;
table var1*var2 /out =dupl (keep =var1 var2 count where = (count >1));
run;

Examine the output to see what circumstances can cause duplicate records.

Step 2: Remove duplicate records.

proc sort data=temp NODUPKEY;
by var1 ;
run;

proc sort data=temp NODUPKEY;
by var1 var2 ;
run;

Alternative method to output unique records and duplicated records in two separate datasets.

proc sort data =temp out=temp1;
by var1 var2;
run;
data unique dup;
set temp1;
by var1 var2;
if first.var2 and last.var2 then output unique;
else output dup;
run;

 

SAS:Output SAS Results to Excel Pivot Table

Purpose:

  • To create a pivot table from sas data sets. Apply specific configuration to the pivot table.

Download the tagset from SAS support website.

  • “http://support.sas.com/rnd/base/ods/odsmarkup/tableeditor/tableeditor.tpl”

Save the tagset to local directory

  • “C:\Users\xxxxxx\Documents\My SAS Files(32)\9.3\TableEditor\tableeditor.tpl”

Code Example:

filename temp  “C:\Users\xxxxxx\Documents\My SAS Files(32)\9.3\TableEditor\tableeditor.tpl”;
%include temp;
ods tagsets.Tableeditor file=”C:\Users\xxxxxx\Documents\My SAS Files(32)\9.3\AAPR\Excel\temp\temp.html”
options(BUTTON_TEXT=”Create PivotTable”
AUTO_EXCEL=”yes”
UPDATE_TARGET=”C:\\EXAMPLE.XLSX”
OPEN_EXCEL=”no”
PIVOT_SHEET_NAME= “Academic Program Report_UG”
PIVOTPAGE=”faculty”
PIVOTROW=”progname,category, level2, level3″
PIVOTCOL=”year”
PIVOTDATA=”value”
PIVOT_GRANDTOTAL=”no”
PIVOT_SUBTOTAL=”no”
PIVOTDATA_FMT=”#,###”
PIVOT_FORMAT=”medium14″
DELETE_SHEETS=”Sheet1, Sheet2, Sheet3, Table_1″
QUIT=”yes”
);

Proc print data = gr;
var faculty progname level2 level3 year value category;
run;
ods tagsets.tableeditor close;

Code Structure:

  1. set up tableedit targetset directory
  2. call ods statement
  3. config pivot table in options
  4. run proc print statement
  5. close ods

Notes for Options

  • Auto_Excel = “yes” (open Excel automatically)
  • Update_target=”C:\\example.xlsx” ( need to create the excelfile example in the directory first. Notice C:\\  in the directory)
  • Open_Excel= “no” (running excel at the backgroup)
  • Pivotpage= “faculty” (identify field for report filter)
  • Pivot_grandtotal=”no” (hide the grandtotal in the pivot table default view)
  • Pivot_subtotal= “no” (hide the sub-total in the pivot table default view)
  • Quit= “yes” (exit excel)

Problems Identified

  • format in the option setting not working
  • print procedure doesn’t take long (22 seconds) for 20,000 records table, but ods process take longer time to generate the pivot table.  There is no stop signal to show that the process has been completed.
  • The excel file doesn’t seem to be updated until you open the file and the prompt would ask if you would like to save the change and if you click yes the updated the pivot table will show.

 

SAS: Set start and stop for macro variables

Sometimes I need to repeat the same exercise for multiple times, for example faculty want retention rate not just for one year but multiple year, it is easy to use %macro macroname (parameter1, parameter2) in conjunction with %do varname = &value1 &to &value2 to customize the year range in the do loop statement per clients request.

eg;

%macro step (strtyear, stopyear);
%do yr= &strtyear %to &stopyear;

….continue coding

%end;

%mend step;

%step (11, 14);

 

SAS: Convert Variable between Numeric and Character Format

A. Use Put or Input
Convert between numeric and character variable.

  • character to numeric (input)
old_char = "2018";
new_num = input(old_char, 8.);
new_num = 2018;
  • character to numeric (input) to character (put)
old_char = "2018";
new_char = put (input (substr(old_char , 1, 4 ), 8.) -1 , 4.);
new_char = "2017";

/* or */
new_char = put (old_char*1-1 , 4.);
new_char = "2017";
  • numeric to character (put), and with leading Zero.
old_num = 2018;
new_char = put(old_num, 4.); *new_char = "2018";
new_char1 = put(old_num, z8.); *new_char1 = "00002018";
  • numeric to character (put) to numeric (input)
old_num = 2018;
new_num = input(substr(put(old_num, 4.),3, 2) , 8.);
new_num = 18;

Use the following functions to check whether any digit or alphabetic character is in the character variable. The results will be the position of the first digit/alphabetic character in the field.

check1 = anydigit(var); *return position of first digit, 0 if not found in the string;
check2 = anyalpha(var); *return position of first alphabetic character, 0 if not found in the string;
check3 = notalpha(var); *return position of first non digit, 0 if not found in the string;
check4 = notdigit(var); *return position of first non alphabetic character, 0 if not found in the string;
  • Please note there should be no space in between % and statement eg. macro, let, mend, put, eval etc.
  • To apply the macro variable, use &varname. format.

CODE:

%macro setyr;
%let yr = 14;
%let nextyr =%eval(&yr. +1);  /* 15 */
%let next2yr=%eval(&yr. +2);  /* 16 */
%let fisyr = “20%substr(&yr., 1,2)/%substr(&nextyr., 1,2)”;   /* “2014/2015” */
%let fisnextyr = “20%substr(&nextyr., 1,2)/%substr(&next2yr., 1, 2)”;   /* “2015/2016” */
%let year = “20&yr.”; /* “2014” */
%put &yr;
%put &nextyr;
%put &next2yr;
%put &fisyr;
%put &fisnextyr;
%put &year;
%mend setyr;
%setyr;

OUTPUT:
14
15
16
“2014/15”
“2015/16”
“2014”

B. Use Vvalue fuction
Vvalue() returns the formatted value that is associate with the variable.

data want;
data have;
new_charvar1 = vvalue(formatted_numvar1);  *formatted means the variable has been applied with format;
new_numvar2= vvalue(formated_charvar2);
run;

SAS: Variables/Fields with same Prefix

I have a SAS file that stores students course selection (up to 20 fields) and grades for each course (up to 20 fields) for a certain semester. To set the sas working file, I use the follow data step. Both course variables and grades variables have the common prefix “crs” and “grade”.  It is more efficient to list the 20 course variables using “crs1-crs20” than list each and every one of them.

%let varlist = crs1-crs20 grade1-grade20 studentid major;

Data tem1;

Set  coursefile_fw16 (keep = &varlist.);

run;

 

SAS: Acquire new database(1)

Prelude

I always find interesting when I have a business case or project where the business requirements involves technical development even in the smallest way.

I’m working at a reporting unit of the institution. We used to have an application maintained for this function unit for their reporting and query needs. As the business grows, the function unit decided to purchase a new third party software to capture its business transactions and reporting needs.  My department was not involved in the business decision.

The third party application is web based. Because the users want to analyze the data, the vendor provided a simple query interface through the application so the users can extract the data that meets certain criteria.  First of all, the query interface was poorly designed where tables can be viewed awkwardly and the relationship between the tables were not presented in an explicit way. Secondly, the query grid has many limitations regarding the type of the criteria setting and has no capacity to incorporate calculated field to the query criteria. The vendor also provide a sql query tool through the interface, but since the users are not technical savvy and the query interface assumes users have comprehensive knowledge of the underlying database structure and relationships between the tables, the sql query interface wasn’t utilized as intended.

Later my department was asked to produce the statistic report of that function unit, we couldn’t do it because the old database is not maintained and does not have the updated data. We also found out that the third party software can not be easily integrate to meet the institution’s reporting needs. In order to re-establish the reporting function for the new databases, I asked the function unit to contact the vendor to provide me the access to the software as well as establish the back-end access to the data tables through the help of our IT people.  Later, I got access to the back-end tables through Mysql ODBC.

I was using MS Access as an intermediate tool to view the tables because I wasn’t provide with any information on the third party software, not on the  table and fields structure, not on the business process and flow or the relationship between the tables. Basically, it is self-direct investigation.  From the more than 200 tables, I narrow down about 30 tables I would need to produce reports based on an iterative discussion with the function unit on what kind of report and data they want. As a temporary solution, I used Access to understand the database and develop the queries and reports for users to verify.  The next task will be set up the database and develop reports on SAS platform.