SAS: Change Variable Length in one data step

Need to put Length statement before the Set statement to change the length of existing variable. Use the LENGTH statement as the very first statement in the DATA STEP to declare the length of variable in the Set dataset.
thekey and program variables in the mapping dataset are both $7 in length.  The following data step change the length of these two variables to $10.

data mapping;
length thekey $10 program $10;
set mapping;
run;

SAS: Proc Boxplot and Proc Sgplot

It is easier to use proc sgplot than proc boxplot to compare distibution by classification variable.  “Drive Train” and “Type” are both categorical variables.

proc sgplot data=sashelp.cars;
title "Price distribution by Drive Train and Type";
vbox invoice / category =type group = drivetrain;
run;
  • side by side comparison
  • group became legend
  • applied legend color by the group
  • inset statement for sgplot doesn’t have statistics output (n/min/max/mean/stddev)

SGPlot22

proc sort data=sashelp.cars out=cars;
by drivetrain type;
run;
proc boxplot data=cars;
title "Price distribution by Drive Train and Type";
plot invoice*type;
by DriveTrain;
inset min mean max stddev/ header = "Overall Statistics";
insetgroup min max / header = "Cheap and Expensive by Type";
run;
  • need to sort the data first according to by statement and plot categorical variable;
  • plot in light blue; want other color, need extra code
  • not able to show 2 categorical variable plot side by side;
  • use by statement use produce plot separately.
  • inset and insetgroup are nice to have to produce stats as part of the plot.
    • inset: data, min, max, mean, nmax, nmin, dobs, stddev;
    • insetgroup: max, mean, min, n, nhigh, nlow, nout, q1, q2, q3, range, stddev;

Boxplot22

Boxplot24

VBA: Tricks in Access VBA and VBscripting(1)

  1.  Naming of the procedures: Don’t name the procedure name the same as the module names.

module name

 

modulename

procedure name

procedurename

The following VBscripting code is trying to run a sub within Access. The sub is in a module called “RunQueries”.  The sub was named “RunQueries” initially because it is the only procedure in the module. The Run method in the VBcripting couldn’t find the sub in the Access because it has the same name as the module.  I changed the sub name to “OpenQueries” and the VPscript works fine now.

Dim objaccess
Set objaccess =CreateObject("Access.Application")
objAccess.OpenCurrentDatabase "D:\accessdb.mdb", False
objAccess.Run ("OpenQueries")
ObjAccess.Quit
set objAccess = Nothing

2. Name of the Object: If the object has event procedure attached to it, the name of the object needs to be consistent with the object name reference in the procedure name.

Object Name

objectname

Procedure Name

objectprocedurename

When you change object name in the Property window, the corresponding procedure name in the code window doesn’t update automatically and the event procedure following the old procedure name will not run until you manually change the procedure name prefix to match the new object name.

3. Compact Access database in VPscript

objAccess.Application.SetOption "Auto compact", True

SAS: Use Filename to Create data set of hierarchical folder structure and file list

FILENAME contents pipe 'TREE "x:\project" /F /A' LRECL = 2000;
DATA project;
infile contents TRUNCOVER;
INPUT content_entry $char2000.;
content_entry = left(compress(content_entry, "|"));
run;

TREE: disply each directory within the listed directory. /F: display the names of the files within each directory.

proc sql noprint;
select count (*) into : file_exist
from project
where content_entry = "table1.sas7bdat";
quit;
%put file_exist = &file_exist.;

General: Popular Words in BI Job Description

PoC: Proof of Concept

https://en.wikipedia.org/wiki/Proof_of_concept

“Proof of concept is a realization of a certain method or idea to demonstrate its feasibility, or a demonstration in principle, whose purpose is to verify that some concept or theory has the potential of being used. A proof of concept is usually small and may or may not be complete.”

SDS: Solution Design Specification

DDD: Detailed Design Document

Essbase is a multidimensional database management system (MDBMS) that provides a multidimensional database platform upon which to build analytic applications. … Oracle Corporation acquired Hyperion Solutions Corporation in 2007, as of 2009 it markets Essbase as “Oracle Essbase”. (https://en.wikipedia.org/wiki/Essbase)

SaaS/PaaS/IaaS

Comparison: https://www.computenext.com/blog/when-to-use-saas-paas-and-iaas/
SaaS: Software as a service (SaaS; pronounced /sæs/) is a software licensing and delivery model in which software is licensed on a subscription basis and is centrally hosted. It is sometimes referred to as “on-demand software”. SaaS is typically accessed by users using a thin client via a web browser. (https://en.wikipedia.org/wiki/Software_as_a_service)
PaaS:  Platform as a service (PaaS) is a category of cloud computing services that provides a platform allowing customers to develop, run, and manage applications without the complexity of building and maintaining the infrastructure typically associated with developing and launching an app. (https://en.wikipedia.org/wiki/Platform_as_a_service)
IaaS:  Infrastructure as a Service (IaaS) is a form of cloud computing that provides virtualized computing resources over the Internet. IaaS is one of three main categories of cloud computing services, alongside Software as a Service (SaaS) and Platform as a Service (PaaS). (http://searchcloudcomputing.techtarget.com/definition/Infrastructure-as-a-Service-IaaS)

AWS: Amazon Web Services

Rack Space: A managed cloud computing company based in US.

Metrics in Banking Industry: ARPU/COA/COR

ARPU:  Average revenue per user (sometimes known as average revenue per unit), usually abbreviated to ARPU, is a measure used primarily by consumer communications and networking companies, defined as the total revenue divided by the number of subscribers. … Most telecommunications carriers operate by the month. (https://en.wikipedia.org/wiki/Average_revenue_per_user)
COA: Cost of Acquisition. A business sales term referring to the expense required to attain a customer or a sale. In setting a marketing and sales strategy, a company must decide what the maximum cost of acquisition will be, which effectively determines the highest amount the company is willing to spend to attain each customer. http://www.investopedia.com/terms/c/costofacquisition.asp)
COR:The cost of revenue is the total cost of manufacturing and delivering a product or service. Cost of revenue information is found in a company’s income statement, and is designed to represent the direct costs associated with the goods and services the company provides. Cost of revenue is different from cost of goods sold (COGS) because it includes costs outside of production, such as distribution and marketing. (http://www.investopedia.com/terms/c/cost-of-revenue.asp)

SCCM: System Center Configuration Manager. System Center Configuration Manager (SCCM, also known as ConfigMgr),formerly Systems Management Server (SMS) is a systems management software product developed by Microsoft for managing large groups of computers running Windows NT, Windows Embedded, OS X, Linux or UNIX, as well as Windows Phone, Symbian, iOS and Android mobile operating systems. Configuration Manager provides remote control, patch management, software distribution, operating system deployment, network access protection and hardware and software inventory. (https://en.wikipedia.org/wiki/System_Center_Configuration_Manager)

PERL: Practical Extraction and Reporting Language. Perl is a family of high-level, general-purpose, interpreted, dynamic programming languages. The languages in this family include Perl 5 and Perl 6.  Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier.[9] Since then, it has undergone many changes and revisions. Perl 6, which began as a redesign of Perl 5 in 2000, eventually evolved into a separate language. Both languages continue to be developed independently by different development teams and liberally borrow ideas from one another. (https://en.wikipedia.org/wiki/Perl)

Graph Database: Nodes/Edges/Properties

graph database is a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data. A key concept of the system is the graph (or edge or relationship), which directly relates data items in the store. The relationships allow data in the store to be linked together directly, and in most cases retrieved with a single operation.
Graph databases are based on graph theory. Graph databases employ nodes, edges and properties.
Nodes represent entities such as people, businesses, accounts, or any other item you might want to keep track of. They are roughly the equivalent of the record, relation or row in a relational database, or the document in a document database.
Edges, also known as graphs or relationships, are the lines that connect nodes to other nodes; they represent the relationship between them. Meaningful patterns emerge when examining the connections and interconnections of nodes, properties, and edges. Edges are the key concept in graph databases, representing an abstraction that is not directly implemented in other systems.
Properties are pertinent information that relate to nodes. For instance, if Wikipedia were one of the nodes, one might have it tied to properties such as website, reference material, or word that starts with the letter w, depending on which aspects of Wikipedia are pertinent to the particular database.
(https://en.wikipedia.org/wiki/Graph_database)

 

 

VBA: VBScripting Reference

Microsoft WSH and VBScript Programming for the Absolute Beginner, Fourth Edition by  Jerry Lee Ford

Terminology
VBScript: Visaual Basic Scripting

VBA: Visual Basic for Application

HTAs: HTML Applications

GUI: Graphical User Interface

WSH: Windows Script Host

Script Execution Host: CScript.exe and WScript.exe

object/property/method

Windows command prompt: C:\>

TRICK:

Run commands with access privilege.

cmd

File Extension:

  • VBScript: .vbs
  • JScript: .js

DLL: C:\WINDOWS\SYSTEM32\VBSCRIPT.DLL

 

 

 

SAS: Compile varlist and run report for each variable

Recently, I found Proc Sql can allow you to create macro variable with long string values separated by either space or other delimiters directly without extra step for table generation and null data step. Here is the modify codes.

proc sql;
select distinct varname into :varlist separated by ' ' from table;
quit;
%put &varlist.;

Sometime the macro variable is created within a macro, then the macro variable will be just local macro variable and it will not be available beyond the macro statement. To force the macro variable to be available in the global environment, using %Global.

%macro new;
proc sql;
select distinct varname into :varlist separated by ' ' from table;
quit;
%global varlist;
%mend new;
%put &varlist.;

If you would like the varlist items to be wrapped with quotation and separated with comma, then use the following code.

%macro new;
proc sql;
select distinct '"'||varname||'"' as temp into :varlist separated by ', ' from table;
quit;
%global varlist;
%mend new;
%put &varlist.;

If the same macro variable is defined or used in multiple macros, making sure the macro variable is set at the global level, because macro variables are local at default. You might get error message like below if the macro variable is not forced to be global variable.

ERROR: Attempt to %GLOBAL a name (NAME) which exists in a local environment.

Here is an example of use previously created macro variable in another macro.

%macro del;
%global varlist;
proc datasets nolist;
delete &varlist;
run;
%mend del;

previous reference: http://www2.sas.com/proceedings/sugi30/028-30.pdf

proc sql;
create table varlist as
select distinct varname from table;
quit;

%let varlist =;

data _null_;
set varlist;
call symput('varlist', trim(resolve('&varlist'))||' '||trim(varname));
run;

null is a special data set name, although sas doesn’t create a data set called “_null”. It allows sas to carry out commands in the data steps.  There is no output data set.

Omitting value in the %let syntax will produce a null value for the macro variable.

resolve function: resolve the value of macro variable in data step.

CALL SYMPUT  assigns value produced in a DATA step to macro variable(S).

SAS: Assign Order Number by group and Calculating Accumulative Total in dataset using Retain Statement

Assigning group order for a set sequence of records.

data temp2;
retain order;
set temp1;
if _n_ =1 then order=1; /* _n_=1 identifies the first record.
else if var1 ne lag(var1) then order= sum(order,1);
else order = order ; 
run;
  • Lag(var) the value of previous record in the var column.
  • retain statement needs to be before the set statement.

Calculating the running total of weight and height for each record. The Retain statement retains the running total from the previous iteration to the next.

data one;
length AccWeight AccHeight 8.;
retain AccWeight 0 AccHeight 0;
set sashelp.class end=eof;
AccWeight = sum(AccWeight, weight);
AccHeight = sum(AccHeight, Height);
run;

The following codes deal with cumulative sum by the group where the variable for calculation contains null values. The variable regs is either 1 or 0 or null value. Sorting by the regs variable will force the the record with a regs value of 1 to be last record by the desired group.

Proc sort data= apps out = sortapp;
by id app_num regs;
run;
data sortapp1;
set sortapp;
by id app_num;
retain sumreg;
firstnum = first.a_num;
lastnum = last.a_num;
if first.a_num then sumreg = sum(regs);
else sumreg = sum(sumreg, regs);
run;

SAS: Identify first and last record by group

Staff are in two wage category “H” and “S” . Sort the table by wage category and wage rate.  Firstrate will equal to 1 if it is the first record in the wagecategory, and the lastrate will equal to 1 if it is the last record in the wagecategory.

proc sort data=sasuser.staff out =sortstaff;
by wagecategory wagerate;
run;
data sortstaff;
set sortstaff;
by wagecategory;
firstrate = first.wagecategory;
lastrate = last.wagecategory;
run;

SAS: Proc Transpose long to wide

  • Need to sort by the variables in the by statement before using the proc transpose.
  • variables in the by statement are row labels;
  • contents in the id variable become the column labels;
  • var variable is the value of the table.
proc transpose data=sum out=sumwide;
by program academicyear yearlevel;
id type;
var count;
run;