General: Popular Words in BI Job Description

PoC: Proof of Concept

https://en.wikipedia.org/wiki/Proof_of_concept

“Proof of concept is a realization of a certain method or idea to demonstrate its feasibility, or a demonstration in principle, whose purpose is to verify that some concept or theory has the potential of being used. A proof of concept is usually small and may or may not be complete.”

SDS: Solution Design Specification

DDD: Detailed Design Document

Essbase is a multidimensional database management system (MDBMS) that provides a multidimensional database platform upon which to build analytic applications. … Oracle Corporation acquired Hyperion Solutions Corporation in 2007, as of 2009 it markets Essbase as “Oracle Essbase”. (https://en.wikipedia.org/wiki/Essbase)

SaaS/PaaS/IaaS

Comparison: https://www.computenext.com/blog/when-to-use-saas-paas-and-iaas/
SaaS: Software as a service (SaaS; pronounced /sæs/) is a software licensing and delivery model in which software is licensed on a subscription basis and is centrally hosted. It is sometimes referred to as “on-demand software”. SaaS is typically accessed by users using a thin client via a web browser. (https://en.wikipedia.org/wiki/Software_as_a_service)
PaaS:  Platform as a service (PaaS) is a category of cloud computing services that provides a platform allowing customers to develop, run, and manage applications without the complexity of building and maintaining the infrastructure typically associated with developing and launching an app. (https://en.wikipedia.org/wiki/Platform_as_a_service)
IaaS:  Infrastructure as a Service (IaaS) is a form of cloud computing that provides virtualized computing resources over the Internet. IaaS is one of three main categories of cloud computing services, alongside Software as a Service (SaaS) and Platform as a Service (PaaS). (http://searchcloudcomputing.techtarget.com/definition/Infrastructure-as-a-Service-IaaS)

AWS: Amazon Web Services

Rack Space: A managed cloud computing company based in US.

Metrics in Banking Industry: ARPU/COA/COR

ARPU:  Average revenue per user (sometimes known as average revenue per unit), usually abbreviated to ARPU, is a measure used primarily by consumer communications and networking companies, defined as the total revenue divided by the number of subscribers. … Most telecommunications carriers operate by the month. (https://en.wikipedia.org/wiki/Average_revenue_per_user)
COA: Cost of Acquisition. A business sales term referring to the expense required to attain a customer or a sale. In setting a marketing and sales strategy, a company must decide what the maximum cost of acquisition will be, which effectively determines the highest amount the company is willing to spend to attain each customer. http://www.investopedia.com/terms/c/costofacquisition.asp)
COR:The cost of revenue is the total cost of manufacturing and delivering a product or service. Cost of revenue information is found in a company’s income statement, and is designed to represent the direct costs associated with the goods and services the company provides. Cost of revenue is different from cost of goods sold (COGS) because it includes costs outside of production, such as distribution and marketing. (http://www.investopedia.com/terms/c/cost-of-revenue.asp)

SCCM: System Center Configuration Manager. System Center Configuration Manager (SCCM, also known as ConfigMgr),formerly Systems Management Server (SMS) is a systems management software product developed by Microsoft for managing large groups of computers running Windows NT, Windows Embedded, OS X, Linux or UNIX, as well as Windows Phone, Symbian, iOS and Android mobile operating systems. Configuration Manager provides remote control, patch management, software distribution, operating system deployment, network access protection and hardware and software inventory. (https://en.wikipedia.org/wiki/System_Center_Configuration_Manager)

PERL: Practical Extraction and Reporting Language. Perl is a family of high-level, general-purpose, interpreted, dynamic programming languages. The languages in this family include Perl 5 and Perl 6.  Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier.[9] Since then, it has undergone many changes and revisions. Perl 6, which began as a redesign of Perl 5 in 2000, eventually evolved into a separate language. Both languages continue to be developed independently by different development teams and liberally borrow ideas from one another. (https://en.wikipedia.org/wiki/Perl)

Graph Database: Nodes/Edges/Properties

graph database is a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data. A key concept of the system is the graph (or edge or relationship), which directly relates data items in the store. The relationships allow data in the store to be linked together directly, and in most cases retrieved with a single operation.
Graph databases are based on graph theory. Graph databases employ nodes, edges and properties.
Nodes represent entities such as people, businesses, accounts, or any other item you might want to keep track of. They are roughly the equivalent of the record, relation or row in a relational database, or the document in a document database.
Edges, also known as graphs or relationships, are the lines that connect nodes to other nodes; they represent the relationship between them. Meaningful patterns emerge when examining the connections and interconnections of nodes, properties, and edges. Edges are the key concept in graph databases, representing an abstraction that is not directly implemented in other systems.
Properties are pertinent information that relate to nodes. For instance, if Wikipedia were one of the nodes, one might have it tied to properties such as website, reference material, or word that starts with the letter w, depending on which aspects of Wikipedia are pertinent to the particular database.
(https://en.wikipedia.org/wiki/Graph_database)

 

 

VBA: VBScripting Reference

Microsoft WSH and VBScript Programming for the Absolute Beginner, Fourth Edition by  Jerry Lee Ford

Terminology
VBScript: Visaual Basic Scripting

VBA: Visual Basic for Application

HTAs: HTML Applications

GUI: Graphical User Interface

WSH: Windows Script Host

Script Execution Host: CScript.exe and WScript.exe

object/property/method

Windows command prompt: C:\>

TRICK:

Run commands with access privilege.

cmd

File Extension:

  • VBScript: .vbs
  • JScript: .js

DLL: C:\WINDOWS\SYSTEM32\VBSCRIPT.DLL

 

 

 

SAS: Compile varlist and run report for each variable

Recently, I found Proc Sql can allow you to create macro variable with long string values separated by either space or other delimiters directly without extra step for table generation and null data step. Here is the modify codes.

proc sql;
select distinct varname into :varlist separated by ' ' from table;
quit;
%put &varlist.;

Sometime the macro variable is created within a macro, then the macro variable will be just local macro variable and it will not be available beyond the macro statement. To force the macro variable to be available in the global environment, using %Global.

%macro new;
proc sql;
select distinct varname into :varlist separated by ' ' from table;
quit;
%global varlist;
%mend new;
%put &varlist.;

If you would like the varlist items to be wrapped with quotation and separated with comma, then use the following code.

%macro new;
proc sql;
select distinct '"'||varname||'"' as temp into :varlist separated by ', ' from table;
quit;
%global varlist;
%mend new;
%put &varlist.;

If the same macro variable is defined or used in multiple macros, making sure the macro variable is set at the global level, because macro variables are local at default. You might get error message like below if the macro variable is not forced to be global variable.

ERROR: Attempt to %GLOBAL a name (NAME) which exists in a local environment.

Here is an example of use previously created macro variable in another macro.

%macro del;
%global varlist;
proc datasets nolist;
delete &varlist;
run;
%mend del;

previous reference: http://www2.sas.com/proceedings/sugi30/028-30.pdf

proc sql;
create table varlist as
select distinct varname from table;
quit;

%let varlist =;

data _null_;
set varlist;
call symput('varlist', trim(resolve('&varlist'))||' '||trim(varname));
run;

null is a special data set name, although sas doesn’t create a data set called “_null”. It allows sas to carry out commands in the data steps.  There is no output data set.

Omitting value in the %let syntax will produce a null value for the macro variable.

resolve function: resolve the value of macro variable in data step.

CALL SYMPUT  assigns value produced in a DATA step to macro variable(S).

SAS: Assign Order Number by group and Calculating Accumulative Total in dataset using Retain Statement

Assigning group order for a set sequence of records.

data temp2;
retain order;
set temp1;
if _n_ =1 then order=1; /* _n_=1 identifies the first record.
else if var1 ne lag(var1) then order= sum(order,1);
else order = order ; 
run;
  • Lag(var) the value of previous record in the var column.
  • retain statement needs to be before the set statement.

Calculating the running total of weight and height for each record. The Retain statement retains the running total from the previous iteration to the next.

data one;
length AccWeight AccHeight 8.;
retain AccWeight 0 AccHeight 0;
set sashelp.class end=eof;
AccWeight = sum(AccWeight, weight);
AccHeight = sum(AccHeight, Height);
run;

The following codes deal with cumulative sum by the group where the variable for calculation contains null values. The variable regs is either 1 or 0 or null value. Sorting by the regs variable will force the the record with a regs value of 1 to be last record by the desired group.

Proc sort data= apps out = sortapp;
by id app_num regs;
run;
data sortapp1;
set sortapp;
by id app_num;
retain sumreg;
firstnum = first.a_num;
lastnum = last.a_num;
if first.a_num then sumreg = sum(regs);
else sumreg = sum(sumreg, regs);
run;

SAS: Identify first and last record by group

Staff are in two wage category “H” and “S” . Sort the table by wage category and wage rate.  Firstrate will equal to 1 if it is the first record in the wagecategory, and the lastrate will equal to 1 if it is the last record in the wagecategory.

proc sort data=sasuser.staff out =sortstaff;
by wagecategory wagerate;
run;
data sortstaff;
set sortstaff;
by wagecategory;
firstrate = first.wagecategory;
lastrate = last.wagecategory;
run;

SAS: Proc Transpose long to wide

  • Need to sort by the variables in the by statement before using the proc transpose.
  • variables in the by statement are row labels;
  • contents in the id variable become the column labels;
  • var variable is the value of the table.
proc transpose data=sum out=sumwide;
by program academicyear yearlevel;
id type;
var count;
run;

SAS: Proc Summary and Statistics Output

SUM ONE VARIABLE

%MACRO summ_prg (data=, out=);
proc summary nway missing data=&data ;
class year program level1 level2;
id progname faculty category ;
var value;
output out=&out(drop=_freq_ _type_)  sum(value)=;
run;

%mend summ_prg;

SUM MULTIPLE VARIABLES

%MACRO summ_prg (data=, out=);
proc summary nway missing data=&data ;
class year program level1 level2;
id progname faculty category ;
var value1 value2 value3;
output out=&out(drop=_freq_ _type_)  sum(value1 value2)=;
run;

%mend summ_prg;

To sum all the numerica variables, use var

_numeric_

and sum=.

CONDITIONAL SUM VARIABLES

%MACRO summ_prg (data=, out=);
proc summary nway missing data=&data (WHERE = (CON_VAR = "XXX ")) ;
class year program level1 level2;
id progname faculty category ;
var value1 value2 value3;
output out=&out(drop=_freq_ _type_)  sum(value1 value2)=;
run;

%mend summ_prg;

OUTPUT DESCRIPTIVE STATISTICS

%MACRO summ_prg (data=, out=);
proc summary nway missing data=&data ;
class year program level1 level2;
id progname faculty category ;
var value;
output out=&out(drop=_freq_ _type_) n(value) = N mean(value)=mean q1(value)=q1 q3(value) =q3;
run;
%mend summ_prg;

SAS: Pros and Cons of Methods to Output SAS Results to Excel

  1. ODS tagset.excelxp
    • Con:can only output to .xml file, not xls or xlsx files;
    • Pro: use Proc Report by statement and output to separate spreadsheets;
  2. ODS html
    • Pro: can output to xls files;
    • Pro : use Proc Report can control label, column width and text wrapping;
    • Con: can’t use Proc Report by statement to output to separate spreadsheets;
ODS HTML BODY = 'C:\cars.xls' style=minimal;
PROC FREQ DATA=sashelp.cars ;
TABLES make*type type origin driveTrain enginesize cylinders horsepower mpg_city ;
ODS OUTPUT OneWayFreqs(match_all)=freqs;
RUN;
ODS HTML CLOSE;
DS HTML BODY = 'C:\cars.xls' style=minimal;
PROC MEANS DATA=sashelp.cars ;
VAR MSRP horsepower mpg_city;
CLASS type origin driveTrain ;
TYPES () type typeorigin typeorigin*drivetrain;
RUN;
ODS HTML CLOSE;

Use DOS Command in SAS

All DOS commands can be used in the following sytax to run in SAS, assuming XCMD is enabled in SAS.

Most common DOS command are copy, mkdir. It can be used to create new directories, files etc. The comand needs to be put in single quoatation ‘ ‘.  If there is space in the folder name or the file name, the directory of the file needs to be in double quotation “”. The folder name and the file name are not case sensitive.

eg.

copy one file

data a;
b= system( 'copy "X:\Work\CPR\1617\1617 ROTA.pdf" X:\Work\CPR\1617\1617rota.pdf' );
run;

 

copy certain file type

data a;
b= system (' copy "x:\sas\project\new data\*.csv" "x:\sas\project\processed data\" ');
run; 

move certain file type

data a;
b= system (' move "x:\sas\project\new data\*.csv" "x:\sas\project\processed data\" ');
run;