Course1 contains the counts information for students who have taken crs1000 to crs 2000. If the student haven’t taken the specific course, the value for the course of the student is 0. If the student have taken the specific course for multiple times, the value for the course is the total number of times the student took the course.
In following code, the array was set up to group the course of interest; count variable is used to capture number of distinct course taken from the selected courses; ifretake variable is used to track if student retakes any of the course from the selected courses.
data course2;
set course1;
array crs{*} crs1000 crs1010 crs1015 crs1200 crs1450 crs2000;
count=0;
ifretake =0;
do i =1 to dim(crs);
if crs{i}>0 then count = count+1;
if crs{i}>1 then ifretake = 1;
end;
run;
The following SAS codes get the filenames from L:\DaraFiles directory and pass them to a macro variable called varlist.
%let varlist =;
data null;
rc =filename("mydir", "L:\DataFiles");
did = dopen("mydir");
if did > 0 then do;
num =dnum(did);
do i = 1 to num;
fname =dread (did, i );
put fname = ;
call symput('varlist', trim(resolve('&varlist'))||' '||trim(fname));
end;
end;
run;
%put &varlist.;
Key Functions:
filename: assign the directory to a alias.
dopen: open the directory and get the directory identifier;
The National Collegiate Athletic Association (NCAA®) is a non-profit organization which regulates most of the United State’s college athletic programs, now supporting nearly half a million student athletes. They officially run the US Men’s and Women’s Division I college basketball tournament, March Madness®.
Google Cloud offers a suite of cloud computing services that runs on the same infrastructure that Google uses internally for its end-user products, like Google Search and YouTube. Google Cloud is now the official cloud of the NCAA®.
Kagglers who participate in this competition will use their computing power along with the NCAA®’s data to join the millions of fans who attempt to predict the outcomes of this year’s tournament.
The competition is split into 2 stages. In the first stage, Kagglers will rely on results of past tournaments to build and test models. In the second stage, competitors will forecast outcomes of all possible match-ups in the 2018 tournament. You don’t need to participate in the first stage to enter the second – the real competition is forecasting the 2018 results.
For example, data table contains student grades for the term. Each grade is one record and the term, program, student id information are the same for the student. The following data step can be used to order the data table and then concatenate the grade records into something like ‘A/A/B/B+/A’ for each student.
proc sort data= one;
by groupvar1 groupvar2 groupvar3;
run;
data two;
set one;
by groupvar1 groupvar2 groupvar3;
firstgroup = first.groupvar2;
lastgroup = last.groupvar2;
retain catvar;
if firstgroup =1 then catvar=trim(targetvar);
else catvar=catx('/',catvar, targetvar);
if lastgroup ne 1 then delete;
keep groupvar1 groupvar2 catvar;
run;
The following proc tabulate procedure will generate at a report on sum of undergraduate teaching and graduate teaching and FTE for each Faculty by departments (row header) and by year (column header).
proc tabulate data=data;
class year faculty dept;
var undergrad_load graduate_load fte;
table faculty, (dept all), year*(undergrad_load="UG" graduate_load="GR" fte)*(sum="");
run;
Data table contains grade, program, and incoming year of students. “Total” variable equals to 1 if the student has a grade; “ge75” is 1 if the grade is great than and equal to 75; and “ge80” is 1 if the grade is great than and equal to 80.
The following proc tabulate procedure gives a summary table of total number of counts of students with grade, students with grade great than and equal to 75 and 80.
proc tabulate missing f=6. noseps data=data;
class program progname category year level2;
var total ge75 ge80;
tables category,program*progname*level2,year*(total ge75 ge80) / rts=40 indent=3;
run;
PERCENTAGE STATISTICS
reppctn: report percentage (all rows and all columns add up to 100)
colpctn: column percentage (every column adds up to 100)
rowpctn: row percentage (every row adds up to 100)
A*pctsum: construct a custom denomination, both A and B are analysis variables, and B as in the is the denominator for calculating the percentage.
The following tabulate procedure calculates the percentage of greater and equal to 75 and 80. The outomatic SAS naming with ‘1111111’ of the percentage output reflects the number of class variables in the procedure.
proc tabulate missing f=6. noseps data=data out=dataout (drop=_type_ _table_ _page_ rename=(ge75_pctsum_1111111=ge75 ge80_pctsum_1111111=ge80));
class level1 faculty program progname category level2 year;
var total ge75 ge80;
tables category*level1*faculty*program*progname*level2*year, ge75*pctsum='ge75%'*f=8.2 ge80*pctsum='ge80%'*f=8.2 / rts=40 indent=3;
run;
Examples: with multiple class variables in row expression
Reppctn/Reppctsum
proc tabulate data = table1 ;
class Agegroup gender acadyear;
var heads ;
table agegroup="Age Group" all gender="Gender" all ,
(acadyear="" all)*(heads="Heads"*SUM="" heads="% Total"*reppctsum="" );
run;
Colpctn/Colpctsum — best for % sum for multiple class variables
proc tabulate data = table1 ;
class Agegroup gender acadyear;
var heads ;
table agegroup="Age Group" all gender="Gender" all ,
(acadyear="" all)*(heads="Heads"*SUM="" heads="% Total"*colpctsum="" );
run;
Rowpctn/Rowpctsum
proc tabulate data = table1 ;
class Agegroup gender acadyear;
var heads ;
table agegroup="Age Group" all gender="Gender" all ,
(acadyear="" all)*(heads="Heads"*SUM="" heads="% Total"*rowpctsum="" );
run;
pctsum – not working for the All column
proc tabulate data = table1 ;
class Agegroup gender acadyear;
var heads ;
table agegroup="Age Group" all gender="Gender" all ,
(acadyear="" all="All Year" )*(heads="Heads"*SUM="" heads="% Total"*pctsum < agegroup all gender all > ="" );
run;
Problem: myattrmap not reconginzed by the scatter statement though ‘attrid= myid’ doesn’t generate error.
data myattrmap;
retain id value linecolor fillcolor;
length linecolor $ 9 fillcolor $ 9;
input ID $ value $ linecolor $ fillcolor $;
datalines;
myid F blue blue
myid M red red;
run;