SAS: Proc SQL Group By generate duplicate results and fix

I need to get a list of sum of award amount by year by student id and by award. The following codes generated duplicated results and a warning message.

Problem:

proc sql;
create table want as
select year, id, award, funding_type, type, sum(amount) as sum, 1 as count
from have
where (funding_type eq "A" and type = "Internal") or (funding_type eq "B")
group by year, funding_type, sisid, award;quit

“The query requires remerging summary statistics back with the original data. “

Fix: “In SAS SQL, in a query with a group by clause that includes extraneous columns on the select statement (i.e. columns not part of the group by and not derived from an aggregating function), SAS “remerges” the summary statistics back to the original data (with a note to that effect).”
Reference:
https://stackoverflow.com/questions/25538392/sas-proc-sql-returning-duplicate-values-of-group-by-order-by-variables

Because “type” variable is in the select statement but not in the group by statement, SAS needs to remerging the summary statistics back with the original data to get information on the “type” variable. In the following codes, I added “type” to the group by statement and the warning message was gone.

proc sql;
create table want as
select year, id, award, funding_type, type, sum(amount) as sum, 1 as count
from funding
where (funding_type eq "A" and type = "Internal") or (funding_type eq "B")
group by year, funding_type, type, id, award;
quit;

Proc SQL: Update with other table

Update statement in Data step:

Syntax:

Data master;
update master  transaction;
by key1 key2;
run;

Limitation:

  • The master data set should be sorted by the by group with no duplicated grouping id.
  • The transaction data set should be sorted by the by group with no duplicated grouping id.
  • Transaction data set should have just the column that contains the updated information.

Update using Proc SQL Update statement:

Syntax:

proc sql;
update table1 as t1
set major=(select major from table2 as t2 where t1.id = t2.id and t1.year=t2.year)
where exists (select 1 from table2 t2 where t1.id = t2.id and t1.year=t2.year);
quit;

Limitation:

  • Can’t use join statement. Can only use nested Select statement.
  • Code is a little complicated to comprehend.

SAS: Identify Lag and Lead Status by Row

For character variable.

data want;
set have;
by id;
set  have (firstobs =2 keep = index0 rename= (index0= index1))
     have (obs=1 drop = _all_ );
indexlag1 = ifc( first.id, '', lag1(index0));
index1 = ifc( last.id, '', index1);
run;

For numeric variable.

data want;
set have;
by id;
set  have (firstobs =2 keep = index0 rename= (index0= index1))
     have (obs=1 drop = _all_ );
indexlag1 = ifn( first.id, (.), lag1(index0));
index1 = ifn( last.id, (.) , index1);
run;

SAS: Identify Row Level Changes and Tagging

Sort the dataset first and identify the change using lag(var).
The example below shows if the student cumulative credits drops from previous term, the count of degree will increase by 1.

/* identify the row observation that the change took place */
data data2;
set data1;
by sisid term; 
if first.sisid then deg =1;
else if sisid = lag1(sisid) and cumm_cr < lag1(cumm_cr) then deg = 2;
run;
/* tagging the observations associated with the change */
data data3 ;
set data2;
by sisid;
retain temp;
if first.sisid then do; temp = 0; end;
if deg ne . then temp = deg;
else if deg eq . then deg= temp;
run;

Updated to

data data2;
set data1;
by sisid term; 
retain deg;
if first.sisid then deg =1;
else if sisid = lag1(sisid) and cumm_cr < lag1(cumm_cr) then deg = deg +1;
run;

Access VBA to Customize Report

In Microsoft Access, VBA code can be used to customize Access reports. Various subroutines can be embedded in the components of the report using “Private Sub”.
The following example applies to the format function/procedure of the Categoryfooter section of the report, so the notes(label) can be customized, showing or hiding the note contents fore specific group.
View of the report design:

tab012

The components in the report are objects that be used for VBA coding. The name of the components can be checked through the property window. The name of the components needs to match the object names in the VBA code window.

View of the project explorer:
tab010

View of the object and procedure selector in the code window:
tab011

In order for the VBA code to be effective, the code section are uniquely named as object_procedure. When you move the cursor from section to section in the code window, the Object and Procedure selector at the top of the code window will always reflect the object and procedure of the section where your cursor is.

Private Sub Categroyfooter_Format(Cancel As Integer, FormatCount As Integer)
If Me.level1.Value = "A" Then
    Me.CategoryFooter.Visible = True
    Me.catAfootnote.Visible = True
    Me.catAfootnote.Caption = "*Note:For 18/19, the admission information are as of the reporting date. "
ElseIf Me.level1.Value = "C" Then
    Me.CategoryFooter.Visible = True
    Me.catAfootnote.Visible = True
    Me.catAfootnote.Caption = "*Note:For 18/19, the enrolment FTEs are as of the reporting date."
Else
    Me.catAfootnote.Visible = False
    Me.CategoryFooter.Visible = False
    End If
End Sub

Tableau: Ratios

Below is a summary table by census topic and sub-topic vs. by geographic area. I want to show instead of the sum, the percentage.
tab007
Select Analysis from the menu. In the drop down menu, select Create Calculated Field…. Create “Percent of Total” with the following formula.
tab008
Drop the “Percent of Total” from Measures to the Text icon in Marks.
tab009

Tableau: Hierarchical Filter

  1. Organize data into hierarchy by drag dimension to other dimension. In the example, the Topic is a hierarchy structure, with L1Topic>L2Topic>Item.
    tab001
  2. Hierarchy structure can be used for Conditional filter. In the following examples, when the L1Topic is selected for Mobility or Labour main topic respectively, the L2Topic automatically shows only the associated sub-topics.
    tab003tab002
  3. Sort field by value of another field. Before sorting the L2Topic and Item fields, the rows are in alphabetical order as default.
    tab005
    Select sort for the field and choose sort by field, select Item ID as field name and choose sum as aggregation method.
    tab004
    The L2Topic and Item are now ordered by Item ID which doesn’t have to show in the result table.
    tab006

4. Hierarchical Filter Configuration

  • In the filter, select only relevant values for the hierarchical filter to take effect.
  • When data source are linked, the “Only Relevant Values”, “All Values in Hierarchy”, and “All Values in Database” may disappear, then you will not be able to use the hierarchical filter.

SAS: Reading Census Data

Census data is exceptionally large. The 2016 census profile for Ontario is 4.5G and has more than 46,694,909 lines of records. To extract the data efficiently, StatsCan provides a csv file that identifies the starting row number for each geography. Using this file, you can compile the parameter list for the geographical area of interest at the selected geographic level, eg. province level, census division level, and census subdivision level.
Census file can be downloaded at link
Step 1: Compile parameter lists

%put &name.;
Canada Ontario Durham York Toronto Peel Halton
 %put &start.;
2 2249 7287023 9513800 12198965 20521853 25289987
 %put &end.;
2248 4495 7289269 9516046 12201211 20524099 25292233

Step 2: Extract and Compile data

%macro ext (namelst=, startlst=, endlst=);
proc datasets library=work noprint;
delete census;
quit;
%let i=1;
%do %while (%scan(&namelst., &i, ' ') ne );
%let parm1=%scan(&namelst., &i, ' ');
%let parm2=%scan(&startlst., &i, ' ');
%let parm3=%scan(&endlst., &i, ' ');
data census_&parm1.;
infile 'X:\Work\Stats Can\98-401-X2016044_ONTARIO_eng_CSV\98-401-X2016044_ONTARIO_English_CSV_data.csv'
delimiter = ',' firstobs=&parm2. obs=&parm3. TRUNCOVER  DSD LRECL=32767 ;
INFORMAT 
year 8.
geo_code  $13.
geo_level 8.
geo_name $80.
gnr 8.1
gnr_lf 8.1
quality_flag $5. 
alt_geo_code 8.
Item $50.
itemID 8.
Notes 8.
Total 8.
Male $8.
Female $8.
;
FORMAT 
year 8.
geo_code  $13.
geo_level 8.
geo_name $50.
gnr 8.1
gnr_lf 8.1
quality_flag $5. 
alt_geo_code 8.
Item $80.
itemID 8.
Notes 8.
Total 8.
Male $8.
Female $8.
;
input
year 
geo_code $
geo_level 
geo_name $
gnr
gnr_lf
quality_flag 
alt_geo_code 
Item $
itemID 
Notes 
Total
Male $
Female $
;
run;
proc append base = census data = census_&parm1.;
run;
%let i = %eval (&i +1);
%end;
%mend ext;
%ext (namelst=&name., startlst=&start., endlst=&end. );

The resulting dataset is only 4.1mb, which you can manipulate efficiently.

SAS: Convert format catalogs from 32bit to 64bit

Reference: http://support.sas.com/kb/44/047.html

Error message:
ERROR: File FORMATS.CATALOG was created for a different operating system.
Step 1: In windows 32-bit SAS, create a transport file (.cpt) with PROC CPORT and file option.

libname my32 'X:\Work\SAS\formats'; /* path where commonfmt.sas7bcat exists */
filename cat1 'X:\Work\SAS\formats\commonfmt.cpt';  /* transport file you are creating */

proc cport lib=my32 file=cat1 memtype=catalog;
   select commonfmt;
run;

The .cpt file will contain the format information of commonfmt.sas7bcat catalog file.

Step 2: In windows 64-bit SAS, unload the transport file (.cpt) using PROC CIMPORT and infile option.

libname my64 'X:\Work\SAS\format64';  /* path to store the new Formats.sas7bcat file */
filename trans1 'X:\Work\SAS\formats\commonfmt.cpt';  /* same as in Step 1 above */

proc cimport infile=trans1 lib=my64;
run;

Step 3: Check the formats in windows 64-bit SAS.

libname sasfmt  'X:\Work\SAS\format64';

PROC CATALOG CATALOG = SASFMT.COMMONFMT;
CONTENTS;
QUIT;