A. Scenario
The data set contains personal response on attribute 1 to attribute 4. The response can be either ‘Y’ or ‘N’. Some of the records are blank. The possible combinations for attribute 1 to 4 are 2x2x2x2, which is 16 combinations.
I would like to know how many people belongs any one of the 16 combination of the attributes. The distribution of people among the combination of attributes indicates the prominent attributes of the population (where the distribution is dense) and the outlier attributes (where the distribution is sparse).
B. Code
- create table with all combination
data combination; format attr1 $1. attr2 $1. attr3 $1. attr4 $1.; *Important to add format statement. The variables need to be exactly the same format as the data table that you want to run summary statistics on; do attr1= 'Y','N'; do attr2 = 'Y','N'; do attr3 = 'Y','N'; do attr4 = 'Y','N'; output; end; end; end; end; run;
- run sum use Proc tabulate with classdata option
proc tabulate data=data out = sum missing classdata = combination; *classdata option use the combination table created in the last step; class attr1 attr2 attr3 attr4; var count; table attr1*attr2*attr3*attr4, count*(sum=""); run;
C. Output
D. Notes
- make sure that the variables used in the class statement of the Proc tabulate procedure are included in the classdata table;
- The output table sum automatically add the combination of null to all attr1 to attr4.
- make sure that the variables used in classdata table have exactly the same name and format as the variables in the table that you want to run statistics on. In the above example, the attr1-4 in classdata table combination need to match the variables attr1-4 in the data table. Use Proc contents procedure on both data sets to make sure these variables are matching. Merely check the format in column attributes is not enough. If you omit the format statement when creating the classdata table combination, you will get the following error.