Proc Means!!!


                 In this post, we will see about PROC Means which is more versatile and can be used to create summary data sets that can then be analyzed with more DATA or PROC steps.By default, PROC MEANS produces statistics on all the numeric variables in the input SAS data set. Let's see some exampes.

#Problem:

                  Using the SAS data set College, report the mean GPA for the following categories of
ClassRank: 0–50 = bottom half, 51–74 = 3rd quartile, and 75 to 100 = top quarter.

#Dataset:

*Creating user defined format;
proc format library=A15028;
   value $yesno 'Y','1' = 'Yes'
                'N','0' = 'No'
                ' '     = 'Not Given';
   value $size 'S' = 'Small'
               'M' = 'Medium'
               'L' = 'Large'
                ' ' = 'Missing';
   value $gender 'F' = 'Female'
                 'M' = 'Male'
                 ' ' = 'Not Given';
run;
*Calling the format library;
option FMTSEARCH=(A15028);
*Creating college dataset;
data A15028.A28_college;
   length StudentID $ 5 Gender SchoolSize $ 1;
   do i = 1 to 100;
      StudentID = put(round(ranuni(123456)*10000),z5.);
      if ranuni(0) lt .4 then Gender = 'M';
      else Gender = 'F';
      if ranuni(0) lt .3 then SchoolSize = 'S';
      else if ranuni(0) lt .7 then SchoolSize = 'M';
      else SchoolSize = 'L';
      if ranuni(0) lt .2 then Scholarship = 'Y';
      else Scholarship = 'N';
      GPA = round(rannor(0)*.5 + 3.5,.01);
      if GPA gt 4 then GPA = 4;
      ClassRank = int(ranuni(0)*60 + 41);
      if ranuni(0) lt .1 then call missing(ClassRank);
      if ranuni(0) lt .05 then call missing(SchoolSize);
      if ranuni(0) lt .05 then call missing(GPA);
      output;
   end;
   format Gender $gender1. 
          SchoolSize $size. 
          Scholarship $yesno.;
   drop i;
run;

#Solution:


proc format;
value rank 0-50 = 'Bottom Half'
51-74 = 'Third Quartile'
75-100 = 'Top Quarter';
run;
title "Statistics on the College Data Set";
title2 "Broken down by School Size";
proc means data=A15028.A28_college
n
mean
maxdec=2;
class ClassRank;
var GPA;
format ClassRank rank.;
run;

#Output:


#Explanation:

                           In the above program we tried to do some basic statistical calculation Mean which is to be said as average in the common man term. The data set which we used is College which we are seeing for most of the posts and that is not our interest of this post. So, we will concentrate on the solution part. The Proc Means procedure helps us to calculate the average of the variable. Here we are calculating the average of GPA based on the Class Rank where we created a range for the rank using Proc Format procedure. The options "n" gives the Number of persons and "Mean" gives the average value for that particular section of students and we can adjust the value as 2 decimal point by giving "Maxdec" option.

                        PROC MEANS lets you use a CLASS statement in place of a BY statement. The CLASS statement performs a similar function to the BY statement, with some significant differences. If you are using PROC MEANS to print a report and are not creating a summary output data set, the differences in the printed output between a BY and CLASS statement are basically cosmetic. The main difference, from a programmer’s perspective, is that you do not have to sort your data set before using a CLASS statement. Similarly, You can control which variables to include in the report by supplying a VAR statement.

No comments:

Post a Comment