Thursday 15 December 2016

5 Mistakes to Avoid in SAS



5 Mistakes to Avoid in SAS

Mistake # 1: When assigning the length of a variable in SAS       
What do you think is the length of the variable TYPE in the code-set below?

data customer.loan;
   set customer.homeloan;
Total+emi;
  if code=’1′ then Type=’Fixed’;
else Type=’Floating’;
   length Type $ 10;
run;

If you look at the statement before run; the length is 10 characters, right?
Remember! The length of a new variable is determined by its first reference in the DATA step. In this case, the length of TYPE is determined by the value ‘Fixed’. The LENGTH statement is in the wrong place. It will work only if it is placed before any other reference to the variable in the DATA step. The LENGTH statement cannot alter the length of an existing variable.

Mistake # 2: When using comparison operators in SAS
Can you predict the output of the following program?

data compare;
input A;
  if A = 4 or 5 then Found = ‘Yes’;
else Found = ‘No’;
datalines;
4
7
8
;
title “Listing of Compare”;
proc print data=compare noobs;
run;

You might expect that the above program would result in following output:
Listing of Compare
A      Found
4         Yes
7         No
8         No
But the correct output is:
Listing of Compare
A      Found
4         Yes
7         Yes
8         Yes

Surprised? This is because, in SAS, any value other than 0 or missing is true. Therefore, 5 is evaluated as true and the statement A = 4 or 5 is always true.

Mistake # 3: When stacking datasets in SAS

If the dataset NORTH had 3 observations and SOUTH had 6 observations, how many observations would the dataset EMP have in the following code?

data emp;
    set north(in = a)  south(in=b);
if a and b;
run;
This looks like a simple case of vertical stacking and hence you would expect that the dataset emp has 9 observations, right?
While stacking the first dataset, IN variable is 1 for the first dataset and 0 for the second. While stacking the second dataset, IN variable is 1 for second dataset and 0 for first dataset. It is never 1 for both together. Hence the output dataset emp would have zero or no observations.





Mistake # 4: When using column input in SAS
You know that column input is appropriate only in the following situations.
  • When data is in standard character or numeric values
  • When values for a variable are in the same location in all records.
Now what do you think would be the value of the variable BREADTH in the output dataset below?

data test;
   infile cards;
   input @1 length 2. @4 breadth 2;
cards;
72 95
run;
proc print;
run;
You say “Breadth is 95, obviously!”.  Right?
The correct answer is 2. This is a case of column input. The ‘2’ after BREADTH in input statement specifies the starting column from which BREADTH is to be read and not the format. Remember, there is no ‘dot’ after 2.


Mistake # 5: When at the end of Data Step in SAS

Look at the program below:
data short;
input x;
datalines;
1
2
;
run;

data long;
input x;
datalines;
3
4
5
6
;

data new;
   set short;
output;
   set long;
   output;
run;
proc print data = new;
run;

In the above program, an observation is first read from the SHORT data set and the observation is written out to the NEW data set. Then an observation is read from the LONG  data set, and is written to the NEW data set. You expect that this would continue until all the observations from both the datasets are read. So, the dataset NEW would have 1,3,2,4,5,6. Right?

The dataset SHORT has only 2 observations 1 and 2. It is a shorter dataset. As soon as the end of file on data set SHORT is encountered, it signals an end to the DATA step, with the result that data set NEW has only four observations, with values of x equal to 1, 3, 2, and 4.

So these are some of the common mistakes you can make while executing a SAS code. Make sure to avoid these!


“Sometimes, little things make a big difference…”
– Nino Varsimashvili

Reference: http://analyticstraining.com/2016/5-mistakes-avoid-sas/

No comments:

Post a Comment