5 Mistakes to Avoid in SAS
Mistake
# 1: When assigning the length of a
variable in SAS
What do you think is the length of
the variable TYPE in the code-set below?
data customer.loan;
set customer.homeloan;
Total+emi;
if code=’1′ then Type=’Fixed’;
else Type=’Floating’;
length Type $ 10;
run;
If you look at the statement before
run; the length is 10 characters, right?
Remember! The length of a new
variable is determined by its first reference in the DATA step. In this case,
the length of TYPE is determined by the value ‘Fixed’. The LENGTH statement is
in the wrong place. It will work only if it is placed before any other
reference to the variable in the DATA step. The LENGTH statement cannot alter
the length of an existing variable.
Mistake #
2: When using comparison operators in
SAS
Can you predict the output of the
following program?
data compare;
input A;
if A = 4 or 5 then Found =
‘Yes’;
else Found = ‘No’;
datalines;
4
7
8
;
title “Listing of Compare”;
proc print data=compare
noobs;
run;
You might expect that the above
program would result in following output:
Listing of Compare
A
Found
4
Yes
7
No
8
No
But the correct output is:
Listing of Compare
A
Found
4
Yes
7
Yes
8
Yes
Surprised? This is because, in SAS,
any value other than 0 or missing is true. Therefore, 5 is evaluated as true
and the statement A = 4 or 5 is always true.
Mistake #
3: When stacking datasets in SAS
If the dataset NORTH had 3
observations and SOUTH had 6 observations, how many observations would the
dataset EMP have in the following code?
data emp;
set north(in = a) south(in=b);
if a and b;
run;
This looks like a simple case of
vertical stacking and hence you would expect that the dataset emp has 9 observations,
right?
While stacking the first dataset, IN
variable is 1 for the first dataset and 0 for the second. While stacking the
second dataset, IN variable is 1 for second dataset and 0 for first dataset. It
is never 1 for both together. Hence the output dataset emp would have zero or
no observations.
Mistake
# 4: When using column input in SAS
You know that column input is
appropriate only in the following situations.
- When data is in standard character or numeric values
- When values for a variable are in the same location in all records.
Now what do you think would be the
value of the variable BREADTH in the output dataset below?
data test;
infile cards;
input @1 length 2. @4 breadth 2;
cards;
72 95
run;
proc print;
run;
You say “Breadth is 95, obviously!”.
Right?
The correct answer is 2. This is a
case of column input. The ‘2’ after BREADTH in input statement specifies the
starting column from which BREADTH is to be read and not the format. Remember,
there is no ‘dot’ after 2.
Mistake
# 5: When at the end of Data Step in SAS
Look at the program below:
data short;
input x;
datalines;
1
2
;
run;
data long;
input x;
datalines;
3
4
5
6
;
data new;
set short;
output;
set long;
output;
run;
proc print data
= new;
run;
In the above program, an observation
is first read from the SHORT data set and the observation is written out to the
NEW data set. Then an observation is read from the LONG data set, and is
written to the NEW data set. You expect that this would continue until all the
observations from both the datasets are read. So, the dataset NEW would have
1,3,2,4,5,6. Right?
The dataset SHORT has only 2
observations 1 and 2. It is a shorter dataset. As soon as the end of file on
data set SHORT is encountered, it signals an end to the DATA step, with the
result that data set NEW has only four observations, with values of x equal to
1, 3, 2, and 4.
So these are some of the common
mistakes you can make while executing a SAS code. Make sure to avoid these!
“Sometimes,
little things make a big difference…”
– Nino Varsimashvili
Reference: http://analyticstraining.com/2016/5-mistakes-avoid-sas/
No comments:
Post a Comment