Wednesday 14 December 2016

Arrays in SAS


SINGLE DIMENSION ARRAYS IN SAS
Almost all the languages have arrays concept. SAS arrays are a little bit different from other arrays. SAS Arrays are just a group of similar kind of variables. Similar kind here means that the variables associated with arrays can only be either numeric or character. SAS array elements do not need to be continuous or of same length or even related to each other. An array is not a data structure, array name is not a variable name and arrays do not define new variables (There is an exception to that).
Arrays help us in carrying out the same actions on multiple similar kinds of variables in a data set easily with minimum coding.
The general syntax of array is as follows:
ARRAY array_name {dim} <$> <length> <array_elements> <(intial_values)>;
  • Array_name: It is the name given to the array. Any valid SAS variable name can also be the array name. Along with that, the Array name should not be the name of any existing variable name. You can give any functions name to an array but, in this case when you want to use functions, SAS will not recognize it as a function.
  • Dim: It tells us how much similar kind of variables is to be grouped for that particular array.
  • The dollar sign here tells us that whether the array defined is of character type or numeric type.
  • The variables grouped can have different lengths. By giving a number after the $ sign, you can define a common length for the array elements.
  • Array_elements: It is the names of the variables (separated by space) that need to be associated with the array.
  • Initial_values: Initial values are the values that you want to assign to the array elements for the first observation in the data set.
Now let’s look more practically deep into arrays.
Single dimension arrays:
The below SAS output shows the SASHELP.CLASS data set. I will be using this along with some more data sets throughout this paper.
Obs    Name       Sex    Age    Height    Weight

 1    Alfred      M      14     69.0      112.5
 2    Alice       F      13     56.5       84.0
 3    Barbara     F      13     65.3       98.0
 4    Carol       F      14     62.8      102.5
 5    Henry       M      14     63.5      102.5
 6    James       M      12     57.3       83.0
 7    Jane        F      12     59.8       84.5
 8    Janet       F      15     62.5      112.5
 9    Jeffrey     M      13     62.5       84.0
10    John        M      12     59.0       99.5
11    Joyce       F      11     51.3       50.5
12    Judy        F      14     64.3       90.0
13    Louise      F      12     56.3       77.0
14    Mary        F      15     66.5      112.0
15    Philip      M      16     72.0      150.0
16    Robert      M      12     64.8      128.0
17    Ronald      M      15     67.0      133.0
18    Thomas      M      11     57.5       85.0
19    William     M      15     66.5      112.0
Now if I want to add 10 to all the numeric variables, the simple data step with no arrays would look something like the below one.
data abc;
set sashelp.class;
age=age+10;
weight=weight+10;
height=height+10;
run;

There are only three numeric variables or there are only three variables whose values you want to increase by 10 and hence the above program works fine. What if you have more number of variables whose values you need to change? In situations like these, SAS arrays become more helpful.
data abc;
set sashelp.class;
array varn(3) age height weight;
do i=1 to 3;
varn(i)=varn(i)+10;
end;
run;
Now in the above program, an array varn is defined using array statement. The dimension is explicit and it is 3. The variables associated with varn array are age height weight. Since there is no dollar sign after the dimension, it is a numeric array (all the elements should be of numeric type).
data abc;
set sashelp.class;
array varc(2) name sex;
genname=cat(varc(1), varc(2));
run;

If you closely observe the above program, there is no dollar sign in the array statement after the dimension. So is this array a numeric array? The answer is no. The type of the array also depends on the elements of the array. Here all the array elements are character type and hence the array will also be character type.
In SAS arrays, the lower bound will always be 1 when not specified explicitly. You can specify the lower bound explicitly as shown in the following program.
data lbound;
set sashelp.class;
array varc(10:11) name sex;
genname=cat(varc(10), varc(11));
run;

When there are multiple variables in a data set and you know the number of variables and you want to include all similar kind of variables into arrays but writing down the names of these variables is a problem, then you can use keywords _NUMERIC_, _CHAR_, _ALL_.
data key;
set sashelp.class;
array varn(3) _numeric_;
array varc(2) _char_;
do i=1 to 3;
varn(i)=varn(i)+10;
end;
genname=cat(varc(1), varc(2));
run;

Remember that when you are using _ALL_ keyword, the data set must have similar kind of variables only. If you look at the below program, the data set all has 5 variables, 3 numeric and 2 character. The first two variables are associated with the array with no problem as they are of same data type (character) but for the remaining variables, SAS could not associate them with the array as they are numeric. So we are getting the following error messages, each for one variable.
data all;
set sashelp.class;
array mix(5) _all_;
run;

ERROR: All variables in array list must be the same type, i.e., all numeric or character.
ERROR: All variables in array list must be the same type, i.e., all numeric or character.
ERROR: All variables in array list must be the same type, i.e., all numeric or character.

Let’s take the below example and see how to create a dynamic array.
In the below example, a * is placed at the place of dimension which indicates that the array is a dynamic array. At the compilation stage, SAS reads the variables from the data set and determines the dimension for these arrays. You can also use keyword _numeric_, _char_, and _all_ with dynamic arrays as shown in the second example below.
data dyn;
set sashelp.class;
array varc(*) name sex;
array varn(*) age height;
run;

data dynkey;
set sashelp.class;
array varc(*) _numeric_;
array varn(*) _char_;
run;

Now some one might think that for a dynamic array, what happens if we do not specify any keywords or name of the elements? In this case, the program gives an error and stops as SAS has t determine the dimension of the dynamic array at the compilation stage.
data noname;
set sashelp.class;
array varc(*);
run;

ERROR: The array varc has been defined with zero elements.
Now when you specify the number of elements more than the dimension or less that the dimension of the array as shown in the below examples, you will get the following errors and SAS will stop.
data less;
set sashelp.class;
array varc(2) name;
run;

ERROR: Too few variables defined for the dimension(s) specified for the array varc.

data more;
set sashelp.class;
array varc(2) age height weight;
run;
ERROR: Too many variables defined for the dimension(s) specified for the array varc.

Till now we have defined the array elements by either providing the elements names or by using the keywords. What will happen when we do not provide any of these? Let’s check out.
In the below program, an array is defined with 2 as dimension. It does not contain any elements in the array statement. In this case, SAS will create permanent variables named as no1 and no2 (number of variables created will be equal to the dimension of the array) and will assign missing values to these variables. These variables will also be the elements of the array. Since there is no $ sign in the array statement, this array will be of numeric type and “.” will represent the missing values for no1 and no2.
data noele;
set sashelp.class (obs=5);
array no(2);
run;
proc print;
run;

Obs     Name      Sex    Age    Height    Weight    no1    no2

1     Alfred      M      14     69.0      112.5     .      .
2     Alice       F      13     56.5       84.0     .      .
3     Barbara     F      13     65.3       98.0     .      .
4     Carol       F      14     62.8      102.5     .      .
5     Henry       M      14     63.5      102.5     .      .

Few functions that works with arrays:
There are two functions that you should be aware of when you are working with arrays. They are VNAME and DIM functions.
VNAME function will return the name of the variable associated with the array and a particular index.
DIM function returns the dimension of the array.
In the below program, for the first observation, the do loop will execute and dim function return the dimension of the ser array which is 7. The vname function returns the name of the variable associated with the array and the particular index represented by I, to name. If you check the log, you will find the variable names as shown below.

data fun;
input id a b c d e f ;
array ser(*) _numeric_;
if _n_=1 then do i=1 to dim(ser);
name=vname(ser(i));
put name @@;
end;
cards;
 1        3281       3413       3114          .          .       3500
 2        4042       3084       3108       3150       3800       3100
 3        6015       6123       6113       6100       6100       6200
;
run;
id a b c d e f

You can also use shortcuts in data step function when you are using these functions on arrays (cannot specify the indexes here).

data fun;
input  a b c d e f ;
array num(*) _numeric_;
dim=dim(num);
sum=sum(of num(*));
cards;
     3281       3413       3114          .          .       3500
     4042       3084       3108       3150       3800       3100
     6015       6123       6113       6100       6100       6200
;
run;

Temporary arrays:
If you look at the following example which is already discussed, new variables are getting created when we are not mentioning any info about the elements. Now if you want SAS not create these new variables in the output data set, then you can use temporary arrays. The new variables will not get created and the array elements will be confined only to PDV and will not come into the output data set.
data noele;
set sashelp.class (obs=5);
array no(2);
run;
proc print;
run;

Obs     Name      Sex    Age    Height    Weight    no1    no2

1     Alfred      M      14     69.0      112.5     .      .
2     Alice       F      13     56.5       84.0     .      .
3     Barbara     F      13     65.3       98.0     .      .
4     Carol       F      14     62.8      102.5     .      .
5     Henry       M      14     63.5      102.5     .      .

data noele;
set sashelp.class (obs=5);
array no(2) _temporary_;
run;
proc print;
run;

Obs     Name      Sex    Age    Height    Weight

1     Alfred      M      14     69.0      112.5
2     Alice       F      13     56.5       84.0
3     Barbara     F      13     65.3       98.0
4     Carol       F      14     62.8      102.5
5     Henry       M      14     63.5      102.5

Use put no(1) and put no(2) to check the values of these array elements. If you use put _all_ in the above data step you will only see name, sex, age, height, weight, _error_, _n_.
Assigning initial values:
When you want to assign initial values to the elements of the array, you can do so in the following way.
data noele;
set sashelp.class (obs=5);
array no(2) (1 2);
run;
proc print;
run;

Remember that the initial values can only be assigned to those variables that not already in the data set and you are creating those variables through array statement. Otherwise these initial values will have no effect.

No comments:

Post a Comment