Wednesday, 14 December 2016

Some basic Questions and Answers in SAS

Question 1:-
What is LRECL= option?
SAS Assumes that external files have record length of 256 or less. If your data lines are longer than 256 characters then SAS will not read the data correctly. To tell SAS the record length in the external files we use LRECL= Infile option.
Ex: Infile 'c:\data\class.txt' LRECL=1500;
 
Question 2:-
What are the limitations of List Input Style (Free Formatted Input Style)?
The limitations are as follows:
1.  All the data in a record must be read, no skipping of unwanted data
2.  The data can’t have embedded spaces.
3.  The data that requires special treatment (formats, length etc) can’t be read.
4.  The character data of length greater than four characters can’t be read.
5.  Missing values should be explicitly indicated.
Question 3:-
What are the advantages of Column Input Style over List Input Style?
1.      The Character data can have embedded blanks.
2.      Character data of length greater than 8 characters can be read.
3.      Unwanted data can be skipped.
4.      Missing Values can be left blank.
Question 4:-
Observe the following program. What is wrong with the program? The output will be as below.
Obs    name                                     code               year                 amt
 1        Yellowstone                         ID/MT/WY   1872       4065493
 2        Everglades                           FL                    934                  13
 3        Yosemite                             CA                 1864                    7
 4        Great Smoky Mountains    NC/TN           1926                520
 5        Wolf Trap Farm                  VA                 1966                    .
data abc;
input name$ 1-21 code$ year amt comma10.;
cards;
Yellowstone           ID/MT/WY 1872  4,065,493
Everglades            FL 1934        1,398,800
Yosemite              CA 1864          760,917
Great Smoky Mountains NC/TN 1926       520,269
Wolf Trap Farm        VA 1966              130
;
run;
proc print;
run;
When SAS reads the data, it uses a pointer to mark its place. SAS uses this pointer differently for different input style. For list input style, SAS automatically scans to the next nonblank character and reads the data. For column input style, SAS reads in the exact column you specify. For Formatted input style, SAS starts reading wherever the pointer is. In the above example amt is read with formatted input style. SAS starts reading right after year values columns and reads 10 columns. Hence the output for this variable is not correct. We have to use column pointers for this situation. 
 
Question 5:-
What are / and #n?
/ and #n are line pointers. / tells SAS to go to the next dataline. #n tells SAs to go the nth line within the observation. #n provides the flexibility of moving forwards and backwards as well. 
 
Question 6:-
What is the output for the following program?
data abc;
input city$ State$
      // i1 i2
              #2 i3 i4;
cards;
Nome AK
55 44
88 29
Miami FL
90 75
97 65
;
proc print;
run;
As #n provides the flexibility of moving forward and backwards as well, SAS jumps to 3rd dataline and reads i1 and i2. Then SAS moves to the 2nd column within observation and reads for i3 and i4.
So, I1 values will be 88 and 97, i2 values will be 29 and 65, i3 values will be 55 and 90, i4 values will be 44 and 75.
  





Question 7:-
What is the difference between single trailing @ and double trailing @@?
Both are line holding specifiers. The difference comes in how long they hold the line. Single trailing @ releases the line when SAS starts developing a new observation or SAS encounters an input statement without @.
Double trailing @ releases the line at the end of last dataline or when SAS encounters an input statement without @@.
  


Question 8:-
What is the difference between Missover and Truncover?
Truncover is necessary when you are using column or formatted input style and some datalines are shorter than others.
Both will assign missing values to variables if the data line ends before the variable’s field starts. But when data line ends in the middle of the variable field, Truncover will take as much data is available, whereas Missover assigns a missing value.
 
Question 9:-
What is the purpose of using DSD option?
When your data is having embedded delimiters and two consecutive delimiters indicates missing value then, in that case we have to use DSD option.
DSD does three things:
1.      It ignores the delimiters within the enclosed quotation marks.
2.      It does not read quotations as a part of the data
3.      It takes two consecutive delimiters as a missing value.
Question 10:-
Where will the dataset gets stored in the following program?
data 'tree';
x=10;
y=20;
run;
In directory based operating environments (windows for example), if you leave out the path, then SAS uses the current working directory (You will see a path at the bottom left corner in the SAS software). In the given program, SAS will Store the dataset Tree in Current working Directory.
Question 11:-
Why SAS is called Self Documenting Software?
Whenever a data step gets executed, SAS automatically stores the descriptor portion of the data set. The descriptor portion includes –
1.      Data set name
2.      Number of Observations
3.      Number of variables
4.      Date on which that data set is created
5.      Attributes for each variable (Type, Length, Format, Informat, Label).
Question 12:-
How can see the descriptor portion of a dataset?
By using proc datasets or proc contents we can see the descriptor portion of the dataset.
proc datasets lib=sashelp;
            contents data=class nods;
quit;
run;
proc contents data=sashelp.class ;
run;

NODS is used to tell just list the datasets name in the library.
 
Question 13:-
What is the purpose of using VARNUM in Proc contents?
Proc Contents lists the variables in the alphabetical order. To tell SAS to list the variables in the order of their occurrence in the datasets, VARNUM option is used.
proc contents data=sashelp.class VARNUM;
run;
Question 14:-
What are the naming conventions of SAS names?
1.      The names should start with a letter or underscore
2.      The names should contain only letters, numbers, and underscore.
3.      The names should be of length 32 characters or few.
Question 15:-
What are the results of the following expressions? Why does the result differ?
Assignment statement Result
x = 10 * 4 + 3 ** 2;
x = 10 * (4 + 3 ** 2);
SAS follows standard mathematical rules of precedence, BODMAS. It calculates Exponential first and then division & multiplication, then Addition and Subtraction. Hence the results differ.
 
Question 16:-
Is it possible in SAS to subset your data using IF-THEN statements and single trailing @?
Yes, it is possible to subset the data using IF-THEN statement and single trailing @.
data abc;
input a @;
if a=1;
input b c;
cards;
1 2 3
1 2 3
4 5 6
7 8 9
1 5 9
1 6 7
;
proc print;
run;
Question 17:-
What is the significance of January 1, 1960 in SAS?
SAS has only two data types, Numeric and Character. SAS stores any date values as the difference between January 1, 1960 and the given date, in Numeric format.
 
Question 18:-
What is the significance of the year 1920 in SAS?
SAS takes the two digit year value on the basis of YEARCUTOFF option. The default value of YEARCUTOFF option is 1920. This value can be changed. SAS looks at  the 100 year block starting from 1920 (1920 to 2019) and takes the two digit year value accordingly.
options yearcutoff=1950;
 
Question 19:-
What is the result of the following data step?
data abc;
set sashelp.class;
array numeric(4);
run;
proc print;
run;
Array concept in SAS is used for grouping the similar type of variables and creating Similar type of variables also. In the given program, SAS creates 4 numeric variables with names as Numeric1, Numeric2, Numeric3, Numeric4 with default length of 8 bytes. It assigns missing values to all these variables.


Question 20:-
What will be there in B variable if you execute the following functions?
 B=index('I AM THERE AS THE KING', 'Her');
B=indexw('I AM THERE AS THE KING', 'Her');
Index function returns 7 whereas Indexw returns zero.
The main difference between Index and Indexw functions is that Indexw has to be given with the whole word. There is no word HER in the given sentence and hence it is giving zero.
Question 21:-
Write the result for the following statement.
B2=CATX('A','B','C','D');
CATX function considers the first argument as separator. It concatenates B, C, D with A inbetween. So the result will be BACAD.
 
Question 22:-
What is the result for the following program?
Data four;
x=200;
If x le 100 then grade='low';
else x ge 200 then grade='high';
else grade='medium';
Run;
During the compilation, SAS will create the variables and assigns attributes also. Since Grade appears with value LOW for the first time, SAS assigns the length of grade as 3. Hence X will be having 200 and grade will be having HIG.
Question 23:-
How many times “Better Code” will be printed with the following program?
Data five;
Do I=1 to 5;
            Do J=5 to 10;
            Put 'BETTER CODE';
            End;
End;
Run;
For each value of I, “Better Code” will be printed for 6 times. Hence total of 30 times it will be printed.
Question 24:-

What is the default value for options Linesize and Pagesize?
Linesize=102
Pagesize=56
Question 25:-
If you donot want to produce any dataset, how would you code the data statement to prevent SAS from producing dataset?
By writing data _null_; we can instruct SAS not produce any data sets.
Question 26:-
What is the output for the following statement?
K=translate('(080)-888 544 54','[]','()');
Translate function in the given statement replaces all ( with [ and all ) with ]. Hence the output will be as follows:
[080]-888 544 54
Question 27:-
Can we use _N_ in Where statement?
Where statement does not use PDV and hence any new variable created in the data step using assignment statement, any temporary variables created during the execution of the data step cannot be used in Where statement. Where statement cannot be used when you are reading external raw data or internal raw data. Where statement works for already existing datasets only.
Question 28:-
What are _N_ and _ERROR_?
They are temporary variables that are created during the data step execution time. _N_ represents the number of times that data step has looped and _Error_ will have the value 1 when there is a data error for that observation.
Question 29:-
What is the difference between the following statements?
input C1-C4;
input C1 a b d C4;
put C1--C4;
First one is Numbered range list. Here SAS creates variables C1, C2, C3, C4 variables.
Second one is Name range list. Here SAS will print all the variables from C1 to C4 i.e. C1 a b d C4.
Question 30:-
What is the difference between PUT function and PUT statement?
PUT statement is used to print the values or text into the required destination. PUT function is used to convert Numeric data into Character data.

No comments:

Post a Comment