Friday 25 November 2016

Interview Questions

What SAS statements would you code to read an external raw data file to a DATA step?
INFILE statement.

· How do you read in the variables that you need?
Using Input statement with the column pointers like @5/12-17 etc.

· Are you familiar with special input delimiters? How are they used?
DLM and DSD are the delimiters that I’ve used. They should be included in the infile statement. Comma separated values files or CSV files are a common type of file that can be used to read with the DSD option. DSD option treats two delimiters in a row as MISSING value.

DSD also ignores the delimiters enclosed in quotation marks.

· If reading a variable length file with fixed input, how would you prevent SAS from reading the next record if the last variable didn't have a value?
By using the option MISSOVER in the infile statement.If the input of some data lines are shorter than others then we use TRUNCOVER option in the infile statement.

· What is the difference between an informat and a format? Name three informats or formats.

Informats read the data. Format is to write the data.
Informats: comma. dollar. date.
Formats can be same as informatsInformats: MMDDYYw. DATEw. TIMEw. , PERCENTw,Formats: WORDIATE18., weekdatew.

· Name and describe three SAS functions that you have used, if any?

LENGTH: returns the length of an argument not counting the trailing blanks.(missing values have a length of
1)Ex:

a=’my cat’;
x=LENGTH(a);

Result: x=6…

SUBSTR: SUBSTR(arg,position,n) extracts a substring from an argument starting at ‘position’ for ‘n’ characters or until end if no ‘n’.
Ex:

data dsn;
A=’(916)734-6241’;
X=SUBSTR(a,2,3); 
RESULT: x=’916’ ;
run;
TRIM: removes trailing blanks from character expression.
Ex: a=’my ‘; b=’cat’;X= TRIM(a)(b); RESULT: x=’mycat’.

SUM: sum of non missing values.Ex: x=Sum(3,5,1); result: x=9.0

INT: Returns the integer portion of the argument.

· How would you code the criteria to restrict the output to be produced?
Use NOPRINT option.

· What is the purpose of the trailing @ and the @@? How would you use them?
@ holds the value past the data step.@@ holds the value till a input statement or end of the line.

Double trailing @@: When you have multiple observations per line of raw data, we should use double trailing signs (@@) at the end of the INPUT statement. The line hold specifies like a stop sign telling SAS, “stop, hold that line of raw data”.
ex: 
data dsn;

input sex $ days;
cards;
F 53
F 56
F 60
F 60
F 78
F 87
F 102
F 117
F 134
F 160
F 277
M 46
M 52
M 58
M 59
M 77
M 78
M 80
M 81
M 84
M 103
M 114
M 115
M 133
M 134
M 175
M 175
;
run;

The above program can be changed to make the program shorter using @@ .... 

data dsn;
input sex $ days @@;
cards;
F 53 F 56 F 60 F 60 F 78 F 87 F 102 F 117 F 134 F 160 F 277M 46 M 52 M 58 M 59 M 77 M 78 M 80 M 81 M 84 M 103 M 114M 115 M 133 M 134 M 175 M 175
;
run;

Trailing @: By using @ without specifying a column, it is as if you are telling SAS,” stay tuned for more information. Don’t touch that dial”. SAS will hold the line of data until it reaches either the end of the data step or an INPUT statement that does not end with the trailing.

· Under what circumstances would you code a SELECT construct instead of IF statements?
When you have a long series of mutually exclusive conditions and the comparison is numeric, using a SELECT group is slightly more efficient than using IF-THEN or IF-THEN-ELSE statements because CPU time is reduced.

SELECT GROUP:
Select: begins with select group.When: identifies SAS statements that are executed when a particular condition is true.
Otherwise (optional): specifies a statement to be executed if no WHEN condition is met.
End: ends a SELECT group.

·What statement you code to tell SAS that it is to write to an external file?

.What statement do you code to write the record to the file?
PUT and FILE statements.

· If reading an external file to produce an external file, what is the shortcut to write that record without coding every single variable on the record?

· If you're not wanting any SAS output from a data step, how would you code the data statement to prevent SAS from producing a set?
Data _Null_

· What is the one statement to set the criteria of data that can be coded in any step?
Options statement: This a part of SAS program and effects all steps that follow it.

· Have you ever linked SAS code? If so, describe the link and any required statements used to either process the code or the step itself

.· How would you include common or reuse code to be processed along with your statements?
By using SAS Macros.

· When looking for data contained in a character string of 150 bytes, which function is the best to locate that data: scan, index, or indexc?

SCAN.· If you have a data set that contains 100 variables, but you need only five of those,

.what is the code to force SAS to use only those variable?
Using KEEP option or statement.

· Code a PROC SORT on a data set containing State, District and County as the primary variables, along with several numeric variables.

Proc sort data=one;
BY State District County ;
Run ;

· How would you delete duplicate observations?
NONUPLICATES

· How would you delete observations with duplicate keys?
NODUPKEY

· How would you code a merge that will keep only the observations that have matches from both sets.
Check the condition by using If statement in the Merge statement while merging datasets.

· How would you code a merge that will write the matches of both to one data set, the non-matches from the left-most data.

Step1: Define 3 datasets in DATA step
Step2: Assign values of IN statement to different variables for 2 datasets
Step3: Check for the condition using IF statement and output the matching to first dataset and no matches to different datasets

Ex:

data xxx;
merge yyy(in = inxxx) zzz (in = inzzz);
by aaa;
if inxxx = 1 and inyyy = 1;
run;



· What is the Program Data Vector (PDV)? What are its functions?
Function: To store the current obs;PDV (Program Data Vector) is a logical area in memory where SAS creates a dataset one observation at a time. When SAS processes a data step it has two phases. Compilation phase and execution phase. During the compilation phase the input buffer is created to hold a record from external file. After input buffer is created the PDV is created. The PDV is the area of memory where SAS builds dataset, one observation at a time. The PDV contains two automatic variables _N_ and _ERROR_.


The Logical Program Data Vector (PDV) is a set of buffers that includes all variables referenced either explicitly or implicitly in the DATA step. It is created at compile time, then used at execution time as the location where the working values of variables are stored as they are processed by the DATA step program(source: http://www2.sas.com/proceedings/sugi24/Posters/p235-24.pdf).



· Does SAS 'Translate' (compile) or does it 'Interpret'? Explain.
SAS compiles the code· At compile time when a SAS data set is read, what items are created?Automatic variables are created. Input Buffer, PDV and Descriptor Information

· Name statements that are recognized at compile time only?
PUT

· Name statements that are execution only.
INFILE, INPUT·

.Identify statements whose placement in the DATA step is critical.
DATA, INPUT, RUN.

· Name statements that function at both compile and execution time.
INPUT

· In the flow of DATA step processing, what is the first action in a typical DATA Step?
The DATA step begins with a DATA statement. Each time the DATA statement executes, a new iteration of the DATA step begins, and the _N_ automatic variable is incremented by 1.

· What is _n_?
It is a Data counter variable in SAS.
Note: Both -N- and _ERROR_ variables are always available to you in the data step

.–N- indicates the number of times SAS has looped through the data step.This is not necessarily equal to the observation number, since a simple sub setting IF statement can change the relationship between Observation number and the number of iterations of the data step.

The –ERROR- variable ha a value of 1 if there is a error in the data for that observation and 0 if it is not. Ex: This is nothing but a implicit variable created by SAS during data processing. It gives the total number of records SAS has iterated in a dataset. It is Available only for data step and not for PROCS. Eg. If we want to find every third record in a Dataset thenwe can use the _n_ as follows

Data new-sas-data-set;
Set old;
if mod(_n_,3)= 1 then;
run;

Note: If we use a where clause to subset the _n_ will not yield the required result.


How do i convert a numeric variable to a character variable?
You must create a differently-named variable using the PUT function.

How do i convert a character variable to a numeric variable?
You must create a differently-named variable using the INPUT function.

How can I compute the age of something?
Given two sas date variables born and calc:

age = int(intck('month',born,calc) / 12); 
if month(born) = month(calc) then age = age - (day(born) > day(calc));

How can I compute the number of months between two dates? 
Given two sas date variables begin and end:

months = intck('month',begin,end) - (day(end) <>

How can I determine the position of the nth word within a character string?

Use a combination of the INDEXW and SCAN functions:pos = indexw(string,scan(string,n));



I need to reorder characters within a string...use SUBSTR?
You can do this using only one function call with TRANSLATE versus two functions calls with SUBSTR. The following lines each move the first character of a 4-character string to the last:

reorder = translate('2341',string,'1234');
reorder = substr(string,2,3) substr(string,1,1);
How can I put my sas date variable so that December 25, 1995 would appear as '19951225'? (with no separator) 

use a combination of the YEAR. and MMDDYY. formats to simply display the value: 
put sasdate year4. sasdate mmddyy4.; 

or use a combination of the PUT and COMPRESS functions to store the value: 
newvar = compress(put(sasdate,yymmdd10.),'/'); 

How can I put my sas time variable with a leading zero for hours 1-9? 
Use a combination of the Z. and MMSS. formats: 
hrprint = hour(sastime); 
put hrprint z2. ':' sastime mmss5.;

INFILE OPTIONS
Prepared by Sreeja E V(sreeja@kreara.comsource: kreara.blogspot.com.
Infile has a number of options available.



FLOWOVER
FLOWOVER is the default option on INFILE statement. Here, when the INPUT statement reaches the end of non-blank characters without having filled all variables, a new line is read into the Input Buffer and INPUT attempts to fill the rest of the variables starting from column one. The next time an INPUT statement is executed, a new line is brought into the Input Buffer.
Consider the following text file containing three variables id, type and amount.
11101 A
11102 A 100
11103 B 43
11104 C
11105 C 67

The following SAS code uses the flowover option which reads the next non missing values for missing variables.

data B;
infile "External fileflowover;
input id $ type $ amount;
run;

which creates the following dataset

MISSOVERWhen INPUT reads a short line, MISSOVER option on INFILE statement does not allow it to move to the next line. MISSOVER option sets all the variables without values to missing.
data B;
infile "External filemissover;
input id $ type $ amount;
run;

which creates the following dataset

TRUNCOVER
Causes the INPUT statement to read variable-length records where some records are shorter than the INPUT statement expects. Variables which are not assigned values are set to missing.
Difference between TRUNCOVER and MISSOVER

Both will assign missing values to variables if the data line ends before the variable’s field starts. But when the data line ends in the middle of a variable field, TRUNCOVER will take as much as is there, whereas MISSOVER will assign the variable a missing value.
Consider the text file below containing a character variable chr.
a
bb
ccc
dddd
eeeee
ffffff
Consider the following SAS code
data trun;
infile "External filetruncover;
input chr $3. ;
run;

When using truncover option we get the following dataset
data miss;
infile "External filemissover;
input chr $3. ;
run;
While using missover option we get the output

Thursday 24 November 2016

Extract Nth highest salary from SALARY table using Proc SQL:

Extract Nth highest salary from SALARY table using Proc SQL:


Data salary;
input sal;
cards;
23
12
.
45
0
54
-21
43
;
run;



%let N=3;
proc sql;
select a.sal from salary a where &N= (select count(distinct sal) from salary b where a.sal<=b.sal);
quit;

OUTPUT:
43

Wednesday 23 November 2016

How to get any kind of data in SAS? Is it possible to take data from notepad in SAS?

How to get any kind of data in SAS? Is it possible to take  data from notepad in SAS?


Sample Data
---------------------------------------------------------------------

DATA sample_accounts;
INPUT Account 1-7 OpenDate $ 9-17 StatusCode $ 21-22
CreditLimit 25-29;
CARDS;
1234670 11-Sep-04 Z 2000
1234671 12-Sep-04 3000
1234672 13-Sep-04 Z 2500
1234673 14-Sep-04 T 3200
1234674 15-Sep-04 8000
run;


method1
--------------------------
Data sample_accounts;
INFILE "C:\Mydata\sample_accounts.txt";
INPUT Account 1-7 OpenDate $ 9-17 StatusCode $ 21-22
CreditLimit 25-29;
RUN;


method 2
-----------------------------------
filename sacc "D:\Mydata\sample_accounts.txt";
Data sample_accounts;
INFILE sacc ;
INPUT Account 1-7 OpenDate $ 9-17 StatusCode $ 21-22
CreditLimit 25-29;
RUN;

method 3;
-----------------------------------use proc import:


method 4:
--------------------------use proc cimport for xpt files into sas

SAS Update Statement - Combining Datasets

Update Statement : 
Many applications in real world require to update master datasets with transaction data set; 
For example when we need to update the original mailing list with changed addresses or one may need update list of master inventories.

This is very simple using SAS UPDATE statement. With update statement SAS reads observation from transaction dataset and finds out corresponding record in master dataset using key specified in by group variables. Then it updates master dataset with new values in transaction dataset

Important
1. Master datasets wont get updated if corresponding columns in transaction dataset are missing;
it gets update with non missing values only.
2. If there is new record in transaction dataset which don’t find corresponding records in master dataset in that case new record is added into the master dataset.
3.Master datset should not have duplicates with specified key group. If SAS finds duplicates then we get warning in the log and the changes get applied only to 1st obs in that by group.
4.Both datasets need to be sorted already. 

PROC DATASETS (copy, move, kill, save, modify, rename, delete datasets)

PROC DATASETS (copy, move, kill, save, modify, rename, delete datasets)


PROC DATASETS is one of the important procedures in SAS. 
It is very much useful in managing SAS datasets in bulk without actually changing the data .
Its usefulness lies in that unlike data step it wont iterate the observations instead will work directly on the metadata to do the changes.
Hence it is much more efficient and powerful than most of its counter parts. 

With the following discussion we will look at each of the important tasks which can be carried out using PROC DATASETS. 

Here we are going to learn following topics
  1. CONTENTS in PROC DATASETS to view the contents of the Library.
  2. MODIFY / CHANGE / RENAME /FORMAT in PROC DATASETS to modify the SAS Dataset properties.
  3. APEEND in PROC DATASETS for appending datasets to each other.
  4. COPY and MOVE in PROC DATASETS for copying or moving datasets between libraries
  5. DELETE in PROC DATASETS for deleting some datasets from the library
  6. KILL in PROC DATASETS  for deleting all the datasets from the library

SAS Interleave - Combining Datasets using BY statement

SAS Interleave - Combining Datasets using BY statement


1. INTERLEAVING :
Interleaving combines individual sorted datasets into one sorted data set by specified variable in the By Statement
Before we interleave any SAS datasets they must be sorted by the same variable If the datasets are already indexed there is no need to sort them 


SAS Merge Statement - Combining Datasets

2.   MERGING :
Merging is nothing but combining two or more SAS datasets horizontally
In one to one merging dataset are not required to be sorted
But for match merging sorted data sets are mandatory.
We can use MERGE statement for this.

Two types of merging can be done
a. One to One Merge
b. Match Merge

While merging data sets SAS writes one new observation for each observation in the data set with the largest number of observation in the by group.
In match merging if the relation is one to many we get different results than what we would get with sql join. This has been illustrated with following example.







SAS MERGE code requires that input datasets must be already sorted with variables listed in BY statement. However PROC SQL dont need this; and thus being advantageous.
 
But as we can see below there is significant difference between the outputs produced by these methods. So one must take precautions while using MERGE statement with many to many relationship.



Happy Coding......

ref URL: https://sites.google.com/site/sasbuddy/sas-merge-statement


SAS-Importing and Exporting from Tabs of Excel

SAS- Importing and Exporting from Tabs of Excel

TO AND FRO FROM TABS OF AN EXCEL SHEET
Importing/Exporting from/into Tabs of an Excel sheet can be very useful. There can be many situations where this becomes absolutely essential, think of a case when you have a SAS dataset which has more than 65,536 rows (limit of Excel 2003) and all you want to do is export the entire SAS dataset into one excel workbook then this becomes very useful.
Let us discuss the case described above, first we divide the Master Dataset into Datasets with each having Rows< 65,536

[35.jpg]



DATA EXPORT1 EXPORT2 EXPORT3;
SET SAMPLE;
IF _n_ <= 65,536 THEN OUTPUT EXPORT1;
ELSE IF 65,536< _n_ <= 100,000 THEN OUTPUT EXPORT2;
ELSE OUTPUT EXPORT3;
RUN;


Now that we have three datasets we export it into three tabs of an excel workbook.

1) PROC EXPORT DATA = EXPORT1
            OUTFILE = "C:\Documents and Settings\analyst\Desktop\Exported_From_SAS.xls"
            DBMS = EXCEL REPLACE;
            SHEET = "EXPORT1";
RUN;


2) PROC EXPORT DATA = EXPORT2
            OUTFILE = "C:\Documents and Settings\analyst\Desktop\Exported_From_SAS.xls"
            DBMS = EXCEL REPLACE;
            SHEET = "EXPORT2";
RUN;


3) PROC EXPORT DATA = EXPORT3
            OUTFILE = "C:\Documents and Settings\analyst\Desktop\Exported_From_SAS.xls"
            DBMS = EXCEL REPLACE;
            SHEET = "EXPORT3";
RUN;


This would Export the datasets into the tabs of the Excel workbook, and of course you can increase or decrease the number of Tabs or Datasets, or have a row distribution as per your choice.

IMPORTING FROM EXCEL TABS:

Next we import from the tabs of an Excel workbook. Here I have used a macro; I could have used similar looking multiple codes like above but then we also need to optimize stuff.
So the macro imports the tabs one by one and then appends it to one data set. The Proc Delete Statement is because if you run the step twice for some reason the new created datasets will get appended to the dataset created in the previous run. There is this one small issue with proc append, the dataset does not get replaced but it gets appended. 

So the Macro imports 5 tabs of the Excel workbook (you can change this)

%MACRO IMPORT_EXCEL();
%DO i=1 %TO 5 ;
PROC DELETE DATA =OUT.FINALSET;
RUN;
PROC IMPORT OUT = OUT.IMPORT_&i.
DATAFILE= "C:\Documents and Settings\analyst\Desktop\Imported_Into_SAS.xls"
DBMS=EXCEL REPLACE; 
SHEET="SHEET &i.$"
GETNAMES=YES; MIXED=NO; SCANTEXT=YES;USEDATE=YES; SCANTIME=YES;
RUN;
PROC APPEND BASE = OUT.IMPORTED_&QTR. DATA = OUT.&QTR._&&I. FORCE;
RUN;
%END;
%MEND;


URL Referred: http://www.analytics-tools.com/search/label/SAS-_N_