Friday 2 December 2016

Base SAS/SQL Tricky Questions

1) Question: Dataset calc is given as :
data calc;
input id marks;
cards;
1 97
2 95
3 45
4 54
5 45
6 48
7 47
8 47
9 45
10 58
;
run;
Find the sum of all marks from id 3 to id 8
1. Using Data Step
2. Using Proc SQL
3. Using proc means
4. Using Proc report

Answer:
Using data Step
data tot;
set calc;
if 3<=id<=8 then
total+marks;
if id=8 then put total=;
run;
Another method using data step:
data tot;
set calc(firstobs=3 obs=8);
total+marks;
run;
Using Proc SQL:
proc sql;
select sum(marks) from mydata where id ge 3 and id le 8;
run;
Using Proc means
option firstobs=3 obs=8;
proc means data = calc sum;
var number;
run;
Using Proc Report
option firstobs=3 obs=8;
proc report data = calc nowd;
column marks;
run;
2) Question:
Data set second highest is given as
data secondhig;
input Name $ Number;
cards;
sandeep 98
kuldeep 49
arvind 58
gajender 97
amit 69
ankit 97
abhishek 70
;
run;
Select those names having second highest numbers such that the output should be like:
Name Number
gajender 97
ankit 97
Answer:
proc sql;
select name, number from secondhig where number in (select max(Number) as second from secondhig where number<(select max(Number)from secondhig));
quit;
3) Question
data countobser;
input name $ grade $;
cards;
Ram A
Raj B
Rahim A
Manu C
Karan C
Praveen A
Pradeep A
Ravi B
;
run;
Create a data set that contains number of times A grade is repeated, B Grade is repeated and so on..
Answer:
proc freq data = countobser noprint;
tables grade/out=counter(drop=percent);
run;



4) Question
In the above data set, create a macro varible that counts the number of observations in the dataset
Using Data Step
Using SQL Query
Also print the result in SAS log
Answer:
data _null_;
set countobser end=eof;
count+1;
if eof then call symput(‘macvar’, count);
run;
%put &macvar;
proc sql;
select count(*) into :macvar1 from countobser;
quit;
5) Question:
Find the mode using proc freq
data a;
input age @@;
datalines;
22 32 32 32 43 23 24 56 45 54 28 29 27 26 25 28 30 20 18 37 36 47 46
56 19 20
;
run;
Answer:
proc freq data = a noprint;
tables age/out=datam(drop=percent);
run;
from the dataset datam, it can be seen that 32 is repeated maximum number of times(3), thus the mode age is 32.
6) Question: Dataset schoolx is given as :
data schoolx;
input Name $ Class $ Marks;
cards;
AAA X 20
AAA Y 30
AAA Z 40
AAA X 50
BBB X 60
BBB Y 70
BBB X 20
CCC X 10
CCC Y 40
CCC Z 50
CCC Z 30
;
run;
Generate the following output from the above dataset:
AAA X 70
AAA Y 30
AAA Z 40
BBB X 80
BBB Y 70
CCC X 10
CCC Y 40
CCC Z 80
Answer:
proc sort data = schoolx;
by Name class;
run;
data schoolx(drop=marks);
set mydat;
by name class;
if first.class then sum = 0;
sum+marks;
if last.class then output;
run;
7) Question:
Dataset sales is given as:
store dept quarter sales
101 10 1 110001.5
101 10 2 113101.2
101 10 3 111932.15
101 10 4 99901.1
101 20 1 110002.36
101 20 2 99922.39
101 20 3 98832.98
101 20 4 110101.7
121 20 1 121947.1
121 20 2 119964.69
121 20 3 122136.28
121 20 4 120111.11
121 10 1 127192.92
121 10 2 125280.13
121 10 3 128203.56
121 10 4 123632.29
109 10 1 120422.77
109 10 2 123984.32
109 10 3 121801.29
109 10 4 122125.66
109 30 1 98310.13
109 30 2 97331.25
109 30 3 96386.28
109 30 4 98511.9
109 20 1 115239.09
109 20 2 113001.98
109 20 3 114234.32
109 20 4 114122.65
Create a dataset having sum of sales by different stores.Also format the sales from different depts.
Answer
proc sort data = mydat;
by store;
run;
data sumsales;
set mydat;
by store;
if first.stote then sum=0;
sum+sales;
if last.store;
drop dept quarter sales;
format sum dollar13.2;
run;
8) Question:
Dataset student1 is given as:
DATA student1;
INPUT id $ quiz score;
CARDS;
001 1 6
002 1 5
003 1 7
003 2 4
005 1 8
004 1 .
004 2 .
001 2 8
001 3 6
005 2 .
002 2 9
003 3 3
005 3 7
002 3 6
004 3 .
;run;
Using the concept of first.var and last.var, find the quiz(out of the three) for which score of a particular student is maximum.
The final dataset should be like:
id score
001 8
002 9
003 7
and so on..
Answer:
PROC SORT DATA=student1;
BY id quiz;
RUN;
DATA student2;
SET student1;
BY id;
RETAIN highest;
IF FIRST.id THEN highest=.;
highest = MAX(highest, score);
drop quiz score;
if last.id;
RUN;
9) Question:
data work.test;
First = ‘lpswich, England’; /* length of var first is 16 */
City = substr(First,1,7);
run;
What is the length of variable City in output dataset?
A. 7
B. 8
C. 16
D. Missing Value
Answer: : C. 16
10) Question: I have a dataset misval. I want to replace all the missing values with the mean value of that variable.
data misval ;
input var1-var5;
datalines;
1 . 1 6 6
2 5 2 7 7
3 5 . 8 8
4 5 4 9 9
5 5 5 . 10
;
Answer: Use proc standard + replace option
proc standard data = misval out = misval2 replace;
run;
11) Question:
I want to count how many times day is repeated in the string “good good study, day day up”

Answer:
data a;
x=”good good study, day day up”;
y=count(x,’day’);
run;


12) Question: 
How to put leading zeros in front of numbers having len less then 9?
data mydat;
input prsnno;
cards;
1234
12345
123456
1234567
12345678
123456789
;
run;
Answer:
data mydat1;
set mydat;
length psn_char $9.;
psn_char = put(prsnno,z9.);
run;
13) Question:
How to put leading zeros in front of prsnno(char var) having len less then 9?
data mydat;
length prsnno $9;
input prsnno $ ;
cards;
1234
12345
123456
1234567
12345678
123456789
;
run;
Answer:
data mydat2;
set mydat;
format y z9.;
y=prsnno*1;
run;

No comments:

Post a Comment