Wednesday, February 9, 2011


1. What is the difference between Merge and Proc SQL ?
The resultant dataset depends on the input datasets. In case of one to one and one to many  both work
similarly,i.e. the resultant dataset is same. But differs in case of many to many and non matching datasets:
many to many:
ex:                                     (merge on x)         (proc sql)
X  Y              X    Z           X  Y  Z                X  Y  Z    
-----             ------            -------                 -------
1  A              1    F            1  A  F                 1  A  F
1  C              1    R --->      1  C  R                 1  A  R
2  B              2    G            2  B  G                1  C  F
                                                                   1  C  R
                                                                    2  B  G


NON MATCHING DATA:

ex:                                       (merge on x)    (proc sql)
X  Y              X    Z              X  Y  Z          X  Y  Z    
-----             ------                -------           -------
1  A              1    F            1  A  F           1  A  F
2  B              3    T    --->    2  B  .            3  C  T
3  C              4    G            3  C  T    
                                         4  .  G

2. Difference between Proc Means & Proc Summary
Ans:
a)  Proc Summary defaults to NOPRINT
     Proc Means defaults to PRINT

b) If you omit the VAR statement, then PROC SUMMARY produces a simple count of observations, whereas PROC MEANS tries to analyze all the numeric variables that are not listed in the other statements

c) If you specify statistics on the PROC SUMMARY statement and the VAR statement is omitted, then PROC SUMMARY stops processing and an error message is written to the SAS log.

d) If you omit the VAR statement, then PROC MEANS analyzes all numeric variables that are not listed in the other statements. When all variables are character variables, PROC MEANS produces a simple count of observations.

3. Check the below available options:

1) dash (-): Using a dash between two variable names will create an ordered list of variables.   
   So KEEP VAR1-VAR4;
   is equivalent to KEEP VAR1 VAR2 VAR3 VAR4; 

Note, if one of the variables expected in the ordered list does not exist a WARNING will be issued to the log.
However, if using the dash when defining an array, the missing variable will be created.

2) double-dash (--): Using a double dash between two variable names will create a list of variables as they are ordered in the dataset. 
    So if I have the following variables in a dataset in this order - INTN, SSN, NAME, ADDRESS, CITY
    then KEEP INTN -- ADDRESS;
    is equivalent to KEEP INTN SSN NAME ADDRESS;

3) colon (:): Using a colon after a string of letters will create a list of all variables beginning with those letter.
   That is var: will create a list of all variables in the dataset that begin with those 3 letters.
   If I have a dataset with variables named VARSSN VARINTN VARNAME VARDATE, then

   KEEP VAR:;
   is equivalent to KEEP VARSSN VARINTN VARNAME VARDATE;

4) _ALL_: Using the _ALL_ will create a list of all the variables in the dataset.
    So if I have a dataset with variables VAR1 VAR2 VAR3 and VAR4.  Then

    KEEP _ALL_;
    is equivalent to KEEP VAR1 VAR2 VAR3 VAR4;

    This one can be most useful with a proc sort nodupkey to remove records which are exact matchs on all variables.
    PROC SORT DATA=TEST NODUPKEY;
     BY _ALL_;
    RUN;

5) _CHARACTER_: Using _CHARACTER_ creates a list of all the character variables in the dataset. 
    If I have a dataset with two character variables CVAR1 and CVAR2 and two numeric variables NVAR3 NVAR4, then

    KEEP _CHARACTER_;
    is equivalent to KEEP CVAR1 CVAR2;

6) _NUMERIC_: Using _NUMERIC_ creates a list of all the numeric variables in the ataset.
    So in the same dateset from #5, then
  
    KEEP _NUMERIC_;
    is equivalent to KEEP NVAR1 NVAR2;

    Both _CHARACTER_ and _NUMERIC_ can be useful when creating arrays, since all the variables in the array must be the same type. 



4. what are system variables, automated variables and Dataset Variables.

5. How will you write a code to create a sas dataset with no data in it/ Or How to validate a dataset without actual creating it.
Ans:
1)Use OPTIONS OBS = 0

2) Creating a data set by using the like clause.
 ex: proc sql;
create table latha.emp like oracle.emp;
quit;
In this the like clause triggers the existing table structure to be copied to the new table.using this method 
result in the creation of an empty table.
3)  data check;
           set _null_;
     run;
4) data test;
        delete;
    run;
check the above answers.. I have not verified if it will work..


6. There is a date in the infile as 01-01-49 . We need to read the date as 2049 instead of 1949
Ans: Set  Options yearcuoff=2000

7. Difference between Merge and Set with example

8. Difference b/w left join,right join, full outer join

9. Do you use PROC REPORT or PROC TABULATE? Which do you prefer? Explain.
I prefer Proc report as it is higly customizable and flexible where I can define each column in what ever way I 
want to and even make use of SAS functions, logic processing, and assignment statements and create new 
variables for report making use of the compute block of proc report. I referred proc report because it is more efficient tool than tabulate becasuse with report we can do means frequency and tabulate also

10. proc Report and options available in Proc Report

11. ODS

12. how will you read an input file into sas
Ans: Use the statements INFILE to point to the file reference(should be defined using Filename) / to the file path. Use INPUT statement to read the data into the sas dataset

13. How would you remove a format that has been permanently associated with a variable? 
Ans: we can remove the format by using proc datasets:
Proc datasets;
modify <data set name>;
format <variable name>(which variable format needs to 
modify>;
run;
quit;

14. how to display duplicated observations in a data using base sas
There are two ways to display duplicate observations.
1)In data step, using first.var and last.var
2)Using proc sort with option dupout option


15. In the flow of DATA step processing, what is the first action in a typical DATA Step?
Ans:
1)Compilation Phase:
When you submit a DATA step,it reads the input statements it creates an input buffer and brings the variables and observations.it is a logical memory area and pdv brings the observations at a time from input buffer 
and checks the errors. PDV contains 2 automatic variables _n_ & _error_ ,these checks the errors in observations.
_n_ : indicates the no of obs.
_error_: 1 if error occured
             0 if no error
After that it assigns the datavalues to appropriate variable and builds a sas dataset.

2)Execution Phase

16.  Does SAS ?Translate? (compile) or does it ?Interpret?? Explain.
A typical SAS program could contain DATA steps, PROC steps and macros. Macros are preprocessed. DATA steps are just in time compiled. PROC steps are interpreted in the order they appear in program. So when we submit a SAS program consisting of all these three components, the macro is compiled and executed first. If a DATA step is encountered, then it is compiled and executed. Note that the DATA step 
will not be executed if there is an error in the compilation. If a PROC step is encountered, it is interpreted and executed line by line. However i am not certain on this PROC step behavior.
  
17. At compile time when a SAS data set is read, what items are created?
Ans: Only PDV is generated at compilation time, where as the automatic variables _N_ , _ERROR_ are generated at execution time only.

18. how do u validate sas program
Ans: When a sas code is submitted, SAS performs syntactical checks before executing the program/ code. In that case, one of the ways could be 
- at the beginning of the code, write OPTIONS OBS=0 in addition to other options and then RUN it. 
This way data will not be processed and the log shows error messages/ warnings, if any. 
- If you are executing the SAS code on PC SAS, the highlighted colors itself shows the syntactical errors, if any.

19. Name and describe three SAS functions that you have used, if any?
 1) Compress - sas function which is used to remove spaces in string value and concatenate two values 
with out spaces.
2) Input is another function of sas, it is one of conversion function in sas. It converts numeric into char.
3) Put is another conversion function in sas. It converts char to numeric.
4) SAS functions can be used to convert data and manipulate character variable values.
different types of functions:1)TRIM
                                        2)SUBSTR
                                        3)ABS
                                        4)SCAN

TRIM : Removing the trailing blanks from character expressions. 
               syntax=trim(argument)
SUBSTR : substr extracts the substring from an argument
               syntax=substr(argument,position<,n>)
abs : Returns the absolute of the argument
                syntax=abs(argument)
  
The most common functions that would be used are-
Conversion functions - Input / Put / int / ceil / floor
Character functions - Scan / substr / index / Left / trim / compress / cat / catx / upcase,lowcase
Arithmetic functions - Sum / abs /
Attribute info functions – Attrn / length
Dataset – open / close / exist
Directory  - dexist / dopen / dclose / dcreate / dinfo
File functions – fexist / fopen/ filename / fileref
SQL functions – coalesce / count / sum/ mean
Date functions – date / today / datdif / datepart / datetime / intck / mdy
Array functions – dim


20. When looking for data contained in a character string of 150 bytes, which function is the best to locate that data: scan, index, or indexc?
INDEX: Searches a character expression for a string of characters, and returns the position of the string's first character for the first occurrence of the string.
INDEX (source, excerpt)
it returns the position where the 2nd field is in the source


Eg:  
     str1 = 'Hi i am fine here. how are u there ? .....';
     str2 = index(str1,'how'); --> str2 = 20

SCAN is to get a substring upto mentioned character. Scan function the best for locating the particular word specified in a argument,scan function default length is 200bytes.
data k;
r='ganesh kumar';
u=scan(r,2);
proc print;
run;

result:  kumar

INDEXC to locate only for mentioned one or more single character
        INDEXC(character-value, 'char1','char2','char3', ..)

FIND: Searches for a specific substring of characters within a character string
 FIND(string,substring<,modifiers><,startpos>)
it returns the position the substring is in

FINDC: To locate a character that appears or does not appear within a string.used to search for any one in a list of character 
values

INDEXW: Searches for the substring as a word that follows a space in the provided sentence
eg;string1= "there is a the here" ;
INDEXW(STRING1,"the")  

result: 12 (the word "the") 

21. If you have a data set that contains 100 variables, but you need only five of those, what is the code to force SAS to use only those variable?
Ans: Use KEEP option on a dataset to only select few variables from 100 variables. We can use KEEP option either on set statement or data step statement. If we use on SET statement then only the 
five variables are created on pdv and only these variables are sent to the output dataset. If we use KEEP option on data step statement then all the variables are copied into pdv and after any maniuplation only the selected variables on data step statement are processed and sent to output dataset.

data abc;
  set xyz (keep= ab cd ef gh);
run;

data abc(keep= ab cd ef gh ij);
  set xyz;
 ij=ab+jk;
run;

22. What is the purpose of the trailing and How would you use them?
If the data is continuosly in data set SAS would read the first words only from each line in the `datalines' block and it will ignore the rest of the line. if we use Trailing @@'it will read completly.and another type of trailing is using single @ this is a line hold specifier.
Trailing @ is used to hold the record in input buffer to execute another input statement on the same datalines.
Trailing @@ is used to hold the record in input buffer to execute same input statement on same datalines intil eof record
The trailing @ or more technically, line hold specifiers are used to hold the pointer in the same record for 
multiple iterations.  The two tyoes of line hold specifiers are single trailing(@) and double trailing(@@).
The single trailing hold the record until it encounters either another input statement or end of the datastep.  
They are used for the records such as 
001F38   H
002 F 40 G
To read these values to the datastep
Data example;
  input @10 type $ @;
  if type='H' then
    input @1 id 3. @4 gender $1. @5 age2.;
  else if type='G' then
    input @1 id3. @5 gender $1. @7 age 2.;
  end;
cards;
001F38   H
002 F 40 G
;
run;

The double trailing holds the until the end of the record.
Data example2;
  input id age @@;
cards;
001 23 002 43 003 65 004 32 005 54
;
run;
  

23. What is the significance of the ‘OF’ in X=SUM (OF a1-a4, a6, a9);? 
Ans: It is use to tell sas to consider a set of values to be processed. In the above example, SUM(OF a1-a4,a6,a9) resolves to SUM(A1,A2,A3,A4,A6,A9).
If we dont use 'OF' then it would be treated as a minus sign.. A1(Minus)-A4 and that is not what we are trying to accomplish. 

24. How do you debug and test your SAS programs? 
put statement and debug option is used for debug and test the programs.

25. How do you test for missing values? 

NMISS OPTION is used for missing values

26. How would you create multiple observations from a single observation? 
line pointer is used for multiple lines per observation
  @@ is used for multiple observations per line

27. What are some good SAS programming practices for processing very large data sets? 
ARRAYS IS USED FOR PROCESSING FOR LARGE DATA SET

28. Briefly describe 5 ways to do a "table lookup" in SAS. 

29. Why is SAS considered self-documenting? 
WHEN DATA SET IS CREATED SAS CREATE DISCRIPTOR PORTION 
AND DATA PORTION .THAT MEANS SAS STORES THE INFOMATION LIKE 
VARIABLE NAME ,LENGTH,TYPE etc.

30. How does SAS handle missing values in: assignment statements, functions, a merge, an update, sort order, formats, PROCs? 


When you check for ordinary missing numeric values, you can use code that is similar to the following: 
if numvar=. then do;
If your data contains special missing values, you can check for either an ordinary or special missing value with a statement that is similar to the following: 
if numvar<=.z then do;
To check for a missing character value, you can use a statement that is similar to the following: 
if charvar=' ' then do;
The MISSING function enables you to check for either a character or numeric missing value, as in: 
if missing(var) then do;

31. How do you generate random samples? ---ranuni
question will be asked like: I have a dataset that creates a value.. I need some random values to be generated each time i execute that code for creating that dataset, how will you generate the random values..

Use the function RANUNI

32. What is the Program Data Vector (PDV)? What are its functions?
When SAS processes a data step it has two phases. Compilation phase and execution phase.
During the compilation phase the input buffer is created to hold a record from external file. After input buffer is
created the PDV is created. The PDV is the area of memory where sas builds dataset, one observation at a time. The PDV contains two automatic variables _N_ and _ERROR_.
  
To the above answers I like to add a bit of information that PDV has even more automated variables such as
FIRST.<by-group-variable>, LAST.<by-group-variable>, _END_, ...etc, along with _ERROR_ and _N_ where each has got its own significance to have a control over the execution of a SAS Program.
  Source: SAS book
The PDV contains all the variables in the input data set, the variables created in DATA step statements, and the two variables, _N_ and _ERROR_, that are automatically generated for every DATA step. The _N_ variable represents the number of times the DATA step has iterated. The _ERROR_ variable acts like a binary switch whose value is 0 if no errors exist in the DATA step, or 1 if one or more errors exist

Variables that are created by the INPUT and the Sum statements  are set to missing initially. Note that in this representation, numeric variables are initialized with a period and character variables are initialized with blanks. The automatic variable _N_ is set to 1; the automatic variable _ERROR_ is set to 0. 

The variable TeamName is marked Drop in the PDV because of the DROP= data set option in the DATA statement. Dropped variables are not written to the SAS data set. The _N_ and _ERROR_ variables are dropped because automatic variables created by the DATA step are not written to a SAS data set

33. If reading an external file to produce an external file, what is the shortcut to write that record without coding every single variable on the record?
A PUT Statement using a Variable _infile_  can be used to achieve this task. where we wont have to mention all the variable names for output using put statement.

_all_ - Write all var present in Dataset including computed variables also.
_input_- write only those variables read from input file
_infile_ - reads in all the content from an external input file as a single block. It can be used to write out if no changes are needed.
  
34. If reading a variable length file with fixed input, how would you prevent SAS from reading the next record if the last variable didn?t have a value?
Use the MISSOVER keyword; it forces SAS to put either a blank or a period for the missing variable before continuing to the next record.

35. what is the diff b/w verification validation in sas
verification :
   is the value is correct ot not
validation :
   is the value is existable or not

EX:   0 <= p(x) <= 1 
validation:  value of p(x) must be lies between 0 and 1 other wise it's not validate result
verification: check again the calculation process to obtain the valid value for p(x).......
p(x) means probability of x(Success)

36. What is the one statement to set the criteria of data that can be coded in any step?
OPTIONS

37. What is the purpose of using the N=PS option?
The N=PS option creates a buffer in memory which is large enough to store PAGESIZE (PS) lines and enables a page to be formatted randomly prior to it being printed.

38. Which date advances a date, time or date/time value by a given interval? 
Here is the detailed usage of intnx function
date1=intnx('month','01jan95'd,5,'beginning');
put date1 / date1 date7.;

date2=intnx('month','01jan95'd,5,'middle');
put date2 / date2 date7.;

date3=intnx('month','01jan95'd,5,'end');
put date3 / date3 date7.;

date4=intnx('month','01jan95'd,5,'sameday');
put date4 / date4 date7.;

date5=intnx('month','15mar2000'd,5,'same');
put date5 / date5 date9.;

interval='month';
date='1sep2001'd;
align='m';
date4=intnx(interval,date,2,align)
;put date4 / date4 date7.;

39. If you need the value of a variable rather than the variable itself what would you use to load the value to a 
macro variable?
If we need a value of a macro variable then we must define it in such terms so that we can call them everywhere in the program. Define it as Global. There are different ways of assigning a global variable. Simplest method is %LET.
Ex: A, is macro variable. 
Use following statement to assign the value of a rather than the variable itself 
%Let A=xyz x="&A";
This will assign "xyz" to x, not the variable xyz to x.

40. How do i read multiple spaces in datasets?
1) While reading external data if there are any consecutive blanks we use DSD.
2) While reading free format extrenal data, if  two words have to be read for a sing variable we use &.

41. 
data abc;
input s w k g o t a m;
cards;
1 2 3 4 5 6 7 8 
2 3 4 5 6 7 8 9
;
run;

i want the output to be in the sorted order(only variables).observations should not be changed..

Use options formdlim='.';

**** create a dummy variable by assigning the values of any 
one of ur variable;
**** here I have taken the values of S variable of your ABC 
data set;
data setabc;
set abc;
l=s;
run;
proc print;run;
* transpose the variables into observations of a transposed 
data set by using the ID statement; 
proc transpose data=setabc out=T1_abc let;
id l;
run;
proc print;run;
**sort the transposed data set by using _name_ variable 
inorder to get varibales in a ascending order;
proc sort data=T1_abc;
by _name_;
run;
proc print;run;
** once again transpose the sorted data set with the _name_ 
variable in ID statement;
proc transpose data=t1_abc out=t2_abc(drop=_name_) let;
id _name_;
run;
proc print noobs;
title 'sorting the variables in Ascending order';
run;

42  WHAT IS DEBUGGING? HOW TO TEST THE DEBUGGING IN SAS?
Debugging is the process of testing the logic of the program. 
To debug the sas Datastep: use the Debug clause after '/' in the data statement and execute. If needed set watch points,and keep on pressing enter key while Observing  the values in the debugger log window.Finally exit from it.
To debug the Macro programs, use the system options like Symbolgen,Mlogic, Mprint.
Debugging is also supported by the log window by notes,warnings, errors and messeges of  and Error macro
variables.

43. How to import the Zip files into SAS? If it is possible in SAS? If it is posible write the code...

PROC IMPORT OUT= WORK.ALL 
   DATAFILE= "C:\Users\MyDocuments\xxxx\file1.zip" 
            DBMS=CSV REPLACE;
     GETNAMES=YES;
     DATAROW=2; 
RUN;

44. Difference Between Proc Means and Proc Summary:
proc means:It will give descreptive statitstics. By default it will give output in output window.
limited statistics will give like n,mean,median,min and max.If we need additional statistics we need to add options.
Proc summary:same thing but it will not give output as default.we need give an option print then only it will give the output.
  
1) Proc Means generate a default output but Proc Summary require Print option to generate output similar to proc means.
2) To get similar outputs Proc summary should compulsory have atleast one variable in VAR statement.
If you want a proof for the above please run the below code:
proc means data=sasuser.admit;
run;
proc summary data=sasuser.admit print;
run;

The outputs will be different.

45. SCANOVER, MISSOVER, TRUNCOVER, FLOWOVER, STOPOVER...

46. explain nway in Proc summary



25 comments:

  1. Hi Amulya,
    Good Work by you.

    Please find some corrections in your post.
    Q 19:
    2) Input is another function of sas, it is one of conversion function in sas. It converts Character values into Numeric Values. you have written it vice versa.
    Similarly 3) Put is another conversion function in sas. It converts numeric to character (not Char to Num)

    ReplyDelete
    Replies
    1. Aspiring to go ahead with Base SAS certification. Was looking for complete Base SAS questions and material to go ahead with A00-211 certification. I found your blog with very nice piece of information and go ahead to follow. I found other good websites too, that will too help me.!!! Later I am looking for online Base SAS exam practice questions and exam to boost my confident to crack SAS certification exam at first attempt

      Delete
  2. I’ve been browsing on-line greater than three hours today, but I never discovered any attention-grabbing article like yours. It is beautiful worth sufficient for me. Personally, if all webmasters and bloggers made good content material as you did, the net will be a lot more helpful than ever before.

    Base SAS Training in chennai

    ReplyDelete
  3. I simply couldn’t depart your site before suggesting that I really enjoyed the usual information an individual supply in your visitors? Is going to be again steadily to check out new posts.

    SAS Training in Chennai

    ReplyDelete
  4. Base sas certification is high in demand as job openings are mostly based on SAS base certification. I strongly recommend you to clear your Base SAS certification with analyticsexam as it is maintained by SAS experts and experienced professionals. I was able to manage 90% in my exam and questions was so close to practice exams from analyticsexam

    ReplyDelete
  5. This is an awesome post.Really very informative and creative contents. These concept is a good way to enhance the knowledge.I like it and help me to development very well.Thank you for this brief explanation and very nice information.Well, got a good knowledge.
    SAS Training in Chennai

    ReplyDelete
  6. It’s really amazing that we can record what our visitors do on our site. Thanks for sharing this awesome guide. I’m happy that I came across with your site this article is on point,thanks again and have a great day. Keep update more information..

    MSBI Training in Chennai

    Informatica Training in Chennai

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete
  8. Thanks for the informative article. This is one of the best resources I have found in quite some time. Once Again Thanks for Sharing this Valuable Information i like this i Can Share this with My Friend Circle.
    SAS Interview Questions and Answers

    ReplyDelete
  9. This comment has been removed by the author.

    ReplyDelete
  10. This comment has been removed by the author.

    ReplyDelete
  11. Wow it is really wonderful and awesome thus it is very much useful for me to understand many concepts and helped me a lot. it is really explainable very well and i got more information from your blog.

    rpa training in chennai | rpa training in chennai
    rpa training in pune | rpa online training | rpa training in bangalore

    ReplyDelete
  12. The site was so nice, I found out about a lot of great things. I like the way you make your blog posts. Keep up the good work and may you gain success in the long run.


    Data Science Training in Chennai | Data Science course in anna nagar
    Data Science course in chennai | Data science course in Bangalore
    Data Science course in marathahalli | Data Science course in btm

    ReplyDelete
  13. Thanks you for sharing this unique useful information content with us. Really awesome work. keep on blogging
    java training in jayanagar | java training in electronic city

    java training in chennai | java training in USA

    ReplyDelete
  14. Thank you for allowing me to read it, welcome to the next in a recent article. And thanks for sharing the nice article, keep posting or updating news article.
    python training Course in chennai | python training in Bangalore | Python training institute in kalyan nagar

    ReplyDelete
  15. I was recommended this web site by means of my cousin. I am now not certain whether this post is written through him as nobody else recognise such precise about my difficulty. You're amazing! Thank you!
    python course in pune
    python course in chennai
    python Training in Bangalore

    ReplyDelete
  16. Great Article… I love to read your articles because your writing style is too good, its is very very helpful for all of us and I never get bored while reading your article because, they are becomes a more and more interesting from the starting lines until the end.
    Data Science Interview questions and answers
    Data Science Tutorial

    ReplyDelete
  17. I read this post two times, I like it so much, please try to keep posting & Let me introduce other material that may be good for our community.
    Best Devops Training in pune
    Microsoft azure training in Bangalore
    Power bi training in Chennai

    ReplyDelete
  18. I found your blog while searching for the updates, I am happy to be here. Very useful content and also easily understandable providing.. Believe me I did wrote an post about tutorials for beginners with reference of your blog. 
    best rpa training in bangalore
    rpa training in pune | rpa course in bangalore
    RPA training in bangalore
    rpa training in chennai

    ReplyDelete
  19. Such a very useful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. After reading your article I was amazed. I know that you explain it very well.
    And I hope that other readers will also experience how I feel after reading your article
    java training in chennai

    java training in tambaram

    aws training in chennai

    aws training in tambaram

    python training in chennai

    python training in tambaram

    selenium training in chennai

    selenium training in tambaram

    ReplyDelete
  20. Nice site.... refer this site .
    if Our vision succes!Training are focused on perfect improvement of technical skills for Freshers and working professional.
    Our Training classes are sure to help the trainee with COMPLETE PRACTICAL TRAINING and Realtime methodologies.
    java training in chennai

    java training in tambaram

    aws training in chennai

    aws training in tambaram

    python training in chennai

    python training in tambaram

    selenium training in chennai

    selenium training in tambaram

    ReplyDelete