Saturday 15 June 2013

Complete SSAS - Questions and Answers

SSAS Interview Questions and Answers

1. Star Vs Snowflake schema and Dimensional modeling
Star Schema: One Fact Table surrounded by number of Dimension Tables. It is a De-Normalized form.
Dimension table will not have any parent table.
Hierarchies in the Dimension are stored in the Dimension itself.

Snowflake: Normalized form of star schema is a snow flake schema. Dimension tables can be further broken down into sub dimensions.

Dimension table will have one or more parent tables.
Hierarchies are broken into seperate tables in snow schema. These hierarchies helps to drilldown the data from Top hierarchy to lowermost hierarchy.
Increases the number of joins and poor performance in retrival of data.

2. Data storage modes - MOLAP, ROLAP, HOLAP

In ROLAP, the structure of aggregation along with the values are stored in the 2 dimensional relational formats at disc level.
   ROLAP offers low latency, but it requires large storage space as well as slower processing and query response times.

In MOLAP, the structure of aggregation along with the data values are stored in multi dimensional format, takes more space with less time for data analysis compared to ROLAP.
    MOLAPoffers faster query response and processing times, but offers a high latency and requires average amount of storage space. This storage mode leads to duplication of data as the detail data is present in both the relational as well as the multidimensional storage.

In HOLAP, stucture is stored in Relational model and data is stored in multi dimensional model  which provides optimal usage and space.
This storage mode offers optimal storage space, query response time, latency and fast processing times.

Default storage setting is MOLAP.


3. Types of Dimensions

Dimension typeDescription
RegularA dimension whose type has not been set to a special dimension type.
TimeA dimension whose attributes represent time periods, such as years, semesters, quarters, months, and days.
OrganizationA dimension whose attributes represent organizational information, such as employees or subsidiaries.
GeographyA dimension whose attributes represent geographic information, such as cities or postal codes.
BillOfMaterialsA dimension whose attributes represent inventory or manufacturing information, such as parts lists for products.
AccountsA dimension whose attributes represent a chart of accounts for financial reporting purposes.
CustomersA dimension whose attributes represent customer or contact information.
ProductsA dimension whose attributes represent product information.
ScenarioA dimension whose attributes represent planning or strategic analysis information.
QuantitativeA dimension whose attributes represent quantitative information.
UtilityA dimension whose attributes represent miscellaneous information.
CurrencyThis type of dimension contains currency data and metadata.
RatesA dimension whose attributes represent currency rate information.
ChannelA dimension whose attributes represent channel information.
PromotionA dimension whose attributes represent marketing promotion information.



4. Types of Measures
Fully Additive Facts: These are facts which can be added across all the associated dimensions. For example, sales amount is a fact which can be summed across different dimensions like customer, geography, date, product, and so on.
Semi-Additive Facts: These are facts which can be added across only few dimensions rather than all dimensions. For example, bank balance is a fact which can be summed across the customer dimension (i.e. the total balance of all the customers in a bank at the end of a particular quarter).  However, the same fact cannot be added across the date dimension (i.e. the total balance at the end of quarter 1 is $X million and $Y million at the end of quarter 2, so at the end of quarter 2, the total balance is only $Y million and not $X+$Y).
Non-Additive Facts: These are facts which cannot be added across any of the dimensions in the cube. For example, profit margin is a fact which cannot be added across any of the dimensions. For example, if product P1 has a 10% profit and product P2 has a 10% profit then your net profit is still 10% and not 20%.  We cannot add profit margins across product dimensions. Similarly, if your profit margin is 10% on Day1 and 10% on Day2, then your net Profit Margin at the end of Day2 is still 10% and not 20%.
Derived Facts: Derived facts are the facts which are calculated from one or more base facts, often by applying additional criteria. Often these are not stored in the cube and are calculated on the fly at the time of accessing them. For example, profit margin.
Factless Facts: A factless fact table is one which only has references (Foreign Keys) to the dimensions and it does not contain any measures. These types of fact tables are often used to capture events (valid transactions without a net change in a measure value). For example, a balance enquiry at an automated teller machine (ATM). Though there is no change in the account balance, this transaction is still important for analysis purposes.
Textual Facts: Textual facts refer to the textual data present in the fact table, which is not measurable (non-additive), but is important for analysis purposes. For example, codes (i.e. product codes), flags (i.e. status flag), etc.
5. Types of relationships between dimensions and measuregroups. 
No relationship: The dimension and measure group are not related.
Regular: The dimension table is joined directly to the fact table.
Referenced: The dimension table is joined to an intermediate table, which in turn,is joined to the fact table.
Many to many:The dimension table is to an intermediate fact table,the intermediate fact table is joined, in turn, to an intermediate dimension table to which the fact table is joined.
Data mining:The target dimension is based on a mining model built from the source dimension. The source dimension must also be included in the cube.
Fact table: The dimension table is the fact table.
6. Proactive caching
Proactive caching can be configured to refresh the cache (MOLAP cache) either on a pre-defined schedule or in response to an event (change in the data) from the underlying relational database. Proactive caching settings also determine whether the data is queried from the underlying relational database (ROLAP) or is read from the outdated MOLAP cache, while the MOLAP cache is rebuilt.
Proactive caching helps in minimizing latency and achieve high performance.

It enables a cube to reflect the most recent data present in the underlying database by automatically refreshing the cube based on the predefined settings.
Lazy aggregations:
When we reprocess SSAS cube then it actually bring new/changed relational data into SSAS cube by reprocessing dimensions and measures. Partition indexes and aggregations might be dropped due to changes in related dimensions data so aggregations and partition indexes need to be reprocessed. It might take more time to build aggregation and partition indexes.
If you want to bring cube online sooner without waiting rebuilding of partition indexes and aggregations then lazy processing option can be chosen. Lazy processing option bring SSAS cube online as soon as dimensions and measures get processed. Partition indexes and aggregations are triggered later as a background job.

Lazy processing option can be changed by server level property "OLAP\LazyProcessing\Enabled"

Advantage: Lazy processing saves processing time as it brings as soon as measures and dimension data is ready.
Disadvantage: User will see performance hit when aggregation are getting build in background.
7. Partition processing options
Process Default: SSAS dynamically chooses from one of the following process options.
Process Full: Drop all object stores, and rebuild the objects. This option is when a structural change has been made to an object, for example, when an 
attribute hierarchy is added, deleted, or renamed.
Process Update: Forces a re-read of data and an update of dimension attributes. Flexible aggregations and indexes on related partitions will be dropped.
Process Add: For dimensions, adds new members and updates dimension attribute captions and descriptions.
Process Data:Processes data only without building aggregations or indexes. If there is data is in the partitions, it will be dropped before re-populating the 
partition with source data.
Process Index:  Creates or rebuilds indexes and aggregations for all processed partitions. For unprocessed objects, this option generates an error.
Unprocess: Delete data from the object.
Process Structure: Drop the data and perform process default on all dimensions.
Process Clear: Drops the data in the object specified and any lower-level constituent objects. After the data is dropped, it is not reloaded.
Process Clear Structure: Removes all training data from a mining structure.

8. Difference between attirubte hierarchy and user hierarchy
An Attribute Hierarchy is created by SSAS for every Attribute in a Dimension by default. An Attribute by default contains only two levels - An "All" level and a 
"Detail" level which is nothing but the Dimension Members.

A User Defined Hierarchy is defined explicitly by the user/developer and often contains multiple levels. For example, a Calendar Hierarchy contains Year, 
Quarter, Month, and Date as its levels.

Some of the highlights/differences of Attribute and User Defined Hierarchies:
1. Attribute Hierarchies are always Two-Level (Unless All Level is suppressed) whereas User Defined Hierarchies are often Multi-Level.
2. By default, Every Attribute in a Dimension has an Attribute Hierarchy whereas User Defined Hierarchies have to be explicitly defined by the user/developer.
3. Every Dimension has at least one Attribute Hierarchy by default whereas every Dimension does not necessarily contain a User Defined Hierarchy. In essence, a Dimension can contain zero, one, or more User Defined Hierarchies.
4. Attribute Hierarchies can be enabled or disabled. Disable the Attribute Hierarchy for those attributes which are commonly not used to slice and dice the data during analysis, like Address, Phone Number, and Unit Price etc. Doing this will improve the cube processing performance and also reduces the size of the cube as those attributes are not considered for performing aggregations.
5. Attribute Hierarchies can be made visible or hidden. When an Attribute Hierarchy is hidden, it will not be visible to the client application while browsing the Dimension/Cube. Attribute Hierarchies for those attributes which are part of the User Defined Hierarchies, like Day, Month, Quarter, and Year, which are part of the Calendar Hierarchy, can be hidden, since the attribute is available to the end users through the User Defined Hierarchy and helps eliminate the confusion/redundancy for end users.
9. Dimension, Hierarchy, Level, and Members
Dimensions in Analysis Services contain attributes that correspond to columns in dimension tables. These attributes appear as attribute hierarchies and can 
be organized into user-defined hierarchies, or can be defined as parent-child hierarchies based on columns in the underlying dimension table. Hierarchies are used to organize measures that are contained in a cube. 
Hierarchy: is the relation between attributes in a dimension.
Level: refers to individual attribute within the Hierarchy.
10. Difference between database dimension and cube dimension
When you create a dimension using dimension wizard in BIDS, then you're creating a Database dimension in your AS database. Database dimensions is independent of cube and can be processed on their own. 

When you build a cube, and you add dimensions to that cube, you create cube dimensions: cube dimensions are instances of a database dimension inside a cube.

A database dimension can be used in multiple cubes, and multiple cube dimensions can be based on a single database dimension

The Database dimension has only Name and ID properties, whereas a Cube dimension has several more properties.
Database dimension is created one where as Cube dimension is referenced from database dimension.
Database dimension exists only once.where as Cube dimensions can be created more than one using ROLE PLAYING Dimensions concept.
11. Importance of CALCULATE keyword in MDX script, data pass and limiting cube space

12. Effect of materialize

When setting up a dimension with a Refence relationship type, we have the option of "materializing" the dimension.

   Select to store the attribute member in the intermediate dimension that links the attribute in the reference dimension to the fact table in the MOLAP 
structure.  This imporvies the qery performance, but increases the processing time and storage space.
  If the option is not selected, only the relationship between the fact records and the intermediate dimension is stored in the cube. This means that Anaylysis services has to derive the aggregated values for the members of the referenced dimension when a query is executed, resulting in slower query performance.
13. Partition processing and Aggregation Usage Wizard

14. Perspectives, Translations, Linked Object Wizard


15. Handling late arriving dimensions / early arriving facts


16. Role playing Dimensions, Junk Dimensions, Conformed Dimensions, SCD and other types of dimensions

Role playing Dimesnion:
A Role-Playing Dimension is a Dimension which is connected to the same Fact Table multiple times using different Foreign Keys. 
eg: Consider a Time Dimension which is joined to the same Fact Table (Say FactSales) multiple times, each time using a different Foreign Key in the Fact 
Table like Order Date, Due Date, Ship Date, Delivery Date, etc
Steps:
In Cube Designer, click the Dimension Usage tab.
Either click the 'Add Cube Dimension' button, or right-click anywhere on the work surface and then click Add Cube Dimension.
In the Add Cube Dimension dialog box, select the dimension that you want to add, and then click OK.
Conformed Dimension is a Dimension which connects to multiple Fact Tables across one or more Data Marts (cubes). Conformed Dimensions are exactly the same  structure, attributes, values (dimension members), meaning and definition.
Example: A Date Dimension has exactly the same set of attributes, same members and same meaning irrespective of which Fact Table it is connected to 

linked dimension is based on a dimension that is stored in a separate Analysis Services  Database which may or may not be on the same server. You can create and maintain a dimension in just one database and then reuse that dimension by creating linked  dimensions for use in multiple databases.

Linked Dimensions can be used when the exact same dimension can be used across multiple Cubes within an Organization like a Time Dimension, gography 
Dimension etc.

Here are some of the highlights of a Linked Dimension:
-More than one Linked Dimension can be created from a Single Database Dimension.
-These can be used to implement the concept of Conformed Dimensions.
-For an end user, a Linked Dimension appears like any other Dimension.

Degenerate Dimension is a Dimension which is derived out of a Fact Table and it appears to the end user as a separate/distinct Dimension, its data is 
actually stored in the Fact Table. It's a Dimension table which does not have an underlying physical table of its own.

Degenerate Dimensions are commonly used when the Fact Table contains/represents Transactional data like Order Details, etc. and each Order has an Order Number associated with it, which forms the unique value in the Degenerate Dimension.

Junk Dimension is often a collection of Unrelated Attributes like indicators, flags, codes, etc. It is also called as a Garbage Dimension.
Junk Dimensions are usually small in size.

One of the common scenarios is when a Fact Table contains a lot of Attributes which are like indicators, flags, etc. Using Junk Dimensions, such Attributes can be removed/cleaned up from a Fact Table.

SCD: The Slowly Changing Dimension (SCD) concept is basically about how the data modifications are absorbed and maintained in a Dimension Table. 

 The new (modified) record and the old record(s) are identified using some kind of a flag like say IsActive, IsDeleted etc. or using Start and End Date fields to indicate the validity of the record. 
17. Parent Child Hierarchy, NamingTemplate property, MemberWithLeafLevelData property

18. 
How will you keep measure in cube without showing it to user?

19. How to pass parameter in MDX

Here is an example, suppose this is your query:
select {[Measures].[Internet Order Quantity]} on 0,
[Product].[Category].[Category] on 1
from [Adventure Works]
where [Date].[Calendar Year].&[2002]

You can modify it like this:
select {[Measures].[Internet Order Quantity]} on 0,
[Product].[Category].[Category] on 1
from [Adventure Works]
where strtomember(@P1)

Now, if you pass the value [Date].[Calendar Year].&[2002] to the P1, then it will run just like:
where [Date].[Calendar Year].&[2002]

STRTOSET returns a set.
STRTOMEMBER returns a member.
20. SSAS 2008 vs SSAS 2012

21. Dimension security vs Cell security


22. SCOPE statement, THIS keyword, SUBCUBE


23. CASE (CASE, WHEN, THEN, ELSE, END) statement, IF THEN END IF, IS keyword, HAVING clause


24. CELL CALCULATION and CONDITION clause


25. RECURSION and FREEZE statement


26. Common types of errors encountered while processing a dimension / measure groups / cube


27. Logging and monitoring MDX scripts and cube performance



28. What do you understand by attribute relationship? what are the main advantages in using attribute relationship? 
An Attribute Relationship is a relationship between various attributes within a Dimension. By default, every Attribute in a Dimension is related to the Key 
Attribute.

There are basically 2 types of Attribute Relationships: Rigid, Flexible
29. What is natural hierarchy and how will you create it?
Natural hierarchies, where each attribute is related either directly or indirectly to all other attributes in the same hierarchy, as in product category - product 
subcategory - product name
30. What do you understand by rigid and flexible relationship? Which one is better from performance perspective?
Rigid: Attribute Relationship should be set to Rigid when the relationship between those attributes is not going to change over time. For example, 
relationship between a Month and a Date is Rigid since a particular Date always belongs to a particular Month like 1st Feb 2012 always belongs to Feb 
Month of 2012. Try to set the relationship to Rigid wherever possible.
Flexible: Attribute Relationship should be set to Flexible when the relationship between those attributes is going to change over time. For example, relationship between an Employee and a Manager is Flexible since a particular Employee might work under one manager during this year (time period) and under a different manager during next year (another time period).
31. In which scenario, you would like to go for materializing dimension? 
Reference dimensions let you create a relationship between a measure group and a dimension using an intermediate dimension to act as a bridge between 
them.
32. In dimension usage tab, how many types of joins are possible to form relationship between measure group and dimension?

33. What is deploy, process and build?
Bulid: Verifies the project files and create several local files.
Deploy: Deploy the structure of the cube(Skeleton) to the server.
Process: Read the data from the source and build the dimesions and cube structures
34. Can you create server time dimension in analysis services(Server time dimension)? 

35. How many types of dimension are possible in SSAS?
Account
Bill of Materials
Currency
Channel
Customer
Geography
Organizations
Products
promotion
Regular
Scenario
Time
Unary
36. What is time intelligence? How will you implement in SSAS?

37. What do you understand by linked cube or linked object feature in SSAS?

38. How will you write back to dimension using excel or any other client tool?

39. What do you understand by dynamic named set (SSAS 2008)? How is i different from static named set?

40. In Process Update, which relationship will be better(Rigid and Flexible relationship)?  


41. What is the difference between "ProcessingGroup" ByAttribute and ByTable? 

42. What do you understand by following properties of dimension attribute.
Default Member
AttributeHierarchyEnabled
AttributeHierarchyOptimizedState
DiscretizationMethod
OrderBy
OrderByAttribute

43.  What are different storage mode option in SQL server analysis services and which scenario, they will be useful?

44.  How will you implement data security for given scenario in analysis service data?

"I have 4 cubes and 20 dimension. I need to give access to CEO, Operation managers and Sales managers and employee. 
1) CEO can see all the data of all 4 cubes.
2) Operation Managers can see only data related to their cube. There are four operation managers.
3) Employees can see only certain dimension and measure groups data. (200 Employees) "

45. What are the options to deploy SSAS cube in production?
Right click on Project in Solution Explorer -> Properties
Build -> Select ' Output Path'
Deployment -> 
Processing Option - Default, Full, Do Not Process
Transactional Deployment - False, True
Deployment Mode - Deploy All, Deploy Changes only

1.BIDS
In BIDS from the build menu – select the build option (or right click on the project in the solution explorer).
The build process will create four xml files in the bin subfolder of the project folder 
.asdatabase -  is the main object definition file
.configsettings
.deploymentoptions
.deploymenttargets

2. Deploy
Deployment via BIDS will overwrite the destination database management settings – so is not recommended for production deployment.

A more controllable option is the Deployment wizard, available in interactive or command line mode.
Run the wizard from Start -> All Programs ->Microsoft Sql Server -> Analysis Services -> deployment wizard
1. Browse to the .asdatabase file created by the build
2. connect to the target server
3. Configure how partitions and roles should be deployed
4. specify how configuration settings are deployed
5. Specify Processing options:
Default processing allows SSAS to decide what needs to be done; Full processing can be used to process all objects.  You can also choose not to process at all. 
6. choose whether to deploy instantly or to create an XMLA command script for later deployment. The script will be created in the same location as the 
.asdatabase file.

46. What are the options available to incrementally load relational data into SSAS cube?
Use Slowly Changing Dimesnion
47. Why will you use aggregation at remote server?    

48.  What are different ways to create aggregations in SSAS? 

49. What do you understand by Usage based optimization?

50.  Can we use different aggregation scheme for different partitions?

51. Why will you use perspective in SSAS?

52. What are KPIs? How will you create KPIs in SSAS?

53. What are the main feature differences in SSAS 2005 and SSAS 2008 from developer point of view?

54.What are the aggregate functions available for measure in SSAS?
Sum, Min, Max, Count, and Distinct Count
55. What are the processing modes available for measure group? What do you understand by lazy aggregation? 

56. How can you improve dimension design?
1: Limit the Number of Dimensions Per Measure Group and Number of Attributes Per Dimension.
AttributeHierarchyOptimizedState: Determines the level of optimization applied to the attribute hierarchy. By default, an attribute hierarchy is FullyOptimized, which means that Analysis Services builds indexes for the attribute hierarchy to improve query performance. The other option, NotOptimized, means that no indexes are built for the attribute hierarchy. 
2: Use Dimension Properties Effectively
For large dimensions that expose millions of rows and have a large number of attributes, pay particular attention to the ProcessingGroup property. By 
default, this property is assigned a value of ByAttribute.
3: Use Regular Dimension Relationship Whenever Possible
4: Use Integer Data Type for Attribute Keys If at All Possible
5: Use Natural Hierarchies When Possible
57. What are the performance issues with parent child hierarchy? 
In parent-child hierarchies, aggregations are created only for the key attribute and the top attribute, i.e., the All attribute unless it is disabled.
58. What do you understand by formula engine and storage engine?
Formula Engine is single-threaded, Storage Engine (SE) is multi-threaded. 
The Query Processor Cache/Formula Engine Cache caches the calculation results whereas the Storage Engine Cache caches aggregated/fact data being 
queried.
59. How can you improve overall cube performance?
Partitioning the cube can help to reduce the processing time. The benefit of partitioning is that it allows  to process multiple partitions in parallel on a 
server that has multiple processors. 

Regarding the best possible processing strategy, the following steps:
1. Process Update all the dimensions that could have had data changed.  Depending on the nature of the changes in the dimension table, Process Update can affect dependent partitions.  If only new members were added, then the partitions are not affected.  But if members were deleted or if member relationships changed, then some of the aggregation data and bitmap indexes on the partitions are dropped. 
2. Process Data the partitions that have changed data (which are usually the most recent partitions).  Of course, the smaller the partition, the better, so try to use daily partitions instead of monthly or use monthly partitions instead of yearly.
3. Process Index for the rest of the partitions 

MDX

1. Explain the structure of MDX query?

2. Tell me your 5 mostly used MDX functions?


3. What is the difference between set and tuple?

Tuple: It is a collection of members from different Dimension
Set: collection of one or More tuples from same dimension
4. What do you understand by Named set? Is there any new feature added in SSAS 2012 related to named set?

5. How will you differentiate among level, member, attribute, hierarchy?


6. What are the differences among exists, existing and scope?


7. What will happen if we remove CALCULATE keyword in the script?


8. How will you pass parameter in MDX?


9. What is the difference between .MEMBERS and .CHILDREN?


10.What is the difference between NON EMPTY keyword and NONEMPTY() function?


11. Functions used commonly in MDX like Filter, Descendants, BAsc and others


12. Difference between NON EMPTY keyword and function, NON_EMPTY_BEHAVIOR, ParallelPeriod, AUTOEXISTS


13. Difference between static and dynamic set


14. Difference between natural and unnatural hierarchy, attribute relationships


15. Difference between rigid and flexible relationships


16. Write MDX for retrieving top 3 customers based on internet sales amount? 


17. Write MDX to find current month's start and end date?


18. Write MDX to compare current month's revenue with last year same month revenue? 


19. Write MDX to find MTD(month to date), QTD(quarter to date) and YTD(year to date) internet sales amount for top 5 products?


20. Write MDX to find count of regions for each country?


21. Write MDX to rank all the product category based on calendar year 2005 internet sales amount?


22. Write MDX to extract nth position tuple from specific set?


23. Write MDX to set default member for particular dimension?


24. What are the performance consideration for improving MDX queries?


25. Is Rank MDX function performance intensive? 


26.  Which one is better from performance point of view...NON Empty keyword or NONEMPTY function?


27. How will you find performance bottleneck in any given MDX?


28. What do you understand by storage engine and formula engine?

No comments:

Post a Comment