The choice between Tabular or Multidimensional models in SQL Server Analysis Services 2012

Introduction

In newly-released version of SQL Server, SQL Server 2012, a whole new approach to multi-dimensional analysis-like Business Intelligence is introduced. This is just one of the many new features. For a complete overview of new BI related functionality, please read the Insight "What’s new in SQL Server 2012? Highlights of new Business Intelligence functionality in the latest release.".
This insight will explain the two "models" that now exist for OLAP-like analysis, the Tabular and Multidimensional model, and their advantages and disadvantages.

Introducing the Tabular and Multidimensional model

Microsoft has been a market leader in OLAP technology for many years. Microsoft’s technology is very mature and scalable as proven by the Yahoo’s 12 TB Analysis Services cube. However, in the last five years, new players have entered the BI market with the promise to finally deliver "BI for the masses”. These new players have provided BI tools with highly interactive and graphical user interfaces built on in-memory architectures and associative paradigms in terms of selecting data in the analysis. They address business users' unmet ease-of-use and rapid deployment needs. These tools have become very popular and have started to become important competitors for Microsoft especially in the midsized business segment.

As a first reaction, Microsoft released PowerPivot in 2010, as a "personal” BI solution. In 2012, as part of SQL Server 2012, a new product called Tabular Model, a server version of PowerPivot is added to the SQL Server solution. With this new product and the new reporting tool "Power View” that comes with it, Microsoft aims to compete with the pure "in-memory” players.

This does not mean that Microsoft is abandoning disk-based OLAP technology. Microsoft communicated at the end of 2011 that it will keep on investing in OLAP technology to maintain its position as market leader. Therefore, in Analysis Services 2012, we find two different products:

The Tabular model (In-Memory Cube). Tabular models are in-memory databases in Analysis Services. Using state-of-the-art compression algorithms and multi-threaded query processing, the Xvelocity™ engine delivers fast access to tabular model objects and data through reporting client applications such as Microsoft Excel and Microsoft Power View.
The Multidimensional model (traditional OLAP Cube). This model is the OLAP cube that already exists for more than ten years. OLAP technology organizes summary data into multidimensional structures. Aggregations are stored in the multidimensional structure in cells at coordinates specified by the dimensions.

These two models have the same goal: to provide a semantic layer on top of your Data Warehouse with high performance capabilities that allows the end-users to dig into the data. The fact that these two models are part of Analysis Services could give the impression that both models are similar and that you can easily switch from one model to another. The reality is that they are two different products with two totally different design experiences and underlying data-architectures[B1]. They can co-exist on the same machine but they both have their own instances.

This insight will compare both models in terms of each layer (Data Source, Data Access, Performance, Data Languages, Data Model, Reporting and Management) to help you answer the first question that you will have to address when using Analysis Services 2012: Which model to choose for my project: a Tabular or Multidimensional model ?

Figure 1: FTP access via Internet ExplorerHigh Level Architecture of SQL Server Analysis Services 2012

click to enlarge

Data Source Layer

There are no differences between the Tabular and the Multi-Dimensional model : every data source that you can import into Multi-Dimensional can also be imported into Tabular. There is only one important restriction: if you use the Direct Query mode, you will be limited to SQL Server relational databases (see below).

Figure 2: Connect to a Data Source with Tabular Model

click to enlarge

Data Access Layer

Tabular models support data access through two modes: Cached mode and DirectQuery mode.

In Cached mode, all the data is loaded in memory and all queries are answered from there.
In DirectQuery mode, you bypass the in-memory model, allowing client applications to query data directly on the database source (only supported for SQL-Server relational engine).

Figure 3: Different Data Access modes in the Tabular Model

click to enlarge

The Multidimensional model supports two modes:

MOLAP. This is the default and most frequently used storage mode. In this mode, when you process the cube, the source data is pulled from the relational store, the required aggregation is then performed within Analysis Services and finally the data is stored in the Analysis Services server in a compressed and optimized multidimensional format.
ROLAP does not pull data from the underlying relational database source to the OLAP server but rather both the cube detail data and the aggregated data stay in the relational database source. In order to store the calculated aggregation the database server creates additional database objects (indexed views).

The DirectQuery Mode can be compared to the ROLAP mode; both do not store data and rely completely on the relational database to answer a query. ROLAP mode has very rarely been implemented because in many cases the performance is insufficient for end-users as soon as you get high volumes and in most scenarios you do not need real-time access to the data in the relational database. Pseudo real-time is sufficient and it can be achieved through the proactive caching mechanism or intermediate processes during the day. With the new column store index available in SQL Server 2012, a new possibility appears since the index column store can dramatically improve the performance of a Data Warehouse query on your relational database.

There are some limitations associated with the DirectQuery Mode:

Only DAX is supported which means that it can only be used from Power View (i.e. you cannot browse it from an Excel Pivot Table).
Time calculations are not supported.
Calculated columns are not supported.

One last remark: there is also the possibility to choose a mixed mode Cached/DirectQuery. In this case data exists in both storage modes. There is one storage mode used by default to answer the queries but in the connection string you can specify which storage mode you want to use.

The vast majority of projects will probably use either MOLAP or Cached mode; the main difference is that MOLAP mode will build pre-aggregated data but store both the aggregated and atomic data on disk, while Cached mode will load all the atomic data in-memory in a column store format without computing any pre-aggregation. If you go for the Tabular model with Cached mode, the amount of available RAM becomes a constraint that needs to be taken into account. Think twice about this especially if you want to have a large number of different cubes running in parallel on the same server.

There is no simple rule to compute in advance the amount of memory needed to be able to load your entire cube because it depends on the compression of your data when stored in columns instead of rows. The rationale can be defined as follows : the lower the cardinality of your column (cardinality means the number of distinct values), the higher the compression rate of your data. This means that you should avoid including data like transaction identifiers, timestamps, etc. in your model to reduce the amount of memory required.

Performance

In this section, we are not going to cover the difference between the ROLAP and Direct Query mode since in both cases most of the work is transferred back to the relational database and not to the model itself.

As explained above, the Tabular in memory engine is fundamentally different from the Multidimensional MOLAP engine. The Tabular model reads data directly from the cache and takes advantage of the query acceleration resulting from the column store indexes, while the Multidimensional model is reading pre-aggregated data or atomic data from disk depending on the existing aggregations defined during the design of the cube.

Consequences are:

By default, the Tabular engine will give great performance without any special tuning.
This does not mean that the Tabular model will be faster than the Multidimensional model. In fact, we could argue that for any existing aggregation the Multidimensional model is likely to give better performance especially if the aggregated data is already in cache. Contrary to the Tabular model, the Multidimensional model keeps the query results in cache and as such the more the cube is used, the better query performance you will get. The results of DAX queries are never saved within the cache which means that DAX queries will always take the same time to execute.
The relational concept of the Tabular model will also give good performance results when querying data at the lowest granularity level. This is much more of an issue for the Multi-Dimensional model especially when one needs to cross-join some very big dimensions.

The table below summarizes the different scenarios that we can encounter:

Scenarios	Multidimensional MOLAP	Tabular In-Memory	More Performant
Report on Low granularity data	Read atomic data from disk.	Read columnar data from RAM.	Tabular In –Memory.
Report on aggregated data with no predefined aggregation	Read atomic data from disk. Aggregate data in Memory	Read columnar data from RAM. Aggregate data in Memory	Tabular In –Memory.
Report on aggregated data with predefined aggregations on Cold Cache	Read aggregated data from disk.	Read columnar data from RAM. Aggregate data in Memory	Comparable.
Report on aggregated data with predefined aggregations on Warm Cache	Read aggregated data from RAM.	Read columnar data from RAM. Aggregate data in Memory	Multidimensional.

Table 1 : Performance comparison between Tabular and Multidimensional Model

In most cases, the Tabular model outperforms the Multidimensional model.

Data Language Layer

The Multidimensional model uses MDX (Multidimensional Expression) as its data language. MDX is a very powerful language but requires a thorough understanding of multidimensional concepts and, therefore, is often perceived as rather complex and difficult to learn.

The Tabular model uses a new language introduced by PowerPivot called DAX (Data Analysis Expression). The syntax of DAX is very similar to the one of Excel formulas and DAX concepts are similar to relational database concepts and therefore easier to grasp for a developer without any experience of Multidimensional concepts. Even though using DAX is rather simple for basic computations (margin, ratio, etc.), it becomes more complex for advanced computations. You can (as with any language) produce the same result with different syntaxes, but each will have a very different performance behavior. The message here is that there is no free lunch, if you want to create some advanced BI applications, you will have to invest some time in learning DAX, even if it is easier than MDX.

Figure 4: Editing a DAX formulas in a Tabular Model

click to enlarge

Also, there are some specificities of the MDX script like SCOPE assignments (it gives the possibility to overwrite values in the multi-dimensional space) or definition of Named Set (very convenient to filter a report with the TOP 10 Customers for example) that cannot be done in DAX.

The Data Model Layer

Both Tabular and Multidimensional models work with the same concept where you create links between the underlying data set to define your relationships. One important difference is that you can only use one column to establish relationships between tables in a Tabular project whereas in Multi-Dimensional projects you can use multiple columns. This reinforces the need to enable a solid Data Warehouse with appropriate surrogate keys before you start using a Tabular project. Otherwise you will have to build a lot of concatenated key fields which:

will be difficult to maintain,
will impact your performance in a negative way and
will require a lot of space (high cardinality character fields are not good candidates for high compression with the column store).

The Tabular project will seem easier for developers without Multidimensional experience because you do not need to define relationships between attributes. This step is mandatory in Multidimensional models in order to be able to define natural hierarchies and thereby optimize performance. Natural hierarchies are always processed into materialized hierarchies (i.e. the entire hierarchy is computed and optimized for querying). When faced with poor data quality, this can sometimes be very challenging.

Some limitations of Tabular compared to Multidimensional:

You cannot define role-playing dimensions (this was especially useful for the Date dimension)
You cannot define Many-to-Many Relationships (there are some workarounds to get the same results but you will have to use very complex DAX formulas)
You cannot define a Parent-Child hierarchy
Custom Rollups are not supported (especially useful when aggregating data following a chart of accounts)
You cannot define Actions (Drill trough, Reporting, etc.)
Write-back is not supported

Overall, a lot less properties are available in the Tabular model than in the Multidimensional model. Again this leads to the same conclusion, Tabular model is easier to define but it contains some limitations compared to Multidimensional model.

Another very important distinction between both models is that with the Tabular model you can import directly from an existing PowerPivot model. This means that you can easily transfer a personal BI application developed by a Business Analyst to the IT department. We understand the path proposed by Microsoft which is to create footbridges between Business and IT by allowing end-users to experiment with a new solution on their own via PowerPivot and to involve IT later on and only for solutions that have reached sufficient maturity. With this new paradigm, Microsoft presents its solution to BI’s current challenge: "How to get a controlled BI environment that is agile enough to react to continuous business changes?”

Reporting Layer

Every reporting solution (Microsoft Excel, SQL Server Reporting Services, etc.) that creates reports on top of the SSAS (SQL Server Analysis Services) Multidimensional model will be able to do the same using the Tabular model. They both use the same connection interface and they are both capable of interpreting MDX. So every reporting tool that can generate MDX will be able to use the Tabular model. The Tabular model engine is capable of translating MDX into DAX.

The Multidimensional model on the contrary is not capable of interpreting DAX languages. As a direct consequence, a reporting tool that would generate DAX queries will not be able to use a Multi-Dimensional project as a source. There is currently only one tool that generates DAX queries, which is Power View.

Power View provides intuitive ad-hoc reporting for end-users. They can easily create and interact with views of data from data models based on tabular models. Power View is a browser-based Silverlight application launched from SharePoint Server 2010. The simplicity of use and the capability of creating dynamic reports in a few minutes, which remain dynamical once exported to PowerPoint, will make Power View a very appealing tool for Business Analysts. It however requires Sharepoint 2010 to be deployed in the company.

Figure 5: Creating a report with Power View

click to enlarge

A last remark, native DAX queries can sometimes be faster than MDX queries, which explains why under some circumstances the same report will be faster when executed from Power View instead of another reporting tool (i.e. Microsoft Excel).

Management Layer

There are no noticeable differences between the Tabular model and the Multidimensional model regarding the administration & management aspects. Both models contain partitioning, perspectives, dynamic security (one will be written in DAX and the other will be written in MDX), Management Views, Backups and AMO/XMLA API.

One difference is worth mentioning: the Tabular model does not support parallel processing of partitions which can have significant impact on the processing time.

Conclusion

The Tabular model looks very promising. Its ease of use and the ability to import a PowerPivot model is definitely something interesting to increase even further the agility of the BI environment within your company.

We can expect Microsoft to work further on the merger between the two models in future releases, but that has not been fully reached yet. This leaves us with the following question: "Which model to choose with SQL Server 2012” ?

Below we have highlighted the differences between the two models in terms of each layer and we can see that both products have advantages and disadvantages. To facilitate your choice we have summarized what we think are the key differentiators between both models:

Tabular Model

Multidimensional Model

Pros

More performant than OLAP in majority of the case.
Easier to develop than Multidimensional model.
Technology for the future.
Integration with Power Pivot.

Mature Technology.
Scalable Technology able to handle very large volume of data.
Able to cope with advanced modeling/ computations requirements.

Cons

New product so still enhancements to come.
Limited to RAM available (= midsized project).
Missing some advanced computations available with MDX.

Cannot be used with Power View.
No major innovations to expect in this product in the future.
Higher complexity than Tabular.

Table 2: Pros and Cons Tabular Model vs. Multidimensional Model

And what is then our final recommendation? At this time, it is impossible to come up with a simple recommendation for one model regardless the type of project since on the one hand we have a very promising product which is still new and on the other hand we have a very mature technology for which no major innovations are to be expected in the future.

Consequently, we recommend the following:

Do not migrate your existing multi-Dimensional models if your end-users are happy with it. There is a high probability that you will not be able to reproduce some existing functionalities which might be frustrating for your end-users. The only consideration that could lead to a rewrite in a Tabular model would be to be able to use Power View.
For every new project, choose the Tabular model unless (1) you have a very high volume of data or (2) you have very complex business requirements (typically advanced financial reporting).

posted @ 2016-02-19 16:35 俊凯阅读(259) 评论(0) 收藏举报

刷新页面返回顶部

俊凯

The choice between Tabular or Multidimensional models in SQL Server Analysis Services 2012

Introduction

Introducing the Tabular and Multidimensional model

Data Source Layer

Data Access Layer

Performance

Data Language Layer

The Data Model Layer

Reporting Layer

Management Layer

Conclusion

公告