-
When presenting make sure
you use AdventureWorks database and OLAP cube
For years we have been using the samples of
Northwind (for SQL Server) and FoodMart (for OLAP). Well no
more... SQL Reporting Services includes a new sample database of
AdventureWorks2012.
Download
AdventureWorksDW zip file
which contains the AdventureWorks2012DW database (a backup of a
flattened/de-normalised version of AdventureWorks2012) and the
AdventureWorks2012 analysis server database that includes the
Reseller Sales cube. (7.41 MB - .zip)
View
Instructions
on setting up the above resources.
-
Make a separate database for your Data
Warehouse
You should always base your cubes off a separate data warehouse database.
For larger organizations this will also reside on separate machines. This
will be used as the store for all the data collected as time rolls on. Why
would you do this?
The good things:
- Only keep data you want
- Allows you to purge old data from OLTP
- Allows you to record Time Dependant info (Slowly Changing Dimensions)
- Performance not hitting the OLTP
The bad things:
- Redundancy Disk space
- Mirror Schema changes
People often ask: Why make a new database and not just make Views/StoredProcs?
The answer to this is that when using multiple databases, cleaning data
gets harder and messier without the data warehouse in the middle (where a
DTS package may exist for each datasource into the warehouse).
- Figure: A good naming convention for your Data Warehouse is to tack on DW to the end of the name
An Exception:
If you are not worried about performance issues and your data is clean, then you dont need to make another database
Just add the business logic using views (eg. If you have a bit field, but you want to add meaning, then you add a view with a CASE statement).
Note: In SQL Server 2005 you would add this business logic in a new layer Data Source Views within the UDM (Unified Data Model)
|
-
Do you follow a naming convention for your Data Warehouse?
Use Dim<Name> for the dimension tables. (e.g. DimAccount)
(Tip: think of Dimensions as your filters/WHERE/ GROUP
BY)
Use Fact<Name> for the fact tables. (e.g. FactFinance)
(Tip: Fact tables contain the
numbers/measures (or the transactions tables/COUNT()/SUM())
- Figure - Bad Example - no standard naming conventions,it is unclear what some tables are used for
- Figure - Good Example - We can clearly see which tables are dimensions and which are facts
-
Do you have a DimTimeDay table in your Data Warehouse?
Your data warehouse must have a table DimTimeDay with a
record for every day. This way when you want to make reports based on time
(which you inevitably will), you can view data for periods other then standard
calendar periods (e.g. the fiscal quarter).
How do you create this?
Well, there are two ways that we recommend:
- Using the Dimension Wizard in SQL Server 2005
(see Creating a Time Dimension in 10 Easy Steps
for more information - Preferred Method).
- Using BI Accelerator
.
While we prefer using the Dimension Wizard in SQL Server 2005, you may find using BI Accelerator to be more
customizable; however, you should be aware that BI Accelerator is an
Excel sheet that creates an entire data warehouse (including DB, Cubes and
DTS to populate it) and not just the Time Dimension table.
|
Figure: Time Dimension table after completing the Dimension
Wizard in SQL Server 2005 |
|
Figure: This is what your Time Dimension will look
like after using BI Accelerator. |
Do you know which Performance you care about?
There is SQL Server 2008 White Paper: Analysis Services Performance Guide.
Sometimes it is a trade-off like when you set lots of aggregates which making the users querying faster, but makes the Cube slower to build, and larger.
-
Always make the Lowest Level of a Dimension the Key
You should always have a unique key for every element of data.
This means that the lowest level of a given dimension should be the
key as this will be unique and based on some unique key already.
For example:
|
Figure: What is wrong with this? ... |
|
Figure: ... Well the bottom Level (called the Leaf Level) should always be the key. |
Note: Remember to make the Member Keys Unique = True and
Member Names Unique = True
The first reason is for performance as in the Cube Editor you can successfully run Tools Optimize Schema.
Note: This improves Cube Building Performance (always) and Querying Performance (for some cases).
The second reason is a smaller cube size, but you need to apply the next rule for that..
-
Always use Virtual Dimensions instead of Real Dimensions where possible
|
Figure: What is wrong with this? ... |
Well we are using Real Dimensions for "Product Color"
and "Product Size", when they should be Virtual Dimensions of the
Product Dimension
There are 3 steps to doing that:
1) Add the Member Properties eg. Color and
Size
2) Delete the original Shared Dimension
3) Add the new Virtual Dimension using the wizard
Note: This improves Cube Building Performance (always) and Querying Performance (for most cases as the cube is physically smaller).
-
When you are using Real Dimensions, when applicable specify the "Depends On Dimension" property
A common property which is often overlooked is "Depends on Dimension". This will indicate to the aggregate engine that these dimensions are related, therefore more efficient aggregates will be built.
- Figure: In this example the Depends on Dimension property should be set to "Product"
Note: This improves Cube Building Performance (sometimes) and Querying Performance (for most cases because you have more efficient aggregates in the Cube).
-
BI - Always make sure the dimensions All Captions = All - *FIXED IN SQL 2008*
When you are carrying out reporting (using Reporting Services) based on some OLAP cube you should change the "All Caption" property of each dimension to be have the value "All". By default the caption for the dimension will be "All <dimension_name>" (e.g. All Products). This default is distracting when used in reporting solutions and doesn't particularly clarify anything anyway.
- Figure: Set the "All Caption" to read "All" instead of the distracting default
When you use this dimension in the parameter toolbar of Reporting Services this default value will look horrible and it should be changed to some consistent value in order to facilitate fast scanning of the parameter values.
- Figure: Having set the All caption nicely, you can more easily see which products aren't using the defaults.
Read our rule on Controls - Do you include '-All-' option in your ComboBoxes?
-
Do you always use Shared Dimensions?
When you're building an Analysis Server Database you should make always use Shared Dimensions. This is because:
- all of your cubes can re-use them, making future cubes faster to develop
- they are only processed once
- Figure: This is the only place you should be creating dimensions. Stay aware from Private dimensions hidden in the cube.
-
Do you always use spaces in names?
Always use spaces for Dimension Names, Level Names and Measures this is a business tool and you want users to see friendly names.
-
Always make time scales clear, if using "financial year" then make it obvious
2004
...
- Month 06
2005
- Month 07
- Month 08
- Month 09
- Month 10
|
Figure: What is wrong with this? |
3 things:
- If you are at the year level, the user doesnt know if it is financial or calendar
- When you are at the month level, the user doesnt know what year they are in
- The full month names should be used because there are no issues with sorting.
When you create the dimension you specify the Order By to be Key. Therefore it would look like this:
FY 2004
- Jun 2004
FY 2005
- Jul 2004
- Aug 2004
- Sep 2004
- Oct 2004
|
Figure: This is the type of data we want. Make names
clear and clarify that you are referring to financial year. |
-
Move your OLAP MetaData from Access to SQL Server
Analysis Server will by default use an Access database to store the meta data
for each of the databases. Presumably this is because you don't necessarily need
to have SQL Server on the same machine as Analysis Server. You should change
this repository to point to your SQL Server because it's a more secure, robust,
and allows you to backup your data with your normal SQL Server backups.
To migrate the repository of meta data:
- In Enterprise Manager, create a new database to use for the meta data.
Call it "OLAPMetaData".
- In Analysis Manager, right click the server, select "Migrate
Repository" and point it to your table in the database.
|
Figure: Move the MetaData repository away from the default Access Database into SQL Server |
At the end you can also double check that it worked successfully by checking the Repository Connection String in the context menu shown above.
-
When designing don't compute cube aggregations
When prompted about whether to process the aggregations for the cube
when you are designing, you should click No in order to save time.
However, when deploying or testing query performance you should click Yes.
- Figure: Click No while designing, click Yes when deploying or testing performance.
-
Do you hide zero values when you are doing financial report?
Financial report could be very complicated and there are large number of figures showing,
we only want to focus on the significant figures. Zeros could be ignored, so we will hide zeros in order to make other figures standard out.
- Figure: Bad Example - a lot of zeros are showing in the gird, we cannot see the significant figures.
- Figure: Good Example - zeros are hidden, easy to focus on the values
To do this in Excel sheet, you can simply choose the cell and format it as "Accounting"
- Figure: Excel - choose "Accounting" as the cell format
-
Do you remove decimal places from chart?
Graphs or charts are used for showing big picture or trend of data. Decimal places in such
reports are not useful at all, only a waste of space. So you should always remove decimal places
from charts.
- Figure: Bad Example - decimal places on a chart is a waste of space
- Figure: Good Example - decimal places are removed from the chart
-
Do you use conditional formatting to visualize your data?
If you are using Excel 2007, there is a very exciting new feature that allows you to visualize your data in the place.
It allows you to view your data in a visual way, make it very easy to tell the difference.
- Figure: Bad Example - no visualization, cannot tell the difference of data easily
- Figure: Good Example - visualized data, very easy to read and understand the difference
You can simply choose "Conditional Formatting" from the tool bar in Excel 2007 to do this.
- Figure: Choose "Conditional formatting - Data Bars" to enable data visualization
-
Can you use the super cool Microsoft Data Mining Addin?
Microsoft SQL Server 2005 Data Mining Add-ins for Microsoft Office 2007 (Data Mining Add-ins) allow you take advantage of SQL Server 2005 predictive analytics in Office Excel 2007 and Office Visio 2007.
- Figure: Simply click the highlight exceptions to see data that looks abnormal
- Figure: Click Forecast on this one
-
Do you avoid defaulting data?
In Business Intellegence, it is not preferable to default data, eg. Country=USA, There are so many web pages that do this and it it influences data - and ruins your Business Intelligence reports - you think they are all from the USA - when really they have just left the default.
Check rule Do you avoid "Data Junk"? in Rules to Better Interfaces.
-
Do you always sort dates in Excel in descending order from right to the left?
When you're working with spreadsheets in Excel, be sure to sort dates in descending order from right to left.
- Figure: Bad Example - Dates are sorted from left to right.
- Figure: Good Example - Dates are sorted from right to left.
-
Do you check the cube has been processed?
You'd better check the cube has been processed.
- Figure: proczsValidateSSWData2005Cube_ver.1-01.
-
Do you know where to put your KPIs? (Cube or PerformancePoint Server)
PerformancePoint Services 2012 gives you a nice GUI to build KPIs if you are not familiar with MDX. The bad news is that these KPIs can't be used by any other cube browsers (e.g. Excel)
So the question is... Do you put the KPI in the cube (and link it in PerformancePoint Services 2012) or directly into PerformancePoint Services 2012?
- Figure: Bad Example - PerformancePoint Services - Dashboard Designer - Creating a KPI - Very easy to use
The cube gives you the flexibility to write whatever KPIs you want (including trends) ?but you need to know MDX
- Figure: Good Example - BIDS - Creating a KPI with MDX statements - ultimate flexibility,steep learning curve
Ultimately, having the KPIs in the cube makes more sense. Keeping them centralized, link them in PerformancePoint Services and potentially allow other cube browsers (like Excel) to access them
- Figure: The reason to put your KPIs in the cube
-
Do you build a layer of views in the data warehouse that has the business rules and calculations?
Having the cube work from views is the first step in ensuring the stability of the cube. As the cube matures and the business refines what it needs to report on, the cube will change and have facts and dimensions added or removed from it.
By using views developers can hide the change in schemas and always present a consistent interface for the cube to work with.
For example, you can put the calculation of the GST component as a calculated field in a view. However, our preference is to create a denormalized field to store this data. (See our rule Do you use triggers for denormalized fields?)
-
Do you use logging in your ETL and SSIS packages?
The data warehouse is the one source of truth (see Do you fix data integrity problems before it gets to the data warehouse?), so to prevent bad data from getting in there we can log them using SSIS.
You can log any exceptions to a table (see http://msdn.microsoft.com/en-us/library/ms140246.aspx. This way you can either allow good data to still be processed and just filter out the bad data.
You can then report on, and have metrics on the errors.
This will help improve the quality of the data in the warehouse as the cube matures.
-
Do you fix data integrity problems before it gets to the data warehouse?
The data warehouse is the one source of truth. Data should have integrity and be verified to exist in the warehouse.
Your first line of defence is verifying the data in your source databases are correct. This can be achieved using a procValidate to check the data. See our rule Do you validate each "Denormalized Field" with procValidate?
You next line of defence is in the ETL process where you can check for data integrity issues (e.g. missing keys, values). At this point you can choose to fail the process or proceed with the valid records (logging the invalid ones). See our rule Do you use logging in your ETL and SSIS packages?
The last thing you can do, and at this point the data is in the cube already, is to hide these errors with "unknown members".
-
Do you add a diagram for each fact table
Adding a diagram helps users clearly see which dimensions are related to which fact tables.
- Figure: Bad Example - There are no diagrams at all to help the user see how facts are related to dimensions
- Figure: Good Example - There is a diagram for each fact table