I’ve explored Data Modules in depth on this blog over the last year with the hope of showing you how awesome data modeling in Cognos Analytics can be if you really embrace it. There is, however, an additional piece of the Cognos data puzzle that you need to understand to unlock the full potential of the platform – the Data Set. So let’s answer the question – just what are Cognos Data Sets?
The IBM Blueview Data Set Series
What is a Data Set in Cognos?
Simply put, a Data Set is data source type in Cognos Analytics that contains data extracted from one or more sources and stored within the Cognos system itself as an Apache parquet file. The parquet file is then loaded into application server memory at run-time on an as-needed basis. This (usually) greatly enhances interactive performance for end users while reducing load on source databases. When combined with Data Modules, Data Sets offer incredible out-of-the-box capabilities like automatic relative time, easy data prep and custom table creation.
Data Sets are also extremely easy to build from your existing Framework Manager or Transformer packages making them an excellent option for getting the most out of your legacy Cognos 10 models. In fact this is probably the #1 use case for the Data Set technology and is the absolute fastest way to modernize your environment and turn Cognos into a rapid-fire data prep and visualization machine.
I’m going to write a full blog post about the exact situations that suggest a Data Set solution, but in short you should consider using Data Sets whenever:
- Excellent interactive performance is a critical part of your deliverable
- You wish to limit extremely costly SQL queries by re-using results
- You must join multiple data sources together or accomplish other ETL tasks within Cognos rather than source systems
- Existing Framework Manager or Transformer models are too complex or too slow for self-service
- Someone tells you Cognos is slow but Tableau or Power BI are fast (those tools use Data Set-like technologies to enhance interactive performance)
- You just want to do something really cool
Which Features can use a Data Set
There is one small limitation to Data Sets – while they function as a data source for all Cognos Analytics features they cannot be used directly to author reports. The solution to this is simple – wrap them in a Data Module and import the Data Module to Report Authoring. You should be doing this anyway for all Data Sets as it provides maximum deployment flexibility and ease of upkeep. I will cover best practice topics like this in a future article.
How to Build a Data Set
Building a Data Set is simple, especially if you have existing Framework Manager or Transformer models available in Cognos. In fact Data Sets can only be built on top of existing models or Data Modules- not directly on data servers. IBM has helpfully hidden the ‘Create data set’ capability in the ‘more’ menu of model objects in the environment, so it’s surprisingly easy to miss.
Cognos Data Set Creation
Creating a Data Set is a straightforward process, especially for experienced Cognoids. The UI is actually a re-skinned version of Report Authoring and many of your favorite tricks will work here. Building a Data Set is as simple as dragging columns into the list object, saving and loading data. Of course there are additional options you can take advantage of.
- Source View: Browse the tables and fields in your data source exactly as you would in Report Authoring
- Data List: The data table shows a live view of the Data Set as you build it. It queries new data as you make changes
- On Demand Toolbar: The on demand toolbar appears when you click on a column, giving you the ability to filter and sort.
- Filtering: Filters help you focus the data in your Data Set to just what you need. Fewer rows = better performance.
- Sorting: Sorting by the columns most used in report or dashboard filters (for example, time data) can greatly improve performance
- Query Item Definition: The query item appears when you double click a column header. You have access to query item functionality from Report Authoring, which means you can really accomplish a lot from this popup.
- Preview: Unchecking the preview button switches the data table into preview mode which turns off automatic data query as you make adjustments to your Data Set.
- Summarize and Row Suppression: The summarize function rolls your data up to the highest level of granularity, for example rolling daily data up to the month.
Row suppression is honestly a mystery to meSpecial thanks to Jason Tavoularis at IBM for an explanation – row suppression in data sets only applies to dimensional data sources and does the same thing as using row suppression in Report Authoring.
Once you’ve imported your desired data, set your filters, sorts and summaries and maybe added a few calculations for good measure it’s time to save, load and deploy your Data Set.
Saving and Loading a Data Set
When you save a Data Set you will see the option to ‘Save and load data.’ This will allow you to select a directory in Cognos to house the Data Set object. It will also issue one or more queries to retrieve data and populate a parquet file. This file is stored in Cognos and loaded into memory upon request when users access the Data Set. Check out the ‘Flint’ section of this in depth article to understand what happens under the hood during Data Set creation and Query
Scheduling and Managing Data Sets
Data Sets only contain data from their last load; it is good practice to get in the habit of scheduling and monitoring Data Sets to ensure they contain relevant data and continue to perform well.
Data Set Scheduling Options
The easiest way to schedule Data Sets is via the ‘schedules’ tab in Data Set properties. Data Sets and Reports share all the same scheduling options, including the ‘by trigger’ option. Scheduling via a trigger makes it easy to ensure Data Sets only load after your ETLs complete. This works great for simple or one-off scheduling tasks.
For more complex schedules, Data Sets are available in the Job feature. Again, they function as if they were reports as far as building Jobs is concerned.
Data Set Management
The Data Set properties screen contains the info you need to effectively maintain fresh and performant data for your end users. At the top of window you can see the last load date of the Data Set, while expanding the ‘advanced’ exposes the following:
- Size: The compressed size of the parquet file on disk
- Number of rows: The number of rows in your data set. Keep this under ~8 million for best performance
- Number of columns: The number of columns in your data set. No hard limit here, just don’t include columns you don’t need
- Time to refresh: The time it takes for the Data Set to load
- Refreshed by: The name of the person who last refreshed the data set
I will write a longer post about Data Set tuning and troubleshooting. For now it’s key to keep in mind the row and column suggestions above. And while ‘Time to refresh’ is important, this represents the time it takes to load data and has no impact on the performance end users will experience. The beauty of Cognos Data Sets is that by front-loading the processing, you can create a complex result set that takes hours to load but offers sub-second response time to end users.
A Real World Example of Data Sets in Action
I have used Data Sets in many successful client engagements to greatly improve performance, simplify presentation or accomplish ETL tasks in an afternoon that their DW team had put off for years. Here is a simple example for you.
The Problem: Metrics, metrics everywhere!
This customer came to us with a very, very common problem. The sales support team had identified a need for some new advanced metrics and built out a prototype dashboard. However, the underlying data divided between two Microsoft SSAS cubes and a handful of tables in the EDW. The data warehouse had given an estimate of many months to create the necessary tables and cubes.
The Solution: Cognos Analytics Data Sets
The customer brought in PMsquare on a 40 hour contract to make this happen. If your initial reaction to that contract length is skepticism I don’t blame you. In Cognos 10 this would have been impossible. However thanks to Data Sets I was able to do the following:
- Extract the needed data from each SSAS cube and the EDW into a Data Set. There were 3 Data Sets total, one from each data source.
- Join the Data Sets together into a Data Module and add in all the Data Module goodies like relative time
- Create a new, final polished Data Set from that Data Module to simplify presentation and improve performance
- Build out the customer’s dashboard
The customer was extremely satisfied with the end result, which looked something like this:
Cognos Analytics Data Sets in Summary
As you can see, I really was able to accomplish months of work in a single week using Data Sets. Obviously this technology cannot replace all ETL tasks however Cognos Analytics is now an option for low to medium complexity transformations. And you now have a slam-dunk option for rapidly simplifying presentation or improving performance vs even the simplest database view.
Be sure to check back next Tuesday, 4/14/2020 for part two of this series: When To Use A Data Set!