I’ve explored Data Modules in depth on this blog over the last year with the hope of showing you how awesome data modeling in Cognos Analytics can be if you really embrace it. There is, however, an additional piece of the Cognos data puzzle that you need to understand to unlock the full potential of the platform – the Data Set. So let’s answer the question – just what are Cognos Data Sets?
The IBM Blueview Data Set Series
What are Cognos Data Sets?
When to use Cognos Data Sets
What is a Data Set in Cognos?

Simply put, a Data Set is data source type in Cognos Analytics that contains data extracted from one or more sources and stored within the Cognos system itself as an Apache parquet file. The parquet file is then loaded into application server memory at run-time on an as-needed basis. This (usually) greatly enhances interactive performance for end users while reducing load on source databases. When combined with Data Modules, Data Sets offer incredible out-of-the-box capabilities like automatic relative time, easy data prep and custom table creation.
Data Sets are also extremely easy to build from your existing Framework Manager or Transformer packages making them an excellent option for getting the most out of your legacy Cognos 10 models. In fact this is probably the #1 use case for the Data Set technology and is the absolute fastest way to modernize your environment and turn Cognos into a rapid-fire data prep and visualization machine.
I’m going to write a full blog post about the exact situations that suggest a Data Set solution, but in short you should consider using Data Sets whenever:
- Excellent interactive performance is a critical part of your deliverable
- You wish to limit extremely costly SQL queries by re-using results
- You must join multiple data sources together or accomplish other ETL tasks within Cognos rather than source systems
- Existing Framework Manager or Transformer models are too complex or too slow for self-service
- Someone tells you Cognos is slow but Tableau or Power BI are fast (those tools use Data Set-like technologies to enhance interactive performance)
- You just want to do something really cool
Which Features can use a Data Set
There is one small limitation to Data Sets – while they function as a data source for all Cognos Analytics features they cannot be used directly to author reports. The solution to this is simple – wrap them in a Data Module and import the Data Module to Report Authoring. You should be doing this anyway for all Data Sets as it provides maximum deployment flexibility and ease of upkeep. I will cover best practice topics like this in a future article.
How to Build a Data Set

Building a Data Set is simple, especially if you have existing Framework Manager or Transformer models available in Cognos. In fact Data Sets can only be built on top of existing models or Data Modules- not directly on data servers. IBM has helpfully hidden the ‘Create data set’ capability in the ‘more’ menu of model objects in the environment, so it’s surprisingly easy to miss.
Cognos Data Set Creation
Creating a Data Set is a straightforward process, especially for experienced Cognoids. The UI is actually a re-skinned version of Report Authoring and many of your favorite tricks will work here. Building a Data Set is as simple as dragging columns into the list object, saving and loading data. Of course there are additional options you can take advantage of.

- Source View: Browse the tables and fields in your data source exactly as you would in Report Authoring
- Data List: The data table shows a live view of the Data Set as you build it. It queries new data as you make changes
- On Demand Toolbar: The on demand toolbar appears when you click on a column, giving you the ability to filter and sort.
- Filtering: Filters help you focus the data in your Data Set to just what you need. Fewer rows = better performance.
- Sorting: Sorting by the columns most used in report or dashboard filters (for example, time data) can greatly improve performance
- Query Item Definition: The query item appears when you double click a column header. You have access to query item functionality from Report Authoring, which means you can really accomplish a lot from this popup.
- Preview: Unchecking the preview button switches the data table into preview mode which turns off automatic data query as you make adjustments to your Data Set.
- Summarize and Row Suppression: The summarize function rolls your data up to the highest level of granularity, for example rolling daily data up to the month.
Row suppression is honestly a mystery to meSpecial thanks to Jason Tavoularis at IBM for an explanation – row suppression in data sets only applies to dimensional data sources and does the same thing as using row suppression in Report Authoring.
Once you’ve imported your desired data, set your filters, sorts and summaries and maybe added a few calculations for good measure it’s time to save, load and deploy your Data Set.
Saving and Loading a Data Set

When you save a Data Set you will see the option to ‘Save and load data.’ This will allow you to select a directory in Cognos to house the Data Set object. It will also issue one or more queries to retrieve data and populate a parquet file. This file is stored in Cognos and loaded into memory upon request when users access the Data Set. Check out the ‘Flint’ section of this in depth article to understand what happens under the hood during Data Set creation and Query
Scheduling and Managing Data Sets
Data Sets only contain data from their last load; it is good practice to get in the habit of scheduling and monitoring Data Sets to ensure they contain relevant data and continue to perform well.
Data Set Scheduling Options

The easiest way to schedule Data Sets is via the ‘schedules’ tab in Data Set properties. Data Sets and Reports share all the same scheduling options, including the ‘by trigger’ option. Scheduling via a trigger makes it easy to ensure Data Sets only load after your ETLs complete. This works great for simple or one-off scheduling tasks.
For more complex schedules, Data Sets are available in the Job feature. Again, they function as if they were reports as far as building Jobs is concerned.
Data Set Management

The Data Set properties screen contains the info you need to effectively maintain fresh and performant data for your end users. At the top of window you can see the last load date of the Data Set, while expanding the ‘advanced’ exposes the following:
- Size: The compressed size of the parquet file on disk
- Number of rows: The number of rows in your data set. Keep this under ~8 million for best performance
- Number of columns: The number of columns in your data set. No hard limit here, just don’t include columns you don’t need
- Time to refresh: The time it takes for the Data Set to load
- Refreshed by: The name of the person who last refreshed the data set
I will write a longer post about Data Set tuning and troubleshooting. For now it’s key to keep in mind the row and column suggestions above. And while ‘Time to refresh’ is important, this represents the time it takes to load data and has no impact on the performance end users will experience. The beauty of Cognos Data Sets is that by front-loading the processing, you can create a complex result set that takes hours to load but offers sub-second response time to end users.
A Real World Example of Data Sets in Action
I have used Data Sets in many successful client engagements to greatly improve performance, simplify presentation or accomplish ETL tasks in an afternoon that their DW team had put off for years. Here is a simple example for you.
The Problem: Metrics, metrics everywhere!
This customer came to us with a very, very common problem. The sales support team had identified a need for some new advanced metrics and built out a prototype dashboard. However, the underlying data divided between two Microsoft SSAS cubes and a handful of tables in the EDW. The data warehouse had given an estimate of many months to create the necessary tables and cubes.
The Solution: Cognos Analytics Data Sets
The customer brought in PMsquare on a 40 hour contract to make this happen. If your initial reaction to that contract length is skepticism I don’t blame you. In Cognos 10 this would have been impossible. However thanks to Data Sets I was able to do the following:
- Extract the needed data from each SSAS cube and the EDW into a Data Set. There were 3 Data Sets total, one from each data source.
- Join the Data Sets together into a Data Module and add in all the Data Module goodies like relative time
- Create a new, final polished Data Set from that Data Module to simplify presentation and improve performance
- Build out the customer’s dashboard
The customer was extremely satisfied with the end result, which looked something like this:

Cognos Analytics Data Sets in Summary
As you can see, I really was able to accomplish months of work in a single week using Data Sets. Obviously this technology cannot replace all ETL tasks however Cognos Analytics is now an option for low to medium complexity transformations. And you now have a slam-dunk option for rapidly simplifying presentation or improving performance vs even the simplest database view.
Be sure to check back next Tuesday, 4/14/2020 for part two of this series: When To Use A Data Set!
Thank you for this great article, but I have a few comments.
By reading your article one has the impression that the DataSet is a miraculous solution for the performance problems that you MUST have with the FM and Transformer models, but this is not entirely true.
You write “Existing Framework Manager or Transformer models are too complex or too slow for self-service”. If the FM model is too complicated and too slow
In addition, say that this is a great option for getting the most out of your legacy Cognos 10 models.
You can’t say it like that.
Legacy models need to be modernized to meet modern needs – DQ migration, performance optimization, use of columnar storage. This is the first step in migrating to CA 11.
The use of Framework Manager itself does not affect performance, the query execution load is shifted to the database engine, response times depend on proper sizing of the database.
As for the Power Cubes, they give excellent response times and extracting the DataSets from them most often will not bring anything in terms of performance improvements.
If you do a DM on a basis that does not offer good response times the performance will not be better.
The DataSet is a valuable tool but – once again – it should not be opposed to models from previous versions of Cognos but explain why it can be useful. What you do elsewhere.
Regarding your example, it was surely not possible to carry out the example of which you speak with Cognos 10, in any case in this form. With a Transformer cube this is quite possible. Requests on each source in place of DataSets and the Transformer model in place of DM. It’s the same solution.
The three sources that you cite are the BI sources and I can hardly believe that it takes months of development to create a datamart from 3 sources of this type. If this is the case it is – in my opinion – more for procedural reasons than for development times. By moving the development onto the user’s desktop, it is no longer a matter of IT, but of the end user and you can develop in AGILE.
Just don’t forget that DatsSets management is a black box, files are stored in the content store and there is no way to optimize its behavior. IT management doesn’t like that. This is one of the reasons why DataSets are not made to handle large volumes of data either.
But it’s a great tool – mixed with DM – for data analysts and advanced users.
Jerzy – thank you for this thoughtful comment! I agree with most of your objections in principle. It is better to undergo a redesign that takes into account FM model structure, database type, sizing and structure, etc.. when looking to improve performance. It was possible in Cognos 10/8 to merge multiple sources together in a Transformer cube, and Data Set management is not idea (although there are more options for tuning data sets than most people realize; you can change where data sets are stored and you can access the spark console to at least see how data sets are processing)
The thing is, most of my clients can’t or won’t do those things. The example I cited was a real example where the sales organization had an outstanding request to the IT BI team for merging those data sources together into a new data mart. You are absolutely correct, they failed to even start the work for something like 14 months due to procedural issues. However even once the work had started the team had no ability to create a new data mart in 40 hours. It would have probably been 4-6 weeks of work given the resources available.
So my argument is not really that Data Sets are superior to Framework Manager or Transformer, but rather that they are so much easier to use that skilled BI practitioners can accomplish a much higher volume of work when using them compared to alternative – and technically superior – solutions.
The number one problem for my clients is under staffing and lack of technical skills. For these clients Data Sets offer a huge advantage for the BI team, not just end users. For clients with large and highly skilled BI departments they serve an important role in prototyping and agile delivery but you are right – they are not a replacement for high performance databases and well crafted models.
Please continue to read and comment on this blog – I appreciate your perspective very much!
The best use of Cognos is when the data is prepared by a competent IT department and it’s properly modeled in Framework Manager or similar. While it is great that Cognos 11 is offering modelling tools for the final user, I don’t want to waste time preparing the data when there is an IT depatment with brilliant ETL capabilities. When data is profesionally modeled, creating reports and query data takes only a fre minutes. Data models is great for a Proof of concept or for a small business without proper IT support. Small businesses probably would be better with Power BI or Tableau.
I’ve started to use data sets and starting to find that could be a powerful solution. The main drawback is that the capability to define the contents of the data sets is extremely limited. There are no options for complex filtering or joining data. Before 11.1.5 there were some hacks to use “Reporting” to define the data set contents.
From 11.1.5 I found that query calculations could be added by dragging an existing data item and changing the definition. This also allows for creating more complex data filtering structures. No solution yet for joining data (Probably on 11.1.7)
Did you see that 11.1.7 does indeed solve this issue? Very cool!
I stumbled upon this article looking for ways to improve run times of reports pulling from complex relationships from framework objects. We have grown to have everything mapped in our framework from different data sources but all our reporting that uses this cross-source data takes forever to run. All our queries are fresh pulls from the source (real time data). This opens up a whole new world for improved performance of reports that dont necessarily need real time data. This is the best thing I’ve seen for years. Going to star playing with it soon!
You are going to love them, trust me! Especially if you are on 11.1.7 – the 11.1.7 release had a great little data set overhaul that added a ton of great data blending features.
hello, how can you solve the permissions on the dataset? can a given user see only part of the data?
Yes, but you need to ensure that the username or a group/role is included in the data itself. Once that’s done you can write a filter in a data module that pulls only the tables associated with the ID from the session. I’ll do a youtube video on how to do this.
Hi,
What are the Memory Limitations for Data Sets?
You’re going to want at least 32GB available on the application server, and you may want to increase the Compute Service max java heap size to accommodate data sets if performance indicates. The most important things to do are to only include the data you need as a best practice, sort the fields that will be used as filters in the data set builder, and keep it to around 8M rows tops.
Thanks for the article, it was very useful! 🙂
Question about it, is there a way I can retrieve this data set metadata information, such as ‘Date Refreshed’ property to the report?
That is a good question… I don’t know, I’ll try to find out!
Row level or Object Level Security can you please post the yoitube video here if u have done Ryan.
How do you change the query for the dataset?
How do you replace the query data on the page? I have my query joins completed but the page is pointing at the 1st query and I need to change it.