• Skip to main content

IBM Blueview

Cognos Analytics and all things IBM

  • The Blog
  • Cognos Glossary
  • Cognos Resources
  • About Me
  • Categories
    • Cognos
      • Data Modules
      • Administration
      • Framework Manager
      • Dashboards
    • Opinion
    • Community Spotlight
  • PMsquare
  • Subscribe

Data Sets

When To Use Cognos Data Sets

April 14, 2020 by Ryan Dolley 14 Comments

I introduced you to Cognos Data Sets in Part 1 of this series and you recognize some intriguing possibilities. You want the massively improved performance, simple presentation for end users and quick road to Cognos modernization that Data Sets offer, but you’re not sure how to start. Well I’m here to help you understand when to use Cognos Data Sets – how to recognize each situation and how Data Sets help.


Prepare Data for Advanced Analytics

Preparing data for Cognos Analytics Explore is a great example of when to use Cognos Data Sets
Advanced Analytics require high quality data to function properly

Advanced Analytics features like forecasting in Cognos often work much better with Data Sets than other data source types. A narrowly focused, in-memory source dramatically enhances the speed, interactivity, accuracy and usefulness of features like Explore or the AI Assistant. This is especially true compared to giant Framework Manager models.

Recognizing poorly prepared data

The need to prepare data is most apparent when the advanced features of Cognos Analytics fail to provide meaningful suggestions or build garbage output. This manifests in the following ways:

  • The AI Assistant cannot understand which instance of ‘customer’ you want and picks it from an incorrect namespace
  • The AI Assistant makes very poor suggestions
  • AI generated visualizations do not filter properly because they contain different versions of the same data item – ‘customer’ from 3 different tables
  • The ‘generate dashboard’ command creates a nonsense dashboard
  • The forecasting feature does not appear in line, bar or column charts
  • Explore takes a very long time to load or interact with

Using Data Sets for advanced analytics

Data Sets make it easy to simplify the data used for advanced analytics. Because they are quick to make and perform very well I use them any time I want a great experience for my end users. The goal of using Data Sets for advanced analytics are:

  • Remove any duplicates in the data. Each field should occur only once
  • Identify a specific subject of analysis and include only measures and fields that help understand that subject. The explore feature helps immensely with this
  • Help the AI assistance shine by producing meaningful results
  • Improve performance across the board, especially in Explore

Improve Performance of Existing Models

Most long time Cognos customers have at least some models that perform slowly. Maybe it’s logic processing at run time. Maybe it’s the underlying database. Whatever the cause, you can’t let your end users watch a wheel spin for minutes on end whenever they make a slight change to a dashboard. Oftentimes customers solve this problem by locking Dashboards, Stories, Explore and anything else new and cool away from users. That’s a big mistake.

Recognizing poor performance in Cognos

This is fairly straightforward. You know performance is poor because Cognos is slow, right? Generally yes but there are some situations where poor performance manifests in surprising ways.

  • People call you and say ‘Cognos is slow’
  • You check Thrive and it tells you ‘Cognos is slow’
  • User adoption for self service features is lacking
  • Schedules are frequently late or are challenging to maintain
  • Source systems process dozens or hundreds of similar queries
  • You just keep staring at that damn spinning wheel

Improving Performance with Data Sets

This is an area where Data Sets shine because you’ve already got a model with all sorts of embedded business logic. It’s extremely easy to generate data sets as needed, and they automatically inherit all that Framework Manager logic. Very little data rework results in huge performance gains. Your goal is to:

  • Take advantage of in-memory processing and server RAM
  • Summarize detailed data to a higher grain to decrease row counts and better target analysis
  • Sort data by commonly filtered data items
  • Filter out unnecessary records
  • Decrease load on underlying data bases
  • Banish the spinning wheel forever

Imagine a query that processes for 15 minutes and runs 100 times a day. You are spending 1,500 minutes processing that data. By moving to a data set, the query runs once for 15 minutes to load the data. All subsequent executions load in ~1 second as data pulls from memory, not the database. You just saved 1,445 minutes of processor time. And you saved the sanity of your end users.

A real world performance example

My friend Rory Cornelius gave me the following quote about Data Sets. Rory actually did this with one of our clients. It shows how these techniques work to solve all sorts of Cognos problems.

Not your typical use case, but my client has this huge set of scheduled jobs. There was one job that had 10 reports each that queried almost the same data. Each report took about 45 minutes to run and they wanted them done sequentially to limit load. I pointed the reports to a Data Set instead, and they took 2 minutes to run instead. The Data Set still takes quite a while to load, but even with that, the total time was cut by at least 5 hours with significantly less load on the database.

Rory Cornelias, Senior Solution Architect with PMsquare

Combine Data Sources in Cognos

Throughout my career the number one impediment to analytics delivery is the struggle to combine data from multiple databases or applications. Data exists at different levels of detail with messy, mismatched keys and incompatible query languages. It’s just tough out there. However Data Sets radically streamline this process, especially for data already in Cognos. They provide a form of lightweight ETL and query processing to supplement for fully featured tools like Incorta or IBM ADP/Trifacta.

Data sets can be used to combine SSAS cubes with data warehouse tables.
A real world example of combining data sets from one of my clients

Recognizing data source mashup bottlenecks

Whether it’s a lack of clear requirements or an IT bottleneck for ETL, projects often wait for months or years at this stage. Faced with mounting delays, frustrated end users often choose to export data from Cognos and go it alone in Power BI. But you can learn to recognize the signs of data mashup bottlenecks

  • The data warehouse request backlog grows to many multiples of the Cognos backlog
  • End users export tons of data to excel
  • Advanced metrics are challenging to build because you are missing key calculation components
  • You often make model or data warehouse changes to add just a few columns or tables

Combining data with Cognos Data Sets

The process of combining data sources using Data Sets could hardly be easier as I outlined in Part 1 of this series. Instead the challenge lies in working through the logic of how best to combine two sources. The main things you will need to do are:

  • Identify the fields required for your analysis and locate them in your data sources
  • Create a data set for each source
  • Aggregate data at a compatible level of detail
  • Perform necessary data cleansing to make joins possible
  • Add filters, calculations or other logic at the Data Set level, not in Data Modules or Reports/Dashboards
  • Schedule data sets so that they build in the correct order
  • Combine them by joining together in a data module

This technique allows you dramatically simplify some complex ETL tasks with large and complex databases by first boiling each source down to just the fields you need. The key thing is to embed as much logic into the Data Set load process as possible. This minimizes query cost at run time and makes building and maintaining your Data Modules as easy as possible.

Simplify Presentation for Self Service

Framework Manager models typically exist for IT and accumulate years or even decades worth of developer focused design decisions. As a consequence they often require crucial yet undocumented context to generate accurate and timely queries, with a host of conditional flags, hidden filters and inscrutable calculations. Self service becomes impossible when end users don’t understand the structure or context of data. This is the number one objection I hear to rolling out Dashboards or Explore in Cognos Analytics

Recognizing overly complex models

An overly complex model stands between you and the evolution of your BI practice like an unbridgeable chasm. It’s calling card is the list of things you cannot accomplish because ‘the data is too complex.’ You know it by:

  • End users cannot effectively use the model, or you have locked them out of it due to data quality concerns
  • Self service feature roll out met with limited success due to data complications
  • Use of your data is always accompanied by caveats, ‘You have to include flags x,y,z to get meaningful results’
  • Debugging data problems is extremely confusing or time consuming
  • New hires to the BI team require weeks or months to get up to speed with the data

Preparing Data for Self Service

Data Sets are the bridge to this chasm. Because they are so easy to make and inherit all the logic from your Framework Manager source, you embed and effectively hide the underlying complexity with a well designed Data Set. You will need to build multiple Data Sets from the same model to effectively simplify the presentation – this is a factor of your design. Remember, the goal for Data Sets is to break a ‘one size fits none’ model into smaller, usable components. Let your data sets multiply!

  • Break your large model into smaller, digestible subject areas based around the types of questions your users need to answer
  • Build a Data Set for each subject area
  • Don’t be shy about overlapping data in multiple Data Sets. The end goal is to make something easy to use for an individual subject area
  • Don’t be shy about building lots of Data Sets
  • Remember – the Data Set inherits your Framework Manager logic. You should have a high degree of data consistency across Data Sets as a result
  • Always be willing to alter, change, abandon and create new data sets based on evolving user needs.

Your instincts from Framework Manager probably tell you to come up with a grand, cross – Data Set design to ensure consistency and eliminate re-use of fields. Don’t do this. Remember, tailor each Data Set to the needs of its user community and be willing to adapt as those needs change. This is the key to modernizing Cognos to compete with Tableau or Power BI

Modernize Your Cognos Practice

By following all these steps you will modernize much of your Cognos Analytics practice without intentionally doing so. A modern BI practice requires two modes of operation, often called ‘Mode 1’ and ‘Mode 2’. Mode 1 is the traditional enterprise BI way of doing things; ETL, ODS, EDW, monolithic Framework Manager models, IT authored reports. It remains a vital component of our work. However Mode 2 is equally important; Agile data mashup, in-memory processing, collaboration with self-service users and above all, speed.

The techniques outlined above will get you to mode 2 rapidly, even if it seems daunting or impossible today. Because you’ve done so much great work building your Framework Manager models you have an incredible foundation for self service – you just haven’t realized it yet. Using Data Sets in combination with Data Modules and Dashboards will give you the performance, simple data presentation and agility you need. Try it! And as always if you need some help along the way reach out to me and PMsquare. The answer to ‘When to use Cognos Data Sets’ is ‘Now!’


  • Cognos Union Queries in Reports
  • Cognos Relative Dates in 11.2
  • The 2021 Gartner BI Magic Quadrant is Broken for Cognos Analytics
  • Data Modeling for Success: BACon 2020
  • Cognos Analytics 11.1.6 What’s New

What Are Cognos Data Sets?

April 7, 2020 by Ryan Dolley 16 Comments

I’ve explored Data Modules in depth on this blog over the last year with the hope of showing you how awesome data modeling in Cognos Analytics can be if you really embrace it. There is, however, an additional piece of the Cognos data puzzle that you need to understand to unlock the full potential of the platform – the Data Set. So let’s answer the question – just what are Cognos Data Sets?


This video introduction to Data Sets covers everything you need to know!

The IBM Blueview Data Set Series

What are Cognos Data Sets?
When to use Cognos Data Sets


What is a Data Set in Cognos?

The Cognos Data Set screen is easy to understand and use.
Data Sets offer an in-memory data processing option for Cognos Analytics

Simply put, a Data Set is data source type in Cognos Analytics that contains data extracted from one or more sources and stored within the Cognos system itself as an Apache parquet file. The parquet file is then loaded into application server memory at run-time on an as-needed basis. This (usually) greatly enhances interactive performance for end users while reducing load on source databases. When combined with Data Modules, Data Sets offer incredible out-of-the-box capabilities like automatic relative time, easy data prep and custom table creation.

Data Sets are also extremely easy to build from your existing Framework Manager or Transformer packages making them an excellent option for getting the most out of your legacy Cognos 10 models. In fact this is probably the #1 use case for the Data Set technology and is the absolute fastest way to modernize your environment and turn Cognos into a rapid-fire data prep and visualization machine.

I’m going to write a full blog post about the exact situations that suggest a Data Set solution, but in short you should consider using Data Sets whenever:

  • Excellent interactive performance is a critical part of your deliverable
  • You wish to limit extremely costly SQL queries by re-using results
  • You must join multiple data sources together or accomplish other ETL tasks within Cognos rather than source systems
  • Existing Framework Manager or Transformer models are too complex or too slow for self-service
  • Someone tells you Cognos is slow but Tableau or Power BI are fast (those tools use Data Set-like technologies to enhance interactive performance)
  • You just want to do something really cool

Which Features can use a Data Set

There is one small limitation to Data Sets – while they function as a data source for all Cognos Analytics features they cannot be used directly to author reports. The solution to this is simple – wrap them in a Data Module and import the Data Module to Report Authoring. You should be doing this anyway for all Data Sets as it provides maximum deployment flexibility and ease of upkeep. I will cover best practice topics like this in a future article.

How to Build a Data Set

The 'create data set' capability is pretty well hidden in Cognos Analytics
The ‘Create data set’ capability is hidden among model options

Building a Data Set is simple, especially if you have existing Framework Manager or Transformer models available in Cognos. In fact Data Sets can only be built on top of existing models or Data Modules- not directly on data servers. IBM has helpfully hidden the ‘Create data set’ capability in the ‘more’ menu of model objects in the environment, so it’s surprisingly easy to miss.

Cognos Data Set Creation

Creating a Data Set is a straightforward process, especially for experienced Cognoids. The UI is actually a re-skinned version of Report Authoring and many of your favorite tricks will work here. Building a Data Set is as simple as dragging columns into the list object, saving and loading data. Of course there are additional options you can take advantage of.

The Cognos Analytics Data Set creation screen shares many features with the Report Authoring interface
  1. Source View: Browse the tables and fields in your data source exactly as you would in Report Authoring
  2. Data List: The data table shows a live view of the Data Set as you build it. It queries new data as you make changes
  3. On Demand Toolbar: The on demand toolbar appears when you click on a column, giving you the ability to filter and sort.
    1. Filtering: Filters help you focus the data in your Data Set to just what you need. Fewer rows = better performance.
    2. Sorting: Sorting by the columns most used in report or dashboard filters (for example, time data) can greatly improve performance
  4. Query Item Definition: The query item appears when you double click a column header. You have access to query item functionality from Report Authoring, which means you can really accomplish a lot from this popup.
  5. Preview: Unchecking the preview button switches the data table into preview mode which turns off automatic data query as you make adjustments to your Data Set.
  6. Summarize and Row Suppression: The summarize function rolls your data up to the highest level of granularity, for example rolling daily data up to the month. Row suppression is honestly a mystery to me Special thanks to Jason Tavoularis at IBM for an explanation – row suppression in data sets only applies to dimensional data sources and does the same thing as using row suppression in Report Authoring.

Once you’ve imported your desired data, set your filters, sorts and summaries and maybe added a few calculations for good measure it’s time to save, load and deploy your Data Set.

Saving and Loading a Data Set

Data Set save options include Save, Save As and Save and Load Data
Data Sets must be saved and loaded to be available

When you save a Data Set you will see the option to ‘Save and load data.’ This will allow you to select a directory in Cognos to house the Data Set object. It will also issue one or more queries to retrieve data and populate a parquet file. This file is stored in Cognos and loaded into memory upon request when users access the Data Set. Check out the ‘Flint’ section of this in depth article to understand what happens under the hood during Data Set creation and Query

Scheduling and Managing Data Sets

Data Sets only contain data from their last load; it is good practice to get in the habit of scheduling and monitoring Data Sets to ensure they contain relevant data and continue to perform well.

Data Set Scheduling Options

Data Sets and Reports have all the same scheduling options
Data Sets have the same scheduling options as reports

The easiest way to schedule Data Sets is via the ‘schedules’ tab in Data Set properties. Data Sets and Reports share all the same scheduling options, including the ‘by trigger’ option. Scheduling via a trigger makes it easy to ensure Data Sets only load after your ETLs complete. This works great for simple or one-off scheduling tasks.

For more complex schedules, Data Sets are available in the Job feature. Again, they function as if they were reports as far as building Jobs is concerned.

Data Set Management

The Advanced Properties view contains the statistics you need to manage Data Set performance.
Manage Data Sets using their advanced properties

The Data Set properties screen contains the info you need to effectively maintain fresh and performant data for your end users. At the top of window you can see the last load date of the Data Set, while expanding the ‘advanced’ exposes the following:

  • Size: The compressed size of the parquet file on disk
  • Number of rows: The number of rows in your data set. Keep this under ~8 million for best performance
  • Number of columns: The number of columns in your data set. No hard limit here, just don’t include columns you don’t need
  • Time to refresh: The time it takes for the Data Set to load
  • Refreshed by: The name of the person who last refreshed the data set

I will write a longer post about Data Set tuning and troubleshooting. For now it’s key to keep in mind the row and column suggestions above. And while ‘Time to refresh’ is important, this represents the time it takes to load data and has no impact on the performance end users will experience. The beauty of Cognos Data Sets is that by front-loading the processing, you can create a complex result set that takes hours to load but offers sub-second response time to end users.

A Real World Example of Data Sets in Action

I have used Data Sets in many successful client engagements to greatly improve performance, simplify presentation or accomplish ETL tasks in an afternoon that their DW team had put off for years. Here is a simple example for you.

The Problem: Metrics, metrics everywhere!

This customer came to us with a very, very common problem. The sales support team had identified a need for some new advanced metrics and built out a prototype dashboard. However, the underlying data divided between two Microsoft SSAS cubes and a handful of tables in the EDW. The data warehouse had given an estimate of many months to create the necessary tables and cubes.

The Solution: Cognos Analytics Data Sets

The customer brought in PMsquare on a 40 hour contract to make this happen. If your initial reaction to that contract length is skepticism I don’t blame you. In Cognos 10 this would have been impossible. However thanks to Data Sets I was able to do the following:

  • Extract the needed data from each SSAS cube and the EDW into a Data Set. There were 3 Data Sets total, one from each data source.
  • Join the Data Sets together into a Data Module and add in all the Data Module goodies like relative time
  • Create a new, final polished Data Set from that Data Module to simplify presentation and improve performance
  • Build out the customer’s dashboard

The customer was extremely satisfied with the end result, which looked something like this:

A real world data flow from a project I successfully completed.
A cavalcade of awesomeness awaits you with Data Sets

Cognos Analytics Data Sets in Summary

As you can see, I really was able to accomplish months of work in a single week using Data Sets. Obviously this technology cannot replace all ETL tasks however Cognos Analytics is now an option for low to medium complexity transformations. And you now have a slam-dunk option for rapidly simplifying presentation or improving performance vs even the simplest database view.

Be sure to check back next Tuesday, 4/14/2020 for part two of this series: When To Use A Data Set!


Catch up on all things Cognos:

  • Cognos Union Queries in Reports
  • Cognos Relative Dates in 11.2
  • The 2021 Gartner BI Magic Quadrant is Broken for Cognos Analytics
  • Data Modeling for Success: BACon 2020
  • Cognos Analytics 11.1.6 What’s New

Copyright © 2023 · Atmosphere Pro on Genesis Framework · WordPress · Log in