• Skip to main content

IBM Blueview

Cognos Analytics and all things IBM

  • The Blog
  • Cognos Glossary
  • Cognos Resources
  • About Me
  • Categories
    • Cognos
      • Data Modules
      • Administration
      • Framework Manager
      • Dashboards
    • Opinion
    • Community Spotlight
  • PMsquare
  • Subscribe

What Are Cognos Data Sets?

April 7, 2020 by Ryan Dolley 16 Comments

I’ve explored Data Modules in depth on this blog over the last year with the hope of showing you how awesome data modeling in Cognos Analytics can be if you really embrace it. There is, however, an additional piece of the Cognos data puzzle that you need to understand to unlock the full potential of the platform – the Data Set. So let’s answer the question – just what are Cognos Data Sets?


This video introduction to Data Sets covers everything you need to know!

The IBM Blueview Data Set Series

What are Cognos Data Sets?
When to use Cognos Data Sets


What is a Data Set in Cognos?

The Cognos Data Set screen is easy to understand and use.
Data Sets offer an in-memory data processing option for Cognos Analytics

Simply put, a Data Set is data source type in Cognos Analytics that contains data extracted from one or more sources and stored within the Cognos system itself as an Apache parquet file. The parquet file is then loaded into application server memory at run-time on an as-needed basis. This (usually) greatly enhances interactive performance for end users while reducing load on source databases. When combined with Data Modules, Data Sets offer incredible out-of-the-box capabilities like automatic relative time, easy data prep and custom table creation.

Data Sets are also extremely easy to build from your existing Framework Manager or Transformer packages making them an excellent option for getting the most out of your legacy Cognos 10 models. In fact this is probably the #1 use case for the Data Set technology and is the absolute fastest way to modernize your environment and turn Cognos into a rapid-fire data prep and visualization machine.

I’m going to write a full blog post about the exact situations that suggest a Data Set solution, but in short you should consider using Data Sets whenever:

  • Excellent interactive performance is a critical part of your deliverable
  • You wish to limit extremely costly SQL queries by re-using results
  • You must join multiple data sources together or accomplish other ETL tasks within Cognos rather than source systems
  • Existing Framework Manager or Transformer models are too complex or too slow for self-service
  • Someone tells you Cognos is slow but Tableau or Power BI are fast (those tools use Data Set-like technologies to enhance interactive performance)
  • You just want to do something really cool

Which Features can use a Data Set

There is one small limitation to Data Sets – while they function as a data source for all Cognos Analytics features they cannot be used directly to author reports. The solution to this is simple – wrap them in a Data Module and import the Data Module to Report Authoring. You should be doing this anyway for all Data Sets as it provides maximum deployment flexibility and ease of upkeep. I will cover best practice topics like this in a future article.

How to Build a Data Set

The 'create data set' capability is pretty well hidden in Cognos Analytics
The ‘Create data set’ capability is hidden among model options

Building a Data Set is simple, especially if you have existing Framework Manager or Transformer models available in Cognos. In fact Data Sets can only be built on top of existing models or Data Modules- not directly on data servers. IBM has helpfully hidden the ‘Create data set’ capability in the ‘more’ menu of model objects in the environment, so it’s surprisingly easy to miss.

Cognos Data Set Creation

Creating a Data Set is a straightforward process, especially for experienced Cognoids. The UI is actually a re-skinned version of Report Authoring and many of your favorite tricks will work here. Building a Data Set is as simple as dragging columns into the list object, saving and loading data. Of course there are additional options you can take advantage of.

The Cognos Analytics Data Set creation screen shares many features with the Report Authoring interface
  1. Source View: Browse the tables and fields in your data source exactly as you would in Report Authoring
  2. Data List: The data table shows a live view of the Data Set as you build it. It queries new data as you make changes
  3. On Demand Toolbar: The on demand toolbar appears when you click on a column, giving you the ability to filter and sort.
    1. Filtering: Filters help you focus the data in your Data Set to just what you need. Fewer rows = better performance.
    2. Sorting: Sorting by the columns most used in report or dashboard filters (for example, time data) can greatly improve performance
  4. Query Item Definition: The query item appears when you double click a column header. You have access to query item functionality from Report Authoring, which means you can really accomplish a lot from this popup.
  5. Preview: Unchecking the preview button switches the data table into preview mode which turns off automatic data query as you make adjustments to your Data Set.
  6. Summarize and Row Suppression: The summarize function rolls your data up to the highest level of granularity, for example rolling daily data up to the month. Row suppression is honestly a mystery to me Special thanks to Jason Tavoularis at IBM for an explanation – row suppression in data sets only applies to dimensional data sources and does the same thing as using row suppression in Report Authoring.

Once you’ve imported your desired data, set your filters, sorts and summaries and maybe added a few calculations for good measure it’s time to save, load and deploy your Data Set.

Saving and Loading a Data Set

Data Set save options include Save, Save As and Save and Load Data
Data Sets must be saved and loaded to be available

When you save a Data Set you will see the option to ‘Save and load data.’ This will allow you to select a directory in Cognos to house the Data Set object. It will also issue one or more queries to retrieve data and populate a parquet file. This file is stored in Cognos and loaded into memory upon request when users access the Data Set. Check out the ‘Flint’ section of this in depth article to understand what happens under the hood during Data Set creation and Query

Scheduling and Managing Data Sets

Data Sets only contain data from their last load; it is good practice to get in the habit of scheduling and monitoring Data Sets to ensure they contain relevant data and continue to perform well.

Data Set Scheduling Options

Data Sets and Reports have all the same scheduling options
Data Sets have the same scheduling options as reports

The easiest way to schedule Data Sets is via the ‘schedules’ tab in Data Set properties. Data Sets and Reports share all the same scheduling options, including the ‘by trigger’ option. Scheduling via a trigger makes it easy to ensure Data Sets only load after your ETLs complete. This works great for simple or one-off scheduling tasks.

For more complex schedules, Data Sets are available in the Job feature. Again, they function as if they were reports as far as building Jobs is concerned.

Data Set Management

The Advanced Properties view contains the statistics you need to manage Data Set performance.
Manage Data Sets using their advanced properties

The Data Set properties screen contains the info you need to effectively maintain fresh and performant data for your end users. At the top of window you can see the last load date of the Data Set, while expanding the ‘advanced’ exposes the following:

  • Size: The compressed size of the parquet file on disk
  • Number of rows: The number of rows in your data set. Keep this under ~8 million for best performance
  • Number of columns: The number of columns in your data set. No hard limit here, just don’t include columns you don’t need
  • Time to refresh: The time it takes for the Data Set to load
  • Refreshed by: The name of the person who last refreshed the data set

I will write a longer post about Data Set tuning and troubleshooting. For now it’s key to keep in mind the row and column suggestions above. And while ‘Time to refresh’ is important, this represents the time it takes to load data and has no impact on the performance end users will experience. The beauty of Cognos Data Sets is that by front-loading the processing, you can create a complex result set that takes hours to load but offers sub-second response time to end users.

A Real World Example of Data Sets in Action

I have used Data Sets in many successful client engagements to greatly improve performance, simplify presentation or accomplish ETL tasks in an afternoon that their DW team had put off for years. Here is a simple example for you.

The Problem: Metrics, metrics everywhere!

This customer came to us with a very, very common problem. The sales support team had identified a need for some new advanced metrics and built out a prototype dashboard. However, the underlying data divided between two Microsoft SSAS cubes and a handful of tables in the EDW. The data warehouse had given an estimate of many months to create the necessary tables and cubes.

The Solution: Cognos Analytics Data Sets

The customer brought in PMsquare on a 40 hour contract to make this happen. If your initial reaction to that contract length is skepticism I don’t blame you. In Cognos 10 this would have been impossible. However thanks to Data Sets I was able to do the following:

  • Extract the needed data from each SSAS cube and the EDW into a Data Set. There were 3 Data Sets total, one from each data source.
  • Join the Data Sets together into a Data Module and add in all the Data Module goodies like relative time
  • Create a new, final polished Data Set from that Data Module to simplify presentation and improve performance
  • Build out the customer’s dashboard

The customer was extremely satisfied with the end result, which looked something like this:

A real world data flow from a project I successfully completed.
A cavalcade of awesomeness awaits you with Data Sets

Cognos Analytics Data Sets in Summary

As you can see, I really was able to accomplish months of work in a single week using Data Sets. Obviously this technology cannot replace all ETL tasks however Cognos Analytics is now an option for low to medium complexity transformations. And you now have a slam-dunk option for rapidly simplifying presentation or improving performance vs even the simplest database view.

Be sure to check back next Tuesday, 4/14/2020 for part two of this series: When To Use A Data Set!


Catch up on all things Cognos:

  • Cognos Union Queries in Reports
  • Cognos Relative Dates in 11.2
  • The 2021 Gartner BI Magic Quadrant is Broken for Cognos Analytics
  • Data Modeling for Success: BACon 2020
  • Cognos Analytics 11.1.6 What’s New

Filed Under: Cognos, Data Sets

Reader Interactions

Comments

  1. Jerzy Konarski says

    April 14, 2020 at 11:52 am

    Thank you for this great article, but I have a few comments.
    By reading your article one has the impression that the DataSet is a miraculous solution for the performance problems that you MUST have with the FM and Transformer models, but this is not entirely true.
    You write “Existing Framework Manager or Transformer models are too complex or too slow for self-service”. If the FM model is too complicated and too slow
    In addition, say that this is a great option for getting the most out of your legacy Cognos 10 models.
    You can’t say it like that.
    Legacy models need to be modernized to meet modern needs – DQ migration, performance optimization, use of columnar storage. This is the first step in migrating to CA 11.
    The use of Framework Manager itself does not affect performance, the query execution load is shifted to the database engine, response times depend on proper sizing of the database.
    As for the Power Cubes, they give excellent response times and extracting the DataSets from them most often will not bring anything in terms of performance improvements.
    If you do a DM on a basis that does not offer good response times the performance will not be better.
    The DataSet is a valuable tool but – once again – it should not be opposed to models from previous versions of Cognos but explain why it can be useful. What you do elsewhere.

    Regarding your example, it was surely not possible to carry out the example of which you speak with Cognos 10, in any case in this form. With a Transformer cube this is quite possible. Requests on each source in place of DataSets and the Transformer model in place of DM. It’s the same solution.
    The three sources that you cite are the BI sources and I can hardly believe that it takes months of development to create a datamart from 3 sources of this type. If this is the case it is – in my opinion – more for procedural reasons than for development times. By moving the development onto the user’s desktop, it is no longer a matter of IT, but of the end user and you can develop in AGILE.
    Just don’t forget that DatsSets management is a black box, files are stored in the content store and there is no way to optimize its behavior. IT management doesn’t like that. This is one of the reasons why DataSets are not made to handle large volumes of data either.
    But it’s a great tool – mixed with DM – for data analysts and advanced users.

    Reply
    • Ryan Dolley says

      April 14, 2020 at 12:55 pm

      Jerzy – thank you for this thoughtful comment! I agree with most of your objections in principle. It is better to undergo a redesign that takes into account FM model structure, database type, sizing and structure, etc.. when looking to improve performance. It was possible in Cognos 10/8 to merge multiple sources together in a Transformer cube, and Data Set management is not idea (although there are more options for tuning data sets than most people realize; you can change where data sets are stored and you can access the spark console to at least see how data sets are processing)

      The thing is, most of my clients can’t or won’t do those things. The example I cited was a real example where the sales organization had an outstanding request to the IT BI team for merging those data sources together into a new data mart. You are absolutely correct, they failed to even start the work for something like 14 months due to procedural issues. However even once the work had started the team had no ability to create a new data mart in 40 hours. It would have probably been 4-6 weeks of work given the resources available.

      So my argument is not really that Data Sets are superior to Framework Manager or Transformer, but rather that they are so much easier to use that skilled BI practitioners can accomplish a much higher volume of work when using them compared to alternative – and technically superior – solutions.

      The number one problem for my clients is under staffing and lack of technical skills. For these clients Data Sets offer a huge advantage for the BI team, not just end users. For clients with large and highly skilled BI departments they serve an important role in prototyping and agile delivery but you are right – they are not a replacement for high performance databases and well crafted models.

      Please continue to read and comment on this blog – I appreciate your perspective very much!

      Reply
  2. Gaston says

    April 15, 2020 at 11:05 am

    The best use of Cognos is when the data is prepared by a competent IT department and it’s properly modeled in Framework Manager or similar. While it is great that Cognos 11 is offering modelling tools for the final user, I don’t want to waste time preparing the data when there is an IT depatment with brilliant ETL capabilities. When data is profesionally modeled, creating reports and query data takes only a fre minutes. Data models is great for a Proof of concept or for a small business without proper IT support. Small businesses probably would be better with Power BI or Tableau.

    Reply
  3. SirM says

    June 17, 2020 at 8:35 pm

    I’ve started to use data sets and starting to find that could be a powerful solution. The main drawback is that the capability to define the contents of the data sets is extremely limited. There are no options for complex filtering or joining data. Before 11.1.5 there were some hacks to use “Reporting” to define the data set contents.
    From 11.1.5 I found that query calculations could be added by dragging an existing data item and changing the definition. This also allows for creating more complex data filtering structures. No solution yet for joining data (Probably on 11.1.7)

    Reply
    • Ryan Dolley says

      October 16, 2020 at 12:24 pm

      Did you see that 11.1.7 does indeed solve this issue? Very cool!

      Reply
  4. Logan says

    October 15, 2020 at 8:40 pm

    I stumbled upon this article looking for ways to improve run times of reports pulling from complex relationships from framework objects. We have grown to have everything mapped in our framework from different data sources but all our reporting that uses this cross-source data takes forever to run. All our queries are fresh pulls from the source (real time data). This opens up a whole new world for improved performance of reports that dont necessarily need real time data. This is the best thing I’ve seen for years. Going to star playing with it soon!

    Reply
    • Ryan Dolley says

      October 16, 2020 at 12:23 pm

      You are going to love them, trust me! Especially if you are on 11.1.7 – the 11.1.7 release had a great little data set overhaul that added a ton of great data blending features.

      Reply
  5. Łukasz says

    February 19, 2021 at 2:35 am

    hello, how can you solve the permissions on the dataset? can a given user see only part of the data?

    Reply
    • Ryan Dolley says

      February 22, 2021 at 11:36 am

      Yes, but you need to ensure that the username or a group/role is included in the data itself. Once that’s done you can write a filter in a data module that pulls only the tables associated with the ID from the session. I’ll do a youtube video on how to do this.

      Reply
  6. Akshay says

    February 19, 2021 at 2:43 am

    Hi,
    What are the Memory Limitations for Data Sets?

    Reply
    • Ryan Dolley says

      February 22, 2021 at 11:32 am

      You’re going to want at least 32GB available on the application server, and you may want to increase the Compute Service max java heap size to accommodate data sets if performance indicates. The most important things to do are to only include the data you need as a best practice, sort the fields that will be used as filters in the data set builder, and keep it to around 8M rows tops.

      Reply
  7. Bruno says

    February 24, 2021 at 10:38 am

    Thanks for the article, it was very useful! 🙂

    Question about it, is there a way I can retrieve this data set metadata information, such as ‘Date Refreshed’ property to the report?

    Reply
    • Ryan Dolley says

      March 3, 2021 at 3:40 pm

      That is a good question… I don’t know, I’ll try to find out!

      Reply
  8. MALLIKARJUNA BANDI says

    June 15, 2021 at 1:15 am

    Row level or Object Level Security can you please post the yoitube video here if u have done Ryan.

    Reply
  9. Clara Knox says

    October 27, 2021 at 3:52 pm

    How do you change the query for the dataset?

    Reply
  10. Rick says

    November 8, 2021 at 1:29 pm

    How do you replace the query data on the page? I have my query joins completed but the page is pointing at the 1st query and I need to change it.

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Copyright © 2023 · Atmosphere Pro on Genesis Framework · WordPress · Log in