
By Xavier Underwood
UX Researcher, IBM watsonx.data
Optimized Analytics
A Study to eradicate the highest issues users face when accumulating data in a Data Lakehouse. As well as clarifying how expectations and workloads changed now versus 15 years ago,
With much rigor and iteration between the watsonX.data UX Research and Product Management leadership over the past year, a mixed method study was designed. The purpose was to learn about 190 data management professional’s current market situation regarding where they are, where they’re going, & what changes they would like to see in the Data Management space.
Market Findings
Data Lakes
- According to over half of the users surveyed Hadoop is the biggest Data Lake today followed by Spark & MongoDB
- On cloud is the most popular version of a Lakehouse with over 50% of users having noted having at least one data lake on cloud meanwhile 30% of users have no data lakes on prem.
- This confirms that users are moving to the cloud, this is mainly due to the cost of on prem storage solutions.

Data Warehouses
IBM is only a small part of the Data Warehouse market
- IBM DB2 28% & IBM Nettezza 22% make up a small share of the Analytical Warehouses market in comparison to Amazon Redshift 39% & MySQL/Synapse 49%.
- They are utilizing SQL queries & HiveQL to extract and analyze data.
- Once again users are signifying the move from on prem to cloud with just shy of a quarter of all users not having a data warehouse on prem.

Storage & Clouds
Storage & Clouds Image
Based on 177 users surveyed IBM is far from the top in storage solutions accounting for only 14% of storage solutions among users surveyed.
- Only 3% of users are using IBM cloud storage regularly.
Data Movement
- The majority of users are moving Terabytes between regions at 35%, 23% of users are moving Gigabytes
- Security & Data Security were two of the top five painpoints amongst users with a combined 46 mentions of the 177 users surveyed.


- The majority of our users are moving data daily & then weekly from Data Lake to Data Warehouses.
- The same goes for Data Warehouses to Data Lakes the majority of users are moving data daily & then weekly.
- These are also not minor amounts of data the majority of our users are moving Terabytes of data & if they are moving terabytes they are moving gigabytes.
Cost Drivers

Compliance according to 77% of Data Professional surveyed is our largest pain point due to data security. It seems that our competitors AWS & SAP are not meeting users needs.
“Permissions in generally are pervasively painful Sequencing – we’re working around the clock to generate reports, for example, and have to factor in DST, and navigate the batch/streaming world with Kafka.”
—Data Engineer,
Hybrid User

Performance according to 45% of Data Management Professionals surveyed is the second highest pain point amongst users surveyed. Speed is hindered by scalability & is negatively impacted by cost
“Another pain point is ensuring storage performance. As datasets grow and storage media evolve, it is important to maintain high performance.”
—Data Engineer,
Hybrid User

According to 37% of Data Management professionals , scalability is another major problem due to capacity planning.
“As data volume grow, it becomes crucial to ensure that storage systems can handle the increasing load and deliver high-performance access & processing capabilities. This includes effecient data indexing, partitioning, & optimizing storage & retrieval operations.”
—Data Engineer,
Hybrid User

Data Management Landscape
Data Lakes
The watsonx.data product management team was interested in learning what Data Lakes were being utilized as well as how many they had on cloud versus on prem. Hadoop is the most popular data lake being utilized by sixty one percent of users surveyed. Our largest group of On Prem Data Lakes were with users that had none on prem. Fifty-one users indicated having no Data Lakes on prem which is 28% of users surveyed. Meanwhile On Cloud the largest segment identified is amongst users that have six plus data lakes. The data suggest that there may be a movement among users from On Prem to Cloud. Data Lakes Image
Data Warehouses
IBM has two products in the Data Warehouse Market, DB2 & Netezza. DB2 was utilized by 28% of users surveyedwhile Netezza was being used by 22% of users. In comparison our largest competitor in the Data Warehouse Space as indicated by the study results is MySQL, which has 49% of our users surveyed. There is a move from On Prem Data Warehouses to On Cloud due to twenty four percent of users not having any Data Warehouses On Prem. In comparison thirty nine percent of users surveyed have at least 2-3 Data Warehouses on cloud. Data Warehouse image
Data Movement
The majority of our users are moving data daily & then weekly from Data Lake to Data Warehouses. The same goes for Data Warehouses to Data Lakes the majority of users are moving data daily & then weekly. These are not minor amounts of data, 34% of our users surveyed are moving Terabytes of data between regions. Twenty three percent of users surveyed are moving gigabytes between regions. With performance being such a large pain point it’s reasonable to believe that users would move more petabytes if data quality was on par with terabytes. Users are moving the amount of data that they feel is safe to avoid the risk of data loss. Data Movement image 1, Data movement Image 2
Data Lake → Data Warehouse
- 29% of users are moving data daily.
- 19% of users are moving data weekly.
Data Warehouse → Data Lake
- 29% of users are moving data daily.
- 14% are moving data weekly.
Cost Drivers
What are your top 3 pain points when it comes to Data Management?
Compliance
Security compliance approvals are required before work can be done. Business drivers are lost in the process. There is not enough flexible/on-demand resources for non-cloud data. – Data Engineer, Hybrid User
Compliance was our number one pain point according to 77% due to GDPR, CCPA, CCPR, etc. this has become a major pain point. The laws were enacted before Data Lakes & Data Warehouses had the infrastructure to handle the implementation. The compliance pain point consist of users problems across Governance, Accessibility, Security, Permissioning, & Retention.
- Compliance approvals are required before work can be done.
- Business drivers are lost in the permissioning process.
The compliance pain point consists of users problems across Governance, Accessibility, Security, Permissioning, & Retention.
Scalability
“It’s a constant challenge to ensure our storage systems can handle growing data volumes, protect sensitive data from unauthorized access, and effectively manage the lifecycle of data from creation to deletion.”- Data Engineer (Hybrid User), Riverbed Technology
Thirty Seven percent of all users surveyed found Scalability to be an issue making it the number one pain point that they face.
- Scaling storage also comes with having to scale accessibility & security.
- Scalability is a guessing game for users due to capacity planning.
- Users feel this process is extremely tedious & it takes time away from how their skills could be better spent.
The scalability score was comprised of users indicating pain points with Scalability, Data Storage, The Cost of Storage, Data Cataloguing, Data Scalability, Size of Data, & Data Duplication.
Capacity planning, which involves accurately predicting future storage needs to avoid over-provisioning or under-provisioning of resources. – Data Engineer (Hybrid User), Micron
While conducting this study 15% of Data Management Professionals spoke to how they are capacity planning to determine the amount of data they can afford to store while maintaining performance levels. Although this is a manual task getting this process incorrect can have monumental negative effects on the business. An over-provisioned storage solution is too pricey for companies to afford, an under-provisioned storage solution cannot perform at a level that meets business needs.
Performance
Another pain point is ensuring storage performance. As datasets grow and storage media evolve, it is important to maintain high performance. This includes efficient data indexing, partitioning, and optimizing storage and retrieval operations. – Data Engineer IDS, Hybrid User
Forty Five percent of users found performance to be a pain point, the performance score amongst users surveyed consists of Speed, Data Quality, Data Query, Data Inconsistency, Data Movement, Data Recovery, & Data Relevance. The data shows us that a scalable storage solution is a storage solution that can perform. Whenever Data Lake or Data Warehouse is under provisioned you risk security, sensitive information may could be lost if data processes slow & there is not a timely backup. Given the new laws going into effect on Data Privacy such a breach of compliance could cost companies millions in fines. This current climate of GDPR, CCPR, CCPA, etc., heightens the importance of performance to companies.
- Performance is driven by scalability.
- Subpar performance risks data security & cannot deliver Data Quality at the level businesses need.
Impact & Next Steps
Key insights and data from this study were used by IBM Digital, Marketing and by PM at customer-facing events, like the iDUG Conference and the Customer Advisory Board event at iDUG. This research led to followup questions for PM to tackle with UXR support:
- How might we implement automation for the compliance process while maintaining the flexibility that users need?
- How might we make the compliance process less time consuming?
- How might we make scalability more cost friendly to users?
- How might we use AI to make the scalability process less time consuming for users.
Methodologies
Participants: Optimized Analytics emerged as a generative study from the Lakehouse Listening Tour to answer Product Management questions regarding the Data Management Market. In an effort to understand where users are currently in their Data Management, their past experiences with Data Lakes, Data Warehouses, and where they would like to see Data Management go in the future. For this mixed-methods study, we surveyed Data Management Professionals were surveyed (N= 177) and interviewed (N=13). 91% of all users surveyed were found to be Hybrid users meaning they currently work with both data lakes and data warehouses as part of their organization’s data management strategy.

By Xavier Underwood
UX Researcher, IBM watsonx Orchestrate
Catalog Redesign
Concept Testing the new WatsonX Orchestrate Catalog , the largest AI agent catalog in the industry. As we move towards a low code environment our users are becoming less technical & must be accomadated.
IBM watsonX Orchestrate introduced two new agent types—Orchestrator Agents and Expert Agents—into its catalog, requiring a redesign to ensure intuitive user comprehension. The challenge was determining the most effective way to structure these agents while helping users understand their hierarchical relationship (OAs → EAs → Tools) without prior onboarding. The goal was to validate whether users, especially non-technical ones, could navigate the catalog, grasp the new terminology, and confidently use the product—culminating in an on-prem release by Q1 2024.
Research Objectives & Methodology
The study aimed to assess: (1) how well users understood the OA/EA/Skill hierarchy, (2) whether they could infer functionality from the terms alone, and (3) how their comprehension improved after explanations. Nine participants—a mix of technical (engineers, architects) and non-technical (HR, product owners)—were recruited via User Interviews and Respondent. They interacted with a mid-fidelity Figma prototype, completing tasks such as navigating to an Orchestrator Agent, exploring Expert Agents, and selecting Skills. Sessions followed a moderated, think-aloud approach with two rounds: initial exploration and post-explanation feedback.
Key Findings
We are not teaching our users what we expect them to know
- Terminology Was a Barrier – Only 3 of 9 users (all technical) correctly inferred the meaning of “Orchestrator Agent” without guidance. Non-technical users found the term confusing, suggesting alternatives like “Operator Agent” or “Supervisor Agent.”
- Hierarchy Wasn’t Immediately Clear – While 5 of 9 users navigated the catalog intuitively, only one was non-technical. Most needed explicit explanations to understand how OAs managed EAs, which in turn utilized Skills.
- Users Wanted More Clarity & Customization – Both groups requested tooltips, workflow diagrams (similar to CrewAI’s flow visualization), and modular skill selection. Confidence scores were lowest for “Skills” (avg. 3/5), with users asking for editable templates and clearer dependencies.
Recommendations & Impact
To bridge these gaps, we proposed:
- Simpler terminology (e.g., renaming “Orchestrator Agent”) and embedded tooltips.
- Visual hierarchy aids, such as flow diagrams showing agent relationships.
- Guided onboarding for non-technical users, including demo sequences.
- Customizable skill selection within Expert Agents.
These insights directly influenced the Q1 2025 release, with tooltips and clearer descriptions added to the UI. As well as the removal of the terms Orchestrator Agent & Expert Agents in lieu of a more layman approach. Future work includes implementing a “Flow View” for better mental model alignment to be released by THINK 2025.
Conclusion
This study revealed that IBM’s initial design overestimated users’ familiarity with AI orchestration concepts. By prioritizing plain language, progressive disclosure, and visual scaffolding, we improved usability for both technical and non-technical audiences—key to wXO’s transition toward a low-code platform. The research underscored the importance of validating assumptions early, especially when introducing complex terminology.
Want more details? Contact me for the full report or prototype walkthrough!

By Xavier Underwood
UX Researcher, IBM watsonx Orchestrate
User Testing
Evaluating core journey 7 of the WatsonX Orchestrate Catalog.
Introduction
At IBM watsonx Orchestrate, our mission is to empower businesses with AI automation by providing an intuitive platform for managing agents, skills, and workflows. A critical part of this experience is the Core Catalog Journey 7, where administrators filter and select skills/collections to build AI solutions.
However, recent concept testing revealed that the current catalog experience falls short in two key areas:
- Relevancy – Administrators struggle to find the most applicable skills for their needs.
- Efficiency – The process of browsing, filtering, and selecting skills is slower than expected.
This article breaks down our research findings, insights, and proposed solutions—showcasing how UX research drives product improvements at IBM.
Research Goals & Methodology
Objective
Evaluate whether the Core Catalog Journey 7 meets administrators’ needs in terms of:
- Relevancy – Are the right skills surfacing at the right time?
- Efficiency – Can admins quickly find and deploy what they need?
Approach
We conducted moderated concept testing with:
- Participants: 12 enterprise administrators (mix of technical and non-technical roles).
- Methods:
- Task-based testing (finding and applying skills in a simulated workflow).
- Think-aloud protocols to capture real-time decision-making.
- Follow-up interviews to dive deeper into pain points.
Key Findings: Why the Current Catalog Falls Short
1. Relevancy Challenges
- Skills are not contextually prioritized.
- Admins waste time scrolling through irrelevant or low-priority skills.
- Example: An HR admin saw engineering-focused skills before HR-related ones.
- Collections lack smart categorization.
- Similar skills are not grouped effectively, leading to redundancy.
“I need to see HR skills first—not random coding tools. Right now, I have to dig.”
—HR Systems Administrator, Fortune 500 Company
2. Efficiency Pain Points
- Filtering is too manual.
- Admins rely heavily on text search because preset filters don’t align with their needs.
- No “quick apply” for common workflows.
- Repetitive tasks (e.g., onboarding automation) require re-selecting the same skills every time.
“If I’m setting up employee onboarding for the fifth time, why can’t I just load a template?”
—IT Operations Lead, Financial Services Firm
Proposed Solutions
1. Smart, Role-Based Skill Prioritization
- AI-driven personalization:
- Detect admin roles (HR, IT, Finance) and surface relevant skills first.
- Dynamic collections:
- Auto-group skills by use case (e.g., “Employee Onboarding Pack”).
2. Streamlined Filtering & Workflow Templates
- Enhanced filters:
- Add “Most Used” and “Recently Added” quick-access tabs.
- Saved workflows:
- Let admins bookmark or duplicate frequent skill combinations.
3. Performance Metrics for Continuous Improvement
- Track:
- Time-to-task completion.
- Skill selection accuracy.
- Iterate based on real-world usage data.
Impact & Next Steps
These findings directly influenced the watsonx Orchestrate 2025 Q2 roadmap, with:
✅ Role-based skill prioritization (piloting with HR/IT admins).
✅ Smart collections (entering beta testing).
✅ Template workflows (in development).
Future research will validate whether these changes reduce time-on-task and improve satisfaction.