SUNNYVALE, Calif.--(BUSINESS WIRE)--Rafay Systems, the leading Platform-as-a-Service (PaaS) provider for modern infrastructure and accelerated computing, today announced the availability of its inaugural survey, revealing that more than 9 in 10 (93%) of platform teams face persistent challenges. Top challenges include managing Kubernetes complexity, keeping Kubernetes and cloud costs low, and boosting developer productivity. The study, titled “The Pulse of Enterprise Platform Teams: Cloud, Kubernetes and AI,” analyzes challenges faced by platform engineering teams across the enterprise segment. To address these hurdles, organizations are emphasizing environment standardization, cost control and improved developer experiences, with a growing focus on automation and self-service solutions. The self-service trend extends to AI adoption, where a majority of respondents believe pre-configured AI workspaces with built-in machine learning operations (MLOps) and large language model operations (LLMOps) tooling could potentially unlock $1.4 million in productivity gains for an organization with 100 developers.
Kubernetes and Infrastructure Cost and Complexity are Pervasive Challenges
Despite widespread adoption of platform teams within IT organizations, survey respondents across the board confirmed that these teams are stretched to their limits managing complex multi-cluster Kubernetes and cloud environments. Top challenges organizations have experienced or are currently experiencing as it relates to Kubernetes include:
- Managing cost visibility and controlling Kubernetes and cloud infrastructure costs - 45%
- Complexity of keeping up with Kubernetes cluster lifecycle management with multiple, disparate tools - 38%
- Establishment and upkeep of enterprise-wide standardization - 38%
As Kubernetetes and cloud environments usage grows, organizations have also been inundated with a significant uptick in the cost and resources required to manage Kubernetes clusters. Nearly one-third (31%) state that the total cost of ownership (including software/support licenses, salaries of resources) is higher than budgeted for or anticipated. Looking ahead, 60% report that reducing and optimizing costs associated with Kubernetes infrastructure remains a top management initiative in the next year.
AI and GenAI: A New Frontier Mirroring Kubernetes Adoption Challenges
Organizations investing in AI and generative AI (GenAI) capabilities face challenges similar to those encountered during their Kubernetes adoption journey. The vast majority of respondents recognize the critical importance of efficient development and deployment methods, with 96% emphasizing this need for AI applications and 94% for GenAI applications.
Despite this, the study reveals that less than a quarter of organizations are at a sufficient level of implementation for both MLOps (17%) and LLMOps (16%). This nascent stage is reflected in the widespread difficulties faced:
- 95% of teams with MLOps implementations report challenges in experimenting with and deploying AI apps
- 94% struggle with GenAI app experimentation and deployment
Organizations are prioritizing key capabilities for their AI initiatives to overcome these obstacles. The top five features include pre-configured environments for developing and testing generative AI applications; automatic allocation of AI workloads to appropriate GPU resources; pre-built MLOps pipelines; GPU virtualization and sharing; and dynamic GPU matchmaking. These capabilities aim to streamline development, optimize resource utilization and manage costs effectively.
As with their critical function in cloud and Kubernetes technologies, platform teams are poised to play a pivotal role in eliminating persistent challenges to advance adoption and implementation. Respondents identified the following top responsibilities for platform teams to assist in the development of AI and GenAI applications:
- Security for MLOps and LLMOps workflows - 50%
- Model deployment automation - 49%
- Data pipeline management - 45%
AI Surge and Kubernetes Expansion Drive Demand for Self-service and Automation in Data Scientist and Developer Workflows
Organizations are prioritizing the developer experience with a growing emphasis on automation and self-service, spanning both AI initiatives and Kubernetes deployments. Respondents identified the following priorities to enhance developer productivity in the expanding Kubernetes ecosystem:
- Automating cluster provisioning - 47%
- Standardizing and automating infrastructure - 44%
- Providing self-service experiences for developers - 44%
- Automating Kubernetes cluster lifecycle management (Day 2) - 44%
- Reducing cognitive load on developer team(s) - 37%
Substantial productivity gains are also expected through improved developer experiences in AI projects — 83% surveyed believe pre-configured AI workspaces with built-in MLOps and LLMOps tooling could save teams over 10% of time monthly. For example, in an organization with 100 developers earning an average salary of $140,000*, adopting self-service AI workspaces could unlock nearly 20,000 hours of developer time annually. This is equivalent to $1.4 million in salary costs or the productivity gain of nine additional full-time developers — without increasing headcount.
“The survey’s findings confirm the platform engineering trend Team Rafay has highlighted previously: platform teams are now decidedly in the driver’s seat when it comes to major tooling and architectural decisions for compute consumption,” said Haseeb Budhani, CEO and co-founder of Rafay Systems. “It’s also clear from the survey that these teams are grappling with ever-increasing costs and complexity. Success for these teams will hinge on them being empowered with the right tools and strategies to optimize resources, standardize processes and drive innovation. Organizations that actively support their platform teams in addressing these challenges are best positioned to thrive in an increasingly competitive and technology-driven business landscape.”
Research Methodology
Demographic: More than 2,000 platform engineering, platform architecture, cloud architecture, cloud engineering, developer, DevOps, site reliability engineering (SRE) and operations professionals with roles ranging from C-level to team members at U.S. organizations with over 1,000 employees.
Process: The research was conducted in two parts — more than 1,000 professionals were surveyed to understand the intricate challenges organizations face with managing Kubernetes environments and cloud infrastructure. A second survey was conducted with 1,035 professionals to gather insights on AI and GenAI adoption in the enterprise.
Download a complimentary copy of Rafay’s full survey report.
Additional Resources
- Sign up for a demo of Rafay’s enterprise PaaS for modern infrastructure here
- Follow Rafay on X and LinkedIn
- Read the Rafay Blog: The Kubernetes Current
About Rafay Systems
Rafay’s mission is to liberate enterprises from the pains and complexities of consuming modern compute infrastructure, allowing them to channel 100% of their developers’ focus into innovation. Companies such as MoneyGram, Guardant Health and MassMutual entrust Rafayto be the cornerstone of their modern infrastructure strategy and AI architecture. With Rafay, platform teams at these companies enable developers and data scientists to access compute and AI resources in record time, complete with essential guardrails for security and governance. Gartner has recognized Rafay as a Cool Vendor in Container Management and GigaOm named Rafay as a Leader and Outperformer in the GigaOm Radar Report for Managed Kubernetes, acknowledging our commitment to driving innovation. To join the ranks of industry leaders who have unlocked the true potential of cloud-native computing with Rafay Systems, please visit www.rafay.co.
*“Software Engineer Salary in the US,” Built In, https://builtin.com/salaries/dev-engineer/software-engineer