Recently, at Cloud Next ’26, Google unveiled significant updates to Google Kubernetes Engine (GKE) that are set to transform the landscape of AI workload management. The enhancements include the introduction of GKE Agent Sandbox for secure agent code execution and GKE Hypercluster, which provides the ability to manage up to a million accelerator chips from a single control plane. Drew Bradstock, the senior director of orchestration and Kubernetes product management, along with Gari Singh, GKE group product manager, highlighted the critical role Kubernetes plays in the AI era.
“Kubernetes has rapidly become the operating system for the AI era, with GKE now powering AI workloads for all of our top 50 customers on the platform, including the largest frontier model builders.”
This perspective aligns with broader industry trends showing that multi-agent AI workflows have surged by an astounding 327% in recent months. In fact, according to CNCF data, 66% of organizations now depend on Kubernetes to power their generative AI applications and agents, establishing it as a backbone for modern AI initiatives.
Introducing GKE Agent Sandbox
The GKE Agent Sandbox introduces a game-changing approach to untrusted code execution. By leveraging gVisor, a kernel-level isolation technology that also secures Google’s Gemini system, GKE Agent Sandbox promises to provide around 300 sandboxes per second with sub-second latency. Moreover, enterprises could achieve up to 30% better price-performance when running on Axion compared to other hyperscale clouds.
The Agent Sandbox initially launched as a subproject under Kubernetes SIG Apps during KubeCon NA 2025. It incorporates three key Kubernetes primitives: Sandbox (the core workload resource), SandboxTemplate (the security blueprint), and SandboxClaim (used for requesting execution environments from higher-level frameworks like ADK or LangChain). Additionally, warm pools of pre-provisioned pods effectively cut cold start latency to under one second, significantly improving operational efficiency.
Companies like Lovable, which supports over 200,000 AI-generated projects daily, are already reaping the benefits of the Agent Sandbox. Co-founder Fabian Hedin noted:
“GKE’s cutting-edge sandboxing capabilities allow us to reliably scale to hundreds of secure sandboxes per second, ensuring we can seamlessly empower builders, even during massive, unpredictable demand.”
Competition in the Agent Sandbox Space
The emergence of GKE Agent Sandbox has intensified competition in the agent sandbox arena. Cloudflare has recently launched Sandboxes GA using container-based isolation on its edge network alongside V8 isolate-based Dynamic Workers for lighter workloads. Meanwhile, E2B is utilizing Firecracker microVMs. Notably, G107021-KE Agent Sandbox stands out as the only native agent sandbox offering from among the three major hyperscalers.
Google’s overarching strategy positions Kubernetes itself as the agent runtime, with gVisor providing open-source isolation rather than being confined to proprietary features. This open-source approach is an essential differentiator, allowing any Kubernetes cluster to run Agent Sandbox, not just GKE, presenting a flexible solution for developers.
Scaling with GKE Hypercluster
The GKE Hypercluster, now in private GA, addresses another critical scaling challenge. In the face of increasing AI training demands, organizations often find themselves managing fragmented infrastructure across numerous disconnected clusters, leading to substantial operational overhead. Hypercluster allows a single, conformant GKE control plane to effectively manage one million chips distributed across 256,000 nodes in multiple regions.
Security protocols leverage Google’s Titanium Intelligence Enclave, adopting a hardware-attested, no-admin-access model. This ensures proprietary model weights and prompts remain cryptographically sealed from platform administrators, addressing escalating security concerns in AI development.
As Alex Gkiouros, a Google Cloud Ambassador and staff architect, insightfully pointed out, the scalability of managing a million chips across regions requires careful consideration of potential blast radius and change management issues.
Enhancements in Inference Performance
Additionally, GKE is shipping significant improvements aimed at enhancing inference performance. The Predictive Latency Boost in the GKE Inference Gateway utilizes machine learning-driven routing to cut down time-to-first-token latency by up to 70%. This advancement replaces traditional heuristic methods with real-time capacity-aware scheduling, built on the llm-d framework, which recently became an official CNCF Sandbox project.
Moreover, Google has introduced automatic KV Cache storage tiering that spans RAM, Local SSD, and Google Cloud Storage. This innovation addresses long-context memory bottlenecks and has been reported to provide up to a 50% throughput gain for 10K prompts offloaded to RAM, along with nearly 70% for 50K prompts routed through SSD.
Additional Feature Enhancements
Among other updates, GKE has rolled out an RL Scheduler designed to optimize reinforcement learning workloads, and an RL Sandbox for kernel-isolated reward evaluation. Perhaps most notably, intent-based autoscaling based on custom metrics can reduce Horizontal Pod Autoscaler (HPA) reaction times from 25 seconds to just 5 seconds by sourcing metrics directly from pods instead of relying on external monitoring stacks.
Inspired by: Source

