Managed Agents in the Gemini API: What Google Launched, What It Changes, and How to Ship a Real Agent

By John Cronin

2026-05-20 Illustration of a developer at a laptop connected to a glowing orb representing an AI agent with small icons of calendar, search, map pin, and email around it on a cream background

At Google I/O 2025, Google rolled out a set of updates to Google AI Studio and the Gemini API that quietly changed what it takes to ship a production agent. The headline is managed agents in the Gemini API, with built in connections into the Google ecosystem and an authoring path inside AI Studio that lets a working team move from sketch to deployed agent in a single afternoon. The launch matters because most of the heavy lifting that used to sit on the shoulders of an engineering team is now part of the platform, and the bar for getting a real agent into the hands of real users just dropped meaningfully.

This piece walks through what was launched, what it changes about the practical work of building an agent, where the platform is still on the operator, and how a Google Gemini partner like ProvenROI fits in for companies that want to move quickly without spending six months learning the platform.

What Google Actually Launched

The announcement covered a lot of ground, and the relevant parts for teams building agents fall into a few buckets that are worth understanding on their own terms before getting into the implications.

The first bucket is the managed agent runtime inside the Gemini API. Until this launch, building an agent on top of Gemini meant orchestrating the model calls, the tool calls, the state, the retries, the streaming, the error handling, the safety guardrails, and the deployment yourself. The managed agent runtime handles most of that for you. You define the agent's goal, the tools it has access to, the policies it operates under, and the surfaces it serves, and the runtime handles the orchestration. The engineering team is freed to focus on the parts that actually differentiate the agent, which are the design, the tools, the data, and the user experience.

The second bucket is the Google ecosystem integrations. Gemini agents now have first class connections into Google Workspace, including Gmail, Calendar, Drive, Docs, Sheets, and Meet, as well as into Google Search, Google Maps, YouTube, and the broader set of Google data surfaces that most companies are already using. The integrations are not a bag of API keys you stitch together yourself. They are part of the managed runtime, with authentication, scopes, and rate limits handled by the platform.

The third bucket is the authoring experience in Google AI Studio. The Studio now supports an agent project type that combines the model selection, the system instructions, the tool configuration, the integration setup, the test runs, the evaluation suite, and the deployment in a single workspace. A team can build, iterate, and ship an agent without ever leaving the Studio, and the same project produces both the AI Studio prototype and the production API endpoint.

The fourth bucket is the evaluation and observability layer. The managed agent runtime captures structured traces of every agent run, with the tool calls, the intermediate reasoning, the latency, the cost, and the outcome available for review. The evaluation suite supports both automated evaluation against a defined rubric and human evaluation with a built in review queue. The observability is one of the parts of the launch that operators with production experience tend to notice first, because it solves a problem that most teams used to handle with a tangle of custom logging.

The fifth bucket is the deployment and scaling story. A Gemini agent built in AI Studio can be deployed to a managed endpoint with a single click, served at the latency and throughput Google's infrastructure provides, and scaled without the engineering team having to think about the runtime. The agent can also be embedded into Google surfaces directly, including Gmail, Docs, and the broader Workspace, with the embedding handled by the platform.

The combination of those five buckets is the thing that matters. Each on its own would be a useful incremental improvement. Together they change the amount of effort it takes to ship a serious agent from a multi quarter engineering investment to a few weeks of focused design and integration work.

What Managed Agents Change About the Practical Work

For teams that have built agents the hard way over the past two years, the shift to managed agents in the Gemini API is going to feel like a substantial change in what the work looks like day to day. A few of the changes are worth calling out specifically.

Orchestration is now the platform's job. The loop of model call, tool call, state update, error handling, retry, and response synthesis used to be the bulk of the engineering work. The managed runtime handles all of it. The team's attention shifts to designing the agent rather than building the runtime that runs it.

Authentication and authorization for tool access is handled by the platform. The old pattern of stitching together OAuth flows, service accounts, and scope management for every tool the agent uses was one of the most fragile parts of building a Workspace integrated agent. The new pattern is to declare the integrations you want and let the platform handle the auth, with the consent flow surfaced to the end user in a clean Google native way.

Evaluation is part of the platform rather than a separate build. The evaluation suite supports both rubric based automated evaluation and a queue for human review, with the runs sampled in production and the results fed back into the iteration loop. The teams that used to spend weeks building an evaluation harness can now spend that time building the rubric and the test cases that actually matter.

Observability is built in. Every run is captured with the full trace, the tool calls, the latency, the cost, and the outcome, and the trace is queryable from the Studio and the API. The teams that used to wire up custom logging, custom tracing, and custom cost attribution can now start with the platform layer and add the custom pieces only where they are genuinely needed.

Deployment is a click rather than a project. The deploy pipeline that used to take a sprint to set up is now part of the Studio. The engineering team can focus on the agent design and the integration logic rather than on the deploy infrastructure that gets the agent in front of users.

The cumulative effect is that the time from idea to shipped agent for a team that knows the platform is now measured in weeks rather than quarters. For a team that does not yet know the platform, the learning curve is the main remaining cost, and the curve is steep enough that a credible Google Gemini partner can be the difference between a fast win and a long detour.

The Google Ecosystem Integrations in Practice

The integration story is one of the parts of the launch that benefits from a concrete walk through, because the abstract pitch of "deep Google ecosystem integration" can hide how much actual work the platform is doing on the team's behalf.

Take a sales follow up agent as a worked example. The agent's job is to read a sales rep's email and calendar, identify the meetings that need a follow up note, draft the follow up note based on the meeting notes and any related Drive documents, surface the draft for review, and send the note once the rep approves it. Building that agent the hard way involves OAuth flows for Gmail, Calendar, and Drive, a state store to track which meetings have already been handled, a queue for drafts pending approval, a UI for the rep to review and edit drafts, and a deploy pipeline to keep the whole thing running. The team that builds it the hard way is looking at a few months of work even before the agent's actual prompt design is in good shape.

Building the same agent with the managed agent runtime is a different shape of work. The integrations for Gmail, Calendar, and Drive are declared in the agent configuration. The state and the approval queue are handled by the runtime. The review interface can be the Studio's built in review queue or a Workspace embed surfaced inside Gmail itself, depending on where the rep wants to do the work. The deploy is a click. The engineering work that remains is the prompt design, the tool selection, the rubric for what makes a good follow up note, and the integration with whatever proprietary systems the team uses that are not Google native. The few months of plumbing become a few weeks of design and tuning.

The pattern repeats across most of the agent use cases that companies actually want to ship. A meeting prep agent that reads the calendar, pulls the relevant Drive documents, summarizes the relevant prior emails, and surfaces a one page brief before the meeting starts. A research assistant agent that runs a Google Search query, pulls the relevant results, synthesizes a structured answer, and saves the answer back into Drive for later reference. A customer success agent that watches for specific signals in Gmail and Calendar, pulls the related Sheets data, and surfaces a prioritized action list. Each of those agents used to be a quarter of engineering work. Each is now a few weeks if the design is sharp and the team knows the platform.

The Google native integration also makes the agents more useful in the surfaces where the work actually happens. A follow up agent that can surface its drafts inside Gmail is more useful than the same agent that requires the rep to log into a separate tool to review the drafts. A meeting prep agent that can land its brief in the Calendar event is more useful than the same brief delivered by email. The platform's ability to embed the agent into the Google surfaces is one of the parts of the launch that pays off in adoption, because it removes the friction of switching tools to get the value.

Where the Platform Is Still on the Operator

The managed agent runtime is genuinely a leap forward, and it is also not the entire job. The design of the agent itself is still on the team, including what it does, what it does not do, the tools it has, the policies it operates under, and the way it handles ambiguity. The teams that ship the best agents spend more time on design than on implementation, and the managed runtime makes that ratio even more pronounced. The rubric for evaluating the agent is on the team too. The platform provides the evaluation infrastructure, but the team defines what good looks like, what failure modes to watch for, and what the threshold is for shipping.

The integration with the team's proprietary systems is on the team. The Google ecosystem integrations cover a lot of ground, but most companies have a meaningful set of internal systems that are not part of the Google stack, and the agent needs to reach for them to be useful. The change management for getting the agent adopted is on the team too. Shipping is half the work. The other half is getting the users to actually use it, which involves training, documentation, feedback loops, and the soft work of building trust. And the ongoing program of monitoring, evaluating, and improving the agent is on the team. The agents that compound value over time are the ones that are operated as a program rather than as a project.

The Bar for a Production Ready Agent

The phrase "production ready" gets used loosely in the agent space, and it is worth being specific about what it means for a Gemini agent shipped into a company's day to day operations. A production ready agent has a clearly defined goal that maps to a business outcome, with the success criteria explicit and the team able to answer what the agent is for in a single sentence. It has tools scoped to the minimum set it needs, with the temptation to grant broad access resisted because narrow scopes are easier to evaluate and harder to misbehave with. It has a tested set of policies for ambiguity, error, and edge cases written into the system instructions. It has an evaluation suite that runs continuously and a human review queue for the cases the rubric does not cover. And it has a named operator who watches the metrics, works the queue, and drives the iteration for the life of the agent.

The managed runtime makes hitting that bar more achievable than it used to be, and the bar itself has not changed. A managed agent that ships without a clear goal, scoped tools, tested policies, an evaluation suite, and an owner is no more production ready than the same agent built the hard way.

Common Patterns and Anti Patterns

Patterns that work tend to repeat across the teams that ship Gemini agents successfully. Start narrow with a single agent that has a single job and let it succeed before expanding. Wire the agent into the surfaces the user already uses, so it surfaces inside Gmail or Calendar rather than living in a separate app the user has to remember to visit. Invest in the evaluation rubric in the first week so the model and tool decisions are made against a clear picture of what good looks like. Run the evaluation suite from the first day rather than retrofitting it later. Treat the agent as a program with a named operator and a weekly cadence rather than as a project that ships and is forgotten.

Anti patterns recur too. Giving the agent access to every tool because access is cheap pays the cost later in failure modes and evaluation work. Skipping the system instruction discipline because the model is smart enough to figure it out produces agents that surprise the operator. Shipping without a human review queue because the evaluation suite is in place misses the cases the rubric does not cover, which are usually the ones that matter most. Treating the launch as the finish line rather than the start of the operating program leaves the agent to decay as the world around it changes. Building in isolation from the users who will use the agent ships something that misses the workflow they actually have.

What to Look For in a Google Gemini Partner

The platform's improvements have lowered the bar enough that more teams can credibly build their own Gemini agents than could a year ago. For the teams that want to move faster than the in house learning curve allows, or that want a partner who has already run the playbook on similar problems, picking the right Google Gemini partner is one of the higher leverage decisions in the program.

A credible partner has built and shipped Gemini agents in production, ideally across more than one client and more than one use case. The ability to point at agents that are actually in production, with the business outcomes they have produced, is the most direct evidence of capability. A partner whose case studies are demos and pilots rather than shipped programs is a partner whose capability is still mostly theoretical.

A credible partner has a clear methodology that covers the design, the evaluation, the deployment, and the ongoing operation. The methodology is the muscle that makes the work repeatable across clients. A partner whose process is "we figure it out as we go" tends to take longer and produce more variable results than a partner whose process is the product of a few dozen prior engagements.

A credible partner has integrated experience across the Google stack. The Gemini API is one piece of the picture. The Google Workspace integrations, the Google Cloud infrastructure, and the broader Google ecosystem matter for any agent that is going to live inside a Google native workflow. A partner who knows Gemini in isolation tends to deliver agents that work in isolation. A partner who knows the full stack tends to deliver agents that fit the company.

A credible partner has the design and product muscle to shape the agent, not just the engineering muscle to build it. The engineering work is now mostly the platform's job. The remaining work is the design of what the agent does, how it behaves, and where it fits in the user's day. A partner whose strength is purely engineering tends to deliver agents that are technically sound and operationally weak. A partner who can shape the agent end to end tends to deliver agents that get adopted.

A credible partner has the evaluation discipline that the platform now enables. The evaluation suite is part of the platform, and a partner that does not use it well is a partner whose agents will drift in production. The honest test is to ask the partner to walk through the evaluation rubric and the review process for a recent client engagement and to see whether the answer is detailed and specific or vague and aspirational.

A credible partner has the change management muscle to get the agent adopted by the end users. The launch of the agent is the start of the value, not the end. A partner that ships and disappears tends to leave the client with an agent that gathers dust. A partner that stays through the adoption work tends to leave the client with an agent that actually changes how the work gets done.

A credible partner has honest reporting. The cadence of weekly operating reviews, monthly leadership reporting, and quarterly business reviews against the original goal is the discipline that proves the work was real. A partner whose reporting is all highlight reel and no flat months is a partner whose data deserves scrutiny.

How ProvenROI Works as a Google Gemini Partner

ProvenROI's approach to Gemini agent work follows the same discipline the firm applies to its other engagements, with the specifics adapted to the shape of agent programs. The starting question is always what business outcome the agent is supposed to produce, with the answer baselined in metrics the leadership team uses to make investment decisions.

The engagement typically starts with a focused diagnostic. The team works with the client to identify the candidate agent use cases, scores them against the criteria of business value, technical feasibility, and adoption likelihood, and recommends a starting point that is narrow enough to ship in a quarter and meaningful enough to justify the investment. The diagnostic produces a small set of clear artifacts that the leadership team uses to commit to the build phase.

The build phase follows the managed agent runtime's natural shape, with the design work done in Google AI Studio, the integration work done against the Google ecosystem connectors and the client's proprietary systems, the evaluation rubric defined in the first week and run from the first day, and the deployment done through the Studio's managed endpoint or embedded into the Google surfaces where the users already work. The build phase typically runs a quarter and ends with the agent in production and the operating program standing up around it.

The operating phase is the part that compounds. The agent is monitored on the evaluation suite, the review queue is worked, the regressions are addressed, the new failure modes are folded into the rubric, and the agent is improved on the cadence the operating cycle calls for. The reporting to the leadership team is honest. A flat month is a flat month, with the diagnosis and the recommended adjustment. The agent's evaluation results, the user adoption metrics, the business outcome metrics, and the cost are reported alongside each other on the cadence the leadership team can use to make investment decisions, and the trust that compounds from that reporting is what makes the engagement durable across the year and beyond.

How to Decide Whether to Build, Partner, or Both

Building in house tends to make sense for companies with a clear set of agent use cases on the roadmap, an engineering team with the bandwidth to learn the platform and run the operating program, and the patience to let the first agent take longer in exchange for the institutional knowledge that builds. The in house path is slower at the start and cheaper at scale. Partnering tends to make sense for companies that want to move quickly on a high value use case without waiting for the internal learning curve, that have a leadership commitment to ship the first agent in a quarter rather than a year, or that want a credible partner to run the operating program while the internal team focuses on other work.

The hybrid model is often the strongest answer. A partner ships the first agent and stands up the operating program. The internal team works alongside the partner through the build, absorbing the methodology and the platform expertise. The partner stays on the operating program through the first year or transitions to an advisory role once the internal team is ready to take it over. The next agent is built by the internal team with the partner as a sounding board. The model gives the company the speed of partnership on the first agent and the cost structure of in house ownership on the agents that follow.

What the Launch Means for the Broader Agent Market

The bar for shipping a useful agent has dropped meaningfully, and the companies that were waiting for the platform layer to mature now have less reason to wait. The next 12 months are likely to see a wave of production agents shipped by teams that would not have attempted the work a year ago, and the companies that move first inside their categories will build advantages that compound. Differentiation is shifting from the engineering of the agent to the design of it, because the platforms are converging on a similar shape of managed runtime, ecosystem integration, evaluation, and deployment. The Google ecosystem advantage is real for companies already running on Google Workspace or Google Cloud, and the role of the partner is shifting from building the runtime to bringing the design discipline, evaluation rigor, change management, and operating program that the platform does not provide.

The Bottom Line

Managed agents in the Gemini API are a real step forward for the teams that build agents and for the companies that want to use them. The orchestration, the integrations, the evaluation, the observability, and the deployment are now part of the platform, and the time from idea to shipped agent has dropped from quarters to weeks for teams that know the platform.

The work that remains is the work that always mattered most. The design of what the agent does. The rubric for what good looks like. The integration with the company's proprietary systems. The change management that gets the agent adopted. The ongoing program that keeps it valuable over time. The platform has not eliminated that work. It has cleared the path so the teams can focus on it.

For companies that are evaluating whether to build, to partner, or to do both, the right answer depends on the roadmap, the internal capability, the value of the first agent, and the long term role of agents in the business. A credible Google Gemini partner can shorten the time to a working agent and bring the design and evaluation discipline that the platform does not provide. A strong internal team can build the long term capability that compounds. A hybrid model often captures the best of both.

ProvenROI works with companies on the design, build, and operating phases of Gemini agent programs, with the same discipline of measurement, integration, and honest reporting that defines its other engagements. If the launch of managed agents in the Gemini API has put an agent program on your roadmap and you want a partner who has already run the playbook, that is the conversation to have. If the path forward is in house, the same principles still apply, and the discipline is the part that proves the work was real.