🟡 The Rise of Agent Bricks

How Databricks’ Agent Bricks is redefining AI product strategy, through feedback loops, evals, and learning systems. Here’s how you can use it.

Jul 09, 2025

In a recent Databricks demo I saw, a seemingly simple question was asked:

“Does our Spicy Mango formula exceed the sugar surge cap?”

The AI assistant paused, scanned the R&D PDFs, and responded poorly. It didn’t understand “sugar surge cap,” a domain-specific term used by the product development team to describe when sweetness overpowers flavor nuance.

It wasn’t a hallucination. It was a mismatch between context and comprehension. The agent had the data, but not the business domain understanding.

But then something happened.

The user updated the context information, gave feedback in plain English explaining what the sugar surge cap meant and where to look for it (grams per 12 oz can, compared to archetype flavor index). On the next attempt, the agent got the answer, citing source docs and offering a flavor profile suggestion. For me this speaks to the power of RAG across business domains for better context.

It was interesting to see an AI system learn a real business rule in natural language and adapt live.

And it wasn’t a hack. It was baked into a new product called Agent Bricks.

Databricks just leveled up, their new platform, Agent Bricks, introduces a powerful way to build multiple agents or intelligent systems ones that reason over data, evaluate their own performance, optimize tradeoffs, and improve through natural language feedback.

It’s not just another AI tool. It’s an end-to-end intelligent product stack and a clear signal that the agent era is here.

If you’re building with AI, this isn’t a side note. It’s your blueprint for the future.

What is Agent Bricks?

At a high level, Agent Bricks is Databricks’ new framework for building production-ready agents with:

Declarative task descriptions (e.g., “answer R&D questions about our energy drink formulas”)
Attached proprietary data (documents, databases, vector search, APIs)
Auto-generated evals that judge the agent’s performance specific to your task (No need to be an eval expert)
Cost-performance optimization, so you don’t have to manually experiment with 5 models. You get visual trade-offs between different models and setups, letting you pick the optimal path based on your task, budget, and constraints.
Feedback loops that let you steer the agent using natural language, not code

If you’ve built agents manually, you know how much this simplifies.

If you haven’t it’s what most RAG-based chatbots are missing like judgment, feedback, and learning.

Databricks’ new Agent Bricks system isn’t a single tool it’s an orchestrated platform for building, evaluating, optimizing, and deploying production-ready agents. It abstracts away complexity but lets teams stay in control.

And most importantly, it’s not static Agent Bricks supports learning over time.

The Intelligent Product Angle

Here’s what makes Agent Bricks exciting for intelligent product builders, it’s not just about building a bot it’s about building systems or group of agents that learn, reason independently or together, and align with your domain.

Databricks brings in a core principle of intelligent product strategy:

Evaluation + Data + Feedback = Learning System

That formula means:

You’re no longer hardcoding prompts or relying on hope
Your agents can improve through natural language guidance
RAG isn’t the ceiling it’s a starting point

Agent Bricks makes the shift from retrieval-based QA to action-oriented supervised agents that can pull from internal tools, vector search, structured databases, and each other.

It’s RAG → Agents → Swarms.

What makes this different?

The insight is that LLMs are better evaluators than we builders.

We’ve spent months trying to improve outputs prompt tweaking, chaining, function calling. I remember early PoC projects where I spent weeks fine-tuning models, only to find that the output still felt generic based on what it has been trained on the internet. The model wasn’t misunderstanding the question it just didn’t grasp the context behind our data or how it was used. But what Databricks realized is that you can get better agents by focusing not just on creation, but on how you evaluate and optimize outputs over time.

This changes things:

You describe the agent’s task.
Bricks creates evals for that task (e.g., “accuracy of referencing product catalog”).
It tries multiple approaches like prompt tuning, tool use, agent composition.
It shows you a cost vs quality tradeoff curve. You choose.
You give it natural language feedback when it fails.
It adjusts everything from retrieval filters to tool configs automatically.

Now with this we are moving from prompt engineering to systemic optimization.

Three Real Use Cases That Matter

Databricks didn’t just talk theory. They were real enterprise examples talked about that make it clear on the overall advancement of this unified platform.

MasterCard

Built an internal onboarding assistant that reduced product setup time by 30%
85% of escalations now handled at first touch
Used Q&A data and feedback loops to continuously improve

Energy Drink Company (Live Demo)

Created an R&D knowledge assistant from PDFs
Used automated evaluation to detect failures
Upgraded performance with natural language feedback
Then orchestrated agents from marketing, finance, and R&D into a multi-agent supervisor that answered “What product should we launch next?” with dev timelines and launch risks included

Internal Agent Fleets

Databricks and JP Morgan talked about how they now deploy agents across teams
Use cases include onboarding, analytics assistants, financial workflows, and system design agents

A Framework You Can Use Today

Even without Agent Bricks, here’s how you can bring this approach into your own products or systems.

Intelligent Agent Loop

Think like a product manager, not a prompt engineer.

Define the Task, Not the Output
- Don’t just say “summarize this document.”
- Say: “Identify contract risks that require legal approval under $50K clauses.”
Attach Domain-Specific Data
- Don’t rely on base model knowledge.
- Use internal docs, API responses, customer logs. (Quality context window)
Create Evaluation Criteria
- Not “Did the model respond?”
- But: “Was the correct policy applied to this edge case?”
Run Cost-Quality Tests Across Approaches
- Try open-source + vector vs hosted LLM + metadata filter
- Benchmark across inputs, not just once
Add Feedback Loops That Learn
- Let users give natural language corrections
- Feed that into re-ranking, retrieval filters, or prompt updates
Track and Trace
- Use observability tools like LangSmith, Phoenix, MLflow 3.0 to observe, not just debug

Where It’s Headed

The term “agent” is starting to lose meaning used to describe everything from a chat window to a multistep planning AI. But what Databricks did with Agent Bricks is clarify the operating system underneath.

Not just a wrapper for LLMs.

But a production framework that:

Treats learning as a product cycle
Embeds evaluation into deployment (AI Judge)
Accepts feedback like a teammate (Natural language)
Optimizes across cost, context, and confidence

It’s still early. But it’s the clearest expression we’ve seen of what intelligent products actually look like in the wild.

What This Means for Builders

This changes how we think about AI productization.

What used to take weeks of LLM ops and DevOps can now be declared in natural language and governed through traceable feedback.

And because it plugs into Unity Catalog, Vector Search, Genie, and the Mosaic LLMs you can own the stack without leaving Databricks.

My Strategic POV

If you’re in product, AI, or data leadership and you’re asking:

“How do we get AI into production responsibly and scalably?”

Agent Bricks offers an important answer not because it solves everything, but because it reframes the question.

It’s about building systems that learn the business.

And that’s what intelligent product strategy is all about.

“We’re witnessing a shift from AI features to AI teams (Teams of agents).
If you’ve been wondering how to get AI into production responsibly, this is it as it combines Unity catalog governance.”

I believe Agent Bricks is one of the first platforms that will let data-rich teams do that without reinventing everything. I can’t wait to have this fully rolled-out

Intelligent Products

Discussion about this post