AI Agents in Practice: Insights, Challenges and What We've Learned

As AI evolves from static LLMs (Large Language Models) into more dynamic and reasoning-driven systems, we started running into a familiar question in our own work: at what point does a language model stop being "just a model" and become something we can actually rely on in production?

Instead of merely generating content, AI agents can reason, use tools, take actions, remember context and even collaborate with other agents. This represents a significant shift from traditional chatbots and marks a new phase in how we design AI-powered systems.

In this article, “we” refers to our engineering team within the Digital Platform tribe, where we experiment with AI agents and multi-agent architectures to build practical, production-oriented solutions. The insights and trade-offs discussed are drawn directly from hands-on experiments and prototypes developed during this work.

What Exactly Is an AI Agent?

You can think of an AI agent as a smart assistant with hands, eyes and a plan. A regular LLM only talks. An agent can act, look things up, use APIs, follow workflows and decide what to do next.

In simple terms:

An AI agent is a system in which a language model interacts with its environment to accomplish a goal by reasoning, planning and taking actions.

Instead of just telling you what flight to book, imagine an agent that helps you prepare a weekend trip: It checks weather, suggests outfits, plans transport and books activities, all without you orchestrating each step.

Key Components of an AI Agent

1. Model (the Brain)

The LLM interprets user input, reasons about the problem and decides the next step. It is the thinking part.

2. Tools (the Hands)

Tools let the agent reach the outside world through databases, APIs, search systems, computation and file access. Without tools, a model is like a brilliant person locked in a room with no internet.

Tools are essential for bridging the gap between the model and the real world.

3. Orchestration Layer (the Nervous System)

This layer manages the agent’s “thinking loop” and lifecycle:

  • Reading context
  • Deciding the next action
  • Remembering useful details
  • Calling tools
  • Deciding when the task is complete

It coordinates how the agent processes information and acts on it.

AI agent model
Many people confuse LLMs with agents, but the model alone is not an agent. In reality, LLMs lack real-world access and can’t think beyond their training data. The combination of tools, memory and orchestration turn a model into an agent.

From One to Many: Multi-Agent Systems

Single-Agent Systems

A single agent, equipped with tools and instructions, handles workflows end-to-end. This works well for simple or narrow tasks.

But single agents often face challenges:

  • Too many tools -> confusion and wrong tool calls
  • Complex prompts -> fragile behavior
  • Limited context -> decreasing accuracy
  • One generalist agent -> no specialization
  • Hard-to-maintain if/else logic buried inside the prompt

Multi-Agent Systems

Instead of one overloaded agent trying to do everything, tasks are distributed across multiple specialized agents. Think of it as a well-coordinated team of experts rather than one overworked superhero. Each agent focuses on what it does best, and together they achieve complex goals more effectively.

Supervisor Agent model

Benefits:

  • Agents become specialists -> better accuracy and more reliable outcomes
  • Modular and easy to scale -> add more agents as the system grows
  • Natural division of labor -> tasks are distributed logically across agents
  • Efficient coordination -> communication frameworks enable effective agent collaboration
  • More robust overall system (when designed well) -> reduces the risk of system-wide failure

By orchestrating specialized agents, multi-agent systems are able to handle complex tasks that require diverse expertise, such as reasoning, creativity, data retrieval and compliance. This makes them more robust, adaptive and collaborative than single-agent architectures, enabling real-world problem-solving.

Connected Agents in Azure AI Foundry

Azure AI Foundry is a unified platform to build and manage agents with built-in models, tools, frameworks and governance. We used its Agent service in our experiments because it removes infrastructure overhead and makes it easy to prototype and iterate on agent behavior quickly.

Connected agents in Azure AI Foundry allow you to build multi-agent setups without custom orchestrators or hand-coded routing logic.

A primary agent – also known as a supervisor, coordinator or main agent – coordinates the workflow and delegates tasks to sub-agents using natural language.

Connected Agents in Foundry Portal
Connected Agents in Foundry Portal

Key Features

  • Flexible setup: Build agents via no-code UI in Azure AI Foundry or programmatically using the Python SDK
  • Flexible model selection: Use Azure OpenAI models or others like Llama, Mistral and Cohere
  • No custom orchestration: The main agent acts as a coordinator and routes tasks based on natural language
  • Built-in tools: Includes retrieval tools, code interpreter and integrations with Bing Search, Azure AI Search, Azure Functions, etc.
  • Tool calling: It automatically runs the model, invokes tools, tracks their execution and returns results
  • Secure state management: Conversations and data are stored in threads, so developers do not need to manage state manually

The model identifies user intent and selects tools dynamically based on context and availability. Tools come in two categories:

  • Knowledge tools: Provide relevant knowledge or enhanced context – examples include Azure AI Search, File Search, Grounding with Bing Search and SharePoint
  • Action tools: Execute tasks or perform actions – examples include Code Interpreter, Azure Functions and Azure Logic Apps
Foundry portal
Knowledge Tools in Foundry Portal
Foundry portal
Action Tools in Foundry Portal

Limitations

  • Only one instance per tool is supported
  • Some tools can only be integrated via SDK, which requires code-based setup
  • Tool specific instructions cannot be set directly
  • Orchestration is implicit:
    • The developer cannot fully control the exact order of agent invocation
    • All orchestration decisions belong to the main agent → works well but is not always aligned with our desired outcomes
  • Citation behavior inconsistent:
    • No guarantee citations are preserved across agents
    • Prompt engineering may improve it, but results can still vary
  • Limited observability into what is happening behind the scenes

Connected Agents are quick to set up, great for experimentation with multi-agents and provide an accessible no/low-code entry point. But the workflow is nondeterministic, and orchestration happens mostly behind the scenes.

For our use case, we needed more control, transparency and reliability, so we moved toward a code-based multi-agent workflow.

Microsoft Agent Framework

To overcome the limitations of Connected Agents, we moved to the Microsoft Agent Framework, which is an open-source and developer-friendly approach for building customizable multi-agent workflows.

Workflows enable agents to follow a predefined order of execution, allowing us to orchestrate agent collaboration explicitly. This gives more control and clarity than relying solely on a primary agent or prompt engineering.

Group chat orchestration
Multi-agents: Group Chat Orchestration

Why This Framework Is Powerful

It gives developers explicit control over multi-agent execution:

  • Clear workflow definitions: Predefined execution order avoids unpredictable orchestration and makes reasoning flows fully transparent
  • Modularity: Workflows can be split into small, reusable components that are easier to update and maintain without touching the whole system
  • Built-in orchestration patterns: Choose from sequential, concurrent, group-chat, hand-off and magnetic multi-agent patterns
  • Full observability with OpenTelemetry: Every agent action, tool invocation and orchestration step is automatically logged and traceable
  • Developer UI: A ready-to-use web interface for demos and debugging with a single line of code – a huge accelerator for rapid iteration
  • Human-in-the-loop: Tasks can be configured to require human approval where needed
  • Secure execution: Agents run inside Azure AI Foundry with built-in RBAC, safety controls and private data handling
  • Open standards support: Integrates with MCP (Model Context Protocol) and A2A (Agent-to-Agent) protocols to discover and use external tools and agents

Limitations

  • Still in public preview, so occasional inconsistencies may appear
  • Agent-supported models are limited
  • Not yet recommended for running large-scale production GenAI solutions

Overall, early experimentation has been very promising. The Microsoft Agent Framework has a higher barrier to entry but offers full flexibility and explicit control. It enables developers to customize workflows, define agent behavior explicitly and build deterministic orchestration. While it requires more development effort, it supports scalability, precision and enterprise-grade reliability.

Challenges We Faced

Several recurring pain points surfaced during our experiments:

  • The supervisor agent sometimes dilutes or misinterprets sub-agent outputs
  • The supervisor agent does not reliably trigger sub-agents in low-code environments
  • Sub-agents occasionally outperform supervisor agent
  • Tool calls may be skipped or misused
  • Low-code environments hide orchestration details
  • Delegation issues can lead to retries, errors or incorrect routing

Insights & Lessons Learned

Ask First: Do You Really Need an Agent?

Not every problem needs an agentic solution.

Sometimes an AI agent is simply not needed and may even perform worse than a straightforward approach. In such cases, using AI agents can introduce extra uncertainty, latency and cost.

Use an agent when:

  • Tasks are dynamic, ambiguous and multi-step
  • Tool access is essential
  • The exact sequence of steps is unknown
  • Decisions depend heavily on context

Avoid agents when:

  • Rules are fixed
  • Outputs are deterministic
  • Workflows are well structured
  • Standard LLM calls or functions are enough

Start Simple, Then Iterate

  • Begin with one agent, minimal tools, and clear instructions
  • Scale only once the basics work reliably

Single Agent vs. Multi-Agent

A single generalist agent can work for simple cases, but it becomes fragile when the use case grows in scope or requires diverse skills. Instead of relying on one agent with a large prompt and many tools, distribute the work across multiple specialized agents. Each agent focuses on a specific task, which leads to clearer roles, fewer errors and more stable performance.

Prompt Design Matters

System prompts define how an agent behaves throughout a conversation.

Effective prompts:

  • Define scope, role, task and boundaries
  • Include examples with types of input the agent will receive and the expected outputs
  • Remain consistent across all agents
  • Follow the single-responsibility principle
  • Allow agents to say “I do not know”, so that they do not hallucinate or act outside their scope

Good prompt design is essential because small wording changes can lead to major shifts in behavior, which means careful prompt engineering is required for consistent results.

For comprehensive guidance, refer to the official resource on prompt engineering techniques.

Picking the Right Model
Do not automatically choose the strongest or newest model.

Choose based on:

  • Cost
  • Latency
  • Reasoning needs
  • Tool call reliability

Use the AI Foundry Model Catalog to guide model selection.

foundry portal
Model Catalog in Foundry Portal

Choose the Right Orchestration Pattern

Well-designed systems are modular, like LEGO blocks. This makes them easier to maintain, reason about and scale.

  • Choose an orchestration pattern that fits the use case (sequential, concurrent, handoff, hybrid)
  • Design agents as independent components that combine into a larger whole
  • Version orchestration changes to keep track of updates and simplify debugging
  • Ensure the orchestration layer handles coordination, task prioritization and conflict resolution
  • Avoid overly complex patterns when a simple sequential or concurrent flow is sufficient
  • Reading tasks are inherently more parallelizable than writing tasks
  • Creating multiple agents is easy; enabling them to work together reliably is the real challenge
Multi agent patterns
Multi-agent Orchestration Patterns

Tools

As the number of tools and data sources grows, predictable behavior becomes more difficult. Tool reliability depends on model behavior, prompting and system design.

  • Assign only the tools an agent truly needs
  • Provide explicit guidance on when and how tools should be used
  • Use tool invocation logs for interpretability and observability
  • Expect occasional tool-call failures and design around them
  • Model differences matter; some models follow tool-use instructions more reliably than others
  • Keep tool sets focused to avoid unnecessary routing complexity

Memory

  • Local memory often works better than global memory
  • Not all agents need full chat history; give only the amount of context the agent needs (full, summarized or none)
  • Reducing unnecessary context helps with latency, token usage and consistency
  • Controlled context flow keeps multi-agent systems more stable

Continuous Evaluation

Agent systems require iterative testing and refinement.

  • Begin with a small evaluation set and expand gradually
  • Test with messy, multilingual, real-world inputs to simulate actual user behavior
  • Evaluate each agent independently, then test the full end-to-end flow to ensure routing and collaboration work correctly
  • Compare how a specialized agent and a supervisor agent handle the same query
  • Define metrics for routing accuracy, tool use, reasoning quality, and output correctness
  • Keep human-in-the-loop where appropriate
  • Collect user feedback and integrate it into the improvements
  • Monitor and version all changes

Latency

  • Reasoning, retrieval and tool calls substantially contribute to delays
  • Optimizing each step is essential for a smooth user experience

Key Takeaways

  • Avoid over-engineering, and use agentic solutions only when they add real value
  • Understand both the capabilities and limitations of agents
  • Guardrails are essential to keep the agent on track
  • Always evaluate outputs critically
  • There is no single best approach; experiment with and compare multiple approaches
  • After experimentation, ask whether agents truly outperform existing solutions or standard LLM approaches
  • Safety, reliability and responsible design matter
  • Current agent workflows are promising but not yet stable and reliable enough for production

Guardrails minimize risks by keeping the system aligned with its original scope, vision and design. Think of them as safety barriers: They prevent your agent from going off course.

Conclusion

AI agents are rapidly becoming part of everyday digital experiences, supporting automation, decision-making and complex workflows across industries. To build them effectively, teams need a clear understanding of their strengths, limitations and design principles that guide responsible use.

While we are still at the first stages of this journey, one thing is already clear from our experiments: Agentic systems hold significant potential, but they only excel when supported by thoughtful engineering, solid guardrails and continuous evaluation.

About the author

  • Buse Yalcinkaya Demir
  • Buse Yalcinkaya DemirData Scientist/Engineer
Buse started her career as a Data Scientist in the Big Data & Analytics team and later expanded her skills into data engineering, DevOps and (Gen)AI development. Over the past four years, she has contributed to the Digital Platform tribe, building and experimenting with ML and GenAI solutions that improve the developer lifecycle and engineers’ experience. Beyond her engineering role, Buse leads the GenAI community at Rabobank, supporting various initiatives and connecting people to foster collaboration, knowledge sharing, and responsible GenAI adoption. She is passionate about continuous learning, innovation, and empowering others through technology.