Extending the Backend of Business Central AI Agents with Azure AI Foundry

In this Areopa Academy webinar (#114), Artem Chernevskiy and Katerina Chernevskaya walk through how to extend the backend of Business Central AI Agents using Azure AI Foundry. The session covers the motivation for going beyond built-in capabilities, two live demos—one using the Azure AI Foundry Agents portal and one using Prompt Flow with RAG—plus a look at a custom Power Apps evaluation tool built for real-world supervisor use cases.

When Built-In AI Capabilities Aren’t Enough

Artem opens by framing the central problem: while built-in Business Central AI agents can handle the majority of common tasks, they fall short when organizations need custom logic (such as complex pricing rules or discount engines), industry-specific compliance rules (legal wording, invoice processing), or data from outside Business Central such as CRM systems, Microsoft Fabric, or SharePoint. Security and compliance requirements—particularly relevant in the EU given AI Act regulations—add further constraints that generic agents cannot address.

Artem categorizes AI agents into three tiers: simple information agents grounded on company data, task-based agents that can update databases and trigger business flows, and advanced autonomous agents that operate independently with memory across multiple steps. The more sophisticated the requirement, the more customization is needed.

Slide: Why Built-In Isn't Always Enough — listing custom logic, industry-specific scenarios, external data context, security/compliance needs, and multi-modal requests as reasons to extend beyond native BC AI capabilities — ▶ Watch this segment

📖 Docs: What is Microsoft Foundry Agent Service? — Overview of the Azure AI Foundry Agent Service, including supported tools, model catalog access, and SDK/REST integration options.

Four Primary Considerations for Agent Development

Artem presents a framework for thinking about custom AI agents built on four pillars: Knowledge (providing the agent with the right context and data), Actions (defining what the agent can do, from database queries to computer-use), Security (controlling data access, audit logging, and compliance), and Evaluation (ensuring the agent doesn’t just work, but works correctly and accurately).

He emphasizes that evaluation is distinct from testing. Testing checks whether a system runs; evaluation measures the quality of outputs—accuracy, groundedness, coherence—which is the harder and more important problem in production AI deployments.

Slide: Successfully Developing AI Agents Requires 4 Primary Considerations — Knowledge, Actions, Security, and Evaluation, with Knowledge circled and annotated — ▶ Watch this segment

Navigating the Microsoft Framework Landscape

With Microsoft offering multiple overlapping tools for agent development, the session includes a structured comparison. The Microsoft stack—Prompt Flow, Copilot Studio, Semantic Kernel, and the Agent Framework—excels in enterprise integration, governance, and Azure-native orchestration. Microsoft Research’s Autogen is suited for research and multi-agent experimentation but is not recommended for production systems at scale.

Third-party options like LangChain, Haystack, Flowise, and CrewAI each fill specific niches, from high-flexibility custom LLM pipelines to quick MVP prototyping. For this webinar, the focus is on the Azure AI Foundry Agent Framework (Artem’s demo) and Prompt Flow (Katerina’s demo).

Comparison table of AI frameworks including Azure AI Foundry/Prompt Flow, Copilot Studio, Semantic Kernel, LangChain, Autogen, Haystack, Flowise, and CrewAI, comparing provider, focus, core features, architecture style, dev effort, and best use cases — ▶ Watch this segment

Demo Part 1: Building an Agent in Azure AI Foundry (Artem)

Artem demonstrates the no-code path to creating an agent in the Azure AI Foundry portal. Starting from the Agents section, he creates a new agent, sets the system prompt (“Act as professional sales analytics”), and selects a model deployment from the catalog. The model catalog contains over 125 models from providers including OpenAI, xAI, DeepSeek, and Microsoft—making it straightforward to swap models or deploy fine-tuned variants.

Azure AI Foundry portal showing the model catalog with 125 available models including GPT-5, o3-pro, o1, Phi-4-mini-reasoning, grok-3-mini, DeepSeek, Sora, and others — ▶ Watch this segment

For knowledge, he uploads a PDF containing sample sales pipeline data. He clarifies an important distinction: unstructured documents (Word, PDF) can be uploaded directly to the agent’s knowledge store, but tabular data (Excel, CSV) should instead be attached through the Code Interpreter action. Code Interpreter writes and executes Python code in a sandboxed Azure environment, enabling accurate numerical analysis rather than relying on the LLM to reason over raw tables.

Azure AI Foundry Agents interface showing the file upload dialog for adding knowledge to an agent, with supported file types listed and a Sales PDF being uploaded to a new vector store — ▶ Watch this segment

In the playground, Artem queries “Who is the best salesperson?” The agent uses Code Interpreter to analyze the Excel file and returns Lena Powels as the top performer—with supporting evidence from the data. He then uses View Run Info to inspect the full thread log: what tools were called, which documents were retrieved, what code was executed, and how long each step took. This explainability layer is a key differentiator of the Agent Framework versus simpler chat completions.

Azure AI Foundry Agents playground showing the response to 'Who is the best sales person?' — identifying Lena Powels based on code-interpreted analysis of uploaded sales data — ▶ Watch this segment

To connect the agent to an external system like Business Central, Artem shows the Create Trigger feature, which provisions an Azure Logic App. The Logic App must be configured manually with the appropriate connector parameters and authentication, but it provides a standard integration path for triggering the agent from BC events or other workflows.

Demo Part 2: Prompt Flow with RAG (Katerina)

Katerina takes over to demonstrate Prompt Flow, the choice when you need fine-grained control over the data retrieval and generation pipeline. She begins by describing the required Azure resource stack: an Azure Blob Storage account (to store and organize documentation in folders), Azure AI Search (to index and vectorize documents into chunks), Application Insights (for telemetry and observability), and an Azure OpenAI resource for model deployments. All of these connect into an Azure AI Foundry hub-based project.

An important technical distinction she highlights: the newer Foundry-based resource type does not include Prompt Flow. To use Prompt Flow, you must create an Azure AI Hub resource and work within an Azure AI Project nested under that hub.

📖 Docs: Prompt Flow in Microsoft Foundry portal — Covers the three flow types (Standard, Chat, Evaluation), the visual designer, compute sessions, and deployment to managed online endpoints.

Katerina walks through a pre-built Chat-type Prompt Flow designed for RAG. The flow graph includes nodes for question understanding, search query generation, Azure AI Search lookup (retrieving relevant document chunks), conditional execution (only fetch chunks if a valid search query was identified), and final answer generation via GPT-4.1. Inputs include chat_history, question, folder, and instructions; outputs include answer and search_queries.

Azure AI Foundry Prompt Flow designer showing a chat-type RAG flow with nodes for question understanding, validate output, get search queries, has search queries, get chunks, get search results, answer generation, and meta telemetry log — ▶ Watch this segment

📖 Docs: Retrieval Augmented Generation (RAG) and indexes in Microsoft Foundry — Explains how Azure AI Search integrates with Foundry to split documents into vectorized chunks and retrieve context-relevant passages for LLM grounding.

API Endpoint Deployment and Configuration

Once a Prompt Flow is ready, Katerina demonstrates deploying it to a managed online endpoint. The deployment wizard covers endpoint name, deployment name (supporting A/B testing across multiple deployments), virtual machine selection, instance count, data collection, identity, and Application Insights integration. She recommends enabling the Application Insights toggle to collect telemetry—latency, token usage, and generated responses—for ongoing monitoring.

After deployment, the Consume tab of the endpoint shows the REST API URI, primary key, and the required request format. API clients (including Business Central AL code using HttpClient) send a POST request with Content-Type: application/json, a bearer token, an optional azureml-model-deployment header (required when routing to a specific deployment), and a JSON body containing all input variables defined in the flow.

Evaluating Prompt Flow Performance

Katerina covers the built-in evaluation capability. To run an evaluation, you prepare a .jsonl file containing sample inputs and expected answers. Azure AI Foundry runs the flow against these inputs, compares generated answers to ground-truth answers using an LLM-as-judge approach, and scores each interaction on metrics including groundedness, coherence, fluency, and GPT similarity. Scores of 4–5 indicate high alignment; a score of 2 signals the response diverges significantly and warrants investigation of the prompt instructions or chunking strategy.

📖 Docs: Evaluation and monitoring metrics for generative AI in Azure AI Foundry — Describes built-in evaluators for quality (coherence, fluency), RAG-specific metrics (groundedness, relevance), and safety metrics (hate/unfairness, violence).

Real-World Use Case: AI Supervisor Portal in Power Apps

The final section presents a practical outcome from a real project: a customer contact center needed supervisors to evaluate AI agent responses without accessing the Azure AI Foundry portal. Artem and Katerina built a Power Apps Canvas app called the AI Supervisor Portal that lets supervisors select a predefined business case, send a question, and review the full context: the AI-generated answer, the search queries that were issued, the document chunks that were retrieved, and a human-readable version of the source material.

Supervisors can flag issues and submit feedback directly to the development team from within the app. Voice input and folder selection for scoping the RAG search are also supported.

AI Supervisor Portal built in Power Apps Canvas app, showing a chat playground on the left and a Query Explorer on the right with AI-generated search queries, related documents, and top-rated document chunks from Azure AI Search — ▶ Watch this segment

The solution is being submitted to the Microsoft Adoption Sample Solution Gallery as a free, unmanaged template with documentation and prerequisites, so the community can use it as a starting point. Follow Katerina Chernevskaya and Artem Chernevskiy on LinkedIn to be notified when it is available.

Key Takeaways

Built-in Business Central AI agents cover most scenarios, but custom logic, compliance rules, and external data sources require extending via Azure AI Foundry.
The Azure AI Foundry Agent Framework is the recommended starting point for no-code and low-code agent development; it supports file search, Code Interpreter, connected agents, and Logic App triggers.
Use the Code Interpreter action for tabular data (Excel, CSV)—not direct file upload—to get accurate, code-verified numerical analysis.
Prompt Flow (hub-based project required) gives full pipeline control for RAG scenarios: chunking, indexing, conditional retrieval, and model selection per step.
Built-in evaluation metrics (groundedness, coherence, fluency, relevance) make it possible to measure—not just assume—that your agent is producing quality responses.
A custom Power Apps evaluation UI gives business users and supervisors the transparency they need to validate AI output without access to Azure tooling.

This post was drafted with AI assistance based on the webinar transcript and video content.