Skip to main content

Command Palette

Search for a command to run...

Foundry vs Semantic Kernel vs AutoGen : What I Actually Use in Production

Updated
7 min read
Foundry vs Semantic Kernel vs AutoGen : What I Actually Use in Production
K
Solution Architect at IBM with 12+ years in enterprise software, cloud, and applied AI. I write about what production agentic AI actually looks like, the parts that don't fit the marketing. Senior Member, IEEE.

There are three real options for building agentic AI on the Microsoft stack right now. Azure AI Foundry Agent Service, Semantic Kernel, and AutoGen. Pick the wrong one and you'll spend three months refactoring. Pick the right one and most of your problems disappear.

This isn't a feature comparison. The docs already do that and they all sound the same. This is what I've actually seen work and not work after building agentic systems for enterprise customers across the last year.

The short version

If you're building one agent that does one job, use Foundry. If you're building a system where agents need to coordinate, use AutoGen. If you're embedding agentic logic inside an existing .NET or Python application, use Semantic Kernel.

Most teams pick the wrong one because they read the marketing.

Foundry Agent Service

Foundry is the closest thing Microsoft has to a managed agent platform. You define your agent, attach tools, set instructions, and it handles state, threads, and tool execution for you. Connected Agents lets you compose multiple Foundry agents that pass work between each other.

Where Foundry wins. Single-agent workloads where reliability matters more than flexibility. Customer-facing chat agents. Internal support bots. Document processing pipelines. Anything where you want Microsoft to handle the operational complexity and you want to focus on the prompt and the tools.

Where Foundry breaks. The moment you need fine-grained control over how agents coordinate, you hit walls. Connected Agents look like multi-agent orchestration but they're really sequential delegation. State is managed for you, which is great until you need to inspect it, modify it, or persist it differently. The thread model assumes a conversation. If you're not building a conversation, you're fighting the framework.

My take. Foundry is the right starting point for 70% of enterprise use cases. Most teams don't actually need multi-agent. They think they do because the demos look cool, but their actual workflow is one agent doing one job. Foundry is good at that and gets out of the way.

Semantic Kernel

Semantic Kernel is a library, not a platform. You import it into your existing application and use it to add LLM calls, function calling, and basic planning. It's available in C# and Python.

Where Semantic Kernel wins. You already have a .NET application and you want to add AI capabilities without rewriting it. You want full control over the host process. You want to embed agentic logic deep inside business logic rather than running agents as a separate service. Your team is already comfortable with the Microsoft stack and doesn't want to learn a new abstraction layer.

Where Semantic Kernel breaks. It's not really an agent framework. It's an LLM toolkit with planning bolted on. If you try to build a true multi-agent system in Semantic Kernel, you'll spend most of your time writing orchestration code that AutoGen would give you for free. The planning capabilities have evolved a lot but they still feel like they're optimizing for the wrong abstraction.

My take. Semantic Kernel is the right choice when AI is a feature inside a larger application, not the application itself. If you're a .NET shop adding a recommendation engine, use Semantic Kernel. If you're building an agent product from scratch, don't.

AutoGen

AutoGen is Microsoft Research's multi-agent framework. Agents are Python objects, they communicate via messages, and you compose them into conversation patterns that solve complex tasks. The latest version splits into a Core (low-level messaging) and AgentChat (higher-level patterns).

Where AutoGen wins. Multi-agent coordination is its native abstraction. If you actually have multiple specialized agents that need to negotiate, critique each other, or solve problems through structured dialogue, AutoGen does this better than anything else in the Microsoft stack. Research workflows, code generation pipelines with critic agents, and any system where the value comes from agents disagreeing with each other.

Where AutoGen breaks. Production maturity. The framework is still moving fast and the abstractions are still shifting. Deploying it as a managed service requires you to build the operational layer yourself. Error handling across agent conversations gets messy. Observability is what you make of it. The team that built it is brilliant but they're not optimizing for enterprise deployability yet.

My take. AutoGen is the right choice when multi-agent is the actual point, not a feature you've talked yourself into. If you can't articulate why your system needs more than one agent, don't use AutoGen. If you can, nothing else in the Microsoft stack will give you what AutoGen gives you.

The decision tree I actually use

Here's how I think about this in customer conversations.

First question: does your workflow need more than one agent? Most don't. If you're building a customer support bot, a document Q&A system, a data extraction pipeline, or an internal copilot, you almost certainly need one agent. Use Foundry. Move on.

Second question: is the agent the product, or is the agent inside a product? If the agent is your product, use Foundry or AutoGen depending on the answer to the first question. If the agent is a feature inside an existing application, use Semantic Kernel.

Third question: do you actually need agents to disagree, critique, or negotiate? This is the AutoGen question. If your "multi-agent" system is really one agent handing work to another in a fixed sequence, that's not multi-agent. That's a pipeline. Use Foundry Connected Agents or roll your own orchestration on top of a single agent platform. AutoGen is for when the back and forth between agents is doing real work.

The mistake everyone makes

Teams pick the framework they want to use, then build a workflow that justifies it.

I've seen teams adopt AutoGen because multi-agent sounds impressive, then spend months building infrastructure that Foundry would have given them for free. I've seen teams use Semantic Kernel because they're a .NET shop, then realize halfway through that what they actually built is an agent platform that should have been on Foundry.

The framework should follow the workflow. Not the other way around.

What I'd build today

If I were starting a new enterprise agentic AI project tomorrow, here's my default stack.

Foundry for the agents. Connected Agents for sequential composition where it fits. MCP for tool integration because it's the protocol Microsoft has aligned with and you don't want to be writing custom connectors in two years when everyone else has standardized. AutoGen only for the parts of the system that genuinely need multi-agent coordination, and only when that need is concrete, not theoretical.

This is not the most exciting stack you can build. It's the one that ships and stays alive in production.

Closing thought

The agent framework wars in 2026 look a lot like the JavaScript framework wars in 2016. Everyone is convinced their framework is the future. Most of them won't matter in three years. What will matter is whether your team can ship reliable systems on whatever they pick.

Foundry will probably still be here. Semantic Kernel will probably still be embedded in enterprise codebases. AutoGen will probably evolve into something even more research-focused or get absorbed into Foundry's roadmap.

Pick the one that matches your actual workflow today. Refactor later if you have to. The cost of refactoring is usually less than the cost of choosing the impressive framework over the right one.


Solution Architect at IBM. Writing about what production agentic AI actually looks like in enterprise environments, including the parts that don't fit the marketing.

5 views

AI Systems in Production

Part 2 of 6

This series covers what actually happens when AI systems move from demo to production - agent workflows, LLM behavior, failure modes, and the architectural decisions that make systems reliable at scale.

Up next

95% of AI Pilots Fail. Here's What I See From the Inside.

The number gets thrown around so often it almost loses meaning. MIT, Gartner, RAND, McKinsey. They all keep arriving at the same answer from different angles. Most enterprise AI pilots never become an