9 minutes
My Agent Army Using the HMAS Architecture
I’ve been diving head-first into the world of AI-driven development lately, specifically playing around with OpenCode and its agent framework. Like many, my first attempts were with a single, monolithic agent. You ask it to do something, and it… well, it tries to do everything. It plans, it codes, it researches, all in one massive, chaotic context window.
It works, sort of. But it always felt a bit fragile. The agent would often get lost, forget the initial goal, or just pollute its own context window to the point of confusion. I’d heard that setting up subagents was a more context-efficient solution, and this got me thinking. If I were to build a real suite of agents, a “virtual team” to build the best software possible, what should that look like?
The Problem with “One Big Brain”
My initial hunch about efficiency turned out to be the core problem. The biggest challenge for any LLM is the finite context window. When one agent tries to do everything, it’s like a developer trying to code, read five API docs, and attend a planning meeting all at the same time. You can’t keep it all in your head.
Research from teams at Anthropic confirms this. The most successful agentic systems use subagents with their own “isolated context windows”1. A primary agent can delegate a heavy task (like sifting through a huge repository) to a specialist. That specialist does its work and then passes back just a “lightweight reference” or a simple result, not its entire messy thought process. This keeps the main agent’s context clean and focused, which is a massive win for performance and reliability.
This idea of specialised agents collaborating is a key finding in building scalable AI systems. It’s been shown to improve everything from reasoning to thorough validation.
The “Aha!” Moment: The Hierarchical Model
This led me down a rabbit hole of agent architectures. I wasn’t just building a collection of agents; I needed an architecture. I soon stumbled upon the concept of a Hierarchical Multi-Agent System (HMAS), and it just clicked.
The idea, as outlined in some fantastic research2, is to structure agents not as a flat “swarm” but as a company org chart, with clear layers of responsibility:
- Layer 1 (Strategy): This is the “leader or orchestrator.” It decides what matters most and in which order.
- Layer 2 (Planning): These are the “decomposers.” They re-express those priorities into consumable subtasks for the specialists.
- Layer 3 (Execution): These are the “workers.” They are the specialists that perform the actual work, like generating code or running tests.
This structure immediately solves the cognitive load problem. The Coder agent doesn’t need to know the entire business strategy; it just needs to know the technical spec for the function it’s writing.
The best part? This isn’t just theory. The OpenCode framework maps to this perfectly.
- OpenCode’s
primaryagents are the perfect fit for the L1 Orchestrator. - OpenCode’s
subagentsare the L2 Planners and L3 Execution specialists.
You can even define each of these subagents as a simple Markdown file in the .opencode/agent/ directory, each with its own prompt, model, and toolset. This makes your “virtual team” a version-controlled, modular part of your repository.
My Virtual Software Ensemble
So, I set out to design my “virtual team” based on this three-layer HMAS architecture. Here’s the breakdown of the agents I’m building and their responsibilities.
Layer 1 (Strategy): The Central Orchestrator
This layer has just one agent, the boss.
Orchestrator(The Project Manager): This is the mainprimaryagent and the only one I talk to. It’s the “Project Manager AI Agent.” Its job isn’t to code, but to manage the workflow. It interprets my goal, asks clarifying questions , and then delegates every single task to the specialists below.
Layer 2 (Planning): The Decomposers and Architects
This is the “middle management” that turns a vague idea into a concrete plan. This layer is the key to stopping “vibe coding” and enforcing “spec-driven development.”
Planner(The Product Manager): This agent takes my goal (e.g., “Add user auth”) and acts like a Product Manager. It decomposes the goal into detailed artifacts: user stories, testable acceptance criteria, and a structured list of sub-tasks for theCoder.Architect(The AI/Knowledge Architect): This agent acts as the team’s “Knowledge Architect.” It reads theCONTRIBUTING.mdand other architecture docs to define the technical constraints. It answers questions like, “What libraries should we use?” or “What’s our standard API pattern?”
Layer 3 (Execution): The Specialist Worker Agents
This is where the magic happens. These are the “doers” who get their tasks from the L2 agents.
Research(The Technical Researcher): A specialist agent that can be called by theArchitectorCoder. Its only job is to answer open-ended questions, like “Compare these two libraries” or “Find an example of how to implement this OAuth flow.”Coder(The Developer): The workhorse. This agent only acts on the detailed, unambiguous tasks from thePlannerandArchitect. It’s an execution agent, not a planner. Its job is to write the code to spec.Test(The QA Engineer): This is a “Verifier Agent.” It’s an autonomous QA engineer that reads thePlanner’s acceptance criteria and generates the unit and integration tests. Even better, it’s capable of “self-healing tests”—detecting when a UI change breaks a locator and fixing the test script automatically.Debugger(The Root Cause Analyst): This agent is my favourite. It’s an “AI Incident Investigator” that’s only invoked when theTestagent’s run fails. Its job isn’t just to report the failure, but to find the root cause by correlating logs, code changes, and test results.Security(The Security Auditor): Another “Verifier Agent.” This one is inspired by OpenAI’s “Aardvark” agent3. It scans new code for vulnerabilities (like the OWASP Top 10) and exposed secrets, acting as an autonomous, continuous security review.
The Magic: The Autonomous “Inner Loop”
This is where it all comes together. The big problem in AI development is the “speed vs. trust” gap. Agents are fast at generating code, but we can’t trust it. This architecture builds an autonomous, internal development loop that solves for trust.
Think about this flow:
- The
Coderagent finishes its task. - The
Orchestratorimmediately delegates to theTestagent. - The
Testagent runs and… it fails! - In a monolithic system, the process would stop and wait for me. But here, the
Orchestratorjust invokes theDebuggeragent. - The
Debuggeragent performs its root cause analysis and reports back: “The test failed becauseCoderforgot to inject theDatabaseService.” - The
Orchestratorthen passes this report back to theCoderagent with a new instruction: “Apply this fix.” - The
Coderapplies the patch, and the loop starts over.
This Coder -> Test -> Debugger -> Coder cycle can iterate multiple times without any human intervention. I only see the final, validated, tested, and secured result.
See this in practice
Here’s some snippets from a recent session:
And the outcome:
Summary: The Agentic Software Ensemble
Here’s a quick summary of the whole virtual team.
| Agent Name | HMAS Layer | Primary Responsibility | Key Collaborator(s) |
|---|---|---|---|
Orchestrator | L1 (Strategy) | The “Project Manager.” Interprets user goals, creates high-level plans, and delegates to all subagents. | User, @Planner, @Coder, @Test |
Planner | L2 (Planning) | The “Product Manager.” Decomposes goals into detailed specs, user stories, and acceptance criteria. | Orchestrator, @Coder, @Test |
Architect | L2 (Planning) | The “Knowledge Architect.” Defines technical constraints, selects technology, and ensures alignment with project standards. | Orchestrator, @Planner, @Coder |
Research | L3 (Execution) | The “Technical Researcher.” Investigates libraries, analyzes API documentation, and answers complex technical questions. | Architect, Coder |
Coder | L3 (Execution) | The “Developer.” Generates and modifies code based on the specific plan from the L2 agents. | Planner, Architect, @Test |
Test | L3 (Execution) | The “QA Engineer.” Generates and runs tests based on acceptance criteria. Capable of “self-healing” broken tests. | Planner, Coder, @Debugger |
Debugger | L3 (Execution) | The “Root Cause Analyst (SRE).” Invoked on test failure. Correlates logs and changes to find the root cause of bugs. | Test, Orchestrator, @Coder |
Security | L3 (Execution) | The “Security Auditor.” Scans code for vulnerabilities (OWASP, secrets) and suggests remediation patches. | Orchestrator, Coder |
Conclusion
When I wrote about my “Second Brain”, it was a journey to find a system for organising what I read. This feels very similar. It’s not just about getting an AI to write code; it’s about building a system I can trust.
This hierarchical, role-based approach transforms AI agents from being a clever, unpredictable “tool” into a reliable, autonomous “team.” It’s a system that’s not just fast, but trustworthy. And as an engineer, that’s the only thing that really matters.