Case Study · Avataar.ai

Agentic AI Platform —
Digital Teammates That Deliver

Healthcare enterprises needed more than chatbots. They needed AI that could coordinate across systems, make decisions, and actually execute.

We built the multi-agent platform that made that possible. Not copilots. Not assistants. Agents that deliver.

TypeWork Project · 0→1

CompanyAvataar.ai

RoleAI Product Manager

Timeline2022 – 2024

StackLLMs · Multi-Agent · RAG · Python · AWS

StatusLive · 80%+ task accuracy

Project Walkthrough

Video walkthrough coming soon

A screen recording explaining the project end-to-end

Placeholder

80%+

Task completion accuracy (production)

Enterprise workflows automated end-to-end

0→1

Built the platform from scratch

The Brief

Healthcare enterprises don't need another chatbot. They need AI that can coordinate across systems, make decisions, and actually execute — not summarise, not suggest, but do.

When Avataar decided to build an agentic AI capability, the brief was deliberately open-ended: build something that works like a digital employee. The use case was enterprise workflow automation — specifically, the multi-step, multi-system workflows that currently require a human to sit in the middle and stitch things together.

I was the PM on this from day one.

The Problem With Existing Approaches

The LLM-as-chatbot model had a ceiling. You could answer questions, summarise documents, draft emails. But the moment a workflow required taking action — writing to a database, calling an API, routing based on live data — the model needed a hand-hold.

RPA handled some of this, but was brittle. RPA bots break when the UI changes, can't handle unstructured inputs, and have zero contextual awareness. A PDF invoice formatted differently than expected would silently fail.

What was needed was something in between: a system with the contextual reasoning of an LLM and the execution reliability of a well-engineered automation pipeline. That's what the Agentic AI Platform became.

The Architecture

The platform runs as a multi-agent orchestration loop. Each agent is specialised for a task type:

The Orchestrator agent reads the incoming request, breaks it into subtasks, and assigns them to specialist agents based on their capability registry.

Tool agents have access to specific APIs, databases, and services. They execute actions — fetching data, running queries, writing records — and return structured results.

A Validator agent checks each output against the task's success criteria before the Orchestrator decides whether to proceed or escalate.

A Memory layer stores context across the session — both short-term (within a workflow run) and long-term (user preferences, prior decisions, learned exceptions).

The human-in-the-loop path: when the Validator's confidence drops below threshold, the system packages the context and escalates to a human with a pre-filled decision form. One click from the human, system continues.

The Four Workflows

Four enterprise workflows were automated end-to-end:

Clinical documentation routing: incoming patient records classified, fields extracted, routed to the correct department queue, and logged in the EHR — without manual triage.

Supplier onboarding: new vendor applications ingested, documents validated, compliance checks run, and approval routed — reducing a 3-day process to under 4 hours.

Exception escalation: service tickets that exceeded SLA thresholds automatically prioritised, context compiled, and assigned to the right team with a summary — no human triage required.

Report generation: weekly operational reports compiled from live data sources, formatted to template, and distributed — replacing a recurring 2-hour manual task.

Across all four, task completion accuracy crossed 80% in production — our threshold for enterprise readiness.

Hard Problems

Agent reliability at the edges. The system worked well on the 80% of tasks that were clean and predictable. The 20% that weren't — malformed inputs, ambiguous instructions, edge-case data — were where it could fail silently. We invested heavily in the validation layer and explicit fallback paths. Every agent had to answer three questions: did I complete the task, how confident am I, and what should happen if I'm wrong?

Tool use and hallucination. Early versions of the tool agents would sometimes fabricate tool call results rather than surface an error. We solved this by making every tool call synchronous and validated — the agent couldn't proceed without a real response from the underlying system.

Enterprises want to see the work. Healthcare clients in particular needed audit trails — not just that the action was taken, but why, based on what data, with what confidence. The platform logged every agent decision with its inputs, outputs, and reasoning chain. Compliance wasn't an afterthought; it was a first-class feature.

What I Owned

End-to-end product definition: the agent architecture, the workflow templates, the human-in-the-loop UX, and the enterprise reporting interface.

Discovery: I ran 40+ interviews with healthcare enterprise buyers and operators to understand where their automation pain was sharpest and what it would take to earn trust.

The go/no-go framework: defining what 'production-ready' meant for an agentic system. We landed on: 80%+ accuracy across a representative test suite, explicit handling of all known failure modes, full audit logging, and a validated escalation path. No exceptions.

Enterprise sales support: building the demo environment, defining the benchmark methodology, and presenting the architecture to technical buyers who needed to understand what was running inside before they'd sign.

What I'd Do Differently

I'd narrow the scope earlier. Four workflows in parallel was ambitious — and it meant no single workflow was fully polished at launch. I'd have shipped one workflow end-to-end first, gotten it to 90%+ accuracy, earned a reference customer, and then expanded. Breadth is less convincing than depth when you're selling to enterprise.

I'd also have invested more in the observability tooling earlier. Understanding why an agent failed in production required digging through raw logs. A proper tracing layer — where you could replay a workflow run step by step and see exactly what each agent saw and decided — would have cut debugging time significantly and been a selling point in its own right.

The Takeaway

The hardest part of agentic AI isn't the AI. It's defining what 'done' looks like for a task, and what 'failed' looks like, and what to do when the answer is somewhere in between.

LLMs are probabilistic. Enterprises need deterministic. The system's job is to bridge that gap — to be confident where it can, transparent where it isn't, and to escalate the right things to the right people at the right time.

Get that contract right and the technology becomes almost invisible. The user experience isn't 'I'm using an AI'. It's 'this just works'.

See more of my work

Building AI products since 2020 — from spatial computing to finance automation.

Let's talk →View all work

Got a wild idea?
Let's build it.

Shubham Shrivastava — Head of Product, Neoflo.ai

LinkedIn Medium Mail Instagram x.com llm.txt

Agentic AI Platform —Digital Teammates That Deliver

Got a wild idea?Let's build it.

Agentic AI Platform —
Digital Teammates That Deliver

Got a wild idea?
Let's build it.