The Prompt Is Dead. Here's What Replaced It

Prompts don't scale. Skills do. How Anthropic's new architecture solves the consistency problem that's been breaking AI products - and why Microsoft, OpenAI, and major partners already adopted it.

A structured AI system where a central reasoning core selectively activates modular skills, illustrating how agent skills replace large prompts to deliver consistent and scalable AI behavior.
A structured AI system where a central reasoning core selectively activates modular skills, illustrating how agent skills replace large prompts to deliver consistent and scalable AI behavior.

Your AI prompts are already obsolete. And the hilarious part? You're probably writing longer ones right now to compensate.

Three months ago, Anthropic dropped something that fundamentally changes how AI products work. Microsoft grabbed it. OpenAI grabbed it. The spec is open-source. And somehow, most builders are still out here writing essays in system prompts like it's 2023.

Stop prompting. Start architecting.

I've been experimenting with Skills at Neoflo. Let me show you what actually changes.

The Problem: When Prompts Become Prayer

Here's something you've definitely experienced.

You ask ChatGPT to write an email. First try? Generic corporate slop. You refine the prompt - add tone, context, examples. Second try? Better, but still off. Third try with the perfect 200-word prompt? Finally nails it.

Now imagine someone else asks for the same email. They write a vague prompt. Output? Back to corporate slop.

Same AI. Same task. Wildly different quality based on who's prompting.

You're one good prompt away from great results. And one bad prompt away from garbage.

This is fine when you're the only user. But when you're building a product? When you need consistent outputs for everyone? You hit a wall.

Hard.

At Neoflo, we're automating finance workflows - invoice processing, expense approvals, reconciliation. I spent weeks crafting the perfect system prompt. Tens of thousands of tokens of rules, examples, edge cases. Worked great when I tested it.

Then real users touched it. Finance managers who understand the nuances got perfect outputs. AP clerks new to AI? Complete chaos.

Your prompt engineering only scales as far as your worst user.

This is the dirty secret of AI products: output quality is directly proportional to prompt quality. And you can't control how users prompt.

I tried cramming more into the system prompt. Hit token limits. Slowed down inference. Still didn't solve consistency.

The problem wasn't the AI. It was the architecture.

That's when I started experimenting with Skills.

What Skills Actually Are (The Architecture I Wish I Had Earlier)

Think about how you'd train someone new.

You don't print the entire employee handbook and make them memorize it. That would be insane.

Instead: "Here's how we approve invoices over $5K." "Here's what to do when a vendor is on the exception list." "Here's how to escalate when amounts don't match."

Context-appropriate onboarding.

Skills do this for AI. Nothing magical - just better architecture.

A Skill is a folder containing:

  • Instructions (markdown with the knowledge)

  • Scripts the AI can execute

  • Templates and checklists

  • Reference documents

When a task needs a specific skill, AI loads it. When done, unloads. Context stays clean.

How I'm Testing This at Neoflo (Early Results)

Here's an experiment I'm running with our most problematic workflow.

Three-way invoice matching - matching purchase order, invoice, and goods receipt. It has 14 different validation rules, 6 exception cases, and different behavior based on vendor category.

Old approach: Everything in the system prompt.

  • Result: Inconsistent accuracy

  • Users need to write detailed prompts to get it right

  • Quality varies wildly based on who's using it

Skills approach: Building a skill called "three_way_match_v1"

  • Package all 14 rules

  • All 6 exception flows

  • Vendor category logic

  • Examples for each case

Load it only when invoice matching is needed.

Early testing shows:

  • More consistent outputs across different prompt styles

  • Faster inference (smaller active context)

  • Easier to update rules without rewriting the entire prompt

Same AI model. Different architecture.

The consistency improvement is what matters. It's the difference between "works when I use it" and "works for everyone."

One skill. Zero variance. That's the unlock.

Progressive Disclosure: The Part That Actually Unlocks Scale

Here's what makes this work.

At Memoria (my side project for personal AI memory), I started with the obvious approach - MongoDB + RAG. Capture everything, store it, use retrieval to pass relevant context to the LLM. V1 worked, but I kept hitting the same problem: how do you decide what context is "relevant" when the AI needs to act a specific way?

Skills solve a different piece of the puzzle.

RAG is great for "what do I know" - retrieving facts, past conversations, user preferences.

Skills handle "how do I act" - the procedures, workflows, and consistent behavior patterns.

RAG = memory. Skills = personality.

Progressive disclosure makes both work together.

Old way: Load all knowledge upfront. Hope AI picks what's relevant.

Skills way:

  1. Load skill names and descriptions (10-20 tokens each)

  2. AI decides which skill it needs

  3. Load only that skill's full instructions

  4. If more detail needed, load supporting docs

For a production system, you might have dozens of skills:

  • Invoice matching

  • Expense categorization

  • Vendor verification

  • Cash application

  • Reconciliation rules

  • And many more

Each skill could average 5,000 tokens of detailed instructions. If you loaded all of them into every conversation, you'd blow through context limits fast.

With progressive disclosure? You only load what's needed for the current task.

The limit isn't context windows anymore. It's how well you organize knowledge.

That's the shift. From "how much can I cram in" to "how smart can I load."

Why This Changes Everything for Enterprise AI

Here's the question enterprise teams actually care about: "How do we make sure the AI follows our rules consistently?"

Before Skills, the honest answer was: "Write very detailed prompts and hope."

Now: "Package your rules into skills. Same behavior, every time, for every user."

That's the unlock for enterprise.

Central control. Provision skills per client. Their rules. Their workflows. Their exceptions. Update a skill, all users get the new behavior instantly. No retraining. No documentation updates. Just... works.

Zero prompt engineering required. AP clerks don't need to know how to prompt. The skill handles it. Consistency stops being a user skill problem.

Auditability. Trace which skill version was active for any transaction. Finance teams care about this more than you'd think. "Why did the AI approve this?" becomes "Skill v2.3 on line 47 says X." Compliance teams love it.

Composability. Chain skills: vendor_verification → three_way_match → approval_routing → payment_processing. Each skill handles one thing well.

This is the difference between "works when I prompt it perfectly" and "works consistently for everyone."

It's the difference between a demo and a product.

The Open Standard Play (And Why It Actually Matters)

December 2025. Anthropic releases Agent Skills as an open spec.

Not proprietary. Not locked to Claude. Just open.

I've seen this playbook before with MCP (Model Context Protocol). Release a spec. Make it easy to adopt. Become the standard.

Who's in:

  • Microsoft (VS Code, GitHub)

  • OpenAI (yes, the competition adopted it)

  • Cursor

  • Atlassian, Canva, Notion, Figma

The spec lives at agentskills.io. It's well-written and worth reading.

For builders, this means:

  • Skills you write for Claude work in ChatGPT

  • Skills you write for one client work everywhere

  • The ecosystem compounds - someone will build the skill you need

At Neoflo, we're betting on this standard. Every workflow we build is a skill. When a client switches AI providers (they will), the skills move with them. We're not locked to any model.

That portability is strategic insurance.

Build once. Run anywhere. Actually.

Skills vs Other Approaches (Quick Comparison)

If you're confused about where Skills fit, here's the short version:

n8n workflows: Great for connecting APIs and automating tasks. Doesn't help with AI consistency - it's orchestration, not knowledge management.

Claude's built-in memory: Stores user preferences and facts. Good for "remembering" but doesn't enforce behavior patterns or procedures.

.md files in Claude Projects: Better than nothing. You can upload documentation, but the AI loads all of it every time. No progressive disclosure, no modularity.

Skills: Modular, versioned, progressively loaded procedural knowledge. Built for consistent AI behavior at scale.

Think of it this way: Memory is the hard drive. Skills are the operating system. n8n is the API layer. You need all three.

(I'm writing a full breakdown of this - more coming soon.)

Do This Next (What I'd Do If Starting Today)

Here's what I'd recommend based on what actually worked:

  • [ ] Find your most inconsistent task. Don't start with what's easy - start with what produces wildly different outputs for different users. That's your biggest pain point and your best ROI.

  • [ ] Build one skill. Take that task. Extract every rule, example, and exception. Package it into a skill format. Start simple - you can always add complexity.

  • [ ] A/B test it for a week. Run 50 tasks through your old method. Run 50 through the skill. Measure consistency, not just accuracy. Track the variance between best and worst outputs.

  • [ ] Read the spec at agentskills.io. See how others structure skills. Steal good patterns. The tax preparation skill is particularly well done.

  • [ ] Version your skills like code. Use skill_name_v1, skill_name_v2 format. Makes rollback easy when something breaks (it will).

Start small. One skill. One task. Measure the difference. Then scale.

The companies moving on this now will have measurably better AI products in six months.

The rest will still be debugging prompts.

Key Takeaways

  • Prompts rely on user skill. Skills don't. Consistent behavior regardless of who's prompting is the unlock for enterprise AI.

  • Progressive disclosure breaks context limits. Load what you need when you need it. Skills make your knowledge base effectively unlimited.

  • RAG handles "what you know." Skills handle "how you act." They solve different problems. MongoDB + RAG for facts and retrieval. Skills for procedures and consistent behavior. Use both.

  • This is already the standard. Microsoft, OpenAI, major partners adopted it. The ecosystem is building momentum fast.

  • Skills are portable knowledge. Build once, run across providers. When (not if) you switch AI models, your skills move with you.

The takeaway: Prompts got you this far. Skills will get you to production.

Keep Reading

If this clicked, these connect directly:

More AI Product Insights

I write about what I'm building and learning at heyshubh.com.

Connect with me on LinkedIn - always up for talking about AI product challenges.


Next week:
Claude's new Chrome integration - is it actually better than Comet and Dia, or just more hype? I've been testing all three. The results aren't what I expected.