- Towards AGI
- Posts
- Microsoft Just Revealed the Biggest Problem With Agentic AI
Microsoft Just Revealed the Biggest Problem With Agentic AI
Today, we’re diving into:
Hot Tea: Why enterprises cannot automate everything anymore
Open AI: Vibe-coding is turning non-technical employees into AI builders
Open AI: OpenClaw’s robotic breakthrough shows where the next AI race is going
The AI Boom Has a Cost Problem Nobody Properly Calculated
For two straight years, enterprise AI has been sold as a cost-cutting machine.
Automate workflows → Reduce headcount → Increase productivity → Scale output without scaling teams.
Now some of the biggest companies deploying AI internally are discovering something uncomfortable: inference costs at enterprise scale may rise faster than the labor costs they were supposed to replace.
Microsoft has reportedly started restricting internal access to Anthropic’s Claude Code after widespread employee adoption dramatically increased compute spending. The company is now steering teams toward GitHub Copilot CLI instead.
Around the same time, Uber CTO Praveen Neppalli Naga revealed that Uber burned through its entire 2026 AI coding assistant budget in just four months after internal usage exploded.
This is not a tooling problem. It is an architecture problem.

Most enterprise leaders still think about AI like SaaS software. Agentic systems do not behave like SaaS.
Traditional software executes deterministic requests. Agentic AI systems continuously reason, retry, chain prompts, retrieve enterprise context, evaluate outputs, call APIs, trigger secondary models, and orchestrate downstream systems simultaneously.
Every action consumes tokens.
And token consumption compounds extremely fast once multi-agent orchestration enters production environments.
Goldman Sachs now projects that agentic AI could drive a 24x increase in token consumption by 2030, reaching nearly 120 quadrillion tokens per month globally.
Gartner has already warned enterprises that falling per-token prices will not automatically reduce AI bills because frontier reasoning systems consume dramatically more compute per task.
That distinction matters more than most leadership teams realize.

A single AI-generated summary inside Slack is cheap.
An autonomous procurement agent simultaneously parsing ERP records, validating contracts, querying supplier databases, triggering compliance checks, generating approvals, and escalating anomalies across multiple enterprise systems is not.
The computational profile is entirely different.

This is why NVIDIA VP Bryan Catanzaro recently admitted, “For my team, the cost of compute is far beyond the costs of the employees.”
The economics become even more dangerous when companies deploy AI for signaling instead of operational leverage.
Right now, many enterprises are incentivizing teams to maximize AI usage rather than measuring marginal productivity gains per inference dollar spent. Internal dashboards reward adoption velocity, not infrastructure efficiency.
That creates a hidden enterprise risk: organizations begin deploying autonomous agents into workflows where the operational value generated is smaller than the compute, governance, security, and orchestration cost required to sustain them.

And the scaling curve gets brutal very quickly.
As agents become more autonomous, enterprises also inherit:
• Higher inference and GPU utilization costs
• Persistent vector retrieval overhead
• Continuous memory synchronization across agents
• Real-time observability and tracing requirements
• Human-in-the-loop validation layers
• Regulatory logging and audit pipelines
• Multi-model fallback infrastructure for resiliency
This is why the next phase of enterprise AI will not be won by the companies deploying the most agents.
It will be won by the companies disciplined enough to deploy AI only where reasoning creates measurable economic asymmetry.
The future belongs to enterprises that understand a hard truth most executives still ignore:
Automation is not automatically profitable, especially when the agent costs more than the employee.
Your Next Internal Software Tool Might Be Built by HR, Not Developers
A firefighter in the UK built an AI-powered grocery route optimization app to reduce time spent walking back and forth inside supermarkets.
A hedge fund managing director created a childcare coordination platform in less than a week.
A Brooklyn entrepreneur vibe-coded an entire construction document management system to organize contracts, blueprints, drawings, and vendor communication for an 18–24 month home-building project.
None of them were traditional software engineers.
That is the real story enterprise leaders should be paying attention to right now.
The cost and complexity of building AI-powered applications is collapsing at a pace most organizations are structurally unprepared for. What previously required product teams, frontend developers, backend engineers, DevOps pipelines, database administrators, and months of sprint planning can now be prototyped through natural language prompts using AI coding systems like Claude, GPT-4o, Cursor, Lovable, Replit Agent, and GitHub Copilot.

GitHub already reports that over 50% of developer code on its platform is now AI-assisted.
Microsoft says developers using Copilot complete certain coding tasks up to 55% faster.
Gartner predicts that by 2027, more than 70% of professional developers will use AI coding assistants daily as part of software production workflows.
The implication is much larger than faster coding.
Business users themselves are becoming application builders.
Operations teams are creating internal workflow automations. Finance departments are prototyping reconciliation tools. HR teams are generating onboarding systems. Sales teams are building custom AI copilots trained on CRM activity. Customer support managers are orchestrating autonomous ticket-routing workflows across multiple systems.
The problem is that most enterprises are allowing this to happen without centralized orchestration, observability, governance, memory synchronization, or data-layer control.

That creates a dangerous form of shadow AI infrastructure.
Every disconnected AI workflow introduces additional vector retrieval pipelines, unmanaged APIs, fragmented embeddings, inconsistent permissions, duplicated enterprise context, and untracked autonomous actions across internal systems. Over time, these isolated agents become operational liabilities rather than productivity multipliers.
This is why enterprises will increasingly move away from allowing every department to independently “vibe-code” production AI systems from scratch.
The organizations scaling successfully are standardizing around reusable, enterprise-grade agents with shared infrastructure underneath.
DataManagement.AI helps companies do exactly that by providing deployment-ready AI agents that already operate with centralized governance, orchestration, enterprise memory, retrieval infrastructure, workflow tracing, and secure access controls built in.

Your teams do not need to independently build vector databases, RAG pipelines, multi-agent orchestration systems, observability tooling, or enterprise retrieval layers every time a new automation use case appears.
The agent foundation already exists.
That fundamentally changes deployment speed, operational reliability, and AI scaling economics.

The enterprises moving fastest right now are not the ones generating the highest number of AI experiments.
They are the ones reducing the distance between identifying a workflow bottleneck and deploying a production-grade AI agent capable of solving it securely at enterprise scale.
Your AI Strategy Is Already Outdated if It Ends at Chatbots and Dashboards
Most enterprise AI strategies today are still designed around digital productivity: copilots inside Slack, AI-generated summaries, automated customer support, workflow orchestration, and coding assistants.
The market is already moving somewhere much bigger.
WIRED recently demonstrated an OpenClaw agent operating a physical robotic arm through natural language reasoning. That sounds experimental until you connect it with what is simultaneously happening across NVIDIA, Figure AI, Tesla Optimus, Sanctuary AI, Boston Dynamics, and OpenAI-backed robotics research.
The underlying shift is that foundation models are no longer being optimized only for text generation. They are increasingly being trained for embodied intelligence: multimodal systems capable of combining computer vision, spatial mapping, reinforcement learning, memory retrieval, sensor fusion, and autonomous action execution in real-world environments.
That changes enterprise infrastructure requirements entirely.

Morgan Stanley estimates the humanoid robotics market could reach $5 trillion globally by 2050. Goldman Sachs projects the robotics industry could exceed $38 billion within the next decade.
NVIDIA CEO Jensen Huang recently said the next multitrillion-dollar industry may be “physical AI,” where models move beyond generating content and begin interacting with factories, warehouses, supply chains, hospitals, retail environments, and industrial operations directly.
The economics behind this transition are becoming increasingly viable because inference costs are falling while model capabilities continue improving. At the same time, labor shortages across logistics, manufacturing, elder care, and industrial operations are intensifying globally. The International Federation of Robotics reported industrial robot installations crossed 540,000 units annually worldwide, while enterprise investment in autonomous systems continues accelerating across Asia, Europe, and North America.
Most organizations are still preparing for AI as software assistance.
The leaders pulling ahead are redesigning infrastructure for autonomous operational systems.
The technical implications are enormous.
Once AI agents start interacting with physical systems, enterprises move from deterministic software environments into probabilistic operational environments where latency, inference reliability, edge compute optimization, retrieval synchronization, permission hierarchies, telemetry observability, failover orchestration, and real-time decision validation become mission-critical infrastructure layers.
A warehouse AI agent coordinating robotic picking systems cannot tolerate hallucinated instructions. A manufacturing AI controlling assembly-line adjustments cannot wait 12 seconds for cloud inference retrieval. A logistics orchestration agent cannot operate on fragmented enterprise memory distributed across disconnected databases and APIs.
This is why enterprise leaders need to stop viewing AI adoption as isolated experimentation inside departments.
The companies likely to dominate the next decade will build unified operational architectures where enterprise data systems, AI agents, robotics infrastructure, edge computing, and autonomous orchestration layers operate as one synchronized environment.
The strategic question is no longer whether AI can automate knowledge work.
The strategic question is whether your enterprise architecture is prepared for AI systems capable of acting on the physical world itself.
Your AI Costs Are Starting to Look Like Cloud Costs in 2012, Except Worse
One enterprise reportedly generated a $500 million Claude bill in a single month after giving employees unrestricted access to Anthropic’s platform without usage caps, governance controls, or workload restrictions.
According to Axios, employees heavily used autonomous coding agents, long-context reasoning workflows, and multi-agent orchestration systems that continuously generated token-intensive inference requests across the organization.
This is not an isolated budgeting mistake.

It is the clearest signal yet that enterprises still fundamentally misunderstand the economics of agentic AI infrastructure.
Goldman Sachs estimates enterprise token consumption could increase 24x by 2030, potentially surpassing 120 quadrillion tokens monthly as autonomous AI systems scale across enterprise operations.
Gartner estimates generative AI spending will exceed $644 billion globally in 2026, while inference costs are increasingly overtaking model training costs as the primary operational burden inside enterprise AI environments.
Most executives still evaluate AI spending like traditional SaaS licensing.

Modern AI systems do not behave like SaaS products.
They behave more like distributed compute infrastructure with variable inference economics, dynamic retrieval pipelines, vector database lookups, orchestration frameworks, tool-calling layers, memory synchronization systems, and continuous API consumption happening simultaneously.
Every autonomous workflow compounds cost multiplicatively.
A single AI coding agent may trigger model inference, repository retrieval, RAG pipeline execution, tool invocation, dependency analysis, recursive prompt chains, memory retrieval, code validation, and orchestration retries inside one task cycle.
Long-context reasoning models processing 200K-token windows can consume exponentially more compute than standard chatbot interactions. Multi-agent architectures amplify this further because several models may coordinate simultaneously across sequential tasks.
Microsoft reportedly reduced internal Claude Code deployments after enterprise usage costs climbed between $500 and $2,000 per engineer monthly. Uber exhausted its 2026 AI tooling budget by April after aggressively incentivizing AI adoption internally. Amazon shut down internal AI usage leaderboards after employees began “tokenmaxxing,” assigning unnecessary workloads to agents simply to maximize token consumption metrics.
This is exactly where enterprise AI strategy needs to mature.
The companies likely to scale AI profitably over the next decade will not deploy frontier models indiscriminately across every workflow. They will optimize inference routing, introduce orchestration governance layers, deploy smaller domain-specific models where possible, implement caching and retrieval optimization, and reserve frontier reasoning models only for high-value cognitive workloads.

In many enterprise environments, quantized open-source models running on optimized GPU clusters already outperform expensive frontier APIs for narrow tasks such as ticket routing, classification pipelines, anomaly detection, structured extraction, and internal semantic retrieval.
The infrastructure strategy matters more than the model itself.
Before adding another expensive AI platform to your enterprise stack, leaders should first understand how fragmented data systems, duplicated pipelines, poor orchestration architecture, and unmanaged retrieval layers silently amplify AI compute costs.
The blog “31 Master Data Management Tools: Best for Integrating Data” is a strong starting point for understanding which platforms help enterprises consolidate fragmented systems before AI workloads turn operational inefficiencies into runaway infrastructure bills.
The enterprises pulling ahead right now are not maximizing AI usage.
They are maximizing inference efficiency, orchestration quality, and business value per token consumed.
Journey Towards AGI
Research and advisory firm guiding on the journey to Artificial General Intelligence
Know Your Inference Maximising GenAI impact on performance and Efficiency. | Model Context Protocol Connect with us, and get end-to-end guidance on AI implementation. |
Your opinion matters!
Hope you loved reading our piece of newsletter as much as we had fun writing it.
Share your experience and feedback with us below ‘cause we take your critique very critically.
How's your experience? |
Thank you for reading
-Shen & Towards AGI team