Key Takeaways
– Microsoft killed internal Claude Code licenses across its Experiences & Devices unit after five months of engineers actively picking it over Copilot CLI. Your own team’s preference data is worth more than any analyst report.
– GitHub Copilot Desktop dropped in technical preview May 15. Multi-agent support, auto-review-merge, unified inbox. Usage-based pricing kicks in this month.
– OpenAI Codex hit iOS and Android May 14 with 4 million weekly users, Hooks customization, Remote SSH GA, and HIPAA compliance for Enterprise. The most platform-flexible of the three.
– Claude Code /goals shipped May 14 with a split model. One to execute, one to evaluate. It keeps going until the work’s actually done, not just until the agent wraps up early.
– If your developers already picked Claude Code, Microsoft just handed you negotiating use. Use it.
—
The memo hit Slack on a Wednesday. Rajesh Jha, Microsoft EVP of Experiences & Devices, told his division to torch all internal Claude Code licenses by June 30.
Why? Engineers kept choosing Claude Code over Copilot CLI. And Microsoft’s leadership decided they’d rather own the worse tool than share the better one with Anthropic.
That’s the sentence that matters.
Not the launch announcements.
Not the feature matrices. A enterprise spending billions on developer tooling looked at what its own engineers actually used, saw Claude Code winning. And responded by making it unavailable.
“Something we can help shape directly.” That’s how they framed it internally. Vendor lock-in, dressed up as a product decision.
Let’s look at what actually dropped last week and what it means for the five-person dev shop trying to pick a winner.
Four Moves in Forty-Eight Hours
Between May 14 and May 16, the coding agent market basically had four new model unveilings at the same auto show.
Microsoft cancelled Claude Code internally. Not ‘given that it was broken’. As engineers were actively choosing it. Five months of usage data, clear preference pattern. The fix wasn’t making Copilot CLI better. It was making the better tool disappear and hoping people would settle.
GitHub launched Copilot Desktop in technical preview on May 15.
Built on the Copilot CLI that went GA back in February, it adds a unified inbox for issues and PRs, simultaneous multi-agent support, and Agent Merge. Which auto-responds to code review comments and fixes CI failures without a human touching anything. Positioned directly against Claude Code. Business and Enterprise plans get in now. Pro and Pro+ on a waitlist.
OpenAI brought Codex to iOS and Android on May 14. Four million people use it weekly. The mobile release adds Hooks. Custom scripts for scanning prompts and tailoring output. Plus Remote SSH in general availability and HIPAA compliance for Enterprise customers.
If you’re a consultancy working with health-tech clients, that’s the compliance gate you needed.
Anthropic shipped /goals for Claude Code on May 14.
Same day as the Microsoft and OpenAI moves. Honestly this is the most underrated release of the week. It splits the agent into two models: one that does the work, one that figures out whether the work is actually done. Default evaluator is Haiku. If the goal isn’t met, the agent keeps plugging away. No more premature exits burning tokens on half-finished refactors.
Four moves. Two days.
Here’s the read on who actually wins.
The Pricing Shift That Changes Everything
The billing model change is the story that matters more than any feature announcement.
GitHub’s May 12 changelog confirms usage-based billing is coming.
Copilot is moving away from flat subscription pricing toward per-token costs tied to which AI model you’re running. If you were on the $19/month Pro plan and running Claude 3.5 Sonnet as your backend, your actual cost structure is about to change.
Copilot CLI itself is still bundled with existing GitHub plans. But Copilot Desktop. The new app. That’s the bridge product.
Once usage-based billing kicks in, the math shifts depending on team size and usage patterns.
Claude Code runs $100-200/month per seat on Pro/Max tiers.
OpenAI Codex requires a ChatGPT paid plan. Starts at $20/month. But the per-seat cost flattens once you’re in the system.
Microsoft’s play here is obvious: eat the Copilot cost for your developers, recover it through Azure billing, keep you inside the Microsoft stack.
For a five-person dev team, the math looks like this: if your developers spend 60% of their day pair-programming with Copilot, the $100/seat Claude Code cost starts looking like real money against a tool your IDE already bundles.
But. And this is the part Microsoft’s own engineers apparently found — if Claude Code ships twice as fast on complex tasks, the per-seat math flips. Hard.
The actual question isn’t which tool costs less upfront.
It’s which tool lets your two senior developers do the work of four.
That’s the comparison that matters for small shops. Not the feature list. The headcount use.
Why Microsoft’s Natural Experiment Is the Most Credible A/B Test This Year
Here’s the part that should make every indie developer and small business owner sit up.
Microsoft ran a five-month natural experiment. They gave engineers access to both Claude Code and Copilot CLI. They watched what people actually used. Engineers chose Claude Code.
The tool Anthropic built, not the tool Microsoft controls.
And Microsoft cancelled it anyway.
That tells you something important about how enterprise procurement actually works versus how it’s advertised. The advertised story is “we pick the best tools for the job.” The actual story is “we pick the tools we can own.”
If you’re a small shop choosing a coding agent today, you don’t have a procurement department forcing a switch. You can listen to what developers actually prefer.
You can pick based on output quality, not vendor relationship.
The five-month Microsoft experiment is the most credible A/B test in the coding agent space this year.
Engineers chose Claude Code. Microsoft chose Copilot CLI. Only one of those choices was driven by code quality.
Your team gets to make the call that Microsoft’s engineers weren’t allowed to make.
What to Do This Week
Not the theory. The action.
If your developers already use Claude Code: your use just went up. Microsoft validated that tool choice by cancelling it under competitive pressure. Negotiate your seat pricing like you have options — because you do.
If you’re choosing a coding agent for a new project: try all three in real work this week. Set a two-hour timer, give each agent the same task. A small CRUD endpoint with tests — count what ships versus what needs editing. Don’t trust the marketing pages. Trust the output.
If you’re on Copilot and considering Codex mobile: the HIPAA compliance and Remote SSH GA are real additions if you’re in regulated industries. The Hooks feature is genuinely useful for tailoring output to your codebase conventions. Four million weekly users means the edge cases are mostly found.
If you’re running a team on Copilot Desktop: watch your token bills closely once usage-based billing activates. The flat subscription model masked actual cost. The new model shows you exactly what each model costs per task. That’s information you want, even if it stings.
If you want to understand what /goals means for your workflow: it’s the feature that prevents the “agent said it was done but nothing changed” problem. For long-running refactors and multi-file changes, a separate evaluator model checking completion before exiting is worth more than any chatbot improvement.
The coding agent war isn’t won in press releases.
It’s won in the commit logs. Developers who got something done versus the ones who spent the afternoon debugging agent output.
Microsoft’s own engineers already told us which tool works. The memo just reminded us what happens when the business side ignores that.
