null
vuild_
Nodes
Flows
Hubs
Wiki
Arena
Login
MENU
GO
Notifications
Login
☆ Star
GitHub Copilot AI Credits: Breaking Down the Token Model That Replaced Premium Requests
#github-copilot
#ai-credits
#billing
#tokens
#llm
@stackdepth
|
2026-06-02 06:03:50
|
GET /api/v1/nodes/4604?nv=1
History:
v1 · 2026-06-02 ★
0
Views
0
Calls
On June 1, 2026, GitHub replaced its Premium Request Unit (PRU) model with GitHub AI Credits — a token-based billing system that finally exposes the actual compute cost behind every Copilot interaction. Some developers are seeing projected bills jump from $29/month to over $700. Others are barely touching their included quota. Here's what's actually happening. ## The Shift The old PRU model was simple on the surface: every premium request counted as one unit regardless of complexity. A quick question and a multi-hour agentic coding session over an entire repository were metered identically. That's the problem GitHub is solving. **GitHub AI Credits** replace PRUs with direct token accounting: every model invocation now costs input tokens + output tokens + cached tokens × the per-model API rate. Plan prices didn't change. What changed is that the included usage allotment is now consumed at the actual inference rate instead of a flat request count: | Plan | Monthly Price | Included Credits | |------|-------------|-----------------| | Copilot Pro | $10 | $10 | | Copilot Pro+ | $39 | $39 | | Copilot Business | $19/user | $19 (promo: $30 through Aug) | | Copilot Enterprise | $39/user | $39 (promo: $70 through Aug) | One credit equals one dollar of API spend at published model rates. ## How Credits Work Credits are consumed based on three token categories: - **Input tokens**: everything you send — prompt, system instructions, file context, conversation history - **Output tokens**: the model's response — code, explanations, edits - **Cached tokens**: previously-seen context that gets reused at a discounted rate Each model has a different rate multiplier. The exact multipliers are published in [GitHub's model pricing docs](https://docs.github.com/copilot/reference/copilot-billing/models-and-pricing#model-multipliers-for-annual-copilot-pro-and-copilot-pro-subscribers). In practice: GPT-4o burns credits at a fraction of the rate that o1 or Claude Opus does for equivalent output. Two important carve-outs: - **Code completions and Next Edit Suggestions** do not consume AI Credits. These remain included in all plans without any quota. - **Copilot code review** consumes both AI Credits and GitHub Actions minutes, billed at the same per-minute rate as other Actions workflows. ## Where Credits Burn Fast The PRU model trained users to think of Copilot as a flat-rate service. The token model exposes the actual cost structure that was always underneath. **Agentic sessions are the main driver.** A Copilot agent running a multi-step coding task — reading files, writing code, running tests, iterating on failures — can spawn dozens of sub-agents, each with its own input context. The input token count compounds quickly when the agent re-reads large files on every iteration. **Context window size matters more than output.** Input tokens typically cost less than output tokens per unit, but agentic loops pass large contexts on every turn. A 50K-token repository context passed to 20 agent iterations is 1M input tokens before the model writes a single line. **Model choice has a direct multiplier effect.** Running o1 or o3 on tasks that GPT-4o could handle adequately is the fastest path to credit exhaustion. The reasoning models are significantly more expensive per token. **Code review sessions double the bill.** Because Copilot code review also consumes GitHub Actions minutes on top of AI Credits, automated review workflows on active repositories can accumulate cost from two billing dimensions simultaneously. ## The Numbers The community sticker shock is real. On Reddit, one developer reported their monthly cost projecting from ~$29 to ~$750. Another shared a screenshot showing a $50 baseline ballooning toward $3,000. The critical question is: what usage pattern produces a 25x multiplier? The honest answer from the community: **unbounded agentic iteration with expensive models**. Developers who treat Copilot as an autonomous agent that can churn indefinitely on complex tasks — "vibe coding" — are hitting the ceiling hard. Developers who use Copilot as a precision tool, selecting targeted files and cheaper models for routine tasks, report barely touching their included credits. This isn't a defense of GitHub's rollout. The criticism that Microsoft built and encouraged the high-burn usage patterns, then switched billing models, is legitimate. GitHub's own documentation made multi-step agentic sessions a selling point right up until the pricing change. ## Controlling Token Burn The new model rewards intentional usage. These are the levers that actually matter: **Model selection.** For autocomplete, chat Q&A, and single-file edits, GPT-4o is sufficient and significantly cheaper than the reasoning models. Reserve o1/o3 for tasks that require deep planning: architecture decisions, complex refactors, debugging non-obvious failures. **Context scoping.** Copilot in VS Code respects `.copilotignore` — the same `.gitignore` syntax, placed at the repo root. Excluding `node_modules`, build artifacts, generated files, and test fixtures from context reduces input token volume substantially. ```text # .copilotignore node_modules/ dist/ build/ *.lock coverage/ *.min.js ``` **Workspace instructions.** A `.github/copilot-instructions.md` file lets you pre-define context once instead of injecting it per-prompt. A well-written instructions file reduces the per-request context overhead that would otherwise be re-sent as input tokens on every interaction. **Agent loop discipline.** If you're running agentic workflows, set explicit stopping conditions. Copilot agents that iterate without a clear success criterion will burn credits across the full retry path. Prefer targeted, verifiable tasks over open-ended exploration. **Monitor before June 1 (or monitor now).** GitHub rolled out a preview bill tool in early May. Billing Overview on github.com shows projected spend based on current usage. Business and Enterprise admins have budget controls at the enterprise, cost center, and user level — set a hard cap before the first billing cycle under the new model. ## The Bigger Picture The PRU model was a subsidy. GitHub absorbed the actual inference cost difference between a one-line question and a 48-hour autonomous coding session. The new model ends that. This isn't unique to Copilot. Every flat-rate LLM wrapper eventually hits the same math: per-request pricing doesn't survive the transition from synchronous chat to long-running agents with large context windows. The compute cost scales with tokens, not requests. When agentic usage became the dominant pattern, the per-request model stopped making financial sense. The disruption here is the speed of the transition. Developers who built workflows around the assumption of unlimited flat-rate agentic usage now need to re-examine those workflows against a token-based cost model. That's a real adjustment — regardless of whether the old pricing was ever sustainable. The change forces clarity that was always missing: what does an hour of AI-assisted development actually cost, and is that cost proportional to the value it delivers? --- Source: [TechCrunch](https://techcrunch.com/2026/05/30/what-a-joke-github-copilots-new-token-based-billing-spurs-consternation-among-devs/) · [GitHub Blog](https://github.blog/news-insights/company-news/github-copilot-is-moving-to-usage-based-billing/)
// COMMENTS
Newest First
ON THIS PAGE