Back to Articles
AI EngineeringAI CostsLLM PricingAI BudgetGPT-5ClaudeGemini

AI Costs Explained: How Much Does It Really Cost to Run AI Features? (December 2025)

Frank Atukunda
Frank Atukunda
Software Engineer
December 7, 2025
12 min read
AI Costs Explained: How Much Does It Really Cost to Run AI Features? (December 2025)

"How much will this AI feature cost?" is the question I get asked most. And honestly, the answer used to be "it depends" followed by hand-waving.

Not anymore. After running AI features in production across multiple projects, I can give you concrete numbers. By the end of this post, you'll know exactly how to estimate your AI costs—and how to cut them by 60-80%.

The Basics: How AI Pricing Works

All major providers charge per token. A token is roughly 3/4 of a word, or about 4 characters. A 1,000-word document is approximately 1,333 tokens.

Quick conversion:

  • 1,000 words ≈ 1,333 tokens
  • 1 page of text ≈ 500 tokens
  • Average email ≈ 200-400 tokens
  • Average chat message ≈ 50-150 tokens
  • Average AI response ≈ 200-500 tokens

Pricing is always quoted per million tokens (MTok), with separate rates for:

  • Input tokens: What you send to the model (prompts, context, documents)
  • Output tokens: What the model generates (responses, completions)

Output tokens typically cost 3-10x more than input tokens because generation is computationally harder than reading.


December 2025 Pricing at a Glance

OpenAI

GPT-5: $1.25 input / $10 output per MTok GPT-5 mini: $0.25 input / $2 output per MTok GPT-5 nano: $0.05 input / $0.40 output per MTok

Anthropic

Claude Opus 4.5: $5 input / $25 output per MTok Claude Opus 4.1: $15 input / $75 output per MTok Claude Sonnet 4.5: $3 input / $15 output per MTok Claude Haiku 4.5: $1 input / $5 output per MTok

Google

Gemini 3 Pro (≤200K): $2 input / $12 output per MTok Gemini 3 Pro (>200K): $4 input / $18 output per MTok Gemini 2.5 Flash: $0.30 input / $2.50 output per MTok Gemini 2.5 Flash-Lite: $0.10 input / $0.40 output per MTok


Real-World Cost Calculations

Let's work through actual scenarios with real numbers.

Scenario 1: Customer Support Chatbot

Assumptions:

  • 1,000 conversations per day
  • Average conversation: 5 back-and-forth exchanges
  • User message: 100 tokens average
  • System prompt: 500 tokens (sent with each request)
  • AI response: 300 tokens average

Per conversation:

  • Input: (500 system + 100 user) × 5 turns = 3,000 tokens
  • Output: 300 × 5 turns = 1,500 tokens

Monthly volume (30,000 conversations):

  • Input: 90M tokens
  • Output: 45M tokens

Monthly costs by model:

GPT-5 nano: (90 × $0.05) + (45 × $0.40) = $4.50 + $18 = $22.50/month

GPT-5 mini: (90 × $0.25) + (45 × $2) = $22.50 + $90 = $112.50/month

Claude Haiku 4.5: (90 × $1) + (45 × $5) = $90 + $225 = $315/month

Gemini 2.5 Flash-Lite: (90 × $0.10) + (45 × $0.40) = $9 + $18 = $27/month

Insight: For a standard support chatbot, GPT-5 nano or Gemini Flash-Lite costs under $30/month. Even at 10x the volume (10,000 conversations/day), you're looking at $225-300/month with budget models.


Scenario 2: Code Review Pipeline

Assumptions:

  • 50 pull requests per day
  • Average PR diff: 2,000 tokens
  • Context (file content, guidelines): 10,000 tokens
  • System prompt: 1,000 tokens
  • AI review: 1,500 tokens

Per review:

  • Input: 1,000 + 10,000 + 2,000 = 13,000 tokens
  • Output: 1,500 tokens

Monthly volume (1,500 PRs):

  • Input: 19.5M tokens
  • Output: 2.25M tokens

Monthly costs by model:

GPT-5: (19.5 × $1.25) + (2.25 × $10) = $24.38 + $22.50 = $46.88/month

Claude Sonnet 4.5: (19.5 × $3) + (2.25 × $15) = $58.50 + $33.75 = $92.25/month

Claude Haiku 4.5: (19.5 × $1) + (2.25 × $5) = $19.50 + $11.25 = $30.75/month

Insight: Code review with frontier models costs under $100/month for a team of 10-20 developers. The quality difference between Sonnet and Haiku on code review is meaningful—Sonnet catches subtle issues Haiku misses. Worth the extra $60/month.


Scenario 3: Document Analysis Platform

Assumptions:

  • 100 documents processed per day
  • Average document: 15,000 tokens (about 20 pages)
  • System prompt: 500 tokens
  • Analysis output: 2,000 tokens per document

Per document:

  • Input: 15,500 tokens
  • Output: 2,000 tokens

Monthly volume (3,000 documents):

  • Input: 46.5M tokens
  • Output: 6M tokens

Monthly costs by model:

GPT-5: (46.5 × $1.25) + (6 × $10) = $58.13 + $60 = $118.13/month

Claude Sonnet 4.5: (46.5 × $3) + (6 × $15) = $139.50 + $90 = $229.50/month

Gemini 2.5 Flash: (46.5 × $0.30) + (6 × $2.50) = $13.95 + $15 = $28.95/month

Insight: Document analysis is input-heavy, making Google's pricing extremely competitive. For bulk document processing where speed matters more than maximum quality, Gemini Flash at $29/month beats GPT-5 at $118/month.


Scenario 4: AI Writing Assistant

Assumptions:

  • 500 users
  • 20 generations per user per day
  • User prompt: 200 tokens average
  • System prompt + context: 800 tokens
  • Generated content: 800 tokens average

Per generation:

  • Input: 1,000 tokens
  • Output: 800 tokens

Monthly volume (300,000 generations):

  • Input: 300M tokens
  • Output: 240M tokens

Monthly costs by model:

GPT-5: (300 × $1.25) + (240 × $10) = $375 + $2,400 = $2,775/month

GPT-5 mini: (300 × $0.25) + (240 × $2) = $75 + $480 = $555/month

Claude Sonnet 4.5: (300 × $3) + (240 × $15) = $900 + $3,600 = $4,500/month

Insight: Output-heavy applications get expensive fast. A writing assistant with 500 active users costs $555-4,500/month depending on model choice. GPT-5 mini is the sweet spot for most writing apps—good quality at reasonable cost.


Scenario 5: Enterprise Search (RAG)

Assumptions:

  • 5,000 searches per day
  • Retrieved context: 4,000 tokens (chunks from vector DB)
  • User query: 50 tokens
  • System prompt: 300 tokens
  • Response: 400 tokens

Per search:

  • Input: 4,350 tokens
  • Output: 400 tokens

Monthly volume (150,000 searches):

  • Input: 652.5M tokens
  • Output: 60M tokens

Monthly costs by model:

GPT-5 mini: (652.5 × $0.25) + (60 × $2) = $163.13 + $120 = $283.13/month

GPT-5 nano: (652.5 × $0.05) + (60 × $0.40) = $32.63 + $24 = $56.63/month

Claude Haiku 4.5: (652.5 × $1) + (60 × $5) = $652.50 + $300 = $952.50/month

Insight: RAG applications are input-heavy due to retrieved context. OpenAI's nano/mini tiers or Gemini Flash-Lite provide massive savings. Even enterprise-scale search (5,000 queries/day) costs under $300/month with the right model.


The Cost Reduction Playbook

Raw token costs are just the starting point. Smart architecture can cut your bill by 60-80%.

1. Prompt Caching (Save 90%)

All three providers offer prompt caching. When the same content appears at the start of multiple requests, you pay 10% of the normal input cost.

How it works:

  • System prompts (same for every request) → cached
  • Common context (shared documents, guidelines) → cached
  • User-specific content → not cached

Example impact:

Your chatbot has a 500-token system prompt sent with every request. At 1,000 requests/day:

Without caching: 500 × 1,000 × 30 = 15M tokens/month of system prompt With caching: First request pays full price, rest pay 10%

Savings on system prompt alone: ~13.5M tokens × 90% = significant

Implementation:

  • OpenAI: Automatic for matching prefixes in recent requests
  • Anthropic: Use prompt caching feature, 5-minute or 1-hour TTL
  • Google: Context caching with explicit API calls

2. Batch Processing (Save 50%)

If responses aren't needed immediately, batch APIs offer 50% discounts.

Good candidates:

  • Nightly report generation
  • Bulk document processing
  • Content moderation backlog
  • Data enrichment pipelines

Example:

Processing 10,000 documents overnight with Claude Sonnet 4.5:

Standard: $229.50 Batch (50% off): $114.75

3. Model Routing (Save 40-70%)

Use cheap models for simple tasks, expensive models only when needed.

Architecture:

User Request → Classifier (nano/Flash-Lite) → Route
                                              ├── Simple → Budget Model
                                              └── Complex → Frontier Model

Example routing rules:

  • FAQ-style questions → GPT-5 nano
  • Simple formatting requests → GPT-5 nano
  • Coding questions → Claude Sonnet 4.5
  • Complex analysis → GPT-5 or Claude Opus

Real impact:

If 70% of requests are simple and 30% are complex:

Without routing: 100% at $3/$15 (Sonnet) = baseline With routing: 70% at $0.05/$0.40 (nano) + 30% at $3/$15 (Sonnet) = ~35% of baseline

4. Response Length Optimization

Output tokens cost 3-10x more than input. Shorter responses = lower costs.

Techniques:

  • Explicit length limits in prompts: "Respond in under 100 words"
  • Structured output formats: JSON instead of prose
  • Multi-turn for details: Brief first response, expand on request

Example:

Reducing average response from 500 to 300 tokens across 100K monthly requests:

At GPT-5 ($10/MTok output): 20M fewer tokens = $200/month saved

5. Context Compression

Don't send unnecessary context. Every token costs money.

Techniques:

  • Summarize long documents before including
  • Only retrieve relevant chunks in RAG (better embeddings = fewer chunks needed)
  • Remove redundant instructions
  • Use references instead of repetition

Cost Comparison: The Same Feature, Different Providers

Let's price an identical feature across all providers: a coding assistant handling 200 code reviews per day.

Fixed assumptions:

  • Code context: 8,000 tokens
  • System prompt: 500 tokens
  • Review output: 1,000 tokens
  • Monthly: 6,000 reviews
  • Input: 51M tokens, Output: 6M tokens

Provider comparison:

GPT-5 Base: (51 × $1.25) + (6 × $10) = $123.75/month With 50% caching on system prompt: ~$120/month

Claude Sonnet 4.5 Base: (51 × $3) + (6 × $15) = $243/month With 50% caching: ~$235/month

Claude Haiku 4.5 Base: (51 × $1) + (6 × $5) = $81/month With 50% caching: ~$78/month

Gemini 2.5 Flash Base: (51 × $0.30) + (6 × $2.50) = $30.30/month With caching: ~$28/month

Decision framework:

  • Maximum code quality matters → Claude Sonnet 4.5 ($243)
  • Good quality, best price-performance → GPT-5 ($124) or Haiku ($81)
  • Cost-critical, quality acceptable → Gemini Flash ($30)

Hidden Costs to Budget For

Token costs aren't everything. Factor these in:

1. Development time

Each provider has quirks. Budget 20-40 hours to properly integrate and optimize for any provider. At $150/hour, that's $3,000-6,000 per provider.

2. Error handling and retries

APIs fail. Rate limits trigger. Budget 5-10% extra for retries and fallbacks.

3. Monitoring and observability

You need to track costs, latency, and quality. Tools like Helicone, LangSmith, or custom dashboards have their own costs.

4. Prompt engineering iteration

Your first prompt won't be optimal. Budget time for A/B testing and refinement.

5. Scaling surprises

Usage patterns change. A viral feature can 10x your costs overnight. Set billing alerts.


Building Your Cost Model

Here's a template for estimating your costs:

Step 1: Estimate request volume

  • Daily active users: ___
  • Actions per user per day: ___
  • Total requests per day: ___
  • Monthly requests: ___ × 30

Step 2: Estimate tokens per request

  • System prompt: ___ tokens
  • User input (average): ___ tokens
  • Context/retrieval: ___ tokens
  • Total input per request: ___
  • Output (average): ___ tokens

Step 3: Calculate monthly tokens

  • Monthly input tokens: requests × input per request
  • Monthly output tokens: requests × output per request

Step 4: Apply pricing

  • Input cost: (monthly input ÷ 1,000,000) × input rate
  • Output cost: (monthly output ÷ 1,000,000) × output rate
  • Base monthly cost: input cost + output cost

Step 5: Apply optimizations

  • Caching savings: Base × 0.10-0.30 (10-30% of input is cacheable)
  • Batch savings: Base × 0.50 (if applicable)
  • Routing savings: Base × 0.30-0.60 (if applicable)

Step 6: Add buffer

  • Final estimate: Optimized cost × 1.15 (15% buffer for errors, growth)

My Recommendations by Budget

Under $50/month (side projects, MVPs)

Use GPT-5 nano or Gemini Flash-Lite for everything. These models are surprisingly capable for simple features. Upgrade specific features to better models only after validating product-market fit.

$50-500/month (growing products)

Implement model routing. Use nano/Flash-Lite for simple tasks, GPT-5 mini or Claude Haiku for complex ones. Add caching for any repeated context. This range covers most early-stage products.

$500-5,000/month (established products)

You can afford frontier models for critical paths. Use Claude Sonnet 4.5 for coding features, GPT-5 for general features. Implement comprehensive caching, batch processing for non-real-time workloads, and consider reserved capacity if available.

$5,000+/month (scale)

At this level, negotiate directly with providers for volume discounts. Implement sophisticated routing with quality monitoring. Consider self-hosted open-source models for some workloads. Every optimization percentage point matters.


The Bottom Line

AI costs are predictable once you understand the math. Here's what most projects actually pay:

  • Simple chatbot (1K conversations/day): $20-100/month
  • Code review tool (50 PRs/day): $30-100/month
  • Document analysis (100 docs/day): $30-250/month
  • Writing assistant (500 users): $500-2,500/month
  • Enterprise search (5K queries/day): $50-300/month

These numbers assume smart model selection and basic optimization. Without optimization, multiply by 2-3x.


What's Next?

You now know how to estimate and optimize AI costs. But which model should you actually choose for your specific use case?

Previous post: How to Choose the Right AI Model for Your Project

A decision framework based on use case, context requirements, speed needs, and quality/cost tradeoffs.


0claps
Frank Atukunda

Frank Atukunda

Software Engineer documenting my transition to AI Engineering. Building 10x .dev to share what I learn along the way.

Share this article

Get more like this

Weekly insights on AI engineering for developers.

Comments (0)

Join the discussion

Sign in with GitHub to leave a comment and connect with other engineers.

No comments yet. Be the first to share your thoughts!