AI Costs Explained: How Much Does It Really Cost to Run AI Features? (December 2025)

"How much will this AI feature cost?" is the question I get asked most. And honestly, the answer used to be "it depends" followed by hand-waving.
Not anymore. After running AI features in production across multiple projects, I can give you concrete numbers. By the end of this post, you'll know exactly how to estimate your AI costs—and how to cut them by 60-80%.
The Basics: How AI Pricing Works
All major providers charge per token. A token is roughly 3/4 of a word, or about 4 characters. A 1,000-word document is approximately 1,333 tokens.
Quick conversion:
- 1,000 words ≈ 1,333 tokens
- 1 page of text ≈ 500 tokens
- Average email ≈ 200-400 tokens
- Average chat message ≈ 50-150 tokens
- Average AI response ≈ 200-500 tokens
Pricing is always quoted per million tokens (MTok), with separate rates for:
- Input tokens: What you send to the model (prompts, context, documents)
- Output tokens: What the model generates (responses, completions)
Output tokens typically cost 3-10x more than input tokens because generation is computationally harder than reading.
December 2025 Pricing at a Glance
OpenAI
GPT-5: $1.25 input / $10 output per MTok GPT-5 mini: $0.25 input / $2 output per MTok GPT-5 nano: $0.05 input / $0.40 output per MTok
Anthropic
Claude Opus 4.5: $5 input / $25 output per MTok Claude Opus 4.1: $15 input / $75 output per MTok Claude Sonnet 4.5: $3 input / $15 output per MTok Claude Haiku 4.5: $1 input / $5 output per MTok
Gemini 3 Pro (≤200K): $2 input / $12 output per MTok Gemini 3 Pro (>200K): $4 input / $18 output per MTok Gemini 2.5 Flash: $0.30 input / $2.50 output per MTok Gemini 2.5 Flash-Lite: $0.10 input / $0.40 output per MTok
Real-World Cost Calculations
Let's work through actual scenarios with real numbers.
Scenario 1: Customer Support Chatbot
Assumptions:
- 1,000 conversations per day
- Average conversation: 5 back-and-forth exchanges
- User message: 100 tokens average
- System prompt: 500 tokens (sent with each request)
- AI response: 300 tokens average
Per conversation:
- Input: (500 system + 100 user) × 5 turns = 3,000 tokens
- Output: 300 × 5 turns = 1,500 tokens
Monthly volume (30,000 conversations):
- Input: 90M tokens
- Output: 45M tokens
Monthly costs by model:
GPT-5 nano: (90 × $0.05) + (45 × $0.40) = $4.50 + $18 = $22.50/month
GPT-5 mini: (90 × $0.25) + (45 × $2) = $22.50 + $90 = $112.50/month
Claude Haiku 4.5: (90 × $1) + (45 × $5) = $90 + $225 = $315/month
Gemini 2.5 Flash-Lite: (90 × $0.10) + (45 × $0.40) = $9 + $18 = $27/month
Insight: For a standard support chatbot, GPT-5 nano or Gemini Flash-Lite costs under $30/month. Even at 10x the volume (10,000 conversations/day), you're looking at $225-300/month with budget models.
Scenario 2: Code Review Pipeline
Assumptions:
- 50 pull requests per day
- Average PR diff: 2,000 tokens
- Context (file content, guidelines): 10,000 tokens
- System prompt: 1,000 tokens
- AI review: 1,500 tokens
Per review:
- Input: 1,000 + 10,000 + 2,000 = 13,000 tokens
- Output: 1,500 tokens
Monthly volume (1,500 PRs):
- Input: 19.5M tokens
- Output: 2.25M tokens
Monthly costs by model:
GPT-5: (19.5 × $1.25) + (2.25 × $10) = $24.38 + $22.50 = $46.88/month
Claude Sonnet 4.5: (19.5 × $3) + (2.25 × $15) = $58.50 + $33.75 = $92.25/month
Claude Haiku 4.5: (19.5 × $1) + (2.25 × $5) = $19.50 + $11.25 = $30.75/month
Insight: Code review with frontier models costs under $100/month for a team of 10-20 developers. The quality difference between Sonnet and Haiku on code review is meaningful—Sonnet catches subtle issues Haiku misses. Worth the extra $60/month.
Scenario 3: Document Analysis Platform
Assumptions:
- 100 documents processed per day
- Average document: 15,000 tokens (about 20 pages)
- System prompt: 500 tokens
- Analysis output: 2,000 tokens per document
Per document:
- Input: 15,500 tokens
- Output: 2,000 tokens
Monthly volume (3,000 documents):
- Input: 46.5M tokens
- Output: 6M tokens
Monthly costs by model:
GPT-5: (46.5 × $1.25) + (6 × $10) = $58.13 + $60 = $118.13/month
Claude Sonnet 4.5: (46.5 × $3) + (6 × $15) = $139.50 + $90 = $229.50/month
Gemini 2.5 Flash: (46.5 × $0.30) + (6 × $2.50) = $13.95 + $15 = $28.95/month
Insight: Document analysis is input-heavy, making Google's pricing extremely competitive. For bulk document processing where speed matters more than maximum quality, Gemini Flash at $29/month beats GPT-5 at $118/month.
Scenario 4: AI Writing Assistant
Assumptions:
- 500 users
- 20 generations per user per day
- User prompt: 200 tokens average
- System prompt + context: 800 tokens
- Generated content: 800 tokens average
Per generation:
- Input: 1,000 tokens
- Output: 800 tokens
Monthly volume (300,000 generations):
- Input: 300M tokens
- Output: 240M tokens
Monthly costs by model:
GPT-5: (300 × $1.25) + (240 × $10) = $375 + $2,400 = $2,775/month
GPT-5 mini: (300 × $0.25) + (240 × $2) = $75 + $480 = $555/month
Claude Sonnet 4.5: (300 × $3) + (240 × $15) = $900 + $3,600 = $4,500/month
Insight: Output-heavy applications get expensive fast. A writing assistant with 500 active users costs $555-4,500/month depending on model choice. GPT-5 mini is the sweet spot for most writing apps—good quality at reasonable cost.
Scenario 5: Enterprise Search (RAG)
Assumptions:
- 5,000 searches per day
- Retrieved context: 4,000 tokens (chunks from vector DB)
- User query: 50 tokens
- System prompt: 300 tokens
- Response: 400 tokens
Per search:
- Input: 4,350 tokens
- Output: 400 tokens
Monthly volume (150,000 searches):
- Input: 652.5M tokens
- Output: 60M tokens
Monthly costs by model:
GPT-5 mini: (652.5 × $0.25) + (60 × $2) = $163.13 + $120 = $283.13/month
GPT-5 nano: (652.5 × $0.05) + (60 × $0.40) = $32.63 + $24 = $56.63/month
Claude Haiku 4.5: (652.5 × $1) + (60 × $5) = $652.50 + $300 = $952.50/month
Insight: RAG applications are input-heavy due to retrieved context. OpenAI's nano/mini tiers or Gemini Flash-Lite provide massive savings. Even enterprise-scale search (5,000 queries/day) costs under $300/month with the right model.
The Cost Reduction Playbook
Raw token costs are just the starting point. Smart architecture can cut your bill by 60-80%.
1. Prompt Caching (Save 90%)
All three providers offer prompt caching. When the same content appears at the start of multiple requests, you pay 10% of the normal input cost.
How it works:
- System prompts (same for every request) → cached
- Common context (shared documents, guidelines) → cached
- User-specific content → not cached
Example impact:
Your chatbot has a 500-token system prompt sent with every request. At 1,000 requests/day:
Without caching: 500 × 1,000 × 30 = 15M tokens/month of system prompt With caching: First request pays full price, rest pay 10%
Savings on system prompt alone: ~13.5M tokens × 90% = significant
Implementation:
- OpenAI: Automatic for matching prefixes in recent requests
- Anthropic: Use prompt caching feature, 5-minute or 1-hour TTL
- Google: Context caching with explicit API calls
2. Batch Processing (Save 50%)
If responses aren't needed immediately, batch APIs offer 50% discounts.
Good candidates:
- Nightly report generation
- Bulk document processing
- Content moderation backlog
- Data enrichment pipelines
Example:
Processing 10,000 documents overnight with Claude Sonnet 4.5:
Standard: $229.50 Batch (50% off): $114.75
3. Model Routing (Save 40-70%)
Use cheap models for simple tasks, expensive models only when needed.
Architecture:
User Request → Classifier (nano/Flash-Lite) → Route
├── Simple → Budget Model
└── Complex → Frontier Model
Example routing rules:
- FAQ-style questions → GPT-5 nano
- Simple formatting requests → GPT-5 nano
- Coding questions → Claude Sonnet 4.5
- Complex analysis → GPT-5 or Claude Opus
Real impact:
If 70% of requests are simple and 30% are complex:
Without routing: 100% at $3/$15 (Sonnet) = baseline With routing: 70% at $0.05/$0.40 (nano) + 30% at $3/$15 (Sonnet) = ~35% of baseline
4. Response Length Optimization
Output tokens cost 3-10x more than input. Shorter responses = lower costs.
Techniques:
- Explicit length limits in prompts: "Respond in under 100 words"
- Structured output formats: JSON instead of prose
- Multi-turn for details: Brief first response, expand on request
Example:
Reducing average response from 500 to 300 tokens across 100K monthly requests:
At GPT-5 ($10/MTok output): 20M fewer tokens = $200/month saved
5. Context Compression
Don't send unnecessary context. Every token costs money.
Techniques:
- Summarize long documents before including
- Only retrieve relevant chunks in RAG (better embeddings = fewer chunks needed)
- Remove redundant instructions
- Use references instead of repetition
Cost Comparison: The Same Feature, Different Providers
Let's price an identical feature across all providers: a coding assistant handling 200 code reviews per day.
Fixed assumptions:
- Code context: 8,000 tokens
- System prompt: 500 tokens
- Review output: 1,000 tokens
- Monthly: 6,000 reviews
- Input: 51M tokens, Output: 6M tokens
Provider comparison:
GPT-5 Base: (51 × $1.25) + (6 × $10) = $123.75/month With 50% caching on system prompt: ~$120/month
Claude Sonnet 4.5 Base: (51 × $3) + (6 × $15) = $243/month With 50% caching: ~$235/month
Claude Haiku 4.5 Base: (51 × $1) + (6 × $5) = $81/month With 50% caching: ~$78/month
Gemini 2.5 Flash Base: (51 × $0.30) + (6 × $2.50) = $30.30/month With caching: ~$28/month
Decision framework:
- Maximum code quality matters → Claude Sonnet 4.5 ($243)
- Good quality, best price-performance → GPT-5 ($124) or Haiku ($81)
- Cost-critical, quality acceptable → Gemini Flash ($30)
Hidden Costs to Budget For
Token costs aren't everything. Factor these in:
1. Development time
Each provider has quirks. Budget 20-40 hours to properly integrate and optimize for any provider. At $150/hour, that's $3,000-6,000 per provider.
2. Error handling and retries
APIs fail. Rate limits trigger. Budget 5-10% extra for retries and fallbacks.
3. Monitoring and observability
You need to track costs, latency, and quality. Tools like Helicone, LangSmith, or custom dashboards have their own costs.
4. Prompt engineering iteration
Your first prompt won't be optimal. Budget time for A/B testing and refinement.
5. Scaling surprises
Usage patterns change. A viral feature can 10x your costs overnight. Set billing alerts.
Building Your Cost Model
Here's a template for estimating your costs:
Step 1: Estimate request volume
- Daily active users: ___
- Actions per user per day: ___
- Total requests per day: ___
- Monthly requests: ___ × 30
Step 2: Estimate tokens per request
- System prompt: ___ tokens
- User input (average): ___ tokens
- Context/retrieval: ___ tokens
- Total input per request: ___
- Output (average): ___ tokens
Step 3: Calculate monthly tokens
- Monthly input tokens: requests × input per request
- Monthly output tokens: requests × output per request
Step 4: Apply pricing
- Input cost: (monthly input ÷ 1,000,000) × input rate
- Output cost: (monthly output ÷ 1,000,000) × output rate
- Base monthly cost: input cost + output cost
Step 5: Apply optimizations
- Caching savings: Base × 0.10-0.30 (10-30% of input is cacheable)
- Batch savings: Base × 0.50 (if applicable)
- Routing savings: Base × 0.30-0.60 (if applicable)
Step 6: Add buffer
- Final estimate: Optimized cost × 1.15 (15% buffer for errors, growth)
My Recommendations by Budget
Under $50/month (side projects, MVPs)
Use GPT-5 nano or Gemini Flash-Lite for everything. These models are surprisingly capable for simple features. Upgrade specific features to better models only after validating product-market fit.
$50-500/month (growing products)
Implement model routing. Use nano/Flash-Lite for simple tasks, GPT-5 mini or Claude Haiku for complex ones. Add caching for any repeated context. This range covers most early-stage products.
$500-5,000/month (established products)
You can afford frontier models for critical paths. Use Claude Sonnet 4.5 for coding features, GPT-5 for general features. Implement comprehensive caching, batch processing for non-real-time workloads, and consider reserved capacity if available.
$5,000+/month (scale)
At this level, negotiate directly with providers for volume discounts. Implement sophisticated routing with quality monitoring. Consider self-hosted open-source models for some workloads. Every optimization percentage point matters.
The Bottom Line
AI costs are predictable once you understand the math. Here's what most projects actually pay:
- Simple chatbot (1K conversations/day): $20-100/month
- Code review tool (50 PRs/day): $30-100/month
- Document analysis (100 docs/day): $30-250/month
- Writing assistant (500 users): $500-2,500/month
- Enterprise search (5K queries/day): $50-300/month
These numbers assume smart model selection and basic optimization. Without optimization, multiply by 2-3x.
What's Next?
You now know how to estimate and optimize AI costs. But which model should you actually choose for your specific use case?
Previous post: How to Choose the Right AI Model for Your Project
A decision framework based on use case, context requirements, speed needs, and quality/cost tradeoffs.

Frank Atukunda
Software Engineer documenting my transition to AI Engineering. Building 10x .dev to share what I learn along the way.
Comments (0)
Join the discussion
Sign in with GitHub to leave a comment and connect with other engineers.