AI Costs Explained: How Much Does It Really Cost to Run AI Features? (December 2025)

"How much will this AI feature cost?" is the question I get asked most. And honestly, the answer used to be "it depends" followed by hand-waving.

Not anymore. After running AI features in production across multiple projects, I can give you concrete numbers. By the end of this post, you'll know exactly how to estimate your AI costs—and how to cut them by 60-80%.

The Basics: How AI Pricing Works

All major providers charge per token. A token is roughly 3/4 of a word, or about 4 characters. A 1,000-word document is approximately 1,333 tokens.

Quick conversion:

1,000 words ≈ 1,333 tokens
1 page of text ≈ 500 tokens
Average email ≈ 200-400 tokens
Average chat message ≈ 50-150 tokens
Average AI response ≈ 200-500 tokens

Pricing is always quoted per million tokens (MTok), with separate rates for:

Input tokens: What you send to the model (prompts, context, documents)
Output tokens: What the model generates (responses, completions)

Output tokens typically cost 3-10x more than input tokens because generation is computationally harder than reading.

December 2025 Pricing at a Glance

OpenAI

GPT-5: $1.25 input / $10 output per MTok GPT-5 mini: $0.25 input / $2 output per MTok GPT-5 nano: $0.05 input / $0.40 output per MTok

Anthropic

Claude Opus 4.5: $5 input / $25 output per MTok Claude Opus 4.1: $15 input / $75 output per MTok Claude Sonnet 4.5: $3 input / $15 output per MTok Claude Haiku 4.5: $1 input / $5 output per MTok

Google

Gemini 3 Pro (≤200K): $2 input / $12 output per MTok Gemini 3 Pro (>200K): $4 input / $18 output per MTok Gemini 2.5 Flash: $0.30 input / $2.50 output per MTok Gemini 2.5 Flash-Lite: $0.10 input / $0.40 output per MTok

Real-World Cost Calculations

Let's work through actual scenarios with real numbers.

Scenario 1: Customer Support Chatbot

Assumptions:

1,000 conversations per day
Average conversation: 5 back-and-forth exchanges
User message: 100 tokens average
System prompt: 500 tokens (sent with each request)
AI response: 300 tokens average

Per conversation:

Input: (500 system + 100 user) × 5 turns = 3,000 tokens
Output: 300 × 5 turns = 1,500 tokens

Monthly volume (30,000 conversations):

Input: 90M tokens
Output: 45M tokens

Monthly costs by model:

GPT-5 nano: (90 × $0.05) + (45 × $0.40) = $4.50 + $18 = $22.50/month

GPT-5 mini: (90 × $0.25) + (45 × $2) = $22.50 + $90 = $112.50/month

Claude Haiku 4.5: (90 × $1) + (45 × $5) = $90 + $225 = $315/month

Gemini 2.5 Flash-Lite: (90 × $0.10) + (45 × $0.40) = $9 + $18 = $27/month

Insight: For a standard support chatbot, GPT-5 nano or Gemini Flash-Lite costs under $30/month. Even at 10x the volume (10,000 conversations/day), you're looking at $225-300/month with budget models.

Scenario 2: Code Review Pipeline

Assumptions:

50 pull requests per day
Average PR diff: 2,000 tokens
Context (file content, guidelines): 10,000 tokens
System prompt: 1,000 tokens
AI review: 1,500 tokens

Per review:

Input: 1,000 + 10,000 + 2,000 = 13,000 tokens
Output: 1,500 tokens

Monthly volume (1,500 PRs):

Input: 19.5M tokens
Output: 2.25M tokens

Monthly costs by model:

GPT-5: (19.5 × $1.25) + (2.25 × $10) = $24.38 + $22.50 = $46.88/month

Claude Sonnet 4.5: (19.5 × $3) + (2.25 × $15) = $58.50 + $33.75 = $92.25/month

Claude Haiku 4.5: (19.5 × $1) + (2.25 × $5) = $19.50 + $11.25 = $30.75/month

Insight: Code review with frontier models costs under $100/month for a team of 10-20 developers. The quality difference between Sonnet and Haiku on code review is meaningful—Sonnet catches subtle issues Haiku misses. Worth the extra $60/month.

Scenario 3: Document Analysis Platform

Assumptions:

100 documents processed per day
Average document: 15,000 tokens (about 20 pages)
System prompt: 500 tokens
Analysis output: 2,000 tokens per document

Per document:

Input: 15,500 tokens
Output: 2,000 tokens

Monthly volume (3,000 documents):

Input: 46.5M tokens
Output: 6M tokens

Monthly costs by model:

GPT-5: (46.5 × $1.25) + (6 × $10) = $58.13 + $60 = $118.13/month

Claude Sonnet 4.5: (46.5 × $3) + (6 × $15) = $139.50 + $90 = $229.50/month

Gemini 2.5 Flash: (46.5 × $0.30) + (6 × $2.50) = $13.95 + $15 = $28.95/month

Insight: Document analysis is input-heavy, making Google's pricing extremely competitive. For bulk document processing where speed matters more than maximum quality, Gemini Flash at $29/month beats GPT-5 at $118/month.

Scenario 4: AI Writing Assistant

Assumptions:

500 users
20 generations per user per day
User prompt: 200 tokens average
System prompt + context: 800 tokens
Generated content: 800 tokens average

Per generation:

Input: 1,000 tokens
Output: 800 tokens

Monthly volume (300,000 generations):

Input: 300M tokens
Output: 240M tokens

Monthly costs by model:

GPT-5: (300 × $1.25) + (240 × $10) = $375 + $2,400 = $2,775/month

GPT-5 mini: (300 × $0.25) + (240 × $2) = $75 + $480 = $555/month

Claude Sonnet 4.5: (300 × $3) + (240 × $15) = $900 + $3,600 = $4,500/month

Insight: Output-heavy applications get expensive fast. A writing assistant with 500 active users costs $555-4,500/month depending on model choice. GPT-5 mini is the sweet spot for most writing apps—good quality at reasonable cost.

Scenario 5: Enterprise Search (RAG)

Assumptions:

5,000 searches per day
Retrieved context: 4,000 tokens (chunks from vector DB)
User query: 50 tokens
System prompt: 300 tokens
Response: 400 tokens

Per search:

Input: 4,350 tokens
Output: 400 tokens

Monthly volume (150,000 searches):

Input: 652.5M tokens
Output: 60M tokens

Monthly costs by model:

GPT-5 mini: (652.5 × $0.25) + (60 × $2) = $163.13 + $120 = $283.13/month

GPT-5 nano: (652.5 × $0.05) + (60 × $0.40) = $32.63 + $24 = $56.63/month

Claude Haiku 4.5: (652.5 × $1) + (60 × $5) = $652.50 + $300 = $952.50/month

Insight: RAG applications are input-heavy due to retrieved context. OpenAI's nano/mini tiers or Gemini Flash-Lite provide massive savings. Even enterprise-scale search (5,000 queries/day) costs under $300/month with the right model.

The Cost Reduction Playbook

Raw token costs are just the starting point. Smart architecture can cut your bill by 60-80%.

1. Prompt Caching (Save 90%)

All three providers offer prompt caching. When the same content appears at the start of multiple requests, you pay 10% of the normal input cost.

How it works:

System prompts (same for every request) → cached
Common context (shared documents, guidelines) → cached
User-specific content → not cached

Example impact:

Your chatbot has a 500-token system prompt sent with every request. At 1,000 requests/day:

Without caching: 500 × 1,000 × 30 = 15M tokens/month of system prompt With caching: First request pays full price, rest pay 10%

Savings on system prompt alone: ~13.5M tokens × 90% = significant

Implementation:

OpenAI: Automatic for matching prefixes in recent requests
Anthropic: Use prompt caching feature, 5-minute or 1-hour TTL
Google: Context caching with explicit API calls

2. Batch Processing (Save 50%)

If responses aren't needed immediately, batch APIs offer 50% discounts.

Good candidates:

Nightly report generation
Bulk document processing
Content moderation backlog
Data enrichment pipelines

Example:

Processing 10,000 documents overnight with Claude Sonnet 4.5:

Standard: $229.50 Batch (50% off): $114.75

3. Model Routing (Save 40-70%)

Use cheap models for simple tasks, expensive models only when needed.

Architecture:

User Request → Classifier (nano/Flash-Lite) → Route
                                              ├── Simple → Budget Model
                                              └── Complex → Frontier Model

User Request → Classifier (nano/Flash-Lite) → Route
                                              ├── Simple → Budget Model
                                              └── Complex → Frontier Model

Example routing rules:

FAQ-style questions → GPT-5 nano
Simple formatting requests → GPT-5 nano
Coding questions → Claude Sonnet 4.5
Complex analysis → GPT-5 or Claude Opus

Real impact:

If 70% of requests are simple and 30% are complex:

Without routing: 100% at $3/$15 (Sonnet) = baseline With routing: 70% at $0.05/$0.40 (nano) + 30% at $3/$15 (Sonnet) = ~35% of baseline

4. Response Length Optimization

Output tokens cost 3-10x more than input. Shorter responses = lower costs.

Techniques:

Explicit length limits in prompts: "Respond in under 100 words"
Structured output formats: JSON instead of prose
Multi-turn for details: Brief first response, expand on request

Example:

Reducing average response from 500 to 300 tokens across 100K monthly requests:

At GPT-5 ($10/MTok output): 20M fewer tokens = $200/month saved

5. Context Compression

Don't send unnecessary context. Every token costs money.

Techniques:

Summarize long documents before including
Only retrieve relevant chunks in RAG (better embeddings = fewer chunks needed)
Remove redundant instructions
Use references instead of repetition

Cost Comparison: The Same Feature, Different Providers

Let's price an identical feature across all providers: a coding assistant handling 200 code reviews per day.

Fixed assumptions:

Code context: 8,000 tokens
System prompt: 500 tokens
Review output: 1,000 tokens
Monthly: 6,000 reviews
Input: 51M tokens, Output: 6M tokens

Provider comparison:

GPT-5 Base: (51 × $1.25) + (6 × $10) = $123.75/month With 50% caching on system prompt: ~$120/month

Claude Sonnet 4.5 Base: (51 × $3) + (6 × $15) = $243/month With 50% caching: ~$235/month

Claude Haiku 4.5 Base: (51 × $1) + (6 × $5) = $81/month With 50% caching: ~$78/month

Gemini 2.5 Flash Base: (51 × $0.30) + (6 × $2.50) = $30.30/month With caching: ~$28/month

Decision framework:

Maximum code quality matters → Claude Sonnet 4.5 ($243)
Good quality, best price-performance → GPT-5 ($124) or Haiku ($81)
Cost-critical, quality acceptable → Gemini Flash ($30)

Hidden Costs to Budget For

Token costs aren't everything. Factor these in:

1. Development time

Each provider has quirks. Budget 20-40 hours to properly integrate and optimize for any provider. At $150/hour, that's $3,000-6,000 per provider.

2. Error handling and retries

APIs fail. Rate limits trigger. Budget 5-10% extra for retries and fallbacks.

3. Monitoring and observability

You need to track costs, latency, and quality. Tools like Helicone, LangSmith, or custom dashboards have their own costs.

4. Prompt engineering iteration

Your first prompt won't be optimal. Budget time for A/B testing and refinement.

5. Scaling surprises

Usage patterns change. A viral feature can 10x your costs overnight. Set billing alerts.

Building Your Cost Model

Here's a template for estimating your costs:

Step 1: Estimate request volume

Daily active users: ___
Actions per user per day: ___
Total requests per day: ___
Monthly requests: ___ × 30

Step 2: Estimate tokens per request

System prompt: ___ tokens
User input (average): ___ tokens
Context/retrieval: ___ tokens
Total input per request: ___
Output (average): ___ tokens

Step 3: Calculate monthly tokens

Monthly input tokens: requests × input per request
Monthly output tokens: requests × output per request

Step 4: Apply pricing

Input cost: (monthly input ÷ 1,000,000) × input rate
Output cost: (monthly output ÷ 1,000,000) × output rate
Base monthly cost: input cost + output cost

Step 5: Apply optimizations

Caching savings: Base × 0.10-0.30 (10-30% of input is cacheable)
Batch savings: Base × 0.50 (if applicable)
Routing savings: Base × 0.30-0.60 (if applicable)

Step 6: Add buffer

Final estimate: Optimized cost × 1.15 (15% buffer for errors, growth)

My Recommendations by Budget

Under $50/month (side projects, MVPs)

Use GPT-5 nano or Gemini Flash-Lite for everything. These models are surprisingly capable for simple features. Upgrade specific features to better models only after validating product-market fit.

$50-500/month (growing products)

Implement model routing. Use nano/Flash-Lite for simple tasks, GPT-5 mini or Claude Haiku for complex ones. Add caching for any repeated context. This range covers most early-stage products.

$500-5,000/month (established products)

You can afford frontier models for critical paths. Use Claude Sonnet 4.5 for coding features, GPT-5 for general features. Implement comprehensive caching, batch processing for non-real-time workloads, and consider reserved capacity if available.

$5,000+/month (scale)

At this level, negotiate directly with providers for volume discounts. Implement sophisticated routing with quality monitoring. Consider self-hosted open-source models for some workloads. Every optimization percentage point matters.

The Bottom Line

AI costs are predictable once you understand the math. Here's what most projects actually pay:

Simple chatbot (1K conversations/day): $20-100/month
Code review tool (50 PRs/day): $30-100/month
Document analysis (100 docs/day): $30-250/month
Writing assistant (500 users): $500-2,500/month
Enterprise search (5K queries/day): $50-300/month

These numbers assume smart model selection and basic optimization. Without optimization, multiply by 2-3x.

What's Next?

You now know how to estimate and optimize AI costs. But which model should you actually choose for your specific use case?

Previous post: How to Choose the Right AI Model for Your Project

A decision framework based on use case, context requirements, speed needs, and quality/cost tradeoffs.

AI Costs Explained: How Much Does It Really Cost to Run AI Features? (December 2025)

The Basics: How AI Pricing Works

December 2025 Pricing at a Glance

OpenAI

Anthropic

Google

Real-World Cost Calculations

Scenario 1: Customer Support Chatbot

Scenario 2: Code Review Pipeline

Scenario 3: Document Analysis Platform

Scenario 4: AI Writing Assistant

Scenario 5: Enterprise Search (RAG)

The Cost Reduction Playbook

1. Prompt Caching (Save 90%)

2. Batch Processing (Save 50%)

3. Model Routing (Save 40-70%)

4. Response Length Optimization

5. Context Compression

Cost Comparison: The Same Feature, Different Providers

Hidden Costs to Budget For

Building Your Cost Model

My Recommendations by Budget

The Bottom Line

What's Next?

Frank Atukunda

Share this article

Get more like this

Comments (0)

Join the discussion