Tomorrow Prompt
BTC

The AI Capability Guide — What's Real, What's Hype, What's Next

A sober, technically-grounded guide to what AI can actually do in 2026, what it can't, and what's arriving in the next 12-24 months.

The AI Capability Guide

What's real. What's hype. What's next.


Current Capabilities: March 2026

Text & Language

  • Conversation & Q&A: Indistinguishable from a knowledgeable human for most topics. Can sustain coherent dialogue over thousands of words.
  • Writing: Produces publishable-quality content for most contexts. Struggles with distinctive voice and lived experience.
  • Code Generation: Writes functional code in all major languages. Can implement features from descriptions. Still makes subtle logic errors ~15% of the time.
  • Translation: Near-professional quality for major language pairs. Minor cultural nuance gaps remain.
  • Summarisation: Excellent. Can reduce 100-page documents to 1-page briefs with remarkable accuracy.

Vision & Images

  • Image Generation: Photorealistic images on demand. Hands are finally correct (mostly). Text in images still imperfect.
  • Image Understanding: Can describe, analyse, and reason about photos, charts, screenshots, and diagrams.
  • OCR & Document Processing: Near-perfect for printed text. Handwriting recognition is good but not flawless.

Audio & Voice

  • Speech-to-Text: 97%+ accuracy in clean environments. Handles accents and multilingual switching.
  • Text-to-Speech: Indistinguishable from human voice in short segments. Longer form still detectable by careful listeners.
  • Real-Time Conversation: Sub-200ms latency. Natural turn-taking. Can handle interruptions.

Reasoning & Analysis

  • Mathematical Reasoning: Correct for most common problems. Unreliable on novel or multi-step problems without chain-of-thought prompting.
  • Logical Deduction: Strong on well-structured problems. Weak on problems requiring real-world common sense.
  • Data Analysis: Can process CSVs, generate charts, identify trends, and perform statistical analysis. Verify numbers — it occasionally fabricates plausible-looking statistics.

Agency & Action

  • Web Browsing: Can navigate websites, fill forms, extract information. Limited to approved sites in most agent frameworks.
  • Tool Use (via APIs): Robust. AI can call external services, chain API calls, and handle conditional logic.
  • Computer Control: Can operate desktop applications via screen reading and mouse/keyboard control. Still clumsy compared to a human.

The Limitation Matrix

What AI Struggles WithWhyTimeline to Improvement
Factual accuracy on niche topicsTraining data gaps, hallucination tendencyImproving slowly — always verify
Real-time informationKnowledge cutoffs, delayed indexingLargely solved by tool use + web search
Consistent long-form outputAttention drift in very long documents2027 — architecture improvements
Physical world interactionRobotics is hard2028-2030 for consumer applications
Understanding your specific contextLimited memory and personalisation2027 — persistent memory and user models
Creative originalityTrained on existing work, remixes patternsUnclear — may be a fundamental limit
Ethical judgmentNo lived experience or moral intuitionOpen research question

What's Arriving: 2026-2027

H2 2026: Expected Releases

  • GPT-5 or equivalent from OpenAI — likely a significant reasoning jump
  • Gemini 2.0 Ultra — Google's flagship with expanded multimodal capabilities
  • Claude 4 (speculative) — likely focused on reliability and tool use
  • Apple Intelligence 2.0 — deeper OS integration, more capable on-device models
  • Llama 4 from Meta — open-source frontier model

2027: High-Confidence Predictions

  • Autonomous agents become practical for routine office work
  • AI-generated video becomes indistinguishable from real footage for short clips
  • On-device models reach GPT-4-level performance
  • First mainstream AI-to-AI negotiation protocols
  • Regulatory frameworks emerge in EU, UK, and select US states

2027: Lower-Confidence Predictions

  • AI tutoring demonstrably improves student outcomes at scale
  • First credible claim of "artificial general intelligence" (definition TBD)
  • Major corporate restructuring driven by AI capability (10,000+ role shifts at a single company)

How to Stay Current

  1. Follow the benchmarks. MMLU, HumanEval, GPQA, ARC-AGI — these measure real capability, not marketing claims.
  2. Try things yourself. Benchmarks don't capture whether a tool is useful for your work. Spend 30 minutes/month testing new capabilities.
  3. Ignore the extremes. AI doomers and AI utopians are both wrong. The reality is in the middle — transformative but uneven.
  4. Watch the tools, not the models. The model is the engine. The tool (agent framework, integration layer, UI) is what makes it useful.
  5. Read this site. We track all of this so you don't have to.

Last updated: March 2026