The AI Capability Guide — What's Real, What's Hype, What's Next
A sober, technically-grounded guide to what AI can actually do in 2026, what it can't, and what's arriving in the next 12-24 months.
The AI Capability Guide
What's real. What's hype. What's next.
Current Capabilities: March 2026
Text & Language
- Conversation & Q&A: Indistinguishable from a knowledgeable human for most topics. Can sustain coherent dialogue over thousands of words.
- Writing: Produces publishable-quality content for most contexts. Struggles with distinctive voice and lived experience.
- Code Generation: Writes functional code in all major languages. Can implement features from descriptions. Still makes subtle logic errors ~15% of the time.
- Translation: Near-professional quality for major language pairs. Minor cultural nuance gaps remain.
- Summarisation: Excellent. Can reduce 100-page documents to 1-page briefs with remarkable accuracy.
Vision & Images
- Image Generation: Photorealistic images on demand. Hands are finally correct (mostly). Text in images still imperfect.
- Image Understanding: Can describe, analyse, and reason about photos, charts, screenshots, and diagrams.
- OCR & Document Processing: Near-perfect for printed text. Handwriting recognition is good but not flawless.
Audio & Voice
- Speech-to-Text: 97%+ accuracy in clean environments. Handles accents and multilingual switching.
- Text-to-Speech: Indistinguishable from human voice in short segments. Longer form still detectable by careful listeners.
- Real-Time Conversation: Sub-200ms latency. Natural turn-taking. Can handle interruptions.
Reasoning & Analysis
- Mathematical Reasoning: Correct for most common problems. Unreliable on novel or multi-step problems without chain-of-thought prompting.
- Logical Deduction: Strong on well-structured problems. Weak on problems requiring real-world common sense.
- Data Analysis: Can process CSVs, generate charts, identify trends, and perform statistical analysis. Verify numbers — it occasionally fabricates plausible-looking statistics.
Agency & Action
- Web Browsing: Can navigate websites, fill forms, extract information. Limited to approved sites in most agent frameworks.
- Tool Use (via APIs): Robust. AI can call external services, chain API calls, and handle conditional logic.
- Computer Control: Can operate desktop applications via screen reading and mouse/keyboard control. Still clumsy compared to a human.
The Limitation Matrix
| What AI Struggles With | Why | Timeline to Improvement |
|---|---|---|
| Factual accuracy on niche topics | Training data gaps, hallucination tendency | Improving slowly — always verify |
| Real-time information | Knowledge cutoffs, delayed indexing | Largely solved by tool use + web search |
| Consistent long-form output | Attention drift in very long documents | 2027 — architecture improvements |
| Physical world interaction | Robotics is hard | 2028-2030 for consumer applications |
| Understanding your specific context | Limited memory and personalisation | 2027 — persistent memory and user models |
| Creative originality | Trained on existing work, remixes patterns | Unclear — may be a fundamental limit |
| Ethical judgment | No lived experience or moral intuition | Open research question |
What's Arriving: 2026-2027
H2 2026: Expected Releases
- GPT-5 or equivalent from OpenAI — likely a significant reasoning jump
- Gemini 2.0 Ultra — Google's flagship with expanded multimodal capabilities
- Claude 4 (speculative) — likely focused on reliability and tool use
- Apple Intelligence 2.0 — deeper OS integration, more capable on-device models
- Llama 4 from Meta — open-source frontier model
2027: High-Confidence Predictions
- Autonomous agents become practical for routine office work
- AI-generated video becomes indistinguishable from real footage for short clips
- On-device models reach GPT-4-level performance
- First mainstream AI-to-AI negotiation protocols
- Regulatory frameworks emerge in EU, UK, and select US states
2027: Lower-Confidence Predictions
- AI tutoring demonstrably improves student outcomes at scale
- First credible claim of "artificial general intelligence" (definition TBD)
- Major corporate restructuring driven by AI capability (10,000+ role shifts at a single company)
How to Stay Current
- Follow the benchmarks. MMLU, HumanEval, GPQA, ARC-AGI — these measure real capability, not marketing claims.
- Try things yourself. Benchmarks don't capture whether a tool is useful for your work. Spend 30 minutes/month testing new capabilities.
- Ignore the extremes. AI doomers and AI utopians are both wrong. The reality is in the middle — transformative but uneven.
- Watch the tools, not the models. The model is the engine. The tool (agent framework, integration layer, UI) is what makes it useful.
- Read this site. We track all of this so you don't have to.
Last updated: March 2026