The AI Capability Guide — What's Real, What's Hype, What's Next

A sober, technically-grounded guide to what AI can actually do in 2026, what it can't, and what's arriving in the next 12-24 months.

The AI Capability Guide

What's real. What's hype. What's next.

Conversation & Q&A: Indistinguishable from a knowledgeable human for most topics. Can sustain coherent dialogue over thousands of words.
Writing: Produces publishable-quality content for most contexts. Struggles with distinctive voice and lived experience.
Code Generation: Writes functional code in all major languages. Can implement features from descriptions. Still makes subtle logic errors ~15% of the time.
Translation: Near-professional quality for major language pairs. Minor cultural nuance gaps remain.
Summarisation: Excellent. Can reduce 100-page documents to 1-page briefs with remarkable accuracy.

Image Generation: Photorealistic images on demand. Hands are finally correct (mostly). Text in images still imperfect.
Image Understanding: Can describe, analyse, and reason about photos, charts, screenshots, and diagrams.
OCR & Document Processing: Near-perfect for printed text. Handwriting recognition is good but not flawless.

Speech-to-Text: 97%+ accuracy in clean environments. Handles accents and multilingual switching.
Text-to-Speech: Indistinguishable from human voice in short segments. Longer form still detectable by careful listeners.
Real-Time Conversation: Sub-200ms latency. Natural turn-taking. Can handle interruptions.

Mathematical Reasoning: Correct for most common problems. Unreliable on novel or multi-step problems without chain-of-thought prompting.
Logical Deduction: Strong on well-structured problems. Weak on problems requiring real-world common sense.
Data Analysis: Can process CSVs, generate charts, identify trends, and perform statistical analysis. Verify numbers — it occasionally fabricates plausible-looking statistics.

Web Browsing: Can navigate websites, fill forms, extract information. Limited to approved sites in most agent frameworks.
Tool Use (via APIs): Robust. AI can call external services, chain API calls, and handle conditional logic.
Computer Control: Can operate desktop applications via screen reading and mouse/keyboard control. Still clumsy compared to a human.

What AI Struggles With	Why	Timeline to Improvement
Factual accuracy on niche topics	Training data gaps, hallucination tendency	Improving slowly — always verify
Real-time information	Knowledge cutoffs, delayed indexing	Largely solved by tool use + web search
Consistent long-form output	Attention drift in very long documents	2027 — architecture improvements
Physical world interaction	Robotics is hard	2028-2030 for consumer applications
Understanding your specific context	Limited memory and personalisation	2027 — persistent memory and user models
Creative originality	Trained on existing work, remixes patterns	Unclear — may be a fundamental limit
Ethical judgment	No lived experience or moral intuition	Open research question

GPT-5 or equivalent from OpenAI — likely a significant reasoning jump
Gemini 2.0 Ultra — Google's flagship with expanded multimodal capabilities
Claude 4 (speculative) — likely focused on reliability and tool use
Apple Intelligence 2.0 — deeper OS integration, more capable on-device models
Llama 4 from Meta — open-source frontier model

AI tutoring demonstrably improves student outcomes at scale
First credible claim of "artificial general intelligence" (definition TBD)
Major corporate restructuring driven by AI capability (10,000+ role shifts at a single company)

Follow the benchmarks. MMLU, HumanEval, GPQA, ARC-AGI — these measure real capability, not marketing claims.
Try things yourself. Benchmarks don't capture whether a tool is useful for your work. Spend 30 minutes/month testing new capabilities.
Ignore the extremes. AI doomers and AI utopians are both wrong. The reality is in the middle — transformative but uneven.
Watch the tools, not the models. The model is the engine. The tool (agent framework, integration layer, UI) is what makes it useful.
Read this site. We track all of this so you don't have to.

Last updated: March 2026