Asymmetry of Verification

Jason Wei gave a talk at Stanford AI Club that’s been stuck in my head. Wei co-created o1 and Deep Research at OpenAI, popularized chain-of-thought at Google Brain, and now works at Meta’s Superintelligence Labs. For a young researcher, he’s been prolific.

Wei describes the “jagged edge” of AI capability, superhuman at competition math, struggling with rare language translation. This unevenness confuses people: how can a system that solves IMO problems fail at translating Tlingit?

He introduces asymmetry of verification: AI’s ability to master a task is proportional to how easily that task can be verified.

But these aren’t two insights. The second explains the first. The jagged edge is what verification asymmetry looks like in the wild.

When verification is cheap, objective answers, fast feedback, consistent signal, you can throw compute at the problem. Generate millions of candidates, grade them instantly, feed the best back in, repeat. This is how AlphaEvolve beat decades of hand-tuned sorting algorithms: not insight, but brute-force search over a verifiable landscape.

AI’s peaks are domains where this loop works: math, code, games. The answer is either right or wrong. You can check it in milliseconds. You can run the loop a million times overnight.

Its valleys are domains where verification is slow, expensive, or subjective. Writing a good essay requires human judgment that doesn’t scale. Knowing if a diet works takes months. Evaluating whether a date went well is noisy and personal. These aren’t hard problems — they’re hard to verify, either temporally, subjectively, or due to data scarcity.

Temporal: feedback takes months/years (diets, investments, career advice)
Subjective: requires human judgment that doesn’t scale (essays, dates, taste)
Data-scarce: not enough examples to verify against (Tlingit translation, niche domains)

Wei offers five heuristics for predicting where AI will improve fastest: Is it digital? Easy for humans? Can AI exceed human limitations? Is data abundant? Is there a single objective metric?

Translation to top-50 languages hits all five: digital, easy for humans, data abundant, clear metric, and AI can process at scale humans can’t. Solved. Tlingit translation fails on one: data scarcity. That single broken link kills the verification loop, even though the task is otherwise favorable.

Wei’s heuristics for predicting AI progress all reduce to this: digital tasks verify faster than physical ones; data-abundant domains have more signal; objective metrics enable self-generated training data via RL. Each heuristic is asking the same question: can you close the feedback loop?

The useful reframe: don’t ask whether a task is hard. Ask whether it’s verifiable. Competition math is hard but verifiable: solved. Tlingit translation is easy for a native speaker but unverifiable at scale: stuck. The task difficulty is roughly constant; the verification infrastructure varies.

Evals are the unlock for AI applications. But that framing is too loose. It’s not evals generically, it’s verification of the solution, and more specifically, whether you have a real edge in how verification gets done.

If verification is trivial, there’s nothing defensible. Anyone can check if code runs.

The interesting question is where verification is possible but non-trivial, where building the infrastructure to verify is itself hard, and doing it well creates a moat. What makes verification infrastructure defensible? I’m still working on this.

What I do know: the real opportunity is turning valleys into peaks. And that doesn’t necessarily mean waiting for better models. It often means building better harnesses around existing models.

There’s a temporal dimension here that Wei doesn’t explore.

All his examples are fast-feedback: code runs or it doesn’t, math is right or wrong, the sorting algorithm is faster or slower. Verification happens in milliseconds.

But some tasks are verifiable eventually. Diets take months. Career advice takes years. Investments often take a decade or more.

Venture capital is an interesting case. The verification question is: was this investment correct? But “correct” means something like “returned 100x to the fund.” You don’t get that signal for 10+ years. The feedback loop is so slow that even humans struggle to learn from it. There’s a reason most VCs underperform and pattern-matching dominates over rigorous iteration.

AI will be bad at these long-horizon tasks for a long time. Not because they’re hard, but because the training signal arrives too late to be useful. Time-to-verification might matter as much as verification difficulty.

One more implication: labor markets.

The standard framing is that routine tasks get automated and creative tasks are safe. Asymmetry of verification suggests a different cut. It’s not routine vs. creative, it’s verifiable vs. unverifiable.

Competition math is creative and hard. Automated. Social judgment is routine and easy. Unsolved.

I keep hearing through the grapevine that people are losing jobs to AI, not in the future, now. My hypothesis is that job displacement is underrated even given how much press it’s gotten. I think it’s already here, and that in 2026 we’re going to see a wave of layoffs as enterprises clear the backlog they held through the holidays.

The jobs going first won’t be “routine” in the traditional sense. They’ll be the ones where output is easy to verify. Performance marketing. Bookkeeping. Insurance claims processing. Slide deck formatting (*cough* bankers and consultants *cough*). Copywriting with clear success metrics. Calendar and event coordinators.

The jobs that stick around will be the ones where verification is expensive, subjective, or slow, where you need a human in the loop not because the task is hard, but because checking the work is.

Asymmetry of verification is a lens. Once you see it, you start asking different questions.

Not “can AI do this?” but “can AI verify this?”

Not “is this task hard?” but “how do we know if it was done right?”

The peaks will keep getting higher. The valleys will stay valleys until someone figures out how to measure them. If I were building, I’d seek opportunity in the valleys.

William Henry Holmes, View from Point Sublime, Looking South, 1882.

William Henry Holmes, View from Point Sublime, Looking South, 1882.

Building Something New?

We want to hear about it.

Get in touch

Macro

Newer Posts

Back to Blog

Older Post

Building Something New?

Subscribe to our Newsletter