A calmer model

There's a thing benchmarks never capture: how a model behaves when you hand it too much.

I fed Claude 2 a forty-page contract last week. The whole thing, no chunking, no clever splitting. A hundred thousand tokens of context, and it just read it. Held the early clauses in mind when it got to the late ones. Pointed at a contradiction on page thirty-one that referred back to a definition on page four. I'd been building elaborate machinery for months to fake exactly that, feeding documents in slices and praying the seams held.

So that's gone now. Good.

But the part I keep turning over isn't the context window. It's the temperament. This model from Anthropic has a different register — steadier, less eager to perform, more willing to say it isn't sure. After months living inside one lab's voice, switching feels like talking to a colleague who measures words instead of one who finishes your sentences for you.

I didn't expect to care about that. I came for the hundred thousand tokens. I'm staying because reading its answers is calmer, and calm matters when you're trusting the thing with a real document and a real decision.

We're at the point where there's more than one serious model, and they have personalities. You pick partly on what they can do and partly on how it feels to work beside them all day. Today open weights landed too — Llama 2, out in the world for anyone to run. The shape of this is widening fast.

A year ago there was one place to paste your text. Now there's a rotation, and I find myself choosing by feel. The number on the leaderboard tells me less than the hour I spend in the conversation.