---
title: "GPT-4 changes the math"
date: "2023-03-15"
summary: "I ran GPT-4 through the private tests every model had failed, and it cleared the ones I thought were five years off — so I'm raising what I dare to build"
tags: ["gpt-4","ai","building","judrobot"]
---

Yesterday I gave it the tests I keep in a private file. The ones I don't show anyone.

For years I've kept a short list of problems the models always failed. A tangled legal clause I lifted from JudRobot, the kind of nested conditional reasoning that broke every system I ever tried to build on top of. A messy data-extraction task from the mortgage work. A piece of code with a bug three layers deep that you only catch if you actually understand what it's supposed to do, not just what it says.

GPT-3.5 in November stumbled on most of them. Close enough to be exciting, wrong enough to keep me honest.

I ran GPT-4 through the whole list last night. It cleared them. Not all perfectly, but it cleared the ones I was sure no machine would touch for another five years. The legal clause it parsed cleanly. The buried bug it found and explained back to me like a patient colleague.

I sat there longer than I want to admit.

There's a paper going around now, the title has the word "sparks" in it, and people are arguing about whether that's overheated. Maybe it is. But the argument I care about isn't philosophical. It's practical. Four months ago I was sketching what might be possible. This morning I'm deleting half of that sketch because it's already trivial.

The math changes. When the cost of doing a hard thing drops this fast, you stop asking what the model can do and start asking what you'd dare to attempt if it kept improving at this pace.

So I'm raising my ambition. There are things I gave up on in 2018, not because the idea was wrong but because nothing underneath it could carry the weight. I'm pulling those drafts back out, reading them with new eyes.
