OpenAI published a study that made me uncomfortable.
They tested models on real engineering tasks from Expensify's codebase: actual bugs and features worth $50 to $32,000 each.
Claude 3.5 Sonnet solved 26% of them.
My first thought was:
This. Is. Terrifying.
But here’s the weirder part:
When I looked at the tasks AI solved and the ones it failed:
I couldn’t find a pattern.
Some looked simple but broke the moment you needed to understand how distant parts of the system connected.
Others looked complex but were just pattern-matching in disguise.
It reminded me of something Nassim Taleb wrote about experts:
“We’re terrible at knowing the boundaries of our own knowledge.
And nowhere is that more obvious than in engineering.
We look at a problem and think:
“Obviously AI can’t handle this kind of complexity.”
Then it solves it in minutes.
Or we say:
“This should be easy.”
And it writes beautiful code that crashes in ways no human would ever think to crash it.
Here's the pattern I see in engineers who've actually gotten ahead with AI:
They don't ask it to write code.
They ask it to think with them.
So if “think with me” beats “code for me,” how do you use it day to day?
When you're stuck, don't ask for answers. Ask for better questions. "I think this is a database issue. What else could cause these symptoms?" Half the time, you discover you're fixing the wrong problem entirely..
When the plan looks obvious, switch hats. “As a tech lead, what will fail first at 10× load? As security, where does untrusted input sneak in?” Each role reveals blind spots you can't see when you're stuck in your own perspective.
Before you code, write the test you’re afraid of. You have to define what “failure” looks like before you build. And if your solution can’t pass that test? Better to find out now, not 3 days and 500 lines later.
AI solves a quarter of real-world engineering tasks, and no one knows why those..
That uncertainty makes people nervous. Rightfully so.
But it also reveals something useful.
It reminds us that the hard part isn’t writing code.
It’s to think beyond it.
And the engineers who win won’t be the ones with the perfect prompts.
They’ll be the ones who ask better questions.
Who switch perspectives.
Who stay curious.
Read next: How to Learn Anything
Science-based Tools No One Teaches Us
I admit I am not good with unit tests. So lately, I use AI to help me with unit tests particularly when I don't know how to replicate a particular error. It was good and it made me approach coding differently than what I used to do.
‘You have to define what “failure” looks like before you build’ <- this should be written on top of all drawing boards!