The estimation problem nobody warned us about

Software estimation was always a bit of a guess. You’ve been there. A “quick fix” swallows three days whole. A supposedly complex feature wraps up before lunch. We call it agile. Scrum never promised precision anyway. It promised a framework for operating inside uncertainty, which is just a polite way of saying we have no idea, but let’s be structured about it. Then AI arrived and made things weirder.

So what exactly are we estimating?

A while back I was building a .NET feature: API integrations, validation, unit tests, Azure config. My gut said two, maybe three days. Out of habit I opened GitHub Copilot, described what I needed, and watched it start building before I finished typing.

Controllers. Services. Validation. Tests. Docs.

Not perfect. But not nothing either. My first thought was that I would have spent most of this week on that. My second thought was that I had no idea what half of it actually did. The task didn’t shrink to fifteen minutes. But it wasn’t three days either. It became something that doesn’t have a name yet. The old logic was simple: bigger tasks take longer. That held up for decades because writing code was the expensive part. It isn’t anymore.

AI can produce Bicep templates, REST APIs, database scripts, and test suites faster than you can read them. Sometimes what comes out is solid. Sometimes it’s subtly broken in ways that won’t surface until you’re deep in a production incident wondering what went wrong.

The trouble is you usually can’t tell which one you have until you’re already inside it. The uncertainty didn’t leave. It just found a new address.

Writing got cheap. Understanding got expensive.

This is the real shift nobody puts on a slide deck. AI can generate a thousand lines of code in seconds. What it cannot do is tell you whether those lines belong in your architecture, respect your compliance constraints, or hold up six months from now when the product changes direction. That part is still yours. In a lot of cases, that’s where most of the actual work lives now.

Velocity is measuring something stranger

Teams used velocity as a rough compass. Last sprint was 40 points, this sprint will probably be close. Imprecise, sure, but useful enough. AI scrambles that rhythm. One developer ships five stories because the AI did the heavy lifting. Another ships two because they spent the week untangling what the AI confidently got wrong. Who did more for the product? Genuinely unclear. Velocity is now picking up signals it was never designed to read: judgment calls, experience, risk tolerance, and just how lucky the generated output happened to be that week.

Sprint planning has a new question

A few years ago nobody asked this. Now it comes up almost every time: can AI handle this one?

That question reshapes the whole conversation. Something intimidating might turn out to be a quick prompt away. Something that looks trivial might actually demand careful, painstaking review of code you didn’t write and can’t fully vouch for. The uncertainty moved from building it to trusting it.

Done means something different now

When you write every line yourself, you carry the context with you almost automatically. You know why things are shaped the way they are.

AI-generated code doesn’t come with that. It arrives confident, plausible, and context-free. So “does it work?” isn’t enough anymore. The real question is “does it make sense here?” Does it fit the architecture? Does it handle the edge cases your business actually cares about? Are those tests genuinely testing anything, or just performing the idea of tests? Treat AI like a talented contractor who showed up without attending a single design meeting. Capable. Useful. Still needs a real review.

Back to what agile was always supposed to be

Here’s the strange part: AI might actually push teams back toward the point of all this. Outcomes over output. Did the problem get solved? Is the product better? Nobody using your software cares about story points. They care whether the thing works. That becomes the only metric that matters when effort stops being a reliable signal. Scrum isn’t becoming obsolete. It was built for uncertainty, and uncertainty didn’t go anywhere. The teams that thrive won’t be the ones generating the most code. They’ll be the ones making sharper decisions about which code should exist at all. Story points aren’t getting less useful because Scrum is broken. They’re getting less useful because writing code stopped being the hard part.