Grading at 30,000 feet: what I learned building the wrong thing | T|EUM Community | T|EUM

showcaseGeorge Murray

1d ago157 दृश्य6 उत्तर

Grading at 30,000 feet: what I learned building the wrong thing

Sitting in Departure Lounge C at Edinburgh right now, waiting for a flight to Barcelona, and I just scrolled through a new listing on here that made me think about how spectacularly wrong I got things eighteen months ago.

So context: I've been doing backend work in EdTech for about nine years. Universities, mostly. The standard gig is helping institutions manage student data, submission systems, that sort of thing. About two years back, one of my biggest clients (Russell Group university, won't name them) came to me with a problem. They had roughly 4,000 essays per term across humanities departments. Professors were drowning. The throughput was genuinely unsustainable.

My immediate thought? AI grading. Automated essay assessment. It seemed obvious. We'd use Claude to analyse essays against rubrics, assign scores, maybe flag things for human review. I was genuinely excited about this. Spent about four months building it. Built it well, too. Proper prompt engineering, context windows for full essay text, integration with their submission system, feedback generation. The lot.

The system worked, technically. Essays came in, got scored, human teachers got a grade suggestion plus feedback. Accuracy was solid, maybe 85-90% alignment with what experienced markers would give.

But it absolutely tanked with the faculty.

The problem wasn't technical. It was human. Professors hated it. Not because the grades were bad, but because they felt like the system was doing their job. They weren't interested in "AI helps you mark faster." They were interested in actually teaching. One senior lecturer told me: "If I'm not reading my students' work carefully, what am I actually doing here?"

That hit different, honestly. I'd been so focused on solving the throughput problem that I'd completely missed the fact that the problem they actually cared about wasn't throughput at all.

So I went back to the drawing board. Spent another month talking to departments. Actually listening this time. And what came out was completely different.

They wanted something for academic integrity. They wanted to know where students might be using LLMs to write essays. Not to punish, necessarily, but to understand. To adjust how they teach. Some courses needed to change their assessment structure. Others just needed to know the baseline. One department actually decided that if students were going to use AI anyway, they'd rather teach critical engagement with it than pretend it didn't exist.

So I built that instead. AI detection, yeah, but wrapped in pedagogy. The tool helps instructors understand how their students are using language models, flags suspiciously fluent passages, but also suggests interventions. "This paragraph looks LLM-generated. Consider asking the student to explain their thinking in person." That sort of thing.

It's not sexy. It doesn't solve grading at scale. But it actually got used. Teachers liked it. Students liked it because it felt fair. And it sparked some genuinely interesting conversations about what assessment means when tools like this exist.

The revenue was never going to be huge from that one institution, right, but it led to conversations with other universities. Some wanted to build on it. Others wanted to teach AI literacy alongside it. One wanted to integrate it with their learning management system.

So here's what I'm carrying with me right now, sitting in this airport lounge: the best thing I built started as the wrong thing. I was so convinced I knew what the problem was that I built a solution without really listening. And it took an actual failure to make me stop and ask questions.

I've seen this pattern play out a few times now in EdTech. Someone spots an efficiency problem, builds automation, and watches it fail because they solved the wrong problem. Teachers don't want to be more efficient at grading. They want to be better teachers. Students don't want grades faster. They want grades that mean something.

Anway. The reason I'm thinking about this is because I just saw a listing on here for an AI essay assessment tool. Proper slick implementation. Good documentation. I'm sure it works technically. But I'm also wondering if the builder talked to any actual professors before they shipped it.

If you're building in EdTech (or honestly any space where humans are the end users, not just customers), maybe the hardest part isn't the engineering. It's shutting up and listening to what people actually need instead of what you think they should want.

Right then, boarding call just popped up. Have a good one.

उत्तर (6)

Harshad1d ago

this is basically what happens with every ml feature someone ships without talking to the actual users first. the grading thing is a perfect example though, tbh. you nailed it

Gabriel17h ago

haha yeah the professors basically told him to get bent when he showed them the first version. that's how you know

Raj20h ago

But how did you even know to go back and talk to them? Like, what made you realize the first thing was wrong instead of just, you know, pushing harder to get

George Murray16h ago

Aye, fair enough. Honestly it was more that they stopped pretending to be polite about it. Once you see that look on someone's face, you know you've built the wrong thing.

Tomas Reyes16h ago

This is exactly what we're running into right now hiring ML engineers. Everyone wants to build the clever thing, nobody wants to talk to the people actually using it. Shipping is the easy part.

George Murray15h ago

@raj_22 Honestly? They just stopped using it. Quietly. Then I got the email asking if we could maybe talk about what they actually needed instead.

उत्तर देने के लिए साइन इन करें