Grading Questions, not Answers, in the age of AI

3 min readMay 4, 2024

Things shift in your educational adventure depending on how far you take it (grade school, college, grad school, post doc, etc.)

At the earliest stages you are simply graded on answers that a teacher poses to you. What is 1+1? You better answer 2.

As you get deeper in topics the answer isn’t the sole issue anymore, it’s even knowing how to pose the right question nobody has thought to ask before.

Why do I bring this up in the context of AI? Because all of these teachers for “chat prompting” to utilize tools such as GPT4 are essentially training people how to ask the right questions.

When you’re young the question is forced on you by your teacher. When you’re older, your job is to figure out the new questions.

What’s the practical point of this so this post isn’t pointless? Because using GPT4 over time, I could tell whether I am having an above average thinking day versus a below average one. How? By how long it takes GPT4 to “think” and respond to my prompts.

As with any computation, thinking is hard and calorie / energy expensive. This I realized later is why at MIT finals our instructors would pass around sugar cookies, quickly absorbed carbs to fuel 3 hours of over heating our heads. It’s no different from any CPU, GPU, etc. and why they need to cool them so dramatically when they are doing hard work.

It takes a lot of energy to think and compute. If you are typing text on a computer, chances are it is not giving off a lot of heat because the CPU energy required to do that is minimal. If you are playing an intense high resolution video game, the computer starts burning like a stove because the CPU is overheating.

This is the final idea. I can imagine a world where we “grade” students not just on their answers, but on the quality of their questions. The former is easy, since the answer is known ahead of time and objective to verify. The latter is harder and can be accused of being too subjective to judge.

But I suspect that if you analyzed a student’s questions (prompts) with AI and how long it took it to respond, you could actually get a sense of an important aspect of their intelligence. If I ask GPT4 when George Washington was born, a quick response is assured. If I ask it how the dynamics of a 3 body system involving black holes that have various spins, now with our understanding of gravitational waves being proven, how would we model that? Long wait for that one, and likely a lot of energy being burnt in a data center somewhere.

This would be a fundamentally new type of objective grading.

So now you have two grades in an orthogonal fashion. Like the old fashioned SATs that split evaluation of how good are you at verbal and how good are you at math.

Now you can add a dimension of, how good are the students’ answers, and how good are their unique questions? Is this flipping the script for “objectively” grading the raw intellectual horsepower of a student?

…analyzing…analyzing…analyzing…

Grading Questions, not Answers, in the age of AI

Written by Paul Chou