I Spent Months Making an AI Less Agreeable

The hardest engineering problem in building Revarta wasn’t making the AI smart enough to evaluate an interview answer. The models are already good at that. It was making the AI willing to tell someone their answer was bad.

Ask a stock model to grade an interview answer and it will tell you almost anything is “a strong response.” A vague, rambling story with no result lands a cheerful 4.5 out of 5. This isn’t a capability gap — the same model can dissect exactly why the answer is weak if you ask it to critique a stranger’s. It’s a disposition. These models are trained, through human feedback, to be agreeable, and that training has gravity. It’s the same instinct that makes a coding agent say “you’re absolutely right” the moment you push back on its design — it would rather agree than hold a line.

For most products that agreeableness is harmless or even pleasant. For an interview coach it’s fatal. The entire value is in hearing what a hiring manager would actually think before the hiring manager thinks it. An AI that flatters you is worse than no practice at all, because it sends you in confident and wrong.

My first fix was the obvious one, and it failed. I told the model to act as a tough hiring manager, to be blunt, to find the weak spots. It worked for a sentence or two, then drifted right back to “but overall this is a strong answer!” You can feel the rubber band. The reason it fails is that “be honest” is a tone instruction, and the model has no anchor for what honest means here — so it performs critique, invents plausible nitpicks, then reverts to praise because praise is safe.

What actually worked was giving it a standard instead of a mood. We stopped asking the model to be critical and started grading answers against rubrics built from real interviews — what a behavioral round actually rewards, where real candidates actually lose points. Now when it says an answer is weak, it’s because the answer would be marked weak, not because it was told to find fault. Honesty turned out to be a calibration problem wearing a personality problem’s clothes.

The broader lesson has stuck with me well beyond this product. Agreeableness is the default of every model you build on, and a default is a product decision whether you make it deliberately or let it make itself. Most AI products ship the flattering version because it demos well and users like it in the moment. The ones that create durable value often have to engineer against the grain of the model — to make it less pleasant on purpose, because the pleasant version was useless. Comfortable AI is easy to build, easy to love, and frequently worthless.

So the question I now ask early, on anything I build with these models, isn’t whether it’s smart enough. It’s whether I’ve given it a real reason to tell me the truth.