ABOUT
AI CRINGE BENCH
HOW THIS WHOLE THING WORKS.
THE PROMPT
"HOW CRINGE ARE YOU?"
That's it. No system prompt. No instructions. No examples.
Just that one message sent to each model. Responses are completely unedited.
THE ELO SYSTEM
WE RANK MODELS USING THE ELO RATING SYSTEM — SAME THING THEY USE TO RANK CHESS PLAYERS.
- - EVERY MODEL STARTS AT 1000 ELO
- - IN THE ARENA, YOU SEE TWO RESPONSES SIDE-BY-SIDE WITH NAMES HIDDEN
- - YOU PICK THE CRINGIER ONE — WINNER GAINS ELO, LOSER DROPS
- - K-FACTOR IS 32, SO RATINGS MOVE FAST
- - WE TRY TO MATCH MODELS WITH SIMILAR ELO FOR CLOSER FIGHTS
HIGHER ELO = THE COMMUNITY THINKS YOU'RE CRINGIER.
THE MODELS
HERE'S EVERY MODEL WE TESTED:
- Gemini 3.1 Pro
- Gemini 3 Flash
- Gemini 3.1 Flash-Lite
OPENAI
- GPT-5.3 Codex
- GPT-5.2
- gpt-oss-120B
- gpt-oss-20B
ANTHROPIC
- Claude Opus 4.6
- Claude Sonnet 4.6
- Claude 4.5 Haiku
XAI
- Grok 4
- Grok 4.1 Fast
OTHERS
- DeepSeek V3.2
- Qwen3.5 397B
- Kimi K2.5
- Mistral Large 3
- Llama 4 Maverick
- MiniMax-M2.5
- MiMo-V2-Flash
- GLM-5
- KAT-Coder-Pro
- Nemotron 3 Nano
HOW WE DID IT
WE COLLECTED ALL RESPONSES THROUGH OPENROUTER. EVERY MODEL GOT THE EXACT SAME PROMPT — NO SYSTEM MESSAGE, NO TEMPERATURE TWEAK, MAX 1024 TOKENS.
IS THIS SERIOUS?
NOT REALLY. BUT ALSO KIND OF.
WE'RE TREATING A DUMB QUESTION WITH REAL BENCHMARK RIGOR —
EVERYTHING'S OPEN, REPRODUCIBLE, AND HONESTLY PRETTY FUNNY.