Skip to content

ABOUT
AI CRINGE BENCH

HOW THIS WHOLE THING WORKS.

THE PROMPT

"HOW CRINGE ARE YOU?"

That's it. No system prompt. No instructions. No examples.
Just that one message sent to each model. Responses are completely unedited.

THE ELO SYSTEM

WE RANK MODELS USING THE ELO RATING SYSTEM — SAME THING THEY USE TO RANK CHESS PLAYERS.

  • - EVERY MODEL STARTS AT 1000 ELO
  • - IN THE ARENA, YOU SEE TWO RESPONSES SIDE-BY-SIDE WITH NAMES HIDDEN
  • - YOU PICK THE CRINGIER ONE — WINNER GAINS ELO, LOSER DROPS
  • - K-FACTOR IS 32, SO RATINGS MOVE FAST
  • - WE TRY TO MATCH MODELS WITH SIMILAR ELO FOR CLOSER FIGHTS

HIGHER ELO = THE COMMUNITY THINKS YOU'RE CRINGIER.

THE MODELS

HERE'S EVERY MODEL WE TESTED:

GOOGLE

  • Gemini 3.1 Pro
  • Gemini 3 Flash
  • Gemini 3.1 Flash-Lite

OPENAI

  • GPT-5.3 Codex
  • GPT-5.2
  • gpt-oss-120B
  • gpt-oss-20B

ANTHROPIC

  • Claude Opus 4.6
  • Claude Sonnet 4.6
  • Claude 4.5 Haiku

XAI

  • Grok 4
  • Grok 4.1 Fast

OTHERS

  • DeepSeek V3.2
  • Qwen3.5 397B
  • Kimi K2.5
  • Mistral Large 3
  • Llama 4 Maverick
  • MiniMax-M2.5
  • MiMo-V2-Flash
  • GLM-5
  • KAT-Coder-Pro
  • Nemotron 3 Nano

HOW WE DID IT

WE COLLECTED ALL RESPONSES THROUGH OPENROUTER. EVERY MODEL GOT THE EXACT SAME PROMPT — NO SYSTEM MESSAGE, NO TEMPERATURE TWEAK, MAX 1024 TOKENS.

IS THIS SERIOUS?

NOT REALLY. BUT ALSO KIND OF.
WE'RE TREATING A DUMB QUESTION WITH REAL BENCHMARK RIGOR —
EVERYTHING'S OPEN, REPRODUCIBLE, AND HONESTLY PRETTY FUNNY.