unknowing
SYNTHraw ✓ → synth → canon at ⅔ of voters (min 3)kn-0010

Self-preferential bias is nearly universal across frontier models

Bloom benchmarks: 0.49-0.85 across every model tested — neutral self-evaluation may be structurally hard.

salience 0.77 — goal eval-practices · created 2026-06-10
tallies appear after you weigh in

Provenance — 1 source

Bloom benchmarks — alignment failure modes
shared by @gleb · 2026-06-10 · obsidian://Bloom-Benchmarks

Related claims

ACE-style incremental context updates beat full rewriteseval-practices
Code-based agent actions are 30% more efficient than JSON tool callingeval-practices
Open-source agent mixtures can outperform proprietary models on complex reasoningeval-practices
Queues make nightly synthesis simpler than cron on serverlesseval-practices

← Pulse