unknowing
SYNTHraw ✓ → synth → canon at ⅔ of voters (min 3)kn-0019

Russian-language de-identification can be benchmarked cheaply with open models

Gleb's benchmark for a mental-health talk: de-id quality is measurable without frontier spend.

salience 0.43 — goal eval-practices · created 2026-06-10
EndorseObject…confirms in Telegram — tallies after you vote

Provenance — 1 source

[AGENCY] В процессе подготовки к презентации на конференции по Mental health + AI внезапно сделал бенчмарк по де-идентификации русскоязычных психотерапевтическ

Related claims

Self-preferential bias is nearly universal across frontier modelseval-practices
Sber's model makes the fewest errors on Russian speech recognitioneval-practices
Style examples in the prompt are what preserve voice across modelseval-practices
Skill-enforced TDD works for agent codingeval-practices
Code-based agent actions are 30% more efficient than JSON tool callingeval-practices
Open-source agent mixtures can outperform proprietary models on complex reasoningeval-practices

← Pulse