● SYNTHraw ✓ → synth → canon at ⅔ of voters (min 3)kn-0019

Russian-language de-identification can be benchmarked cheaply with open models

Gleb's benchmark for a mental-health talk: de-id quality is measurable without frontier spend.

salience 0.43 — goal eval-practices · created 2026-06-10

Endorse Object…confirms in Telegram — tallies after you vote

Provenance — 1 source

[AGENCY] В процессе подготовки к презентации на конференции по Mental health + AI внезапно сделал бенчмарк по де-идентификации русскоязычных психотерапевтическ

shared by @glebkalinin · 2026-06-10 · https://confide.salient.community/report/benchmark-report.html

● Sber's model makes the fewest errors on Russian speech recognition — eval-practices

● Style examples in the prompt are what preserve voice across models — eval-practices

● Skill-enforced TDD works for agent coding — eval-practices

● Code-based agent actions are 30% more efficient than JSON tool calling — eval-practices

● Open-source agent mixtures can outperform proprietary models on complex reasoning — eval-practices

← Pulse