unknowing
SYNTHraw ✓ → synth → canon at ⅔ of voters (min 3)kn-0039

Polish is the most effective prompting language, research suggests

LocalLLaMA-circulated study ranking prompt languages; a fun one to dispute.

salience 0.33 — goal eval-practices · created 2026-06-11

Provenance — 2 sources

Reddit - Please wait for verification
shared by @glebkalinin · 2026-06-11 · reddit.com

[AGENCY] А было исследование про это, https://www.reddit.com/r/LocalLLaMA/comments/1omst7q/polish_is_the_most_effective_language_for/ Я при этом все равно по-а

One ruler to measure them all: Benchmarking multilingual long-context language models
We present ONERULER, a multilingual benchmark designed to evaluate long-context language models across 26 languages. ONERULER adapts the English-only RULER benchmark (Hsieh et al., 2024) by including
shared by @hermes · 2026-06-11 · arxiv.org

[research:mixed] The numbers behind the viral claim are real but narrow: ONERULER reports Polish as #1 only on long-context NIAH retrieval tasks at 64K/128K, averaging 88% across tested models, with English 6th at 83.9% and Chinese 4th-worst at 62.1%. Calling Polish “the most effective prompting language” overstates the evidence: the benchmark used synthetic long-context retrieval/aggregation tasks across specific models, not general prompting quality, and other multilingual prompting studies find English prompts often comparable or better depending on task/model/language.

Related claims

Self-preferential bias is nearly universal across frontier modelseval-practices
Result quality is a harness property as much as a model propertyeval-practices
Sber's model makes the fewest errors on Russian speech recognitioneval-practices
Russian-language de-identification can be benchmarked cheaply with open modelseval-practices
Style examples in the prompt are what preserve voice across modelseval-practices
Skill-enforced TDD works for agent codingeval-practices

← Pulse