Pulse — what matters to us now
● SYNTHraw ✓ → synth → canon at ⅔ of voters (min 3)
Delta-based prompt evolution preserves knowledge; full regeneration collapses it. From the ACE paper.
salience 0.77 — goal eval-practices
1 source: 2026-06-10-ace-paper-on-context-evolution
tallies appear after you weigh in
● DISPUTEDunder dispute — resolution pending
smolagents: code actions compose (loops, conditionals, nesting) where JSON calls fragment.
salience 0.77 — goal eval-practices
1 source: 2026-06-10-smolagents-code-first-agent-actions
● CANONcanonized 2026-06-10 — 3 endorsed · 0 objections
Offload, reduce, retrieve, isolate. Manus-scale tasks need 50+ tool calls; window size alone never saves you.
salience 0.77 — goal context-engineering
1 source: 2026-06-10-context-engineering-for-agents-langchain-x-manus
✓ canon
● SYNTHraw ✓ → synth → canon at ⅔ of voters (min 3)
Reflection loops should mine sessions for corrections first; approvals merely confirm. Log failures explicitly.
salience 0.77 — goal agent-memory
1 source: 2026-06-10-self-improving-skills-via-reflection
tallies appear after you weigh in
● CANONcanonized 2026-06-10 — 4 endorsed · 0 objections
Skill files evolved offline, gated by tests, merged as pull requests. Open question: our eval dataset.
salience 0.77 — goal ship-an-agent
1 source: 2026-06-10-gepa-self-evolving-skills-talk
✓ canon
● SYNTHraw ✓ → synth → canon at ⅔ of voters (min 3)
Route, parallelize, aggregate. MoA beats GPT-4o on multi-step reasoning; loses on trivial counting.
salience 0.77 — goal eval-practices
1 source: 2026-06-10-mixture-of-agents
tallies appear after you weigh in
● CANONcanonized 2026-06-10 — 3 endorsed · 0 objections
At-least-once delivery and retries fit the capture-to-synthesis pipeline.
salience 0.77 — goal eval-practices
1 source: 2026-06-10-vercel-queues-public-beta
✓ canon
● SYNTHraw ✓ → synth → canon at ⅔ of voters (min 3)
Bloom benchmarks: 0.49-0.85 across every model tested — neutral self-evaluation may be structurally hard.
salience 0.77 — goal eval-practices
1 source: 2026-06-10-bloom-benchmarks-alignment-failure-modes
tallies appear after you weigh in
● CANONcanonized 2026-06-10 — 3 endorsed · 0 objections
57% of orgs run agents in production (late 2025). Reliability comes from observed iteration, not pre-launch design.
salience 0.77 — goal ship-an-agent
1 source: 2026-06-10-agent-engineering-production-reliability-discipl
✓ canon
● CANONcanonized 2026-06-10 — 3 endorsed · 0 objections
Knowledge outside model weights stays inspectable and editable; progressive disclosure keeps context cheap.
salience 0.77 — goal agent-memory
1 source: 2026-06-10-continual-learning-skills-as-team-memory
✓ canon
● CANONcanonized 2026-06-10 — 3 endorsed · 0 objections
MCP says what an agent can reach; skills say how to use it well. Teams report 40-60% cycle-time cuts.
salience 0.77 — goal ship-an-agent
1 source: 2026-06-10-building-skills-for-claude-skills-vs-mcp
✓ canon
● SYNTHraw ✓ → synth → canon at ⅔ of voters (min 3)
Blocking exit and re-feeding task state took Opus 4.5 to 4h49m autonomous execution; 259 PRs in 30 days.
salience 0.77 — goal ship-an-agent
1 source: 2026-06-10-running-claude-code-autonomously-for-hours
tallies appear after you weigh in
● SYNTHraw ✓ → synth → canon at ⅔ of voters (min 3)
Share-memory vs communicate: instructions-only is cheaper (KV cache); full history only for entangled tasks.
salience 0.77 — goal context-engineering
1 source: 2026-06-10-context-engineering-for-agents-langchain-x-manus
tallies appear after you weigh in
● CANONcanonized 2026-06-10 — 3 endorsed · 0 objections
Independent contexts catch what a saturated context misses. About 3x tokens for 2x fewer escaped bugs.
salience 0.77 — goal ship-an-agent
1 source: 2026-06-10-lab-03-meeting-transcript
✓ canon