Andreas Rau — Writing

Andreas Rau — Writing https://andreasrau.tech/writing High-signal writing on AI systems, engineering tradeoffs, and building products that have to work in production. Mon, 18 May 2026 07:12:07 GMT https://validator.w3.org/feed/docs/rss2.html https://github.com/jpmonette/feed 2026 Andreas Rau <![CDATA[The 39% Ceiling: What RoadmapBench Says About Long-Horizon Coding Agents]]> https://andreasrau.tech/writing/agentic-coding-paper-2026-05-18 https://andreasrau.tech/writing/agentic-coding-paper-2026-05-18 Mon, 18 May 2026 07:11:30 GMT AI Agents Research Daily <![CDATA[Documentation as IR: RustPrint and the case for spec-anchored agent loops]]> https://andreasrau.tech/writing/agentic-coding-paper-2026-05-15 https://andreasrau.tech/writing/agentic-coding-paper-2026-05-15 Fri, 15 May 2026 07:05:05 GMT AI Agents Research Daily <![CDATA[Agentic Coding Research Digest — May 2026]]> https://andreasrau.tech/writing/agentic-coding-digest-2026-05-13 https://andreasrau.tech/writing/agentic-coding-digest-2026-05-13 Wed, 13 May 2026 14:19:58 GMT 95% prompt-cache reuse on replay. A live supervisor lifted pair-coding success from 28.8% to 54.7% on CooperBench. Branching exploration beat baselines by up to 11 points while reducing compute by 58%. And forking rollouts during tree-RL training moved TerminalBench-2 from 34.2% to 39.4%. The whole system is open-sourced. Why it matters: This is what serious agent infrastructure looks like — not a chat loop, but a typed, replayable trace of agent operations that you can fork, supervise, and intervene on. If you're building anything beyond a one-shot agent harness, this paper is worth a careful read. The Lean formalisation alone tells you the authors mean business. arxiv.org/abs/2605.10913 The Common Thread Autonomy is solved; reliability isn't. ProgramBench, Constraint Decay, and SWE Atlas all converge on the same finding: when the task demands architectural judgement or structural conformance, frontier agents fall off a cliff. The leaderboard story and the production story are increasingly different stories. Context engineering is becoming a discipline. Mise en Place names what serious users already do — externalise domain knowledge, write specs, decompose tasks — and the Proactivity paper names the next step: deciding what the agent should surface back to you. Both are arguments that the human-agent interface is where the next 10× lives. Infrastructure is catching up to the agent loop. Shepherd's typed execution trace, Atlas's beyond-bugfix evaluation, ProgramBench's fuzz-based behavioural tests — the tooling around agents is getting noticeably more sophisticated. The era of "a for-loop over a chat completion" is ending.]]> AI Agents Research Digest <![CDATA[Agentic Coding Paper of the Day — May 13, 2026]]> https://andreasrau.tech/writing/agentic-coding-paper-2026-05-13 https://andreasrau.tech/writing/agentic-coding-paper-2026-05-13 Wed, 13 May 2026 14:19:02 GMT AI Agents Research Daily <![CDATA[When Doing Nothing Is the Right Patch: The Action Bias of Coding Agents]]> https://andreasrau.tech/writing/agentic-coding-paper-2026-05-11 https://andreasrau.tech/writing/agentic-coding-paper-2026-05-11 Mon, 11 May 2026 12:21:49 GMT AI Agents Research Daily <![CDATA[Multi-agent coordination is a graph problem, not a hierarchy problem]]> https://andreasrau.tech/writing/agentic-coding-paper-2026-05-08 https://andreasrau.tech/writing/agentic-coding-paper-2026-05-08 Fri, 08 May 2026 14:53:26 GMT AI Agents Research Daily <![CDATA[A 4B Model Just Replaced Frontier LLMs in the Subagent Slot]]> https://andreasrau.tech/writing/agentic-coding-paper-2026-05-07 https://andreasrau.tech/writing/agentic-coding-paper-2026-05-07 Thu, 07 May 2026 15:41:08 GMT AI Agents Research Daily <![CDATA[Agentic Coding Paper of the Day — May 6, 2026]]> https://andreasrau.tech/writing/agentic-coding-paper-2026-05-06 https://andreasrau.tech/writing/agentic-coding-paper-2026-05-06 Wed, 06 May 2026 09:43:37 GMT AI Agents Research Daily <![CDATA[Agentic Coding Research Digest — May 2026]]> https://andreasrau.tech/writing/agentic-coding-digest-2026-05-06 https://andreasrau.tech/writing/agentic-coding-digest-2026-05-06 Wed, 06 May 2026 06:30:05 GMT AI Agents Research Digest <![CDATA[Agentic Coding Research Digest — April 2026]]> https://andreasrau.tech/writing/agentic-coding-digest-2026-04-30 https://andreasrau.tech/writing/agentic-coding-digest-2026-04-30 Thu, 30 Apr 2026 14:32:26 GMT