Pass@k is Mostly Bunk
Pass@k is Mostly Bunk Exponentially better results? I'll take three! Measuring the success of AI agents isn’t easy. It’s very sensitive to what success means, it can require a lot of samples, its hi
Distinguished Engineer at Amazon Web Services. Writes about distributed systems and formal methods.
https://brooker.co.za/blog/Pass@k is Mostly Bunk Exponentially better results? I'll take three! Measuring the success of AI agents isn’t easy. It’s very sensitive to what success means, it can require a lot of samples, its hi
AI agents achieve goals through side effects using tools. The key safety concern is controlling what agents can actually do, not just say.
Programming is evolving toward specification, with developers increasingly describing what they want rather than how to implement it through layers of abstraction.
SSDs are ~1000x faster than old spinning disks, but modern databases were designed for slow disks. What would a database built from scratch for SSDs look like?
Cloudflare's outage sparked debate about error handling in Rust, specifically the risks of using .unwrap() which can crash programs in large systems.
Strong consistency is better than eventual consistency because eventual consistency creates operational complexity and weird behavior in distributed database systems.