Agentic tooling

The pace of new releases of agentic tooling and harnesses is explosive right now. From the myriad, a couple have stood out that seem worth a deeper dive:

  1. Pi: A minimalist coding agent that’s been gaining mindshare. Pi shuns complexity — a tiny system prompt, just four tools (Read, Write, Edit, Bash), and full transparency into what’s happening under the hood. The premise is that frontier models already know how to work with bash, Linux, and programming languages, so you don’t need a massive prompt to guide them. It’s also fully open source, unlike Claude Code, so you can actually understand what’s going on. Extensions are supported when the basics aren’t enough.
  2. Hermes Agent by Nous Research, positioned as a blend between coding agents like Claude Code and generalist agents like OpenClaw.

A Pod of Assistants

The “claw” ecosystem of AI assistants, named after OpenClaw’s lobster mascot, is expanding fast. Last week, OpenAI hired Peter Steinberger (@steipete), creator of OpenClaw (née Clawdbot, née Moltbot), and this week we’ve seen several alternatives appear, each iterating on the idea in different directions. A non-exhaustive list:

MCP vs CLI: the debate continues

Following on from the success of OpenClaw — which famously shuns MCP in favour of CLI access to tools — the conversation about MCP’s future has been heating up. Critics point to the context bloat MCP introduces; proponents counter that enterprise adoption shows it solves real problems.

This week, Polymarket launched an official CLI for their prediction marketplace. One assumes this is in response to the volume of agent-based traffic (and fees) hitting their platform. Their decision to ship a CLI rather than an MCP interface raises interesting questions about when each approach makes sense. A good post discussing the trade-offs is worth a read.

Claude CoWork

Anthropic continues in its bid to be the enterprise work-everything app, turning its models into specialised agents “for every role and department” through the use of plugins. The latest set announced this week is for the financial services sector.

Claude Code

Claude Code Security, a new capability built into Claude Code, is now available in a research preview. It scans codebases for security vulnerabilities and suggests targeted software patches for human review.

Perplexity’s Perplexity Computer

Perplexity launched their Perplexity Computer — a multimodal interface which, in their words, “unifies every current AI capability into one system. It can research, design, code, deploy, and manage any project end-to-end.” Interesting part here is the model diversity, in theory Perplexity can use any model from any provider, so they can pick the optimal one for the (sub) task at hand.

Inception Labs drops Mercury 2 — the first diffusion model that ‘thinks’

Inception’s blog post

Mercury 2 is a diffusion model. Unlike traditional LLMs that predict the next token sequentially, diffusion models start with noise and iteratively refine it into the final output. The result is significantly faster generation. Inception claims 1,009 tokens/s on Blackwell GPUs, roughly 5x faster than leading speed-optimized models, along with potentially improved accuracy, given the model works on the full context at once.

It’s still to be proven that this approach holds at frontier quality levels, but early results suggest it’s comparable with smaller models like Haiku whilst being significantly faster.