A Week in Applied AI

Claude code

Thinking effort had been added and set to Medium by default. To get maximum performance, use ‘/model’ command and use the right arrow to change reasoning to set thinking effort to High.

Added ‘/loop’ command - cron scheduling tools to run prompts repeatedly, poll for status, or set one-time reminders within a Claude Code session

OpenAI Codex Security in research preview

Codex Security released in research preview this week. OpenAI Codex Security (formerly Aardvark) is OpenAI’s application security agent. It builds a project-specific, editable threat model of your codebase. It maps what the system does, what it trusts, and where it’s most exposed, then hunts for vulnerabilities using that context and validates findings in sandboxed environments before surfacing them.

OpenAI GPT 5.4

GPT-5.4 Thinking for paid ChatGPT subscribers, and GPT-5.4 Pro for Pro and Enterprise tiers. Pro means more inference-time compute thrown at harder problems. The same model but with a longer leash to reason through particularly gnarly tasks in law, finance, coding, etc. OpenAI describes Pro as being for “people who want maximum performance on complex tasks”

There’s also the API version, and Codex, in which flavours is “the first general-purpose model we’ve released with native, state-of-the-art computer-use capabilities, enabling agents to operate computers and carry out complex workflows across applications”

Always-On Memory Agent by Google Cloud

Google Cloud open-sourced a project called the Always-On Memory Agent

This is a background agent that acts as a shared memory service for other agents or workflows. It runs continuously, watching a folder for new files (text, images, audio, video, PDFs) and exposing an HTTP API. When new information arrives, Gemini Flash-Lite analyses the content, extracts entities and topics, and stores structured memories in SQLite. Periodically, a consolidation loop reviews recent memories, finds patterns between them, and produces higher-level insights.

Agents POST their observations and decisions to it, and query it for context before acting — so each agent benefits from what the others have learned. The approach deliberately avoids vector search and embedding-based retrieval — the argument being that traditional RAG is passive (embed once, retrieve later), while this agent actively processes and connects information over time, more like how the brain consolidates during sleep. Older memories aren’t lost; they’re compressed into consolidated insights that are surfaced alongside recent memories at query time.

Langchain now supports the Standard Schema spec

LangChain’s adoption of Standard Schema is valuable to enterprises because it collapses the gap between existing application-layer validation and the AI layer. Rather than maintaining separate schema definitions for LLM structured output, teams can now pass the same domain schemas they already use across their APIs, forms, and databases — whether written in Zod, Valibot, or ArkType — directly into agent and structured output calls.

This means an enterprise’s CustomerRecord or ClaimPayload schema becomes the LLM’s output contract with no translation layer, reducing integration surface area and ensuring that validated, type-safe data flows end-to-end from model response through to downstream systems.

Google Worskspace CLI

Google released a CLI for Workspace. Google Drive, Gmail, Calendar, and every Workspace API. 40+ agent skills included.