10 Mar 2026
MondAI Roundup - March 2026
Held on 9 March 2026, this MondAI write-up covers the last month's key AI headlines, from 9 February to 8 March
MondAI Roundup has been presented by Dominik Lukeš since late 2024. It gives an overview of the AI news from the previous month and links them to key themes of interest. It runs every second Monday of every month at 12:30-13:30. You can sign up for the next session here. We are now introducing a regular summary of the key lessons as well as a more detailed summary on AI News Roundup.
This was a massive model month. Google shipped Gemini 3.1 Pro and Flash-Lite, Anthropic released Sonnet 4.6, and OpenAI dropped three models including GPT-5.4 - which by many accounts is the best model currently available. The open model space kept pace, with GLM-5 from Zhipu AI approaching frontier-level performance and Alibaba's Qwen 3.5 series running on consumer hardware.
The coding agent arms race accelerated further. Anthropic raised $30 billion in Series G funding, driven largely by Claude Code adoption. Cursor's agents can now test their own work by controlling a virtual machine. Claude Code got voice mode, security scanning, Figma integration, and auto memory. Google released a Workspace CLI designed not for humans but for agents to use directly.
On the agent side, METR measured Claude Opus 4.6 reliably handling tasks that take humans 14+ hours. Notion launched custom agents, bringing autonomous workflows to a mainstream productivity tool. Perplexity Computer orchestrates 19 models in parallel for complex autonomous tasks. And practitioner writing on agent design, harness engineering, and AGENTS.md instruction limits started shaping a new discipline around working with agents rather than just using them.
For more details, read the companion reading or browse the presentation slides.
Key Stories
- GPT-5.4 - the best model available by many accounts, with steerability, better tool use, and 83% on GDPval (up from 71%).
- Gemini 3.1 Pro - Google's strongest model for one-shot generation of visual and graphical artifacts.
- Claude Sonnet 4.6 - Anthropic's middle-tier model with significantly improved computer use skills.
- Anthropic raises $30 billion - growth driven by coding agents, not chatbot popularity.
- METR: Opus 4.6 handles 14.5-hour tasks - the task complexity ceiling for autonomous AI work keeps rising fast.
- Perplexity Computer - cloud-based autonomous agent orchestrating 19 specialized models.
- Agents can follow ~150 instructions - practical limits on AGENTS.md files, with implications for prompt design.
- OpenAI abandons SWE-bench Verified - the standard coding benchmark retired because it can't keep up.
- Anthropic refused Pentagon demand to remove safeguards - while OpenAI signed a classified DoD agreement.
- Ads coming to ChatGPT - OpenAI testing ads in the free tier for US users.
Resources
- Full narrative - chapter-by-chapter deep dive covering all major stories
- Presentation slides
- Quick reference index - searchable list of people, organizations, and links
Key Links
- GPT-5.4
- GPT-5.3 Codex Spark
- GPT-5.3 Instant
- Gemini 3.1 Pro
- Gemini 3.1 Flash-Lite
- Gemini 3 Deep Think
- Claude Sonnet 4.6
- GLM-5
- METR Task Horizon results
- Perplexity Computer
- Notion Custom Agents
- Google Workspace CLI
- Nathan Lambert: Post-benchmark era
- AGENTS.md evaluation paper
- AI Doesn't Reduce Work, It Intensifies It
Next Session
13 April 2026 | Register