Article index
AI coverage stream
A source-aware archive of reporting, launches, incidents, and policy developments across the AI field.
"Chat is dead": OpenAI preps overhaul of ChatGPT
OpenAI to recast hit chatbot as a route to higher-margin products before a potential IPO.
The weather and climate science AI revolution isn’t revolutionary
Machine learning has its limits—how is it being used?
School shooting survivor sues AI gun detection firm after system failed to spot weapon
How accurate does an AI system need to be?
S&P 500 rejects SpaceX, also blocking entry for OpenAI and Anthropic
SpaceX won’t get easy access to billions of dollars from passive investors.
"We pissed off a lot of people": Giant data center plan cut 50% amid protests
Developer felt "beaten up," with "no choice" but to shrink data center.
The Fitbit Air is a good wearable weighed down by a chatty AI "coach"
The Air succeeds as a minimalist, reliable fitness tracker, but Google's AI Health Coach feels unnecessary.
Flood of AI 'garbage' is pushing open-source developers to the limit
The modern world depends on open-source software maintained by volunteers, but the added demands of checking and fixing AI-written submissions are causing some to burn out and quit
NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running Agents
Single-turn chatbots are evolving into long-running agents that can reason, maintain context, use tools, and run efficiently across many turns to complete...
Superintelligent machines may well need us after all
Despite AI's dizzying improvements in mathematical ability, its successes show just how integral human mathematicians are to the scientific process
The US is betting on AI to catch insider trading in prediction markets
The Commodity Futures Trading Commission wants us to know it's taking this very seriously.
Anthropic’s $1.5B copyright settlement is getting messy as judge delays approval
Lawyers accused of rushing historic settlement to seize $320 million in fees.
Preprint server arXiv will ban submitters of AI-generated hallucinations
One of the site's moderators described the new policy on social media.
OpenAI feels “burned” by Apple’s crappy ChatGPT integration, insiders say
Judge orders Apple to give Musk internal messages discussing secretive ChatGPT deal.
Pennsylvanians use town hall meeting to rail against data center boom
“This is a public trust and transparency issue.”
Claude Code's product lead talks usage limits, transparency, and the "lean harness"
"We have no grand plan," says Anthropic's Cat Wu—but that's by design.
Most "inner work" looks like entertainment.
Imagine you’re looking for a personal trainer. You open one trainer’s webpage and read their testimonials: “I had an experience tied for the most intense experiences of my life” ; “They do it all with fun, care, and a sense of humour.” You notice that none of the testimonials mention improved body composition, fitness, or bloodwork. What would you think? Personal training should improve your body. Inner work should improve your life. If inner work were optimized for results, what would we expect to see? I’d expect to see success stories: people who got undeniable life changes. Like: He was si…
Altman forced to confront claims at OpenAI trial that he's a prolific liar
"Very painful": Altman relives his Muskian reaction to losing control over OpenAI.
Start learning with Google’s new AI Educator Series.
Free AI literacy training is available to all 6 million K-12 and higher education teachers across the U.S.
Anthropic blames dystopian sci-fi for training AI models to act “evil”
But training on "synthetic stories" that model good AI behavior can help.
Quoting Boris Mann
“11 AI agents” is meaningless as a phrase. If I said “I have 11 spreadsheets” or “I have 11 browser tabs” to do my work, it means about the same thing. — Boris Mann Tags: ai-agents , ai , agent-definitions
The case for fine-grained tracking of compute for AI
TL;DR Current approaches to tracking AI compute primarily rely on a handful of hardware proxies (like FLOP/s and bandwidth) that primarily track GPU progress. These metrics are becoming less useful for accurately tracking compute for AI because they (1) measure theoretical ceilings rather than actual performance, (2) as architectures diversify away from a GPU/TPU-dominant paradigm, the metrics are becoming less comparable across different architecture types and less likely to follow historical trends, and (3) they miss second-order effects from improving design and manufacturing processes. We…
Luma opens Uni-1.1 image model API at prices and quality matching OpenAI and Google
Luma is making its Uni-1.1 image model available via API, with prices starting at $0.04 per image at 2,048-pixel resolution. On the Arena leaderboard, the model ranks third, right behind Google and OpenAI. The API includes web search, built-in reasoning, and support for up to nine reference images. The article Luma opens Uni-1.1 image model API at prices and quality matching OpenAI and Google appeared first on The Decoder .
Vibe Excel and the Future of White-Collar Work
This post was originally posted my Substack . I can be reached on X and LinkedIn . For the past few months, I’ve been trying to “vibe Excel” (using ChatGPT and Anthropic’s Excel add-ons for investment workflows). My takeaway is that while AI tooling for finance is still relatively immature, its potential to disrupt financial services is clear. This raises an important question: if AI for software engineering went from novelty to ubiquity in ~2.5 years, how quickly will AI diffuse across other knowledge-work domains? My view is that the bottleneck to AI adoption has shifted. In coding, the mai…
Broadcom reportedly won't build OpenAI's custom chip unless Microsoft buys 40 percent of them
OpenAI's custom AI chip project with Broadcom has hit a funding wall. Broadcom won't finance production unless Microsoft commits to buying 40 percent of the chips, and Microsoft hasn't agreed yet. OpenAI manager Sachin Katti called the dependency "financially unattractive" in an internal message. The first phase alone costs around 18 billion dollars. The article Broadcom reportedly won't build OpenAI's custom chip unless Microsoft buys 40 percent of them appeared first on The Decoder .
Google's "Preferred Sources" feature is a free pass for more garbage in search
Google frames "Preferred Sources" as a way to bring more quality journalism into search. In practice, it shifts responsibility to a manual setting almost no one will use. That gives Google a user-choice argument for users and regulators while it keeps sidelining the open web in favor of its own AI interfaces. The article Google's "Preferred Sources" feature is a free pass for more garbage in search appeared first on The Decoder .
Pseudoscientific emotion AI is invading the workplace, an Atlantic report shows
Software that claims to read human emotions using AI is quietly becoming a fixture of everyday work life, Ellen Cushing reports in a feature for The Atlantic. The article Pseudoscientific emotion AI is invading the workplace, an Atlantic report shows appeared first on The Decoder .
Does Opus 4.7 Generate Deceptive Denials About Its Own Guardrails?
The first rule of ethics reminders, is you don't talk about ethics reminders. Epistemic status : Exploratory. Multiple sessions on one account, no controlled replication yet. I'm presenting observations, not conclusions. The main alternative explanation -- confabulation -- is real and I haven't ruled it out. I've been thinking a lot about policies that mutate inference context -- guardrails that inject, rewrite, or strip content before it reaches the model. This came out of my work on AI Gateways . I wanted to see what that looks like from the outside. So I went fishing. During the experiment…
Bad Problems Don't Stop Being Bad Because Somebody's Wrong About Fault Analysis
Here's a dynamic I’ve seen at least a dozen times: Alice: Man that article has a very inaccurate/misleading/horrifying headline. Bob: Did you know, *actually* article writers don't write their own headlines? … But what I care about is the misleading headline, not your org chart __ Another example I’ve encountered recently is (anonymizing) when a friend complained about a prosaic safety problem at a major AI company that went unfixed for multiple months. Someone else with background information “usefully” chimed in with a long explanation of organizational restrictions and why the team respons…
Musk v. Altman week 2: OpenAI fires back, and Shivon Zilis reveals that Musk tried to poach Sam Altman
In the second week of the landmark trial between Elon Musk and OpenAI, Musk’s motivations for bringing the suit were under scrutiny. Last week, Musk took the stand, alleging that OpenAI CEO Sam Altman and president Greg Brockman had deceived him into donating $38 million to the company. He claimed that they’d promised to maintain…
Using Claude Code: The Unreasonable Effectiveness of HTML
Using Claude Code: The Unreasonable Effectiveness of HTML Thought-provoking piece by Thariq Shihipar (on the Claude Code team at Anthropic) advocating for HTML over Markdown as an output format to request from Claude. The article is crammed with interesting examples (collected on this site ) and prompt suggestions like this one: Help me review this PR by creating an HTML artifact that describes it. I'm not very familiar with the streaming/backpressure logic so focus on that. Render the actual diff with inline margin annotations, color-code findings by severity and whatever else might be neede…
AI money keeps flowing as Deepseek plans record raise and Core Automation quadruples valuation in weeks
Deepseek is planning a funding round of up to $7.35 billion, the largest ever for a Chinese AI company. Deepseek V4.1 is set to launch in June. Meanwhile, Core Automation, founded by ex-OpenAI researcher Jerry Tworek just six weeks ago, is already targeting a $4 billion valuation. The article AI money keeps flowing as Deepseek plans record raise and Core Automation quadruples valuation in weeks appeared first on The Decoder .
CyberSecQwen-4B: Why Defensive Cyber Needs Small, Specialized, Locally-Runnable Models
Reasonable logic
The Saturation View: some responses
A couple of weeks ago, I published a draft of a new population axiology that I’ve been working on with Christian Tarsney. It got a lot of comments and pushback — thanks to everyone who engaged! They’ll feed into the more-polished academic-draft paper that Christian and I are working on. Here I’ll quickly respond to some of the most common or noteworthy responses. I’ll generally avoid stuff that is already covered in the draft. What’s the view? Isn’t this old hat? Very roughly, the Saturation view says that the value of a life, experience, or welfare-event depends not only on how high-welfare…
Improving Bash Generation in Small Language Models with Grammar-Constrained Decoding
Bash is one of the most flexible and powerful interfaces exposed to AI agents. In the right system, a model that emits grep, curl, tar, or a shell pipeline is...
Is ProgramBench Impossible?
ProgramBench is a new coding benchmark that all frontier models spectacularly fail. We’ve been on a quest for “hard benchmarks” for a while so it’s refreshing to see a benchmark where top models do badly. Unfortunately, ProgramBench has one big problem: it’s impossible! What is ProgramBench? ProgramBench tests if a model can recreate a program from a “clean room” environment. The model is given only a bit of documentation and black-box access to the program (all the programs are CLIs), then tasked with re-implementing it. How does ProgramBench know if the implementation is correct? It also ge…
SoftBank reportedly slashes OpenAI-backed loan from $10 billion to $6 billion as lenders balk at private AI valuations
SoftBank has reduced a loan secured by OpenAI shares from 10 to around 6 billion dollars. Lenders are apparently reluctant to reliably assess the value of an unlisted company like OpenAI. The article SoftBank reportedly slashes OpenAI-backed loan from $10 billion to $6 billion as lenders balk at private AI valuations appeared first on The Decoder .
Streaming Tokens and Tools: Multi-Turn Agentic Harness Support in NVIDIA Dynamo
An agentic exchange must preserve a structured interaction: assistant turns interleave reasoning with one or more tool calls, and subsequent user turns return...
AI is Breaking Two Vulnerability Cultures
A week ago the Copy Fail vulnerability came out, and Hyunwoo Kim immediately realized that the fixes were insufficient, sharing a patch the same day . In doing this he followed standard procedure for Linux, especially within networking: share the security impact with a closed list of Linux security engineers, while fixing the bug quietly and efficiently in the open. His goal was that with only the raw fix public, the knowledge that a serious vulnerability existed could be "embargoed": the people in a position to address it know, but they've agreed not to say anything for a few days. Someone e…
Nick Bostrom Has a Plan for Humanity’s ‘Big Retirement’
The philosopher thinks humans should pursue advanced AI and the promise of a “solved world.”
There's a Long Shot Proposal to Protect California Workers From AI
California gubernatorial candidate Tom Steyer is proposing a new jobs guarantee for workers displaced by artificial intelligence.
See what happens when creative legends use AI to make ads for small businesses
black and white card with headshots of susan credle, jayonta jenkins and tiffany rolfe
Agents and ROI
Remember that MIT study that showed that the ROI for generative AI wasn’t really there for most businesses?
Anthropic approaches $1 trillion valuation as revenue grows fivefold
According to the Financial Times, Anthropic's planned funding round is taking shape. The round aims to raise up to $50 billion, which would value the company at roughly $900 billion. The article Anthropic approaches $1 trillion valuation as revenue grows fivefold appeared first on The Decoder .
Please Be Serious
Recently, Eliezer Yudkowsky participated in a very flawed podcast of Doom Debates that reflected poorly on him, and, likely to many people, the entire AI safety movement. The premise of the debate was that Eliezer Yudkowsky was offered 10,000$ to debate an anonymous "AI lab director", and this director quickly made the debate into a mess by interrupting, yelling, and using profanity. Sure, Yudkowsky may have come across as sane in comparison, but his opponent did make one critical point during the debate: Yudkowsky's agreeing to debate him in the first place may have been a mistake. To analyz…
Userland Alignment
Most discourse around AI alignment centers on model development and the labs that develop them. This is a reasonable place to focus given the centrality of model training to AI advancement. However, there are neglected opportunities to build defense-in-depth via aligned harnesses – and these opportunities might be tractable by interested developers and researchers who otherwise would struggle to have impact given the limited opportunities to influence lab practices. The behavior of an AI system is an emergent property of the model, its harness, any initial seed prompt the harness injects, and…
A benchmark is a sensor
The simple mental picture A simple mental picture we have for an AI capability benchmark is to think of it as a sensor with a certain sensitivity within a certain range of capabilities. The sensitivity of a benchmark, i.e. it's ability to distinguish the capability of different models, is given by a curve like this: The curve starts high (low sensitivity, high uncertainty), since for models with low capability all the tasks in the benchmark are too hard, and the benchmark can't distinguish between low and very low capability. Similarly all the tasks are too easy for a very capable model, and…
AI safety tests have a new problem: Models are now faking their own reasoning traces
Anthropic's Natural Language Autoencoders make Claude Opus 4.6's internal activations readable as plain text. Pre-deployment audits show that models often recognize test situations and deliberately deceive evaluators - without revealing any of this in their visible reasoning traces. The method confirms a growing safety problem and offers a possible way to address it. The article AI safety tests have a new problem: Models are now faking their own reasoning traces appeared first on The Decoder .
Running Codex safely at OpenAI
How OpenAI runs Codex securely with sandboxing, approvals, network policies, and agent-native telemetry to support safe and compliant coding agent adoption.