Papers

Academic papers and research lineage

This archive traces how AI capability, safety, evaluation, and governance ideas evolve over time alongside public reporting and discussion.

RePEc: Research Papers in Economicsdate pending

Mastering Atari, Go, chess and shogi by planning with a learned model

Abstract Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess1 and Go2, where a perfect simulator is available. However, in real-world problems, the dynamics governing the environment are often complex and unknown. Here we present the MuZero algorithm, which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. The MuZero algorithm learns an iterable model that produces predictions relevant to planning: the action-selection policy, the value function and the reward. When evaluated on 57 different Atari games3—the canonical video game environment for testing artificial intelligence techniques, in which model-based planning approaches have historically struggled4—the MuZero algorithm achieved state-of-the-art performance. When evaluated on Go, chess and shogi—canonical environments for high-performance planning—the MuZero algorithm matched, without any knowledge of the game dynamics, the superhuman performance of the AlphaZero algorithm5 that was supplied with the rules of the game.

Julian Schrittwieser · Ioannis Antonoglou · Thomas Hubert · Karen Simonyan · Laurent Sifre · Simon Schmitt · Arthur Guez · Edward Lockhart · Demis Hassabis · Thore Graepel · Timothy Lillicrap · David Silver

arXiv cs.CVJul 17, 2026

PanoAffordanceNet: Towards Holistic Affordance Grounding in 360{\deg} Indoor Environments

arXiv:2603.09760v2 Announce Type: replace Abstract: Global perception is essential for embodied agents in 360{\deg} spaces, yet current affordance grounding remains largely object-centric and restricted to perspective views. To bridge this gap, we introduce a novel task: Holistic Affordance Grounding in 360{\deg} Indoor Environments. This task faces unique challenges, including severe geometric distortions from Equirectangular Projection (ERP), semantic dispersion, and cross-scale alignment difficulties. We propose PanoAffordanceNet, an end-to-end framework featuring a Distortion-Aware Spectr…

arXiv cs.LGJul 17, 2026

Stop Thinking, Start Looking: Efficient Post-Training for Multimodal Document Question Answering via Reasoning-Free Alignment

arXiv:2607.14682v1 Announce Type: cross Abstract: Efficient multimodal document question answering with explicit visual grounding, locating the precise document region that supports each answer remains an open challenge. Current approaches bifurcate into Supervised Fine-Tuning (SFT), which requires large annotated datasets and reaches optimization plateaus, and reasoning-centric Reinforcement Learning (RL), which depends on verbose intermediate traces that inflate inference token cost without clear benefit. We introduce Perception-RFT, a training framework that applies Group Relative Policy O…

arXiv cs.CLJul 17, 2026

Smarter and Cheaper at Once: Byte-Exact KV-Cache Grafting Turns a Frozen Small Model into a Verified-Knowledge Flywheel

arXiv:2607.14431v1 Announce Type: new Abstract: We report a way to make a frozen small language model both more capable and dramatically cheaper at once, without changing any weights. Verified knowledge is deposited once as a byte-exact key-value (KV) state artifact and later restored, by graft, into a fresh inference context. The restore is bit-exact: under a pinned deterministic configuration, the grafted logits are byte-for-byte identical to a fresh computation (SHA-256 equality), with zero KL divergence and 100% argmax agreement over fifty samples. We show that own-position graft is the u…

Academic papers and research lineage

Mastering Atari, Go, chess and shogi by planning with a learned model

PanoAffordanceNet: Towards Holistic Affordance Grounding in 360{\deg} Indoor Environments

Stop Thinking, Start Looking: Efficient Post-Training for Multimodal Document Question Answering via Reasoning-Free Alignment

Smarter and Cheaper at Once: Byte-Exact KV-Cache Grafting Turns a Frozen Small Model into a Verified-Knowledge Flywheel

VTM-Nav: Hierarchical Visual-Topological Memory for Cross-Episode Object-Goal Navigation

CRISP: Constrained Refinement via Iterative Squeezing Process for Robust Medical Image Segmentation under Domain Shift

Reward-Free Evolving Agents via Pairwise Validator

Learning in Infinitesimal Non-Compositional Sketches

SceneBind: Binding What and Where Across Vision, Audio and Language

Structural-Semantic Reciprocal Learning for Unsupervised Visible-Infrared Person Re-Identification

Temporal Cascading of Planning and Control for Quadrotor MPC

Beyond Generalist LLMs: Specialist Agentic Systems for Structured Code Workflow Execution

D-cut: Adaptive Verification Depth Pruning for Batched Speculative Decoding

Quality-Aware Robust Multi-View Clustering for Heterogeneous Observation Noise

Stabilizing Native Low-Rank LLM Pretraining

Communication-Efficient Relative Pose Estimation with Vision Foundation Models for Ephemeral Collaborative Perception

Bridge Evidence: Static Retrieval Utility Does Not Predict Causal Utility in Multi-Step Agentic Search

Global drivers and barriers to the public acceptance of autonomous vehicles: Evidence from 17 countries

BadWAM: When World-Action Models Dream Right but Act Wrong

AnyStyle: Single-Pass Multimodal Stylization for 3D Gaussian Splatting

Hierarchical Denoising For Multi-Step Visual Reasoning

Symbal: Detecting Systematic Misalignments in Model-Generated Captions

MAGiSt3R: Multi-Agent Feed-forward 3D Reconstruction from Monocular RGB Videos

LATTICE: Graph Self-Supervised Learning for Multimodal Spatial Omics Integration

Knowledge-Embedded and Hypernetwork-Guided Few-Shot Substation Meter Defect Image Generation Method

RetroAgent: Harnessing LLMs to Search Over Structured Memory for Agentic Retrosynthesis Planning

Routing Ceilings Are Domain-Independent: Structural Prior Injection in Code Security Vulnerability Detection

CXRAgent: Director-Orchestrated Multi-Stage Reasoning for Chest X-Ray Interpretation

L-MARS: Legal Multi-Agent System with Agentic Search and Citation-Faithfulness Audit

CoDi -- an exemplar-conditioned diffusion model for low-shot counting

Step-Level Preference Learning for Generative Agents in Social Simulations

Marinarium: A Modular Experimental Facility for Reproducible Maritime and Space-Analog Field Robotics

Empirical evidence of Large Language Model's influence on human spoken communication

Knowledge-Aware Evolution for Task-Free Streaming Federated Continual Learning with Arbitrary Class Overlap

Non-vacuous Generalization Bounds for Reinforcement Learning with Verifiable Rewards

Memory-Driven Self-Disclosure and Relational Turning Points: A Longitudinal Multimodal Study of Human-AI Interaction

Energy-Efficient Federated Learning via Adaptive Encoder Freezing for MRI-to-CT Conversion: A Green AI-Guided Research

Towards an Intention Abstraction Layer for Autonomous Industrial Systems

EdgeFaaS: A Function-based Framework for Edge Computing

Harmonious Color Pairings: Insights from Human Preference and Natural Hue Statistics

Decision Making Needs Uncertainty Quantification [Lecture Notes]

Multi-LLM Collaborative MRI Report Generation for Visual Instruction Tuning in Brain Oncology

NIFA: Nonlinear IMC enhanced FPGA for efficient ML inference

An offline approach to fNIRS-guided reinforcement learning for robot behavior

Why Git Is the Memory Solution for the Agentic Development Lifecycle

Penny: Transition Network Analysis of Learner-Chatbot Interactions in Scaffolded EFL Writing

MagicPrompt: Ultra-Lightweight Prompt Tuning for Video Generation

Unsafe at any AUC: Unlearned Lessons from Sociotechnical Disasters for Responsible AI