Papers

Academic papers and research lineage

This archive traces how AI capability, safety, evaluation, and governance ideas evolve over time alongside public reporting and discussion.

RePEc: Research Papers in Economicsdate pending

Mastering Atari, Go, chess and shogi by planning with a learned model

Abstract Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess1 and Go2, where a perfect simulator is available. However, in real-world problems, the dynamics governing the environment are often complex and unknown. Here we present the MuZero algorithm, which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. The MuZero algorithm learns an iterable model that produces predictions relevant to planning: the action-selection policy, the value function and the reward. When evaluated on 57 different Atari games3—the canonical video game environment for testing artificial intelligence techniques, in which model-based planning approaches have historically struggled4—the MuZero algorithm achieved state-of-the-art performance. When evaluated on Go, chess and shogi—canonical environments for high-performance planning—the MuZero algorithm matched, without any knowledge of the game dynamics, the superhuman performance of the AlphaZero algorithm5 that was supplied with the rules of the game.

Julian Schrittwieser · Ioannis Antonoglou · Thomas Hubert · Karen Simonyan · Laurent Sifre · Simon Schmitt · Arthur Guez · Edward Lockhart · Demis Hassabis · Thore Graepel · Timothy Lillicrap · David Silver

arXiv cs.ROJun 8, 2026

A Human-Sensitive Controller: Adapting to Human Musculoskeletal Disorder-Related Constraints via Reinforcement Learning

arXiv:2504.10102v2 Announce Type: replace Abstract: Work-Related Musculoskeletal Disorders continue to be a major challenge in industrial environments, leading to reduced workforce participation, increased healthcare costs, and long-term disability. This study introduces a human-sensitive robotic system aimed at reintegrating individuals with a history of musculoskeletal disorders into standard job roles, while simultaneously optimizing ergonomic conditions for the broader workforce. This research leverages reinforcement learning (RL) to develop a human-aware control strategy for collaborativ…

arXiv cs.ROJun 8, 2026

Where to Touch, How to Contact: Hierarchical RL-MPC Framework for Geometry-Aware Long-Horizon Dexterous Manipulation

arXiv:2601.10930v3 Announce Type: replace Abstract: A key challenge in contact-rich dexterous manipulation is the need to jointly reason over global geometry and nonsmooth contact dynamics. End-to-end policies bypass this complexity, but often require large amounts of data and transfer poorly from simulation to reality. We address the limitations with a simple insight: dexterous manipulation is inherently hierarchical--at a high level, a robot decides where to touch (geometry); at a low level it determines how to move the object through contact dynamics. Building on this insight, we propose a…

arXiv cs.ROJun 8, 2026

CRAFT: Coaching Reinforcement Learning Autonomously using Foundation Models for Multi-Robot Coordination Tasks

arXiv:2509.14380v3 Announce Type: replace Abstract: Multi-Agent Reinforcement Learning (MARL) provides a powerful framework for learning coordination in multi-agent systems. However, applying MARL to robotics remains challenging due to their high-dimensional continuous joint action spaces, complex reward design, and non-stationarity from concurrently learning agents. On the other hand, humans often learn complex coordination with the help of coaches, who guide learning through carefully designed curricula and detailed feedback. Building on the reasoning capabilities of foundation models, we a…

arXiv cs.ROJun 8, 2026

From Pixels to Shelf: An Integrated Robotic System for Autonomous Supermarket Stocking with a Mobile Manipulator

arXiv:2509.11740v2 Announce Type: replace Abstract: Autonomous stocking in retail environments, particularly supermarkets, presents challenges due to dynamic human interactions, constrained spaces, and diverse product geometries. This paper introduces an efficient modular robotic system for autonomous shelf stocking, integrating commercially available hardware with a scalable algorithmic architecture. A major contribution of this work is the system integration of off-the-shelf hardware and ROS2-based perception, planning, and control into a single deployable platform for retail environments.…

arXiv cs.ROJun 8, 2026

Test-Time Trajectory Optimization for Autonomous Driving

arXiv:2606.07170v1 Announce Type: new Abstract: End-to-end planners for autonomous driving typically generate a set of candidate trajectories, score each one, and return the highest-scoring candidate. However, the scorer is applied only after the proposals are generated and cannot influence the set of trajectories: a weak set of candidates limits planning performance regardless of the scorer's quality. We instead treat the scorer as a learned trajectory-level reward function and search for trajectories that maximize it. Our method, TOAD, runs the Cross-Entropy Method at test time, warm-starte…

arXiv cs.ROJun 8, 2026

CAPE: Contrastive Action-conditioned Parallel Encoding for Embodied Planning

arXiv:2606.07304v1 Announce Type: new Abstract: Embodied agents need to predict the future consequences of candidate actions in order to plan effectively before execution. Existing visual dynamics models learn by reconstructing future visual states or rolling out dense latent representations, which spreads learning capacity across visually salient but planning-irrelevant content rather than the action-conditioned changes that drive manipulation outcomes. We propose CAPE, a Contrastive Action-conditioned Parallel Encoding framework that learns visual dynamics by distinguishing the future outco…

arXiv cs.ROJun 8, 2026

A Causal Probabilistic Framework for Perception-Informed Closed-Loop Simulation of Autonomous Driving

arXiv:2606.07186v1 Announce Type: new Abstract: Software-in-the-loop (SIL) simulation is a cornerstone for the validation of modern automotive safety functions. However, many current frameworks utilize ideal sensing, which bypasses the functional insufficiencies of perception algorithms, leading to over-optimistic safety assessments. This paper proposes a perception-informed SIL testing methodology that bridges the gap between ground-truth simulation and real-world perception behavior. We present a framework for incorporating causal probabilistic models into standardized, scenario-based simul…

arXiv cs.ROJun 8, 2026

Physiologically Constrained Musculoskeletal Neural Network for Multi-DoF Joint Kinematics Estimation from Partially Observed sEMG

arXiv:2606.07476v1 Announce Type: cross Abstract: This paper investigates multi-degrees of freedom (DoF) joint kinematics estimation under partially observed surface electromyography (sEMG), where only a subset of task-relevant muscles can be measured due to anatomical inaccessibility or sensor constraints. A novel musculoskeletal neural network (MSK-NN) is proposed to estimate multi-DoF joint angles while simultaneously inferring activations for both measured and unmeasured muscles. MSK-NN consists of a CNN-based muscle activation estimator and an embedded MSK forward dynamics module, formin…

arXiv cs.ROJun 8, 2026

Dreaming when Necessary: Advancing World Action Models with Adaptive Multi-Modal Reasoning

arXiv:2606.07089v1 Announce Type: new Abstract: World Action Models (WAMs) offer a promising approach to embodied intelligence, yet existing methods rely heavily on video prediction as action priors and lack adaptive multimodal reasoning, limiting their effectiveness on long-horizon, complex tasks. We observe that WAMs require different multimodal reasoning modes under different execution contexts: textual reasoning is essential during task transitions to guide high-level action prediction, while visual reasoning is critical during fine-grained manipulation for precise control. Motivated by t…

arXiv cs.ROJun 8, 2026

Predictive Style Matching: Natural and Robust Humanoid Locomotion

arXiv:2606.07083v1 Announce Type: new Abstract: Reinforcement learning has become the prevailing approach to humanoid locomotion control: policies transfer reliably from simulation to hardware and recover gracefully from disturbances. Motion quality, however, still lags behind: task-only rewards often converge to stiff, asymmetric gaits, while motion imitation methods improve appearance but become more sensitive to external disturbances because reference signals can oppose the transient poses needed to regain balance. We propose Predictive Style Matching, in which an offline predictor maps th…

arXiv cs.ROJun 8, 2026

LIMMT: Less is More for Motion Tracking

arXiv:2606.06953v1 Announce Type: new Abstract: We argue that high-quality motion data can steer tracking policies toward better optimization trajectories early in training. In this work, we introduce LIMMT (Less Is More for Motion Tracking). To our knowledge, this is the first data-centric study for physics-based humanoid motion tracking. We go beyond simply removing low-quality and erroneous clips, but define motion data quality through three dimensions: physics feasibility, diversity, and complexity. We show that even training with under 3% of AMASS yields better tracking performance than…

arXiv cs.ROJun 8, 2026

T-GMP: Terrain-conditioned Generative Motion Priors for Versatile and Natural Humanoid Locomotion

arXiv:2606.06944v1 Announce Type: new Abstract: Achieving both anthropomorphic naturalness and robust terrain traversal remains a fundamental challenge in humanoid locomotion. Existing Reinforcement Learning (RL) approaches typically rely on fixed motion priors, limiting their adaptability to varying environments. We propose Terrain-conditioned Generative Motion Priors (T-GMP), a module that captures a terrain-conditioned latent motion manifold from a few expert state-terrain demonstrations using a Conditional Variational Autoencoder (CVAE). The learned priors enable smooth style transitions,…

arXiv cs.ROJun 8, 2026

Robots Need More than VLA and World Models

arXiv:2606.06556v1 Announce Type: new Abstract: Generalist robot intelligence is often framed as a policy-scaling problem: collect more robot demonstrations, train larger Vision-Language-Action (VLA) models, and expect broader generalisation. In this position paper, we argue that this framing is incomplete. The central bottleneck is not only policy learning, but the absence of mechanisms that convert the world's abundant unstructured behavioural data into grounded robot supervision. Human motion, internet video, simulation rollouts, and interactive demonstrations contain rich information abou…

arXiv cs.ROJun 8, 2026

PhyRoGen: Synthetic Generation of Physical Robot Manipulation Puzzles Using Procedural Content Generation

arXiv:2606.06569v1 Announce Type: new Abstract: Robot manipulation of physical puzzles is important for automatic assembly and disassembly tasks. However, to enable robots to solve physical puzzles, manipulation skills need to be learned, which requires large training datasets, the generation of which is often time consuming and tedious. To overcome this problem, we propose the Physical Robot Manipulation Puzzle Generation framework (PhyRoGen), which leverages procedural content generation (PCG) for automated generation of synthetic datasets of manipulation puzzles. PhyRoGen is a general-purp…

arXiv cs.ROJun 8, 2026

Optimal Control Approach for Non-prehensile Ball Juggling Using a 7-DoF Manipulator

arXiv:2606.06704v1 Announce Type: new Abstract: Non-prehensile object manipulation skills are important for real-world robot interactions, enabling highly dynamic tasks such as balancing a glass on a tray or the controlled sliding of items on a table. Among such tasks, those characterised by high-speed manipulation requirements and general sensitivity of the resulting hybrid dynamics are particularly hard to accomplish. Within these, juggling can be seen as a highly challenging maneuver to be solved. The key to robotic juggling is achieving dynamic stabilisation of an underactuated object. Si…

arXiv cs.ROJun 8, 2026

Mission-Level Runtime Assurance Framework for Autonomous Driving

arXiv:2606.06996v1 Announce Type: new Abstract: This paper studies runtime safety for autonomous driving when high-level driving commands become faulty or unreliable. Unlike conventional runtime-safety approaches that mainly focus on immediate vehicle safety, the proposed framework evaluates both driving safety and whether the vehicle can still successfully complete its mission before a command is executed. The framework extends highway-env with mission-level fault scenarios such as skipping required checkpoints, entering restricted areas, and generating future routes that can no longer compl…

arXiv cs.ROJun 8, 2026

IDDMBSE: Integrating Data-Driven and Model-Based Systems Engineering for Trusted Autonomous Cyber-Physical Systems

arXiv:2606.06727v1 Announce Type: new Abstract: Autonomous cyber-physical systems (CPS) sit at the intersection of Model-Based Systems Engineering (MBSE) and data-driven Machine Learning and Artificial Intelligence (ML/AI), yet no integrated Systems Engineering (SE) methodology natively spans both. We address this gap with IDDMBSE, an Integrated Data-Driven and Model-Based Systems Engineering methodology that extends the rigorous MBSE V-process with a data-driven loop at every step, anchored in SysML, the autonomy stack, and a hybrid model-based plus data-driven trade-off architecture. We ins…

arXiv cs.ROJun 8, 2026

STRIPS-WM: Learning Grounded Propositional STRIPS-style World Models from Images

arXiv:2606.06832v1 Announce Type: new Abstract: Robots performing long-horizon visual manipulation observe high-dimensional images, but successful plans depend on action-relevant facts: what can be done now and what changes afterward. A useful planning representation should discard irrelevant visual details while preserving action applicability and effects. Classical task planners exploit this structure through symbolic operators with preconditions and effects, but obtaining such representations from raw visual experience remains challenging. We study a visual task-planning setting in which a…

arXiv cs.ROJun 8, 2026

Predicting Dynamic Map States from Limited Field-of-View Sensor Data

arXiv:2602.12360v2 Announce Type: replace Abstract: When autonomous systems are deployed in real-world scenarios, sensors are often subject to limited field-of-view (FOV) constraints, either naturally through system design, or through unexpected occlusions or sensor failures. In conditions where a large FOV is unavailable, it is important to be able to infer information about the environment and predict the state of nearby surroundings based on available data to maintain safe and accurate operation. In this work, we explore the effectiveness of deep learning for dynamic map state prediction b…

Nature — Machine LearningJun 8, 2026