Paper
CAPE: Contrastive Action-conditioned Parallel Encoding for Embodied Planning
arXiv:2606.07304v1 Announce Type: new Abstract: Embodied agents need to predict the future consequences of candidate actions in order to plan effectively before execution. Existing visual dynamics models learn by reconstructing future visual states or rolling out dense latent representations, which spreads learning capacity across visually salient but planning-irrelevant content rather than the action-conditioned changes that drive manipulation outcomes. We propose CAPE, a Contrastive Action-conditioned Parallel Encoding framework that learns visual dynamics by distinguishing the future outco…
Authors:
Topics
Relevant entities
People
Linked people will appear here.
Related coverage
Linked coverage will appear here.
Related events
Linked events will appear here.
Related discussions
Related discussion nodes will appear here.