Refining Compositional Diffusion for Reliable Long-Horizon Planning

TL;DR: Want to make a compositional diffusion planner reliable on long-horizon tasks? Try RCD (Refining Compositional Diffusion).

Motivation: The Mode-Averaging Problem

We visualize the entire denoising process on AntMaze-Giant-Stitch: starting from pure noise, both methods iteratively refine multiple overlapping local plans into a single composed trajectory from start to goal . Each task panel shows 20 sampled plans for the corresponding test-time task. Plans that violate environment constraints (wall penetration) are coloured red; feasible plans are coloured green. CompDiffuser frequently produces mode-averaged trajectories in low-density regions, leaving plans that cut through walls, while RCD steers each denoising step toward high-density modes and recovers globally coherent paths.

Task:

Key Insight: Self-Reconstruction Error as a Density Proxy

A pretrained diffusion model already provides an intrinsic density signal. Samples in high-density regions reconstruct faithfully through a noise-denoise cycle, while mode-averaged samples in low-density regions do not. This self-reconstruction error serves as a training-free density proxy for compositional guidance, computed using only the local diffusion model itself.

Combined with an overlap consistency term that penalizes score disagreement at segment boundaries, RCD steers denoising toward a tilted distribution that concentrates on high-density, globally coherent plans, mitigating mode-averaging without the population-based resampling and ranking overhead of prior search-based approaches.

Toy Experiment

The training data consists of overlapping length-$3$ segments, each shown in a distinct color. They are anchored at a fixed start ($x_1\!=\!0$) and goal ($x_L\!=\!0$), and pass through a bimodal distribution with modes at $+1$ and $-1$ at the interior positions. No full long-horizon trajectory is ever seen during training. At test time, the planner must compose these short segments into a long-horizon trajectory without averaging across the two incompatible interior modes.

Training data.

CompDiffuser vs. RCD

Pick a horizon $L$ and an RCD guidance weight $w$. The left panel always shows the CompDiffuser baseline (no guidance) at the chosen $L$; the right panel shows RCD at the chosen $(L, w)$. Green trajectories are valid (within the high-density mode bands at every position and mode-consistent across the interior); red trajectories are invalid (mode-averaged or off-mode).

Horizon $L$:

RCD guidance weight $w$:

CompDiffuser

RCD

What to look for. As $L$ grows, the CompDiffuser panel collapses to the mode-averaged centre band (red), while RCD with sufficient $w$ recovers the two true modes (green).

Qualitative Results in OGBench

OGBench PointMaze

Task:

Task 1

Task 2

Task 3
Task 4

Task 5

OGBench AntMaze

Task:

Task 1

Task 2

Task 3
Task 4

Task 5

OGBench HumanoidMaze

Task:

Task 1

Task 2

Task 3
Task 4

Task 5

OGBench AntSoccer (Object Manipulation)

Task:

Task 1

Task 2

Task 3
Task 4

Task 5

OGBench Visual AntMaze (Pixel Observations)

Task:

Task 1

Task 2

Task 3
Task 4

Task 5

OGBench Cube Manipulation

Task:

Task 1

Task 2

Task 3
Task 4

Task 5

Conclusion

We presented RCD, a training-free guidance method that addresses the mode-averaging problem in compositional diffusion planning. By combining a self-reconstruction density proxy with an overlap consistency term, RCD steers compositional sampling toward high-density, globally coherent plans without any extra training, architectural change, or population-based search. Across long-horizon tasks in OGBench, including locomotion, manipulation, and pixel-based observation environments, RCD consistently outperforms prior compositional and search-based methods while remaining plug-and-play and broadly applicable to existing diffusion planners.