Under Review

Start right,
arrive right.

Asynchronous execution of action-chunking robot policies via initial noise selection — not trajectory steering.

We introduce PAINTPrefix-Anchored INiTial noise — a training-free inference method. Instead of correcting a policy mid-generation, PAINT chooses the right starting noise so the unmodified flow naturally produces a chunk that continues the motion already underway.

Anonymous Under Review 

Training-free · no gradients · no policy modification

The one-paragraph version

A noise problem,
not a steering problem.

Robot policies predict action chunks — short bursts of future moves. Generating one takes time, so the robot keeps acting while the next chunk is computed. By the time it arrives, the robot has already moved on, and the new chunk must pick up exactly where the motion is — or it jerks at the seam.

Prior methods fix this by steering the generation toward the executed actions. PAINT asks a different question: what if the trajectory never needed correcting? We invert the flow ODE to find an initial noise whose unmodified rollout already satisfies the constraint — then run the policy exactly as trained.

12
simulated benchmarks (Kinetix)
6
real-world manipulation tasks
3
robot embodiments: single-arm, bimanual, humanoid
0grad
gradients, retraining or policy edits

The problem

The prefix constraint

Under asynchronous inference the robot can’t wait. It keeps executing the current chunk while the next one is generated — and advances d steps in the meantime.

So the first d actions of the new chunk must match the last d actions already executed. Violate it and the robot lurches: jerky, unsafe motion at every chunk boundary. The gap only widens as larger VLAs push inference latency past the controller’s clock.

A t-1[s + i]  =  At[i]   for  i = 0 … d-1

Toggle: a chunk sampled independently jumps at the boundary; a prefix-consistent chunk continues the motion.

The key idea

Two ways to satisfy the constraint

Both want the same prefix. They differ in where they intervene — during generation, or before it.

Prior work — e.g. RTC

Velocity steering

noise target + correction at every step

Push the velocity field toward the prefix at each denoising step — using backprop, retraining, or extra compute. Effective, but the orthogonal part of the correction can drag the trajectory off the policy’s learned flow.

PAINT — ours

Noise selection

chosen x₀* target unmodified flow — runs straight

Pick the starting noise x₀* so the standard ODE lands on the constraint on its own. The velocity field is never touched, so the chunk stays faithful to the policy’s own distribution.

This works because optimal-transport flow matching is approximately local: the noise at each position mostly governs the action at the same position. So inverting from a desired prefix recovers a noise that reproduces it — under the unmodified forward pass.

How PAINT works

Six steps, all forward-only

Step through the algorithm — or press play. Watch the predicted chunk go from a jump at the boundary to a clean continuation, just by changing where generation starts.

01 / 06

Sample fresh noise

Algorithm 1 — PAINT (Prefix-Anchored INiTial noise) Require: observation oₜ, executed prefix, delay d, ODE steps N 1 x₀ᶠʳᵉᵉ ~ 𝒩(0, I) 2 x₁ⁿᵃⁱᵛᵉ ← πθ(x₀ᶠʳᵉᵉ, oₜ) # naive forward pass 3 x₁ᵗᵃʳᵍᵉᵗ ← [ executed prefix | x₁ⁿᵃⁱᵛᵉ[d:] ] # build target 4 xτ ← x₁ᵗᵃʳᵍᵉᵗ 5 for τ = 1, 1−1/N, … , 1/N: 6 xτ ← xτ − (1/N)·vπ(xτ, oₜ, τ) # backward Euler 7 x₀ⁱⁿᵛ ← xτ 8 x₀* ← [ x₀ⁱⁿᵛ[:d] | x₀ᶠʳᵉᵉ[d:] ] # Mao re-painting rule 9 Aₜ ← πθ(x₀*, oₜ) # final forward pass 10 return Aₜ

Every line is a forward model call. No vector-Jacobian products at deployment — so PAINT slots cleanly into graph-compiled runtimes like TensorRT.

Results

Matches or beats gradient-based steering

Across simulation and hardware, PAINT meets or improves on Real-Time Chunking (RTC) on both task success and prefix consistency — while never touching the policy.

SR ↑ success rate ATR ↓ avg. time of successful rollouts (s) CON ↓ prefix consistency error
MethodSR ↑ATR ↓CON ↓
Block Stacking  ·  single-arm
TE0.5528.27
RTC0.7516.040.030
PAINT0.7515.320.023
Toy in Drawer  ·  single-arm
TE0.6023.19
RTC0.7517.850.025
PAINT0.8517.080.023
Banana in Pot  ·  single-arm
TE0.6029.57
RTC0.7030.150.031
PAINT0.7029.790.026
Towel Flinging  ·  bimanual
TE0.5117.13
RTC0.766.980.028
PAINT0.797.440.023
Shorts Folding  ·  bimanual
TE0.9033.82
RTC0.9018.790.027
PAINT0.9519.770.025
Part Placing  ·  humanoid
TE0.5068.51
RTC0.7018.280.030
PAINT0.7017.320.021

PAINT (ours). Best value per task & metric in bold. 20 trials per method–task pair. TE is synchronous, so it has no CON score (—).

Simulated benchmark · Kinetix

Across 12 force-control environments and rising inference delay, naive async degrades sharply while temporal ensembling and B-spline smoothing barely help — neither conditions on the executed prefix.

PAINT-Euler holds the strongest delay robustness and the lowest prefix mismatch, consistently above RTC — without any gradient computation. Each point aggregates 2,048 trials.

ablation   Among inversion methods, backward Euler best balances quality and cost — on par with the 2× costlier DPM-Solver.

Figure 3 — success rate and consistency vs. inference delay on the Kinetix benchmark
Latency · GR00T-N1.5

86 ±2 ms  PAINT  vs  113 ±3 ms RTC

Faster than gradient-based steering on smaller VLAs.

Latency · π₀

311 ±7 ms  PAINT  vs  213 ±4 ms RTC

The trade-off: extra forward passes replace backward ones.

Real-world tasks

Six tasks, three embodiments

Single-arm, bimanual (ALOHA) and humanoid — on two VLA architectures, under natural network inference delay.