TL;DR: SafeFlow is a real-time text-driven humanoid whole-body control framework that combines physics-guided rectified flow matching for executable motion generation with a 3-Stage Safety Gate for robust deployment under open-ended and out-of-distribution text inputs.
Recent advances in real-time interactive text-driven motion generation have enabled humanoids to perform diverse behaviors. However, kinematics-only generators often exhibit physical hallucinations, producing motion trajectories that are physically infeasible to track with a downstream motion tracking controller or unsafe for real-world deployment. These failures often arise from the lack of explicit physics-aware objectives for real-robot execution and become more severe under out-of-distribution (OOD) user inputs. Hence, we propose SafeFlow, a text-driven humanoid whole-body control framework that combines physics-guided motion generation with a 3-Stage Safety Gate driven by explicit risk indicators. SafeFlow adopts a two-level architecture. At the high level, we generate motion trajectories using Physics-Guided Rectified Flow Matching in a VAE latent space to improve real-robot executability, and further accelerate sampling via Reflow to reduce the number of function evaluations (NFE) for real-time control. The 3-Stage Safety Gate enables selective execution by detecting semantic OOD prompts using a Mahalanobis score in text-embedding space, filtering unstable generations via a directional sensitivity discrepancy metric, and enforcing final hard kinematic constraints such as joint and velocity limits before passing the generated trajectory to a low-level motion tracking controller. Extensive experiments on the Unitree G1 demonstrate that SafeFlow outperforms prior diffusion-based methods in success rate, physical compliance, and inference speed, while maintaining diverse expressiveness.
A real-time text-driven humanoid whole-body control framework that couples physics-guided generation with deployment-time selective execution for robustness under unconstrained prompts.
Physics-guided rectified flow matching in a VAE latent space with reflow distillation reduces sampling to a single function evaluation (NFE=1, 92.6 Hz generator-only; ~67.7 Hz with full safety pipeline) while significantly improving the physical feasibility and real-robot executability of generated motions.
Proactively blocks unsafe behaviors under OOD prompts via Mahalanobis semantic OOD filtering, directional sensitivity discrepancy metric for generation instability, and hard kinematic screening.
Failure Cases of a Baseline Text-Driven Reference Motion Generator: While a kinematics-only baseline produces physically feasible motions for simple prompts (a), it often generates infeasible references—including joint limit violations (b) and self-collisions (c)—even under in-distribution commands. For out-of-distribution prompts, the generation process becomes unstable, leading to structural collapse and unsafe, implausible full-body configurations (d). These failure modes underscore the critical need for physics-guided generation and runtime safety gating.
Overview of SafeFlow: Top (Deployment, Online): A 3-Stage Safety Gate hierarchically filters OOD semantics, generation instability, and kinematic violations. A reflow-accelerated high-level motion generator provides physically feasible reference motions. If accepted, these are executed by the downstream motion tracking controller; otherwise, a safe fallback is triggered. Bottom (Training, Offline): The motion generator is trained via VAE latent learning and physics-guided flow matching with reflow distillation (NFE=1). The motion tracking controller is trained in simulation via RL.
SafeFlow improves generator compliance and downstream tracking fidelity. Joint limit violations drop from 43.14% to 3.08%, and the success rate increases from 80.6% to 98.5%.
| Method | Generator-Only | System-Level Tracking Fidelity | ||||
|---|---|---|---|---|---|---|
| JV ↓ | SC ↓ | Succ. ↑ | Empjpe ↓ | Evel ↓ | Eacc ↓ | |
| TextOp (Baseline) | 43.14% | 11.05% | 80.6% | 81.42 | 0.23 | 10.61 |
| SafeFlow (Flow) | 12.75% | 7.25% | 92.7% | 55.32 | 0.17 | 7.98 |
| SafeFlow (+ Guid.) | 6.32% | 4.39% | 98.0% | 46.39 | 0.11 | 5.48 |
| SafeFlow (+ Guid. & Reflow) | 3.08% | 1.42% | 98.5% | 40.89 | 0.09 | 4.54 |
Kinematic Feasibility and Tracking Stability: Despite generating dynamic motions (left), our full pipeline, SafeFlow (+Guid. & Reflow), stabilizes kinematic references and improves tracking. (a) Generator-only: SafeFlow suppresses erratic spikes in CoM velocity and joint acceleration. (b) System-level: SafeFlow mitigates torque chattering and joint velocity spikes, enabling hardware-safe tracking.
Generation Instability Score 𝓡 Detects Failure-Prone References: Mean tracking MPJPE of 10-frame windows grouped into absolute 𝓡 quintiles for In-Distribution (ID) and Out-of-Distribution (OOD) sequences. MPJPE increases monotonically with 𝓡, indicating that high-𝓡 windows correspond to physically unstable references.
Instability Score-Triggered Safe Fallback: When the instability score 𝓡 exceeds the fallback threshold due to unstable flow dynamics, Stage 2 temporarily overrides the current command, injects a standing prompt, and interpolates the tracker reference toward a predefined standing pose. Without Stage 2, the robot fails to track the unstable reference motion; with Stage 2 enabled, it remains stable and awaits the next prompt.
Via reflow distillation, SafeFlow achieves real-time inference at ~67.7 Hz with the complete safety pipeline, adding only 3.98 ms overhead for the 3-Stage Safety Gate.
| Pipeline Component | Added (ms) ↓ | Latency (ms) ↓ | Freq. (Hz) ↑ |
|---|---|---|---|
| TextOp Generator (Baseline) | - | 23.59 | 42.4 |
| SafeFlow Generator (+ Guid.) | - | 172.03 | 5.8 |
| SafeFlow Generator (+ Guid. & Reflow) | - | 10.80 | 92.6 |
| + Stage 1 (Semantic OOD) | 0.006 | 10.81 | 92.5 |
| + Stage 2 (Generation Instability) | 3.96 | 14.77 | 67.7 |
| + Stage 3 (Hard Kinematic Screen) | 0.013 | 14.78 | 67.7 |
Real-Robot Deployment of SafeFlow on Unitree G1: The robot executes a continuous long-horizon command sequence with smooth transitions across diverse behaviors, including upper-body gestures ("wave hands", "punch") and whole-body actions ("squat down", "hop on left leg"). A high-risk prompt ("double backflip") is included. The 3-Stage Safety Gate filters the unsafe reference and triggers a standing fallback, allowing the robot to maintain balance and continue execution under subsequent prompts. This demonstrates sim-to-real transferability and deployment-time safety on hardware.
@article{cho2026safeflow,
title = {SafeFlow: Real-Time Text-Driven Humanoid Whole-Body Control via Physics-Guided Rectified Flow and Selective Safety Gating},
author = {Cho, Hanbyel and Kim, Sang-Hun and Kang, Jeonguk and Koo, Donghan},
journal = {arXiv preprint},
year = {2026}
}