SafeFlow: Real-Time Text-Driven Humanoid Whole-Body Control via Physics-Guided Rectified Flow and Selective Safety Gating

TL;DR: SafeFlow is a real-time text-driven humanoid whole-body control framework that combines physics-guided rectified flow matching for executable motion generation with a 3-Stage Safety Gate for robust deployment under open-ended and out-of-distribution text inputs.

Abstract

Recent advances in real-time interactive text-driven motion generation have enabled humanoids to perform diverse behaviors. However, kinematics-only generators often exhibit physical hallucinations, producing motion trajectories that are physically infeasible to track with a downstream motion tracking controller or unsafe for real-world deployment. These failures often arise from the lack of explicit physics-aware objectives for real-robot execution and become more severe under out-of-distribution (OOD) user inputs. Hence, we propose SafeFlow, a text-driven humanoid whole-body control framework that combines physics-guided motion generation with a 3-Stage Safety Gate driven by explicit risk indicators. SafeFlow adopts a two-level architecture. At the high level, we generate motion trajectories using Physics-Guided Rectified Flow Matching in a VAE latent space to improve real-robot executability, and further accelerate sampling via Reflow to reduce the number of function evaluations (NFE) for real-time control. The 3-Stage Safety Gate enables selective execution by detecting semantic OOD prompts using a Mahalanobis score in text-embedding space, filtering unstable generations via a directional sensitivity discrepancy metric, and enforcing final hard kinematic constraints such as joint and velocity limits before passing the generated trajectory to a low-level motion tracking controller. Extensive experiments on the Unitree G1 demonstrate that SafeFlow outperforms prior diffusion-based methods in success rate, physical compliance, and inference speed, while maintaining diverse expressiveness.

Key Contributions

SafeFlow Framework

A real-time text-driven humanoid whole-body control framework that couples physics-guided generation with deployment-time selective execution for robustness under unconstrained prompts.

Physics-Guided Rectified Flow + Reflow

Physics-guided rectified flow matching in a VAE latent space with reflow distillation reduces sampling to a single function evaluation (NFE=1, 92.6 Hz generator-only; ~67.7 Hz with full safety pipeline) while significantly improving the physical feasibility and real-robot executability of generated motions.

Training-Free 3-Stage Safety Gate

Proactively blocks unsafe behaviors under OOD prompts via Mahalanobis semantic OOD filtering, directional sensitivity discrepancy metric for generation instability, and hard kinematic screening.

Motivation

Failure Cases of a Baseline Text-Driven Reference Motion Generator: While a kinematics-only baseline produces physically feasible motions for simple prompts (a), it often generates infeasible references—including joint limit violations (b) and self-collisions (c)—even under in-distribution commands. For out-of-distribution prompts, the generation process becomes unstable, leading to structural collapse and unsafe, implausible full-body configurations (d). These failure modes underscore the critical need for physics-guided generation and runtime safety gating.

Method

Overview of SafeFlow: Top (Deployment, Online): A 3-Stage Safety Gate hierarchically filters OOD semantics, generation instability, and kinematic violations. A reflow-accelerated high-level motion generator provides physically feasible reference motions. If accepted, these are executed by the downstream motion tracking controller; otherwise, a safe fallback is triggered. Bottom (Training, Offline): The motion generator is trained via VAE latent learning and physics-guided flow matching with reflow distillation (NFE=1). The motion tracking controller is trained in simulation via RL.

Results

Physical Executability & Tracking Fidelity

SafeFlow improves generator compliance and downstream tracking fidelity. Joint limit violations drop from 43.14% to 3.08%, and the success rate increases from 80.6% to 98.5%.

Method	Generator-Only		System-Level Tracking Fidelity
Method	JV ↓	SC ↓	Succ. ↑	E_mpjpe ↓	E_vel ↓	E_acc ↓
TextOp (Baseline)	43.14%	11.05%	80.6%	81.42	0.23	10.61
SafeFlow (Flow)	12.75%	7.25%	92.7%	55.32	0.17	7.98
SafeFlow (+ Guid.)	6.32%	4.39%	98.0%	46.39	0.11	5.48
SafeFlow (+ Guid. & Reflow)	3.08%	1.42%	98.5%	40.89	0.09	4.54

Kinematic Feasibility and Tracking Stability: Despite generating dynamic motions (left), our full pipeline, SafeFlow (+Guid. & Reflow), stabilizes kinematic references and improves tracking. (a) Generator-only: SafeFlow suppresses erratic spikes in CoM velocity and joint acceleration. (b) System-level: SafeFlow mitigates torque chattering and joint velocity spikes, enabling hardware-safe tracking.

Deployment-Time Safety and Robustness

Generation Instability Score 𝓡 Detects Failure-Prone References: Mean tracking MPJPE of 10-frame windows grouped into absolute 𝓡 quintiles for In-Distribution (ID) and Out-of-Distribution (OOD) sequences. MPJPE increases monotonically with 𝓡, indicating that high-𝓡 windows correspond to physically unstable references.

Instability Score-Triggered Safe Fallback: When the instability score 𝓡 exceeds the fallback threshold due to unstable flow dynamics, Stage 2 temporarily overrides the current command, injects a standing prompt, and interpolates the tracker reference toward a predefined standing pose. Without Stage 2, the robot fails to track the unstable reference motion; with Stage 2 enabled, it remains stable and awaits the next prompt.

Real-Time Performance

Via reflow distillation, SafeFlow achieves real-time inference at ~67.7 Hz with the complete safety pipeline, adding only 3.98 ms overhead for the 3-Stage Safety Gate.

Pipeline Component	Added (ms) ↓	Latency (ms) ↓	Freq. (Hz) ↑
TextOp Generator (Baseline)	-	23.59	42.4
SafeFlow Generator (+ Guid.)	-	172.03	5.8
SafeFlow Generator (+ Guid. & Reflow)	-	10.80	92.6
+ Stage 1 (Semantic OOD)	0.006	10.81	92.5
+ Stage 2 (Generation Instability)	3.96	14.77	67.7
+ Stage 3 (Hard Kinematic Screen)	0.013	14.78	67.7

Real-Robot Deployment

Real-Robot Deployment of SafeFlow on Unitree G1: The robot executes a continuous long-horizon command sequence with smooth transitions across diverse behaviors, including upper-body gestures ("wave hands", "punch") and whole-body actions ("squat down", "hop on left leg"). A high-risk prompt ("double backflip") is included. The 3-Stage Safety Gate filters the unsafe reference and triggers a standing fallback, allowing the robot to maintain balance and continue execution under subsequent prompts. This demonstrates sim-to-real transferability and deployment-time safety on hardware.

BibTeX

@article{cho2026safeflow,
    title     = {SafeFlow: Real-Time Text-Driven Humanoid Whole-Body Control via Physics-Guided Rectified Flow and Selective Safety Gating},
    author    = {Cho, Hanbyel and Kim, Sang-Hun and Kang, Jeonguk and Koo, Donghan},
    journal   = {arXiv preprint},
    year      = {2026}
}