SafeFlow: Real-Time Text-Driven Humanoid Whole-Body Control via Physics-Guided Rectified Flow and Selective Safety Gating

Hanbyel Cho, Sang-Hun Kim, Jeonguk Kang, Donghan Koo
Future Robot AI Group, Samsung Electronics
arXiv Preprint

SafeFlow enables real-time text-driven humanoid control with physics-guided generation and a 3-Stage Safety Gate. When unsafe commands are detected, the system triggers a safe fallback to protect the robot.

TL;DR: SafeFlow is a real-time text-driven humanoid whole-body control framework that combines physics-guided rectified flow matching for executable motion generation with a 3-Stage Safety Gate for robust deployment under open-ended and out-of-distribution text inputs.

Abstract

Recent advances in real-time interactive text-driven motion generation have enabled humanoids to perform diverse behaviors. However, kinematics-only generators often exhibit physical hallucinations, producing motion trajectories that are physically infeasible to track with a downstream motion tracking controller or unsafe for real-world deployment. These failures often arise from the lack of explicit physics-aware objectives for real-robot execution and become more severe under out-of-distribution (OOD) user inputs. Hence, we propose SafeFlow, a text-driven humanoid whole-body control framework that combines physics-guided motion generation with a 3-Stage Safety Gate driven by explicit risk indicators. SafeFlow adopts a two-level architecture. At the high level, we generate motion trajectories using Physics-Guided Rectified Flow Matching in a VAE latent space to improve real-robot executability, and further accelerate sampling via Reflow to reduce the number of function evaluations (NFE) for real-time control. The 3-Stage Safety Gate enables selective execution by detecting semantic OOD prompts using a Mahalanobis score in text-embedding space, filtering unstable generations via a directional sensitivity discrepancy metric, and enforcing final hard kinematic constraints such as joint and velocity limits before passing the generated trajectory to a low-level motion tracking controller. Extensive experiments on the Unitree G1 demonstrate that SafeFlow outperforms prior diffusion-based methods in success rate, physical compliance, and inference speed, while maintaining diverse expressiveness.

Key Contributions

SafeFlow Framework

A real-time text-driven humanoid whole-body control framework that couples physics-guided generation with deployment-time selective execution for robustness under unconstrained prompts.

Physics-Guided Rectified Flow + Reflow

Physics-guided rectified flow matching in a VAE latent space with reflow distillation reduces sampling to a single function evaluation (NFE=1, 92.6 Hz generator-only; ~67.7 Hz with full safety pipeline) while significantly improving the physical feasibility and real-robot executability of generated motions.

Training-Free 3-Stage Safety Gate

Proactively blocks unsafe behaviors under OOD prompts via Mahalanobis semantic OOD filtering, directional sensitivity discrepancy metric for generation instability, and hard kinematic screening.

Motivation

Failure Cases of a Baseline Text-Driven Reference Motion Generator: While a kinematics-only baseline produces physically feasible motions for simple prompts (a), it often generates infeasible references—including joint limit violations (b) and self-collisions (c)—even under in-distribution commands. For out-of-distribution prompts, the generation process becomes unstable, leading to structural collapse and unsafe, implausible full-body configurations (d). These failure modes underscore the critical need for physics-guided generation and runtime safety gating.

Method

Overview of SafeFlow: Top (Deployment, Online): A 3-Stage Safety Gate hierarchically filters OOD semantics, generation instability, and kinematic violations. A reflow-accelerated high-level motion generator provides physically feasible reference motions. If accepted, these are executed by the downstream motion tracking controller; otherwise, a safe fallback is triggered. Bottom (Training, Offline): The motion generator is trained via VAE latent learning and physics-guided flow matching with reflow distillation (NFE=1). The motion tracking controller is trained in simulation via RL.

Results

Physical Executability & Tracking Fidelity

SafeFlow improves generator compliance and downstream tracking fidelity. Joint limit violations drop from 43.14% to 3.08%, and the success rate increases from 80.6% to 98.5%.

Method Generator-Only System-Level Tracking Fidelity
JV ↓ SC ↓ Succ. ↑ Empjpe Evel Eacc
TextOp (Baseline) 43.14% 11.05% 80.6% 81.42 0.23 10.61
SafeFlow (Flow) 12.75% 7.25% 92.7% 55.32 0.17 7.98
SafeFlow (+ Guid.) 6.32% 4.39% 98.0% 46.39 0.11 5.48
SafeFlow (+ Guid. & Reflow) 3.08% 1.42% 98.5% 40.89 0.09 4.54

Kinematic Feasibility and Tracking Stability: Despite generating dynamic motions (left), our full pipeline, SafeFlow (+Guid. & Reflow), stabilizes kinematic references and improves tracking. (a) Generator-only: SafeFlow suppresses erratic spikes in CoM velocity and joint acceleration. (b) System-level: SafeFlow mitigates torque chattering and joint velocity spikes, enabling hardware-safe tracking.

Deployment-Time Safety and Robustness

Generation Instability Score 𝓡 Detects Failure-Prone References: Mean tracking MPJPE of 10-frame windows grouped into absolute 𝓡 quintiles for In-Distribution (ID) and Out-of-Distribution (OOD) sequences. MPJPE increases monotonically with 𝓡, indicating that high-𝓡 windows correspond to physically unstable references.

Instability Score-Triggered Safe Fallback: When the instability score 𝓡 exceeds the fallback threshold due to unstable flow dynamics, Stage 2 temporarily overrides the current command, injects a standing prompt, and interpolates the tracker reference toward a predefined standing pose. Without Stage 2, the robot fails to track the unstable reference motion; with Stage 2 enabled, it remains stable and awaits the next prompt.

Real-Time Performance

Via reflow distillation, SafeFlow achieves real-time inference at ~67.7 Hz with the complete safety pipeline, adding only 3.98 ms overhead for the 3-Stage Safety Gate.

Pipeline Component Added (ms) ↓ Latency (ms) ↓ Freq. (Hz) ↑
TextOp Generator (Baseline) - 23.59 42.4
SafeFlow Generator (+ Guid.) - 172.03 5.8
SafeFlow Generator (+ Guid. & Reflow) - 10.80 92.6
+ Stage 1 (Semantic OOD) 0.006 10.81 92.5
+ Stage 2 (Generation Instability) 3.96 14.77 67.7
+ Stage 3 (Hard Kinematic Screen) 0.013 14.78 67.7

Real-Robot Deployment

Real-Robot Deployment of SafeFlow on Unitree G1: The robot executes a continuous long-horizon command sequence with smooth transitions across diverse behaviors, including upper-body gestures ("wave hands", "punch") and whole-body actions ("squat down", "hop on left leg"). A high-risk prompt ("double backflip") is included. The 3-Stage Safety Gate filters the unsafe reference and triggers a standing fallback, allowing the robot to maintain balance and continue execution under subsequent prompts. This demonstrates sim-to-real transferability and deployment-time safety on hardware.

BibTeX

@article{cho2026safeflow,
    title     = {SafeFlow: Real-Time Text-Driven Humanoid Whole-Body Control via Physics-Guided Rectified Flow and Selective Safety Gating},
    author    = {Cho, Hanbyel and Kim, Sang-Hun and Kang, Jeonguk and Koo, Donghan},
    journal   = {arXiv preprint},
    year      = {2026}
}