Comprehensive Summary
Hu and colleagues investigated how human eye gaze coordinates with full-body movements across a variety of real-world, virtual reality (VR), and augmented reality (AR) scenarios. Prior studies primarily examined eye-head coordination, but this work expanded the analysis to include full-body motion using four public datasets (MoGaze, ADT, GIMO, and EgoBody). Their findings showed that during human-object interactions (pick-and-place), eye gaze strongly correlates with body motion and precedes physical actions, while in human-human interactions (chat, teach), gaze is more closely aligned with the body orientation toward a partner. Building on these insights, the team developed Pose2Gaze, a novel model combining convolutional and spatio-temporal graph convolutional neural networks to predict gaze from head directions and body poses. Compared to state-of-the-art gaze predictors based only on head movements, Pose2Gaze achieved significant accuracy improvements: 24.0% on MoGaze, 10.1% on ADT, 21.3% on GIMO, and 28.6% on EgoBody. Moreover, Pose2Gaze boosted downstream tasks such as eye-based activity recognition, reaching accuracies close to ground truth gaze. These results highlight the rich information available in eye-body coordination and open new avenues for VR/AR gaze prediction research.
Outcomes and Implications
While primarily positioned within VR/AR and human-computer interaction, this study has much wider implications for fields involving movement and cognitive coordination. Pose2Gaze demonstrates that full-body motion can help us predict information about the eye gaze, suggesting potential applications in rehabilitation, adaptive user interfaces, and cognitive assessment. For practitioners in neurorehabilitation and motor recovery, integrating pose-based gaze modeling could inform next-generation diagnostic and training tools, enriching interventions where tracking attention and motor coordination is critical.