Understanding human behavior is important for many different applications related to healthcare, including ambient intelligence for hospitals, elderly living environments, clinician-patient interactions, and behavioral health assessment. Human behavior is multimodal, and dynamically changing. Moreover, privacy is a critical aspect to be considered for human behavior estimation in healthcare settings.
Our prior work has focussed on how non-verbal visual human behavior (e.g., the visual focus of attention, body and arm pose, facial activity, etc.) can be estimated from a variety of sensors, ranging from very sparse privacy-preserving overhead range sensors to frontal video cameras, (b) how complementary non-verbal speech information (speaking and interruption patterns, tone, prosody etc.) can be integrated to supplement sparse visual information, and (c) how social science metrics like perceived leadership, contribution, and personality traits can be predicted from automated human behavior estimates. We leveraged machine learning, computer vision, image and signal processing algorithms for this multimodal group dynamics analysis.
Unobtrusive and Privacy-Preserving Multimodal-Sensor-Enabled Ambient Intelligence
The automated estimation of human interactions in group settings forms the foundation for understanding team processes, and many systems use video cameras and/or wearable sensors as the basis for this estimation. However, such sensors may inhibit natural human behavior and can be particularly limiting in privacy critical spaces like hospitals or assisted daily living environments. We investigated a different modality for studying human interaction: time-of-flight (ToF) sensors. These sensors preserve human privacy much more than video cameras, while still allowing fine-grained measurements that can be effectively used to characterize individual and team behavior. We developed computer vision and machine learning methods for using ToF sensors for estimating human location, body, head and arm pose, visual focus of attention, and, when combined with non-verbal audio signals, speaking and interruption patterns. We then studied if the automatically extracted features could be used to predict perceptions of leadership, contribution, and group performance.
Selected Publications:
- Arrays of single pixel time-of-flight sensors for privacy preserving tracking and coarse pose estimation, WACV, 2016.
- Privacy-Preserving Understanding of Human Body Orientation for Smart Meetings, CVPRW, 2017.
- A Multimodal-Sensor-Enabled Room for Unobtrusive Group Meeting Analysis, ICMI, 2018.
- The unobtrusive group interaction (UGI) corpus, MMSys, 2019.
- Classifying the emotional speech content of participants in group meetings using convolutional long short-term memory network, Journal of the Acoustical Society of America, 2021.
Human Behavior Estimation using frontal video cameras
Video cameras enable a fine-grained analysis of eye gaze and facial expressions, that is not possible by overhead privacy-preserving range sensors. We developed multimodal machine learning algorithms that fuse video information from frontal cameras with non-verbal speech information to predict perceived leadership, contribution, and personality traits.
Selected Publications: