Multi-Modal Dataset Across Exertion Levels: Capturing Post-Exercise Speech, Breathing, and Phonocardiogram
Jingping Nie, Yuang Fan, Minghui Zhao, and 4 more authors
In Proceedings of the 23rd ACM Conference on Embedded Networked Sensor Systems, UC Irvine Student Center., Irvine, CA, USA, 2025
Cardio exercise elevates both heart rate and respiration rate, resulting in distinct physiological changes that affect speech patterns, pitch, breathing sounds, and heart sounds. These variations, which occur post-exercise, are influenced by factors such as exercise intensity and individual fitness levels. A comprehensive audio dataset is critically needed to capture post-exercise physiological changes, as existing datasets focus mainly on resting speech, breathing, and heart sounds, neglecting the dynamic shifts following physical exertion. Current datasets fail to capture unique post-exercise variations like speech disfluencies, altered breathing patterns, and variable heart sound intensities, limiting model generalizability to post-exercise conditions. To address this gap, we recruited 59 subjects from diverse backgrounds to engage in cardio exercise, specifically running, reaching varied exertion levels to produce a rich dataset. Our dataset includes 250 sessions totaling 143 minutes of structured reading, 47 minutes of spontaneous speech, 71 minutes of breathing sounds, and 62.5 minutes of phonocardiogram (PCG) recordings. We designed and deployed preliminary case studies to show that speech changes post-cardio could serve as an indicator of exertion level. We envision this dataset as a foundational resource for designing models in speech and cardiorespiratory monitoring that are resilient to the physiological shifts induced by exercise. This dataset could advance natural language processing (NLP) applications, mobile health, and wearable sensing technologies by enabling resilient and accurate physiological monitoring in real-world conditions.