Document Type : Original Article
Authors
Faculty of Computer Science and Engineering Shahid Beheshti University Tehran, Iran
Abstract
Abstract— Voice Pathology Detection (VPD) aims to identify voice impairments through the analysis of speech signals, providing a foundation for developing diagnostic tools in advanced healthcare services to the public. This paper contributes to the development of efficient and accurate models based on deep learning (DL) for automatic VPD using sustained vowels of speech data. Therefore, this study explores the comparative efficacy of Mel-Frequency Cepstral Coefficients (MFCCs) and Linear Predictive Coding (LPC) as acoustic features extracted from vowels /i/, /a/, and /u/. Using the AVFAD database, we utilized and optimized a Convolutional Neural Network (CNN) as a DL model to classify healthy and pathological voices, prioritizing both accuracy and computational efficiency for real-time applications. Our findings reveal that 20 MFCC features extracted from vowel /i/ achieve the highest accuracy, with the optimal model reaching approximately 88% on test data. Our findings reveal that 20 MFCC features extracted from vowel /i/ achieve the highest accuracy, with the optimal model reaching approximately 88% on test data.
Keywords
Main Subjects