RECOGNITION OF EMOTIONS THROUGH SPEECH USING MACHINE LEARNING TECHNIQUES
Keywords:
neural networks, artificial intelligence, emotion recognition, speech feature extraction, machine learning, emotions, speechAbstract
One of the areas where AI can be used is the recognition of emotions through speech, ensuring the real use of these systems to access and democratize this type of technology. Customer service will be personalized, with bots determining the customer's mood while performing a service and the ability to redirect to human service if slurred speech is noticed. Call centers for emergency and insurance services, in particular, can be positively impacted by emotional recognition. This work presents a Recurrent Neural Network Unit (RNN)-gate recurrent (GRU) and a Convolutional Neural Network (CNN) for speech emotion classification with excellent performance in experimental conditions. The Ryerson Audiovisual Emotional Speech and Song (RAVDESS) dataset was used to train these models, which allowed for the creation of an evaluation and testing environment. Evaluation of a model trained in English provided an accuracy of approximately 42%, which was considered unsatisfactory for the classifier. The main characteristics identified as responsible for performance are sample group characteristics, classification bias without cross-validation, and lack of noise processing. The neural network that gave the best accuracy was RNN-GRU, which achieved 79.69% using a technique that increases the size of the dataset through a stretching process.