Hate Speech and Offensive Language Detection in Twitter Data Using Machine Learning Classifiers
No Thumbnail Available
Date
2023-05-03T00:00:00
Journal Title
Journal ISSN
Volume Title
Publisher
Springer Science and Business Media Deutschland GmbH
Abstract
Social media is rapidly growing in popularity and has its advantages and disadvantages. Users posting their daily updates and opinions on social media may inadvertently hurt the feelings of others. Detecting hate speech and harmful information on social media is critical these days, lest it led to calamity. In this research, machine learning classifiers such as Na�ve Bayes, support vector machines, logistic regression, and pre-trained models BERT and RoBERTa, developed by Google and Facebook, respectively, are used to detect hate speech and offensive content from Twitter data on a newly created dataset that included tweets and articles/blogs. The sentiments were obtained using the VADER sentiment analyzer. The results depicted that the pre-trained classifiers outperformed the machine learning classifiers utilized in this study. An accuracy score of 96% and 93% was scored by BERT and RoBERTa, respectively, on the tweet dataset, whereas on a dataset of articles/blogs, accuracy of 97% and 98%, respectively, was achieved by both the classifiers outperforming other classifiers used in this work. Further, it can also be depicted that neutral content is shared more in articles/blogs, hate content is mostly shared equally in both the tweets and article/blogs, whereas offensive content is shared higher in tweets than articles/blogs. � 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
Description
Keywords
BERT, Hate speech, Offensive language, RoBERTa, Tweets, VADER