SHED: Spam Ham Email Dataset
Files
Date
2017
Journal Title
Journal ISSN
Volume Title
Publisher
International Journal on Recent and Innovation Trends in Computing and Communication (
Abstract
Automatic filtering of spam emails becomes essential feature for a good email service provider. To gain direct or indirect benefits organizations/individuals are sending a lot of spam emails. Such kind emails activities are not only distracting the user but also consume lot of resources including processing power, memory and network bandwidth. The security issues are also associated with these unwanted emails as these emails may contain malicious content and/or links. Content based spam filtering is
one of the effective approaches used for filtering. However, its efficiency depends upon the training set. The most of the existing datasets were collected and prepared a long back and the spammers have been changing the content to evade the filters trained based on these datasets. In this paper, we introduce Spam Ham email dataset (SHED): a dataset consisting spam and ham email. We evaluated the performance of filtering techniques trained by previous datasets and filtering techniques trained by SHED. It
was observed that the filtering techniques trained by SHED outperformed the technique trained by other dataset. Furthermore, we also classified the spam email into various categories.
Description
Keywords
Spam Email, Non-spam emails, WEKA, feature selection, classifiers, Parameters
Citation
Sharma, U., & Khurana. S.S. (2016). SHED Ham Email Dataset. International Journal of Advanced Research in Computer Science, 5(6), 1078-1082