fuzzy set model; similarity measures; phrase matching; information retrieval
Emails are unquestionably one of the most popular communication media these days. Not only they are fast and reliable, but also free in general. Unfortunately, a significant number of emails received by email users on a daily basis are spam. This fact is annoying, since spam emails translate into a waste of user’s time in reviewing and deleting them. In addition, spam emails consume resources, such as storage, bandwidth, and computer processing time. Many attempts have been made in the past to eradicate spam emails; however, none has been proved highly effective. In this paper, we propose a spam-email detection approach, called SpamED, which uses the similarity of phrases in emails to detect spam. Conducted experiments not only verify that SpamED using trigrams in emails is capable of minimizing false positives and false negatives in spam detection, but also it outperforms a number of existing email filtering approaches with a 96% accuracy rate.
(c) 2009 Wiley Periodicals, Inc. or copyright owner as specified in the Journal. This is a preprint of the article published in JASIST. The definitive version is available at http://www.interscience.Wiley.com/.;