Download | - View accepted manuscript: Using Data Mining Methods to Predict Personally Identifiable Information in Emails (PDF, 310 KiB)
|
---|
Author | Search for: Geng, L.; Search for: Korba, Larry; Search for: Wang, X.; Search for: Wang, Y.; Search for: Liu, H.; Search for: You, Y. |
---|
Format | Text, Article |
---|
Conference | The Fourth International Conference on Advanced Data Mining and Applications (ADMA 2008), October 8-10, 2008, Chengdu, China |
---|
Abstract | Private information management and compliance are important issues nowadays for most of organizations. As a major communication tool for organizations, email is one of the many potential sources for privacy leaks. Information extraction methods have been applied to detect private information in text files. However, since email messages usually consist of low quality text, information extraction methods for private information detection may not achieve good performance. In this paper, we address the problem of predicting the presence of private information in email using data mining and text mining methods. Two prediction models are proposed. The first model is based on association rules that predict one type of private information based on other types of private information identified in emails. The second model is based on classification models that predict private information according to the content of the emails. Experiments on the Enron email dataset show promising results. |
---|
Publication date | 2008 |
---|
In | |
---|
Language | English |
---|
NRC number | NRCC 50381 |
---|
NPARC number | 8914417 |
---|
Export citation | Export as RIS |
---|
Report a correction | Report a correction (opens in a new tab) |
---|
Record identifier | 6a47e196-20ca-470b-ad71-905eb20e5e77 |
---|
Record created | 2009-04-22 |
---|
Record modified | 2020-08-12 |
---|