Abstract | We investigate a method designed to improve the accuracy of process mining in scenarios where the identification of task labels for log events is uncertain. Such situations are prevalent in business processes where events consist of communications between people, such as email messages. We examine how the accuracy of an independent task identifier, such as a classification or clustering engine, can be improved by examining the currently mined process model. First, a classification scheme based on identifying keywords in each message is presented to provide an initial labeling. We then demonstrate how these labels can be refined by considering the likelihood that the event represents a particular task as obtained via an analysis of the current representation of the process model. This process is then repeated a number of times until the model is sufficiently refined. Results show that both keyword classification and current process model analysis can be significantly effective on their own, and when combined have the potential to correct virtually all errors when noise is low (less than 20%), and can reduce the error rate by about 85% when noise is in the 30-40% range. |
---|