National Research Council of Canada. NRC Institute for Information Technology
Text, Book Chapter
We describe the National Research Council's (NRC) entry in the Anomaly Detection/Text Mining competition organized at the Text Mining Workshop 2007. This entry relies on a straightforward implementation of a probabilistic categorizer described earlier [GGPC02]. This categorizer is adapted to handle multiple labeling and a piecewise-linear confidence estimation layer is added to provide an estimate of the labeling confidence. This technique achieves a score of 1.689 on the test data. This model has potentially useful features and extensions such as the use of a category-specific decision layer or the extraction of descriptive category keywords from the probabilistic profile.
Survey of Text Mining II: Clustering, Classification, and Retrieval (2008).