National Research Council of Canada. NRC Institute for Information Technology
Natural language processing; Medline; Molecular biology; Knowledge acquisition (computer); Semantics; Indexing and abstracting
Literature mining is the process of extracting and combining facts from scientific publications. In recent years, many computer programs have been designed to extract various molecular biology findings from Medline abstracts or full text articles. The present article describes the range of text mining techniques that have been applied to scientific documents. It divides 'automated reading' into four general subtasks: text categorization, named entity tagging, fact extraction, and collection-wide analysis. Literature mining offers powerful methods to support knowledge discovery and the construction of topic maps and ontologies. An overview is given of recent developments in medical language processing. Special attention is given to the domain particularities of molecular biology, and the emerging synergy between literature mining and molecular databases accessible through Internet.
International Journal of Medical Informatics67, no. 1-3 (14 November 2002): 7–18.