Download | - View accepted manuscript: Real-Time Identification of Parallel Texts from Bilingual Newsfeed (PDF, 532 KiB)
|
---|
Author | Search for: Nadeau, David1; Search for: Foster, George1 |
---|
Affiliation | - National Research Council of Canada. NRC Institute for Information Technology
|
---|
Format | Text, Article |
---|
Conference | Proceedings of the Computational Linguistic in the North-East (CLINE'2004), August 30, 2004., Montréal, Québec, Canada |
---|
Abstract | Parallel texts are documents that present parallel translations. This paper describes a simple method that can be deployed on a real-time news feed to create an infinitely growing source of parallel texts in French and English. Our experiment was lead on theCanada Newswire news feed. Given some of its intrinsic properties, it was possible to deploy a relatively simple text matching techniques that rely on language independent cognates such numbers, capitalized words, punctuation and new lines characters. On three week of press releases, our system correctly identified the vast majority of parallel press release. It committed only minor errors on repeated news items. |
---|
Publication date | 2004 |
---|
In | |
---|
Language | English |
---|
NRC number | NRCC 48081 |
---|
NPARC number | 5764063 |
---|
Export citation | Export as RIS |
---|
Report a correction | Report a correction (opens in a new tab) |
---|
Record identifier | e6d2a7f8-a74d-406d-b1f9-7ca7d7d6720c |
---|
Record created | 2009-03-29 |
---|
Record modified | 2020-04-17 |
---|