Exploring Sentence Variations with Bilingual Corpora

From National Research Council Canada

Download	View accepted manuscript: Exploring Sentence Variations with Bilingual Corpora (PDF, 336 KiB)
Author	Search for: Jin, Z.; Search for: Barrière, Caroline
Format	Text, Article
Conference	Corpus Linguistics 2005 Conference, July 14-17, 2005, Birmingham, United Kingdom
Abstract	We propose a system for retrieving similar sentences from a corpus which treats sentences as pure strings. The advantage of such an approach compared to more linguistically motivated approaches is that the system can quickly retrieve similar sentences from a large size corpus (over one million sentences), work well with illstructured sentences, and work across different human languages. The system has been tested using English, French and Chinese corpora and the results have been manually evaluated. The application suggested in this paper is to use our similar sentence search engine within a language-learning context to help language learners improve their writing skills and better understand grammar rules of their second language by studying different sentence variants from realistic examples. We further suggest using the system with bilingual parallel corpora to help translation students enhance their translation skills by accessing professional translations.
Publication date	2005
In	Corpus Linguistics 2005 Conference [Proceedings].
Language	English
NRC number	NRCC 48511
NPARC number	5764603
Export citation	Export as RIS
Report a correction	Report a correction (opens in a new tab)
Record identifier	b167b1d5-1e96-4599-90e9-c911f769e82d
Record created	2009-03-29
Record modified	2020-10-09

Date modified:: 2025-05-11