Cost weighting for neural machine translation domain adaptation

From National Research Council Canada

Author	Search for: Chen, Boxing¹; Search for: Cherry, Colin¹; Search for: Foster, George¹; Search for: Larkin, Samuel¹
Affiliation	National Research Council of Canada. Information and Communication Technologies
Format	Text, Article
Conference	The First Workshop on Neural Machine Translation, August 4, 2017, Vancouver, BC, Canada
Abstract	In this paper, we propose a new domain adaptation technique for neural machine translation called cost weighting, which is appropriate for adaptation scenarios in which a small in-domain data set and a large general-domain data set are available. Cost weighting incorporates a domain classifier into the neural machine translation training algorithm, using features derived from the encoder representation in order to distinguish in-domain from out-of-domain data. Classifier probabilities are used to weight sentences according to their domain similarity when updating the parameters of the neural translation model. We compare cost weighting to two traditional domain adaptation techniques developed for statistical machine translation: data selection and sub-corpus weighting. Experiments on two large data tasks show that both the traditional techniques and our novel proposal lead to significant gains, with cost weighting outperforming the traditional methods.
Publication date	2017-08-04
Publisher	Association for Computational Linguistics
In	Proceedings of the First Workshop on Neural Machine Translation: 40–46.
Language	English
Peer reviewed	Yes
NPARC number	23002215
Export citation	Export as RIS
Report a correction	Report a correction (opens in a new tab)
Record identifier	328f63b3-c8d0-4c4a-bd21-47ef78e5e696
Record created	2017-09-06
Record modified	2020-03-16

Date modified:: 2025-02-05