Calibration and context in human evaluation of machine translation

From National Research Council Canada

Download	View final version: Calibration and context in human evaluation of machine translation (PDF, 4.5 MiB) View supplementary information: Calibration and context in human evaluation of machine translation (PDF, 10.1 MiB)
DOI	Resolve DOI: https://doi.org/10.1017/nlp.2024.5
Author	Search for: Knowles, Rebecca¹ORCID identifier: https://orcid.org/0000-0002-1647-584X; Search for: Lo, Chi-kiu¹ORCID identifier: https://orcid.org/0000-0001-8714-7846
Affiliation	National Research Council of Canada. Digital Technologies
Format	Text, Article
Subject	machine translation; evaluation
Abstract	Human evaluation of machine translation is considered the “gold standard” for evaluation, but it remains a challenging task for which to define best practices. Recent work has focused on incorporating intersentential context into human evaluation, to better distinguish between high-performing machine translation systems and human translations. In this work, we examine several ways that such context influences evaluation and evaluation protocols. We take a close look at annotator variation through the lens of calibration sets and focus on the implications for context-aware evaluation protocols. We then demonstrate one way in which degraded target-side intersentential context can influence annotator scores of individual sentences, a finding that supports the context-aware approach to evaluation and which also has implications for best practices in evaluation protocols.
Publication date	2024-06-03
Publisher	Cambridge University Press (CUP)
Licence	Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) https://creativecommons.org/licenses/by-nc-nd/4.0/
In	Natural Language Processing (3 June 2024): 1–25.
Language	English
Peer reviewed	Yes
Export citation	Export as RIS
Report a correction	Report a correction (opens in a new tab)
Record identifier	38f5f3ec-1a13-4100-bb65-f21273d1bccb
Record created	2024-06-04
Record modified	2024-06-04

Date modified:: 2025-04-30