Download | - View final version: Calibration and context in human evaluation of machine translation (PDF, 4.5 MiB)
- View supplementary information: Calibration and context in human evaluation of machine translation (PDF, 10.1 MiB)
|
---|
DOI | Resolve DOI: https://doi.org/10.1017/nlp.2024.5 |
---|
Author | Search for: Knowles, Rebecca1ORCID identifier: https://orcid.org/0000-0002-1647-584X; Search for: Lo, Chi-kiu1ORCID identifier: https://orcid.org/0000-0001-8714-7846 |
---|
Affiliation | - National Research Council of Canada. Digital Technologies
|
---|
Format | Text, Article |
---|
Subject | machine translation; evaluation |
---|
Abstract | Human evaluation of machine translation is considered the “gold standard” for evaluation, but it remains a challenging task for which to define best practices. Recent work has focused on incorporating intersentential context into human evaluation, to better distinguish between high-performing machine translation systems and human translations. In this work, we examine several ways that such context influences evaluation and evaluation protocols. We take a close look at annotator variation through the lens of calibration sets and focus on the implications for context-aware evaluation protocols. We then demonstrate one way in which degraded target-side intersentential context can influence annotator scores of individual sentences, a finding that supports the context-aware approach to evaluation and which also has implications for best practices in evaluation protocols. |
---|
Publication date | 2024-06-03 |
---|
Publisher | Cambridge University Press (CUP) |
---|
Licence | |
---|
In | |
---|
Language | English |
---|
Peer reviewed | Yes |
---|
Export citation | Export as RIS |
---|
Report a correction | Report a correction (opens in a new tab) |
---|
Record identifier | 38f5f3ec-1a13-4100-bb65-f21273d1bccb |
---|
Record created | 2024-06-04 |
---|
Record modified | 2024-06-04 |
---|