| Download | - View final version: Calibration and context in human evaluation of machine translation (PDF, 4.5 MiB)
- View supplementary information: Calibration and context in human evaluation of machine translation (PDF, 10.1 MiB)
|
|---|
| DOI | Resolve DOI: https://doi.org/10.1017/nlp.2024.5 |
|---|
| Author | Search for: Knowles, Rebecca1ORCID identifier: https://orcid.org/0000-0002-1647-584X; Search for: Lo, Chi-kiu1ORCID identifier: https://orcid.org/0000-0001-8714-7846 |
|---|
| Affiliation | - National Research Council Canada. Digital Technologies
|
|---|
| Format | Text, Article |
|---|
| Subject | machine translation; evaluation |
|---|
| Abstract | Human evaluation of machine translation is considered the “gold standard” for evaluation, but it remains a challenging task for which to define best practices. Recent work has focused on incorporating intersentential context into human evaluation, to better distinguish between high-performing machine translation systems and human translations. In this work, we examine several ways that such context influences evaluation and evaluation protocols. We take a close look at annotator variation through the lens of calibration sets and focus on the implications for context-aware evaluation protocols. We then demonstrate one way in which degraded target-side intersentential context can influence annotator scores of individual sentences, a finding that supports the context-aware approach to evaluation and which also has implications for best practices in evaluation protocols. |
|---|
| Publication date | 2024-06-03 |
|---|
| Publisher | Cambridge University Press (CUP) |
|---|
| Licence | |
|---|
| In | |
|---|
| Language | English |
|---|
| Peer reviewed | Yes |
|---|
| Export citation | Export as RIS |
|---|
| Report a correction | Report a correction (opens in a new tab) |
|---|
| Record identifier | 38f5f3ec-1a13-4100-bb65-f21273d1bccb |
|---|
| Record created | 2024-06-04 |
|---|
| Record modified | 2024-06-04 |
|---|