Téléchargement | - Voir la version finale : Beyond correlation: making sense of the score differences of new MT evaluation metrics (PDF, 1.6 Mio)
|
---|
Auteur | Rechercher : Lo, Chi-kiu1; Rechercher : Knowles, Rebecca1; Rechercher : Goutte, Cyril1 |
---|
Affiliation du nom | - Conseil national de recherches du Canada. Technologies numériques
|
---|
Format | Texte, Article |
---|
Conférence | Machine Translation Summit (MTS) 2023, September 4–8, 2023, Macau SAR, China |
---|
Résumé | While many new automatic metrics for machine translation valuation have been proposed in recent years, BLEU scores are still used as the primary metric in the vast majority of MT research papers. There are many reasons that researchers may be reluctant to switch to new metrics, from external pressures (reviewers, prior work) to the ease of use of metric toolkits. Another reason is a lack of intuition about the meaning of novel metric scores. In this work, we examine “rules of thumb” about metric score differences and how they do (and do not) correspond to human judgments of statistically significant differences between systems. In particular, we show that common rules of thumb about BLEU score differences do not in fact guarantee that human annotators will find significant differences between systems. We also show ways in which these rules of thumb fail to generalize across translation directions or domains. |
---|
Date de publication | 2023-09-08 |
---|
Maison d’édition | Asia-Pacific Association for Machine Translation |
---|
Licence | |
---|
Dans | |
---|
Langue | anglais |
---|
Publications évaluées par des pairs | Oui |
---|
Exporter la notice | Exporter en format RIS |
---|
Signaler une correction | Signaler une correction (s'ouvre dans un nouvel onglet) |
---|
Identificateur de l’enregistrement | 250b820e-150c-477e-8a6c-da1bb4ee65ea |
---|
Enregistrement créé | 2023-09-18 |
---|
Enregistrement modifié | 2023-09-20 |
---|