| Téléchargement | - Voir le manuscrit accepté : Phrase clustering for smoothing TM probabilities – or, how to extract paraphrases from phrase tables (PDF, 687 Kio)
|
|---|
| Auteur | Rechercher : Kuhn, Roland1; Rechercher : Chen, Boxing1; Rechercher : Foster, George1; Rechercher : Stratford, Evan |
|---|
| Affiliation | - Conseil national de recherches Canada. Institut de technologie de l'information du CNRC
|
|---|
| Format | Texte, Article |
|---|
| Conférence | The 23rd International Conference on Computational Linguistics (COLING 2010), August 23-27, 2010, Beijing, China |
|---|
| Sujet | Information and Communication Technologies |
|---|
| Résumé | This paper describes how to cluster to-gether the phrases of a phrase-based sta-tistical machine translation (SMT) sys-tem, using information in the phrase table itself. The clustering is symmetric and recursive: it is applied both to source-language and target-language phrases, and the clustering in one language helps determine the clustering in the other. The phrase clusters have many possible uses. This paper looks at one of these uses: smoothing the conditional translation model (TM) probabilities employed by the SMT system. We incorporated phrase-cluster-derived probability esti-mates into a baseline loglinear feature combination that included relative fre-quency and lexically-weighted condition-al probability estimates. In Chinese-English (C-E) and French-English (F-E) learning curve experiments, we obtained a gain over the baseline in 29 of 30 tests, with a maximum gain of 0.55 BLEU points (though most gains were fairly small). The largest gains came with me-dium (200-400K sentence pairs) rather than with small (less than 100K sentence pairs) amounts of training data, contrary to what one would expect from the pa-raphrasing literature. We have only be-gun to explore the original smoothing approach described here. |
|---|
| Date de publication | 2010-08-27 |
|---|
| Dans | |
|---|
| Langue | anglais |
|---|
| Publications évaluées par des pairs | Oui |
|---|
| Numéro NPARC | 15736686 |
|---|
| Exporter la notice | Exporter en format RIS |
|---|
| Signaler une correction | Signaler une correction (s'ouvre dans un nouvel onglet) |
|---|
| Identificateur de l’enregistrement | 68e35bd5-b0b9-4e25-8be2-36e382b8aa1b |
|---|
| Enregistrement créé | 2010-07-05 |
|---|
| Enregistrement modifié | 2020-04-17 |
|---|