Téléchargement | - Voir la version finale : Lattice desegmentation for statistical machine translation (PDF, 301 Kio)
|
---|
DOI | Trouver le DOI : https://doi.org/10.3115/v1/P14-1010 |
---|
Auteur | Rechercher : Salameh, Mohammad; Rechercher : Cherry, Colin1; Rechercher : Kondrak, Grzegorz |
---|
Affiliation | - Conseil national de recherches du Canada. Technologies de l'information et des communications
|
---|
Format | Texte, Article |
---|
Conférence | 52nd Annual Meeting of the Association for Computational Linguistics, June 23-25, 2014, Baltimore, Maryland |
---|
Résumé | Morphological segmentation is an effective sparsity reduction strategy for statistical machine translation (SMT) involving morphologically complex languages. When translating into a segmented language, an extra step is required to desegment the output; previous studies have desegmented the 1-best output from the decoder. In this paper, we expand our translation options by desegmenting n-best lists or lattices. Our novel lattice desegmentation algorithm effectively combines both segmented and desegmented views of the target language for a large subspace of possible translation outputs, which allows for inclusion of features related to the desegmentation process, as well as an unsegmented language model (LM). We investigate this technique in the context of English-to-Arabic and English-to-Finnish translation, showing significant improvements in translation quality over desegmentation of 1-best decoder outputs. |
---|
Date de publication | 2014-06-25 |
---|
Maison d’édition | Association for Computational Linguistics |
---|
Dans | |
---|
Langue | anglais |
---|
Publications évaluées par des pairs | Oui |
---|
Numéro NPARC | 21275904 |
---|
Exporter la notice | Exporter en format RIS |
---|
Signaler une correction | Signaler une correction (s'ouvre dans un nouvel onglet) |
---|
Identificateur de l’enregistrement | 72236846-b40a-4563-94bd-5d6d0bb299aa |
---|
Enregistrement créé | 2015-07-31 |
---|
Enregistrement modifié | 2020-06-02 |
---|