National Research Council of Canada. Information and Communication Technologies
MT Summit XIV Workshop on Post-editing Technology and Practice, September 2, 2013, Nice, France
Translation agencies are introducing sta- tistical machine translation (SMT) into the work flow of human translators. Typ- ically, SMT produces a first-draft transla- tion, which is then post-edited by a per- son. SMT has met much resistance from translators, partly because of professional conservatism, but partly because the SMT community has often neglected some practical aspects of translation. Our paper discusses one of these: transferring formatting tags such as bold or italic from the source to the target document with a low error rate, thus freeing the post-editor from having to reformat SMT-generated text. In our “two-stream” approach, tags are stripped from the input to the decoder, then reinserted into the resulting target-language text. Tag trans- fer has been tackled by other SMT teams, but only a few have published descrip- tions of their work. This paper contrib- utes to understanding tag transfer by ex- plaining our approach in detail.