Téléchargement | - Voir le manuscrit accepté : Experiments in discriminating similar languages (PDF, 531 Kio)
|
---|
Auteur | Rechercher : Goutte, Cyril1; Rechercher : Léger, Serge1 |
---|
Affiliation | - Conseil national de recherches du Canada. Technologies de l'information et des communications
|
---|
Format | Texte, Article |
---|
Conférence | LT4VarDial - Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects, September 10th, 2015, Hissar, Bulgaria |
---|
Résumé | We describe the system built by the National Research Council (NRC) Canada for the 2015 shared task on Discriminating between similar languages. The NRC system uses various statistical classifiers trained on character and word ngram features. Predictions rely on a two-stage process: we first predict the language group, then discriminate between languages or variants within the group. This year, we focused on two issues: 1) the ngram generation process, and 2) the handling of the anonymized (“blinded”) Named Entities. Despite the slightly harder experimental conditions this year, our systems achieved an average accuracy of 95.24% (closed task) and 95.65% (open task), ending up second or (close) third on the closed task, and first on the open task. |
---|
Date de publication | 2015-09 |
---|
Dans | |
---|
Langue | anglais |
---|
Publications évaluées par des pairs | Oui |
---|
Numéro NPARC | 21276326 |
---|
Exporter la notice | Exporter en format RIS |
---|
Signaler une correction | Signaler une correction (s'ouvre dans un nouvel onglet) |
---|
Identificateur de l’enregistrement | 884cac9b-7d70-4078-9542-3f5980852d99 |
---|
Enregistrement créé | 2015-10-02 |
---|
Enregistrement modifié | 2020-06-04 |
---|