N-gram and neural models for Uralic language identification: NRC at VarDial 2021

Par Conseil national de recherches du Canada

Téléchargement	Voir la version finale : N-gram and neural models for Uralic language identification: NRC at VarDial 2021 (PDF, 377 Kio)
Auteur	Rechercher : Bernier-Colborne, Gabriel¹; Rechercher : Léger, Serge¹; Rechercher : Goutte, Cyril¹
Affiliation	Conseil national de recherches du Canada. Technologies numériques
Format	Texte, Article
Conférence	8th VarDial Workshop on NLP for Similar Languages, Varieties and Dialects, April 20th, 2021, Held Virtually
Résumé	We describe the systems developed by the National Research Council Canada for the Uralic language identification shared task at the 2021 VarDial evaluation campaign. We evaluated two different approaches to this task: a probabilistic classifier exploiting only character 5-grams as features, and a character-based neural network pre-trained through self-supervision, then fine-tuned on the language identification task. The former method turned out to perform better, which casts doubt on the usefulness of deep learning methods for language identification, where they have yet to convincingly and consistently outperform simpler and less costly classification algorithms exploiting n-gram features.
Date de publication	2021-04-20
Maison d’édition	Association for Computational Linguistics
Licence	Creative Commons Attribution 4.0 International (CC BY 4.0) https://creativecommons.org/licenses/by/4.0/deed.fr
Dans	Proceedings of the Eighth VarDial Workshop on NLP for Similar Languages, Varieties and Dialects (20 avril 2021) : 128–134.
Langue	anglais
Publications évaluées par des pairs	Oui
Exporter la notice	Exporter en format RIS
Signaler une correction	Signaler une correction (s'ouvre dans un nouvel onglet)
Identificateur de l’enregistrement	ff9f3cc0-73cd-4026-bcf7-3b60e247330d
Enregistrement créé	2021-08-03
Enregistrement modifié	2021-08-04

Date de modification :: 2024-09-30