Téléchargement | - Voir la version finale : Dialect and variant identification as a multi-label classification task: a proposal based on near-duplicate analysis (PDF, 319 Kio)
|
---|
DOI | Trouver le DOI : https://doi.org/10.18653/v1/2023.vardial-1.15 |
---|
Auteur | Rechercher : Bernier-Colborne, Gabriel1; Rechercher : Goutte, Cyril1; Rechercher : Leger, Serge1 |
---|
Affiliation du nom | - Conseil national de recherches du Canada. Technologies numériques
|
---|
Format | Texte, Article |
---|
Conférence | Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023), May 5-6, 2023, Dubrovnik, Croatia |
---|
Résumé | We argue that dialect identification should be treated as a multi-label classification problem rather than the single-class setting prevalent in existing collections and evaluations. In order to avoid extensive human re-labelling of the data, we propose an analysis of ambiguous near-duplicates in an existing collection covering four variants of French. We show how this analysis helps us provide multiple labels for a significant subset of the original data, therefore enriching the annotation with minimal human intervention. The resulting data can then be used to train dialect identifiers in a multi-label setting. Experimental results show that on the enriched dataset, the multi-label classifier produces similar accuracy to the single-label classifier on test cases that are unambiguous (single label), but it increases the macro-averaged F1- score by 0.225 absolute (71% relative gain) on ambiguous texts with multiple labels. On the original data, gains on the ambiguous test cases are smaller but still considerable (+0.077 absolute, 20% relative gain), and accuracy on nonambiguous test cases is again similar in this case. This supports our thesis that modelling dialect identification as a multi-label problem potentially has a positive impact. |
---|
Date de publication | 2023-05-05 |
---|
Maison d’édition | Association for Computational Linguistics |
---|
Licence | |
---|
Dans | |
---|
Langue | anglais |
---|
Publications évaluées par des pairs | Oui |
---|
Exporter la notice | Exporter en format RIS |
---|
Signaler une correction | Signaler une correction (s'ouvre dans un nouvel onglet) |
---|
Identificateur de l’enregistrement | acc68ace-d317-45d0-bbea-243e601bbc0d |
---|
Enregistrement créé | 2023-05-15 |
---|
Enregistrement modifié | 2023-11-02 |
---|