Download | - View final version: Transfer learning improves french cross-domain dialect identification: NRC @ VarDial 2022 (PDF, 319 KiB)
|
---|
Author | Search for: Bernier-Colborne, Gabriel1; Search for: Leger, Serge1; Search for: Goutte, Cyril1 |
---|
Affiliation | - National Research Council of Canada. Digital Technologies
|
---|
Format | Text, Article |
---|
Conference | Ninth Workshop on NLP for Similar Languages, Varieties and Dialects, October 2022, Gyeongju, Republic of Korea |
---|
Abstract | We describe the systems developed by the National Research Council Canada for the French Cross-Domain Dialect Identification shared task at the 2022 VarDial evaluation campaign. We evaluated two different approaches to this task: SVM and probabilistic classifiers exploiting n-grams as features, and trained from scratch on the data provided; and a pre-trained French language model, CamemBERT, that we fine-tuned on the dialect identification task. The latter method turned out to improve the macro-F1 score on the test set from 0.344 to 0.430 (25% increase), which indicates that transfer learning can be helpful for dialect identification. |
---|
Publication date | 2022-10-06 |
---|
Publisher | Association for Computational Linguistics |
---|
Licence | |
---|
In | |
---|
Language | English |
---|
Peer reviewed | Yes |
---|
Export citation | Export as RIS |
---|
Report a correction | Report a correction (opens in a new tab) |
---|
Record identifier | 7d0c4e22-ed47-4519-a0d3-0f1c1b25b516 |
---|
Record created | 2022-10-19 |
---|
Record modified | 2022-10-21 |
---|