Download | - View accepted manuscript: The NRC System for Discriminating Similar Languages (PDF, 533 KiB)
|
---|
Author | Search for: Goutte, Cyril1; Search for: Léger, Serge1; Search for: Carpuat, Marine1 |
---|
Affiliation | - National Research Council of Canada. Information and Communication Technologies
|
---|
Format | Text, Article |
---|
Conference | First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects, August 23-29, 2014, Dublin, Ireland |
---|
Abstract | We describe the system built by the National Research Council Canada for the ”Discriminating between similar languages” (DSL) shared task. Our system uses various statistical classifiers and makes predictions based on a two-stage process: we first predict the language group, then discriminate between languages or variants within the group. Language groups are predicted using a generative classifier with 99.99% accuracy on the five target groups. Within each group (except English), we use a voting combination of discriminative classifiers trained on a variety of feature spaces, achieving an average accuracy of 95.71%, with per-group accuracy between 90.95% and 100% depending on the group. This approach turns out to reach the best performance among all systems submitted to the open and closed tasks. |
---|
Publication date | 2014-08-23 |
---|
In | |
---|
Language | English |
---|
NPARC number | 21275282 |
---|
Export citation | Export as RIS |
---|
Report a correction | Report a correction (opens in a new tab) |
---|
Record identifier | bd4a662e-ed67-47ef-8165-abde04de494c |
---|
Record created | 2015-05-28 |
---|
Record modified | 2020-06-04 |
---|