| Download | - View accepted manuscript: The NRC System for Discriminating Similar Languages (PDF, 533 KiB)
|
|---|
| Author | Search for: Goutte, Cyril1; Search for: Léger, Serge1; Search for: Carpuat, Marine1 |
|---|
| Affiliation | - National Research Council Canada. Information and Communication Technologies
|
|---|
| Format | Text, Article |
|---|
| Conference | First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects, August 23-29, 2014, Dublin, Ireland |
|---|
| Abstract | We describe the system built by the National Research Council Canada for the ”Discriminating between similar languages” (DSL) shared task. Our system uses various statistical classifiers and makes predictions based on a two-stage process: we first predict the language group, then discriminate between languages or variants within the group. Language groups are predicted using a generative classifier with 99.99% accuracy on the five target groups. Within each group (except English), we use a voting combination of discriminative classifiers trained on a variety of feature spaces, achieving an average accuracy of 95.71%, with per-group accuracy between 90.95% and 100% depending on the group. This approach turns out to reach the best performance among all systems submitted to the open and closed tasks. |
|---|
| Publication date | 2014-08-23 |
|---|
| In | |
|---|
| Language | English |
|---|
| NPARC number | 21275282 |
|---|
| Export citation | Export as RIS |
|---|
| Report a correction | Report a correction (opens in a new tab) |
|---|
| Record identifier | bd4a662e-ed67-47ef-8165-abde04de494c |
|---|
| Record created | 2015-05-28 |
|---|
| Record modified | 2020-06-04 |
|---|