Download | - View accepted manuscript: Experiments in discriminating similar languages (PDF, 531 KiB)
|
---|
Author | Search for: Goutte, Cyril1; Search for: Léger, Serge1 |
---|
Affiliation | - National Research Council of Canada. Information and Communication Technologies
|
---|
Format | Text, Article |
---|
Conference | LT4VarDial - Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects, September 10th, 2015, Hissar, Bulgaria |
---|
Abstract | We describe the system built by the National Research Council (NRC) Canada for the 2015 shared task on Discriminating between similar languages. The NRC system uses various statistical classifiers trained on character and word ngram features. Predictions rely on a two-stage process: we first predict the language group, then discriminate between languages or variants within the group. This year, we focused on two issues: 1) the ngram generation process, and 2) the handling of the anonymized (“blinded”) Named Entities. Despite the slightly harder experimental conditions this year, our systems achieved an average accuracy of 95.24% (closed task) and 95.65% (open task), ending up second or (close) third on the closed task, and first on the open task. |
---|
Publication date | 2015-09 |
---|
In | |
---|
Language | English |
---|
Peer reviewed | Yes |
---|
NPARC number | 21276326 |
---|
Export citation | Export as RIS |
---|
Report a correction | Report a correction (opens in a new tab) |
---|
Record identifier | 884cac9b-7d70-4078-9542-3f5980852d99 |
---|
Record created | 2015-10-02 |
---|
Record modified | 2020-06-04 |
---|