Download | - View final version: NLP Scholar: a dataset for examining the state of NLP research (PDF, 1.2 MiB)
|
---|
Author | Search for: Mohammad, Saif M.1 |
---|
Affiliation | - National Research Council of Canada. Digital Technologies
|
---|
Format | Text, Article |
---|
Conference | 12th Conference on Language Resources and Evaluation [Held Virtually], May 11-16, 2020, Marseille, France |
---|
Subject | scientometrics; trends in research; Google Scholar; ACL Anthology; citations |
---|
Abstract | Google Scholar is the largest web search engine for academic literature that also provides access to rich metadata associated with the papers. The ACL Anthology (AA) is the largest repository of articles on Natural Language Processing (NLP). We extracted information from AA for about 44 thousand NLP papers and identified authors who published at least three papers in AA. We then extracted citation information from Google Scholar for all their papers (not just their AA papers). This resulted in a dataset of 1.1 million papers and associated Google Scholar information. We aligned the information in the AA and Google Scholar datasets to create the NLP Scholar Dataset—a single unified source of information (from both AA and Google Scholar) for tens of thousands of NLP papers. NLP Scholar can be used to identify broad trends in productivity, focus, and impact of NLP research. We present here initial work on analyzing the volume of research in NLP over the years and identifying the most cited papers in NLP. We also list a number of additional potential applications. |
---|
Publication date | 2020-05-16 |
---|
Publisher | European Language Resource Association (ELRA) |
---|
In | |
---|
Language | English |
---|
Peer reviewed | Yes |
---|
Export citation | Export as RIS |
---|
Report a correction | Report a correction (opens in a new tab) |
---|
Record identifier | df3a2ffa-2418-4adc-9fe3-e9adc2aac0f3 |
---|
Record created | 2020-11-02 |
---|
Record modified | 2020-11-06 |
---|