Abstract | A key aspect of cognitive diagnostic models is the specification of the Q-matrix associating the items and some underlying student attributes. In many data-driven approaches, test items are mapped to the underlying, latent knowledge components (KC) based on observed student performance, and with little or no input from human experts. As a result, these latent skills typically focus on modeling the data accurately, but may be hard to describe and interpret. In this paper, we focus on the problem of describing these knowledge components. Using a simple probabilistic model, we extract, from the text of the test items, some keywords that are most relevant to each KC. On a small dataset from the PSLC datashop, we show that this is surprisingly effective, retrieving unknown skill labels in close to 50% of cases. We also show that our method clearly outperforms typical baselines in specificity and diversity. |
---|