I am currently working on developing an R package enabling researchers to explore their textual corpora using several measures from computational linguistics. My package allows researchers to calculate a range of word keyness metrics, which allow one to find the “key” words characteristic of a textual corpus. It supports calculation of both statistical significance (log-likelihood ration, bayesian information criterion) and effect size (%DIFF, relative risk, log ratio, odds ratio) measures of word keyness.
The current version, still under development, can be found on my GitHub, in the KeynessMeasures repository.
This package calculates a range of different F-scores and accuracy
scores for multi-class and multi-label classification problems in
R.
Its goal is to simplify the calculation of multi-class and multi-label
classification measures in R. Currently, different R packages implement
different types of averaging for measure calculation. Especially when it
comes to multi-label problems, this can cause issues with interpretation
and comparability of results across different applications. In the
documentation, we clearly denote which metrics match those output by the
sklearn.metrics functions - which are most widely used in
implementations using python - but also those given by several R
packages that offer options for multi-class (yardstick package) or
multi-label (HEMDAG and mlr) classification performance measure
calculation.
The current version, still under development, can be found on my GitHub, in the sklearnR.metrics repository.