Wals Roberta Sets 37-70.zip ❲PRO❳
: Inclusive/exclusive distinctions (39A–40A), distance contrasts in demonstratives (41A), and third-person pronouns (43A).
: Leveraging the broad cross-linguistic data in WALS to improve how models handle the hundreds of languages that lack large amounts of training text. WALS roberta sets 37-70.zip
: Ordinal (53A) and distributive (54A) numerals, and numeral classifiers (55A). Nominal Syntax (Chapters 58–64) : : Inclusive/exclusive distinctions (39A–40A)
The "RoBERTa" designation suggests this data has been pre-processed or formatted for use with the (Robustly Optimized BERT Pretraining Approach) large language model, likely for tasks like cross-lingual transfer or testing a model's metalinguistic knowledge. Included Linguistic Features (Chapters 37–70) distance contrasts in demonstratives (41A)
: Using the WALS database features as labels to see if a model's internal representations (embeddings) cluster according to known linguistic traits, such as whether a language uses definite articles.
: Testing if models like RoBERTa or XLM-RoBERTa have "learned" the typological rules of specific languages during pre-training.
: Obligatory possessive inflection (58A) and possessive classification (59A).