https://arxiv.org/abs/2212.00616 – расширение идеи текстовой инверсии в image generation на текст, использование псевдослов, которые компактно описывают некоторое подмножество датасета (например, стиль речи).
И опять зрелые бенчмарки. We present Holistic Evaluation of Language Models (HELM) to improve the transparency of language models. First, we taxonomize the vast space of potential scenarios (i.e. use cases) and metrics (i.e. desiderata) that are of interest for LMs. Then we select a broad subset based on coverage and feasibility, noting what's missing or underrepresented (e.g. question answering for neglected English dialects, metrics for trustworthiness). Second, we adopt a multi-metric approach: We measure 7 metrics (accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency) for each of 16 core scenarios when possible (87.5% of the time). https://arxiv.org/abs/2211.09110