Determining Perceived Text Complexity: An Evaluation of German Sentences Through Student Assessments
Tailoring written texts to a specific audience is of particular importance in settings where the embedded information affects decision-making. Existing methods for measuring text complexity commonly rely on quantitative linguistic features and ignore differences in the readers' backgrounds. In this paper, we evaluate several machine learning models that determine the complexity of texts as perceived by teenagers in high school prior to deciding on their postsecondary pathways. The models are trained on data collected at German schools where a total of 3262 German sentences were annotated by 157 students with different demographic characteristics, school grades, and language abilities. In contrast to existing methods of determining text complexity, we build a model that is specialized to behave like the target audience, thereby accounting for the diverse backgrounds of the readers. We show that text complexity models benefit from including person-related features and that K-Nearest- Neighbors and ensemble models perform well in predicting the subjectively perceived text complexity. Furthermore, SHapley Additive exPlanation (SHAP) values reveal that these perceptions not only differ by the text's linguistic features but also by the students' math and language skills and by gender.
Thome, B., F. Hertweck and S. Conrad (2024), Determining Perceived Text Complexity: An Evaluation of German Sentences Through Student Assessments. Proceedings of the 17th International Conference on Educational Data Mining, July, 714-721