Daryna Dementieva

friedrich schiedel fellow

Harmful Speech Proactive Moderation

Offensive speech remains a pervasive issue despite ongoing efforts, as underscored by recent EU regulations aimed at mitigating digital violence. Existing approaches primarily rely on binary solutions, such as outright blocking or banning, yet fail to address the complex nature of hate speech. In this work, we want to advocate for a more comprehensive approach that aims to assess and classify offensive speech within several new categories: (i) hate speech that can be prevented from publishing by recommending a detoxified version; (ii) hate speech that necessitated counter speech initiatives to persuade the speaker; (iii) hate speech that should be indeed blocked or banned, and (iv) instances mandating further human intervention.