Controversy and Consensus in News Hotspots: Using Outliers to Identify Emerging Trends in Press Corpora
Abstract
Topic modeling, a cornerstone of text mining, leverages unsupervised machine learning to uncover clusters of semantically related words within textual documents. However, controversies or polarizing topics, especially in news media, complicate the task by producing outliers, i.e., data points that deviate significantly from identified clusters. This complexity is particularly acute in the context of fake news, which propagates biased or false information, further exacerbating controversies and polarization. Despite these challenges, outliers can reveal emerging trends and shifts in discourse, making their study critical in contentious domains.
This paper presents a pilot study on the evolution of outliers in French news corpora documenting a controversy over environmental responsibility. Using topic-based clustering with multiple vector embedding models, we explore whether outliers transition into topics over time or remain as outliers. We also investigate the linguistic features driving these patterns, offering insights into the dynamics of discourse in news media.
Auteur(s) : Evangelia Zve, Benjamin Icard, Alice Breton, Lila Sainero, Gauvain Bourgne and Jean-Gabriel Ganascia
Lien vers l'article :