数据与计算发展前沿 ›› 2023, Vol. 5 ›› Issue (2): 136-149.

CSTR: 32002.14.jfdc.CN10-1649/TP.2023.02.011

doi: 10.11871/jfdc.issn.2096-742X.2023.02.011

• 技术与应用 • 上一篇    下一篇

社交网络“内容趋同引力”现象与效应实证——基于Word2vec的微博数据挖掘

徐翔*(),张铃媛,王雨晨   

  1. 同济大学,艺术与传媒学院,大数据与计算传播研究中心,上海 201804
  • 收稿日期:2022-02-14 出版日期:2023-04-20 发布日期:2023-04-24
  • 通讯作者: 徐翔
  • 作者简介:徐翔,同济大学艺术与传媒学院,教授,副院长,大数据与计算传播研究中心主任,主要研究方向为社交媒体挖掘。
    本文主要承担工作为提出思路、研究方案设计、数据采集与文本挖掘、文章修改与定稿。
    XU Xiang is a professor and associate dean of the College of Arts and Media, Tongji University, and director of the Research Center for Big Data and Computational Communication, ma-joring in social media mining.
    In this paper, he is responsible for proposing ideas, designing research schemes, data collection and text mining, revising and finalizing articles.
    E-mail: xuxiang@tongji.edu.cn
  • 基金资助:
    上海市“科技创新行动计划”软科学研究项目“从各有所好到共同牢笼:推荐算法平台公共信息茧房形成机理与调控策略”(23692110600)

An Empirical Study of the Phenomenon and Effect of “Content Convergence Gravity” in Social Network ——Data Mining of Sina Microblog Based on Word2vec

XU Xiang*(),ZHANG Lingyuan,WANG Yuchen   

  1. Research Center for Big Data and Computational Communication, College of Arts and Media, Tongji University, Shanghai 201804, China
  • Received:2022-02-14 Online:2023-04-20 Published:2023-04-24
  • Contact: XU Xiang

摘要:

【目的】明确提出“内容趋同引力”的现象与分析维度,以考察社交网络环境中存在着的信息趋同现象与特征。【方法】抓取新浪微博的14,111,274条有效帖子样本,采用Word2vec等文本和语义挖掘手段考察:在特定时间周期内,按照同等传播热度切分后的各信息层与全部层的两两相似度,及这种相似度与各信息层热度的关系。【结果】任意两个信息单元G1G2之间的内容相似度,与这两个单元的热度之和(H1+H2)成正比。“内容趋同引力”的现象和效应,在从单条帖子到多条帖子的微观细粒度和层组大尺度,都是成立的。【局限】对于社交网络内容趋同现象产生的结构性后果及其演化规律还缺乏更为具体和深刻的分析。【结论】“内容趋同引力”效应的探讨视角,为微博信息流通拓展了理论可能空间与实践可预测价值,同时也蕴藏着信息社会“舆论极端”语境、“反公共领域”的信息风险。

关键词: 社交媒体, 内容趋同, 用户生成内容, 同质化, 文本挖掘

Abstract:

[Objective] In this study, the phenomenon and analytical dimension of “content convergence gravity” are clearly proposed to analyze the convergence effect of information in the social media field represented by Sina Microblog. [Methods] 14,111,274 valid sample posts on Sina Microblog were captured, and we used Word2vec and other text mining methods to examine: the similarities between each information level and all other levels, and the relationship between these similarities and the popularity of each level in a specific period of time, after dividing contents into “spectrums” of different popularity. [Results] The content similarity between any two information units G1, G2, is proportional to the sum of the popularity of the two units (H1+H2). The effect of “content convergence gravity” holds at scales ranging from a single post to multiple posts, and from microscopic fine-grained level to large scale level groups. [Limitations] There is still a lack of a more specific and profound analysis of the structural consequences and evolution laws of content convergence in social networks. [Conclusions] The perspective of “content convergence gravity” has expanded the theoretical possibilities and practical predictability for microblog information circulation, it also contains the potential information risk of “extreme public opinion" context and “anti-public sphere”.

Key words: social media, content convergence, user-generated content, homogeneity, text mining