基于图划分的分布式推荐系统

doi:10.11871/jfdc.issn.2096-742X.2024.05.010

数据与计算发展前沿 ›› 2024, Vol. 6 ›› Issue (5): 102-110.

CSTR: 32002.14.jfdc.CN10-1649/TP.2024.05.010

doi: 10.11871/jfdc.issn.2096-742X.2024.05.010

基于图划分的分布式推荐系统

杨锦光¹(),熊菲^1,^*(),顾峻瑜^2,³,席炜亭⁴

1.北京交通大学，北京 100044
2.中国科学院计算机网络信息中心，北京 100083
3.中国科学院大学，北京 100049
4.华北电力大学，北京 100096

收稿日期:2023-01-15 出版日期:2024-10-20 发布日期:2024-10-21
通讯作者: * 熊菲（E-mail: xiongf@bjtu.edu.cn）
作者简介:杨锦光，北京交通大学，硕士研究生，主要研究方向为人工智能、推荐系统。
本文承担工作为：模型设计，模型算法实现。
YANG Jinguang is a master’s student at Beijing Jiaotong University. His main research interests are artificial intelligence and recommender systems.
In this paper, he is mainly responsible for model design and model algorithm realization.
E-mail: yangjg@bjtu.edu.cn|熊菲，北京交通大学，博士生导师，主要研究方向为人工智能、网络内容安全、推荐系统等。
本文承担工作为：指导优化模型和模型设计。
XIONG Fei is a Ph.D. supervisor at Beijing Jiaotong University. His main research interests are artificial intelligence, network content security, and recommender systems.
In this paper, he is mainly responsible for providing guidance for optimizing and designing models.
E-mail: xiongf@bjtu.edu.cn
基金资助:
国家自然科学基金(61872033);国家自然科学基金(72004009);国家重点研发计划(2018YFC0832304);北京市科技新星计划(Z201100006820015)

A Distributed Recommender System Based on Graph Partition

YANG Jinguang¹(),XIONG Fei^1,^*(),GU Junyu^2,³,XI Weiting⁴

1. Beijing Jiaotong University, Beijing 100044, China
2. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
3. University of Chinese Academy of Sciences, Beijing 100049, China
4. North China Electric Power University, Beijing 100096, China

Received:2023-01-15 Online:2024-10-20 Published:2024-10-21

摘要/Abstract

摘要：

【目的】设计一个数据处理效率高的推荐系统具有重要的意义。【方法】使用图结构来模拟推荐系统中的用户偏好关系，将其通过图划分算法处理，可以更深层次地挖掘推荐系统中数据的信息价值，并将得到的负载均衡的子图数据作为分布式系统的输入，最终经过一个自适应聚合模块的融合实现了一个分布式推荐系统。【结果】该系统可以提高推荐算法对于大规模数据的处理效率，在预测精度不下降的前提下，算法在一个由16个CPU构成的集群训练相比于单个CPU训练可提高6.4倍的效率。【结论】实验结果证明了该系统于推荐效率方面的有效性。

关键词: 推荐系统, 图划分, 负载均衡, 分布式系统

Abstract:

[Objective] It is of great significance to design a recommender system with high data processing efficiency. [Methods] The graph structure is used to simulate the user preference relationship in the recommender system. Through the graph partition algorithm processing, the information value of the data in the recommender system can be further mined, and the obtained subgraph data with load balancing can be used as the input of the distributed system. Finally, a distributed recommender system is implemented through the fusion of an adaptive aggregation module. [Results] The system can improve the processing efficiency of the recommender algorithm for large-scale data. On the premise that the prediction accuracy does not decline, the algorithm can improve the efficiency 6.4 times in a cluster training consisting of 16 CPUs compared with a single CPU training. [Conclusions] The experimental results show that the system is effective in recommendation efficiency.

Key words: recommender system, graph partition, load balancing, distributed system

杨锦光, 熊菲, 顾峻瑜, 席炜亭. 基于图划分的分布式推荐系统[J]. 数据与计算发展前沿, 2024, 6(5): 102-110.

YANG Jinguang, XIONG Fei, GU Junyu, XI Weiting. A Distributed Recommender System Based on Graph Partition[J]. Frontiers of Data and Computing, 2024, 6(5): 102-110, https://cstr.cn/32002.14.jfdc.CN10-1649/TP.2024.05.010.

图/表 13

图1

表1

图2

图3

表2

表3

表4

图4

图5

图6

图7

表5

表6

参考文献 17

[1]	GRECHANIK M, FU C, XIE Q, et al. A search engine for finding highly relevant applications[C]// Acm/ieee International Conference on Software Engineering. ACM, 2010: 475-484.
[2]	JIE L, DIANSHUANG W, MINGSONG M Z, et al. Recommender system application developments: A survey[J]. Decision Support Systems, 2015, 74: 12-32.
[3]	FUYU L, TAIWEI J, CHANGLONG Y, et al. SDM: Sequential Deep Matching Model for Online Large-scale Recommender System[C]// Conference on Information and Knowledge Management, 2019: 2635-2643.
[4]	COVINGTON P, ADAMS J, SARGIN E. Deep Neural Networks for YouTube Recommendations[C]// Acm Conference on Recommender Systems. ACM, 2016: 191-198.
[5]	SHUMPEI O, YUKIHIRO T, SHINGO O, et al. Embedding-based News Recommendation for Millions of Users[C]// Knowledge Discovery and Data Mining, 2017: 1933-1942.
[6]	MCLAUGHLIN M R, HERLOCKER J L. A collaborative filtering algorithm and evaluation metric that accurately model the user experience[C]// International Acm Sigir Conference on Research & Development in Information Retrieval. ACM, 2004: 329-336.
[7]	KOREN Y, BELL R, VOLINSKY C. Matrix Factorization Techniques for Recommender Systems[J]. IEEE, 2009, 42(8): 30-37.
[8]	HE X, DENG K, WANG X, et al. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation[C]// International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020: 639-648.
[9]	CHEN M, LIHENG M, YINGXUE Z, et al. Memory Augmented Graph Neural Networks For Sequential Recommendation[C]// National Conference on Artificial Intelligence, 2020, 34: 5045-5052.
[10]	DONG H K, CHANYOUNG P, JINOH O, et al. Convolutional Matrix Factorization for Document Context-Aware Recommendation[C]// Conference on Recommender Systems, 2016: 233-240.
[11]	STEFFEN R, CHRISTOPH F, ZENO G, et al. BPR: Bayesian personalized ranking from implicit feedback[C]// Uncertainty in Artificial Intelligence, 2012: 452-461.
[12]	ZHANG W, WANG J Y, FENG W. Combining latent factor model with location features for event-based group recommendation[M]. Knowledge Discovery and Data Mining, 2013: 910-918.
[13]	GUO Q Y, ZHUANG F Z, QIN C, et al. A Survey on Knowledge Graph-Based Recommender Systems[J]. IEEE Annals of the History of Computing, 2022, 34(8): 3549-3568.
[14]	WU Z H, PAN S R, CHEN F W, et al. A Comprehensive Survey on Graph Neural Networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(1): 4-24.
[15]	GEORGE K, VIPIN K. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs[J]. SIAM Journal on Scientific Computing, 1999, 20(1): 359-392.
[16]	MAXWELL H F, JOSEPH A K. The MovieLens Datasets: History and Context[J]. ACM transactions on interactive intelligent systems, 2016, 5(4): 1-19.
[17]	JOHN C D, ELAD H, YORAM S. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization[J]. Journal of Machine Learning Research, 2011, 12(61): 2121-2159.

符号	描述
U	用户集合
V	项目集合
Y	用户-项目交互
N	分区数量
Pⁱ	第i个分区的用户嵌入表示
Qⁱ	第i个分区的项目嵌入表示
Pⁱ_t	转移到相同特征空间上第i个分区的用户嵌入表示
Qⁱ_t	转移到相同特征空间上第i个分区的项目嵌入表示
P	聚合后的用户嵌入表示
Q	聚合后的项目嵌入表示

数据集	用户	项目	用户-项目交互	密度
Movielens-1 m	6,940	3,706	1,000,209	3.89%
Movielens-10 m	71,567	10,681	10,000,054	1.31%

分区数N	1	2	4	8	16
recall@10	0.0693	0.0740	0.0757	0.0723	0.0714
recall@20	0.1135	0.1178	0.1282	0.1265	0.1250
precision@10	0.1681	0.1763	0.1803	0.1776	0.1749
precision@20	0.1422	0.1457	0.1550	0.1501	0.1492

分区数N	1	2	4	8	16
recall@10	0.1050	0.1101	0.1180	0.1206	0.1029
recall@20	0.1684	0.1835	0.1899	0.1940	0.1814
precision@10	0.1715	0.1738	0.1826	0.1845	0.1701
precision@20	0.1437	0.1503	0.1550	0.1563	0.1445

	1	2	4	8	16
子模型训练	—	28.3	15	7.8	4.2
聚合训练	—	8.3	8.3	8.3	8.3
总计	80	36.6	23.3	16.1	12.5

基于图划分的分布式推荐系统

A Distributed Recommender System Based on Graph Partition

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 17

相关文章 6

编辑推荐

Metrics

本文评价

	1	2	4	8	16
子模型训练	—	400	216.7	125	75
聚合训练	—	83.3	83.3	83.3	83.3
总计	750	483.3	300	208.3	158.3

[1]	田少博, 李佳霖, 张鉴. 物质点法模拟的大规模并行算法[J]. 数据与计算发展前沿, 2024, 6(5): 148-158.
[2]	许淞源,刘峰. ESDRec：一种面向地球大数据平台的数据推荐模型[J]. 数据与计算发展前沿, 2023, 5(1): 55-64.
[3]	罗婕溪,刘帅,张玉志,李正丹,孙羽菲,张圣林. 基于知识图谱技术的线上教学资源推荐系统设计与实现[J]. 数据与计算发展前沿, 2022, 4(3): 3-18.
[4]	李言,陈远平. 科研信息门户的资源推荐技术研究[J]. 数据与计算发展前沿, 2021, 3(2): 112-119.
[5]	丁磊,王武,姜金荣,赵莲. 基于Charm++的并行FMM实现[J]. 数据与计算发展前沿, 2020, 2(3): 101-112.
[6]	刘鲲鹏,赵宵飒,胡一睿,傅衍杰. 个体及团体异构多方面评分行为建模[J]. 数据与计算发展前沿, 2020, 2(2): 59-77.