数据与计算发展前沿 ›› 2023, Vol. 5 ›› Issue (4): 112-126.

CSTR: 32002.14.jfdc.CN10-1649/TP.2023.04.010

doi: 10.11871/jfdc.issn.2096-742X.2023.04.010

• 技术与应用 • 上一篇    下一篇

基于Spark和优化BP神经网络的出租车需求预测模型

孟哲1(),余粟2,*()   

  1. 1.上海工程技术大学,电子电气工程学院,上海 201620
    2.上海工程技术大学,图文信息中心,上海 201620
  • 收稿日期:2022-05-02 出版日期:2023-08-20 发布日期:2023-08-23
  • 通讯作者: *余粟(E-mail: yusu@sues.edu.cn
  • 作者简介:孟哲,上海工程技术大学,电子电气工程学院,硕士研究生,主要研究方向为大数据分析。
    本文中负责模型设计、实验设计、集群搭建与论文写作。
    MENG Zhe is a master’s student at the School of Electronic and Electrical Engineering, Shanghai University of Engineering Science. His main research interests include big data analysis.
    In this paper, he is responsible for the model design, experiments design, cluster construction, and thesis writing.
    E-mail:574722546@qq.com|余粟,上海工程技术大学,图文信息中心,教授,主要研究方向为机电控制、计算机视觉、大数据分析等。
    本文中负责论文写作指导。
    YU Su is a professor at the Graphic Information Center, Shanghai University of Engineering Science. Her main research interests include electromechanical control, computer vision, and big data analysis.
    In this paper, she is responsible for the paper writing guidance.
    E-mail: yusu@sues.edu.cn
  • 基金资助:
    上海市科委科研计划项目“染色机器人管理软件系统”(17511110204)

Taxi Demand Prediction Model Based on Spark and Improved BP Neural Network

MENG Zhe1(),YU Su2,*()   

  1. 1. School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China
    2. Graphic Information Center, Shanghai University of Engineering Science, Shanghai 201620, China
  • Received:2022-05-02 Online:2023-08-20 Published:2023-08-23

摘要:

【目的】出租车调度问题作为影响我国交通发展的重要问题,一直受到学者们的广泛关注。针对实际生活中出租车空驶时间长、人车匹配效率低以及供不应求等问题,研究一个基于Spark和优化BP神经网络的出租车需求预测模型,对城市内某个区域未来一天内的出租总需求量进行预测。其中核心的预测算法是优化的BP神经网络。【方法】针对传统BP神经网络面对大数据集会出现收敛慢并且训练不理想的情况,利用灰色关联分析和遗传算法对模型进行优化。提前使用灰色关联分析对数据集进行处理并将结果作用于BP神经网络内部从而优化网络的收敛速度以及训练效果,并使用遗传算法再次对模型参数进行优化,将最终模型通过Spark实现从而加速模型的训练速度。【结果】通过模型对来自于TLC(纽约市出租车和轿车委员会)的出租车数据集的训练以及预测的实验结果可知:和原始BP神经网络、遗传算法优化的BP神经网络、模拟退火算法结合遗传算法优化的BP神经网络以及粒子群算法优化的BP神经网络相比,本文提出的优化模型的预测精度分别提升了25%、11.1%、6.9%、12.4%,训练时长分别缩短了32.9h、30.1h、36.2h、33.5h,并且收敛速度明显加快。最后将模型在我国的成都出租车数据集上进行训练和预测也证明了模型的泛用性以及对我国城市出租车需求量预测的有效性。【结论】优化模型可以很好地完成出租车需求量预测的任务,为决策者们进行出租车调度提供了有效参考,从而缓解目前存在的出租车调度问题。【局限】但是模型仍有改进空间如预测范围以及选取参数等。

关键词: 出租车调度, 出租车需求量, 预测模型, BP神经网络, 灰色关联分析, 遗传算法, Spark

Abstract:

[Objective] Taxi dispatching, as an important issue affecting the development of transportation in China, has been widely concerned by scholars. To solve the problems such as long-time idle driving of taxis, low-efficiency of matching between taxis and passengers, and short supply of taxis in real life, a taxi demand prediction model based on Spark and the improved BP network is proposed. The work of the model is to predict the total demand for taxis in a certain area of the city in one day. The core prediction algorithm is an improved BP network. [Methods] Given the slow convergence and unsatisfactory training effect of traditional BP neural network in the face of big data set, grey relation analysis and genetic algorithm are used to optimize the model. The data set is processed by grey relation analysis in advance and the results are applied to the interior of the BP neural network to optimize the convergence speed and training effect. Then, the genetic algorithm is used to optimize the parameters of the model again. And the final model is achieved by Spark, to accelerate the training speed of the model. [Results] The experimental results of the model's training and prediction on the taxi data set from TLC (The New York City Taxi and Limousine Commission) show the following conclusion: compared with the traditional BP neural network model, the BP neural network improved by genetic algorithm, the BP neural network improved by simulated annealing algorithm combined with genetic algorithm and the BP neural network improved by particle swarm optimization algorithm, the prediction accuracy of the proposed improved model is increased by 25%, 11.1%, 6.9%, and 12.4% respectively, the training duration is shortened by 32.9h, 30.1h, 36.2h, and 33.5h respectively, and the convergence speed is significantly accelerated. Finally, the model is trained and predicted on the Chengdu taxi data, which also proves the universality of the model and the effectiveness of the model in forecasting the demand for urban taxis in China. [Conclusions] The improved model can complete the task of taxi demand prediction nicely and provide an effective reference for decision-makers to carry out taxi dispatching, to alleviate the existing taxi dispatching problem. [Limitations] However, the model still has room for improvement such as prediction range and selection of parameters.

Key words: taxi dispatching, taxi demand, prediction model, BP neural network, grey relation analysis, genetic algorithm, Spark