«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

[1]张睿敏,张琪淼,杜叔强,等.大数据环境下基于Spark的Bayes分类算法研究[J].工业仪表与自动化装置,2018,(03):116-118.[doi:1000-0682（2018）03-0000-00]
　ZHANG Ruimin,ZHANG Qimiao,DU Shuqiang,et al.The researchingof the Bayes classification algorithm based on Spark in large data environment[J].Industrial Instrumentation & Automation,2018,(03):116-118.[doi:1000-0682（2018）03-0000-00]
点击复制

大数据环境下基于Spark的Bayes分类算法研究

《工业仪表与自动化装置》[ISSN:1000-0682/CN:61-1121/TH]

卷:
期数:: 2018年03期

页码:: 116-118

栏目:

出版日期:: 2018-06-15

文章信息/Info

Title:: The researchingof the Bayes classification algorithm based on Spark in large data environment

作者:: 张睿敏1; 张琪淼2; 杜叔强1; 贾桂霞1; 1.兰州工业学院软件工程学院，兰州 730050；
2.兰州市公安局，兰州 730030

Author(s):: ZHANG Ruimin1; ZHANG Qimiao2; DU Shuqiang1; JIA Guixia1; 1. Department of Software, Lanzhou Institute of Technology, Lanzhou 730050,China;2. Lanzhou Municipal Public Security Bureau, Lanzhou 730030, China

关键词:: 大数据; Spark; 并行流式化; 贝叶斯分类

Keywords:: big data; Spark; parallel flow; Bayes classification收稿日期：2017-12-29基金项目：2016年度甘肃省高等学校科研项目自筹经费项目（2016B-115）作者简介：张睿敏(1978); 女; 甘肃兰州人; 硕士; 讲师; 研究方向为移动互联网; 大数据; web Service。

分类号:: TP31

DOI:: 1000-0682（2018）03-0000-00

文献标志码:: A

摘要:: 随着大数据的爆发，如何提高算法的执行效率是大数据分类的研究热点问题，Spark分布式并行计算框架，支持迭代数据流，该文对朴素贝叶斯文本分类算法作并行流式化处理，实验证明，并行流式化Bayes分类算法能有效提高大数据分类效率。

Abstract:: With the big data burst, how to improve the efficiency of the algorithm is a hot research problem in the data classification, Spark is the distributed parallel computing framework, support the iterative data flow, In this paper, the naive Bayes text classification algorithm is used in parallel flow processing. experiments show that the parallel flow type Bayes classification algorithm can effectively to improve the efficiency of data classification.

参考文献/References:

[1] Hall M.A decision tree-based attribute weighting filter for na?ve Bayes[J].Knowledge-Based Systems, 2007, 20(2): 120-126.

[2] 张明卫,王波,张斌,等.基于相关系数的加权朴素贝叶斯分类算法[J].东北大学学报(自然科学版),2008,29(7): 952-955.

[3] 刘志强,顾荣,袁春风,等.基于SparkR的分类算法并行化研究[J].计算机科学与探索,2015,9(11):1281-1294.

[4] 宋福星.基于Spark的超大文本分类方法的设计与实现[D].北京:北京交通大学,2017.

[5] 光顺利.基于Spark的文本分类的研究[D].长春:长春工业大学,2016.

[6] 张春,郭明亮.大数据环境下朴素贝叶斯分类算法的改进与实现[J].北京交通大学学报,2015(2):39-45.

[7] 张睿敏,张琪淼,杜叔强.Android平台上属性约简贝叶斯优化Web Services分类选择算法研究[J].工业仪表与自动化装置,2017(2):119-122.

[8] 宁越.基于信息论的自筛选贝叶斯分类模型的研究与设计[D].吉林:吉林大学,2015.

[9] 程学旗,靳小龙,王元卓,等.大数据系统和分析技术综述[J].软件学报,2014,25(9):1889-1908.

[10] 刘志强,顾荣,袁春风,等.基于SparkR的分类算法并行化研究[J].计算机科学与探索,2015,9(11):1281-1294.

[11] http://blog.csdn.net/tiantangrenjian/article/details/698753

相似文献/References:

[1]马金祥,范新南,张建生,等.智能配电网大数据全景风险评估与自愈控制方法[J].工业仪表与自动化装置,2016,(03):14.
　MA Jinxiang,FAN Xinnan,ZHANG Jiansheng,et al.Panoramic risk assessment and self-healing control of big data on smart distribution grid[J].Industrial Instrumentation & Automation,2016,(03):14.
[2]黄文思,毛学工,熊开智,等.基于大数据技术的水电行业企业级数据中心建设的研究[J].工业仪表与自动化装置,2017,(01):26.
　HUANG Wensi,MAO Xuegong,XIONG Kaizhi,et al.Research on the construction of enterprise level data center based on big data technology in hydropower industry[J].Industrial Instrumentation & Automation,2017,(03):26.
[3]秦丽杰,李禹曈,罗义旺,等.一种基于大数据技术的城市信息孤岛消除策略[J].工业仪表与自动化装置,2017,(03):32.
　QIN Lijie,LI Yutong,LUO Yiwang,et al.A city information island elimination strategy based on big data technology[J].Industrial Instrumentation & Automation,2017,(03):32.
[4]董维振,陈燕*,李媛媛.基于多元逐步回归的带钢性能预测模型[J].工业仪表与自动化装置,2022,(02):107.[doi:10.19950/j.cnki.cn61-1121/th.2022.02.022]
　DONG Weizhen,CHEN Yan*,LI Yuanyuan.Research on prediction model of steel properties based on multiple stepwise regression and data mining[J].Industrial Instrumentation & Automation,2022,(03):107.[doi:10.19950/j.cnki.cn61-1121/th.2022.02.022]

备注/Memo

备注/Memo:: 收稿日期：2017-12-19
基金项目：2016年度甘肃省高等学校科研项目自筹经费项目（2016B-115）
作者简介：张睿敏（1978），女，兰州人，硕士，讲师，研究方向为移动互联网，大数据，Web Service。

更新日期/Last Update: 2018-06-15

《工业仪表与自动化装置》[ISSN:1000-0682/CN:61-1121/TH]

文章信息/Info

参考文献/References:

相似文献/References:

备注/Memo

常用功能

导航/Navigate

工具/Tools

统计/Statistics