统计研究

• 论文 • 上一篇    下一篇

海量半结构化数据采集、存储及分析——基于实时空气质量数据处理的实践

黄恒君 漆威   

  • 出版日期:2014-05-15 发布日期:2014-05-12

Massive Semi-Structured Data: Collection, Storage and Analysis

Henjun Huang & Wei Qi   

  • Online:2014-05-15 Published:2014-05-12

摘要: 大数据现象及处理引起了社会各界的关注。本文以大数据宏观层面理论为依据,试图从微观层面讨论一类大数据具体处理,归纳提出一种基于开源架构的海量半结构化数据采集、存储及分析自动化解决方案,并分析解决方案的开放性、融合性和经济性的特点,指出解决方案的可拓展方面。同时,结合海量空气质量实时数据,分析解决方案的具体开发细节,给出解决方案运行的经验做法,讨论分析过程的大数据压缩机制。

关键词: 大数据, 数据挖掘, 空气质量, 函数型

Abstract: Big data phenomenon and processing has aroused attention from all sectors of the community. Based on macro-level discussion of big data, this paper tries to treat a type of big data in case-level. An automation solution of massive semi-structured data collection, storage and analysis was proposed under open source framework. The features of our solution, which include openness, integration and economy, were discussed. The extension of the solution was also pointed out. Meanwhile, based on our massive real-time air quality data, this paper give out the specific development details, running experience and practice, and big data compression schemes also been discussed.

Key words: Big Data, Data Mining, Air Quality, Functional