统计研究

• 论文 • 上一篇    

大数据背景下的谷歌翻译 ——现状与挑战

斯介生等   

  • 出版日期:2016-05-15 发布日期:2016-05-10

Google translate in the era of Big Data —Actuality and Challenge

Si Jiesheng et.al   

  • Online:2016-05-15 Published:2016-05-10

摘要: 在大数据时代,如何通过数据分析挖掘事物的内在规律是人们需要思考的问题。谷歌翻译基于“最好的表达为出现频率最高的表达”这样的认识,将翻译问题转化为统计问题,解决实际问题。本文以谷歌翻译为案例,详细分析了案例背景,实现过程,并给出案例反思。谷歌翻译的成功之处在于,将实际问题巧妙地转化为统计问题,并利用其强大的计算能力进行解决。其瓶颈在于,当前的方法只利用了大数据的少量信息,不能充分刻画大数据的全部信息。谷歌翻译对问题转化和处理方式是大数据应用的典范,对今后利用大数据解决实际问题有重要的借鉴意义。

关键词: 谷歌翻译, 统计机器翻译, 最大熵, 最小误差率损失

Abstract: In the era of Big Data, it is common to explore the potential pattern by data analysis. With the understanding of frequency, Google translate solve the translation issue in the view of statistics. In this paper, it reviews the background and statistical methods for Google translate and discuss the future work in this field. The success of Google is that its translation the practical problems into the statistics ones and the powerful computing capability. However, it just uses the small amount of information in Big Data without describing all the information. But it is still the excellent application example of Big Data.

Key words: Google translate, Statistical learning translation, Maximized entropy, Minimum Error Rate