### 大数据背景下CPI预测问题的文本挖掘技术设计与应用

• 出版日期:2021-08-25 发布日期:2021-08-25

### Design and Application of Text Mining Technology for CPI Prediction Based on Big Data

Tang Xiaobin Dong Manru Xu Rong

Abstract: This paper innovatively combines the semi-supervised interactive keyword extraction algorithm Term Frequency-Inverse Document Frequency ( TF-IDF) with the Bidirectional Encoder Representation from Transformers (BERT) model, and designs a text mining technology that expands CPI prediction seed keywords. Using the interactive TF-IDF algorithm, the original CPI prediction seed keywords are expanded in breadth. On this basis, the BERT “ two-stage” search and filter model is used to deeply mine text information and match keywords to realize the expansion of the depth of CPI prediction keywords, thereby constructing the CPI prediction keyword database. Furthermore, for the keywords before and after the feature expansion of text mining technology, a predictive model is established for comparative analysis. The research shows that compared with traditional keyword extraction algorithms, the interactive TF-IDF algorithm does not need a corpus, and also allows the input of seed words. Simultaneously, the BERT model fine-tunes the basic model through transfer learning, learns the knowledge in specific domains, and implements language representation, semantic expansion and human-computer interaction in CPI prediction. Compared with traditional text mining technology, this paper designs a text mining technology with strong generalization and representation for CPI prediction problems. On the basis of 84 CPI prediction key seed words, the research mines deeper into the text, and the expanded keyword glossary has higher accuracy and more comprehensive interpretability in CPI prediction. The text mining technology designed in this paper for the CPI prediction also provides new research ideas and references for the establishment of databases of other macroeconomic index keywords.