-
作为机器学习和人工智能领域发展最为迅速的研究方向,深度学习受到学术界和工业界的高度关注。深度学习是基于特征自学习和深度神经网络(DNN)的一系列机器学习算法的总称。目前深度学习的研究有了长足的发展,在传统特征选择与提取框架上取得了巨大突破,对包括自然语言处理、生物医学分析、遥感影像解译在内的诸多领域产生越来越重要的影响,并在计算机视觉和语音识别领域取得了革命性的成功。
当前,如何应用深度学习技术解决自然语言处理(NLP)相关任务是深度学习的研究热点。NLP作为计算机科学与人工智能交叉领域中的重要研究方向,综合了语言学、计算机科学、逻辑学、心理学、人工智能等学科的知识与成果。其主要研究任务包括词性标注、机器翻译、命名实体识别、机器问答、情感分析、自动文摘、句法分析和共指消解等。自然语言作为高度抽象的符号化系统,文本间的关系难以度量,相关研究高度依赖人工构建特征。而深度学习方法的优势恰恰在于其强大的判别能力和特征自学习能力,非常适合自然语言高维数、无标签和大数据的特点。为此,本文将对当前深度学习如何应用在NLP领域展开综述性讨论,并进一步分析其中的应用难点和未来可能的突破方向。
HTML
[1] | LANDAHL H D, MCCULLOCH W S, PITTS W. A statistical consequence of the logical calculus of nervous nets[J]. Bulletin of Mathematical Biology, 1943, 5(4): 135-137. | |
[2] | HINTON G E, SALAKHUTDINOV R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507. doi: 10.1126/science.1127647 | |
[3] | BENGIO Y, LAMBLIN P, POPOVICI D, et al. Greedy layer-wise training of deep networks[C]//Proceedings of NIPS. Vancouver, Canada:MIT Press, 2007:153-160. | |
[4] | MATSUGU M, MORI K, MITARI Y. Subject independent facial expression recognition with robust face detection using a convolutional neural network[J]. Neural Networks, 2003, 16(5): 555-559. | |
[5] | 余凯, 贾磊, 陈雨强. 深度学习的昨天、今天和明天[J]. 计算机研究与发展, 2013, 9(): 1799-1804. doi: 10.7544/issn1000-1239.2013.20131180 | YU Kai, JIA Lei, CHEN Yu-qiang. Deep learning:Yesterday, today, and tomorrow[J]. Journal of Computer Research and Development, 2013, 9(): 1799-1804. doi: 10.7544/issn1000-1239.2013.20131180 |
[6] | MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[C]//Proceedings of ICLR. Scottsdale, Arizona, USA:arXiv Press, 2013:1301.3781. | |
[7] | LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436-444. doi: 10.1038/nature14539 | |
[8] | XU W, RUDNICKY A I. Can artificial neural networks learn language models?[C]//Proceedings of International Conference on Speech and Language Processing. Beijing, China:Speech Communication Press. 2000. | |
[9] | BENGIO Y, DUCHARME R, VINCENT P. A neural probabilistic language model[J]. Journal of Machine Learning Research, 2003, 3(): 1137-1155. | |
[10] | MNIH A, HINTON G E. A scalable hierarchical distributed language model[C]//Proceedings of NIPS. New York:Curran Associates Inc, 2008. | |
[11] | COLLOBERT R, WESTON J, BOTTOU L. Natural language processing (almost) from scratch[J]. The Journal of Machine Learning Research, 2011, 12(): 2493-2537. | |
[12] | TURIAN J, RATINOV L, BENGIO Y. Word representations:a simple and general method for semisupervised learning[C]//Proceedings of ACL. Uppsala, Sweden:ACL Press, 2010:384-394. | |
[13] | HUANG E H, SOCHER R, MANNING C D, et al. Improving word representations via global context and multiple word prototypes[C]//Proceedings of ACL. Jeju, Korea:ACL Press, 2012:873-882. | |
[14] | PENNINGTON J, SOCHER R, MANNING C D. Glove:Global vectors for word representation[C]//Proceedings of EMNLP. Doha, Qatar:ACL Press, 2014, 14:1532-1543. | |
[15] | MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of NIPS. Nevada, USA:MIT Press, 2013:3111-3119. | |
[16] | LEVY O, GOLDBERG Y. Neural word embedding as implicit matrix factorization[C]//Proceedings of NIPS. Montreal, Quebec:MIT Press, 2014:2177-2185. | |
[17] | SCHNABEL T, LABUTOV I, MIMNO D, et al. Evaluation methods for unsupervised word embeddings[C]//Proceedings of EMNLP. Lisbon, Portugal:ACL Press, 2015. | |
[18] | AL-RFOU R, PEROZZI B, SKIENA S. Polyglot:Distributed word representations for multilingual nlp[C]//Proceedings of CoNLL. Sofia, Bulgaria:ACL Press, 2013:183. | |
[19] | LAI S, LIU K, XU L. How to generate a good word embedding?[J]. IEEE Intelligent Systems, 2016, 31(6): 5-14. doi: 10.1109/MIS.2016.45 | |
[20] | KUSNER M, SUN Y, KOLKIN N, et al. From word embeddings to document distances[C]//Proceedings of ICML. Lille, France:Omni Press, 2015:957-966. | |
[21] | Le ROUX N, BENGIO Y. Representational power of restricted Boltzmann machines and deep belief networks[J]. Neural Computation, 2008, 20(6): 1631-1649. doi: 10.1162/neco.2008.04-07-510 | |
[22] | LEE H, EKANADHAM C, NG A Y. Sparse deep belief net model for visual area V2[C]//Proceedings of NIPS. New York, USA:ACM Press, 2008:873-880. | |
[23] | VINCENT P, LAROCHELLE H, BENGIO Y, et al. Extracting and composing robust features with denoising autoencoders[C]//Proceedings of ICML. New York, USA:ACM Press, 2008:1096-1103. | |
[24] | GEHRING J, MIAO Y, METZE F, et al. Extracting deep bottle-neck features using stacked auto-encoders[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, BC, Canada:IEEE, 2013:3377-3381. | |
[25] | WANG W, OOI B C, YANG X. Effective multimodal retrieval based on stacked auto-encoders[J]. Proceedings of the Vldb Endowment, 2014, 7(8): 649-660. doi: 10.14778/2732296 | |
[26] | XIE J, XU L, CHEN E. Image denoising and inpainting with deep neural networks[C]//Proceedings of NIPS. Nevada, USA:MIT Press, 2012:341-349. | |
[27] | GLOROT X, BORDES A, BENGIO Y. Domain adaptation for large-scale sentiment classification:a deep learning approach[C]//Proceedings of ICML. Bellevue, Washington, USA:ACM Press, 2011:513-520. | |
[28] | SUTSKEVER I, MARTENS J, HINTON G E. Generating text with recurrent neural networks[C]//Proceedings of ICML. Bellevue, Washington, USA:ACM Press, 2011, 1017-1024. | |
[29] | CHO K, VAN MERRIËNBOER B, GULCEHRE C, et al. Learning phrase representations using rnn encoder-decoder for statistical machine translation[C]//Proceedings of EMNLP Processing. Doha, Qatar:ACL Press, 2014:1724-1734. | |
[30] | GRAVES A, JAITLY N. Towards end-to-end speech recognition with recurrent neural networks[C]//Proceedings of ICML. Bejing, China:[s.n.], 2014:1764-1772. | |
[31] | WERBOS P J. Backpropagation through time:What it does and how to do it[J]. Proceedings of the IEEE, 1990, 78(10): 1550-1560. doi: 10.1109/5.58337 | |
[32] | HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. doi: 10.1162/neco.1997.9.8.1735 | |
[33] | GREFF K, SRIVASTAVA R K, KOUTNÍK J. LSTM:a search space odyssey[J]. IEEE Transactions on Neural Networks and Learning Systems, 2016, 99(): 1-11. | |
[34] | JOZEFOWICZ R, ZAREMBA W, SUTSKEVER I. An empirical exploration of recurrent network architectures[C]//Proceedings of ICML. Lille, France:Omni Press, 2015:2342-2350. | |
[35] | TANG D, QIN B, LIU T. Document modeling with gated recurrent neural network for sentiment classification[C]//Proceedings of EMNLP. Lisbon, Portugal:ACL Press, 2015:1422-1432. | |
[36] | MA X, HOVY E. End-to-end sequence labeling via bi-directional lstm-cnns-crf[C]//Proceedings of ACL. Berlin, Germany:ACL Press, 2016:1064-1074. | |
[37] | LIU S, YANG N, LI M, et al. A recursive recurrent neural network for statistical machine translation[C]//Proceedings of EMNLP. Doha, Qatar:ACL Press, 2014:1491-1500. | |
[38] | SOCHER R, HUVAL B, MANNING C D, et al. Semantic compositionality through recursive matrix-vector spaces[C]//Proceedings of the EMNLP-CoNLL. Jeju Island, Korea:ACL Press, 2012:1201-1211. | |
[39] | SOCHER R, CHEN D, MANNING C D, et al. Reasoning with neural tensor networks for knowledge base completion[C]//Proceedings of Advances in Neural Information Processing Systems. Nevada, USA:MIT Press, 2013:926-934. | |
[40] | FUKUSHIMA K. Neocognitron:a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position[J]. Biological Cybernetics, 1980, 36(4): 193-202. doi: 10.1007/BF00344251 | |
[41] | LECUN Y, BOTTOU L, BENGIO Y. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. doi: 10.1109/5.726791 | |
[42] | ZHANG X, ZHAO J, LECUN Y. Character-level convolutional networks for text classification[C]//Proceedings of NIPS. Montreal, Quebec, Canada:MIT Press, 2015:649-657. | |
[43] | SCHWENK H. Continuous space translation models for phrase-based statistical machine translation[C]//Proceedings of COLING. Mumbai, India:ACL Press, 2012:1071-1080. | |
[44] | ZOU W Y, SOCHER R, CER D, et al. Bilingual word embeddings for phrase-based machine translation[C]//Proceedings of EMNLP. Seattle, USA:ACL Press, 2013:1393-1398. | |
[45] | YANG N, LIU S, LI M, et al. Word alignment modeling with context dependent deep neural network[C]//Proceedings of ACL. Sofia, Bulgaria:ACL Press, 2013:166-175. | |
[46] | ZHANG J, LIU S, LI M, et al. Mind the gap:Machine translation by minimizing the semantic gap in embedding space[C]//AAAI Conference on Artificial Intelligence. Québec, Canada:AAAI Press, 2014. | |
[47] | KARPATHY A. The unreasonable effectiveness of recurrent neural networks[EB/OL].[2015-05-21]. https://karpathy.github.io/2015/05/21/rnn-effectiveness/. | |
[48] | 郑炜秩. 让神经网络会做唐诗[EB/OL]. [2016-02-01]. http://zhengwy.com/neural-network-for-tangshi/. | ZHENG Wei-Zhi. Let the neural network write poetry of Tang Dynasty[EB/OL].[2016-02-01]. http://zhengwy.com/neural-network-for-tangshi/. |
[49] | KARPATHY A, LI Fei-fei. Deep visual-semantic alignments for generating image descriptions[C]//Proceedings of CVPR. Boston, USA:IEEE, 2015. | |
[50] | SANTOS C D, ZADROZNY B. Learning character-level representations for part-of-speech tagging[C]//Proceedings of ICML. Beijing, China, 2014:1818-1826. | |
[51] | DAHL G E, ADAMS R P, LAROCHELLE H. Training restricted boltzmann machines on word observations[C]//Proceedings of ICML. Edinburgh, UK:Omni Press, 2012:679-686. | |
[52] | WESTON J, RATLE F, MOBAHI H, et al. Deep learning via semi-supervised embedding[M]//Neural Networks:Tricks of the Trade.[S.l.]:Springer Heidelberg, 2012:639-655. | |
[53] | TKACHENKO M, SIMANOVSKY A. Named entity recognition:Exploring features[C]//Proceeding of KONVENS. Vienna, Austria:Wien, 2012:118-127. | |
[54] | LAMPLE G, BALLESTEROS M, SUBRAMANIAN S, et al. Neural architectures for named entity recognition[C]//Proceedings of NAACL. San Diego, USA:ACL Press, 2016:260-270. | |
[55] | SOCHER R, MANNING C D, NG A Y. Learning continuous phrase representations and syntactic parsing with recursive neural networks[C]//Proceedings of the NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop. British Columbia, Canada:MIT Press, 2010:2550-2558. | |
[56] | TAI K S, SOCHER R, MANNING C D. Improved semantic representations from tree-structured long short-term memory networks[C]//Proceedings of ACL. Beijing, China:ACL Press, 2015:1556-1566. | |
[57] | SAGARA T, HAGIWARA M. Natural language neural network and its application to question-answering system[J]. Neurocomputing, 2014, 142(): 201-208. doi: 10.1016/j.neucom.2014.04.048 | |
[58] | WESTON J, CHOPRA S, BORDES A. Memory networks[C]//Proceedings of ICLR. San Diego, California, USA:arXiv Press, 2015. | |
[59] | ANDREAS J, ROHRBACH M, DARRELL T, et al. Learning to compose neural networks for question answering[C]//Proceedings of NAACL-HLT. San Diego California, USA:ACL Press, 2016:1545-1554. | |
[60] | DONG L, WEI F, ZHOU M, et al. Adaptive multicomposit-ionality for recursive neural models with applications to sentiment analysis[C]//AAAI Conference on Artificial Intelligence. Québec, Canada:AAAI Press, 2014:1537-1543. | |
[61] | TANG D, WEI F, QIN B, et al. Coooolll:a deep learning system for Twitter sentiment classification[C]//Proceedings of the 8th International Workshop on Semantic Evaluation. Dublin, Ireland:ACL Press, 2014:208-212. | |
[62] | 奚雪峰, 周国栋. 基于deep learning的代词指代消解[J]. 北京大学学报(自然科学版), 2014, 50(1): 100-110. | XI Xue-feng, ZHOU Guo-dong. Pronoun resolution based on deep learning[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2014, 50(1): 100-110. |
[63] | MENG F, LU Z, WANG M, et al. Encoding source language with convolutional neural network for machine translation[C]//Proceedings of ACL. Beijing, China:ACL Press, 2015. | |
[64] | MA L, LU Z, LI H. Learning to answer questions from image using convolutional neural network[C]//AAAI Conference on Artificial Intelligence. Phoenix, USA:[s.n.], 2016. | |
[65] | DONG L, WEI F, ZHOU M, et al. Question answering over freebase with multi-column convolutional neural networks[C]//Proceedings of ACL. Beijing, China:ACL Press, 2015:260-269. | |
[66] | FENG S, LIU S, YANG N, et al. Improving attention modeling with implicit distortion and fertility for machine translation[C]//Proceedings of COLING. Osaka, Japan:ACL Press, 2016:3082-3092. | |
[67] | YAN Z, DUAN N, BAO J, et al. DocChat:an information retrieval approach for chatbot engines using unstructured documents[C]//Proceedings of ACL. Berlin, Germany:ACL Press, 2016:516-525. | |
[68] | CHENG Y, XU W, HE Z, et al. Semi-supervised learning for neural machine translation[C]//Proceedings of ACL. Berlin, Germany:ACL Press, 2016:1965-1974. | |
[69] | LIN Y, SHEN S, LIU Z, et al. Neural relation extraction with selective attention over instances[C]//Proceedings of ACL. Berlin, Germany:ACL Press, 2016, 1:2124-2133. | |
[70] | 刘知远, 孙茂松, 林衍凯. 知识表示学习研究进展[J]. 计算机研究与发展, 2016, 2(): 247-261. doi: 10.7544/issn1000-1239.2016.20160020 | LIU Zhi-yuan, SUN Mao-song, LIN Yan-kai. Knowledge representation learning:a review[J]. Journal of Computer Research and Development, 2016, 2(): 247-261. doi: 10.7544/issn1000-1239.2016.20160020 |
[71] | LI P, ZHOU G. Joint argument inference in Chinese event extraction with argument consistency and event relevance[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(4): 612-622. doi: 10.1109/TASLP.2015.2497148 | |
[72] | WANG Z, ZHANG Y, LEE S Y M, et al. A bilingual attention network for code-switched emotion prediction[C]//Proceedings of COLING. Osaka, Japan:ACL Press, 2016:1624-1634. | |
[73] | GUO J, CHE W, WANG H, et al. Revisiting embedding features for simple semi-supervised learning[C]//Proceedings of EMNLP. Doha, Qatar:ACL Press, 2014:110-120. | |
[74] | LI X, ZHANG J, ZONG C. Towards zero unknown word in neural machine translation[C]//Proceedings of IJCAI. New York, USA:AAAI Press, 2016:2852-2858. | |
[75] | PEI W, GE T, CHANG B. An effective neural network model for graph-based dependency parsing[C]//Proceedings of ACL. Beijing, China:ACL Press, 2015. | |
[76] | BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[C]//International Conference on Learning Representations. San Diego, California, USA:arXiv Press, 2015:1409.0473V7. |