-
作为机器学习和人工智能领域发展最为迅速的研究方向,深度学习受到学术界和工业界的高度关注。深度学习是基于特征自学习和深度神经网络(DNN)的一系列机器学习算法的总称。目前深度学习的研究有了长足的发展,在传统特征选择与提取框架上取得了巨大突破,对包括自然语言处理、生物医学分析、遥感影像解译在内的诸多领域产生越来越重要的影响,并在计算机视觉和语音识别领域取得了革命性的成功。
当前,如何应用深度学习技术解决自然语言处理(NLP)相关任务是深度学习的研究热点。NLP作为计算机科学与人工智能交叉领域中的重要研究方向,综合了语言学、计算机科学、逻辑学、心理学、人工智能等学科的知识与成果。其主要研究任务包括词性标注、机器翻译、命名实体识别、机器问答、情感分析、自动文摘、句法分析和共指消解等。自然语言作为高度抽象的符号化系统,文本间的关系难以度量,相关研究高度依赖人工构建特征。而深度学习方法的优势恰恰在于其强大的判别能力和特征自学习能力,非常适合自然语言高维数、无标签和大数据的特点。为此,本文将对当前深度学习如何应用在NLP领域展开综述性讨论,并进一步分析其中的应用难点和未来可能的突破方向。
-
深度信念网络(deep belief nets, DBN)是由受限玻尔兹曼机(restricted Boltzmann machine, RBM)堆叠而生成的一种模型。DBN通过训练网络的权重,使网络具有还原输入层训练数据的能力。DBN采用的训练步骤如下:
1) 当前层RBM为可见层则接收原始数据输入,否则接收上一层RBM的输出,并训练当前层RBM;
2) 网络总层数满足要求则执行步骤4),否则置下一层RBM为当前层;
3) 重复步骤1)和步骤2);
4) 微调网络,使用有监督学习算法将模型收敛到局部最优解。
文献[21]讨论了RBM和DBN网络的层数设置、网络泛化能力以及可能的扩展,并使用自编码器(auto-encoder, AE)取代DBN网络中每一层的RBM,由此简单堆叠数个AE得到的神经网络在文献[3]中被称为堆叠自编码网络(stacked auto-encoders, SAE)。目前SAE有两种典型的改进:1)在隐藏神经元加入稀疏性限制,使网络中大部分神经元处于抑制状态,形成稀疏自编码网络[22];2)在SAE网络的编码过程加入噪音,增加SAE网络的抗噪性,形成堆叠降噪自编码网络[23]。SAE网络由于强大的特征学习能力[24],被广泛使用在多模态检索[25]、图像分类[26]、情感分析[27]等诸多领域中。
-
循环神经网络(recurrent neural networks, RNN)是隐藏层和自身存在连接的一类神经网络。相较于前馈神经网络,循环神经网络可将本次隐藏层的计算结果用于下次隐藏层的计算,因此可以用来处理时间序列问题,比如文本生成[28]、机器翻译[29]和语音识别[30]。循环神经网络的优化算法为BPTT算法(backpropagation through time)[31]。由于梯度消失的原因,循环神经网络的反馈误差往往只能向后传递5~10层,因此文献[32]在循环神经网络的基础上提出长短时记忆模型(long-short term memory, LSTM)。LSTM使用Cell结构记忆之前的输入,使得网络可以学习到合适的时机重置Cell结构。LSTM有诸多结构变体,文献[33]给出了其中8种流行变体的比较。文献[34]则在超过1万种循环网络架构上进行了测试,发现并列举在某些任务上可能比LSTM更好的架构。
循环神经网络和LSTM具有许多NLP应用。文献[35]将门控循环网络用于情感分析,在IMDB等影评数据集上较SVM和CNN方法在准确率上有5%左右的提升。文献[36]使用双向LSTM网络结合卷积神经网络和条件随机场解决词性标注和命名实体识别问题,分别取得97.55%和91.21%的最好结果。
-
递归神经网络(recursive neural networks)是利用树形神经网络结构递归构造而成,用于构建句子语义信息的深度神经网络[37]。递归神经网络所用的树形结构一般是二叉树,典型的递归神经网络如图 2所示。对于句法分析任务,图 2中x表示词向量,y表示合并而成的子树向量。定义全局参数W,b和U,以及子树合理性评分变量s,对于合并节点(x1, x2→y1),有:
$$ {\mathit{\boldsymbol{y}}_1} = \tanh \left( {\mathit{\boldsymbol{W}}\left( {\begin{array}{*{20}{c}} {{\mathit{\boldsymbol{x}}_1}}\\ {{\mathit{\boldsymbol{x}}_2}} \end{array}} \right) + \mathit{\boldsymbol{b}}} \right) $$ (4) $$ {s_1} = {\mathit{\boldsymbol{U}}^{\rm{T}}}{\mathit{\boldsymbol{y}}_1} $$ (5) 随机初始全局参数,使用贪心算法,相邻的叶子节点(或子树)两两合并成子树并计算评分,取分最高者合并,直到最终形成句法树。句法分析任务是有标定的,即存在一个正确的句法树构造,因此训练目标是优化网络参数使得整个网络的评分损失最小。除了句法分析,递归神经网络还可用于关系分类[38]和情感分析[39]中。
卷积神经网络(convolutional neural networks, CNN)是由文献[40]提出并由文献[41]改进的深度神经网络。在一般前馈神经网络中,输入层和隐藏层之间采用全连接结构,而在CNN中每一个卷积层节点只与一个固定大小的区域有连接,连接的权重矩阵称为卷积核。池化(pooling)是CNN所采用的另一项关键技术,在固定大小的区域使用平均值或最大值代替原有的矩阵区域,既减少了特征数目又增加了网络的鲁棒性。
目前CNN在NLP领域的应用有许多新的尝试。文献[6]将CNN用于语义角色标注,文献[42]使用字符作为语义特征,采用大规模文本语料训练CNN模型用于本体分类、情感分析和文本分类,所用的CNN模型如图 3所示。
Deep Learning in NLP:Methods and Applications
-
摘要: 该文围绕特征表示和模型原理,以神经网络语言模型与词向量作为深度学习与自然语言处理结合的切入点,概述了当前主要深度神经网络的模型原理和相关应用。之后综述了当前研究人员在自然语言处理热点领域上所使用的最新深度学习方法并及所取得的成果。最后总结了深度学习方法在当前自然语言处理研究应用中所遇到的瓶颈,并对未来可能的研究重点做出展望。Abstract: With the rise of deep learning waves, the full force of deep learning methods has hit the Natural Language Process (NLP) and ushered in amazing technological advances in many different application areas of NLP. In this article, we firstly present the development history, main advantages and research situation of deep learning. Secondly, in terms of both feature representation and model theory, we introduces the neural language model and word embedding as the entry point, and present an overview of modeling and implementations of Deep Neural Network (DNN). Then we focus on the newest deep learning models with their wonderful and competitive performances related to different NLP tasks. At last, we discuss and summarize the existing problems of deep learning in NLP with the possible future directions.
-
Key words:
- deep learning /
- deep neural networks /
- language models /
- nature language process /
- word embedding
-
[1] LANDAHL H D, MCCULLOCH W S, PITTS W. A statistical consequence of the logical calculus of nervous nets[J]. Bulletin of Mathematical Biology, 1943, 5(4):135-137. doi: 10.1007/BF02478260 [2] HINTON G E, SALAKHUTDINOV R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786):504-507. doi: 10.1126/science.1127647 [3] BENGIO Y, LAMBLIN P, POPOVICI D, et al. Greedy layer-wise training of deep networks[C]//Proceedings of NIPS. Vancouver, Canada:MIT Press, 2007:153-160. doi: 10.1007/BF02478260 [4] MATSUGU M, MORI K, MITARI Y, et al. Subject independent facial expression recognition with robust face detection using a convolutional neural network[J]. Neural Networks, 2003, 16(5):555-559. http://europepmc.org/abstract/MED/12850007 [5] 余凯, 贾磊, 陈雨强, 等.深度学习的昨天、今天和明天[J].计算机研究与发展, 2013, 9:1799-1804. doi: 10.7544/issn1000-1239.2013.20131180 YU Kai, JIA Lei, CHEN Yu-qiang, et al. Deep learning:Yesterday, today, and tomorrow[J]. Journal of Computer Research and Development, 2013, 9:1799-1804. doi: 10.7544/issn1000-1239.2013.20131180 [6] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[C]//Proceedings of ICLR. Scottsdale, Arizona, USA:arXiv Press, 2013:1301.3781. http://arxiv.org/abs/1301.3781 [7] LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553):436-444. doi: 10.1038/nature14539 [8] XU W, RUDNICKY A I. Can artificial neural networks learn language models?[C]//Proceedings of International Conference on Speech and Language Processing. Beijing, China:Speech Communication Press. 2000. http://cpfd.cnki.com.cn/Article/CPFDTOTAL-OGSM200010004055.htm [9] BENGIO Y, DUCHARME R, VINCENT P, et al. A neural probabilistic language model[J]. Journal of Machine Learning Research, 2003, 3:1137-1155. https://dl.acm.org/citation.cfm?id=944966 [10] MNIH A, HINTON G E. A scalable hierarchical distributed language model[C]//Proceedings of NIPS. New York:Curran Associates Inc, 2008. [11] COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch[J]. The Journal of Machine Learning Research, 2011, 12:2493-2537. http://dl.acm.org/citation.cfm?id=2078183.2078186 [12] TURIAN J, RATINOV L, BENGIO Y. Word representations:a simple and general method for semisupervised learning[C]//Proceedings of ACL. Uppsala, Sweden:ACL Press, 2010:384-394. http://dl.acm.org/citation.cfm?id=1858721 [13] HUANG E H, SOCHER R, MANNING C D, et al. Improving word representations via global context and multiple word prototypes[C]//Proceedings of ACL. Jeju, Korea:ACL Press, 2012:873-882. http://dl.acm.org/citation.cfm?id=2390645 [14] PENNINGTON J, SOCHER R, MANNING C D. Glove:Global vectors for word representation[C]//Proceedings of EMNLP. Doha, Qatar:ACL Press, 2014, 14:1532-1543. https://www.researchgate.net/publication/284576917_Glove_Global_Vectors_for_Word_Representation [15] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of NIPS. Nevada, USA:MIT Press, 2013:3111-3119. http://dl.acm.org/citation.cfm?id=2999959 [16] LEVY O, GOLDBERG Y. Neural word embedding as implicit matrix factorization[C]//Proceedings of NIPS. Montreal, Quebec:MIT Press, 2014:2177-2185. https://www.researchgate.net/publication/287514944_Neural_word_embedding_as_implicit_matrix_factorization [17] SCHNABEL T, LABUTOV I, MIMNO D, et al. Evaluation methods for unsupervised word embeddings[C]//Proceedings of EMNLP. Lisbon, Portugal:ACL Press, 2015. https://www.researchgate.net/publication/301445790_Evaluation_methods_for_unsupervised_word_embeddings?ev=auth_pub [18] AL-RFOU R, PEROZZI B, SKIENA S. Polyglot:Distributed word representations for multilingual nlp[C]//Proceedings of CoNLL. Sofia, Bulgaria:ACL Press, 2013:183. http://arxiv.org/abs/1307.1662 [19] LAI S, LIU K, XU L, et al. How to generate a good word embedding?[J]. IEEE Intelligent Systems, 2016, 31(6):5-14. doi: 10.1109/MIS.2016.45 [20] KUSNER M, SUN Y, KOLKIN N, et al. From word embeddings to document distances[C]//Proceedings of ICML. Lille, France:Omni Press, 2015:957-966. http://dl.acm.org/citation.cfm?id=3045118.3045221 [21] Le ROUX N, BENGIO Y. Representational power of restricted Boltzmann machines and deep belief networks[J]. Neural Computation, 2008, 20(6):1631-1649. doi: 10.1162/neco.2008.04-07-510 [22] LEE H, EKANADHAM C, NG A Y. Sparse deep belief net model for visual area V2[C]//Proceedings of NIPS. New York, USA:ACM Press, 2008:873-880. http://dl.acm.org/citation.cfm?id=2981672 [23] VINCENT P, LAROCHELLE H, BENGIO Y, et al. Extracting and composing robust features with denoising autoencoders[C]//Proceedings of ICML. New York, USA:ACM Press, 2008:1096-1103. http://dl.acm.org/citation.cfm?id=1390294 [24] GEHRING J, MIAO Y, METZE F, et al. Extracting deep bottle-neck features using stacked auto-encoders[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, BC, Canada:IEEE, 2013:3377-3381. http://ieeexplore.ieee.org/document/6638284 [25] WANG W, OOI B C, YANG X, et al. Effective multimodal retrieval based on stacked auto-encoders[J]. Proceedings of the Vldb Endowment, 2014, 7(8):649-660. doi: 10.14778/2732296 [26] XIE J, XU L, CHEN E. Image denoising and inpainting with deep neural networks[C]//Proceedings of NIPS. Nevada, USA:MIT Press, 2012:341-349. http://dl.acm.org/citation.cfm?id=2999173 [27] GLOROT X, BORDES A, BENGIO Y. Domain adaptation for large-scale sentiment classification:a deep learning approach[C]//Proceedings of ICML. Bellevue, Washington, USA:ACM Press, 2011:513-520. http://dl.acm.org/citation.cfm?id=3104547 [28] SUTSKEVER I, MARTENS J, HINTON G E. Generating text with recurrent neural networks[C]//Proceedings of ICML. Bellevue, Washington, USA:ACM Press, 2011, 1017-1024. https://www.researchgate.net/publication/221345823_Generating_Text_with_Recurrent_Neural_Networks [29] CHO K, VAN MERRIËNBOER B, GULCEHRE C, et al. Learning phrase representations using rnn encoder-decoder for statistical machine translation[C]//Proceedings of EMNLP Processing. Doha, Qatar:ACL Press, 2014:1724-1734. http://arxiv.org/abs/1406.1078 [30] GRAVES A, JAITLY N. Towards end-to-end speech recognition with recurrent neural networks[C]//Proceedings of ICML. Bejing, China:[s.n.], 2014:1764-1772. https://www.researchgate.net/publication/288623601_Towards_end-to-end_speech_recognition_with_recurrent_neural_networks [31] WERBOS P J. Backpropagation through time:What it does and how to do it[J]. Proceedings of the IEEE, 1990, 78(10):1550-1560. doi: 10.1109/5.58337 [32] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780. doi: 10.1162/neco.1997.9.8.1735 [33] GREFF K, SRIVASTAVA R K, KOUTNÍK J, et al. LSTM:a search space odyssey[J]. IEEE Transactions on Neural Networks and Learning Systems, 2016, 99:1-11. [34] JOZEFOWICZ R, ZAREMBA W, SUTSKEVER I. An empirical exploration of recurrent network architectures[C]//Proceedings of ICML. Lille, France:Omni Press, 2015:2342-2350. http://dl.acm.org/citation.cfm?id=3045367 [35] TANG D, QIN B, LIU T. Document modeling with gated recurrent neural network for sentiment classification[C]//Proceedings of EMNLP. Lisbon, Portugal:ACL Press, 2015:1422-1432. https://www.researchgate.net/publication/301446024_Document_Modeling_with_Gated_Recurrent_Neural_Network_for_Sentiment_Classification [36] MA X, HOVY E. End-to-end sequence labeling via bi-directional lstm-cnns-crf[C]//Proceedings of ACL. Berlin, Germany:ACL Press, 2016:1064-1074. http://arxiv.org/abs/1603.01354 [37] LIU S, YANG N, LI M, et al. A recursive recurrent neural network for statistical machine translation[C]//Proceedings of EMNLP. Doha, Qatar:ACL Press, 2014:1491-1500. https://www.researchgate.net/publication/270878065_A_Recursive_Recurrent_Neural_Network_for_Statistical_Machine_Translation [38] SOCHER R, HUVAL B, MANNING C D, et al. Semantic compositionality through recursive matrix-vector spaces[C]//Proceedings of the EMNLP-CoNLL. Jeju Island, Korea:ACL Press, 2012:1201-1211. http://dl.acm.org/citation.cfm?id=2391084 [39] SOCHER R, CHEN D, MANNING C D, et al. Reasoning with neural tensor networks for knowledge base completion[C]//Proceedings of Advances in Neural Information Processing Systems. Nevada, USA:MIT Press, 2013:926-934. http://dl.acm.org/citation.cfm?id=2999611.2999715 [40] FUKUSHIMA K. Neocognitron:a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position[J]. Biological Cybernetics, 1980, 36(4):193-202. doi: 10.1007/BF00344251 [41] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11):2278-2324. doi: 10.1109/5.726791 [42] ZHANG X, ZHAO J, LECUN Y. Character-level convolutional networks for text classification[C]//Proceedings of NIPS. Montreal, Quebec, Canada:MIT Press, 2015:649-657. http://arxiv.org/abs/1509.01626 [43] SCHWENK H. Continuous space translation models for phrase-based statistical machine translation[C]//Proceedings of COLING. Mumbai, India:ACL Press, 2012:1071-1080. https://www.researchgate.net/publication/270878785_Continuous_Space_Translation_Models_for_Phrase-Based_Statistical_Machine_Translation [44] ZOU W Y, SOCHER R, CER D, et al. Bilingual word embeddings for phrase-based machine translation[C]//Proceedings of EMNLP. Seattle, USA:ACL Press, 2013:1393-1398. https://www.researchgate.net/publication/290650943_Bilingual_word_embeddings_for_phrase-based_machine_translation [45] YANG N, LIU S, LI M, et al. Word alignment modeling with context dependent deep neural network[C]//Proceedings of ACL. Sofia, Bulgaria:ACL Press, 2013:166-175. https://www.researchgate.net/publication/270877853_Word_Alignment_Modeling_with_Context_Dependent_Deep_Neural_Network [46] ZHANG J, LIU S, LI M, et al. Mind the gap:Machine translation by minimizing the semantic gap in embedding space[C]//AAAI Conference on Artificial Intelligence. Québec, Canada:AAAI Press, 2014. http://dl.acm.org/citation.cfm?id=2892753.2892783 [47] KARPATHY A. The unreasonable effectiveness of recurrent neural networks[EB/OL].[2015-05-21]. https://karpathy.github.io/2015/05/21/rnn-effectiveness/. [48] 郑炜秩. 让神经网络会做唐诗[EB/OL]. [2016-02-01]. http://zhengwy.com/neural-network-for-tangshi/. ZHENG Wei-Zhi. Let the neural network write poetry of Tang Dynasty[EB/OL].[2016-02-01]. http://zhengwy.com/neural-network-for-tangshi/. [49] KARPATHY A, LI Fei-fei. Deep visual-semantic alignments for generating image descriptions[C]//Proceedings of CVPR. Boston, USA:IEEE, 2015. http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=7298932 [50] SANTOS C D, ZADROZNY B. Learning character-level representations for part-of-speech tagging[C]//Proceedings of ICML. Beijing, China, 2014:1818-1826. http://dl.acm.org/citation.cfm?id=3045095 [51] DAHL G E, ADAMS R P, LAROCHELLE H. Training restricted boltzmann machines on word observations[C]//Proceedings of ICML. Edinburgh, UK:Omni Press, 2012:679-686. http://arxiv.org/abs/1202.5695 [52] WESTON J, RATLE F, MOBAHI H, et al. Deep learning via semi-supervised embedding[M]//Neural Networks:Tricks of the Trade.[S.l.]:Springer Heidelberg, 2012:639-655. [53] TKACHENKO M, SIMANOVSKY A. Named entity recognition:Exploring features[C]//Proceeding of KONVENS. Vienna, Austria:Wien, 2012:118-127. http://www.oegai.at/konvens2012/proceedings/17_tkachenko12o/ [54] LAMPLE G, BALLESTEROS M, SUBRAMANIAN S, et al. Neural architectures for named entity recognition[C]//Proceedings of NAACL. San Diego, USA:ACL Press, 2016:260-270. http://arxiv.org/abs/1603.01360 [55] SOCHER R, MANNING C D, NG A Y. Learning continuous phrase representations and syntactic parsing with recursive neural networks[C]//Proceedings of the NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop. British Columbia, Canada:MIT Press, 2010:2550-2558. http://www.researchgate.net/publication/228569700_Learning_continuous_phrase_representations_and_syntactic_parsing_with_recursive_neural_networks [56] TAI K S, SOCHER R, MANNING C D. Improved semantic representations from tree-structured long short-term memory networks[C]//Proceedings of ACL. Beijing, China:ACL Press, 2015:1556-1566. http://www.oalib.com/paper/4070890 [57] SAGARA T, HAGIWARA M. Natural language neural network and its application to question-answering system[J]. Neurocomputing, 2014, 142:201-208. doi: 10.1016/j.neucom.2014.04.048 [58] WESTON J, CHOPRA S, BORDES A. Memory networks[C]//Proceedings of ICLR. San Diego, California, USA:arXiv Press, 2015. http://www.oalib.com/paper/4047781 [59] ANDREAS J, ROHRBACH M, DARRELL T, et al. Learning to compose neural networks for question answering[C]//Proceedings of NAACL-HLT. San Diego California, USA:ACL Press, 2016:1545-1554. http://arxiv.org/abs/1601.01705 [60] DONG L, WEI F, ZHOU M, et al. Adaptive multicomposit-ionality for recursive neural models with applications to sentiment analysis[C]//AAAI Conference on Artificial Intelligence. Québec, Canada:AAAI Press, 2014:1537-1543. https://www.researchgate.net/publication/289108770_Adaptive_multi-compositionality_for_recursive_neural_models_with_applications_to_sentiment_analysis [61] TANG D, WEI F, QIN B, et al. Coooolll:a deep learning system for Twitter sentiment classification[C]//Proceedings of the 8th International Workshop on Semantic Evaluation. Dublin, Ireland:ACL Press, 2014:208-212. https://www.researchgate.net/publication/288013084_Coooolll_A_Deep_Learning_System_for_Twitter_Sentiment_Classification [62] 奚雪峰, 周国栋.基于deep learning的代词指代消解[J].北京大学学报(自然科学版), 2014, 50(1):100-110. http://kns.cnki.net/KCMS/detail/detail.aspx?filename=bjdz201401015&dbname=CJFD&dbcode=CJFQ XI Xue-feng, ZHOU Guo-dong. Pronoun resolution based on deep learning[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2014, 50(1):100-110. http://kns.cnki.net/KCMS/detail/detail.aspx?filename=bjdz201401015&dbname=CJFD&dbcode=CJFQ [63] MENG F, LU Z, WANG M, et al. Encoding source language with convolutional neural network for machine translation[C]//Proceedings of ACL. Beijing, China:ACL Press, 2015. http://www.oalib.com/paper/4072112 [64] MA L, LU Z, LI H. Learning to answer questions from image using convolutional neural network[C]//AAAI Conference on Artificial Intelligence. Phoenix, USA:[s.n.], 2016. http://arxiv.org/abs/1506.00333 [65] DONG L, WEI F, ZHOU M, et al. Question answering over freebase with multi-column convolutional neural networks[C]//Proceedings of ACL. Beijing, China:ACL Press, 2015:260-269. http://www.researchgate.net/publication/301404590_Question_Answering_over_Freebase_with_Multi-Column_Convolutional_Neural_Networks [66] FENG S, LIU S, YANG N, et al. Improving attention modeling with implicit distortion and fertility for machine translation[C]//Proceedings of COLING. Osaka, Japan:ACL Press, 2016:3082-3092. [67] YAN Z, DUAN N, BAO J, et al. DocChat:an information retrieval approach for chatbot engines using unstructured documents[C]//Proceedings of ACL. Berlin, Germany:ACL Press, 2016:516-525. http://www.researchgate.net/publication/306093305_DocChat_An_Information_Retrieval_Approach_for_Chatbot_Engines_Using_Unstructured_Documents [68] CHENG Y, XU W, HE Z, et al. Semi-supervised learning for neural machine translation[C]//Proceedings of ACL. Berlin, Germany:ACL Press, 2016:1965-1974. http://arxiv.org/abs/1606.04596 [69] LIN Y, SHEN S, LIU Z, et al. Neural relation extraction with selective attention over instances[C]//Proceedings of ACL. Berlin, Germany:ACL Press, 2016, 1:2124-2133. http://www.researchgate.net/publication/306093646_Neural_Relation_Extraction_with_Selective_Attention_over_Instances [70] 刘知远, 孙茂松, 林衍凯, 等.知识表示学习研究进展[J].计算机研究与发展, 2016, 2:247-261. doi: 10.7544/issn1000-1239.2016.20160020 LIU Zhi-yuan, SUN Mao-song, LIN Yan-kai, et al. Knowledge representation learning:a review[J]. Journal of Computer Research and Development, 2016, 2:247-261. doi: 10.7544/issn1000-1239.2016.20160020 [71] LI P, ZHOU G. Joint argument inference in Chinese event extraction with argument consistency and event relevance[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(4):612-622. doi: 10.1109/TASLP.2015.2497148 [72] WANG Z, ZHANG Y, LEE S Y M, et al. A bilingual attention network for code-switched emotion prediction[C]//Proceedings of COLING. Osaka, Japan:ACL Press, 2016:1624-1634. [73] GUO J, CHE W, WANG H, et al. Revisiting embedding features for simple semi-supervised learning[C]//Proceedings of EMNLP. Doha, Qatar:ACL Press, 2014:110-120. https://www.researchgate.net/publication/301404891_Revisiting_Embedding_Features_for_Simple_Semi-supervised_Learning [74] LI X, ZHANG J, ZONG C. Towards zero unknown word in neural machine translation[C]//Proceedings of IJCAI. New York, USA:AAAI Press, 2016:2852-2858. http://dl.acm.org/citation.cfm?id=3061020 [75] PEI W, GE T, CHANG B. An effective neural network model for graph-based dependency parsing[C]//Proceedings of ACL. Beijing, China:ACL Press, 2015. https://www.researchgate.net/publication/283556656_An_Effective_Neural_Network_Model_for_Graph-based_Dependency_Parsing [76] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[C]//International Conference on Learning Representations. San Diego, California, USA:arXiv Press, 2015:1409.0473V7. http://www.oalib.com/paper/4068727