Метод автоматичного екстрактивного узагальнення тексту на основі рекурентних нейронних мереж

Ostap V. Ivanyshyn; Anatoliy Ye. Batyuk

doi:10.20535/1810-0546.2018.4.141286

Authors

Ostap V. Ivanyshyn National University "Lviv Polytechnic", Ukraine https://orcid.org/0000-0001-5640-0087
Anatoliy Ye. Batyuk National University "Lviv Polytechnic", Ukraine https://orcid.org/0000-0001-7650-7383

DOI:

https://doi.org/10.20535/1810-0546.2018.4.141286

Keywords:

Recurrent neural network, Extractive text summarization, Deep learning, Natural language processing, Unsupervised learning

Abstract

Background. The article deals with the solution of the problem of automatic extractive text summarization on the basis of recurrent artificial neural network, using graph interpretation of the text and a text unit importance estimator. Abstractive approach is much more complex than extractive as it requires network to generate personal thought vector which is not obliged to contain words from input text as well as it should be built grammatically correct. The text unit importance estimator uses recommendation rating principle which balances the graph weights depending on the popularity of text units. The principle of unsupervised learning is much closer to real biological learning process and doesn’t require labeled preprocessed dataset.

Objective. The aim of the paper is the method of automatic extractive text summarization based on recurrent artificial neural networks using unsupervised learning.

Methods. An algorithm for the achievement of deeper abstract text processing using the interpretation of the text in the form of a graph is proposed. The algorithm uses elements of graph theory and methods of algorithms’ design. The text unit importance estimator uses recommendation rating principle.

Results. In relative comparison, the performance of the directed graph based on neural network is almost 5 % higher than undirected graph network version. Using graph interpretation algorithm, the network performance is 15 % higher than the usual simple lexical n-gram representation.

Conclusions. This method is characterised in that it takes into account its own structure of the text, instead of processing the text as simple rows of lexical and semantic terms. It is the transformation of the text into a multidimensional oriented graph that opens the potential for much more abstract text processing. Practical application, in its turn, covers a large area of continuous processing of not only social networks and news, blogs, articles or communications, but also the fields of education, genetics and medicine.

Author Biographies

Ostap V. Ivanyshyn, National University "Lviv Polytechnic"

Остап Васильович Іванишин

Anatoliy Ye. Batyuk, National University "Lviv Polytechnic"

Анатолій Євгенович Батюк

References

D. Das and A.F.T. Martins, “A survey on automatic text summarization”, Carnegie Mellon University, Language Technologies Institute, Tech. Rep., 2007.

C.Y. Lin and E.H. Hovy, “Identifying topics by position”, in Proc. 5th Conf. Appl. Natural Language Proc., Washington D.C., March 1997.

Y. Ouyang et al., “Learning similarity functions in graph-based document summarization”, in Lecture Notes in Computer Science, vol. 5459, W. Li and D.M. Aliod, eds. Springer, 2009, pp. 189–200. doi: 10.1007/978-3-642-00831-3_18

R. Barzilay and K. McKeown, “Sentence fusion for multi-document news summarization”, Computational Linguistics, vol. 31, no. 3, pp. 297–328, 2005. doi: 10.1162/089120105774321091

G. Salton et al., “Automatic text structuring and summarization”, Inform. Proces. Manag., vol. 33, no. 2, pp. 193–207, 1997. doi: 10.1016/S0306-4573(96)00062-3

R. Mihalcea and P. Tarau, “Textrank: Bringing order into texts”, in Proc. EMNLP 2004, Barcelona, Spain, July 2004, pp. 404–411.

G. Erkan and D.R. Radev, “Lexrank: Graph-based lexical centrality as salience in text summarization”, J. Artif. Intell. Res., vol. 22, no. 1, pp. 457–479, 2004.

L. Page et al., “The PageRank citation ranking: Bringing order to the web”, in Proc. 7th Int. World Wide Web Conf., Brisbane, Australia, 1998, pp. 161–172.

J.M. Kleinberg, “Authoritative sources in a hyperlinked environment”, JACM, vol. 46, no. 5, pp. 604–632, 1999. doi: 10.1145/324133.324140

P.J.J. Herings et al., “Measuring the power of nodes in digraphs”, Maastricht University, Maastricht Research School of Economics of Technology and Organization, TI 2001–096/1, 2001.

C.D. Manning et al., Introduction to Information Retrieval. Cambridge, UK: Cambridge University Press, 2008.

F. Geche et al., “Boolean neuro-functions and synthesis of the recognition device in the neuro-base”, Bulletin of the National University “Lviv Polytechnic”. Ser. Comp. Sci. Inform. Technol., no. 598, pp. 44–50, 2007.

V. Kotsovsky et al., “Estimates of the magnitude of integer weighted coefficients of two-threshold neuronal elements”, Bulletin of the National University “Lviv Polytechnic”. Ser. Comp. Sci. Inform. Technol., no. 694, pp. 292–296, 2011.

І. Cmoc et al., “Realization of the neural element based on previous calculations”, Bulletin of the National University “Lviv Polytechnic”. Ser. Comp. Sci. Inform. Technol., no. 710, pp. 11–18, 2011.

Method of Automatic Extractive Text Summarization on the Basis of Recurrent Neural Networks

Authors

DOI:

Keywords:

Abstract

Author Biographies

Ostap V. Ivanyshyn, National University "Lviv Polytechnic"

Anatoliy Ye. Batyuk, National University "Lviv Polytechnic"

References

Downloads

Published

Issue

Section

License

Developed By