Using Stochastic Automaton for Data Consolidation

Olexandr V. Koval, Valeriy O. Kuzminykh, Dmitriy V. Khaustov

Abstract


Background. Development of methods and algorithms for efficient search of relevant information on demand. The article deals with the consolidation of data for subsequent use in the information and analytical systems.

Objective. The aim of the paper is to identify capabilities and build relevant information search algorithms from disparate sources by analyzing the probability information identifying the possible presence of relevant documents in these sources.

Methods. To find the relevant information for search queries the approach based on the use of probability estimates of relevant documents available in the sources of further increasing the number of selected documents from these sources to analyze their relevance to the query is used.

Results. A stochastic programmable automaton structure to ensure selection of the most possible information sources by relevance parameters and information retrieval algorithm based on the use of stochastic automaton were developed.

Conclusions. The described algorithm using stochastic automaton for data consolidation allows developing a set of software tools, provides plenty full and holistic data consolidation problem-solving for diverse systems which search for information from information sources different in composition and presentation type.


Keywords


Open data sources; Data consolidation; Information-analytical systems; Information retrieval systems; Probabilistic models; Relevance; Big data tasks

Full Text:

PDF

References


D.V. Khaustov et al., “Standardization of educational resources based on object-oriented approach”, in Proc. V(XXIX) Int. Interuniversity School Seminar “Methods and Diagnostic Tools in Technology and Society (MiZD TS-201)”, Ivano-Frankivs’k, 2015, pp. 81–85 (in Ukrainian).

G.V. Pevtsov et al., “Analysis of information consolidation methods and their application features”, Visnyk "KhPI". Special Edition: Information and Modelling, no. 39, pp. 45–153, 2007 (in Ukrainian).

N.B. Shakhovs’ka, “Processing methods of consolidated data using data space”, Problemy Prohramuvannya, no. 4, pp. 72–84, 2011 (in Ukrainian).

L. Cherniak. (2011). Big Data – A New Theory and Practice [Online]. Available: https://www.osp.ru/os/2011/10/13010990 (in Russian).

G. Salton et al., “Extended boolean information retrieval”, CACM, vol. 26, no. 11, pp. 1022–1036, 1983. doi: 10.1145/ 182.358466

I.M. Yaglom, Boolean Structure and Its Models. Moscow, SU: Sovetskoe Radio, 1980 (in Russian).

V.I. Ukhobotov, Selected Chapters of the Theory of Fuzzy Sets. Cheliabinsk, Russia: Publ. House of Cheliabinsk State University, 2011 (in Russian).

R. Baeza-Yates and B. Rebeiro-Neto. Modern Information Retrieval. Menlo Parl, California, New York: ACM Press, Addison-Wesley, 1999.

G. Salton et al., “A vector space model for automatic indexing”, CACM, vol. 18, no. 11, pp. 613–620, 1975. doi: 10.1145/ 361219.361220

A.G. Dubinskiy, “Some questions of application of vector model of document representation in information search”, Upravljajushhie Sistemy i Mashiny, no. 4, pp.77–83, 2001 (in Russian).

T. Landauer et al., “An introduction to latent semantic analysis”, Discourse Processes, vol. 25, no. 2-3, pp. 259–284, 1998. doi: 10.1080/01638539809545028

D.V. Bondarchuk, “The use of latent-semantic analysis in the case of text classification by emotional coloring”, Bjulleten' Rezul'tatov Nauchnyh Issledovanij, vol. 2, no. 3, pp. 146–151, 2012 (in Russian).

S.E. Robertson, “The probabilistic ranking principle in IR”, J. Documentation, vol. 33, no. 4, pp. 294–304, 1977. doi: 10.1108/eb026647

M.E. Maron and J.L. Kuhns, “On relevance, probabilistic indexing, and information retrieval”, JACM, vol. 7, no. 3, pp. 216–244, 1960. doi: 10.1145/321033.321035

D.V. Lande et al., Navigation in Complex Networks: Models and Algorithms. Moscow, Russia: LIBROCOM, 2009 (in Russian).

C.D. Manning et al., Introduction to Information Search. Moscow, Russia: I.D. Williams, 2011 (in Russian).

L.A. Rastrigin and K.K. Ripa, Automate Random Search Theory. Riga, Latvia: Zinatne, 1973 (in Russian).

V.O. Kuzminykh and O.S. Boichenko, “The information request automatic geocoding system”, in Proc. V(XXIX) Int. Interuniversity School Seminar “Methods and Diagnostic Tools in Technology and Society (MiZD TS-201)”, Ivano-Frankivs’k, 2015, pp. 12–17 (in Ukrainian).

V.O. Kuzminykh and O.S. Boichenko, “The user request automatic geocoding system”, in Environmental Security Clusters: Energy, Environment, Information Technology. Kyiv, Ukraine: “MP Lesia”, NTUU “KPI”, 2015, pp. 217–222 (in Ukrainian).


GOST Style Citations






DOI: https://doi.org/10.20535/1810-0546.2017.2.100011

Refbacks

  • There are currently no refbacks.


Copyright (c) 2017 NTUU KPI