Using Stochastic Automaton for Data Consolidation
DOI:
https://doi.org/10.20535/1810-0546.2017.2.100011Keywords:
Open data sources, Data consolidation, Information-analytical systems, Information retrieval systems, Probabilistic models, Relevance, Big data tasksAbstract
Background. Development of methods and algorithms for efficient search of relevant information on demand. The article deals with the consolidation of data for subsequent use in the information and analytical systems.
Objective. The aim of the paper is to identify capabilities and build relevant information search algorithms from disparate sources by analyzing the probability information identifying the possible presence of relevant documents in these sources.
Methods. To find the relevant information for search queries the approach based on the use of probability estimates of relevant documents available in the sources of further increasing the number of selected documents from these sources to analyze their relevance to the query is used.
Results. A stochastic programmable automaton structure to ensure selection of the most possible information sources by relevance parameters and information retrieval algorithm based on the use of stochastic automaton were developed.
Conclusions. The described algorithm using stochastic automaton for data consolidation allows developing a set of software tools, provides plenty full and holistic data consolidation problem-solving for diverse systems which search for information from information sources different in composition and presentation type.
References
D.V. Khaustov et al., “Standardization of educational resources based on object-oriented approach”, in Proc. V(XXIX) Int. Interuniversity School Seminar “Methods and Diagnostic Tools in Technology and Society (MiZD TS-201)”, Ivano-Frankivs’k, 2015, pp. 81–85 (in Ukrainian).
G.V. Pevtsov et al., “Analysis of information consolidation methods and their application features”, Visnyk "KhPI". Special Edition: Information and Modelling, no. 39, pp. 45–153, 2007 (in Ukrainian).
N.B. Shakhovs’ka, “Processing methods of consolidated data using data space”, Problemy Prohramuvannya, no. 4, pp. 72–84, 2011 (in Ukrainian).
L. Cherniak. (2011). Big Data – A New Theory and Practice [Online]. Available: https://www.osp.ru/os/2011/10/13010990 (in Russian).
G. Salton et al., “Extended boolean information retrieval”, CACM, vol. 26, no. 11, pp. 1022–1036, 1983. doi: 10.1145/ 182.358466
I.M. Yaglom, Boolean Structure and Its Models. Moscow, SU: Sovetskoe Radio, 1980 (in Russian).
V.I. Ukhobotov, Selected Chapters of the Theory of Fuzzy Sets. Cheliabinsk, Russia: Publ. House of Cheliabinsk State University, 2011 (in Russian).
R. Baeza-Yates and B. Rebeiro-Neto. Modern Information Retrieval. Menlo Parl, California, New York: ACM Press, Addison-Wesley, 1999.
G. Salton et al., “A vector space model for automatic indexing”, CACM, vol. 18, no. 11, pp. 613–620, 1975. doi: 10.1145/ 361219.361220
A.G. Dubinskiy, “Some questions of application of vector model of document representation in information search”, Upravljajushhie Sistemy i Mashiny, no. 4, pp.77–83, 2001 (in Russian).
T. Landauer et al., “An introduction to latent semantic analysis”, Discourse Processes, vol. 25, no. 2-3, pp. 259–284, 1998. doi: 10.1080/01638539809545028
D.V. Bondarchuk, “The use of latent-semantic analysis in the case of text classification by emotional coloring”, Bjulleten' Rezul'tatov Nauchnyh Issledovanij, vol. 2, no. 3, pp. 146–151, 2012 (in Russian).
S.E. Robertson, “The probabilistic ranking principle in IR”, J. Documentation, vol. 33, no. 4, pp. 294–304, 1977. doi: 10.1108/eb026647
M.E. Maron and J.L. Kuhns, “On relevance, probabilistic indexing, and information retrieval”, JACM, vol. 7, no. 3, pp. 216–244, 1960. doi: 10.1145/321033.321035
D.V. Lande et al., Navigation in Complex Networks: Models and Algorithms. Moscow, Russia: LIBROCOM, 2009 (in Russian).
C.D. Manning et al., Introduction to Information Search. Moscow, Russia: I.D. Williams, 2011 (in Russian).
L.A. Rastrigin and K.K. Ripa, Automate Random Search Theory. Riga, Latvia: Zinatne, 1973 (in Russian).
V.O. Kuzminykh and O.S. Boichenko, “The information request automatic geocoding system”, in Proc. V(XXIX) Int. Interuniversity School Seminar “Methods and Diagnostic Tools in Technology and Society (MiZD TS-201)”, Ivano-Frankivs’k, 2015, pp. 12–17 (in Ukrainian).
V.O. Kuzminykh and O.S. Boichenko, “The user request automatic geocoding system”, in Environmental Security Clusters: Energy, Environment, Information Technology. Kyiv, Ukraine: “MP Lesia”, NTUU “KPI”, 2015, pp. 217–222 (in Ukrainian).
Downloads
Published
Issue
Section
License
Copyright (c) 2017 NTUU KPI Authors who publish with this journal agree to the following terms:- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under CC BY 4.0 that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work