Demythization of structural XML query processing: Comparison of holistic and binary approaches

Lukáš, Petr; Bača, Radim; Krátký, Michal; Ling, Tok Wang

dc.contributor.author	Lukáš, Petr
dc.contributor.author	Bača, Radim
dc.contributor.author	Krátký, Michal
dc.contributor.author	Ling, Tok Wang
dc.date.accessioned	2021-06-23T09:04:02Z
dc.date.available	2021-06-23T09:04:02Z
dc.date.issued	2021
dc.identifier.citation	IEEE Transactions on Knowledge and Data Engineering. 2021, vol. 33, issue 4, p. 1439-1452.	cs
dc.identifier.issn	1041-4347
dc.identifier.issn	1558-2191
dc.identifier.uri	http://hdl.handle.net/10084/143107
dc.description.abstract	XML queries can be modeled by twig pattern queries (TPQs) specifying predicates on XML nodes and XPath relationships satisfied between them. A lot of TPQ types have been proposed; this paper takes into account a TPQ model extended by a specification of output and non-output query nodes since it complies with the XQuery semantics and, in many cases, it leads to a more efficient query processing. In general, there are two types of approaches to process a TPQ: holistic joins and binary joins. Whereas the binary join approach builds a query plan as a tree of interconnected binary operators, the holistic join approach evaluates a whole query using one operator (i.e., using one complex algorithm). Surprisingly, a thorough analytical and experimental comparison is still missing despite an enormous research effort in this area. In this paper, we try to fill this gap; we analytically and experimentally show that the binary joins used in a fully-pipelined plan (i.e., the plan where each join operation does not wait for the complete result of the previous operation and no explicit sorting is used) can often outperform the holistic joins, especially for TPQs with a higher ratio of non-output query nodes. The main contributions of this paper can be summarized as follows: (i) we introduce several improvements of existing binary join approaches allowing to build a fully-pipelined plan for a TPQ considering non-output query nodes, (ii) we prove that for a certain class of TPQs such a plan has the linear time complexity with respect to the size of the input and output as well as the linear space complexity with respect to the XML document depth (i.e., the same complexity as the holistic join approaches), (iii) we show that our improved binary join approach outperforms the holistic join approaches in many situations, and (iv) we propose a simple combined approach that utilizes advantages of both types of approaches.	cs
dc.language.iso	en	cs
dc.publisher	IEEE	cs
dc.relation.ispartofseries	IEEE Transactions on Knowledge and Data Engineering	cs
dc.relation.uri	https://doi.org/10.1109/TKDE.2019.2946157	cs
dc.rights	© 2019 IEEE	cs
dc.subject	XML	cs
dc.subject	query processing	cs
dc.subject	semantics	cs
dc.subject	sorting	cs
dc.subject	time complexity	cs
dc.subject	impedance matching	cs
dc.subject	structural XML query processing	cs
dc.subject	twig pattern query	cs
dc.subject	holistic joins	cs
dc.subject	binary joins	cs
dc.subject	XPath	cs
dc.subject	XQuery	cs
dc.title	Demythization of structural XML query processing: Comparison of holistic and binary approaches	cs
dc.type	article	cs
dc.identifier.doi	10.1109/TKDE.2019.2946157
dc.type.status	Peer-reviewed	cs
dc.description.source	Web of Science	cs
dc.description.volume	33	cs
dc.description.issue	4	cs
dc.description.lastpage	1452	cs
dc.description.firstpage	1439	cs
dc.identifier.wos	000626617900008

Soubory tohoto záznamu

Soubory	Velikost	Formát	Zobrazit
K tomuto záznamu nejsou připojeny žádné soubory.

Tento záznam se objevuje v následujících kolekcích

Publikační činnost VŠB-TUO ve Web of Science / Publications of VŠB-TUO in Web of Science [7798]
Kolekce obsahuje bibliografické záznamy článků akademických pracovníků VŠB-TUO publikovaných v časopisech indexovaných ve Web of Science od roku 1990 po současnost.
Publikační činnost Katedry informatiky / Publications of Department of Computer Science (460) [562]
Kolekce obsahuje bibliografické záznamy publikační činnosti (článků) akademických pracovníků Katedry informatiky (460) v časopisech a v Lecture Notes in Computer Science registrovaných ve Web of Science od roku 2003 po současnost.
Články z časopisů s impakt faktorem / Articles from Impact Factor Journals [6377]
Články z časopisů (od roku 2008), které v době vydání článku měly impakt faktor.

Zobrazit minimální záznam