Similarity joins in relational database systems /

State-of-the-art database systems manage and process a variety of complex objects, including strings and trees. For such objects equality comparisons are often not meaningful and must be replaced by similarity comparisons. This book describes the concepts and techniques to incorporate similarity int...

Full description

Bibliographic Details
Main Authors:	Augsten, Nikolaus (Author), B�ohlen, Michael H. (Author)
Format:	eBook
Language:	English
Published:	San Rafael, California (1537 Fourth Street, San Rafael, CA 94901 USA) : Morgan & Claypool, 2014.
Series:	Synthesis digital library of engineering and computer science. Synthesis lectures on data management ; # 38.
Subjects:	Relational databases. Similarity transformations. edit distance lower bound pq-grams q-grams similarity similarity join strings token-based distance trees upper bound
Online Access:	Abstract with links to full text


LEADER	06523nam a2200757 i 4500
001	201310DTM038
005	20160320103534.0
006	m eo d
007	cr cn \|\|\|m\|\|\|a
008	131221s2014 caua foab 001 0 eng d
020			\|a 9781627050296 \|q (ebook)
020			\|z 9781627050289 \|q (paperback)
024	7		\|a 10.2200/S00544ED1V01Y201310DTM038 \|2 doi
035			\|a (CaBNVSL)swl00402968
035			\|a (OCoLC)866563916
040			\|a CaBNVSL \|b eng \|e rda \|c CaBNVSL \|d CaBNVSL
050		4	\|a QA76.9.D3 \|b A938 2014
082	0	4	\|a 005.7565 \|2 23
100	1		\|a Augsten, Nikolaus., \|e author.
245	1	0	\|a Similarity joins in relational database systems / \|c Nikolaus Augsten, Michael H. B�ohlen.
264		1	\|a San Rafael, California (1537 Fourth Street, San Rafael, CA 94901 USA) : \|b Morgan & Claypool, \|c 2014.
300			\|a 1 PDF (xvii, 106 pages) : \|b illustrations.
336			\|a text \|2 rdacontent
337			\|a electronic \|2 isbdmedia
338			\|a online resource \|2 rdacarrier
490	1		\|a Synthesis lectures on data management, \|x 2153-5426 ; \|v # 38
500			\|a Part of: Synthesis digital library of engineering and computer science.
500			\|a Series from website.
504			\|a Includes bibliographical references (pages 93-101) and index.
505	0		\|a 1. Introduction -- 1.1 Applications of similarity queries -- 1.2 Edit-based similarity measures -- 1.3 Token-based similarity measures --
505	8		\|a 2. Data types -- 2.1 Strings -- 2.2 Trees --
505	8		\|a 3. Edit-based distances -- 3.1 String edit distance -- 3.1.1 Definition of the string edit distance -- 3.1.2 Computation of the string edit distance -- 3.2 Tree edit distance -- 3.2.1 Definition of the tree edit distance -- 3.2.2 Computation of the tree edit distance -- 3.2.3 Constrained tree edit distance -- 3.2.4 Unordered tree edit distance -- 3.3 Further readings --
505	8		\|a 4. Token-based distances -- 4.1 Sets and bags -- 4.1.1 Counting approach -- 4.1.2 Frequency approach -- 4.2 Similarity measures for sets and bags -- 4.2.1 Overlap similarity -- 4.2.2 Jaccard similarity -- 4.2.3 Dice similarity -- 4.2.4 Converting threshold constraints -- 4.3 String tokens -- 4.3.1 q-gram tokens -- 4.4 Tokens for ordered trees -- 4.4.1 Overview of ordered tree tokens -- 4.4.2 The pq-gram distance -- 4.4.3 An algorithm for the pq-gram index -- 4.4.4 Relational implementation -- 4.5 Tokens for unordered trees -- 4.5.1 Overview of unordered tree tokens -- 4.5.2 Desired properties for unordered tree decompositions -- 4.5.3 The windowed pq-gram distance -- 4.5.4 Properties of windowed pq-grams -- 4.5.5 Building the windowed pq-gram index -- 4.6 Discussion: properties of tree tokens -- 4.7 Further readings --
505	8		\|a 5. Query processing techniques -- 5.1 Filters -- 5.2 Lower and upper bounds -- 5.3 String distance bounds -- 5.3.1 Length filter -- 5.3.2 Count filter -- 5.3.3 Positional count filter -- 5.3.4 Using string filters in a relational database -- 5.4 Tree distance bounds -- 5.4.1 Size lower bound -- 5.4.2 Intersection lower bound -- 5.4.3 Traversal string lower bound -- 5.4.4 pq-gram lower bound -- 5.4.5 Binary branch lower bound -- 5.4.6 Constrained edit distance upper bound -- 5.5 Further readings --
505	8		\|a 6. Filters for token equality joins -- 6.1 Token equality join, avoiding empty intersections -- 6.2 Prefix filter, avoiding small intersections --6.2.1 Prefix filter for overlap similarity -- 6.2.2 Prefix filter for jaccard similarity -- 6.2.3 Effectiveness of prefix filtering -- 6.3 Size filter -- 6.4 Positional filter -- 6.5 Partitioning filter -- 6.6 Further readings --
505	8		\|a 7. Conclusion -- Bibliography -- Authors' biographies -- Index.
506			\|a Abstract freely available; full-text restricted to subscribers or individual document purchasers.
510	0		\|a Compendex
510	0		\|a Google book search
510	0		\|a Google scholar
510	0		\|a INSPEC
520	3		\|a State-of-the-art database systems manage and process a variety of complex objects, including strings and trees. For such objects equality comparisons are often not meaningful and must be replaced by similarity comparisons. This book describes the concepts and techniques to incorporate similarity into database systems. We start out by discussing the properties of strings and trees, and identify the edit distance as the de facto standard for comparing complex objects. Since the edit distance is computationally expensive, token-based distances have been introduced to speed up edit distance computations. The basic idea is to decompose complex objects into sets of tokens that can be compared efficiently. Token-based distances are used to compute an approximation of the edit distance and prune expensive edit distance calculations. A key observation when computing similarity joins is that many of the object pairs, for which the similarity is computed, are very different from each other. Filters exploit this property to improve the performance of similarity joins. A filter preprocesses the input data sets and produces a set of candidate pairs. The distance function is evaluated on the candidate pairs only. We describe the essential query processing techniques for filters based on lower and upper bounds. For token equality joins we describe prefix, size, positional and partitioning filters, which can be used to avoid the computation of small intersections that are not needed since the similarity would be too low.
530			\|a Also available in print.
538			\|a Mode of access: World Wide Web.
538			\|a System requirements: Adobe Acrobat Reader.
588			\|a Title from PDF title page (viewed on December 21, 2013).
650		0	\|a Relational databases.
650		0	\|a Similarity transformations.
653			\|a edit distance
653			\|a lower bound
653			\|a pq-grams
653			\|a q-grams
653			\|a similarity
653			\|a similarity join
653			\|a strings
653			\|a token-based distance
653			\|a trees
653			\|a upper bound
700	1		\|a B�ohlen, Michael H., \|e author.
776	0	8	\|i Print version: \|z 9781627050289
830		0	\|a Synthesis digital library of engineering and computer science.
830		0	\|a Synthesis lectures on data management ; \|v # 38. \|x 2153-5426
856	4	8	\|3 Abstract with links to full text \|u http://dx.doi.org/10.2200/S00544ED1V01Y201310DTM038
942			\|c EB
999			\|c 81069 \|d 81069
952			\|0 0 \|1 0 \|4 0 \|7 0 \|9 73089 \|a MGUL \|b MGUL \|d 2016-03-20 \|l 0 \|r 2016-03-20 \|w 2016-03-20 \|y EB

Similarity joins in relational database systems /

Similar Items