Information Retrieval Models
Foundations and Relationships
Seiten
2013
Morgan and Claypool Life Sciences (Verlag)
978-1-62705-078-4 (ISBN)
Morgan and Claypool Life Sciences (Verlag)
978-1-62705-078-4 (ISBN)
Information Retrieval (IR) models are a core component of IR research and IR systems. The past decade brought a consolidation of the family of IR models, which by 2000 consisted of relatively isolated views on TF-IDF (Term-Frequency times Inverse-Document-Frequency) as the weighting scheme in the vector-space model (VSM), the probabilistic relevance framework (PRF), the binary independence retrieval (BIR) model, BM25 (Best-Match Version 25, the main instantiation of the PRF/BIR), and language modelling (LM). Also, the early 2000s saw the arrival of divergence from randomness (DFR).
Regarding intuition and simplicity, though LM is clear from a probabilistic point of view, several people stated: ""It is easy to understand TF-IDF and BM25. For LM, however, we understand the math, but we do not fully understand why it works.""
This book takes a horizontal approach gathering the foundations of TF-IDF, PRF, BIR, Poisson, BM25, LM, probabilistic inference networks (PIN's), and divergence-based models. The aim is to create a consolidated and balanced view on the main models.
A particular focus of this book is on the ""relationships between models."" This includes an overview over the main frameworks (PRF, logical IR, VSM, generalized VSM) and a pairing of TF-IDF with other models. It becomes evident that TF-IDF and LM measure the same, namely the dependence (overlap) between document and query. The Poisson probability helps to establish probabilistic, non-heuristic roots for TF-IDF, and the Poisson parameter, average term frequency, is a binding link between several retrieval models and model parameters.
Regarding intuition and simplicity, though LM is clear from a probabilistic point of view, several people stated: ""It is easy to understand TF-IDF and BM25. For LM, however, we understand the math, but we do not fully understand why it works.""
This book takes a horizontal approach gathering the foundations of TF-IDF, PRF, BIR, Poisson, BM25, LM, probabilistic inference networks (PIN's), and divergence-based models. The aim is to create a consolidated and balanced view on the main models.
A particular focus of this book is on the ""relationships between models."" This includes an overview over the main frameworks (PRF, logical IR, VSM, generalized VSM) and a pairing of TF-IDF with other models. It becomes evident that TF-IDF and LM measure the same, namely the dependence (overlap) between document and query. The Poisson probability helps to establish probabilistic, non-heuristic roots for TF-IDF, and the Poisson parameter, average term frequency, is a binding link between several retrieval models and model parameters.
List of Figures
Preface
Acknowledgments
Introduction
Foundations of IR Models
Relationships Between IR Models
Summary & Research Outlook
Bibliography
Author's Biography
Index
Erscheint lt. Verlag | 1.9.2013 |
---|---|
Reihe/Serie | Synthesis Lectures on Information Concepts, Retrieval, and Services |
Verlagsort | San Rafael, CA |
Sprache | englisch |
Maße | 191 x 235 mm |
Gewicht | 330 g |
Themenwelt | Mathematik / Informatik ► Informatik ► Datenbanken |
Mathematik / Informatik ► Informatik ► Netzwerke | |
Mathematik / Informatik ► Informatik ► Theorie / Studium | |
Informatik ► Weitere Themen ► Hardware | |
ISBN-10 | 1-62705-078-7 / 1627050787 |
ISBN-13 | 978-1-62705-078-4 / 9781627050784 |
Zustand | Neuware |
Haben Sie eine Frage zum Produkt? |
Mehr entdecken
aus dem Bereich
aus dem Bereich
entwickle, drucke und baue deine DIY-Objekte
Buch | Hardcover (2023)
Hanser, Carl (Verlag)
34,99 €
Eine unterhaltsame Einführung für Maker, Kids, Tüftlerinnen und …
Buch | Softcover (2022)
dpunkt (Verlag)
36,90 €