Solr in Action

Trey Grainger, Timothy Potter (Autoren)

Buch | Softcover

664 Seiten

2014
Manning Publications (Verlag)
978-1-61729-102-9 (ISBN)

Keine Verlagsinformationen verfügbar

Artikel merken

Clearly-written comprehensive guide
In-depth coverage of Solr 4
Uses real-world examples backed by years of experience

»Solr in Action« is a comprehensive guide to implementing scalable search using Apache Solr. This clearly written book walks you through well-documented examples ranging from basic keyword searching to scaling a system for billions of documents and queries. It will give you a deep understanding of how to implement core Solr capabilities.

Whether handling big data, building cloud-based services, or developing multi-tenant web applications, it's vital to have a fast, reliable search solution.

Apache Solr is a scalable and ready-to-deploy open-source full-text search engine powered by Lucene. It offers key features like multi-lingual keyword searching, faceted search, intelligent matching, and relevancy weighting right out of the box.

Solr 4 provides new features to enable large-scale distributed search solutions that can be deployed as an elastically scaling cloud-based service and can provide additional intelligence to other big data technologies like Hadoop and Mahout.

»Solr in Action« is the definitive guide to implementing fast and scalable search using Apache Solr 4. It uses well-documented examples ranging from basic keyword searching to scaling a system for billions of documents and queries.

Readers will gain a deep understanding of how to implement core Solr capabilities such as faceted navigation through search results, matched snippet highlighting, field collapsing and search results grouping, spell checking, query auto-complete, querying by functions, and more.

This book assumes some knowledge of Java and standard database technology. No prior experience with Solr or Lucene is required.

Topics included:

How to scale Solr for big data
Rich real-world examples
Solr as a NoSQL data store
Advanced multilingual, data, and relevancy tricks
Coverage of versions through Solr 4.7

Trey Grainger manages the Search Technology Development group at CareerBuilder.com. His search experience includes handling multi-lingual content across dozens of markets/languages, machine learning, genetic algorithm and user group based relevancy tuning, geo-spatial search and validation, and work on customized payload scoring models, data mining, clustering, and recommendations. Trey is the Founder of Celiaccess.com, a gluten-free search engine, and is a frequent speaker at Lucene and Solr-related conferences.

Timothy Potter is an architect on the Big Data team at Dachis Group, where he focuses on large-scale machine learning, text mining, and social network analysis. Tim has worked extensively with Lucene and Solr technologies and has been a speaker at Lucene Revolution. He is a contributing author to Taming Text (Manning 2012) and holds several US Patents related to J2EE-based enterprise application integration. He blogs at thelabdude.blogspot.com.

foreword
preface
acknowledgments
about this book

Part 1 Meet Solr

Chapter 1 Introduction to Solr
Why do I need a search engine?
What is Solr?
Why Solr?
Features overview
Summary
Chapter 2 Getting to know Solr
Getting started
Searching is what it’s all about
Tour of the Solr administration console
Adapting the example to your needs
Summary
Chapter 3 Key Solr concepts
Searching, matching, and finding content
Relevancy
Precision and Recall
Searching at scale
Summary
Chapter 4 Configuring Solr
Overview of solrconfig.xml
Query request handling
Managing searchers
Cache management
Remaining configuration options
Summary
Chapter 5 Indexing
Example microblog search application
Designing your schema
Defining fields in schema.xml
Field types for structured nontext fields
Sending documents to Solr for indexing
Update handler
Index management
Summary
Chapter 6 Text analysis
Analyzing microblog text
Basic text analysis
Defining a custom field type for microblog text
Advanced text analysis
Summary

Part 2 Core Solr capabilities

Chapter 7 Performing queries and handling results
The anatomy of a Solr request
Working with query parsers
Queries and filters
The default query parser (Lucene query parser)
Handling user queries (eDisMax query parser)
Other useful query parsers
Returning results
Sorting results
Debugging query results
Summary
Chapter 8 Faceted search
Navigating your content at a glance
Setting up test data
Field faceting
Query faceting
Range faceting
Filtering upon faceted values
Multiselect faceting, keys, and tags
Beyond the basics
Summary
Chapter 9 Hit highlighting
Overview of hit highlighting
How highlighting works
Improving performance using FastVectorHighlighter
PostingsHighlighter
Summary
Chapter 10 Query suggestions
Spell-check
Autosuggesting query terms
Suggesting document field values
Suggesting queries based on user activity
Summary
Chapter 11 Result grouping/field collapsing
Result grouping vs. field collapsing
Skipping duplicate documents
Returning multiple documents per group
Grouping by functions and queries
Paging and sorting grouped results
Grouping gotchas
Efficient field collapsing with the collapsing query parser
Summary
Chapter 12 Taking Solr to production
Developing a Solr distribution
Deploying Solr
Hardware and server configuration
Data acquisition strategies
Sharding and replication
Solr core management
Managing clusters of servers
Querying and interacting with Solr
Monitoring Solr’s performance
Upgrading between Solr versions
Summary

Part 3 Taking Solr to the next level

Chapter 13 SolrCloud
Getting started with SolrCloud
Core concepts
Distributed indexing
Distributed search
Collections API
Basic system-administration tasks
Advanced topics
Summary
Chapter 14 Multilingual search
Why linguistic analysis matters
Stemming vs. lemmatization
Stemming in action
Handling edge cases
Available language libraries in Solr
Searching content in multiple languages
Language identification
Summary
Chapter 15 Complex query operations
Function queries
Geospatial search
Pivot faceting
Referencing external data
Cross-document and cross-index joins
Big data analytics with Solr
Summary
Chapter 16 Mastering relevancy
The impact of relevancy tuning
Debugging the relevancy calculation
Relevancy boosting
Pluggable Similarity class implementations
Personalized search and recommendations
Creating a personalized search experience
Running relevancy experiments
Summary

appendix A Working with the Solr codebase
appendix B Language-specific field type configurations
appendix C Useful data import configurations

index

Solr has had a long and successful history, but a major new chapter began recently with the advent of Solr 4 and SolrCloud. This is the perfect time for Solr in Action. With clear examples, enlightening diagrams, and coverage from key concepts through the newest features, Solr in Action will have you successfully using Solr in no time! Solr was born out of necessity in 2004, at CNET Networks (now CBS Interactive), to replace a commercial search engine being discontinued by the vendor. Even though I had no formal search background when I started writing Solr, it felt like a very natural fit, because I have always enjoyed making software “go fast.” I viewed Solr more as an alternate type of datastore designed around an inverted index than as a full-text search engine, and that has helped Solr extend beyond the legacy enterprise search market. By the end of 2005, Solr was powering the search and faceted navigation of a number of CNET sites, and soon it was made open source. Solr was contributed to the Apache Software Foundation in January 2006 and became a subproject of the Lucene PMC (with Lucene Java as its sibling). There had always been a large degree of overlap with Lucene (the core full-text search library used by Solr) committers, and in 2010 the projects were merged. Separate Lucene and Solr downloads would still be available, but they would be developed by a single unified team. Solr’s version number jumped to match that of Lucene, and the releases have since been synchronized. The recent Solr 4 release is a major milestone, adding SolrCloud—the set of highly scalable features including distributed indexing with no single points of failure. The NoSQL feature set was also expanded to include transaction logs, update durability, optimistic concurrency, and atomic updates. Solr in Action, written by longtime Solr power users and community members, Trey and Timothy, covers these important recent Solr features and provides an excellent starting point for those new to Solr. Solr is now used in more places than I could ever have imagined—from integrated library systems to e-commerce platforms, analytics and business intelligence products, content-management systems, internet searches, and more. It’s been rewarding to see Solr grow from a few early adopters to a huge global community of helpful users and active volunteers cooperatively pushing development forward. Solr in Action gives you the knowledge and techniques you need to use Solr’s features that have been under development since 2004. With Solr in Action in hand, you too are now well equipped to join the global community and help take Solr to new heights! YONIK SEELEY CREATOR OF SOLR

In 2008, I was asked to take over leadership of CareerBuilder’s search technology team. We were using the Microsoft FAST search platform at the time, but realized that search was too important to the success of our business for us to continue relying on a commercial vendor instead of developing the domain expertise internally. I immediately began investigating open source alternatives such as Solr, which seemed to provide most of the key features needed for our products. By the summer of 2009, we decided that we were ready to bring our search expertise in-house and convert our systems to Solr. The timing was great. Lucene, the open source search library upon which Solr is built, had become a full top-level Apache project in February 2005, and Solr, which had been contributed to the Apache Software Foundation in 2006, had become a top-level Apache project in January of 2007. Both technologies were reaching critical mass and would soon be merged (in March 2010) into a unified project. By the summer of 2010, our entire platform was converted to Solr. In the process, we increased the speed of our searches, significantly reduced the number of servers necessary to support our search infrastructure, dropped expensive licensing fees, increased platform stability, and in-sourced much of the search expertise for which we had previously been dependent on a commercial vendor. Little did we know at that time how much additional value we would gain by bringing search in-house. We have been able to build entirely new suites of search-based products—from traditional keyword and semantic search, to big data analytics products, to real-time recommendation engines—utilizing Solr as a scalable search architecture to handle billions of documents and millions of queries an hour across hundreds of servers. We have entered the era of cloud services, elastic scalability, and an explosion of data that we strive to make meaningful for society, and with Solr we are able to tackle each of these challenges head-on. When Manning approached me about writing Solr in Action, I was hesitant because I knew it would be a large undertaking. My one requirement was that I needed a strong coauthor, and that is exactly what I found in Timothy Potter. Tim also has years of experience developing search-based solutions with Lucene and Solr. He has a wealth of expertise building text analysis systems for social data and architecting realtime analytics solutions using Solr and other cutting-edge big data technologies. With both of us having received so much help from the Solr community over the years and with such a clear need for an example-driven guide to Solr, Tim and I are excited to be able to provide Solr in Action to help the next generation of search engineers. It’s the book we wish we’d had five years ago when we started with Solr, and we hope that you find it to be useful, whether you are just getting introduced to Solr or are looking to take your knowledge to the next level. TREY GRAINGER

Erscheint lt. Verlag	3.4.2014
Vorwort	Yonik Seeley
Verlagsort	New York
Sprache	englisch
Maße	191 x 231 mm
Gewicht	1092 g
Einbandart	kartoniert
Themenwelt	Informatik ► Datenbanken ► Data Warehouse / Data Mining
	Informatik ► Software Entwicklung ► SOA / Web Services
	Mathematik / Informatik ► Informatik ► Web / Internet
ISBN-10	1-61729-102-1 / 1617291021
ISBN-13	978-1-61729-102-9 / 9781617291029
Zustand	Neuware