Real–Time Analytics
John Wiley & Sons Inc (Verlag)
978-1-118-83791-7 (ISBN)
- Titel ist leider vergriffen;
keine Neuauflage - Artikel merken
Construct a robust end-to-end solution for analyzing and visualizing streaming data Real-time analytics is the hottest topic in data analytics today. In Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data, expert Byron Ellis teaches data analysts technologies to build an effective real-time analytics platform. This platform can then be used to make sense of the constantly changing data that is beginning to outpace traditional batch-based analysis platforms. The author is among a very few leading experts in the field. He has a prestigious background in research, development, analytics, real-time visualization, and Big Data streaming and is uniquely qualified to help you explore this revolutionary field. Moving from a description of the overall analytic architecture of real-time analytics to using specific tools to obtain targeted results, Real-Time Analytics leverages open source and modern commercial tools to construct robust, efficient systems that can provide real-time analysis in a cost-effective manner.
The book includes: * A deep discussion of streaming data systems and architectures * Instructions for analyzing, storing, and delivering streaming data * Tips on aggregating data and working with sets * Information on data warehousing options and techniques Real-Time Analytics includes in-depth case studies for website analytics, Big Data, visualizing streaming and mobile data, and mining and visualizing operational data flows. The book's "recipe" layout lets readers quickly learn and implement different techniques. All of the code examples presented in the book, along with their related data sets, are available on the companion website.
BYRON ELLIS is CTO of Spongecell, where he heads research and development. Previously the Chief Data Scientist for LivePerson and CTO at AdBrite, Ellis holds a Ph.D. in Statistics from Harvard University, and a B.S. in Cybernetics from UCLA. He presents sessions on real-time analytics at Strata and other major conferences.
Introduction xv Chapter 1 Introduction to Streaming Data 1 Sources of Streaming Data 2 Operational Monitoring 3 Web Analytics 3 Online Advertising 4 Social Media 5 Mobile Data and the Internet of Things 5 Why Streaming Data Is Different 7 Always On, Always Flowing 7 Loosely Structured 8 High-Cardinality Storage 9 Infrastructures and Algorithms 10 Conclusion 10 Part I Streaming Analytics Architecture 13 Chapter 2 Designing Real-Time Streaming Architectures 15 Real-Time Architecture Components 16 Collection 16 Data Flow 17 Processing 19 Storage 20 Delivery 22 Features of a Real-Time Architecture 24 High Availability 24 Low Latency 25 Horizontal Scalability 26 Languages for Real-Time Programming 27 Java 27 Scala and Clojure 28 JavaScript 29 The Go Language 30 A Real-Time Architecture Checklist 30 Collection 31 Data Flow 31 Processing 32 Storage 32 Delivery 33 Conclusion 34 Chapter 3 Service Configuration and Coordination 35 Motivation for Confi guration and Coordination Systems 36 Maintaining Distributed State 36 Unreliable Network Connections 36 Clock Synchronization 37 Consensus in an Unreliable World 38 Apache ZooKeeper 39 The znode 39 Watches and Notifi cations 41 Maintaining Consistency 41 Creating a ZooKeeper Cluster 42 ZooKeeper s Native Java Client 47 The Curator Client 56 Curator Recipes 63 Conclusion 70 Chapter 4 Data-Flow Management in Streaming Analysis 71 Distributed Data Flows 72 At Least Once Delivery 72 The n+1 Problem 73 Apache Kafka: High-Throughput Distributed Messaging 74 Design and Implementation 74 Configuring a Kafka Environment 80 Interacting with Kafka Brokers 89 Apache Flume: Distributed Log Collection 92 The Flume Agent 92 Configuring the Agent 94 The Flume Data Model 95 Channel Selectors 95 Flume Sources 98 Flume Sinks 107 Sink Processors 110 Flume Channels 110 Flume Interceptors 112 Integrating Custom Flume Components 114 Running Flume Agents 114 Conclusion 115 Chapter 5 Processing Streaming Data 117 Distributed Streaming Data Processing 118 Coordination 118 Partitions and Merges 119 Transactions 119 Processing Data with Storm 119 Components of a Storm Cluster 120 Configuring a Storm Cluster 122 Distributed Clusters 123 Local Clusters 126 Storm Topologies 127 Implementing Bolts 130 Implementing and Using Spouts 136 Distributed Remote Procedure Calls 142 Trident: The Storm DSL 144 Processing Data with Samza 151 Apache YARN 151 Getting Started with YARN and Samza 153 Integrating Samza into the Data Flow 157 Samza Jobs 157 Conclusion 166 Chapter 6 Storing Streaming Data 167 Consistent Hashing 168 NoSQL Storage Systems 169 Redis 170 MongoDB 180 Cassandra 203 Other Storage Technologies 215 Relational Databases 215 Distributed In-Memory Data Grids 215 Choosing a Technology 215 Key-Value Stores 216 Document Stores 216 Distributed Hash Table Stores 216 In-Memory Grids 217 Relational Databases 217 Warehousing 217 Hadoop as ETL and Warehouse 218 Lambda Architectures 223 Conclusion 224 Part II Analysis and Visualization 225 Chapter 7 Delivering Streaming Metrics 227 Streaming Web Applications 228 Working with Node 229 Managing a Node Project with NPM 231 Developing Node Web Applications 235 A Basic Streaming Dashboard 238 Adding Streaming to Web Applications 242 Visualizing Data 254 HTML5 Canvas and Inline SVG 254 Data-Driven Documents: D3.js 262 High-Level Tools 272 Mobile Streaming Applications 277 Conclusion 279 Chapter 8 Exact Aggregation and Delivery 281 Timed Counting and Summation 285 Counting in Bolts 286 Counting with Trident 288 Counting in Samza 289 Multi-Resolution Time-Series Aggregation 290 Quantization Framework 290 Stochastic Optimization 296 Delivering Time-Series Data 297 Strip Charts with D3.js 298 High-Speed Canvas Charts 299 Horizon Charts 301 Conclusion 303 Chapter 9 Statistical Approximation of Streaming Data 305 Numerical Libraries 306 Probabilities and Distributions 307 Expectation and Variance 309 Statistical Distributions 310 Discrete Distributions 310 Continuous Distributions 312 Joint Distributions 315 Working with Distributions 316 Inferring Parameters 316 The Delta Method 317 Distribution Inequalities 319 Random Number Generation 319 Generating Specific Distributions 321 Sampling Procedures 324 Sampling from a Fixed Population 325 Sampling from a Streaming Population 326 Biased Streaming Sampling 327 Conclusion 329 Chapter 10 Approximating Streaming Data with Sketching 331 Registers and Hash Functions 332 Registers 332 Hash Functions 332 Working with Sets 336 The Bloom Filter 338 The Algorithm 338 Choosing a Filter Size 340 Unions and Intersections 341 Cardinality Estimation 342 Interesting Variations 344 Distinct Value Sketches 347 The Min-Count Algorithm 348 The HyperLogLog Algorithm 351 The Count-Min Sketch 356 Point Queries 356 Count-Min Sketch Implementation 357 Top-K and Heavy Hitters 358 Range and Quantile Queries 360 Other Applications 364 Conclusion 364 Chapter 11 Beyond Aggregation 367 Models for Real-Time Data 368 Simple Time-Series Models 369 Linear Models 373 Logistic Regression 378 Neural Network Models 380 Forecasting with Models 389 Exponential Smoothing Methods 390 Regression Methods 393 Neural Network Methods 394 Monitoring 396 Outlier Detection 397 Change Detection 399 Real-Time Optimization 400 Conclusion 402 Index 403
Erscheint lt. Verlag | 19.8.2014 |
---|---|
Verlagsort | New York |
Sprache | englisch |
Maße | 187 x 251 mm |
Gewicht | 744 g |
Themenwelt | Mathematik / Informatik ► Informatik ► Datenbanken |
Informatik ► Weitere Themen ► Hardware | |
ISBN-10 | 1-118-83791-6 / 1118837916 |
ISBN-13 | 978-1-118-83791-7 / 9781118837917 |
Zustand | Neuware |
Haben Sie eine Frage zum Produkt? |
aus dem Bereich