Business Intelligence Guidebook -  Rick Sherman

Business Intelligence Guidebook (eBook)

From Data Integration to Analytics

(Autor)

eBook Download: PDF | EPUB
2014 | 1. Auflage
550 Seiten
Elsevier Science (Verlag)
978-0-12-411528-6 (ISBN)
Systemvoraussetzungen
Systemvoraussetzungen
38,95 inkl. MwSt
  • Download sofort lieferbar
  • Zahlungsarten anzeigen
Between the high-level concepts of business intelligence and the nitty-gritty instructions for using vendors' tools lies the essential, yet poorly-understood layer of architecture, design and process. Without this knowledge, Big Data is belittled - projects flounder, are late and go over budget. Business Intelligence Guidebook: From Data Integration to Analytics shines a bright light on an often neglected topic, arming you with the knowledge you need to design rock-solid business intelligence and data integration processes. Practicing consultant and adjunct BI professor Rick Sherman takes the guesswork out of creating systems that are cost-effective, reusable and essential for transforming raw data into valuable information for business decision-makers. After reading this book, you will be able to design the overall architecture for functioning business intelligence systems with the supporting data warehousing and data-integration applications. You will have the information you need to get a project launched, developed, managed and delivered on time and on budget - turning the deluge of data into actionable information that fuels business knowledge. Finally, you'll give your career a boost by demonstrating an essential knowledge that puts corporate BI projects on a fast-track to success. - Provides practical guidelines for building successful BI, DW and data integration solutions. - Explains underlying BI, DW and data integration design, architecture and processes in clear, accessible language. - Includes the complete project development lifecycle that can be applied at large enterprises as well as at small to medium-sized businesses - Describes best practices and pragmatic approaches so readers can put them into action. - Companion website includes templates and examples, further discussion of key topics, instructor materials, and references to trusted industry sources.

Rick Sherman is the founder of Athena IT Solutions, which provides consulting, training and vendor services for business intelligence, analytics, data integration and data warehousing. He is an adjunct faculty member at Northeastern University's Graduate School of Engineering and is a frequent contributor to industry publications, events, and webinars.
Between the high-level concepts of business intelligence and the nitty-gritty instructions for using vendors' tools lies the essential, yet poorly-understood layer of architecture, design and process. Without this knowledge, Big Data is belittled - projects flounder, are late and go over budget. Business Intelligence Guidebook: From Data Integration to Analytics shines a bright light on an often neglected topic, arming you with the knowledge you need to design rock-solid business intelligence and data integration processes. Practicing consultant and adjunct BI professor Rick Sherman takes the guesswork out of creating systems that are cost-effective, reusable and essential for transforming raw data into valuable information for business decision-makers. After reading this book, you will be able to design the overall architecture for functioning business intelligence systems with the supporting data warehousing and data-integration applications. You will have the information you need to get a project launched, developed, managed and delivered on time and on budget turning the deluge of data into actionable information that fuels business knowledge. Finally, you'll give your career a boost by demonstrating an essential knowledge that puts corporate BI projects on a fast-track to success. - Provides practical guidelines for building successful BI, DW and data integration solutions. - Explains underlying BI, DW and data integration design, architecture and processes in clear, accessible language. - Includes the complete project development lifecycle that can be applied at large enterprises as well as at small to medium-sized businesses- Describes best practices and pragmatic approaches so readers can put them into action. - Companion website includes templates and examples, further discussion of key topics, instructor materials, and references to trusted industry sources.

Front 
1 
Business Intelligence 
4 
Copyright 5
Contents 6
Foreword 18
How to Use This Book 20
CHAPTER SUMMARIES 20
Acknowledgments 24
PART I - 

26 
CHAPTER 1 
28 
JUST ONE WORD: DATA 28
WELCOME TO THE DATA DELUGE 29
TAMING THE ANALYTICS DELUGE 31
TOO MUCH DATA, TOO LITTLE INFORMATION 33
DATA CAPTURE VERSUS INFORMATION ANALYSIS 35
THE FIVE CS OF DATA 37
COMMON TERMINOLOGY FROM OUR PERSPECTIVE 39
REFERENCES 44
PART II - 

46 
CHAPTER 2 - JUSTIFYING BI: BUILDING THE BUSINESS AND TECHNICAL CASE 48
WHY JUSTIFICATION IS NEEDED 48
BUILDING THE BUSINESS CASE 49
BUILDING THE TECHNICAL CASE 53
ASSESSING READINESS 57
CREATING A BI ROAD MAP 60
DEVELOPING SCOPE, PRELIMINARY PLAN, AND BUDGET 60
OBTAINING APPROVAL 65
COMMON JUSTIFICATION PITFALLS 65
CHAPTER 3 - DEFINING REQUIREMENTS—BUSINESS, DATA AND QUALITY 68
THE PURPOSE OF DEFINING REQUIREMENTS 68
GOALS 69
DELIVERABLES 70
ROLES 72
DEFINING REQUIREMENTS WORKFLOW 74
INTERVIEWING 81
DOCUMENTING REQUIREMENTS 85
PART III - 
88 
CHAPTER 4 - ARCHITECTURE FRAMEWORK 90
THE NEED FOR ARCHITECTURAL BLUEPRINTS 90
ARCHITECTURAL FRAMEWORK 91
INFORMATION ARCHITECTURE 92
DATA ARCHITECTURE 93
TECHNICAL ARCHITECTURE 97
PRODUCT ARCHITECTURE 103
METADATA 103
SECURITY AND PRIVACY 105
AVOIDING ACCIDENTS WITH ARCHITECTURAL PLANNING 106
DO NOT OBSESS OVER THE ARCHITECTURE 108
CHAPTER 5 - INFORMATION ARCHITECTURE 110
THE PURPOSE OF AN INFORMATION ARCHITECTURE 110
DATA INTEGRATION FRAMEWORK 111
DIF INFORMATION ARCHITECTURE 112
OPERATIONAL BI VERSUS ANALYTICAL BI 125
MASTER DATA MANAGEMENT 128
CHAPTER 6 - DATA ARCHITECTURE 132
THE PURPOSE OF A DATA ARCHITECTURE 132
HISTORY 133
DATA ARCHITECTURAL CHOICES 143
DATA INTEGRATION WORKFLOW 153
DATA WORKFLOW—RISE OF EDW AGAIN 161
OPERATIONAL DATA STORE 162
REFERENCES 167
CHAPTER 7 - TECHNOLOGY & PRODUCT ARCHITECTURES
WHERE ARE THE PRODUCT AND VENDOR NAMES? 168
EVOLUTION NOT REVOLUTION 169
TECHNOLOGY ARCHITECTURE 172
PRODUCT AND TECHNOLOGY EVALUATIONS 190
PART IV - 
196 
CHAPTER 8 - FOUNDATIONAL DATA MODELING 198
THE PURPOSE OF DATA MODELING 198
DEFINITIONS—THE DIFFERENCE BETWEEN A DATA MODEL AND DATA MODELING 198
THREE LEVELS OF DATA MODELS 199
DATA MODELING WORKFLOW 202
WHERE DATA MODELS ARE USED 203
ENTITY-RELATIONSHIP (ER) MODELING OVERVIEW 204
NORMALIZATION 214
LIMITS AND PURPOSE OF NORMALIZATION 219
CHAPTER 9 - DIMENSIONAL MODELING 222
INTRODUCTION TO DIMENSIONAL MODELING 222
HIGH-LEVEL VIEW OF A DIMENSIONAL MODEL 223
FACTS 223
DIMENSIONS 228
SCHEMAS 233
ENTITY RELATIONSHIP VERSUS DIMENSIONAL MODELING 238
PURPOSE OF DIMENSIONAL MODELING 241
FACT TABLES 243
ACHIEVING CONSISTENCY 245
ADVANCED DIMENSIONS AND FACTS 246
DIMENSIONAL MODELING RECAP 259
CHAPTER 10 - BUSINESS INTELLIGENCE DIMENSIONAL MODELING 262
INTRODUCTION 262
HIERARCHIES 262
OUTRIGGER TABLES 269
SLOWLY CHANGING DIMENSIONS 270
CAUSAL DIMENSION 287
MULTIVALUED DIMENSIONS 288
JUNK DIMENSIONS 290
VALUE BAND REPORTING 293
HETEROGENEOUS PRODUCTS 294
ALTERNATE DIMENSIONS 295
TOO FEW OR TOO MANY DIMENSIONS 297
PART V - 
298 
CHAPTER 11 - DATA INTEGRATION DESIGN AND DEVELOPMENT 300
GETTING STARTED WITH DATA INTEGRATION 300
DATA INTEGRATION ARCHITECTURE 302
DATA INTEGRATION REQUIREMENTS 305
DATA INTEGRATION DESIGN 310
DATA INTEGRATION STANDARDS 315
LOADING HISTORICAL DATA 320
DATA INTEGRATION PROTOTYPING 323
DATA INTEGRATION TESTING 323
CHAPTER 12 - DATA INTEGRATION PROCESSES 326
INTRODUCTION: MANUAL CODING VERSUS TOOL-BASED DATA INTEGRATION 326
DATA INTEGRATION SERVICES 334
PART VI - 
360 
CHAPTER 13 - BUSINESS INTELLIGENCE APPLICATIONS 362
BI CONTENT SPECIFICATIONS 362
REVISE BI APPLICATIONS LIST 364
BI PERSONAS 365
BI DESIGN LAYOUT—BEST PRACTICES 368
DATA DESIGN FOR SELF-SERVICE BI 373
MATCHING TYPES OF ANALYSIS TO VISUALIZATIONS 376
CHAPTER 14 - BI DESIGN AND DEVELOPMENT 384
BI DESIGN 384
BI DEVELOPMENT 392
BI APPLICATION TESTING 397
CHAPTER 15 - ADVANCED ANALYTICS 400
ADVANCED ANALYTICS OVERVIEW AND BACKGROUND 400
PREDICTIVE ANALYTICS AND DATA MINING 402
ANALYTICAL SANDBOXES AND HUBS 408
BIG DATA ANALYTICS 420
DATA VISUALIZATION 426
REFERENCE 427
CHAPTER 16 - DATA SHADOW SYSTEMS 428
THE DATA SHADOW PROBLEM 428
ARE THERE DATA SHADOW SYSTEMS IN YOUR ORGANIZATION? 430
WHAT KIND OF DATA SHADOW SYSTEMS DO YOU HAVE? 431
DATA SHADOW SYSTEM TRIAGE 432
THE EVOLUTION OF DATA SHADOW SYSTEMS IN AN ORGANIZATION 433
DAMAGES CAUSED BY DATA SHADOW SYSTEMS 437
THE BENEFITS OF DATA SHADOW SYSTEMS 438
MOVING BEYOND DATA SHADOW SYSTEMS 439
MISGUIDED ATTEMPTS TO REPLACE DATA SHADOW SYSTEMS 442
RENOVATING DATA SHADOW SYSTEMS 443
PART VII - 
448 
CHAPTER 17 - PEOPLE, PROCESS AND POLITICS 450
THE TECHNOLOGY TRAP 450
THE BUSINESS AND IT RELATIONSHIP 452
ROLES AND RESPONSIBILITIES 454
BUILDING THE BI TEAM 456
TRAINING 466
DATA GOVERNANCE 469
CHAPTER 18 - PROJECT MANAGEMENT 474
THE ROLE OF PROJECT MANAGEMENT 474
ESTABLISHING A BI PROGRAM 475
BI ASSESSMENT 485
WORK BREAKDOWN STRUCTURE 490
BI ARCHITECTURAL PLAN 495
BI PROJECTS ARE DIFFERENT 497
PROJECT METHODOLOGIES 498
BI PROJECT PHASES 504
BI PROJECT SCHEDULE 509
CHAPTER 19 - CENTERS OF EXCELLENCE 518
THE PURPOSE OF CENTERS OF EXCELLENCE 518
BI COE 519
DATA INTEGRATION CENTER OF EXCELLENCE 526
ENABLING A DATA-DRIVEN ENTERPRISE 536
REFERENCE 537
Index 538

Chapter 1

The Business Demand for Data, Information, and Analytics


Abstract


In the business world, knowledge is not just power. It is the lifeblood of a thriving enterprise. Knowledge comes from information, and that, in turn, comes from data. Many enterprises are overwhelmed by the deluge of data, which they are receiving from all directions. They are wondering if they can handle Big Data—with its expanding volume, variety, and velocity. There is a big difference between raw data, which by itself is not useful, and actionable information, which business people can use with confidence to make decisions. Data must to be transformed to make it clean, consistent, conformed, current, and comprehensive—the five Cs of data. It is up to a Business Intelligence (BI) team to gather and manage the data to empower the company’s business groups with the information they need to gain knowledge—knowledge that helps them make informed decisions about every step the company takes. While there are attempts to circumvent or replace BI with operational systems, there really is no good substitute for true BI. Operational systems may excel at data capture, but BI excels at information analysis.

Keywords


Big Data; Data; Data 5 Cs; Data capture; Data variety; Data velocity; Data volume; Information; Information analysis; Operational BI
Information in This Chapter
• The data and information deluge
• The analytics deluge
• Data versus actionable information
• Data capture versus information analysis
• The five Cs of data
• Common terminology

Just One Word: Data


“I just want to say one word to you. Just one word… Are you listening? … Plastics. There’s a great future in plastics.”

Mr. McGuire in the 1967 movie The Graduate.

The Mr. McGuires of the world are no longer advising newly-minted graduates to get into plastics. But perhaps they should be recommending data. In today’s digital world data is the key, the ticket, and the Holy Grail all rolled into one.
I do not just mean it’s growing in importance as a profession, although it is a great field to get into, and I’m thrilled that my sons Jake and Josh are pursuing careers in data and technology. Data is where the dollars are when it comes to company budgets. Every few years there is another report showing that business intelligence (BI) is at or near the top of the chief information officer’s (CIO) list of priorities.
Enterprises today are driven by data, or, to be more precise, information that is gleaned from data. It sheds light on what is unknown, it reduces uncertainty, and it turns decision-making from an art to a science.
But whether it’s Big Data or just plain old data, it requires a lot of work before it is actually something useful. You would not want to eat a cup of flour, but baked into a cake with butter, eggs, and sugar for the right amount of time at the right temperature it is transformed into something delicious. Likewise, raw data is unpalatable to the business person who needs it to make decisions. It is inconsistent, incomplete, outdated, unformatted, and riddled with errors. Raw data needs integration, design, modeling, architecting, and other work before it can be transformed into consumable information.
This is where you need data integration to unify and massage the data, data warehousing to store and stage it, and BI to present it to decision-makers in an understandable way. It can be a long and complicated process, but there is a path; there are guidelines and best practices. As with many things that are hard to do, there are promised shortcuts and “silver bullets” that you need to learn to recognize before they trip you up.
It will take a lot more than just reading this book to make your project a success, but my hope is that it will help set you on the right path.

Welcome to the Data Deluge


In the business world, knowledge is not just power. It is the lifeblood of a thriving enterprise. Knowledge comes from information, and that, in turn, comes from data. It is up to a BI team to gather and manage the data to empower the company’s business groups with the information they need to gain knowledge—knowledge that helps them make informed decisions about every step the company takes.
Enterprises need this information to understand their operations, customers, competitors, suppliers, partners, employees, and stockholders. They need to learn about what is happening in the business, analyze their operations, react to internal and external pressures, and make decisions that will help them manage costs, grow revenues, and increase sales and profits. Forrester Research sums it up perfectly: “Data is the raw material of everything firms do, but too many have been treating it like waste material—something to deal with, something to report on, something that grows like bacteria in a petri dish. No more! Some say that data is the new oil—but we think that comparing data to oil is too limiting. Data is the new sun: it’s limitless and touches everything firms do. Data must flow fast and rich for your organization to serve customers better than your competitors can. Firms must invest heavily in building a next-generation customer data management capability to grow revenue and profits in the age of the customer. Data is an asset that even CFOs will realize should have a line on the balance sheet right alongside property, plant, and equipment” [1].
It can be a problem, however, when there is more data than an enterprise can handle. They collect massive amounts of data every day internally and externally as they interact with customers, partners, and suppliers. They research and track information on their competitors and the marketplace. They put tracking codes on their websites so they can learn exactly how many visitors they get and where they came from. They store and track information required by government regulations and industry initiatives. Now there is the Internet of Things (IoT), with sensors embedded in physical objects such as pacemakers, thermostats, and dog collars where they collect data. It is a deluge of data (Figure 1.1).

Data Volume, Variety, and Velocity


It is not only that enterprises accumulate data in ever-increasing volumes, the variety and velocity of data is also increasing. Although the emerging “Big Data” databases can cause an enterprise’s ability to gather data to explode, the volume, velocity, and variety are all expanding no matter how “big” or “small” the data is.
Volume—According to many experts, 90% of the data in the world today was created in the last two years alone. When you hear that statistic you might think that it is coming from all the chatter on social media, but data is being generated by all manner of activities. For just one example, think about the emergence of radio frequency identification (RFID) to track products from manufacturing to purchase. It is a huge category of data that simply did not exist before. Although not all of the data gathered is significant for an enterprise, it still leaves a massive amount of data with which to deal.
Velocity—Much of the data now is time sensitive, and there is greater pressure to decrease the time between when it is captured and when it is used for reporting. We now depend on the speed of some of this data. It is extremely helpful to receive an immediate notification from your bank, for example, when a fraudulent transaction is detected, enabling you to cancel your credit card immediately. Businesses across industry sectors are using current data when interacting with their customers, prospects, suppliers, partners, employees, and other stakeholders.
Variety—The sources of data continue to expand. Receiving data from disparate sources further complicates things. Unstructured data, such as audio, video, and social media, and semistructured data like XML and RSS feeds must be handled differently from traditional structured data. The CIO of the past thought phones were just for talking, not something that collected data. He also thought Twitter was something that birds did. Now that an enterprise can collect data from tweets about its products, how does it handle that data and then what does it do with it? Also, what does it do with the invaluable data that business people create in spreadsheets and Microsoft Word documents and use in decision-making? Formerly, CIOs just had to worry about collecting and analyzing data from back office applications, but now their data can come from people, machines, processes, and applications spread across the world.

FIGURE 1.1 Too much information. www.CartoonStock.com.
Unfortunately, enterprises have not been as good at organizing and understanding the data as they have been at gathering it. Data has no value unless you can understand what you have, analyze it, and then act on the insights from the analysis.
See the book’s companion Website www.BIguidebook.com for links to industry research, templates, and other materials to help you learn more about business intelligence and make your next project a success.
To receive updates on newly posted material, subscribe to the email list on the Website or follow the RSS feed of my blog at www.datadoghouse.com.

Taming the Analytics Deluge


With this flood of data...

PDFPDF (Adobe DRM)
Größe: 35,3 MB

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seiten­layout eignet sich die PDF besonders für Fach­bücher mit Spalten, Tabellen und Abbild­ungen. Eine PDF kann auf fast allen Geräten ange­zeigt werden, ist aber für kleine Displays (Smart­phone, eReader) nur einge­schränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

EPUBEPUB (Adobe DRM)
Größe: 22,1 MB

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belle­tristik und Sach­büchern. Der Fließ­text wird dynamisch an die Display- und Schrift­größe ange­passt. Auch für mobile Lese­geräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich