Commercial Data Mining - David Nettleton

Commercial Data Mining (eBook)

Processing, Analysis and Modeling for Predictive Analytics Projects

David Nettleton (Autor)

eBook Download: PDF | EPUB

2014 | 1. Auflage
304 Seiten
Elsevier Science (Verlag)
978-0-12-416658-5 (ISBN)

Whether you are brand new to data mining or working on your tenth predictive analytics project, Commercial Data Mining will be there for you as an accessible reference outlining the entire process and related themes. In this book, you'll learn that your organization does not need a huge volume of data or a Fortune 500 budget to generate business using existing information assets. Expert author David Nettleton guides you through the process from beginning to end and covers everything from business objectives to data sources, and selection to analysis and predictive modeling. Commercial Data Mining includes case studies and practical examples from Nettleton's more than 20 years of commercial experience. Real-world cases covering customer loyalty, cross-selling, and audience prediction in industries including insurance, banking, and media illustrate the concepts and techniques explained throughout the book. - Illustrates cost-benefit evaluation of potential projects - Includes vendor-agnostic advice on what to look for in off-the-shelf solutions as well as tips on building your own data mining tools - Approachable reference can be read from cover to cover by readers of all experience levels - Includes practical examples and case studies as well as actionable business insights from author's own experience

David F. Nettleton has more than 25 years of experience in IT system development, specializing in databases and data analysis. He has a Bachelor of Science degree in Computer Science, Master of Science degree in Computer Software and Systems Design and a Ph.D. in Artificial Intelligence. He has worked for IBM as a Business Intelligence Consultant, among other companies. In 1995 he founded his own consultancy dedicated to commercial data analysis projects, working in the Banking, Insurance, Media, Industry and Health Sectors. He has published over 40 articles and papers in journals, national and international congresses and magazines, and has given many presentations in conferences and workshops. He is currently a contract researcher at the Universitat Pompeu Fabra, Barcelona, Spain and at the IIIA-CSIC, Spain, specializing in data mining applied to online social networks and data privacy. Dr. Nettleton was born in England and lives in Barcelona, Spain since 1988.

Chapter 2

Business Objectives

Abstract

This chapter discusses the definition of a data mining project, including its initial concept, motivation, objective, viability, estimated costs, and expected benefit (returns). Key considerations are defined, and a way of quantifying the cost and benefit is presented in terms of the factors that most influence the project. Two case studies illustrate how the cost/benefit evaluation can be applied to real-world projects.

Keywords

business objectives

cost

benefit

evaluation

influential factors

customer call center

mobile applications advertising

Introduction

A commercial data analysis project that lives up to its expectations will probably do so because sufficient time was dedicated at the outset to defining the project’s business objectives. What is meant by business objectives? The following are some examples:

• Reduce the loss of existing customers by 3 percent.

• Augment the contract signings of new customers by 2 percent.

• Augment the sales from cross-selling products to existing customers by 5 percent.

• Predict the television audience share with a probability of 70 percent.

• Predict, with a precision of 75 percent, which clients are most likely to contract a new product.

• Identify new categories of clients and products.

• Create a new customer segmentation model.

The first three examples define a specific percentage of precision and improvement as part of the objective.

Business Objective

Assigning a Value for Percent Improvement

The percentage improvement should always be considered with regard to the current precision of an existing index as a baseline. Also, the new precision objective should not get lost in the error bars of the current precision. That is, if the current precision has an error margin of ± 3% in its measurement or calculation, this should be taken into account.

In the fourth and fifth examples, an absolute value is specified for the desired precision for the data model. In the final two examples the desired improvement is not quantified; instead, the objective is expressed in qualitative terms.

Criteria for Choosing a Viable Project

This section enumerates some main issues and poses some key questions relevant to evaluating the viability of a potential data mining project. The checklists of general and specific considerations provided here are the bases for the rest of the chapter, which enters into a more detailed specification of benefit and cost criteria and applies these definitions to two case studies.

Evaluation of Potential Commercial Data Analysis Projects – General Considerations

The following is a list of questions to ask when considering a data analysis project:

• Is data available that is consistent and correlated with the business objectives?

• What is the capacity for improvement with respect to the current methods? (The greater the capacity for improvement, the greater the economic benefit.)

• Is there an operational business need for the project results?

• Can the problem be solved by other techniques or methods? (If the answer is no, the profitability return on the project will be greater.)

• Does the project have a well-defined scope? (If this is the first instance of a project of this type, reducing the scale of the project is recommended.)

Evaluation of Viability in Terms of Available Data – Specific Considerations

The following list provides specific considerations for evaluating the viability of a data mining project in terms of the available data:

• Does the necessary data for the business objectives exist, and does the business have access to it?

• If part or all of the data does not exist, can processes be defined to capture or obtain it?

• What is the coverage of the data with respect to the business objectives?

• What is the availability of a sufficient volume of data over a required period of time, for all clients, product types, sales channels, and so on? (The data should cover all the business factors to be analyzed and modeled. The historical data should cover the current business cycle.)

• Is it necessary to evaluate the quality of the available data in terms of reliability? (The reliability depends on the percentage of erroneous data and incomplete or missing data. The ranges of values must be sufficiently wide to cover all cases of interest.)

• Are people available who are familiar with the relevant data and the operational processes that generate the data?

Factors That Influence Project Benefits

There are several factors that influence the benefits of a project. A qualitative assessment of current functionality is first required: what is the current grade of satisfaction of how the task is being done? A value between 1 and 0 is assigned, where 1 is the highest grade of satisfaction and 0 is the lowest, where the lower the current grade of satisfaction, the greater the improvement and, consequently, the benefit, will be.

The potential quality of the result (the evaluation of future functionality) can be estimated by three aspects of the data: coverage, reliability, and correlation:

• The coverage or completeness of the data, assigned a value between 0 and 1, where 1 indicates total coverage.

• The quality or reliability of the data, assigned a value between 0 and 1, where 1 indicates the highest quality. (Both the coverage and the reliability are normally measured variable by variable, giving a total for the whole dataset. Good coverage and reliability for the data help to make the analysis a success, thus giving a greater benefit.)

• The correlation between the data and its grade of dependence with the business objective can be statistically measured. A correlation is typically measured as a value from –1 (total negative correlation) through 0 (no correlation) to 1 (total positive correlation). For example, if the business objective is that clients buy more products, the correlation would be calculated for each customer variable (age, time as a customer, zip code of postal address, etc.) with the customer’s sales volume.

Once individual values for coverage, reliability, and correlation are acquired, an estimation of the future functionality can be obtained using the formula:

An estimation of the possible improvement is then determined by calculating the difference between the current and the future functionality, thus:

A fourth aspect, volatility, concerns the amount of time the results of the analysis or data modeling will remain valid.

Volatility of the environment of the business objective can be defined as a value of between 0 and 1, where 0 = minimum volatility and 1 = maximum volatility. A high volatility can cause models and conclusions to become quickly out of date with respect to the data; even the business objective can lose relevance. Volatility depends on whether the results are applicable over the long, medium, or short terms with respect to the business cycle.

Note that this a priori evaluation gives an idea for the viability of a data mining project. However, it is clear that the quality and precision of the end result will also depend on how well the project is executed: analysis, modeling, implementation, deployment, and so on. The next section, which deals with the estimation of the cost of the project, includes a factor (expertise) that evaluates the availability of the people and skills necessary to guarantee the a posteriori success of the project.

Factors That Influence Project Costs

There are numerous factors that influence how much a project costs. These include:

• Accessibility: The more data sources, the higher the cost. Typically, there are at least two different data sources.

• Complexity: The greater the number of variables in the data, the greater the cost. Categorical-type variables (zones, product types, etc.) must especially be taken into account, given that each variable may have many possible values (for example, 50). On the other hand, there could be just 10 other variables, each of which has only two possible...

Erscheint lt. Verlag	29.1.2014
Sprache	englisch
Themenwelt	Informatik ► Datenbanken ► Data Warehouse / Data Mining
	Informatik ► Office Programme ► Outlook
	Mathematik / Informatik ► Informatik ► Software Entwicklung
	Sozialwissenschaften ► Kommunikation / Medien ► Buchhandel / Bibliothekswesen
ISBN-10	0-12-416658-X / 012416658X
ISBN-13	978-0-12-416658-5 / 9780124166585

Haben Sie eine Frage zum Produkt?

PDF (Adobe DRM)
Größe: 13,6 MB

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

EPUB (Adobe DRM)
Größe: 11,5 MB

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Softcover

46,10 €