Why is KDD important?
Almost every company has been through a customer profiling exercise.
To better target your markets and best customer prospects you need to
be able to answer these proverbial questions:
The customer profile will describe what the customers are like in marketing
terms, spending patterns, payment histories, repeat business opportunities,
cross selling products, and similar categories that are considered significant
to ongoing business development.
- Who is my best customer / my worst customer?
- Who is my single buy customer?
- What do they buy?
- When do they buy it?
- Why don't they buy?
Important topics in analysing data
Data Mining is the process of finding new and potentially useful knowledge
from data. According to a recent study by Gartner Group, worldwide spending
on Data Mining licenses and services is expected to reach $76.3 billion
in 2005, more than tripling the $23.3 billion spend in 2000. The most
important business tasks in Data Mining are:
- Customer Relationship Management is a strategy used to learn
more about customers' needs and behaviors in order to develop stronger
relationships with them. After all, good customer relationships are
at the heart of business success. CRM to be effective requires using
information about customers and prospects in all stages of their relationship
with a company. From the company's point of view, the stages are acquiring
customers, increasing the value of customers and retaining good customers.
Read more about Customer Relationship Management.
- Direct mailing is a commonly chosen method by companies as
a part of their direct marketing strategies. Of course every company
wants its mailings to be as effective as possible. The effectiveness
of a mailing campaign can be measured by its response rate. A high response
rate means that the marketing goals have been achieved and therefore
that the mailing costs were justified. A company that regularly sends
mails for marketing purposes can reduce the mailing costs considerably
by optimizing the responses using data mining techniques.
One example of a direct mailing action is described here.
- Fraud detection systems enable an operator to respond to fraud
by denying services to or detecting and preparing prosecutions against
fraudulent users. The huge volume of call activity in a network means
that fraud detection and analysis is a challenging problem.
- Other tasks are the prediction of sales in order to minimize
stocks, the prediction of electricity consumption or telecommunication
services at particular day times in order to minimize the use of external
services or optimize network routing, respectively. The health sector
demands several analysis tasks for resource management, quality control,
and decision making.
On-line Analytical Processing (OLAP) offers interactive data analysis
by aggregating data and counting the frequencies. This already answers
questions like the following:
Reports that support decision making need more detailed information. Questions
are more specific, for instance:
- What are the attributes of my most frequent customers?
- Which are the frequently sold products?
- How many unpaid bills do I have to expect per year?
- How many returns did I receive after my last direct mailing action?
- Which customers are most likely to sell their insurance contract back
to the insurance company before it ends?
- How many sales of a certain item do I have to expect in order to not
offer empty shelves to customers and at the same time minimize my stock?
Knowledge Discovery in Databases, or KDD for short, refers
to the broad process of finding knowledge in data, and emphasizes the
"high-level" application of particular data mining methods. The unifying
goal of the KDD process is to extract knowledge from data in the context
of large databases. Knowledge Discovery in Databases (KDD) can be considered
a high-level query language for relational databases that aims at generating
sensible reports such that a company may enhance its performance. KDD
enables analysts to model virtually any customer activity and to find
previously hidden patterns relevant to current business problems, or business
evolution and growth.
But data mining is a difficult process which requires many iterations
and adaptions in the data and in the parameter settings until a satisfactory
result is achieved. Within the data mining process considerable time is
spend for pre-processing the data (data cleaning and handling of null
values), feature generation and selection (in databases this means to
construct additional columns and select the relevant attributes). Practical
experiences have shown that the time spend on preprocessing can take from
50% up to 80% of the entire data mining process when using the traditional
attribute-value learners. That´s why preprocessing is the key issue
in data analysis.
The MiningMart Approach
MiningMart can help to reduce this time. The MiningMart
project aims at new techniques that give decision-makers direct
access to information stored in databases, data warehouses, and
knowledge bases. The main goal is to support users in making intelligent
choices by offering following objectives:
- Operators for preprocessing with direct database access
- Use of machine learning for the preprocessing
- Detailed documentation of successful cases
- High quality discovery results
- Scalability to very large databases
- Techniques that automatically select or change representations.
What is MiningMarts path to reaching the goal? Read
Examples of successfully applied Data Mining Cases with
the MiningMart System
The MiningMart System was successfully applied in two telecommunications companies, the National Institute of Telecommunications in Warsaw, Poland, and the Telecom Italia Lab in Alessandria, Italy. The details of these cases are published in the internet case base that MiningMart provides (see next paragraph).
Case base of successful cases on the internet
One of the projects objectives is to set up a case-base of successful
cases on the internet. The shared knowledge allows all internet users
to benefit from a new case. Submitting a new case of best practice is
a safe advertisement for KDD specialists or service providers, since the
relational data model is kept private. Only the conceptual and
the case model is published. The case base can be found here.
A detailed description of the case base is available here.