Your Navigation Path | Case Base MiningMart |
---|
Has Chains |
DataMining
HandleCallsData HandleCustomerData HandleRevenueData HandleServicesData Select Client Ids For Mining |
---|
This case demonstrates a complex application based on churn prediction in telecommunications. It exemplifies common aggregation and pivotisation tasks. Detailed data about phonecalls is aggregated and combined with other data to render a single mining table. The actual mining step is not included, instead the case demonstrates the combination of MiningMart with the external tool YALE which offers a comprehensive range of learning methods.
The case is from the telecommunications domain. It is centered on a Call Detail Records (CDR) table which stores information about each phonecall of each customer. Further, other tables provide additional data about the customers, such as contract-related information and the revenue generated from this customer, as computed by the accounts department of the company. The data is available for a two-year period.
The main business goal is churn prediction. Data of customers that have left the company in the past (positive examples for churn) is compared to data of current customers (negative examples for churn). The CDR data is aggregated to detailed fields showing how much time each customer spent calling internet providers, abroad phone numbers, freecall numbers etc. This data is compared for a six months period for each customer, deriving attributes about the amount and speed of change in these statistical indicators. The derived attributes, in combination with data from the other input tables, then allow a simple classification using decision trees.
This case is most interesting for its complex data preparation. Two issues play a central role: time and aggregation. There are two data sources which are indexed by time, but on different time scales. The more detailed source, the CDR table, is thus aggregated to the monthly level which is already available in the revenues table. Time-related attributes such as the amount and speed of change in customer behaviour were computed, so that standard classification algorithms could be applied to a single mining table. To compute these attributes, pivotisation of the monthly statistics into new attributes for each month was necessary.
Timm Euler, University of Dortmund, Computer Science VIII; Email: euler@ls8.cs.uni-dortmund.de; Web: www-ai.cs.uni-dortmund.de