main page

MiningMart Approach Part II


Mining Mart Approach



All the information about the conceptual descriptions and about the according database objects involved are represented within the M4 model and stored within relational tables. “M4-Cases” denote a collection of steps, basically performed sequentially, each of which changes or augments one or more concepts. Each step is related to exactly one operator, and holds all of its input arguments.


The M4 compiler reads the specifications of steps and starts the according executable operator, passing all the necessary inputs to it. Depending on the operator, the database might be altered. In any case the M4 meta-data will have to be updated by the compiler. A machine learning tool to replace missing values is an example for operators altering the database. In contrast, for operators like a join it is sufficient to virtually add the resulting view – together with corresponding SQL-statement – to the meta-data.

The task of a case designer, ideally a data mining expert, is to find sequences of steps resulting in a representation well suited for data mining. This work is supported by a special tool, the case editor. Usually a lot of different operators will be involved in a case of preprocessing steps. A list of available operators and their overall categories, e.g.Feature Construction, Clustering or Sampling is part of the conceptual case model M4. In every step the case designer chooses an applicable operator, sets all the parameters of the operator, assigns the input concepts, input attributes and/or input relations and gives some specifics about the output. Applicability conditions are considered in two ways. On one hand, constraints of operators can be checked on the basis of meta-data. These are, for instance, the presence or absence of NULL values. On the other hand, the conditions of operators can be checked on the basis of the business data when running the case. Applicability constraints and conditions support the case designer by checking the validity of a sequence of steps while it is created.

The sequence of many steps, namely a case, transforms the original database into another representation. Each step and their ordering is formalized within M4, so the system is automatically keeping track of the performed activities. This enables the user to interactively edit and replay the case or parts of it. Further more, as soon as an efficient chain of preprocessing has been found, it can easily be exported by just submitting the conceptual meta-data.

What operators are implemented in the Mining Mart System?