The Mining Mart approach
The first component
are operators that perform data transformations such as, e.g., discretization,
handling null values, aggregation of attributes into a new one, or collecting
sequences from time-stamped data. The operators directly access the database
and are capable of handling large masses of data. Machine learning
is not restricted to a data mining step, but is also applicable in preprocessing.
This view offers a variety of learning tasks that are not as well investigated
as is learning classifiers. For instance, an important task is to acquire
events and their duration (i.e. a time interval) on the basis of time
series (i.e. measurements at time points).
The second component are successful cases of knowledge discovery. Since most of the time is used to find chains of operator applications that lead to good answers to complex questions, it is cumbersome to develop such chains over and over again for very similar discovery tasks and data. Currently, even the same task on data of the same format is implemented anew every time new data are to be analysed. Therefore, the re-use of successful cases would speed up the process considerably. Cases of successful preprocessing are stored for their re-use.
Metadata of cases can be adapted to similar cases. A library of best-practice cases in the form of their meta-data is currently being collected. MiningMart presents cases from areas ranging from on-line monitoring in intensive care to direct mailing actions.The particular approach of the MininjgMart project is to allow the re-use of cases by means of meta-data, also called ontologies. Meta-data describe the data as well as the operator chains. A compiler generates the SQL code according to the meta-data.
Read more about the advantages of meta-data driven software generation.
The MiningMart project has developed a model for meta-data together with its compiler and implements human-computer interfaces that allow database managers and case designers to fill in their application-specific meta-data. The system will support preprocessing and can be used stand-alone or in combination with a toolbox for the data mining step.