Invited Presentation:
Data Mining: The Quest Perspective
Rakesh Agrawal
IBM Almaden Research Center,
San Jose, CA 95120, U.S.A.
ragrawal@almaden.ibm.com
Abstract
Data mining is the efficient discovery of
previously unknown patterns in large databases,
and is emerging as a major application area for databases.
The Quest project on data mining at the IBM Almaden
Research Center has developed technologies to discover
useful patterns in huge amounts of data in a short amount of time.
This software can be used to solve the following customer problems:
-
Associations:
Given a database of sales transactions, find what sells together.
Discover all associations such that the presence of
one set of items in a transaction implies other items.
``If a customer buys salmon and mussels then the customer
buys white wine too in 90% of the cases.''
-
Sequential Patterns:
Given a database of sales transactions, find what items customers buy
over a set of visits in sequence.
``A customer orders sheets and pillowcases, followed by
a comforter, followed by drapes in 70% of the cases.''
-
Classification:
Given examples of people belonging to different groups, develop a
profile for each group. This profile is then used to retrieve
instances of these groups from a different population.
``Buyers of expensive sport cars are typically young urban professionals
with a high income whereas luxury sedans are preferred by elderly
wealthy persons.''
-
Time Series Clustering:
Given a database of time sequences, find sequences similar to a given one,
or find all pairs of similar sequences. Typical uses of this software
include finding stocks with similar price movements,
products with similar sales patterns, or
store/departments with similar revenue streams.
``If sales of coke go up due to a promotion, sales of salted snacks
also go up.''
The Quest software has been tested on several customer datasets.
Several customers are actively developing various applications of Quest
in retail, finance, and other industries.
The software currently runs on RS/6000 workstations under AIX on flat files,
DB2/CS, and DB2/MVS data bases. Parallel algorithms for finding association
rules also run on IBM Power Parallel System.
The software has been designed to be easily portable to other platforms
and run on multiple data-repositories.
In this talk, I will draw upon my Quest experience
to present my perspective of data mining, describe current work,
and present some open problems.
Conference Home Page