Data Mining: The Quest Perspective

Invited Presentation:
Data Mining: The Quest Perspective

Rakesh Agrawal
IBM Almaden Research Center, San Jose, CA 95120, U.S.A.
ragrawal@almaden.ibm.com

Abstract

Data mining is the efficient discovery of previously unknown patterns in large databases, and is emerging as a major application area for databases. The Quest project on data mining at the IBM Almaden Research Center has developed technologies to discover useful patterns in huge amounts of data in a short amount of time. This software can be used to solve the following customer problems:

Associations:
Given a database of sales transactions, find what sells together. Discover all associations such that the presence of one set of items in a transaction implies other items.
``If a customer buys salmon and mussels then the customer buys white wine too in 90% of the cases.''
Sequential Patterns:
Given a database of sales transactions, find what items customers buy over a set of visits in sequence.
``A customer orders sheets and pillowcases, followed by a comforter, followed by drapes in 70% of the cases.''
Classification:
Given examples of people belonging to different groups, develop a profile for each group. This profile is then used to retrieve instances of these groups from a different population.
``Buyers of expensive sport cars are typically young urban professionals with a high income whereas luxury sedans are preferred by elderly wealthy persons.''
Time Series Clustering:
Given a database of time sequences, find sequences similar to a given one, or find all pairs of similar sequences. Typical uses of this software include finding stocks with similar price movements, products with similar sales patterns, or store/departments with similar revenue streams.
``If sales of coke go up due to a promotion, sales of salted snacks also go up.''

The Quest software has been tested on several customer datasets. Several customers are actively developing various applications of Quest in retail, finance, and other industries. The software currently runs on RS/6000 workstations under AIX on flat files, DB2/CS, and DB2/MVS data bases. Parallel algorithms for finding association rules also run on IBM Power Parallel System. The software has been designed to be easily portable to other platforms and run on multiple data-repositories. In this talk, I will draw upon my Quest experience to present my perspective of data mining, describe current work, and present some open problems.

Conference Home Page

Invited Presentation: Data Mining: The Quest Perspective

Abstract

Invited Presentation:
Data Mining: The Quest Perspective