This year, for the first time, the KDD-97 Organization is organizing a Knowledge Discovery and Data Mining (KDDM) tools competition (KDD-CUP-97) in conjunction with the 3rd International Conference on Knowledge Discovery and Data Mining (KDD-97.)

The Cup is open to all KDDM tool vendors, academics and corporations with significant applications. All products, applications, research prototypes and black-box solutions are welcome. If requested, the anonymity of the participants and their affiliated companies/ institutions will be preserved. Our aim is not to rank the participants but to recognize the most innovative, efficient and methodologically advanced KDDM tools.

Attendance at the KDD-97 conference is not required to participate in the CUP. Participants are required to demonstrate the performance of their KDDM tool in the area of supervised learning (classification or discrimination.) In the interest of time, the regression (or prediction) category, the clustering (or segmentation) category and other descriptive modeling techniques, such as association rules are not included in the competition this year.

The winners will be determined based on a weighted combination of classification accuracy (or predictive power,) software novelty (or innovation,) efficiency (people and CPU time) and the data mining methodology employed. The top three performing tools will be awarded Gold, Silver and Bronze Miner awards and will be listed in the KD Nuggets web site until the beginning of the KDD-98 conference, unless the participants and their affiliated companies/institutions wish to remain anonymous.


June 19, 1997: Data set release date
July 27, 1997: Participants turn-in their results
August 11, 1997: Individual performance evaluations sent to the participants
August 14, 1997: Public announcement of the top three performing tools during the KDD-97 conference.


Vasant Dhar (New York University, NY, USA)
Ronen Feldman (Bar-Ilan University, Ramat-Gan, Israel)
Ismail Parsa (Epsilon Data Management, Burlington, MA, USA)
Gregory Piatetsky-Shapiro (Knowledge Stream Partners/Geneve Consulting Group, Cambridge, MA, USA)


Although the predictive power, i.e., the classification accuracy, of the resulting model measured in terms of lift (the term 'lift' implies improvement over random or no prediction) will be the primary evaluation criterion in the classification category, the winner will be selected based on a weighted combination of all of the following:

A. Software Novelty/Innovation, e.g., unified approach to analyses through the implementation of analytic metadata, integration of data mining with data visualization, integration with other systems in novel ways, user interaction, built-in intelligence, etc.

B. Efficiency, i.e., people and CPU time

C. KDD Methodology, including but not limited to:

1. Data Archaeology, including but not limited to:
a. Data Hygiene (quality-control and cleaning)
Identify and eliminate noise
b. Preprocessing
Identify and eliminate constants
Identify and treat missing values
Identify (and treat) outliers
Identify (and treat) non-linearity
Identify (and treat) non-normality
Create derived features based on string-to-numeric conversions
Create derived features based on dates
Create derived features based on time series smoothing
Discretize or bin continuous features
Discretize or bin nominal features based on a criterion
Create derived features based on feature interactions
Create derived features based on transformations
Identify feature measurement scales: nominal, continuous, etc.

2. Exploratory Data Analysis (EDA), including but not limited to:
Collinearity screening (elimination of redundant features)
Feature dimensionality reduction
Feature subset selection
Data visualization

3. Model Development and Implementation, including but not limited to:
Application of data mining algorithm(s)
Evaluation of alternative algorithms, modeling technologies
Validation of results (to avoid over-fitting)
Interpretability of extracted patterns
Data visualization
Return on investment (ROI) or back-end analysis
Application of learned knowledge to the universe, i.e., scoring.


To register for KDD-CUP-97, please fill out the KDD Cup registration form , and email or fax it to:

Ismail Parsa
Epsilon Data Management
50 Cambridge Street
Burlington MA 01803 USA
E-mail: iparsa@epsilon.com
Phone: (617) 273-0250*6734
Fax: (617) 272-8604


Detailed information regarding the rules of the competition will be sent to the participants later.

home | top