|
|

|
|

This year, for the first time, the KDD-97 Organization is organizing a
Knowledge Discovery and Data Mining (KDDM) tools competition
(KDD-CUP-97) in conjunction with the 3rd International Conference on
Knowledge Discovery and Data Mining (KDD-97.)
The Cup is open to all KDDM tool vendors, academics and corporations
with significant applications. All products, applications, research
prototypes and black-box solutions are welcome. If requested, the
anonymity of the participants and their affiliated companies/
institutions will be preserved. Our aim is not to rank the
participants but to recognize the most innovative, efficient and
methodologically advanced KDDM tools.
Attendance at the KDD-97 conference is not required to participate in
the CUP. Participants are required to demonstrate the performance of
their KDDM tool in the area of supervised learning (classification or
discrimination.) In the interest of time, the regression (or
prediction) category, the clustering (or segmentation) category and
other descriptive modeling techniques, such as association rules are
not included in the competition this year.
The winners will be determined based on a weighted combination of
classification accuracy (or predictive power,) software novelty (or
innovation,) efficiency (people and CPU time) and the data mining
methodology employed. The top three performing tools will be awarded
Gold, Silver and Bronze Miner awards and will be listed in the
KD Nuggets web site
until the beginning of
the KDD-98 conference, unless the participants and their affiliated
companies/institutions wish to remain anonymous.
|


June 19, 1997: Data set release date
July 27, 1997: Participants turn-in their results
August 11, 1997: Individual performance evaluations sent to the
participants
August 14, 1997: Public announcement of the top three performing tools
during the KDD-97 conference.
|


Vasant Dhar (New York University, NY, USA)
Ronen Feldman (Bar-Ilan University, Ramat-Gan, Israel)
Ismail Parsa (Epsilon Data Management, Burlington, MA, USA)
Gregory Piatetsky-Shapiro (Knowledge Stream Partners/Geneve Consulting Group,
Cambridge, MA, USA)
|


Although the predictive power, i.e., the classification accuracy, of
the resulting model measured in terms of lift (the term 'lift' implies
improvement over random or no prediction) will be the primary
evaluation criterion in the classification category, the winner will
be selected based on a weighted combination of all of the following:
A. Software Novelty/Innovation, e.g., unified approach to analyses
through the implementation of analytic metadata, integration of
data mining with data visualization, integration with other systems
in novel ways, user interaction, built-in intelligence, etc.
B. Efficiency, i.e., people and CPU time
C. KDD Methodology, including but not limited to:
1. Data Archaeology, including but not limited to:
a. Data Hygiene (quality-control and cleaning)
Identify and eliminate noise
b. Preprocessing
Identify and eliminate constants
Identify and treat missing values
Identify (and treat) outliers
Identify (and treat) non-linearity
Identify (and treat) non-normality
Create derived features based on string-to-numeric conversions
Create derived features based on dates
Create derived features based on time series smoothing
Discretize or bin continuous features
Discretize or bin nominal features based on a criterion
Create derived features based on feature interactions
Create derived features based on transformations
Identify feature measurement scales: nominal, continuous, etc.
2. Exploratory Data Analysis (EDA), including but not limited to:
Collinearity screening (elimination of redundant features)
Feature dimensionality reduction
Feature subset selection
Data visualization
3. Model Development and Implementation, including but not limited to:
Application of data mining algorithm(s)
Evaluation of alternative algorithms, modeling technologies
Validation of results (to avoid over-fitting)
Interpretability of extracted patterns
Data visualization
Return on investment (ROI) or back-end analysis
Application of learned knowledge to the universe, i.e., scoring.
|


To register for KDD-CUP-97, please fill out the KDD Cup
registration form , and
email or fax it to:
Ismail Parsa
Epsilon Data Management
50 Cambridge Street
Burlington MA 01803 USA
E-mail: iparsa@epsilon.com
Phone: (617) 273-0250*6734
Fax: (617) 272-8604
|


Detailed information regarding the rules of the competition will be
sent to the participants later.
|

|
|