KDD Cup  

Home Page
KDD Cup 2008
KDD Cup 2007
KDD Cup 2006
KDD Cup 2005
KDD Cup 2004
KDD Cup 2003
KDD Cup 2002
KDD Cup 2001
KDD Cup 2000
KDD Cup 1999
KDD Cup 1998
KDD Cup 1997
SIGKDD

KDD Cup 1998: Datasets

Abstract

This is the data set used for The Second International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-98 The Fourth International Conference on Knowledge Discovery and Data Mining. The competition task is a regression problem where the goal is to estimate the return from a direct mailing in order to maximize donation profits.

Usage Notes

The KDD-CUP-98 data set and the accompanying documentation are now available for general use with the following restrictions:
  1. If you intend to use this data set for training or educational purposes, you must not reveal the name of the sponsor PVA (Paralyzed Veterans of America) to the trainees or students. You are allowed to say "a national veterans organization"...

Information files
  • readme. This list, listing the files in the FTP server and their contents.
  • instruct.txt . General instructions for the competition.
  • cup98doc.txt. This file, an overview and pointer to more detailed information about the competition.
  • cup98dic.txt. Data dictionary to accompany the analysis data set.
  • cup98que.txt. KDD-CUP questionnaire. PARTICIPANTS ARE REQUIRED TO FILL-OUT THE QUESTIONNAIRE and turn in with the results.
  • valtargt.readme. Describes the valtargt.txt file.
Data files
  • cup98lrn.zip PKZIP compressed raw LEARNING data set. (36.5M; 117.2M uncompressed)
  • cup98val.zip PKZIP compressed raw VALIDATION data set. (36.8M; 117.9M uncompressed)
  • cup98lrn.txt.Z UNIX COMPRESSed raw LEARNING data set. (36.6M; 117.2M uncompressed)
  • cup98val.txt.Z UNIX COMPRESSed raw VALIDATION data set. (36.9M; 117.9M uncompressed)
  • valtargt.txt. This file contains the target fields that were left out of the validation data set that was sent to the KDD CUP 98 participants. (1.1M)