Program
Schedule
 
  Keynotes
Invited Talks
 
Papers
Posters
IT Papers
 
Panels
Tutorials
Workshops
 
Exhibits
KDD Cup
 
Attending
Registration
Hotel
Airline
 
Organization
Chairs
PC
 

Keynotes


K1: Mass Collaboration and Data Mining

Raghu Ramakrishnan
CTO and Founder, QUIQ; Professor, University of Wisconsin, Madison.

Session Chair: Ramakrishnan Srikant, IBM Almaden Research Center

Mass Collaboration is a new "P2P"-style approach to large-scale knowledge sharing, with applications in customer support, focused community development, and capturing knowledge distributed within large organizations. Effectively supporting this paradigm raises many technical challenges, and offers intriguing opportunities for mining massive amounts of data captured continually from user interactions. Data mining offers the promise of increased business intelligence, and also improved user experiences, leading to increased participation and greater quality in the knowledge that is captured, both of which are central objectives in Mass Collaboration. In this talk, I will introduce Mass Collaboration and discuss some important data mining related issues.

K2: Extracting Targeted Data from the Web

Tom Mitchell
WhizBang! Labs and Carnegie Mellon University.

Session Chair: Foster Provost, New York University

Many knowledge discovery systems begin with a relational database containing relevant historical data. Increasingly, one finds relevant data trapped in unstructured text. This talk considers the problem of automatically extracting relational databases of factual information, from text.

At WhizBang! Labs, we have developed a collection of machine learning algorithms that can be trained to extract targetted information from large volumes of unstructured text. For example, these have been trained to collect descriptions of job postings (including job title, location, etc.) from the web, resulting in the world's largest database of job postings (see www.flipdog.com). They have also been trained to extract continuing education course information from university web sites, company descriptions from corporate web sites, and biographical information from the web. This talk will survey several of the machine learning algorithms used, and summarize key lessons learned.

K3: Challenges for Knowledge Discovery in Biology

Russ Altman
Associate Professor of Medicine, Stanford University.

Click here for presentation slides

Session Chair: David Page, University of Wisonsin at Madison.

Bioinformatics is the study of information flow in biology. Interest in the field has exploded in the last 10 years with the emergence of techniques for large scale experimental data collection--including genome sequencing, gene expression analysis, protein interaction detection, high throughput structure determination and others. These techniques, in the context of a large online published literature, have created relatively large data sets (by biological standards) that are not possible to analyze manually. There is therefore a critical need for methods to analyze these data and reduce them to new knowledge. The principle challenges to the field include the great diversity of data types and questions that are asked of the data, and the communication difficulties that can exist between experts in biology and experts in machine learning. In this talk, I will provide an introduction to the major biological questions that are being addressed, why they are important, and how the field is trying to address them with technical approaches.