KDD95 Description of Demos
Demo SessionChair: Tej Anand, AT&T Global Information Solutions
6:00-8:00 PM, Sunday, August 20th
Room 407A, Palais des Congrès
Demo Presentation List:
Ronen Feldman
Hing-Yan Lee, Hwee-Leng Ong and Lee-Hian Quek
Christopher Matheus and Gregory Piatetsky-Shapiro
Arun Sanjeev and Jan Zytkow
Dr. Jiawei Han and Yongjian Fu
Richard Scheines, Peter Spirtes, Clark Glymour, and Christopher Meek
KDD95 HOME Page
(Return to List)
Knowledge Discovery in Textual Databases
Ronen Feldman,
Bar-Ilan University
Most KDD systems only handle structured databases. However, much
online information is in the form of unstructured text. KDT is a
system for the browsing and analysis of collections of unstructured
texts. Each document in the collection is annotated by a set of
keywords organized in a hierarchical structure. KDT enables the user
to browse the textual database by selecting keywords from the
hierarchy and viewing their distributions against other classes of
keywords. KDT also enables the user to compare distributions of
similar keywords and view the results using tables and graphs.
Finally, as in traditional KDD systems, KDT searches for irregular
distributions, correlations, and associations based on conditions and
thresholds supplied by the user. KDT includes a browsing facility in
which the user can click on any discovered pattern and get the list of
all documents that contributed to this pattern. KDT is implemented on
MS-Windows, and was designed with a special emphasis on efficiency and
ease of use.
(Return to List)
WinViz and Machine Learning: An integrated Approach to Data Mining
Hing-Yan Lee, Hwee-Leng Ong and Lee-Hian Quek,
Information Technology Institute
WinViz and Machine Learning: An integrated Approach to Data Mining
Knowledge Discovery in Databases encompasses many technologies such as
visualization and machine learning. In ITI, we have developed WinViz
that uses a multidimensional visualization (MDV) technique to discover
patterns and trends in multi-dimensional data. However, we find that
a synergistic combination of WinViz with machine learning provides an
even greater leverage for KDD. To this end, we have developed a
prototype that seamlessly integrates these two technologies.
The use of WinViz for KDD has its motivation from the adage that a
picture is worth a thousand words. WinViz is a visual data analysis
tool. It presents a global view of the data in a single picture. It
also has an interactive visual query interface that allows one to
formulate hypothesis and drill down through the data to discover
hidden patterns and trends. Using WinViz, we can quickly discover
relationships between different attributes in a dataset.
We have integrated WinViz with the popular machine learning algorithm
C4.5. The if-then rules generated by C4.5 can be visualized on WinViz
to spot potential exceptions to the rules.
The integration harnesses the interactivity and visual representation
of WinViz with the generalization capability of C4.5.
(Return to List)
KEFIR: The Key Findings Reporter for the Analysis of Healthcare Information
Christopher Matheus and Gregory Piatetsky-Shapiro,
GTE Labs.
Key Findings Reporter, (KEFIR) a system for discovering and explaining ``key
findings'' in large, changing databases, is currently being applied to the
analysis of GTE healthcare data. The system performs an automatic drill-down
through data along multiple dimensions to determine the most interesting
deviations of specific quantitative measures relative to their previous and
expected values. It explains ``key'' deviations through their relationship to
other deviations in the data, and, where appropriate, generates recommendations
for actions in response to these deviations. KEFIR uses Netscape, a WWW
browser, to present its findings in a hypertext report, with natural language
and business graphics.
Status: Application in beta-testing.
(Return to List)
Automated Large-scale Data Mining by Forty-Niner (49er)
Arun Sanjeev and Jan Zytkow
Universities all over the world vary widely in their student population,
environmental setting, academic programs offered, etc. Yet, higher educational
problems like enrollment, attrition, retention, and many others faced by all the
universities are strikingly similar. Large databases consisting of hundreds of
thousands of student records exist in universities. These student databases are
useful source of knowledge for resolving problems faced by universities. But
the knowledge is implicit in the data and must be mined and expressed in an
useful form. We demonstrate an application of Forty-Niner (49er) on our
university student database.
49er is an automated discovery system which explores databases in search for
knowledge. 49er discovers knowledge in the form of regularities, that is
statements of the form ``Pattern P holds for data in range R''. We show how
49er systematically searches a large number of data subsets discovering even
patterns that occur in limited circumstances. 49er evaluates and reports only
those patterns that pass the user thresholds. As an example, we demonstrate a
focused search (evaluating remedial programs) where thresholds are controlled to
select the most weakest pattern in the data. The regularities discovered
through incremental exploration are useful for managing enrollment at our
university.
Status: Fielded application
(Return to List)
Mining various kinds of knowledge by DBMiner (previously DBLearn)
Dr. Jiawei Han and Mr. Yongjian Fu
Database System Research Lab., Computing Science,
Simon Fraser University
The major features of DBMiner (an early version named DBLearn) include:
1. integration of machine learning and database technologies,
2. discovery of different kinds of knowledge from large databases, including
characteristic, discriminant, association, and classification rules,
3. high speed and efficiency in analyzing large databases,
4. interactive knowledge mining, and
5. smooth integration with commercial relational database systems.
The system will be demonstrated using a large database, an SQL-like data mining
language, and an interactive graphical user interface.
Status: A research prototype, seeking for commercialization and applications
(Return to List)
TETRAD II: Tools for Discovery
Richard Scheines, Peter Spirtes, Clark Glymour, and Christopher Meek,
Carnegie Mellon University
TETRAD II is a multi-module program that assists in the construction of Bayes
networks or causal models for sample data and in the use of Bayes networks in
prediction. With continuous variables the program will aid in the search for
"path models" or "structural equation models;" with discrete data the program
will construct and update a Bayes network from sample data and user knowledge of
the domain; the program includes Monte Carlo facilities. Proofs of the
asymptotic correctness of all but one of the search modules are available in P.
Spirtes, C. Glymour and R. Scheines, Causation, Prediction and Search, Springer
Lecture Notes in Statistics, 1993.
Platform(s): DOS
A Unix version may be available soon.
The DOS software comes with a 250 page manual with chapters on theoretical
foundations, interpreting output, and a chapter on each of the software modules.
Each of the chapters include many detailed examples.
Status: Commercially available