ACM SIGKDD is pleased to sponsor webcasts as part of its efforts to improve data mining education and share the outstanding expertise of many data mining researchers with the broader audience.
If you are interested in presenting a webcast, then please contact:
SIGKDD Webcast director
gjames at kddlab.com
SF Bay ACM Data Mining SIG
Another excellent source of data mining webcasts can be found at:
Bayesian networks are graphical structures for representing the probabilistic relationships among a large number of variables and doing probabilistic inference with those variables. The 1990's saw the emergence of excellent algorithms for learning Bayesian networks from passive data. In 2004 I unified this research with my text Learning Bayesian Networks. This tutorial is based on that text and my paper. Neapolitan, R.E., and X. Jiang, "A Tutorial on Learning Causal Influences," in Holmes, D. and L. Jain (Eds.): Innovations in Machine Learning, Springer-Verlag, New York, 2005. I will discuss the constraint-based method for learning Bayesian networks using an intuitive approach that concentrates on causal learning. Then I will show a few real examples.
Richard E. Neapolitan is Professor and Chair of Computer Science at Northeastern Illinois University. He has previously written three books including the seminal 1990 Bayesian network text Probabilistic Reasoning in Expert Systems. More recently, he wrote the 2004 text Learning Bayesian networks, and Foundations of Algorithms, which has been translated to three languages and is one of the most widely-used algorithms texts world-wide. His books have the reputation of making difficult concepts easy to understand because of the logical flow of the material, the simplicity of the explanations, and the clear examples.
Algorithms like PageRank and HITS have been developed in late 1990s to explore links among Web pages to discover authoritative pages and hubs. Links have also been popularly used in citation analysis and social network analysis. We show that the power of links can be explored thoroughly at data mining in classification, clustering, information integration, and other interesting tasks. Some recent results of our research that explore the crucial information hidden in links will be introduced, including (1) multi-relational classification, (2) user-guided clustering, (3) link-based clustering, and (4) object distinction analysis. The power of links in other analysis tasks will also be discussed in the talk.
Jiawei Han has been working on research into data mining, data warehousing, database systems, data mining from spatiotemporal data, multimedia data, stream and RFID data, Web data, social network data, and biological data, with over 300 journal and conference publications. He has chaired or served on over 100 program committees of international conferences and workshops, including PC co-chair of 2005 (IEEE) International Conference on Data Mining (ICDM), Americas Coordinator of 2006 International Conference on Very Large Data Bases (VLDB). He is also serving as the founding Editor-In-Chief of ACM Transactions on Knowledge Discovery from Data. He is an ACM Fellow and has received 2004 ACM SIGKDD Innovations Award and 2005 IEEE Computer Society Technical Achievement Award. His book "Data Mining: Concepts and Techniques" (2nd ed., Morgan Kaufmann, 2006) has been popularly used as a textbook worldwide.
Multidisciplinary techniques to extract and mine useful knowledge from the Web.
Web content mining aims to extract/mine useful information or knowledge from Web page contents. Apart from traditional tasks of Web page clustering and classification, there are many other Web content mining tasks, e.g., data/information extraction, information integration, mining opinions from the user-generated content, mining the Web to build concept hierarchies, Web page pre-processing and cleaning, etc. Here, I will introduce these tasks and some of their basic algorithms. I will also try to put these tasks into a unified framework and present some fundamental challenges. The webcast will be useful to not only to students and researchers, but also to practitioners as many of these tasks and techniques have immediate real-life applications.
Biotechnology makes possible the performance of thousands of experiments in the time it used to take to perform just one. This tutorial describes a variety of data mining tasks arising from high-throughput biological data. Examples include gene expression microarrays, mass spectrometry for proteomics and metabonomics, single-nucleotide polymorphism arrays for genotyping, and robotic high-throughput screening for potential drug compounds. We will discuss the challenges of various data types and the technologies raised to address them. Case studies will be presented from a variety of data mining applications and we will speculate about novel data mining tasks likely to arise in the near future.
David Page received his Ph.D. in computer science from the University of Illinois at Urbana-Champaign. He was a research scientist in the Oxford University Computing Laboratory and also served as a visiting member of the Faculty of Mathematics. He is now an associate professor at the University of Wisconsin, Madison, Dept. of Biostatistics and Medical Informatics (College of Medicine and Public Health) and Dept. of Computer Sciences. David was a founding member of the Institute for Molecular Diversity and Drug Design, an inaugural member of the U.S. National Institutes of Health study on Biodata Management and Analysis, and is on the editorial boards of Data Mining and the Machine Learning Journal.
Overview of techniques for scaling information extraction to the Web.
Data mining applications over text require efficient methods for extracting and structuring the information embedded in millions, or billions, of text documents. This presentation reviews the current research on enabling information extraction to operate on Web scale. Different dimensions of scalability include corpus size, heterogeneity of the information sources, access to the documents, and the diversity of the extraction domains. This presentation will focus on the first three dimensions. First I will briefly review common information extraction tasks such as entity, relation, and event extraction, indicating the main scalability bottlenecks associated with each task. I will then review the key algorithmic approaches to improving the efficiency of information extraction, which include applications of randomized algorithms, ideas adapted from information retrieval, and recently developed specialized indexing techniques. I hope that data mining, databases, and knowledge management researchers and developers can build on these general ideas to develop more effective tools to manage and discover information in text.
Eugene Agichtein is an Assistant Professor in the Mathematics & Computer Science Department at Emory University. Previously, Eugene was a Postdoctoral Researcher in the Text Mining, Search, and Navigation group at Microsoft Research, working on data mining for information retrieval. He received a Ph.D. in Computer Science from Columbia University in 2005, and a B.S. in Engineering from The Cooper Union in 1998. Eugene co-authored several publications on scalable and efficient information extraction, including the best student paper award at the IEEE ICDE 2003 conference and the best paper award at the SIGMOD 2006 conference.