Data Mining Docente: Susi Dulli

 

Modalità d'esame:

L'esame consisterà  in una prova scritta e una pratica. La prova pratica vertirà in tesine e/o esercitazioni concordate con il docente.

Programma in linea di massima (sarà ulteriormente dettagliato nell'ambito del corso)

Data Base  Design e  Data Warehousing
Introduzione al Data Mining, concetti e overview del processo di KDD 
Algoritmi di Data Mining:

 

Articoli per la ricerca scientifica da utilizzarsi eventualmente per la tesina

Regole di associazione 

J. Han, J. Pei, and Y. Yin. "Mining Frequent Patterns without Candidate Generation", Proc. of 2000 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD'00), Dallas, TX, May 2000.

R. Agrawal, T. Imielinski, A. Swami. "Mining Associations between Sets of Items in Massive Databases", Proc. of the ACM-SIGMOD 1993 Int'l Conference on Management of Data, Washington D.C., May 1993, 207-216.

R. Agrawal, R. Srikant. "Fast Algorithms for Mining Association Rules", Proc. of the 20th Int'l Conference on Very Large Databases, Santiago, Chile, Sept. 1994.

S. Orlando, P. Palmerini, R. Perego, F. Silvestri. "Adaptive and Resource-Aware Mining of Frequent Sets", Proc. of the 2002 IEEE Int. Conf. on Data Mining (ICDM 2002), Maebashi City, Japan, 2002.

Dougherty, R. Kohavi and M. Sehrami. "Supervised and Unsupervised Discretization of Continuous Features", ICML 1995.

Pang-Ning Tan and Vipin Kumar. "Interestingness Measures for Association Patterns : A Perspective", Technical Report # TR00-036, 2000, U Minnesota.

Shiby Thomas, Sreenath Bodagala, Khaled Alsabti, Sanjay Ranka. "An Efficient Algorithm for the Incremental Updation of Association Rules in Large Databases". In Proceedings of the 3rd International conference on Knowledge Discovery and Data Mining (KDD 97), New Port Beach, California. August 1997.

H. Toivonen. "Sampling large databases for association rules". In 22th International Conference on Very Large Databases (VLDB'96), 134-145, Mumbay, India, September 1996.

J. Han and Y. Fu, Discovery of Multiple-Level Association Rules from Large Databases". In Proc. of 1995 Int'l Conf. on Very Large Data Bases (VLDB'95), Zürich, Switzerland, September 1995, pp. 420-431.

R. Ng, L. V. S. Lakshmanan, J. Han and A. Pang. "Exploratory Mining and Pruning Optimizations of Constrained Associations Rules". Proc. of 1998 ACM-SIGMOD Conf. on Management of Data, Seattle, Washington, June 1998.

S. Sarawagi, S. Thomas, R. Agrawal. "Integrating association rule mining with databases: alternatives and implications". Proc. of the ACM SIGMOD Int'l Conference on Management of Data, Seattle, Washington, June 1998.

R. Srikant, R. Agrawal. "Mining Generalized Association Rules". Proc. of the 21st Int'l Conference on Very Large Databases, Zurich, Switzerland, Sep. 1995.

 

Mining Sequenziale

R. Agrawal, R. Srikant. "Mining Sequential Patterns". Proc. of the Int'l Conference on Data Engineering (ICDE), Taipei, Taiwan, March 1995.

R. Srikant, R. Agrawal. "Mining Sequential Patterns: Generalizations and Performance Improvements". Proc. of the Fifth Int'l Conference on Extending Database Technology (EDBT), Avignon, France, March 1996 (Expanded version appeared as "IBM Research Report RJ 9994).

Mohammed J. Zaki, "Efficient Enumeration of Frequent Sequences". 7th International Conference on Information and Knowledge Management, Washington DC, November 1998.

H. Mannila, H. Toivonen, and A. I. Verkamo. "Discovery of frequent episodes in event sequences". Data Mining and Knowledge Discovery, 1(3): 259 - 289, November 1997.

K. Koperski and J. Han. "Discovery of Spatial Association Rules in Geographic Information Databases". Proc. 4th Int'l Symp. on Large Spatial Databases (SSD95), Maine, Aug. 1995, pp. 47-66.

Clustering e Outlier detection

Alexander Hinneburg, Daniel A. Keim. "Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering". VLDB'99.

Ester M., Kriegel H.-P., Sander J., Xu X. "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise". Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD-96), Portland, OR, 1996, pp. 226-231.

S. Guha, R. Rastogi and K. Shim. "CURE: An efficient algorithm for clustering large databases". In Proceedings of ACM-SIGMOD 1998 International Conference on Management of Data, Seattle, 1998.

Harsha Nagesh, Sanjay Goil, and Alok Choudhary. "Adaptive Grids for Clustering Massive Data Sets". SIAM Conference on Data Mining, 2001.

S. Guha, R. Rastogi and K. Shim. "ROCK: a robust clustering algorithm for categorical attributes". In Proceedings of International Conference on Data Engineering, 1999.

Tian Zhang, Raghu Ramakrishnan, Miron Livny. "BIRCH: An Efficient Data Clustering Method for Very Large Databases". SIGMOD96.

R. Ng and J. Han. "Efficient and Effective Clustering Method for Spatial Data Mining". Proc. of 1994 Int'l Conf. on Very Large Data Bases (VLDB'94), Santiago, Chile, September 1994, pp. 144-155.

Edwin M. Knorr and Raymond T. Ng. "A Unified Notion of Outliers: Properties and Computation". Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, Newport Beach, CA, August 14-17, 1997.

Edwin M. Knorr and Raymond T. Ng. "Algorithms for Mining Distance-Based Outliers in Large Datasets". Proceedings of the 24th VLDB Conference, New York, August 24-27, 1998.

Classification

Bing Liu, Wynne Hsu, Yiming Ma. "Integrating Classification and Association Rule Mining". Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98, Plenary Presentation), New York, USA, 1998.

Pedro Domingos and Geoff Hulten. "Mining High-Speed Data Streams". Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining (pp. 71-80), 2000. Boston

Johannes Gehrke, Raghu Ramakrishnan, Venkatesh Ganti. "RainForest - A Framework for Fast Decision Tree Construction of Large Datasets". VLDB 1998 (416-427).

R. Agrawal and R. Srikant. "Privacy-Preserving Data Mining". Proc. of the ACM SIGMOD Conference on Management of Data, Dallas, May 2000.

J.C. Shafer, R. Agrawal, M. Mehta. "SPRINT: A Scalable Parallel Classifier for Data Mining". Proc. of the 22th Int'l Conference on Very Large Databases, Mumbai (Bombay), India, Sept. 1996

M. Mehta, R. Agrawal and J. Rissanen. "SLIQ: A Fast Scalable Classifier for Data Mining". Proc. of the Fifth Int'l Conference on Extending Database Technology, Avignon, France, March 1996.

K. Alsabti, S. Ranka and V. Singh. "CLOUDS: A Decision Tree Classifier for Large Datasets". Conference on Knowledge Discovery and Data Mining (KDD-98

Surajit Chaudhuri, Usama Fayyad, Jeff Bernhardt. "Scalable Classification over SQL Databases". Proceedings of the IEEE Data Engineering Conference, Sydney, 1999.

K. Koperski, J. Han, and N. Stefanovic. "An Efficient Two-Step Method for Classification of Spatial Data". Proc. 1998 International Symposium on Spatial Data Handling SDH'98, , Vancouver, BC, Canada, July 1998.

Web and Text Mining

Soumen Chakrabarti, Byron Dom, Rakesh Agrawal, Prabhakar Raghavan. "Scalable Feature Selection, Classification and Signature Generation for Organizing Large Text Databases into Hierarchical Topic Taxonomies" VLDB Journal 1998.

Soumen Chakrabarti. "Data mining for hypertext: A tutorial survey". ACM SIGKDD Explorations, 1(2), pages 1--11, 2000.

Soumen Chakrabarti, Byron Dom and Piotr Indyk. "Enhanced hypertext categorization using hyperlinks. In SIGMOD 1998.

Soumen Chakrabarti, M. van den Berg and B. Dom. "Focused crawling: A new approach to topic-specific Web resource discovery". In Proc. of WWW8, Toronto, May 1999.

Daniel Boley, Maria Gini, Robert Gross, Eui-Hong (Sam) Han, Kyle Hastings, George Karypis, Vipin Kumar, Bamshad Mobasher, and Jerome Moore. "Document Categorization and Query Generation on the World Wide Web Using WebACE". AI Review, 1999

Cyrus Shahabi, Amir Zarkesh, Jafar Abidi, and Vishal Shah. "Knowledge Discovery from User's Web-Page Navigation". Seventh International Workshop on Research Issues in Data Engineering, April 7-8,1997.

Tak Woon Yan, Matthew Jacobsen, Hector Garcia-Molina, Umeshwar Dayal. "From User Access Patterns to Dynamic Hypertext Linking". Fifth International World Wide Web Conference, May 1996.

O. R. Zaiane, M. Xin, J. Han. "Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web Logs". Proc. Advances in Digital Libraries Conf. (ADL'98), Santa Barbara, CA, April 1998, pp. 19-29.

D.W. Cheung, B. Kao, and J.W. Lee. "Discovering User Access Patterns on the World-Wide-Web". Proc. First Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-97), Singapore, February, 1997.

R. Cooley, B. Mobasher, J. Srivastava. "Web Mining: Information and Pattern Discovery on the World Wide Web (A Survey Paper)". In Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'97), November 1997.

Soumen Chakrabarti, Byron E. Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins, David Gibson, and Jon Kleinberg. "Mining the Web's Link Structure". IEEE Computer, Vol. 32, No. 8, August 1999.

Masaru Kitsureawa, Masashi Toyoda, Iko Pramudiono. "WEB community mining and WEB log mining: Commodity Cluster based Execution". Thirteenth Australasian Database Conference (ADC2002)

Daniel Tkach. "Text Mining Technology: Turning Information Into Knowledge". A White Paper from IBM Software Solutions, 1998.

Raymond Kosala, Hendrik Blockeel. "Web Mining Research: A Survey". SIGKDD Explorations. July 2000.

Helena Ahonen, Oskari Heinonen, Mika Klemettinen, Inkeri Verkamo. "Applying Data Mining Techniques in Text Analysis". Report C-1997-23, Department of Computer Science, University of Helsinki, 1997.