Correlation Clustering

Francesco Bonchi, David GarcÍa-Soriano, Francesco Gullo (Autoren)

Buch | Softcover

XV, 133 Seiten

2022
Springer International Publishing (Verlag)
978-3-031-79198-7 (ISBN)

Artikel merken

Given a set of objects and a pairwise similarity measure between them, the goal of correlation clustering is to partition the objects in a set of clusters to maximize the similarity of the objects within the same cluster and minimize the similarity of the objects in different clusters. In most of the variants of correlation clustering, the number of clusters is not a given parameter; instead, the optimal number of clusters is automatically determined. Correlation clustering is perhaps the most natural formulation of clustering: as it just needs a definition of similarity, its broad generality makes it applicable to a wide range of problems in different contexts, and, particularly, makes it naturally suitable to clustering structured objects for which feature vectors can be difficult to obtain. Despite its simplicity, generality, and wide applicability, correlation clustering has so far received much more attention from an algorithmic-theory perspective than from the data-mining community. The goal of this lecture is to show how correlation clustering can be a powerful addition to the toolkit of a data-mining researcher and practitioner, and to encourage further research in the area.

Francesco Bonchi is Scientific Director at the ISI Foundation, Turin, Italy, where he's also coordinating the "Learning and Algorithms for Data Analytics" Research Area. Before becoming Scientific Director, he served as Deputy Director with responsibility over the Industrial Research area. Earlier, he was Director of Research at Yahoo Labs in Barcelona, Spain, where he led the Web Mining Research group. He is also (part-time) Research Director for Big Data & Data Science at Eurecat (Technological Center of Catalunya), Barcelona. His recent research interests include algorithms and learning on graphs and complex networks (e.g., financial networks, social networks, brain networks), fair and explainable AI, and more in general, privacy and all ethical aspects of data analysis and AI. He has more than 200 publications in these areas. He also filed 16 U.S. patents, and got granted 9 U.S. patents. He is member of the Steering Committee of ECML PKDD and IEEE DSAA, and is in the editorial board of several journals in the Data Science area. Dr. Bonchi has been the General Co-Chair of the 5th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2018). He has been twice PC Co-Chair of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2010 and 2018), the 16th IEEE International Conference on Data Mining (ICDM 2016), the 28th ACM Conference on Hypertext and Hypermedia (HT 2017), the "Social Network Analysis and Graph algorithms for the Web" track at The International World Wide Web Conference (WWW 2018), and the 6th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2019). Dr. Bonchi has also served as program co-chair of the first and second ACM SIGKDD International Workshop on Privacy, Security, and Trust in KDD (PinKDD 2007 and 2008), the 1st IEEE International Workshop on Privacy Aspects of Data Mining (PADM 2006), and the 4th International Workshop on Knowledge Discovery in Inductive Databases (KDID 2005). He is co-editor of the book Privacy-Aware Knowledge Discovery: Novel Applications and New Techniques published by Chapman & Hall/CRC Press. He will be General Chair of ECML PKDD 2023 to be held in Turin (Italy), and of ACM SIGKDD 2024, to be held in Barcelona (Spain). David Garcia-Soriano is a Senior Research Scientist at the Institute for Scientific Interchange (ISI) in Turin, in the "Algorithmic Data Analytics" group. Previously, he received his Ph.D. in Computer Science (2012) from the University of Amsterdam and his undergraduate degrees in Computer Science (2007) and Mathematics (2009) from the Complutense University of Madrid. He has been a member of the Algorithms and Complexity group at CWI Amsterdam (the Dutch National Research Center for Mathematics and Computer Science), and a research visitor at the Israel Institute of Technology in Haifa (Technion). Later, he was a postdoctoral researcher at Yahoo Labs Barcelona and a Lecturer in Computer Science at Pompeu Fabra University. He has also worked for industry as a software engineer at Google, CERN (the European Organization for Nuclear Research), and Tuenti. In recent years, he has been developing machine-learning and optimization-based solutions to financial portfolio management problems, in collaboration with Intesa San Paolo banking group. His research focuses on the theory and practice of large-scale data mining and machine learning, with an emphasis on computational efficiency and provable quality guarantees; topics include algorithmic theory, combinatorial optimization, scalable machine learning, data mining, algorithmic fairness, social network analysis, data streams, and portfolio optimization. His research findings have been published in top-tier conferences (SODA, KDD, SIGMOD, ICALP, CCC, ICDM, WWW, ICDE, RANDOM, ECML/PKDD, SDM, ...) and journals (SIAM Journal on Computing, Combinatorica, Data Mining and Knowledge Discovery, ...). Francesco Gullo is a senior researcher at the UniCredit banking group, specifically in the "Applied Research & Innovation" unit of the "AI, Data & Analytics ICT" department (UniCredit Services controlled company). Previously, he has been part of the "Research & Development" department (UniCredit holding company) for 5 years. He received his Ph.D., in "Computer and Systems Engineering," from the University of Calabria, Italy, in 2010. During his Ph.D., he was an intern at the George Mason University, U.S. After his graduation, he spent 1.5 years in the University of Calabria, Italy (as a postdoc), and 4 years in the Yahoo Labs, Spain (as a postdoc first, and as a research scientist then). His research falls into the broad areas of artificial intelligence and data science, with special emphasis on algorithmic aspects. His recent interests include mining and learning on graphs, natural language processing, and AI in finance. He has been practicing both applied research (with a 10-year work experience in industrial-research environments), and fundamental research (with 80 publications in premier venues such as SIGMOD, VLDB, KDD, ICDM, CIKM, EDBT, WSDM, ECML-PKDD, SDM, TODS, TKDE, TKDD, MACH, DAMI, TNSE, JCSS, PR). He has also been serving the scientific community, by, e.g., being Workshop Chair of ICDM'16, organizing workshops/symposia (MIDAS workshop @ECML-PKDD['16-'21], MultiClust symposium @SDM'14, MultiClust workshop @KDD'13, 3Clust workshop @PAKDD'12), or being part of the program committee of major AI/data-science conferences (e.g., SIGMOD, KDD, WWW, IJCAI, AAAI, CIKM, SIGIR, ICDM, WSDM, SDM,ECML-PKDD, ICWSM).

Preface.- Acknowledgments.- Foundations.- Constraints.- Relaxed Formulations.- Other Types of Graphs.- Other Computational Settings.- Conclusions and Open Problems.- Bibliography.- Authors' Biographies.

Erscheinungsdatum	06.06.2022
Reihe/Serie	Synthesis Lectures on Data Mining and Knowledge Discovery
Zusatzinfo	XV, 133 p.
Verlagsort	Cham
Sprache	englisch
Maße	191 x 235 mm
Gewicht	300 g
Themenwelt	Informatik ► Datenbanken ► Data Warehouse / Data Mining
	Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik
	Mathematik / Informatik ► Mathematik
ISBN-10	3-031-79198-3 / 3031791983
ISBN-13	978-3-031-79198-7 / 9783031791987
Zustand	Neuware