Data Cleaning
Springer International Publishing (Verlag)
978-3-031-00769-9 (ISBN)
Venky Ganti is the co-founder and CTO of Alation Inc, where he is developing technology to effectively search, understand, and analyze structured and semi-structured data. Prior to Alation, he was a member of the Google Adwords engineering team for a few years. He helped develop the Dynamic Search Ads (DSA) product, whose goal is to completely automate the configuration and maintenance of AdWords campaigns based on an advertiser's website and a few configuration parameters. e main technical challenge is to mine for appropriate keywords and automatically create high quality ads which match the accuracy and quality of manually configured campaigns. Prior to Google, Venky was a senior researcher at Microsoft Research (MSR). While at MSR, he worked extensively on data cleaning and integration technologies. Some of the technologies he helped develop in this context are now part of Microsoft SQL Server Integration Services, the ETL platform of Microsoft SQL Server. He also worked on leveraging rich structured databases on products, movies, people, etc., to enrich user experience for web search. Some of the tech nologies he helped develop are now part of the Bing product search. He has a Ph.D. in database systems and data mining from the University of Wisconsin-Madison. Anish Das Sarma is currently a Senior Research Scientist at Google (since May 2010), before which he was a Research Scientist at Yahoo (August 2009-April 2010). Prior to joining Yahoo research, Anish did his Ph.D. in Computer Science at Stanford University, advised by Prof. Jen nifer Widom. Anish received a B.Tech. in Computer Science and Engineering from the Indian Institute of Technology (IIT) Bombay in 2004, and an M.S. in Computer Science from Stan ford University in 2006. Anish is a recipient of the Microsoft Graduate Fellowship, a Stanford University School of Engineering fellowship, and the IIT-Bombay Dr. Shankar Dayal Sharma Gold Medal. Anish has written over 40 technical papers, filed over 10 patents, is associate edi tor of Sigmod Record, has served on the thesis committee of a Stanford Ph.D. student, and has served on numerous program committees. Two SIGMOD and one VLDB paper co-authored by Anish were selected among the best papers of the conference, with invitations to journals. While at Stanford, Anish co-founded Shout Velocity, a social tweet ranking system that was named a top-50 fbFund Finalist for most promising upcoming start-up ideas
Preface.- Acknowledgments.- Introduction.- Technological Approaches.- Similarity Functions.- Operator: Similarity Join.- Operator: Clustering.- Operator: Parsing.- Task: Record Matching.- Task: Deduplication.- Data Cleaning Scripts.- Conclusion.- Bibliography.- Authors' Biographies.
Erscheinungsdatum | 06.06.2022 |
---|---|
Reihe/Serie | Synthesis Lectures on Data Management |
Zusatzinfo | XV, 69 p. |
Verlagsort | Cham |
Sprache | englisch |
Original-Titel | Data Cleaning |
Maße | 191 x 235 mm |
Gewicht | 183 g |
Themenwelt | Mathematik / Informatik ► Informatik ► Netzwerke |
Informatik ► Theorie / Studium ► Algorithmen | |
ISBN-10 | 3-031-00769-7 / 3031007697 |
ISBN-13 | 978-3-031-00769-9 / 9783031007699 |
Zustand | Neuware |
Haben Sie eine Frage zum Produkt? |
aus dem Bereich