Web Communities (eBook)
XI, 187 Seiten
Springer Berlin (Verlag)
978-3-540-27739-2 (ISBN)
Due to the lack of a uniform schema for Web documents and the sheer amount and dynamics of Web data, both the effectiveness and the efficiency of information management and retrieval of Web data is often unsatisfactory when using conventional data management techniques.
Web community, defined as a set of Web-based documents with its own logical structure, is a flexible and efficient approach to support information retrieval and to implement various applications. Zhang and his co-authors explain how to construct and analyse Web communities based on information like Web document contents, hyperlinks, or user access logs. Their approaches combine results from Web search algorithms, Web clustering methods, and Web usage mining. They also detail the necessary preliminaries needed to understand the algorithms presented, and they discuss several successful existing applications.
Researchers and students in information retrieval and Web search find in this all the necessary basics and methods to create and understand Web communities. Professionals developing Web applications will additionally benefit from the samples presented for their own designs and implementations.
Dr. Yanchun Zhang is Associate Professsor and the Head of Computing Discipline in the Department of Mathematics and Computing at the University of Southern Queensland. He obtained PhD degree in Computer Science from the University of Queensland in 1991. His research areas cover databases, electronic commerce, internet/web information systems, web data management, web search and web services. He has published over 100 research papers on these topics in international journals and conference proceedings, and edited over 10 books/proceedings and journal special issues. He is a co-founder and Co-Editor-In-Chief of World Wide Web: Internet and Web Information Systems and Co-Chairman of International Web Information Systems Engineering Society.
Dr. Jeffrey Xu Yu received his B.E., M.E. and Ph.D. in computer science, from the University of Tsukuba, Japan, in 1985, 1987 and 1990, respectively. Jeffrey Xu Yu was a faculty member in the Institute of Information Sciences and Electronics, University of Tsukuba, Japan, and was a Lecturer in the Department of Computer Science, The Australian National University. Currently, he is an Associate Professor in the Department of Systems Engineering and Engineering Management, the Chinese University of Hong Kong. His research areas cover databases, data warehouse and data mining. He has published over 100 research papers on these topics in international journals and conference proceedings. Jeffrey Xu Yu is a member of ACM, and a society affiliate of IEEE Computer Society.
Dr Jingyu Hou received his BSc in Computational Mathematics from Shanghai University of Science and Technology (1985) and his PhD in Computational Mathematics from Shanghai University (1995). He is now a Lecturer in the School of Information Technology at Deakin University, Australia. He has also completed a PhD in Computer Science in the Department of Mathematics and Computing at The University of Southern Queensland, Australia. His research interests include Web-Based Data Management and Information Retrieval, Web Databases, Internet Computing and Electronic Commerce, and Semi-Structured Data Models. He has extensively published in the areas of Web information retrieval and Web Communities.
Dr. Yanchun Zhang is Associate Professsor and the Head of Computing Discipline in the Department of Mathematics and Computing at the University of Southern Queensland. He obtained PhD degree in Computer Science from the University of Queensland in 1991. His research areas cover databases, electronic commerce, internet/web information systems, web data management, web search and web services. He has published over 100 research papers on these topics in international journals and conference proceedings, and edited over 10 books/proceedings and journal special issues. He is a co-founder and Co-Editor-In-Chief of World Wide Web: Internet and Web Information Systems and Co-Chairman of International Web Information Systems Engineering Society. Dr. Jeffrey Xu Yu received his B.E., M.E. and Ph.D. in computer science, from the University of Tsukuba, Japan, in 1985, 1987 and 1990, respectively. Jeffrey Xu Yu was a faculty member in the Institute of Information Sciences and Electronics, University of Tsukuba, Japan, and was a Lecturer in the Department of Computer Science, The Australian National University. Currently, he is an Associate Professor in the Department of Systems Engineering and Engineering Management, the Chinese University of Hong Kong. His research areas cover databases, data warehouse and data mining. He has published over 100 research papers on these topics in international journals and conference proceedings. Jeffrey Xu Yu is a member of ACM, and a society affiliate of IEEE Computer Society. Dr Jingyu Hou received his BSc in Computational Mathematics from Shanghai University of Science and Technology (1985) and his PhD in Computational Mathematics from Shanghai University (1995). He is now a Lecturer in the School of Information Technology at Deakin University, Australia. He has also completed a PhD in Computer Science in the Department of Mathematics and Computing at The University of Southern Queensland, Australia. His research interests include Web-Based Data Management and Information Retrieval, Web Databases, Internet Computing and Electronic Commerce, and Semi-Structured Data Models. He has extensively published in the areas of Web information retrieval and Web Communities.
Contents 6
Preface 9
1 Introduction 10
1.1 Background 10
1.2 Web Community 13
1.3 Outline of the Book 14
1.4 Audience of the Book 15
2 Preliminaries 16
2.1 Matrix Expression of Hyperlinks 16
2.2 Eigenvalue and Eigenvector of the Matrix 18
2.3 Matrix Norms and the Lipschitz Continuous Function 19
2.4 Singular Value Decomposition (SVD) of a Matrix 20
2.5 Similarity in Vector Space Models 23
2.6 Graph Theory Basics 23
2.7 Introduction to the Markov Model 24
3 HITS and Related Algorithms 26
3.1 Original HITS 26
3.2 The Stability Issues 29
3.3 Randomized HITS 31
3.4 Subspace HITS 32
3.5 Weighted HITS 33
3.6 The Vector Space Model (VSM) 36
3.7 Cover Density Ranking (CDR) 38
3.8 In-depth Analysis of HITS 40
3.9 HITS Improvement 44
3.10 Noise Page Elimination Algorithm Based on SVD 47
3.11 SALSA (Stochastic algorithm) 52
4 PageRank Related Algorithms 57
4.1 The Original PageRank Algorithm 57
4.2 Probabilistic Combination of Link and Content Information 61
4.3 Topic-Sensitve PageRank 64
4.4 Quadratic Extrapolation 66
4.5 Exploring the Block Structure of the Web for Computing PageRank 68
4.6 Web Page Scoring Systems (WPSS) 72
4.7 The Voting Model 79
4.8 Using Non-Affliated Experts to Rank Popular Topics 83
4.9 A Latent Linkage Information (LLI) Algorithm 87
5 Affinity and Co-Citation Analysis Approaches 93
5.1 Web Page Similarity Measurement 93
5.2 Hierarchical Web Page Clustering 103
5.3 Matrix-Based Clustering Algorithms 105
5.4 Co-Citation Algorithms 112
6 Building a Web Community 119
6.1 Web Community 119
6.2 Small World Phenomenon on the Web 121
6.3 Trawling the Web 123
6.4 From Complete Bipartite Graph to Dense Directed Bipartite Graph 126
6.5 Maximum Flow Approaches 131
6.6 Web Community Charts 141
6.7 From Web Community Chart to Web Community Evolution 146
6.8 Uniqueness of a Web Community 149
7 Web Community Related Techniques 152
7.1 Web Community and Web Usage Mining 152
7.2 Discovering Web Communities Using Co-occurrence 154
7.3 Finding High-Level Web Communities 156
7.4 Web Community and Formal Concept Analysis 158
7.5 Generating Web Graphs with Embedded Web Communities 162
7.6 Modeling Web Communities Using Graph Grammars 164
7.7 Geographical Scopes of Web Resources 165
7.8 Discovering Unexpected Information from Competitors 168
7.9 Probabilistic Latent Semantic Analysis Approach 171
8 Conclusions 176
8.1 Summary 176
8.2 Future Directions 178
References 180
Index 188
About the Authors 191
Erscheint lt. Verlag | 28.3.2006 |
---|---|
Zusatzinfo | XI, 187 p. |
Verlagsort | Berlin |
Sprache | englisch |
Themenwelt | Mathematik / Informatik ► Informatik ► Web / Internet |
Informatik ► Weitere Themen ► Hardware | |
Schlagworte | ALS • Content • Dom • organization • origin • search engine marketing (SEM) • Web • Web Clustering • Web data management • Webp • Web Search • Web Usage Mining |
ISBN-10 | 3-540-27739-0 / 3540277390 |
ISBN-13 | 978-3-540-27739-2 / 9783540277392 |
Haben Sie eine Frage zum Produkt? |
Größe: 3,5 MB
DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich