José Luis Ortega is a web researcher in the Spanish National Research Council (CSIC). He achieved a fellowship to the Cybermetrics Lab of the CSIC, where he finished his doctoral studies. In 2005, he was hired by the Virtual Knowledge Studio of the Royal Netherlands Academy of Sciences and Arts, and in 2008 he received a full position in the Vice-presidency for Scientific and Technological Research at the CSIC, working in research evaluation. He collaborates with the Cybermetrics Lab in research areas such as Webometrics, Web usage mining, Visualization of Information, Social network analysis and Web bibliometrics.
Academic Search Engines intends to run through the current panorama of the academic search engines through a quantitative approach that analyses the reliability and consistence of these services. The objective is to describe the main characteristics of these engines, to highlight their advantages and drawbacks, and to discuss the implications of these new products in the future of scientific communication and their impact on the research measurement and evaluation. In short, Academic Search Engines presents a summary view of the new challenges that the Web set to the scientific activity through the most novel and innovative searching services available on the Web. - This is the first approach to analyze search engines exclusively addressed to the research community in an integrative handbook. The novelty, expectation and usefulness of many of these services justify their analysis- This book is not merely a description of the web functionalities of these services; it is a scientific review of the most outstanding characteristics of each platform, discussing their significance to the scholarly communication and research evaluation- This book introduces an original methodology based on a quantitative analysis of the covered data through the extensive use of crawlers and harvesters which allow going in depth into how these engines are working. Beside of this, a detailed descriptive review of their functionalities and a critical discussion about their use for scientific community is displayed
Introduction
Abstract
This chapter first introduces the problem of the need for a clear definition of what an academic search engine is, and the colourful range of products that could be considered within this. The chapter goes on to summarize the mean characteristics that these engines should have in order to serve the complexity of scholarly information and the thoroughness of the scientific community. A brief history of these engines is given, marking their principal milestones and contributions, such as the appearance of the first autonomous citation index or the first author profiling platform. Finally, the chapter looks at the challenges these services will face in the future as they try to establish themselves within the landscape of research evaluation and information retrieval.
Keywords
academic search engines
scientific users
autonomous citation indexing
profiling
research evaluation
Academic search engines could be considered to be the meeting point between two streams that started to diverge just when the web evolved: on the one hand the traditional specialized databases and on the other hand the new generalist web search engines. Before the appearance of the web, any search product was a specialized object addressed to a specific and sophisticated user (scientist, technician, lawyer, etc.) who needed to carry out complex queries to obtain the highest precision or recall. These databases contained records with a great number of fields which described structured and formal documents. This paradigm started to break up when the arrival of the web brought about the opening up of these search services to a broad and heterogeneous population with few skills in information searching and with diverse needs. Thus, while the specialized databases retained their traditional search scheme, the new web search services began to adopt a search interface more suitable to their new users and the new hypertextual and multimedia documents, consistently simplifying the search pages (Lewandowski and Mayr, 2006). A good example of this transition was AltaVista, the first search engine on the web. The initial appearance of this service showed an advanced search with multiple boxes, Boolean operators and ranking criteria to display results, an unusual picture in the current scenario of increasingly accessible and user-friendly search interfaces (Figure 1.1). The arrival of Google in 1998 with its clear and simple search box, rapid response time and powerful crawling was the inflection point that definitively separated the world of search engines from the world of databases.
Nevertheless, the appearance of the academic search engines united these distant worlds again, creating a developed product focused on a concrete and specialized public, but based on the accessible information on the web. This new environment brought with it important challenges and a new conceptual framework, because these services should not be merely search engines of scientific information nor simply specialized databases running on the web, instead they should provide a new insight into scholarly information searchable on the web. In short, an academic search engine is neither a search engine nor a database – rather, it is the union of the best of both and, unfortunately, the mix of the complexities of each as well.
What is an academic search engine?
Firstly, it is necessary to provide a definition of an academic search engine, although this is not an easy task because the scientific literature in this area is rather sparse. Perhaps the simplest approach is to consider academic search engines as the search products that localize scientific information on the web (Codina, 2007). This is because, as will be seen, there are engines that act only as specialized search engines, indexing their data directly from the web and returning a clickable list, such as CiteSeerx and Google Scholar, and there are services that go beyond this to add elaborate and structured information – real assessment and benchmarking tools such as Microsoft Academic Search and AMiner, which incorporate functionalities to rank and measure the scientific activity; and there are services that are entirely lean on pre-processed data from secondary sources such as BASE or Q-Sensei Scholar, or systems that are supported entirely within their own means such as CiteSeerx and Google Scholar. This range of types and approaches comes from the reduced number of initiatives now on the web, and because each one starts from a particular view of the academic web search. Thus, although it is difficult to establish an outline definition, some measures can be made to distinguish a simple bibliographic database on the web from a real academic search engine. Therefore, such search services should be free web-based search services that incorporate added-value elements (citations, indicators, and so on) which allow their use for research evaluation, and should be open to different typologies of research results such as pre-prints, patents, presentations or teaching materials available on the web. Not all the products analysed in this book fit with this definition, but it is desirable that these elements are present in new and future developments.
Challenges for an academic search engine
Web-based products
As outlined above, the definition of an academic search engine is marked by its web context, which makes it different to traditional bibliographic databases. This singularity introduces some particularities inherent to the web environment. For instance, these engines are freely accessible on the web and therefore their contents can reach a broader audience, which increases the popularization of science and the appreciation for the scientific activity. However, this also provokes a higher public exposition and can prompt criticism of their functioning, coverage, searching and so on, as well as enabling the emergence of competitor developments.
A further aspect arising from the web context is that the volume of information has grown exponentially – hundreds of millions of web pages and documents related to scientific issues are now accessible. The enormous capability of these academic search engines reduces any manual data processing, so new automatic treatments are necessary in order to classify and index the amount of information generated. From a technical point of view, this is the sticking point of these search engines, because the quality of the services relies on the skill of developing autonomous processes that properly structure the data generated with as few mistakes as possible. The birth of autonomous citation indexes, the development of advanced parsing tools and the design of robust bots and harvesters are technical progressions which help the implementation of these engines but which, without a doubt, introduce failures in the citations counts, the identification of document elements and the disambiguation of names and titles.
These technical challenges are bigger still when the web information is not technically structured and presents multiple formal and informal scientific typologies. Unlike scientific databases, where the characteristics of each document are well-defined (author, title, venue, etc.) and the type of document is homogeneous (articles, notes, letters, etc.), the web gathers a great variety of documents (articles, presentations, theses, etc.) in unstructured formats where it is very difficult to distinguish between author, title, abstract, etc. This then requires a great technical effort to develop advanced parsing techniques that allow us to obtain the precise information on each document, as well as a wide and flexible data framework that enables the inclusion and characterization of any scientific document.
The need to overcome such problems is a necessary challenge for these initiatives because the quality of their product depends hugely on the technical solutions of these issues. In this sense, the way in which these matters are addressed makes these search services clear competitors of the traditional scientific databases due to their greater coverage and their ease of access.
Scientific users
An academic search engine is a specialized product which not only covers scientific information but is focused on a concrete user type: scientists. These users require search services which offer a range of instruments that not only allow them to locate precise and relevant information but that can also evaluate the quality of the information.
Unlike other users, scholars are simultaneously consumers and creators of content, which makes them more critical regarding coverage, the way in which the documents are indexed and the paper-author assignation – precisely because they can see how many of their own papers are indexed, whether they are correctly assigned to them, and whether their names are properly disambiguated. These personal insights can be one of the most important criteria for the refusal or acceptance of the use of a particular engine by a particular researcher.
The format and definition of the document type is very important, because the validity of its content is determined by the way in which the document was originated. Unlike general-interest search engines, where the content is more important than the container, in academic search engines the document typology is a quality indicator of the information that it contains. For a scientist, the...
Erscheint lt. Verlag | 2.10.2014 |
---|---|
Sprache | englisch |
Themenwelt | Geisteswissenschaften ► Sprach- / Literaturwissenschaft |
Sozialwissenschaften ► Kommunikation / Medien ► Buchhandel / Bibliothekswesen | |
Wirtschaft ► Betriebswirtschaft / Management ► Unternehmensführung / Management | |
Wirtschaft ► Betriebswirtschaft / Management ► Wirtschaftsinformatik | |
ISBN-10 | 1-78063-472-2 / 1780634722 |
ISBN-13 | 978-1-78063-472-2 / 9781780634722 |
Haben Sie eine Frage zum Produkt? |
Größe: 8,9 MB
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
Größe: 5,2 MB
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich