Designing New Crawling and Indexing Techniques for Web Search Engines. A New Generation of Web Search Engine

Designing New Crawling and Indexing Techniques for Web Search Engines. A New Generation of Web Search Engine

Qingzhao Tan

     

бумажная книга



Издательство: Книга по требованию
Дата выхода: июль 2011
ISBN: 978-3-6392-0400-1
Объём: 156 страниц
Масса: 258 г
Размеры(В x Ш x Т), см: 23 x 16 x 1

This thesis studies in a Web search engine how a crawler with limited computing resource can effectively crawl from the dynamically changing Web and acquire the most updated Web documents, and how a Web search engine can provide information-object--oriented indexing methods which enable users to retrieve desired information with high accuracy and high efficiency. To address the first problem, we design a set of sampling policies with various downloading granularity for the sampling method, taking into account the link structure, the directory structure, and the content-based features which include the clustering technique. We further extend the clustering-based sampling approach by testing more dynamic features and strategically selecting samples from each cluster. For the second problem, we propose building indexes on extracted metadata of various information objects, instead of the whole document. We set up a digital library named ArchSeer for the domain of archeology. ArchSeer allows users to retrieve archeology literature via domain-specific search engines.

Данное издание не является оригинальным. Книга печатается по технологии принт-он-деманд после получения заказа.

Каталог