Significance of Search Logs in Crawling

Karishma

doi:10.17577/IJERTV1IS5032

Volume 01, Issue 05 (July 2012)

Significance of Search Logs in Crawling

DOI : 10.17577/IJERTV1IS5032

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 36
Total Downloads : 540
Authors : Karishma
Paper ID : IJERTV1IS5032
Volume & Issue : Volume 01, Issue 05 (July 2012)
Published (First Online): 02-08-2012
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Significance of Search Logs in Crawling

Karishma

Asst.Prof.

J B Knowledge Park,Faridabad Haryana,India

Abstract

On most sites, users look to on-site Search after they've scanned the page for clues to the content they're seeking. A quality Search experience is critical to making users happy. With clever instrumentation and reporting, you can put together benchmarks for your search that will help identify problems quickly and will make search more efficient and fast. The place to start is the Search Log. How hard is search? How much does personalization help? All the answers lie with Search Logs. Effective organization of search results is critical for improving the utility of any Crawler

.In this paper we discuss the significance of search logs on crawlers. Using the contents of Search logs crawler can refine the search and retrieve data more quickly and personalize the search according to user needs and interests.

Keywords

Search logs, Personalization, Crawlers, Spiders.

Introduction

The utility of a Crawler (Search engine) is affected by multiple factors [1]. While the primary factor is the soundness of the underlying retrieval model and ranking function, how to organize present search results is also a very important factor that can affect the utility of a search engine significantly. Compared with the vast amount of literature on retrieval models, however, there is relatively little research on how to improve the effectiveness of search result organization. Search engine employs the strategy of ranking the searched results and rank top the most suitable

and relevant results [3]. However, when the search results are diverse (e.g., due to ambiguity or multiple aspects of a topic) as is often the case in Web search, the ranked list presentation would not be effective. It is by analyzing search patterns via our logs data search results can be optimized.
Search logs

Search engine logs [11] record the activities of Web users, which

Table 1: Sample entries of search engine logs. Different ID's mean different sessions

ID

Query URL Time

1

win zip http://www.winzip.com xxxx

1

win zip http://www.swinzip.com/winzip xxx

2

time zones http://www.timeanddate.com xxxx

reflect the actual users needs or interests when conducting Web search. They generally have the following information: text queries that users submitted, the URLs that they clicked after submitting the queries, and the time when they clicked. Search engine logs are separated by sessions. A session includes a single query and all the URLs that a user clicked after issuing the query. A small sample of search log data is shown in Table 1. The process of recording the data in the search log is relatively straightforward. Web servers record and store the interactions between searchers (i.e., actually Web browsers on a particular computer) and search engines in a log

file (i.e., the transaction log) on the server using a software application. Thus, most search logs are server-side recordings of interactions. Major Web search engines execute millions of these interactions per day. The server software application can record various types of data and interactions depending on the file format that the server software supports.

Search logs are records of the user requests for information from our index[12]. We can generate and export this information, and then use it as input to our preferred log analysis software or reporting software.

For Web searching, a search log is an electronic record of interactions that have occurred during a searching episode between a Web search engine and users searching for information on that Web search engine

These are some examples of the information that search logs can provide:

What types of queries are users making? How fast are users being served?

Are users getting the results they need?

Do you need to help users find relevant information by configuring the Related Queries, Key Match, Query Expansion, or One Box features?

On the basis of this information prioritization of links is done and when user further makes query about the related topic more relevant information is retrieved.
ANALYSIS OF SEARCH LOGS TO IMPROVISE SEARCH

A number of studies have indirectly compared successful and less successful search strategies by comparing expert and novice searchers in lab studies. Recently, researchers have also begun to use data from search engine logs to identify metrics that are related to users' search success. These studies have provided some promising findings, but the noisiness of the log data makes it hard to determine if the searchers were successful or not and which signals are specific to which kinds of tasks. Next we discuss

improvisation of search based on following factors[13].
In most of the above personalized search strategies, each user has a distinct profile and the profile is used to personalize search results for the user. There are also some approaches that personalize search results for the preferences of a community of like-minded users. These approaches are called community-based personalized web search or collaborative web search In a community-based personalized web search, when a user issues a query, search histories of users who have similar interests to the user are used to filter or re-rank search results. For example, documents that have been selected for the target query or similar queries by the community are re-ranked higher in the results list.
PERSONALIZING THE SEARCH EXPERIENCE

Rather than providing one centralized search experience for everyone, you can provide different search experiences for different groups of users. Each personalized search experience is based on the interests[4], roles, departments, locations, or languages of the user group.

Users often search for "acme widgets," but they are not all searching for the same results. More typically, when searching for acme widgets:

Engineering staff is searching for design documents and status information

Sales staff is searching for sales forecasts and reports

Customer support is searching for support metrics and update information

With a centralized search experience, some users may find what they are looking for at the top of the results listings while other users might have to view several results before finding what they are looking for. With a personalized search experience[5]:

Each group of users has a unique search experience where results are ranked according to their interests

Users find what they are looking for at the top of the search results.
APPROACHES FOR PERSONALIZATION: RELATED WORK

The basic approach for personalization is:
1. Build User model that shows his interest through Click history.
2. Take any learning Strategy for analyzing this History.
3. After analysis ranking mechanism or any other categorical method is applied an search log to personalize search.
To realize personalized web search, search engine needs to make different semantic expansion of the users queries based on users personalized information which will affect the users' whole search process. A common method to achieve personalized web search is to construct a user profile model based on users personal information and then the user profile is applied to influence the users search results. A variety of methods can be used to construct user profile model according to the personal information collected from users. Different methods not only have different characteristics and performances, but also have different effects on personalized search results. First, the user profile can be described as a set of rules. Second, the user profile can be described as a set of key words. A web page can be determined its correlation with the user interest by calculating the distribution of these key words in it. Third, the user's personalized information is represented as a vector space. After transforming a web page into a corresponding vector in the vector space, the page can be identified if it is relative with the user profile. Fourth, the user profile is represented as a probability table of user interest and its corresponding key words. This table can be used to determine the probability of relevance between user interest and the web page. Vector space model can be flexibly used to express user profile, moreover many information retrieval and machine learning methods are directly based on vector space model.

Three user profile modelling [8] methods are proposed to realize web personalized search. The

methods include Rocchio method, k-Nearest Neighbour method and Support Vector Machines method. To measure and compare the search performances of these three methods, a domain dataset is also constructed. Then based on this dataset, the user profile modeling methods are tested. Experimental results [2] show that k- Nearest Neighbor and Support Vector Machines method perform as well. In addition, kNN method has better robustness and can be easy to use. Therefore, kNN method is a better way to construct user profile model for web personalized search.

Another Approach is to personalize the search based on User Behaviour [3]. Because users are difficult to express effective demand and difficult to be analyzed, so these make personalized search engine difficult to collect the users' personalized information, Instead of letting the user to express his needs, let us analyze the uses' behavior from the history use, then finds effective personal information. Therefore, the search system should be able to without user intervention, directly and accurately detect the users' direction of interest, allowing users to look up the information more accurate, more in line with their needs. With the development of network, online advertising has increased from no difference for all users towards the direction of targeted player. Behavior Targeting ad model is generated for users' interest.Tracking the history page of users visited during a certain period, then based on the content, time, frequency, whether set to collection to determine the user's interest. Even under long- term follow-up to confirm the user's long-term interest, short-term interest and the change of the interest, adjust the ads timely, so that these make users always see something they feel interesting. We also can use this idea to the search engine to collect the personalized information. Of course, users' interest often more than one, also not static; so these require us to improve the algorithm continuously to achieve the best possible condition. As the complexity of search engine technology, so it is not enough we collected the user's interest in a similar way, we also need to according to the user's personal interests to give targeted feedback.

There also exist approaches for personalization based on the User preferences, user interest [10] etc.
CONCLUSION

We have discussed the effect of using the content of search Logs in crawling .We can utilize Search log data to effectively retrieve information from web. Search results can also be personalized with search log data by collecting User preferences or interest or behavior etc. In closing we can conclude that using search log we can personalize the Search to effective and relevant retrieal of Information from web and there can be various other methods to analyze the search Log.
References

A. Kritikopoulos, and M. Sideri, The Compass Filter: Search Engine Result Personalization using Web Communities, Lecture Notes in Computer Science, v 3169, p 229-240, 2005.
Chunyan Liang, User Profile for Personalized Web Search in the proceedings of 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)
Sugiyama K. Studies on Improving Retrieval Accuracy in web Information Retrieval [0]. Tokyo: Nara Institute of Science and Technology, 2004.
Ding Zhenfan, Deng Lei. personalized information recommendation service and personalized search engine (Personalized Information Recommendation Service and Personalized Search Engines). Software space ,2009-12 ,205-206
Lee Aoki, Cui North light. Based on personalized information recommendation services, Web search engine technology Summary of Information . 2007-8,98- 101.
Seher, I. Ginige, A. Shahrestani, S.A, A personalized query expansion approach using context, 3rd IET International Conference on Intelligent Environments, 2007.
Wang, G.T. Xie, F. Tsunoda, F. Maezawa, H. Onoma, A.K., Web search with personalization and knowledge, IEEE International Conference on Multimedia Software Engineering, 2002.
Hany M. Harb, Ahmed R. Khalifa, Hossam M. Ishkewy, Personal Search Engine Based on User Interests and Modified Page Rank, July 1 2009.
Fang Liu, Weiyi Meng, Personalized Web Search for Improving Retrieval Effectiveness, IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 1, January 2004.
Li, S.Q. and Han, Z.Y. 2008. Principles & Technique of Personalized Search Engine. Science Press,2008.
Yuan Hong, Xiaoyun He, Jaideep Vaidya, Nabil Adam, and Vijayalakshmi Atluri. Effective anonymization of query logs. In CIKM, 2009.
Michaela GÂ¨otz, Ashwin Machanavajjhala, Guozhang Wang, Xiaokui Xiao, and Johannes Gehrke, Publishing Search Logs A Comparative Study of Privacy Guarantees in IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING in 2011.
Aleksandra Korolova, Krishnaram Kenthapadi, Nina Mishra, and Alexandros Ntoulas. Releasing search queries and clicks privately.In WWW, 2009.

ID	Query URL Time
1	win zip http://www.winzip.com xxxx
1	win zip http://www.swinzip.com/winzip xxx
2	time zones http://www.timeanddate.com xxxx

Significance of Search Logs in Crawling

Keywords

Leave a Reply