Please use this identifier to cite or link to this item: /library/oar/handle/123456789/93686
Title: Using latent semantic analysis to cluster pages in web browser history
Authors: Felice Sant Cassia, Chiara (2014)
Keywords: Browsers (Computer programs)
Latent semantic indexing
Web sites
¸£ÀûÔÚÏßÃâ·Ñ retrieval
Issue Date: 2014
Citation: ¹ó±ð±ô¾±³¦±ð&#³æ20;³§²¹²Ô³Ù&#³æ20;°ä²¹²õ²õ¾±²¹,&#³æ20;°ä.&#³æ20;(2014).&#³æ20;±«²õ¾±²Ô²µ&#³æ20;±ô²¹³Ù±ð²Ô³Ù&#³æ20;²õ±ð³¾²¹²Ô³Ù¾±³¦&#³æ20;²¹²Ô²¹±ô²â²õ¾±²õ&#³æ20;³Ù´Ç&#³æ20;³¦±ô³Ü²õ³Ù±ð°ù&#³æ20;±è²¹²µ±ð²õ&#³æ20;¾±²Ô&#³æ20;·É±ð²ú&#³æ20;²ú°ù´Ç·É²õ±ð°ù&#³æ20;³ó¾±²õ³Ù´Ç°ù²â&#³æ20;(µþ²¹³¦³ó±ð±ô´Ç°ù’s&#³æ20;»å¾±²õ²õ±ð°ù³Ù²¹³Ù¾±´Ç²Ô).
Abstract: The web browser has become one of the most significant applications on any device. Needless to say, the accumulated browsing history of any user is a huge repository of information, accounting for all sorts of user interests. Web page revisitation is a common issue amongst users; yet the technique to view browsing history made available by modern web browsers is poor in structure. We make the assumption that any query submitted into a search engine is derived from a user interest, and as a result, combine information extracted from web history with query terms. This is achieved using an approach referred to as Latent Semantic Analysis which transforms a document corpus into a reduced dimensional semantic space, by separating documents from the actual terms used and representing these extracted ideas as concepts. Queries are then represented as 'pseudo-document' vectors within this space, in an attempt to influence the significance of such query terms. Queries are then clustered using a hierarchical clustering technique, and transformed into a visualisation using a third-party application, aiming to accurately represent a user's navigational patterns on the web. For any given query cluster, all relevant web pages from history are retrieved. Given that a great number of participants are required in order to achieve results which are statistically significant, a conclusive evaluation on our project is not realistic. However, through some experiments we uncovered that the system fares well in situation where user browsing is not heavily influenced by external sources, such as social media websites. This is due to the fact that these web pages do not necessarily reflect a user interest, yet they are still given the same importance within the space. Future work in this area included automatically identifying and filtering out these pages which contribute to 'noise'.
Description: B.Sc. IT (Hons)(Melit.)
URI: https://www.um.edu.mt/library/oar/handle/123456789/93686
Appears in Collections:Dissertations - FacICT - 2014

Files in This Item:
File Description SizeFormat 
B.SC.(HONS)ICT_Felice Sant Cassia_Chiara_2014.PDF
  Restricted Access
8.04 MBAdobe PDFView/Open Request a copy


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.