Please use this identifier to cite or link to this item:
/library/oar/handle/123456789/93686| Title: | Using latent semantic analysis to cluster pages in web browser history |
| Authors: | Felice Sant Cassia, Chiara (2014) |
| Keywords: | Browsers (Computer programs) Latent semantic indexing Web sites ¸£ÀûÔÚÏßÃâ·Ñ retrieval |
| Issue Date: | 2014 |
| Citation: | ¹ó±ð±ô¾±³¦±ð³æ20;³§²¹²Ô³Ù³æ20;°ä²¹²õ²õ¾±²¹,³æ20;°ä.³æ20;(2014).³æ20;±«²õ¾±²Ô²µ³æ20;±ô²¹³Ù±ð²Ô³Ù³æ20;²õ±ð³¾²¹²Ô³Ù¾±³¦³æ20;²¹²Ô²¹±ô²â²õ¾±²õ³æ20;³Ù´Ç³æ20;³¦±ô³Ü²õ³Ù±ð°ù³æ20;±è²¹²µ±ð²õ³æ20;¾±²Ô³æ20;·É±ð²ú³æ20;²ú°ù´Ç·É²õ±ð°ù³æ20;³ó¾±²õ³Ù´Ç°ù²â³æ20;(µþ²¹³¦³ó±ð±ô´Ç°ù’s³æ20;»å¾±²õ²õ±ð°ù³Ù²¹³Ù¾±´Ç²Ô). |
| Abstract: | The web browser has become one of the most significant applications on any device. Needless to say, the accumulated browsing history of any user is a huge repository of information, accounting for all sorts of user interests. Web page revisitation is a common issue amongst users; yet the technique to view browsing history made available by modern web browsers is poor in structure. We make the assumption that any query submitted into a search engine is derived from a user interest, and as a result, combine information extracted from web history with query terms. This is achieved using an approach referred to as Latent Semantic Analysis which transforms a document corpus into a reduced dimensional semantic space, by separating documents from the actual terms used and representing these extracted ideas as concepts. Queries are then represented as 'pseudo-document' vectors within this space, in an attempt to influence the significance of such query terms. Queries are then clustered using a hierarchical clustering technique, and transformed into a visualisation using a third-party application, aiming to accurately represent a user's navigational patterns on the web. For any given query cluster, all relevant web pages from history are retrieved. Given that a great number of participants are required in order to achieve results which are statistically significant, a conclusive evaluation on our project is not realistic. However, through some experiments we uncovered that the system fares well in situation where user browsing is not heavily influenced by external sources, such as social media websites. This is due to the fact that these web pages do not necessarily reflect a user interest, yet they are still given the same importance within the space. Future work in this area included automatically identifying and filtering out these pages which contribute to 'noise'. |
| Description: | B.Sc. IT (Hons)(Melit.) |
| URI: | https://www.um.edu.mt/library/oar/handle/123456789/93686 |
| Appears in Collections: | Dissertations - FacICT - 2014 |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| B.SC.(HONS)ICT_Felice Sant Cassia_Chiara_2014.PDF Restricted Access | 8.04 MB | Adobe PDF | View/Open Request a copy |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.
