One possible approach is to personalize the web space -- create a system which responds to user queries by potentially aggregating information from several sources in a manner which is dependent on who the user is. As a trivial example - a European querying on casinos is probably better served by URLs pointing to Monaco, whereas someone in North America should get URLs pointing to Las Vegas. A biologist querying on cricket in all likelihood wants something other than a sports enthusiast would. Existing commercial systems seek to do some minimal personalization based on declarative information directly provided by the user, such as their zip code, or keywords describing their interests, or specific URLs, or even particular pieces of information they are interested in (e.g. price for a particular stock). Our research aims at creating systems that (semi) automatically tailor the content delivered to the user from a web site. We do so by mining the web -- both the contents, as well as the users' interaction. Web mining, when looked upon in data mining terms, can be said to have
three operations of interests - clustering (finding natural groupings of
users, pages etc.), associations (which URLs tend to be requested
together), and sequential analysis (the order in which URLs tend to be
accessed). As in most real-world problems, the clusters and associations
in Web mining do not have crisp boundaries. and often overlap
considerably. In addition, bad exemplars (outliers) and incomplete data
can easily occur in the data set, due to a wide variety of reasons
inherent to web browsing and logging. Thus, Web Mining and Personalization
requires modeling of an unknown number of overlapping sets in the presence
of significant noise and outliers, (i. e., bad exemplars). Moreover, the
data sets in Web Mining are extremely large.
| |||||||||
| |||||||||