社会书签、内容索引和网上个性化服务
张 亮
(复旦大学计算机科学与信息技术系 200433)
摘 要 社会书签(Social bookmarking)是2004年起Web出现的一种新的内容标引方法。相对于专业的编目和用户提供元数据的现行方式,社会书签以其方便实用而备受人们的关注和喜爱,被认为是下一代的Web信息基础设施。为实现这一目标,信息技术需要有相应的进步以从用户赋予的自由内容标前中挖掘标签的本质意义、标签间的隐含关系以及利用挖掘的成果索引和推荐符合用户信息需求的对象。为此,本文提出一种标签分析技术,从用户提供的个人标签中提取具有共性和本质意义的信息标注内容,并以此为基础,索引数字化对象,刻画用户的信息需求,实现网上个性化服务。应用高维奇异值分解,这种标签分析技术能够有效地应对社会书签中的信息不足问题和用户兴趣多样化问题,从而集中了以往基于内容推荐和协同推荐两方面的长处,可满足用户的短期和长期信息需求。在社会书签主流网站del.icio.us数据集上的实验表明,本文提出的技术较以往方法有更优异的推荐性能。
Abstract Social bookmarking has come to front since last year. For its extreme low barriers to work with and ability to organize multicultural metadata, it is considered as a brand-new information infrastructure on the web and probably a promising alternative framework to build digital libraries. To achieve its potential, it is needed to extract substantial correlation among user provided tags. In this paper, we propose a technique to distill the crucial points of these users’ personal metadata. Based on the extracted information, we can maintain a profile of user interests from his personal tags, and make recommendation according to it. The proposed approach equilibrates both strength of content-based and collaborative recommendation, and satisfies all requirements from short-term to long-term recommendation. Experiments against data from the most famous social bookmarking website del.icio.us reveal the superiority of our method.
People are overwhelmed with a glut of information on the Web. Personalized recommendation is widely used to conquer the information overload problem, and collaborative filtering recommendation (CF) is one of the most successful recommendation techniques to date. It recommends items to a user according to the explicit or implicit ratings given by other users with the similar interests. However, CF becomes less effective when users have multiple interests, because users have similar taste in one aspect may behave quite different in other aspects. Thanks for the burgeoning web application named social bookmarking, we can gain information about not only what a user likes, but also why he or she likes it from the tags he or she denoted on the digital entities. By analyzing this information, we can distill the interrelations between different users’ various interests, and make better personalized recommendation based on them. In this paper, the attached information on social bookmarking is represented by a 3-order tensor of user-tag-item. We propose a division algorithm to overcome the tensor’s sparse problem, and perform cubic analysis using the higher-order singular value decomposition (HOSVD). Our analysis indicates that the approach can automatically capture the latent correlations inhered in the tensor, which has great value for personalized recommendation. Experiment against the most famous social bookmarking website del.icio.us reveals the superiority of our method over traditional CF methods.