Popis: |
As the web continues to evolve, its users have gone from only consuming content to actually producing it, resulting in systems with highly dynamic, user-generated content that cannot be easily modelled with existing tools. In this paper we investigate two such systems, digg and reddit, derive a general model for them, and show how this model can be used to improve their efficiency as well as that of other systems with similar characteristics. In order to achieve this, we have collected data on hundreds of thousands of posts and member profiles from both sites. digg and reddit are social news sites that allow users to post links to other websites as well as to vote for them. We analyse the data to get an understanding of how content is generated and how the popularity of a post evolves over time. We use the results of this analysis coupled with user-location information to derive a general model that describes the user posting behaviour across different time zones. We further demonstrate how this model can be used to do efficient replication and caching, improving these systems' performance. More importantly though, the periodic trends inherent in the model are not only applicable to these news sites, but also to applications as varied as chatting and online gaming servers, peer-to-peer content distribution and energy-efficient load balancing. We end by showing how the derived model can be used to improve some of these systems. |