Yearly Archives: 2014

Verification of Social Media Content for News

@SNOW/WWW, 2014, by Christina Boididou, Symeon Papadopoulos, Nic Newman, Steve Schifferes, Yiannis Kompatsiaris

Over the last few years, social media has become a primary news source.  On a range of stories such as disasters and political uprisings social networks are routinely the place where news is broken first – through eyewitness reports via text, pictures and videos.

The Hudson River plane crash in 2009 was an early example of how a mobile phone picture can be distributed to a global audience within seconds. Since then, user-generated content posted to social networks like Twitter, Facebook, and YouTube has shaped coverage of a variety of news events including the death of Osama Bin Laden in 2010, the Japanese earthquake of 2011, and the popular uprisings in the middle-east. During Hurricane Sandy in 2012, users of Instagram—a social photo sharing site—posted 10 photographs a second of the devastation, with around half a million photos being posted in total.

But amid this deluge of information, we also find a large number of fakes. Analysis by the Guardian newspaper suggested up to 15% of the most shared items during Hurricane Sandy were misleading or deliberately falsified. Many of these were picked up by the mainstream media and given added credibility – at least for a while.

blog_fig1a blog_fig1b

Figure 1: Fake Hurricane Sandy pictures sent in social media

Over the years there have been a number of similar well-documented cases.  Following the death of Osama Bin Laden in May 2011, ‘PhotoShopped’ pictures purporting to show his dead body were distributed on social media and picked up by newspapers, news websites, and TV stations, potentially inflaming passions in the region. In the December 2012 Connecticut school shootings, news agency and TV outlets misidentified the gunman to millions of readers and viewers, based on false information in social media. Following the Boston Marathon bombings in April 2013, social media tried to crowdsource the identification of the perpetrators with unsatisfactory results.



Figure 2: Twitter users make wrong assumptions around the identity of the Boston Bombing (April 2013)

Reddit, Twitter and 4Chan contained information that wrongly identified innocent people – causing great distress to these individuals as well as friends and family. Posting fake pictures can also cause unnecessary alarm among the general public. Images that purported to show sharks in New York City were posted at the height of Hurricane Sandy. A giant creature washed up on a California beach was linked to effects of radiation from the Fukushima nuclear plant in 2011.

Such incidents have led to increasing concerns about the reliability of news in social networks. Journalists are looking to quickly identify relevant information and trusted sources, but how can they also quickly spot misinformation, faked pictures, and the setting of false trails? Without tackling these issues the use of social media as a source of news will be increasingly problematic.

So what can be done? In other domains researchers have been able to help mitigate the effects of false and potentially damaging information. The problem of email and web spam in the 1990s led to a series of new and increasingly effective approaches to combatting the problem. Link-based and content-based dependencies among Web pages were used (Castillo et al. [1]) to develop an automated way to predict and identify spam and irrelevant content. Other studies (Seo et al. [2]) noted that false claims in social media tend to come from a small number of unreliable sources – so research has focused on how to identify this small group of users.

But the news problem is proving particularly hard to crack. Previously reliable contributors in one domain may prove unreliable in another. An eyewitness to a news story may have a very limited footprint on which to base decisions on credibility – and yet the content may be uniquely valuable. Content analysis may be confounded by the amount and speed of information and the range of different formats now flowing through social media. In polarised situations like Syria and the Ukraine, protagonists are often looking to use social media to spread false rumours and have become sophisticated in tampering with pictures and video to promote their point of view. These issues are hard to resolve algorithmically with any degree of certainty.

One promising approach in distinguishing between real and fake news images came in research by Gupta et al. [3], where the authors attempted to capture the patterns of fake twitter content, by using classification models on tweet text and user features. Around a defined story dataset, this approach achieved success levels of up to 97%, but this depended on training the classifier against this particular story, introducing considerable bias and overfitting on the learning process.

In our research for SocialSensor, an EU research project, we attempted to replicate these results using pictures mined from social media around Hurricane Sandy and the Boston Bombings. Fake images were identified from journalistic blogs to provide a ‘ground truth’ against which we could test various algorithms. While we identified false pictures in around 80% of the cases, this depended on the training set used to create the classifier. Applying the algorithms trained in one dataset (Hurricane Sandy) to the second (Boston Marathon) produced precision that was not much higher than the random baseline.

Working closely with journalists, however, as part of the project, we have built up a deep understanding of how newsrooms verify content from social media and which factors they take into account under different circumstances. Without wishing to understate the complexity of these processes, we think a number of these factors are applicable to a wide range of news stories and could help us to come up with a more generalisable approach that could at least get closer to the answer.

One hypothesis is that it would be valuable to add the geographic location of the user as a key factor in determining the relevance of content in a breaking news situation. The time the tweet was posted and its proximity to the beginning of the story is another key element that a journalist would use to assess veracity and in most cases can be determined automatically. Finally, we believe the incorporation of features from appropriately selected terms (based on statistical analysis of an independent set of fake and real tweets) will also carry considerable predictive power.

Even without these features, we hope that our work already sets out an experimental framework for assessing the performance of computational verification approaches on social multimedia which will benefit the wider research community – and we have published this on GitHub. We welcome feedback and collaboration to further work on this exciting challenge.

[1] Castillo, C., Donato, D., Gionis, A., Murdock, V., & Silvestri, F. (2007, July). Know your neighbors: Web spam detection using the web topology. In Proceedings of the 30th ACM SIGIR conference (pp. 423-430). ACM
[2] Seo, E., Mohapatra, P., & Abdelzaher, T. (2012, May). Identifying rumors and their sources in social networks. In SPIE Defense, Security, and Sensing (pp. 83891I-83891I). International Society for Optics and Photonics.
[3] Gupta, A., Lamba, H., Kumaraguru, P., & Joshi, A. (2013, May). Faking Sandy: characterizing and identifying fake images on Twitter during Hurricane Sandy. In Proceedings of the 22nd international conference on World Wide Web companion (pp. 729-736)

Trends of News Diffusion in Social Media based on Crowd Phenomena

@SNOW/WWW, 2014, by Minkyoung Kim, David Newth, Peter Christen

Complex Information Pathways in Social Media

More and more web articles today share information by simply including hyperlinks to different types of social media such as mainstream news (News), social networking sites (SNS), and blogs (Blog), as shown in Figure (a). For instance, within a few mouse clicks, an experienced blogger can merge news, relevant ideas from social networking sites, and supporting blog posts from unlimited digital content. Such behaviour from multiple social media platforms collectively forms complex information pathways on the Web, as shown in Figure (b).


With the help of web technologies such as RSS news feeds, social media aggregators, and miscellaneous mobile applications, a wide range of information from different sources is now more accessible than ever before. However, previous studies on diffusion have focused on a single social platform alone (such as Twitter or Facebook) rather than on combined social media of different types. When considering these real circumstances, it is meaningful to obtain macro-level diffusion trends from emergent information pathways. In this regard, there are several challenges. First, underlying real diffusion structures are not only hard to define, but they are also dynamically changing. Second, high accessibility to diverse information sources increases the heterogeneity of social networks in diffusion. Finally, the diversity of information leads to significant variations in diffusion patterns.

A Few Considerations for the study of Cross-Population Diffusion

For studying the cross-population diffusion phenomena, there are several points to consider. First, the meta-population scheme is important since the way of classifying heterogeneous social networks gives a different interpretation of the dynamics of diffusion across populations. Second, identifying trending topics across target populations helps trace beyond the bounds of site-specific (local) diffusion. Finally, categorizing the topics of information enables us to obtain common or distinct diffusion patterns between different categories.

In this study, we focus on real-world news diffusion in social media. Accordingly, as the meta-population scheme, we considered three different types of social media: News, SNS, and Blog, which constitute over 98% of our original Spinn3r dataset [1]. Also, for the identification of real-world news, we use the Wikipedia Current Events [2] as a note-worthy event registry, which helps us to identify trending topics across diverse social media platforms. As the figure below shows, each bullet point is referred to as news, which describes a short summary of an event for that day, along with reference hyperlinks (purple rectangles), and each bold font text is referred to as the corresponding news category.


Macro-level Information Pathways based on Crowd Phenomena

We analyse crowd phenomena in news diffusion across different types of social media such as News, SNS, and Blog in terms of activity, reactivity, and heterogeneity. We found that News is the most active, SNS is the most reactive, and Blog is the most persistent, which governs time-evolving heterogeneity of populations in news diffusion. Finally, we interpret the collective behaviours of News, SNS, and Blog from various angles using our previous model-free [3] and model-driven [4] approaches, showing that the strength and directionality of influence reflect the discovered crowd phenomena in news diffusion. These attempts enable us to understand diffusion trends of news in social media from diverse aspects in a more consistent and systematic way.

Summarizing our findings, the main trends of news diffusion in social media are:

  • SNS and Blog users are less active but more reactive for real-world news than for other arbitrary topics.
  • Regarding activity, the Pareto principle is not applied uniformly across different online social systems.
  • Active news media are tightly connected, enhancing the opportunity to be exposed to other social systems.
  • One week is a meaningful period for tracking news cascades regardless of system types and news topics.
  • The most active news category in each system corresponds to the most reactive news category.
  • Larger diffusion exhibits higher heterogeneity.
  • News is a diligent creator and diligent adopter, SNS is a lazy creator and diligent adopter, and Blog is a diligent creator and lazy adopter.

For more details, please refer to our paper entitled: “Trends of News Diffusion in Social Media based on Crowd Phenomena” by Minkyoung Kim, David Newth and Peter Christen, Second Workshop on Social News on the Web @ WWW ’14 (SNOW 2014), Seoul, Korea, April 2014.

[1] ICWSM’11 Dataset.
[2] Wikipedia Current Events in January, 2011.
[3] M. Kim, D. Newth, and P. Christen. Macro-level information transfer across social networks. In WWW Companion, Seoul, Korea, 2014.
[4] M. Kim, D. Newth, and P. Christen. Modeling dynamics of diffusion across heterogeneous social networks. Entropy, 15(10):4215-4242, 2013. doi:10.3390/e15104215

News from the Crowd: Grassroots and Collaborative Journalism in the Digital Age

@SNOW/WWW, 2014, by Jochen Spangenberg, Nicolaus Heise

Information content provided by members of the general public via Social Networks such as Twitter, Facebook and YouTube, to name but a few, is playing an ever-increasing role in the detection, production and distribution of news. “Ordinary citizens” (by this we mean non-professional journalists) can be more actively involved in the production and distribution of news, in particular because of the availability of

  • affordable portable devices (especially smartphones) that allow for the capturing of information in an audiovisual format;
  • Internet access almost anytime and anywhere (mobile and stationary), more and more of it provided at high speeds;
  • platforms with networking capabilities (especially Social Networks) that allow for the sharing of content and the fast spreading of information to potentially millions of people

News organizations and information content providers, in turn, can not neglect these developments. Their former “gatekeeper” functions have been challenged profoundly: no longer can only a selected few decide what is in the public interest or can be exploited commercially.

In our paper, we will take a closer look at two new forms of audience involvement and the impacts this has on news and information production. We label these concepts (1) grassroots journalism and (2) collaborative journalism. For our purpose, grassroots journalism is defined as the collection, dissemination and analysis of news and information by the general public, especially by means of the Internet. Collaborative journalism, in turn, labels ways in which media organizations and professional journalists involve external parties in the production of information, thereby making audience contributions part of the storytelling process or the story itself.

Our investigations have shown that while there are a variety of areas that require further detailed investigations, grassroots and collaborative journalism will continue to grow. It can be expected that accelerating technological developments, audience’s eagerness to “get involved” and increasing Internet access will motivate even more people to participate in the process of news gathering and information dissemination. At the same time, further strategies that meet the emerging challenges need to be developed in order to maintain (or improve) the quality of grassroots/collaborative news coverage: All this is of great importance for the prospering of the media landscape as a whole, and thereby the functioning of democratic societies.

This work has been supported by the European Commission under the EC co-funded projects SocialSensor (FP7-ICT-2011-7-287975, and REVEAL (FP7-ICT-2013-10-610928,

For a more elaborate investigation please consult our position paper entitled: “News from the Crowd: Grassroots and Collaborative Journalism in the Digital Age” by Jochen Spangenberg and Nicolaus Heise, Second Workshop on Social News on the Web @ WWW ’14 (SNOW 2014), Seoul, Korea, April 2014.

Please note: The views and findings presented here and in the position paper are those of the named authors. They are not necessarily identical with those of Deutsche Welle or SocialSensor / REVEAL project partners, nor do they in any way represent the views of the European Commission.

Alethiometer: a Framework for Assessing Trustworthiness and Content Validity in Social Media


@SNOW/WWW, 2014, by Eva Jaho, Efstratios Tzoannos, Aris Papadopoulos, Nikos Sarrisalethiometer1

Social media refers to the interaction among people in virtual communities and networks, powered by Web 2.0 technologies, during which they exchange news, ideas, and generally information. However, it is not easy to digest the massive amounts of information that the community is offering. Hundreds of new blogs are appearing every day, hundreds of thousands of pictures and videos are uploaded and millions of tweets are posted every minute. Validation of content or presentation of it in an objective manner is a crucial challenge, in order to avoid manipulations and guarantee the democratic role of the media.

We have developed the framework of a platform for assessing trustworthiness in one of the most popular social media, Twitter. “Alethiometer” derives from the Greek word Αλήθεια, which means truth. The principle of Alethiometer is determined around three axes: Contributor, Content and Context. The analysis of the validity of Contributor concerns parameters such as trust, reputation and influence of an information source. Content validity is expressed through parameters such as the language used, the history and possible manipulations performed on the content. And finally, Context analysis examines whether the ’what’, ’when’ and ’where’ of an online publication concur with each other. Joint analysis of the validity of Contributor, Content and Context provides a more thorough approach for revealing trustworthiness.

Existing methods on the veracity of social media content has focused on validating either the source of content or the content itself, but not these two aspects simultaneously. Furthermore, the analysis of the context of a post or article (publication date, place, etc.) and its coherence to the content itself can reveal mistakes that are often hidden in a well written text. Joint analysis of the validity of Contributor, Content and Context provides a more thorough approach for revealing truthfulness.

For the analysis of each framework category, we have defined a set of related parameters, which we term as modalities. Modalities concerning a contributor include the reputation, history of valid contributions, popularity, influence, and account validity. Modalities referring to the posted content include the importance and reputation of the contained web links, the content popularity, influence, its originality, authenticity and objectivity. Finally, analysis of context refers to cross-checking for similar reports in different social media, the coherence between the content and tags, attached links and multimedia, and the coherence between reference location/time and publication location/time.

For each item (post, tweet, etc.), modality parameters are rated on a discrete 5-point scale, from 0 to 4. A score for the significance of each parameter is also derived by comparing its value with the value of parameters of all other similar items. By combining the parameter scores with the significance of modality parameters, a single score is derived for each modality, which characterizes the quality of a contributor and of the content provided by that contributor.

A preliminary statistical analysis on a large corpus of Twitter data has been conducted, which showed that different parameters describing a user (no. of followers, no. of tweets, account age) exhibit a different behavior and are highly uncorrelated. For example, a ‘new’ user can have a large number of tweets/followers and vice-versa. The highest correlation found is between friends and followers, whereas the lowest between followers and tweets. All correlation values were however quite small, which means that these parameters are relatively independent from one another and have to be considered individually. These findings support our approach to examine many different parameters in order to evaluate social media content and contributors, and decide on their trustworthiness and validity.

This work has been supported by the European Commission under EU projects SocialSensor (FP7-ICT-2011-7-287975, and REVEAL (FP7-ICT-2013-10-610928,

For more details, please consult our position paper:

– Eva Jaho, Efstratios Tzoannos, Aris Papadopoulos, Nikos Sarris, “Alethiometer: a framework for assessing trustworthiness and content validity in social media”, Second Workshop on Social News on the Web @ WWW ’14 (SNOW 2014), Seoul, Korea, April 2014.