Twitter mood predicts the stock market (2011) Johan Bollen, Huina Mao and Xiaojun Zeng Journal of computational science, 2 (1), 8-Jan
Behavioral economics tells us that emotions can profoundly affect individual behavior and decision-making. Does this also apply to societies at large, i.e. can societies experience mood states that affect their collective decision making? By extension is the public mood correlated or even predictive of economic indicators? Here we investigate whether measurements of collective mood states derived from large-scale Twitter feeds are correlated to the value of the Dow Jones Industrial Average (DJIA) over time. We analyze the text content of daily Twitter feeds by two mood tracking tools, namely OpinionFinder that measures positive vs. negative mood and Google-Profile of Mood States (GPOMS) that measures mood in terms of 6 dimensions (Calm, Alert, Sure, Vital, Kind, and Happy). We cross-validate the resulting mood time series by comparing their ability to detect the public's response to the presidential election …
Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena (2011) Johan Bollen, Huina Mao and Alberto Pepe
We perform a sentiment analysis of all tweets published on the microblogging platform Twitter in the second half of 2008. We use a psychometric instrument to extract six mood states (tension, depression, anger, vigor, fatigue, confusion) from the aggregated Twitter content and compute a six-dimensional mood vector for each day in the timeline. We compare our results to a record of popular events gathered from media and sources. We find that events in the social, political, cultural and economic sphere do have a significant, immediate and highly specific effect on the various dimensions of public mood. We speculate that large scale analyses of mood can provide a solid platform to model collective emotive trends in terms of their predictive value with regards to existing social as well as economic indicators.
Co-authorship networks in the digital library research community (2005) Xiaoming Liu, Johan Bollen, Michael L Nelson and Herbert Van de Sompel Information processing & management, 41 (6), 1462-1480
The field of digital libraries (DLs) coalesced in 1994: the first digital library conferences were held that year, awareness of the World Wide Web was accelerating, and the National Science Foundation awarded $24 Million (US) for the Digital Library Initiative (DLI). In this paper we examine the state of the DL domain after a decade of activity by applying social network analysis to the co-authorship network of the past ACM, IEEE, and joint ACM/IEEE digital library conferences. We base our analysis on a common binary undirectional network model to represent the co-authorship network, and from it we extract several established network measures. We also introduce a weighted directional network model to represent the co-authorship network, for which we define AuthorRank as an indicator of the impact of an individual author in the network. The results are validated against conference program committee members in …
A principal component analysis of 39 scientific impact measures (2009) Johan Bollen, Herbert Van de Sompel, Aric Hagberg and Ryan Chute PloS one, 4 (6), e6022
Background The impact of scientific publications has traditionally been expressed in terms of citation counts. However, scientific activity has moved online over the past decade. To better capture scientific impact in the digital era, a variety of new impact measures has been proposed on the basis of social network analysis and usage log data. Here we investigate how these new measures relate to each other, and how accurately and completely they express scientific impact. Methodology We performed a principal component analysis of the rankings produced by 39 existing and proposed measures of scholarly impact that were calculated on the basis of both citation and usage log data. Conclusions Our results indicate that the notion of scientific impact is a multi-dimensional construct that can not be adequately measured by any single indicator, although some measures are more suitable than others. The commonly used citation Impact Factor is not positioned at the core of this construct, but at its periphery, and should thus be used with caution.
Journal Status (2006) Johan Bollen, M Rodriguez and Herbert Van de Sompel Scientometrics, 69 (3), 669-687
The status of an actor in a social context is commonly defined in terms of two factors: the total number of endorsements the actor receives from other actors and the prestige of the endorsing actors. These two factors indicate the distinction between popularity and expert appreciation of the actor, respectively. We refer to the former as popularity and to the latter as prestige. These notions of popularity and prestige also apply to the domain of scholarly assessment. The ISI Impact Factor (ISI IF) is defined as the mean number of citations a journal receives over a 2 year period. By merely counting the amount of citations and disregarding the prestige of the citing journals, the ISI IF is a metric of popularity, not of prestige. We demonstrate how a weighted version of the popular PageRank algorithm can be used to obtain a metric that reflects prestige. We contrast the rankings of journals according to their ISI IF and their …
Toward alternative metrics of journal impact: A comparison of download and citation data (2005) Johan Bollen, Herbert Van de Sompel, Joan A Smith and Rick Luce Information Processing & Management, 41 (6), 1419-1440
We generated networks of journal relationships from citation and download data, and determined journal impact rankings from these networks using a set of social network centrality metrics. The resulting journal impact rankings were compared to the ISI IF. Results indicate that, although social network metrics and ISI IF rankings deviate moderately for citation-based journal networks, they differ considerably for journal networks derived from download data. We believe the results represent a unique aspect of general journal impact that is not captured by the ISI IF. These results furthermore raise questions regarding the validity of the ISI IF as the sole assessment of journal impact, and suggest the possibility of devising impact metrics based on usage information in general.
How the scientific community reacts to newly submitted preprints: Article downloads, twitter mentions, and citations (2012) Xin Shuai, Alberto Pepe and Johan Bollen PloS one, 7 (11), e47523
We analyze the online response to the preprint publication of a cohort of 4,606 scientific articles submitted to the preprint database arXiv.org between October 2010 and May 2011. We study three forms of responses to these preprints: downloads on the arXiv.org site, mentions on the social media site Twitter, and early citations in the scholarly record. We perform two analyses. First, we analyze the delay and time span of article downloads and Twitter mentions following submission, to understand the temporal configuration of these reactions and whether one precedes or follows the other. Second, we run regression and correlation tests to investigate the relationship between Twitter mentions, arXiv downloads, and article citations. We find that Twitter mentions and arXiv downloads of scholarly articles follow two distinct temporal patterns of activity, with Twitter mentions having shorter delays and narrower time spans than arXiv downloads. We also find that the volume of Twitter mentions is statistically correlated with arXiv downloads and early citations just months after the publication of a preprint, with a possible bias that favors highly mentioned articles.
Happiness is assortative in online social networks (2011) Johan Bollen, Bruno Gonçalves, Guangchen Ruan and Huina Mao Artificial life, 17 (3), 237-251
Online social networking communities may exhibit highly complex and adaptive collective behaviors. Since emotions play such an important role in human decision making, how online networks modulate human collective mood states has become a matter of considerable interest. In spite of the increasing societal importance of online social networks, it is unknown whether assortative mixing of psychological states takes place in situations where social ties are mediated solely by online networking services in the absence of physical contact. Here, we show that the general happiness, or subjective well-being (SWB), of Twitter users, as measured from a 6-month record of their individual tweets, is indeed assortative across the Twitter social network. Our results imply that online social networks may be equally subject to the social mechanisms that cause assortative mixing in real social networks and that such …
Clickstream data yields high-resolution maps of science (2009) Johan Bollen, Herbert Van de Sompel, Aric Hagberg, Luis Bettencourt, Ryan Chute, Marko A Rodriguez ... PLoS one, 4 (3), e4803
Background Intricate maps of science have been created from citation data to visualize the structure of scientific activity. However, most scientific publications are now accessed online. Scholarly web portals record detailed log data at a scale that exceeds the number of all existing citations combined. Such log data is recorded immediately upon publication and keeps track of the sequences of user requests (clickstreams) that are issued by a variety of users across many different domains. Given these advantages of log datasets over citation data, we investigate whether they can produce high-resolution, more current maps of science. Methodology Over the course of 2007 and 2008, we collected nearly 1 billion user interactions recorded by the scholarly web portals of some of the most significant publishers, aggregators and institutional consortia. The resulting reference data set covers a significant part of world-wide use of scholarly web portals in 2006, and provides a balanced coverage of the humanities, social sciences, and natural sciences. A journal clickstream model, i.e. a first-order Markov chain, was extracted from the sequences of user interactions in the logs. The clickstream model was validated by comparing it to the Getty Research Institute's Architecture and Art Thesaurus. The resulting model was visualized as a journal network that outlines the relationships between various scientific domains and clarifies the connection of the social sciences and humanities to the natural sciences. Conclusions Maps of science resulting from large-scale clickstream data provide a detailed, contemporary view of scientific activity and correct the …
More tweets, more votes: Social media as a quantitative indicator of political behavior (2013) Joseph DiGrazia, Karissa McKelvey, Johan Bollen and Fabio Rojas PloS one, 8 (11), e79449
Is social media a valid indicator of political behavior? There is considerable debate about the validity of data extracted from social media for studying offline behavior. To address this issue, we show that there is a statistically significant association between tweets that mention a candidate for the U.S. House of Representatives and his or her subsequent electoral performance. We demonstrate this result with an analysis of 542,969 tweets mentioning candidates selected from a random sample of 3,570,054,618, as well as Federal Election Commission data from 795 competitive races in the 2010 and 2012 U.S. congressional elections. This finding persists even when controlling for incumbency, district partisanship, media coverage of the race, time, and demographic variables such as the district's racial and gender composition. Our findings show that reliable data about political behavior can be extracted from social media.
Twitter mood as a stock market predictor (2011) Johan Bollen and Huina Mao Computer, 44 (10), 91-94
Behavioral finance researchers can apply computational methods to large-scale social media data to better understand and predict markets.
Computational fact checking from knowledge networks (2015) Giovanni Luca Ciampaglia, Prashant Shiralkar, Luis M Rocha, Johan Bollen, Filippo Menczer and Alessandro Flammini PloS one, 10 (6), e0128193
Traditional fact checking by expert journalists cannot keep up with the enormous volume of information that is now generated online. Computational fact checking may significantly enhance our ability to evaluate the veracity of dubious information. Here we show that the complexities of human fact checking can be approximated quite well by finding the shortest path between concept nodes under properly defined semantic proximity metrics on knowledge graphs. Framed as a network problem this approach is feasible with efficient computational techniques. We evaluate this approach by examining tens of thousands of claims related to history, entertainment, geography, and biographical information using a public knowledge graph extracted from Wikipedia. Statements independently known to be true consistently receive higher support via our method than do false ones. These findings represent a significant step toward scalable computational fact-checking methods that may one day mitigate the spread of harmful misinformation.
Usage impact factor: The effects of sample characteristics on usage‐based impact metrics (2008) Johan Bollen and Herbert van de Sompel Journal of the American Society for Information Science and technology, 59 (1), 136-149
There exist ample demonstrations that indicators of scholarly impact analogous to the citation‐based ISI Impact Factor can be derived from usage data; however, so far, usage can practically be recorded only at the level of distinct information services. This leads to community‐specific assessments of scholarly impact that are difficult to generalize to the global scholarly community. In contrast, the ISI Impact Factor is based on citation data and thereby represents the global community of scholarly authors. The objective of this study is to examine the effects of community characteristics on assessments of scholarly impact from usage. We define a journal Usage Impact Factor that mimics the definition of the Thomson Scientific ISI Impact Factor. Usage Impact Factor rankings are calculated on the basis of a large‐scale usage dataset recorded by the linking servers of the California State University system from 2003 to …
The World-Wide Web as a Super-Brain: from metaphor to model (1996) Francis Heylighen and Johan Bollen
ticular has been characterized by the explosive develop-If society is viewed as a super-organism, communication networks play the role of its brain. This metaphor is developed into a model for the design of a more intelligent global network. The World-Wide Web, through its distributed hypermedia architecture, functions as an “associative memory”, which may “learn” by the strengthening of frequently used links. Software agents, exploring the Web through spreading activation, function as problem-solving “thoughts”. Users are integrated into this " super-brain " through direct manmachine interfaces and the reciprocal exchange of knowledge between individual and Web. 1
Usage bibliometrics (2011) Michael J Kurtz and Johan Bollen arXiv preprint arXiv:1102.2891,
Scholarly usage data provides unique opportunities to address the known shortcomings of citation analysis. However, the collection, processing and analysis of usage data remains an area of active research. This article provides a review of the state-of-the-art in usage-based informetric, ie the use of usage data to study the scholarly process.