Filippo Menczer Profile Picture

Filippo Menczer

  • fil@indiana.edu
  • Brand Hall E314
  • (812) 856-1377
  • Home Website
  • Director
    Observatory on Social Media
  • Member
    Center for Complex Networks and Systems Research
  • Member
    IU Network Science Institute
  • Member
    Cognitive Science Program
  • Professor
    Informatics
  • Professor
    Computer Science
  • Adjunct Professor
    Physics

Field of study

  • Computational social science
  • Network science
  • Web science
  • Data science
  • Science of Science

Research interests

  • Information diffusion in complex information and techno-social networks. Agent-based models and data analytic methods to study the online spread of mis/disinformation. Tools to detect and counter the manipulation of social media.

Representative publications

Social phishing (2007)
Tom N Jagatic, Nathaniel A Johnson, Markus Jakobsson and Filippo Menczer
Communications of the ACM, 50 (10), 94-100

Phishing is a form of social engineering in which an attacker attempts to fraudulently acquire sensitive information from a victim by impersonating a trustworthy third party. Phishing attacks today typically employ generalized “lures.” For instance, a phisher misrepresenting himself as a large banking corporation or popular on-line auction site will have a reasonable yield, despite knowing little to nothing about the recipient. In a study by Gartner [11], about 19% of all those surveyed reported having clicked on a link in a phishing email, and 3% admitted to giving up financial or personal information. However, no existing studies provide us with a baseline success rate for individual phishing attacks. This was one of the motivating factors for the research project described here.It is worth noting that phishers are getting smarter. Following trends in other online crimes, it is inevitable that future generations of phishing attacks will incorporate greater elements of context to become more effective and thus more dangerous for society. For instance, suppose a phisher were able to induce an interruption of service to a frequently used resource, eg, to cause a victim’s password to be locked by generating excessive authentication failures. The phisher could then notify the victim of a “security threat.” Such a message may be welcome or expected by the victim, who would then be easily induced into disclosing personal information.

Political polarization on Twitter (2011)
Michael Conover, Jacob Ratkiewicz, Matthew R Francisco, Bruno Gonçalves, Filippo Menczer and Alessandro Flammini
Proc. 5th International AAAI Conference on Weblogs and Social Media (ICWSM), 89-96

In this study we investigate how social media shape the networked public sphere and facilitate communication between communities with different political orientations. We examine two networks of political communication on Twitter, comprised of more than 250,000 tweets from the six weeks leading up to the 2010 US congressional midterm elections. Using a combination of network clustering algorithms and manually-annotated data we demonstrate that the network of political retweets exhibits a highly segregated partisan structure, with extremely limited connectivity between left-and right-leaning users. Surprisingly this is not the case for the user-to-user mention network, which is dominated by a single politically heterogeneous cluster of users in which ideologically-opposed individuals interact at a much higher rate compared to the network of retweets. To explain the distinct topologies of the retweet and mention networks we conjecture that politically motivated individuals provoke interaction by injecting partisan content into information streams whose primary audience consists of ideologically-opposed users. We conclude with statistical evidence in support of this hypothesis.

The rise of social bots (2016)
E. Ferrara, O. Varol, C. Davis, F. Menczer, A. and Flammini
Communicationa of the ACM, 59 (7), 96-104

Today's social bots are sophisticated and sometimes menacing. Indeed, their presence can endanger online ecosystems as well as our society.

Truthy: mapping the spread of astroturf in microblog streams (2011)
Jacob Ratkiewicz, Michael Conover, Mark Meiss, Bruno Gonçalves, Snehal Patil, Alessandro Flammini ...
ACM. 249-252

Online social media are complementing and in some cases replacing person-to-person social interaction and redefining the diffusion of information. In particular, microblogs have become crucial grounds on which public relations, marketing, and political battles are fought. We demonstrate a web service that tracks political memes in Twitter and helps detect astroturfing, smear campaigns, and other misinformation in the context of US political elections. We also present some cases of abusive behaviors uncovered by our service. Our web service is based on an extensible framework that will enable the real-time analysis of meme diffusion in social media by mining, visualizing, mapping, classifying, and modeling massive streams of public microblogging events.

Predicting the political alignment of twitter users (2011)
Michael D Conover, Bruno Gonçalves, Jacob Ratkiewicz, Alessandro Flammini and Filippo Menczer
IEEE. 192-199

The widespread adoption of social media for political communication creates unprecedented opportunities to monitor the opinions of large numbers of politically active individuals in real time. However, without a way to distinguish between users of opposing political alignments, conflicting signals at the individual level may, in the aggregate, obscure partisan differences in opinion that are important to political strategy. In this article we describe several methods for predicting the political alignment of Twitter users based on the content and structure of their political communication in the run-up to the 2010 U.S. midterm elections. Using a data set of 1,000 manually-annotated individuals, we find that a support vector machine (SVM) trained on hash tag metadata outperforms an SVM trained on the full text of users' tweets, yielding predictions of political affiliations with 91% accuracy. Applying latent semantic analysis to …

Detecting and Tracking Political Abuse in Social Media (2011)
Jacob Ratkiewicz, Michael Conover, Mark R Meiss, Bruno Gonçalves, Alessandro Flammini and Filippo Menczer
Proc. 5th International AAAI Conference on Weblogs and Social Media (ICWSM), 297-304

We study astroturf political campaigns on microblogging platforms: politically-motivated individuals and organizations that use multiple centrally-controlled accounts to create the appearance of widespread support for a candidate or opinion. We describe a machine learning framework that combines topological, content-based and crowdsourced features of information diffusion networks on Twitter to detect the early stages of viral spreading of political misinformation. We present promising preliminary results with better than 96% accuracy in the detection of astroturf content in the run-up to the 2010 US midterm elections.

Competition among memes in a world with limited attention (2012)
Lilian Weng, Alessandro Flammini, Alessandro Vespignani and Fillipo Menczer
Scientific reports, 2 335

The wide adoption of social media has increased the competition among ideas for our finite attention. We employ a parsimonious agent-based model to study whether such a competition may affect the popularity of different memes, the diversity of information we are exposed to, and the fading of our collective interests for specific topics. Agents share messages on a social network but can only pay attention to a portion of the information they receive. In the emerging dynamics of information diffusion, a few memes go viral while most do not. The predictions of our model are consistent with empirical data from Twitter, a popular microblogging platform. Surprisingly, we can explain the massive heterogeneity in the popularity and persistence of memes as deriving from a combination of the competition for our limited attention and the structure of the social network, without the need to assume different intrinsic values …

The science of fake news (2018)
David MJ Lazer, Matthew A Baum, Yochai Benkler, Adam J Berinsky, Kelly M Greenhill, Filippo Menczer ...
Science, 359 (6380), 1094-1096

The rise of fake news highlights the erosion of long-standing institutional bulwarks against misinformation in the internet age. Concern over the problem is global. However, much remains unknown regarding the vulnerabilities of individuals, institutions, and society to manipulations by malicious actors. A new system of safeguards is needed. Below, we discuss extant social and computer science research regarding belief in fake news and the mechanisms by which it spreads. Fake news has a long history, but we focus on unanswered scientific questions raised by the proliferation of its most recent, politically oriented incarnation. Beyond selected references in the text, suggested further reading can be found in the supplementary materials.

Virality prediction and community structure in social networks (2013)
Lilian Weng, Filippo Menczer and Yong-Yeol Ahn
Scientific reports, 3 2522

How does network structure affect diffusion? Recent studies suggest that the answer depends on the type of contagion. Complex contagions, unlike infectious diseases (simple contagions), are affected by social reinforcement and homophily. Hence, the spread within highly clustered communities is enhanced, while diffusion across communities is hampered. A common hypothesis is that memes and behaviors are complex contagions. We show that, while most memes indeed spread like complex contagions, a few viral memes spread across many communities, like diseases. We demonstrate that the future popularity of a meme can be predicted by quantifying its early spreading pattern in terms of community concentration. The more communities a meme permeates, the more viral it is. We present a practical method to translate data about community structure into predictive knowledge about what information will …

Topical web crawlers: Evaluating adaptive algorithms (2004)
Filippo Menczer, Gautam Pant and Padmini Srinivasan
ACM Transactions on Internet Technology (TOIT), 4 (4), 378-419

Topical crawlers are increasingly seen as a way to address the scalability limitations of universal search engines, by distributing the crawling process across users, queries, or even client computers. The context available to such crawlers can guide the navigation of links with the goal of efficiently locating highly relevant target pages. We developed a framework to fairly evaluate topical crawling algorithms under a number of performance metrics. Such a framework is employed here to evaluate different algorithms that have proven highly competitive among those proposed in the literature and in our own previous research. In particular we focus on the tradeoff between exploration and exploitation of the cues available to a crawler, and on adaptive crawlers that use machine learning techniques to guide their search. We find that the best performance is achieved by a novel combination of explorative and exploitative …

Feature selection in unsupervised learning via evolutionary search (2000)
YeongSeog Kim, W Nick Street and Filippo Menczer
365-369

Feature subset selection is an important problem in knowledge discovery, not only for the insight gained from determining relevant modeling variables but also for the improved understandability, scalability, and possibly, accuracy of the resulting models. In this paper we consider the problem of feature selection for unsupervised learning. A number of heuristic criteria can be used to estimate the quality of clusters built from a given feature subset. Rather than combining such criteria, we use ELSA, an evolutionary local selection algorithm that maintains a diverse population of solutions that approximate the Pareto front in a multidimensional objective space. Each evolved solution represents a feature subset and a number of clusters; a standard K-means algorithm is applied to form the given number of clusters based on the selected features. Preliminary results on both real and synthetic data show promise in finding Pareto-optimal solutions through which we can identify the significant features and the correct number of clusters.

Evaluating similarity measures for emergent semantics of social tagging (2009)
Benjamin Markines, Ciro Cattuto, Filippo Menczer, Dominik Benz, Andreas Hotho and Gerd Stumme
ACM. 641-650

Social bookmarking systems are becoming increasingly important data sources for bootstrapping and maintaining Semantic Web applications. Their emergent information structures have become known as folksonomies. A key question for harvesting semantics from these systems is how to extend and adapt traditional notions of similarity to folksonomies, and which measures are best suited for applications such as community detection, navigation support, semantic search, user profiling and ontology learning. Here we build an evaluation framework to compare various general folksonomy-based similarity measures, which are derived from several established information-theoretic, statistical, and practical measures. Our framework deals generally and symmetrically with users, tags, and resources. For evaluation purposes we focus on similarity between tags and between resources and consider different methods to …

Online human-bot interactions: Detection, estimation, and characterization (2017)
Onur Varol, Emilio Ferrara, Clayton A Davis, Filippo Menczer and Alessandro Flammini

Increasing evidence suggests that a growing amount of social media content is generated by autonomous entities known as social bots. In this work we present a framework to detect such entities on Twitter. We leverage more than a thousand features extracted from public data and meta-data about users: friends, tweet content and sentiment, network patterns, and activity time series. We benchmark the classification framework by using a publicly available dataset of Twitter bots. This training data is enriched by a manually annotated collection of active Twitter users that include both humans and bots of varying sophistication. Our models yield high accuracy and agreement with each other and can detect bots of different nature. Our estimates suggest that between 9% and 15% of active Twitter accounts are bots. Characterizing ties among accounts, we observe that simple bots tend to interact with bots that exhibit more human-like behaviors. Analysis of content flows reveals retweet and mention strategies adopted by bots to interact with different target groups. Using clustering analysis, we characterize several subclasses of accounts, including spammers, self promoters, and accounts that post content from connected applications.

Evaluating topic-driven web crawlers (2001)
Filippo Menczer, Gautam Pant, Padmini Srinivasan and Miguel E Ruiz
ACM. 241-249

Due to limited bandwidth, storage, and computational resources, and to the dynamic nature of the Web, search engines cannot index every Web page, and even the covered portion of the Web cannot be monitored continuously for changes. Therefore it is essential to develop effective crawling strategies to prioritize the pages to be indexed. The issue is even more important for topic-specific search engines, where crawlers must make additional decisions based on the relevance of visited pages. However, it is difficult to evaluate alternative crawling strategies because relevant sets are unknown and the search space is changing. We propose three different methods to evaluate crawling strategies. We apply the proposed metrics to compare three topic-driven crawling algorithms based on similarity ranking, link analysis, and adaptive agents.

Botornot: A system to evaluate social bots (2016)
Clayton Allen Davis, Onur Varol, Emilio Ferrara, Alessandro Flammini and Filippo Menczer
International World Wide Web Conferences Steering Committee. 273-274

While most online social media accounts are controlled by humans, these platforms also host automated agents called social bots or sybil accounts. Recent literature reported on cases of social bots imitating humans to manipulate discussions, alter the popularity of users, pollute content and spread misinformation, and even perform terrorist propaganda and recruitment actions. Here we present BotOrNot, a publicly-available service that leverages more than one thousand features to evaluate the extent to which a Twitter account exhibits similarity to the known characteristics of social bots. Since its release in May 2014, BotOrNot has served over one million requests via our website and APIs.

Dissertation Committee Service

Dissertation Committee Service
Author Dissertation Title Committee
Laine, Tei Agent-Based Model Selection Framework For Complex Adaptive Systems (August 2006) Menczer, F. (Chair), Gasser, M., Busemeyer, J., Janssen, M.
Edit your profile