Staša Milojević Profile Picture

Staša Milojević

  • smilojev@indiana.edu
  • Wells Library - LI011
  • (812) 856-4182
  • Home Website
  • Professor
    School of Informatics and Computing

Field of study

  • Scientometrics; informetrics; information science; computational social science

Education

  • Ph.D., University of California, Los Angeles, 2009

Research interests

  • My research combines theoretical, mathematical, statistical and computational approaches to study how modern scientific disciplines/fields form, organize and develop.

Representative publications

The cognitive structure of library and information science: Analysis of article title words (2011)
Staša Milojević, Cassidy R Sugimoto, Erjia Yan and Ying Ding
Journal of the American Society for Information Science and Technology, 62 (10), 1933-1953

This study comprises a suite of analyses of words in article titles in order to reveal the cognitive structure of Library and Information Science (LIS). The use of title words to elucidate the cognitive structure of LIS has been relatively neglected. The present study addresses this gap by performing (a) co‐word analysis and hierarchical clustering, (b) multidimensional scaling, and (c) determination of trends in usage of terms. The study is based on 10,344 articles published between 1988 and 2007 in 16 LIS journals. Methodologically, novel aspects of this study are: (a) its large scale, (b) removal of non‐specific title words based on the “word concentration” measure (c) identification of the most frequent terms that include both single words and phrases, and (d) presentation of the relative frequencies of terms using “heatmaps”. Conceptually, our analysis reveals that LIS consists of three main branches: the traditionally …

Science of science (2018)
Santo Fortunato, Carl T Bergstrom, Katy Börner, James A Evans, Dirk Helbing, Staša Milojević ...
American Association for the Advancement of Science. 359 (6379), eaao0185

<h3 class="gsh_h3">BACKGROUND</h3> The increasing availability of digital data on scholarly inputs and outputs—from research funding, productivity, and collaboration to paper citations and scientist mobility—offers unprecedented opportunities to explore the structure and evolution of science. The science of science (SciSci) offers a quantitative understanding of the interactions among scientific agents across diverse geographic and temporal scales: It provides insights into the conditions underlying creativity and the genesis of scientific discovery, with the ultimate goal of developing tools and policies that have the potential to accelerate science. In the past decade, SciSci has benefited from an influx of natural, computational, and social scientists who together have developed big data–based capabilities for empirical analysis and generative modeling that capture the unfolding of science, its institutions, and its workforce. The value …

Principles of scientific research team formation and evolution (2014)
Staša Milojević
Proceedings of the National Academy of Sciences, 111 (11), 3984-3989

Research teams are the fundamental social unit of science, and yet there is currently no model that describes their basic property: size. In most fields, teams have grown significantly in recent decades. We show that this is partly due to the change in the character of team size distribution. We explain these changes with a comprehensive yet straightforward model of how teams of different sizes emerge and grow. This model accurately reproduces the evolution of empirical team size distribution over the period of 50 y. The modeling reveals that there are two modes of knowledge production. The first and more fundamental mode employs relatively small, “core” teams. Core teams form by a Poisson process and produce a Poisson distribution of team sizes in which larger teams are exceedingly rare. The second mode employs “extended” teams, which started as core teams, but subsequently accumulated new members …

Citation content analysis (CCA): A framework for syntactic and semantic analysis of citation content (2013)
Guo Zhang, Ying Ding and Staša Milojević
Journal of the American Society for Information Science and Technology, 64 (7), 1490-1503

This study proposes a new framework for citation content analysis (CCA), for syntactic and semantic analysis of citation content that can be used to better analyze the rich sociocultural context of research behavior. This framework could be considered the next generation of citation analysis. The authors briefly review the history and features of content analysis in traditional social sciences and its previous application in library and information science (LIS). Based on critical discussion of the theoretical necessity of a new method as well as the limits of citation analysis, the nature and purposes of CCA are discussed, and potential procedures to conduct CCA, including principles to identify the reference scope, a two‐dimensional (citing and cited) and two‐module (syntactic and semantic) codebook, are provided and described. Future work and implications are also suggested.

Modes of collaboration in modern science: Beyond power laws and preferential attachment (2010)
Staša Milojević
Journal of the American Society for Information Science and Technology, 61 (7), 1410-1423

The goal of the study was to determine the underlying processes leading to the observed collaborator distribution in modern scientific fields, with special attention to nonpower‐law behavior. Nanoscience is used as a case study of a modern interdisciplinary field and its coauthorship network for 2000–2004 period is constructed from the NanoBank database. We find three collaboration modes that correspond to three distinct ranges in the distribution of collaborators: (1) for authors with fewer than 20 collaborators (the majority) preferential attachment does not hold and they form a log‐normal “hook” instead of a power law; (2) authors with more than 20 collaborators benefit from preferential attachment and form a power law tail; and (3) authors with between 250 and 800 collaborators are more frequent than expected because of the hyperauthorship practices in certain subfields.

Power law distributions in information science: Making the case for logarithmic binning (2010)
Staša Milojević
Journal of the American Society for Information Science and Technology, 61 (12), 2417-2425

We suggest partial logarithmic binning as the method of choice for uncovering the nature of many distributions encountered in information science (IS). Logarithmic binning retrieves information and trends “not visible” in noisy power law tails. We also argue that obtaining the exponent from logarithmically binned data using a simple least square method is in some cases warranted in addition to methods such as the maximum likelihood. We also show why often‐used cumulative distributions can make it difficult to distinguish noise from genuine features and to obtain an accurate power law exponent of the underlying distribution. The treatment is nontechnical, aimed at IS researchers with little or no background in mathematics.

Scientometrics (2012)
Loet Leydesdorff and Staša Milojević
arXiv preprint arXiv:1208.4566,

The paper provides an overview of the field of scientometrics, that is: the study of science, technology, and innovation from a quantitative perspective. We cover major historical milestones in the development of this specialism from the 1960s to today and discuss its relationship with the sociology of scientific knowledge, the library and information sciences, and science policy issues such as indicator development. The disciplinary organization of scientometrics is analyzed both conceptually and empirically, using a map of journals cited in the core journal of the field, entitled Scientometrics. A state-of-the-art review of five major research threads is provided:(1) the measurement of impact;(2) the delineation of reference sets;(3) theories of citation;(4) mapping science; and (5) the policy and management contexts of indicator developments.

Accuracy of simple, initials-based methods for author name disambiguation (2013)
Staša Milojević
Journal of Informetrics, 7 (4), 767-773

There are a number of solutions that perform unsupervised name disambiguation based on the similarity of bibliographic records or common coauthorship patterns. Whether the use of these advanced methods, which are often difficult to implement, is warranted depends on whether the accuracy of the most basic disambiguation methods, which only use the author's last name and initials, is sufficient for a particular purpose. We derive realistic estimates for the accuracy of simple, initials-based methods using simulated bibliographic datasets in which the true identities of authors are known. Based on the simulations in five diverse disciplines we find that the first initial method already correctly identifies 97% of authors. An alternative simple method, which takes all initials into account, is typically two times less accurate, except in certain datasets that can be identified by applying a simple criterion. Finally, we introduce a …

Topics in dynamic research communities: An exploratory study for the field of information retrieval (2012)
Erjia Yan, Ying Ding, Staša Milojević and Cassidy R Sugimoto
Journal of Informetrics, 6 (1), 140-153

Research topics and research communities are not disconnected from each other: communities and topics are interwoven and co-evolving. Yet, scientometric evaluations of topics and communities have been conducted independently and synchronically, with researchers often relying on homogeneous unit of analysis, such as authors, journals, institutions, or topics. Therefore, new methods are warranted that examine the dynamic relationship between topics and communities. This paper examines how research topics are mixed and matched in evolving research communities by using a hybrid approach which integrates both topic identification and community detection techniques. Using a data set on information retrieval (IR) publications, two layers of enriched information are constructed and contrasted: one is the communities detected through the topology of coauthorship network and the other is the topics of the …

Information metrics (iMetrics): a research specialty with a socio-cognitive identity? (2013)
Staša Milojević and Loet Leydesdorff
Scientometrics, 95 (1), 141-157

“Bibliometrics”, “scientometrics”, “informetrics”, and “webometrics” can all be considered as manifestations of a single research area with similar objectives and methods, which we call “information metrics” or iMetrics. This study explores the cognitive and social distinctness of iMetrics with respect to the general information science (IS), focusing on a core of researchers, shared vocabulary and literature/knowledge base. Our analysis investigates the similarities and differences between four document sets. The document sets are drawn from three core journals for iMetrics research (Scientometrics, Journal of the American Society for Information Science and Technology, and Journal of Informetrics). We split JASIST into document sets containing iMetrics and general IS articles. The volume of publications in this representation of the specialty has increased rapidly during the last decade. A core of researchers …

Upper tag ontology for integrating social tagging data (2010)
Ying Ding, Elin K Jacob, Michael Fried, Ioan Toma, Erjia Yan, Schubert Foo ...
Journal of the American Society for Information Science and Technology, 61 (3), 505-521

Data integration and mediation have become central concerns of information technology over the past few decades. With the advent of the Web and the rapid increases in the amount of data and the number of Web documents and users, researchers have focused on enhancing the interoperability of data through the development of metadata schemes. Other researchers have looked to the wealth of metadata generated by bookmarking sites on the Social Web. While several existing ontologies have capitalized on the semantics of metadata created by tagging activities, the Upper Tag Ontology (UTO) emphasizes the structure of tagging activities to facilitate modeling of tagging data and the integration of data from different bookmarking sites as well as the alignment of tagging ontologies. UTO is described and its utility in modeling, harvesting, integrating, searching, and analyzing data is demonstrated with metadata …

Referenced Publication Years Spectroscopy applied to iMetrics: Scientometrics, Journal of Informetrics, and a relevant subset of JASIST (2014)
Loet Leydesdorff, Lutz Bornmann, Werner Marx and Staša Milojević
Journal of Informetrics, 8 (1), 162-174

We have developed a (freeware) routine for “Referenced Publication Years Spectroscopy” (RPYS) and apply this method to the historiography of “iMetrics,” that is, the junction of the journals Scientometrics, Informetrics, and the relevant subset of JASIST (approx. 20%) that shapes the intellectual space for the development of information metrics (bibliometrics, scientometrics, informetrics, and webometrics). The application to information metrics (our own field of research) provides us with the opportunity to validate this methodology, and to add a reflection about using citations for the historical reconstruction. The results show that the field is rooted in individual contributions of the 1920s to 1950s (e.g., Alfred J. Lotka), and was then shaped intellectually in the early 1960s by a confluence of the history of science (Derek de Solla Price), documentation (e.g., Michael M. Kessler's “bibliographic coupling”), and “citation …

Citations: Indicators of quality? The impact fallacy (2016)
Loet Leydesdorff, Lutz Bornmann, Jordan A Comins and Staša Milojević
Frontiers in Research Metrics and Analytics, 1 1

We argue that citation is a composed indicator: short-term citations can be considered as currency at the research front, whereas long-term citations can contribute to the codification of knowledge claims into concept symbols. Knowledge claims at the research front are more likely to be transitory and are therefore problematic as indicators of quality. Citation impact studies focus on short-term citation, and therefore tend to measure not epistemic quality, but involvement in current discourses in which contributions are positioned by referencing. We explore this argument using three case studies: (1) citations of the journal Soziale Welt as an example of a venue that tends not to publish papers at a research front, unlike, for example, JACS; (2) Robert Merton as a concept symbol across theories of citation; and (3) the Multi-RPYS (“Multi-Referenced Publication Year Spectroscopy”) of the journals Scientometrics, Gene, and Soziale Welt. We show empirically that the measurement of “quality” in terms of citations can further be qualified: short-term citation currency at the research front can be distinguished from longer-term processes of incorporation and codification of knowledge claims into bodies of knowledge. The recently introduced Multi-RPYS can be used to distinguish between short-term and long-term impacts.

arXiv E‐prints and the journal of record: An analysis of roles and relationships (2014)
Vincent Larivière, Cassidy R Sugimoto, Benoit Macaluso, Staša Milojević, Blaise Cronin and Mike Thelwall
Journal of the Association for Information Science and Technology, 65 (6), 1157-1169

Since its creation in 1991, arXiv has become central to the diffusion of research in a number of fields. Combining data from the entirety of arXiv and the Web of Science (WoS), this article investigates (a) the proportion of papers across all disciplines that are on arXiv and the proportion of arXiv papers that are in the WoS, (b) the elapsed time between arXiv submission and journal publication, and (c) the aging characteristics and scientific impact of arXiv e‐prints and their published version. It shows that the proportion of WoS papers found on arXiv varies across the specialties of physics and mathematics, and that only a few specialties make extensive use of the repository. Elapsed time between arXiv submission and journal publication has shortened but remains longer in mathematics than in physics. In physics, mathematics, as well as in astronomy and astrophysics, arXiv versions are cited more promptly and decay …

How are academic age, productivity and collaboration related to citing behavior of researchers? (2012)
Staša Milojević
PloS one, 7 (11), e49176

References are an essential component of research articles and therefore of scientific communication. In this study we investigate referencing (citing) behavior in five diverse fields (astronomy, mathematics, robotics, ecology and economics) based on 213,756 core journal articles. At the macro level we find: (a) a steady increase in the number of references per article over the period studied (50 years), which in some fields is due to a higher rate of usage, while in others reflects longer articles and (b) an increase in all fields in the fraction of older, foundational references since the 1980s, with no obvious change in citing patterns associated with the introduction of the Internet. At the meso level we explore current (2006–2010) referencing behavior of different categories of authors (21,562 total) within each field, based on their academic age, productivity and collaborative practices. Contrary to some previous findings and expectations we find that senior researchers use references at the same rate as their junior colleagues, with similar rates of re-citation (use of same references in multiple papers). High Modified Price Index (MPI, which measures the speed of the research front more accurately than the traditional Price Index) of senior authors indicates that their research has the similar cutting-edge aspect as that of their younger colleagues. In all fields both the productive researchers and especially those who collaborate more use a significantly lower fraction of foundational references and have much higher MPI and lower re-citation rates, i.e., they are the ones pushing the research front regardless of researcher age. This paper introduces improved …

Edit your profile