Luis Rocha Profile Picture

Luis Rocha

  • rocha@indiana.edu
  • (812) 856-1832
  • Home Website
  • Professor
    Computer Science
  • Professor
    Informatics

Field of study

  • Complex systems modeling, distributed artificial intelligence and artificial life, computational and mathematical biology, adaptive agents, information retrieval, data mining, and uncertainty modeling

Representative publications

Singular value decomposition and principal component analysis (2003)
Michael E Wall, Andreas Rechtsteiner and Luis M Rocha
Springer US. 91-109

One of the challenges of bioinformatics is to develop effective ways to analyze global gene expression data. A rigorous approach to gene expression analysis must involve an up-front characterization of the structure of the data. Singular value decomposition (SVD) and principal component analysis (PCA) can be valuable tools in obtaining such a characterization. SVD and PCA are common techniques for analysis of multivariate data. A single microarray1 experiment can generate measurements for tens of thousands of genes. Present experiments typically consist of less than ten assays, but can consist of hundreds (Hughes et al., 2000). Gene expression data are currently rather noisy, and SVD can detect and extract small signals from noisy data. The goal of this chapter is to provide precise explanations of the use of SVD and PCA for gene expression analysis, illustrating methods using simple examples. We …

Computational fact checking from knowledge networks (2015)
Giovanni Luca Ciampaglia, Prashant Shiralkar, Luis M Rocha, Johan Bollen, Filippo Menczer and Alessandro Flammini
PLoS ONE, 10 (6), e0128193. doi:10.1371/journal.pon

Traditional fact checking by expert journalists cannot keep up with the enormous volume of information that is now generated online. Computational fact checking may significantly enhance our ability to evaluate the veracity of dubious information. Here we show that the complexities of human fact checking can be approximated quite well by finding the shortest path between concept nodes under properly defined semantic proximity metrics on knowledge graphs. Framed as a network problem this approach is feasible with efficient computational techniques. We evaluate this approach by examining tens of thousands of claims related to history, entertainment, geography, and biographical information using a public knowledge graph extracted from Wikipedia. Statements independently known to be true consistently receive higher support via our method than do false ones. These findings represent a significant step toward scalable computational fact-checking methods that may one day mitigate the spread of harmful misinformation.

The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text (2011)
Martin Krallinger, Miguel Vazquez, Florian Leitner, David Salgado, Andrew Chatr-Aryamontri, Andrew Winter ...
BMC bioinformatics, 12 (8), S3

Determining usefulness of biomedical text mining systems requires realistic task definition and data selection criteria without artificial constraints, measuring performance aspects that go beyond traditional metrics. The BioCreative III Protein-Protein Interaction (PPI) tasks were motivated by such considerations, trying to address aspects including how the end user would oversee the generated output, for instance by providing ranked results, textual evidence for human interpretation or measuring time savings by using automated systems. Detecting articles describing complex biological events like PPIs was addressed in the Article Classification Task (ACT), where participants were asked to implement tools for detecting PPI-describing abstracts. Therefore the BCIII-ACT corpus was provided, which includes a training, development and test set of over 12,000 PPI relevant and non-relevant PubMed abstracts labeled manually by domain experts and recording also the human classification times. The Interaction Method Task (IMT) went beyond abstracts and required mining for associations between more than 3,500 full text articles and interaction detection method ontology concepts that had been applied to detect the PPIs reported in them. A total of 11 teams participated in at least one of the two PPI tasks (10 in ACT and 8 in the IMT) and a total of 62 persons were involved either as participants or in preparing data sets/evaluating these tasks. Per task, each team was allowed to submit five runs offline and another five online via the BioCreative Meta-Server. From the 52 runs submitted for the ACT, the highest Matthew's Correlation Coefficient (MCC …

Selected self-organization and the semiotics of evolutionary systems (1998)
Luis Mateus Rocha
Springer, Dordrecht. 341-358

Heinz von Foerster (1965, 1969, 1977) equated the ability of an organization to classify its environment with the notion of eigenbehavior. He postulated the existence of some stable structures (eigenvalues) which are maintained in the operations of an organization’s dynamics. Following Piaget (von Foerster, 1977), he observed that any specific instance of observation of such an organization will still be the result of an indefinite succession of cognitive/sensory-motor operations. This reiterated the constructivist position that observables do not refer directly to real world objects, but are instead the result of an infinite cascade of cognitive and sensory-motor operations in some environment/subject coupling. Eigenvalues are self-defining, or self-referent, through the imbedding dynamics — implying a complementary relationship (circularity, closure) between eigen-values and cognitive/sensory-motor operators …

Towards semiotic agent-based models of socio-technical organizations (2000)
Cliff Joslyn and Luis Rocha
Proc. AI, Simulation and Planning in High Autonomy Systems (AIS 2000) Conference, Tucson, Arizona, 70-79

We present an approach to agent modeling of socio-technical organizations based on the principles of semiotics. After reviewing complex systems theory and traditional Artificial Life (ALife) and Artificial Intelligence (AI) approaches to agent-based modeling, we introduce the fundamental principles of semiotic agents as decision-making entities embedded in artificial environments and exchanging and interpreting semiotic tokens. We proceed to discuss the design requirements for semiotic agents, including those for artificial environments with a rich enough “virtual physics” to support selected selforganization; semiotic agents as implementing a generalized control relation; and situated communication and shared knowledge within a community of such agents. We conclude with a discussion of the resulting properties of such systems for dynamical incoherence, and finally describe an application to the simulation of the decision structures of Command and Control Organizations.

Complex systems modeling: Using metaphors from nature in simulation and scientific models (1999)
Luis M Rocha
BITS: Computer and Communications News, (November),

Symbiotic Intelligence: self-organizing knowledge on distributed networks driven by human interaction (1998)
Norman Johnson, Steen Rasmussen, Cliff Joslyn, Luis Rocha, Steven Smith and Marianna Kantor
MIT Press. 403-407

Through conceptual examples and demonstrations, we argue that the symbiotic combination of the Internet and humans will result in a significant enhancement of the previously existing, self-organizing social structure of humans. The combination of the unique capabilities of intelligent, distributed information systems (the relatively loss-less transmission and capturing of detailed signatures) with the unique capabilities of humans (processing and analysis of complex, but limited, systems) will enable essential problem solving within our increasingly complex world. The capability may allow solutions that are not achievable directly by individuals, organizations or governments.

Control of complex networks requires both structure and dynamics (2016)
Alexander J Gates and Luis M Rocha
Scientific Reports, 6 (24456), doi:10.1038/srep24456

The study of network structure has uncovered signatures of the organization of complex systems. However, there is also a need to understand how to control them; for example, identifying strategies to revert a diseased cell to a healthy state, or a mature cell to a pluripotent state. Two recent methodologies suggest that the controllability of complex systems can be predicted solely from the graph of interactions between variables, without considering their dynamics: structural controllability and minimum dominating sets. We demonstrate that such structure-only methods fail to characterize controllability when dynamics are introduced. We study Boolean network ensembles of network motifs as well as three models of biochemical regulation: the segment polarity network in Drosophila melanogaster, the cell cycle of budding yeast Saccharomyces cerevisiae, and the floral organ arrangement in Arabidopsis thaliana. We …

Evolution with material symbol systems (2001)
Luis Mateus Rocha
Biosystems, 60 (1-3), 95-121

Pattee's semantic closure principle is used to study the characteristics and requirements of evolving material symbols systems. By contrasting agents that reproduce via genetic variation with agents that reproduce via self-inspection, we reach the conclusion that symbols are necessary to attain open-ended evolution, but only if the phenotypes of agents are the result of a material, self-organization process. This way, a study of the inter-dependencies of symbol and matter is presented. This study is based first on a theoretical treatment of symbolic representations, and secondly on simulations of simple agents with matter-symbol inter-dependencies. The agent-based simulations use evolutionary algorithms with indirectly encoded phenotypes. The indirect encoding is based on Fuzzy Development programs, which are procedures for combining fuzzy sets in such a way as to model self-organizing development processes.

Material representations: from the genetic code to the evolution of cellular automata (2005)
Luis Mateus Rocha and Wim Hordijk
Artificial life, 11 (2-Jan), 189-214

We present a new definition of the concept of representation for cognitive science that is based on a study of the origin of structures that are used to store memory in evolving systems. This study consists of novel computer experiments in the evolution of cellular automata to perform nontrivial tasks as well as evidence from biology concerning genetic memory. Our key observation is that representations require inert structures to encode information used to construct appropriate dynamic configurations for the evolving system. We propose criteria to decide if a given structure is a representation by unpacking the idea of inert structures that can be used as memory for arbitrary dynamic configurations. Using a genetic algorithm, we evolved cellular automata rules that can perform nontrivial tasks related to the density task (or majority classification problem) commonly used in the literature. We present the particle catalogs of …

Eigenbehavior and symbols (1996)
Luis Mateus Rocha
Systems Research, 13 (3), 371-384

In this paper I sketch a rough taxonomy of self‐organization which may be of relevance in the study of cognitive and biological systems. I frame the problem both in terms of the language Heinz von Foerster used to formulate much of second‐order cybernetics as well as the language of current theories of self‐organization and complexity. In particular, I defend the position that, on the one hand, self‐organization alone is not rich enough for our intended simulations, and on the other, that genetic selection in biology and symbolic representation in cognitive science alone leave out the very important (self‐organizing) characteristics of particular embodiments of evolving and learning systems. I propose the acceptance of the full concept of symbol with its syntactic, semantic, and pragmatic dimensions. I argue that the syntax should be treated operationally in second‐order cybernetics.

Evidence sets and contextual genetic algorithms: Exploring uncertainty, context, and embodiment in cognitive and biological systems (1998)
Luis Mateus Rocha
2655-2655

Degree: Ph. D.DegreeYear: 1997Institute: State University of New York at BinghamtonThis dissertation proposes a systems-theoretic framework to model biological and cognitive systems which requires both self-organizing and symbolic dimensions. The framework is based on an inclusive interpretation of semiotics as a conceptual theory used for the simulation of complex systems capable of representing, as well as evolving in their environments, with implications for Artificial Intelligence and Artificial Life. This evolving semiotics is referred to as Selected Self-Organization when applied to biological systems, and Evolutionary Constructivism when applied to cognitive systems. Several formal avenues are pursued to define tools necessary to build models under this framework.

Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks (2008)
Alaa Abi-Haidar, Jasleen Kaur, Ana Maguitman, Predrag Radivojac, Andreas Rechtsteiner, Karin Verspoor ...
Genome biology, 9 (2), S11

We participated in three of the protein-protein interaction subtasks of the Second BioCreative Challenge: classification of abstracts relevant for protein-protein interaction (interaction article subtask [IAS]), discovery of protein pairs (interaction pair subtask [IPS]), and identification of text passages characterizing protein interaction (interaction sentences subtask [ISS]) in full-text documents. We approached the abstract classification task with a novel, lightweight linear model inspired by spam detection techniques, as well as an uncertainty-based integration scheme. We also used a support vector machine and singular value decomposition on the same features for comparison purposes. Our approach to the full-text subtasks (protein pair and passage identification) includes a feature expansion method based on word proximity networks. Our approach to the abstract classification task (IAS) was among the top submissions for this task in terms of measures of performance used in the challenge evaluation (accuracy, F-score, and area under the receiver operating characteristic curve). We also report on a web tool that we produced using our approach: the Protein Interaction Abstract Relevance Evaluator (PIARE). Our approach to the full-text tasks resulted in one of the highest recall rates as well as mean reciprocal rank of correct passages. Our approach to abstract classification shows that a simple linear model, using relatively few features, can generalize and uncover the conceptual nature of protein-protein interactions from the bibliome. Because the novel approach is based on a rather lightweight linear model, it can easily be ported and applied to …

Evaluation of the host transcriptional response to human cytomegalovirus infection (2004)
Jean F Challacombe, Andreas Rechtsteiner, Raphael Gottardo, Luis M Rocha, Edward P Browne, Thomas Shenk ...
Physiological genomics, 18 (1), 51-62

Gene expression data from human cytomegalovirus (HCMV)-infected cells were analyzed using DNA-Chip Analyzer (dChip) followed by singular value decomposition (SVD) and compared with a previous analysis of the same data that employed GeneChip software and a fold change filtering approach. dChip and SVD analysis revealed two clusters of coexpressed human genes responding differently to HCMV infection: one containing some genes identified previously, and another that was largely unique to this analysis. Annotating these genes, we identified several functional categories important to host cell responses to HCMV infection. These categories included genes involved in transcriptional regulation, oncogenesis, and cell cycle regulation, which were more prevalent in cluster 1, and genes involved in immune system regulation, signal transduction, and cell adhesion, which were more prevalent in cluster …

Contextual genetic algorithms: Evolving developmental rules (1995)
Luis Mateus Rocha
Springer, Berlin, Heidelberg. 368-382

A genetic algorithm scheme with a stochastic genotype/phenotype relation is proposed. The mechanisms responsible for this intermediate level of uncertainty, are inspired by the biological system of RNA editing found in a variety of organisms. In biological systems, RNA editing represents a significant and potentially regulatory step in gene expression. The artificial algorithm here presented, will propose the evolution of such regulatory steps as an aid to the modeling of differentiated development of artificial organisms according to environmental, contextual, constraints. This mechanism of genetic string editing will then be utilized in the definition of a genetic algorithm scheme, with good scaling and evolutionary properties, in which phenotypes are represented by mathematical structures based on fuzzy set and evidence theories.

Dissertation Committee Service

Dissertation Committee Service
Author Dissertation Title Committee
Simas, Tiago De Stochastic Models and Transitivity in Complex Networks (May 2012) Rocha, L. (Chair), Sporns, O., Flammini, A., Bollen, J.
Edit your profile