Michael Trosset Profile Picture

Michael Trosset

  • mtrosset@indiana.edu
  • (812) 856-7824
  • Professor
    Statistics

Field of study

  • Computational statistics, statistical learning, multidimensional scaling, nonlinear dimension reduction, classification, clustering

Education

  • Ph.D., University of California, Los Angeles, 1989

Research interests

  • Computational Statistics, especially problems that involve numerical optimization, e.g., the development of tractable formulations of and efficient numerical algorithms for multidimensional scaling and other methods for embedding dissimilarity data.
  • Statistical Learning, i.e., multivariate data-analytic techniques for nonlinear dimension reduction (manifold learning), classification, and clustering. Current interests include the application of distance geometry to the problem of inferring 3-dimensional molecular structure from distance restraints, and various high-dimensional classification problems in bioinformatics.
  • Design & Analysis of Computer Experiments, specifically for the purpose of optimizing computationally expensive computer simulations. Current interests include the application of statistical decision theory to computer-assisted robust design.
  • Stochastic Optimization and Response Surface Methodology, especially for tuning the inputs of highly nonlinear stochastic simulations and estimating the parameters of analytically intractable stochastic processes. Current interests include developing quasi-Newton methods for optimization in the presence of random noise.

Representative publications

Approximate Information Tests on Statistical Submanifolds (2019)
Michael W. Trosset, Carey E. Priebe

Parametric inference posits a statistical model that is a specified family of probability distributions. Restricted inference, e.g., restricted likelihood ratio testing, attempts to exploit the structure of a statistical submodel that is a subset of the specified family. We consider the problem of testing a simple hypothesis against alternatives from such a submodel. In the case of an unknown submodel, it is not clear how to realize the benefits of restricted inference. To do so, we first construct information tests that are locally asymptotically equivalent to likelihood ratio tests. Information tests are conceptually appealing but (in general) computationally intractable. However, unlike restricted likelihood ratio tests, restricted information tests can be approximated even when the statistical submodel is unknown. We construct approximate information tests using manifold learning procedures to extract information from samples of an unknown (or intractable) submodel, thereby providing a roadmap for computational solutions to a class of previously impenetrable problems in statistical inference. Examples illustrate the efficacy of the proposed methodology.

On the Power of Likelihood Ratio Tests in Dimension-Restricted Submodels (2016)
Michael W. Trosset, Mingyue Gao, Carey E. Priebe

Likelihood ratio tests are widely used to test statistical hypotheses about parametric families of probability distributions. If interest is restricted to a subfamily of distributions, then it is natural to inquire if the restricted LRT is superior to the unrestricted LRT. Marden's general LRT conjecture posits that any restriction placed on the alternative hypothesis will increase power. The only published counterexample to this conjecture is rather technical and involves a restriction that maintains the dimension of the alternative. We formulate the dimension-restricted LRT conjecture, which posits that any restriction that replaces a parametric family with a subfamily of lower dimension will increase power. Under standard regularity conditions, we then demonstrate that the restricted LRT is asymptotically more powerful than the unrestricted LRT for local alternatives. Remarkably, however, even the dimension-restricted LRT conjecture fails in the case of finite samples. Our counterexamples involve subfamilies of multinomial distributions. In particular, our study of the Hardy-Weinberg subfamily of trinomial distributions provides a simple and elegant demonstration that restrictions may not increase power.

Fast Embedding for JOFC Using the Raw Stress Criterion (2015)
Vince Lyzinski, Youngser Park, Carey E. Priebe, Michael W. Trosset
Journal of Computational and Graphical Statistics, 26 (4),

The Joint Optimization of Fidelity and Commensurability (JOFC) manifold matching methodology embeds an omnibus dissimilarity matrix consisting of multiple dissimilarities on the same set of objects. One approach to this embedding optimizes the preservation of fidelity to each individual dissimilarity matrix together with commensurability of each given observation across modalities via iterative majorizations of a raw stress error criterion by successive Guttman transforms. In this paper, we exploit the special structure inherent to JOFC to exactly and efficiently compute the successive Guttman transforms, and as a result we are able to greatly speed up and parallelize the JOFC procedure. We demonstrate the scalability of our implementation on both real and simulated data examples.

Parallel deterministic and stochastic global minimization of functions with very many minima (2013)
David R. Easterling, Layne T. Watson, Michael L. Madigan, Brent S. Castle & Michael W. Trosset
Computational Optimization and Applications, 57 (2), 469–492

The optimization of three problems with high dimensionality and many local minima are investigated under five different optimization algorithms: DIRECT, simulated annealing, Spall’s SPSA algorithm, the KNITRO package, and QNSTOP, a new algorithm developed at Indiana University.

Supplementary Material (2014)
David Robert Easterling, Layne T. Watson, Michael Madigan, Brent S. Castle, Michael W. Trosset

Fortran 95 implementation of QNSTOP for global and stochastic optimization (2014)
Brandon Amos, David Robert Easterling, L.T. Watson, B.S. Castle, Michael W. Trosset, William I. Thacker

A serial Fortran 95 implementation of the QNSTOP algorithm is presented. QNSTOP is a class of quasi-Newton methods for stochastic optimization with variations for deterministic global optimization. This discussion provides results from testing on various deterministic and stochastic optimization functions.

Adjusting process count on demand for petascale global optimization (2012)
Masha Sosonkina, Layne T. Watson, Nicholas R. Radcliff, Rafael T.Haftka, Michael W.Trosset
Parallel Computing, 39 (1), 21-35

There are many challenges that need to be met before efficient and reliable computation at the petascale is possible. Many scientific and engineering codes running at the petascale are likely to be memory intensive, which makes thrashing a serious problem for many petascale applications. One way to overcome this challenge is to use a dynamic number of processes, so that the total amount of memory available for the computation can be increased on demand. This paper describes modifications made to the massively parallel global optimization code pVTdirect in order to allow for a dynamic number of processes. In particular, the modified version of the code monitors memory use and spawns new processes if the amount of available memory is determined to be insufficient. The primary design challenges are discussed, and performance results are presented and analyzed.

Direct search and stochastic optimization applied to two nonconvex nonsmooth problems (2012)
David Robert Easterling, Layne T. Watson, Michael Madigan, Brent S. Castle, Michael W. Trosset
Proceedings of the 2012 Symposium on High Performance Computing,

The optimization of two problems with high dimensionality and many local minima are investigated under two different optimization algorithms: DIRECT and QNSTOP, a new algorithm developed at Indiana University.

Interference competition in desert subterranean termites (2011)
S. C. Jones, Michael W. Trosset
Entomologia Experimentalis et Applicata, 61 (1), 83-90

We examined interspecific aggression between two subterranean termite species, Heterotermes aureus (Snyder) (Rhinotermitidae) and Gnathamitermes perplexus (Banks) (Termitidae). In laboratory tests with worker termites, neither species was the inherently superior fighter, but rather the outcome of interspecific encounters depended on the number of conspecifics. We then investigated patterns of resource use by these species during a 13-month period in the Sonoran Desert. Baits consisted of toilet-paper rolls, which have been shown to be a mutually acceptable food source. Analyses of foraging activity demonstrated that the two species did not forage independently of each other. Not only were the two species negatively associated spatially, but extended periods of temporal segregation were observed. G. perplexus took significantly longer to return to sites that it had simultaneously occupied with H. aureus than to sites that G. perplexus had occupied alone. The pattern of co-occurrence of these two species is consistent with the hypothesis that interspecific interference competition affects their spatial and temporal distribution.

Euclidean and circum-Euclidean distance matrices: Characterizations and linear preserves (2010)
Li Chi-Kwong, Milligan Thomas, Trosset Michael W
The electronic journal of linear algebra ELA, 20 (1),

Short proofs are given to various characterizations of the (circum-)Euclidean squared distance matrices. Linear preserver problems related to these matrices are discussed.

Dissertation Committee Service

Dissertation Committee Service
Author Dissertation Title Committee
Blaha, Leslie A Dynamic Hebbian-style Model of Configural Learning (December 2010) Townsend, J. (Co-Chair), Busey, T. (Co-Chair), Gold, J,. Trosset, M.
Edit your profile