Regression analysis and introduction to linear models. Topics: Multiple regression, analysis of covariance, least square means, logistic regression, and non-linear regression. This course includes a one hour computer lab and emphasizes hands-on applications to datasets from the health sciences.
Advanced presentation of statistical methods for comparing populations and estimating and testing associations between variables. Topics: Point estimation, confidence intervals, hypothesis testing, ANOVA models for 1, 2 and k way classifications, multiple comparisons, chi-square test of homogeneity, Fisher's exact test, McNemar's test, measures of association, including odds ratio, relative risks, Mantel-Haenszel tests of association, and standardized rates, repeated measures ANOVA, simple regression and correlation. This course includes a one-hour computing lab and emphasizes hands-on applications to datasets from the health-related sciences.
Statistical tools for analyzing experiments involving genomic data. Topics: Basic genetics and statistics, linkage analysis and map construction using genetic markers, association studies, Quantitative Trait Loci analysis with ANOVA, variance components analysis and marker regression (including multiple and partial regression), QTL mapping with interval mapping and composite interval mapping, LOD test, supervised and unsupervised methods for gene expression microarray data across multiple conditions.
This course is intended for students interested in statistical computing. The goal of this course is to enable students to do essential computations and statistical analysis using SAS and R software. Topics include descriptive statistics, graphical presentation, estimation, hypothesis testing, sample size and power; emphasis on learning statistical methods and concepts through hands-on experience with real data.
This course provides an introduction to the fundamental principles for evaluating causal relationships based on the potential outcomes framework. Topics include the concept of causation versus association, Hill’s criteria, and sufficient causal conditions; causal estimands, assumptions, associated biases, and sensitivity analysis; graphical methods for displaying a causal structure and formulating directed acyclical graphs and other modeling strategies to evaluate causal mechanisms; use of randomization, quasi-experimental designs. other non-randomized experiments, and purely observational data; direct and indirect effects, and mediation models; propensity score-based methods with matching, weighting and other sample equating approaches for point interventions; marginal structural models and g estimation for time-varying treatment; and the role of ‘big data’, study design, and pragmatic trials for causal inference.
Prerequisites: 503 and 504
Preferred: 521 and 522 (can be concurrent) or permission of instructor
Introduces alternate methods for designing and analyzing comparative studies that may be used when some or all of the assumptions underlying the usual parametric method are questionable. Topics: 1- , 2- , and k-sample location problems, randomized block and repeated measures designs, the independence problem, rank transformation tests, randomization tests, the 2-sample dispersion problem, and other topics as time permits.
This course provides students with useful methods for analyzing categorical data. Topics: Cross-classification tables, tests for independence, log-linear models, Poisson regression, ordinal logistic regression, and multinomial regression for the logistic model.
Provides student with probability and distribution theory necessary for study of statistics. Topics: axioms of probability theory, independence, conditional probability, random variables, discrete and continuous probability distributions, functions of random variables, moment generating functions, Law of Large Numbers and Central Limit Theorem.
Introduces principles of statistical inference. Classical methods of estimation, tests of significance, and Neyman-Pearson Theory of testing hypotheses, maximum likelihood methods, and Bayesian statistics are introduced and developed.
Since the completion of the human genome project, there is a burgeoning field of new applications for statistics involving high throughput experiments designed to gather large amounts of information on biological systems. This course is focused on discussing the wide array of approaches and technologies implemented to gather this information and the statistical issues involved from initial data processing steps to end stage research objectives. Specifically, time permitting, the technologies we will examine include two dimensional protein gel electrophoresis, protein mass spectrometry, and several flavors of microarray experiments. Much of the work for the course will involve analyzing data sets from class and form the text using the R language.
Introduction to fundamental principles and planning techniques for designing and analyzing statistical experiments. Recommended for students in applied fields. Topics: Justification for randomized controlled clinical trials, methods of randomization, blinding and placebos, ethical issues, parallel groups design, crossover trials, inclusion of covariates, determining sample size, sequential designs, interim analyses, repeated measures studies.
Introduction to theory and practice of sample surveys involving collection of statistical data from planned surveys.
Introduces factorial experiments, fractional factorial experiments, confounding, lattice designs, various incomplete block designs, efficiency of experimentation, and problems of design construction.
Deals with statistical methods for estimation and testing hypotheses when samples are observed and analyzed sequentially.
This course presents the topic of data mining from a statistical perspective, with attention directed towards both applied and theoretical considerations. An emphasis will be placed on supervised learning methods. Topics include: linear and logistic regression, discriminant analysis, shrinkage methods, subset selection, dimension reduction techniques, classification and regression trees, ensemble methods, neural networks, and random forests. Model selection and estimation of generalization error will be emphasized. Considerations and issues that arise with high-dimensional (N<<p) applications will be highlighted. Applications will be presented in R to illustrate methods and concepts.
This course presents the topic of data mining from a statistical perspective, with attention directed towards both applied and theoretical considerations. An emphasis will be placed on unsupervised learning methods, especially those designed to discover and model patterns in data. Applications to high-dimensional data (N<<p) and big data (N>>p) will be highlighted. Topics include: market basket analysis, hierarchical and center-based clustering, self organizing maps, factor analysis, computer vision, eigenfaces, data visualization, graphical models. Applications will be presented in Matlab and R to illustrate methods and concepts.
For graduate students who have had an introduction to probability theory and advanced calculus. Concepts, properties, basic theory, and applications of stochastic processes.
Introduction to methods for analyzing longitudinal and time series data. Topics: Random coefficient regression models, growth curve analysis, hierarchical linear models, general mixed models, autoregressive and moving average models for time series data, and the analysis of cross-section time series data.
The Bayesian approach to statistical design and analysis can be viewed as a philosophical approach or as a procedure-generator. The use of Bayesian design and analysis is burgeoning. In this introduction to Bayesian methods, we consider basic examples of Bayesian thinking and formalism on which more complicated and comprehensive approaches are built. These include adjusting estimates using related information, the use of Bayes Factors in testing of hypotheses, the relationship of the prior and posterior distributions, and the key steps in a Bayesian analysis. We consider the Bayesian approach that requires a data likelihood (the sampling distribution) and a prior distribution. From these, the posterior distribution can be computed and used to inform statistical design and analysis. Applications of this technique are presented.
This course aims has aims to develop both technical and soft skills that are not directly taught in traditional courses, but are relevant to modern data science. Students will be required to work in interdisciplinary teams in a data challenge sponsored by SAGE Bionetworks. These data challenges pose some of the most relevant problems in biomedicine and students will compete with other teams across the globe. The course will blend traditional lectures and training, with group-based learning. Modules that capture efficient shared computing, scientific presentations, technical writing, and group work strategies will be emphasized. Scientific topics will be tailored by the nature of the selected challenges. However, a strong foundational knowledge of data mining and statistical computing will be required.
Provides an advanced course on the use of life tables and analysis of failure time data. Topics: Use of Kaplan-Meier survival curves, use of log rank test, Cox proportional hazards model, evaluating the proportionality assumption, dealing with non-proportionality, stratified Cox procedure, extension to time-dependent variables, and comparison with logistic regression approaches.
Presents methods for analyzing multiple outcome variables simultaneously, and for classification and variable reduction. Topics: Multivariate normal distribution, simple, partial, and multiple correlation; Hotelling's T-squared, multivariate analysis of variance, and general linear hypothesis, and discriminant analysis, cluster analysis, principal components analysis, and factor analysis.