Introduces basic principles of probability and distribution theory and statistical inference. Topics include axioms of probability theory, independence, conditional probability random variables, discrete and continuous distributions, functions of random variables, moment generating functions, central limit theorem, point and interval estimation, maximum likelihood methods, tests of significance, and the Neyman-Pearson theory of testing hypotheses.
May not be used as credit for MA students in biostatistics.
Regression analysis and introduction to linear models. Topics: Multiple regression, analysis of covariance, least square means, logistic regression, and non-linear regression. This course includes a one hour computer lab and emphasizes hands-on applications to datasets from the health sciences.
Advanced presentation of statistical methods for comparing populations and estimating and testing associations between variables. Topics: Point estimation, confidence intervals, hypothesis testing, ANOVA models for 1, 2 and k way classifications, multiple comparisons, chi-square test of homogeneity, Fisher's exact test, McNemar's test, measures of association, including odds ratio, relative risks, Mantel-Haenszel tests of association, and standardized rates, repeated measures ANOVA, simple regression and correlation. This course includes a one-hour computing lab and emphasizes hands-on applications to datasets from the health-related sciences.
Statistical tools for analyzing experiments involving genomic data. Topics: Basic genetics and statistics, linkage analysis and map construction using genetic markers, association studies, Quantitative Trait Loci analysis with ANOVA, variance components analysis and marker regression (including multiple and partial regression), QTL mapping with interval mapping and composite interval mapping, LOD test, supervised and unsupervised methods for gene expression microarray data across multiple conditions.
This course provides the background in special topics in mathematics required to succeed in the biostatistics graduate programs and is required for students who have not had an advanced calculus and/or matrix algebra course. The basic mathematical concepts relevant to statistical studies will be discussed. Topics: convergence of sequences of sets, numbers, and functions, convergence of series, uniform convergence, power series, term by term integration and differentiation, matrix algebra, and other topics as time permits.
Introduces alternate methods for designing and analyzing comparative studies that may be used when some or all of the assumptions underlying the usual parametric method are questionable. Topics: 1- , 2- , and k-sample location problems, randomized block and repeated measures designs, the independence problem, rank transformation tests, randomization tests, the 2-sample dispersion problem, and other topics as time permits.
This course provides students with useful methods for analyzing categorical data. Topics: Cross-classification tables, tests for independence, log-linear models, Poisson regression, ordinal logistic regression, and multinomial regression for the logistic model.
It can be said there are no new problems in statistics, only new applications. Since the completion of the human genome project, there is a burgeoning field of new applications for statistics involving high throughput experiments designed to gather large amounts of information on biological systems. This course is focused on discussing the wide array of approaches and technologies implemented to gather this information and the statistical issues involved from initial data processing steps to end stage research objectives. Specifically, time permitting, the technologies we will examine include two dimensional protein gel electrophoresis, protein mass spectrometry, and several flavors of microarray experiments. We will use the text "Bioinformatics and Computational Biology Solutions Using R and Bioconductor". Much of the work for the course will involve analyzing data sets from class and from the text using the R language.
Introduction to fundamental principles and planning techniques for designing and analyzing statistical experiments. Recommended for students in applied fields. Topics: Justification for randomized controlled clinical trials, methods of randomization, blinding and placebos, ethical issues, parallel groups design, crossover trials, inclusion of covariates, determining sample size, sequential designs, interim analyses, repeated measures studies.
This course is a continuation of the introduction to the statistical analysis of data and statistical design of experiments with an emphasis on regression methods. The material covered includes study design and the role of regression methods, simple linear regression, multiple regression, generalized linear models with a focus on logistic and Poisson outcomes, interactions, confounding variables, other regression models as time allows and statistical software usage. Statistical techniques will be demonstrated using real-world examples. This is a hands-on course and students will be doing calculations and analyses, not just interpreting analyses done by others.
Instructor: Kristopher Attwood, PhD
Format: seated
Introduction to theory and practice of sample surveys involving collection of statistical data from planned surveys.
Introduces factorial experiments, fractional factorial experiments, confounding, lattice designs, various incomplete block designs, efficiency of experimentation, and problems of design construction.
Deals with statistical methods for estimation and testing hypotheses when samples are observed and analyzed sequentially.
This course presents the topic of data mining from a statistical perspective, with attention directed towards both applied and theoretical considerations. An emphasis will be placed on supervised learning methods. Topics include: linear and logistic regression, discriminant analysis, shrinkage methods, subset selection, dimension reduction techniques, classification and regression trees, ensemble methods, neural networks, and random forests. Model selection and estimation of generalization error will be emphasized. Considerations and issues that arise with high-dimensional (N<<p) applications will be highlighted. Applications will be presented in R to illustrate methods and concepts.
This course presents the topic of data mining from a statistical perspective, with attention directed towards both applied and theoretical considerations. An emphasis will be placed on unsupervised learning methods, especially those designed to discover and model patterns in data. Applications to high-dimensional data (N<<p) and big data (N>>p) will be highlighted. Topics include: market basket analysis, hierarchical and center-based clustering, self organizing maps, factor analysis, computer vision, eigenfaces, data visualization, graphical models. Applications will be presented in Matlab and R to illustrate methods and concepts.
For graduate students who have had an introduction to probability theory and advanced calculus. Concepts, properties, basic theory, and applications of stochastic processes.
Introduction to methods for analyzing longitudinal and time series data. Topics: Random coefficient regression models, growth curve analysis, hierarchical linear models, general mixed models, autoregressive and moving average models for time series data, and the analysis of cross-section time series data.
The Bayesian approach to statistical design and analysis can be viewed as a philosophical approach or as a procedure-generator. The use of Bayesian design and analysis is burgeoning. In this introduction to Bayesian methods, we consider basic examples of Bayesian thinking and formalism on which more complicated and comprehensive approaches are built. These include adjusting estimates using related information, the use of Bayes Factors in testing of hypotheses, the relationship of the prior and posterior distributions, and the key steps in a Bayesian analysis. We consider the Bayesian approach that requires a data likelihood (the sampling distribution) and a prior distribution. From these, the posterior distribution can be computed and used to inform statistical design and analysis. Applications of this technique are presented.
It can be said there are no new problems in statistics, only new applications. Since the completion of the human genome project, there is a burgeoning field of new applications for statistics involving high throughput experiments designed to gather large amounts of information on biological systems. This course is focused on discussing the wide array of approaches and technologies implemented to gather this information and the statistical issues involved from initial data processing steps to end stage research objectives. Specifically, time permitting, the technologies we will examine include two dimensional protein gel electrophoresis, protein mass spectrometry, several flavors of microarrays, and Xerogel sensor experiments.
We will use the text "Bioinformatics and Computational Biology Solutions Using R and Bioconductor". Much of the work for the course will involve analyzing data sets from class and from the text using the R language.
Provides an advanced course on the use of life tables and analysis of failure time data. Topics: Use of Kaplan-Meier survival curves, use of log rank test, Cox proportional hazards model, evaluating the proportionality assumption, dealing with non-proportionality, stratified Cox procedure, extension to time-dependent variables, and comparison with logistic regression approaches.
Presents methods for analyzing multiple outcome variables simultaneously, and for classification and variable reduction. Topics: Multivariate normal distribution, simple, partial, and multiple correlation; Hotelling's T-squared, multivariate analysis of variance, and general linear hypothesis, and discriminant analysis, cluster analysis, principal components analysis, and factor analysis.
Prerequisite: None
This course is intended to provide a basic introduction to principles and methods of epidemiology. The course emphasizes the conceptual aspects of epidemiologic investigation and application of these concepts in public health and related professions. Topics include overview of the epidemiologic approach to studying disease; the natural history of disease; measures of disease occurrence, association and risk; epidemiologic study designs; disease surveillance; population screening; interpreting epidemiologic associations; causal inference using epidemiologic information; and application of these basic concepts in the context of selected major diseases and risk factors. Please note that this course cannot be used for degrees that require EEH 501 unless pre-approved by the program director, or as a prerequisite for courses that require EEH 501.
Format: Online
Corequisite: Students must enroll in STA 527 LEC and STA 527 REC in the same term.
This course is designed for students concerned with medical data. The material covered includes: the design of clinical trials and epidemiological studies; data collection; summarizing and presenting data; probability; standard error; confidence intervals and significance tests; techniques of data analysis including multifactorial methods and the choice of statistical methods; problems of medical measurement and diagnosis; and vital statistics and calculation of sample size. The design and analysis of medical research studies will be illustrated. MINITAB is used to perform some data analysis. Descriptive statistics, probability distributions, estimation, tests of hypothesis, categorical data, regression model, analysis of variance, nonparametric methods, and others will be discussed as time permits.
Instructor: Kuhlmann
Format: seated and online