Yi's Knowledge Basehttp://www.y1zhou.com/Recent content on Yi's Knowledge BaseHugo -- gohugo.ioen-us2019-{year}Wed, 24 Nov 2021 00:00:00 +0000Naive Bayeshttp://www.y1zhou.com/series/data-mining/data-mining-naive-bayes/Wed, 03 Feb 2021 15:55:00 -0400http://www.y1zhou.com/series/data-mining/data-mining-naive-bayes/We talk about one of the simplest classification methods, naive Bayes classifiers, and its applications in text classification. It’s not really machine learning as we only need a single pass through the data to compute necessary values.Introduction to Bayesian Statisticshttp://www.y1zhou.com/series/bayesian-stat/bayesian-stat-introduction/Wed, 13 Jan 2021 10:20:00 -0400http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-introduction/First lecture of the course, and a brief history of Bayesian statistics.Introductionhttp://www.y1zhou.com/series/time-series/time-series-introduction/Fri, 28 Aug 2020 19:05:11 -0400http://www.y1zhou.com/series/time-series/time-series-introduction/We introduce some basic ideas of time series analysis and stochastic processes. Of particular importance are the concepts of stationarity and the autocovariance and sample autocovariance functions.Matriceshttp://www.y1zhou.com/series/linear-algebra/linear-algebra-matrices/Wed, 26 Aug 2020 15:14:34 -0400http://www.y1zhou.com/series/linear-algebra/linear-algebra-matrices/Matrix algebra plays an important role in many areas of statistics, such as linear statistical models and multivariate analysis. In this chapter we introduce basic terminology and some basic matrix operations. We also introduce some basic types of matrices.Estimationhttp://www.y1zhou.com/series/linear-model/linear-models-estimation/Mon, 30 Sep 2019 13:46:57 -0400http://www.y1zhou.com/series/linear-model/linear-models-estimation/In this chapter we introduce the concept of linear models. We use the ordinary least squares estimator to get unbiased estimates of the unknown parameters. $R^2$ is introduced as a measure of the goodness of fit, and the different types of sum of squares in a linear model are briefly discussed.Basic Conceptshttp://www.y1zhou.com/series/maths-stat/1-probability/mathematical-statistics-basic-concepts/Wed, 25 Sep 2019 11:05:06 -0500http://www.y1zhou.com/series/maths-stat/1-probability/mathematical-statistics-basic-concepts/Introducing the concept of the probability of an event. Also covers set operations and the sample-point method.Basic Conceptshttp://www.y1zhou.com/series/nonparam-stat/1-introduction/nonparametric-methods-basic-concepts/Fri, 25 Jan 2019 22:50:34 -0500http://www.y1zhou.com/series/nonparam-stat/1-introduction/nonparametric-methods-basic-concepts/A brief introduction to what we’re going to discuss in later chapters.Conditional Probabilityhttp://www.y1zhou.com/series/maths-stat/1-probability/mathematical-statistics-conditional-probability/Thu, 26 Sep 2019 11:51:56 -0500http://www.y1zhou.com/series/maths-stat/1-probability/mathematical-statistics-conditional-probability/Introducing conditional probability and independence of events. Bayes’ rule comes in as well.Fundamentals of Nonparametric Methodshttp://www.y1zhou.com/series/nonparam-stat/1-introduction/nonparametric-methods-fundamentals/Fri, 25 Jan 2019 22:50:34 -0500http://www.y1zhou.com/series/nonparam-stat/1-introduction/nonparametric-methods-fundamentals/Some basic tools such as the permutation test and the binomial test. We also introduce order statistics and ranks, which will come in handy in later chapters.Text Classification with Naive Bayes and NLTKhttp://www.y1zhou.com/series/data-mining/data-mining-text-classification-naive-bayes-nltk/Tue, 09 Feb 2021 15:55:00 -0400http://www.y1zhou.com/series/data-mining/data-mining-text-classification-naive-bayes-nltk/In the last post we talked about the theoretical side of naive Bayes in text classification. Here we will implement the model in Python, both from scratch and utilizing existing packages.
The corpus we use is a 26-line poem by T.S. Eliot. In each line a dummy string “ZZZ” or “XXX’ has been inserted, representing the class of the line (“ZZZ” for class 0 and XXX for class 1).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 corpus = [ "And indeed there will be time ZZZ", "For the yellow smoke that slides along the street XXX", "Rubbing its back upon the window-panes ZZZ", "There will be time, there will be time ZZZ", "To prepare a face to meet the faces that you meet XXX", "There will be time to murder and create ZZZ", "And time for all the works and days of hands ZZZ", "That lift and drop a question on your plate ZZZ", "Time for you and time for me ZZZ", "And time yet for a hundred indecisions XXX", "And for a hundred visions and revisions XXX", "Before the taking of a toast and tea ZZZ.Frequentist Inferencehttp://www.y1zhou.com/series/bayesian-stat/bayesian-stat-frequentist-inference/Mon, 18 Jan 2021 10:20:00 -0400http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-frequentist-inference/A simple problem in the binomial setting solved under the frequentist view of statistics.Linear Dependence and Independencehttp://www.y1zhou.com/series/linear-algebra/2-linear-dep-and-indep/Mon, 31 Aug 2020 12:33:24 -0400http://www.y1zhou.com/series/linear-algebra/2-linear-dep-and-indep/A short piece on linearly dependent and independent sets of vectors.Autoregressive Serieshttp://www.y1zhou.com/series/time-series/2-arma/time-series-autoregressive-model/Fri, 28 Aug 2020 20:31:05 -0400http://www.y1zhou.com/series/time-series/2-arma/time-series-autoregressive-model/We talk about autoregressive models of different orders, and introduce their mean, variance, ACF and PACF values. Its stationarity is also briefly discussed.Definitions for Discrete Random Variableshttp://www.y1zhou.com/series/maths-stat/2-discrete-random-variables/mathematical-statistics-discrete-rv-definition/Sun, 06 Oct 2019 10:46:18 -0400http://www.y1zhou.com/series/maths-stat/2-discrete-random-variables/mathematical-statistics-discrete-rv-definition/The probability mass function, cumulative distribution function, expectation and variance for random variables.Location Inference for Single Sampleshttp://www.y1zhou.com/series/nonparam-stat/2-single-samples/nonparametric-methods-single-sample-location-inference/Tue, 26 Mar 2019 21:12:45 -0500http://www.y1zhou.com/series/nonparam-stat/2-single-samples/nonparametric-methods-single-sample-location-inference/The Wilcoxin signed rank test explained.Moving Average Modelhttp://www.y1zhou.com/series/time-series/2-arma/time-series-moving-average-model/Fri, 04 Sep 2020 20:31:13 -0400http://www.y1zhou.com/series/time-series/2-arma/time-series-moving-average-model/The mean, variance, ACF and PACF of moving average models. Instead of stationarity, a new property called invertibility is introduced.Common Discrete Random Variableshttp://www.y1zhou.com/series/maths-stat/2-discrete-random-variables/mathematical-statistics-common-discrete-random-variables/Sun, 06 Oct 2019 10:46:18 -0400http://www.y1zhou.com/series/maths-stat/2-discrete-random-variables/mathematical-statistics-common-discrete-random-variables/We introduce the binomial (Bernoulli), geometric and Poisson probability distributions and their properties. The properties include their expectations, variances and moment generating functions.Other Single Sample Inferenceshttp://www.y1zhou.com/series/nonparam-stat/2-single-samples/nonparametric-methods-other-single-sample-inferences/Fri, 26 Apr 2019 23:45:36 -0500http://www.y1zhou.com/series/nonparam-stat/2-single-samples/nonparametric-methods-other-single-sample-inferences/Explore whether the sample is consistent with a specified distribution at the population level. Kolmogorov’s test, Lilliefors test and Shapiro-Wilk test are introduced, as well as tests for runs or trends.ARMA Modelhttp://www.y1zhou.com/series/time-series/2-arma/time-series-arma-model/Sat, 12 Sep 2020 20:31:18 -0400http://www.y1zhou.com/series/time-series/2-arma/time-series-arma-model/The mean, variance, ACF and PACF of ARMA models. The backshift operator is introduced, and the stationarity and invertibility of the general ARMA(p, q) model is discussed.Gradient Descent and Linear Regressionhttp://www.y1zhou.com/series/data-mining/data-mining-gradient-descent-linear-regression/Thu, 11 Feb 2021 15:55:00 -0400http://www.y1zhou.com/series/data-mining/data-mining-gradient-descent-linear-regression/We implement linear regression using gradient descent, a general optimization technique which in this case can find the global minimum.Bayesian Inference for the Binomial Modelhttp://www.y1zhou.com/series/bayesian-stat/bayesian-stat-bayesian-inference-binomial/Mon, 25 Jan 2021 10:20:00 -0500http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-bayesian-inference-binomial/The general procedure for Bayesian analysis. We use two different prior models and compare the resulting posteriors (visually and mathematically).Model Fitting and Forecastinghttp://www.y1zhou.com/series/time-series/time-series-model-fitting-and-forecasting/Mon, 14 Sep 2020 12:06:41 -0400http://www.y1zhou.com/series/time-series/time-series-model-fitting-and-forecasting/This model-building strategy consists of three steps: model specification (identification), model fitting, and model diagnostics.Vector Spacehttp://www.y1zhou.com/series/linear-algebra/linear-algebra-vector-space/Mon, 31 Aug 2020 13:13:34 -0400http://www.y1zhou.com/series/linear-algebra/linear-algebra-vector-space/We introduce some basic terminology - vector space, subspace, span, basis, and dimension. These concepts lay the foundation for future discussions on matrices and matrix properties.Definitions for Continuous Random Variableshttp://www.y1zhou.com/series/maths-stat/3-continuous-random-variables/mathematical-statistics-continuous-rv-definition/Wed, 25 Sep 2019 10:46:18 -0400http://www.y1zhou.com/series/maths-stat/3-continuous-random-variables/mathematical-statistics-continuous-rv-definition/The probability density function, cumulative distribution function, expectation and variance for a continuous random variable.Methods for Paired Sampleshttp://www.y1zhou.com/series/nonparam-stat/3-multiple-samples/nonparametric-methods-paired-samples/Mon, 29 Apr 2019 14:22:47 -0400http://www.y1zhou.com/series/nonparam-stat/3-multiple-samples/nonparametric-methods-paired-samples/An obvious extension of the one-sample procedures.Common Continuous Random Variableshttp://www.y1zhou.com/series/maths-stat/3-continuous-random-variables/mathematical-statistics-common-continuous-rvs/Fri, 01 Nov 2019 10:46:18 -0400http://www.y1zhou.com/series/maths-stat/3-continuous-random-variables/mathematical-statistics-common-continuous-rvs/The uniform distribution, normal distribution, exponential distribution and their properties.Two Independent Sampleshttp://www.y1zhou.com/series/nonparam-stat/3-multiple-samples/nonparametric-methods-two-independent-samples/Thu, 02 May 2019 12:09:42 -0400http://www.y1zhou.com/series/nonparam-stat/3-multiple-samples/nonparametric-methods-two-independent-samples/With two independent samples, we may ask about the centrality of the population distribution and see if there’s a shift. Wilcoxon-Mann-Whitney is here!Basic Tests for Three or More Sampleshttp://www.y1zhou.com/series/nonparam-stat/3-multiple-samples/nonparametric-methods-three-or-more-samples/Sat, 04 May 2019 12:09:42 -0400http://www.y1zhou.com/series/nonparam-stat/3-multiple-samples/nonparametric-methods-three-or-more-samples/Nonparametric analogues of the one-way classification ANOVA and the simplest two-way classifications, namely the Kruskal-Wallis test, the Jonckheere-Terpstra test, and the Friedman test.Logistic Regressionhttp://www.y1zhou.com/series/data-mining/data-mining-logistic-regression/Fri, 12 Feb 2021 15:55:00 -0400http://www.y1zhou.com/series/data-mining/data-mining-logistic-regression/In linear regression, the function learned is used to estimate the value of the target $y$ using values of input $x$. While it could be used for classification purposes by setting the target value to a distinct constant for each class, it’s a poor choice for this task. The target attribute takes on a finite number of values, yet the linear model produces a continuous range.
For classification tasks, logistic regression is a better choice.Bayesian Inference for the Poisson Modelhttp://www.y1zhou.com/series/bayesian-stat/bayesian-stat-bayesian-inference-poisson/Mon, 08 Feb 2021 10:20:00 -0500http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-bayesian-inference-poisson/This lecture discusses Bayesian inference for the Poisson model, including conjugate prior specification, a different way to specify a “non-informative” prior, and relevant posterior summaries.Multivariate Probability Distributionshttp://www.y1zhou.com/series/maths-stat/mathematical-statistics-multivariate-probability-distributions/Wed, 06 Nov 2019 09:57:16 -0400http://www.y1zhou.com/series/maths-stat/mathematical-statistics-multivariate-probability-distributions/Joint probability distributions of two or more random variables defined on the same sample space. Also covers independence, conditional expectation and total expectation.Mean Trendhttp://www.y1zhou.com/series/time-series/4-nonstationary/time-series-mean-trend/Wed, 30 Sep 2020 11:30:42 -0400http://www.y1zhou.com/series/time-series/4-nonstationary/time-series-mean-trend/We introduce detrending and differencing, two methods that aim to remove the mean trends in time series.Definitions in Arbitrary Linear Spacehttp://www.y1zhou.com/series/linear-algebra/4-geom-considerations/linear-algebra-definitions-in-arbitrary-linear-space/Mon, 14 Sep 2020 20:28:58 -0400http://www.y1zhou.com/series/linear-algebra/4-geom-considerations/linear-algebra-definitions-in-arbitrary-linear-space/This chapter provides an introduction to some fundamental geometrical ideas and results. We start by giving definitions for norm, distance, angle, inner product and orthogonality. The Cauchy-Schwarz inequality comes useful in many settings.Correlation and Concordancehttp://www.y1zhou.com/series/nonparam-stat/4-association-analysis/nonparametric-methods-correlation-and-concordance/Sun, 05 May 2019 10:46:18 -0400http://www.y1zhou.com/series/nonparam-stat/4-association-analysis/nonparametric-methods-correlation-and-concordance/Measures for the strength of relationships between variables (two or more). The Spearman rank correlation coefficient, Kendall’s tau and Kendall’s W are introduced.ARIMA Modelshttp://www.y1zhou.com/series/time-series/4-nonstationary/time-series-arima/Mon, 05 Oct 2020 11:31:11 -0400http://www.y1zhou.com/series/time-series/4-nonstationary/time-series-arima/Combining differencing and ARMA models and we get ARIMA. The procedures of estimation, diagnosis and forecasting are very similar as that of ARMA models.Projectionhttp://www.y1zhou.com/series/linear-algebra/4-geom-considerations/linear-algebra-projection/Mon, 21 Sep 2020 20:38:19 -0400http://www.y1zhou.com/series/linear-algebra/4-geom-considerations/linear-algebra-projection/Geometrically speaking, what is the projection of a vector onto another vector, and the projection of a vector onto a subspace?Categorical Datahttp://www.y1zhou.com/series/nonparam-stat/4-association-analysis/nonparametric-methods-categorical-data/Mon, 06 May 2019 10:46:18 -0400http://www.y1zhou.com/series/nonparam-stat/4-association-analysis/nonparametric-methods-categorical-data/Dealing with contingency tables. Fisher’s exact test comes back, together with Chi-squared test and likelihood-ratio test. We also talk about testing goodness-of-fit.Unit Root Testhttp://www.y1zhou.com/series/time-series/4-nonstationary/time-series-unit-root-test/Wed, 07 Oct 2020 16:42:23 -0400http://www.y1zhou.com/series/time-series/4-nonstationary/time-series-unit-root-test/A test that helps us determine whether differencing is needed or not. We also talk about over-differencing (don’t do it!) and model selection (AIC/BIC and MAPE).Orthogonalizationhttp://www.y1zhou.com/series/linear-algebra/4-geom-considerations/linear-algebra-orthogonalization/Mon, 28 Sep 2020 21:55:43 -0400http://www.y1zhou.com/series/linear-algebra/4-geom-considerations/linear-algebra-orthogonalization/Introducing the Gram-Schmidt process, a method for constructing an orthogonal basis given a non-orthogonal basis.Variability of Nonstationary Time Serieshttp://www.y1zhou.com/series/time-series/4-nonstationary/time-series-stationarity-variability/Fri, 09 Oct 2020 11:30:59 -0400http://www.y1zhou.com/series/time-series/4-nonstationary/time-series-stationarity-variability/Using the Box-Cox power transformation to stabilize the variance. At the end of this section, the standard procedure for fitting an ARIMA model is discussed.Monte Carlo Samplinghttp://www.y1zhou.com/series/bayesian-stat/bayesian-stat-monte-carlo-sampling/Mon, 15 Feb 2021 10:20:00 -0500http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-monte-carlo-sampling/This lecture discusses Monte Carlo approximations of the posterior distribution and summaries from it. While this might not seem entirely useful now, this underlies some of the key computational methods used for Bayesian inference that we will discuss further.Seasonal Time Serieshttp://www.y1zhou.com/series/time-series/seasonal-time-series/Tue, 13 Oct 2020 23:36:46 -0400http://www.y1zhou.com/series/time-series/seasonal-time-series/We introduce seasonal differencing, seasonal ARMA models, and combine them to get SARIMA models.Linear Space of Matriceshttp://www.y1zhou.com/series/linear-algebra/linear-algebra-matrix-linear-space/Wed, 30 Sep 2020 13:23:18 -0400http://www.y1zhou.com/series/linear-algebra/linear-algebra-matrix-linear-space/The column space, row space and rank of a matrix and their properties.Functions of Random Variableshttp://www.y1zhou.com/series/maths-stat/mathematical-statistics-functions-of-random-variables/Sun, 08 Dec 2019 09:57:16 -0400http://www.y1zhou.com/series/maths-stat/mathematical-statistics-functions-of-random-variables/Finding the distribution of a real-valued function of multiple random variables. There’s the method of distribution functions, transformations and moment generating functions.Bootstraphttp://www.y1zhou.com/series/nonparam-stat/5-modern-methods/nonparametric-methods-bootstrap/Mon, 06 May 2019 10:46:18 -0400http://www.y1zhou.com/series/nonparam-stat/5-modern-methods/nonparametric-methods-bootstrap/The procedure and applications of the nonparametric bootstrap.Density Estimationhttp://www.y1zhou.com/series/nonparam-stat/5-modern-methods/nonparametric-methods-density-estimation/Mon, 06 May 2019 10:46:18 -0400http://www.y1zhou.com/series/nonparam-stat/5-modern-methods/nonparametric-methods-density-estimation/Wanna know more about histograms and density plots?Modern Nonparametric Regressionhttp://www.y1zhou.com/series/nonparam-stat/5-modern-methods/nonparametric-methods-modern-nonparametric-regression/Wed, 08 May 2019 10:46:18 -0400http://www.y1zhou.com/series/nonparam-stat/5-modern-methods/nonparametric-methods-modern-nonparametric-regression/LOWESS, penalized least squares and the cubic spline.Bayesian Inference for the Normal Modelhttp://www.y1zhou.com/series/bayesian-stat/bayesian-stat-normal-distribution/Mon, 22 Feb 2021 10:20:00 -0500http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-normal-distribution/The normal distribution has two parameters, but we focus on the one-parameter setting in this lecture. We also introduce the posterior predictive check as a way to assess model fit, and briefly discuss the issue with improper prior distributions.Decomposition and Smoothing Methodshttp://www.y1zhou.com/series/time-series/time-series-decomposition-and-smoothing/Mon, 02 Nov 2020 11:12:20 -0500http://www.y1zhou.com/series/time-series/time-series-decomposition-and-smoothing/Decomposition procedures to extract trend, seasonal and other components from a time series. Smoothing techniques like moving average and Lowess are often used, and exponential smoothing (Holt-Winters) is another powerful tool.Matrix Tracehttp://www.y1zhou.com/series/linear-algebra/linear-algebra-matrix-trace/Wed, 07 Oct 2020 13:07:16 -0400http://www.y1zhou.com/series/linear-algebra/linear-algebra-matrix-trace/Such a simple concept with so many properties and applications!Sampling Distribution and Limit Theoremshttp://www.y1zhou.com/series/maths-stat/mathematical-statistics-sampling-distribution-and-limit-theorems/Sat, 28 Dec 2019 09:57:16 -0400http://www.y1zhou.com/series/maths-stat/mathematical-statistics-sampling-distribution-and-limit-theorems/We observe a random sample from a probability distribution of interest and want to estimate its properties. The CLT also comes into place.The Normal Model in a Two Parameter Settinghttp://www.y1zhou.com/series/bayesian-stat/bayesian-stat-normal-two-param-setting/Mon, 15 Mar 2021 10:20:00 -0500http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-normal-two-param-setting/This lecture discusses Bayesian inference of the normal model, particularly the case where we are interested in joint posterior inference of the mean and variance simultaneously. We discuss approaches to prior specification, and introduce the Gibbs sampler as a way to generate posterior samples if full conditional distributions of the parameters are available in closed-form.Spectral Analysishttp://www.y1zhou.com/series/time-series/time-series-spectral-analysis/Tue, 17 Nov 2020 15:12:15 -0500http://www.y1zhou.com/series/time-series/time-series-spectral-analysis/We talk about a method that helps us find the periodicity of a time series – the spectral density.Matrix Inversehttp://www.y1zhou.com/series/linear-algebra/linear-algebra-matrix-inverse/Mon, 12 Oct 2020 12:42:03 -0400http://www.y1zhou.com/series/linear-algebra/linear-algebra-matrix-inverse/…for a nonsingular matrix. We talk about left and right inverses, <em>the</em> matrix inverse and orthogonal matrices.Brief Review Before STAT 6520http://www.y1zhou.com/series/maths-stat/mathematical-statistics-brief-review-before-6520/Wed, 08 Jan 2020 09:57:16 -0400http://www.y1zhou.com/series/maths-stat/mathematical-statistics-brief-review-before-6520/A brief review of probability theory and statistics we’ve learnt so far.Metropolis-Hastings Algorithmshttp://www.y1zhou.com/series/bayesian-stat/bayesian-stat-metropolis-hastings-algorithms/Mon, 22 Mar 2021 10:20:00 -0500http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-metropolis-hastings-algorithms/This lecture discusses the Metropolis and Metropolis-Hastings algorithms, two more tools for sampling from the posterior distribution when we do not have it in closed form. These are used when we are unable to obtain full conditional distributions. MCMC for the win!Conditional Heteroscedastic Modelshttp://www.y1zhou.com/series/time-series/time-series-conditional-heteroscedastic-models/Mon, 23 Nov 2020 18:22:39 -0500http://www.y1zhou.com/series/time-series/time-series-conditional-heteroscedastic-models/Introducing volatility to our time series models. The properties and building procedures of ARCH and GARCH models are discussed.Generalized Inversehttp://www.y1zhou.com/series/linear-algebra/linear-algebra-generalized-inverse/Wed, 21 Oct 2020 12:45:56 -0400http://www.y1zhou.com/series/linear-algebra/linear-algebra-generalized-inverse/The generalized matrix inverse that applies to any $m \times n$ matrix.Bias and Variancehttp://www.y1zhou.com/series/maths-stat/8-estimation/mathematical-statistics-bias-and-variance/Sat, 25 Jan 2020 10:46:18 -0400http://www.y1zhou.com/series/maths-stat/8-estimation/mathematical-statistics-bias-and-variance/The bias, variance and mean squared error of an estimator. The efficiency is used to compare two estimators.Consistencyhttp://www.y1zhou.com/series/maths-stat/8-estimation/mathematical-statistics-consistency/Mon, 27 Jan 2020 10:46:18 -0400http://www.y1zhou.com/series/maths-stat/8-estimation/mathematical-statistics-consistency/Introducing consistency, a concept about the convergence of estimators. We start from the convergence of non-random number sequences to convergence in probability, then to consistency of estimators and its properties.The Method of Momentshttp://www.y1zhou.com/series/maths-stat/8-estimation/mathematical-statistics-method-of-moments/Tue, 28 Jan 2020 10:46:18 -0400http://www.y1zhou.com/series/maths-stat/8-estimation/mathematical-statistics-method-of-moments/A fairly simple method of constructing estimators that’s not often used now.Hierarchical Modelshttp://www.y1zhou.com/series/bayesian-stat/bayesian-stat-hierarchical-models/Mon, 29 Mar 2021 10:20:00 -0500http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-hierarchical-models/This model is useful for accommodating data which are grouped or having multiple levels, as the main feature is the addition of a between-group layer which relates groups to each other. The presence of this layer forces group-level parameters to be more similar to each other, displaying the important properties of partial pooling and shrinkage.Projection Matrixhttp://www.y1zhou.com/series/linear-algebra/linear-algebra-projection-matrix/Fri, 23 Oct 2020 13:21:22 -0400http://www.y1zhou.com/series/linear-algebra/linear-algebra-projection-matrix/We introduce idempotent matrices and the projection matrix. Both are very important concepts in statistical analyses such as linear regression.Maximum Likelihood Estimatorhttp://www.y1zhou.com/series/maths-stat/9-estimation-under-parametric-models/mathematical-statistics-maximum-likelihood-estimator/Wed, 29 Jan 2020 10:46:18 -0400http://www.y1zhou.com/series/maths-stat/9-estimation-under-parametric-models/mathematical-statistics-maximum-likelihood-estimator/Under parametric family distributions, there’s a much better way of constructing estimators - the maximum likelihood estimator.Sufficiencyhttp://www.y1zhou.com/series/maths-stat/9-estimation-under-parametric-models/mathematical-statistics-sufficiency/Thu, 30 Jan 2020 10:46:18 -0400http://www.y1zhou.com/series/maths-stat/9-estimation-under-parametric-models/mathematical-statistics-sufficiency/Introducing sufficient statistics for the inference of parameters. The factorization theorem comes in handy!Optimal Unbiased Estimatorhttp://www.y1zhou.com/series/maths-stat/9-estimation-under-parametric-models/mathematical-statistics-optimal-unbiased-estimator/Sun, 02 Feb 2020 10:46:18 -0400http://www.y1zhou.com/series/maths-stat/9-estimation-under-parametric-models/mathematical-statistics-optimal-unbiased-estimator/Introducing the Minimum Variance Unbiased Estimator and the procedure of deriving it.Bayesian Linear Regressionhttp://www.y1zhou.com/series/bayesian-stat/bayesian-stat-bayesian-linear-regression/Mon, 05 Apr 2021 10:20:00 -0500http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-bayesian-linear-regression/The main difference with traditional approaches is in the specification of prior distributions for the regression parameters, which relate covariates to a continuous response variable. However, the Bayesian approach also provides a fairly intuitive way to add random effects (such as a random intercept or random slope), which results in what is traditionally known as a linear mixed model.Determinanthttp://www.y1zhou.com/series/linear-algebra/linear-algebra-determinant/Mon, 26 Oct 2020 13:25:26 -0400http://www.y1zhou.com/series/linear-algebra/linear-algebra-determinant/The determinant is a very important concept for square matrices, and its properties are key to various other notions such as block matrices and matrix inverses.Confidence Intervalshttp://www.y1zhou.com/series/maths-stat/mathematical-statistics-confidence-intervals/Sat, 08 Feb 2020 09:57:16 -0400http://www.y1zhou.com/series/maths-stat/mathematical-statistics-confidence-intervals/Confidence intervals and methods of contructing them.Penalized Linear Regression and Model Selectionhttp://www.y1zhou.com/series/bayesian-stat/bayesian-stat-penalized-linear-regression-and-model-selection/Mon, 12 Apr 2021 10:20:00 -0500http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-penalized-linear-regression-and-model-selection/This lecture covers some Bayesian connections to penalized regression methods such as ridge regression and the LASSO. Further discussion of the posterior predictive distribution as well as model selection criterion (DIC) is included.Quadratic Formhttp://www.y1zhou.com/series/linear-algebra/linear-algebra-quadratic-form/Wed, 04 Nov 2020 16:45:31 -0500http://www.y1zhou.com/series/linear-algebra/linear-algebra-quadratic-form/This long post covers the quadratic form and the positive definiteness of matrices. The decomposition of symmetric matrices is slightly touched on, and the entire post is mainly to prepare for the next chapter – eigenvalues and eigenvectors.Statistical Decisionhttp://www.y1zhou.com/series/maths-stat/11-hypothesis-testing/mathematical-statistics-statistical-decision/Wed, 01 Apr 2020 16:55:50 -0400http://www.y1zhou.com/series/maths-stat/11-hypothesis-testing/mathematical-statistics-statistical-decision/Up till now we’ve made the assumption that the data is generated from a statistical model controlled by some parameter(s). We used estimation to determine a point or a range of possible values of parameters based on the sample. On the other hand, the goal of data analysis is often to help make decisions, which is not directly addressed by estimation.
Drug approval example Suppose a new drug can be approved only with $\geq 90%$ effective rate.Statistical Testhttp://www.y1zhou.com/series/maths-stat/11-hypothesis-testing/mathematical-statistics-statistical-test/Wed, 01 Apr 2020 16:55:50 -0400http://www.y1zhou.com/series/maths-stat/11-hypothesis-testing/mathematical-statistics-statistical-test/Here we introduce the elements of a statistical test, namely null and alternative hypotheses, test statistic, rejection region, and type I and type II errors. We then proceed to large-sample Z-tests and some small-sample tests derived from the small sample CIs.p-valueshttp://www.y1zhou.com/series/maths-stat/11-hypothesis-testing/mathematical-statistics-p-values/Thu, 09 Apr 2020 18:26:34 -0400http://www.y1zhou.com/series/maths-stat/11-hypothesis-testing/mathematical-statistics-p-values/Introducing the definition of p-values, and why they are important in statistical tests.Optimal Testshttp://www.y1zhou.com/series/maths-stat/11-hypothesis-testing/mathematical-statistics-optimal-tests/Tue, 14 Apr 2020 18:26:34 -0400http://www.y1zhou.com/series/maths-stat/11-hypothesis-testing/mathematical-statistics-optimal-tests/Briefly introducing the optimality of a statistical test and showing why it’s a difficult problem to solve.Likelihood Ratio Testhttp://www.y1zhou.com/series/maths-stat/11-hypothesis-testing/mathematical-statistics-likelihood-ratio-test/Sat, 18 Apr 2020 22:15:07 -0400http://www.y1zhou.com/series/maths-stat/11-hypothesis-testing/mathematical-statistics-likelihood-ratio-test/In the previous section, we considered the situation where we
Test $H_0$: $\theta = \theta_0$ vs. $H_a$: $\theta = \theta_a$ using rejection rule $\frac{L(\theta_0)}{L(\theta_a)} < k_\alpha$. Test $H_0$: $\theta = \theta_0$ vs. $H_a$: $\theta \in \Theta_a$ (typically one-sided) using the rejection rule $\frac{L(\theta_0)}{L(\theta_a)} < k_\alpha$ if it does not depend on $\theta \in \Theta_a$. Beyond these situations, there’s many other cases, such as
What if $H_0: \theta \in \Theta_0$ is composite?Bayesian Generalized Linear Modelshttp://www.y1zhou.com/series/bayesian-stat/bayesian-stat-generalized-linear-models/Mon, 19 Apr 2021 10:20:00 -0500http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-generalized-linear-models/This lecture discusses a simple logistic regression model for predicting a binary variable. GLMs are necessary when the response variable cannot be modeled appropriately by a normal distribution, and use a link function to connect parameters of the response distribution to the covariates.Eigenvalues and Eigenvectorshttp://www.y1zhou.com/series/linear-algebra/linear-algebra-eigenvalues-and-eigenvectors/Wed, 18 Nov 2020 18:45:52 -0500http://www.y1zhou.com/series/linear-algebra/linear-algebra-eigenvalues-and-eigenvectors/Probably the most important lecture in this course – we start from the calculation of eigenvalues and eigenvectors, and move on to related topics such as the eigendecomposition, singular value decomposition, and the Moore-Penrose inverse.Linear Modelshttp://www.y1zhou.com/series/maths-stat/mathematical-statistics-linear-models/Tue, 21 Apr 2020 13:05:20 -0400http://www.y1zhou.com/series/maths-stat/mathematical-statistics-linear-models/So far we’ve finished the main materials of this course - estimation and hypothesis testing. The starting point of all the statistical analyses is really modeling. In other words, we assume that our data are generated by some random mechanism, specifically we’ve been focusing on i.i.d. samples from a fixed population distribution.
Although this assumption can be regarded reasonable for many applications, in practice there are other scenarios where this doesn’t make sense, e.A Bayesian Perspective on Missing Data Imputationhttp://www.y1zhou.com/series/bayesian-stat/bayesian-stat-missing-data-imputation/Mon, 26 Apr 2021 10:20:00 -0500http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-missing-data-imputation/This lecture discusses some approaches to handling missing data, primarily when missingness occurs completely randomly. We discuss a procedure, MICE, which uses Gibbs sampling to create multiple “copies” of filled-in datasets.Are we there yet? A machine learning architecture to predict organotropic metastaseshttp://www.y1zhou.com/publications/2021-michael-mg-organotropic-metastasis/Wed, 24 Nov 2021 00:00:00 +0000http://www.y1zhou.com/publications/2021-michael-mg-organotropic-metastasis/Cancer metastasis into distant organs is an evolutionarily selective process. A better understanding of the driving forces endowing proliferative plasticity of tumor seeds in distant soils is required to develop and adapt better treatment systems for this lethal stage of the disease. To this end, we aimed to utilize transcript expression profiling features to predict the site-specific metastases of primary tumors and second, to identify the determinants of tissue specific progression.Molecular identification of protein kinase C beta in Alzheimer's diseasehttp://www.y1zhou.com/publications/2020-zhike-alzheimers/Sun, 15 Nov 2020 00:00:00 +0000http://www.y1zhou.com/publications/2020-zhike-alzheimers/The purpose of this study was to investigate the potential roles of protein kinase C beta (PRKCB) in the pathogenesis of Alzheimer’s disease (AD). We identified 2,254 differentially expressed genes from 19,245 background genes in AD versus control as well as PRKCB-low versus high group. Five co-expression modules were constructed by weight gene correlation network analysis. Among them, the 1,222 genes of the turquoise module had the strongest relation to AD and those with low PRKCB expression, which were enriched in apoptosis, axon guidance, gap junction, Fc gamma receptor (FcγR)-mediated phagocytosis, mitogen-activated protein kinase (MAPK) and vascular endothelial growth factor (VEGF) signaling pathways.Abouthttp://www.y1zhou.com/about/Tue, 30 Jun 2020 00:00:00 +0000http://www.y1zhou.com/about/1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 { "name": "Yi", "job": { "PhD student": "bioinformatics" }, "education": { "Master's": ["Statistics", "University of Georgia"], "Bachelor's": ["Biology", "China Agricultural University"] } "interests": [ "cancer", "systems biology", "metabolic reprogramming", "NLP", "data visualization", "graph neural networks" ], "skills": ["R", "Python", "Linux", "Docker"] } Savage’s approach to research via Mosteller (Hamada and Sitter, 2004):Co-expression based cancer staging and applicationhttp://www.y1zhou.com/publications/2020-xiangchun-coexpression-classifier/Tue, 30 Jun 2020 00:00:00 +0000http://www.y1zhou.com/publications/2020-xiangchun-coexpression-classifier/A novel method is developed for predicting the stage of a cancer tissue based on the consistency level between the co-expression patterns in the given sample and samples in a specific stage. The basis for the prediction method is that cancer samples of the same stage share common functionalities as reflected by the co-expression patterns, which are distinct from samples in the other stages. Test results reveal that our prediction results are as good or potentially better than manually annotated stages by cancer pathologists.Metabolic Reprogramming in Cancer: the bridge that connects intracellular stresses and cancer behaviorshttp://www.y1zhou.com/publications/2020-yi-nsr-perspective/Thu, 30 Apr 2020 00:00:00 +0000http://www.y1zhou.com/publications/2020-yi-nsr-perspective/We outline in this perspective a novel framework for cancer study from the angle of stress-induced metabolic reprogramming. The driving question is: what may dictate the same or highly similar evolutionary trajectory across different cancers, consisting of cell proliferation, drug resistance, migration and metastasis? We have observed that cancer and cancer-forming cells are under a persistent intracellular alkaline stress, due to chronic inflammation and local iron overload. A wide range of reprogrammed metabolisms (RMs) are induced to keep the intracellular pH within a livable range for survival.Install Dependencies for Puppeteer on Manjaro Linuxhttp://www.y1zhou.com/posts/manjaro-puppeteer/Mon, 13 Apr 2020 21:31:56 -0400http://www.y1zhou.com/posts/manjaro-puppeteer/Elucidation of Functional Roles of Sialic Acids in Cancer Migrationhttp://www.y1zhou.com/publications/2020-sun-sialic-acid/Tue, 31 Mar 2020 00:00:00 +0000http://www.y1zhou.com/publications/2020-sun-sialic-acid/Sialic acids (SA), negatively charged nine-carbon sugars, have long been implicated in cancer metastasis since 1960’s but its detailed functional roles remain elusive. We present a computational analysis of transcriptomic data of cancer vs. control tissues of eight types in TCGA, aiming to elucidate the possible reason for the increased production and utilization of SAs in cancer and their possible driving roles in cancer migration. Our analyses have revealed for all cancer types:Automatic and Interpretable Model for Periodontitis Diagnosis in Panoramic Radiographshttp://www.y1zhou.com/publications/2020-haoyang-miccai/Sat, 14 Mar 2020 00:00:00 +0000http://www.y1zhou.com/publications/2020-haoyang-miccai/Periodontitis is a prevalent and irreversible chronic inflammatory disease both in developed and developing countries, and affects about 20% - 50% of the global population. The tool for automatically diagnosing periodontitis is highly demanded to screen at-risk people for periodontitis and its early detection could prevent the onset of tooth loss, especially in local community and health care settings with limited dental professionals. In the medical field, doctors need to understand and trust the decisions made by computational models and proposing interpretable machine learning models is crucial for disease diagnosis.Neural Functions Play Different Roles in Triple Negative Breast Cancer (TNBC) and non-TNBChttp://www.y1zhou.com/publications/2020-renbo-neural-tnbc/Thu, 20 Feb 2020 00:00:00 +0000http://www.y1zhou.com/publications/2020-renbo-neural-tnbc/Triple negative breast cancer (TNBC) represents the most malignant subtype of breast cancer, and yet our understanding about its unique biology remains elusive. We have conducted a comparative computational analysis of transcriptomic data of TNBC and non-TNBC (NTNBC) tissue samples from the TCGA database, focused on genes involved in neural functions. Our main discoveries are:
While both subtypes involve neural functions, TNBC has substantially more up-regulated neural genes than NTNBC, suggesting that TNBC is more complex than NTNBC; Non-neural functions related to cell-microenvironment interactions and intracellular damage processing are key inducers of the neural genes in both TNBC and NTNBC, but the inducer-responder relationships are different in the two cancer subtypes; Key neural functions such as neural crest formation are predicted to enhance adaptive immunity in TNBC while glia development, along with a few other neural functions, induce both innate and adaptive immunity in NTNBC.Metabolic Reprogramming in Cancer is Induced to Increase Proton Productionhttp://www.y1zhou.com/publications/2020-huiyan-metabolic-reprogramming/Mon, 13 Jan 2020 00:00:00 +0000http://www.y1zhou.com/publications/2020-huiyan-metabolic-reprogramming/Considerable metabolic reprogramming has been observed in a conserved manner across multiple cancer types, but their true causes remain elusive. We present an analysis of around 50 such reprogrammed metabolisms (RMs) including the Warburg effect, nucleotide de novo synthesis and sialic acid biosynthesis in cancer.
Analyses of the biochemical reactions conducted by these RMs, coupled with gene expression data of their catalyzing enzymes, in 7,011 tissues of 14 cancer types, revealed that all RMs produce more H+ than their original metabolisms.Transcription regulation by DNA methylation under stressful conditions in human cancerhttp://www.y1zhou.com/publications/2017-sha-transcription-methylation/Thu, 23 Nov 2017 00:00:00 +0000http://www.y1zhou.com/publications/2017-sha-transcription-methylation/We aim to address one question: do cancer vs. normal tissue cells execute their transcription regulation essentially the same or differently, and why? We utilized an integrated computational study of cancer epigenomes and transcriptomes of 10 cancer types, by using penalized linear regression models to evaluate the regulatory effects of DNA methylations on gene expressions.