Definitions
| Acronym | Meaning | Use |
|---|
| PDF | probability density function (continuous) | describes relative likelihood |
| PMF | probability mass function (discrete) | probability at specific values |
| CDF | cumulative distribution function | probability ≤ x |
| CI | confidence interval | estimates parameter range |
| ANOVA | analysis of variance | tests group mean differences |
Key Symbols
| Symbol | Meaning | Use |
|---|
| n | sample size | number of observations |
| xi | i-th data value | individual measurement |
| xˉ | sample mean | average of sample |
| μ | population mean | true average |
| s, s2 | sample SD, variance | sample spread |
| σ, σ2 | population SD, variance | true spread |
| p | success probability | Bernoulli success rate |
| k | count of successes | number of positive outcomes |
| λ | rate parameter | events per unit time |
| α | significance level | probability of Type I error |
| z,t | standard quantiles | critical values |
| χ2 | chi-square quantile | variance test value |
| β0,β1 | intercept, slope | regression coefficients |
| R2 | coefficient of determination | explained variance fraction |
Basics
| Concept | Formula | Use |
|---|
| Mean | xˉ=n1∑ixi | central location |
| Sample Variance | s2=n−11∑(xi−xˉ)2 | measure of dispersion |
| Population Variance | σ2=n1∑(xi−μ)2 | actual dispersion |
| PDF | f(x) | density shape |
| CDF (cont.) | F(x)=∫−∞xf(t)dt | cumulative probability |
| PMF | P(X=k) | discrete probability at k |
| CDF (disc.) | F(k)=∑i=0kP(X=i) | cumulative discrete prob. |
Discrete Distributions
| Distribution | PMF | CDF | Mean / Var | Use |
|---|
| Binomial (n,p) | P(X=k)=(kn)pk(1−p)n−k or like (P) | P(X≤x)=∑i=0k(in)pi(1−p)n−i | np, np(1−p) | modeling # successes in n trials |
| Poisson (λ) | P(X=k)=k!λke−λ | F(k)=∑i=0ki!λie−λ | λ, λ | counts of rare events |
| Binomial | [nx]=(n−x)!x!n! | | | |
Continuous Distributions
| Distribution | PDF | CDF | E, Var | Use |
|---|
| Normal (μ,σ2) | σ2π1e−2σ2(x−μ)2 | Φ(σx−μ) | μ, σ2 | symmetric spread around mean |
| Exponential (λ) | λe−λx,x≥0 | 1−e−λx | 1/λ, 1/λ2 | time between Poisson events |
| Lognormal (μ,σ2) | xσ2π1e−2σ2(lnx−μ)2 | Φ(σlnx−μ) | eμ+21σ2, (eσ2−1)e2μ+σ2 | modeling skewed positive data |
| Gamma (α,β) | Γ(α)βαxα−1e−x/β | Γ(α)1γ(α,x/β) | αβ, αβ2 | sum of α exponentials |
| Weibull (k,λ) | λk(λx)k−1e−(x/λ)k | 1−e−(x/λ)k | λΓ(1+k1), λ2[Γ(1+k2)−Γ(1+k1)2] | life/failure time distribution |
Parameter Estimation
| Estimate | Formula | Use |
|---|
| CI for mean | xˉ±tn−1,α/2ns | interval for true mean |
| CI for variance | (χ1−α/22(n−1)s2,χα/22(n−1)s2) | interval for true variance |
| CI for proportion | p^±znp^(1−p^) | interval for true proportion |
Hypothesis Testing
| Test | Statistic | Decision Rule | Use |
|---|
| Z-test | Z=σ/nxˉ−μ0 | reject if ∣Z∣>zα/2 | test mean with known σ |
| T-test | T=s/nxˉ−μ0 | reject if ∣T∣>tn−1,α/2 | test mean with unknown σ |
| p-value | p=P(∣Z∣>∣z∣) | reject if p<α | quantify evidence against H0 |
Error Propagation
| Rule | Formula | Use |
|---|
| Addition / Subtraction | Z=X±Y, ΔZ=ΔX+ΔY | combine absolute errors |
| Multiplication / Division | Z=X⋅Y or Z=X/Y, ∥Z∥ΔZ=∥X∥ΔX+∥Y∥ΔY | Z |
| Power | Z=Xn, ∥Z∥ΔZ=∣n∣∥X∥ΔX | Z |
| General Function | Z=f(X1,…,Xk), ΔZ=∑i=1k(∂Xi∂fΔXi)2 | derivative propagation |
| Outlier Criterion (Chauvenet) | ∣x−xˉ∣>kσ | detect outliers |
Regression Analysis
| Concept | Formula | Use |
|---|
| Slope | β^1=∑(xi−xˉ)2∑(xi−xˉ)(yi−yˉ) | effect per unit change |
| Intercept | β^0=yˉ−β^1xˉ | value when x=0 |
| R2 | 1−∑(yi−yˉ)2∑ei2 | proportion variance explained |
| ANOVA F | F=SSE/(n−2)SSR/1 | test overall regression fit |