The Basic Basics:
Bayesian Inferenceon a Binomial Proportion
The Basic Basics:
Bayesian InferenceProbabilities:
$p(\text{positive}|\text{disease}) = 0.98$
$p(\text{disease}) = 0.002$
$p(\text{positive}) = p(\text{positive}|\text{disease}) p(\text{disease}) + p(\text{positive}|\text{no disease}) p(\text{no disease})$
$p(\text{disease}|\text{positive}) = \frac{p(\text{positive}|\text{disease}) p(\text{disease})}{p(\text{positive})} = \frac{0.98 * 0.002}{0.98 * 0.002 + 0.02 * 0.998} = \frac{0.000196}{0.020156} = 0.00894 $
$p(\text{disease}|\text{positive}) = 0.00894 $
$ Y = y_1, y_2, y_3, ..., y_n $ | $ X = x_1, x_2, x_3, ..., x_n $ |
$\mu_y $ |
$ \sigma_y $ |
$ \rho_{x,y} $ |
$ y = f(x) + \epsilon $ |
$ \epsilon \overset{iid}{\sim} WN(0,\sigma^2) $ |
Point estimates
p values: the probability of observing our sample data under the hypothesis that the true value of the parameter is 0...
Confidence interval: not a probability range...
An interval that contains all the estimates for our parameter of interest for which we do not reject a null hypothesis of equality with our "true" parameter of interest...
All implicitly conditional on our assumptions
(and maybe lots of other assumptions)
Easier to DO (and scale).
Pre-packaged, widely adopted, off-the-shelf solutions.
Often computationally simpler.
Easier to black-box it.
Easier to understand what comes out.
Easier to update and include disparate data sources.
Flexible.
## Generate Theta: Unknown Probability of Success
## Using built-in R pseudo-random number generator
theta_true <- runif(1,0,1)
## N is random integer between 100,000 and 400,000
N = sample(seq(100000,400000),1)
## A is successes proportion of N
A = round(theta_true*N)
## B is the rest of N
B = N - A
## Zpop is randomly shuffled A successes and B failures
Zpop <- sample(c(rep(1,A),rep(0,B)))
$$p( \theta | Y ) \propto p( Y | \theta ) \times p( \theta )$$
### Function: Prior Plot Values
###################################################
### Function: Prior Plot Values
###################################################
Prior <- function(m, n, a=n*m, b=n*(1-m)){
dom <- seq(0,1,0.005)
val <- dbeta(dom,a,b)
return(data.frame('x'=dom, 'y'=val))
}
###################################################
### Function: Likelihood Plot Values
###################################################
Likelihood <- function(N, Y){
a <- Y + 1
b <- N - Y + 1
dom <- seq(0,1,0.005)
val <- dbeta(dom,a,b)
return(data.frame('x'=dom, 'y'=val))
}
###################################################
### Function: Posterior Plot Values
###################################################
Posterior <- function(m, n, N, Y, a_in=n*m, b_in=n*(1-m)){
a_out <- Y + a_in - 1
b_out <- N - Y + b_in - 1
dom <- seq(0,1,0.005)
val <- dbeta(dom,a_out,b_out)
return(data.frame('x'=dom, 'y'=val))
}
###################################################
### Function: Posterior Plot Values
###################################################
Posterior <- function(m, n, N, Y, a_in=n*m, b_in=n*(1-m)){
a_out <- Y + a_in - 1
b_out <- N - Y + b_in - 1
dom <- seq(0,1,0.005)
val <- dbeta(dom,a_out,b_out)
return(data.frame('x'=dom, 'y'=val))
}
$$ \beta + 1 = N - Y + n(1-m)$$ $$\rightarrow \beta = N - Y + n(1-m) - 1 $$
$$ \alpha + 1 = (Y + nm)$$ $$\rightarrow \alpha = Y + nm - 1 $$
###################################################
### Function: Mean of Posterior Beta
###################################################
MeanOfPosterior <- function(m, n, N, Y, a=n*m, b=n*(1-m)){
a_out <- Y + a - 1
b_out <- N - Y + b - 1
E_posterior <- a_out / (a_out + b_out)
return(E_posterior)
}
###################################################
### Function: Mode of Posterior Beta
###################################################
ModeOfPosterior <- function(m, n, N, Y, a=n*m, b=n*(1-m)){
a_out <- Y + a - 1
b_out <- N - Y + b - 1
mode_posterior <- (a_out - 1)/(a_out + b_out - 2)
return(mode_posterior)
}
###################################################
### Function: Standard Deviation of Posterior Beta
###################################################
StdDevOfPosterior <- function(m, n, N, Y, a=n*m, b=n*(1-m)){
a_out <- Y + a - 1
b_out <- N - Y + b - 1
sigma_posterior <- sqrt((a_out*b_out)/(((a_out+b_out)^2)*(a_out+b_out+1)))
return(sigma_posterior)
}
Prior | Mean | Mode | Std Dev | Alpha | Beta |
Uniform | 0.0520 | 0.0502 | 0.0099 | 26 | 474 |
Bimodal | 0.0511 | 0.0493 | 0.0098 | 25.5 | 473.5 |
Weak High | 0.0669 | 0.0652 | 0.0111 | 34 | 474 |
Weak Equal | 0.0591 | 0.0573 | 0.0104 | 30 | 478 |
Weak Low | 0.0512 | 0.0494 | 0.0098 | 26 | 482 |
Strong High | 0.1923 | 0.1913 | 0.0161 | 115 | 483 |
Strong Equal | 0.1254 | 0.1242 | 0.0135 | 75 | 523 |
Strong Low | 0.0585 | 0.0570 | 0.0096 | 35 | 563 |
#### 2008 Election ####
# 2008 Parameters
elecDay <- '2008-11-04'
cutOff <- '2008-08-01'
elecPassed = T
elecType <- 'President'
candidates <- list('d'='Obama','r'='McCain')
d_m = 0.5
r_m = 0.5
init_n = 2
Two binomial proportion models, one for each candidate.
The prior for each update after the first is the posterior of the last model.
Obama | McCain | Predicted Outcome | Predicted Winner | Forecast Error | |
FL | 49.00 | 47.20 | 1.80 | Obama | 1.00 |
IN | 46.40 | 47.80 | -1.40 | McCain | 2.50 |
MO | 47.80 | 48.50 | -0.70 | McCain | 0.60 |
NC | 48.00 | 48.40 | -0.40 | McCain | 0.70 |
NH | 52.80 | 42.20 | 10.60 | Obama | 1.00 |
OH | 48.80 | 46.30 | 2.50 | Obama | 2.10 |
VA | 50.20 | 45.80 | 4.40 | Obama | 1.90 |
Obama | McCain | Predicted Outcome | Predicted Winner | Forecast Error | |
FL | 47.40 | 46.50 | 0.90 | Obama | 1.90 |
IN | 46.50 | 47.40 | -0.90 | McCain | 2.00 |
MO | 47.40 | 47.70 | -0.30 | McCain | 0.20 |
NC | 47.40 | 47.30 | 0.10 | Obama | 0.20 |
NH | 50.70 | 42.60 | 8.10 | Obama | 1.50 |
OH | 47.80 | 45.30 | 2.50 | Obama | 2.10 |
VA | 49.40 | 45.50 | 3.90 | Obama | 2.40 |
Obama | McCain | Outcome | Winner | |
FL | 51.00 | 48.20 | 2.80 | Obama |
IN | 50.00 | 48.90 | 1.10 | Obama |
MO | 49.30 | 49.40 | -0.10 | McCain |
NC | 49.70 | 49.40 | 0.30 | Obama |
NH | 54.10 | 44.50 | 9.60 | Obama |
OH | 51.50 | 46.90 | 4.60 | Obama |
VA | 52.60 | 46.30 | 6.30 | Obama |
RMSE | RMSE, Diff < 2% | Number Correctly Predicted | |
My Predictions | 4.27 | 1.17 | 17.00 |
RealClearPolitics Predictions | 3.23 | 1.54 | 16.00 |
Obama | Romney | Outcome | Winner | |
CO | 48.70 | 45.80 | 2.90 | Obama |
FL | 49.30 | 46.10 | 3.20 | Obama |
MO | 43.00 | 50.30 | -7.30 | Romney |
NC | 47.80 | 46.70 | 1.10 | Obama |
NH | 48.70 | 45.70 | 3.00 | Obama |
VA | 48.70 | 45.00 | 3.70 | Obama |
Obama | Romney | Outcome | Winner | |
CO | 47.90 | 46.10 | 1.80 | Obama |
FL | 48.10 | 46.30 | 1.80 | Obama |
MO | 42.70 | 49.30 | -6.60 | Romney |
NC | 46.60 | 47.20 | -0.60 | Romney |
NH | 49.00 | 44.80 | 4.20 | Obama |
VA | 47.80 | 45.50 | 2.30 | Obama |