# "And you are?"

### By a show of hands:

1. Stats?
2. Probability?
3. Equations?
4. Bayes?
5. Code?

# A Chickensh** Disclaimer ## I'll do my best not to be tedious about it...

### "Tell em what you’re gonna tell em..."

• A Conceptual Introduction to Bayesian Data Analysis
• A Few Equations...Ok, a lot of equations.
• A Little Code
• Pretty Pictures
• Maybe A Little History

# Fast But Useless Introductions™: Probability

## Probability allows us to quantify our uncertainty...

Probabilities:

1. Point probabilities: $80\%$ chance of rain
2. Probability range
3. Probability distribution:

• A model of uncertainty around some value or process
• A function: $y = f(x)$
• Describes the likelihood of specific values of some random variable.

### Rules:

• The probability of all possible values of a random variable always sums (or integrates) to 1.
• Joint: probability of A and B $$p(A,B) = p(B,A) = p(A) p(B)$$
• Marginal: probability of A $$p(A) = p(A)p(B) + p(A)p(\text{not }B)$$
• Conditional: probability of A given x $$p(A|B) = p(A \text{ and }B) p(B)$$

> # The Prior times the Likelihood is Proportional to the Posterior.

## So you want to analyze some data...

### How it begins:

• Data:  $Y = y_1, y_2, y_3, ..., y_n$ $X = x_1, x_2, x_3, ..., x_n$
• Quantities of Interest:  $\mu_y$ $\sigma_y$ $\rho_{x,y}$
• Assumptions!
•  $y = f(x) + \epsilon$ $\epsilon \overset{iid}{\sim} WN(0,\sigma^2)$

### What A Classical Data Analysis Gives Us:

1. Point estimates

2. p values: the probability of observing our sample data under the hypothesis that the true value of the parameter is 0...

3. Confidence interval: not a probability range...

An interval that contains all the estimates for our parameter of interest for which we do not reject a null hypothesis of equality with our "true" parameter of interest...

All implicitly conditional on our assumptions

(and maybe lots of other assumptions)

## Advantages of Classical Data Analysis ## There are reasons...

• Easier to DO (and scale).

• Pre-packaged, widely adopted, off-the-shelf solutions.

• Often computationally simpler.

• Easier to black-box it.

## Disadvantages of Classical Data Analysis  #### Inflexible # Probability distributions!

## Advantages of Bayesian Data Analysis ## It's all probability...

• Easier to understand what comes out.

• Easier to update and include disparate data sources.

• Flexible.

## More time feeling dumb. ## More difficult to scale. ## Few widely adopted off-the-shelf solutions.   # And now for the main event...

## Our Data: Observations of some process with just two outcomes ## Lots of real-world processes can be modeled like this.  # $$z_i \in (A, B) \forall i$$ #### In order to model a process mathematically, we have to define one outcome as...

##### a success... ## and the other as a failure. # $$\theta = 0.2$$ # $$\theta = 0.5$$ # $$\theta = 0.8$$ ## Playing make-believe...

• We don't know $\theta$...
## Generate Theta: Unknown Probability of Success
## Using built-in R pseudo-random number generator
theta_true <- runif(1,0,1)
• We don't know even know $N$...
## N is random integer between 100,000 and 400,000
N = sample(seq(100000,400000),1)
• Our Dataset:
• ## A is successes proportion of N
A = round(theta_true*N)
## B is the rest of N
B = N - A
## Zpop is randomly shuffled A successes and B failures
Zpop <- sample(c(rep(1,A),rep(0,B)))

## Our population... # $$p( \theta )$$

$$p( \theta | Y ) \propto p( Y | \theta ) \times p( \theta )$$

# $p(\theta) = \theta^{(\alpha - 1)} (1 - \theta)^{(\beta - 1)}$          # $$\alpha = m \times n$$$$\beta = (1 - m) \times n$$

### Function: Prior Plot Values
###################################################
### Function: Prior Plot Values
###################################################
Prior <- function(m, n, a=n*m, b=n*(1-m)){
dom <- seq(0,1,0.005)
val <- dbeta(dom,a,b)
return(data.frame('x'=dom, 'y'=val))
}

# $$p( Y | \theta )$$

##### $$p( \theta | Y ) \propto p( Y | \theta ) \times p( \theta )$$
###################################################
### Function: Likelihood Plot Values
###################################################
Likelihood <- function(N, Y){
a <- Y + 1
b <- N - Y + 1
dom <- seq(0,1,0.005)
val <- dbeta(dom,a,b)
return(data.frame('x'=dom, 'y'=val))
}

# $$p( \theta | Y ) \propto p( Y | \theta ) \times p( \theta )$$

#### $$p( \theta | Y ) \propto [\theta^{(Y)} (1-\theta)^{(N - Y)}] \times [\theta^{(nm)} (1-\theta)^{(n(1-m))}]$$

##### $$p( \theta | Y ) \propto [\theta^{(Y + nm)} (1-\theta)^{(N - Y + n(1-m))}]$$
###################################################
### Function: Posterior Plot Values
###################################################
Posterior <- function(m, n, N, Y, a_in=n*m, b_in=n*(1-m)){
a_out <- Y + a_in - 1
b_out <- N - Y + b_in - 1
dom <- seq(0,1,0.005)
val <- dbeta(dom,a_out,b_out)
return(data.frame('x'=dom, 'y'=val))
}

# $$p( \theta | Y ) \propto \theta^{(Y + nm)} (1-\theta)^{(N - Y + n(1-m))}$$

###################################################
### Function: Posterior Plot Values
###################################################
Posterior <- function(m, n, N, Y, a_in=n*m, b_in=n*(1-m)){
a_out <- Y + a_in - 1
b_out <- N - Y + b_in - 1
dom <- seq(0,1,0.005)
val <- dbeta(dom,a_out,b_out)
return(data.frame('x'=dom, 'y'=val))
}

$$\beta + 1 = N - Y + n(1-m)$$ $$\rightarrow \beta = N - Y + n(1-m) - 1$$

$$\alpha + 1 = (Y + nm)$$ $$\rightarrow \alpha = Y + nm - 1$$

###################################################
### Function: Mean of Posterior Beta
###################################################
MeanOfPosterior <- function(m, n, N, Y, a=n*m, b=n*(1-m)){
a_out <- Y + a - 1
b_out <- N - Y + b - 1
E_posterior <- a_out / (a_out + b_out)
return(E_posterior)
}

###################################################
### Function: Mode of Posterior Beta
###################################################
ModeOfPosterior <- function(m, n, N, Y, a=n*m, b=n*(1-m)){
a_out <- Y + a - 1
b_out <- N - Y + b - 1
mode_posterior <- (a_out - 1)/(a_out + b_out - 2)
return(mode_posterior)
}

###################################################
### Function: Standard Deviation of Posterior Beta
###################################################
StdDevOfPosterior <- function(m, n, N, Y, a=n*m, b=n*(1-m)){
a_out <- Y + a - 1
b_out <- N - Y + b - 1
sigma_posterior <- sqrt((a_out*b_out)/(((a_out+b_out)^2)*(a_out+b_out+1)))
return(sigma_posterior)
}

## Our population... # Our sample...

## $N_{samp} = 500$ $Y_{samp} = 26$ # $p(Y_{samp}|\theta, N) = \theta^{26} (1-\theta)^{(500 - 26)}$ $p(Y_{samp}|\theta) = \theta^{26} (1-\theta)^{474}$

## Testing out different priors...        ## So how'd we do?

### Results...

 Prior Mean Mode Std Dev Alpha Beta Uniform 0.0520 0.0502 0.0099 26 474 Bimodal 0.0511 0.0493 0.0098 25.5 473.5 Weak High 0.0669 0.0652 0.0111 34 474 Weak Equal 0.0591 0.0573 0.0104 30 478 Weak Low 0.0512 0.0494 0.0098 26 482 Strong High 0.1923 0.1913 0.0161 115 483 Strong Equal 0.1254 0.1242 0.0135 75 523 Strong Low 0.0585 0.0570 0.0096 35 563

## Our population... # A p value...

## And for my next trick... ### Using only these models...

#### 2008 Election ####
# 2008 Parameters
elecDay <- '2008-11-04'
cutOff <- '2008-08-01'
elecPassed = T
elecType <- 'President'
candidates <- list('d'='Obama','r'='McCain')
d_m = 0.5
r_m = 0.5
init_n = 2

## Any one affiliated with Real Clear Politics?  ## 18 states with full polling data still 'available'...   ### RCP Predictions

 Obama McCain Predicted Outcome Predicted Winner Forecast Error FL 49.00 47.20 1.80 Obama 1.00 IN 46.40 47.80 -1.40 McCain 2.50 MO 47.80 48.50 -0.70 McCain 0.60 NC 48.00 48.40 -0.40 McCain 0.70 NH 52.80 42.20 10.60 Obama 1.00 OH 48.80 46.30 2.50 Obama 2.10 VA 50.20 45.80 4.40 Obama 1.90

### RCP Predictions ### My Predictions

 Obama McCain Predicted Outcome Predicted Winner Forecast Error FL 47.40 46.50 0.90 Obama 1.90 IN 46.50 47.40 -0.90 McCain 2.00 MO 47.40 47.70 -0.30 McCain 0.20 NC 47.40 47.30 0.10 Obama 0.20 NH 50.70 42.60 8.10 Obama 1.50 OH 47.80 45.30 2.50 Obama 2.10 VA 49.40 45.50 3.90 Obama 2.40

### My Predictions ### Actual Outcomes

 Obama McCain Outcome Winner FL 51.00 48.20 2.80 Obama IN 50.00 48.90 1.10 Obama MO 49.30 49.40 -0.10 McCain NC 49.70 49.40 0.30 Obama NH 54.10 44.50 9.60 Obama OH 51.50 46.90 4.60 Obama VA 52.60 46.30 6.30 Obama

### Actual Outcomes ### Performance...

 RMSE RMSE, Diff < 2% Number Correctly Predicted My Predictions 4.27 1.17 17.00 RealClearPolitics Predictions 3.23 1.54 16.00 ### RCP Predictions

 Obama Romney Outcome Winner CO 48.70 45.80 2.90 Obama FL 49.30 46.10 3.20 Obama MO 43.00 50.30 -7.30 Romney NC 47.80 46.70 1.10 Obama NH 48.70 45.70 3.00 Obama VA 48.70 45.00 3.70 Obama

### RCP Predictions ### My Predictions

 Obama Romney Outcome Winner CO 47.90 46.10 1.80 Obama FL 48.10 46.30 1.80 Obama MO 42.70 49.30 -6.60 Romney NC 46.60 47.20 -0.60 Romney NH 49.00 44.80 4.20 Obama VA 47.80 45.50 2.30 Obama

### My Predictions    # A Tiny Bit of Context: The Holy War

## "SUBJECTIVE!" "UNINTUITIVE!" "ASYMPTOTIC!" "INTRACTABLE!"

#### "I'm pretty famous."  # A "quarrel".

## In the Bayesian corner...

#### "Oh. Oh dear. Well, then. I do believe... I definitely don't..." # "The scientific method is neither deduction from a set of axioms nor a way of making plausible guesses, as Bertrand Russell said; but that it is a matter of successive approximation to probability distributions." ## And the winner...

#### R. A. Fisher: #### "Clever. Now I will ruin you." ## "Critical tests of this kind may be called tests of significance, and when such tests are available we may discover whether a second sample is or is not significantly different from the first."   # But the thing is... Simulation Code

Election Code

# These slides will be hosted at: www.obscureanalytics.com

## Questions? 