Probability Ross Chapter 7 Notes

14 May 2018

Example 4c

$$ \frac{n(n-1)}{N(N-1)}-\fracpB{n}N^2=\frac{nN(n-1)}{N^2(N-1)}-\frac{n^2(N-1)}{N^2(N-1)} $$

$$ =\frac{n^2N-nN-(n^2N-n^2)}{N^2(N-1)}=\frac{-nN+n^2}{N^2(N-1)}=\frac{-n(N-n)}{N^2(N-1)} $$

7.5.2 Computing Expectations by Conditioning

Let’s look at the first two sentences in this section: “Let us denote by $\evc{X}{Y}$ that function of the random variable $Y$ whose value at $Y=y$ is $\evc{X}{Y=y}$. Note that $\evc{X}{Y}$ is itself a random variable.”

This might be a bit more clear: define $\phi_X(\wt)=\evc{X}{\wt}$ as follows

$$ \phi_X(Y=y)\equiv\Ecwrt{X|Y}{X}{Y=y}\equiv\cases{\int_{-\infty}^{\infty}x\pdfa{x|y}{X|Y}dx=\int_{-\infty}^{\infty}x\frac{\pdf{x,y}}{\pdfa{y}{Y}}dx&X\text{ continuous}\\\sum_{x}xp_{X\bar Y}(x\bar y)=\sum_xx\frac{p(x,y)}{p_{Y}(y)}&X\text{ discrete}} $$

Then $\phi_X(\wt)=\evc{X}{\wt}$ is a function of a random variable. Since a function of a random variable is itself a random variable, I think this makes it a bit more clear that $\phi_X(Y)=\evc{X}{Y}$ is a random variable.

For emphasis, we state again that $\phi_X(Y)=\evc{X}{Y}$ is a function of $Y$, not of $X$. Then $\phi_X(Y)=\evc{X}{Y}$ is a function of the random variable $Y$ and is a random variable itself. Hence, like any random variable, we can take its expected value (with respect to $Y$):

Proposition 5.1

$$ \E{X}=\E{\evc{X}Y}=\Ewrt{Y}{\Ecwrt{X|Y}{X}Y} $$

Proof The discrete case was proven in the text. Here we prove the continuous case. Let $X$ and $Y$ be continuous random variables. Then

$$ \Ewrt{Y}{\Ecwrt{X|Y}{X}Y}=\int_{-\infty}^{\infty}\Ecwrt{X|Y}{X}{Y=y}\wts\pdfa{y}{Y}dy=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}x\frac{\pdf{x,y}}{\pdfa{y}{Y}}dx\wts\pdfa{y}{Y}dy $$

$$ =\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}x\pdf{x,y}dx\frac1{\pdfa{y}{Y}}\wts\pdfa{y}{Y}dy $$

$$ =\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}x\pdf{x,y}dxdy $$

$$ =\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}x\pdf{x,y}dydx \tag{change order} $$

$$ =\int_{-\infty}^{\infty}x\int_{-\infty}^{\infty}\pdf{x,y}dydx $$

$$ =\int_{-\infty}^{\infty}x\pdfa{x}{X}dx=\E{X} $$

The only questionable equation here is the change of integration order. We can use measure theory and Fubini’s Theroem to show that, for sufficiently nice distributions $X$ and $Y$, this change of integration order is good. And of course the limits are constant, so it’s straightforward.

$\wes$

Example 5j

$$ m(x)=\int_0^1\evc{N(x)}{U_1=y}dy $$

$$ =\int_0^x\evc{N(x)}{U_1=y}dy+\int_x^1\evc{N(x)}{U_1=y}dy $$

$$ =\int_0^x\sbr{1+m(x-y)}dy+\int_x^11dy $$

$$ =\int_0^x1dy+\int_0^xm(x-y)dy+\int_x^11dy $$

$$ =\int_0^1dy+\int_0^xm(x-y)dy=1+\int_0^xm(x-y)dy $$

$$ u=x-y\dq y=x-u\dq dy=-du\dq u_0=x-y_0=x\dq u_1=x-y_1=0 $$

$$ =1+\int_0^xm(x-y)dy=1-\int_x^0m(u)du=1+\int_0^xm(u)du $$

Succintly:

$$ m(x)=1+\int_0^xm(u)du $$

The Fundamental Theorem of Calculus Gives

$$ m'(x)=m(x) $$

or

$$ \frac{m'(x)}{m(x)}=1 $$

Let’s integrate the left side:

$$ \int\frac{m'(x)dx}{m(x)} $$

$$ y=m(x)\dq dy=m'(x)dx $$

$$ \int\frac{m'(x)dx}{m(x)}=\int\frac{dy}{y}=\ln{y}=\ln(m(x)) $$

Example 5l INCOMPLETE

$$ \int_0^1p^i(1-p)^{n-i}dp $$

$$ u=(1-p)^{n-i}\dq du=-(n-i)(1-p)^{n-i-1}dp\dq dv=p^idp\dq v=\frac{p^{i+1}}{i+1} $$

$$ \int_0^1p^i(1-p)^{n-i}dp=\int_0^1udv=uv\eval01-\int_0^1vdu $$

$$ =(1-p)^{n-i}\frac{p^{i+1}}{i+1}\eval01+\int_0^1\frac{p^{i+1}}{i+1}(n-i)(1-p)^{n-i-1}dp $$

$$ =(0-0)+\frac{n-i}{i+1}\int_0^1p^{i+1}(1-p)^{n-i-1}dp $$

$$ =\frac{n-i}{i+1}\int_0^1p^{i+1}(1-p)^{n-(i+1)}dp $$

$$ =\frac{n-i}{i+1}\frac{n-(i+1)}{i+2}\int_0^1p^{i+2}(1-p)^{n-(i+2)}dp $$

$$ \int_0^1p^{0}(1-p)^{n-0}dp=\frac{n-0}{0+1}\int_0^1p^{0+1}(1-p)^{n-(0+1)}dp $$

Conditional Variance p.347

This is the definition of conditional variance on p.347:

$$ \Vc{X}{Y}\equiv\Ec{(X-\Ec{X}{Y})^2}{Y} $$

Let’s show that

$$ \Vc{X}{Y}=\Ec{X^2}{Y}-\prn{\Ec{X}{Y}}^2 $$

Remember that these are equations of functions. We want to show that for all $y$ in the range of $Y$, the following holds:

$$ \Vc{X}{Y=y}=\Ec{X^2}{Y=y}-\prn{\Ec{X}{Y=y}}^2 $$

Equivalently:

$$ \Ec{(X-\Ec{X}{Y})^2}{Y=y}=\Ec{X^2}{Y=y}-\prn{\Ec{X}{Y=y}}^2 $$

For $y$ in the range of $Y$, define $\mu_y$ to be the conditional mean:

$$ \mu_y\equiv\Ec{X}{Y=y}=\sum_xx\cp{X=x}{Y=y} $$

where the last equation is the definition of conditional expectation on p.331. Then we have

$$ \Vc{X}{Y=y}=\Ec{(X-\Ec{X}{Y})^2}{Y=y} $$

$$ =\Ec{(X-\Ec{X}{Y=y})^2}{Y=y} $$

$$ =\Ec{(X-\mu_y)^2}{Y=y} $$

$$ =\sum_x(x-\mu_y)^2\cp{X=x}{Y=y} \tag{cv.1} $$

$$ =\sum_x(x^2-2x\mu_y+\mu_y^2)\cp{X=x}{Y=y} $$

$$ =\sum_xx^2\cp{X=x}{Y=y}-2\mu_y\sum_xx\cp{X=x}{Y=y}+\mu_y^2\sum_x\cp{X=x}{Y=y} $$

$$ =\Ec{X^2}{Y=y}-2\mu_y\Ec{X}{Y=y}+\mu_y^2 \tag{cv.2} $$

$$ =\Ec{X^2}{Y=y}-2\mu_y^2+\mu_y^2 $$

$$ =\Ec{X^2}{Y=y}-\mu_y^2=\Ec{X^2}{Y=y}-\prn{\Ec{X}{Y=y}}^2 $$

In cv.1 and cv.2, we refer to the remark on p.333: “conditional expectations satisfy the properties of ordinary expectations. For instance”

$$ \Ec{g(X)}{Y=y}=\sum_xg(x)\cp{X=x}{Y=y} $$

Example 7m

For all real values of $s,t$, we have

$$ \evw{\e{sX_c+t(X-X_c)}}=\evw{(p\e{s}+(1-p)\e{t})^X} $$

Now, since $X$ is Poisson with mean $\lambda$, it follows that $\evw{\e{rX}}=\e{\lambda(\e{r}-1)}$ for all real values of $r$. In particular, this holds for $r=r(s,t)=\ln(p\e{s}+(1-p)\e{t})$. This equivalent to $\e{r}=p\e{s}+(1-p)\e{t}$. Hence

$$ \evw{\e{sX_c+t(X-X_c)}}=\evw{(p\e{s}+(1-p)\e{t})^X}=\evw{(\e{r})^X}=\evw{\e{rX}}=\e{\lambda(\e{r}-1)}=\e{\lambda(p\e{s}+(1-p)\e{t}-1)} $$

Example 8b

A number of things from this example are not self-evident to me. First let’s prove this statement by the author:

Suppose $Z~\nda{0}{1}$ and $\Theta~\nda{\mu}{\sigma}$. Also suppose that $Z$ and $\Theta$ are independent. Then the conditional distribution of $Z+\Theta$, given that $\Theta=\theta$, is also normal with mean $\theta$ and variance $1$.

Proof

Since $Z$ and $\Theta$ are independent normals, we have

$$ \pdfa{z,\theta}{Z,\Theta}=\pdfa{z}{Z}\pdfa{\theta}{\Theta}=\frac1{\sqrt{2\pi}}\e{-\frac{z^2}2}\frac1{\sqrt{2\pi}\sigma}\e{-\frac{(\theta-\mu)^2}{2\sigma^2}} $$

Define $W\equiv Z+\Theta$ and $Y\equiv\Theta$. Since $Z$ and $\Theta$ are independent normals, by Proposition 3.2, p.256, we know that $W$ is normal with parameters $\mu=0+\mu$ and $1+\sigma^2$. Obviously $Y~\nda{\mu}{\sigma}$. That is

$$ \pdfa{w}{W}=\frac1{\sqrt{2\pi(1+\sigma^2)}}\e{-\frac{(w-\mu)^2}{2(1+\sigma^2)}}\dq\pdfa{y}{Y}=\frac1{\sqrt{2\pi}}\e{-\frac{(y-\mu)^2}{2\sigma^2}} $$

We wish to show that

$$ \pdfa{w|y}{W|Y}=\frac1{\sqrt{2\pi}}\e{-\frac{(w-y)^2}{2}} $$

The definition of conditional density on p.266 gives us

$$ \pdfa{w|y}{W|Y}=\frac{\pdfa{w,y}{W,Y}}{\pdfa{y}{Y}} \tag{Ex.8b.1} $$

We can compute the joint density in the numerator using equation 7.1 on p.275: Let $g_1(z,\theta)=z+\theta$ and $g_2(z,\theta)=\theta$. Then $W=g_1(Z,\Theta)=Z+\Theta$ and $Y=g_2(Z,\Theta)=\Theta$ and

$$ \pdfa{w,y}{W,Y}=\pdfa{z,\theta}{Z,\Theta}\inv{\normb{J(z,\theta)}} $$

where

$$ J(z,\theta)=\vmtrx{\wpart{g_1}{z}&\wpart{g_1}{\theta}\\\wpart{g_2}{z}&\wpart{g_2}{\theta}}=\wpart{g_1}{z}\wpart{g_2}{\theta}-\wpart{g_1}{\theta}\wpart{g_2}{z}=1\wts1-1\wts0=1 $$

Note that $\theta=y$ and $z=w-\theta=w-y$. Hence

$$ \pdfa{w,y}{W,Y}=\pdfa{z,\theta}{Z,\Theta}\inv{\normb{J(z,\theta)}}=\frac1{\sqrt{2\pi}}\e{-\frac{z^2}2}\frac1{\sqrt{2\pi}\sigma}\e{-\frac{(\theta-\mu)^2}{2\sigma^2}}\wts\frac1{\norm{1}} $$

$$ =\frac1{\sqrt{2\pi}}\e{\frac{(w-y)^2}{2}}\frac1{\sqrt{2\pi}\sigma}\e{-\frac{(y-\mu)^2}{2\sigma^2}} $$

So Ex.8b.1 becomes

$$ \pdfa{w|y}{W|Y}=\frac{\pdfa{w,y}{W,Y}}{\pdfa{y}{Y}}=\frac{\frac1{\sqrt{2\pi}}\e{\frac{(w-y)^2}{2}}\frac1{\sqrt{2\pi}\sigma}\e{-\frac{(y-\mu)^2}{2\sigma^2}}}{\frac1{\sqrt{2\pi}}\e{-\frac{(y-\mu)^2}{2\sigma^2}}}=\frac1{\sqrt{2\pi}}\e{-\frac{(w-y)^2}2} $$

$\wes$

Next claim: “Consequently, the joint density of $Z+\Theta,\Theta$ is the same as that of $X,\Theta$”.

In more detail: We know that $Z+\Theta\bar\Theta=\theta\ndansq{\theta}{1}\sim X\bar\Theta=\theta$. That is, they share the same conditional density and

$$ \frac{\pdfa{x,\theta}{X,\Theta}}{\pdfa{\theta}{\Theta}}=\pdfa{x|\theta}{X|\Theta}=\pdfa{w|\theta}{W|\Theta}=\frac{\pdfa{w,\theta}{W,\Theta}}{\pdfa{\theta}{\Theta}} $$

which implies their joint densities are the same:

$$ \pdfa{x,\theta}{X,\Theta}=\pdfa{w,\theta}{W,\Theta} $$

Next claims:

$$ \evw{X}=\evw{Z+\Theta}\dq\varw{X}=\varw{Z+\Theta}\dq\corr{X}{\Theta}=\corr{Z+\Theta}{\Theta} $$

In more detail: The multivariate normal distribution is determined completely by its expected values and covariances. In this bivariate case, we have two expected values. One of them is $\evw{\Theta}$. Hence the other must be $\evw{X}=\evw{Z+\Theta}$. The same argument applies to the variance and correlation.

Next claim: $\varw{Z+\Theta}=1+\sigma^2$. In detail: this follows directly from p.324, equation 4.1:

$$ \varw{Z+\Theta}=\varw{Z}+\varw{\Theta}+2\covw{Z}{\Theta}=1+\sigma^2+0 $$

where $\covw{Z}{\Theta}=0$ follows from the independence of $Z$ and $\Theta$.

Next, the author claims that

$$ \frac{\covw{Z+\Theta}{\Theta}}{\sqrt{\varw{Z+\Theta}\varw{\Theta}}}=\frac\sigma{\sqrt{1+\sigma^2}} $$

In more detail:

$$ \frac{\covw{Z+\Theta}{\Theta}}{\sqrt{\varw{Z+\Theta}\varw{\Theta}}}=\frac{\covw{Z}{\Theta}+\covw{\Theta}{\Theta}}{\sqrt{(1+\sigma^2)\sigma^2}}=\frac{0+\varw{\Theta}}{\sqrt{1+\sigma^2}\sigma}=\frac{\sigma^2}{\sqrt{1+\sigma^2}\sigma} $$

In the first equality, the numerator follows from Proposition 4.2.iv on p.323. In the second equality, the independence of $Z$ and $\Theta$ implies $\covw{Z}\Theta=0$. Proposition 4.2.ii gives $\covw\Theta\Theta=\varw\Theta$.

Next claim: “Because $X,\Theta$ has a bivariate normal distribution, the conditional distribution of $\Theta$, given that $X=x$, is normal with mean”

$$ \evw{\Theta|X=x}=\evw{\Theta}+\rho\sqrt{\frac{\varw{\Theta}}{\varw{X}}}(x-\evw{X}) $$

and variance

$$ \varw{\Theta|X=x}=\varw{\Theta}(1-\rho^2) $$

This claim follows directly from p.268, the last computation and last paragraph: “Recognizing the preceding equation as a normal density, we can conclude that, given $Y=y$, the random variable $X$ is normally distributed with mean $\mu_x+\rho\frac{\sigma_x}{\sigma_y}(y-\mu_y)$” and variance $\sigma_x^2(1-\rho^2)$”.

Section 7.8.2

These are unproven but used tools in the book:

Proposition 7.8.2.1 Functions of Independent Variables are Independent Let $X$ and $Y$ be independent random variables. Then $g(X)$ and $h(Y)$ are independent for any functions $g$ and $h$.

Proof Fix a set $G$ in the range of $g(X)$ and fix a set $H$ in the range of $h(Y)$. We wish to show that

$$ \pr{g(X)\in G,h(Y)\in H}=\pr{g(X)\in G}\pr{h(Y)\in H} $$

Define $G_x=\set{x\in\wR:g(x)\in G}$ and $H_y=\set{y\in\wR:h(y)\in H}$. Then we have

$$ g(X)\in G\iff X\in G_x\quad\text{and}\quad h(Y)\in H\iff Y\in H_y $$

This implies that

$$ g(X)\in G,h(Y)\in H\iff X\in G_x,Y\in H_y $$

Hence

$$ \pr{g(X)\in G,h(Y)\in H}=\pr{X\in G_x,Y\in H_y} $$

$$ =\pr{X\in G_x}\pr{Y\in H_y}=\pr{g(X)\in G}\pr{h(Y)\in H} $$

$\wes$

Proposition 7.8.2.2 Suppose $Y$ is independent of the sequence $X_1,X_2$. Then $Y$ is independent of the sequence $g(X_1,X_2),h(X_1,X_2)$ for any functions $g$ and $h$.

Proof Fix sets $G,H,C$ in the respective ranges of $g(X_1,X_2),h(X_1,X_2),Y$. We wish to show that

$$ \pr{Y\in C,\prn{g(X_1,X_2),h(X_1,X_2)}\in G\times H}=\pr{Y\in C}\pr{\prn{g(X_1,X_2),h(X_1,X_2)}\in G\times H} $$

Define sets $G_x$ and $H_x$ as

$$ G_x=\set{(x_1,x_2)\in\wR^2:g(x_1,x_2)\in G}\dq H_x=\set{(x_1,x_2)\in\wR^2:h(x_1,x_2)\in H} $$

Then

$$ \pr{Y\in C,\prn{g(X_1,X_2),h(X_1,X_2)}\in G\times H}=\pr{Y\in C,g(X_1,X_2)\in G,h(X_1,X_2)\in H} $$

$$ =\pr{Y\in C,(X_1,X_2)\in G_x,(X_1,X_2)\in H_x} $$

$$ =\pr{Y\in C,(X_1,X_2)\in G_x\cap H_x} $$

$$ =\pr{Y\in C}\pr{(X_1,X_2)\in G_x\cap H_x} $$

$$ =\pr{Y\in C}\pr{(X_1,X_2)\in G_x,(X_1,X_2)\in H_x} $$

$$ =\pr{Y\in C}\pr{g(X_1,X_2)\in G,h(X_1,X_2)\in H} $$

$$ =\pr{Y\in C}\pr{\prn{g(X_1,X_2),h(X_1,X_2)}\in G\times H} $$

$\wes$

Proposition 7.8.2.3 Suppose $Y$ is independent of the sequence $X_1,X_2$. Then $Y$ is independent of $g(X_1,X_2)$ for any function $g$.

Proof Fix sets $G,C$ in the respective ranges of $g(X_1,X_2),Y$. We wish to show that

$$ \pr{Y\in C,g(X_1,X_2)\in G}=\pr{Y\in C}\pr{g(X_1,X_2)\in G} $$

Define the set $G_x$ as

$$ G_x=\set{(x_1,x_2)\in\wR^2:g(x_1,x_2)\in G} $$

Then

$$ \pr{Y\in C,g(X_1,X_2)\in G}=\pr{Y\in C,(X_1,X_2)\in G_x} $$

$$ =\pr{Y\in C}\pr{(X_1,X_2)\in G_x} $$

$$ =\pr{Y\in C}\pr{g(X_1,X_2)\in G} $$

$\wes$

Proposition 7.8.2.4 Suppose $Y,X_1,X_2$ are independent. Then $Y$ is independent of the sequence $X_1,X_2$.

Proof Fix sets $A,B,C$ in the respective ranges of $X_1,X_2,Y$. We wish to show that

$$ \pr{Y\in C,(X_1,X_2)\in A\times B}=\pr{Y\in C}\pr{(X_1,X_2)\in A\times B} $$

Then

$$ \pr{Y\in C,(X_1,X_2)\in A\times B}=\pr{Y\in C, X_1\in A, X_2\in B} $$

$$ =\pr{Y\in C, X_1\in A, X_2\in B} $$

$$ =\pr{Y\in C}\pr{X_1\in A}\pr{X_2\in B} $$

$$ =\pr{Y\in C}\pr{X_1\in A,X_2\in B} $$

$$ =\pr{Y\in C}\pr{(X_1,X_2)\in A\times B} $$

$\wes$

Source of confusion: the definition of $Y$: “let $Y$ be a normal random variable … that is independent of the $X_i,i = 1,…,n$”. I will take this to mean that $Y$ is independent of the sequence $X_1,…,X_n$. Now let’s look at a series of claims based on this definition.

Claim: $Y,X_i-\overline{X}, i=1,…,n$ has a multivariate normal distribution. I guess the author is implying that $Y,X_1-\overline{X},X_2-\overline{X},…,X_n-\overline{X}$ are all linear combinations of the independent standard normals $\frac{X_i-\mu}\sigma$. Define $Z_i=\frac{X_i-\mu}\sigma$ and note that

$$ Y=\frac{\sigma}{\sqrt{n}}Z_1+\sum_{i=2}^n0\wts Z_i+\mu $$

Then $Y$ has parameters $\mu=\frac{\sigma}{\sqrt{n}}\wts0+\mu$ and $\frac{\sigma^2}n=\fracpb{\sigma}{\sqrt{n}}^2\wts1^2$ as given.

Next claim: “$Y,X_i-\overline{X}, i=1,…,n$ has the same expected values and covariances as the random variables $\overline{X},X_i-\overline{X},i=1,…,n$”. Since $Y$ and $\overline{X}$ both have mean $\mu$ and variance $\frac{\sigma^2}{n}$, we just need to check the covariances. Proposition 7.8.2.3 from above tells us that $Y$ is independent of any function of the $X_1,…,X_n$. In particular, $Y$ is independent of $X_i-\overline{X}$ for $i=1,…,n$. Hence

$$ \covw{Y}{X_i-\overline{X}}=0,\quad i=1,...n $$

This matches with equation 8.1. Since a multivariate normal distribution is determined completely by its expected values and covariances, then it must be that $Y,X_i-\overline{X}, i=1,…,n$ and $\overline{X},X_i-\overline{X}, i=1,…,n$ have the same joint distribution.

Next claim: “it follows that $Y,X_i-\overline{X}, i=1,…,n$ and $\overline{X},X_i-\overline{X}, i=1,…,n$ have the same joint distribution, thus showing that $\overline{X}$ is independent of the sequence of deviations $X_i-\overline{X},i=1,…,n$”.

Proposition 7.8.2.2 tells us that $Y$ is independent of the sequence of deviations $X_i-\overline{X},i=1,…,n$. Hence the joint density can be factored:

$$ \pdfa{y,x_1-\overline{x},...,x_n-\overline{x}}{Y,X_1-\overline{X},...,X_n-\overline{X}}=h(x_1-\overline{x},...,x_n-\overline{x})\wts g(y) $$

Since $Y,X_i-\overline{X},i=1,…,n$ and $\overline{X},X_i-\overline{X},i=1,…,n$ share the same joint distribution, then it must be that $\overline{X}$ is independent of the sequence of deviations $X_i-\overline{X},i=1,…,n$.

$$ \pdfa{\overline{x},x_1-\overline{x},...,x_n-\overline{x}}{\overline{X},X_1-\overline{X},...,X_n-\overline{X}}=h(x_1-\overline{x},...,x_n-\overline{x})\wts g(\overline{x}) $$

Next claim: “Since $\overline{X}$ is independent of the sequence of deviations $X_i-\overline{X},i=1,…,n$, it is also independent of the sample variance $S^2\equiv\sum_{i=1}^n\frac{(X_i-\overline{X})^2}{n-1}$”.

This claim follows because of proposition 7.8.2.1. Notice that the sample variance is a function of the sequence of deviations.