• 1. Basic Probability

    Lecture 1

    Prop 1.2.1 (De Morgan's)

    (i=1Ai)C=i=1AiC

    Proof

    Strategy: prove they are subsets of each other

    Let ω(i=1Ai)C. Then ωi=1AiωAi  iωAiC  iωi=1AiC. So we have (i=1Ai)Ci=1AiC

    Let ωi=1AiC. Then ωAiC  iωAi  iωi=1Aiω(i=1Ai)C. So we have i=1AiC(i=1Ai)C

    (i=1Ai)C=i=1AiC

    Proof

    Let ω(i=1Ai)C. Then ωi=1AiωAi for some i ωAiC for some i ωi=1AiC. So LHS RHS

    Let ωi=1AiC. Then ωAiC for some i ωAi for some i ωi=1Ai,ω(i=1Ai)C. So RHS LHS

    Def. Power Set

    Denoted 2Ω, a power set consists of all subsets of Ω.

    Its cardinality is given by #(2Ω)=2#Ω

    Def. Sigma Algebra/Field

    A sigma algebra A2Ω on Ω is a set (containing subsets of Ω) that:

    1. contains the null set

    A

    1. closed under unions

    A1,A2,...Ai=1AiA

    1. closed under complementation

    AAACA

    {,Ω}coarsest σ algebra all other sigma algebras 2Ωfinest σ algebra

    Def. Probability Measure

    A probability measure defined on a set Ω with σ algebra A is a function P:A[0,1] with the following properties:

    1. normed

      P(Ω)=1

    2. countably additive

      P(i=1Ai)=i=1P(Ai), for mutually disjoint Ai

    To turn finite additivity into countable additivity, add infinitely many null sets.

    Many sample spaces are infinite sets, and there is no P that can be defined for every element of these sets. We thus restrict the domain of P to be a subset A2Ω.

    Prop 1.2.2 (Some Event Must Occur)

    If (Ω,A,P) is a probability model, then P() = 0

    Proof

    Let Ai= for i=1,2,... so the Ai are mutually disjoint, and i=1Ai=

    (Contradiction) Suppose that P()>0, then P()=i=1P()=P()=, which ⇒⇐P()[0,1], so P()=0

    Lecture 2

    Hierarchy Elements (ω) -> sets of elements (events or A) -> sigma algebras (A) -> Borel sets (Bk)

    Prop 1.3.1 (Intersection of Sigma Algebras)

    If Aλ:λΛ is a family/set of σ-algebras on Ω, then λΛAλ is a σ algebra on Ω

    Proof

    λΛAλ must have the properties of a σ-algebra:

    1. Aλ  λλΛAλ
    2. A1,A2,...Aλ  λi=1AiAλ  λi=1AiλΛAλ
    3. AAλACAλ  λACλΛAλ

    Since the intersection contains the null set, is closed under unions and complementation, it is a σ-algebra.

    Def. Sigma Algebra Generated by C

    A(C) is obtained by intersecting all σ-algebras containing C2Ω.
    It is thus the smallest σ algebra on Ω containing all subsets in C.

    Def. Borel Set

    Bk is a σ algebra generated by open sets. Formally:

    It is the smallest σ algebra on Ω=Rk containing all rectangles of the form (a, b] where a=[a1ak],b=[b1bk]Rk

    Bk=×i=1k(ai,bi)

    =(a1,b1]×...×(ak,bk]

    ={(x1,...,xk):ak<xk<bk}

    Bk, since 2Rk contains all such rectangles. Bk2Rk, since there is a subset ARk that is not a borel set.

    Loosely speaking, any set that can be defined explicitly is a borel set.
    (Nice) transformations of borel sets are also borel sets.

    Def. Ellipsoidal Region

    A ball of radius r centered at x0 is given by Br(x0)={x:(xx0)T(xx0)=xix0i2=i=1k(xix0i)2r2}Bk

    The set that forms its boundary is denoted Sr(x0), and obtained by replacing with =.

    Applying an affine transformation y=Ax+b on Br(x0), where A is an invertible matrix A=[a11...a1kak1...akk]Rk×k,bRk

    ABr(x0)+b={y:y=Ax+b for some xBr(x0)}={y:(A1(yb)x0)T(A1(yb)x0)r2}(x=A1(yb))={y:(ybAx0μ)T(A1)T(A1)Σ1(ybAx0μ)r2}(pull out A1)={y:(yμ)TΣ1(yμ)r2}=Er(μ,Σ)Bk

    we obtain an ellipsoidal region centered at μ=Ax0+b, whose axes and orientation are determined by Σ=((A1)TA1)1=((AT)1A1)1=ATA and r

    Recall: A matrix is…
    - symmetric if AT=A
    - invertible if A1A=I (The 0 matrix is not invertible)
    - positive definite if vTAv0  vRk

    Σ is…

    Note for the multivariate normal, μ is the mean vector, Σ is the variance matrix.

    Lecture 3

    Def. Limit inferior/superior of a Sequence

    For a sequence AnΩ:

    • lim infAn=n=1i=nAi={ω:ω is in all but finitely many Ai}

      • ω is a member of at least one of the intersections
    • lim supAn=n=1i=nAi={ω:ω is in infinitely many Ai}

      • ω is a member of all the unions

    Properties:

    Monotone Increasing/Decreasing Sequences

    i=nAi is an increasing sequence of sets (as i increases, fewer sets are intersected, the resulting intersection gets bigger)

    i=nAi is an decreasing sequence of sets (as i increases, fewer sets are unioned, the resulting union gets smaller)

    Prop 1.4.1 (Monotone Sequences Converge)

    A monotone decreasing sequence of sets converges to their intersection.

    If AnA  n, and A1A2..., then AnA=i=1Ai

    Proof

    Need to prove that lim inf = lim sup:

    (1) Since AnAn1... , we have that i=nAi=An, so lim supAn=n=1i=nAi=n=1An

    (2) Also, i=nAi=i=1Ai, so lim infAn=n=1i=nAi=i=1Ai (if we union the same set over and over again, we get that set)

    Optional subproof:
    i=nAii=1Ai  n, since the intersection of many sets intersection of fewer sets
    Other direction: let ωi=nAi, so ωAn...A1, i.e. ωi=1Ai i=nAii=1Ai
    Since they are subsets of each other, i=nAi=i=1Ai

    (1 & 2) Since lim infAn=i=nAi=i=1Ai=lim supAn. Hence we have convergence: AnA=i=1Ai

    A monotone increasing sequence of sets converges to their union.

    If AnA  n, and A1A2..., then AnA=i=1Ai

    Proof

    (1) Since AnAn+1..., we have that i=nAi=An so lim infAn=n=1i=nAi=n=1An

    (2) Also, i=nAi=i=1Ai, so lim supAn=n=1i=nAi=i=1Ai (intersecting the same set over and over again gives that set)

    (1 & 2) Since lim infAn=lim supAn, we have convergence: Ani=1Ai.

    Prop 1.4.2 (Continuity of P)

    If AnA  n and AnA, then P(An)P(A) as n

    Note The converse is true

    Proof

    By the previous proposition, we know (1) & (2)

    (1) Since i=nAi is a monotone decreasing sequence, it converges to the intersection of the sets,
    i.e. i=nAii=1i=nAi=lim supAn

    (2) Since i=nAi is a monotone increasing sequence, it converges to the union of the sets,
    i.e. i=nAii=1i=nAi=lim infAn

    By (1) & (2), P(i=1Ai)P(lim supAn), and P(i=1Ai)P(lim infAn)

    So P(i=1Ai)P(An)P(i=1Ai)P(lim infAn)P(An)P(lim supAn)P(An)P(A)

    Suppose An is a monotone increasing sequence, so AnA=i=1Ai

    Now create mutually disjoint BiA like so {B1=A1,B2=A2A1CB3=A3A2C... such that An=i=1nBiP(An)=i=1nP(Bi)

    So limnP(An)=limni=1nP(Bi)=i=1P(Bi)=P(i=1Bi)=P(i=1Ai)=P(limnAn)

    Suppose An is a monotone decreasing sequence, so AnC is monotone increasing.

    Hence limnP(AnC)=P(limnAnC)=P(i=1AiC)=P((i=1Ai)C)=1P(i=1Ai)=P(limnAn)

    Prop 1.4.3 (Prob Measure on a Sigma Algebra)

    P is a probability measure on A if P:A[0,1] satisfies

    (1) P(Ω)=1

    (2) P is additive

    (3) P(An)P(A) as n whenever AnA  n and AnA

    Proof

    (1) and (2) are contained in the def of probability measure (normed and countably additive)

    Combining additivity (2) with continuity (3), we have that P is countably additive:

    (3) can also be written as AnAlimnP(An)=P(A)

    Let Bn=i=1nAi, where A1,A2,...A are mutually disjoint.

    Then Bn is a monotone increasing sequence of events with limBn=n=1Bn=n=1i=1nAi=i=1Ai

    Since P(i=1Ai)=P(limBn)=limP(Bn)=limP(i=1nAi)=limi=1nP(Ai)=i=1P(Ai),

    So continuity finite additivity

    Important Note Countable additivity continuity of P. By ensuring countable additivity, we ensure continuity of P, which is needed when we have an infinite sample space.

    Def. Conditional Probability Model

    If (Ω,A,P) is a probability model and CA has P(C)>0, then the conditional probability model given C is (Ω,A,P(|C)), where P(|C):A[0,1] is given by P(A|C)=P(AC)P(C)

    Proof

    (1) P(Ω|C)=P(ΩC)P(C)=P(C)P(C)=1

    (2) If An,A2,...A are mutually disjoint,

    then P(i=1Ai|C)=P(i=1Ai)CP(C)=P(i=1(AiC))P(C)=i=1P(AiC)P(C)=i=1P(Ai|C)

    Since P is normed and countably additive, (Ω,A,P(|C)) is a probability model.

    Note The model can also be presented as (C,AC,P(|C))

    Prop 1.5.1 (LOTP / Thm of Total Prob.)

    Suppose C1,C2,...A with P(Ci)>0  i, and Ω=i=1Ci with CiCj=,  i,j, then for any AA,P(A)=i=1P(Ci)P(A|Ci)

    Proof

    Since A=i=1ACi where CiA are mutually disjoint,
    P(A) = i=1P(ACi)=i=1P(ACi)P(Ci)P(Ci)=i=1P(A|Ci)P(Ci)

    Fact If each Ci is a partition of Ω, then A=i=1(ACi) and the sets ACi are mutually disjoint

    Proof

    Since CiCj= when ij, we have (ACi)(ACj)=, and A=i=1(ACi)=Ai=1Ci (also i=1Ci=Ω)

    Lecture 4

    Def. Statistically Independent

    If (Ω,A,P) is a probability model and A,CA, then A and C are statistically independent if P(AC)=P(A)P(C)

    It follows that when P(C)>0, P(A|C)=P(AC)P(C)=P(A)P(C)P(C)=P(A)

    Statistically Independent Sigma Algebras

    A and B are statistically independent if every element of the σ-algebra generated by A:{,A,AC,Ω} is statistically independent of every element of the σ-algebra generated by B: {,B,BC,Ω}

    Proof

    C and are statistically independent since C=  C, and so P(C)=P()P(C)=P()=0

    C and Ω are statistically independent since CΩ=C  C, and so P(CΩ)=P(C)P(Ω)=P(C)

    A and BC are statistically independent since ABC=A(AB)C=A(AB), and so P(ABC)=P(A)P(AB)=P(A)P(A)P(B)=P(A)(1P(B))=P(A)P(BC)

    AC and B are statistically independent in the same vein.

    AC and BC are statistically independent since P(ACBC)=P((AB)C)=1P(AB)=1P(A)P(B)+P(AB)=1P(A)P(B)+P(A)P(B)=(1P(A))(1P(B))=P(AC)P(BC)

    Def. Mutually Statistically Independent

    When (Ω,A,P) is a probability model and {Aλ:λΛ} is a collection of sub σ-algebras of A,
    the Aλ are mutually statistically independent if P(A1...An)=i=1nP(Ai)   n,
    where distinct λ1,...,λnΛ, and A1Aλ1,...,AnAλn.

    Notes

    Union of 3 events (Inclusion-Exclusion Principles)

    P(ABC)=P(A)+P(B)+P(C)P(AB)P(AC)P(BC)+P(ABC)

    Proof

    P(ABC)=P((AB)C)=P(AB)+P(C)P((AB)C))=P(A)+P(B)P(AB)+P(C)P((AC)(BC))=P(A)+P(B)+P(C)P(AB)P(AC)P(BC)+P((AC)(BC))=P(A)+P(B)+P(C)P(AB)P(AC)P(BC)+P(ABC)

    Generalized to n events

    P(A1...An)=i=1nP(Ai)i<jP(AiAj)+i<j<kP(AiAjAk)...+(1)n+1P(A1...An)

    Proof

    Base The result is true for n=2: P(AB)=P(A)+P(B)P(AB)

    I.H. Assume it's true for n

    Consider

    P(A1...AnAn+1)=P((A1...An)An+1)=P(A1...An)+P(An+1)P((A1...An)An+1)=P(A1...An)(1)+P(An+1)P((A1An+1)...(AnAn+1))(2)

    (1)P(A1...An)=i=1nP(Ai)i<jnP(AiAj)+...+(1)n+1P(A1...An)

    (2)P((A1An+1)...(AnAn+1))=i=1nP(AiAn+1)i<jnP(AiAjAn+1)+...+(1)n+1P(A1...AnAn+1)

    Combining the above, we have

    P(A1...An+1)=i=1n+1P(Ai)i<jP(AiAj)+...+(1)n+2P(A1...An+1)

    Intersection of 3 events

    P(ABC)=P(A)+P(B)+P(C)P(AB)P(AC)P(BC)+P(ABC)

    Proof

    LHS=1P((ABC)C)=1P(ACBCCC)=1[P(AC)+P(BC)+P(CC)P(ACBC)P(ACCC)P(BCCC)+P(ACBCCC)]=1[3P(A)P(B)P(C)(1P(AB))(1P(AC))(1P(BC))+(1P(ABC))]=RHS

    Generalized to n events

    P(A1An)=i=1nP(Ai)i<jP(AiAj)++(1)n+1P(A1An)

    2. Random Variables and Stochastic Processes

    Lecture 5

    Motivation If we have a population — Ω, a measurement of some sort —X(ω) , and we want to assign probabilities to events — aX(ωb), or X(ω)[a,b], the probabilities are on Ω instead of R1 — this is difficult. To navigate this, we use inverse images.

    Def. Inverse Image

    Under the function X:ΩR1, the inverse image of the set BR is given by X1B={ωΩ:X(ω)B}

    It is the set of ω that get mapped into B.

    Note to self X(ω)=bX1{b}=ω

    E.g. Suppose Ω={1,2,3,4,5} and X(ω)={0ω=10.20ω=20.30ω=30.01ω=40.20ω=5

    Note that X is not 1-1

    Given a set B, determine X1B

    B = [0, 1] X1B=Ω

    B = [0.00, 0.25] X1B={1,2,4,5}

    B = {0} X1B=1

    B = (, 0) X1B=

    Property Inverse images preserve Boolean operations.

    Proof for Unions

    Let ωX1(B1B2), then X(ω)B1B2

    ωX1B1 or ωX1B2

    ωX1B1X1B2

    So X1(B1B2)X1B1X1B2 1

    Suppose ωX1B1X1B2,

    ωX1B1 or ωX1B2

    X(ω)B1 or X(ω)B2

    X(ω)B1B2

    ωX1(B1B2)

    So X1B1X1B2X1(B1B2) 2

    By 1 and 2, we have X1(B1B2)=X1B1X1B2 since they are subsets of each other

    Proof for Complements

    Let ωX1BC, then X(ω)BC

    X(ω)B

    wX1B

    ω(X1B)C

    So X1BC(X1B)C 1

    Suppose ω(X1B)C

    ωX1B

    X(ω)B

    X(ω)BC

    ωX1BC

    So (X1B)CX1BC 2

    By 1 and 2, X1BC=(X1B)C

    Property If B1B2=, then X1B1 and X1B2 are also disjoint.

    Proof

    Suppose AB=, then X1AX1B=X1(AB)=X1=

    Def. Random Variable

    A random variable is a function X:ΩR1 with the property that for any BB1 (i.e. Borel set in R1), X1BA.

    Thus, when X is a random variable, P(X(ω)B)=P(X1B)

    Prop 2.1.1 (Marginal Probability Measure)

    When X is a r.v., the marginal probability measure of X is PX, which is defined on B1 by PX(B)=P(X1B)

    Proof

    PX:B1[0,1]

    1. Normed: PX(R1)=P(X1R1)=P(Ω)=1
    2. Countably additive: If B1,B2,... are mutually disjoint elements of B1,
      then PX(i=1Bi)=P(X1i=1Bi)=P(i=1X1Bi)=i=1P(X1Bi)=i=1PX(Bi)

    Note The probability model for a random variable X is (R1,B1,PX)

    Prop 2.1.2 (Determine whether X is a random variable)

    If X1(a,b]A for every a,bR1, then X is a random variable.

    Proof

    Let B1={BB1:X1BA}

    1. Since B1 and X1A, we know B1

    2. If BB1, then X1BA(X1B)C=X1BCA

      Since BCB1 and X1BCA, we know BCB1

    3. If B1,B2,...B1, then X1B1,X1B2,...Ai=1XBi=X1i=1BiA

      Since i=1BiB1 and X1i=1BiA, we know i=1BiB1

    By 1 (contains null set), 2 (closed under comp), & 3 (closed under union), we know B1 is a sub σ-algebra of B1 1

    By hypothesis, (a,b]B1  a,bR1B1B1, since B1 is the smallest σ-algebra containing all intervals (a,b] 2

    By 1 and 2, they are subsets of each other, so B1=B1X1BA,  BB1 X is a random variable.

    Examples

    Note When A=2Ω, then any X:ΩR1 is a random variable.

    Prop 2.1.3 (Sum & Prod of R.V.s are R.V.s)

    If X, Y are random variables defined on Ω, then (1) W = X+Y and (2) W = XY are both random variables.

    Proof of (1) W = X + Y

    Suppose ωW1(,b]={ω:X(ω)+Y(ω)b}

    Let cnQ be such that cnb, then  qQ such that X(ω)q and Y(ω)cnq,

    We can take the intersection to get that ω(X1(,q]Y1(,cnq])A

    We can express the set of all cn as Cn=qQ({ω:X(ω)q}{ω:Y(ω)cnq}), so W1(,b]set of ωCn,  n

    Since Q is countable, and Cn is a countable union of elements of A, we have that CnA

    By hypothesis Cn is monotone decreasing, so limnCn=n=1Cn=W1(,b]AW=X+Y is a r.v.

    Proof of (2) W = XY

    Suppose b = 0, then

    W1(,0]={ω:X(ω)0,Y(ω)0}{ω:X(ω)0,Y(ω)0}=(X1(,0]Y1[0,))(X1[0,)Y1(,0])A

    Suppose b > 0, then

    W1(,b]=W1(,0]W1(0,b]

    We've shown W1(,0]A, so we just need to show the other part: W1(0,b]A.

    W1(0,b]={ω:X(ω)>0,Y(ω)>0,X(ω)Y(ω)b}{ω:X(ω)<0,Y(ω)<0,X(ω)Y(ω)b}={ω1}{ω4}

    Since xy=b is symmetrical over the line y=-x, proving the argument for one of 1 & 4 will suffice.

    Suppose ω 1 and let cnb. Then  qQ(0,) such that ωX1(0,q]Y1(0,cn/q]A

    Cn=q  Q(0,)X1(0,q]Y1(0,cn/q]A since Q(0,) is countable.

    Since Cn11AW1(0,b]A

    A similar argument holds for b < 0. For any b, W1(,b]A, so W=XY is a r.v.

    E.g. p(X)=i=0naiXi is a r.v. if X is a r.v.

    Any constant function Y(ω)=c is a r.v., so all ai are r.v.'s.

    The product of r.v.'s is a r.v., so all aiXi are r.v.'s

    The sum of r.v.'s is a r.v., so i=0naiXi is a r.v.

    Prop 2.1.4 (Sigma Algebra generated by X)

    When X is a random variable, AX=X1B1={X1B:BB1} is a sub σ-algebra of A, called the σ-algebra on Ω generated by X.

    Alternative notation: AX=A({X1(a,b]:a,bR1})

    Proof

    1. =X1AX

    2. If A1,A2,...AX, then  B1,B2,...B1 such that Ai=X1Bi.

      So i=1Ai=i=1X1Bi=X1i=1BiAX (since i=1BiB1)

    3. If AAX, then  BB1 such that A=X1V.

      So AC=(X1B)C=X1BCAX (since BCB1)

    By 1(contains null), 2(closed under unions), 3(closed under complementation), AX is a sub σ-algebra of A

    Def. Random Vector

    Recall

    A random variable is a function X:ΩR1 with the property that for any BB1, X1BA.

    Thus, when X is a random variable, P(X(ω)B)=P(X1B), since X1B={ω:X(ω)B}

    A random vector is a function X:ΩRk with the property that for any BBk, X1BA.

    Thus, when X is a random vector, P(X(ω)B)=P(X1B), since X1B={ω:X(ω)B}

    Properties

    Example (Pt. 1)

    Suppose we have Ω={1,2,3},A=2Ω, and the uniform prob measure P

    Let X=[X1X2]:ΩR2 be given by X(ω)=[X1(ω)X2(ω)] where X1,X2 are defined as X1(1)=0X1(2)=0X1(3)=1 and X2(1)=1X2(2)=0X2(3)=0

    X1{(0,1)}={1}X1{(0,0)}={2}X1{(1,0)}={3}X1B={(0,1),(0,0),(1,0)B{1} or {2} or {3} if only one of (0,0),(0,1),(1,0)B{1,2} or {1,3} or {2,3} if only two of (0,0),(0,1),(1,0)BΩ(0,1),(0,0),(1,0)B

    PX(B)={0(0,1),(0,0),(1,0)B1/3 if only one of (0,1),(0,0),(1,0)B2/3 if only two of (0,1),(0,0),(1,0)B1(0,1),(0,0),(1,0)B

    Example (Pt. 2)

    What if we change the def of X2? If X1,X2 are now X1(1)=0X1(2)=0X1(3)=1 and X2(1)=1X2(2)=1X2(3)=0, what is PX?

    Only 2 possible outputs now: (0,1) and (1, 0)

    X1{(0,1)}={1,2}X1{(1,0)}={3}X1B={(0,1),(1,0)B{1,2}(0,1)B,(1,0)B{3}(0,1)B,(1,0)BΩ(0,1),(1,0)B

    Then for BB2,PX(B)={0(0,1),(1,0)B2/3(0,1)B,(1,0)B1/3(0,1)B,(1,0)B1(0,1),(1,0)B

    Example (Pt. 3)

    If P is not uniform, but instead defined P({1})=12,P({2})=13,P({3})=16, what is PX?

    PX(B)={0(0,1),(1,0)B5/6(0,1)B,(1,0)B1/6(0,1)B,(1,0)B1o/w

    Prop 2.1.5 (Cartesian Prod of Borel Sets is a Borel Set)

    If B1,...,BkB1, then B1×...×Bk={(x1,...,xk)T|xiBi,i=1,...,k}Bk and Bk is the smallest σ-algebra on Rk containing all such sets

    Proof

    Consider the sets R1×...×Bi×...×R1 that only restrict the ith coord.

    Then {R1×...×Bi×...×R1|BiB1} is a sub σ-algebra of Bk

    Sub-proof Let By={B×R1×...×R1:BB1}

    (,b]×R1×...×R1Bk  bR1B×R1×...×R1Bk

    1. ×R1×...×R1=Bk
    2. If Bi×R1×...×R1By for i = 1, 2, …, then i=1Bi×R1×...×R1=(i=1Bi)×R1×...R1By since i=1BiB1
    3. If B×R1×...×R1By, then (B×R1×...×R1)C=BC×R1×...×R1By since BCB1.

    So B1×...×Bk=i=1k(R1×...×Bi×...×R1)Bk

    Since each k-cell (a,b]=(a1,b1]××(ak,bk] is of this form, there a σ-algebra on Rk containing all such sets that is smaller than Bk.

    Prop 2.1.6 (A Vector of R.V.s is a Random Vector)

    If Xi:ΩR1 is a random variable for i=1,...,k, then X=(X1,...,Xk)T:ΩRk is a random vector.

    Proof

    Suppose B1,...,BkB1. By the previous proposition, B1×...×BkBk. Then we have

    X1(B1×...×Bk)={ω:X(ω)B1×...×Bk}={ω:Xi(ω)Bi for i = 1, ..., k}=i=1kXi1BiA

    Since X1(a,b]A  a,bRkX1BA  BBkX is a random vector.

    Lecture 6

    Def. K-cells

    (a,b]=×i=1k(ai,bi], or ×i=1k(,bi]

    K-cells are the basic sets we want to assign probabilities to (using random vectors)

    For k = 2, (a, b] =

    Def. Cumulative Distribution Function (CDF)

    The cumulative distribution function FX:Rk[0,1] for random vector XRk is given by FX(x1,...,xk)=PX((,x1]×...×(,xk])=PX((,X])

    Def. Difference Operator

    For any g:RkR1, the i-th difference operator Δa,b(i) g:Rk1R1 is given by (Δa,b(i) g)(x1,...,xi1,xi+1,...,xk)=g(x1,...,xi1,b,xi+1,...,xk)g(x1,...,xi1,a,xi+1,...,xk)

    Prop 2.2.1 (Properties of Distribution Functions)

    Any distribution function FX:Rk[0,1] satisfies

    1. If aibi for i=1,...,k, then PX((a,b])=Δa1,b1(1)Δa2,b2(2)...Δak,bk(k)FX

    2. As xi,FX(x1,...,xk)0

      As xi,FX(x1,...,xk)1

    3. FX is right continuous

      If δi0  i, then FX(x1+δ1,...,xk+δk)FX(x1,...,xk)

    Proof for (1)

     

    Proof for (2)

     

    Proof for (3)

     

    Thm 2.2.1 (Extension Theorem)

    If F:Rk[0,1] satisfies the 3 properties of distribution functions, then   a unique probability measure P on Bk, such that F is the distribution function of P

    Note such an F determines a probability model (Rk,Bk,P) and we can define a random vector with this model by taking Ω=Rk and X(ω)=ω

    Now we can present PX by a function of points (rather than sets) FX

    Def. Marginal Distributions

    Def. Discrete Probability Models

    Prop 2.3.1 (Countably Many Points with Positive Prob)

    Prop 2.3.2 (Prob Measure Defined by p)

    Def. Multinomial Distribution

    Def. Multivariate Hypergeometric Distribution

    Lecture 7

    Def. Continuous Probability Models

    Def. Absolutely Continuous Probability Models

    Def. Probability Density Functions (PDF)

    Prop 2.4.1 (Properties of A.C. Models)

    1. f(x)0 with probability 1
    2. Rkf(x)dx=1
    3. F(x)=F(x1,...,xk)=xk...x1f(z1,...,zk)dz1...dzk
    4. f(x)=f(x1,...,xk)=kF(x1,...,xk)x1...xk

    Prop 2.4.2 (Properties of PDFs)

    f:(Rk,Bk)(R1,B1) is a density function for a.c. model (Rk,Bk,P) if

    1. f(x)0  x
    2. Rkf(x)dx=1

    Def. Multivariate Normal Distribution

    Lecture 8 & 9

    Suppose we transform the random vector XRk to the random vector Y=T(X)R1

    Discrete case

    If X is discrete (with prob function pX), then pY(y)=PY({y})=PX(T1{y})=xT1{y}pX(x)

    Def. Projections (& their Prob Functions)

    Suppose k2 , then the projection on the first 2 coordinates is (y1,y2)=T(x1,,xk)=(x1,x2)

    Prob Function Derivation:

    To find the probability functions of projections, take the joint probability function, and sum out unwanted variables.

    T1{y}=T1{(y1,y2)}={(x1,,xk):x1=y1,x2=y2}

    py(Y)=pY(y1,y2)=PX(T1{y})=xT1{y}pX(x)=(x1,,xk):x1=y1,x2=y2pX(x1,,xk)=(x3,,xk)Rk2pX(y1,y2,x3,,xk)fix x1,x2 to y1,y2

    The projection on the second coordinate is y=T(x1,,xk)=x2

    Prob Function Derivation:

    T1{y}=T1{y}={(x1,,xk):x2=y}

    py(Y)=pY(y)=(x1,,xk):x2=ypX(x1,,xk)=(x1,x3,,xk)Rk1pX(x1,y,x3,,xk)

    Marginal of a Multinomial Random Vector

    Let X=(X1,,Xk) multinomial (n,p1,,pk), then pX(a)=(na1  ak)p1a1pkak
    where aRk, ai{0,,n}, and a1++ak=n

    Suppose k2, (y1,y2)=T(x1,,xk)=(x1,x2), and we want to find the distribution of Y=(X1,X2)

    By the defined constraints, y1,y2,a3,,ak{0,,n} and y1+y2+a3++ak=n

    a3,,ak{0,,ny1y2} and a3++ak=ny1y2()

    so  pY(y1,y2)=(a3,,ak) sat. ()(ny1 y2 a3 ak)p1y1p2y2p3a3pkak=n!y1!y2!p1y1p2y2(a3,,ak) sat. ()1a3!ak!p3a3pkaktook out terms where i=1,2=n!y1!y2!(ny1y2)!p1y1p2y2(a3,,ak) sat. ()(ny1y2)!a3!ak!p3a3pkakmultiplied prev by (ny1y2)!(ny1y2)!=(ny1  y2  ny1y2)p1y1p2y2(1p1p2)ny1y2multiply by (1p1p2)ny1y2(a3,,ak) sat. ()(ny1y2a3ak)(p31p1p2)a3(pk1p1p2)aksum of all multinomial(ny1y2,p31p1p2,...,pk1p1p2) probabilities, so =1divide by (1p1p2)a3+...+ak=(ny1  y2  ny1y2)p1y1p2y2(1p1p2)ny1y2

    Thus, (X1,X2) multinomial (n,p1,p2,1p1p2)

    Binomial(n, p) = Multinomial(n, p, 1-p)

    If X=(X1,,Xk) multinomial (n,p1,,pk), then prove Xi binomial (n,pi)= multinomial (n,pi,1pi) Note this is easy to see intuitively since the multinomial arises by placing n ind. observations into k mutually disjoint categories, and when we project onto l coordinates we are now categorizing into l+1 mutually disjoint categories

    PX1(x1)=x2=0nx1p(x1,x2,nx1x2)(x1,x2,nx1x2)=x2=0nx1(nx1 x2 nx2x2)p1x1pxx2(1p1p2)nx2x2=x2=0nx1n!x1!x2!(nx1x2)!p1x1pxx2(1p1p2)nx2x2now multiply by (nx1)!(nx1)!=n!x1!(nx1)!p1x1(1p1)nx1x2=0nx1(nx1)!x2!(xx1x2)!(p21p1)x2(1p21p1)nx1x2sum of all binomial(nx1,p21p1) probabilities, so = 1=n!x1!(nx1)!p1x1(1p1)nx1

    So X1binomial(n,p1)

    Sum of sub-Multinomial Random Vector ~ Binomial

    Use the previous note to determine the distribution of Y=X1++Xl for lk when (X1,,Xk) multinomial(n,p1,,pk)

    Note in the discrete case, if T is 1-1 and T1{y}ϕ, then pY(y)=PX(T1{y})=pX(T1{y})

    Y=X1+...+Xl is the number of responses falling in the first l categories.

    A response falls into one of these categories with probability p1+...+pl.

    So Ybinomial(n,p1+...+pl)

    Def. Indicator Function

    For AΩ, the indicator function IA:ΩR1 is given by IA(ω)={1 if ωA0 if ωAc

    Indicator Variable ~ Bernoulli(P(A))

    Prove: if (Ω,A,P) is a probability model and AA, then Y=IA is a random variable with YBernoulli(P(A))

    AAIA:ΩR1, and   BB1,IA1B={ω:IA(ω)B}={0,1BA1B,0BAC0B,1BΩ0,1B A

    Since for any BB1, IA1BA, we know Y=IA is a r.v.

    PY(1)=P(IA1{1})=P({ω:IA(ω)=1})=P(A)YBern(P(A))

    Transformation Determines Distribution Type

    Y=T(X) could have a discrete distribution no matter how X is distributed.

    E.g. Suppose T(x)=cRl for every x, then pY(y)=PX(T1{y})={PX(Rk)=1 if y=cPX()=0 if yc

    and the distribution of Y is degenerate at c

    E.g. Suppose XN(0,1), so P(X0)=P(X>0)=1/2

    Y=T(X)=I(,0](X)={1 if X00 if X>0 pY(1)=P(X0)=1/2pY(0)=P(X>0)=1/2 Y Bernoulli (1/2)

    Absolutely continuous case

    Suppose XRk has density function fX, and Y=T(X)Rl where lk.

    Y is also absolutely continuous with density fY which we want to determine.

    Cdf Method

    Generally, the cdf method works with projections T when there is a formula for FX :

    fy(y1)=kFY(y1,...,yl)y1...yl=kPX(T1{(,y1]×...×(,yl})y1...yl

    E.g. Define F:R2[0,1] by F(x1,x2)={0x1<0 or x2<01ex1ex2+ex1x2x10 and x20

    It was proved (in a lec 6 exercise) that this is a cdf (using thm 2.2.1),

    so f(x1,x2)=2F(x1,x2)x1x2={0x1<0 or x2<0ex1x2x10 and x20

    Check that f is a valid pdf:

    (i) f(x1,x2)0 for all (x1,x2)

    (ii) f is normed:

    f(x1,x2)dx1dx2=00ex1x2dx1dx2=0ex1dx10ex2dx2=(ex1|0)(ex2|0)=1

    so it is valid and we obtain F(x1,x2)=x1x2f(z1,z2)dz1dz2

    Therefore, if Y=T(X1,X2)=X1, then FX1(x1)=F(x1,)={0x1<01ex1x10

    so fX1(x1)=FX1(x1)x1={0x1<0ex1x10, and fX2(x2)=FX2(x2)x2={0x2<0ex2x20

    Thus, both X1 and X2 have exponential(1) distributions

    E.g. Suppose y=T(x1,x2)=x1+x2, and (X1,X2) has the triangular density f(x1,x2)={20<x1<x2<10o/w

    Fy(y)=PY((,y])=P(X1,X2)({(x1,x2):x1+x2y})={0y<00y/2x1yx12dx2dx1=y2/20y11y/21yx2x22dx1dx2=2yy2/211y212<yfY(y)={0y0 or y2y0<y<12y1y<2

    Change of Variable Method

    Suppose T:RkRk is 1-1 and smooth (i.e. all 1st order partial derivatives exist and are continuous),

    so T(X)=(T1(x)Tk(x)) and we can find the Jacobian JT(x)=|det(T1(X)x1T1(X)xkTk(X)x1Tk(X)xk)|1

    Since JT(x)=limδ0vol(Bδ(X))vol(TBδ(X)), JT1(x) indicates how T is changing volume at x,

    so JT(x)<1 means T expands volume at x, and JT(x)>1 means T contracts volumes at x=T1(y)

    If Y=T(X), then for small δ,

    fY(y)PY(TBδ(T1(y)))vol(TBδ(T1(y)))=PX(Bδ(T1(y)))vol(Bδ(T1(y))vol(Bδ(T1(y)))vol(TBδ(T1(y)))fX(T1(y))JT(T1(y))

    This intuitive argument can be made rigorous to prove the following.

    Proposition II.5.1 (Change of Variable)

    If T:RkRk is 1-1 and smooth, and Y=T(X) where X has an a.c. distribution with density fX,
    then Y has an a.c. distribution with density fY(y)=fX(T1(y))JT(T1(y))

    E.g. If we have a uniform dist f(x)=12 for 0<x<2, find the density for y=T(x)=x2.

    T1(y)=y1/2, JT(x)=|det(2x)|1=12x for x(0,2)

    Note: solving JT(x)=1, we see that T contracts lengths on (0, 1/2) and expands lengths on (1/2, 2)

    fY(y)=f(T1(y))JT(T1(y))=f(y1/2)12y1/2={0y0 or y41/4y1/20<y<4

    E.g. Prove φ(x)dx=1 for φ, the N(0,1) pdf.

    Consider (φ(x)dx)2=φ(x)dxφ(y)dy=12πexp(x2+y22)dxdy

    Make the polar coordinate change of variable T(x,y)=(r,θ) where for r(0,),θ[0,2π)

    (x,y)=T1(r,θ)=(rcosθ,rsinθ)

    JT1(r,θ)=|det(rcosθrrcosθθrsinθrrsinθθ)|1=|det(cosθrsinθsinθrcosθ)|1=|r(cos2θ+sin2θ)|1=1/r

    Since JT(x)=JT11(T(x))=1JT1(T(x))=r, and r2=x2+y2,

    (φ(x)dx)2=002πr2πexp(r2/2)dθdr=0rexp(r2/2)dr=exp(r2/2)|0=1

    so φ(x)dx=1

    Def. Affine Transformation

    (Affine transformation are linear transformations plus a constant.)

    T:RkRk is an affine transformation if
    T(x)=Ax+b=(a11x1++a1kxk+b1a21x1++a2kxk+b2ak1x1++akkxk+bk) where bRk,ARk×k

    So JT(x)=|det(T1(x)x1T1(x)xkTk(x)x1Tk(x)xk)|1=|detA |1

    Note: T(x1)=T(x2) iff A(x1x2)=0, so T is 1-1 iff A is a nonsingular (invertible) matrix, in which case, T1(y)=A1(yb)=x

    If Y=AX+b, then fY(y)=fX(T1(y))JT(T1(y))=fX(A1(yb))|detA |1

    Multivariate Normal

    Suppose ZNk(0,I), so fZ(z)=(2π)k/2exp(zz/2) for zRk

    Let X=AZ+μZ=A1(Xμ) where ARk×k is nonsingular and μRk, then since X is an affine transformation, we know it has an a.c. distribution with density:

    fX(x)=fZ(A1(xμ))|detA|1=(2π)k/2exp((A1(xμ))A1(xμ)/2)|detA|1plug in Z=A1(Xμ)=(2π)k/2|detA|1exp((xμ)(A1)A1(xμ)/2)reorder=(2π)k/2|detAdetA|1/2exp((xμ)(AA)1(xμ)/2)det(A)=det(AT)=(2π)k/2|detAA|1/2exp((xμ)(AA)1(xμ)/2)det(AB)=det(A)det(B)=(2π)k/2(detΣ)1/2exp((xμ)Σ1(xμ)/2)

    where Σ=AARk×k

    If a random vector X has this pdf, XNk(μ,Σ)

    Note Σ is symmetric, invertible, and positive definite (see note from lecture 2)

    Ex. Suppose