NLP - Probability Review

date
Aug 19, 2024
type
Post
AI summary
The document reviews key concepts in probability, including random variables (discrete and continuous), probability distributions (PMF and PDF), expectation and variance, joint distributions, conditional expectation, and complex queries involving random variables. It discusses applications such as order statistics and the expectation of products, along with conditional probabilities and normalization constants.
slug
ml-probability-review
status
Published
tags
NLP
summary
Reviews key concepts in probability, including random variables (discrete and continuous), probability distributions (PMF and PDF), expectation and variance, joint distributions, conditional expectation, and complex queries involving random variables.

Random Variable

A random variable is a numerical outcome of a random phenomenon. Essentially, it is a function that assigns a numerical value to each outcome in a sample space of a random experiment.
  1. Discrete Random Variable: A discrete random variable can take on a countable number of distinct values.
  1. Continuous Random Variable: A continuous random variable can take on any value within a certain range.

Probability Distribution

  1. Probability Mass Function (PMF) for Discrete Random Variables:
      • The PMF, denoted as , gives the probability that a discrete random variable is exactly equal to some value .
      • Example: If is the number of heads in two flips of a fair coin, the PMF might be:
  1. Probability Density Function (PDF) for Continuous Random Variables:
      • The PDF, denoted as , describes the likelihood of the random variable taking on a particular value. The probability of falling within a particular interval is given by the area under the curve of the PDF over that interval.
      • Example: If is the time until the next bus arrives, its PDF might be a function like for an exponential distribution, where .

Expectation

The expectation of a random variable is a measure of the central tendency, or the average value it takes.
For a discrete random variable, it is calculated as:
For a continuous random variable:

Variance

The variance of a random variable measures the spread of its values.
For a discrete random variable:
For a continuous random variable:

Conditional Probabilities & Normalization Constant

Conditional Probability

Conditional probability is the probability of an event occurring given that another event has already occurred. In the context of random variables, it refers to the probability of one random variable taking a certain value given that another random variable has taken a specific value.
The conditional probability of an event given an event is denoted as and is defined as:
For random variables and , the conditional probability of given is:

Normalization Constant

When dealing with conditional probabilities, especially in cases involving multiple possible outcomes, the probabilities may need to be normalized to ensure that they sum up to 1. This is where the normalization constant, often denoted as , comes into play.
Suppose you have a set of unnormalized probabilities for different values of . To normalize these probabilities, you multiply them by a constant such that the sum of the probabilities equals 1:
The normalized conditional probabilities are then given by:

Complex Queries on Random Variables

Joint Distributions and Correlations

When dealing with multiple random variables, one of the key questions involves understanding how they relate to each other. The joint distribution of two or more random variables describes the probability distribution of these variables taken together. For example, if and are two random variables, the joint distribution tells you the probability that takes a value and takes a value simultaneously.
A full joint distribution is a probability distribution that captures the likelihood of every possible combination of values for a set of random variables. It represents the most comprehensive description of the relationships between these variables by specifying the probability for each possible scenario.
If you have multiple random variables, say , the full joint distribution is written as:
Hidden variables (also known as latent variables) are variables in a probabilistic model that are not directly observed or measured, but they influence the observed variables. These variables are "hidden" because their values are not known, yet they affect the relationships between the observed variables.
In many practical situations, some aspects of a system might not be directly observable, but their presence is inferred through their effects on other variables. Hidden variables are used in models to capture these effects and explain the dependencies between observed variables.
For instance, consider a model with two observed variables, and , and a hidden variable :

Conditional Expectation

Conditional expectation involves finding the expected value of a random variable given that another random variable has taken on a specific value. This is particularly useful in scenarios where the occurrence of one event influences the expected outcome of another.
  • Complex Query Example:
    • Given that , what is the expected value of ?
    • If you know the value of one random variable, how does that information change your expectation of another random variable?
The conditional expectation is formally written as , which reads as "the expected value of given that equals ."

Convolution of Random Variables

When dealing with the sum of two independent random variables, the resulting distribution is obtained through a process called convolution. Convolution is a mathematical operation that combines two distributions to produce a third, representing the sum of the random variables.
  • Complex Query Example:
    • What is the probability distribution of the sum if and are independent random variables?
    • How does the convolution of their probability density functions (PDFs) affect the resulting distribution?
Mathematically, the convolution for discrete random variables is given by:
For continuous random variables, the convolution is expressed as:

Expectation of Products

In some cases, you might be interested in the expectation of the product of two or more random variables. This can be more complex when the random variables are not independent.
  • Complex Query Example:
    • What is for two random variables and ?
    • How does the covariance between and influence the expectation of their product?
In the case of independent random variables, . However, if the variables are dependent, the calculation becomes more involved and may require knowledge of their covariance.

Variable Independence and the Product Rule

Variable Independence

Two random variables and are independent if the occurrence of one does not affect the occurrence of the other. This is mathematically expressed as:
This equation indicates that the joint probability of $$X$$ and $$Y$$ equals the product of their individual probabilities.

Product Rule

The product rule relates joint probability to conditional probability. It states:
 
When and are independent, the product rule simplifies to:

Bayes' Rule

Bayes' Rule is derived from the product rule and allows you to update the probability of a hypothesis based on new evidence. It is given by:
Bayes' Rule is a powerful tool in probabilistic reasoning, especially in scenarios where you need to update probabilities as new information becomes available.
 

© Qiwei Mao 2024