NLP - Generalized Bayes’ Rule

date

Aug 20, 2024

type

Post

AI summary

Explores the Bayes' theorem, which calculates the posterior probability of a hypothesis given evidence. Discusses how the generalized Bayesian model simplifies calculations under the Naive Bayes assumption of independent effects.

slug

nlp-bayes-rule

status

Published

Bayes' theorem

Bayes' theorem is mathematically expressed as:

Where:

is the posterior probability: the probability of the hypothesis being true given the evidence .

is the likelihood: the probability of observing the evidence given that the hypothesis is true.

is the prior probability: the initial probability of the hypothesis being true before seeing the evidence.

is the marginal likelihood (or evidence): the total probability of observing the evidence under all possible hypotheses. It can be thought of as a normalizing factor ensuring that the posterior probabilities sum to 1.

Example:

Suppose you want to determine whether a patient has a particular disease based on the results of a diagnostic test.

Hypothesis (H): The patient has the disease.

Evidence (E): The test result is positive.

Given:

: Prior probability (1% of the population has the disease).

: Likelihood (90% of patients with the disease test positive).

: Marginal likelihood (10% of the population tests positive overall, considering both sick and healthy people).

Using Bayes' theorem:

So, the posterior probability is 0.09, or 9%. Even with a positive test result, there’s only a 9% chance the patient has the disease because the disease is rare in the population.

Multiple Evidence Variables

By expanding the joint distribution into conditional distribution, we can extend Bayes' theorem to include multiple evidence variables .

Two :

Three :

Generalized Bayesian Model

Naive Bayes Assumption: effect variables are always independent, or that their casual effects are so small as to be negligible. When effects are independent, the multiple evidence variable Bayes theorem can be simplified into the following:

Why it is Useful — Cause & Effect

A lot of problems has the form:

For example, in medical diagnosis: if I observed anosmia, what is the probability that this is caused by covid?

We know the by observation, and in the population.

In the Context of NLP

What is the sentiment of a movie review, given a bunch of words in the review? We can apply the generalized Bayesian Model to compute this probability:

Bayesian Network

Bayesian network is a way of visualizing the casual relationship between random variables.

A Bayesian Net is an equation:

Where,

Left: the full joint distribution of all random variables

Right: must have a conditional probability distribution for each arc in the Bayes Net.

Here’s how to derive this equation using the generalized Bayesian Theorem:

Covid Example

Construct the full joint distribution:

Another example, missing one variable . We need to expand the conditional probability distribution into a full joint distribution first:

Then use the generalized Bayes model: