• + 0 comments

    Perplexity is inversely related to log-likelihood:

    Perplexity = 2^(-log_lik)
    

    For example, in a uni-gram model where every word (w_i) is equally likely, i.e. p(w_i) = 1/V where V is the dictionary size. The perplexity is

    log_lik = 1/n \sum_{i=1}^{n} \log p(w) = - \log V
    Perplexity = 2^[-(-\log V)] = V
    

    Perplexity is also related to a branching factor, i.e. there are V possibilities (branches) for the next word given a uniform distribution over words.