Probably one potential problem would be the splitting of a string into words. A minor problem is that of figuring out what a "word" is, i.e. the division between words.
Otherwise looks like a pretty standard Bayesian analysis, which I believe pg has done already.