From: "C. Hillman" Newsgroups: sci.math Subject: Re: Information Theory fundamental interpretations. Date: Wed, 27 May 1998 20:36:04 -0700 On 28 May 1998, Mike wrote: > I am interested in understanding the fundamental interpretation about the > variables involved with information theory. I have seen how they define the > amount of information content in terms of (bits) as a function of > probabilities. They define a bit of information as any situation in which > there is an equal chance of 2 possibilities that can result. The equation > is defined on the bottom of the following Webpage: > > http://jcbmac.chem.brown.edu/baird/Chem22I/lectures/SecondLaw/ch1925.html > > My question is: Given a number of facts (bits of information), what is the > certainty of any conclusion. In other words can the above reference > equation be solved for the probabilities? You don't mention your background in math, but I would guess that you would find the original 1948 paper by Claude Shannon which founded information theory (at least parts I & II) quite accessible. You can find that paper in Shannon & Weaver, The Mathematical Theory of Information, University of Illinois Press, 1949 (still in print, despite the date; skip the introduction by Weaver, though). Reading this paper would give you a good appreciation for Shannon's interpretation of the expression H(p) = -sum_{j=1]^n p(x_j) log p(x_j) I'm not quite sure what you really have in mind by asking "given a certain number of facts, what is the certainty of any conclusion?" Perhaps you are guessing (correctly) that in a specific technical sense, the information content H(C) of a conclusion drawn from a data set of information content H(D) satisfies H(C) <= H(C). There are a number of ways to make this idea precise. I suspect, though, that you may be more interested in decision theory and its relation to information. See the text by Blahut. A more elementary (but also more comprehensive) text (not covering decision theory but covering many related ideas) is Elements of Information Theory, by Cover and Thomas, Wiley, 1991. > Another question is: What is the maximum amount of information that can be > obtained? Does the above equation have a maximum, or can be infinite for > perhaps an infinite number of event with a particular set of probabilities? I can't use a webbrowser so I couldn't check what "entropy" you are thinking of; I was only guessing that you mean H(p) as defined above. If so, H(p) does indeed have a maximum value of log n, attained iff all the p(x_j) = 1/n. Shannon 1948 has a very good discussion of this. > Also, is it fair to plug the probability for only one even into the above > formula and get the information content of that event? Depends on what you mean by "information content". You can certainly think of H(p) in terms of "information functions" which do assign values to single events, but in Shannon's interpretation, the information gained when you perform an experiment with only one outcome is zero. Chris Hillman TO REACH ME BY EMAIL: the address optimist@u.washington.edu is only for spammers; human correspondents can reach me at the address you can find by visiting my home page: http://www.math.washington.edu/~hillman/personal.html (If you already know my email address--- I haven't moved, this is just a ruse to foil the spammers!)