BWT for NLP (2)
I show how the Burrows-Wheeler Transform can be used to compute the similarity between two strings. We submitted results from this method (along with results from the Context-Chain metric developed by my colleagues Frank Schilder and Ravi Kondadadi) for the Automatically Evaluating the Summaries of Peers (AESOP) task of the TAC 2009 conference.
The task was to produce an automatic metric to evaluate machine generated summaries (i.e., system summaries) against human generated summaries for the TAC ’09 Update Summarization Task. Clearly the automatic metric is just some function that produces a similarity score between the system summary and the human generated (the so-called model) summary.
The proposed metrics were evaluated by comparing their rankings of the system summaries from different peers to that of the ranking produced by human judges.
We use an estimate of the conditional “compressibility” of the model summary given the system summary as the similarity metric. The conditional compressibility is defined as the increase in the compressibility of the model summary when the system summary has been observed.
We define the compressibility of any string
by
and the conditional compressibility of string over an alphabet
given another string
over the same alphabet as
where is the concatenation of the strings
and
,
is the entropy of string
, and
is the length of the string
.
.
We use as the similarity metric to measure the similarity of a system summary
to the model summary
.
Our metric is similar to the one proposed by Li and Vitanyi and is theoretically well-justified from the perspective of algorithmic information theory. One peculiarity is that our similarity is asymmetric.
The only thing that is needed to implement the above similarity metric is an estimate of the entropy for a string
. We use the BWT for this estimate.
BWT-based String Entropy Estimate
We use the Move-To-Front (MTF) entropy of the Burrows-Wheeler transform of a given string as an estimate for its entropy $H(S)$.
In this paper the MTF coding is used to define the MTF entropy (which the authors also call local entropy) of a string as
where is the
symbol of the MTF coding of the string
.
Now we define , the entropy of string
as
where is the BWT of string
.
Since the Burrows-Wheeler transform involves just the construction of a suffix array, the computation of our compression based evaluation metric is linear in time and space in the length of the model and system summary strings.
Some Technical Details
Results

The results on the TAC-AESOP task (above) show that the BWT based metric (FraCC in the table) is reasonable for summarization evaluation, especially because there are not very many knobs to tune. I obtained these results from Frank (who will present them at TAC next week). The “best metric” is the AESOP submission that seemed to have high scores across several measures.
There is a tool, “complearn”, to compute this kind of distances and then do clustering: http://www.complearn.org/
This tool airises from the Ph.D. activity of R.Cilibrasi that worked with Vitanyi: http://cilibrar.com/
E.
Em,
Thanks. I was aware of Cilibrasi and Vitanyi’s work. Marcus Hutter is another guy who does similar things. Hopefully one day I’ll get down to reading and understanding the theory behind this stuff better.