Juris Kalnins wrote:
[snip]
[snip]
[snip]
Does anyone have a set of data points for squish, where we compare
data-compressed sizes of sources, where:
(1) keyword token replacement filter, versus
(2) no keyword token replacement
Now, in (2), LZ coding would zap most keywords into a sliding dictionary
match code, whereas in (1), the initial size of the filtered sources will be
smaller but there is more variation in the symbol frequencies of the source
code (token symbols added) and less chance to make sliding dictionary
matches.
So, would there be a big difference when we compare compressed sources? Say,
we tabulate results as:
(1) original
(2) token filtered
(3) original, compressed
(4) token filtered, compressed