[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Domain Specific Languages
- From: Benjohn Barnes <benjohn@...>
- Date: Tue, 10 Sep 2002 10:33:11 +0100
On Tuesday, September 10, 2002, at 09:29 am, Reuben Thomas wrote:
<OT>
To think MS chose XML as native format for Office 11... Unless they
always
compress it, I expect the files to bloat excessively. Hard disk is
cheap these
day, but I still have a 6GB HD on my personal computer, and
applications
already eat a lot of it...
</OT> -- See, even mailing list/newsgroups notations are infected :-)
I'd far rather they did that than the binary brain dump they usually
choose. Binary files are dreadful for forwards, backwards, and external
compatibility.
That's really just a question of how you encode XML. If you store it as
ASCII, yes it will bloat (but Gnumeric does this, and it compresses
well:
a spreadsheet I have is ~120Kb of bloated XML, but only 2.5Kb when
gzipped).
Compressed XML is an efficient storage format for general data. The
repeated tags are exactly what zip like compressors exploit. I'd
imagine that compressed XML could be loaded and decompressed a lot
faster than uncompressed XML could be loaded.
I recall hearing about research in to processing directly on entropy
compressed (zip like) files. With that approach, it may be possible to
parse and search directly on a compressed file, giving tight storage
and fast processing.
Cheers,
Benjohn