lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



On Tuesday, September 10, 2002, at 09:29  am, Reuben Thomas wrote:

<OT>
To think MS chose XML as native format for Office 11... Unless they always compress it, I expect the files to bloat excessively. Hard disk is cheap these day, but I still have a 6GB HD on my personal computer, and applications
already eat a lot of it...
</OT> -- See, even mailing list/newsgroups notations are infected :-)

I'd far rather they did that than the binary brain dump they usually choose. Binary files are dreadful for forwards, backwards, and external compatibility.

That's really just a question of how you encode XML. If you store it as
ASCII, yes it will bloat (but Gnumeric does this, and it compresses well:
a spreadsheet I have is ~120Kb of bloated XML, but only 2.5Kb when
gzipped).

Compressed XML is an efficient storage format for general data. The repeated tags are exactly what zip like compressors exploit. I'd imagine that compressed XML could be loaded and decompressed a lot faster than uncompressed XML could be loaded.

I recall hearing about research in to processing directly on entropy compressed (zip like) files. With that approach, it may be possible to parse and search directly on a compressed file, giving tight storage and fast processing.

Cheers,
	Benjohn