lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


It was thus said that the Great Alex Marinko once stated:
> Hi,
> 
> a bizarre idea came to my mind: instead of relying on an email client, I
> would like to construct my own email system by assembling together
> existing code (LuaSocket, LuaPOP3, etc). It would not be a full fledged
> email client (and probably would have only a very limited, text-based UI,
> at best). Something along the lines of Mail Handler. I would need it for
> ordinary email operations, nothing complex. And IMAP4 is not needed at
> all.
> 
> The idea would be to reproduce, in Lua, something similar to Mail Handler,
> but probably much simpler. As far as I know, no such system has been
> constructed in Lua yet (please correct me if I am wrong), but from what I
> understand it would not be too difficult to put it together starting with
> Lua programs which provide SMTP and POP3 functionalities. Once cobbled
> together, I would be running an email system similar to Mail Handler and
> nmh.
> 
> Could you please give me some suggestions as to the best way to implement
> such a system (possibly, by storing each email as a separate file on the
> disk)? Would it be very difficult to put together, starting from the
> components already available? Is anyone interested in such a system?
> Anyone willing to cooperate? Any ideas will be appreciated.

  The hardest part, by far, will be actually parsing the email messages,
especially headers [1] that contain email addresses (like From:, To:, Cc:,
etc.).  I've been playing around with this for the past few years (not even
part time---basically, when I have *nothing* else to do, which is why it's
taken a few years) and I *think* I finally have a handle on it.  For
instance:

From: Sean Conner <sean@conman.org>, (Sean Conner) <sean@conman.org>,
	sean@conman.org (Sean Conner), sean@conman.org, "Sean (I Am A
	Programmer)" Conner <sean(that's me)@conman(my domain) . com>

  All of those are valid email addresses.  Okay, a few are pathological
but are quite possible (theoretically; probably never in practice).  Second
hardest (especially for older email) is Date:---you would not believe the
crap I've seen in that header.  I've even come across problematic Message-ID
headers (and about 20% of email I have doesn't have a Message-ID header).

  Second hardest issue about email is the use of character sets.  Most
contain the proper character set in use, but that still leaves plenty where
you have to guess at the set being used (not to mention the ones that don't
bother using US-ASCII in the headers---technically not allowed but hey, I
have plenty of emails that break that particular part of the standard).  

  Now, with that out of the way, a decent method of storing emails is one
per file, and there's even a semi-standard for that [3].  My preference is
to take the Message-ID (if it doesn't exist, generate one), take a hash
(SHA1, MD5, pick your favorite) and use that result as the basis for the
directory/filename.  I also store two versions of the headers and the body
as separate files.  For example:

	Message-ID: <d1b.3e6bbea3.37310e87@aol.com>

  This (I include the brackets since it's part of the message id) hashes to
(I use MD5 since it was handy):

	fff6c8c5b7ae790d732d6cf50b8a5ff6

  I then break the hash up into three components:

	fff6 c8c5 b7ae790d732d6cf50b8a5ff6

  The first two components become directories (I've found that too many files
in a single directory has performance issues) and the third the basis for
the filename.  The base filename becomes the third portion of the hash plus
the message ID (sans the brackets):

	b7ae790d732d6cf50b8a5ff6,d1b.3e6bbea3.37310e87@aol.com

  I do this in case two email message IDs hash to the same value.  With that,
I create three files per email, the body, and two for headers.  The first
one for headers only contains the From:, To:, Date: and Subject: headers,
which for me, are typically the only ones I'm insterested in (say, for
displaying purposes).  The other headers file contains the full set of
headers.  So, this method creates:

	fff6/c8c5/b7ae790d732d6cf50b8a5ff6,d1b.3e6bbea3.37310e87@aol.com,B	
	fff6/c8c5/b7ae790d732d6cf50b8a5ff6,d1b.3e6bbea3.37310e87@aol.com,HF
	fff6/c8c5/b7ae790d732d6cf50b8a5ff6,d1b.3e6bbea3.37310e87@aol.com,HS

	,B = body of email message
	,HF = full headers
	,HS = From:, To:, Subject:, Date: headers only

  For "folders" of email, I use a text file that contains message IDs of
emails for that "folder".  The upside---an email message can be in multiple
"folders" while maintaining a single copy of the email.  The downside---I
need to track the "folders" an email is in (probably with the use of another
header, but I haven't gotten that far yet).  

  It works for me (and I have a ton of personal email and USENET messages
dating back to the early 90s).  

  -spc (And quite a bit of this work has been done with Lua code I wrote [4])
  
[1]	RFCs for email headers:

	RFC-822		earliest currently used standard [2]
	RFC-1036	USENET headers---may be of some use to email
	RFC-2045	MIME headers
	RFC-2046
	RFC-2047
	RFC-2048
	RFC-2049
	RFC-2369	Mailing list headers
	RFC-2822	update of RFC-822
	RFC-2919	List-ID
	RFC-5064	Archived-At header
	RFC-5322	update of RFC-2822

[2]	The RFCs leading up to RFC-822:

	RFC-0561	
	RFC-0680
	RFC-0724
	RFC-0733

[3]	mdir format.

[4]	Except for the header parsing---for that I use C code, and I'm still
	working on that.