[SLL] Looking for a uber-MHonArc

Derek Simkowiak dereks at realloc.net
Tue Jan 20 10:09:19 PST 2009


    It seems like there's demand for a forum+email app that would 
combine forum threading and searching with email clients.

    Jeremy's thread from 2009-01-06 ("joined web forum and mailing 
list(s)") ended with a pointer to an existing deployment at matronics.com:

http://forums.matronics.com/viewforum.php?f=3

    Did anyone ever figure out what phpBB plugin (or add-on) Matronics 
is using for that functionality?

Anand> /[...] (stored in Outlook PST, which is another problem, since I need to convert PST to mbox)/

    I recommend Maildir format over mbox.  Since each message is an 
individual (plain) text file, it's easier to find specific emails 
messages using grep/scripts/vi/etc.  I've used Maildir with several 
thousands of messages (a few Gigs' worth) and haven't had a problem.  
The only limitations in Maildir come from the ext2/ext3 filesystem when 
you have many thousands of messages, so I recommend ext4 (or similar) 
for Big Systems.

    mbox works fine too, though.  Either one is infinitely better than 
PST.  :)

Anand> /So when we archive 200+ PSTs/mboxes, we are sure to have hundreds of duplicate mails. I am wondering whether any clean solution exists? Or does it need some custom scripting work?/

   This is not a complete solution, but here's an idea for getting the 
existing archive out of PST files and avoiding duplicate copies:

1. Set up an IMAP server on a Linux box (Courier-IMAP works well, but it 
uses Maildir format.)

2. Have the Outlook machines, with the archival PST files, connect to 
the IMAP server.  You'll need to set up user accounts on the IMAP server 
to do this.

    You can now "drag'n'drop" the emails from the local Outlook copy 
(PST file) onto the IMAP server.  Now you're out of the "vendor lock-in" 
woods -- the messages are out of PST and into your IMAP server's 
preferred format, either Maildir or mbox.

3. Use the excellent program "imapsync" to synchronize all of the user's 
new IMAP folders into a centralized, single, IMAP account.

   From the imapsync man page:

We sometimes need to transfer mailboxes from one imap server to another. 
This is called migration.

imapsync is the adequate tool because it reduces the amount of data 
transferred by not transferring a given message if it is already on both 
sides. Same headers, same message size and the transfer is done only 
once. All flags are preserved, unread will stay unread, read will stay 
read, deleted will stay deleted. You can stop the transfer at any time 
and restart it later, imapsync is adapted to a bad connection.

    Instead of migrating from one IMAP server to another, just "migrate" 
all your PST-dumped folders into a single user account on that same 
Linux IMAP server.  Now you've eliminated duplicates and created a 
central history repository in a single IMAP account.

    From here, you can either process the IMAP server's mbox file 
directly, or re-forward all the messages through MHonArc.  It's been a 
long time since I looked at MHonArc, I can't remember what the options are.

    Note, you should do a couple of test runs with imapsync before 
processing large batches.  You'll probably need to use the --skipheader 
option to ignore certain mail headers that make the dupes not look like 
dupes.  You may also need to use --syncinternaldates option.  Give 
yourself a week (or so) to read up on imapsync and to do some initial 
testing.


--Derek

Anand Vaidya wrote:
> Hi All,
>
> I am familiar with MHonArc (and Hypermail) mbox to HTML archivers. However, I 
> have the following problem to solve:
>
> - A company has 200 odd developers (located in different countries), who email 
> one another regarding ongoing project .
>
> - The team leader wants a searchable (threaded) archive built. The intention 
> is to quickly lookup  threads of discussion and glean problems / analysis / 
> solutions as well as to generate summaries quickly.
>
> - We discussed wiki, forums etc as alternative documentation solutions, which 
> all devs must update online but found those not workable in this specific 
> situation and decided to go the MailArchiver route.
>
> My first instinct was to get everyone to CC: a special mailbox and run MHonArc 
> on that. This should work in the future.
>
> However, we will miss all the existing mails (stored in Outlook PST, which is 
> another problem, since I need to convert PST to mbox) of all the developers.
>
> So when we archive 200+ PSTs/mboxes, we are sure to have hundreds of duplicate 
> mails. I am wondering whether any clean solution exists? Or does it need some 
> custom scripting work?
>
> Regards
> Anand
>   



More information about the linux-list mailing list