Evan Martin (evan) wrote in evan_tech,
Evan Martin

removing duplicates using formail

An offlineimap hiccup from a long time ago put two copies of every message in my inbox. formail has a duplicate filter: it stores every message id in a cache file and when it sees a messageid twice, it returns success. If you add in the -s (split mail) flag, it will only output any message once.
formail -D 10000000 cache -s cat < mbox > mbox2

Then to convince offlineimap to not recopy all of your local duplicates of messages back up to the server, rm your local copy of this mailbox and all mentions of it in ~/.offlineimap and its subdirectories. The next run will cause a full download of that mailbox.

(No work today, so it's mail-cleaning time: adding more data to the spamassassin bayes learner, removing duplicates, and next I'll try that bounce-filtering scheme y'all suggested.)

  • blog moved

    As described elsewhere, I've quit LiveJournal. If you're interested in my continuing posts, you should look at one of these (each contains feed…

  • dremel

    They published a paper on Dremel, my favorite previously-unpublished tool from the Google toolchest. Greg Linden discusses it: "[...] it is capable…

  • treemaps

    I finally wrote up my recent adventures in treemapping, complete with nifty clickable visualizations.

  • Post a new comment


    default userpic
    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.