Evan Martin (evan) wrote in evan_tech,
Evan Martin

avva is hardcore

I know I've said it before, but avva is hard core. From pan-devel:
Judging by recent discussions on this mailing list, developers of Pan have more or less decided to move to a DB backend. I've been working towards a different goal for the last few days, trying to make Pan work for me with very large groups in its current model of storing article headers in memory. This wasn't motivated by any ideological opposition to DB backends in general; I merely wanted to be able to use Pan for all my Usenet needs as soon as possible. Pan is the only GUI newsreader I can use w/o yearning for a return to slrn every minute or so.

I ended up with a patch that allows me to browse a 1-million-headers newsgroup comfortably on my machine, which is more or less what I needed. Basically, I use refcounted strings and normalised subjects. There's a new string type, RString, which stores unique strings only once by using a global hash table and a refcount field to keep track of how many times the string was referenced, allowing it to be freed when the refcount drops to 0. RStrings can be used for many strings inside Article which are now stored as PStrings separately for each article - for example, author's name, author's email address, newsgroup names in xref headers, etc. I may convert all of these to RStrings sometime later to further reduce memory use. However, the biggest memory hog is the subject. I wrote up a separate Subject type which is a kind of normalised subject - it strips the "Re: " part at the beginning and the part number, if those are present, stores them separately, and then stores the rest as an RString, which means, in particular, that all parts of a multipart article end up referencing the same subject RString. Additionally, all of article-thread.c needed to be rewritten (its normalisation of subjects when sorting or threading is no longer needed, and in general it became smaller, faster, and much less RAM-hungry), and all places in Pan which reference article subjects needed small adjustment.

  • blog moved

    As described elsewhere, I've quit LiveJournal. If you're interested in my continuing posts, you should look at one of these (each contains feed…

  • dremel

    They published a paper on Dremel, my favorite previously-unpublished tool from the Google toolchest. Greg Linden discusses it: "[...] it is capable…

  • treemaps

    I finally wrote up my recent adventures in treemapping, complete with nifty clickable visualizations.

  • Post a new comment


    default userpic
    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.