Evan Martin (evan) wrote in evan_tech,
Evan Martin

nerding out: spamassassin cutoff

Gather your SpamAssassin scores:
trout:~/Mail/danga/spam/cur% grep 'X-Spam-Status' * | sed -e 's/.*hits=//' | sed -e 's/ .*//' > ~/n
Then, in R:
> n = read.table('n')$V1
> summary(n)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-38.800  -4.900  -4.900  -3.564  -1.400   4.000 

So if I lowered the cut off to 4, I'd have...
> length(n[n < 4]) / length(n)
[1] 0.9976526

99.7% accuracy.
> length(n[n < 3.5]) / length(n)
[1] 0.988263

One of the messages I'd lose, with a spam score of 3.9, is from my ex inviting me to dinner. I think it's 'cause it's HTML and it has a disclaimer footer inserted by her job.

(...am I posting too much?)

  • megaupload captcha

    Someone make a Javascript-based captcha cracker for megaupload. It's strange to see those captchas again because I idly myself wrote a…

  • zombie ghosd

    I was tickled to discover another IBM developerworks article on one of my abandoned hacks and that both it and its predecessor have been translated…

  • gat, a git clone in haskell

    I've been pretty busy with work lately, so I may as well dump this on the internet before it gets too dusty. Though I think I understand Git decently…

  • Post a new comment


    default userpic
    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.