06:03 pm, 27 Nov 04
nerding out: spamassassin cutoff
Gather your SpamAssassin scores:
trout:~/Mail/danga/spam/cur% grep 'X-Spam-Status' * | sed -e 's/.*hits=//' | sed -e 's/ .*//' > ~/n
Then, in R:
So if I lowered the cut off to 4, I'd have...
> length(n[n < 4]) / length(n)
[1] 0.9976526
99.7% accuracy.
> length(n[n < 3.5]) / length(n)
[1] 0.988263
Hmm...
One of the messages I'd lose, with a spam score of 3.9, is from my ex inviting me to dinner. I think it's 'cause it's HTML and it has a disclaimer footer inserted by her job.
(...am I posting too much?)
trout:~/Mail/danga/spam/cur% grep 'X-Spam-Status' * | sed -e 's/.*hits=//' | sed -e 's/ .*//' > ~/n
Then, in R:
> n = read.table('n')$V1
> summary(n)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-38.800 -4.900 -4.900 -3.564 -1.400 4.000 So if I lowered the cut off to 4, I'd have...
> length(n[n < 4]) / length(n)
[1] 0.9976526
99.7% accuracy.
> length(n[n < 3.5]) / length(n)
[1] 0.988263
Hmm...
One of the messages I'd lose, with a spam score of 3.9, is from my ex inviting me to dinner. I think it's 'cause it's HTML and it has a disclaimer footer inserted by her job.
(...am I posting too much?)
I was meaning to ask you about R last night. Any good suggestions/tutorials/tips on learning it and/or its mindset? I find myself doing more and more statistical-ish stuff lately, so i figure i should learn the real tools (though gnumeric is still my good, good friend).