% Mail filtering
Mail filtering in Niksula
=========================

If your Niksula account was created after September 2012, you don't have a
Niksula mailbox and thus this guide does not apply to you.

### How does the system work?

The Niksula mail server uses Sieve (RFC 5228) upon mail delivery to decide what
to do with it. The sieve script the system looks at is `~/.dovecot.sieve` for
each user. You can edit this file using your favorite text editor and then
syntax-check your edits by compiling the file with `sievec(1)`, eg. by running
`sievec ~/.dovecot.sieve`. This is automatically done on mail delivery if
required, and if any errors occur, mail is delivered to INBOX and error
messages appended to `~/.dovecot.sieve.log`. Additionally, to test that your
script works correctly, you can use `sieve-test(1)`.

`procmail` is no longer supported in Niksula - having a `.procmailrc` does
nothing.

Before Sieve scripts are consulted, SpamAssassin is automatically run on all
mail delivered to Niksula mailboxes. It tries to detect spam by doing various
tests. Found spam is tagged with specified headers. For more details see
`spamassassin(1)`.

For more information about Sieve see <http://sieve.info/>. In Niksula, the
Sieve implementation in use is dovecot-pigeonhole, so it [supports some
extensions](http://wiki2.dovecot.org/Pigeonhole/Sieve#Supported_Features)
(those not enabled by default are not enabled in Niksula either).

### How to do it?

We'll now show you examples on how to use Sieve in combination with
SpamAssassin in Niksula to filter spam.

Since incoming mail is automatically tagged for you, all you need to do is
create Sieve rules to handle it. 

### 1) The easy way - basic spamfiltering

All incoming mail to Niksula are tagged with some default settings.
Therefore, to put spam messages in a different mailbox as it arrives, all you
have to do is to add the following lines to your `~/.dovecot.sieve` script.

    require ["fileinto"];

    if header :is "X-Spam-Flag" "YES" {
        fileinto "spam"; # or whatever mailbox name you wish
        stop; # don't continue processing for this message
    }

### 2) The medium level - spamfiltering reloaded

This example shows how to use some personal SpamAssassin settings. Different
users get different kind spam so default settings can't be the best for
everyone.

As in the above example, add the required lines to file spam into a mailbox of
your choosing. To do some personal configuration edit
`~/.spamassassin/user_prefs`.

You can set lots of variables to suit better for your own mails. The
default required score for spam is 5.0. If you still get spam with
scores like 4.5 or something, you can set the limit lower (or perhaps tune the
scores assigned to certain rules).

    required_score     4.0

And now also the spams with scores higher than 4.0 go to your spam
folder.

See
[Mail::SpamAssassin::Conf](http://spamassassin.apache.org/doc/Mail_SpamAssassin_Conf.html)
POD documentation for details of what can be tweaked.

### 3) Last but far from the least - the bayesian filtering

Even though SpamAssassin has very good rule-based filters, sometimes
even those are not enough. Some people might get even hundreds of spams
per day so it is very annoying. SpamAssassin includes a Bayesian module
that learns from your emails which ones are spam and which ones are good
(ham, ie. non-spam).

Now you should make two mailboxes of training material, ie. spam and ham. By
default SpamAssassin will not apply Bayesian filtering until it has learned 200
messages of both types (but this is configurable with the options
`bayes_min_ham_num` and `bayes_min_spam_num`). Make sure that only good mails
are in the ham collection, and the spam collection contains only spam.

Once you have a mailbox full of spam and another one full of ham, ssh to some
machine in [Paniikki](maps) or kekkonen, and run sa-learn on your mail. This
may take some time so it is nice to use command 'nice'. Your mail is stored in
`~/Maildir` and is organized in the filesystem in Maildir++ format, so if your
mailboxes are called 'spambox' and 'hambox', run the following commands:

    nice sa-learn --showdots --spam ~/Maildir/.spambox/
    nice sa-learn --showdots --ham ~/Maildir/.hambox/

Bayesian filtering is enabled by default in Niksula (may be controlled by the
setting `use_bayes` in `~/.spamassassin/user_prefs`), but autolearning is
disabled so you need to train the filter yourself.

You can test that the new filter is working by checking the
X-Spam-Status headers of emails that you receive; SpamAssassin tests
that look like `BAYES_00` should be visible even in ordinary emails.

**And for reminder: No system is foolproof, so please check your spamfolder
frequently.**