networks & systems laboratory> research> current projects> LINGER: linger is a neural & genetic email reader

LINGER: Linger Is a Neural & Genetic Email Reader
Smart Internet Technology Research Group

Introduction

- The flood of email into our mailboxes each day continues to grow in volume, due to the ever rising popularity of the internet.

- To maintain a grip on such large quantities of information, it is convenient to organise our mail by dividing it into folders based on content; keeping our work mail separate from our personal mail, for example.

- Linger is an intelligent email filter, designed to automatically organise your email into folders based on the content.

- It uses sophisticated machine learning techniques to do this; a genetic algorithm combined with a neural network is used to classify the text of new email to match your existing folder arrangement.

Problems With Existing Solutions

- Many email clients offer automatic filtering of mail by using simple keyword spotting rules.

- These systems are not intelligent.

- The rules have to be designed manually by the user.

- The rules cannot adapt to change.

- It is impossible to create comprehensive rules given the variety of human expression.

- The rules will have to be rewritten frequently.

- We would like to have a system to work autonomously and filter our mail for us.

- A lot of research has been put into more intelligent systems of doing this, using statistical and Artificial Intelligence (AI) approaches.

- However, high accuracy is needed - the user has to be able to trust the AI not to make mistakes.

How It Works

- The contents of your mail folders can be thought of as a bag of words, each word with an associated frequency for each folder.

- Linger uses these word frequencies to identify your mail filing habits - it can notice specific patterns in the content of your folders that differentiate the different types of mail you receive. For example, the words "Assignment" and "Deadline" could feature prominently in work related mail but not in your personal correspondence.

- This means that Linger can dynamically adapt its decisions based on the semantics that it extracts from the mail that has already been sorted. If you split your stored mail into two different categories, Linger can adjust itself accordingly.

Algorithms

- Linger uses a hybrid approach to classification.

- A genetic algorithm is used to isolate which features can be used to distinguish between mail types.

- Genetic algorithms can be used to efficiently search through a large space of possible parameters.

- In Linger's case, it allows an optimal feature selection to be found quickly based on the variance of each word across all mailboxes

- A neural network is then used to classify the unknown mail based on what features it contains and what words are important to your mail folders.

- Neural networks are very good at adapting to their input values and have the ability to generalise to predict unseen data.

- This ability to generalise is very important in email classification, where the subject of a text can vary wildly.

Additional Link

http://neko.freeshell.org/linger

Contact

James Clark
Dr Irena Koprinska

Dr Josiah Poon

 
University of SydneyDesigned by eliu