networks & systems laboratory> research> current projects> automatically constructed ontologies for user modeling

Automatically Constructed Ontologies for User Modeling
Computer Human Adapted Interaction Research Group

Aims

- To extract an extensive ontology of computer science from a computer science dictionary

- Perform queries on the ontology to serve as a basis for computer science learning

- Use the ontology as a foundation for reasoning from limited models of user interests to more detailed and powerful models

- Compare and contrast these models to learn from the entities themselves

The Parsing Process

- An extract from FOLDOC the dictionary is shown below, with key components used in the parsing process highlighted in bold

- The generated ontology is backed by a weighted digraph

- Concepts discovered become nodes and the relationships between them become edges

- When further information about the relationship can be determined (for example, if it is a synonym, antonym, parent, child or sibling relationship) this becomes the type of the edge

- A weight is then given to each edge based on its type and the position within the definition that its corresponding relationship in the ontology was discovered

ontology
1.
<philosophy> A systematic account of Existence.
2.
<artificial intelligence> (From philosophy) An explicit formal specification of how to represent the objects, concepts and other entities that are assumed to exist in some area of interest and the relationships that hold among them.
For
{AI} systems, what "exists" is that which can be represented. When the {knowledge} about a {domain} is represented in a {declarative language}, the set of objects that can be represented is called the {universe of discourse}. We can describe the ontology of a program by defining a set of representational terms. Definitions associate the names of entities in the {universe of discourse} (e.g. classes, relations, functions or other objects) with human-readable text describing what the names mean, and formal {axioms} that constrain the interpretation and well-formed use of these terms. Formally, an ontology is the statement of a {logical theory}.
A set of
{agents} that share the same ontology will be able to communicate about a domain of discourse without necessarily operating on a globally shared theory. We say that an agent commits to an ontology if its observable actions are consistent with the definitions in the ontology. The idea of ontological commitment is based on the {Knowledge-Level} perspective.
3.
<information science> The hierarchical structuring of knowledge about things by subcategorising them according to their essential (or at least relevant and/or cognitive) qualities. See {subject index}. This is an extension of the previous senses of "ontology" (above) which has become common in discussions about the difficulty of maintaining {subject indices}.
(1997-04-09)

Results I – Ontology Properties

- The ontology generated contains 23,095 concepts and identifies 57,550 (directed) relationships (the direction is ignored when a query is processed, but not when the result is output)

- The chart below gives an indication of the connectivity of the graph. Essentially, 92% of the ontology resides in a single, connected subgraph with the cost of any shortest path between two nodes being less than 6.0 units (each edge has a weight between 0 and 1, with a majority between 0.4 and 0.7)

*click image to enlarge

Rationale
A software representation of an extensive computer science ontology can be used for a variety of roles in a teaching system
Automatic construction of an ontology makes good use of existing resources and helps minimise errors that may result from manual ontology construction
The sheer size and scope of such an ontology requires effective querying tools to extract useful information
Expanding existing models and representing them in a common domain allows them to be compared more effectively; for example, in machine learning applications

Ontology Queries
Queries can be performed based on a single node, a subset of nodes or two subgraphs
A single node (point) query generates a subgraph similar to a concept map, centred around that node. An example is shown below from the ‘ontology’ node, which also demonstrates the output of the parsing process for the dictionary extract to the left.
A subset of nodes is used to expand models using limited information. The nodes to be used in the query are determined through a matching process involving stemming and substring comparisons between arbitrary input and the ontology concepts.
Subgraphs (models) can either be merged (with optional clustering) or compared quantitatively.

Below: A Point query from ontology with a depth limit of 0.9
Bidirectional edges indicate synonym (or strong sibling) relationships, reversed arrowheads indicate antonym (or other opposing) relationships, directed edges indicate strict parent/child relationships and undirected edges indicate undetermined relationships (or weak siblings). Bold, normal, dashed and dotted line styles indicate progressively weaker relationships for all types.

*click image to enlarge

Results II – Ontology Evaluations

- Using the ASIS Thesaurus, 519 of the 1345 thesaurus terms were matched either exactly or making use of the stemmer. Many of the terms that could not be matched concerned the business focus of this resource and are not in the base computer science ontology.

- The thesaurus is arranged hierarchically in a tree structure (rather than a general graph), so structural analysis could be performed by finding the distance in the ontology between a node and its parent in the thesaurus. In this way, 90% of the path costs between matched parent-child edges were below 1.9 units, which translates to an average of around two edges on each path.

- The quantitative results of the concept mapping experiment are shown in the table to the right. In summary, approximately 86% of nodes could be matched and only four of the paths between matched nodes that shared an edge had a cost greater than 1.5 units.

Methodology

- For the initial ontology, the dictionary resource chosen was the Free On-line Dictionary of Computing (FOLDOC)

- A parser was written to generate an ontology from FOLDOC, or any resource that can be put into a similar format (for example, the ASIS Thesaurus of Information Science was parsed as part of the evaluation process)

- Evaluation of the ontology was conducted by comparing parts of it with trusted information sources and through concept mapping experiments using student volunteers

- Evaluation was also conducted in terms of the qualitative effectiveness of the querying tools for answering queries about ontological relationships

- Development and evaluation of the model expansion and comparison tools continues

Evaluation of the Ontology

- Initially, the parameters to the parsing process had to be tuned. These were progressively refined to ensure good use of the information in the dictionary without introducing too many errors (for example, anomalies in the classification of relationships)

- Once tuned, the content and structure of the ontology was compared with trusted resources such as the ASIS Thesaurus of Information Science. These validated some of the relationships determined by the system (its precision). They identified concepts where the ontology was lacking (its recall).

- Further evaluation was conducted by asking student volunteers to draw a concept map based upon a concept that was central to subjects they had recetly studied. These drawings were then compared with similar queries in the ontology.

Contact

Associate Professor Judy Kay
Trent Apted

 
University of SydneyDesigned by eliu