- To extract an
extensive ontology of computer science from a computer science dictionary
- Perform queries
on the ontology to serve as a basis for computer science learning
- Use the ontology
as a foundation for reasoning from limited models of user interests
to more detailed and powerful models
- Compare and contrast
these models to learn from the entities themselves
The Parsing Process
- An extract from
FOLDOC the dictionary is shown below, with key components used in
the parsing process highlighted in bold
- The generated
ontology is backed by a weighted digraph
- Concepts discovered
become nodes and the relationships between them become edges
- When further information
about the relationship can be determined (for example, if it is
a synonym, antonym, parent, child or sibling relationship) this
becomes the type of the edge
- A weight is then
given to each edge based on its type and the position within the
definition that its corresponding relationship in the ontology was
discovered
ontology
1. <philosophy>
A systematic account of Existence.
2. <artificial intelligence>
(From philosophy) An explicit formal specification of how to
represent the objects, concepts and other entities that are
assumed to exist in some area of interest and the relationships
that hold among them.
For {AI}
systems, what "exists" is that which can be represented.
When the {knowledge} about
a {domain}
is represented in a {declarative
language}, the set of objects
that can be represented is called the {universe
of discourse}. We can describe
the ontology of a program by defining a set of representational
terms. Definitions associate the names of entities in the {universe
of discourse} (e.g.
classes, relations, functions or other objects) with human-readable
text describing what the names mean, and formal
{axioms} that constrain the
interpretation and well-formed use of these terms. Formally,
an ontology is the statement of a {logical
theory}.
A set of {agents}
that share the same ontology will be able to communicate about
a domain of discourse without necessarily operating on a globally
shared theory. We say that an agent commits to an ontology if
its observable actions are consistent with the definitions in
the ontology. The idea of ontological commitment is based on
the {Knowledge-Level}
perspective.
3. <information science>
The hierarchical structuring of knowledge about things by subcategorising
them according to their essential (or at least relevant and/or
cognitive) qualities. See{subject index}.
This is an extension of the previous senses of "ontology"
(above) which has become common in discussions about the difficulty
of maintaining {subject indices}.
(1997-04-09)
Results I Ontology Properties
- The ontology generated
contains 23,095 concepts and identifies 57,550 (directed) relationships
(the direction is ignored when a query is processed, but not when
the result is output)
- The
chart below gives an indication of the connectivity of the graph.
Essentially, 92% of the ontology resides in a single, connected
subgraph with the cost of any shortest path between two nodes being
less than 6.0 units (each edge has a weight between 0 and 1, with
a majority between 0.4 and 0.7)
*click
image to enlarge
Rationale
A software representation of an extensive computer science ontology
can be used for a variety of roles in a teaching system
Automatic construction of an ontology makes good use of existing
resources and helps minimise errors that may result from manual
ontology construction
The sheer size and scope of such an ontology requires effective
querying tools to extract useful information
Expanding existing models and representing them in a common domain
allows them to be compared more effectively; for example, in machine
learning applications
Ontology Queries
Queries can be performed based on a single node, a subset of nodes
or two subgraphs
A single node (point) query generates a subgraph similar to a concept
map, centred around that node. An example is shown below from the
ontology node, which also demonstrates the output of
the parsing process for the dictionary extract to the left.
A subset of nodes is used to expand models using limited information.
The nodes to be used in the query are determined through a matching
process involving stemming and substring comparisons between arbitrary
input and the ontology concepts.
Subgraphs (models) can either be merged (with optional clustering)
or compared quantitatively.
Below:
A Point query from ontology with a depth limit of 0.9
Bidirectional edges indicate synonym (or strong sibling) relationships,
reversed arrowheads indicate antonym (or other opposing) relationships,
directed edges indicate strict parent/child relationships and
undirected edges indicate undetermined relationships (or weak
siblings). Bold, normal, dashed and dotted line styles indicate
progressively weaker relationships for all types.
*click
image to enlarge
Results II Ontology Evaluations
- Using the ASIS
Thesaurus, 519 of the 1345 thesaurus terms were matched either exactly
or making use of the stemmer. Many of the terms that could not be
matched concerned the business focus of this resource and are not
in the base computer science ontology.
- The thesaurus
is arranged hierarchically in a tree structure (rather than a general
graph), so structural analysis could be performed by finding the
distance in the ontology between a node and its parent in the thesaurus.
In this way, 90% of the path costs between matched parent-child
edges were below 1.9 units, which translates to an average of around
two edges on each path.
- The quantitative
results of the concept mapping experiment are shown in the table
to the right. In summary, approximately 86% of nodes could be matched
and only four of the paths between matched nodes that shared an
edge had a cost greater than 1.5 units.
Methodology
- For the initial
ontology, the dictionary resource chosen was the Free On-line Dictionary
of Computing (FOLDOC)
- A parser was written
to generate an ontology from FOLDOC, or any resource that can be
put into a similar format (for example, the ASIS Thesaurus of Information
Science was parsed as part of the evaluation process)
- Evaluation of
the ontology was conducted by comparing parts of it with trusted
information sources and through concept mapping experiments using
student volunteers
- Evaluation was
also conducted in terms of the qualitative effectiveness of the
querying tools for answering queries about ontological relationships
- Development and
evaluation of the model expansion and comparison tools continues
Evaluation of the Ontology
- Initially, the parameters to the parsing
process had to be tuned. These were progressively refined to ensure
good use of the information in the dictionary without introducing
too many errors (for example, anomalies in the classification of
relationships)
- Once tuned, the content and structure
of the ontology was compared with trusted resources such as the
ASIS Thesaurus of Information Science. These validated some of the
relationships determined by the system (its precision). They identified
concepts where the ontology was lacking (its recall).
- Further evaluation was conducted by
asking student volunteers to draw a concept map based upon a concept
that was central to subjects they had recetly studied. These drawings
were then compared with similar queries in the ontology.