A Concept Lattice for Semantic Integration of Geo-Ontologies Based on Weight of Inclusion Degree Importance and Information Entropy

Xiao, Jia; He, Zongyi

doi:10.3390/e18110399

Open AccessArticle

A Concept Lattice for Semantic Integration of Geo-Ontologies Based on Weight of Inclusion Degree Importance and Information Entropy

by

Jia Xiao

and

Zongyi He

^*

School of Resources and Environment Science, Wuhan University, No.129 Luoyu Road, Wuhan 430079, China

^*

Author to whom correspondence should be addressed.

Entropy 2016, 18(11), 399; https://doi.org/10.3390/e18110399

Submission received: 23 August 2016 / Revised: 1 October 2016 / Accepted: 10 November 2016 / Published: 15 November 2016

Download

Browse Figures

Versions Notes

Abstract

:

Constructing a merged concept lattice with formal concept analysis (FCA) is an important research direction in the field of integrating multi-source geo-ontologies. Extracting essential geographical properties and reducing the concept lattice are two key points of previous research. A formal integration method is proposed to address the challenges in these two areas. We first extract essential properties from multi-source geo-ontologies and use FCA to build a merged formal context. Second, the combined importance weight of each single attribute of the formal context is calculated by introducing the inclusion degree importance from rough set theory and information entropy; then a weighted formal context is built from the merged formal context. Third, a combined weighted concept lattice is established from the weighted formal context with FCA and the importance weight value of every concept is defined as the sum of weight of attributes belonging to the concept’s intent. Finally, semantic granularity of concept is defined by its importance weight; we, then gradually reduce the weighted concept lattice by setting up diminishing threshold of semantic granularity. Additionally, all of those reduced lattices are organized into a regular hierarchy structure based on the threshold of semantic granularity. A workflow is designed to demonstrate this procedure. A case study is conducted to show feasibility and validity of this method and the procedure to integrate multi-source geo-ontologies.

Keywords:

geo-ontologies integration; semantic granularity; formal concept analysis; weighted concept lattice; inclusion degree; information entropy

1. Introduction and Motivation

How to effectively integrate extremely large collections of heterogeneous data from multiple sources is, nowadays, a great challenge [1]. This challenge also exists in the field of geographic information science (GIScience). One way to address this challenge is with geo-ontologies integration. This is a formal solution provided to integrate heterogeneous geo-information data from multiple sources. Ontology, as a standardization procedure or a systematic approach, has been applied in various disciplines, such as information science, knowledge engineering, and artificial intelligence, in order to address the heterogeneity issues; although different definitions of ontology have been proposed in many related literatures [2,3,4]. Meanwhile, ontology is also defined as “a formal specification of a shared conceptualization” [5]; thus, ontology is not only used as a tool or method for solving the heterogeneity problems, but also as a determinate specification. In this paper, geo-ontologies are interpreted as formal specifications applied in various systems or data sources in the field of GIScience. Integration of geo-ontologies, thus, is the process or procedure of solving the problem of semantic heterogeneity, which is a concrete realization of ontology as a standardization procedure or systematic approach.

There have been various methods of integrating ontologies proposed in recent decades. Among these methods, formal concept analysis (FCA) is a key theory introduced to this research field. A merging technique, called FCA-MERGE, was proposed by Stumme et al. [6], which entails building a concept lattice and semi-automatically creating a target ontology from the lattice. Zhu [7] attempted to construct a general semantic level by using a concept lattice in which the redundant concept relations could be removed by a designed algorithm. Huang et al. [8] presented an approach to integrate heterogeneous tourism information by combining ontology with FCA. In the field of GIScience, using the same principle, Kokla et al. [9] explored integrating geo-ontologies by, for the first time, constructing a unified concept lattice. They constructed an experiment by extracting semantic factors of the category “stream” from three different ontologies (CYC top-level ontology, WordNet, and SDTSan), and then merging them into a unified concept context to generate a concept lattice. Li et al. [10] introduced information entropy to characterize the importance of attributes in order to merge multi-source geo-ontologies by constructing a weighted concept lattice and a set of algorithms is given to realize the reduction of this weighted concept lattice.

In the process of integrating ontologies, extracting essential geographical properties to build a formal context and removing redundant concepts and relations between them in order to reduce the concept lattice are the two key tasks. However, the technologies and methods mentioned above have drawbacks in two aspects. First, most technologies and methods only focus on refining essential properties, but always assume that they have equal importance in the ontology and fail to take difference in importance into account. Li et al. [10] used information entropy to characterize the importance of properties, but entropy-based weight does not measure the semantic difference between properties. Second, integration methods that conduct a reduction of the concept lattice involve a great deal of randomness. More domain experts are needed in the reduction process. Thus, in this paper our work mainly focuses on addressing these two problems. First, a unified formal context is built by extracting objects and essential attributes from different ontologies. Second, the combined importance weight of each attribute in the formal context is calculated by using the inclusion degree importance introduced from rough set theory, and information entropy, from information science, and a weighted formal context is constructed. Third, a weighted concept lattice is constructed on the basis of the partial order relation between weighted concepts generated from the weighted formal context. Finally, semantic granularity of the concept [11] is introduced and we define the combined weight of a concept as its semantic granularity, and reduce the weighted concept lattice according to the different thresholds of semantic granularity. A workflow has been designed to demonstrate this procedure.

The structure of this paper is organized as follows: Section 2 gives a brief introduction to the formal concept analysis and introduces some semantic representations of geo-ontologies; Section 3 discusses rough set theory and introduces the inclusion degree importance in the decision table; Section 4 describes the method of calculating the combined importance weight of a single attribute and constructing and reducing the weighted concept lattice; Section 5 provides a case of integrating multi-source geo-ontologies; and the conclusion and outlook are given in Section 6.

2. A Semantic Representation of Geographical Concepts with Formal Concept Analysis

2.1. Concept and Concept Lattices

Formal concept analysis (FCA) [12], proposed by Wille in 1982, quickly developed to be an important branch of applied mathematics. Its application has expanded into a number of fields, such as psychology, sociology, medicine, linguistics, information science, software engineering, and computer science [13,14,15]. Given a domain of interest, FCA essentially helps users build a conceptual framework of hierarchies, which is called a concept lattice, from a formal context in order to analyze, visualize, and share domain concepts and data. A formal concept in FCA is defined as a tuple which consists of an objects set and an attributes set. All concepts can be constructed into a concept lattice using partial order relation.

Definition 1.

In FCA, a formal context is described by a triple

K = (G, M, I)

, where G and M represent two nonempty sets of objects (called extent) and attributes (called intent), respectively, and I is a subset of the Cartesian product of G and M (

I \subseteq G \times M

). When

g \in G

and

m \in M

, if

g I m

, it means that object g has the attribute m, and that attribute m belongs to object g.

Given two sets,

A \subseteq G

and

B \subseteq M

, in a formal context

K = (G, M, I)

, if every element (object) in A contains all of the attributes in B and every element (attribute) in B belongs to all of the objects in A, respectively, as denoted with following operations:

A = B^{'} = {a \in G | b \in B, a I b},

(1)

B = A^{'} = {b \in M | a \in A, a I b},

(2)

then, the tuple

(A, B)

is a formal concept of K if

A = B^{'}

and

B = A^{'}

.

In a formal context, K, given two concepts,

(A_{1}, B_{1})

and

(A_{2}, B_{2})

, we can establish a hierarchical relation

(\leq)

between them, and

(A_{1}, B_{1})

is called a sub-concept of

(A_{2}, B_{2})

while

(A_{2}, B_{2})

is called a super-concept of

(A_{1}, B_{1})

, if they satisfy following condition:

(A_{1}, B_{1}) \leq (A_{2}, B_{2}) ⟺ A_{1} \subseteq A_{2} ⟺ B_{2} \subseteq B_{1} .

(3)

In fact, any two concepts of context K have a hierarchical relation defined by Condition (3). Consider the set of all concepts of K, indicated as

ℒ (G, M, I)

; then

(ℒ, \leq)

is called a concept lattice which is a complete lattice. By the definition of Condition (3), the relation

\leq

is a partial order relation.

2.2. A Semantic Representation of Geographical Concepts

How to select reasonable attributes in order to represent geographical entities (objects) is a crucial problem in building a formal context. This process is the basis of processing semantic integration. Originating from philosophical categories of identity, unity, essence, and dependence, Guarino et al. [16] presented a set of meta-properties, including rigid property, non-rigid property, anti-rigid property, semi-rigid property, carrier identity, and external dependence, to describe and classify the essential properties of a concept. Kavouras et al. [17], focusing on the definitions of categories in geographic ontologies, proposed that semantic properties, such as purpose, location, cover, and semantic relations, such as hypernym, part of, and has part, are essential attributes in distinguishing geographic categories. Wang et al. [18] proposed a universal standard in which essential attributes of geographic ontologies can be classified into seven categories: space, time, cause, material, function, object, and metrics. Based on this standard, Li et al. [10] made efforts to build a merged formal context from the specifications of fundamental geographic information and the specifications of water conservancy projects in China. Based on the above-mentioned research, this paper takes advantage of geographic ontologies and their essential attributes, which are extracted from GB/T 13923-2006 (specifications for feature classification and codes of fundamental geographic information), GB/T 20258.1-2007 (data dictionary for fundamental geographic information features) and SL 213-98 (specification of basic information coding of water conservancy projects), in China, based on the principle proposed by the authors of [10]. Table 1 presents partial objects of inland hydrological concepts and their essential attributes; and this table is actually a formal context in formal concept analysis.

The formal concepts of the formal context recorded in Table 1 can be calculated with a fast algorithm for building lattices [19], and all concepts are listed as follow:

$C_{0} = ({s_{1}, s_{2}, s_{3}, s_{4}, s_{5}, s_{6}, s_{7}, s_{8}}; \emptyset)$ ;
$C_{1} = ({s_{1}, s_{2}, s_{3}, s_{4}, s_{5}, s_{6}, s_{7}}; {a})$ ;
$C_{2} = ({s_{1}, s_{2}, s_{3}, s_{4}, s_{5}, s_{6}, s_{8}}; {h})$ ;
$C_{3} = ({s_{1}, s_{2}, s_{3}, s_{4}, s_{5}, s_{6}}; {a, h})$ ;
$C_{4} = ({s_{1}, s_{3}, s_{4}, s_{5}}; {a, c, h})$ ;
$C_{5} = ({s_{2}, s_{6}, s_{7}, s_{8}}; {d})$ ;
$C_{6} = ({s_{1}, s_{2}, s_{3}, s_{6}}; {a, f, h, o})$ ;
$C_{7} = ({s_{2}, s_{6}, s_{7}}; {a, d})$ ;
$C_{8} = ({s_{4}, s_{5}, s_{7}}; {a, e})$ ;
$C_{9} = ({s_{2}, s_{6}, s_{8}}; {d, h})$ ;
$C_{10} = ({s_{1}, s_{2}, s_{4}}; {a, h, j})$ ;
$C_{11} = ({s_{6}, s_{7}, s_{8}}; {d, n})$ ;
$C_{12} = ({s_{4}, s_{5}}; {a, c, e, h, l, m})$ ;
$C_{13} = ({s_{1}, s_{3}}; {a, c, f, h, o})$ ;
$C_{14} = ({s_{2}, s_{6}}; {a, d, f, h, o})$ ;
$C_{15} = ({s_{1}, s_{4}}; {a, c, h, j})$ ;
$C_{16} = ({s_{1}, s_{2}}; {a, f, h, j, o})$ ;
$C_{17} = ({s_{3}, s_{5}}; {a, c, h, k})$ ;
$C_{18} = ({s_{6}, s_{7}}; {a, d, n})$ ;
$C_{19} = ({s_{6}, s_{8}}; {d, h, n})$ ;
$C_{20} = ({s_{8}}; {b, d, g, h, n})$ ;
$C_{21} = ({s_{7}}; {a, d, e, i, n})$ ;
$C_{22} = ({s_{2}}; {a, d, f, h, j, o})$ ;
$C_{23} = ({s_{4}}; {a, c, e, h, j, l, m})$ ;
$C_{24} = ({s_{1}}; {a, c, f, h, j, o})$ ;
$C_{25} = ({s_{3}}; {a, c, f, h, k, o})$ ;
$C_{26} = ({s_{5}}; {a, c, e, h, k, l, m})$ ;
$C_{27} = ({s_{6}}; {a, d, f, h, n, o})$ ;
$C_{28} = (\emptyset; {a, b, c, e, f, g, h, i, j, k, l, m, n, o})$ .

With the complete concepts generated from the formal context in Table 1, we can construct a concept lattice, demonstrated in Figure 1, based on the partial order relation, which proves to be an effective and useful approach to partially tackle the problem of semantic heterogeneity in various domain ontologies. However, there are two main and obvious deficiencies in this solution. First, with the increase of the formal context, the concept lattice, time, and space complexity will increase exponentially. Second, essential attributes may receive the same importance weights, although they should have been assigned different weights. Aiming to tackle these two problems, this paper proposes methods of assigning weights to various attributes based on the inclusion degree, information entropy and reducing the merged concept lattice in accordance with the semantic granularity of concepts. The formal context data in Table 1 is cited from [10]; however, there are some mistakes in the original formal context data; for example, the object of ‘ground river’ satisfies the attribute ‘cause/artificial’ in Table 3 of the above-mentioned paper, but it is obvious that ground river is a naturally-formed geographical element. Additionally, the set of formal concepts, and the concept lattice generated from Table 3 in Reference [10], are proven to be wrong. The correct, complete formal concepts and the concept lattice have been demonstrated above and in Figure 1, respectively.

3. An Attribute Importance Definition Based on the Inclusion Degree in Rough Set Theory

In Section 3, all attributes in a concept context are considered to hold an equal degree of importance in the construction of a concept lattice. However, it is intuitive and reasonable that the weight coefficients of attributes are quite different in dealing with classification and integration problems, especially when the formal context grows larger (containing more objects and attributes), and the reduction of the concept lattice is necessary. How to identify the importance weight of each attribute has become an important and valuable task. Information entropy is introduced to assign a weight to attributes in some cases [20,21,22], the basis of which was used by Li et al. [10] to design a workflow to build an entropy-based, weighted concept lattice. Weight based on information entropy considers the diversity of the attributes, but ignores their relativity. For instance, in the formal context in Table 1, ‘a’ and ‘b’ fields, which represent water and soil or stone, respectively, belong to the same property, ‘material’, while ‘c’ and ‘d’ fields belong to the property ‘cause’; thus, the weight difference between ‘a’ and ‘b’ attributes is quite distinct from that of ‘a’ and ‘c’ attributes, which is not considered by entropy weight. This paper divides weight into two levels of property and attribute which can be calculated by the importance of the inclusion degree and information entropy respectively.

Consider that each of the objects in Table 1 is a category of geographic features in GB/T 13923-2006, and some of them belong to the same super-category, for instance both ‘ground river’ and ‘seasonal river’ are sub-categories of ‘river’. According to the classification standard in GB/T 13923-2006, we present a method to calculate the weight of each property with a rough set decision table, which can be transformed from the formal context table. In this paper, in order to avoid confusion, we describe the attribute of an object as a ‘property’ in the decision table. Table 3 contains the conditional properties which are converted from Table 1 and its decision property, which is referenced from GB/T 13923-2006 is listed as follows.

3.1. Necessary Knowledge of Rough Set Theory

First proposed in the early 1980s by the mathematician Pawlak, the rough set theory has become an important tool for data processing in many fields including information retrieval, data mining, text classification, pattern recognition and so on [23,24,25,26]. Determining the weight of attribute importance through a decision table is a foundational and worthwhile application of rough set theory. Some necessary knowledge regarding rough set theory will be introduced as follows, and for more basic knowledges about regarding rough set theory, we refer the readers to References [27,28].

Let

S = (U, C \cup D, V, f)

be a decision table in which

C

is the set of condition properties and

D

is the set of decision properties, if

B \subseteq C

,

{sig}_{C D} (B)

denotes the importance of

U / D

with respect to B:

{sig}_{C D} (B) = γ_{C} (D) - γ_{C - B} (D),

(4)

particularly, when

B = {a}

; the importance of

U / D

with respect to

a

is:

{sig}_{C D} (a) = γ_{C} (D) - γ_{C - {a}} (D),

(5)

where

γ_{C} (D) = | {POS}_{C} (D) | / | U |

, in which

{POS}_{C} (D)

is the positive region of the partition

U / D

[28].

3.2. Attribute Importance Based on Inclusion Degree

In Section 3.1,

{sig}_{C D} (a)

is a very valuable and useful characteristic parameter to describe the importance of property

a

on the properties reduction of the decision table, however, this parameter could lose its function of distinguishing the importance of different properties when unnecessary properties exist in the decision table, or there is more than one reduction result from the decision table. Ji et al. [29] propose a new definition of property importance based on the inclusion degree which partially solves this problem.

Given two sets, X and Y, we define a function as follows:

con (X / Y) = {\begin{array}{l} 0, X ⊄ Y; \\ 1, X \subseteq Y; \end{array}

(6)

Definition 2.

Let

S = (U, A, V, f)

be an information system,

A_{1}, A_{2} \subseteq A

, so

U / A_{1} = {X_{1}, X_{2} \dots X_{n}}

and

U / A_{2} = {Y_{1}, Y_{2} \dots Y_{m}}

are partitions of U, respectively; we denote

C O N (A_{1} / A_{2})

to be the inclusion degree of

U / A_{2}

to

U / A_{1}

:

CON (A_{1} / A_{2}) = \sum_{\begin{array}{l} 0 \leq i \leq n \\ 0 \leq j \leq m \end{array}} con (\frac{X_{i}}{Y_{j}}) .

(7)

Obviously,

0 \leq C O N (A_{1} / A_{2}) \leq n

, particularly if

A_{1}

is smaller than

A_{2}

, which means

\forall X_{i}, \exists Y_{j}

makes

X_{i} \subseteq Y_{j}

,

C O N (A_{1} / A_{2})

have the maximum value n.

Definition 3.

Let

S = (U, C \cup D, V, f)

be a decision table, property

a \in C

;

S I G_{C D} (a)

denotes the property importance of the inclusion degree:

{SIG}_{C D} (a) = ({| U |}^{2} {sig}_{C D} (a) + | U | p_{a} - q_{a} + 1) / ({| U |}^{2} + | U | + 1),

(8)

where

p_{a} = (| {POS}_{a} (D) | + CON (a / D) / | U |) / (| U | + 1),

q_{a} = (| {POS}_{D} (a) | + CON (D / a) / | U |) / (| U | + 1),

p_{a}

and

q_{a}

represent the influence of the positive region and the inclusion to the property importance, respectively.

Theorem 1.

The property importance of the inclusion degree

S I G_{C D} (a)

satisfies the following properties:

1.: $0 \leq S I G_{C D} (a) \leq 1$ ;
2.: If $S I G_{C D} (a) = 0$ , then $s i g_{C D} (a) = 0$ must be true, but the opposite is not necessarily the case; if $S I G_{C D} (a) = 1$ , then $s i g_{C D} (a) = 1$ must be true; however, the opposite is not necessarily the case;
3.: If $s i g_{C D} (a) > s i g_{C D} (b)$ , then $S I G_{C D} (a) > S I G_{C D} (b)$ , but the opposite is not necessarily the case; and
4.: If $U / a$ is smaller than $U / b$ , which means $\forall a' \in U / a, \exists b' \in U / b$ makes $X_{i} \subseteq Y_{j}$ , then $S I G_{C D} (a) \geq S I G_{C D} (b)$ .

4. Construction of a Combined Weighted Concept Lattice with Inclusion Degree Importance and Information Entropy

In Section 4, we assign the weight to each property of the decision table (Table 3) depending on the importance of the inclusion degree; for example, let conditional properties

C =

{material, cause, spatial morphology, spatial location, time, material state, function} and decision property

D =

{category}; the

{SIG}_{CD} (m a t e r i a l) = ({| U |}^{2} {sig}_{CD} (m a t e r i a l) + | U | p_{m a t e r i a l} - q_{m a t e r i a l} + 1) / ({| U |}^{2} + | U | + 1) = 0.014

. However, the property in the decision table usually implies more than one attribute of the formal context, which drives this decision table. Enlightened by the viewpoint of information entropy [10,20,21,22], to the foundation of the importance of the inclusion degree we introduce information entropy to characterize the combined weights of various attributes that belong to the same property in the decision table.

4.1. Information Entropy

According to Shannon’s information theory, information entropy is established as a key information measure [20]. Li et al. [10] introduce information entropy into the field of formal concept analysis to quantify the weight of the concepts’ intent.

Definition 4.

Let

K = (G, M, I)

be a formal context, where

G = {g_{1}, g_{2}, \dots, g_{n}}

and

M = {m_{1}, m_{2}, \dots, m_{k}}

, then

p (m / g)

denotes the probability of object

g

possessing the corresponding attribute

m

, and

E (m)

is the average information of attribute

m

provided by

G

, the set of objects. We, thus, can compute

E (m_{j}) (1 \leq j \leq k)

as the following formula [10,22] shows:

E (m_{j}) = - \sum_{i = 1}^{n} p (m_{j} / g_{i}) \log_{2} p (m_{j} / g_{i}) .

(9)

4.2. Combined Weight Based on Attribute Importance and Information Entropy

In Section 4, given a decision table

S = (U, C \cup D, V, f)

, we can quantify the inclusion degree importance of each conditional property by the inclusion degree method.

Definition 5.

Let

K = (G, M, I)

be a formal context, which generates the conditional properties of the decision table

S

, where

G = {g_{1}, g_{2}, \dots, g_{n}}

and

M = {m_{1}, m_{2}, \dots, m_{k}}

,

S I G (m_{j})

denotes the importance of the inclusion degree of attribute

m_{j}

based on decision table

S

:

SIG (m_{j}) = \frac{1}{| [m_{j}] |} {SIG}_{C D} ([m_{j}]),

(10)

where

[m_{j}] \in C

represents one of the conditional properties in decision table

S

, its possible value is

m_{j}

, and

| [m_{j}] |

is the number of nonempty value of

[m_{j}]

. For example, in the formal context (Table 1),

a

is the attribute representing water, and

[a]

denotes the attribute of materialdenoted as

[a] = [b] = a b

in Table 3 and

| [a] | = | [b] | = 2

.

Definition 6.

Let

K = (G, M, I)

be a formal context, where

G = {g_{1}, g_{2}, \dots, g_{n}}

and

M = {m_{1}, m_{2}, \dots, m_{k}}

, and

w_{m_{j}} (1 \leq j \leq k)

is the combined weight of attribute importance and information entropy of attribute

m_{j}

:

w_{m_{j}} = SIG (m_{j}) * E (m_{j}) / (\sum_{i = 1}^{k} SIG (m_{j}) * E (m_{j})) .

(11)

4.3. Construction of Weighted Concept Lattice

With the inclusion degree importance and information entropy, the weight value of each attribute of a formal context is computed using Equation (11); then we can define a weighted formal context and the combined weighted formal concepts generated from it.

Definition 7.

Given the formal context

K = (G, M, I)

and its derived decision table

S = (U, C \cup D, V, f)

in which

D

is the set of decision properties assigned by a given classification standard, a weighted formal context is defined to be

K' = (G, M, I, W)

, where

W

is the set of combined weight values of each attribute belonging to

M

, based on inclusion degree importance and information entropy.

Definition 8.

Given the weighted formal context

K' = (G, M, I, W)

, if the tuple

(A, B)

is a formal concept generated from the normal formal context

K = (G, M, I)

using Equation (1) and (2) in Section 1, the triple

C = (A, B, w_{C})

denotes the weighted concept based on inclusion degree importance and information entropy in which

w_{C}

is the weight of this concept:

w_{C} = \sum_{i = 1}^{n} w_{b_{i}},

(12)

where

B = b_{1} ⋃ b_{2} ⋃ \dots ⋃ b_{n}

and

w_{b_{i}} \in W

.

Using Equation (12) as a foundation, we constructed a weighted concept lattice (Figure 2) from the classic concept lattice demonstrated in Figure 1, where each formal concept weight value is noted next to its corresponding concept node. In the weighted concept lattice, each concept is given a weight value to identify its relative significance. We, thus, can use the theory of granular computing to reduce the weighted concept lattice so as to satisfy the diversified requirements for constructing various concept lattices of different granular hierarchies.

4.4. Reduction of the Weighted Concept Lattice

The new method for calculating the weights of concepts proposed in this paper considers the significant difference among the attributes and quantifies their differences by using inclusion degree importance and information entropy analyses. On the basis of this complete weighted concept lattice, many other further analyses can be implemented, such as formal concepts similarity analysis, lattice hierarchy classification analysis, and so on. In this paper, we focus on reducing the weighted concept lattice by removing certain fined-grained concepts that involve more specific characteristics that are different from others, and with the remaining, relatively coarse-grained concepts, which only contain some common generalities, and a reducing weighted concept lattice is reconstructed with a certain granularity.

According to Equation (12), a concept weight value

w

is the sum of the weight values of each attribute belonging to the concept intent. Concept weight value

w

represents a measure that a concept includes specific characteristics, which means that, the larger a concept weight is, the finer the granularity of this concept. From the point view of granular computing theory, we define semantic granularity of a concept with its combined weight value

w

. Then, we define the quantity

θ (0 \leq θ \leq 1)

which can be regarded as the threshold of concept semantic granularity as well as that of its combined weight. We reduce the complete weighted concept lattice with the following procedure: Given a weighted formal context

K' = (G, M, I, W)

, let

C

be the set of all formal concepts generated from

K'

,

\forall C_{i} \in C

if

w_{C_{i}} \geq θ

and

0 < w_{C_{i}} < 1

,

C_{i}

is removed from

C

. Then we use the remaining concepts to construct a new concept lattice based on the partial order relation among them; the new lattice is the reduction of the original weighted concept lattice. It is necessary and reasonable to add the constraint

0 < w_{C_{i}} < 1

, because the concept with maximal extent and maximal intent remain in this case, to maintain the completeness of the reduced lattice.

From the discussion above, we designed a workflow to implement the integration of multi-source geo-ontologies (Figure 3) as follows. Our main study in this paper focuses on two steps of the workflow. One is to generate the weighted concept lattice from the concept lattice based on the weight of the inclusion degree importance and information entropy. The other is to reduce the weighted concept lattice in terms of different levels of conceptual semantic granularities.

In the following section, an experimental case presents the implementation process of integrating geographic ontologies from different sources.

5. A Case Study

In Section 3, the all formal concepts and a concept lattice generated from the formal context (Table 1) are computed and demonstrated. Then in Section 4, a decision table is introduced from rough set theory to represent the formal context in another way. Using Equation (8), the weight of each property in the decision table is calculated, which is illustrated in Table 4. Then in Table 5 we compute the weight value of each attribute of the formal context by substituting the value of each

SIG (p r o p e r t y)

in Table 4 and each attribute’s information entropy is calculated using Equation (9).

By using Equations (11) and (12), substituted with the weight value of each single attribute in Table 5, we can compute each formal concept’s combined weight value, from which a weighted concept lattice is constructed with complete weighted formal concepts. The weighted concepts and weighted concept lattice are demonstrated as follows:

$C_{0} = ({s_{1}, s_{2}, s_{3}, s_{4}, s_{5}, s_{6}, s_{7}, s_{8}}; \emptyset; 0)$ ;
$C_{1} = ({s_{1}, s_{2}, s_{3}, s_{4}, s_{5}, s_{6}, s_{7}}; {a}; 0.0107)$ ;
$C_{2} = ({s_{1}, s_{2}, s_{3}, s_{4}, s_{5}, s_{6}, s_{8}}; {h}; 0.0198)$ ;
$C_{3} = ({s_{1}, s_{2}, s_{3}, s_{4}, s_{5}, s_{6}}; {a, h}; 0.0306)$ ;
$C_{4} = ({s_{1}, s_{3}, s_{4}, s_{5}}; {a, c, h}; 0.0490)$ ;
$C_{5} = ({s_{2}, s_{6}, s_{7}, s_{8}}; {d}; 0.0185)$ ;
$C_{6} = ({s_{1}, s_{2}, s_{3}, s_{6}}; {a, f, h, o}; 0.2470)$ ;
$C_{7} = ({s_{2}, s_{6}, s_{7}}; {a, d}; 0.0292)$ ;
$C_{8} = ({s_{4}, s_{5}, s_{7}}; {a, e}; 0.0485)$ ;
$C_{9} = ({s_{2}, s_{6}, s_{8}}; {d, h}; 0.0383)$ ;
$C_{10} = ({s_{1}, s_{2}, s_{4}}; {a, h, j}; 0.0509)$ ;
$C_{11} = ({s_{6}, s_{7}, s_{8}}; {d, n}; 0.2091)$ ;
$C_{12} = ({s_{4}, s_{5}}; {a, c, e, h, l, m}; 0.4419)$ ;
$C_{13} = ({s_{1}, s_{3}}; {a, c, f, h, o}; 0.2641)$ ;
$C_{14} = ({s_{2}, s_{6}}; {a, d, f, h, o}; 0.2641)$ ;
$C_{15} = ({s_{1}, s_{4}}; {a, c, h, j}; 0.0931)$ ;
$C_{16} = ({s_{1}, s_{2}}; {a, f, h, j, o}; 0.2660)$ ;
$C_{17} = ({s_{3}, s_{5}}; {a, c, h, k}; 0.0682)$ ;
$C_{18} = ({s_{6}, s_{7}}; {a, d, n}; 0.2198)$ ;
$C_{19} = ({s_{6}, s_{8}}; {d, h, n}; 0.2290)$ ;
$C_{20} = ({s_{8}}; {b, d, g, h, n}; 0.2527)$ ;
$C_{21} = ({s_{7}}; {a, d, e, i, n}; 0.3017)$ ;
$C_{22} = ({s_{2}}; {a, d, f, h, j, o}; 0.2845)$ ;
$C_{23} = ({s_{4}}; {a, c, e, h, j, l, m}; 0.4622)$ ;
$C_{24} = ({s_{1}}; {a, c, f, h, j, o}; 0.2845)$ ;
$C_{25} = ({s_{3}}; {a, c, f, h, k, o}; 0.2833)$ ;
$C_{26} = ({s_{5}}; {a, c, e, h, k, l, m}; 0.4610)$ ;
$C_{27} = ({s_{6}}; {a, d, f, h, n, o}; 0.4548)$ ;
$C_{28} = (\emptyset; {a, b, c, e, f, g, h, i, j, k, l, m, n, o}; 1)$ ;

The new method for calculating the weights of concepts proposed in this paper has taken into consideration the significant differences among the attributes. The weighted concept lattice in Figure 2, is a complete lattice generated from the formal context (Table 1) and each of its concept nodes’ weights is calculated by summing the weights of the attributes that belong to the concept node.

Since we have calculated the weight value of each concept node of the concept lattice, with the complete weighted concept lattice in Figure 2 and the procedure of reducing the weighted lattice proposed in Section 4.4, we demonstrate several reduction experiments under various levels of conceptual semantic granularities.

First, we try to set up the concept semantic granularity threshold

θ = 0.4

; then the lattice nodes

C_{12}, C_{23}, C_{26}, C_{27}

are removed for semantic granularity weight values larger than

θ

. The reduced weighted concept lattice under this

θ

is plotted in Figure 4. The objects of

C_{23}, C_{26}, C_{27}

are ‘ground river’, ‘season river’ and ‘reservoir’, respectively, and

C_{12}

is the direct super-concept of

C_{23}

and

C_{26}

. This result might confuse some readers; according to people’s institutions, ‘ground river’, ‘season river’, ‘lake’ and ‘season lake’ should be under the same level of conceptual semantic granularity. However, as we compare the difference of their attributes we find that ‘ground river’ and ‘season river’ contain more specific characteristics, such as ‘material state/flow’ attribute which only belongs to ‘river’. Thus this result is reasonable for those four removed concepts, which include more specific characteristics. From the viewpoint of granular computing theory, they are, relatively, finer-grained concepts.

Second, the semantic granularity threshold is set to

θ = 0.1

; then the lattice nodes, whose semantic granularity is larger than 0.1 are removed. The reduced weighted concept lattice under this

θ

is plotted in Figure 5. The nodes

C_{6}, C_{11}, C_{12}, C_{13}, C_{14}, C_{16}, C_{18}, C_{19}, C_{20}, C_{21}, C_{22}, C_{23}, C_{24}, C_{25}, C_{26},

C_{27}

are removed. We can see that all concept nodes that contain only a single object are removed at this level of semantic granularity threshold. This result is understandable as a concept node that contains more than one object inevitably remains those objects’ common attributes and abandons their non-common attributes. It is obvious that the reduced weighted concept lattice under

θ = 0.1

preserves fewer concept nodes than that under

θ = 0.4

and, more precisely, the set of concepts of the reduced lattice under

θ = 0.1

is a subset of that under

θ = 0.4

. As the threshold

θ

is set to be smaller and smaller, the reduced lattices reserve fewer and fewer concept nodes and their structures become simpler and simpler. From the viewpoint of granular computing theory the structure, or Hasse Diagram, of the reduced weighted lattice becomes gradually simplified with the decrease of the threshold (

θ

), which indicates that the granularity gradually becomes coarser; and the process of reducing the weighted concept lattice, step-by-step, by gradually decreasing the threshold

θ

is actually a stepwise process that refines the generality of the concepts under various levels of granularity. Figure 6 and Figure 7 present the reduced weighted concept lattices under

θ = 0.04

and

θ = 0.02

, respectively.

6. Conclusions and Outlook

From the case discussed above, we mainly built a process of integrating multi-source geo-ontologies. First, heterogeneous ontology semantics of different domains are extracted from the normative definitions or descriptions of geo-concepts (objects), and then formalized to be a merged objects-attributes table, which can be defined as a unified formal context in formal concept analysis. Second, we computed the combined weight of each attribute in this unified formal context by introducing the inclusion degree importance and information entropy, both of which characterize significant differences among the attributes but under different levels. Inclusion degree importance quantifies the significant weight of ontology properties, such as material or cause in this paper’s case, according to rough set theory; and information entropy differentiates the entropy weights of attributes, such as cause/nature and cause/artificial under the viewpoint of information theory. The combined weights present the significant weight of attributes more exactly and comprehensively. Third, a weighted concept lattice is generated from the unified formal context, and we define every concept node’s significant weight as the sum of the combined weight of each attribute that belongs to this conceptual intent. Under the viewpoint of granular computing theory, this weighted concept lattice contains a structure of granularity hierarchies, for the concept node’s combined weight is a sort of measure of conceptual particularity, which means that a concept with a larger combined weight maintains more particularity and semantically belongs to a finer-grained hierarchy. Finally, we reduce the weighted concept lattice from the level of fine granularity to that of coarse granularity by gradually decreasing the semantic granularity threshold and then removing concepts that have a larger semantic granularity than threshold from the lattice. All of those reduced lattices can be organized to be a regular hierarchy structure, based on the threshold of semantic granularity.

In the study, the inclusion degree importance, based on the decision table from rough set theory, is introduced to measure the significant weight of ontology properties. We chose the classification code of the objects’ super-categories from GB/T 13923-2006 as the decision property (attribute or field) in the decision table. This decision property, from fundamental geographic information domain, can be considered to be prior classification knowledge which is a result of experts’ long-term recognition, thus, it makes sense to inversely calculate the significant weight of conditional properties from this decision property. In future studies, we will attempt to merge multiple decision properties from different domains into the decision table, and then existing classification knowledges from multi-domains can be taken into account.

It proves to be effective and feasible to reduce the weighted concept lattice and to construct its granularity hierarchies using the combined significant weights of attributes, and the tree-structure of concepts has a very wide range of applications, so our future goals also include converting the reduced weighted concept lattices into tree structures. In addition, our future research will focus on extending our methods to other kinds of ontologies.

Acknowledgments

The authors gratefully acknowledge the support of all the members of our research group. This research was financially supported by the National Natural Science Foundation of China (Grant No. 41071290, 41201463).

Author Contributions

Jia Xiao and Zongyi He conceived and designed the workflow and the study; Jia Xiao performed the case experiment, analyzed the data and wrote the paper; Jia Xiao and Zongyi He revised the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Jagadish, H.V.; Gehrke, J.; Labrinidis, A.; Papakonstantinou, Y.; Patel, J.M.; Ramakrishnan, R.; Shahabi, C. Big data and its technical challenges. Commun. ACM 2014, 57, 86–94. [Google Scholar] [CrossRef]
Chandrasekaran, B.; Johnson, T.R.; Benjamins, V.R. Ontologies: What are they? Why do we need them? IEEE Intell. Syst. 1999, 14, 20–26. [Google Scholar] [CrossRef]
Fonseca, F.T.; Egenhofer, M.J.; Agouris, P.; Camara, G. Using ontologies for integrated geographic information systems. Trans. GIS 2002, 6, 231–257. [Google Scholar] [CrossRef]
Bittner, T.; Stell, J.G. Approximate qualitative spatial reasoning. Spat. Cogn. Comput. 2002, 2, 435–466. [Google Scholar]
Borst, W.N.; Akkermans, J.M.; Top, J.L. Engineering ontologies. Int. J. Hum. Comput. Stud. 1997, 46, 365–406. [Google Scholar] [CrossRef]
Stumme, G.; Adche, M.A. FCA-Merge: Bottom-up merging of ontologies. In Proceedings of the Seventeenth International Conference on Artificial Intelligence (IJCAI’01), Seattle, WA, USA, 4–10 August 2001; pp. 225–230.
Zhu, J.W. A formal method for integrating distributed ontologies and reducing the redundant relations. Kybernetes 2009, 38, 1872–1879. [Google Scholar] [CrossRef]
Huang, Y.; Bian, L. Using ontologies and formal concept analysis to integrate heterogeneous tourism information. IEEE Trans. Emerg. Top. Comput. 2015, 3, 172–184. [Google Scholar] [CrossRef]
Kokla, M.; Kavouras, M. Fusion of top-level and geographical domain ontologies based on context formation and complementarity. Int. J. Geogr. Inf. Sci. 2001, 15, 679–687. [Google Scholar] [CrossRef]
Li, J.L.; He, Z.Y.; Zhu, Q.L. An Entropy-based weighted concept lattice for merging multi-source geo-ontologies. Entropy 2013, 15, 2303–2318. [Google Scholar] [CrossRef]
Fonseca, F. Semantic granularity in ontology-driven geographic information systems. Ann. Math. Artif. Intell. 2002, 36, 121–151. [Google Scholar] [CrossRef]
Ganter, B.; Wille, R. Formal Concept Analysis: Mathematical Foundations; Springer: Berlin, Germany, 1999. [Google Scholar]
Chen, Y.; Yao, Y.Y. A multiview approach for intelligent data analysis based on data operators. Inf. Sci. 2008, 178, 1–20. [Google Scholar] [CrossRef]
Kumar, C.A. Knowledge discovery in data using formal concept analysis and random projections. Comput. Math. Appl. 2011, 21, 745–756. [Google Scholar] [CrossRef]
Kwon, O.; Kim, J. Concept lattices for visualizing and generating user profiles for context-aware service recommendations. Expert Syst. Appl. 2009, 36, 1893–1902. [Google Scholar] [CrossRef]
Guarino, N.; Welty, C. A Formal Ontology of Properties. In Proceedings of the 12th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2000), Juan-les-Pins, French Riviera, France, 2–6 October 2000; Dieng, R., Corby, O., Eds.; Springer: Berlin, Germany, 2000; pp. 97–112. [Google Scholar]
Kavouras, M.; Kokla, M.; Tomai, E. Comparing categories among geographic ontologies. Comput. Geosci. 2005, 31, 145–154. [Google Scholar] [CrossRef]
Wang, H.; Li, L.; Zhu, H.H. Research on National Fundamental Geographic Information Ontology; Science Press: Beijing, China, 2011; pp. 74–91. (In Chinese) [Google Scholar]
Nourine, L.; Raynaud, O. A fast algorithm for building lattices. Inf. Process. Lett. 1999, 71, 199–204. [Google Scholar] [CrossRef]
Shanonn, C.E.; Weaver, W. The Mathematical Theory of Communication; University of Illinois Press: Urbana, IL, USA, 1946. [Google Scholar]
Lee, I.; Seo, D.-C.; Choi, T.-S. Entropy-based block processing for satellite image registration. Entropy 2012, 14, 2397–2407. [Google Scholar] [CrossRef]
Csiszár, I. Axiomatic characterizations of information measures. Entropy 2008, 10, 261–273. [Google Scholar] [CrossRef]
Lin, T.; Cercone, N. Rough Sets and Data Mining: Analysis of Imprecise Data; Kluwer Academic Publishers: Boston, MA, USA, 1997. [Google Scholar]
Polkowski, L.; Skowron, A. Rough Sets in Knowledge Discovery 2: Applications, Case Studies and Software Systems; Physica-Verlag: New York, NY, USA, 1998. [Google Scholar]
Raiffa, H.; Richardson, J.; Metcallfe, D. Negotiation Analysis: The Science and Art of Collaborative Decision Making; Harvard University Press: Cambridge, MA, USA, 2002. [Google Scholar]
Yao, Y.Y. Three-way decisions with probabilistic rough sets. Inf. Sci. 2010, 180, 341–353. [Google Scholar] [CrossRef]
Pawlak, Z. Rough Sets. Int. J. Comput. Inf. Sci. 1982, 11, 341–356. [Google Scholar] [CrossRef]
Zhang, W.X.; Wu, W.Z.; Liang, J.Y.; Li, D.Y. Theory and Method of Rough Sets; Science Press: Beijing, China, 2001. (In Chinese) [Google Scholar]
Ji, J.; Wu, G.X.; Li, W. Significance of attribute based on inclusion degree. J. Jiangxi Norm. Univ. (Nat. Sci.) 2009, 33, 656–660. (In Chinese) [Google Scholar]

Figure 1. The concept lattice.

Figure 2. The weighted concept lattice.

Figure 3. The workflow of integrating multi-source geo-ontologies.

Figure 4. The reduced weighted concept lattice (

θ = 0.4

).

Figure 4. The reduced weighted concept lattice (

θ = 0.4

).

Figure 5. The reduced weighted concept lattice (

θ = 0.1

).

Figure 5. The reduced weighted concept lattice (

θ = 0.1

).

Figure 6. The reduced weighted concept lattice (

θ = 0.04

).

Figure 6. The reduced weighted concept lattice (

θ = 0.04

).

Figure 7. The reduced weighted concept lattice (

θ = 0.02

).

Figure 7. The reduced weighted concept lattice (

θ = 0.02

).

Table 1. A formal context from geographic categories definitions.

**Table 1.** A formal context from geographic categories definitions.
SN	Object	a	b	c	d	e	f	g	h	i	j	k	l	m	n	o
$s_{1}$	lake	*		*			*		*		*					*
$s_{2}$	pond	*			*		*		*		*					*
$s_{3}$	seasonal lake	*		*			*		*			*				*
$s_{4}$	ground river	*		*		*			*		*		*	*
$s_{5}$	seasonal river	*		*		*			*			*	*	*
$s_{6}$	reservoir	*			*		*		*						*	*
$s_{7}$	spillway	*			*	*				*					*
$s_{8}$	dike		*		*			*	*						*

Note: Each letter from “a” to “o” represents attributes in Table 2; * represents a satisfied criterion.

Table 2. Comparison table between attributes and identifiers.

**Table 2.** Comparison table between attributes and identifiers.
Identifier	a	b	c	d	e	f
Attribute	material/	material/	cause/	cause/	spatial morphology/	spatial morphology/
Attribute	water	soil or stone	nature	artificial	long strip slot	depressions
Identifier	g	h	i	j	k	l
Attribute	spatial morphology/	spatial location/	spatial location/	time/	time/	material state/
Attribute	buildings	on the earth	underground	perennial	seasonal	flow
Identifier	m	n	o
Attribute	function/	function/	function/
Attribute	shipping	prevent flood	store water

Table 3. A decision table converted from Table 1.

**Table 3.** A decision table converted from Table 1.
SN	Object	Material	Cause	Spatial Morphology	Spatial Location	Time	Material State	Function	Category
$s_{1}$	lake	a	c	f	h	j	$\emptyset$	o	230000
$s_{2}$	pond	a	d	f	h	j	$\emptyset$	o	230000
$s_{3}$	seasonal lake	a	c	f	h	k	$\emptyset$	o	230000
$s_{4}$	ground river	a	c	e	h	j	l	m	210000
$s_{5}$	seasonal river	a	c	e	h	k	l	m	210000
$s_{6}$	reservoir	a	d	f	h	$\emptyset$	$\emptyset$	(n, o)	240000
$s_{7}$	spillway	a	d	e	i	$\emptyset$	$\emptyset$	n	240000
$s_{8}$	dike	b	d	g	h	$\emptyset$	$\emptyset$	n	270000

Note: Each letter from “a” to “o” represents the same meaning as in Table 1, and

\emptyset

indicates that the object does not contain the attribute. The value of field ‘category’ is the classification code of the super-category of the objects, which is the decision attribute in the decision table. As the object ‘reservoir’ includes two ‘function’ values (n and o) in the formal context (Table 1), we represent its ‘function’ with ‘(n, o)’ which is a new value of the ‘function’ property in the decision table.

Table 4. Weight of properties in the decision table.

**Table 4.** Weight of properties in the decision table.
Inclusion Degree	Material	Cause	Spatial Morphology	Spatial Location	Time	Material State	Function
SIG(property)	0.0145	0.0055	0.0162	0.0177	0.0057	0.0266	0.0816

Table 5. Attribute weight value of the formal context.

**Table 5.** Attribute weight value of the formal context.
Attribute	p(x)	E(x)	SIG(x)	w(x)
a	0.875	0.169	0.0048	0.0107
b	0.125	0.375	0.0048	0.0238
c	0.500	0.500	0.0028	0.0185
d	0.500	0.500	0.0028	0.0185
e	0.375	0.531	0.0054	0.0379
f	0.500	0.500	0.0054	0.0357
g	0.125	0.375	0.0054	0.0268
h	0.875	0.169	0.0885	0.0198
i	0.125	0.375	0.0885	0.0440
j	0.375	0.531	0.0029	0.0203
k	0.250	0.500	0.0029	0.0191
l	0.250	0.500	0.0266	0.1755
m	0.250	0.500	0.0272	0.1795
n	0.375	0.531	0.0272	0.1906
o	0.500	0.500	0.0272	0.1795

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xiao, J.; He, Z. A Concept Lattice for Semantic Integration of Geo-Ontologies Based on Weight of Inclusion Degree Importance and Information Entropy. Entropy 2016, 18, 399. https://doi.org/10.3390/e18110399

AMA Style

Xiao J, He Z. A Concept Lattice for Semantic Integration of Geo-Ontologies Based on Weight of Inclusion Degree Importance and Information Entropy. Entropy. 2016; 18(11):399. https://doi.org/10.3390/e18110399

Chicago/Turabian Style

Xiao, Jia, and Zongyi He. 2016. "A Concept Lattice for Semantic Integration of Geo-Ontologies Based on Weight of Inclusion Degree Importance and Information Entropy" Entropy 18, no. 11: 399. https://doi.org/10.3390/e18110399

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Concept Lattice for Semantic Integration of Geo-Ontologies Based on Weight of Inclusion Degree Importance and Information Entropy

Abstract

1. Introduction and Motivation

2. A Semantic Representation of Geographical Concepts with Formal Concept Analysis

2.1. Concept and Concept Lattices

2.2. A Semantic Representation of Geographical Concepts

3. An Attribute Importance Definition Based on the Inclusion Degree in Rough Set Theory

3.1. Necessary Knowledge of Rough Set Theory

3.2. Attribute Importance Based on Inclusion Degree

4. Construction of a Combined Weighted Concept Lattice with Inclusion Degree Importance and Information Entropy

4.1. Information Entropy

4.2. Combined Weight Based on Attribute Importance and Information Entropy

4.3. Construction of Weighted Concept Lattice

4.4. Reduction of the Weighted Concept Lattice

5. A Case Study

6. Conclusions and Outlook

Acknowledgments

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI