When knowledge graphs “meet” deep learning

Author: Xiao Yanghua, Associate Professor, Doctoral Supervisor, School of Computer Science and Technology, Fudan University, Deputy Director of Shanghai Internet Big Data Engineering Technology Center. His main research directions are big data management and mining, knowledge base, etc.

The advent of the big data era has brought unprecedented data dividends to the rapid development of artificial intelligence. Under the "feeding" of big data, artificial intelligence technology has made unprecedented progress. Its progress is prominently reflected in related fields such as knowledge engineering represented by knowledge graphs and machine learning represented by deep learning. As the dividends of deep learning for big data are exhausted, the ceiling of the effect of deep learning models is approaching. On the other hand, a large number of knowledge graphs continue to emerge, and these treasure troves containing a large amount of human prior knowledge have not yet been effectively utilized by deep learning. The integration of knowledge graphs and deep learning has become one of the important ideas to further improve the effect of deep learning models. Symbolism represented by knowledge graphs and connectionism represented by deep learning are increasingly moving away from the original independent development track and embarking on a new path of coordinated progress.

Historical background of the integration of knowledge graph and deep learning

Big data brings unprecedented data dividends to machine learning, especially deep learning. Thanks to large-scale labeled data, deep neural networks can learn effective hierarchical feature representations, thus achieving excellent results in areas such as image recognition. However, as the data dividend disappears, deep learning is increasingly showing its limitations, especially in terms of reliance on large-scale labeled data and difficulty in effectively utilizing prior knowledge. These limitations hinder the further development of deep learning. On the other hand, in a large number of practices of deep learning, people are increasingly finding that the results of deep learning models often conflict with people's prior knowledge or expert knowledge. How can deep learning get rid of its dependence on large-scale samples? How can deep learning models effectively utilize a large amount of prior knowledge? How to make the results of deep learning models consistent with prior knowledge has become an important issue in the current field of deep learning.

At present, human society has accumulated a large amount of knowledge. In particular, in recent years, driven by knowledge graph technology, various machine-friendly online knowledge graphs have emerged in large numbers. Knowledge graph is essentially a semantic network that expresses various entities, concepts and their semantic relationships. Compared with traditional knowledge representation forms (such as ontology and traditional semantic network), knowledge graph has the advantages of high entity/concept coverage, diverse semantic relationships, friendly structure (usually expressed in RDF format) and high quality, making knowledge graph increasingly become the most important knowledge representation method in the era of big data and artificial intelligence. Whether the knowledge contained in the knowledge graph can be used to guide the learning of deep neural network models and thus improve the performance of the models has become one of the important issues in the study of deep learning models.

At present, the method of applying deep learning technology to knowledge graphs is relatively direct. A large number of deep learning models can effectively complete end-to-end entity recognition, relationship extraction and relationship completion tasks, and can then be used to build or enrich knowledge graphs. This paper mainly discusses the application of knowledge graphs in deep learning models. From the current literature, there are two main ways. One is to input the semantic information in the knowledge graph into the deep learning model; express the discretized knowledge graph as a continuous vector, so that the prior knowledge of the knowledge graph can become the input of deep learning. The second is to use knowledge as a constraint of the optimization target to guide the learning of the deep learning model; usually, the knowledge in the knowledge graph is expressed as a posterior regular term of the optimization target. There are many papers on the research work of the former, and it has become a current research hotspot. The vector representation of the knowledge graph has been effectively applied as an important feature in practical tasks such as question answering and recommendation. The research on the latter has just started, and this paper will focus on the deep learning model with first-order predicate logic as a constraint.

Knowledge graph as input for deep learning

Knowledge graphs are a typical example of recent progress in symbolism in artificial intelligence. Entities, concepts, and relationships in knowledge graphs are all represented in discrete, explicit symbolic representations. However, these discrete symbolic representations are difficult to directly apply to neural networks based on continuous numerical representations. In order to enable neural networks to effectively utilize the symbolic knowledge in knowledge graphs, researchers have proposed a large number of representation learning methods for knowledge graphs. Representation learning of knowledge graphs aims to learn real-valued vectorized representations of the constituent elements (nodes and edges) of knowledge graphs. These continuous vectorized representations can be used as inputs to neural networks, allowing neural network models to make full use of the large amount of prior knowledge in knowledge graphs. This trend has spawned a large number of studies on representation learning of knowledge graphs. This chapter first briefly reviews representation learning of knowledge graphs, and then further introduces how these vector representations are applied to various practical tasks based on deep learning models, especially practical applications such as question answering and recommendation.

1. Representation Learning of Knowledge Graphs

The representation learning of knowledge graph aims to learn the vectorized representation of entities and relations. The key is to reasonably define the loss function ƒr(h,t) about facts (triplets <h,r,t>) in the knowledge graph, where and are the vectorized representations of the two entities h and t in the triplet. Usually, when the fact <h,r,t> holds, it is expected to minimize ƒr(h,t). Considering the facts of the entire knowledge graph, it can be minimized by

To learn the vectorized representation of entities and relations, O represents the set of all facts in the knowledge graph. Different representation learning can use different principles and methods to define the corresponding loss function. Here, the basic idea of knowledge graph representation is introduced based on the distance and translation model [1].

Distance-based models. The representative work is the SE model [2]. The basic idea is that when two entities belong to the same triple <h, r, t>, their vector representations should also be close to each other in the projected space. Therefore, the loss function is defined as the distance after vector projection

The matrices Wr,1 and Wr,2 are used for the projection operation of the head entity h and the tail entity t in the triple. However, since SE introduces two separate projection matrices, it is difficult to capture the semantic correlation between entities and relations. To address this problem, Socher et al. used a third-order tensor to replace the linear transformation layer in the traditional neural network to characterize the scoring function. Bordes et al. proposed an energy matching model, which captures the interactive relationship between entity vectors and relationship vectors by introducing the Hadamard product of multiple matrices.

Representation learning based on translation. Its representative work TransE model describes the correlation between entities and relations through vector translation in vector space[3]. The model assumes that if <h,r,t> holds, the embedding representation of the tail entity t should be close to the embedding representation of the head entity h plus the relation vector r, that is, h+r≈t. Therefore, TransE adopts

As a scoring function. When a triple is established, the score is low, otherwise the score is high. TransE is very effective in dealing with simple 1-1 relationships (i.e., the ratio of the number of entities connected at both ends of the relationship is 1:1), but its performance is significantly reduced when dealing with complex relationships of N-1, 1-N, and NN. For these complex relationships, Wang proposed the TransH model to learn different representations of entities under different relationships by projecting the entities to the hyperplane where the relationship is located. Lin proposed the TransR model to project the entities to the relationship subspace through the projection matrix, thereby learning different entity representations under different relationships.

In addition to the two typical knowledge graph representation learning models mentioned above, there are a large number of other representation learning models. For example, Sutskever et al. used tensor factorization and Bayesian clustering to learn relational structures. Ranzato et al. introduced a three-way restricted Boltzmann machine to learn the vectorized representation of knowledge graphs and parameterized it through a tensor.

The current mainstream knowledge graph representation learning methods still have various problems, such as the inability to well describe the semantic correlation between entities and relationships, the inability to well handle the representation learning of complex relationships, the model is too complex due to the introduction of a large number of parameters, and the computational efficiency is low and difficult to expand to large-scale knowledge graphs, etc. In order to better provide prior knowledge for machine learning or deep learning, the representation learning of knowledge graphs is still a long-term research topic.

Application of knowledge graph vectorization representation

Application 1 Question answering system. Natural language question answering is an important form of human-computer interaction. Deep learning makes it possible to generate question answers based on question answering corpus. However, most deep question answering models still find it difficult to use a large amount of knowledge to achieve accurate answers. Yin et al. proposed a deep learning question answering model based on an encoder-decoder framework for simple factual questions, which can make full use of the knowledge in the knowledge graph [4]. In deep neural networks, the semantics of a question is often represented as a vector. Questions with similar vectors are considered to have similar semantics. This is a typical approach of connectionism. On the other hand, the knowledge representation of the knowledge graph is discrete, that is, there is no gradual relationship between knowledge and knowledge. This is a typical approach of symbolism. By vectorizing the knowledge graph, questions can be matched with triples (that is, their vector similarity is calculated), so as to find the best triple matching from the knowledge base for a specific question. The matching process is shown in Figure 1. For question Q: "How tallis Yao Ming?", first represent the words in the question as a vector array HQ. Then search for candidate triples in the knowledge graph that can match it. Finally, for these candidate triples, the semantic similarity between the question and different attributes is calculated respectively. It is determined by the following similarity formula:

Here, S(Q,τ) represents the similarity between question Q and candidate triplet τ; xQ represents the vector of the question (calculated from HQ), uτ represents the vector of the triplet of the knowledge graph, and M is the parameter to be learned.

Figure 1 Neural generated question answering model based on knowledge graph

Application 2 Recommendation system. Personalized recommendation system is one of the important intelligent services of major social media and e-commerce websites on the Internet. With the increasing application of knowledge graphs, a large number of research works have realized that the knowledge in knowledge graphs can be used to improve the content (feature) description of users and items in content-based recommendation systems, thereby improving the recommendation effect. On the other hand, recommendation algorithms based on deep learning are increasingly superior to traditional recommendation models based on collaborative filtering in terms of recommendation effect [5]. However, research on personalized recommendation by integrating knowledge graphs into the framework of deep learning is still relatively rare. Zhang et al. made such an attempt. The authors made full use of three typical types of knowledge: structured knowledge (knowledge graphs), textual knowledge, and visual knowledge (images) [6]. The authors obtained the vectorized representation of structured knowledge through network embedding, and then used SDAE (Stacked Denoising Auto-Encoder) and stacked convolution-autoencoder to extract textual knowledge features and image knowledge features respectively; and finally integrated the three types of features into the collaborative ensemble learning framework, and used the integration of the three types of knowledge features to achieve personalized recommendation. The authors conducted experiments on movie and book datasets and proved that this recommendation algorithm that integrates deep learning and knowledge graphs has good performance.

Knowledge graphs as constraints for deep learning

Hu et al. proposed a model that integrates first-order predicate logic into deep neural networks and successfully used it to solve problems such as sentiment classification and named entity recognition [7]. Logical rules are a flexible form of representation for high-order cognition and structured knowledge, and are also a typical form of knowledge representation. It is of great significance to introduce various logical rules that people have accumulated into deep neural networks and use human intentions and domain knowledge to guide neural network models. Some other research works have attempted to introduce logical rules into probabilistic graphical models. The representative of this type of work is Markov logic networks [8], but few works have been able to introduce logical rules into deep neural networks.

The solution framework proposed by Hu et al. can be summarized as a "teacher-student network", as shown in Figure 2, which includes two parts: teacher network q(y|x) and student network pθ(y|x). The teacher network is responsible for modeling the knowledge represented by the logical rules, and the student network uses the back-propagation method plus the constraints of the teacher network to learn the logical rules. This framework can introduce logical rules to most tasks that use deep neural networks as models, including sentiment analysis, named entity recognition, etc. By introducing logical rules, the effect can be improved on the basis of the deep neural network model.

Figure 2: The “teacher-student network” model that introduces logical rules into deep neural networks

The learning process mainly includes the following steps:

Use soft logic to express logical rules as continuous values between [0, 1].
Based on the posterior regularization method, logical rules are used to restrict the teacher network, while ensuring that the teacher network and the student network are as close as possible. The final optimization function is:

Among them, ξl,gl are slack variables, L is the number of rules, and Gl is the grounding number of the lth rule. The KL function (Kullback-Leibler Divergence) part ensures that the learning models of the teacher network and the student network are as consistent as possible. The regularization term behind expresses the constraints from the logical rules.
Train the student network to ensure that the prediction results of the teacher network and the student network are as good as possible. The optimization function is as follows:

Where t is the training round, l is the loss function in different tasks (for example, in classification problems, l is the cross entropy), σθ is the prediction function, and sn(t) is the prediction result of the teacher network.
Repeat steps 1 to 3 until convergence.

Conclusion

With the further development of deep learning research, how to effectively utilize a large amount of prior knowledge and reduce the model's dependence on large-scale labeled samples has gradually become one of the mainstream research directions. Representation learning of knowledge graphs has laid the necessary foundation for exploration in this direction. Some recent pioneering work on integrating knowledge into deep neural network models is also quite inspiring. But overall, the current deep learning models still have very limited means of using prior knowledge, and the academic community still faces huge challenges in exploring this direction. These challenges are mainly reflected in two aspects:

How to obtain high-quality continuous representations of various types of knowledge. Current knowledge graph representation learning, no matter what learning principle it is based on, inevitably produces semantic loss. Once symbolic knowledge is vectorized, a large amount of semantic information is discarded, and only very vague semantic similarity relationships can be expressed. How to learn high-quality continuous representations for knowledge graphs is still an open problem.
How to integrate common sense knowledge into deep learning models. A large number of practical tasks (such as dialogue, question answering, reading comprehension, etc.) require machines to understand common sense. The scarcity of common sense knowledge has seriously hindered the development of general artificial intelligence. How to introduce common sense into deep learning models will be a major challenge in the future field of artificial intelligence research, but also a major opportunity.

<<: How to explain deep learning to non-professionals?

>>: iOS - Solution to NSTimer circular reference

How to play the short video planting game?

Will the cybersecurity companies that disappeared in the mobile phone era become life firewalls in the field of smart cars?

Blog

Community operation: 5 ways to get to know the community and master it

Blog

The Central Document No. 1 focuses on firmly maintaining two bottom lines! What are the two bottom lines? Details attached!

On the evening of February 22, 2022, the Central ...

Japanese electric cars are finally here. GAC Honda's first pure electric sedan with the Aion logo is about to be launched

Today, as the automotive industry is swept by the...

When knowledge graphs “meet” deep learning

How to play the short video planting game?

Google's new system Fuchsia, which is quietly developing, will be the end of Android

SaaS product promotion and customer acquisition guide!

Why is watching TikTok addictive?

Summary of 31 SEO interview questions and 4 interview skills!

China 4G: Apple and Samsung continue to fall sharply

This small hole on the ear is not a symbol of "wealth"! It may be a dangerous hole...

The basic elements of a community operation plan!

Will the cybersecurity companies that disappeared in the mobile phone era become life firewalls in the field of smart cars?

Community operation: 5 ways to get to know the community and master it

Recommend

Having more sons is good for fighting. Can Chery Jetour X70 help Chery turn things around?

New food regulations are coming! It concerns everyone. Read the labels like this when buying things in the future →

Why has the radio and television industry fallen to a state of decline even worse than Chinese football?

The Central Document No. 1 focuses on firmly maintaining two bottom lines! What are the two bottom lines? Details attached!

Mobei Class Foreign Trade SEO Promotion Google SEO (Google SEO) Tutorial (2-7 Periods 10-11 Periods) Value 3490 yuan

Eucommia brand "Nezha Plus" is launched! Don't use lotus root, let's try some black technology this time!

Is there any difference between Douyin Feed and Dou+?

Breaking the dependence on imports! CCTV praised the good life of a Zhejiang University academician who is good at coal

"Black Book of Plans" is officially launched丨It makes it easy to write plans

UI Development Trends in 2020: Declarative UI to Rule Them All

Fun Questions | What is the uneven structure above the mouth?

Some tips on server hosting?

After living on land for a long time, can you return to the ocean to live a "carefree" life?

APP promotion: 4 core steps to acquire users!

Japanese electric cars are finally here. GAC Honda's first pure electric sedan with the Aion logo is about to be launched