Ctrip's exploration and practice of physical linking technology

Ctrip's exploration and practice of physical linking technology

​Author|Ctrip Travel AI R&D team is committed to providing Ctrip Travel Division with a wealth of AI technology products. The knowledge graph group focuses on the construction and application of knowledge graphs in the tourism field.

1. Background

With the rapid development of network application technology, the rapid expansion of diversified and low-density data has brought great challenges to people's acquisition of correct information. The root cause of the emergence of a large amount of redundant information lies in the diversity of natural language expression, that is, polysemy and synonymy of multiple words. For example, "apple" can refer to both the apple plant of the genus Rosaceae and the Apple product company in different contexts. Although "Shencheng" and "Modu" are completely different in literal meaning, they are both nicknames for Shanghai. The goal of entity linking is to achieve efficient processing of massive Web data, understand user intentions, and reduce information overload.

In the field of tourism, the entities that users are concerned about are usually attractions, hotels, and ways of playing around tourist destinations. These objects are collectively referred to as Points of Interest (POI) in Geographic Information Systems (GIS), and mainly include four core dimensions: name, address, coordinates, and category. With the popularization of Internet electronic map services and location-based services (LBS), POI has made great progress in both conceptual scope and information depth, and has grown into a towering tree in the information space. It can be said that all the current popular Internet outlets are related to POI, such as e-commerce, O2O, social networking, local life, Internet finance, and sharing economy.

Building an entity linking service based on the POI knowledge base and improving the effectiveness of tourism search, intelligent question and answer, knowledge mining and information extraction are of great significance to improving user experience.

2. Problem Analysis

Entity linking refers to the task of linking expressions in text to corresponding entities in the knowledge base to perform entity disambiguation and help computers understand the specific meaning of the text. It generally includes three steps: entity mention recognition, candidate entity generation, and candidate entity disambiguation.

Figure 1 Example of entity linking function

1) Entity mention recognition aims to identify the boundaries of entity mention segments in natural language and mark their locations in the input text. Taking Figure 1 as an example, the search term "Wuhan East Lake Scenic Area" entered by the user contains two named entity mentions "Wuhan" and "East Lake", which may represent the formal name, abbreviation, common name or alias of certain entities in the knowledge base.

2) Candidate entity generation generates a set of candidate entities that may be linked to a given entity name in the text, that is, based on the entity mention fragments identified in the previous step, all entities that the user may be interested in are recalled from the knowledge base. The candidate item set generated in this step determines the scope of entity disambiguation. For example, the entity mention "Wuhan" can recall "Wuhan" as a city from the knowledge base, while "East Lake" can recall two scenic spots, "Wuhan East Lake" and "Shaoxing East Lake".

3) Entity disambiguation is the process of determining the real-world entity to which an entity referent refers, and outputting a score for ranking through the static features of the candidate entity or the dynamic features that interact with the query. Taking Figure 1 as an example, combined with the context, we can see that the user is actually querying the East Lake under Wuhan City, not the East Lake in Shaoxing City, so "Wuhan East Lake" should have a higher score than "Shaoxing East Lake".

Entity mention recognition is often regarded as a sequence labeling task, and the classic methods include dictionary-based methods and statistical-based methods. Dictionary-based methods can be divided into forward maximum matching, backward maximum matching, and bidirectional maximum matching; representative methods based on statistical learning include HMM and CRF, and their performance usually relies on a large number of manually constructed and maintained feature templates. With the improvement of computing power and the development of end-to-end neural network technology, structures such as CNN and RNN are widely used to model sequence representations. Their ability to automatically combine low-order features to obtain high-order features has gotten rid of the time-consuming and labor-intensive drawbacks of manual feature engineering. At the same time, the powerful expression ability of neural networks has significantly improved the effectiveness of traditional algorithms.

Transformer, proposed by Google in 2018, brought the self-attention model to the public for the first time, providing a feasible solution for efficient parallel computing of sequence representation. The use of the self-attention mechanism enables tokens at each position in the sequence to fully learn contextual semantics and adaptively receive information inflows from tokens at different positions. It has become the basic encoding unit of the popular self-supervised learning tasks in recent years, and has inspired many large-scale pre-trained language models based on this configuration, BERT being one of the representatives.

BERT, which uses the Transformer Encoder structure, has learned a lot of prior knowledge from unlabeled corpus, and can achieve excellent results by only fine-tuning the weights on specific downstream tasks. BERT once dominated the GLUE list, setting a new SOTA record for major natural language understanding tasks, and its pre-training plus fine-tuning learning paradigm has also become a major milestone in the NLP community.

Candidate entity generation is a retrieval task. Traditional retrieval methods are represented by the Bag of Words (BOW) model, such as TF-IDF, BM25, etc. This type of algorithm does not consider word order and ignores the context between words. In addition to manually designing formulas, it is also necessary to introduce coverage, closeness, and expand synonyms based on the statistical word weights and word frequencies to achieve a better effect. The biggest drawback of the Bag of Words model is that it can only solve the problem of literal matching and cannot obtain the semantic relevance between query and document. Therefore, semantic vector retrieval solutions represented by the dual-tower model and the interactive model have begun to receive attention.

The main dual-tower models include DSSM and Siamese networks. They usually use two identical or different encoders to extract low-dimensional sentence vector representations of query and document, and then design a correlation function, such as cosine, inner product, etc., to calculate the similarity score between the two; the interactive model starts to model the correlation between query and document at the low-order feature combination stage. The key idea lies in the construction of the interaction matrix, such as ESIM and MatchPyramid. This type of model ultimately obtains the overall representation of the query-document pair, so it can avoid the loss of precision caused by independent encoding of the two parts, and often has better performance in practice.

Entity disambiguation is to sort candidate entities at a finer granularity. Common learning ranking algorithms include pointwise, pairwise, and listwise. The idea of ​​pointwise is to use regression or classification models to score each candidate entity independently; pairwise considers the relative ranking between candidate entities rather than their absolute scores, so the calculation of the loss depends on paired samples; listwise treats all candidate entity sets of a query as a sample and outputs the score of each candidate entity. In terms of feature construction and representation learning, the features of the query, the features of the candidate entity itself, or the interactive features of the two can be used. The relevant methods are similar to the semantic matching mentioned above.

3. Tourism Knowledge Graph

Knowledge Graph (KG) is a large-scale semantic network consisting of nodes and interconnected edges between nodes. It can represent the structured relationship between entities and is considered to be the cornerstone of cognitive intelligence. Nodes represent concepts or objects, and edges represent the relationship between concepts, concepts and objects, or objects and objects. In the knowledge graph, each piece of knowledge can be represented as a SPO (Subject-Predicate-Object) triple. For example, Changning District belongs to Shanghai, which is represented as (Changning District, upperClass, Shanghai) in the tourism knowledge graph. Because highly structured knowledge representation is easy for computers to understand, knowledge graphs have broad application scenarios in information retrieval, intelligent recommendation, and financial risk control.

The tourism knowledge graph created by our team uses Neo4j and Nebula graph databases as storage solutions. The graph schema involves 18 entity types and 12 relationship types. Currently, there are about 10 million entities and 37 million triples, and the amount of knowledge has begun to take shape. At the same time, a complete automatic update mechanism and monitoring system have been established at the data governance level to ensure that new knowledge is added to the database every day and expired knowledge is removed, which improves the reliability of the knowledge graph.

Figure 2 Tourism knowledge graph data exploration

4. Technical Solution

In the technical selection of the entity linking system, we follow a three-stage process, namely, entity mention recognition, candidate entity generation, and candidate entity disambiguation, three serial sub-modules work together to complete the analysis of natural language.

Figure 3 Entity linking system process

In addition, we have made some engineering optimizations, using Redis to cache the mapping relationship between aliases and candidate entity IDs and the mapping relationship between entity IDs and entity attributes, to avoid high latency caused by frequent queries to Neo4j or Nebula graph databases.

5. Functional modules

5.1 Entity Mention Recognition

This step combines the neural network model and alias prefix tree for multi-way detection to expand the recall of candidate entities.

5.1.1 Entity Alias ​​Prefix Tree

We insert all entity alias strings in the knowledge base into a prefix tree structure. In this prefix tree, except for the root node, which does not contain any characters and the leaf nodes, which contain terminators, each intermediate node contains only one character. From the root node to a certain node, the characters passed through are connected to represent the string corresponding to the node, so the successor nodes of each node in the tree have the same prefix.

Figure 4 Example of entity alias prefix tree The path from the root node to the leaf node closes an entity alias located in the knowledge base. In actual retrieval, the forward maximum matching strategy is usually adopted:

1) Maintain two pointers: prefix tree pointer and query pointer. The prefix tree pointer is located at the ROOT node when initialized, and the query pointer is located at the first character of the query text.

2) If the character to be matched pointed to by the query pointer is in the successor node of the node corresponding to the prefix tree pointer, move the prefix tree pointer to the child node and move the query pointer back one position.

3) If the character to be matched pointed to by the query pointer is not in the successor node of the node corresponding to the prefix tree pointer, if the successor node contains end, the entity mention string is closed and the prefix tree pointer returns to ROOT; otherwise, the prefix tree pointer recursively returns to the parent node (the query pointer moves forward synchronously) until the successor node of the parent node contains the end node, and then the entity mention string is closed and the prefix tree pointer returns to ROOT; if no entity mention is closed in the process of the prefix tree pointer returning to ROOT, the query pointer moves back one position.

The prefix tree can minimize the matching of invalid strings in user queries, and its worst-case time complexity is still better than that of a hash table, providing a very efficient string search solution.

5.1.2 Named Entity Recognition Model

Here we use a pointer network with BERT as the skeleton to mark the boundaries of named entities. Figure 5 shows the model framework, forward propagation process, and label decoding method.

Figure 5 Named Entity Recognition Model Structure

In the reasoning stage, the entity mention boundary is obtained by closing the token positions corresponding to the same entity label according to the prediction results of the head and tail pointers.

5.2 Candidate Entity Generation

In the tourism knowledge graph, "alias" is a special node type. During the graph construction phase, we will establish a hasAlias ​​type relationship between each newly added POI, destination, product, and tag type entity and its alias (the entity name is also an alias). Therefore, POI, product, and tag entities are associated with at least one alias entity.

Taking Figure 6 as an example, when the input text is "Wuhan Jiangxi East Lake", assuming that the recognized entity mentions are "Wuhan", "Jiangxi" and "East Lake", these three mentions are used as the name attribute values ​​of the "alias" node for conditional query to obtain three alias nodes (marked in yellow in the figure). These three alias nodes can be found through the input edges of type hasAlias. Several POI nodes are the candidate entities recalled by the text.

Figure 6 Candidate entity subgraph when the text is "Wuhan Jiangxi East Lake"

We did not use a vector retrieval solution in the candidate entity generation stage because entity mentions are generally very short strings. Similarity-based retrieval has high uncertainty and it is difficult to ensure the reliability of recall results. Maintaining a high-quality alias table is more suitable for the current scenario.

The candidate entity generation module also includes path-based pre-filtering logic. Taking Figure 6 as an example, it is detected that there may be path connections between candidate entities recalled from different entity mentions, such as "Wuhan City" to "East Lake" and "Jiangxi Province" to "Lulin Lake". Then POI nodes that have the same alias as nodes in the path but are not on the path, such as Shaoxing East Lake, will not be returned as candidate entities. In practice, in order to avoid the path assumption being too strong and some important nodes being lost by mistake, some constraints will be imposed. These methods are mostly related to rules and will not be repeated.

5.3 Candidate Entity Disambiguation

This module is used to calculate the ranking scores for candidate entities. We use the BERT-based interactive semantic matching model.

First, the query string and the description text of the candidate entity are concatenated, and after word segmentation and numerical processing, they are input into BERT to extract high-order interaction features.

In the BERT output layer, the feature vector hCLS at the [CLS] position in the input sequence is selected and concatenated with the feature vectors hhead and htail corresponding to the tokens at the first and last positions of the entity mention fragment of the candidate entity in the query. Through an affine transformation, the sigmoid activation function is used to obtain the probability value of the candidate entity being a link object:

Among them, w and b are the linear layer parameters.

Figure 7 Entity disambiguation model structure

The loss function in the model training phase is the binary cross entropy loss.

Here y is the 0-1 true label of the candidate entity.

In the inference stage, the probability score is calculated for each candidate entity recalled by the query and sorted from high to low. The candidate entity sequence is truncated according to the preset threshold to obtain the link result.

6. Practical Scenarios

6.1 Ctrip Travel Search

Ctrip Travel Search Term Parsing Service performs word segmentation and part-of-speech tagging through the backend configuration dictionary, and returns all matching POI terms. It does not have the function of rejecting or sorting POIs with the same name, and often introduces search results that are irrelevant to the query.

After connecting to the entity linking system, it is possible to disambiguate POIs with duplicate names based on context information. Even if context is missing, the departure city can be used to assist in sorting candidate entities.

Case 1 The search term is "Wuhan East Lake". The interface originally returns "Wuhan City" and all scenic spots named "East Lake". The entity link service is called, and only the East Lake Scenic Area (id: 1xxx6) located in "Wuhan City" is returned.

Case 2 The search term is "Shenzhen Disney", and the interface originally returns "Shenzhen City" and all Disney resorts. Although there is no Disneyland under Shenzhen City, common sense would make people think that the user's actual intention may be the Disneyland in Hong Kong (id: 1xxx9), which is exactly the result returned after entity linking.

Case 3 The search term is "Baiyun Mountain", and the departure station is set to Dongguan City. The interface originally returns all scenic spots named "Baiyun Mountain", and there is no sorting, so it is impossible to infer the user's interest in each POI. After calling the entity linking service, Baiyun Mountain in Guangzhou City (id: 7xxx4) is ranked top 1 in the returned results, indicating that the system has captured the association between "Guangzhou Baiyun Mountain" and the location station "Dongguan City" during the entity disambiguation stage.

6.2 Ctrip Travel Intelligent Customer Service

In human-computer dialogue systems, semantic slot filling is usually performed in conjunction with intent recognition to determine follow-up questions, ambiguous clarification words, or complete the understanding of the user's natural language, search from the knowledge base and return answers.

For example, when a user asks "flights from Shanghai to Chengdu", the intention is "query flights", but only classifying the user's intention is not enough to give an accurate answer, because two key pieces of information are missing: the departure and arrival stations of the flight, which are the semantic slots related to "query flights". Only after the intention recognition and semantic slot filling are completed, the conditions for searching for answers are met. Here, the departure station and arrival station refer to Shanghai and Chengdu respectively, which happen to be two POIs in the tourism knowledge graph. With the help of entity links, the id information of these two POIs can be easily found.

After Ctrip Travel’s intelligent customer service introduced the entity linking service, the F1 Score of word slot extraction increased by more than 12 percentage points compared to the original, reflecting the huge potential of entity linking in customer service scenarios.

6.3 Ctrip POI key information update

Ticket-related departments need to ensure the accuracy of frequently updated key POI information, such as the opening and closing times of scenic spots, which is of vital importance to product sales and user experience.

The main basis for updating the opening and closing information is the daily announcements and information obtained from the official channels of the scenic spot. The POI names and corresponding opening or closing times are extracted by parsing the content of these articles. In the current situation of repeated epidemics, this information faces more frequent changes, so higher requirements are placed on accuracy and timeliness.

When the original text is parsed and written to the database, it will be attached to the scenic spot that published the information. However, this information is not necessarily correct. In practice, there are many situations where the scenic spots extracted from the text are inconsistent with the scenic spots that published the information. For example, a scenic spot announces that a subordinate sub-scenic spot is closed. At this time, it is necessary to map the extracted scenic spot name to the entity in the knowledge graph through entity linking to obtain the real POI id. This function can improve the accuracy of the information and perform POI disambiguation at the same time.

After the introduction of entity links, the accuracy of the scenic spot opening and closing extraction project increased by nearly six percentage points, greatly improving the effect of the original extraction process.

6.4 Ctrip’s identification of duplicate POIs and upper-lower-level POI relationships

The sources of POI data maintained by the ticket activity-related departments are very complex, including multiple internal and official platforms. When importing POI data in batches, the duplicate POIs and the hierarchical relationships between POIs are not fully identified, which will lead to a large number of duplicate POIs in the system, resulting in diversion; or the system will have POIs that are outside, resulting in incomplete display and users cannot fully understand the situation of the scenic spot. Therefore, it is necessary to obtain and repair this information in a timely manner to improve the comprehensiveness of information coverage and the information reliability of the platform.

The address or description of a POI may imply the parent node of the POI. For example, the parent node of a POI with the address "xxx Road, No. xxx, xxx Scenic Area" may be a scenic area. If the ID of the scenic area can be obtained using entity linking technology, and there is no superior-subordinate relationship between the two POIs in the current graph, it can be added to the relationship recognition system as an important feature. Since the project was launched, the average accuracy rate of superior-subordinate relationship recognition has reached more than 90%, and the accuracy of nearly a thousand POI information has been improved.

VII. Summary and Outlook

This article mainly introduces the exploration and practice of the tourism AI knowledge graph group on entity linking technology, explains the basic definition of entity linking, related technical development routes and application value, and combines each sub-module to explain in detail the architecture and process of the entity linking system based on the tourism knowledge graph. Finally, it introduces the implementation scenarios of the entity linking system.

In the future, we will keep pace with the development of cutting-edge technologies, promote a closer integration of knowledge graphs with entity recall and refined ranking tasks, make full use of the structure of the graph to improve the effectiveness and interpretability of existing models, explore more efficient and lightweight models, and at the same time take into account the implementation of technology to empower more tourism scenarios in the future.

<<:  Some of our thoughts and attempts on end-to-end speech translation

>>:  Are you annoyed by harassing calls every day? No need for an app to block them

Recommend

Design your APP interface like a designer

[[140382]] In early 2014, the number of people us...

Mercedes-Benz factory workers to strike again after talks fail

According to Reuters recently, on November 24, wo...

Evolution of IoT technology: wireless connection + smart sensors

The Internet of Things is a term for connected em...

How to achieve user growth? Share 4 new customer acquisition techniques!

In an era where traffic costs are rising, efficie...

How to create a hit product amid product homogeneity?

Have you noticed that some products become popula...

Try the Xiaozhi Butler Robot. Is it time to hire an intelligent assistant?

In addition to fear, AlphaGo's victory should...

Which CDN is the fastest for server rental configuration?

To solve the server speed problem, there is not o...

Naughty Tom Cat

Source code introduction: The cute Tom cat can pe...

They are most afraid of you going to the hospital...

“As long as you can hold it in, there is no need ...