Using deep neural networks to solve the problem of NER named entity recognition

This article is organized as follows:

What is Named Entity Recognition (NER)
How to identify?

cs224d Day 7: Project 2-Using DNN to solve NER problem

Course Project Description Address

What is NER?

Named entity recognition (NER) refers to the identification of entities with specific meanings in text, mainly including names of people, places, institutions, proper nouns, etc. Named entity recognition is an important basic tool in application fields such as information extraction, question-answering systems, syntactic analysis, and machine translation, and is an important step in extracting structured information. Excerpted from BosonNLP

How to identify?

I will first explain the logic of solving the problem, and then explain the main code. If you are interested, please go here to see the complete code.

The code is to build a DNN with only one hidden layer under Tensorflow to handle the NER problem.

1. Problem identification:

NER is a classification problem.

Given a word, we need to determine which of the following four categories it belongs to based on the context. If it does not belong to any of the following four categories, then the category is 0, which means it is not an entity. Therefore, this is a problem that needs to be divided into 5 categories:

 • Person (PER)
 • Organization (ORG)
 • Location (LOC)
 • Miscellaneous (MISC)

Our training data has two columns, the first column is the word and the second column is the label.

 EU ORG
 rejects
 German MISC
 Peter PER
 BRUSSELS LOC

2. Model:

Next we train it using a deep neural network.

The model is as follows:

The x^(t) of the input layer is the context with a window size of 3 centered on x_t. x_t is a one-hot vector. After x_t and L are applied, they become the corresponding word vector, and the length of the word vector is d = 50:

We build a neural network with only one hidden layer, the hidden layer dimension is 100, y^ is the predicted value, the dimension is 5:

Use cross entropy to calculate the error:

J is differentiated with respect to each parameter:

The following derivation formula is obtained:

In TensorFlow, derivation is automatically implemented. Here, the Adam optimization algorithm is used to update the gradient and continuously iterate to make the loss smaller and smaller until convergence.

3. Specific implementation

In def test_NER(), we perform max_epochs iterations. Each time, we train the model with the training data to get a pair of train_loss and train_acc. Then we use this model to predict the validation data and get a pair of val_loss and predictions. We select the smallest val_loss and save the corresponding parameter weights. Finally, we use these parameters to predict the category label of the test data:

 def test_NER():
 config = Config()
 with tf.Graph().as_default():
 model = NERModel(config) # The main class 
 
 init = tf.initialize_all_variables()
 saver = tf.train.Saver() 
 
 with tf.Session() as session:
 best_val_loss = float ( 'inf' ) # The best value, its loss, its number of iterations, epoch
 best_val_epoch = 0   
 
 session.run(init)
 for epoch in xrange(config.max_epochs):
 print 'Epoch {}' .format(epoch)
 start = time.time()
 ###
 train_loss, train_acc = model.run_epoch(session, model.X_train,
 model.y_train) # 1. Put the train data into the iteration and get the loss and accuracy
 val_loss, predictions = model.predict(session, model.X_dev, model.y_dev) # 2. Use this model to predict dev data and get loss and prediction
 print 'Training loss: {}' .format(train_loss)
 print 'Training acc: {}' .format(train_acc)
 print 'Validation loss: {}' .format(val_loss)
 if val_loss < best_val_loss: # Use the loss of val data to find the minimum loss
 best_val_loss = val_loss
 best_val_epoch = epoch
 if not os.path.exists( "./weights" ):
 os.makedirs( "./weights" ) 
 
 saver.save(session, './weights/ner.weights' ) # Save the weights corresponding to the minimum loss
 if epoch - best_val_epoch > config.early_stopping:
 break  
 ###
 confusion = calculate_confusion(config, predictions, model.y_dev) # 3. Put the dev label data in and calculate the confusion of prediction
 print_confusion(confusion, model.num_to_tag)
 print 'Total time: {}' .format(time.time() - start) 
 
 saver.restore(session, './weights/ner.weights' ) # Load the saved weights again, use the test data to make predictions, and get the prediction results
 print 'Test'  
 print '=-=-='  
 print 'Writing predictions to q2_test.predicted'  
 _, predictions = model.predict(session, model.X_test, model.y_test)
 save_predictions(predictions, "q2_test.predicted" ) # Save the prediction results 
 
 if __name__ == "__main__" :
 test_NER()

4. How is the model trained?

First, import the data training, validation, and test:

 # Load the training set  
 docs = du.load_dataset( 'data/ner/train' ) 
 
 # Load the dev set (for tuning hyperparameters)  
 docs = du.load_dataset( 'data/ner/dev' ) 
 
 # Load the test set (dummy labels only)  
 docs = du.load_dataset( 'data/ner/test.masked' )

After converting the words into one-hot vectors, convert them into word vectors:

 def add_embedding( self ):
 # The embedding lookup is currently only implemented for the CPU  
 with tf.device( '/cpu:0' ): 
 
    embedding = tf.get_variable( 'Embedding' , [len( self .wv), self .config.embed_size]) # L in assignment  
 window = tf.nn.embedding_lookup(embedding, self .input_placeholder) # Get the word vector of the context of window size directly in L  
 window = tf.reshape(
      window, [- 1 , self .config.window_size * self .config.embed_size]) 
 
 return window

Build the neural layer, including using xavier to initialize the first layer, L2 regularization and using dropout to reduce overfitting:

 def add_model( self , window): 
 
 with tf.variable_scope( 'Layer1' , initializer=xavier_weight_init()) as scope: # Use initializer=xavier to initialize the first layer  
 W = tf.get_variable( # *** layer has W, b1, h  
 'W' , [ self .config.window_size * self .config.embed_size,
 self .config.hidden_size])
    b1 = tf.get_variable( 'b1' , [ self .config.hidden_size])
    h = tf.nn.tanh(tf.matmul(window, W) + b1)
 if   self .config.l2: # L2 regularization for W  
        tf.add_to_collection( 'total_loss' , 0.5 * self .config.l2 * tf.nn.l2_loss(W)) # 0.5 * self.config.l2 * tf.nn.l2_loss(W)   
 
  with tf.variable_scope( 'Layer2' , initializer=xavier_weight_init()) as scope:
    U = tf.get_variable( 'U' , [ self .config.hidden_size, self .config.label_size])
    b2 = tf.get_variable( 'b2' , [ self .config.label_size])
 y = tf.matmul(h, U) + b2
 if   self .config.l2:
        tf.add_to_collection( 'total_loss' , 0.5 * self .config.l2 * tf.nn.l2_loss(U))
 output = tf.nn.dropout(y, self .dropout_placeholder) # Return output, both variable_scopes have dropout   
 
 return output

For more information about what L2 regularization and dropout are and how to reduce overfitting problems, please read this blog post, which summarizes them simply and clearly.

Use cross entropy to calculate loss:

 def add_loss_op( self , y): 
 
 cross_entropy = tf.reduce_mean( # 1. Key step: loss is defined using cross entropy  
 tf.nn.softmax_cross_entropy_with_logits(y, self .labels_placeholder)) # y is the model prediction value, calculate cross entropy  
  tf.add_to_collection( 'total_loss' , cross_entropy) # Stores value in the collection with the given name.  
 # collections are not sets, it is possible to add a value to a collection several times.  
  loss = tf.add_n(tf.get_collection( 'total_loss' )) # Adds all input tensors element-wise. inputs: A list of Tensor with same shape and type   
 
 return loss

Then use Adam Optimizer to minimize the loss:

 def add_training_op( self , loss): 
 
  optimizer = tf.train.AdamOptimizer( self .config.lr)
  global_step = tf.Variable( 0 , name= 'global_step' , trainable= False )
 train_op = optimizer.minimize(loss, global_step=global_step) # 2. Key step: Use AdamOptimizer to minimize loss, so the more important thing is loss   
 
 return train_op

After each training, the corresponding weights that minimize the loss are obtained.

In this way, the NER classification problem is solved. Of course, in order to improve accuracy and other issues, we still need to consult literature to learn. Next time, we will implement an RNN first.

<<: AI helps you solve the problem of "too long to read": How to build a deep abstract summary model

>>: The principle of BP algorithm in neural network and source code analysis of Python implementation

9 knowledge points for marketers in 2020!

Blog

A live broadcast review system worth 100,000 yuan!

Blog

Electric Technology Car News: Low price and high enjoyment, can the Vision X1 gain a firm foothold in the small SUV market?

Blog

SEO article standards, what are the requirements and specifications for SEO article writing?

The five elements landing system of community operation, the only common framework diagram for all big names to earn 100,000 yuan a day [audio course]

Blog

6 channels for private domain traffic!

Blog

Unknown light column appears in the night sky of Jeju Island, South Korea! Is it an alien invasion or a celestial wonder?

Blog

How to increase the popularity of Kuaishou live broadcast, increase the popularity of Kuaishou live broadcast by 1,000 people!

Blog

Technology News丨The new coronavirus will stay in the human body for several months; my country releases the ocean circulation numerical model "Mazu"

【Today’s cover】 On December 29, after the snow cl...

Using deep neural networks to solve the problem of NER named entity recognition

This article is organized as follows:

How to identify?

3. Specific implementation

9 knowledge points for marketers in 2020!

A live broadcast review system worth 100,000 yuan!

Electric Technology Car News: Low price and high enjoyment, can the Vision X1 gain a firm foothold in the small SUV market?

SEO article standards, what are the requirements and specifications for SEO article writing?

What should I do if the Douyin showcase score is low?

How to effectively identify malicious clicks? How can websites prevent malicious clicks?

The five elements landing system of community operation, the only common framework diagram for all big names to earn 100,000 yuan a day [audio course]

6 channels for private domain traffic!

Unknown light column appears in the night sky of Jeju Island, South Korea! Is it an alien invasion or a celestial wonder?

How to increase the popularity of Kuaishou live broadcast, increase the popularity of Kuaishou live broadcast by 1,000 people!

Recommend

The best time for meteor showers in 2024 has been determined! These places in Beijing are suitable for stargazing →

“Jike APP” UGC community operation skills!

After 30 years, the Siberian tiger returns to Changbai Mountain

1. Hundreds of millions of Chinese people claim to be "lactose intolerant", so should we still drink milk?

Technology News丨The new coronavirus will stay in the human body for several months; my country releases the ocean circulation numerical model "Mazu"

50 Children's Day copywriting sentences, worth collecting!

APP user growth: How to use data analysis to improve user growth?

How to play information flow advertising in the gaming industry? 38 creative sentence patterns to use!

"Yangkang" passengers return home

What is the Hammer smartphone doing with less than 1,000 units shipped on the first day?

How to build a growth system based on user behavior?

Promotional event: the process of bargaining for popular products!

Is it better to let your hair air dry or blow dry? The answer may not be what you think

A well-known male singer died of cerebral hemorrhage after singing on stage! Be careful when singing with such people

8000-word long article | A comprehensive interpretation of Xiaohongshu's "professional account"