Using deep neural networks to solve the problem of NER named entity recognition

Using deep neural networks to solve the problem of NER named entity recognition

This article is organized as follows:

  1. What is Named Entity Recognition (NER)
  2. How to identify?

cs224d Day 7: Project 2-Using DNN to solve NER problem

Course Project Description Address

What is NER?

Named entity recognition (NER) refers to the identification of entities with specific meanings in text, mainly including names of people, places, institutions, proper nouns, etc. Named entity recognition is an important basic tool in application fields such as information extraction, question-answering systems, syntactic analysis, and machine translation, and is an important step in extracting structured information. Excerpted from BosonNLP

How to identify?

I will first explain the logic of solving the problem, and then explain the main code. If you are interested, please go here to see the complete code.

The code is to build a DNN with only one hidden layer under Tensorflow to handle the NER problem.

1. Problem identification:

NER is a classification problem.

Given a word, we need to determine which of the following four categories it belongs to based on the context. If it does not belong to any of the following four categories, then the category is 0, which means it is not an entity. Therefore, this is a problem that needs to be divided into 5 categories:

  1. • Person (PER)
  2. • Organization (ORG)
  3. • Location (LOC)
  4. • Miscellaneous (MISC)

Our training data has two columns, the first column is the word and the second column is the label.

  1. EU ORG
  2. rejects
  3. German MISC
  4. Peter PER
  5. BRUSSELS LOC

2. Model:

Next we train it using a deep neural network.

The model is as follows:

The x^(t) of the input layer is the context with a window size of 3 centered on x_t. x_t is a one-hot vector. After x_t and L are applied, they become the corresponding word vector, and the length of the word vector is d = 50:

We build a neural network with only one hidden layer, the hidden layer dimension is 100, y^ is the predicted value, the dimension is 5:

Use cross entropy to calculate the error:

J is differentiated with respect to each parameter:

The following derivation formula is obtained:

In TensorFlow, derivation is automatically implemented. Here, the Adam optimization algorithm is used to update the gradient and continuously iterate to make the loss smaller and smaller until convergence.

3. Specific implementation

In def test_NER(), we perform max_epochs iterations. Each time, we train the model with the training data to get a pair of train_loss and train_acc. Then we use this model to predict the validation data and get a pair of val_loss and predictions. We select the smallest val_loss and save the corresponding parameter weights. Finally, we use these parameters to predict the category label of the test data:

  1. def test_NER():
  2. config = Config()
  3. with tf.Graph().as_default():
  4. model = NERModel(config) # The main class
  5.  
  6. init = tf.initialize_all_variables()
  7. saver = tf.train.Saver()
  8.  
  9. with tf.Session() as session:
  10. best_val_loss = float ( 'inf' ) # The best value, its loss, its number of iterations, epoch
  11. best_val_epoch = 0  
  12.  
  13. session.run(init)
  14. for epoch in xrange(config.max_epochs):
  15. print 'Epoch {}' .format(epoch)
  16. start = time.time()
  17. ###
  18. train_loss, train_acc = model.run_epoch(session, model.X_train,
  19. model.y_train) # 1. Put the train data into the iteration and get the loss and accuracy
  20. val_loss, predictions = model.predict(session, model.X_dev, model.y_dev) # 2. Use this model to predict dev data and get loss and prediction
  21. print 'Training loss: {}' .format(train_loss)
  22. print 'Training acc: {}' .format(train_acc)
  23. print 'Validation loss: {}' .format(val_loss)
  24. if val_loss < best_val_loss: # Use the loss of val data to find the minimum loss
  25. best_val_loss = val_loss
  26. best_val_epoch = epoch
  27. if not os.path.exists( "./weights" ):
  28. os.makedirs( "./weights" )
  29.  
  30. saver.save(session, './weights/ner.weights' ) # Save the weights corresponding to the minimum loss
  31. if epoch - best_val_epoch > config.early_stopping:
  32. break  
  33. ###
  34. confusion = calculate_confusion(config, predictions, model.y_dev) # 3. Put the dev label data in and calculate the confusion of prediction
  35. print_confusion(confusion, model.num_to_tag)
  36. print 'Total time: {}' .format(time.time() - start)
  37.  
  38. saver.restore(session, './weights/ner.weights' ) # Load the saved weights again, use the test data to make predictions, and get the prediction results
  39. print 'Test'  
  40. print '=-=-='  
  41. print 'Writing predictions to q2_test.predicted'  
  42. _, predictions = model.predict(session, model.X_test, model.y_test)
  43. save_predictions(predictions, "q2_test.predicted" ) # Save the prediction results
  44.  
  45. if __name__ == "__main__" :
  46. test_NER()

4. How is the model trained?

First, import the data training, validation, and test:

  1. # Load the training set  
  2. docs = du.load_dataset( 'data/ner/train' )
  3.  
  4. # Load the dev set (for tuning hyperparameters)  
  5. docs = du.load_dataset( 'data/ner/dev' )
  6.  
  7. # Load the test set (dummy labels only)  
  8. docs = du.load_dataset( 'data/ner/test.masked' )

After converting the words into one-hot vectors, convert them into word vectors:

  1. def add_embedding( self ):
  2. # The embedding lookup is currently only implemented for the CPU  
  3. with tf.device( '/cpu:0' ):
  4.  
  5. embedding = tf.get_variable( 'Embedding' , [len( self .wv), self .config.embed_size]) # L in assignment  
  6. window = tf.nn.embedding_lookup(embedding, self .input_placeholder) # Get the word vector of the context of window size directly in L  
  7. window = tf.reshape(
  8. window, [- 1 , self .config.window_size * self .config.embed_size])
  9.  
  10. return window

Build the neural layer, including using xavier to initialize the first layer, L2 regularization and using dropout to reduce overfitting:

  1. def add_model( self , window):
  2.  
  3. with tf.variable_scope( 'Layer1' , initializer=xavier_weight_init()) as scope: # Use initializer=xavier to initialize the first layer  
  4. W = tf.get_variable( # *** layer has W, b1, h  
  5. 'W' , [ self .config.window_size * self .config.embed_size,
  6. self .config.hidden_size])
  7. b1 = tf.get_variable( 'b1' , [ self .config.hidden_size])
  8. h = tf.nn.tanh(tf.matmul(window, W) + b1)
  9. if   self .config.l2: # L2 regularization for W  
  10. tf.add_to_collection( 'total_loss' , 0.5 * self .config.l2 * tf.nn.l2_loss(W)) # 0.5 * self.config.l2 * tf.nn.l2_loss(W)  
  11.  
  12. with tf.variable_scope( 'Layer2' , initializer=xavier_weight_init()) as scope:
  13. U = tf.get_variable( 'U' , [ self .config.hidden_size, self .config.label_size])
  14. b2 = tf.get_variable( 'b2' , [ self .config.label_size])
  15. y = tf.matmul(h, U) + b2
  16. if   self .config.l2:
  17. tf.add_to_collection( 'total_loss' , 0.5 * self .config.l2 * tf.nn.l2_loss(U))
  18. output = tf.nn.dropout(y, self .dropout_placeholder) # Return output, both variable_scopes have dropout  
  19.  
  20. return output

For more information about what L2 regularization and dropout are and how to reduce overfitting problems, please read this blog post, which summarizes them simply and clearly.

Use cross entropy to calculate loss:

  1. def add_loss_op( self , y):
  2.  
  3. cross_entropy = tf.reduce_mean( # 1. Key step: loss is defined using cross entropy  
  4. tf.nn.softmax_cross_entropy_with_logits(y, self .labels_placeholder)) # y is the model prediction value, calculate cross entropy  
  5. tf.add_to_collection( 'total_loss' , cross_entropy) # Stores value in the collection with the given name.  
  6. # collections are not sets, it is possible to add a value to a collection several times.  
  7. loss = tf.add_n(tf.get_collection( 'total_loss' )) # Adds all input tensors element-wise. inputs: A list of Tensor with same shape and type  
  8.  
  9. return loss

Then use Adam Optimizer to minimize the loss:

  1. def add_training_op( self , loss):
  2.  
  3. optimizer = tf.train.AdamOptimizer( self .config.lr)
  4. global_step = tf.Variable( 0 , name= 'global_step' , trainable= False )
  5. train_op = optimizer.minimize(loss, global_step=global_step) # 2. Key step: Use AdamOptimizer to minimize loss, so the more important thing is loss  
  6.  
  7. return train_op

After each training, the corresponding weights that minimize the loss are obtained.

In this way, the NER classification problem is solved. Of course, in order to improve accuracy and other issues, we still need to consult literature to learn. Next time, we will implement an RNN first.

<<:  AI helps you solve the problem of "too long to read": How to build a deep abstract summary model

>>:  The principle of BP algorithm in neural network and source code analysis of Python implementation

Recommend

An article reveals the "marketing decoding" of MINISO

How to build a good store brand? For example, fir...

Large-scale offline reasoning based on Ray

Offline reasoning for large models Features Offli...

Why don’t you know how to promote your products?

Founders often underestimate the importance of hir...

On my journey after quitting my job: I wrote an app and started a company

[[127859]] A year ago, I left San Francisco, sold...

5 suggestions for new media operations!

Again, the same words: The method is the method, ...

How do brands choose UP hosts on Bilibili for promotion?

Bilibili has special platform attributes. From a ...

Summary of common problems in the App Store listing review process

1. Introduction to iOS APP Listing Process Apply ...

Money-making or big trend? Who is the most scheming boy in the VR world in 2016?

As early as last year, 2015, some industry inside...