Using deep neural networks to solve the problem of NER named entity recognition

Using deep neural networks to solve the problem of NER named entity recognition

This article is organized as follows:

  1. What is Named Entity Recognition (NER)
  2. How to identify?

cs224d Day 7: Project 2-Using DNN to solve NER problem

Course Project Description Address

What is NER?

Named entity recognition (NER) refers to the identification of entities with specific meanings in text, mainly including names of people, places, institutions, proper nouns, etc. Named entity recognition is an important basic tool in application fields such as information extraction, question-answering systems, syntactic analysis, and machine translation, and is an important step in extracting structured information. Excerpted from BosonNLP

How to identify?

I will first explain the logic of solving the problem, and then explain the main code. If you are interested, please go here to see the complete code.

The code is to build a DNN with only one hidden layer under Tensorflow to handle the NER problem.

1. Problem identification:

NER is a classification problem.

Given a word, we need to determine which of the following four categories it belongs to based on the context. If it does not belong to any of the following four categories, then the category is 0, which means it is not an entity. Therefore, this is a problem that needs to be divided into 5 categories:

  1. • Person (PER)
  2. • Organization (ORG)
  3. • Location (LOC)
  4. • Miscellaneous (MISC)

Our training data has two columns, the first column is the word and the second column is the label.

  1. EU ORG
  2. rejects
  3. German MISC
  4. Peter PER
  5. BRUSSELS LOC

2. Model:

Next we train it using a deep neural network.

The model is as follows:

The x^(t) of the input layer is the context with a window size of 3 centered on x_t. x_t is a one-hot vector. After x_t and L are applied, they become the corresponding word vector, and the length of the word vector is d = 50:

We build a neural network with only one hidden layer, the hidden layer dimension is 100, y^ is the predicted value, the dimension is 5:

Use cross entropy to calculate the error:

J is differentiated with respect to each parameter:

The following derivation formula is obtained:

In TensorFlow, derivation is automatically implemented. Here, the Adam optimization algorithm is used to update the gradient and continuously iterate to make the loss smaller and smaller until convergence.

3. Specific implementation

In def test_NER(), we perform max_epochs iterations. Each time, we train the model with the training data to get a pair of train_loss and train_acc. Then we use this model to predict the validation data and get a pair of val_loss and predictions. We select the smallest val_loss and save the corresponding parameter weights. Finally, we use these parameters to predict the category label of the test data:

  1. def test_NER():
  2. config = Config()
  3. with tf.Graph().as_default():
  4. model = NERModel(config) # The main class
  5.  
  6. init = tf.initialize_all_variables()
  7. saver = tf.train.Saver()
  8.  
  9. with tf.Session() as session:
  10. best_val_loss = float ( 'inf' ) # The best value, its loss, its number of iterations, epoch
  11. best_val_epoch = 0  
  12.  
  13. session.run(init)
  14. for epoch in xrange(config.max_epochs):
  15. print 'Epoch {}' .format(epoch)
  16. start = time.time()
  17. ###
  18. train_loss, train_acc = model.run_epoch(session, model.X_train,
  19. model.y_train) # 1. Put the train data into the iteration and get the loss and accuracy
  20. val_loss, predictions = model.predict(session, model.X_dev, model.y_dev) # 2. Use this model to predict dev data and get loss and prediction
  21. print 'Training loss: {}' .format(train_loss)
  22. print 'Training acc: {}' .format(train_acc)
  23. print 'Validation loss: {}' .format(val_loss)
  24. if val_loss < best_val_loss: # Use the loss of val data to find the minimum loss
  25. best_val_loss = val_loss
  26. best_val_epoch = epoch
  27. if not os.path.exists( "./weights" ):
  28. os.makedirs( "./weights" )
  29.  
  30. saver.save(session, './weights/ner.weights' ) # Save the weights corresponding to the minimum loss
  31. if epoch - best_val_epoch > config.early_stopping:
  32. break  
  33. ###
  34. confusion = calculate_confusion(config, predictions, model.y_dev) # 3. Put the dev label data in and calculate the confusion of prediction
  35. print_confusion(confusion, model.num_to_tag)
  36. print 'Total time: {}' .format(time.time() - start)
  37.  
  38. saver.restore(session, './weights/ner.weights' ) # Load the saved weights again, use the test data to make predictions, and get the prediction results
  39. print 'Test'  
  40. print '=-=-='  
  41. print 'Writing predictions to q2_test.predicted'  
  42. _, predictions = model.predict(session, model.X_test, model.y_test)
  43. save_predictions(predictions, "q2_test.predicted" ) # Save the prediction results
  44.  
  45. if __name__ == "__main__" :
  46. test_NER()

4. How is the model trained?

First, import the data training, validation, and test:

  1. # Load the training set  
  2. docs = du.load_dataset( 'data/ner/train' )
  3.  
  4. # Load the dev set (for tuning hyperparameters)  
  5. docs = du.load_dataset( 'data/ner/dev' )
  6.  
  7. # Load the test set (dummy labels only)  
  8. docs = du.load_dataset( 'data/ner/test.masked' )

After converting the words into one-hot vectors, convert them into word vectors:

  1. def add_embedding( self ):
  2. # The embedding lookup is currently only implemented for the CPU  
  3. with tf.device( '/cpu:0' ):
  4.  
  5. embedding = tf.get_variable( 'Embedding' , [len( self .wv), self .config.embed_size]) # L in assignment  
  6. window = tf.nn.embedding_lookup(embedding, self .input_placeholder) # Get the word vector of the context of window size directly in L  
  7. window = tf.reshape(
  8. window, [- 1 , self .config.window_size * self .config.embed_size])
  9.  
  10. return window

Build the neural layer, including using xavier to initialize the first layer, L2 regularization and using dropout to reduce overfitting:

  1. def add_model( self , window):
  2.  
  3. with tf.variable_scope( 'Layer1' , initializer=xavier_weight_init()) as scope: # Use initializer=xavier to initialize the first layer  
  4. W = tf.get_variable( # *** layer has W, b1, h  
  5. 'W' , [ self .config.window_size * self .config.embed_size,
  6. self .config.hidden_size])
  7. b1 = tf.get_variable( 'b1' , [ self .config.hidden_size])
  8. h = tf.nn.tanh(tf.matmul(window, W) + b1)
  9. if   self .config.l2: # L2 regularization for W  
  10. tf.add_to_collection( 'total_loss' , 0.5 * self .config.l2 * tf.nn.l2_loss(W)) # 0.5 * self.config.l2 * tf.nn.l2_loss(W)  
  11.  
  12. with tf.variable_scope( 'Layer2' , initializer=xavier_weight_init()) as scope:
  13. U = tf.get_variable( 'U' , [ self .config.hidden_size, self .config.label_size])
  14. b2 = tf.get_variable( 'b2' , [ self .config.label_size])
  15. y = tf.matmul(h, U) + b2
  16. if   self .config.l2:
  17. tf.add_to_collection( 'total_loss' , 0.5 * self .config.l2 * tf.nn.l2_loss(U))
  18. output = tf.nn.dropout(y, self .dropout_placeholder) # Return output, both variable_scopes have dropout  
  19.  
  20. return output

For more information about what L2 regularization and dropout are and how to reduce overfitting problems, please read this blog post, which summarizes them simply and clearly.

Use cross entropy to calculate loss:

  1. def add_loss_op( self , y):
  2.  
  3. cross_entropy = tf.reduce_mean( # 1. Key step: loss is defined using cross entropy  
  4. tf.nn.softmax_cross_entropy_with_logits(y, self .labels_placeholder)) # y is the model prediction value, calculate cross entropy  
  5. tf.add_to_collection( 'total_loss' , cross_entropy) # Stores value in the collection with the given name.  
  6. # collections are not sets, it is possible to add a value to a collection several times.  
  7. loss = tf.add_n(tf.get_collection( 'total_loss' )) # Adds all input tensors element-wise. inputs: A list of Tensor with same shape and type  
  8.  
  9. return loss

Then use Adam Optimizer to minimize the loss:

  1. def add_training_op( self , loss):
  2.  
  3. optimizer = tf.train.AdamOptimizer( self .config.lr)
  4. global_step = tf.Variable( 0 , name= 'global_step' , trainable= False )
  5. train_op = optimizer.minimize(loss, global_step=global_step) # 2. Key step: Use AdamOptimizer to minimize loss, so the more important thing is loss  
  6.  
  7. return train_op

After each training, the corresponding weights that minimize the loss are obtained.

In this way, the NER classification problem is solved. Of course, in order to improve accuracy and other issues, we still need to consult literature to learn. Next time, we will implement an RNN first.

<<:  AI helps you solve the problem of "too long to read": How to build a deep abstract summary model

>>:  The principle of BP algorithm in neural network and source code analysis of Python implementation

Recommend

Apple is terrible when it learns to compromise

If something keeps running along its original pat...

Replacing global fonts in a rough way in Android

[[204795]] sequence Using custom fonts on Android...

Apple updates iOS 15.2 with details to check if iPhone parts are genuine

The official version of iOS 15.2 has been release...

Short video planning and operation guide!

During last year's National Day, Zhang Ce and...

How to create a CocoaPods in Swift

You may be familiar with some well-known open sou...

How to carry out new media work?

The so-called new media operation , simply put, i...

The 2017 Internet Marketing Vocabulary, a must-have for Internet people! !

This is a major update to the internet marketing ...

The weather is so hot, don't waste your sweat! Experts: Go generate electricity

Hi, my lovely “academic bacteria”, I am your love...