Automatic information extraction in Vietnamese text

San Chanrathany, Lê Thanh Hương, Nguyễn Thanh Thủy, Nguyễn Hữu Thiệu


 This paper presents semi-supervised approaches to construct a Vietnamese information extraction system. Our approach in named entity extraction inherits the idea of Liao and expands it by using proper name coreference rules to find new entities. The new entities are put into the training set to learn new context features for the extracting module. The experimental results show that our method achieves higher accuracy than Liao’s. In relation extraction, we improved the Shallow Linguistic Kernel (SLK) of Giuliano et al.’s by modifying the window size of the kernel and using additional features to present sentences, including part of speech, another entity types, and a dictionary of compound verbs. Our experimental results show that the supervised method using our SLK achieves higher accuracy than one used by Giuliano et al. And its accuracy when applying the semi-supervised  method is higher than that when using the supervised one.

DOI: Display counter: Abstract : 165 views. PDF : 97 views. PDF (Tiếng Việt) : 37 views.

Journal of Computer Science and Cybernetics ISSN: 1813-9663

Published by Vietnam Academy of Science and Technology