Train Corpus For Ner With Nltk Ieer Or Conll2000 Corpus
Solution 1:
The nltk provides everything you need. Read the nltk book's chapter 6, on Learning to Classify Text. It gives you a worked example of classification. Then study sections 2 and 3 from Chapter 7, which show you how to work with IOB text and write a chunking classifier. Although the example application is not named entity recognition, the code examples should need almost no changes to work (although of course you'll need a custom feature function to get decent performance.)
You can also use the nltk's tagger (or another tagger) to add POS tags to your corpus, or you could take your chances and try to train a classifier on data without part-of-speech tags (just the IOB named entity categories). My guess is that POS tagging will improve performance, and you're actually much better off if the same POS tagger is used on the training data as for evaluation (and eventually production use).
Post a Comment for "Train Corpus For Ner With Nltk Ieer Or Conll2000 Corpus"