Yarowsky Algorithm

Yarowsky Algorithm

Lambert M. Surhone, Mariam T. Tennoe, Susan F. Henssonow

     

бумажная книга



Издательство: Книга по требованию
Дата выхода: июль 2011
ISBN: 978-6-1313-2556-4
Объём: 68 страниц
Масса: 123 г
Размеры(В x Ш x Т), см: 23 x 16 x 1

High Quality Content by WIKIPEDIA articles! In computational linguistics the Yarowsky algorithm is an unsupervised learning algorithm for word sense disambiguation that uses the "one sense per collocation" and the "one sense per discourse" properties of human languages for word sense disambiguation. From observation, words tend to exhibit only one sense in most given discourse and in a given collocation. The algorithm starts with a large, untagged corpus, in which it identifies examples of the given polysemous word, and stores all the relevant sentences as lines. For instance, Yarowsky uses the word "plant" in his 1995 paper to demonstrate the algorithm. If it is assumed that there are two possible senses of the word, the next step is to identify a small number of seed collocations representative of each sense, give each sense a label (i.e. sense A and B), then assign the appropriate label to all training examples containing the seed collocations. In this case, the words "life" and "manufacturing" are chosen as initial seed collocations for senses A and B respectively. The residual examples (85%–98% according to Yarowsky) remain untagged.

Данное издание не является оригинальным. Книга печатается по технологии принт-он-деманд после получения заказа.

Каталог