Tokenization

Tokenization

Lambert M. Surhone, Mariam T. Tennoe, Susan F. Henssonow

     

бумажная книга



Издательство: Книга по требованию
Дата выхода: июль 2011
ISBN: 978-6-1336-2666-9
Объём: 148 страниц
Масса: 246 г
Размеры(В x Ш x Т), см: 23 x 16 x 1

High Quality Content by WIKIPEDIA articles! Tokenization is the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens. The list of tokens becomes input for further processing such as parsing or text mining. Tokenization is useful both in linguistics (where it is a form of text segmentation), and in computer science, where it forms part of lexical analysis.In languages such as English (and most programming languages) where words are delimited by whitespace, this approach is straightforward. However, tokenization is more difficult for languages such as Chinese which have no word boundaries.

Данное издание не является оригинальным. Книга печатается по технологии принт-он-деманд после получения заказа.

Каталог