Издательство: | Книга по требованию |
Дата выхода: | июль 2011 |
ISBN: | 978-6-1336-2666-9 |
Объём: | 148 страниц |
Масса: | 246 г |
Размеры(В x Ш x Т), см: | 23 x 16 x 1 |
High Quality Content by WIKIPEDIA articles! Tokenization is the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens. The list of tokens becomes input for further processing such as parsing or text mining. Tokenization is useful both in linguistics (where it is a form of text segmentation), and in computer science, where it forms part of lexical analysis.In languages such as English (and most programming languages) where words are delimited by whitespace, this approach is straightforward. However, tokenization is more difficult for languages such as Chinese which have no word boundaries.
Данное издание не является оригинальным. Книга печатается по технологии принт-он-деманд после получения заказа.