×

You are using an outdated browser Internet Explorer. It does not support some functions of the site.

Recommend that you install one of the following browsers: Firefox, Opera or Chrome.

Contacts:

+7 961 270-60-01
ivdon3@bk.ru

The technique and framework of language models construction for real-time Internet monitoring

Abstract

The technique and framework of language models construction for real-time Internet monitoring

Nosko V.I., Svechkarev V.P., Rozin M.D.

Incoming article date: 25.11.2015

The highest dynamics of the processes of online extremism bring more importance to the development of methodology and software tools capable to produce real-time tracking of the dissemination of information, including social networks, to analyze posts meaning and promise and to build predictive models. The article presents the technique of designing a smart linguistic models that are able to take into account the context and allows flexible adaptation to the subject area for natural language processing in the social networks within the field of real-time warning cyber-threats based on Data Mining. We describe the basic disadvantages of using a simple feature engineering and the bag of words method for the purposes of text classification. We show programming interface and features of the generalized framework in which our technique is applied, and also show how this framework can be used to meet the challenges of business and government during the process of collection and analysis of publications on the Internet.

Keywords: natural language processing, linguistic models, machine learning, feature engineering, text mining framework, text classification, language models constructor, morphology analysis