The technique and framework of language models construction for real-time Internet monitoring
Abstract
The technique and framework of language models construction for real-time Internet monitoring
Incoming article date: 25.11.2015The highest dynamics of the processes of online extremism bring more importance to the development of methodology and software tools capable to produce real-time tracking of the dissemination of information, including social networks, to analyze posts meaning and promise and to build predictive models. The article presents the technique of designing a smart linguistic models that are able to take into account the context and allows flexible adaptation to the subject area for natural language processing in the social networks within the field of real-time warning cyber-threats based on Data Mining. We describe the basic disadvantages of using a simple feature engineering and the bag of words method for the purposes of text classification. We show programming interface and features of the generalized framework in which our technique is applied, and also show how this framework can be used to meet the challenges of business and government during the process of collection and analysis of publications on the Internet.
Keywords: natural language processing, linguistic models, machine learning, feature engineering, text mining framework, text classification, language models constructor, morphology analysis