Big number of work tasks cannot prevent enthusiastic professionals from doing extra research. The start of NaukaLabs corporate project serves as a proof of that. Our research will be aimed at identifying ideas with high potential for business and developing advanced technologies. The first project is a neural network.
There are too many materials about neural networks to go into the explanation of the technology without repeating all known facts. Therefore, we will only provide a brief explanation of what a neural network is.
The structure of the neural network has come to the programming world from biology:
In the human brain there are millions of neurons transmitting information with the help of impulses. A neural network is a sequence of neurons connected by synapses that can amplify or reduce an electrical signal passing through them. The bottom line is that the mathematical model for artificial neural networks was based on biological neural networks made of the nervous cells of a living organism. An artificial neural network in the form of a program may be able to analyze and even remember information.
Studying the neural processes taking place in the brain and then trying to model them helped create the notion of an artificial neuron. The role of synapses is played by a weight correcting the value of the data that goes into the neuron. Comparison of the sum of all the weighted signals with the value of the activation function gives some result at the output.
Neural complexes can be formed by neural networks containing one or more layers. The picture below illustrates the simplest case: the circles denote an input layer that is not considered a layer of a neural network, and on the right there is a layer of ordinary neurons. In single-layer neural networks, signals from the input layer are immediately transferred to the output layer. It does the necessary calculations, and immediately provides the output.
A more complicated structure is a multilayered neural network. The work of hidden layers of neurons can be compared with a large production. The product – the output signal – is formed at the plant by stages. After being treated by each machine, an intermediate result is obtained. The hidden layers transform the input signals into various intermediate results in the same way.
What is particularly interesting is a feedback network. With the help of recurrent networks, you can restore or complete signals. In other words, such neural networks have the properties of short-term memory, similar to the one of a human.
For what purposes is the neural network useful? We will list the main types of tasks:
- The technology greatly simplifies the distribution of data by parameters. For example, there is a set of people as inputs, and we need to decide which of them can we give credit to, and who will be rejected. This work can effectively be done by a neural network, analyzing age, solvency, credit history, etc.
- Regression analysis. In other words, approximation and prediction. As an example, the opportunity to predict the next growth or fall in shares, based on the current situation on the stock market.
- This goal is very similar to classification, however, its result is a cluster – a whole class. In other words, clustering is a classification performed by the neural network itself: it is impossible to determine in advance which classes will be obtained but it is possible to reveal the hidden or unnoticed dependencies between the studied neural network objects.
Before using the neural network, it needs to be trained. The training process consists in searching for a set of weights, which, after passing through the network, produce the desired outputs. There are two main ways:
- Training with a teacher – weights change so that the network responses are at least slightly different from the ready, correct answers.
- Training without a teacher – the network itself classifies the input signals. The correct (reference) output signals are not displayed.
Only in the last six months, users have sent about 5700 messages to the support department of NAUKA. Sorting them manually would be time-consuming and uninteresting; it is more fun to create a neural network that classifies emails by types.
Let’s list the main reasons for the appeal:
- The message describes a software failure.
- New requirements (wishes). Appeal is connected with new requirements for software functionality or the desire to change the data.
- Training or consultation is required. The user is faced with the lack of knowledge about a program’s functionality or business process.
- The reason cannot be defined. Messages are not directly related to software – for example, congratulations and season greetings, erroneous messages, etc.
Let’s break down creation of the neural network into stages:
- to build a neural network that classifies emails based on the text of a message;
- to create a neural network that classifies emails based on error in the log;
- to make a neural network for image processing to determine if the image contains an error box;
- to combine the received neural networks, using them together, or on their basis to build a generalized network.
At the time of this writing, NAUKA has implemented the first step, that is sorting messages based on their contents. This task was carried out in several stages, too:
- creating a dictionary,
- selecting a neural network model,
- network training,
- checking performance of the network by determining the percentage of correct answers.
Let’s look at each stage in more detail.
Creating a Neural Network Dictionary
The dictionary is the whole array of meaningful words that a sample of the learning model contains. Not every word found in the text is significant, so the array needs to be processed.
Prepositions, interjections and words from the black list («hello», «thank you», etc.) should be excluded from the dictionary. It is also necessary to delete words containing numbers, and other words – i.e. to “normalize”. What helped us do that was Porter’s Stemmer – an algorithm that allows you to get rid of endings and suffixes and process stems only, if a given language allows to do that. The words encountered in fewer than four messages are excluded, and each remaining word is assigned a number of the input variable.
As a result, 1111 significant words were obtained. However, the neural network cannot work directly with words, so the text of each message must be turned into an array of numerical values.
At the input, we divided the text of a message into words, normalized them and checked for their presence in the dictionary. If the word was present, we assigned a value to the variable with the appropriate number. At the output, we ended up having an array of 1111 elements made up of zeros and ones:
(1, 0, 0, 1, …, 0, 1, 0)
Building a Neural Network
Doing this from scratch would take quite a while, so we decided to use the Encog 3.3 library – a free open-source API for Java and .Net. We used a network of direct distribution, i.e. from the entrance to the exit. We do not want to overload our readers with mathematics, so we will just mention for those who are interested, that the sigmoid function was chosen as the activation function.
The experts at NAUKA investigated three variants of a neural network – with two hidden layers of neurons, with one hidden layer and without any hidden layers.
Training the neural network
As mentioned above, we used 3416 messages sent to the NAUKA support service in the last six months as test samples. 75% of this data was fed to the neural network to train it, the remaining 25% was left for verification.
We carried out training with the level of an allowable error of 1% using the learning algorithm called Resilient Propagation (RProp). Unlike other algorithms based on the method of gradient descent, RProp uses the signs of partial derivatives, rather than their values.
Testing revealed that only 55% of the messages submitted to the trained network have been classified correctly. The team were wondering: “Is it possible to improve this indicator?”
Retraining is a phenomenon when the constructed model explains the examples from the training sample well, but it works relatively poorly with examples that did not participate in training. With a minimal error in the learning process of the network, it becomes less variable to data not participating in training.
Next, we trained the neural network to an error of 8% (in reality there was a maximum of 7.2%). The test gave a more successful result in 65% of the messages classified correctly. At this stage, it was noted that messages of the same content are mostly sent to the same modules. We expanded the dictionary by adding module signatures. The result was 68.6%.
It is worth noting that both multilayered and single-layered neural networks gave the same result when tested. This indicates the linear separation of the set of initial data. Hence, in practice it is permissible to use a neural network without hidden layers.
And yet – can the forecast’s precision be improved? Yes, it can, if you take into account the hidden dependencies, which our programmers were not able to do. And with this task can be performed well by the neural network itself that solves the problem of clustering. To do this, try to move away from networks of direct distribution. This time we used the neural network of Kohonen – this class of networks, as a rule, can learn without a teacher and is successfully used in recognition problems.
Having chosen the algorithm of training called Neighborhood Competitive Training, we investigated networks with the number of clusters equal to 10, 20, 40, 60, 80, 100. We stopped at 80 clusters – the testing showed the forecast accuracy equal to 64.98%. Thus, clustering gave results similar to the ones obtained with the training sample.
To sum up, the text of a message is not enough to predict the type of the message. The forecast may be inaccurate due to the fact that messages with identical texts can belong to different topics. Therefore, the NaukaLabs project has a long way to go – we are going to be faced with the task of creating a neural network that classifies errors in the log, the neural network analyzing a picture, and the combination of these two technologies.
Prospects of such programs are very extensive – even within the scope of only one of our projects. For example, a neural network can analyze any texts from users of the site for banned materials – the sale of questionable drugs, obscene lexicon, extremism, etc. It will be able to help the support service in forming possible responses to users’ queries. The technology will make it easier to search for people by photo and perform image analysis to identify pornography. And this is only a small part of the possibilities of neural networks. We will try them out by ourselves and report about the new cases to you!