New Neural Networks Coming to PA

Last Friday I got the crazy idea to update the neural networks used by the AI. Why, you ask? Well, during work Thursday I watched a live stream by Alex Champandard of that went over modern neural network techniques. There were a couple of things mentioned during the live stream that caught my attention. One was that neural networks are making a comeback. The second was a new (at least to me) activation function that has gained popularity.

The first point made me laugh because not very long ago I would get weird looks and questioning faces when I mentioned using neural networks for Supreme Commander 2. The second point highlighted the fact that my already limited knowledge of neural networks may be getting out of date. So, I took last Friday off of work and spent the majority of the weekend reading research papers and fiddling with the neural networks in PA, looking to see if I could improve them.

The first step was to update my neural network class to be more robust. The class I had in place was pretty rigid and unsuitable for what I had planned. The first thing I did was add support for having more than one hidden layer. It seems like a simple change, but one that was, up until now, completely unnecessary. The research I read backs that up as well. The second thing I added was support for specifying different activation functions per layer type. Sure, I could make them per layer, but for now per layer type is fine. These two changes opened up a lot of new possibilities.

Currently the live build PA uses a trio of 3 layer multi-layer perceptrons, one each for land, fighter, and bomber/gunship platoons. The hidden and output layers of these neural networks utilize a sigmoid activation function to squash the input values it receives from the previous layer. The reason just adding multiple hidden layers isn't an immediate bonus is, as I understand it, due to the sigmoid activation function itself, or rather its derivative. To train a neural network you need to be able to calculate an error amount for each output in the neural network. PA uses the gradient descent method to achieve this. The problem this poses is as you propagate the error value further up the stack of layers this gradient gets tinier and tinier causing the layers nearer to the input layer (top) to have a harder time learning than the layers closer to the output layer (bottom).

To get around this people are using what is called a rectified linear activation function. The benefits of using a rectified linear activation function are three-fold. One, the rectified linear activation function is cheaper to calculate (as is its derivative) which allows you to have more hidden nodes. Two, it allows for sparsity in the neural network, where a portion of the nodes in each layer will be inactive for a given set of inputs. Three, because the full value neurons inputs are utilized (as long as it is a positive value), rather than squashed, it allows for effective gradient descent in the upper layers.

All in all this allows for faster training times and potentially more effective neural networks. This also allows neural networks to grow upwards instead of outwards, which is more efficient to calculate. So far, I have been quite happy with the results. Once the patch that includes them goes live, I hope you will be too.