Neural nets in more detail

Since so many of you have requested more info on how the neural nets in Supreme Commander 2 work I have decided to make another blog entry to go into the neural nets in more detail. Actually, I am a bit surprised I haven't made a blog entry about them already.

Supreme Commander 2 contains 4 neural networks, each of them consisting of 3 layers of neurons (input, hidden, and output) and learn via backpropagation. There is a neural network for Land, Naval, Fighter, and Bomber/Gunship. The neural networks are used for AI platoon fight or flight mechanics which are used by the bulk of the AIs combat platoons.

When a platoon is formed it contacts the AI's strategic manager and requests a place to attack. The strategic manager looks at the map and chooses a place to attack based on a risk versus reward ratio. It also checks for pathability. Once it has chosen a spot it generates a path using good old A* and returns the path to the platoon. The platoon then sets up a series of move orders to take them to the attack location.

Up until now the neural networks have not even come into play (although, that is one thing I would like to change in KnC). Once the platoon encounters an enemy it gathers information about the enemies in the area and it also gathers information about allies in the area. Note the difference between live and training. In training the platoon only gathers information about itself, in live the platoon gathers information about all allies (including player controlled units) in the area.

It then takes all of that information and feeds it into the input neurons of the neural network and feeds the network forward. It then gathers the outputs from the neural network and evaluates the output. Each output corresponds to an action that the platoon can take such as, attack structure from range or attack highest value target. Each output is going to have a value of between 0.0 and 1.0 with below 0.5 being a bad decision and above 0.5 being a good decision. If there are no good decisions returned by the neural network it means run away. After a small delay the enemy and ally data are gathered up again, feed to the neural network, and a new decision is made.

The neural networks are trained by having the AI fight it out repeatedly and backpropagating the results. At the end of a match the game writes out the new neural networks to a file. The game runs using a few special command lines which do the following:

  • Set the game to run at +/- 50 sim speed.
  • Automatically restart the game when it ends.
  • Enable neural network training.
  • Set which map to play on.
  • Sets up the AIs.
This allows me to run neural network training 24 hours a day on a dedicated machine. For Kings and Castles I want to look at having several computers running training at once and merging the results. The more iterations the neural networks get the better they can be. This is because, during training, the platoons are choosing actions at random and recording the results. It can take a long time to test every action in large set of circumstances.

Hopefully this answers most, if not all, the questions you all had.

15 comments:

BulletMagnet said... / October 2, 2010 at 6:40 PM  

Is it safe for me to say that you train your NNs based on the outcome of an entire match, not a particular sortie?

Wouldn't that leave a possibility for the NNs to perform exceptionally well, but still lose (to say, a lucky nuke strike), and hence classifying good choices as, well... bad?

Also, I don't think that EHC training can be parallelised (don't quote me on that, I've only done the one course on machine intelligence and it was only out of personal interest). I'd hazard a guess that you could train each NN on a different machine.

Sorian said... / October 2, 2010 at 6:57 PM  

Nope, the neural nets are trained per encounter. We are teaching them to recognize encounters where they can win and where they will lose.

If I am correct about what EHC stands for, then it would not apply here because we don't use a genetic algorithm for our neural nets. You may still be correct that we cannot train the neural nets in parallel, but I have not looked into it that deep yet. It is still down the road.

BulletMagnet said... / October 2, 2010 at 7:48 PM  

Per encounter is great! :D

I probably have gaffed my terminology on EHC (Evolutionary Hill Climbing), but I meant that if you're iteratively improving the networks then you can't really split that task. If you were to introduce genetics to the situation, then you could try different variations on different machines, and judge which one worked best.

But since you have all the data there, your guess is better than mine. ;P

zeech said... / October 3, 2010 at 5:19 AM  

I asked this in the forums, but can you train your NNs on datasets such as replays?

Seems like it would be interesting to use real replays as training data instead of / in addition to, AIvAI combat.

Sorian said... / October 3, 2010 at 9:45 AM  

@zeech: I suppose anything is possible. I would have to find a way of determining what action you took and then a way to rate the outcome, because we can't just assume that any action you take is always a good one.

olivier said... / October 5, 2010 at 12:07 AM  

Thx a lot Sorian for this detailed explanation. Have additional question : As far as i know the NN learning process relies on increasing/lowering/keeping unchanged a coefficient multiplier applied to the result of the Neuron function : Could it be than in a specific situation what was learnt before get lost because the learning process for another situation has modified the NN so that what was applied before is not applied anymore ? Or does the NN learning process prevents this ?

Sorian said... / October 5, 2010 at 1:21 AM  

@oliver: There are a couple things that prevent this. One is the learning rate. Set low enough, the learning rate will keep an oddball situation from negatively affecting the learning too much.

The second thing is that momentum factor. The neural nets store the previous change amounts and apply a portion of the previous change to the new change. This means if all of the previous changes were positive, a negative change will have little impact.

olivier said... / October 5, 2010 at 9:13 AM  

Very clear Sorian, thanks a lot !

Kevin said... / October 5, 2010 at 1:38 PM  

The neural nets probably could be parallelized by incorporating a method of learning called classical conditioning even if the current method cannot not. At the moment it seems that your AI learns by only two of the three methods used by humans (Operant conditioning from examining its own matches and indirect conditioning by examining the matches of top players).

Classical conditioning works on the basis that the intelligence associates a neutral stimulus with the unconditioned stimulus(thr ai's action) and the unconditioned response (the opponents reaction). By using an algorithm based upon this method of learning you can teach the AI to have an awareness of situations that lead to particular responses by the opponent. In this way it could generate an intuitive, strategic sense of a situation. This method has two difficulties. Firstly it actually demands a large parallel processing abilty. Secondly it can throw up irrelevant associations which have to be dismissed by operant testing.

I understand that such a learning algorithm could be extremely difficult to design and might need specialised hardware. However, if it can be done it would improve your AI greatly and would probably be a groundbreaking development in the field as a whole.

Kevin said... / October 6, 2010 at 2:01 AM  

The point of a learning method based on classical conditioning is that it allows the AI to make good decisions when it does not have all the data. By associating a terrain feature such as a choke point with a defeat for example it can potentially realise that its opponents will often try to funnel it into choke points blocked by point defence. Upon making the association it will not charge through these choke points without radar scouting first. Similarly it could also help it to predict unit compositions based upon radar data, game time and the opponents logical tech path. For example, upon seeing a single diamond approaching its early game base, it might realise that it is being acu rushed and take steps to counter it before the acu is actually on its doorstep. Similarly it might learn to associate shielded pds and tmls with a high probability of the opponent teching to loyalty guns and so create a counter to the loyalty gun as well as to the pd or tml.

Sorian said... / October 6, 2010 at 9:42 AM  

I definitely want to give the AI the ability to plan more strategically in Kings and Castles, so we will see what happens.

Chris Love said... / October 8, 2010 at 1:49 PM  

New eco rocks, rocks i tell yer!

ProfessorAsian said... / October 23, 2010 at 10:50 PM  

So are the neural networks being trained by you and then sent to us, or are we training them just by playing them?

I am not sure what this training is.

ProfessorAsian said... / October 23, 2010 at 10:50 PM  

So are the neural networks being trained by you and then sent to us, or are we training them just by playing them?

I am not sure what this training is.

George said... / November 15, 2010 at 1:22 AM  

Best would be the best of two world. Train the AI and let it train with the player so it can adapt.

Problem is that a real player would try to overcome bad odds by trying to use non conventional tactics like deception or try to buy time etc. There are countless tactics a human player would try to get the upper hand.

On hard the AI is a really nice challange now with a good mix of units and paired with a team member it can be pretty deadly if you try to turtle.

I played settons a dozen times (my fav map) and tried different setups.

I placed 2 hard AI on the upper left and right corners of the map and me and my friedn on the lower corners. We placed an AI Teammate on the middle starting point on our side to cut of the landbridge for the AI. A starting defence for me and my friend.

We tried to turtle as hard as we could and on the first run we survived 45 min. and on the second only 22min. We pretty much got buttkicked with turtling..that was awesome. Enemies where cybran and illuminate and we where cybran and 2x uef.

We lowered the AI Enemies to normal on our 3rd run. Was pretty unimpressive. Our normal AI friend could hold his own against those 2 normal AI's. Very weird because you would assume that 2 are stronger then 1.

Maybe it's just coincidence but i didn't notice transports on the AI side. Are they that much ineffective that the AI doesn't use them?

Post a Comment