Coders Strike Back - pb4608's AI (Rank: 3rd)

Genetic Algorithm inspired the top of the leaderboard. With this strategy based on evolutionary trajectory optimization, pb4608 managed to own the 3rd position on the leaderboard. He kindly took the time to explain us in details his AI. Thanks a lot to pb4608 for this amazing contribution! :)

    Core algorithm : evolutionary trajectory optimization

    1. Genetic algorithm

The core of the AI is based on a genetic algorithm. There are a lot of resources on the internet to understand this class of algorithm, but here are the basic requirements to implement one :

  • Define basic individuals with a behavior that can be described with a set of numeric parameters. In my case, an individual is a sequence of [thrust ; angle] instructions for the next six turns, and a turn at which the shield is activated.
  • Implement a fitness function which describes how good the individual behaves in its environment. In my case, the fitness calculation is based on a physics engine which simulates 6 game turns and evaluates the final state.
  • Mutate/crossover/select individuals from your population.

Turn 0
Turn 1
Turn 2
Turn 3
Turn 5
Turn 6
picked randomly in [-40; +40], then bounded in [-18;18]
picked randomly in [-100; +400], then bounded in [0;200]

Shield Activation
picked randomly in [0; 8]
determines the turn when the shield is activated

Table : Example of genome encoding for an evolutionary entity

    1. Evaluation function subtleties

The evaluation function presented in this paragraph only involves a runner pod which wants to end the race as soon as possible. A more complete evaluation function with a blocker pod is described in paragraph 2.
The parameters taken into account are :
  • If the race is finished, time until arrival
  • If the race is not finished, number of checkpoints left + distance until the next checkpoint. This distance is taken between the pod and an arbitrarily defined entry point, as shown on the figure below.
  • A late addition was with a small factor, the angle between the opponent blocker and my runner’s next checkpoint, as seen by my runner.

The distance until the next checkpoint is calculated between the pod and an arbitrary point inside the checkpoint. The definition of this entry point is described on the figure below :

Figure : Definition of a checkpoint’s entry point

The definition of the entry point was used to counteract the pod’s tendency to settle in sub-optimal trajectories as shown in the figure below :
Figure : Two types of trajectories found by the pod (suboptimal / optimal)

Unfortunately, evaluating the factors above after six turns of simulation requires having a good prediction of the opponent’s movements. In order to account for uncertainties in what happens in the future, the evaluation is conducted twice :
  • After one turn of simulation with a 10% weight
  • After six turns of simulation with a 90% weight

Twenty-four hours before the end of the contest, this simple AI with only two runners and no collision simulation was ranked 15th on the leaderboard. When approaching such a contest, a strong base is required before additional features are implemented.

Runner / chaser implementation

Based on the simulation/evolution component described above, a more complex runner/chaser behavior is devised :

    1. General behavior

Repeat each turn :
  • Define all pods’ roles. (My runner / my blocker / his runner / his blocker)
  • During 12ms, use the evolutionary algorithm to predict a trajectory for his runner.
  • During 12ms, use the evolutionary algorithm to predict a trajectory for his blocker. The simulation is done with my runner using a very basic movement behavior.
  • Inject the two trajectories predicted in the game simulator : they will be used for the next simulation.
  • Define a “waiting checkpoint” for my blocker bot to wait for the opponent’s runner.
  • During 116ms, use the evolutionary algorithm to define a combined trajectory for my two pods. This combination means that both the pods’ genomes are evolved at the same time. This is the main feature that allows the algorithm to find a cooperative behavior between the pods.
    During these 116ms, some simulations will involve collisions with the opponent’s pods. When a collision is detected, the fixed trajectories calculated beforehand are discarded : control of the opponent’s bot switches to a basic movement behavior.

    1. Combined evaluation function
The behavior defined above involves using a more complex evaluation function to calculate the evolutionary entities’ fitness.
The evaluation is a combination of a runner evaluation and a blocker evaluation. The basic concept of the runner evaluation function is not modified, as described in part 1).

The blocker part of the evaluation function gives a bonus to :
  • If I am not at risk to timeout :
    • Reducing the opponent’s runner evaluation
    • Maintaining my blocker’s angle in the direction where the opponent’s runner is (maintain eyes on target)
    • Minimizing the angle under which the opponent’s runner sees his next checkpoint and my blocker (stay between him and his next checkpoint)
    • Minimize the blocker’s distance to his waiting point. (wait for the opponent in front of his next checkpoint).
  • If I am at risk to timeout :
    • The blocker uses the same evaluation function as a runner to take his next checkpoint and reset the timer.
All the factors above are weighted according to their order of importance. For example, the “wait for the opponent in front of his next checkpoint” is of low priority compared to pushing him out of his path.
I received many questions during the contest asking how my runner would become berserk and act like a second blocker. There is no magic in there : it is simply not specified in the evaluation function what pod should be used to reduce the opponent’s runner evaluation. The blocker evaluation function can be over-weighted or under-weighted to change my runner bot’s tendency to have a blocker behavior.
I had a lot of fun during the last three days of the contest uploading a version of the AI with a behavior where the runner wants to block a hundred times more than it wants to advance. In the final release, this value was brought down to 1.5.


The contest was challenging and interesting. Positives and negatives :

  • The games were 1v1. Games with more than two players generally evolve into a race towards greediness, and the top bots’ final strategies have (carefully designed) holes, that they hope other greedy opponents won’t exploit. Under that light, “Coders Strike Back” is much better designed than “Back to the code” for example.
  • Many approaches seemed viable, as seen on the forums (evolutionary algorithms, beam search, minimax, basic heuristics, or even neural networks for some).
  • “Easy to participate, hard to master”  : a great game for everybody !

  • Incomplete description of the rules and physics engine. Either the game is to reverse engineer the simulation engine, or the game is to program the best AI. It is particularly disappointing to spend 24h “debugging” a local simulation engine before understanding there is a hidden minimum-impulse on collision.

Ranking system. TrueSkill is setup to “forget” old matches from a player, as it is expected that a human player’s level will evolve with time. This is not the case for an AI : the rankings should be able to stabilize over time without the feeling that luck is involved in the process.


  1. Congrats for the contest and nice post dude !
    I'm starting lerning genetics algorithms and I would like to ask some questions. They can be personal, so feel free to not answer all ;)
    First, How much people your population contains at the beginning and how much generations do you generate each turn ?
    Then, can you tell me more about the evolution functions ? (The number of selected genome, does your mutation is a lot random, how much people do you mutate, and same question for the crossover).
    These numbers could be really helpful for me in programming my first efficient AI :)

    1. Sorry for the late answer...
      I went with a special version of a genetic algorithm which is called "Steady-state genetic algorithm". Its specificity is that there is no definition of "generations". The idea is : at every cycle, you generate ONE child from a pool of genomes. That child has a chance of replacing an entity from that pool depending on its fitness.
      I decided to use this "steady state" algorithm for its relative ease of implementation. In hindsight, this is not a good reason and I would go a with a regular genetic algorithm if I had to do it again.
      The size of my pool was 50 entities, which I decided to use after a bit of testing.
      Mutation probability was approximately 0.1, with the angle picked randomly in (0;2*PI)
      Crossover probability was approximately 0.3.
      I believe is generated around 10 000 entities each turn. This is equivalent to 200 generations if your pool size is 50.