As you probably all recognised, Hypersonic was strongly inspired from the game Bomberman. It was a multiplayer game where your AI had to move on a grid and place bombs to destroy crates and possibly eliminate other players. The goal being to survive and destroy the most crates.
Each bomb had a timer of 8 turns before they explode and a specific range. The game was really accessible in the wood leagues, because your own bombs couldn't harm you. In Bronze league however, they would indeed harm you ("friendly fire"), this is where the game started to get very challenging.
Strategy of Ixanezis, winner of the contest
The Depth-First Search algorithm (DFS)
Let's first imagine we don't have any opponent and we just want to destroy all crates as fast as we can. I didn't try to invent any heuristics or implement a self-educating system. Just a good old Depth-First Search! On every depth level of this recursive search, I tried to perform at most 10 types of actions:
- place a bomb + stay where I am
- do not place a bomb + stay where I am
- place a bomb + move left
- do not place a bomb + move left
- place a bomb + move right
My initial implementation was fairly straightforward, and has allowed me to achieve fixed D=5 on any test-case within 20ms (with #pragma GCC optimize("O2") surely).
This was basically the extent of what I did the first day of the contest. After that, I spent 4 days debugging, improving and speeding up the code.
Since different paths in the DFS tree often lead to the same (or very much alike) situations, we can prune away those branches that lead to the situations where we have already been. In the Hypersonic game, this has improved my depth level up to 6-7.
b. Dynamic Depth
It becomes evident that in some situations we have enough time to run 10-12 steps deep when searching, while other situations branch very quickly, leading to a much lower level of recursion. In order to fully use provided resources (100ms) we try to go as deeper as possible until we hit the time limit
Let me just show you the C++-based pseudocode that makes use of Caching + Dynamic Depth:
The Dynamic Depth approach has increased my depth level to 6-15, depending on the initial situation.
I'm convinced that any required operation such as moving a player, exploding bombs, removing crates or bonuses can be done in O(1) or almost O(1) time using bit masks in Hypersonic. However, I did not achieve such state-of-the-art implementation. I just replaced some commonly used operations with bitwise operations for speed and storage efficiency.
Hypersonic grid contains 11*13 cells, where 5*6 cells are always immovable and irreplaceable stones, so we have 11*13 - 5*6 = 113 playable cells. Luckily, 113 is less than 128, so we can store all boxes on a field within a GCC's unsigned __int128 type. If GCC wouldn't support this type, we would have used two unsigned long long's for the same purpose. After that, we can remove a box from a known location using a simple XOR operation. Most importantly, we can easily calculate the number of crates of a field using this code:
__builtin_popcountll would compile into a single popcnt instruction if the CPU architecture supports it, and if GCC feels like doing so (it usually does not, unless provided a -mpopcnt) option. Otherwise, it will compile into most efficient code possible to calculate those bits. From my experience, the resulting code is fast enough to be considered as a constant-time operation.
Same logic applies to storing and also manipulating bombs, players and bonuses.
These improvements allow to drastically speed up score and hash evaluation in some cases.
Estimating the score
No rocket science here. The score of the state consists of the following:
- Number of crates that I've destroyed before I reached this state + the number of crates I will destroy in next 8 moves, if I don't move at all
- Number of bonuses that I've collected, the score decreases if I already have enough bombs and a respectable fire range
- If I'm dead by this time, the score is strongly decreased.
- If I can't escape bombs' fire withing next 8 moves, the score is strongly decreased
- If an enemy can make such 2 moves, that I'm dead or deadly stuck after them, the score is strongly decreased
- If I can make such two moves, that an enemy is definitely dead or deadly stuck, no matter what he does in the future, the score is a little increased. (I removed this in my very final submission)
The overwhelming DFS replacement
It soon becomes obvious, that DFS search visits lots and lots of useless states. E.g., if you manage to destroy 3 crates within next 12 moves, then another search path, where you do not destroy any (within next 12 moves), is most probably useless and should not be evaluated further. However, in the DFS search, we know nothing about neighbouring states on the same level depth, so we can't estimate whether the situation is worth expanding.
The trivial solution to the problem is to replace the DFS with a Breadth-First Search algorithm (BFS) and expand all situations on the same depth level at once. On the other hand, BFS requires to simultaneously store all different states from the same depth level, which may require too much memory. The solution to the problem is to leave N (100, 300, 500) best states from the single depth level and to prune away less promising ones.
Such approach allows to iterate 30-50+ levels deep within 100ms. And thanks to this I was able to rank 1st in Hypersonic. Here goes the C++-based pseudocode:
In order to better understand whether one solution is better than another, I wrote a sandbox, that constantly clashed different strategies with each other. And only after a few thousands battles, you can conclude that solution A is 5.2% better, than solution B. Otherwise, it's sometimes very hard to say for sure, whether some improvement you've made improves anything.
Strategy of Romka, 2nd of Hypersonic
At first I thought that brute-force will perform poorly in this task. The reason was that you need to have a depth of at least 8 in order to see what happens after you put a bomb and it explodes. So my initial approach was the following.
For each cell on the field
- Find some optimal path to this cell to pick up a lot of bonuses (this path could visit each cell at least once)
- Find some optimal bomb placement along this path
- Calculate score for this path based on found bombs placement and bonuses collected
- Select plan with the best score
It was enough to be top-1 for several days, but this approach has obvious limitations. Then I decided to do Breadth-First Search on the graph of all states. I picked BFS because I could naturally limit its depth only by the time limit, which is harder to do with DFS. I decided not to pay any attention on my opponents at first.
One node of graph was describing state of the world after some steps. It included:
- Vector of bombs that was placed either at the beginning or during the BFS
- Mask of exploded boxes (64-bit int)
- Mask of exploded bonuses (64-bit int)
- My position, bombs, range
When performing BFS, for each node extracted from the queue, I did the following:
- Calculate possible moves before the explosion of bombs
- Perform bombs explosion
For each possible move:
- Check if I will not trap myself (that is by far the most time-consuming part)
- Create new state after that move and put it to the queue
When I have reached time limit for the move, I performed a move that leads to a branch with the state with maximum score. Score for each state was calculated as a linear combination of values of exploded boxes and picked bonuses.
When considering boxes, I assign higher values to boxes with bonuses than to empty boxes. Besides that, I reduce box value by (some coefficient) * (steps before this box will be exploded) in order to explode boxes faster.
When considering bonuses, I assign value to it depending on how much of this bonus I've already picked up. I decided that I do not need more than 5 bombs and more than 8 explosion range. Bonuses increasing bomb count were more preferred that ones that increase range.
That was quite good and I decided to improve it. I created "list of forbidden initial moves": at the beginning of each turn I did brute-force search of depth 2 for me and every opponent. If I see that after some move, the opponent has a sequence of two moves so that I get trapped, I added this move to the list of forbidden moves. This gave me much better survivability. I decided not to spend my time writing code for attacking opponents because I found it very hard to achieve: it is very difficult to trap an opponent that performs at least 2-ply DFS in order to escape (as I do).
This was still not as good as I wanted: I had quite optimised code and still could get only to depth 8-9 at the beginning of the game and to depth 4-5 in the middle when the field is almost empty. I decided then to cut off some branches of BFS tree.
My first cut was not to put a bomb if it will not explode any boxes. While it is not optimal in some situations, it is a very rational decision in general. That increased my depth by 1-3 in some cases.
But the main improvement that allowed me to secure my rank in the top-3 was converting BFS to a Beam Search. On each tree level, I decided to leave only some hundreds of nodes that were "most promising". I tried to do this in the Smash The Code contest too, but there were too many combos and it was very hard to decide which nodes were more promising than others.
Here in Hypersonic it was very, very natural. I selected nodes according to the same scoring function described above. It turned out to be a blast :)
With this improvement I managed to easily reach search depth of 30-40 during all the game and it allowed my bot to see way more ahead than other players.
So the secret was just in the Beam Search, not in the scoring function, as some people assumed in the chat :)
Strategy of kimiyuki, 3rd of Hypersonic
I did a simple Beam Search: generate 10 (MOVE/BOMB and 5 directions) next states for each current state and select good ones, depth was 7,8 turns.
I thought that the main difficulty of this game is the distance of time between the action (e.g. putting a bomb) and the effect (e.g. explosion), so I tried to solve this.
I think the important points of my implementation are the following:
- In the evaluation function, I estimate the boxes which will be broken in the future.
- For each state in the beam search, I roughly calculate whether it is a survivable state or not.
- For diversity of states, I restrict the state sets to having only a few states for the same player position.
I didn't implement anything to take other players into account. So my AI was often killed by others.
I suspect that the one step of beam search should be one event (e.g. getting item), instead of one turn.
Thank you all of you for sharing with us and congratulations again! You can find strategies and feedbacks from a lot of other players in this forum topic.
The game will be released really soon in the bot programming section of the platform, stay tuned!