I've always been eager to solve problems and work on challenging tasks. My current position at Epic Games meets these needs well. But recently, I was on vacation for an extended period, and I wanted a brain teaser to keep my mind busy during those days. I chose an optimization game on CodinGame, the Code of the Ring, for such a purpose. Simply said, this game is a brainf**k golfing: given an input text, you have to output that text on a brainf**k machine with the least instruction possible. I had a lot of fun with it. This post will describe my thought process, errors, and successes in solving this optimization game.
The brainf**k machine we are interested in has a memory of 30
cells.
Those cells are registries containing values.
A value of 0
describes a space, and values between 1
and 26
describe characters from A
to Z
.
This machine has a pointer on this memory, initialized to the first cell.
We can increase the address of the pointer with the command >
, moving the pointer to the next cell, and decrease it with <
, moving the pointer to the previous cell.
Addresses are wrapping: moving left while on the first cell places the pointer on the last memory cell, and moving right at the last memory cell puts it on the first cell.
The value pointed can be increased by +
and decreased by -
.
Values are also wrapping: decreasing a space results in a Z
, and increasing a Z
creates a space.
A .
command will output the character signified by the cell at the pointer.
Lastly, the brainf**k language has a loop mechanism: [
jumps past the matching ]
in the command buffer if the cell at the pointer is 0
, and ]
jumps back to the matching [
if the cell at the pointer is nonzero.
The objective of this optimization game is to build the minimal command buffer to output a given input text.
We know the input text for 24
test levels.
The 23
levels used to compute the final scores have hidden input text.
This game has been completed at 100%
by 6805
people, and the best contestant is dbdr
with a score of 2756
.
I've set the following goals for this optimization game:
1%
, which was held by lechium06_
at rank #68
At first, I thought I was too ambitious with the top 1%
goal.
However, things went much smoother than expected.
Still, I made some mistakes that complexified the last improvements.
I have a project, goals, and a time frame; let's start!
I wanted an algorithm that may not be the brightest but guarantees that the same solution will always be found in a finite amount of time. I could then use this algorithm to measure improvements, provide a fallback solution when other algorithms fail, and retrieve hidden validator input texts.
What would be the most straightforward and safest algorithm to use as a baseline?
I chose to process each input text character one by one.
Subsequent sections will show that I carried over this decision, which caused some issues for me.
However, I believe this decision remains valid until a search algorithm is fully operational.
When processing character C
, move to a cell containing C
if such a cell exists, or move to the closest cell containing zero otherwise and update its value to C
.
Then output the current cell, thus sending C
in the output buffer.
When all characters are processed, the message is completely sent. Neat!
I thought this algorithm would be too simple to succeed in all the validator tests.
There are some limitations to the number of executed instructions, and I never bothered with them.
Surprisingly, it scored 8 062
, reaching the #4 482
rank (top 66%
, bottom 34%
), which is pretty decent for the first version.
I wrote a simulator to execute a sequence of brainf**k instructions. Apart from being useful for quickly testing ideas and being the core of future search algorithms, this allowed me to get my score per level. Indeed, optimization games do not provide you with a report per level; you only see which levels passed/failed and the total score. Due to the simplicity of the brainf**k language, writing the simulator was easy. See the table below of the baseline results on validator levels. The baseline scores are the same on the test input, except for the last one (the final validator input has fewer characters than the final test input). Thus, you can compare your results on test levels with the table below if you have not yet retrieved hidden validator inputs.
Level | Score |
---|---|
00 - Sample 1 | 53 |
01 - Short spell | 200 |
02 - One letter x15 | 27 |
03 - Close letters | 44 |
04 - Edge of the alphabet | 25 |
05 - One letter x31 | 32 |
06 - Two letter word x20 | 104 |
07 - Far away letters | 193 |
08 - One letter x53 + one letter x38 | 105 |
09 - One letter x70 | 72 |
10 - Ten letter word x8 | 198 |
11 - Medium spell | 586 |
12 - Seven letter word x26 | 530 |
13 - Long random sequence of 4 letters | 260 |
14 - Incremental sequence of 18 letters | 117 |
15 - Entire alphabet x11 separated by one letter | 809 |
16 - One same letter + a random letter x32 | 534 |
17 - Incremental space separated sequence | 258 |
18 - Pattern followed by other pattern | 319 |
19 - Pattern followed by random sequence of two letters | 313 |
20 - Two patterns repeated randomly | 355 |
21 - Pattern followed by incremental sequence | 399 |
22 - Long spell | 2 549 |
This step cleared my first goal; four more to go!
My baseline algorithm sits idle for most of the two seconds allowed for computation time.
I wanted to know how well a Monte Carlo algorithm would perform on this task, so I wrote a modified baseline algorithm that randomly chooses the zero cell to update to match the current character.
The Monte Carlo algorithm initializes the best solution with the baseline output and then applies the modified baseline until it runs out of time.
It represented a 4.6%
score / ~350
instructions / 95
ranks improvement for a copy-paste and a five-line code change.
If only it could be that easy at work!
I've allowed the Monte Carlo algorithm to run 20 times longer to check for result improvement. I also added an early exit to start another iteration when the current one produces an instruction sequence already longer than the shortest known solution. Unfortunately, it did not significantly change the results, hinting that the search space is too ample for a fully generic Monte Carlo algorithm to do wonders. We must to guide the search algorithm among all the possibilities to find the best ones.
I already restrained my search algorithm by my choice for the baseline: each algorithm step outputs a single character.
A step is composed of a sequence of three phases:
Move(m)
moves the memory pointer by -14 ≤ m ≤ 15
cells,
Update(u)
increments the value in the current cell by -13 ≤ u ≤ 13
,
and Commit
sends the value of the current cell to the output.
How can I extend the expression produced by a step to reach better scores?
I was astonished by the score on the long spell level.
It accounts for 92%
of dbdr
's total score.
For a word of 361
characters, it makes 2043
pointer moves; that's 80%
of moves.
With the baseline algorithm, we always end up with a sequence of memory cells containing zero values.
Crossing this contiguous memory region can cost a lot of pointer moves.
I wondered if it would help to have a different value instead of zero in such a region.
To test this idea, I allowed a step to prepend two other phrases:
an Update(u)
, followed by UpdateLoop(is_right, is_up)
,
where UpdateLoop(true, true)
is [>+]
, UpdateLoop(true, false)
is [>-]
,
UpdateLoop(false, true)
is [<+]
and UpdateLoop(false, false)
is [<-]
.
If unsure about these phases' effects, try +++[>+]
in the CodinGame's IDE to understand what it does.
I added a parameter with this initial value to the baseline and the modified baseline algorithms: they use this parameter to create a first step with the four phases.
Adding loops over possible initialization values, I improved my results 4.6%
score / ~370
instructions / 69
ranks.
Level | Score |
---|---|
00 - Sample 1 | 50 |
01 - Short spell | 200 |
02 - One letter x15 | 27 |
03 - Close letters | 44 |
04 - Edge of the alphabet | 24 |
05 - One letter x31 | 32 |
06 - Two letter word x20 | 97 |
07 - Far away letters | 193 |
08 - One letter x53 + one letter x38 | 105 |
09 - One letter x70 | 72 |
10 - Ten letter word x8 | 198 |
11 - Medium spell | 564 |
12 - Seven letter word x26 | 530 |
13 - Long random sequence of 4 letters | 251 |
14 - Incremental sequence of 18 letters | 117 |
15 - Entire alphabet x11 separated by one letter | 809 |
16 - One same letter + a random letter x32 | 383 |
17 - Incremental space separated sequence | 258 |
18 - Pattern followed by other pattern | 309 |
19 - Pattern followed by random sequence of two letters | 313 |
20 - Two patterns repeated randomly | 284 |
21 - Pattern followed by incremental sequence | 399 |
22 - Long spell | 2 085 |
I knew the [>]
and [<]
tricks to move the pointer to the nearest zero cell quickly, but I was reluctant to implement it at that time.
I needed to retrieve the hidden validator inputs to understand better what kind of inputs the search algorithm must solve.
Using the level names, I wrote code that detects which level a given validator input corresponds to. I assumed input strings were the same length as the test levels. This assumption was correct for all but the long spell level (the validator input is one character less than the test input). When the code recognizes a level, it calls the baseline algorithm, otherwise it returns the empty string. Analyzing the submission reports, I quickly validated the detection code for all validator levels. Knowing the level and the input length, I needed to retrieve the characters for each level. I assumed there was some symmetry, such that test and validator levels have the same complexity. Again, this assumption was correct for all but the last validator.
Usually, when retrieving hidden validator inputs on CodinGame, we can retrieve one bit of information per submission, i.e., if we failed or succeeded a level.
I'll explain how we can retrieve more bits per level in this game.
Suppose you produce a solution containing N
characters for a level and want to know the first input character C
of such a level.
Remember there are 27
possible characters, so you can encode C
with a value V
between 0
and 26
.
Append >
character V
times to your N
-length solution on that level, and let all other levels fail.
The submission report gives you the N + V
number; you only have to subtract N
to know the first character C
.
Using this trick and my assumptions above, getting all the hidden validator inputs was easy.
I did not want to code the solution manually for six levels: 1
, 7
, 11
, 13
, 16
, and 22
.
It looked tedious and a better fit for the search algorithm. I found solutions by hand for the 17 other levels.
It took me a while, but I improved the score by 39%
/ 2 895
instructions / 3 855
ranks.
I achieved two more of my goals and reached rank 463
.
All I needed was to improve my score on the remaining six levels by 27%
with a search algorithm to complete all my goals.
Level | Score |
---|---|
00 - Sample 1 | 28 |
01 - Short spell | 200 |
02 - One letter x15 | 20 |
03 - Close letters | 31 |
04 - Edge of the alphabet | 21 |
05 - One letter x31 | 12 |
06 - Two letter word x20 | 22 |
07 - Far away letters | 193 |
08 - One letter x53 + one letter x38 | 31 |
09 - One letter x70 | 16 |
10 - Ten letter word x8 | 66 |
11 - Medium spell | 564 |
12 - Seven letter word x26 | 63 |
13 - Long random sequence of 4 letters | 251 |
14 - Incremental sequence of 18 letters | 21 |
15 - Entire alphabet x11 separated by one letter | 19 |
16 - One same letter + a random letter x32 | 383 |
17 - Incremental space separated sequence | 10 |
18 - Pattern followed by other pattern | 77 |
19 - Pattern followed by random sequence of two letters | 118 |
20 - Two patterns repeated randomly | 177 |
21 - Pattern followed by incremental sequence | 41 |
22 - Long spell | 2 085 |
I chose early what expression a step of a search algorithm produces.
I knew it would limit me if I tried to compete with the top performers of this game, but this decision made the task easier.
I wrote a StateChange
data structure to describe the effect of such a step.
You notice that I do not manipulate brainf**k expression directly.
struct StateChange { // excerpt of the code I use
// pre-update: first Update(u) + UpdateLoop(is_right, is_up)
int8_t pre_update_delta : 5; // parameter u of the first Update()
uint8_t pre_update : 1; // skip this two phases if 0
uint8_t pre_update_right : 1; // parameter is_right of UpdateLoop()
uint8_t pre_update_up : 1; // parameter is_up of UpdateLoop()
// Move(m)
int8_t move_delta : 5; // parameter m
uint8_t move_with_jump : 1; // 1 if we use nonzero cell skips (used to convert to expression)
uint8_t move_with_jump_right: 1; // 1 if we use [>] to skip nonzero cells, 0 for [<] (used to convert to expression)
// Update(u)
uint8_t new_value : 5; // u = signed_distance(current_cell_value, new_value)
// How many characters are needed to express this change in brainf**k
uint8_t instruction_cost;
};
This StateChange
structure takes into account shortcuts for updates (relying on value wrapping to use fewer characters, fast reset to 0
with [+]
, and fast increase/decrease by 12
and 13
)
and moves (relying on memory address wrapping, jumping over nonzero values with [>]
and [<]
) when computing the instruction cost or converting to brainf**k expression.
A State
structure holds the cell values and the memory pointer address of the brainf**k machine and accepts StateChange
parameters to apply changes or convert to brainf**k expression.
struct State { // excerpt of the code I use
Memory memory;
Position position;
uint16_t instruction_cost;
void apply(const StateChange change) {
if (change.pre_update) {
memory.update(position, change.pre_update_delta);
const int8_t direction = change.pre_update_right ? 1 : -1;
const int8_t increment = change.pre_update_up ? 1 : -1;
while (memory.get(position)) {
position += direction;
memory.update(position, increment);
}
}
position += change.move_delta;
memory.set(position) = change.new_value;
instruction_cost += change.instruction_cost;
}
// Same as above but convert change to brainf**k expression in an output structure
void apply(const StateChange change, Output &output);
};
With these two data structures, I wrote some change selection algorithms, namely a random one, a baseline one, and a greedy one.
// Fully construct a state change, ensuring optimal instruction cost
StateChange next(const State &state, const Position position, const uint8_t value);
// Fully construct a state change with a pre-update, ensuring optimal cost
StateChange next_with_pre_update(const State &state, const Position position, const uint8_t value,
const int8_t pre_update_delta, const bool pre_update_right, const bool pre_update_up);
// Modified baseline algorithm
StateChange next_baseline(const State &state, const uint8_t value,
const bool with_pre_update_loop, const int8_t pre_update_delta) {
for (int8_t index = 0; index < constants::memory_size; ++index) {
if (state.memory.get(index) == value) {
return with_pre_update_loop && pre_update_delta
? next_with_pre_update_loop(state, index, value, pre_update_delta, true, true)
: next(state, index, value);
}
}
const Position destination = state.find_right_zero(state.position);
return with_pre_update_loop && pre_update_delta
? next_with_pre_update_loop(state, destination, value, pre_update_delta, true, true)
: next(state, destination, value);
}
// Random change
StateChange next_random(const State &state, Randomizer &randomizer, const uint8_t value, const bool with_pre_update_loop) {
const int8_t index = int8_t(randomizer.choice(constants::memory_size));
if (with_pre_update_loop) {
const int8_t pre_update_delta = int8_t(randomizer.choice(Runes::code_count) - 13);
const bool pre_update_right = randomizer.uniform() > 0.5f;
const bool pre_update_up = randomizer.uniform() > 0.5f;
return next_with_pre_update_loop(state, index, value, pre_update_delta, pre_update_right, pre_update_up);
}
return next(state, index, value);
}
// Greedy algorithm: find the change with the smallest instruction cost
StateChange next_greedy(const State &state, const uint8_t value, const bool with_pre_update_loop) {
StateChange best = next(state, 0, value);
for (int8_t index = 1; index < constants::memory_size; ++index) {
StateChange candidate = next(state, index, value);
if (candidate.instruction_cost < best.instruction_cost) {
best = candidate;
}
}
if (with_pre_update_loop) {
for (int8_t index = 0; index < constants::memory_size; ++index) {
for (int8_t update_delta = -13; update_delta <= 13; ++update_delta) {
if (StateChange candidate_left = next_with_pre_update_loop(state, index, value, update_delta, false, true);
candidate_left.instruction_cost < best.instruction_cost) {
best = candidate_left;
}
if (StateChange candidate_right = next_with_pre_update_loop(state, index, value, update_delta, true, true);
candidate_right.instruction_cost < best.instruction_cost) {
best = candidate_right;
}
}
}
}
return best;
}
I wrote a genetic algorithm template for the Mars Lander optimization game.
I applied it to search for better state change sequences and minimize the instruction costs.
With the genetic algorithm framework, a gene is a state change, and a chromosome is a state change sequence (a solution).
The state change structure allows for simple crossovers and mutations.
I was able to set up the whole genetic algorithm in one hour.
To my surprise, the first version of the genetic algorithm bot already reached the top 50
players.
I tweaked the genetic algorithm with a graphical visualizer application and searched for the best solutions offline.
In the end, the genetic algorithm improved my score by 27%
in total / 32%
on the six levels to search / 1 203
instructions / 430
ranks.
With the score of 3 246
I was at rank 33
.
At this point, I reached all my goals. However, that seemed too easy, and I needed to learn more. The genetic algorithm was slow to progress. I kept a pool of the best-found chromosomes. When I removed one of such chromosomes, it could take the search algorithm a lot of time before finding it again. The graphical analyzer showed idiotic/random changes undermined the effects of a good change in a chromosome. For a good change to shine, many other good changes must be in the chromosome. This situation reminded me of the beam search algorithm.
Level | Genetic Algorithm | Beam Search |
---|---|---|
01 - Short spell | 127 | 119 |
07 - Far away letters | 184 | 167 |
11 - Medium spell | 313 | 284 |
13 - Long random sequence of 4 letters | 250 | 236 |
16 - One same letter + a random letter x32 | 215 | 201 |
22 - Long spell | 1 384 | 1 124 |
I wrote a simple beam search algorithm in which the heuristic used to select the best nodes is the partial score,
i.e., the number of brainf**k instructions needed to reach this state. Even with such a poor heuristic (or absence thereof, as I do not estimate the final cost),
the results were immediately better.
I spent many hours tweaking the beam search, adding new possible states, better filtering, optimizing memory footprint for more extensive searches, and iteratively improving my score.
At some point, I was stuck by my choice for the state change: unable to experiment with more complex expressions, such as loop expressions containing commit instructions or using nearby cells to make fast update loops.
My vacations were almost over, I achieved all my goals and wanted to finish this project soon, so I inserted a string field in the state change structure to represent a custom brainf**k expression.
It was insufficient to describe any expression, but it helped me progress without rewriting everything.
As I generate a lot of children when analyzing a node, a heuristic is vital to identifying suitable candidates; otherwise, the search favors nodes adding fewer instructions for the current depth (I call this a myopic heuristic, as it fails to see the benefits from afar).
I chose to compute the score after K
greedy algorithm steps: a node adding more instructions now, but that allows later a better result would be selected.
I stopped the optimization game at this point, with 2 094
instructions at rank #11
.
Level | Score |
---|---|
00 - Sample 1 | 28 |
01 - Short spell | 119 |
02 - One letter x15 | 20 |
03 - Close letters | 31 |
04 - Edge of the alphabet | 21 |
05 - One letter x31 | 12 |
06 - Two letter word x20 | 22 |
07 - Far away letters | 167 |
08 - One letter x53 + one letter x38 | 31 |
09 - One letter x70 | 16 |
10 - Ten letter word x8 | 66 |
11 - Medium spell | 284 |
12 - Seven letter word x26 | 63 |
13 - Long random sequence of 4 letters | 236 |
14 - Incremental sequence of 18 letters | 21 |
15 - Entire alphabet x11 separated by one letter | 19 |
16 - One same letter + a random letter x32 | 201 |
17 - Incremental space separated sequence | 10 |
18 - Pattern followed by other pattern | 77 |
19 - Pattern followed by random sequence of two letters | 118 |
20 - Two patterns repeated randomly | 177 |
21 - Pattern followed by incremental sequence | 41 |
22 - Long spell | 1 124 |
This optimization game was really fun to play.
Being able to search solutions manually helped me better grasp tricks for reducing the instruction number.
The set of instructions generated by a search step I chose easily reached the top 1%
.
However, it hindered me from obtaining better scores; getting to rank #11
took more work.
I should have switched to a beam search algorithm sooner and worked directly on brainf**k expressions or instructions.
I wanted to avoid solving the evaluation of nodes having outputted a different number of characters.
I'm always struggling with writing evaluation functions, and it forces me to find alternatives that might be suboptimal.
The eleventh rank is still a decent result, and I'd recommend going the same route if you struggle with evaluations.
I think I'll choose an optimization game again for my next project. Preferably, it should have post-mortems that mention the efficient use of hill climbing or beam search. As it was the case with MCTS (see Langton's Ant and Spring 21 Contest), I feel I'm about to understand those search algorithms better, adding another string to my bow.