Workaround

Hi Codeforces! In Round 507 earlier today, a large number of "mostly correct" randomized solutions on 1039B - Subway Pursuit were hacked. I wanted to write a quick explanation of how it's possible to hack these, as well as a guide on how to write your code so that you aren't vulnerable to getting hacked.

First to go over the solution quickly (skip down a few paragraphs if you already know the solution): one can use binary search to reduce the range of possible locations for the train to a constant multiple of \(K\), such as \(6K\). Just make sure to expand out both ends of the range by \(K\) after every iteration. Once the range is at most \(6K\), one can repeat the following:

  1. Make a random guess out of the remaining numbers. The probability of success is at least \(1\over 6K\).

  2. Binary search again until the range is at most \(6K\) again. It turns out this only takes one query, since after our previous guess our \(6K\) range will become an 8\(K\) range, and a single query will reduce this to \(\frac{8K}{2}+2K=6K\) again.

Since \(K\) ≤ 10 and we have 4500 queries, this works because our probability of failure on any test case is at most \((\frac{59}{60})^{2200}\approx8\cdot10^{-17}\), extremely low.

However, this makes a key assumption that the train locations are independent of your queries. The main problem that most hacked solutions had was being completely deterministic; i.e., the programs query the same sequence of numbers on every run, given the same output. Most people had this issue due to using rand() without ever calling srand(). There were a few other causes as well: random_device is oddly deterministic on Codeforces, as is mt19937 without a seed. Calling srand() with a fixed value is no good either, as the program is still totally predictable.

Because of the predictability, these programs are quite easy to hack: simply generate the same sequence of queries that the program would make, and set up the train to always be at a different location from each query. To make this even easier for yourself when hacking, you can choose \(N=2\) and \(K=1\), which skips the initial binary search phase, and then literally move the train to the non-queried option between \(1\) and \(2\) every time.

Workaround?

To get around this, many competitors seeded their generators with the current time by calling srand(time(NULL)); This stops the code from being deterministic and makes it less likely you will get hacked, but it turns out this is still very possible to hack. How? The main problem here is that time(NULL) is only precise to one second. So if I'm able to guess the second that your program gets run, your program is effectively deterministic.

It turns out I don't even need to guess. If I set up \(N=11\) and \(K=10\), I can produce all the different query sequences you might generate in the next \(10\) seconds, since your code will almost certainly be run a few seconds after my generator. Then for every query, at least one of the \(11\) positions will not be chosen by any of the \(10\) sequences, and I can move the train to that position each time. Unfortunately -- or fortunately for the people in my room 😉 -- I didn't have time to do this during the contest.

Solution

The fix is quite easy. time(NULL) has the right idea, but we should use a much more high-precision clock:

mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());

Then generate random numbers by calling rng(). Remember to include <chrono> and <random> for this.

Another option is to use the address of a newly allocated heap variable:

mt19937 rng((uint64_t) new char);

This should be different every run. Note this creates a tiny memory leak for the life of the program since we never call delete, but since it's just one variable it's not a big deal.

I personally like to combine both of these methods, but it isn't necessary. Either one will give your program much more unpredictability, making it effectively impossible to hack.

Thanks for reading! This will likely lead to me getting fewer hacks in the future, but I thought the community should have a guide explaining this unusual situation.

If you want to read more, take a look at dacin21's great post on anti-hash tests.

本文转载并翻译自 neal 的博客《How randomized solutions can be hacked, and how to make your solution unhackable》。

posted @ 2025-07-18 07:49  chu_yh  阅读(3)  评论(0)    收藏  举报