

The 31st try resulted in them only solving the problem for odd m, but the even m case was still open. So of course this happened:
Filip also told me that he asked Claude to continue on the even case after the odd case had been resolved. āBut there after a while it seemed to get stuck. In the end, it was not even able to write and run explore programs correctly anymore, very weird. So I stopped the search.ā
Knuth did add a postscript on other friends maybe kinda vibing a possible solution for even m:
On March 3, Stappers wrote me as follows: āThe story has a bit of a sequel. I put Claude Opus 4.6 to work on the m = even cases again for about 4 hours yesterday. It made some progress, but not a full solution. The final program . . . sets up a partial fiber construction similar to the odd case, then runs a search to fix it all up. . . . Claude spent the last part of the process mostly on making the search quicker instead of looking for an actual construction. . . . It was running many programs trying to find solutions using simulated annealing or backtrack. After I suggested to use the ORTools CP-SAT [part of Googleās open source toolkit, with the AddCircuit constraint] to find solutions, progress was better, since now solutions could be found within seconds.ā This program is [4].
Then on March 4, another friend ā Ho Boon Suan in Singapore ā wrote as follows: āI have code generated by gpt-5.3-codex that generates a decomposition for even m ā„ 8. . . . Iāve tested it for all even m from 8 to 200 and bunch of random even values between 400 and 2000, and it looks good. Seems far more chaotic to prove correctness by hand here though; the pattern is way more complex.ā That program is [5]. (Wow. The graph for m = 2000 has 8 billion vertices!)
I find it slightly funny how Stappers suggested to the AI to use specific external tools that are actually reliable (like ORTools). This also makes me question how much the of the AIās āinsightā was a result of handholding and the rubber duck effect.
For context:
- This is planned as a hard exercise for a textbook.
- There are likely so many solutions that finding a general program that works is like hitting the side of a barn with an arrow. Random bullshit go is an excellent strategy here.
- The AIs did not provide proofs that their solutions worked. This is kind of a problem if you want to demonstrate that AI has understanding.
Iād say that the great problems that last for decades do not fall purely to random bullshit and require serious advances in new concepts and understanding. But even then, the romanticized warrior culture view is inaccurate. Itās not like some big brain genius says āIām gonna solve this problemā and comes up with big brain ideas that solve it. Instead, a big problem is solved after people make tons of incremental progress by trying random bullshit and then someone realizes that the tools are now good enough to solve the big problem. A better analogy than the Good Will Hunting genius is picking a fruit: you wait until it is ripe.
But math/CS research is not just about random bullshit go. The truly valuable part is theory and understanding, which comes from critically evaluating the results of whatever random bullshit one tries. Why did idea X work well with Y but not so well with Z, and where else could it work? So random bullshit go is a necessary part of the process, but Iād say research has value (and prestige) because of the theory that comes from people thinking about it critically. Needless to say, LLMs are useless at this. (In the Knuth example, the AI didnāt even prove that its construction worked.)
I think intelligence is overrated for research, and the most important quality for research is giving a shit. Solving big problems is mostly a question of having the right perspective and tools, and raw intelligence is not very useful without them. To do that, one needs to take time to develop opinions and feelings about the strengths and weaknesses of various tools.
Of course, every rule has exceptions, and there have been long standing problems that have been solved only when someone had the chutzpah to apply far more random bullshit than anyone had dared to try before.