Quick Note: Since I began writing this series of articles, I have discovereda rather nifty plugin for displaying LaTeX mathematics markup in WordPress. Thus, combination notation, which I had previously been writing in the crude ASCII form of C(n,k) I will now write in its more proper and compact mathematical notation .
This brief analysis treats the generator in three stages. First, the enumeration of all tickets and matches. Second, the construction of the problem graph. And finally, the selection of tickets and updating of coverage potentials.
The enumeration of all tickets is straightforward to analyze. Since we use precomputed bit counts to avoid iterating over the full potential ticket representations, and focus only on those with the correct number of ball selections, the enumeration of all tickets takes place in time if we define as “the number of tickets in the game”. It is proportional only to the number of tickets. Similarly, since the size of a match is at largest the size of a ticket, all matches can be enumerated in time. Note that while this is linear in the number of tickets , it is in fact still super-polynomial in terms of the problem parameters, since the number of tickets is given by . Overall, this stage has a bound of .
Generating each match node and ticket node in the graph takes constant time, so all nodes can again be generated in time. However, generating the edges takes more time. For each ticket node, a number of edges equal to the number of matches per ticket must be generated. So overall, this stage has a bound of , which dominates the running time of the previous stage.
Simple observation of the behavior of the generator makes it clear that the third stage is what produces the bulk of the running time. Every time the coverage potentials are updated, during the depth-first traversal, the algorithm must traverse a number of graph edges equal to at the first and third levels, and equal to at the second and fourth levels. So the running time of each depth-first traversal is . And at worst, we will select all of the possibile tickets for the wheel, and so the overall complexity for this stage is , which again dominates the complexity of the previous two stages and so describes the overall time complexity of the whole algorithm.
The most interesting thing to compare this to is the time complexity of the most naïve greedy implementation, without all this graph construction and depth-first traversal business. The simplest implementation would, for each round, check all matches against every pair of tickets to determine which ticket covers the most uncovered tickets. Following the same logic as above, the dominating term for this algorithm is . Thus we can see the improvement of this approach over the more simplistic style of greedy algorithm in that we reduced by two the order of the term , which is by far the largest term involved. If you plug in values for r, t, and m, you will note that the magnitude of is always much larger than and and larger than by several orders of magnitude.
Also, we can argue that either of these is in some sense a pseudo-polynomial time algorithm. I’m abusing this term a bit, but the idea is that the algorithm can be thought of as polynomial time under a particular definition of what the “size of the problem” is.
If we want to simplify the bound on the graph-based implementation a bit (and loosen the bound) we can drop the subtracted variables in the third term, and obtain the complexity . Observe that is equal to , the number of tickets in the game. Also, represents the “number of matches per ticket”. Let’s again be loose and note that the number of matches per ticket can certainly not exceed the number of tickets in the game (even in the degenerate case of a single-ticket game). Then the running time of the algorithm is bounded by , which is polynomial.
I am fairly sure that this bound could be improved upon significantly; it is quite loose. The point is only to show that indeed this algorithm is pseudo-polynomial. Again, note that this is polynomial in the number of tickets, not in any of the individual parameters r, t, or m. In terms of the raw parameters, the algorithm is still super-polynomial.
The real, practical test of the generator, however is twofold: Can it run fast enough to generate a wheel for 3-matches in a 6-from-49 lottery under the time and memory constraints I initially imposed on myself, and is the generated wheel optimal?
Recall that the constraints I initially imposed on myself were as follows:
As for resources, I wanted the program to be able to generate the 3-matching wheel from the 6-from-49 lottery using less than 2 gigabytes of RAM, and to do so in at most a few days on a modern 2-4 GHz processor.
On a 2.66 GHz Intel Core 2 Duo (though only using a single core), the generator consumed a peak of 1.87 GB of RAM and ran for approximately 36 hours before returning its wheel for 3-matches in a 6-from-49 lottery. Here is the wheel it generated.
Incidentally, don’t let the 36 hours part scare you. A 6-from-49 lottery is on the upper end of practical for this generator, and since the algorithm is super-polynomial in its parameters, even modest reductions in parameters give large speedups in performance. Lotteries of under r = 25 or so tend to take more in the range of seconds to minutes, rather than days, and setting r = 32 produces a result in under 4 hours.
The wheel that it generated for my target 6-from-49 3-match lottery was 325 tickets long. While this is in my opinion a pretty good wheel, it is not optimal. This is demostrated simply by referring to the UK National Lottery Wheeling Challenge site which originally inspired this endeavor. Their current champion wheel (of unknown origin – possibly manually generated), is a scant 163 tickets long; my wheel is almost exactly twice as long as necessary. Still, I consider this a relatively good first attempt, given that a 6-from-49 lottery contains almost 14 million tickets.
What went wrong then? First, nothing really “went wrong”. This experiment has demonstrated that a simple greedy algorithm in insufficient to solve this problem optimally, but that it can give relatively good results. I was unable to find any research or information on the Internet that demonstrated how effective a greedy approach would be, or even gave an idea of how one might be implemented such that it could work on realistic problems under reasonable time and space constraints. In the context of filling that knowledge gap, I consider myself to have succeeded.
On the other hand, the ultimate goal remains to generate an optimal wheel, and in that my approach failed. Based on intermediate output that my generator produced, the tantalizing fact is that the greedy generator appears at first able to cover a huge number of tickets with a very small number of selections, but just before the end encounters a series of individual uncovered tickets that cannot be covered more than one, two, or three at a time. Tickets to cover these must be added to the wheel, which balloons its size.
If selection had been made more intelligently during the generation process, the generator may have been able to realize that it was risking such a “seven-ten split” situation, and attempted to avoid it. In that vein, my next improvement upon this generator will be to change from a simple greedy algorithm to a heuristic greedy algorithm which attempts to avoid this very situation as much as possible. In essence, I will want it to sometimes select a ticket with somewhat less than maximal coverage potential, if doing so can reduce the likelihood of leaving behind quite so many “orphaned” tickets.
This, by the way, is why I was a bit fast and loose above with the complexity analysis. It’s not particularly useful to have a tight upper bound on an almost-optimal algorithm (especially one without a definite approximation factor). I will save more of the heavy math work for later if I ever do achive a generator that is either optimal or for which I have come up with a precise approximation factor.
And last but not least, the code. The full implementation to the generator, as source code, can be downloaded here (453 KiB). It is a bzip2 compressed tar archive of C++ source code which can be compiled and run using GCC on Linux. I have not tested this on Windows or on any compiler other than GCC, but it should be simple and standards-conformant enough that it will work in other environments with only minor modifications. The comments in the wheelgen.cc source file point out a few areas that may need to be tweaked on other compilers.
This code is released under the GNU General Public License version 3 or later. See the included COPYING file for details. If anyone finds a bug in the code, feel free to e-mail me or post about it in the comments.
|« Relationship to Minimum Set Covering|
Share this content on: