# Results and Analysis of the Greedy Implementation

Quick Note: Since I began writing this series of articles, I have discovereda rather nifty plugin for displaying LaTeX mathematics markup in WordPress. Thus, combination notation, which I had previously been writing in the crude ASCII form of C(n,k) I will now write in its more proper and compact mathematical notation $\binom{n}{k}$.

### Complexity Analysis

This brief analysis treats the generator in three stages. First, the enumeration of all tickets and matches. Second, the construction of the problem graph. And finally, the selection of tickets and updating of coverage potentials.

The enumeration of all tickets is straightforward to analyze. Since we use precomputed bit counts to avoid iterating over the full $O(2^r)$ potential ticket representations, and focus only on those with the correct number of ball selections, the enumeration of all tickets takes place in $O(n_t)$ time if we define $n_t$ as “the number of tickets in the game”. It is proportional only to the number of tickets. Similarly, since the size of a match is at largest the size of a ticket, all matches can be enumerated in $O(n_t)$ time. Note that while this is linear in the number of tickets $n_t$, it is in fact still super-polynomial in terms of the problem parameters, since the number of tickets is given by $\binom{r}{t}$. Overall, this stage has a bound of $O(\binom{r}{t})$.

Generating each match node and ticket node in the graph takes constant time, so all nodes can again be generated in $O(n_t)$ time. However, generating the edges takes more time. For each ticket node, a number of edges equal to the number of matches per ticket $\binom{t}{m}$ must be generated. So overall, this stage has a bound of $O\left(\binom{r}{t}\binom{t}{m}\right)$, which dominates the running time of the previous stage.

Simple observation of the behavior of the generator makes it clear that the third stage is what produces the bulk of the running time. Every time the coverage potentials are updated, during the depth-first traversal, the algorithm must traverse a number of graph edges equal to $\binom{t}{m}$ at the first and third levels, and equal to $\binom{r-m}{t-m}$ at the second and fourth levels. So the running time of each depth-first traversal is $O\left(\binom{t}{m}^2\binom{r-m}{t-m}^2\right)$. And at worst, we will select all of the possibile tickets for the wheel, and so the overall complexity for this stage is $O\left(\binom{r}{t}\binom{t}{m}^2\binom{r-m}{t-m}^2\right)$, which again dominates the complexity of the previous two stages and so describes the overall time complexity of the whole algorithm.

The most interesting thing to compare this to is the time complexity of the most naïve greedy implementation, without all this graph construction and depth-first traversal business. The simplest implementation would, for each round, check all matches against every pair of tickets to determine which ticket covers the most uncovered tickets. Following the same logic as above, the dominating term for this algorithm is $O\left(\binom{r}{t}^3\binom{t}{m}\right)$. Thus we can see the improvement of this approach over the more simplistic style of greedy algorithm in that we reduced by two the order of the term $\binom{r}{t}$, which is by far the largest term involved. If you plug in values for r, t, and m, you will note that the magnitude of $\binom{r}{t}$ is always much larger than and $\binom{r-m}{t-m}$ and larger than $\binom{t}{m}$ by several orders of magnitude.

Also, we can argue that either of these is in some sense a pseudo-polynomial time algorithm. I’m abusing this term a bit, but the idea is that the algorithm can be thought of as polynomial time under a particular definition of what the “size of the problem” is.

If we want to simplify the bound on the graph-based implementation a bit (and loosen the bound) we can drop the subtracted $m$ variables in the third term, and obtain the complexity $O\left(\binom{t}{m}^2\binom{r}{t}^3\right)$. Observe that $\binom{r}{t}$ is equal to $n_t$, the number of tickets in the game. Also, $\binom{t}{m}$ represents the “number of matches per ticket”. Let’s again be loose and note that the number of matches per ticket can certainly not exceed the number of tickets in the game (even in the degenerate case of a single-ticket game). Then the running time of the algorithm is bounded by $O(n_t^5)$, which is polynomial.

I am fairly sure that this bound could be improved upon significantly; it is quite loose. The point is only to show that indeed this algorithm is pseudo-polynomial. Again, note that this is polynomial in the number of tickets, not in any of the individual parameters r, t, or m. In terms of the raw parameters, the algorithm is still super-polynomial.

### Practical Results

The real, practical test of the generator, however is twofold: Can it run fast enough to generate a wheel for 3-matches in a 6-from-49 lottery under the time and memory constraints I initially imposed on myself, and is the generated wheel optimal?

Recall that the constraints I initially imposed on myself were as follows:

As for resources, I wanted the program to be able to generate the 3-matching wheel from the 6-from-49 lottery using less than 2 gigabytes of RAM, and to do so in at most a few days on a modern 2-4 GHz processor.

On a 2.66 GHz Intel Core 2 Duo (though only using a single core), the generator consumed a peak of 1.87 GB of RAM and ran for approximately 36 hours before returning its wheel for 3-matches in a 6-from-49 lottery. Here is the wheel it generated.

Incidentally, don’t let the 36 hours part scare you. A 6-from-49 lottery is on the upper end of practical for this generator, and since the algorithm is super-polynomial in its parameters, even modest reductions in parameters give large speedups in performance. Lotteries of under r = 25 or so tend to take more in the range of seconds to minutes, rather than days, and setting r = 32 produces a result in under 4 hours.

The wheel that it generated for my target 6-from-49 3-match lottery was 325 tickets long. While this is in my opinion a pretty good wheel, it is not optimal. This is demostrated simply by referring to the UK National Lottery Wheeling Challenge site which originally inspired this endeavor. Their current champion wheel (of unknown origin – possibly manually generated), is a scant 163 tickets long; my wheel is almost exactly twice as long as necessary. Still, I consider this a relatively good first attempt, given that a 6-from-49 lottery contains almost 14 million tickets.

What went wrong then? First, nothing really “went wrong”. This experiment has demonstrated that a simple greedy algorithm in insufficient to solve this problem optimally, but that it can give relatively good results. I was unable to find any research or information on the Internet that demonstrated how effective a greedy approach would be, or even gave an idea of how one might be implemented such that it could work on realistic problems under reasonable time and space constraints. In the context of filling that knowledge gap, I consider myself to have succeeded.

On the other hand, the ultimate goal remains to generate an optimal wheel, and in that my approach failed. Based on intermediate output that my generator produced, the tantalizing fact is that the greedy generator appears at first able to cover a huge number of tickets with a very small number of selections, but just before the end encounters a series of individual uncovered tickets that cannot be covered more than one, two, or three at a time. Tickets to cover these must be added to the wheel, which balloons its size.

If selection had been made more intelligently during the generation process, the generator may have been able to realize that it was risking such a “seven-ten split” situation, and attempted to avoid it. In that vein, my next improvement upon this generator will be to change from a simple greedy algorithm to a heuristic greedy algorithm which attempts to avoid this very situation as much as possible. In essence, I will want it to sometimes select a ticket with somewhat less than maximal coverage potential, if doing so can reduce the likelihood of leaving behind quite so many “orphaned” tickets.

This, by the way, is why I was a bit fast and loose above with the complexity analysis. It’s not particularly useful to have a tight upper bound on an almost-optimal algorithm (especially one without a definite approximation factor). I will save more of the heavy math work for later if I ever do achive a generator that is either optimal or for which I have come up with a precise approximation factor.

### Implementation Code

And last but not least, the code. The full implementation to the generator, as source code, can be downloaded here (453 KiB). It is a bzip2 compressed tar archive of C++ source code which can be compiled and run using GCC on Linux. I have not tested this on Windows or on any compiler other than GCC, but it should be simple and standards-conformant enough that it will work in other environments with only minor modifications. The comments in the wheelgen.cc source file point out a few areas that may need to be tweaked on other compilers.

This code is released under the GNU General Public License version 3 or later. See the included COPYING file for details. If anyone finds a bug in the code, feel free to e-mail me or post about it in the comments.

1. Carlos says:

Hi there,

just a quick note:

The programm Covermaster, which apparently also works with a greedy algorythm, gets a 322 ticket wheel in about 10 minutes on my computer.

http://homepage.ntlworld.com/honest.john/

I use this programm since 2.000 for fun when looking for ticket combinations for saturday lotto.

What do you think might be difference between your approach and Covermaster’s that makes such a big difference in calculation time occur?

Thanks for your Site, well explained.

2. Obviously I can’t say what CoverMaster is doing that allows it to run so fast because there appears to be no source code or documentation available for it.

However, I would suspect that one thing it might be doing is either using something precomputed, or generating some portion of the wheel at random before starting to actually use a greedy algorithm.

If it is actually using a greedy algorithm from scratch like my approach, I would be very interested in knowing how it manages to do that, since in my experience with this it takes a few minutes just to compute how all of the tickets are related to one another.

3. Carlos says:

Hello again,

Covermaster also has an optimization button, which seems to go through the Matrix again, after the greedy value has been found.

Clicking there it takes the program 4 hours to spit a 312 ticket wheel. There still is some room until reaching 163.

The program with all its features is barely 2 MB big. I don’t know much about programming, but maybe it’s written in assembler.

Rgds

Carlos

4. Quite informative article I have to say. You put the problem to its deserved dimension. I have written such a program, called Wheel Generator and during the development I tried all sorts of different approaches over the years, including greedy. I had to tackle the same design issues as you have demonstrated so I’ll not delve in this here. However, my two cents here is that a greedy algorithm approach is definitely not a good option, from personal experience, when dealing with set cover theory. Better to opt for hill-climbing/simulated annealing. The only thing I haven’t tried yet and I intend to do sometime in the future is to use genetic algorithms & mutation.
Also, Covermaster depends on the concept of variants of an existing ticket (that is to change one of its numbers and test again if it can provide equal or better coverage – this is what the Var column shows during optimization: the variant picked) and it is written in assembly as far as I know. Don’t ask me more on this, I just don’t know more details 🙂

cheers
Anastasios

5. I’ve had an identical problem as an assignment of an advanced undergraduate class. My goal was just to find the wheel as I’ve had a month to do it and I wasn’t thinking so much about optimization, algorithm was there, it was a cover matrix where minimum set cover was being sought.(I was checking each ticket against every other ticket in a C(n,k)*C(n,k) matrix, put a 1 if it covers the ticket(if r numbers are the same), 0 otherwise and after that I just found the largest cover, selected that ticket and continued).

What surprised me when I compiled your solution is that even with approach of covering all of the C(n,r) (n = num of tickets, r = size of matching to guarantee) this greedy algorithm produces the same solution(number of cover tickets), faster.

Still, a nice read, I really wondered how I didn’t find this last year as a reference. Problem still stayed in my head all this time.