Implementation Challenges

Part of a series of articles on

Describing the greedy Set Covering algorithm, as I did in the previous article, is simple. It can be stated in a single sentence. Implementing it so that it can operate on Set Covering problems that correspond to real-world sized Lottery Wheel Generation problems using only the resources commonly available on a home computer is quite a bit more difficult.

For the purposes of this endeavor, I focused on being able to generate a 3-matching wheel for a 6-from-49 lottery. The reason for this particular choice is that it is the problem posed on the UK National Lottery Wheeling Challenge website, which inspired this work.

As for resources, I wanted the program to be able to generate the 3-matching wheel from the 6-from-49 lottery using less than 2 gigabytes of RAM, and to do so in at most a few days on a modern 2-4 GHz processor. Multiple cores were not considered as the implementation is not parallelized.

Each of the sections below will outline an implementation challenge inherent to this problem. The next article will describe the solutions I chose.

Expressing All Possible Tickets

For the specific problem I was targeting, expressing tickets was not really a problem. However, I wanted my program to work not just on 6-from-49 lotteries, but on a wide class of Lottery Wheel Generation problems, even if larger problems exceeded the resource limitations that I imposed above. For a 6-from-49 lottery, expressing a ticket can be as simple as having an array of the 6 numbers that the ticket contains.

However, for larger problems, this is an issue. Using this method increases memory consumption very quickly. Each additional number on the ticket requires at least one additional byte of RAM (possibly more than one, depending on how it is implemented) for every ticket. Large problems can have tens of millions of tickets or more, which means every byte is precious.

Additionally, an operation that will clearly be necessary in the implementation is determining which tickets are matched by a particular m-match. If the ticket expression and match expression are both arrays of integers, the best comparison algorithm will require O(t) time. There is a better way.

Expressing the Problem in Memory

Perhaps it is not obvious exactly how large a full specification of a Lottery Wheel Generation problem can be. What is required to specify the problem? In addition to specifying each ticket, in order to determine coverages for use in the greedy Set Cover algorithm, we must also specify relationships between tickets. In other words, we must know which tickets cover which other tickets.

How many other tickets might a given ticket cover? The number of tickets in a t-from-r lottery is C(r,t). Each ticket has C(m,t) m-matches embedded in it. But it doesn’t stop there. Each m-match matches C(r-m,t-m) tickets. So the total number of ticket relationships is as many as:

C(r,t) * C(m,t) * C(r-m,t-m)

In reality, it is somewhat less than that because the above, for reasons of simplicity, does not account for double-counting (where two tickets might share more than one m-match with each other). But this formula should give a general idea of the magnitude of relationships involved.

As a visual aide, consider again the simple toy 3-from-6 lottery where we are looking to cover all 2-matches. The following is a graphical depiction of the relationships between tickets. A line is drawn between two ticket nodes if they share a 2-match in common.

A visualization of 2-match relationships in a 3-from-6 lottery. Click for full size.

Keep in mind, the above is only a 3-from-6 lottery which has only 20 possible tickets. Imagine what the visualization of a realistically-sized lottery might look like.

To put some hard numbers to it, consider the 6-from-49 lottery that we are targeting. Plugging in the appropriate values for r, t and m we discover that we might have to deal with as many as 4,245,486,537,600 (that’s four trillion) relationships. If each relationship takes a few bytes of memory to store, then we would be looking at needing tens of terabytes of RAM to be able to even express the problem. Clearly, we need a more compact expression of the problem than simply an enumeration of ticket-covering relationships.

Finishing Computation in a Reasonable Amount of Time

Any reduction in the storage requirements is necessarily going to increase the run-time requirements. This is known as a space-time tradeoff and is largely unavoidable. Anything that we choose not to store explicitly must be made up for by either re-computing the information on demand, or using some other method of making use of implicit information. In either case, the result will certainly be much slower than simply being able to directly look up the answer in memory.

Re-computing relationships is not a feasible solution for realistically-sized Lottery Wheel Generation problems. The large number of tickets (over 13 million) in the 6-from-49 problem means that every time we needed to know which tickets a given ticket covered, we would have to iterate over millions of tickets checking for shared 3-matches.

Due to the already-large size of the 6-from-49 problem, placing an iteration over millions of values inside of an inner loop of the program greatly increases the running time. While I did no empirical calculations on this, I suspect that the running time would easily stretch into months if not years if implemented using re-computation. We must find a way to implicitly express relationships.

« Greedy Algorithm Implementation Walkthrough

Relationship to Minimum Set Covering »

Share this content on: