Relationship to Minimum Set Covering

Part of a series of articles on

We will now briefly forget about the details of the lottery wheel problem and explore the theoretical concepts on which the computation of wheels will be built.

Covering Problems

A reader familiar with topics in the theory of computation will likely conclude immediately that lottery wheel generation is in the class of covering problems. A covering problem exists when you have a collection of items, each of which satisfy some of your needs. You want to select as small a number of items as possible while still meeting all of your needs.

A simple example of a covering problem is hiring sports coaches at a school. Each applicant can coach some, but not all, of the sports that the school needs coaches for. For example:

Coach	Sports
Adams	Soccer, Rugby, Swimming
Brown	Soccer, Baseball, Swimming
Caroll	Baseball, Rugby
Devin	Swimming
Ernst	Basketball

The school wishes to hire as few coaches as possible, yet still field teams in Baseball, Basketball, Soccer, Swimming, and Rugby. Since this is a small problem, it’s easy to see that hiring Coaches Brown, Caroll, and Ernst would be the best choice. Coach Adams, although he coaches three sports, duplicates all of his sports with coaches Brown and Caroll who can coach other sports as well, so Adams is not selected.

Set Covering

The above is an example of the Set Covering Problem in which you want sets of elements to “cover” some larger collection of elements. In this case, the elements to be covered are the sports, and the sets doing the covering are the coaches. The contents of a “coach set” are the skills that the coach provides. A “sport element” is “covered” when a coach with that skill is hired.

Despite the simplicity of the Set Covering example above, the Set Covering problem is in general very difficult. In fact, it is in the class of computational problems known as NP-Hard. Problems that are NP-Hard are problems for which no efficient solution is known, and generally none is thought to exist. In this context “efficient” means that the time required to solve the
problem scales at less-than-exponential rate as the size of the problem increases. Finding exact solutions to even moderately sized NP-hard problems can easily require billions or trillions of years using any amount of computing power available today.

Aside: The reader may be more familiar with the term NP-Complete as opposed to NP-Hard. For a discussion on this level, they mean essentially the same thing except that NP-Complete is a term applied to decision problems (which answer yes-or-no questions), and NP-Hard is term applied to optimization problems (which try to find the biggest, smallest, or best of something).

Solving Set Covering

If you don’t have billions of years to wait for the best solution to a Set Covering problem, one alternative solution is to try to efficiently compute an approximate solution. One that is not the optimal solution, but is close enough for your needs.

The Set Covering problem is particularly difficult because unlike some other NP-Hard problems, there does not even exist any efficient algorithm which reliably produces a particularly good approximation to the optimal solution for all possible Set Covering problems. In fact, there is a proof which shows that such good approximation algorithms for Set Cover cannot exist (unless P=NP, which is quite literally a million-dollar question).

Yet, there are two well-known approaches to approximating Set Covering. One is to express the problem as a integer-valued linear program and use a technique known as relaxation to obtain an approximate solution. I do not intend to explore this method, but I will later describe the results of some previously-published academic papers which did.

The other approach, which is simpler, is known as the “greedy” algorithm for Set Covering. This algorithm can be described succinctly as follows: As long as there remain uncovered elements, select the set which covers the largest number of remaining uncovered elements. Perhaps it isn’t obvious that this will not necessarily produce an optimal solution, but it does not.

Formulating Lottery Wheel Generation as Set Covering

There is a direct yet subtle mapping between Lottery Wheel Generation and Set Covering. Clearly, the sets you are selecting are the tickets you are buying. However, what exactly are you trying to cover?

One potential answer is that you are trying to cover all possible m-matches. Covering all possible m-matches would certainly create a valid lottery wheel, since no matter which m-matches appeared in the winning ticket, you would have some ticket in the wheel that covered them. But this formulation, even for an optimal solution, generates a wheel that is much larger than necessary. Why?

The reason is that it is not actually necessary to cover every m-match. In fact it is quite possible to get away with covering only a relatively small fraction of them. This is possible because the winning drawing is not a single m-match, but a ticket that contains t numbers and therefore has C(t,m) m-matches contained within it. And in order for a lottery wheel to be valid, it needs only to match a single one of the C(t,m) m-matches for any drawn winning ticket.

So if we are not covering m-matches, then what are we covering? We are covering drawn winning tickets. This may be a little difficult to conceptualize at first, since I am essentially asserting here that in the Set Covering formulation of this problem, the sets are tickets and the elements are also tickets. To disambiguate, we might say that the sets are purchased tickets, while the elements are winning tickets. We want to find a minimal set of purchased tickets that covers all winning tickets.

In this case, what does it mean to “cover” a winning ticket? Quite simply, a winning ticket is covered when at least one purchased ticket in the wheel contains at least one of that winning ticket’s m-matches.

Tractability of Lottery Wheel Generation as Set Covering

Just because we can express Lottery Wheel Generation as an instance of a Set Covering problem does not mean that the two are equivalent. In particular, it does not necessarily imply that Lottery Wheel Generation is NP-Hard. In order to do that, we would need to demonstrate the opposite: that any arbitrary Set Covering problem (or some other NP-Hard problem) could be expressed as a Lottery Wheel Generation problem. I have no proof one way or the other whether Lottery Wheel Generation is NP-Hard or not. But I suspect that it may not be NP-Hard, or that if it is NP-Hard that known approximation algorithms for Set Cover may produce better results on Lottery Wheel Generation problems then they do for general Set Cover problems. This is simply intuition, however. I intend to work formally on this later.

The reason I suspect this is because the Lottery Wheel Generation problem, when expressed as a Set Cover problem, has an extremely high degree of symmetry. All of the sets are exactly the same size, and they are distributed over the elements in an extremely regular manner owing to their basis in combinatorics. The elements that any two sets have in common is entirely determined by the number of m-matches that the corresponding tickets share. This regularity, it appears, may be sufficient to avoid many or all of the pathological cases that cause approximation algorithms to produce sub-optimal results.

This same symmetry, however, makes the integer linear programming approach more difficult. It was from this basis that I decided to implement a lottery wheel generator program based on the simple greedy algorithm for set covering in hopes of determining how good it would actually be.

« Implementation Challenges

Results and Analysis of the Greedy Implementation »

Share this content on: