The Universe of Discourse

Sat, 05 Feb 2022

Pennsylvania license plate numbers have four digits and when I'm driving I habitually try to factor these. (This hasn't yet led to any serious injury or property damage…) In general factoring is a hardish problem but when !!n<10000!! the worst case is !!9991 = 97·103!! which is not out of reach. The toughest part is when you find a factor like !!661!! or !!667!! and have to decide if it is prime. For !!667!! you might notice right off that it is !!676-9 = (26+3)(26-3)!! but for !!661!! you have to wonder if maybe there is something like that and you just haven't thought of it yet. (There isn't.)

A related problem is the Nearly Equal Factors (NEF) problem: given !!n!!, find !!a!! and !!b!!, as close as possible, with !!ab=n!!. If !!n!! has one large prime factor, as it often does, this is quite easy. For example suppose we are driving on the Interstate and are behind a car with license plate GJA 6968. First we divide this by !!8!!, so !!871!!, which is obviously not divisible by !!2, 3, 5, 7,!! or !!11!!. So try !!13!!: !!871-780 = 91!! so !!871 = 13·67!!. If we throw the !!67!! into the !!a!! pile and the other factors into the !!b!! pile we get !!6968 = 67· 104!! and it's obvious we can't divide up the factors more evenly: the !!67!! has to go somewhere and if we put anything else with it, its pile is now at least !!2·67 = 134!! which is already bigger than !!104!!. So !!67·104!! is the best we can do.

When I first started thinking about this I thought maybe there could be a divide-and-conquer algorithm. For example, suppose !!n=4m!!. Then if we could find an optimal !!ab=m!!, we could conclude that the optimal factorization of !!n!! would simply be !!n = 2a· 2b!!. Except no, that is completely wrong; a counterexample is !!n=20!! where the optimal factorization is !!5·4!!, not !!(2·5)·(2·1)!!. So it's not that simple.

It's tempting to conclude that NEF is NP-hard, because it does look a lot like Partition. In Partition someone hands you a list of numbers and demands to know of they can be divided into two piles with equal sums. This is NP-hard, and so the optimization version of it, where you are asked to produce two piles as nearly equal as possible, is at least as hard. The NEF problem seems similar: if you know the prime factors !!n=p_1p_2…p_k!! then you can imagine that someone handed you the numbers !!\log p_1, … \log p_k!! and asked you to partition them into two nearly-equal piles. But this reduction is in the wrong direction; it only proves that Partition is at least as hard as NEF. Could there be a reduction in the other direction? I don't see anything obvious, but maybe there is something known about Partition or Knapsack that shows that even this restricted version is hard. [ Addendum: see below. ]

In practice, the first-fit-decreasing (FFD) algorithm usually performs well for this sort of problem. In FFD we go through the prime factors in decreasing order, and throw each one into the bin that is least full. This always works when there is one large prime factor. For example with !!6968!! we throw the !!67!! into the !!a!! bin, then the !!13!! and two of the !!2!!s into the !!b!! bin, at which point we have !!67·52!!, so the final !!2!! goes into the !!b!! bin also, and this is optimal. FFD does find the optimal solution much of the time, but not always. It works for !!20!! but fails for !!72 = 2^33^2!! because the first thing it does is to put the threes into separate bins. But the optimal solution !!72=9·8!! puts them in the same bin. Still it works for nearly all small numbers.

I would like to look into whether FFD it produces optimal results almost all of the time, and if so how almost? Wikipedia seems to say that the corresponding FFD algorithm for Partition, called LPT-first scheduling is guaranteed to produce a larger total which is no more than !!\frac76!! as big as the optimal, which would mean that for the NEF problem !!n=ab!! and !!a\ge b!! it will produce an !!a!! value no more than !!OPT^{7/6}!! where !!OPT!! is the minimum possible !!a!!.

Some small-number cases where FFD fails are:

$$\begin{array}{rccc} n & \text{FFD} = ab & \text{Optimal} & \frac{\log(a)}{\log(OPT)} (≤ 1.167) \\ 72 & 12·6 & 9·8 & 1.131 \\ 180 & 18·10 & 15·12 & 1.067 \\ 240 & 20·12 & 16·15 & 1.080 \\ 288 & 24·12 & 18·16 & 1.100 \\ 336 & 28·12 & 21·16 & 1.094 \\ 540 & 30·18 & 27·20 & 1.032 \\ \end{array}$$

I wrote code to compute these and then I lost it.

I would also like to look at algorithms for NEF that don't begin by factoring !!n!!. We can guarantee to find optimal solutions with brute force, and in some cases this works very well. Consider !!240!! we begin by computing (in at most !!O(\log^2 n)!! time) the integer square root of !!240!!, which is !!15!!. Then since !!240!! is a multiple of !!15!! we have !!240=16·15!! and we win. In general of course it is not so easy, and it fails even in some cases where !!n!! is easy to factor. !!n=p^{2k+1}!! is especially unfortunate. Say !!n=243!! so the square root is !!15!!, which is not a factor of !!243!!. So we try !!14,!! then !!13,12,11,10!! and at last we find !!243=27·9!!.

[ Addendum 20220207: Dan Brumleve points out that it is NP-complete to decide whether, given numbers !!L, U, N!!, there is a number !!f!! in !! [L, U]!! that divides !!N!!. Using this, he shows that it is probably NP-complete to decide whether a given !!N!! is a product of two integers with !!\lvert a-b\rvert ≤ N^{1/4}!!. “Probably” here means that the reduction from Partition is polynomial time if Cramér's conjecture is correct. ]