The Universe of Discourse

Well, I guess I believe everything now!

The principle of explosion is that in an inconsistent system everything is provable: if you prove both and not- for any , you can then conclude for any :

$$(P \land \lnot P) \to Q.$$

This is, to put it briefly, not intuitive. But it is awfully hard to get rid of because it appears to follow immediately from two principles that are intuitive:

If we can prove that is true, then we can prove that at least one of or is true. (In symbols, .)
If we can prove that at least one of or is true, and we can prove that is false, then we may conclude that that is true. (Symbolically, .).

Then suppose that we have proved that is both true and false. Since we have proved true, we have proved that at least one of or is true. But because we have also proved that is false, we may conclude that is true. Q.E.D.

This proof is as simple as can be. If you want to get rid of this, you have a hard road ahead of you. You have to follow Graham Priest into the wilderness of paraconsistent logic.

Raymond Smullyan observes that although logic is supposed to model ordinary reasoning, it really falls down here. Nobody, on discovering the fact that they hold contradictory beliefs, or even a false one, concludes that therefore they must believe everything. In fact, says Smullyan, almost everyone does hold contradictory beliefs. His argument goes like this:

Consider all the things I believe individually, . I believe each of these, considered separately, is true.
However, I also believe that I'm not infallible, and that at least one of is false, although I don't know which ones.
Therefore I believe both (because I believe each of the separately) and (because I believe that not all the are true).

And therefore, by the principle of explosion, I ought to believe that I believe absolutely everything.

Well anyway, none of that was exactly what I planned to write about. I was pleased because I noticed a very simple, specific example of something I believed that was clearly inconsistent. Today I learned that K2, the second-highest mountain in the world, is in Asia, near the border of Pakistan and westernmost China. I was surprised by this, because I had thought that K2 was in Kenya somewhere.

But I also knew that the highest mountain in Africa was Kilimanjaro. So my simultaneous beliefs were flatly contradictory:

K2 is the second-highest mountain in the world.
Kilimanjaro is not the highest mountain in the world, but it is the highest mountain in Africa
K2 is in Africa

Well, I guess until this morning I must have believed everything!

3-coloring the vertices of an icosahedron

I don't know that I have a point about this, other than that it makes me sad.

A recent Math SE post (since deleted) asked:

How many different ways are there to color the vertices of the icosahedron with 3 colors such that no two adjacent vertices have the same color?

I would love to know what was going on here. Is this homework? Just someone idly wondering?

Because the interesting thing about this question is (assuming that the person knows what an icosahedron is, etc.) it should be solvable in sixty seconds by anyone who makes the least effort. If you don't already see it, you should try. Try what? Just take an icosahedron, color the vertices a little, see what happens. Here, I'll help you out, here's a view of part of the end of an icosahedron, although I left out most of it. Try to color it with 3 colors so that no two adjacent vertices have the same color, surely that will be no harder than coloring the whole icosahedron.

The explanation below is a little belabored, it's what OP would have discovered in seconds if they had actually tried the exercise.

Let's color the middle vertex, say blue.

The five vertices around the edge can't be blue, they must be the other two colors, say red and green, and the two colors must alternate:

Ooops, there's no color left for the fifth vertex.

The phrasing of the question, “how many” makes the problem sound harder than it is: the answer is zero because we can't even color half the icosahedron.

If OP had even tried, even a little bit, they could have discovered this. They didn't need to have had the bright idea of looking at a a partial icosahedron. They could have grabbed one of the pictures from Wikipedia and started coloring the vertices. They would have gotten stuck the same way. They didn't have to try starting in the middle of my diagram, starting at the edge works too: if the top vertex is blue, the three below it must be green-red-green, and then the bottom two are forced to be blue, which isn't allowed. If you just try it, you win immediately. The only way to lose is not to play.

Before the post was deleted I suggested in a comment “Give it a try, see what happens”. I genuinely hoped this might be helpful. I'll probably never know if it was.

Like I said, I would love to know what was going on here. I think maybe this person could have used a dose of Lower Mathematics.

Just now I wondered for the first time: what would it look like if I were to try to list the principles of Lower Mathematics? “Try it and see” is definitely in the list.

Buy
How to Solve It

from Bookshop.org
(with kickback)
(without kickback)

Then I thought: How To Solve It has that sort of list and something like “try it and see” is probably on it. So I took it off the shelf and found: “Draw a figure”, “If you cannot solve the proposed problem”, “Is it possible to satisfy the condition?”. I didn't find anything called “fuck around with it and see what you learn” but it is probably in there under a different name, I haven't read the book in a long time. To this important principle I would like to add “fuck around with it and maybe you will stumble across the answer by accident” as happened here.

Mathematics education is too much method, not enough heuristic.

Optimal boxes with and without lids

Sometime around 1986 or so I considered the question of the dimensions that a closed cuboidal box must have to enclose a given volume but use as little material as possible. (That is, if its surface area should be minimized.) It is an elementary calculus exercise and it is unsurprising that the optimal shape is a cube.

Then I wondered: what if the box is open at the top, so that it has only five faces instead of six? What are the optimal dimensions then?

I did the calculus, and it turned out that the optimal lidless box has a square base like the cube, but it should be exactly half as tall.

For example the optimal box-with-lid enclosing a cubic meter is a 1×1×1 cube with a surface area of .

Obviously if you just cut off the lid of the cubical box and throw it away you have a one-cubic-meter lidless box with a surface area of . But the optimal box-without-lid enclosing a cubic meter is shorter, with a larger base. It has dimensions $$2^{1/3} \cdot 2^{1/3} \cdot \frac{2^{1/3}}2$$

and a total surface area of only . It is what you would get if you took an optimal complete box, a cube, that enclosed two cubic meters, cut it in half, and threw the top half away.

I found it striking that the optimal lidless box was the same proportions as the optimal complete box, except half as tall. I asked Joe Keane if he could think of any reason why that should be obviously true, without requiring any calculus or computation. “Yes,” he said. I left it at that, imagining that at some point I would consider it at greater length and find the quick argument myself.

Then I forgot about it for a while.

Last week I remembered again and decided it was time to consider it at greater length and find the quick argument myself. Here's the explanation.

Take the cube and saw it into two equal halves. Each of these is a lidless five-sided box like the one we are trying to construct. The original cube enclosed a certain volume with the minimum possible material. The two half-cubes each enclose half the volume with half the material.

If there were a way to do better than that, you would be able to make a lidless box enclose half the volume with less than half the material. Then you could take two of those and glue them back together to get a complete box that enclosed the original volume with less than the original amount of material. But we already knew that the cube was optimal, so that is impossible.

Math SE report 2023-10: Peano's definition of addition is not a tautology, and what was great about Ramanujan?

Content warning: grumpy complaining. This was a frustrating month.

Need an intuitive example for how "P is necessary for Q" means "Q⇒P"?

This kind of thing comes up pretty often. Why are there so many ways that the logical expression can appear in natural language?

If , then
implies
if
is sufficient for
is necessary for

Strange, isn't it? is much simpler: “Both and are true” is pretty much it.

Anyway this person wanted an intuitive example of “ is necessary for ”

I suggested:

Suppose that it is necessary to have a ticket () in order to board a certain train (). That is, if you board the train (), then you have a ticket ().

Again this follows the principle that rule enforcement is a good thing when you are looking for intuitive examples. Keeping ticketless people off the train is something that the primate brain is wired up to do well.

My first draft had “board a train” in place of “board a certain train”. One commenter complained:

many people travel on trains without a ticket, worldwide

I was (and am) quite disgusted by this pettifogging.

I said “Suppose that…”. I was not claiming that the condition applies to every train in all of history.

OP had only asked for an example, not some universal principle.

Does ...999.999... = 0?

This person is asking one of those questions that often puts Math StackExchange into the mode of insisting that the idea is completely nonsensical, when it is actually very close to perfectly mundane mathematics. (Previously: [1] [2] [3] ) That didn't happen this time, which I found very gratifying.

Normally, decimal numerals have a finite integer part on the left of the decimal point, and an infinite fractional part on the right of the decimal point, as with (for example) . It turns out to work surprisingly well to reverse this, allowing an infinite integer part on the left and a finite fractional part on the right, for example . For technical reasons we usually do this in base where is prime; it doesn't work as well in base . But it works well enough to use: If we have the base-10 numeral and we add , using the ordinary elementary-school right-to-left addition algorithm, the carry in the units place goes to the tens place as usual, then the next carry goes to the hundreds place and so on to infinity, leaving us with , so that can be considered a representation of the number , and that means we don't need negation signs.

In fact this system is fundamental to the way numbers are represented in computer arithmetic. Inside the computer the integer is literally represented as the base-2 numeral , and when we add to it the carry bit wanders off toward infinity on the left. (In the computer the numeral is finite, so we simulate infinity by just discarding the carry bit when it gets too far away.)

Once you've seen this a very reasonable next question is whether you can have numbers that have an infinite sequence of digits on both sides. I think something goes wrong here — for one thing it is no longer clear how to actually do arithmetic. For the infinite-to-the-left numerals arithmetic is straightforward (elementary-school algorithms go right-to-left anyway) and for the standard infinite-to-the-right numerals we can sort of fudge it. (Try multiplying the infinite decimal for by itself and see what trouble you get into. Or simpler: What's ?)

OP's actual question was: If can be considered to represent , and if can be considered to represent , can we add them and conclude that ?

This very deserving question got a good answer from someone who was not me. This was a relief, because my shameful answer was pure shitpostery. It should have been heavily downvoted, but wasn't. The gods of Math SE karma are capricious.

Why define addition with successor?

Ugh, so annoying. OP had read (Bertrand Russell's explanation of) the Peano definition of addition, and did not understand it. Several people tried hard to explain, but communication was not happening. Or, perhaps, OP was more interested in having an argument than in arriving at an understanding. I lost a bit of my temper when they claimed:

Russell's so-called definition of addition (as quoted in my question) is nothing but a tautology: ….

I didn't say:

If you think Bertrand Russell is stupid, it's because you're stupid.

although I wanted to at first. The reply I did make is still not as measured as I would like, and although it leaves this point implicit, the point is still there. I did at least shut up after that. I had answered OP's question as well as I was able, and carrying on a complex discussion in the comments is almost never of value.

Why is Ramanujan considered a great mathematician?

This was easily my best answer of the month, but the question was deleted, so you will only be able to see it if you have enough Math SE reputation.

OP asked a perfectly reasonable question: Ramanujan gets a lot of media hype for stuff like this:

$${\sqrt {\phi +2}}-\phi ={\cfrac {e^{{-2\pi /5}}}{1+{\cfrac {e^{{-2\pi }}}{1+{\cfrac {e^{{-4\pi }}}{1+{\cfrac {e^{{-6\pi }}}{1+\,\cdots }}}}}}}}$$

which is not of any obvious use, so “why is it given such high regard?”

OP appeared to be impugning a famous mathematician, and Math SE always responds badly to that; their heroes must not be questioned. And even worse, OP mentioned the notorious non-fact that $$1+2+3+\ldots =-\frac1{12}$$ which drives Math SE people into a frothing rage.

One commenter argued:

Mathematics is not inherently about its "usefulness". Even if you can't find practical use for those formulas, you still have to admit that they are by no means trivial

I think this is fatuous. OP is right here, and the commenter is wrong. Mathematicians are not considered great because they produce wacky and impractical equations. They are considered great because they solve problems, invent techniques that answer previously impossible questions, and because they contribute insights into deep and complex issues.

Some blockhead even said:

Most of the mathematical results are useless. Mathematics is more like an art.

Bullshit. Mathematics is about trying to understand stuff, not about taping a banana to the wall. I replied:

I don't think “mathematics is not inherently about its usefulness" is an apt answer here. Sometimes mathematical results have application to physics or engineering. But for many mathematical results the application is to other parts of mathematics, and mathematicians do judge the ‘usefulness’ of results on this basis. Consider for example Mochizuki's field of “inter-universal Teichmüller theory”. This was considered interesting only as long as it appeared that it might provide a way to prove the conjecture. When that hope collapsed, everyone lost interest in it.

My answer to OP elaborated on this point:

The point of these formulas wasn't that they were useful in themselves. It's that in order to find them he had to have a deep understanding of matters that were previously unknown. His contribution was the deep understanding.

I then discussed Hardy's book on the work he did with Ramanujan and Hardy's own estimation of Ramanujan's work:

The first chapter is somewhat negative, as it summarizes the parts of Ramanujan's work that he felt didn't have lasting value — because Hardy's next eleven chapters are about the work that he felt did have value.

So if OP wanted a substantive and detailed answer to their question, that would be the first place to look.

I also did an arXiv search for “Ramanujan” and found many recent references, including one with “applications to the Ramanujan -function”, and concluded:

The -function is the subject of the entire chapter 10 of Hardy's book and appears to still be of interest as recently as last Monday.

The question was closed as “opinion-based” (a criticism that I think my answer completely demolishes) and then it was deleted. Now if someone else trying to find out why Ramanujan is held in high regard they will not be able to find my factual, substantive answer.

Screw you, Math SE. This month we both sucked.

Uncountable sets for seven-year-olds

I was recently talking to a friend whose seven-year old had been reading about the Hilbert Hotel paradoxes. One example: The hotel is completely full when a bus arrives with 53 passengers seeking rooms. Fortunately the hotel has a countably infinite number of rooms, and can easily accomodate 53 more guests even when already full.

My friend mentioned that his kid had been unhappy with the associated discussion of uncountable sets, since the explanation he got involved something about people whose names are infinite strings, and it got confusing. I said yes, that is a bad way to construct the example, because names are not infinite strings, and even one infinite string is hard to get your head around. If you're going to get value out of the hotel metaphor, you waste an opportunity if you abandon it for some weird mathematical abstraction. (“Okay, Tyler, now let be a projection from a vector bundle onto a compact Hausdorff space…”)

My first attempt on the spur of the moment involved the guests belonging to clubs, which meet in an attached convention center with a countably infinite sequence of meeting rooms. The club idea is good but my original presentation was overcomplicated and after thinking about the issue a little more I sent this email with my ideas for how to explain it to a bright seven-year-old.

Here's how I think it should go. Instead of a separate hotel and convention center, let's just say that during the day the guests vacate their rooms so that clubs can meet in the same rooms. Each club is assigned one guest room that they can use for their meeting between the hours of 10 AM to 4 PM. The guest has to get out of the room while that is happening, unless they happen to be a member of the club that is meeting there, in which case they may stay.

If you're a guest in the hotel, you might be a member of the club that meets in your room, or you might not be a member of the club that meets in your room, in which case you have to leave and go to a meeting of one of your clubs in some other room.

We can paint the guest room doors blue and green: blue, if the guest there is a member of the club that meets in that room during the day, and green if they aren't. Every door is now painted blue or green, but not both.

Now I claim that when we were assigning clubs to rooms, there was a club we missed that has nowhere to meet. It's the Green Doors Club of all the guests who are staying in rooms with green doors.

If we did assign the Green Doors Club a guest room in which to meet, that door would be painted green or blue.

The Green Doors Club isn't meeting in a room with a blue door. The Green Doors Club only admits members who are staying in rooms with green doors. That guest belongs to the club that meets in their room, and it isn't the Green Doors Club because the guest's door is blue.

But the Green Doors Club isn't meeting in a room with a green door. We paint a door green when the guest is not a member of the club that meets in their room, and this guest is a member of the Green Doors Club.

So however we assigned the clubs to the rooms, we must have missed out on assigning a room to the Green Doors Club.

One nice thing about this is that it works for finite hotels too. Say you have a hotel with 4 guests and 4 rooms. Well, obviously you can't assign a room to each club because there are 16 possible clubs and only 4 rooms. But the blue-green argument still works: you can assign any four clubs you want to the four rooms, then paint the doors, then figure out who is in the Green Doors Club, and then observe that, in fact, the Green Doors Club is not one of the four clubs that got a room.

Then you can reassign the clubs to rooms, this time making sure that the Green Doors Club gets a room. But now you have to repaint the doors, and when you do you find out that membership in the Green Doors Club has changed: some new members were admitted, or some former members were expelled, so the club that meets there is no longer the Green Doors Club, it is some other club. (Or if the Green Doors Club is meeting somewhere, you will find that you have painted the doors wrong.)

I think this would probably work. The only thing that's weird about it is that some clubs have an infinite number of members so that it's hard to see how they could all squeeze into the same room. That's okay, not every member attends every meeting of every club they're in, that would be impossible anyway because everyone belongs to multiple clubs.

But one place you could go from there is: what if we only guarantee rooms to clubs with a finite number of members? There are only a countably infinite number of clubs then, so they do all fit into the hotel! Okay, Tyler, but what happens to the Green Door Club then? I said all the finite clubs got rooms, and we know the Green Door Club never gets a room, so what can we conclude?

It's tempting to try to slip in a reference to Groucho Marx, but I think it's unlikely that that will do anything but confuse matters.

[ Previously ]

[ Update: My friend said he tried it and it didn't go over as well as I thought it might. ]

Math SE report 2023-09: Sense and reference, Wason tasks, what is a sequence?

Proving there is only one proof?

OP asks:

In mathematics, is it possible to prove that there is only one (shortest) proof of a given theorem (say, in ZFC)?

This was actually from back in July, when there was a fairly substantive answer. But it left out what I thought was a simpler, non-substantive answer: For a given theorem it's actually quite simple to prove that there is (or isn't) only one proof of : just generate all possible proofs in order by length until you find the shortest proofs of , and then stop before you generate anything longer than those. There are difficult and subtle issues in provability theory, but this isn't one of them.

I say “non-substantive” because it doesn't address any of the possibly interesting questions of why a theorem would have only one proof, or multiple proofs, or what those proofs would look like, or anything like that. It just answers the question given: is it possible to prove that there is only one shortest proof.

So depending on what OP was looking for, it might be very unsatisfying. Or it might be hugely enlightening, to discover that this seemingly complicated question actually has a simple answer, just because proofs can be systematically enumerated.

This comes in handy in more interesting contexts. Gödel showed that arithmetic contains a theorem whose shortest proof is at least one million steps long! He did it by constructing an arithmetic formula which can be interpreted as saying:

cannot be proved in less than one million steps.

If is false, it can be proved (in less than one million steps) and our system is inconsistent. So assuming that our axioms are consistent, then is true and either:

There is no proof of at all of , or
There are proofs of but the shortest one is at least a million steps

Which is it? It can't be (1) because there is a proof of : simply generate every single proof of one million steps or fewer, and check at the last line of each one to make sure that it is not . So it must be (2).

What counts as a sequence, and how would we know that it isn't deceiving?

This is a philosophical question: What is a sequence, really? And:

if I write down random numbers with no pattern at all except for the fact that it gets larger, is it a viable sequence?

And several other related questions that are actually rather subtle: Is a sequence defined by its elements, or by some external rule? If the former how can you know when a sequence is linear, when you can only hope to examine a finite prefix?

I this is a great question because I think a sequence, properly construed, is both a rule and its elements. The definition says that a sequence of elements of is simply a function . This definition is a sort of spherical cow: it's a nice, simple model that captures many of the mathematical essentials of the thing being modeled. It works well for many purposes, but you get into trouble if you forget that it's just a model. It captures the denotation, but not the sense. I wouldn't yak so much about this if it wasn't so often forgotten. But the sense is the interesting part. If you forget about it, you lose the ability to ask questions like

Are sequences and the same sequence?

If all you have is the denotation, there's only one way to answer this question:

By definition, yes, if and only if and are the same function.

and there is nothing further to say about it. The question is pointless and the answer is useless. Sometimes the meaning is hidden a little deeper. Not this time. If we push down into the denotation, hoping for meaning, we find nothing but more emptiness:

Q: What does it mean to say that and are the same function?

A: It means that the sets $$S_1 = \{ \langle i, s_1(i) \rangle \mid i\in \Bbb N\}$$ and $$S_2 = \{ \langle i, s_2(i) \rangle \mid i\in \Bbb N\}$$ have exactly the same elements.

We could keep going down this road, but it goes nowhere and having gotten to the end we would have seen nothing worth seeing.

But we do ask and answer this kind of question all the time. For example:

is the infinite sequence of odd numbers starting at
is the infinite sequence of numbers that are the difference between a square and its previous square, starting at

Are sequences and the same sequence? Yes, yes, of course they are, don't focus on the answer. Focus on the question! What is this question actually asking?

The real essence of the question is not about the denotation, about just the elements. Rather: we're given descriptions of two possible computations, and the question is asking if these two computations will arrive at the same results in each case. That's the real question.

Well, I started this blog article back in October and it's still not ready because I got stuck writing about this question. I think the answer I gave on SE is pretty good, OP asked what is essentially a philosophical question and the backbone of my answer is on the level of philosophy rather than mathematics.

[ Addendum: On review, I am pleasantly surprised that this section of the blog post turned out both coherent and relevant. I really expected it to be neither. A Thanksgiving miracle! ]

Can inequalities be added the way that equations can be added?

OP says:

Suppose you have and . Adding the inequalities, the terms cancel and you end up with … . It is not intuitively obvious to me that this holds true … I can see that you can't subtract inequalities, but is it always okay to add them?

I have a theory that if someone is having trouble with the intuitive meaning of some mathematical property, it's a good idea to turn it into a question about fair allocation of resources, or who has more of some commodity, because human brains are good at monkey tasks like seeing who got cheated when the bananas were shared out.

About ten years ago someone asked for an intuitive explanation of why you could add to both sides of to get . I said:

Say I have half a bag of cookies, that's cookies, and you have half a carton of cookies, that's cookies, and the carton is bigger than the bag, so you have more than me, so that .

Now a friendly djinn comes along and gives you another half a bag of cookies, . And to be fair he gives me half a bag too, also .

So you had more cookies before, and the djinn gave each of us an extra half a bag. Then who has more now?

I tried something similar this time around:

Say you have two bags of cookies, and . A friendly baker comes by and offers to trade with you: you will give the baker your bag and in return you will get a larger bag which contains more cookies. That is, . You like cookies, so you agree.

Then the baker also trades your bag for a bigger bag .

Is it possible that you might not have more cookies than before you made the trades? … But that's what it would mean if and but not too.

Someday I'll write up a whole blog article about this idea, that puzzles in arithmetic sometimes become intuitively obvious when you turn them into questions about money or commodities, and that puzzles in logic sometimes become intuitively obvious when you turn them into questions about contract and rule compliance.

I don't remember why I decided to replace the djinn with a baker this time around. The cookies stayed the same though. I like cookies. Here's another cookie example, this time to explain why .

What is the difference between "for all" and "there exists" in set builder notation?

This is the same sort of thing again. OP was was asking about

$$B = \{n \in \mathbb{N} : \forall x \in \mathbb{N} \text{ and } n=2^x\}$$

but attempting to understand this is trying to swallow two pills at once. One pill is the logic part (what role is the playing) and the other pill is the arithmetic part having to do with powers of . If you're trying to understand the logic part and you don't have an instantaneous understanding of powers of , it can be helpful to simplify matters by replacing the arithmetic with something you understand intuitively. In place of the relation I like to use the relation “ is the mother of ”, which everyone already knows.

Are infinities included in the closure of the real set

This is a good question by the Chip Buchholtz criterion: The answer is much longer than the question was. OP wants to know if the closure of is just or if it's some larger set like . They are running up against the idea that topological closure is not an absolute notion; it only makes sense in the context of an enclosing space.

I tried to draw an analogy between the closure and the complement of a set: Does the complement of the real numbers include the number ? Well, it depends on the context.

OP preferred someone else's answer, and I did too, saying:

I thought your answer was better because it hit all the important issues more succinctly!

I try to make things very explicit, but the downside of that is that it makes my answers longer, and shorter is generally better than longer. Sometimes it works, and sometimes it doesn't.

Vacuous falsehood - does it exist, and are there examples?

I really liked this question because I learned something from it. It brought me up short: “Huh,” I said. “I never thought about that.” Three people downvoted the question, I have no idea why.

I didn't know what a vacuous falsity would be either but I decided that since the negation of a vacuous truth would be false it was probably the first thing to look at. I pulled out my stock example of vacuous truth, which is:

All my rubies are red.

This is true, because all rubies are red, but vacuously so because I don't own any rubies.

Since this is a vacuous truth, negating it ought to give us a vacuous falsity, if there is such a thing:

I have a ruby that isn't red.

This is indeed false. And not in the way one would expect! A more typical false claim of this type would be:

I have a belt that isn't leather.

This is also false, in rather a different way. It's false, but not vacuously so, because to disprove it you have to get my belts out of the closet and examine them.

Now though I'm not sure I gave the right explanation in my answer. I said:

In the vacuously false case we don't even need to read the second half of the sentence:
there is a ruby in my vault that …
… The irrelevance of the “…is not red” part is mirrored exactly in the irrelevance of the “… are red” part in the vacuously true statement:
all the rubies in my vault are …

But is this the right analogy? I could have gone the other way:

In the vacuously false case we don't even need to read the first half of the sentence:
there is a ruby … that is not red
… The irrelevance of the “… in my vault …” part is mirrored exactly in the irrelevance of the “… are red” part in the vacuously true statement:
all the rubies in my vault are …

Ah well, this article has been drying out on the shelf for a month now, I'm making an editorial decision to publish it without thinking about it any more.

The discrete logarithm, shorter and simpler

I recently discussed the “discrete logarithm” method for multiplying integers, and I feel like I took too long and made it seem more complicated and mysterious than it should have been. I think I'm going to try again.

Suppose for some reason you found yourself needing to multiply a lot of powers of . What's ? You could use the conventional algorithm:

$$ \begin{array}{cccccccc} & & & & 4 & 0 & 9 & 6 \\ × & & & & & 5 & 1 & 2 \\ \hline % & & & & 8 & 1 & 9 & 2 \\ & & & 4 & 0 & 9 & 6 & \\ & 2 & 0 & 4 & 8 & 0 & & \\ \hline % & 2 & 0 & 9 & 7 & 1 & 5 & 2 \end{array} $$

but that's a lot of trouble, and a simpler method is available. You know that $$2^i\cdot 2^j = 2^{i+j}$$

so if you had an easy way to convert $$2^i\leftrightarrow i$$ you could just convert the factors to exponents, add the exponents, and convert back. And all that's needed is a simple table:

\begin{array}{rr} 0 & 1\\ 1 & 2\\ 2 & 4\\ 3 & 8\\ 4 & 16\\ 5 & 32\\ 6 & 64\\ 7 & 128\\ 8 & 256\\ 9 & 512\\ 10 & 1\,024\\ 11 & 2\,048\\ 12 & 4\,096\\ 13 & 8\,192\\ 14 & 16\,384\\ 15 & 32\,768\\ 16 & 65\,536\\ 17 & 131\,072\\ 18 & 262\,144\\ 19 & 524\,288\\ 20 & 1\,048\,576\\ 21 & 2\,097\,152\\ \vdots & \vdots \\ \end{array}

We check the table, and find that $$4096\cdot512 = 2^{12}\cdot 2^9 = 2^{12+9} = 2^{21} = 2097152.$$ Easy-peasy.

That is all very well but how often do you find yourself having to multiply a lot of powers of ? This was a lovely algorithm but with very limited application.

What Napier (the inventor of logarithms) realized was that while not every number is an integer power of , every number is an integer power of , or nearly so. For example, is very close to . Napier made up a table, just like the one above, except with powers of instead of powers of . Then to multiply you would just find numbers close to and in Napier's table and use the same algorithm. (Napier's original table used powers of , but it works the same way for the same reason.)

There's another way to make it work. Consider the integers mod , called . In , every number is an integer power of ‍!

For example, is a power of . It's simply , because if you multiply out you get , and .

Or:

$$\begin{array}{rcll} 14 & \stackrel{\pmod{101}}{\equiv} & 10\cdot 101 & + 14 \\ & = & 1010 & + 14 \\ & = & 1024 \\ & = & 2^{10} \end{array} $$

Or:

$$\begin{array}{rcll} 3 & \stackrel{\pmod{101}}{\equiv} & 5844512973848570809\cdot 101 & + 3 \\ & = & 590295810358705651709 & + 3 \\ & = & 590295810358705651712 \\ & = & 2^{69} \end{array} $$

Anyway that's the secret. In the silly algorithm that quickly multiplies powers of becomes more practical, because in , every number is a power of .

What works for works in other cases larger and more interesting. It doesn't work to replace with (try it and see what goes wrong), but we can replace it with , or . The key is that if we want to replace with and with , we need to be sure that there is a solution to for every possible . (The jargon term here is that must be a “primitive root mod ”. is a primitive root mod , but not mod .)

Is this actually useful for multiplication? Perhaps not, but it does have cryptographic applications. Similar to how multiplying is easy but factoring seems difficult, computing for given is easy, but nobody knows a quick way in general to reverse the calculation and compute the for which for a given . When is small we can simply construct a lookup table with entries. But if is a -digit number, the table method is impractical. Because of this, Alice and Bob can find a way to compute a number that they both know, but someone else, seeing can't easily figure out what the original was. See Diffie-Hellman key exchange for more details.

[ Also previously: Percy Ludgate's weird variation on this ]

The discrete logarithm

[ Addendum 20231020: This came out way longer than it needed to be, so I took another shot at it, and wrote a much simpler explanation of the same thing that is only one-third as long. ]

A couple days ago I discussed the weird little algorithm of Percy Ludgate's, for doing single-digit multiplication using a single addition and three scalar table lookups. In Ludgate's algorithm, there were two tables, and , satisfying the following properties:

$$ \begin{align} T_2(T_1(n)) & = n \tag{$\color{darkgreen}{\spadesuit}$} \\ T_2(T_1(a) + T_1(b)) & = ab. \tag{$\color{purple}{\clubsuit}$} \end{align} $$

This has been called the “Irish logarithm” method because of its resemblance to ordinary logarithms. Normally in doing logarithms we have a magic logarithm function with these properties:

$$ \begin{align} \ell^{-1}(\ell(n)) & = n \tag{$\color{darkgreen}{\spadesuit}$} \\ \ell^{-1}(\ell(a) + \ell(b)) & = ab. \tag{$\color{purple}{\clubsuit}$} \end{align} $$

(The usual notation for is of course “” or “” or something of that sort, and is usually written or .)

The properties of Ludgate's and are formally identical, with playing the role of the logarithm function and playing the role of its inverse . Ludgate's versions are highly restricted, to reduce the computation to something simple enough that it can be implemented with brass gears.

Both and map positive integers to positive integers, and can be implemented with finite lookup tables. The ordinary logarithm does more, but is technically much more difficult. With the ordinary logarithm you are not limited to multiplying single digit integers, as with Ludgate's weird little algorithm. You can multiply any two real numbers, and the multiplication still requires only one addition and three table lookups. But the cost is huge! The tables are much larger and more complex, and to use them effectively you have to deal with fractional numbers, perform table interpolation, and worry about error accumulation.

It's tempting at this point to start explaining the history and use of logarithm tables, slide rules, and so on, but this article has already been delayed once, so I will try to resist. I will do just one example, with no explanation, to demonstrate the flavor. Let's multiply by .

I look up in my table of logarithms and find that .
I look up similarly and find that .
I add .
I do a reverse lookup on and find that the result is approximately .

If I were multiplying by , I would be willing to accept all these costs, and generations of scientists and engineers did accept them. But for the process is ridiculous. One might wonder if there wasn't some analogous technique that would retain the small, finite tables, and permits multiplication of integers, using only integer calculations throughout. And there is!

Now I am going to demonstrate an algorithm, based on logarithms, that exactly multiplies any two integers and , as long as . Like Ludgate's and the standard algorithm, it will use one addition and three lookups in tables. Unlike the standard algorithm, the tables will be small, and will contain only integers.

Here is the table of the function, which corresponds to Ludgate's :

$$ \begin{array}{rrrrrrrrrrr} {\tiny\color{gray}{1}} & 0, & 1, & \color{darkblue}{69}, & 2, & 24, & 70, & \color{darkgreen}{9}, & 3, & 38, & 25, \\ {\tiny\color{gray}{11}} & 13, & \color{darkblue}{71}, & \color{darkgreen}{66}, & 10, & 93, & 4, & 30, & 39, & 96, & 26, \\ {\tiny\color{gray}{21}} & 78, & 14, & 86, & 72, & 48, & 67, & 7, & 11, & 91, & 94, \\ {\tiny\color{gray}{31}} & 84, & 5, & 82, & 31, & 33, & 40, & 56, & 97, & 35, & 27, \\ {\tiny\color{gray}{41}} & 45, & 79, & 42, & 15, & 62, & 87, & 58, & 73, & 18, & 49, \\ {\tiny\color{gray}{51}} & 99, & 68, & 23, & 8, & 37, & 12, & 65, & 92, & 29, & 95, \\ {\tiny\color{gray}{61}} & 77, & 85, & 47, & 6, & 90, & 83, & 81, & 32, & 55, & 34, \\ {\tiny\color{gray}{71}} & 44, & 41, & 61, & 57, & 17, & 98, & 22, & 36, & 64, & 28, \\ {\tiny\color{gray}{81}} & \color{darkred}{76}, & 46, & 89, & 80, & 54, & 43, & 60, & 16, & 21, & 63, \\ {\tiny\color{gray}{91}} & 75, & 88, & 53, & 59, & 20, & 74, & 52, & 19, & 51, & 50\hphantom{,} \\ \end{array} $$

(If we only want to multiply numbers with we only need the first row, but with the full table we can also compute things like .)

Like , this is not really a two-dimensional array. It just a list of numbers, arranged in rows to make it easy to find the st number when you need it. The small gray numerals in the margin are a finding aid. If you want to look up you can see that it is without having to count up elements. This element is highlighted in red in the table above.

Note that the elements are numbered from to , whereas all the other tables in these articles have been zero-indexed. I wondered if there was a good way to fix this, but there really isn't. is analogous to a logarithm function, and the one thing a logarithm function really must do is to have . So too here; we have .

We also need an table analogous to Ludgate's :

$$ \begin{array}{rrrrrrrrrrr} {\tiny\color{gray}{0}} & 1, & 2, & 4, & 8, & 16, & 32, & 64, & 27, & 54, & 7, \\ {\tiny\color{gray}{10}} & 14, & 28, & 56, & 11, & 22, & 44, & 88, & 75, & 49, & 98, \\ {\tiny\color{gray}{20}} & 95, & 89, & 77, & 53, & 5, & 10, & 20, & 40, & 80, & 59, \\ {\tiny\color{gray}{30}} & 17, & 34, & 68, & 35, & 70, & 39, & 78, & 55, & 9, & 18, \\ {\tiny\color{gray}{40}} & \color{darkblue}{36}, & 72, & 43, & 86, & 71, & 41, & 82, & 63, & 25, & 50, \\ {\tiny\color{gray}{50}} & 100, & 99, & 97, & 93, & 85, & 69, & 37, & 74, & 47, & 94, \\ {\tiny\color{gray}{60}} & 87, & 73, & 45, & 90, & 79, & 57, & 13, & 26, & 52, & 3, \\ {\tiny\color{gray}{70}} & 6, & 12, & 24, & 48, & 96, & \color{darkgreen}{91}, & \color{darkred}{81}, & 61, & 21, & 42, \\ {\tiny\color{gray}{80}} & 84, & 67, & 33, & 66, & 31, & 62, & 23, & 46, & 92, & 83, \\ {\tiny\color{gray}{90}} & 65, & 29, & 58, & 15, & 30, & 60, & 19, & 38, & 76, & 51\hphantom{,} \\ \end{array} $$

Like and , this is just a list of numbers in order.

As the notation suggests, and are inverses. We already saw that the first table had and . Going in the opposite direction, we see from the second table that (again in red) and . The elements of tell you where to find numbers in the table. Where is in the second table? Look at the th element in the first table. , so is at position in the second table.

Before we go too deeply into how these were constructed, let's try the example we did before. The algorithm is just :

$$ \begin{align} % \ell^{-1}(\ell(a) + \ell(b)) & = ab\tag{$\color{purple}{\clubsuit}$} \\ 7·13 &= \ell^{-1}(\ell(7) + \ell(13)) \\ &= \ell^{-1}(\color{darkgreen}{9} + \color{darkgreen}{66}) \\ &= \ell^{-1}(75) \\ &= \color{darkgreen}{91} \end{align} $$

(The relevant numbers are picked out in green in the two tables.)

As promised, with three table lookups and a single integer addition.

What if the sum in the middle exceeds ? No problem, the table wraps around, so that element is the same as element :

$$ \begin{align} % \ell^{-1}(\ell(a) + \ell(b)) & = ab\tag{$\color{purple}{\clubsuit}$} \\ 3·12 &= \ell^{-1}(\ell(3) + \ell(12)) \\ &= \ell^{-1}(\color{darkblue}{69} + \color{darkblue}{71}) \\ &= \ell^{-1}(140) \\ &= \ell^{-1}(40) &\text{(wrap around)}\\ &= \color{darkblue}{36} \end{align} $$

How about that.

(This time the relevant numbers are picked out in blue.)

I said this only computes when the product is at most . That is not quite true. If you are willing to ignore a small detail, this algorithm will multiply any two numbers. The small detail is that the multiplication will be done mod . That is, instead of the exact answer, you get one that differs from it by a multiple of . Let's do an example to see what I mean when I say it works even for products bigger than :

$$ \begin{align} % \ell^{-1}(\ell(a) + \ell(b)) & = ab\tag{$\color{purple}{\clubsuit}$} \\ 16·26 &= \ell^{-1}(\ell(16) + \ell(26)) \\ &= \ell^{-1}(4 + 67) \\ &= \ell^{-1}(71) \\ &= 12 \end{align} $$

This tell us that . The correct answer is actually , and indeed which is a multiple of . The reason this happens is that the elements of the second table, , are not true integers, they are mod integers.

Okay, so what is the secret here? Why does this work? It should jump out at you that it is often the case that an entry in the table is twice the previous entry:

$$\ell^{-1}(1+n) = 2\cdot \ell^{-1}(n)$$

In fact, this is true everywhere, if you remember that the numbers are not ordinary integers but mod integers. For example, the number that follows , in place of , is . But because they differ by a multiple of . From a mod point of view, it doesn't matter wther we put or after , as they are the same thing.

Those two facts are the whole secret of the table:

Each element is twice the one before, but
The elements are not quite ordinary numbers, but mod numbers where .

Certainly . And every entry in the is twice the previous one, if you are thinking in mod . The two secrets are actually one secret:

$$\ell^{-1}(n) = 2^n\pmod{101}.$$

This is why the multiplication algorithm works. Say we want to multiply and again. We look up and in , and find and . What this is really telling us is that

$$ \begin{align} 7 & = 2^{9\hphantom6} \pmod{101} \\ 13 & = 2^{66} \pmod{101} \\ \end{align} $$

so that multiplying mod is the same as multiplying $$2^9\cdot 2^{66}.$$

But multiplying exact powers of is easy, since you just add the exponents: , whether you are doing it in regular numbers or mod numbers. And the table tells us directly that .

The function, which is analogous to the regular logarithm, is called a discrete logarithm.

What's going on with Percy Ludgate's algorithm? It's a sort of compressed, limited version of the discrete logarithm.

I had a hope that maybe we could reimplement Ludgate's thing by basing it more directly on discrete logarithms. Say we had the table encoded in a wheel of some sort, with the entries in order around the rim. There's a “current position” , initially , and a “current number” , initially .

On the same axle as the wheel, mount a gear with exactly 100 teeth. We can easily turn the wheel exactly positions by taking a straight bar with teeth and using it to turn the gear, which turns the wheel. We easily multiply the current number by just by turning the wheel one position clockwise.

Multiplying by isn't too hard, just turn the wheel positions clockwise. We can do this by constructing a short bar with exactly teeth and using it to turn the gear. Or maybe we have a meshing gear with teeth, on another axle, which we give one full turn. Either way, if the current number was before, it's after.

Multiplying by is rather more of a pain, because we have to turn the wheel positions, so we need a bar or a meshing gear with 69 teeth.

(We could get away with one with only teeth, if we could turn the wheel the other way, but that seems like it might be more complicated. Hmm, I suppose it would work to use a meshing gear with 31 teeth that engages a second gear (with any number of teeth) that engages the main gear.)

Anyway I took a look to see if there were any better tables do use, and the answer is: maybe! If, instead of a table of , we use a table of , then the brass wheel approach performs a little better:

$$ \begin{array}{rrrrrrrrrr} {\tiny\color{gray}{0}} & 1, & 26, & 70, & 2, & 52, & 39, & 4, & 3, & 78, & 8, \\ {\tiny\color{gray}{10}}& 6, & 55, & 16, & 12, & 9, & 32, & 24, & 18, & 64, & 48, \\ {\tiny\color{gray}{20}}& 36, & 27, & 96, & 72, & 54, & 91, & 43, & 7, & 81, & 86, \\ {\tiny\color{gray}{30}}& 14, & 61, & 71, & 28, & 21, & 41, & 56, & 42, & 82, & 11, \\ {\tiny\color{gray}{40}}& 84, & 63, & 22, & 67, & 25, & 44, & 33, & 50, & 88, & 66, \\ {\tiny\color{gray}{50}}& 100, & 75, & 31, & 99, & 49, & 62, & 97, & 98, & 23, & 93, \\ {\tiny\color{gray}{60}}& 95, & 46, & 85, & 89, & 92, & 69, & 77, & 83, & 37, & 53, \\ {\tiny\color{gray}{70}}& 65, & 74, & 5, & 29, & 47, & 10, & 58, & 94, & 20, & 15, \\ {\tiny\color{gray}{80}}& 87, & 40, & 30, & 73, & 80, & 60, & 45, & 59, & 19, & 90, \\ {\tiny\color{gray}{90}}& 17, & 38, & 79, & 34, & 76, & 57, & 68, & 51, & 13, & 35\hphantom{,} \\ \end{array} $$

Multiplying by is no longer as simple as turning the wheel one notch clockwise; you have to turn it positions counterclockwise. But that seems pretty easy. Multiplying by is also rather easy: just turn the wheel positions. If the table above is , then the analogue of Ludgate's table is:

$$ \begin{array}{cccccccccc} \tiny\color{gray}{1} & \tiny\color{gray}{2} & \tiny\color{gray}{3} & \tiny\color{gray}{4} & \tiny\color{gray}{5} & \tiny\color{gray}{6} & \tiny\color{gray}{7} & \tiny\color{gray}{8} & \tiny\color{gray}{9} \\ 0 & 3 & 7 & 6 & 72 & 10 & 27 & 9 & 14 \\ \end{array} $$

That is, if you want to compute , you start with the wheel in position , then turn it by positions, then by , and now it's at position , where the current number is .

The numbers in the table are all pretty small, except that to multiply by you have to turn by positions, which is kinda awful. Still it's only a little worse than in the powers-of-2 version where to multiply by you would have to turn the wheel by positions. And overall the powers-of-26 table is better: the sum of the entries is only , which is optimal; the corresponding sum of the entries for the powers-of-2 table is .

Who knows, it might work, and even if it didn't work well it might be pretty cool.

Irish logarithm forward instead of backward

Yesterday I posted about the so-called “Irish logarithm”, Percy Ludgate's little algorithm for single-digit multiplication.

Hacker News user sksksfpuzhpx said:

There's a much simpler way to derive Ludgate's logarithms

and referred to Brian Coghlan's aticle “Percy Ludgate's Logarithmic indices”.

Whereas I was reverse-engineering Ludgate's tables with a sort of ad-hoc backtracking search, if you do it right you can do it it more easily with a simple greedy search.

Uh oh, I thought, I will want to write this up before I move on to the thing I planned to do next, which made it all the more likely that I never would get to the thing I had planned to do next. But Shreevatsa R. came to my rescue and wrote up the Coghlan thing at least as well as I could have myself. Definitely check it out.

Thank you, Shreevatsa!

[ Update 20231015: A different kind of all-integer logarithm: the discrete logarithm. ]

[ Update 20231020: Better explanation of the discrete logarithm. ]

The Irish logarithm

The Wikipedia article on “Irish logarithm” presents this rather weird little algorithm, invented by Percy Ludgate. Suppose you want to multiply and , where both are single-digit numbers .

Normally you would just look it up on a multiplication table, but please bear with me for a bit.

To use Ludgate's algorithm you need a different little table:

$$ \begin{array}{rl} T_1 = & \begin{array}{cccccccccc} \tiny\color{gray}{0} & \tiny\color{gray}{1} & \tiny\color{gray}{2} & \tiny\color{gray}{3} & \tiny\color{gray}{4} & \tiny\color{gray}{5} & \tiny\color{gray}{6} & \tiny\color{gray}{7} & \tiny\color{gray}{8} & \tiny\color{gray}{9} \\ 50 & 0 & 1 & 7 & 2 & 23 & 8 & 33 & 3 & 14 \\ \end{array} \end{array} $$

and a different bigger one:

$$ \begin{array}{rl} T_2 = & % \left( \begin{array}{rrrrrrrrrrr} {\tiny\color{gray}{0}} & 1, & 2, & 4, & 8, & 16, & 32, & 64, & 3, & 6, & 12, \\ {\tiny\color{gray}{10}} & 24, & 48, & 0, & 0, & 9, & 18, & 36, & 72, & 0, & 0, \\ {\tiny\color{gray}{20}} & 0, & 27, & 54, & 5, & 10, & 20, & 40, & 0, & 81, & 0, \\ {\tiny\color{gray}{30}} & 15, & 30, & 0, & 7, & 14, & 28, & 56, & 45, & 0, & 0, \\ {\tiny\color{gray}{40}} & 21, & 42, & 0, & 0, & 0, & 0, & 25, & 63, & 0, & 0, \\ {\tiny\color{gray}{50}} & 0, & 0, & 0, & 0, & 0, & 0, & 35, & 0, & 0, & 0, \\ {\tiny\color{gray}{60}} & 0, & 0, & 0, & 0, & 0, & 0, & 49\hphantom{,} \end{array} % \right) \end{array} $$

I've formatted in rows for easier reading, but it's really just a zero-indexed list of numbers. So for example is .

The tiny gray numbers in the margin are not part of the table, they are counting the elements so that it is easy to find element .

Ludgate's algorithm is simply:

$$ ab = T_2(T_1(a) + T_1(b)) $$

Let's see an example. Say we want to multiply . We first look up and in , and get and , which we add, getting . Then is , which is the correct answer.

This isn't useful for paper-and-pencil calculation, because it only works for products up to , and an ordinary multiplication table is easier to use and remember. But Ludgate invented this for use in a mechanical computing engine, for which it is much better-suited.

The table lookups are mechanically very easy. They are simple one-dimensional lookups: to find you just look at entry in the table, which can be implemented as a series of ten metal rods of different lengths, or something like that. Looking things up in a multiplication table is harder because it is two-dimensional.

The single addition in Ludgate's algorithm can also be performed mechanically: to add and , you have some thingy that slides up by units, and then by more, and then wherever it ends up is used to index into to get the answer. The table doesn't have to be calculated on the fly, it can be made up ahead of time, and machined from brass or something, and incorporated directly into the machine. (It's tempting to say “hardcoded”.)

The tables look a little uncouth at first but it is not hard to figure out what is going on. First off, is the inverse of in the sense that $$T_2(T_1(n)) = n\tag{$\color{darkgreen}{\spadesuit}$}$$

whenever is in range — that is when .

is more complex. We must construct it so that

$$T_2(T_1(a) + T_1(b)) = ab.\tag{$\color{purple}{\clubsuit}$}$$

for all and of interest, which means that .

If you look over the table you should see that the entry is often followed by . That is, , at least some of the time. In fact, this is true in all the cases we care about, where for some single digits .

The second row could just as well have started with , but Ludgate doesn't need the entries, so he made them zero, which really means “I don't care”. This will be important later.

The algorithm says that if we want to compute , we should compute $$ \begin{align} 2n & = T_2(T_1(2) + T_1(n)) && \text{Because $\color{purple}{\clubsuit}$} \\ & = T_2(1 + T_1(n)) \\ & = 2T_2(T_1(n)) && \text{Because moving one space right doubles the value}\\ & = 2n && \text{Because $\color{darkgreen}{\spadesuit}$} \end{align} $$

when .

I formatted in rows of because that makes it easy to look up examples like , and because that's how Wikipedia did it. But this is very misleading, and not just because it makes appear to be a table when it's really a vector. is actually more like a compressed version of a table.

Let's reformat the table so that the rows have length instead of :

We have already seen that moving one column right usually multiplies the entry by . Similarly, moving down by one row is seen to triple the value — not always, but in all the cases of interest. Since the rows have length , moving down one row from gets you to , and this is why : to compute , one does:

$$ \begin{align} 3n & = T_2(T_1(3) + T_1(n)) && \text{Because $\color{purple}{\clubsuit}$} \\ & = T_2(7 + T_1(n)) \\ & = 3T_2(T_1(n)) && \text{Because moving down triples the value}\\ & = 3n && \text{Because $\color{darkgreen}{\spadesuit}$} \end{align} $$

Now here is where it gets clever. It would be straightforward easy to build as a stack of tables, with each layer in the stack having entries quintuple the layer above, like this:

This works, if we make the correct offset, which is . But it wastes space, and the larger is, the more complicated and expensive is the brass thingy that encodes it. The last six entries of the each layer in the stack are don't-cares, so we can just omit them:

And to compensate we make instead of : you now move down one layer in the stack by skipping entries forward, instead of .

The table is still missing all the multiples of , but we can repeat the process. The previous version of can now be thought of as a table, and we can stack another table below it, with all the entries in the new layer being times the original one:

$$ \begin{array}{rrrrrrrr} {\tiny\color{gray}{0}} & 1, & 2, & 4, & 8, & 16, & 32, & 64, \\ {\tiny\color{gray}{7}} & 3, & 6, & 12, & 24, & 48, & 0, & 0, \\ {\tiny\color{gray}{14}} & 9, & 18, & 36, & 72, & 0, & 0, & 0, \\ {\tiny\color{gray}{21}} & 27, & 54, & 0, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{28}} & 81, \\ \\ {\tiny\color{gray}{29}} & 5, & 10, & 20, & 40, & 0, & 0, & 0, \\ {\tiny\color{gray}{36}} & 15, & 30, & 0, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{43}} & 45, & 0, & 0, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{50}} & 0, & 0, & 0, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{57}} & 0, \\ \\ {\tiny\color{gray}{58}} & 25, & 0, & 0, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{65}} & 0, & 0, & 0, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{72}} & 0, & 0, & 0, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{79}} & 0, & 0, & 0, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{86}} & 0, \\ \\ \hline \\ {\tiny\color{gray}{87}} & 7, & 14, & 28, & 56, & 0, & 0, & 0, \\ {\tiny\color{gray}{94}} & 21, & 42, & 0, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{101}} & 63, & 0, & 0, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{108}} & 0, & 0, & 0, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{115}} & 0, \\ \\ {\tiny\color{gray}{116}} & 35, & 0, & 0, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{123}} & 0, & 0, & 0, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{130}} & 0, & 0, & 0, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{137}} & 0, & 0, & 0, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{144}} & 0, \\ \\ {\tiny\color{gray}{145}} & 0, & 0, & 0, & 0, & 0, & 0, & \ldots \\ \\ \hline \\ {\tiny\color{gray}{174}} & 49\hphantom{,} \\ \end{array} $$

Each layer in the stack has entries, so we could take and it would work, but the last entries in every layer are zero, so we can discard those and reduce the layers to entries each.

$$ \begin{array}{rrrrrrrr} {\tiny\color{gray}{0}} & 1, & 2, & 4, & 8, & 16, & 32, & 64, \\ {\tiny\color{gray}{7}} & 3, & 6, & 12, & 24, & 48, & 0, & 0, \\ {\tiny\color{gray}{14}} & 9, & 18, & 36, & 72, & 0, & 0, & 0, \\ {\tiny\color{gray}{21}} & 27, & 54, & 0, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{28}} & 81, \\ \\ {\tiny\color{gray}{29}} & 5, & 10, & 20, & 40, & 0, & 0, & 0, \\ {\tiny\color{gray}{36}} & 15, & 30, & 0, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{43}} & 45, & 0, & 0, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{50}} & 0, & 0, & 0, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{57}} & 0, \\ \\ {\tiny\color{gray}{58}} & 25, \\ \\ \hline \\ {\tiny\color{gray}{59}} & 7, & 14, & 28, & 56, & 0, & 0, & 0, \\ {\tiny\color{gray}{66}} & 21, & 42, & 0, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{73}} & 63, & 0, & 0, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{80}} & 0, & 0, & 0, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{87}} & 0, \\ \\ {\tiny\color{gray}{88}} & 35, & 0, & 0, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{95}} & 0, & 0, & 0, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{102}} & 0, & 0, & 0, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{109}} & 0, & 0, & 0, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{116}} & 0, \\ \\ {\tiny\color{gray}{117}} & 0, \\ \\ \hline \\ {\tiny\color{gray}{118}} & 49\hphantom{,} \\ \end{array} $$

Doing this has reduced the layers from to elements each, but Ludgate has another trick up his sleeve. The last few numbers in the top layer are and a lot of zeroes. If he could somehow finesse and , he could trim the top two layers all the way back to only 38 entries each:

We're now missing and we need to put it back. Fortunately the place we want to put it is , and that slot contains a zero anyway. And similarly we want to put at position , also empty:

The arithmetic pattern is no longer as obvious, but property still holds.

We're not done yet! The table still has a lot of zeroes we can squeeze out. If we change from to , the group will slide backward to just after the , and the will move to the row below that.

We will also have to move the other multiples of . The itself moved back by six entries, and so did everything after that in the table, including the (from position to ) and the (from position to ) so those are still in the right places. Note that this means that has moved from position to position , so we now have .

But the is giving us trouble. It needed to move back twice as far as the others, from to , and unfortunately it now collides with which is currently at position .

$$ \begin{array}{rrrrrrrr} {\tiny\color{gray}{0}} & 1, & 2, & 4, & 8, & 16, & 32, & 64, \\ {\tiny\color{gray}{7}} & 3, & 6, & 12, & 24, & 48, & 0, & 0, \\ {\tiny\color{gray}{14}} & 9, & 18, & 36, & 72, & 0, & 0, & 0, \\ {\tiny\color{gray}{21}} & 27, & 54, & \color{purple}{5}, & \color{purple}{10}, & \color{purple}{20}, & \color{purple}{40}, & 0, \\ {\tiny\color{gray}{28}} & 81, & 0, & \color{purple}{15} & \color{purple}{30}, \\ \\ \hline \\ {\tiny\color{gray}{32}} & 7, & 14, & 28, & 56, & 0, & \color{darkgreen}{45}, & 0, \\ {\tiny\color{gray}{39}} & 21, & 42, & 0, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{46}} & {63\atop\color{darkred}{¿25?}} & 0, & 0, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{53}} & 0, & 0, & \color{darkgreen}{35}, & 0, & 0, & 0, & 0, \\ {\tiny\color{gray}{60}} & 0, & 0, & 0, & 0, \\ \\ \hline \\ {\tiny\color{gray}{64}} & 49\hphantom{,} \\ \end{array} $$

We need another tweak to fix . is currently at position . We can't move any farther back to the left without causing more collisions. But we can move it forward, and if we move it forward by one space, the will move up one space also and the collision with will be solved. So we insert a zero between and , which moves up from position to :

All the other multiples of moved up by one space, but not the non-multiples and . Also had to move up by two, but that's no problem at all, since it was at the end of the table and has all the space it needs.

And now we are done! This is exactly Ludgate's table, which has the property that

$$T_2(p + 7q + 23r + 33s) = 2^p3^q5^r7^s$$

whenever for some . Moving right by one space multiplies the entry by , at least for the entries we care about. Moving right by seven spaces multiplies the entry by . To multiply by or we move right by or or by , respectively.

These are exactly the values in the table:

$$\begin{align} T_1(2) & = 1\\ T_1(3) & = 7\\ T_1(5) & = 23\\ T_1(7) & = 33 \end{align}$$

The rest of the table can be obtained by remembering , that , so for example because . Or we can get by multiplication, using : multiplying by is the same as multiplying by and then by , which means you move right by and then by , for a total of . Here's again for reference:

$$ \begin{array}{rl} T_1 = & \begin{array}{cccccccccc} 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 \\ \hline 50 & 0 & 1 & 7 & 2 & 23 & 8 & 33 & 3 & 14 \\ \end{array} \end{array} $$

(Actually I left out a detail: . Ludgate wants for all . So we need for each in . is the smallest value that works. This is rather painful, because it means that the -item table above is not sufficient. Ludgate has to extend all the way out to items in order to handle the seemingly trivial case of . But the last 35 entries are all zeroes, so the the brass widget probably doesn't have to be too much more complicated.)

Wasn't that fun? A sort of mathematical engineering or a kind that has not been really useful for at least fifty years.

But actually that was not what I planned to write about! (Did you guess that was coming?) I thought I was going to write this bit as a brief introduction to something else, but the brief introduction turned out to be 2500 words and a dozen complicated tables.

We can only hope that part 2 is forthcoming. I promise nothing.

[ Update 20231002: Rather than the ad-hoc backtracking approach I described here, one can construct and in a simpler and more direct way. Shreevatsa R. explains. ]

[ Update 20231015: Part 2 has arrived! It discusses a different kind of all-integer logarithm called the “discrete” logarithm. ]

[ Update 20231020: I think this is a clearer explanation of the discrete logarithm. Shorter, anyway. ]

Math SE report 2023-06: funky-looking Hasse diagrams, and what is a polynomial anyway?

Is a fourth-degree equation?

This is actually a really good question! (You can tell because it's quick to ask and complicated to answer.) It goes back to a very fundamental problem that beginners in mathematics, which is that there is a difference between an object's true nature and the way it happens to be written down. And often these problems are compounded because there is no way to talk about the object except by referring to how it is written down.

OP says:

The best definition I could find for the degree of an equation is the following:

The highest power of the unknown term whose coefficient isn't zero in a given equation

And they are bothered by this, and rightly so. I was almost derailed at this point into writing an article about what an equation is, but I'm going to put it off for another day, because I think to get to this person's question what we really need to do is to say what a polynomial is.

One way is to describe it as an expression in a certain form, but this is a bit roundabout. It's like describing a rational number as an expression of the form where and are relatively prime integers. Under this sort of definition, isn't a polynomial at all, because it's not an expression of the correct form.

But I think the right way to define a polynomial is that it's an element of the free ring over some ring of coefficients. This leaves completely open the question of how a polynomial is written, or what it looks like. It becomes a theorem that polynomials are in one-to-one correspondence with finite sequences of elements of . Then we can say that the “degree” of a polynomial is one less than the length of the corresponding finite sequence, or something like that.

[ Sometimes we make an exception for the zero polynomial and say its degree is , to preserve the law .) ]

In this view the zero polynomial is simply the zero element of the ring. The polynomial called “” is the fourth power of the free element .

Since the polynomials are elements of a ring, addition, subtraction, and multiplication come along automatically, and we can discuss the value of the expression , which by the usual properties of is also the zero polynomial.

Anyway that all is pretty much what I said:

is just a way to write the polynomial , which is not a fourth-degree polynomial. Similarly is not a fifth-degree polynomial.

There's an underlying reality here, the abstract elements of the ring . And then there's a representation theorem, which is that elements of are in one-to-one correspondence with finite sequences of elements of . The ring laws give us ways to normalize arbitrary expressions involving polynomials. And then there's also the important functor that turns every polynomial ring into a ring of functions, turning the polynomial into the function .

This kind of abstract approach isn't usually explained in secondary or tertiary education, and I'm not sure it would be an improvement if it were. (You'd have to explain rings, for one thing.) But the main conceptual point is that there is a difference between the thing itself (say, ) and the way we happen to write the thing (say, ), and some properties are properties of the thing itself and others are properties of expressions, of the way the thing has been written. The degree of a polynomial is a property of the thing itself, not of the way it happens to be written, so both of those expressions are ways to write the same polynomial, which is fourth-degree, regardless of the fact that in one of them, “the highest power of the unknown term whose coefficient isn't zero” is five.

There is one example of this abstraction that everyone learns in childhood, rational numbers. I lean hard on this example, because most people know it pretty well, even if they don't realize it yet. and are the same thing, written in two different ways. Mathematicians will, without even thinking about it, speak of the numerator of a rational number, and without batting an eyelash will say that the numerator of the rational number is . The fraction is a mere notation that represents a rational number, in this case the rational number , and this rational number has a numerator of .

Beginning (and intermediate) computer programmers also have this issue, that the thing itself, usually some embedding of data in the computer's memory, may be serialized into a string in multiple ways. There's a string that represents the thing, and then there's the thing itself, but it's hard to talk about the thing itself because anything you can write is a string. I wish this were made explicit in CS education. Computer programmers who don't pick up on this crucial point, at least on an intuitive level, do not progress past intermediate.

What are the names given to statements that can be true or false?

I think I totally flubbed this one. OP is really concerned with open and closed formulas. For example, “” is true, or false, depending on the value of . And OP astutely noted that while “” is always true, its meaning still depends on the value of .

I did get around to explaining that part of the issue, eventually. The crucial point, which is that there are formulas which may have free variables and then there are statements which do not, is buried at the end after a lot of barely-relevant blather about Quinian quasiquotation. What was I thinking? Perhaps I should go back and shorten the answer to just the relevant part.

How does one identify the weakest preconditions in Hoare triples?

I wrote a detailed explanation of how one identifies weakest preconditions in Hoare triples, before realizing that actually OP understood this perfectly, and their confusion was because their book wrote “{x≠1}” where it should have had “{x≠-1}”.

Sheesh.

Artifacts of mathematical logic

This was fun. OP wants actual physical artifacts that embody concepts from mathematical logic, the way one models of the platonic solids embody concepts from solid geometry.

I couldn't think of anything good, but then Michael Weiss brought up Lewis Carroll's Game of Logic. This reminded me that Martin Gardner had written a book about embodiments of simple logic, including the Carroll one, so I mentioned that. It's a fun book. Check out the account of Ramon Llull, who missed being canonized because his martyrdom looked a bit too much like FAFO.

I find this answer a little unsatisfying though. The logic machines in Gardner's book do only a little boolean algebra, or maybe at best a tiny bit of predicate logic. But I'd love to see a physical system that somehow encapsulated sequent calculus or natural deduction or something like that. Wouldn't it be cool to have blocks stuck together with magnets or physical couplings, and if you could physically detach the from only if you already had an assemblage that matched exactly? I have no idea how you'd do it. Maybe a linear logic model would be more feasible: once you used with to get , you wouldn't be able to use either one again.

We need some genius to come and invent some astonishing mechanism that formerly seemed impossible. I wonder if Ernő Rubik is available?

Joachim Breitner's Incredible Proof Machine is a fun thing along these lines, but it's not at all an artifact.

Is there a name for this refinement of the subset ordering?

This was my question. I've never seen this ordering elsewhere, but it has a simple description. We start with a totally ordered finite set . Then if and are subsets of , we deem if there is an injective mapping where for each .

So for example, if has three elements then the ordering I want, on , has this Hasse diagram:

because we can match the 's. And because we can match with and with . But because we can't match with either or with , and because, while we can match either of or with , we aren't allowed to match both of them with .

Here's the corresponding Hasse diagram for :

Maybe a better way to describe this is: the bottom element is . To go up the lattice one step, you either increment one of the elements of the current set, or you insert an if there isn't one already. So from you can either increment the to move up to or you can insert an to move up to .

This ordering comes up in connection with a problem I've thought about a lot: Say you have a number and you want to find with and as close together as possible. Even if you have the prime factorization of available, it's not usually clear what the answer is. (In general it's NP-hard.)

If is the product of two primes, the answer is obvious. And if is a product of three primes there is a definitive answer. Without loss of generality, , and the answer is simply that is always optimal.

But if it can go two different ways. Assuming , it usually turns out that the optimal solution is . But sometimes the optimal solution is . These are the only two possibilities.

Which ways of splitting the prime factors might be optimal relates to those Hasse diagrams above. The possibly-optimal splits between and correspond to nodes that are just at the boundary of the left and right halves of the diagram.

Nobody had an answer for what this order was called, so I could not look it up. This is OK, I will figure it all out eventually.

Gantō's axe does have computational content

A while back I was thinking about this theorem of intuitionistic logic:

$$((P\to Q)\land ((P\to \bot) \to Q)) \to {(Q\to\bot)\to \bot} \tag{$\color{darkred}{\heartsuit}$}$$

Since it's intuitionistically valid, its proof can be converted into a program with the corresponding type. But the function that you get seemed to be computationally vacuous, since it could only be called with arguments that were impossible to obtain, or else with arguments that render the whole thing silly. For the former, you can make it work in general if you have at hand a function of type — but how could you? And for the latter, you can make it work if , in which case you don't really need the program at all since is trivially true.

Several people replied to me about this, but the most interesting response I got was from Simon Tatham, who observed that is still intuitionistically valid if you replace with arbitrary :

$$((P\to Q)\land ((P\to R) \to Q)) \to {(Q\to R)\to R} \tag{$\color{purple}{\spadesuit}$}$$

The proof is essentially the same:

Suppose you have: and
then if you also have , you can compose it with to get
which you can feed to to get
and then feed that to to get .

But unlike , this is computationally interesting.

M. Tatham gives this example: Take to be . We can understand a function as a subset of , represented by its characteristic function. Say that is the type of subsets of , perhaps in some representation that is more concrete than characteristic functions. If you'd like a more specific example, take to be the natural numbers, and to be (recursive) sets of natural numbers as represented by (possibly infinite) lists in strictly increasing order.

Then:

is some arbitrary function that assigns, to each element of , a related subset of . Say for example that is the semispatulated closure of .
could be the isomorphism that converts a subset of , represented as a characteristic function, into one represented as a .
is a single subset of , represented as a characteristic function on . Say, is true if and only the set has the Cosell property.

With these interpretations, tells you whether 's semispatulated closure has the Cosell property, and is the element of that represents, -style, the set of all such .

Then computes whether this set itself also has the Cosell property: true if so, false if not.

This is certainly nontrivial, and if the made-up properties were replaced by real ones, it could even be interesting.

One could also replace with some other set, and then instead of a characteristic function, would be some sort of valuation function.

I had asked:

Is the whole thing just trivial because there is no interesting way to instantiate data objects with the right types? Or is there some real computational content here?

I think M. Tatham's example shows that there is, and that the apparent vacuity of the theorem arose only because had been replaced with the empty set.