Archive:
In this section: Subtopics:
Comments disabled |
Mon, 24 Sep 2018 A long time ago, I wrote up a blog article about how to derive the linear regression formulas from first principles. Then I decided it was not of general interest, so I didn't publish it. (Sometime later I posted it to math stack exchange, so the effort wasn't wasted.) The basic idea is, you have some points !!(x_i, y_i)!!, and you assume that they can be approximated by a line !!y=mx+b!!. You let the error be a function of !!m!! and !!b!!: $$\varepsilon(m, b) = \sum (mx_i + b - y_i)^2$$ and you use basic calculus to find !!m!! and !!b!! for which !!\varepsilon!! is minimal. Bing bang boom. I knew this for a long time but it didn't occur to me until a few months ago that you could use basically the same technique to fit any other sort of curve. For example, suppose you think your data is not a line but a parabola of the type !!y=ax^2+bx+c!!. Then let the error be a function of !!a, b, !! and !!c!!: $$\varepsilon(a,b,c) = \sum (ax_i^2 + bx_i + c - y_i)^2$$ and again minimize !!\varepsilon!!. You can even get a closed form as you can with ordinary linear regression. I especially wanted to try fitting hyperbolas to data that I expected to have a Zipfian distribution. For example, take the hundred most popular names for girl babies in Illinois in 2017. Is there a simple formula which, given an ordinal number like 27, tells us approximately how many girls were given the 27th most popular name that year? (“Scarlett”? Seriously?) I first tried fitting a hyperbola of the form !!y = c + \frac ax!!. We could, of course, take !!y_i' = \frac 1{y_i}!! and then try to fit a line to the points !!\langle x_i, y_i'\rangle!! instead. But this will distort the measurement of the error. It will tolerate gross errors in the points with large !!y!!-coordinates, and it will be extremely intolerant of errors in points close to the !!x!!-axis. This may not be what we want, and it wasn't what I wanted. So I went ahead and figured out the Zipfian regression formulas: $$ \begin{align} a & = \frac{HY-NQ}D \\ c & = \frac{HQ-JY}D \end{align} $$ Where: $$\begin{align} H & = \sum x_i^{-1} \\ J & = \sum x_i^{-2} \\ N & = \sum 1\\ Q & = \sum y_ix_i^{-1} \\ Y & = \sum y_i \\ D & = H^2 - NJ \end{align} $$ When I tried to fit this to some known hyperbolic data, it worked just fine. For example, given the four points !!\langle1, 1\rangle, \langle2, 0.5\rangle, \langle3, 0.333\rangle, \langle4, 0.25\rangle!!, it produces the hyperbola $$y = \frac{1.00018461538462}{x} - 0.000179487179486797.$$ This is close enough to !!y=\frac1x!! to confirm that the formulas work; the slight error in the coefficients is because we used !!\bigl\langle3, \frac{333}{1000}\bigr\rangle!! rather than !!\bigl\langle3, \frac13\bigr\rangle!!. Unfortunately these formulas don't work for the Illinois baby data. Or rather, the hyperbola fits very badly. The regression produces !!y = \frac{892.765272442475}{x} + 182.128894972025:!! I think maybe I need to be using some hyperbola with more parameters, maybe something like !!y = \frac a{x-b} + c!!. In the meantime, here's a trivial script for fitting !!y = \frac ax + c!! hyperbolas to your data:
[ Addendum 20180925: Shreevatsa R. asked a related question on StackOverflow and summarized the answers. The problem is more complex than it might first appear. Check it out. ] [Other articles in category /math] permanent link Fri, 14 Sep 2018
How not to remember the prime numbers under 1,000
A while back I said I wanted to memorize all the prime numbers under 1,000, because I am tired of getting some number like 851 or 857, or even 307, and then not knowing whether it is prime. The straightforward way to deal with this is: just memorize the list. There are only 168 of them, and I have the first 25 or so memorized anyway. But I had a different idea also. Say that a set of numbers from !!10n!! to !!10n+9!! is a “decade”. Each decade contains at most 4 primes, so 4 bits are enough to describe the primes in a single decade. Assign a consonant to each of the 16 possible patterns, say “b” when none of the four numbers is a prime, “d” when only !!10n+1!! is prime, “f” when only !!10n+3!! is, and so on. Now memorizing the primes in the 90 decades is reduced to memorizing 90 consonants. Inserting vowels wherever convenient, we have now turned the problem into one of memorizing around 45 words. A word like “potato” would encode the constellation of primes in three consecutive decades. 45 words is only a few sentences, so perhaps we could reduce the list of primes to a short and easily-remembered paragraph. If so, memorizing a few sentences should be much easier than memorizing the original list of primes. The method has several clear drawbacks. We would have to memorize the mapping from consonants to bit patterns, but this is short and maybe not too difficult. More significant is that if we're trying to decide if, say, 637 is prime, we have to remember which consonant in which word represents the 63rd decade. This can be fixed, maybe, by selecting words and sentences of standard length. Say there are three sentences and each contains 30 consonants. Maybe we can arrange that words always appear in patterns, say four words with 1 or 2 consonants each that together total 7 consonants, followed by a single long word with three consonants. Then each sentence can contain three of these five-word groups and it will be relatively easy to locate the 23rd consonant in a sentence: it is early in the third group. Katara and I tried this, with not much success. But I'm not ready to give up on the idea quite yet. A problem we encountered early on is that we remember consonants not be how words are spelled but by how they sound. So we don't want a word like “hammer” to represent the consonant pattern h-m-m but rather just h-m. Another problem is that some constellations of primes are much more common than others. We initially assigned consonants to constellations in order. This assigned letter “b” to the decades that contain no primes. But this is the most common situation, so the letter “b” tended to predominate in the words we needed for our mnemonic. We need to be careful to assign the most common constellations to the best letters. Some consonants in English like to appear in clusters, and it's not trivial to match these up with the common constellations. The mapping from prime constellations to consonants must be carefully chosen to work with English. We initially assigned “s” to the constellation “☆•☆☆” (where !!10n+1, 10n+7,!! and !!10n+9!! are prime but !!10n+3!! is not) and “t” to the constellation “☆☆••” (where !!10n+1!! and !!10n+3!! are prime but !!10n+7!! and !!10n+9!! are not) but these constellations cannot appear consecutively, since at least one of !!10n+7, 10n+9, 10n+11!! is composite. So any word with “s” and “t” with no intervening consonants was out of play. This eliminated a significant fraction of the entire dictionary! I still think it could be made to work, maybe. If you're interested
in playing around with this, the programs I wrote are available on
Github. The mapping
from decade constellations to consonant clusters is in
[Other articles in category /math] permanent link Wed, 12 Sep 2018
Perils of hacking on mature software
Yesterday I wrote up an interesting bug in People complain that the trouble of working on mature software like Git is to understand the way the code is structured, its conventions, the accumulated layers of cruft, and where everything is. I think this is a relatively minor difficulty. The hard part is no so much doing what you want, as knowing what you want to do. My original idea for the fix was this: I can give I excavated the code and found where the change needed to go. It's
not actually in But then I ran into a roadblock. Diff already has an undocumented
flag called (See this commit for further details.) The roadblock is: how does If they should be controlled in unison, should
If we add new options, how do they interact with the existing and
already non-orthogonal flags that do something like this? They
include at least the following options of
Only Now suppose you would like to configure a default for this option in
your The thing to do at this point is to come up with some
reasonable-seeming proposal and send it to Jeff King, who created the
undocumented Doing any particular thing would not be too hard. The hard part is deciding what particular thing to do. [Other articles in category /prog] permanent link
Language fluency in speech and print
Long ago I worked among the graduate students at the University of Pennsylvania department of Computer and Information Sciences. Among other things, I did system and software support for them, and being about the same age and with many common interests, I socialized with them also. There was one Chinese-Malaysian graduate student who I thought of as having poor English. But one day, reading one of his emailed support requests, I was struck by how clear and well-composed it was. I suddenly realized I had been wrong. His English was excellent. It was his pronunciation that was not so good. When speaking to him in person, this was all I had perceived. In email, his accent vanished and he spoke English like a well-educated native. When I next met him in person I paid more careful attention and I realized that, indeed, I had not seen past the surface: he spoke the way he wrote, but his accent had blinded me to his excellent grammar and diction. Once I picked up on this, I started to notice better. There were many examples of the same phenomenon, and also the opposite phenomenon, where someone spoke poorly but I hadn't noticed because their pronunciation was good. But then they would send email and the veil would be lifted. This was even true of native speakers, who can get away with all sorts of mistakes because their pronunciation is so perfect. (I don't mean perfect in the sense of pronouncing things the way the book says you should; I mean in the sense of pronouncing things the way a native speaker does.) I didn't notice this unless I was making an effort to look for it. I'm not sure I have anything else to say about this, except that it seems to me that when learning a foreign language, one ought to consider whether one will be using it primarily for speech or primarily for writing, and optimize one's study time accordingly. For speech, concentrate on good pronunciation; for writing, focus on grammar and diction. Hmm, put that way it seems obvious. Also, the sky is sometimes blue. [Other articles in category /lang] permanent link Mon, 10 Sep 2018
Why hooks and forks in the J language?
And I think I now recall that the name of the language itself, J, is intended to showcase the hook, so he must have thought it was pretty wonderful. A helpful Hacker News
comment pointed me to
the explanation. Here Iverson explains why the “hook”
feature: it is actually the
S combinator in disguise. Recall that
$${\bf S} x y z = x z (y z).$$ This is exactly what J's hook computes
when you write As McBride and Paterson point
out, S
is also the same as the [Other articles in category /prog] permanent link
git log --follow enthusiastically tracks empty files
This bug I just found in I knew I'd written a draft of a blog article about the Watchmen movie, and I went to find out how long it had been sitting around:
The log stopped there, and the commit message says clearly that the
article was moved from elsewhere, so I used
Okay, it was moved, with slight modifications, from
Okay, the previous month I added some text to it. Then I skipped to the bottom to see when it first appeared, and the bottom was completely weird, mentioning a series of completely unrelated articles:
(The complete output is available for your perusal.) The log is showing unrelated files being moved to totally unrelated
places. And also, the log messages do not seem to match up. “First
chunk of linear regression article” should be on some commit that adds
text to My first thought was: when I imported my blog from CVS to Git, many years ago, I made a series of mistakes, and mismatched the log messages to the commits, or worse, and I might have to do it over again. Despair! But no, it turns out that
But if I do
This is easy to understand. The commit message was correct: the
I believe what happened here is this: In 2012 I “finally started article”. But I didn't create the file at that time. Rather, I had created the file in 2009 with the intention of putting something into it later:
This commit does appear in the
It appears that Git, having detected that At this point it has gone completely off the rails, because it is now
following the unrelated empty file Commit ff398402 created the empty file There is a As far as I can tell there is no option to set an absolute threshhold
on when two files are considered the same by The part I don't fully understand is how The problem appears in Git 1.7.11, 2.7.4, and 2.13.0. [ Addendum 20180912: A followup about my work on a fix for this. ] [Other articles in category /prog] permanent link Sun, 09 Sep 2018I very recently suggested a mathematical operation that does this: $$\begin{align} \left((\sqrt\bullet) \cdot x + \left(\frac1\bullet\right) \cdot 1 \right) ⊛ (9x+4) & = \sqrt9 x^2 + \sqrt4 x + \frac19 x + \frac14 \\ & = 3x^2 + \frac{19}{9} x + \frac 14 \end{align}$$ Here the left-hand argument is like a polynomial, except that the coefficients are functions. The right-hand argument is an ordinary polynomial. It occurs to me that the APL progamming lanaguage (invented around 1966) actually has something almost like this, in its generalized matrix product. In APL, if $$c_{ij} = a_{i1} \cdot b_{1j} + The APL With this feature, the ⊛ operator I proposed above would be something
like APL does have a $$c_{11} = a_{11} \cdot b_{11} + and similarly if !!a!! is !!n×1!! and !!b!! is !!1×m!! then !!a +.× b!! is the
outer product, the !!n×m!! matrix whose !!c_{ij} = a_i × b_j!!. But I
think APL doesn't distinguish between a !!1×n!! matrix and a vector,
though, and always considers them to be vectors, so that in such cases
!!a +.× b!! always gets you the dot product, if !!a!! and !!b!! are the same
length, and an error otherwise. If you want the outer product of two
vectors you use I applied for an APL job once; I went to a job fair (late 1980s maybe?) and some Delaware bank was looking for APL programmers to help maintain their legacy APL software. I was quite excited at the idea of programming APL professionally, but I had no professional APL experience so they passed me over. I think they made a mistake, because there are not that many people with professional APL experience anyway, and how many twenty-year-olds are there who know APL and come knocking on your door looking for a job? But whatever, it's probably better that I didn't take that route. The I was pleased to find out that Iverson had designed a successor language, J, and then quickly disappointed when I saw how little it added. For example, it has an implicit “hook” construction, which is a special case in the language for handling one special case of function composition. In Haskell it would be:
but in J the [ Addendum 20180910: The explanation. ] Meanwhile the awful APL notation has gotten much more awful in J, and
you get little in return. You even lose all the fun of the little
squiggles. Haskell is a much better J than J ever was. Haskell's
notation can be pretty awful too ( I thought I'd see about implementing APL's For a regular matrix product, !!C = AB!! means that !!c_{ij}!! is the dot product of the !!i!!th row of !!A!! and the !!j!!th column of !!B!!, so I implemented a dot product function:
OK, that was straightforward. The rows of !!A!! are right there, but we also need the columns from !!B!!, so here's a function to get those:
Also straightforward. After that I toiled for a very long time over the matrix product itself. My first idea was to turn !!A!! into a list of functions, each of which would dot-product one of the rows of !!A!! by a given vector. Then I would map each of these functions over the columns of !!B!!. Turning !!A!! into a list of functions was easy:
and getting the columns of !!B!! I had already done:
and now I just need to apply each row of functions in the first part to each column in the second part and collect the results:
I don't know why this turned out to be so damn hard. This is the sort of thing that ought to be really, really easy in Haskell. But I had many difficulties. First I wasted a bunch of time trying to get
whereas
and I needed to keep that extra structure. I tried all sorts of
tinkering with Another part of the problem was I didn't know any primitive for “map a list of functions over a single argument”. Although it's not hard to write, I had some trouble thinking about it after I wrote it:
Then the “map each function over each list of arguments” is
and this almost works, except it produces the columns of the results
instead of the rows. There is an easy fix and a better fix. The easy
fix is to just transpose the final result. I never did find the
better fix. I thought I'd be able to replace Anyway this did work:
but that So then I went down a rabbit hole and wrote nine more versions of
I finally settled on
The result was okay, but it took me so long to get there. Now I have
just use:
Except uh oh, that I tinkered a bit with requiring a Monoid instance for the matrix
entries, which seemed interesting at least, but to do that I would
need to switch monoids in the middle of the computation and I didn't
want to think about how to do that. So instead I wrote a version of
This fails on empty lists, which is just fine, since I wasn't planning on multiplying any empty matrices. Then I have the final answer:
It's nice and short, but on the other hand it has that mysterious As for the shortness, let's see what it looks like in a more conventional language:
Wow, that was amazingly easy.
Okay, that was kind of a mess. The I think the standard Python answer to this is that you don't need
I don't know how I feel about that argument in general but in this case the result was lovely. I have no complaints. While I was writing the Python program I got a weird bug that turned
out to be related to mutability: I had initialized
But this makes the rows of A lot of the mess in the code is because Python is so obstinate about extending lists when you need them extended, you have to say pretty please every time. Maybe I can get rid of that by using more list comprehensions?
Python's list comprehensions usually make me long for Haskell's, which are so much nicer, but this time they were fine. Python totally wins here. No wait, that's not fair: maybe I should have been using list comprehensions in Haskell also?
Yeah, okay. All that
Well, lesson learned. I really wish I could write Haskell faster. In the mid-1990s I wrote thousands of lines of SML code and despite (or perhaps because of) SML's limitations I was usually able to get my programs to do what I wanted. But when I try to write programs in Haskell it takes me a really long time to get anywhere. Apropos of nothing, today is the 77th birthday of Dennis M. Ritchie. [ Addendum: It took me until now to realize that, after all that, the operation I wanted for polynomials is not matrix multiplication. Not at all! It is actually a convolution: $$ c_k = \sum_{i+j=k} a_ib_j $$ or, for my weird functional version, replace the multiplication !!a_ib_j!! with function composition !!a_i ∘ b_j!!. I may implement this later, for practice. And it's also tempting to try to do it in APL, even though that would most likely be a terrible waste of time… ] [ Addendum 20180909: Vaibhav Sagar points out that my [Other articles in category /prog] permanent link Sat, 08 Sep 2018
Why I never finish my Haskell programs (part 2 of ∞)
Here's something else that often goes wrong when I am writing a Haskell program. It's related to the problem in the previous article but not the same. Let's say I'm building a module for managing polynomials. Say
Now clearly this is going to be a functor, so I define the Functor instance, which is totally straightforward:
Then I ask myself if it is also going to be an Applicative.
Certainly the
But what about
The first argument there is a polynomial whose coefficients are functions. This is not something we normally deal with. That ought to be the end of the matter. But instead I pursue it just a little farther. Suppose we did have such an object. What would it mean to apply a functional polynomial and an ordinary polynomial? Do we apply the functions on the left to the coefficients on the right and then collect like terms? Say for example $$\begin{align} \left((\sqrt\bullet) \cdot x + \left(\frac1\bullet\right) \cdot 1 \right) ⊛ (9x+4) & = \sqrt9 x^2 + \sqrt4 x + \frac19 x + \frac14 \\ & = 3x^2 + \frac{19}{9} x + \frac 14 \end{align}$$ Well, this is kinda interesting. And it would mean that the
Then the ⊛ can be understood to be just like polynomial
multiplication, except that coefficients are combined with function
composition instead of with multiplication. The operation is
associative, as one would hope and expect, and even though the ⊛
operation is not commutative, it has a two-sided identity element,
which is This is different from the failure mode of the previous article because in that example I was going down a Haskell rabbit hole of more and more unnecessary programming. This time the programming is all trivial. Instead, I've discovered a new kind of mathematical operation and I abandon the programming entirely and go off chasing a mathematical wild goose. [ Addendum 20181109: Another one of these. ] [Other articles in category /prog/haskell] permanent link Mon, 03 Sep 2018
Why I never finish my Haskell programs (part 1 of ∞)
Whenever I try to program in Haskell, the same thing always goes wrong. Here is an example. I am writing a module to operate on polynomials. The polynomial !!x^3 - 3x + 1!! is represented as
[ Addendum 20180904: This is not an error. The !!x^3!! term is last, not first. Much easier that way. Fun fact: two separate people on Reddit both commented that I was a dummy for not doing it the easy way, which is the way I did do it. Fuckin' Reddit, man. ] I want to add two polynomials. To do this I just add the corresponding coefficients, so it's just
Except no, that's wrong, because it stops too soon. When the lists
are different lengths,
and I can write this off the top of my head. But do I? No, this is where things go off the rails. “I ought to be
able to generalize this,” I say. “I can define a function like
as long as there is a suitable Monoid instance for the I could write So do I write Then I open a new file and start writing
And I go father and farther down the rabbit hole and I never come back
to what I was actually working on. Maybe the next step in this
descent into madness is that I start thinking about how to perform
unification of arbitrary algebraic data structures, I abandon Actually when I try to program in Haskell there a lot of things that go wrong and this is only one of them, but it seems like this one might be more amenable to a quick fix than some of the other things. [ Addendum 20180904: A lobste.rs
user
points out that I don't need Monoid, but only Semigroup, since
I don't need [ Addendum 20181109: More articles in this series: [2] [3] ] [Other articles in category /prog/haskell] permanent link |