The Universe of Discourse


Sat, 26 Mar 2022

U.S. surnames with no vowels

While writing the recent article about Devika Icecreamwala (born Patel) I acquired the list of most common U.S. surnames. (“Patel” is 95th most common; there are about 230,000 of them.) Once I had the data I did many various queries on it, and one of the things I looked for was names with no vowels. Here are the results:

name rank count
NG 1125 31210
VLK 68547 287
SMRZ 91981 200
SRP 104156 172
SRB 129825 131
KRC 149395 110
SMRT 160975 100

It is no surprise that Ng is by far the most common. It's an English transcription of the Cantonese pronunciation of , which is one of the most common names in the world. belongs to at least twenty-seven million people. Its Mandarin pronunciation is Wu, which itself is twice as common in the U.S. as Ng.

I suspect the others are all Czech. Vlk definitely is; it's Czech for “wolf”. (Check out the footer of the Vlk page for eighty other common names that all mean “wolf”, including Farkas, López, Lovato, Lowell, Ochoa, Phelan, and Vuković.)

Similarly Smrz is common enough that Wikipedia has a page about it. In Czech it was originally Smrž, and Wikipedia mentions Jakub Smrž, a Czech motorcycle racer. In the U.S. the confusing háček is dropped from the z and one is left with just Smrz.

The next two are Srp and Srb. Here it's a little harder to guess. Srb means a Serbian person in several Slavic languages, including Czech and it's not hard to imagine that it is a Czech toponym for a family from Serbia. (Srb is also the Serbian word for a Serbian person, but an immigrant to the U.S. named Srb, coming from Czechia, might fill out the immigration form with “Srb” and might end up with their name spelled that way, whereas a Serbian with that name would write the unintelligible Срб and would probably end up with something more like Serb.) There's also a town in Croatia with the name Srb and the surname could mean someone from that town.

I'm not sure whether Srp is similar. The Serbian-language word for the Serbian language itself is Srpski (српски), but srp is also Slavic for “sickle” and appears in quite a few Slavic agricultural-related names such as Sierpiński. (It's also the name for the harvest month of August.)

Next is Krc. I guessed maybe this was Czech for “church” but it seems that that is kostel. There is a town south of Prague named Krč and maybe Krc is the háčekless American spelling of the name of a person whose ancestors came from there.

Last is Smrt. Wikipedia has an article about Thomas J. Smrt but it doesn't say whether his ancestry was Czech. I had a brief fantasy that maybe some of the many people named Smart came from Czech families originally named Smrt, but I didn't find any evidence that this ever happened; all the Smarts seem to be British. Oh well.

[ Bonus trivia: smrt is the Czech word for “death”, which we also meet in the name of James Bond's antagonist SMERSH. SMERSH was a real organization, its name a combination of смерть (/smiert/, “death”) and шпио́нам (/shpiónam/, “to spies”). Шпио́нам, incidentally, is borrowed from the French espion, and ultimately akin to English spy itself. ]

[ Addenda 20220327: Thanks to several readers who wrote to mention that Smrž is a morel and Krč is (or was) a stump or a block of wood, I suppose analogous to the common German name Stock. Petr Mánek corrected my spelling of háček and also directed me to KdeJsme.cz, a web site providing information about Czech surnames. Finally, although Smrt is not actually a shortened form of Smart I leave you with this consolation prize. ]


[Other articles in category /lang/etym] permanent link

Best occupational name ever?

For a long while I've been planning an article about occupational surnames, but it's not ready and this is too delightful to wait.

There is, in California, a dermatologist named:

Dr. Devika Icecreamwala, M.D.

Awesome.

“Wala” is an Indian-language suffix that indicates a person who deals in, transports, or otherwise has something to do with the suffixed thing. You may have heard of the famous dabbawalas of Mumbai. A dabba is a lunchbox, and in Mumbai thousands of dabbawalas supply workers with the hot lunches that were cooked fresh in the workers’ own homes that morning.

Similarly, an icecreamwala is an ice cream vendor. Apparently there was some point during the British Raj that the Brits went around handing out occupational surnames, and at least one ice cream wala received the name Icecreamwala.

It is delightful enough that Dr. Icecreamwala exists, but the story gets better. Icecreamwala is her married name. She was born Devika Patel. Some people might have stuck with Patel, preferring the common and nondescript to the rare and wonderful. Not Dr. Icecreamwala! She not only changed her name, she embraced the new one. Her practice is called Icecreamwala Dermatology and their internet domain is icecreamderm.com.

Ozy Brennan recently considered the problem of which parent's surnames to give to the children. and suggested that they choose whichever is coolest. Dr. Icecreamwala appears to be in agreement.

[ Addendum 20220328: Vaibhav Sagar informs me that Icecreamwala is probably rendered in Hindi as आइसक्रीमवाला. ]


[Other articles in category /misc] permanent link

Fri, 25 Mar 2022

My horse Pongo

I tried playing Red Dead Redemption 2 last week. I was a bit disappointed because I was hoping for Old West Skyrim but it's actually Old West GTA. I'm not sure how long I will continue.

Anyway, I acquired a new horse and was prompted to name it. My first try, “Pongo”, was rejected by the profanity filter. Puzzled, I supposed I had mistyped and included a ZWNJ or something. No, it was rejecting "Pongo”.

The only meaning I know for “Pongo” is that it is the name of the daddy dog in 101 Dalmatians. So I asked the Goog. The Goog shrugged and told me that was the only Pongo it knew also.

Steeling myself, I asked Urban Dictionary, preparing to learn that Pongo was obscene, racist, or probably both. Urban Dictionary told me that “Pongo” is 1900-era Brit slang for a soldier. (Which I suppose explains its appearance as the name of the dog.) Nothing obscene or racist.

I'm stumped. I forget what I ended up naming the horse.


[Other articles in category /lang] permanent link

Mon, 14 Mar 2022

There is a Unix error device

Yesterday I discussed /dev/full and asked why there wasn't a generalization of it, and laid out out some very 1990s suggestions that I have had in the back of my mind since the 1990s. I ended by acknowledging that there was probably a more modern solution in user space:

Eh, probably the right solution these days is to LD_PRELOAD a complete mock filesystem library that has any hooks you want in it.

Carl Witty suggested that there is a more modern solution in userspace, FUSE, and Leah Neukirchen filled in the details:

UnreliableFS is a FUSE-based fault injection filesystem that allows to change fault-injections in runtime using simple configuration file.

Also, Dave Vasilevsky suggested that something like this could be done with the device mapper.

I think the real takeaway from this is that I had not accepted the hard truth that all Unix is Linux now, and non-Linux Unix is dead.

Thanks everyone who sent suggestions.

[ Addendum: Leah Neukirchen informs me that FUSE also runs on FreeBSD, OpenBSD and macOS, and reminds me tht there are a great many MacOS systems. I should face the hard truth that my knowledge of Unix systems capabilities is at least fifteen yers out of date. ]


[Other articles in category /Unix] permanent link

Sun, 13 Mar 2022

Why no Unix error device?

Suppose you're writing some program that does file I/O. You'd like to include a unit test to make sure it properly handles the error when the disk fills up and the write can't complete. This is tough to simulate. The test itself obviously can't (or at least shouldn't) actually fill the disk.

A while back some Unix systems introduced a device called /dev/full. Reading from /dev/full returns zero bytes, just like /dev/zero. But all attempts to write to /dev/full fail with ENOSPC, the system error that indices a full disk. You can set up your tests to try to write to /dev/full and make sure they fail gracefully.

That's fun, but why not generalize it? Suppose there was a /dev/error device:

#include <sys/errdev.h>
error = open("/dev/error", O_RDWR);

ioctl(error, ERRDEV_SET, 23);

The device driver would remember the number 23 from this ioctl call, and the next time the process tried to read or write the error descriptor, the request would fail and set errno to 23, whatever that is. Of course you wouldn't hardwire the 23, you'd actually do

#include <sys/errno.h>

ioctl(error, ERRDEV_SET, EBUSY);

and then the next I/O attempt would fail with EBUSY.

Well, that's the way I always imagined it, but now that I think about it a little more, you don't need this to be a device driver. It would be better if instead of an ioctl it was an fcntl that you could do on any file descriptor at all.

Big drawback: the most common I/O errors are probably EACCESS and ENOENT, failures in the open, not in the actual I/O. This idea doesn't address that at all. But maybe some variation would work there. Maybe for those we go back to the original idea, have a /dev/openerror, and after you do ioctl(dev_openerror, ERRDEV_SET, EACCESS), the next call to open fails with EACCESS. That might be useful.

There are some security concerns with the fcntl version of the idea. Suppose I write malicious program that opens some file descriptor, dups it to standard input, does fcntl(1, ERRDEV_SET, ESOMEWEIRDERROR), then execs the target program t. Hapless t tries to read standard input, gets ESOMEWEIRDERROR, and then does something unexpected that it wasn't supposed to do. This particular attack is easily foiled: exec should reset all the file descriptor saved-error states. But there might be something more subtle that I haven't thought of and in OS security there usually is.

Eh, probably the right solution these days is to LD_PRELOAD a complete mock filesystem library that has any hooks you want in it. I don't know what the security implications of LD_PRELOAD are but I have to believe that someone figured them all out by now.

[ Addendum 20220314: Better solutions exist. ]


[Other articles in category /Unix] permanent link

Wed, 09 Mar 2022

Bad but interesting mathematical notation idea

Zaz Brown showed up on Math SE yesterday with a proposal to make mathematical notation more uniform. It's been pointed out several times that the expressions

$$y^n = x \qquad n = \log_y x \qquad y=\sqrt[n]x $$

all mean the same thing, and yet look completely different. This has led to proposals to try to unify the three notations, although none has gone anywhere. (For example, this Math SE thread .)

!!\def\o{\overline}\def\u{\underline}!!

In this new thread, M. Brown has an interesting observation: exponentiation also unifies addition and multiplication. So write !!\o x!! to mean !!e^x!!, and !!\u x!! to mean !!\ln x!!, and leave multiplication as it is. Now !!x^y!! can be written as !!\o{\u x y}!! and !!x+y!! can be written as !!\u{\bar x \! \bar y}!!.

Well, this is a terrible idea, and I'll explain why I think so in some detail. But I really hope nobody will think I mean this as any sort of criticism of its author. I have a lot of ideas too, and most of them are amazingly bad, way worse than this one. Having bad ideas doesn't make someone a bad person. And just because an idea is bad, doesn't mean it wasn't worth considering; thinking about ideas is how you decide which ones are bad and which aren't. M. Brown's idea was interesting enough for me to think about it and write an article. That's a compliment, not a criticism.

I'm deeply interested in notation. I think mathematicians don't yet understand the power of mathematical notation and what it does. We use it, but we don't understand it. I've observed before that you can solve algebraic equations or calculus problems just by “pushing around the symbols”. But why can you do that? Where is the meaning, and how do the symbols capture the meaning? How does that work? The fact that symbols in general can somehow convey meaning is a deep philosophical mystery, not just in mathematics but in all communication, and nobody understands how it works. Mathematical symbols can be even more amazing: they don't just tell you what other people were thinking, they tell you things themselves. You rearrange them in a certain way and they smile and whisper secrets: “now you can see this function is everywhere zero”, “this is evidently unbounded” or “the result is undefined when !!\lvert x_1\rvert > \frac 23!!”. It's almost as if the symbols are doing some of the thinking for you.

Anyway this particular idea is not good, but maybe we can learn something from its failure modes?

Here's how you would write !!x^2+x!!: $$\u{\o{\o{2\u x}}{\o x}}$$

Zaz Brown suggested that this expression might be better written as !!x{\u{\o x \o 1}}!!, which is analogous to !!x(x+1)!!, but I think that reply misses a very important point: you need to be able to write both expressions so that you can equate them, or transform one into the other. The expression !!x(x+1)!! is useful because you can see at a glance that it is composite for all integer !!x!! larger than 1, and actually twice a composite for sufficiently large !!x!!. (This is the kind of thing I had in mind when I said the symbols whisper secrets to you.) !!x^2+x!! is useful in different ways: you can see that it's !!\Theta(x^2)!! and it's !!(x+1)^2 - (x+1)!! and so on. Both are useful and you need to be able to turn one into the other easily. Good notation facilitates that sort of conversion.

M. Brown's proposal actually has at least two components. One component is its choice of multiplication, exponentials and logarithms as the only first-class citizens. The other is the specific way that was chosen to write these, with the over- and underbars. This second component is no good at all, for purely typographic reasons. These three expressions look almost identical but have completely different meanings: $$ \u{\o a\, \o c}\qquad \u{\o { ac}} \qquad \o{\u a\, \u c}.$$

In fact, the two on the right were almost indistinguishable until I told MathJax to put in some extra space. I'm sure you can imagine similar problems with !!\u{\o{\o{2\u x}}}{\o x}!! turning into !!\u{\o{\o{2\u x x}}}!! or !!\u{\o{\o{2\u x }x}}!! or whatever. Think of how easy it is to drop a minus sign; this is much worse.

[ Addendum 20220308: Earlier, I had said that !!x+y!! could be written as !!\u{\bar x\bar y}!!. A Gentle Reader pointed out that the bar on the bottom wasn't connected but should have been, as on the far right of this screenshot:

Screenshot of blog text “x+y can be written as (xy) (xy)” where in each case both the x and the y have overbars, and the whole thing has an underbar, except that on the right the underbar has a tiny break, and on the left the x and y have been squished together uncomfortably to eliminate the break in the underbar.

I meant it to be connected and what I wrote asked for it to be connected, but MathJax, which formats the math formulas on the blog, didn't connect it. To remove the gap, I had to explicitly subtract space between the !!x!! and the !!y!!. ]

But maybe the other component of the proposal has something to it and we will find out what it is if we fix the typographic problem with the bars. What's a good alternative?

Maybe !!\o x = x^\bullet!! and !!\u x = x_\bullet!! ? On the one hand we get the nice property that !!x^\bullet_\bullet = x!!. But I think the dots would make my head swim. Perhaps !!\o x = x\top!! and !!\u x = x\bot!!? Let's try.

Good notation facilitates transformation of expressions into equal expressions. The !!\top\bot!! notation allows us to easily express the simple identities $$a\top\bot \quad = \quad a\bot\top \quad = \quad a.$$ That kind of thing is good, although the dots did it better. But I couldn't find anything else like it.

Let's see what the distributive law looks like. In standard notation it is $$a(b+c) = ab + ac.$$ In the original bar notation it was $$a\u{\o b\o c} = \u{\o{ab}\, \o{ac}}.$$ This looks uncouth but perhaps would not be worse once one got used to it.

With the !!\top\bot!! idea we have

$$ a(b\top c\top)\bot = ((ab)\top(ac)\top)\bot. $$

I had been hoping that by making the !!\top!! and !!\bot!! symbols postfix we'd be able to avoid parentheses. That didn't happen: without the parentheses you can't distinguish between !!(ab)\top!! and !!a(b\top)!!. Postfix notation is famous for allowing you to omit parentheses, but that's only if your operators all have fixed arity. Here the invisible variadic multiplication ruins that. And making it visible dyadic multiplication is not really an improvement:

$$ ab\top c\top\cdot\cdot\bot = ab\cdot\top ac\cdot \top\cdot \bot. $$

You know what I think would happen if we actually tried to use this idea? Someone would very quickly invent an abbreviation for !!\u{\o {x_1}\, \o {x_2} \cdots \o{x_k}}!!, I don't know, something like “!!x_1 + x_2 + \ldots + x_k!!” maybe. (It looks crazy, I know, but it might just work.) Because people might like to discuss the fact that $$ \u{\o 2\, \o 3 } = 5$$ and without an addition sign there seems to be no way to explain why this should be.

Well, I have been turning away from the real issue for a while now, but !!a(b\top c\top)\bot = !! !!((ab)\top(ac)\top)\bot!! forces me to confront it. The standard expression of the distributive law equates a computation with two operations and another with three. The computations expressed by the new notation involve five and six operations respectively. Put this way, the distributive law is no longer simple!

This reminds me of the earlier suggestion that if !!x^2+x!! is too complicated, one can write !!x(x+1)!! instead. But expressions don't only express a result, they express a way of arriving at that result. The purpose of an equation is to state that two different computations arrive at the same result. Yes, it's true that $$a+b = \ln e^ae^b,$$ but the two computations are not the same! If they were, the statement would be vacuous. Instead, it says that the simple computation on the left arrives at the same result as the complicated one on the right, an interesting thing to know. “!!2+3=5!!” might imply that !!e^2\cdot e^3=e^5!! but it doesn't say the same thing.

Here's my takeaway from consideration of the Zaz Brown proposal:

It's not sufficient for a system of notation to have a way of expressing every result; it has to be able to express every possible computation.

Put that way, other instructive examples come to mind. Consider Egyptian fractions. It's known that every rational number between !!0!! and !!1!! can be written in the form $$\frac1{a_1} + \frac1{a_2} + \ldots + \frac1{a_n}$$ where !!\{ a_i\}!! is a strictly increasing sequence of positive integers. For example $$\frac 7{23} = \frac 14 + \frac1{19} + \frac1{583} + \frac1{1019084}$$ or with a bit more ingenuity, $$\frac7{23} = \frac16 + \frac1{12} + \frac1{23} + \frac1{138} + \frac1{276},$$ longer but less messy. The ancient Egyptians did in fact write numbers this way, and when they wanted to calculate !!2\cdot\frac17!!, they had to look it up in a table, because writing !!\frac27!! was not an expressible computation, it had to be expressed in terms of reciprocals and sums, so !!2\cdot\frac 17 = \frac14 + \frac1{28}!!. They could write all the numbers, but they couldn't write all the ways of making the numbers.

(Neither can we. We can write the real root of !!x^3-2!! as !!\sqrt[3]2!!, but there is no effective notation for the real root of !!x^5+x-1!!. The best we can do is something like “!!0.75488\ldots!!”, which is even less effective than how the Egyptians had to write !!\frac27!! as !!\frac14+\frac1{28}!!.)

Anyway I think my conclusion from all this is that a practical mathematical notation really must have a symbol for addition, which is not at all surprising. But it was fun and interesting to see what happened without it. It didn't work well, but maybe the next idea will be better.

Thanks again, Zaz Brown.


[Other articles in category /math/se] permanent link