| The Universe of Discourse | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
12 recent entries Archive:
In this section:
Comments disabled |
Sat, 12 Jul 2008
runN revisited
Check it out. Thank you, M. Crane.
[Other articles in category /prog] permanent link
Another useful utility
#!/usr/bin/perl
my $field = shift or usage();
$field -= 1 if $field > 0;
$|=1;
while (<>) {
chomp;
my @f = split;
print $f[$field], "\n";
}
sub usage {
print STDERR "$0 fieldnumber\n";
exit 1;
}
I got tired of writing awk '{print $11}' when I wanted to
extract the 11th field of some stream of data in a Unix pipeline,
which is something I do about six thousand times a day. So
I wrote this tiny thing. It was probably the most useful piece of
software I wrote in that calendar year, and as you can see from the
length, it certainly had the best cost-to-benefit ratio. I use it
every day.The point here is that you can replace awk '{print $11}' with just f 11. For example, f 11 access_log finds out the referrer URLs from my Apache httpd log. I also frequently use f -1, which prints the last field in each line. ls -l | grep '^l' | f -1 prints out the targets of all the symbolic links in the current directory. Programs like this won't win me any prizes, but they certainly are useful. Anyway, today's post was inspired by another similarly tiny utility that I expect will be similarly useful that I just finished. It's called runN:
#!/usr/bin/perl
use Getopt::Std;
my %opt;
getopts('r:n:c:v', \%opt) or usage();
$opt{n} or usage();
$opt{c} or usage();
@ARGV = shuffle(@ARGV) if $opt{r};
my $N = $opt{n};
my %pid;
while (@ARGV) {
if (keys(%pid) < $N) {
$pid{spawn($opt{c}, split /\s+/, shift @ARGV)} = 1;
} else {
delete $pid{wait()};
}
}
1 while wait() >= 0;
sub spawn {
my $pid = fork;
die "fork: $!" unless defined $pid;
return $pid if $pid;
exec @_;
die "exec: $!";
}
You can tell I just finished it because the shuffle() and
usage() functions are unimplemented.The idea is that you execute the program like this:
runN -n 3 -c foo arg1 arg2 arg3 arg4...and it runs the commands foo arg1, foo arg2, foo arg3, foo arg4, etc., simultaneously, but with no more than 3 running at a time. The -n option says how many commands to run simultaneously; after running that many the main control waits until one has exited before starting another. If I had implemented shuffle(), then -r would run the commands in random order, instead of in the order specified. Probably I should get rid of -c and just have the program take the first argument as the command name, so that the invocation above would become runN -n 3 foo arg1 arg2 arg3 arg4.... The -v flag, had I implemented it, would put the program into verbose mode. I find that it's best to defer the implementation of features like -r and -v until I actually need them, which might be never. In the past I've done post-analyses of the contents of ~mjd/bin, and what I found was that my tendency was to implement a lot more features than I needed or used. In the original implementation, the -n is mandatory, because I couldn't immediately think of a reasonable default. The only obvious choice is 1, but since the point of the program was to run programs concurrently, 1 is not reasonable. But it occurs to me now that if I let -n default to 1, then this command would replace many of my current invocations of:
for i in ...; do cmd $i donewhich I do quite a lot. Typing runN cmd ... would be a lot quicker and easier. As I've written before, when a feature you put in turns out to have unanticipated uses, it's a sign of a good, modular design. The code itself makes me happy for two reasons. One is that the program worked properly on the first try, which does not happen very often for me. When I was in elementary school, my teachers always complained that although I was very bright, I made a lot of careless mistakes because I was not methodical enough. They tried hard to fix this personality flaw. They did not succeed. The other thing I like about the code is that it's so very brief. Not to say that it is any briefer than it should be; I think it's just about perfect. One of the recurring themes of my study of programming for the last few years is that beginner programmers use way more code than is necessary, just like beginning writers use way too many words. The process and concurrency management turned out to be a lot easier than I thought they would be: the default Unix behavior was just exactly what I needed. I am particularly pleased with delete $pid{wait()}. Sometimes these things just come together. The 1 while wait() >= 0 line is a non-obfuscated version of something I wrote in my prize-winning obfuscated program, of all places. Sometimes the line between the sublime and the ridiculous is very fine indeed. Despite my wariness of adding unnecessary features, there is at least one that I will put in before I deploy this to ~mjd/bin and start using it. I'll implement usage(), since experience has shown that I tend to forget how to invoke these things, and reading the usage message is a quicker way to figure it out than is rereading the source code. In the past, usage messages have been good investments. I'm tempted to replace the cut-rate use of split here with something more robust. The problem I foresee is that I might want to run a command with an argument that contains a space. Consider:
runN -n 2 -c ls foo bar "-l baz"This runs ls foo, then ls bar, then ls -l baz. Without the split() or something like it, the third command would be equivalent to ls "-l baz" and would fail with something like -l baz: no such file or directory. (Actually it tries to interpret the space as an option flag, and fails for that reason instead.) So I put the split in to enable this usage. (Maybe this was a you-ain't-gonna-need-it moment; I'm not sure.) But this design makes it difficult or impossible to apply the command to an argument with a space in it. Suppose I'm trying to do ls on three directories, one of which is called old stuff. The natural thing to try is:
runN -n 2 -c ls foo bar "old stuff"But the third command turns into ls old stuff and produces:
ls: old: No such file or directory ls: stuff: No such file or directoryIf the split() were omitted, it would just work, but then the ls -l baz example above would fail. If the split() were replaced by the correct logic, I would be able to get what I wanted by writing something like this:
runN -n 2 -c ls foo bar "'old stuff'"But as it is this just produces another error:
ls: 'old: No such file or directory ls: stuff': No such file or directoryPerl comes standard with a library called ShellWords that is probably close to what I want here. I didn't use it because I wasn't sure I'd actually need it—only time will tell—and because shell parsing is very complicated and error-prone, more so when it is done synthetically rather than by the shell, and even more so when it is done multiple times; you end up with horrible monstrosities like this:
s='q=`echo "$s" | sed -e '"'"'s/'"'"'"'"'"'"'"'"'/'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'/g'"'"'`; echo "s='"'"'"$q"'"'"'"; echo $s' q=`echo "$s" | sed -e 's/'"'"'/'"'"'"'"'"'"'"'"'/g'`; echo "s='"$q"'"; echo $sSo my fear was that by introducing a double set of shell-like interpretation, I'd be opening a horrible can of escape character worms and weird errors, and my hope was that if I ignored the issue the problems might be simpler, and might never arise in practice. We'll see. [ Addendum 20080712: Aaron Crane wrote a thoughtful followup. Thank you, M. Crane. ]
[Other articles in category /prog] permanent link
Period three and chaos
A Möbius function is simply a function of the form f : x → (ax + b) / (cx + d) for some constants a, b, c, and d. Möbius functions are of major importance in complex analysis, where they correspond to certain transformations of the Riemann sphere, but I'm mostly looking at the behavior of Möbius functions on the reals, and so restricting a, b, c, and d to be real.
One nice thing about the Möbius functions is that you can identify the
Möbius function f : x → (ax + b) /
(cx + d) with the matrix
The matrices are not quite identical with the Möbius functions,
because the matrix You can also consider various subgroups of PGL(2), such as the subgroup that leaves the set {0, 1, ∞, -1} fixed. The reciprocal function x → 1/x is one such; it leaves 1 and -1 fixed and exchanges 0 and ∞. In general a Möbius function has three degrees of freedom, since you can choose the four constants a, b, c, and d however you like, but one degree of freedom is removed because of the equivalence relation—or, to look at it another way, you get to pick b/a, c/a, and d/a however you like. So in general you can pick any p, q, and r and find the unique Möbius function m with m(0) = p, m(1) = q, m(-1) = r. These then determine m(∞), which turns out to be (4qr - 2p(q+r))/(q + r - 2p) when that is defined. And sometimes even when it isn't.
You may be worrying about the infinities here, but it's really nothing
much to worry about. f(∞) is nothing more than If (4qr - 2p(q+r))/(q + r - 2p) in the presence of infinities worries you, try a few examples. For instance, consider m : x → x+1. This function has p = m(0) = 1, q = m(1) = 2, r = m(-1) = 0. Plugging into the formula, we get m(∞) = -2pq/(q - 2p) = -4 / (2-2) = -4/0 = ∞, which is just right. The only other thing you have to remember is that +∞ = -∞, because we're really living on the Riemann sphere. Or rather, we're living on the real part of the Riemann sphere, but either way there's only one ∞. We might call this space the "Riemann circle", but I've never heard it called that. And neither has Google, although it did turn up a bulletin board post in which someone else asked the same question in a similar context. There's a picture of it farther down on the right. Anyway, most choices of p, q, and r in {0, 1, ∞, -1} do not get you permutations of {0, 1, ∞, -1}, because they end up mapping ∞ outside that set. For example, if you take p = 1, q = -1, r = 0, you get m(∞) = -2/3. But obviously the identity function has the desired property, and if you think about the Riemann circle (excuse me, Riemann sphere) you immediately get the rest: any rigid motion of the Riemann sphere is a Möbius function, and some of those motions permute the four points {0, 1, ∞, -1}. In fact, there are eight such functions, because {0, 1, ∞, -1} are at the vertices of a square, so any rigid motion of the Riemann sphere that permutes {0, 1, ∞, -1} must be a rigid motion of that square, and the square has eight symmetries, namely the elements of the group D4:
Here we have eight functions on the reals which make the group D4 under the operation of composition. For example, if f(x) = (x+1)/(x-1), then f(f(f(f(x)))) = x. Isn't that nice? Anyway, none of that was what I was really planning to talk about. (You knew that was coming, didn't you?) What I wanted to discuss was the function f : x → 1 / (1 - x). I found this function because I was considering other permutations of {0, 1, ∞, -1}. The f function takes 0 → 1 → ∞ → 0. (It also takes -1 → 1/2, and so is not one of the functions in the D4 table above.) We say that f has a periodic point of order 3 because f(f(f(x))) = x for some x; in this case at least for x ∈ {0, 1, ∞}. A function with a periodic point of order three is not something you see every day, and I was somewhat surprised that as simple a function as 1/(1-x) had one. But if you do the algebra and calculate f(f(f(x))) explicitly, you find that you do indeed get x, so every point is a periodic point of order 3, or possibly 1.
Or you can do a simpler calculation: since
f is the Möbius function that corresponds to the matrix
F = This also gives you a simple matrix M for which M7 = M, if you happened to be looking for such a thing. I had noticed a couple of years ago that this 1/(1-x) function had period 3, and then forgot about it. Then I noticed it again a few weeks ago, and a nagging question came into my mind, which is reflected in a note I wrote in my notebook at that point: "WHAT ABOUT SARKOVSKY'S THEOREM?" Well, what about it? Sharkovskii's theorem (I misspelled it in the notebook) is a delightful generalization of the "Period three implies chaos" theorem of Li and Yorke. It says, among other things, that if a continuous function of the reals has a periodic point of order 3, then it also has a periodic point of order n for all positive integers n. In particular, we can take n=1, so the function f, which has a periodic point of order 3 must also have a fixed point. But it's quite easy to see that f has no fixed point on the reals: Just put f(x) = 1/(1-x) = x and solve for x; there are no real solutions. So what about Sharkovskii's theorem? Oh, it only applies to continuous functions, and f is not, because f(1) = ∞. So that's all right. The Sharkovskii thing is excellent. The Sharkovskii ordering of the integers is:
3 < 5 < 7 < 9 < ... The 1/(1-x) function led me to read more about Sharkovskii's theorem and its predecessor, the "period three implies chaos" theorem. Isn't that a great name for a theorem? And Li and Yorke knew it, because that's what they titled their paper. "Chaos" in this context means the following: say that two values a and b are "scrambled" by f if, for any given d and ε, there is some n for which |fn(a) - fn(b)| > d, and some m for which |fm(a) - fm(b)| < ε. That is, a and b are scrambled if repeated application of f drives a and b far apart, then close together, then far apart again, and so on. Then, if f is a continuous function with a periodic point of order 3, there is some uncountable set S of reals such that f scrambles all distinct pairs of values a and b from S. All that was from memory; I hope it got it more or less correct. (The Li and Yorke paper also includes an example of a continuous function with a periodic point of order 5 but no periodic point of order 3. It's pretty simple.)
I was pleased to finally be studying this material, because it was a very early inspiration to me. When I was about fourteen, my cousin Alex, who is an analytic chemist, came to visit, and told me about period-doubling and chaos in the logistic map. (It was all over the news at the time.) The logistic map is just f : x → λx(1-x) for some constant λ. For small &lambda, the map has a single fixed point, which increases as λ does. But at a certain critical value of λ (λ=3, actually) the function's behavior changes, and it suddenly begins to have a periodic point of order 2. As λ increases further, the behavior changes again, and the periodicity changes from order 2 to order 4. As &lambda increases, this happens again and again, with the splits occurring at exponentially closer and closer values of λ. Eventually there is a magic value of λ at which the function goes berserk and is chaotic. Chaos continues for a while, and then the function develops a periodic point of order 3, which bifurcates... (The illustration here, which I copied from Wikipedia, uses r instead of λ.) I was deeply impressed. For some reason I got the idea that I would need to understand partial differential equations to understand the chaos and the logistic map, so I immeditately set out on a program to learn what I thought I would need to know. I enrolled in differential equations courses at Columbia University instead of in something more interesting. The partial differential equations turned out to be a sidetrack, but in those days there were no undergraduate courses in iterated dynamic systems. I am happy to discover that after only twenty-five years I am finally arriving at the destination. Cousin Alex also told me to carry a notebook and pen with me wherever I went. That was good advice, and it took me rather less time to learn.
[Other articles in category /math] permanent link |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||