The Universe of Discourse


Thu, 27 Feb 2014

2banner, which tells you when someone else is looking at the same web page

I was able to release a pretty nice piece of software today, courtesy of my employer, ZipRecruiter. If you have a family of web pages, and whenever you are looking at one you want to know when someone else is looking at the same page, you can use my package. The package is called 2banner, because it pops up a banner on a page whenever two people are looking at it. With permission from ZipRecruiter, I have put it on github, and you can download and use it for free.

A typical use case would be a customer service organization. Say your users create requests for help, and that the customer service reps have to answer the requests. There is a web page with a list of all the unserviced requests, and each one has a link to a page with details about what is requested and how to contact the person who made the request. But it might sometimes happes that Fred and Mary independently decide to service the same request, which is at best a waste of effort, and at worst confusing for the customer who gets email from both Fred and Mary and doesn't know how to respond. With 2banner, when Mary arrives at the customer's page, she sees a banner in the corner that says Fred is already looking at this page!, and at the same time a banner pops up in Fred's browser that says Mary has started looking at this page! Then Mary knows that Fred is already on the case, and she can take over a different one, or Fred and Mary can communicate and decide which of them will deal with this particular request.

You can similarly trick out the menu page itself, to hide the menu items that someone is already looking out.

I wanted to use someone else's package for this, but I was not able to find one, so I wrote one myself. It was not as hard as I had expected. The system comprises three components:

  • The back-end database for recording who started looking at which pages and when. I assumed a SQL database and wrote a component that uses Perl's DBIx::Class module to access it, but it would be pretty easy throw this away and use something else instead.

  • An API server that can propagate notifications like “user X is now looking at page Y” and “user X is no longer looking at page Y” into the database, and which can answer the question “who else is looking at page Y right now?”. I used Perl's Catalyst framework for this, because our web app already runs under it. It would not be too hard to throw this away and use something else instead. You could even write a standalone server using HTTP::Server, and borrow most of the existing code, and probably have it working in under an hour.

  • A JavaScript thingy that lives in the web page, sends the appropriate notifications to the API server, and pops up the banner when necessary. I used jQuery for this. Probably there is something else you could use instead, but I have no idea what it is, because I know next to nothing about front-end web programming. I was happy to have the chance to learn a little about jQuery for this project.

Often a project seems easy but the more I think about it the harder it seems. This project was the opposite. I thought it sounded hard, and I didn't want to do it. It had been an outstanding request of the CS people for some time, but I guess everyone else thought it sounded hard too, because nobody did anything about it. But the longer I let it percolate around in my head, the simpler it seemed. As I considered it, one difficulty after another evaporated.

Other than the jQuery stuff, which I had never touched before, the hardest part was deciding what to do about the API server. I could easily have written a standalone, special-purpose API server, but I felt like it was the wrong thing, and anyway, I didn't want to administer one. But eventually I remembered that our main web site is driven by Catalyst, which is itself a system for replying to HTTP requests, which already has access to our database, and which already knows which user is making each request.

So it was natural to say that the API was to send HTTP requests to certain URLs on our web site, and easy-peasy to add a couple of handlers to the existing Catalyst application to handle the API requests, query the database, and send the right replies.

I don't know why it took me so long to think of doing the API server with Catalyst. If it had been someone else's suggestion I would probably feel dumb for not having thought of it myself, because in hindsight it is so obvious. Happily, I did think of it, because it is clearly the right solution for us.

It was not too hard to debug. The three components are largely independent of one another. The draft version of the API server responded to GET requests, which are easier to generate from the browser than the POST requests that it should be getting. Since the responses are in JSON, it was easy to see if the API server was replying correctly.

I had to invent techniques for debugging the jQuery stuff. I didn't know the right way to get diagnostic messages out of jQuery, so I put a big text area on the web page, and had the jQuery routines write diagnostic messages into it. I don't know if that's what other people do, but I thought it worked pretty well. JavaScript is not my ideal language, but I program in Perl all day, so I am not about to complain. Programming the front end in JavaScript and watching stuff happen on the page is fun! I like writing programs that make things happen.

The package is in ZipRecruiter's GitHub repository, and is available under a very lenient free license. Check it out.

(By the way, I really like working for ZipRecruiter, and we are hiring Perl and Python developers. Please send me email if you want to ask what it is like to work for them.)


[Other articles in category /tech] permanent link

Mon, 10 Feb 2014

The perfect machine

You sometimes hear people claim that there is no perfectly efficient machine, that every machine wastes some of its input energy in noise or friction.

However, there is a counterexample. An electric space heater is perfectly efficient. Its purpose is to heat the space around it, and 100% of the input energy is applied to this purpose. Even the electrical energy lost to resistance in the cord you use to plug it into the wall is converted to heat.

Wait, you say, the space heater does waste some of its energy. The coils heat up, and they emit not only heat, but also light, which is useless, being a dull orange color. Ah! But what happens when that light hits the wall? Most of it is absorbed, and heats up the wall. Some is reflected, and heats up a different wall instead.

Similarly, a small fraction of the energy is wasted in making a quiet humming noise—until the sound waves are absorbed by the objects in the room, heating them slightly.

Now it's true that some heat is lost when it's radiated from the outside of the walls and ceiling. But some is also lost whenever you open a window or a door, and you can't blame the space heater for your lousy insulation. It heated the room as much as possible under the circumstances.

So remember this when you hear someone complain that incandescent light bulbs are wasteful of energy. They're only wasteful in warm weather. In cold weather, they're free.


[Other articles in category /physics] permanent link

Sat, 08 Feb 2014

Everything needs an ID

I wrote some time ago about Moonpig's use of GUIDs: every significant object was given a unique ID. I said that this was a useful strategy I had only learned from Rik, and I was surprised to see how many previously tricky programming problems became simpler once the GUIDs were available. Some of these tricky problems are artifacts of Perl's somewhat limited implementation of hashes; hash keys must be strings, and the GUID gives you an instantaneous answer to any question about what the keys should be.

But it reminds me of a similar maxim which I was thinking about just yesterday: Every table in a relational database should have a record ID field. It often happens that I am designing some table and there is no obvious need for such a field. I now always put one in anyway, having long ago learned that I will inevitably want it for something.

Most recently I was building a table to record which web pages were being currently visited by which users. A record in the table is naturally identified by the pair of user ID and page URL; it is not clear that it needs any further keys.

But I put in a record ID anyway, because my practice is to always put in a record ID, and sure enough, within a few hours I was glad it was there. The program I was writing has not yet needed to use the record IDs. But to test the program I needed to insert and manipulate some test records, and it was much easier to write this:

update table set ... where record_id = 113;

than this:

update table set ... where user_id = 97531 and url = 'http://hostname:port/long/path/that/is/hard/to/type';

If you ever end up with two objects in the program that represesent record sets and you need to merge or intersect them synthetically, having the record ID numbers automatically attached to the records makes this quite trivial, whereas if you don't have them it is a pain in the butt. You should never be in such a situation, perhaps, but stranger things have happened. Just yesterday I found myself writing

    function relativize (pathPat) {
      var dummyA = document.createElement('a');
      dummyA.href = document.URL;
      return "http://" + dummyA.host + pathPat;
    }

which nobody should have to do either, and yet there I was. Sometimes programming can be a dirty business.

During the bootstrapping of the user-url table project some records with bad URLs were inserted by buggy code, and I needed to remove them. The URLs all ended in % signs, and there's probably some easy way to delete all the records where the URL ends in a % sign. But I couldn't remember the syntax offhand, and looking up the escape sequence for LIKE clauses would have taken a lot longer than what I did do, which was:

delete from table where record_id in (43, 47, 49)

So the rule is: giving things ID numbers should be the default, because they are generally useful, like handles you can use to pick things up with. You need a good reason to omit them.


[Other articles in category /IT] permanent link

Sat, 01 Feb 2014

Why I like Java

(为什么我喜欢Java)

My current employer uses an online quiz to pre-screen applicants for open positions. The first question on the quiz is a triviality, just to let the candidate get familiar with the submission and testing system. The question is to write a program that copies standard input to standard output. Candidates are allowed to answer the questions using whatever language they prefer.

Sometimes we get candidates who get a zero score on the test. When I see the report that they failed to answer even the trivial question, my first thought is that this should not reflect badly on the candidate. Clearly, the testing system itself is so hard to use that the candidate was unable to submit even a trivial program, and this is a failure of the testing system and not the candidate.

But it has happened more than once that when I look at the candidate's incomplete submissions I see that the problem, at least this time, is not necessarily in the testing system. There is another possible problem that had not even occurred to me. The candidate failed the trivial question because they tried to write the answer in Java.

I am reminded of Dijkstra's remark that the teaching of BASIC should be rated as a criminal offense. Seeing the hapless candidate get bowled over by a question that should be a mere formality makes me wonder if the same might be said of Java.

I'm not sure. It's possible that this is still a failure of the quiz. It's possible that the Java programmers have valuable skills that we could use, despite their inability to produce even a trivial working program in a short amount of time. I could be persuaded, but right now I have a doubtful feeling.

When you learn Perl, Python, Ruby, or Javascript, one of the things you learn is a body of technique for solving problems using hashes, which are an integral part of the language. When you learn Haskell, you similarly learn a body of technique for solving problems with lazy lists and monads. These kinds of powerful general-purpose tools are at the forefront of the language.

But when you learn Java, there aren't any powerful language features you can use to solve many problems. Instead, you spend your time learning a body of technique for solving problems in the language. Java has hashes, but if you are aware of them at all, they are just another piece of the immense Collections library, lost among the many other sorts of collections, and you have no particular reason to know about them or think about them. A good course of Java instruction might emphasize the more useful parts of the Collections, but since they're just another part of the library it may not be obvious that hashes are any more or less useful than, say, AbstractAction or zipOutputStream.

I was a professional Java programmer for three years (in a different organization), and I have meant for some time to write up my thoughts about it. I am often very bitter and sarcastic, and I willingly admit that I am relentlessly negative and disagreeable, so it can be hard to tell when I am in earnest about liking something. I once tried to write a complimentary article about Blosxom, which has generated my blog since 2006, and I completely failed; people thought I was being critical, and I had to write a followup article to clarify, and people still thought I was dissing Blosxom. Because this article about Java might be confused with sarcastic criticism, I must state clearly that everything in this article about Java is in earnest, and should be taken at face value. Including:

I really like Java

I am glad to have had the experience of programming in Java. I liked programming in Java mainly because I found it very relaxing. With a bad language, like say Fortran or csh, you struggle to do anything at all, and the language fights with you every step of the way forward. With a good language there is a different kind of struggle, to take advantage of the language's strengths, to get the maximum amount of functionality, and to achieve the clearest possible expression.

Java is neither a good nor a bad language. It is a mediocre language, and there is no struggle. In Haskell or even in Perl you are always worrying about whether you are doing something in the cleanest and the best way. In Java, you can forget about doing it in the cleanest or the best way, because that is impossible. Whatever you do, however hard you try, the code will come out mediocre, verbose, redundant, and bloated, and the only thing you can do is relax and keep turning the crank until the necessary amount of code has come out of the spout. If it takes ten times as much code as it would to program in Haskell, that is all right, because the IDE will generate half of it for you, and you are still being paid to write the other half.

So you turn the crank, draw your paycheck, and you don't have to worry about the fact that it takes at least twice as long and the design is awful. You can't solve any really hard design problems, but there is a book you can use to solve some of the medium-hard ones, and solving those involves cranking out a lot more Java code, for which you will also be paid. You are a coder, your job is to write code, and you write a lot of code, so you are doing your job and everyone is happy.

You will not produce anything really brilliant, but you will probably not produce anything too terrible either. The project might fail, but if it does you can probably put the blame somewhere else. After all, you produced 576 classes that contain 10,000 lines of Java code, all of it seemingly essential, so you were doing your job. And nobody can glare at you and demand to know why you used 576 classes when you should have used 50, because in Java doing it with only 50 classes is probably impossible.

(Different languages have different failure modes. With Perl, the project might fail because you designed and implemented a pile of shit, but there is a clever workaround for any problem, so you might be able to keep it going long enough to hand it off to someone else, and then when it fails it will be their fault, not yours. With Haskell someone probably should have been fired in the first month for choosing to do it in Haskell.)

So yes, I enjoyed programming in Java, and being relieved of the responsibility for producing a quality product. It was pleasant to not have to worry about whether I was doing a good job, or whether I might be writing something hard to understand or to maintain. The code was ridiculously verbose, of course, but that was not my fault. It was all out of my hands.

So I like Java. But it is not a language I would choose for answering test questions, unless maybe the grade was proportional to the number of lines of code written. On the test, you need to finish quickly, so you need to optimize for brevity and expressiveness. Java is many things, but it is neither brief nor expressive.

When I see that some hapless job candidate struggled for 15 minutes and 14 seconds to write a Java program for copying standard input to standard output, and finally gave up, without even getting to the real questions, it makes me sad that their education, which was probably expensive, has not equipped them with with better tools or to do something other than grind out Java code.


[Other articles in category /prog] permanent link