The Universe of Discourse


Sat, 31 May 2025

I didn't like jq in 2021 either

[ Previously: [1] ]

A couple of weeks ago Dan Vanderkam asked if it would be worth while putting in time figuring out how to use jq really well. I said I thought not and took three minutes to find a nice solid example of why jq sucks.

I went to turn that into a blog article and would have dropped it into the file prog/jq.html, but I found that file already existed and contained an unpublished article called “I don't like jq”.

This is that article, from 2021.

2021-06-23

I like the idea of jq, which is something like “AWK, but for JSON instead of space-separated records”. It reads a JSON input and then uses supplied script to transform it to a JSON output.

But I don't like jq itself because I find the language too hard to use. Two years in to using jq, I still struggle to do things I think should be simple.

Most recently, I had a dictionary like this:

    { "Fish": { ...
                "food": { ...
                          "modifiers": [ "huge meal",
                                         "meaty",
                                         "fishy",
                                         "fatty" ]
                        }
              },
      …
      "French Toast": { ...
                        "food": { ...
                                  "modifiers": [ "earthy",
                                                 "planty",
                                                 "fatty" ]
                                }
                      },
      …
      "Iron Sword": { ...
                      "food": null
                    },
      …
    }

I wanted jq to print out all the keys of “earthy” foods. So for example French Toast yes, Fish and Iron Sword no. This sounds simple. It wasn't simple. What I eventually came up with was:

     jq -r 'to_entries |
            .[] |
            select(.value.food.modifiers) |
            select(.value.food.modifiers | .[] | any(. ; . == "earthy")) |
            .key '

Geez Louise. Really?

(Not really; some improvements are possible. But this is the first thing I found that worked and by that time I was tired of jq and wanted to get on with my life.)

The pipe symbols | are real pipes; they take the output of the filter on the left and feed it into the filter on the right.

  • to_entries takes the original dictionary

           {k1: v1, k2: v2, …}
    

    and turns it into an array of sub-dictionaries:

           [ {"key": k1, "value": v1},
             {"key": k2, "value": v2},
             … ]
    

    There must be a better way to handle this, I know know. (Addendum 20250615: Most obviously, I think [ [k1, v1], [k2, v2], … ] would be a small but immediate improvement.)

  • The .[] means to iterate over the array: instead of feeding it as a single object to what follows, destructure it and feed each element to the following program. The result of this is not an array, it is a stream of separate objects, each of the form

    { "key": "Iron Sword", "value": { … } }
    

    The .[] would work on the original stream, before to_entries, but it would emit a stream of the values, discarding the keys. The whole point of this pipeline is to emit the matching keys, so that is a non-starter.

  • The select(.value.food.modifiers) discards the stream elements where there is no .value.food.modifiers data.

    (If the value had a food value that is an array or a scalar, the whole program would crash. If the food had a modifiers that was a scalar, the select would emit the elements where .food.modifier was true. Fortunately neither of these was an issue for me.)

  • select(.value.food.modifiers | .[] | any(.; . == "earthy")) is more complicated. Like the previous item, it will pass along matching items and discard non-matching items. Here, instead of selecting items where .value.food.modifiers is true, I want to ask if any of the modifiers is equal to "earthy". I want to just write

    select("earthy" is in .value.food.modifiers)
    

    but I can't do this. jq has an x in array operator, but it doesn't check to see if a value is in an array. It also has an array has x operator, but that also doesn't check to see if the element is in the array. They only tell you if a particular numerical index is in range for the array.

    Instead, I must take the array of food modifier strings and pass it to .[] to turn it into a stream of single strings. The stream goes into the any operator which will return true if a predicate is true for any item in its input stream. The predicate here is . == "earthy" which asks if the string is equal to "earthy". The result is a stream of booleans; if any of these is true the any() operator returns true also, and the select selects the input key-value structure.

  • Finally, the .key filter discards the values and emits a stream of the matching keys. The -r flag on the command tells jq to emit these as plain strings, not quoted JSON strings. (For example, simply Tomato "Sushi" instead of "Tomato \"Sushi\"".)

Looking at this now I see several possible improvements that a more experienced user of jq might have tried first.. Instead of the duplicate select(.value.food.modifiers) test, which first discards structures where .value.food.modifiers is missing or empty, and then filters the remaining items, we can use

     jq -r 'to_entries |
            .[] |
            select(.value.food.modifiers | .[]? | any(. ; . == "earthy")) |
            .key '

where the .[]? just means to discard items where .[] does not make sense. In particular if .value.food.modifiers is null, .[]? will discard the null instead of crashing the program.

We can fold the .[]? into the any. any has two arguments. The first one, before the ;, says what stream to process. Instead of preprocessing the stream with .[]? and then telling any to process the resulting stream ., we can tell any to process the result of .[]? directly:

     jq -r 'to_entries |
            .[] |
            select(.value.food.modifiers | any(.[]? ; . == "earthy")) |
            .key '

We can also fold the .value.food.modifiers into the input stream given to any:

            select(any(.value.food.modifiers | .[]? ; . == "earthy"))

Now the input stream to any is the result of .value.food.modifiers | .[]?. Each element of this stream is used as the value of . in the condition on the right.

I thought we would be able to eliminate the pipe in the input stream:

            select(any(.value.food.modifiers.[]? ; . == "earthy"))

No, this is a syntax error. Why? Oh, I see. The . on the right is notionally an abbreviation for the result of .value.food.modifiers on the left. To combine them, we want to write:

            select(any(.value.food.modifiers[]? ; . == "earthy"))

I suppose I can similarly abbreviate to_entries | .[] to to_entries[]. The version I have now is:

   jq -r 'to_entries[] |
          select(any(.value.food.modifiers[]? ; . == "earthy")) |                                                                      .key'

This works okay, and produces the same output as the original. I still find it somewhat obtuse.

What would I like better? I'm not sure; I'd have to think about it carefully. I like many of the specific syntactic choices:

  • The distinction between arrays and streams is really confusing to me. And suppose I wanted to emit the final result as a single array instead of as a stream of strings? I think the only way to do that is to wrap the entire program as [entire-program]. (Just | [.] doesn't do it; that wraps each individual string as a single-element array.)

  • . == "earthy" is essentially a lambda function, or what Haskell might call a “section”. I like this notation just fine; using . instead of a named lambda variable is a fine innovation.

  • The trailing ? to suppress errors seems like a good idea, and it can be applied on other situations. But it's not clear to me when its behavior is to suppress the result entirely and when it emits a null value.


2025-06-15

Four years on, I still do not like jq. In the next article in this series I will bring some more detailed criticisms and some ideas for what I think would have been better.


[Other articles in category /prog] permanent link