I didn't like jq in 2021 either
[ Previously: [1] ]
A couple of weeks ago Dan Vanderkam
asked if it would be worth while putting in time figuring out how to
use jq really well. I said I thought not and took three minutes to
find a nice solid example of why jq sucks.
I went to turn that into a blog article and would have dropped it into
the file prog/jq.html, but I found that file already existed and
contained an unpublished article called “I don't like jq”.
This is that article, from 2021.
2021-06-23
I like the idea of jq,
which is something like “AWK, but for JSON instead of space-separated
records”. It reads a JSON input and then uses supplied script to
transform it to a JSON output.
But I don't like jq itself because I find the language too hard to
use. Two years in to using jq, I still struggle to do things I
think should be simple.
Most recently, I had a dictionary like this:
{ "Fish": { ...
"food": { ...
"modifiers": [ "huge meal",
"meaty",
"fishy",
"fatty" ]
}
},
…
"French Toast": { ...
"food": { ...
"modifiers": [ "earthy",
"planty",
"fatty" ]
}
},
…
"Iron Sword": { ...
"food": null
},
…
}
I wanted jq to print out all the keys of “earthy” foods. So for
example French Toast yes, Fish and Iron Sword no. This sounds
simple. It wasn't simple. What I eventually came up with was:
jq -r 'to_entries |
.[] |
select(.value.food.modifiers) |
select(.value.food.modifiers | .[] | any(. ; . == "earthy")) |
.key '
Geez Louise. Really?
(Not really; some improvements are possible. But this is the first
thing I found that worked and by that time I was tired of jq and
wanted to get on with my life.)
The pipe symbols | are real pipes; they take the output of the
filter on the left and feed it into the filter on the right.
to_entries takes the original dictionary
{k1: v1, k2: v2, …}
and turns it into an array of sub-dictionaries:
[ {"key": k1, "value": v1},
{"key": k2, "value": v2},
… ]
There must be a better way to handle this, I know know. (Addendum
20250615: Most obviously, I think [ [k1, v1], [k2, v2], … ] would
be a small but
immediate improvement.)
The .[] means to iterate over the array: instead of feeding it as
a single object to what follows, destructure it and feed each
element to the following program. The result of this is not an
array, it is a stream of separate objects, each of the form
{ "key": "Iron Sword", "value": { … } }
The .[] would work on the original stream, before to_entries,
but it would emit a stream of the values, discarding the keys. The
whole point of this pipeline is to emit the matching keys, so that
is a non-starter.
The select(.value.food.modifiers) discards the stream elements
where there is no .value.food.modifiers data.
(If the value had a food value that is an array or a scalar, the
whole program would crash. If the food had a modifiers that was
a scalar, the select would emit the elements where
.food.modifier was true. Fortunately neither of these was an
issue for me.)
select(.value.food.modifiers | .[] | any(.; . == "earthy")) is
more complicated. Like the previous item, it will pass along matching
items and discard non-matching items. Here, instead of selecting
items where .value.food.modifiers is true, I want to ask if any of
the modifiers is equal to "earthy". I want to just write
select("earthy" is in .value.food.modifiers)
but I can't do this. jq has an x in array operator, but it doesn't
check to see if a value is in an array. It also has an array has x
operator, but that also doesn't check to see if the element is in
the array. They only tell you if a particular numerical index is in
range for the array.
Instead, I must take the array of food modifier strings and pass it to
.[] to turn it into a stream of single strings. The stream goes
into the any operator which will return true if a predicate is
true for any item in its input stream. The predicate here is . ==
"earthy" which asks if the string is equal to "earthy". The
result is a stream of booleans; if any of these is true the any()
operator returns true also, and the select selects the input
key-value structure.
Finally, the .key filter discards the values and emits a stream of
the matching keys. The -r flag on the command tells jq to emit
these as plain strings, not quoted JSON strings. (For example,
simply Tomato "Sushi" instead of "Tomato \"Sushi\"".)
Looking at this now I see several possible improvements that a more
experienced user of jq might have tried first.. Instead of the
duplicate select(.value.food.modifiers) test, which first discards
structures where .value.food.modifiers is missing or empty, and then
filters the remaining items, we can use
jq -r 'to_entries |
.[] |
select(.value.food.modifiers | .[]? | any(. ; . == "earthy")) |
.key '
where the .[]? just means to discard items where .[] does not make
sense. In particular if .value.food.modifiers is null, .[]? will
discard the null instead of crashing the program.
We can fold the .[]? into the any. any has two arguments. The
first one, before the ;, says what stream to process. Instead of
preprocessing the stream with .[]? and then telling any to process
the resulting stream ., we can tell any to process the result of
.[]? directly:
jq -r 'to_entries |
.[] |
select(.value.food.modifiers | any(.[]? ; . == "earthy")) |
.key '
We can also fold the .value.food.modifiers into the input stream
given to any:
select(any(.value.food.modifiers | .[]? ; . == "earthy"))
Now the input stream to any is the result of .value.food.modifiers | .[]?.
Each element of this stream is used as the value of . in the
condition on the right.
I thought we would be able to eliminate the pipe in the input stream:
select(any(.value.food.modifiers.[]? ; . == "earthy"))
No, this is a syntax error. Why? Oh, I see. The . on the right is
notionally an abbreviation for the result of .value.food.modifiers
on the left. To combine them, we want to write:
select(any(.value.food.modifiers[]? ; . == "earthy"))
I suppose I can similarly abbreviate to_entries | .[] to
to_entries[]. The version I have now is:
jq -r 'to_entries[] |
select(any(.value.food.modifiers[]? ; . == "earthy")) | .key'
This works okay, and produces the same output as the original. I
still find it somewhat obtuse.
What would I like better? I'm not sure; I'd have to think about it
carefully. I like many of the specific syntactic choices:
The distinction between arrays and streams is really confusing to
me. And suppose I wanted to emit the final result as a single array
instead of as a stream of strings? I think the only way to do that
is to wrap the entire program as [entire-program]. (Just | [.] doesn't do it;
that wraps each individual string as a single-element array.)
. == "earthy" is essentially a lambda function, or what Haskell might call
a “section”. I like this notation just fine; using . instead of a
named lambda variable is a fine innovation.
The trailing ? to suppress errors seems like a good idea, and it
can be applied on other situations. But it's not clear to me when
its behavior is to suppress the result entirely and when it emits a
null value.
2025-06-15
Four years on, I still do not like jq. In the next article in this
series I will bring some more detailed
criticisms and some ideas for what I think would have been better.
[Other articles in category /prog]
permanent link
|