Parse JSON with jq

🗓️
•
🔄
•
⏳ 5 min

You might have noticed that our beloved GNU utils, especially sed and awk, work better with some file types than others.

In fact, although awk can do wonders with CSV files, modern types like JSON, YAML or even XML (which is by no means modern) can be a bit of a pain to work with.

Well, jq is a parser specifically designed to address this issue (regarding JSON file). We’ll see further down how to tackle the other file types mentioned.

Basics

Let’s take a simple JSON as an example. Run curl https://til.hashrocket.com/api/developer_posts.json?username=doriankarter on your command line to see the data.

Since this data is presented as a one-liner, we can use jq to format the output:

sh
curl https://til.hashrocket.com/api/developer_posts.json?username=doriankarter | jq

We can query the interesting data to remove some noise simply by referring to its node name:

sh
curl https://til.hashrocket.com/api/developer_posts.json?username=doriankarter | jq '.data.posts[]'

To output the data as an array we can just enclose the query in []:

sh
curl https://til.hashrocket.com/api/developer_posts.json?username=doriankarter | jq '[.data.posts[]]'

Or we can do some interesting manipulation to the data and present a parsed version:

sh
curl https://til.hashrocket.com/api/developer_posts.json?username=doriankarter | jq '.data.posts[] | {hi: .slug, mom: ("THIS IS A TITLE " + .title)}'

Again, we are accessing the data by their node name and doing some string concatenation, nothing too fancy. Notice how we use a pipe to pass the data from one query to the next.

Not so basics

This software has a bunch of very useful functions available, we’ll go over a few of them. From now on, there will be no reference to the curl command to keep the code blocks more concise.

You can still use it to test these queries!

Delete node

Use it to clear out unwanted noise:

sh
jq '. | del(.data.posts[].slug)'

Filter with select

Select only the entries that match the given condition.

sh
jq '[.data.posts[] | {ID: .slug, TITLE: .title} | select(.TITLE | length > 30)]'

Notice how we now use .TITLE instead of .title to filter the output. This is because by this point, the .title node is not there anymore!

Conditional logic

We can use concise if statements to selectively modify a node. Here we create a new one called IS_VALID with the value "Too short!" or "yes" depending on the length of the .title.

sh
jq '[.data.posts[] | {ID: .slug, TITLE: .title, IS_VALID: (if .title | length < 30 then "Too short!" else "yes" end)}]'

Again, notice how in this case we reference .title. This is because the .TITLE node is not created yet!

Group by

Group by the value of any given node with group_by()!

sh
jq '[.data.posts[] | {ID: .slug, TITLE: .title, IS_VALID: (if .title | length < 30 then "Too short!" else "yes" end)}] | group_by(.IS_VALID)'

Notice how this function is called outside the creation of the array!

Sort by

Sorting is also possible and can the result be reversed as wanted:

sh
jq '[.data.posts[] | {ID: .slug, TITLE: .title, len: (.title | length)}] | sort_by(.len) | reverse'

Notice that we set .len to the result of passing .title to the length built-in function.

Handle other file types with YQ

Since this is so nice to work with, someone took the time to make a wrapper around jq to also handle other file types and created yq (as in YAML query).

It doesn’t just handle YAML files, but also JSON, XML, CSV and TSV. Not only that, you can easily use this application to convert one file type into another! It is worth mentioning however that not all file conversions are supported as there are still some edge cases to be tackled. Check the docs to find out more.

Keep in mind that apart from what is shown below, all the previous operations can be applied to any of these file types. Since yq uses the same syntax as jq, I’ll keep it out of the examples to keep things simple.

This is just a quick overview of how you might want to use the tool, it can achieve much more than I’m showing here.

YAML to other types

For a your_cool.yaml file of the structure:

yaml
pets:
cat:
- purrs
- meows

The command yq -o xml '.' your_cool.yaml would output it under XML format:

xml
<pets>
<cat>purrs</cat>
<cat>meows</cat>
</pets>

Or you can run it like yq -o json '.' your_cool.yaml to get a JSON instead:

json
{
"pets": {
"cat": ["purrs", "meows"]
}
}

Any Input, Any Output

As mentioned above, the possibilities are near limitless. Say you have a your_cool.csv file of the structure:

name,numberOfCats,likesApples,height
Gary,1,true,168.8
Samantha's Rabbit,2,false,-188.8

Hassle-free conversion to YAML can be achieved with yq -o yaml -p csv '.' your_cool.csv:

yaml
- name: Gary
numberOfCats: 1
likesApples: true
height: 168.8
- name: Samantha's Rabbit
numberOfCats: 2
likesApples: false
height: -188.8

Again, use the -o flag to change the output format yq -o json -p csv '.' your_cool.csv:

json
[
{
"name": "Gary",
"numberOfCats": 1,
"likesApples": true,
"height": 168.8
},
{
"name": "Samantha's Rabbit",
"numberOfCats": 2,
"likesApples": false,
"height": -188.8
}
]

Notice that in order to properly take a non-YAML file as input, the -p flag must be used.


Other posts you might like