Parse JSON with jq

Our beloved GNU utils, especially sed and awk, work better with some file types than others. JSON, YML or XML files can be a bit of a pain to work with.

Jq is a parser specifically designed to handle JSON files, and there’s a bonus tool at the end for YML and XML files as well!

The basics

Let’s take a simple JSON as an example: run curl https://til.hashrocket.com/api/developer_posts.json?username=doriankarter on your command line to see the data.

Since this data is presented as a one-liner, we can use jq to format the output:

curl https://til.hashrocket.com/api/developer_posts.json?username=doriankarter | jq

We can query the interesting bits and remove some noise simply by referring to its node name:

curl https://til.hashrocket.com/api/developer_posts.json?username=doriankarter | jq '.data.posts[]'

To output the data as an array we can just enclose the query in []:

curl https://til.hashrocket.com/api/developer_posts.json?username=doriankarter | jq '[.data.posts[]]'

Or we can do some interesting manipulation to the data and present a parsed version:

curl https://til.hashrocket.com/api/developer_posts.json?username=doriankarter | jq '.data.posts[] | {id: .slug, formatted_title: ("THIS IS A TITLE - " + .title)}'

Again, we are accessing the data by their node name and doing some string concatenation.

Notice how we use a pipe (|) to pass the data from one command to the next.

The not so basics

This tool has a bunch of very useful functions available, we’ll go over a few of them.

From now on, there will be no reference to the curl command to keep the code blocks more concise.

Delete node

Use it to clear out unwanted noise:

jq 'del(.data.posts[].slug)'

Filter data

Select only the entries that match the given condition:

jq '[.data.posts[] | select(.title | length > 30)]'

Add a node

You can add nodes to the JSON:

jq '.data.posts[] | (. + {hi: "mom"})'

Conditional logic

Following the previous example, we can use if statements to add a node with variable content.

Here we create a new one called IS_VALID with the value "Too short!" or "yes" depending on the length of the .title.

jq '.data.posts[] | (. + {IS_VALID: (if .title | length < 30 then "Too short!" else "yes" end)})'

Perhaps more useful, we can add the new node or not depending on the condition:

jq '.data.posts[] | (if .title | length > 30 then . + {IS_VALID: true} else . end)'

Group by

Group nodes by values using group_by():

jq '[.data.posts[] | (. + {IS_VALID: (if .title | length < 30 then "Too short!" else "yes" end)})] | group_by(.IS_VALID)'

Notice how in this case we create a new array with the data before sending it to group_by().

Sort by length

Sorting is also possible and can the result be reversed if needed:

jq '[.data.posts[] | (. + {len: (.title | length)})] | sort_by(.len) | reverse'

Notice that we add a .len node with the result of passing .title to the length built-in function.

Modify in place

So far we’ve always focused on the content of the posts array, losing it and the data node names in the process.

This might be what you want, but in some cases one needs to modify the data ‘in place’, keeping the original data structure.

This can be done swapping the pipe operator (|) for the modify-in-place operator (|=), so for this simple example from before:

jq '.data.posts[] | (. + {hi: "mom"})'

If we wanted to modify the original data structure including the data and posts node names, we could instead do:

jq '.data.posts[] |= (. + {hi: "mom"})'

Handle other file types with yq

Since this is so useful, someone took the time to create yq (as in YAML query). It actually doesn’t just handle YAML files, but also XML, CSV and TSV.

Not only that, you can easily use this application to convert one file type into another!
Check the docs to find out more.

Keep in mind that apart from what is shown below, all the previous operations can be applied to any of these file types.
Since yq uses similar syntax as jq, I’ll keep it out of the examples to keep things simple.

This is just a quick overview of how you might want to use the tool, it can achieve much more than I’m showing here.

YAML to other types

For a cool.yaml file of the structure:

pets:
    cat:
        - purrs
        - meows

The command yq -o xml '.' your_cool.yaml would output it with XML structure:

<pets>
  <cat>purrs</cat>
  <cat>meows</cat>
</pets>

Or you can run it like yq -o json '.' your_cool.yaml to get a JSON instead:

{
    "pets": {
        "cat": ["purrs", "meows"]
    }
}

Any Input, Any Output

Say you have a cool.csv file of the structure:

name,numberOfCats,likesApples,height
Gary,1,true,168.8
Samantha's Rabbit,2,false,-188.8

Convert it to YAML with yq -o yaml -p csv '.' your_cool.csv:

- name: Gary
  numberOfCats: 1
  likesApples: true
  height: 168.8
- name: Samantha's Rabbit
  numberOfCats: 2
  likesApples: false
  height: -188.8

Again, use the -o flag to change the output format yq -o json -p csv '.' your_cool.csv:

1
[
2
    {
3
        "name": "Gary",
4
        "numberOfCats": 1,
5
        "likesApples": true,
6
        "height": 168.8
7
    },
8
    {
9
        "name": "Samantha's Rabbit",
10
        "numberOfCats": 2,
11
        "likesApples": false,
12
        "height": -188.8
13
    }
14
]

Notice the use of the -p flag to indicate the input format, since by default it will expect a YAML.

Parse JSON with jq

Other posts you might like