1. Introduction

The JavaScript Object Notation (JSON) format has been around as an IETF Information Standard since 2006. It’s widely used for the storage of structured data in the field of data analytics.

In this tutorial, we’ll explore various ways to convert a JSON dataset to a table using jq. First, we’ll download a public JSON dataset. Subsequently, we’ll use options of the jq command to transform that dataset into tables with different formats.

We’ll use a Ubuntu 22.04 environment for running the examples.

2. Sample JSON File

First, we’ll download a sample JSON file:

$ curl 'http://api.worldbank.org/v2/countries/IND/indicators/SP.POP.TOTL?format=json' -o ind.json
$ cat ind.json | jq -r '.[1][1]'
{ 
  "indicator": { 
    "id": "SP.POP.TOTL", 
    "value": "Population, total" 
  }, 
  "country": { 
    "id": "IN", 
    "value": "India" 
  }, 
  "countryiso3code": "IND", 
  "date": "2021", 
  "value": 1407563842, 
  "unit": "", 
  "obs_status": "", 
  "decimal": 0 
}

As illustrated above, we use the curl command to download a public JSON dataset. The dataset consists of the population of India since 1960. The curl command has one argument and one option:

  • http://api.worldbank.org/v2/countries/IND/indicators/SP.POP.TOTL?format=json is the URL to download from
  • -o ind.json specifies the output file path

We inspect the downloaded file using the cat and jq commands:

  • -r writes the output as a raw string without escaping any quote characters
  • ‘.[1][1]’ is a filter to read the value at index [1,1] of the two-dimensional array within the object

We’ll use this sample dataset for exploring the various options of the jq command.

3. CSV Data

The jq syntax supports extraction of the data and provides options for transforming JSON to a CSV file:

$ cat ind.json |
  jq '.[1][] | [.country.id, .date, .value] | @csv' |
  head
"IN","2022",1417173173 
"IN","2021",1407563842 
"IN","2020",1396387127 
"IN","2019",1383112050 
"IN","2018",1369003306 
"IN","2017",1354195680 
"IN","2016",1338636340 
"IN","2015",1322866505 
"IN","2014",1307246509 
"IN","2013",1291132063

As we can see above, the command performs several actions and uses head to only show the first ten rows of the output:

  • .[1][] iterates over all the elements of the second array in the dataset
  • [.country.id, .date,.value] extracts individual fields and creates an array element
  • we pipe the output to a @csv directive within the jq script to generate a CSV output

In summary, we created a CSV file from a list of JSON objects.

4. Custom Table Header

We’ll explore adding a header row to our table. This can be helpful to better understand the dataset.

4.1. CSV With Custom Header

First, we’ll add a header while converting to a CSV file:

$ cat ind.json |
  jq -r '["country","year","population"], (.[1][] | [.country.id, .date,.value]) | @csv' |
  head
"country","year","population"
"IN","2022",1417173173
"IN","2021",1407563842
"IN","2020",1396387127
"IN","2019",1383112050
"IN","2018",1369003306
"IN","2017",1354195680
"IN","2016",1338636340
"IN","2015",1322866505
"IN","2014",1307246509
"IN","2013",1291132063

Here, the jq command prepends a custom header [“country”,”year”,”population”] to the output.

4.2. Tab Separated Table With Custom Header

Next, we’ll use the tab character as a separator of the fields in the generated table:

$ cat ind.json |
  jq -r '["country","year","population"], (.[1][] | [.country.id, .date,.value]) | @tsv' |
  head
country year population 
IN 2022 1417173173 
IN 2021 1407563842 
IN 2020 1396387127 
IN 2019 1383112050 
IN 2018 1369003306 
IN 2017 1354195680 
IN 2016 1338636340 
IN 2015 1322866505 
IN 2014 1307246509 

The jq command uses the @tsv instead of the @csv directive to separate the fields with a tab character.

Finally, let’s add dashes under the header row based on the length of each header field:

$ cat ind.json |
  jq -r '(["country","year","population"] | (., map(length*"-"))), (.[1][] | [.country.id, .date,.value]) | @tsv' |
  head
country year population
------- ---- ----------
IN      2022 1417173173
IN      2021 1407563842
IN      2020 1396387127
IN      2019 1383112050
IN      2018 1369003306
IN      2017 1354195680
IN      2016 1338636340
IN      2015 1322866505

In summary, the directive (., map(length*”-“) iterates over the current set of lines and generates a row of dashes below the header. Another key point is that map and the .[] are equivalent in this case.

5. Table With Pipe Separators

Likewise, we can use the column command to separate fields using the pipe character:

$ cat ind.json |
  jq -r '["country","year","population"], (.[1][] | [.country.id, .date,.value]) | @tsv' |
  column -t -o '|' |
  head
country|year|population 
IN |2022|1417173173 
IN |2021|1407563842 
IN |2020|1396387127 
IN |2019|1383112050 
IN |2018|1369003306 
IN |2017|1354195680 
IN |2016|1338636340 
IN |2015|1322866505 
IN |2014|1307246509

As shown above, the column command has a couple of options:

  •  -t specifies the tab character as the field separator for input fields
  •  -o specifies pipe character as the field separator for output fields

In summary, combining the jq and column commands provides a powerful way of formatting tables.

6. Converting Command Output to Tables

At this point, we’ll use the output of the df command to build a table from the disk usage information:

$ df -P |
  jq -Rs 'split("\n")[1:-1] | map(split(" +";"")) | map({FileSystem: .[0], Used:.[1], Available : .[2], Capacity: .[3]})'
[
  {
    "FileSystem": "tmpfs",
    "Used": "1091488",
    "Available": "1856",
    "Capacity": "1089632"
  },
  {
    "FileSystem": "/dev/sda3",
    "Used": "97977620",
    "Available": "26216284",
    "Capacity": "66738208"
  }
]

In this case, the jq command transforms the output of the df command to create an array of JSON objects, in several steps:

  • -Rs reads the input as one raw string instead of as a JSON
  • split(“\n”) breaks the input into an array of strings, using newline as a separator
  • [1:-1] drops the first and last line
  • map(split(” +”;””)) operates on each line and breaks it in separate parts
  • map({FileSystem: .[0], Used:.[1], Available : .[2], Capacity: .[3]}, creates a JSON object using custom keys

Next, we’ll convert the above JSON to a table:

$ df -P |
  jq -Rs 'split("\n")[1:-1] | map(split(" +";"")) | map({FileSystem: .[0], Used:.[1], Available : .[2], Capacity: .[3]})' > disk.json
$ cat disk.json | jq -r '["FileSystem","Used","Capacity"], (.[] | [.FileSystem,.Used,.Available]) | @tsv' |
  column -t -o '|'
FileSystem|Used     |Capacity
tmpfs     |1091488  |1864
/dev/sda3 |97977620 |26216476
tmpfs     |5457432  |0
tmpfs     |5120     |4
/dev/sda2 |524252   |6216
Scratch   |497775612|131487316
tmpfs     |1091484  |108
/dev/sr0  |51806    |51806

Similar to our earlier example with pipe separators, we create a jq pipeline with several steps:

  • we generate an intermediate disk.json file to store the output of the df command in JSON format
  • pipe the disk.json file to a jq filter
  • ‘[“FileSystem”,”Used”,”Capacity”], (.[] | [.FileSystem,.Used,.Available]) | @tsv’ generates a tab-separated table
  • column transforms the table using the pipe character as the field separator

Thus, we transformed the line-oriented output of df command into a sequence of JSON objects. Subsequently, we converted the list of JSON objects into a table.

7. Conclusion

In this article, we learned a few ways to transform a sequence of JSON objects into a table format using jq.

Firstly, we used the sample dataset to create a CSV file from the separate JSON objects. Secondly, we enhanced the table with a header, underlined with dashes. Lastly, we saw an advanced example of converting the Linux df command output into a table.

In summary, jq is a powerful text processing tool to reshape JSON objects to meet our use cases.

Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.