Bruce Momjian: Postgres Blog


This blog is about my work on the Postgres open source database, and is published on Planet PostgreSQL. PgLife allows monitoring of all Postgres community activity.

Monday, July 13, 2020

The popular json and jsonb data types are more than just a collection of single-data-type values like arrays. They can contain multiple data types:

CREATE TABLE test(x JSONB); INSERT INTO test VALUES ('"abc"'), ('5'), ('true'), ('null'), (NULL); \pset null (null) SELECT x, jsonb_typeof(x), pg_typeof(x) FROM test; x | jsonb_typeof | pg_typeof
--------+--------------+----------- "abc" | string | jsonb 5 | number | jsonb true | boolean | jsonb null | null | jsonb (null) | (null) | jsonb

You can see I stored a string, number, boolean, json null, and an sql null in a single jsonb column, and Postgres knew the type of each value in the jsonb column, as shown by jsonb_typeof(). However, pg_typeof() still sees all these as jsonb values, and that is the type exposed to anything referencing the json column.

Here is another example that uses #>> to convert all json values to text:

SELECT x, x::TEXT, x#>>'{}', pg_typeof(x#>>'{}') FROM test; x | x | ?column? | pg_typeof
--------+--------+----------+----------- "abc" | "abc" | abc | text 5 | 5 | 5 | text true | true | true | text null | null | (null) | text (null) | (null) | (null) | text

(I am using #>> rather than the more popular ->> because it lets me access json values that are not associated with keys.) Notice that using :: to cast to text retains the double-quotes, while #>> removes them. (->> does also.) However, the return value for #>> did not change based on the json contents — it always returned text or sql null. Also notice that :: casting returns the string "null" for the jsonb null value, while #>> returns an sql null. (I used psql's \pset null above.)

There is a fundamental problem with the interaction between jsonb and sql data types. Postgres knows the data type of each json value inside the jsonb field (as shown by jsonb_typeof), but each field can have multiple json values inside, and each row can be different as well. Therefore, you only have two choices for interfacing jsonb to sql data types:

  1. Use ->> or #>> to map all values to text
  2. Cast values to an sql data type

For number one, all data types can be converted to text, so it is a simple solution, though it doesn't work well if you need to process the values in a non-textual way, e.g., a numeric comparison. You can cast jsonb values to any sql data type (number two) as long as all jsonb values can be cast successfully. For example:

DELETE FROM test; -- all inserted values must cast to a jsonb type
INSERT INTO test VALUES ('"1"'), ('2'), ('3e1'), ('4f1'), ('true'), ('null'), (NULL);
ERROR: invalid input syntax for type json
LINE 1: ...NSERT INTO test VALUES ('"1"'), ('2'), ('3e1'), ('4f1'), (... ^
DETAIL: Token "4f1" is invalid.
CONTEXT: JSON data, line 1: 4f1 -- 4e1 uses exponential notation
INSERT INTO test VALUES ('"1"'), ('2'), ('3e1'), ('4e1'), ('true'), ('null'), (NULL); SELECT x, x::TEXT, x#>>'{}', jsonb_typeof(x), pg_typeof(x#>>'{}') FROM test; x | x | ?column? | jsonb_typeof | pg_typeof
--------+--------+----------+--------------+----------- "1" | "1" | 1 | string | text 2 | 2 | 2 | number | text 30 | 30 | 30 | number | text 40 | 40 | 40 | number | text true | true | true | boolean | text null | null | (null) | null | text (null) | (null) | (null) | (null) | text -- all values can't be cast to numeric
SELECT x, x::TEXT, x#>>'{}', x::NUMERIC * 5, jsonb_typeof(x)
FROM test;
ERROR: cannot cast jsonb string to type numeric -- all values of jsonb type 'number' can be cast to numeric
SELECT x, x::TEXT, x#>>'{}', x::NUMERIC * 5, jsonb_typeof(x)
FROM test
WHERE jsonb_typeof(x) = 'number'; x | x | ?column? | ?column? | jsonb_typeof
----+----+----------+----------+-------------- 2 | 2 | 2 | 10 | number 30 | 30 | 30 | 150 | number 40 | 40 | 40 | 200 | number -- Use #>> to remove double-quotes from the jsonb string
SELECT x, x::TEXT, x#>>'{}', (x#>>'{}')::NUMERIC * 5, jsonb_typeof(x)
FROM test
WHERE jsonb_typeof(x) = 'number' OR jsonb_typeof(x) = 'string'; x | x | ?column? | ?column? | jsonb_typeof
-----+-----+----------+----------+-------------- "1" | "1" | 1 | 5 | string 2 | 2 | 2 | 10 | number 30 | 30 | 30 | 150 | number 40 | 40 | 40 | 200 | number

The first insert fails because 4f1 isn't double-quoted and can't be cast to a json numeric, but 4e1 can because it represents exponential notation. In trying to cast all values to numeric in the select, the "1" has double-quotes, so it fails. Using #>> removes the double quotes and allows the string value to be cast to numeric too. This discussion exposes the confusion of using json without casting.

I have been using single jsonb values, but the same behavior happens with jsonb documents:

DELETE FROM test; -- create document with keys 'a' and 'b'
INSERT INTO test VALUES ('{"a": "xyz", "b": 5}'); -- access key 'a'
SELECT x->'a', jsonb_typeof(x->'a'), pg_typeof(x->'a'), (x->'a')::TEXT, x->>'a', pg_typeof(x->>'a') FROM test; ?column? | jsonb_typeof | pg_typeof | text | ?column? | pg_typeof
----------+--------------+-----------+-------+----------+----------- "xyz" | string | jsonb | "xyz" | xyz | text -- access key 'b'
SELECT x->'b', jsonb_typeof(x->'b'), pg_typeof(x->'b'), (x->'b')::TEXT, x->>'b', pg_typeof(x->>'b') FROM test; ?column? | jsonb_typeof | pg_typeof | text | ?column? | pg_typeof
----------+--------------+-----------+------+----------+----------- 5 | number | jsonb | 5 | 5 | text

So, in summary:

  • If you want to use json values as text, use ->> or #>>
  • If you want to cast to another sql data type, use :: or cast to cast, but be sure all values can be cast to the new sql data type