Skip to content Skip to sidebar Skip to footer

Json Library Interprets Space Characters As "\xa0"

When I load a json-file into python there's no problem with encodings as long as the file is treated as a string. However, loading the file into json-format either using json.load

Solution 1:

Assuming that:

  1. Your JSON file is valid and uses UTF-8 as the encoding.
  2. Your JSON file contains keys with non-breaking spaces.

Then the output you get is perfectly correct.

The first piece of code reads and print strings:

with open(json_path) as f:
    lines = f.readlines()
    for line inlines:
        print(line)

When you print a string, it is output more or less unchanged and the non-breaking spaces look the same as a regular space.

The second piece of code parses a JSON file thereby creating a dictionary and then prints the dictionary keys. For simplicity of explanation, let's assume the dictionary itself is printed (instead of the keys):

with open(json_path) as f:
    data = json.load(f)
    print(data)

Calling print with a dictionary as an argument invokes the __str__ function of the dictionary. The __str__ function uses it's own rules how to format the output, e.g. it encloses the dictionary in braces, adds single quotes etc.

If you study the output you might find that printing a dictionary creates valid Python code for a dictionary.

In Python strings, certain characters need to be escaped. And the escape sequence starts with a backslash. A typical example would be a newline character:

d = {'line1\nline2': 3}
print(d)

Output:

{'line1\nline2': 3}

Part of __str__ dictionary logic obviously is to also escape non-breaking spaces as they otherwise cannot be visually distinguished from a regular space (even though this is not strictly necessary). And the proper way to escape it in Python is \a0.

So everything works as designed. It's a feature, not a bug.

Post a Comment for "Json Library Interprets Space Characters As "\xa0""