SyntaxStudy
Sign Up
Python Intermediate 10 min read

String Encoding

String Encoding

Python 3 strings are Unicode by default. Understanding encoding is essential when reading/writing files and communicating over networks.

Unicode Strings

s = "Hello, 世界! 🌍"
print(len(s))          # 11 (characters, not bytes)
print(s[7])            # 世
print(ord("A"))        # 65
print(chr(65))         # A
print("A")        # A (unicode escape)
print("\U0001F30D")    # 🌍

Encoding to Bytes

s = "Hello"
b = s.encode("utf-8")
print(b)           # b'Hello'
print(type(b))     # bytes
print(b[0])        # 72 (ASCII code for H)

Decoding from Bytes

data = b"caf\xc3\xa9"
text = data.decode("utf-8")
print(text)   # café

Common Encodings

s = "café"
print(s.encode("utf-8"))    # 5 bytes
print(s.encode("latin-1"))  # 4 bytes
print(s.encode("ascii", errors="replace"))  # b'caf?'
Example
s = "Python 🐍"
encoded = s.encode("utf-8")
print(encoded)           # bytes object
print(len(s))            # character count
print(len(encoded))      # byte count
decoded = encoded.decode("utf-8")
print(decoded == s)      # True
Pro Tip

Always specify encoding explicitly when opening files: open("file.txt", encoding="utf-8"). Never rely on the system default.