String Encoding
Python 3 strings are Unicode by default. Understanding encoding is essential when reading/writing files and communicating over networks.
Unicode Strings
s = "Hello, 世界! 🌍"
print(len(s)) # 11 (characters, not bytes)
print(s[7]) # 世
print(ord("A")) # 65
print(chr(65)) # A
print("A") # A (unicode escape)
print("\U0001F30D") # 🌍Encoding to Bytes
s = "Hello"
b = s.encode("utf-8")
print(b) # b'Hello'
print(type(b)) # bytes
print(b[0]) # 72 (ASCII code for H)Decoding from Bytes
data = b"caf\xc3\xa9"
text = data.decode("utf-8")
print(text) # caféCommon Encodings
s = "café"
print(s.encode("utf-8")) # 5 bytes
print(s.encode("latin-1")) # 4 bytes
print(s.encode("ascii", errors="replace")) # b'caf?'