Today I was hit with strange error in my script:
'utf8' codec can't decode byte 0xc3 in position 21: invalid continuation byte
I’m reading data from socket sock.recv and result is buff.decode('utf-8') where buff is the returned data.
But today I found pretty much “unicorn” where one of the characters returned “▒” <– this is what throw decode utf-8 into exception. Is there some pre process that would either remove or replace such a strange character?
Solution:
There is a second parameter for .decode() named errors. You can set it to 'ignore' to ignore all non-utf8 characters, or set it to 'replace' to replace them with the diamond question mark (�).
buff.decode('utf-8', 'ignore')