encoding – A Passionate Techie

Today I was hit with strange error in my script:

'utf8' codec can't decode byte 0xc3 in position 21: invalid continuation byte

I’m reading data from socket sock.recv and result is buff.decode('utf-8') where buff is the returned data.

But today I found pretty much “unicorn” where one of the characters returned “▒” <– this is what throw decode utf-8 into exception. Is there some pre process that would either remove or replace such a strange character?

Solution:

There is a second parameter for .decode() named errors. You can set it to 'ignore' to ignore all non-utf8 characters, or set it to 'replace' to replace them with the diamond question mark (�).

buff.decode('utf-8', 'ignore')

	gamejudilebaran.word… on Chef: Roles and Environme…
	WARN: Waiting for se… on OSSEC start problem due to…
	aratik711 on Ansible issues
	aratik711 on Chef: Roles and Environme…
	situs judi on Chef: Roles and Environme…

Tag: encoding

'utf8' codec can't decode byte 0xc3 while decode('utf-8') in python

Rate this:

Share this: