Python: UnicodeDecodeError: 'utf-8' codec can't decode byte...invalid continuation byte
Date : March 29 2020, 07:55 AM
like below fixes the issue You should not decode the response. First of all, you are incorrectly assuming the response is UTF-8 encoded (it is not, as the error shows), but more importantly, BeautifulSoup will detect the encoding for you. See the Encodings section of the BeautifulSoup documentation. Pass a byte string to BeautifulSoup and it'll use any header proclaiming the correct encoding, or do great job of autodetecting the encoding for you. encoding = page.info().get_charset()
page = page.read()
soup = BeautifulSoup(page)
if encoding is not None and soup.original_encoding != encoding:
print('Server and BeautifulSoup disagree')
print('Content-type states it is {}, BS4 states thinks it is {}'.format(encoding, soup.original_encoding)
print('Forcing encoding to server-supplied codec')
soup = BeautifulSoup(page, from_encoding=encoding)
|
Python 'utf8' codec can't decode byte 0xc3 in position 72: invalid continuation byte
Date : March 29 2020, 07:55 AM
Hope that helps Keep it simple and it works. The data has already been decoded by the requests module. import requests
data = requests.get('https://www.whoisxmlapi.com/whoisserver/WhoisService?domainName=http://N%E2%94%9CO-RESPONDER@MERCAOLIVRE.COM&outputFormat=json')
print data.text
import json
print json.loads(data.text)
|
Python 'utf8' codec can't decode byte 0xcd in position 0: invalid continuation byte
Date : March 29 2020, 07:55 AM
wish helps you datetime.datetime doesn't return a string, so it cannot be processed by json.dumps. To change it use str(datetime.datetime(2015, 6, 17, 7, 43))
|
Python SMTP: 'utf-8' codec can't decode byte 0xe7 in position 7: invalid continuation byte
Date : March 29 2020, 07:55 AM
this one helps. \xe7 is the รง in your name but not encoded in UTF-8 (maybe cp1254, Turkish name?). Save your source file in UTF-8 and try again. It helps to have a reproducible example. Your ****** in the source probably removed the problem. Note #coding:utf8 at the top of the file declares the encoding of the file, but it is the default in Python 3 so it is not required. Python 2 would need it.
|
Python / Pandas: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcd in position 133: invalid continuation byte
Date : March 29 2020, 07:55 AM
Any of those help I'm trying to build a method to import multiple types of csvs or Excels and standardize it. Everything was running smoothly until a certain csv showed up, that brought me this error: , For the record, this is probably better than multiple try/excepts def read_csv(filepath):
if os.path.splitext(filepath)[1] != '.csv':
return # or whatever
seps = [',', ';', '\t'] # ',' is default
encodings = [None, 'utf-8', 'ISO-8859-1'] # None is default
for sep in seps:
for encoding in encodings:
try:
return pd.read_csv(filepath, encoding=encoding, sep=sep)
except Exception: # should really be more specific
pass
raise ValueError("{!r} is has no encoding in {} or seperator in {}"
.format(filepath, encodings, seps))
|