’utf-8‘ codec can’t decode bytes in position 203-204： invalid continuation byte✨

2020-05-27

在打开爬取完的csv文件时，因为有2.7GB，文件太大了，Excel、notepad++都打不开，用Python打开时乱码，那就改下编码格式吧。

改成 encoding=’utf-8’ 报错如下：

1	UnicodeDecodeError：'utf-8' codec can't decode bytes in position 203-204： invalid continuation byte

改成 encoding=’gbk’ 也报错，报错如下：

1	UnicodeDecodeError：'gbk' codec can't decode byte 0xad in position 62： illegal multibyte sequence

最后解决方法：

在encoding=’utf-8’后面加上个errors=’ignore’

1 2	with open('ceshi.csv','r',encoding='utf-8', errors='ignore') as f：

jsonContent: meta: false pages: false posts: title: true date: true path: true text: false raw: false content: false slug: false updated: false comments: false link: false permalink: false excerpt: false categories: false tags: true