在打开爬取完的csv文件时,因为有2.7GB,文件太大了,Excel、notepad++都打不开,用Python打开时乱码,那就改下编码格式吧。
改成 encoding=’utf-8’ 报错如下:
1 | UnicodeDecodeError:'utf-8' codec can't decode bytes in position 203-204: invalid continuation byte |
改成 encoding=’gbk’ 也报错,报错如下:
1 | UnicodeDecodeError:'gbk' codec can't decode byte 0xad in position 62: illegal multibyte sequence |
最后解决方法:
在encoding=’utf-8’后面加上个errors=’ignore’
1 | with open('ceshi.csv','r',encoding='utf-8', errors='ignore') as f: |