UTF-8和BOM
UTF是什么?
UTF(Unicode Transformation Format),统一码转换格式。
BOM是什么?
BOM(Byte Order Mark),字节顺序码。
注意这不是一个文件头信息,而是直接写在文件流最前面的3个字节:\uFEFF
什么场景会用到BOM?
Windows下。
Microsoft compilers[9] and interpreters, and many pieces of software on Microsoft Windows such as Notepad treat the BOM as a required magic number rather than use heuristics. These tools add a BOM when saving text as UTF-8, and cannot interpret UTF-8 unless the BOM is present or the file contains only ASCII. Windows PowerShell (up to 5.1) will add a BOM when it saves UTF-8 XML documents. However, PowerShell Core 6 has added a -Encoding switch on some cmdlets called utf8NoBOM so that document can be saved without BOM. Google Docs also adds a BOM when converting a document to a plain text file for download.
Unix下都是不带BOM的。
如何解决BOM导致的解析问题?
添加/删除文件头的\uFEFF即可。
参考
BOM的wiki:
https://en.wikipedia.org/wiki/Byte_order_mark
https://blog.csdn.net/weixin_40449300/article/details/86567129