UTF-8和BOM

UTF是什么?

UTF(Unicode Transformation Format),统一码转换格式。

BOM是什么?

BOM(Byte Order Mark),字节顺序码。

注意这不是一个文件头信息,而是直接写在文件流最前面的3个字节:\uFEFF

什么场景会用到BOM?

Windows下。

Microsoft compilers[9] and interpreters, and many pieces of software on Microsoft Windows such as Notepad treat the BOM as a required magic number rather than use heuristics. These tools add a BOM when saving text as UTF-8, and cannot interpret UTF-8 unless the BOM is present or the file contains only ASCII. Windows PowerShell (up to 5.1) will add a BOM when it saves UTF-8 XML documents. However, PowerShell Core 6 has added a -Encoding switch on some cmdlets called utf8NoBOM so that document can be saved without BOM. Google Docs also adds a BOM when converting a document to a plain text file for download.

Unix下都是不带BOM的。

如何解决BOM导致的解析问题?

添加/删除文件头的\uFEFF即可。

参考

BOM的wiki:

https://en.wikipedia.org/wiki/Byte_order_mark

https://blog.csdn.net/weixin_40449300/article/details/86567129