2017 May 07 Python

关于bytes和str

Python3有两种表示字符串：bytes和str。

bytes是具有特定编码二进制字符串。
str是不具有特定编码的字符串，就是人们常说的Unicode字符串。

bytes <> str

    s = "中文"
    b = s.encode("utf-8")  # str > bytes
    ss = b.decode("utf-8") # bytes > str

字符串转换函数

    def to_str(bytes_or_str):
        if isinstance(bytes_or_str, bytes):
            value = bytes_or_str.decode("utf-8")
        else:
            value = bytes_or_str
        return value
  
  
    def to_bytes(bytes_or_str):
        if isinstance(bytes_or_str, str):
            value = bytes_or_str.encode("utf-8")
        else:
            value = bytes_or_str
        return value

读写二进制文件需要注意

    with open('t.txt', 'w') as f:
        f.write(os.urandom(5))

    >>> TypeError: must be str, not bytes

Python3给 open 函数添加了名为encoding的新参数，这个新参数的默认值就是 ‘utf-8’。对文件进行 read 和 write 操作时，就必须传入 str 类型字符串。

我们必须使用二进制写入模式（’wb’）来开启待操作的文件

    with open('t.txt', 'wb') as f:
        f.write(os.urandom(5))

从文件中读取数据的时候也有这种问题。解决办法与写入时相似：用 ‘rb’ 模式