使用 Sequence 初始化字符串的时间复杂度是多少？-解网

问：

到目前为止，我做了什么？

我查看了初始值设定项的公共文档。
我看了一下开源代码，见下面的代码。

细节

在查看了方法定义之后，我认为我们可以通过说 / 将存储在连续的字节中并将使用来进一步澄清这个问题。SequenceArrayUTF8

  /// Creates a string from the given Unicode code units in the specified
  /// encoding.
  ///
  /// - Parameters:
  ///   - codeUnits: A collection of code units encoded in the encoding
  ///     specified in `sourceEncoding`.
  ///   - sourceEncoding: The encoding in which `codeUnits` should be
  ///     interpreted.
  @inlinable
  @inline(__always) // Eliminate dynamic type check when possible
  public init<C: Collection, Encoding: Unicode.Encoding>(
    decoding codeUnits: C, as sourceEncoding: Encoding.Type
  ) where C.Iterator.Element == Encoding.CodeUnit {
    guard _fastPath(sourceEncoding == UTF8.self) else {
      self = String._fromCodeUnits(
        codeUnits, encoding: sourceEncoding, repair: true)!.0
      return
    }

    // Fast path for user-defined Collections and typed contiguous collections.
    //
    // Note: this comes first, as the optimizer nearly always has insight into
    // wCSIA, but cannot prove that a type does not have conformance to
    // _HasContiguousBytes.
    if let str = codeUnits.withContiguousStorageIfAvailable({
      (buffer: UnsafeBufferPointer<C.Element>) -> String in
      Builtin.onFastPath() // encourage SIL Optimizer to inline this closure :-(
      let rawBufPtr = UnsafeRawBufferPointer(buffer)
      return String._fromUTF8Repairing(
        UnsafeBufferPointer(
          start: rawBufPtr.baseAddress?.assumingMemoryBound(to: UInt8.self),
          count: rawBufPtr.count)).0
    }) {
      self = str
      return
    }

    // Fast path for untyped raw storage and known stdlib types
    if let contigBytes = codeUnits as? _HasContiguousBytes,
      contigBytes._providesContiguousBytesNoCopy
    {
      self = contigBytes.withUnsafeBytes { rawBufPtr in
        Builtin.onFastPath() // encourage SIL Optimizer to inline this closure
        return String._fromUTF8Repairing(
          UnsafeBufferPointer(
            start: rawBufPtr.baseAddress?.assumingMemoryBound(to: UInt8.self),
            count: rawBufPtr.count)).0
      }
      return
    }

    self = String._fromNonContiguousUnsafeBitcastUTF8Repairing(codeUnits).0
  }

Swift 字符串时间复杂度

答：

1赞 Itai Ferber 8/16/2023 #1

这不是对时间复杂度的正式分析，但希望足够具体以满足您的需求：从代码单元集合创建字符串的时间复杂度为 O（n）（其中输入集合的长度以代码单元为单位）。n

String从 A 的代码单元初始化有三个初始化路径：Collection

String._fromCodeUnits（_：encoding：repair：） 用于非 UTF-8 输入
String._fromUTF8Repairing（_：）表示连续的 UTF-8 输入（例如，现有字符串或 UTF-8 代码单元）Array
String._fromNonContiguousUnsafeBitcastUTF8Repairing（_：），它将输入集合转换为，然后对生成的数组调用（2）ArrayString._fromUTF8Repairing(_:)
- 此副本本身是 O（n），因为输入集合中的每个代码单元都被复制到数组中（每个代码单元的恒定时间）

因此，我们有两个案例要看：

String._fromCodeUnits(_:encoding:repair:)本身有两条主要路径：
- 如果输入是非 ASCII，则每个代码单元都会迭代并转码为 UTF-8;从可能的输入编码转码是一个恒定时间操作（所有支持的编码在代码单元的长度（以字节为单位）上都有一个固定的上限，因此在操作的复杂性上有一个恒定的、固定的界限），所以这里的整体操作是 O（n） 用于构造
- 如果输入是 ASCII，则迭代并验证每个代码单元为 ASCII，然后按原样复制（因为 ASCII 是 UTF-8 的子集）;因为每个代码单元都是在恒定时间内复制的，所以这里的整体运算是 O（n）
String._fromUTF8Repairing(_:) 验证输入集合中的所有代码单元，以确保它们都是有效的 UTF-8
- 如果全部有效，则按原样复制整个集合（每个代码单元在恒定时间内），因此 O（n）
- 如果任何代码单元无效，则通过复制有效 UTF-8 代码单元的范围，并将无效 UTF-8 代码单元的范围替换为 Unicode 替换字符来修复字符串。每个代码单元只需要访问一次，如果有效，则复制，如果无效，则跳过，这是在恒定时间内完成的，这将使操作保持在 O（n）

所以：

ASCII 输入：O（n）（验证）+ O（n）（复制）= O（n）
非 ASCII、非 UTF-8 输入：O（n）（转码 + 复制）
连续 UTF-8 输入：O（n）（验证）+ O（n）（修复/复制）= O（n）
非连续 UTF-8 输入：O（n）（复制到数组中）+ O（n）（连续运算（3））= O（n）

或者，也许更直观：

复制字符串时，需要一次访问一个代码单元才能复制它
复制单个代码单元是一个常量时间操作（因为代码单元的长度（以字节为单位）具有固定的上限）
- 如果选择用不同的代码单元替换代码单元，则可以在恒定时间内做出决定（复制该字符只是恒定时间）

O（n [长度] * 1 [复制或替换]） = O（n）

上一个：输入换行符后，Swift 无法读取 textDocumentProxy.documentContextBeforeInput

下一个：如何在 swift 中使用 “ 和 ' 单引号和双引号过滤？

使用 Sequence 初始化字符串的时间复杂度是多少？

What is the time complexity of String initialization with a Sequence?

评论