提问人:user2902773 提问时间:11/12/2013 最后编辑:Karl Knechteluser2902773 更新时间:3/25/2023 访问量:90303
如何在 Python 中模拟可变字符串(如 Java 中的 StringBuffer 或 C# 中的 StringBuilder)?
How can I emulate a mutable string in Python (like StringBuffer in Java or StringBuilder in C#)?
答:
此链接可能对 Python 中的串联有用
http://pythonadventures.wordpress.com/2010/09/27/stringbuilder/
上面链接中的示例:
def g():
sb = []
for i in range(30):
sb.append("abcdefg"[i%7])
return ''.join(sb)
print g()
# abcdefgabcdefgabcdefgabcdefgab
评论
也许使用bytearray:
In [1]: s = bytearray('Hello World')
In [2]: s[:5] = 'Bye'
In [3]: s
Out[3]: bytearray(b'Bye World')
In [4]: str(s)
Out[4]: 'Bye World'
使用字节数组的吸引力在于其内存效率和方便的语法。它也可以比使用临时列表更快:
In [36]: %timeit s = list('Hello World'*1000); s[5500:6000] = 'Bye'; s = ''.join(s)
1000 loops, best of 3: 256 µs per loop
In [37]: %timeit s = bytearray('Hello World'*1000); s[5500:6000] = 'Bye'; str(s)
100000 loops, best of 3: 2.39 µs per loop
请注意,速度的大部分差异可归因于容器的创建:
In [32]: %timeit s = list('Hello World'*1000)
10000 loops, best of 3: 115 µs per loop
In [33]: %timeit s = bytearray('Hello World'*1000)
1000000 loops, best of 3: 1.13 µs per loop
评论
str
bytearray
['s', 't', 'r', 'i', 'n', 'g']
取决于你想做什么。如果你想要一个可变的序列,内置类型就是你的朋友,从 str 到 list 再返回就像:list
mystring = "abcdef"
mylist = list(mystring)
mystring = "".join(mylist)
如果你想使用 for 循环构建一个大字符串,pythonic 方法通常是构建一个字符串列表,然后用适当的分隔符(换行符或其他)将它们连接在一起。
否则,您还可以使用一些文本模板系统,或解析器或任何最适合这项工作的专用工具。
评论
str.join()
str.join()
蟒蛇 3
从文档中:
连接不可变序列总是会产生一个新对象。这意味着通过重复串联构建序列将在总序列长度中产生二次运行时成本。若要获得线性运行时成本,必须切换到以下替代方法之一: 如果连接 str 对象,您可以构建一个列表并在末尾使用 str.join() 或写入 IO。StringIO 实例,并在完成后检索其值
试验比较几个选项的运行时:
import sys
import timeit
from io import StringIO
from array import array
def test_concat():
out_str = ''
for _ in range(loop_count):
out_str += 'abc'
return out_str
def test_join_list_loop():
str_list = []
for _ in range(loop_count):
str_list.append('abc')
return ''.join(str_list)
def test_array():
char_array = array('b')
for _ in range(loop_count):
char_array.frombytes(b'abc')
return str(char_array.tostring())
def test_string_io():
file_str = StringIO()
for _ in range(loop_count):
file_str.write('abc')
return file_str.getvalue()
def test_join_list_compr():
return ''.join(['abc' for _ in range(loop_count)])
def test_join_gen_compr():
return ''.join('abc' for _ in range(loop_count))
loop_count = 80000
print(sys.version)
res = {}
for k, v in dict(globals()).items():
if k.startswith('test_'):
res[k] = timeit.timeit(v, number=10)
for k, v in sorted(res.items(), key=lambda x: x[1]):
print('{:.5f} {}'.format(v, k))
results
3.7.5 (default, Nov 1 2019, 02:16:32)
[Clang 11.0.0 (clang-1100.0.33.8)]
0.03738 test_join_list_compr
0.05681 test_join_gen_compr
0.09425 test_string_io
0.09636 test_join_list_loop
0.11976 test_concat
0.19267 test_array
Python 2
Efficient String Concatenation in Python is a rather old article and its main statement that the naive concatenation is far slower than joining is not valid anymore, because this part has been optimized in CPython since then. From the docs:
CPython implementation detail: If s and t are both strings, some Python implementations such as CPython can usually perform an in-place optimization for assignments of the form s = s + t or s += t. When applicable, this optimization makes quadratic run-time much less likely. This optimization is both version and implementation dependent. For performance sensitive code, it is preferable to use the str.join() method which assures consistent linear concatenation performance across versions and implementations.
I've adapted their code a bit and got the following results on my machine:
from cStringIO import StringIO
from UserString import MutableString
from array import array
import sys, timeit
def method1():
out_str = ''
for num in xrange(loop_count):
out_str += `num`
return out_str
def method2():
out_str = MutableString()
for num in xrange(loop_count):
out_str += `num`
return out_str
def method3():
char_array = array('c')
for num in xrange(loop_count):
char_array.fromstring(`num`)
return char_array.tostring()
def method4():
str_list = []
for num in xrange(loop_count):
str_list.append(`num`)
out_str = ''.join(str_list)
return out_str
def method5():
file_str = StringIO()
for num in xrange(loop_count):
file_str.write(`num`)
out_str = file_str.getvalue()
return out_str
def method6():
out_str = ''.join([`num` for num in xrange(loop_count)])
return out_str
def method7():
out_str = ''.join(`num` for num in xrange(loop_count))
return out_str
loop_count = 80000
print sys.version
print 'method1=', timeit.timeit(method1, number=10)
print 'method2=', timeit.timeit(method2, number=10)
print 'method3=', timeit.timeit(method3, number=10)
print 'method4=', timeit.timeit(method4, number=10)
print 'method5=', timeit.timeit(method5, number=10)
print 'method6=', timeit.timeit(method6, number=10)
print 'method7=', timeit.timeit(method7, number=10)
Results:
2.7.1 (r271:86832, Jul 31 2011, 19:30:53)
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)]
method1= 0.171155929565
method2= 16.7158739567
method3= 0.420584917068
method4= 0.231794118881
method5= 0.323612928391
method6= 0.120429992676
method7= 0.145267963409
Conclusions:
join
still wins over concat, but marginally- list comprehensions are faster than loops (when building a list)
- joining generators is slower than joining lists
- other methods are of no use (unless you're doing something special)
评论
之前提供的答案几乎总是最好的。但是,有时字符串是在许多方法调用和/或循环中构建的,因此构建行列表然后联接它们并不一定很自然。而且由于不能保证您使用的是 CPython,或者 CPython 的优化将适用,因此另一种方法是仅使用 !print
下面是一个示例 helper 类,尽管 helper 类是微不足道的,而且可能没有必要,但它用于说明该方法 (Python 3):
import io
class StringBuilder(object):
def __init__(self):
self._stringio = io.StringIO()
def __str__(self):
return self._stringio.getvalue()
def append(self, *objects, sep=' ', end=''):
print(*objects, sep=sep, end=end, file=self._stringio)
sb = StringBuilder()
sb.append('a')
sb.append('b', end='\n')
sb.append('c', 'd', sep=',', end='\n')
print(sb) # 'ab\nc,d\n'
只是我在 python 3.6.2 上运行的一个测试,表明“加入”仍然赢得大奖!
from time import time
def _with_format(i):
_st = ''
for i in range(0, i):
_st = "{}{}".format(_st, "0")
return _st
def _with_s(i):
_st = ''
for i in range(0, i):
_st = "%s%s" % (_st, "0")
return _st
def _with_list(i):
l = []
for i in range(0, i):
l.append("0")
return "".join(l)
def _count_time(name, i, func):
start = time()
r = func(i)
total = time() - start
print("%s done in %ss" % (name, total))
return r
iterationCount = 1000000
r1 = _count_time("with format", iterationCount, _with_format)
r2 = _count_time("with s", iterationCount, _with_s)
r3 = _count_time("with list and join", iterationCount, _with_list)
if r1 != r2 or r2 != r3:
print("Not all results are the same!")
输出为:
with format done in 17.991968870162964s
with s done in 18.36879801750183s
with list and join done in 0.12142801284790039s
评论
我在 Roee Gavirel 的代码中添加了 2 个额外的测试,这些测试最终表明,在 Python 3.6 之前,将列表连接成字符串并不比 s += “something” 快。更高版本具有不同的结果。
结果:
Python 2.7.15rc1
Iterations: 100000
format done in 0.317540168762s
%s done in 0.151262044907s
list+join done in 0.0055148601532s
str cat done in 0.00391721725464s
Python 3.6.7
Iterations: 100000
format done in 0.35594654083251953s
%s done in 0.2868080139160156s
list+join done in 0.005924701690673828s
str cat done in 0.0054128170013427734s
f str done in 0.12870001792907715s
Python 3.8.5
Iterations: 100000
format done in 0.1859891414642334s
%s done in 0.17499303817749023s
list+join done in 0.008001089096069336s
str cat done in 0.014998912811279297s
f str done in 0.1600024700164795s
法典:
from time import time
def _with_cat(i):
_st = ''
for i in range(0, i):
_st += "0"
return _st
def _with_f_str(i):
_st = ''
for i in range(0, i):
_st = f"{_st}0"
return _st
def _with_format(i):
_st = ''
for i in range(0, i):
_st = "{}{}".format(_st, "0")
return _st
def _with_s(i):
_st = ''
for i in range(0, i):
_st = "%s%s" % (_st, "0")
return _st
def _with_list(i):
l = []
for i in range(0, i):
l.append("0")
return "".join(l)
def _count_time(name, i, func):
start = time()
r = func(i)
total = time() - start
print("%s done in %ss" % (name, total))
return r
iteration_count = 100000
print('Iterations: {}'.format(iteration_count))
r1 = _count_time("format ", iteration_count, _with_format)
r2 = _count_time("%s ", iteration_count, _with_s)
r3 = _count_time("list+join", iteration_count, _with_list)
r4 = _count_time("str cat ", iteration_count, _with_cat)
r5 = _count_time("f str ", iteration_count, _with_f_str)
if len(set([r1, r2, r3, r4, r5])) != 1:
print("Not all results are the same!")
评论
+=
.extend
+
list
+=
Python 提供的最接近可变字符串或 StringBuffer 的东西可能是来自标准库模块的 Unicode 类型数组。在只想编辑字符串的一小部分的情况下,它可能很有用:array
modifications = [(2, 3, 'h'), (0, 6, '!')]
n_rows = multiline_string.count('\n')
strarray = array.array('u', multiline_string)
for row, column, character in modifications:
strarray[row * (n_rows + 1) + column] = character
multiline_string = map_strarray.tounicode()
这是我的实现:StringBuffer
class StringBuffer:
def __init__(self, s:str=None):
self._a=[] if s is None else [s]
def a(self, v):
self._a.append(str(v))
return self
def al(self, v):
self._a.append(str(v))
self._a.append('\n')
return self
def ts(self, delim=''):
return delim.join(self._a)
def __bool__(self): return True
用法:
sb = StringBuffer('{')
for i, (k, v) in enumerate({'k1':'v1', 'k2': 'v2'}.items()):
if i > 0: sb.a(', ')
sb.a('"').a(k).a('": ').a('"').a(v)
sb.a('}')
print(sb.ts('\n'))
这将输出 .{"k1": "v1, "k2": "v2}
评论
join()