提问人:Robert Gould 提问时间:9/29/2008 最后编辑:Yu HaoRobert Gould 更新时间:10/4/2017 访问量:15054
C/C++ 的多线程内存分配器
Multithreaded Memory Allocators for C/C++
问:
我目前有大量的多线程服务器应用程序,我正在四处寻找一个好的多线程内存分配器。
到目前为止,我在以下两者之间左右为难:
- 孙的乌姆
- 谷歌的 tcmalloc
- 英特尔的线程构建块分配器
- 埃默里·伯杰(Emery Berger)的囤积
据我所知,囤积可能是最快的,但在今天之前我没有听说过它,所以我怀疑它是否真的像看起来那么好。有人有尝试这些分配器的个人经验吗?
答:
也许这是处理你所要求的错误方式,但也许可以采用完全不同的策略。如果你正在寻找一个非常快速的内存分配器,也许你应该问为什么你需要花那么多时间分配内存,而你也许可以摆脱变量的堆栈分配。堆栈分配虽然更烦人,但做得好可以为您节省很多互斥锁争用,并防止代码中出现奇怪的内存损坏问题。此外,您可能会减少碎片,这可能会有所帮助。
评论
几年前,我们在我工作的一个项目中使用了囤积。它似乎效果很好。我没有使用其他分配器的经验。尝试不同的方法并进行负载测试应该很容易,不是吗?
我使用过 tcmalloc 并阅读过有关 Hoard 的信息。两者都具有相似的实现,并且都实现了相对于线程/CPU 数量的大致线性性能扩展(根据各自站点上的图表)。
因此:如果性能真的如此重要,那么请进行性能/负载测试。否则,只需掷骰子并选择列出的一个(根据目标平台上的易用性进行加权)。
从 trshiv 的链接来看,Hoard、tcmalloc 和 ptmalloc 在速度上都大致相当。总的来说,tt 看起来 ptmalloc 针对尽可能少的空间进行了优化,Hoard 针对速度 + 内存使用进行了优化,而 tcmalloc 针对纯速度进行了优化。
我个人更喜欢并推荐 ptmalloc 作为多线程分配器。Hoard 很好,但在几年前我的团队对 Hoard 和 ptmalloc 进行的评估中,ptmalloc 更好。据我所知,ptmalloc 已经存在了很多年,并且被广泛用作多线程分配器。
您可能会发现这种比较很有用。
评论
要真正判断哪个内存分配器适合您的应用程序,唯一的方法是尝试一些。提到的所有分配器都是由聪明人编写的,并且将在一个特定的微基准测试或另一个微基准测试上击败其他分配器。如果你的应用程序一整天都在线程 A 中 malloc 一个 8 字节的块,然后在线程 B 中释放它,并且根本不需要处理任何其他事情,那么你可能会编写一个内存分配器,以击败迄今为止列出的任何一个。它只是对其他很多事情都没有多大用处。:)
I have some experience using Hoard where I work (enough so that one of the more obscure bugs addressed in the recent 3.8 release was found as a result of that experience). It's a very good allocator - but how good, for you, depends on your workload. And you do have to pay for Hoard (though it's not too expensive) in order to use it in a commercial project without GPL'ing your code.
A very slightly adapted ptmalloc2 has been the allocator behind glibc's malloc for quite a while now, and so it's incredibly widely used and tested. If stability is important above all things, it might be a good choice, but you didn't mention it in your list, so I'll assume it's out. For certain workloads, it's terrible - but the same is true of any general purpose malloc.
If you're willing to pay for it (and the price is reasonable, in my experience), SmartHeap SMP is also a good choice. Most of the other allocators mentioned are designed as drop-in malloc/free new/delete replacements that can be LD_PRELOAD'd. SmartHeap can be used that way as well, but it also includes an entire allocation-related API that lets you fine-tune your allocators to your heart's content. In tests that we've done (again, very specific to a particular application), SmartHeap was about the same as Hoard for performance when acting as a drop-in malloc replacement; the real difference between the two is the degree of customization. You can get better performance the less general-purpose you need your allocator to be.
And depending on your use case, a general-purpose multithreaded allocator might not be what you want to use at all; if you're constantly malloc & free'ing objects that are all the same size, you might want to just write a simple slab allocator. Slab allocation is used in several places in the Linux kernel that fit that description. (I would give you a couple more useful links, but I'm a "new user" and Stack Overflow has decided that new users are not allowed to be too helpful all in one answer. Google can help out well enough, though.)
Probably a late response to your question , but
why to do mallocs if you have performance hick ups ?
Better way would be to do a malloc of a big memory window at the initialization and then come up with a that would . light weight Memory manager
lease out the memory chunks at run time
This avoids any possibility of system calls if your heap expansion.
You can try ltalloc (general purpose global memory allocator with speed of fast pool allocator).
The locklessinc allocator is very good and the developer is responsive if you have questions. There's an article he wrote about some of the optimization tricks used, it's an interesting read: http://locklessinc.com/articles/allocator_tricks/. I've used it in the past with excellent results.
评论