analysis of PIMPL
Let's address your three questions one by one with a hardware-oriented, low-level perspective:
1️⃣ Pointer dereference overhead vs direct member access
Mechanism
- Direct member access:
obj.member;
→ Compiler generates:
mov rax, [rdi + offset]
1 memory load from obj’s base address (stack or heap) with a fixed offset.
- Pointer + member access (Pimpl):
obj.pImpl->member;
→ Compiler generates:
mov rax, [rdi] ; load pImpl pointer
mov rax, [rax + offset]; load member from Impl
This adds one extra memory load to fetch the pointer first.
Timing
- Cache hit (L1): ~0.5–1 ns per load (~3–5 CPU cycles on modern CPUs).
- Cache miss (L2): ~4–12 ns.
- DRAM miss: ~50–100 ns.
But note:
- Both cases will still need to access
memberin memory eventually. - The extra pointer read will likely be in the same cache line as the object header (
obj), so cost difference is ~1 cycle if cached.
🔹 Practical impact:
- Accessing via Pimpl is slightly slower (~1 extra load) but usually negligible compared to cache misses caused by member access itself.
- In tight loops over millions of objects, this might add ~5–10% overhead at most.
2️⃣ Function call overhead: Pimpl vs Virtual Function
Pimpl function call
void Widget::foo() { pImpl->foo(); }
-
foo()is a non-virtual call, can be inlined if compiler sees the definition. -
Overhead:
- Pointer deref to
pImpl(~1 cycle). - Direct function call (~0–1 cycles if inlined).
- Pointer deref to
Virtual function call
obj->foo(); // foo is virtual
-
Needs:
- Load vtable pointer.
- Load function pointer from vtable.
- Indirect call (can't be inlined easily).
-
Overhead:
- 2 memory loads (vtable + function pointer).
- Pipeline stall due to indirect branch prediction (~5–10 cycles worst case).
🔹 Result:
- Pimpl call is cheaper than virtual call, and can often be optimized away if compiler has visibility (especially with LTO).
- Virtual calls are inherently harder to inline, slightly more expensive due to indirect branching.
3️⃣ Heap allocation with Pimpl vs direct member
Pimpl
struct Widget {
std::unique_ptr<Impl> pImpl;
};
Implis usually constructed vianew:
pImpl = std::make_unique<Impl>();
- Always 1 heap allocation per object.
- Heap allocation cost: ~50–200 ns (can vary with allocator, fragmentation).
Direct member
struct Widget {
Impl impl;
};
-
The
implmemory is stored inline with theWidgetobject. -
Where
Widgetlives depends on how you allocate it:Widget w;→ stack memory.new Widget→ heap memory.
-
No extra allocation needed beyond where the object is stored.
-
One fewer pointer dereference.
🔹 Result:
- Pimpl always incurs extra heap allocation unless you use advanced "small buffer optimization" tricks (e.g.,
std::optional<Impl>inline or custom arena allocator). - Direct member avoids this unless
std::vectoror similar internally allocates memory.
✅ Summary
| Aspect | Direct Member | Pimpl (unique_ptr<Impl>) |
|---|---|---|
| Memory access | 1 load | 2 loads (pointer + member) |
| Typical overhead | ~1 cycle difference | negligible unless very hot path |
| Function call | Direct/inlined | Direct/inlined (cheaper than virtual) |
| Allocation | Inline (stack or heap) | Always heap allocation (extra cost) |
| ABI stability, encapsulation | Poor | Excellent |
So:
- Overhead from pointer access: ~1 CPU cycle (negligible unless extremely hot).
- Function call overhead: Pimpl < Virtual.
- Heap allocation: Always present for Pimpl (unless optimized away manually); direct members can be stack or heap allocated as needed.
Would you like me to prepare a small C++ benchmark (Google Benchmark) that measures:
- Direct member access
- Pimpl access
- Heap allocation cost
So you can see nanosecond-level differences on real hardware?
本文来自博客园,作者:ijpq,转载请注明原文链接:https://www.cnblogs.com/ijpq/p/19019781

浙公网安备 33010602011771号