analysis of PIMPL

Let's address your three questions one by one with a hardware-oriented, low-level perspective:

1️⃣ Pointer dereference overhead vs direct member access

Mechanism

Direct member access:

obj.member;

→ Compiler generates:

mov rax, [rdi + offset]

1 memory load from obj’s base address (stack or heap) with a fixed offset.

Pointer + member access (Pimpl):

obj.pImpl->member;

→ Compiler generates:

mov rax, [rdi]         ; load pImpl pointer
mov rax, [rax + offset]; load member from Impl

This adds one extra memory load to fetch the pointer first.

Timing

Cache hit (L1): ~0.5–1 ns per load (~3–5 CPU cycles on modern CPUs).
Cache miss (L2): ~4–12 ns.
DRAM miss: ~50–100 ns.

But note:

Both cases will still need to access member in memory eventually.
The extra pointer read will likely be in the same cache line as the object header (obj), so cost difference is ~1 cycle if cached.

🔹 Practical impact:

Accessing via Pimpl is slightly slower (~1 extra load) but usually negligible compared to cache misses caused by member access itself.
In tight loops over millions of objects, this might add ~5–10% overhead at most.

2️⃣ Function call overhead: Pimpl vs Virtual Function

Pimpl function call

void Widget::foo() { pImpl->foo(); }

foo() is a non-virtual call, can be inlined if compiler sees the definition.
Overhead:
- Pointer deref to pImpl (~1 cycle).
- Direct function call (~0–1 cycles if inlined).

Virtual function call

obj->foo(); // foo is virtual

Needs:
- Load vtable pointer.
- Load function pointer from vtable.
- Indirect call (can't be inlined easily).
Overhead:
- 2 memory loads (vtable + function pointer).
- Pipeline stall due to indirect branch prediction (~5–10 cycles worst case).

🔹 Result:

Pimpl call is cheaper than virtual call, and can often be optimized away if compiler has visibility (especially with LTO).
Virtual calls are inherently harder to inline, slightly more expensive due to indirect branching.

3️⃣ Heap allocation with Pimpl vs direct member

Pimpl

struct Widget {
    std::unique_ptr<Impl> pImpl;
};

Impl is usually constructed via new:

pImpl = std::make_unique<Impl>();

Always 1 heap allocation per object.
Heap allocation cost: ~50–200 ns (can vary with allocator, fragmentation).

Direct member

struct Widget {
    Impl impl;
};

The impl memory is stored inline with the Widget object.
Where Widget lives depends on how you allocate it:
- Widget w; → stack memory.
- new Widget → heap memory.
No extra allocation needed beyond where the object is stored.
One fewer pointer dereference.

🔹 Result:

Pimpl always incurs extra heap allocation unless you use advanced "small buffer optimization" tricks (e.g., std::optional<Impl> inline or custom arena allocator).
Direct member avoids this unless std::vector or similar internally allocates memory.

✅ Summary

Aspect	Direct Member	Pimpl (`unique_ptr<Impl>`)
Memory access	1 load	2 loads (pointer + member)
Typical overhead	~1 cycle difference	negligible unless very hot path
Function call	Direct/inlined	Direct/inlined (cheaper than virtual)
Allocation	Inline (stack or heap)	Always heap allocation (extra cost)
ABI stability, encapsulation	Poor	Excellent

So:

Overhead from pointer access: ~1 CPU cycle (negligible unless extremely hot).
Function call overhead: Pimpl < Virtual.
Heap allocation: Always present for Pimpl (unless optimized away manually); direct members can be stack or heap allocated as needed.

Would you like me to prepare a small C++ benchmark (Google Benchmark) that measures:

Direct member access
Pimpl access
Heap allocation cost
So you can see nanosecond-level differences on real hardware?

posted @ 2025-08-03 12:11 ijpq 阅读(23) 评论(0) 收藏举报

刷新页面返回顶部

0x01

computer arch/parallel programming/