analysis of PIMPL

Let's address your three questions one by one with a hardware-oriented, low-level perspective:


1️⃣ Pointer dereference overhead vs direct member access

Mechanism

  • Direct member access:
obj.member;

→ Compiler generates:

mov rax, [rdi + offset]

1 memory load from obj’s base address (stack or heap) with a fixed offset.


  • Pointer + member access (Pimpl):
obj.pImpl->member;

→ Compiler generates:

mov rax, [rdi]         ; load pImpl pointer
mov rax, [rax + offset]; load member from Impl

This adds one extra memory load to fetch the pointer first.


Timing

  • Cache hit (L1): ~0.5–1 ns per load (~3–5 CPU cycles on modern CPUs).
  • Cache miss (L2): ~4–12 ns.
  • DRAM miss: ~50–100 ns.

But note:

  • Both cases will still need to access member in memory eventually.
  • The extra pointer read will likely be in the same cache line as the object header (obj), so cost difference is ~1 cycle if cached.

🔹 Practical impact:

  • Accessing via Pimpl is slightly slower (~1 extra load) but usually negligible compared to cache misses caused by member access itself.
  • In tight loops over millions of objects, this might add ~5–10% overhead at most.

2️⃣ Function call overhead: Pimpl vs Virtual Function

Pimpl function call

void Widget::foo() { pImpl->foo(); }
  • foo() is a non-virtual call, can be inlined if compiler sees the definition.

  • Overhead:

    • Pointer deref to pImpl (~1 cycle).
    • Direct function call (~0–1 cycles if inlined).

Virtual function call

obj->foo(); // foo is virtual
  • Needs:

    • Load vtable pointer.
    • Load function pointer from vtable.
    • Indirect call (can't be inlined easily).
  • Overhead:

    • 2 memory loads (vtable + function pointer).
    • Pipeline stall due to indirect branch prediction (~5–10 cycles worst case).

🔹 Result:

  • Pimpl call is cheaper than virtual call, and can often be optimized away if compiler has visibility (especially with LTO).
  • Virtual calls are inherently harder to inline, slightly more expensive due to indirect branching.

3️⃣ Heap allocation with Pimpl vs direct member

Pimpl

struct Widget {
    std::unique_ptr<Impl> pImpl;
};
  • Impl is usually constructed via new:
pImpl = std::make_unique<Impl>();
  • Always 1 heap allocation per object.
  • Heap allocation cost: ~50–200 ns (can vary with allocator, fragmentation).

Direct member

struct Widget {
    Impl impl;
};
  • The impl memory is stored inline with the Widget object.

  • Where Widget lives depends on how you allocate it:

    • Widget w; → stack memory.
    • new Widget → heap memory.
  • No extra allocation needed beyond where the object is stored.

  • One fewer pointer dereference.


🔹 Result:

  • Pimpl always incurs extra heap allocation unless you use advanced "small buffer optimization" tricks (e.g., std::optional<Impl> inline or custom arena allocator).
  • Direct member avoids this unless std::vector or similar internally allocates memory.

✅ Summary

Aspect Direct Member Pimpl (unique_ptr<Impl>)
Memory access 1 load 2 loads (pointer + member)
Typical overhead ~1 cycle difference negligible unless very hot path
Function call Direct/inlined Direct/inlined (cheaper than virtual)
Allocation Inline (stack or heap) Always heap allocation (extra cost)
ABI stability, encapsulation Poor Excellent

So:

  • Overhead from pointer access: ~1 CPU cycle (negligible unless extremely hot).
  • Function call overhead: Pimpl < Virtual.
  • Heap allocation: Always present for Pimpl (unless optimized away manually); direct members can be stack or heap allocated as needed.

Would you like me to prepare a small C++ benchmark (Google Benchmark) that measures:

  1. Direct member access
  2. Pimpl access
  3. Heap allocation cost
    So you can see nanosecond-level differences on real hardware?
posted @ 2025-08-03 12:11  ijpq  阅读(23)  评论(0)    收藏  举报