std function如何消除不同functor的类型和存储差别

intro

std::function颇有类似于python这种动态语言的特性：同一个类型可以容纳函数指针，类对象，lambda表达式等不同类型的调用方法。它既有动态语言的运行时灵活，又有静态语言的编译时安全。

更进一步，不同结构初始化的function可以放在同一个数组中。“数组”又是C/C++语言中一个非常强的限制结构：所有的成员都是相同的类型，相同的内存大小。但是，这些不同结构初始化的function如何做到具有相同的静态类型和内存布局？一个对象初始化的function明显和一个函数初始化的function需要不同的内存大小，它们的底层调用也有不同的逻辑。

内存

内存大小

无论什么结构，本质上都是需要特定大小的存储空间。C语言最基础的数据结构就是char，所以不同类型存储需要的内存统一定义为char类型的数组。只是对于不同的具体类型，这个内存存储的是不同内容。对于函数，这里可能存储的是一个函数指针即可；对于更大的object，这里可能也是指向一个heap上分配的动态对象。而这个具体的内存如何解释，可以在运行时通过函数转换为特定类型即可。

在function中，这个用来占位内存的就是这个_Any_data结构。所有的function对象，无论是使用什么结构初始化的，它的存储都必须要容纳到这个类型结构中。

///@file: gcc\libstdc++-v3\include\bits\std_function.h
  union _Nocopy_types
  {
    void*       _M_object;
    const void* _M_const_object;
    void (*_M_function_pointer)();
    void (_Undefined_class::*_M_member_pointer)();
  };

  union [[gnu::may_alias]] _Any_data
  {
    void*       _M_access()       noexcept { return &_M_pod_data[0]; }
    const void* _M_access() const noexcept { return &_M_pod_data[0]; }

    template<typename _Tp>
      _Tp&
      _M_access() noexcept
      { return *static_cast<_Tp*>(_M_access()); }

    template<typename _Tp>
      const _Tp&
      _M_access() const noexcept
      { return *static_cast<const _Tp*>(_M_access()); }

    _Nocopy_types _M_unused;
    char _M_pod_data[sizeof(_Nocopy_types)];
  };

local vs heap

很显然，对于object初始化的function，这个大小的_Any_data根本无法容纳如此多的内存；反之，如果单单是一个函数指针，它可以直接放在_Any_data中。至于functor具体是放在local还是都要分配堆内存(并在_Any_data中保存这个堆内存的地址)，好在可以在编译时确定。

///@file: libstdc++-v3\include\bits\std_function.h
    template<typename _Functor>
      class _Base_manager
      {
      protected:
	static const bool __stored_locally =
	(__is_location_invariant<_Functor>::value
	 && sizeof(_Functor) <= _M_max_size
	 && __alignof__(_Functor) <= _M_max_align
	 && (_M_max_align % __alignof__(_Functor) == 0));

	using _Local_storage = integral_constant<bool, __stored_locally>;

然后根据编译时数值不同，决定是使用local还是heap内存。当然，内存释放也需要做相应的处理。


	template<typename _Fn>
	  static void
	  _M_init_functor(_Any_data& __functor, _Fn&& __f)
	  noexcept(__and_<_Local_storage,
			  is_nothrow_constructible<_Functor, _Fn>>::value)
	  {
	    _M_create(__functor, std::forward<_Fn>(__f), _Local_storage());
	  }
	  
	// Construct a location-invariant function object that fits within
	// an _Any_data structure.
	template<typename _Fn>
	  static void
	  _M_create(_Any_data& __dest, _Fn&& __f, true_type)
	  {
	    ::new (__dest._M_access()) _Functor(std::forward<_Fn>(__f));
	  }

	// Construct a function object on the heap and store a pointer.
	template<typename _Fn>
	  static void
	  _M_create(_Any_data& __dest, _Fn&& __f, false_type)
	  {
	    __dest._M_access<_Functor*>()
	      = new _Functor(std::forward<_Fn>(__f));
	  }

函数指针

在内存类型统一之后，这些内存的具体意义肯定需要根据不同的实例化对象来做特定的解释，这个工作就由function中的_M_invoker来完成。它是一个韩式指针，参数是function模板参数指定的函数类型和额外的一个、前面提到的_Any_data类型。

这个_Invoker_type类型的函数指针根据不同的functor类型指向不同的具体函数实现，并对_Any_data结构进行解析之后调用functor的具体函数。

    private:
      using _Invoker_type = _Res (*)(const _Any_data&, _ArgTypes&&...);
      _Invoker_type _M_invoker = nullptr;

在实现中，同样是根据functor的类型来生成不同的函数指针。在function的构造函数中，根据入参functor的不同，创建一个_Handler<_Functor>类型，这个类型根据模板参数的不同而实例化不同的类内部静态函数。构造函数把不同模板类中的静态函数复制给类内部的函数指针。

如此一来，不同functor的类型信息本质上是在函数指针指向的函数实现内部中体现的。但是因为function本身保存的是统一的函数指针，所以所有function对象都有相同的编译类型。


      /**
       *  @brief Builds a %function that targets a copy of the incoming
       *  function object.
       *  @param __f A %function object that is callable with parameters of
       *  type `ArgTypes...` and returns a value convertible to `Res`.
       *
       *  The newly-created %function object will target a copy of
       *  `__f`. If `__f` is `reference_wrapper<F>`, then this function
       *  object will contain a reference to the function object `__f.get()`.
       *  If `__f` is a null function pointer, null pointer-to-member, or
       *  empty `std::function`, the newly-created object will be empty.
       *
       *  If `__f` is a non-null function pointer or an object of type
       *  `reference_wrapper<F>`, this function will not throw.
       */
      // _GLIBCXX_RESOLVE_LIB_DEFECTS
      // 2774. std::function construction vs assignment
      template<typename _Functor,
	       typename _Constraints = _Requires<_Callable<_Functor>>>
	function(_Functor&& __f)
	noexcept(_Handler<_Functor>::template _S_nothrow_init<_Functor>())
	: _Function_base()
	{
	  static_assert(is_copy_constructible<__decay_t<_Functor>>::value,
	      "std::function target must be copy-constructible");
	  static_assert(is_constructible<__decay_t<_Functor>, _Functor>::value,
	      "std::function target must be constructible from the "
	      "constructor argument");

	  using _My_handler = _Handler<_Functor>;

	  if (_My_handler::_M_not_empty_function(__f))
	    {
	      _My_handler::_M_init_functor(_M_functor,
					   std::forward<_Functor>(__f));
	      _M_invoker = &_My_handler::_M_invoke;
	      _M_manager = &_My_handler::_M_manager;
	    }
	}

wrap up

any_data

使用any_data定义为char数组来占位内存，不同类型对内存通过类型相同，指向不同的函数指针来解释内存的具体意义。

模板函数的类型推导

使用语言提供的函数类型推导，可以获得函数参数的类型信息。

函数内利用推导的参数实例化模板类

在函数内部，利用编译器推导出来的参数类型作为模板函数的类型参数，从而让不同的参数具有不同的函数实现(函数行为)。例如，生成不同实现的函数指针。

demo

仿照std::function的思路，我们可以实现类似的、poor man's implementation: 同样可以将不同大小，不同类型的结构放入同一个类型，并且调用相同的接口并转发到具体类的实现。当然，这里的实现比std的实现简陋到了原始的级别。

tsecer@harry: cat FunctionDemo.cpp
#include <iostream>

using namespace std;

// Like _Any_data in std::function
struct Any
{
    char pod[sizeof(void*)];
};

// Like _Handler in std::function
template<typename T>
struct Handler
{
    static void Init(Any &a, const T &t)
    {
        if (sizeof(T) > sizeof(a))
        {
            *(T**)&a.pod = new T(t);
        }
        else
        {
            *((T*)&a.pod) = t;
        }
    }

    static void Disp(const Any &a)
    {
        if (sizeof(T) > sizeof(a))
        {
            cout << **(T**)&a.pod << endl;
        }
        else
        {
            cout << *(T*)&a.pod << endl;
        }
    }
};

// Struct Larger than a pointer
struct Large
{
    const char *m_p1;
    const char *m_p2;
};
std::ostream& operator<<(std::ostream& os, const Large& obj)
{
    os << obj.m_p1 << obj.m_p2;
    return os;
}


// Like std::function in std::function
struct Function
{
    Any m_a;
    using DispFunc = void(const Any&);
    DispFunc *m_pDisp;

    template <typename T>
    Function(const T t)
    {
        using H = Handler<T>;
        H::Init(m_a, t);
        m_pDisp = H::Disp;
    }
    void Disp() const
    {
        m_pDisp(m_a);
    }
};

int main()
{
    // Function different type can be put in same array with same interface, like std::function
    Function a[] = { Large{"hello", "wolrd"}, 12356, "tsecer" };
    for (const auto &i : a)
    {
        i.Disp();
    }
}

tsecer@harry: g++ FunctionDemo.cpp
tsecer@harry: ./a.out
hellowolrd
12356
tsecer
tsecer@harry:

outro

了解了大致的实现思路之后，不得不赞叹作者的奇思妙想：通过朴素的char数组和函数指针这些C语言都已经存在的功能，实现了看似不可能的神奇功能。

不得不感慨：“高端的食材往往只需要简单的烹饪方法”。

posted on 2025-12-06 20:29 tsecer 阅读(0) 评论(0) 收藏举报

刷新页面返回顶部

tsecer