Cannot enlarge string buffer containing XX bytes by XX more bytes

在ELK的数据库报警系统中,发现有台机器报出了下面的错误:

2018-12-04 18:55:26.842 CST,"XXX","XXX",21106,"XXX",5c065c3d.5272,4,"idle",2018-12-04 18:51:41 CST,117/0,0,ERROR,54000,"out of memory","Cannot enlarge string buffer containing 0 bytes by 1342177281 more bytes.",,,,,,,"enlargeStringInfo, stringinfo.c:268",""

当看到是发生了OOM时,以为是整个数据库实例存在了问题,线上检查发现数据库正常,后查阅资料了解到,pg对于一次执行的查询语句长度是有限制的,如果长度超过了1G,则会报出上面的错误。

上面日志中的1342177281 bytes是查询的长度。

在使用copy的时候,也常会报出类似的问题,此时就要根据报错,查看对应的行数是不是由于引号或转义问题导致了对应行没有恰当的结束,或者是一整行的内容大于了1G。

下面是翻阅pg9.6源码找到的相关内容:

结合注释,pg的源码很容易看懂。

src/include/utils/memutils.h

/*
 * MaxAllocSize, MaxAllocHugeSize
 *      Quasi-arbitrary limits on size of allocations.
 *
 * Note:
 *      There is no guarantee that smaller allocations will succeed, but
 *      larger requests will be summarily denied.
 *
 * palloc() enforces MaxAllocSize, chosen to correspond to the limiting size
 * of varlena objects under TOAST.  See VARSIZE_4B() and related macros in
 * postgres.h.  Many datatypes assume that any allocatable size can be
 * represented in a varlena header.  This limit also permits a caller to use
 * an "int" variable for an index into or length of an allocation.  Callers
 * careful to avoid these hazards can access the higher limit with
 * MemoryContextAllocHuge().  Both limits permit code to assume that it may
 * compute twice an allocation's size without overflow.
 */
#define MaxAllocSize    ((Size) 0x3fffffff)     /* 1 gigabyte - 1 */

src/backend/lib/stringinfo.c

/*
* enlargeStringInfo
*
* Make sure there is enough space for 'needed' more bytes
* ('needed' does not include the terminating null).
*
* External callers usually need not concern themselves with this, since
* all stringinfo.c routines do it automatically.  However, if a caller
* knows that a StringInfo will eventually become X bytes large, it
* can save some palloc overhead by enlarging the buffer before starting
* to store data in it.
*
* NB: because we use repalloc() to enlarge the buffer, the string buffer
* will remain allocated in the same memory context that was current when
* initStringInfo was called, even if another context is now current.
* This is the desired and indeed critical behavior!
*/
void
enlargeStringInfo(StringInfo str, int needed)
{
   int         newlen;

   /*
    * Guard against out-of-range "needed" values.  Without this, we can get
    * an overflow or infinite loop in the following.
    */
   if (needed < 0)             /* should not happen */
       elog(ERROR, "invalid string enlargement request size: %d", needed);
   if (((Size) needed) >= (MaxAllocSize - (Size) str->len))
       ereport(ERROR,
               (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                errmsg("out of memory"),
                errdetail("Cannot enlarge string buffer containing %d bytes by %d more bytes.",
                          str->len, needed)));

   needed += str->len + 1;     /* total space required now */

   /* Because of the above test, we now have needed <= MaxAllocSize */

   if (needed <= str->maxlen)
       return;                 /* got enough space already */

   /*
    * We don't want to allocate just a little more space with each append;
    * for efficiency, double the buffer size each time it overflows.
    * Actually, we might need to more than double it if 'needed' is big...
    */
   newlen = 2 * str->maxlen;
   while (needed > newlen)
       newlen = 2 * newlen;

   /*
    * Clamp to MaxAllocSize in case we went past it.  Note we are assuming
    * here that MaxAllocSize <= INT_MAX/2, else the above loop could
    * overflow.  We will still have newlen >= needed.
    */
   if (newlen > (int) MaxAllocSize)
       newlen = (int) MaxAllocSize;

   str->data = (char *) repalloc(str->data, newlen);

   str->maxlen = newlen;
}

src/include/lib/stringinfo.h

下面是字符串存储用到的结构体:

/*-------------------------
 * StringInfoData holds information about an extensible string.
 *      data    is the current buffer for the string (allocated with palloc).
 *      len     is the current string length.  There is guaranteed to be
 *              a terminating '\0' at data[len], although this is not very
 *              useful when the string holds binary data rather than text.
 *      maxlen  is the allocated size in bytes of 'data', i.e. the maximum
 *              string size (including the terminating '\0' char) that we can
 *              currently store in 'data' without having to reallocate
 *              more space.  We must always have maxlen > len.
 *      cursor  is initialized to zero by makeStringInfo or initStringInfo,
 *              but is not otherwise touched by the stringinfo.c routines.
 *              Some routines use it to scan through a StringInfo.
 *-------------------------
 */
typedef struct StringInfoData
{
    char       *data;
    int         len;
    int         maxlen;
    int         cursor;
} StringInfoData;

typedef StringInfoData *StringInfo;

从存放字符串或二进制的结构体StringInfoData中,可以看出pg字符串类型不支持\u0000的原因,因为在pg中的字符串形式是C strings,是以\0结束的字符串,\0在ASCII中叫做NUL,Unicode编码表示为\u0000,八进制则为0x00,如果字符串中包含\0,pg会当做字符串的结束符。

pg中的字符串不支持其中包含NULL(\0x00),这个很明显是不同于NULL值的,NULL值pg是支持的。

在具体的使用中,可以将\u0000替换掉再导入pg数据库。

在其他数据库导入pg时,可以使用下面方式替换:

regexp_replace(stringWithNull, '\\u0000', '', 'g')

java程序中替换:

str.replaceAll('\u0000', '')

vim替换:

s/\x00//g;

参考:

src/backend/lib/stringinfo.c

src/include/lib/stringinfo.h

src/include/utils/memutils.h

https://en.wikipedia.org/wiki/Null-terminated_string

https://stackoverflow.com/questions/1347646/postgres-error-on-insert-error-invalid-byte-sequence-for-encoding-utf8-0x0?rq=1

posted on 2018-12-07 23:21 Still water run deep 阅读(...) 评论(...) 编辑 收藏

导航

公告