Measure the size of a PostgreSQL table row

Q:
 

I have a PostgreSQL table. select * is very slow whereas select id is nice and quick. I think it may be that the size of the row is very large and it's taking a while to transport, or it may be some other factor.

I need all of the fields (or nearly all of them), so selecting just a subset isn't a quick fix. Selecting the fields that I want is still slow.

Here's my table schema minus the names:

integer                  | not null default nextval('core_page_id_seq'::regclass)
character varying(255)   | not null
character varying(64)    | not null
text                     | default '{}'::text
character varying(255)   | 
integer                  | not null default 0
text                     | default '{}'::text
text                     | 
timestamp with time zone | 
integer                  | 
timestamp with time zone | 
integer                  | 

The size of the text field may be any size. But still, no more than a few kilobytes in the worst case.

Questions

  1. Is there anything about this that screams 'crazy inefficient'?
  2. Is there a way to measure page size at the Postgres command-line to help me debug this?
 
 
A1:
 
 
 
 

Q2: way to measure page size

PostgreSQL provides a number of Database Object Size Functions, you can use. I packed the most interesting ones in this query and added some Statistics Access Functions.

This is going to demonstrate that the various methods to measure the "size of a row" can lead to very different results. It all depends what you want to measure exactly.

Replace public.tbl with your (optionally schema-qualified) table name to get a compact view of collected statistics about the size of your rows.

WITH x AS (
   SELECT count(*)               AS ct
        , sum(length(t::text))   AS txt_len  -- length in characters
        , 'public.tbl'::regclass AS tbl  -- provide (qualified) table name here
   FROM   public.tbl t  -- ... and here
   )
, y AS (
   SELECT ARRAY [pg_relation_size(tbl)
               , pg_relation_size(tbl, 'vm')
               , pg_relation_size(tbl, 'fsm')
               , pg_table_size(tbl)
               , pg_indexes_size(tbl)
               , pg_total_relation_size(tbl)
               , txt_len
             ] AS val
        , ARRAY ['core_relation_size'
               , 'visibility_map'
               , 'free_space_map'
               , 'table_size_incl_toast'
               , 'indexes_size'
               , 'total_size_incl_toast_and_indexes'
               , 'live_rows_in_text_representation'
             ] AS name
   FROM   x
   )
SELECT unnest(name)                AS what
     , unnest(val)                 AS "bytes/ct"
     , pg_size_pretty(unnest(val)) AS bytes_pretty
     , unnest(val) / ct            AS bytes_per_row
FROM   x, y

UNION ALL SELECT '------------------------------', NULL, NULL, NULL
UNION ALL SELECT 'row_count', ct, NULL, NULL FROM x
UNION ALL SELECT 'live_tuples', pg_stat_get_live_tuples(tbl), NULL, NULL FROM x
UNION ALL SELECT 'dead_tuples', pg_stat_get_dead_tuples(tbl), NULL, NULL FROM x;

I only pack the values in arrays and unnest() again, so I don't have to spell out calculations for every single row repeatedly.

General row count statistics are appended at the end with unconventional SQL-foo to get everything in one query. You could wrap it into a plpgsql function for repeated use, hand in the table name as parameter and use EXECUTE.

Result:

              what                | bytes/ct | bytes_pretty | bytes_per_row
-----------------------------------+----------+--------------+---------------
 core_relation_size                | 44138496 | 42 MB        |            91
 visibility_map                    |        0 | 0 bytes      |             0
 free_space_map                    |    32768 | 32 kB        |             0
 table_size_incl_toast             | 44179456 | 42 MB        |            91
 indexes_size                      | 33128448 | 32 MB        |            68
 total_size_incl_toast_and_indexes | 77307904 | 74 MB        |           159
 live_rows_in_text_representation  | 29987360 | 29 MB        |            62
 ------------------------------    |          |              |
 row_count                         |   483424 |              |
 live_tuples                       |   483424 |              |
 dead_tuples                       |     2677 |              |

The additional module pgstattuple provides more useful functions.

Update for Postgres 9.3+

We could use the new form of unnest() in pg 9.4 taking multiple parameters to unnest arrays in parallel.
But using LATERAL and a VALUES expression, this can be simplified further. Plus some other improvements:

SELECT l.what, l.nr AS "bytes/ct"
     , CASE WHEN is_size THEN pg_size_pretty(nr) END AS bytes_pretty
     , CASE WHEN is_size THEN nr / x.ct END          AS bytes_per_row
FROM  (
   SELECT min(tableoid)        AS tbl      -- same as 'public.tbl'::regclass::oid
        , count(*)             AS ct
        , sum(length(t::text)) AS txt_len  -- length in characters
   FROM   public.tbl t  -- provide table name *once*
   ) x
 , LATERAL (
   VALUES
      (true , 'core_relation_size'               , pg_relation_size(tbl))
    , (true , 'visibility_map'                   , pg_relation_size(tbl, 'vm'))
    , (true , 'free_space_map'                   , pg_relation_size(tbl, 'fsm'))
    , (true , 'table_size_incl_toast'            , pg_table_size(tbl))
    , (true , 'indexes_size'                     , pg_indexes_size(tbl))
    , (true , 'total_size_incl_toast_and_indexes', pg_total_relation_size(tbl))
    , (true , 'live_rows_in_text_representation' , txt_len)
    , (false, '------------------------------'   , NULL)
    , (false, 'row_count'                        , ct)
    , (false, 'live_tuples'                      , pg_stat_get_live_tuples(tbl))
    , (false, 'dead_tuples'                      , pg_stat_get_dead_tuples(tbl))
   ) l(is_size, what, nr);

Same result.

 

Q1: anything inefficient?

You could optimize column order to save some bytes per row, currently wasted to alignment padding:

 

integer                  | not null default nextval('core_page_id_seq'::regclass)
integer                  | not null default 0
character varying(255)   | not null
character varying(64)    | not null
text                     | default '{}'::text
character varying(255)   | 
text                     | default '{}'::text
text                     |
timestamp with time zone |
timestamp with time zone |
integer                  |
integer                  |

This saves between 8 and 18 bytes per row. I call it "column tetris". Details:

Also consider:

A2:
 
An approximation of the size of a row, including the TOAST'ed contents, is easy to get by querying the length of the TEXT representation of the entire row:
SELECT octet_length(t.*::text) FROM tablename AS t WHERE primary_key=:value;

This is a close approximation to the number of bytes that will be retrieved client-side when executing:

SELECT * FROM tablename WHERE primary_key=:value;

...assuming that the caller of the query is requesting results in text format, which is what most programs do (binary format is possible, but it's not worth the trouble in most cases).

The same technique could be applied to locate the N "biggest-in-text" rows of tablename:

SELECT primary_key, octet_length(t.*::text) FROM tablename AS t
   ORDER BY 2 DESC LIMIT :N;

 

 

 

注:
1、octet_length(string)函数表示的是Number of bytes in binary string,而length则表示的字符个数。
 
 
 
 
 

posted on 2017-11-06 17:02  Still water run deep  阅读(602)  评论(0编辑  收藏  举报

导航