Monday, August 7, 2017

How to Find Values of User Variables With gdb

In his comment to my announcement of the previous post, Shane Bester kindly suggested to consider pretty printing the information about user variables from gdb. I tried to do this tonight, after a long working day, while working with the same Percona server 5.7.x on CentOS 6.9, and found several interesting details to share even before getting to the pretty printing part (that I'd surely try to avoid doing with Python anyway, as I am lazy and not in a mood to use that programming language for a decade already). So, I decided to share them in a separate post.

To begin with, I checked if actually using gdb for this task is needed for anyone else but those poor souls who has to study core dumps after the crash. This time I do not want Mark Leith to step in with a comment that I had better use <some cool P_S feature> (instead of attaching gdb to properly working production server), so I did my homework...

One can surely use the user_variables_by_thread table in the Performance Schema to get these details if server still works, P_S is enabled and it's possible to connect to it to run SQL queries. There is one minor problem (the type of user variable is missing there), that I was happy to report immediately as Bug #87341. By the way, I had found old feature request, Bug #20307, that I verified 11 years ago and that should be just properly closed because of this nice feature of current MySQL.

So, there is (small) reason to use gdb even before any crashes, and I did it immediately after settings values for few user variables in connection with id 16 with the following statements:
set @a := 'aaa', @b := 254, @c := sysdate();
set @e = 10.5;
I also want to show how one may use (relatively) new column in the threads table of Performance Schema while studying live server with gdb. This was also a hint from Mark Leith in a comment on my older post, to check THREAD_OS_ID column:
mysql> select * from threads\G
...
*************************** 27. row ***************************
          THREAD_ID: 42
               NAME: thread/sql/one_connection
               TYPE: FOREGROUND
     PROCESSLIST_ID: 16
   PROCESSLIST_USER: root
   PROCESSLIST_HOST: localhost
     PROCESSLIST_DB: test
PROCESSLIST_COMMAND: Sleep
   PROCESSLIST_TIME: 195
  PROCESSLIST_STATE: NULL
   PROCESSLIST_INFO: NULL
   PARENT_THREAD_ID: NULL
               ROLE: NULL
       INSTRUMENTED: YES
            HISTORY: YES
    CONNECTION_TYPE: Socket
       THREAD_OS_ID: 2224
...
My first naive attempt to check the details about user variables with gdb ended up as follows:
(gdb) thread 3
[Switching to thread 3 (Thread 0x7fc5997bc700 (LWP 2224))]#0  0x00007fc5cc299383 in poll () from /lib64/libc.so.6
(gdb) p do_command::thd->m_thread_id
$1 = 16
(gdb) p do_command::thd->user_vars
$2 = {key_offset = 0, key_length = 0, blength = 8, records = 4, flags = 0,
  array = {buffer = 0x7fc5aabf1560 "\001", elements = 4, max_element = 16,
    alloc_increment = 32, size_of_element = 16, m_psi_key = 38},
  get_key = 0xc63630 <get_var_key(user_var_entry*, size_t*, my_bool)>,
  free = 0xc636c0 <free_user_var(user_var_entry*)>, charset = 0x1ded740,
  hash_function = 0xeb6990 <cset_hash_sort_adapter>, m_psi_key = 38}
I see 4 elements for the thread I am interested in. Looks good. I also assume from the above that they are of user_var_entry type. So, given some buffer, I try to cast it to this type and check (that was stupid, especially further pointer increment attempt - if that were an array of structures imagine how long would it take to find some variable by name!):
(gdb) p do_command::thd->user_vars->array->buffer
$3 = (uchar *) 0x7fc5aabf1560 "\001"
(gdb) p (user_var_entry *)(do_command::thd->user_vars->array->buffer)
$4 = (user_var_entry *) 0x7fc5aabf1560
(gdb) p sizeof(user_var_entry)
$5 = 104
(gdb) set $vars = (user_var_entry *)(do_command::thd->user_vars->array->buffer)
(gdb) p $vars
$6 = (user_var_entry *) 0x7fc5aabf1560
(gdb) p *($vars)
$7 = {static extra_size = 8, m_ptr = 0x1 <Address 0x1 out of bounds>,
  m_length = 140486683628224, m_type = 4294967295
, m_owner = 0x7fc59adf90e0,
  m_catalog = {str = 0xffffffff <Address 0xffffffff out of bounds>,
    length = 140486683628064}, entry_name = {
    m_str = 0xffffffff <Address 0xffffffff out of bounds>,
    m_length = 140486683627904}, collation = {collation = 0x0,
    derivation = DERIVATION_EXPLICIT, repertoire = 0}, update_query_id = 0,
  used_query_id = 0, unsigned_flag = false}
(gdb) set $vars = $vars + 1
(gdb) p *($vars)
$8 = {static extra_size = 8, m_ptr = 0x0, m_length = 0,
  m_type = STRING_RESULT, m_owner = 0x0, m_catalog = {str = 0x0, length = 0},
  entry_name = {m_str = 0x0, m_length = 0}, collation = {collation = 0x0,
    derivation = DERIVATION_EXPLICIT, repertoire = 0}, update_query_id = 0,
  used_query_id = 0, unsigned_flag = false}
...
So, while eventually I've see something expected (STRING_RESULT as type) and pointer trick "worked", the content I get is obviously a bullshit - addresses are out of bounds or zero etc. This leads us nowhere at best.

I had to dig into the code to see how this structure is actually used. This is easy with grep:
[root@centos openxs]# grep -rn user_vars git/mysql-server/*
git/mysql-server/sql/item_func.cc:6017:  HASH *hash= & thd->user_vars;
...
git/mysql-server/sql/sql_prepare.cc:2182:    entry= (user_var_entry*)my_hash_search(&thd->user_vars,...
git/mysql-server/storage/perfschema/table_uvar_by_thread.cc:76:    sql_uvar= reinterpret_cast<user_var_entry*> (my_hash_element(& thd->user_vars, index));
...
git/mysql-server/sql/sql_class.cc:1209:  my_hash_init(&user_vars, system_charset_info, USER_VARS_HASH_SIZE, 0, 0,
git/mysql-server/sql/sql_class.cc:1650:  my_hash_init(&user_vars, system_charset_info, USER_VARS_HASH_SIZE, 0, 0,
...
git/mysql-server/sql/rpl_binlog_sender.cc:654:    (user_var_entry*) my_hash_search(&m_thd->user_vars, (uchar*) name.str,
...
git/mysql-server/sql/sql_class.h:1576:  HASH    user_vars;                     // hash for user variables...
git/mysql-server/storage/perfschema/table_uvar_by_thread.cc:76:    sql_uvar= reinterpret_cast<user_var_entry*> (my_hash_element(& thd->user_vars, index));[root@centos openxs]#
So, user_vars is actually a HASH, and we see functions to find the item in this hash by name, my_hash_search(), and to get item by index, my_hash_element(). I've immediately proceed to use the latter, as I wanted to see all variables:
...
(gdb) thread 10
[Switching to thread 10 (Thread 0x7fc5997bc700 (LWP 2224))]#0  0x00007fc5cc299383 in poll () from /lib64/libc.so.6
(gdb) p do_command::thd->m_thread_id
$1 = 16
(gdb) p do_command::thd->user_vars$2 = {key_offset = 0, key_length = 0, blength = 8, records = 4, flags = 0,
  array = {buffer = 0x7fc5aabf1560 "\001", elements = 4, max_element = 16,
    alloc_increment = 32, size_of_element = 16, m_psi_key = 38},
  get_key = 0xc63630 <get_var_key(user_var_entry*, size_t*, my_bool)>,
  free = 0xc636c0 <free_user_var(user_var_entry*)>, charset = 0x1ded740,
  hash_function = 0xeb6990 <cset_hash_sort_adapter>, m_psi_key = 38}
(gdb) p my_hash_element(&(do_command::thd->user_vars), 1)
$3 = (uchar *) 0x7fc59adf90e0 "H\221ъ \305\177"
(gdb) set $uvar = (user_var_entry *)(my_hash_element(&(do_command::thd->user_vars), 1))
[New Thread 0x7fc59946f700 (LWP 2447)]
[Thread 0x7fc59973a700 (LWP 2396) exited]
[New Thread 0x7fc59932a700 (LWP 2448)]
(gdb) p $uvar
$4 = (user_var_entry *) 0x7fc59adf90e0
(gdb) p *$uvar
$5 = {static extra_size = 8, m_ptr = 0x7fc59adf9148 "aaa", m_length = 3,
  m_type = STRING_RESULT
, m_owner = 0x7fc59accc000, m_catalog = {
    str = 0x41272c275441434e <Address 0x41272c275441434e out of bounds>,
    length = 6075168230325441358}, entry_name = {m_str = 0x7fc59adf9150 "a",
    m_length = 1}, collation = {collation = 0x1ded740,
    derivation = DERIVATION_IMPLICIT, repertoire = 3}, update_query_id = 430,
  used_query_id = 426, unsigned_flag = false}
...
(gdb) set $uvar = (user_var_entry *)(my_hash_element(&(do_command::thd->user_vars), 3))
[Thread 0x7fc599636700 (LWP 2400) exited]
(gdb) p *$uvar
$7 = {static extra_size = 8, m_ptr = 0x7fc59adf91e8 "\376", m_length = 8,
  m_type = INT_RESULT, m_owner = 0x7fc59accc000, m_catalog = {
    str = 0x655f79625f726573 <Address 0x655f79625f726573 out of bounds>,
    length = 7881702179129419126}, entry_name = {m_str = 0x7fc59adf91f0 "b",
    m_length = 1}, collation = {collation = 0x1ded740,
    derivation = DERIVATION_IMPLICIT, repertoire = 3}, update_query_id = 430,
  used_query_id = 426, unsigned_flag = false}
(gdb) set $uvar = (user_var_entry *)(my_hash_element(&(do_command::thd->user_vars), 4))
(gdb) p *$uvar
Cannot access memory at address 0x0
We see that this experimenting was performed on a server under some load, we see threads created and gone, we are able to get proper details by index, but should take extra case to check addresses before accessing them, and should note that items are numbered starting from zero, not 1 (there is no item 4, we get NULL pointer). If anyone cares, item with index 0 was this:
$8 = {static extra_size = 8, m_ptr = 0x7fc5b23b41c0 "\002", m_length = 64,
  m_type = DECIMAL_RESULT, m_owner = 0x7fc59accc000, m_catalog = {
    str = 0x7665602e60616d65 <Address 0x7665602e60616d65 out of bounds>,
    length = 7593481698565975653}, entry_name = {m_str = 0x7fc59adf9330 "e",
    m_length = 1}, collation = {collation = 0x1ded740,
    derivation = DERIVATION_IMPLICIT, repertoire = 3}, update_query_id = 444,
  used_query_id = 444, unsigned_flag = false}
As this was a hash table the order of items is determined by hash, not by the order of assignments to variables.

Now, if I know the name of the variable I can access it in the same way as it's done in the server code (one fo examples found by grep):
    entry= (user_var_entry*)my_hash_search(&thd->user_vars,
                                           (uchar*)lex->prepared_stmt_code.str,
                                           lex->prepared_stmt_code.length);
Basically, we need to pass variable name and the length of this name:
(gdb) set $uvar=(user_var_entry *)(my_hash_search(&(do_command::thd->user_vars), "e", strlen("e")))
[Thread 0x7fc5995b4700 (LWP 2848) exited]
(gdb) p *$uvar
$1 = {static extra_size = 8, m_ptr = 0x7fc5b23b41c0 "\002", m_length = 64,
  m_type = DECIMAL_RESULT, m_owner = 0x7fc59accc000, m_catalog = {
    str = 0x7665602e60616d65 <Address 0x7665602e60616d65 out of bounds>,
    length = 7593481698565975653}, entry_name = {m_str = 0x7fc59adf9330 "e",
    m_length = 1}, collation = {collation = 0x1ded740,
    derivation = DERIVATION_IMPLICIT, repertoire = 3}, update_query_id = 444,
  used_query_id = 444, unsigned_flag = false}

...
(gdb) p *((my_decimal *)$uvar->m_ptr)
$4 = {<st_decimal_t> = {intg = 2, frac = 1, len = 9, sign = 0 '\000',
    buf = 0x7fc5b23b41d8}, buffer = {10, 500000000, 0, 0, 0, 0, 257, 0,
    -1697445128}}
I highlighted some parts that eventually should help to pretty print decimal values. With just strings it's also not that trivial:
...
$5 = {static extra_size = 8, m_ptr = 0x7fc59adf9148 "aaa", m_length = 3,
  m_type = STRING_RESULT, m_owner = 0x7fc59accc000, m_catalog = {
    str = 0x41272c275441434e <Address 0x41272c275441434e out of bounds>,
    length = 6075168230325441358}, entry_name = {m_str = 0x7fc59adf9150 "a",
    m_length = 1}, collation = {collation = 0x1ded740,
    derivation = DERIVATION_IMPLICIT, repertoire = 3}, update_query_id = 430,
  used_query_id = 426, unsigned_flag = false}
(gdb) p *$uvar->m_ptr
$6 = 97 'a'
We can not simply print the pointer in a hope, as this is not a null-terminated string.

I am too tired to create a useful summary right now, but you can see that essentially it's easy (even if not trivial) to get all the details about the user variables defined in any session with gdb, both from working process or from core dump. Enjoy!

1 comment:

  1. Surely, one can get the details about all the user variables set for the thread even without calls to my_hash_element(). If we have 3 variables:

    (gdb) thread 2
    [Switching to thread 2 (Thread 0x7fc3a037b700 (LWP 3061))]#0 0x00007fc3d2cb3383 in poll () from /lib64/libc.so.6
    (gdb) p do_command::thd->m_thread_id
    $1 = 5
    (gdb) p do_command::thd->user_vars
    $2 = {key_offset = 0, key_length = 0, blength = 4, records = 3, flags = 0,
    array = {buffer = 0x7fc3ba2b3560 "\377\377\377\377", elements = 3,
    max_element = 16, alloc_increment = 32, size_of_element = 16,
    m_psi_key = 38},
    get_key = 0xc63630 ,
    free = 0xc636c0 , charset = 0x1ded740,
    hash_function = 0xeb6990 , m_psi_key = 38}

    we can get the details about them as follows:

    (gdb) set $uvars=&(do_command::thd->user_vars)
    (gdb) p $uvars
    $3 = (HASH *) 0x7fc3b9fad280
    (gdb) p $uvars->array
    $4 = {buffer = 0x7fc3ba2b3560 "\377\377\377\377", elements = 3,
    max_element = 16, alloc_increment = 32, size_of_element = 16, m_psi_key = 38}
    (gdb) p &($uvars->array)
    $5 = (DYNAMIC_ARRAY *) 0x7fc3b9fad2a8
    (gdb) p ((HASH_LINK*)((&($uvars->array))->buffer) + (0))
    $6 = (HASH_LINK *) 0x7fc3ba2b3560
    (gdb) p ((HASH_LINK*)((&($uvars->array))->buffer) + (0))->data
    $7 = (uchar *) 0x7fc3b9fc80e0 "H\201\374\271\303\177"
    (gdb) p (user_var_entry *)(((HASH_LINK*)((&($uvars->array))->buffer) + (0))->data)
    $8 = (user_var_entry *) 0x7fc3b9fc80e0
    (gdb) p *(user_var_entry *)(((HASH_LINK*)((&($uvars->array))->buffer) + (0))->data)
    $9 = {static extra_size = 8, m_ptr = 0x7fc3b9fc8148 "bbb", m_length = 3,
    m_type = STRING_RESULT, m_owner = 0x7fc3b9fad000, m_catalog = {
    str = 0x100000000 <...>,
    length = 416611827727}, entry_name = {m_str = 0x7fc3b9fc8150 "a",
    m_length = 1}, collation = {collation = 0x1ded740,
    derivation = DERIVATION_IMPLICIT, repertoire = 3}, update_query_id = 25,
    used_query_id = 25, unsigned_flag = false}
    (gdb) p *(user_var_entry *)(((HASH_LINK*)((&($uvars->array))->buffer) + (1))->data)
    $10 = {static extra_size = 8, m_ptr = 0x7fc3b9fc81e8 "\n", m_length = 8,
    m_type = INT_RESULT, m_owner = 0x7fc3b9fad000, m_catalog = {str = 0x0,
    length = 0}, entry_name = {m_str = 0x7fc3b9fc81f0 "b", m_length = 1},
    collation = {collation = 0x1ded740, derivation = DERIVATION_IMPLICIT,
    repertoire = 3}, update_query_id = 25, used_query_id = 25,
    unsigned_flag = false}
    (gdb) p *(user_var_entry *)(((HASH_LINK*)((&($uvars->array))->buffer) + (2))->data)
    $11 = {static extra_size = 8, m_ptr = 0x7fc3b9e6e220 "\002", m_length = 64,
    m_type = DECIMAL_RESULT, m_owner = 0x7fc3b9fad000, m_catalog = {str = 0x0,
    length = 0}, entry_name = {m_str = 0x7fc3b9fc8290 "c", m_length = 1},
    collation = {collation = 0x1ded740, derivation = DERIVATION_IMPLICIT,
    repertoire = 3}, update_query_id = 25, used_query_id = 25,
    unsigned_flag = false}
    (gdb)

    ReplyDelete