Debugging Heap Errors |
The heap verifier debugger extension
The heap verifier debugger extension is part of the !heap extension (NT heap debugger extension). Simple help can be obtained with !heap -? or more extensive with !heap -p -? .
The current extension does not detect on its own if page heap is enabled for a process and act accordingly. For now the user of the extension needs to know that page heap is enabled and use commands prefixed by !heap -p .
!heap -p
Dumps addresses of all full page heaps created in the process.
!heap -p -h ADDRESS-OF-HEAP
Full dump of full page heap at ADDRESS-OF-HEAP.
!heap -p -a ADDRESS
Tries to figure out if there is a heap block at ADDRESS. This value does not need to be the address of the start of the block. The command is useful if there is no clue whatsoever about the nature of a memory area.
Heap operation log
The heap operation log tracks all heap routines. These include HeapAlloc, HeapReAlloc, and HeapFree.
You can use the !avrf -hp Length extension command to display the last several records; Length specifies the number of records.
You can use !avrf -hp -a Address to display all heap space operations that affected the specified Address. For an allocation operation, it is sufficient that Address be contained in the allocated heap block. For a free operation, the exact address of the beginning of the block must be given.
For each entry in the log, the following information is displayed:
The heap function called.
The thread ID of the thread that called the routine.
The address involved in the call � this is the address that was returned by an allocation routine or that was passed to a free routine.
The size of the region involved in the call.
The stack trace of the call.
The most recent entries are displayed first.
In this example, the two most recent entries are displayed:
0:001> !avrf -hp 2
alloc (tid: 0xFF4):
address: 00ea2fd0
size: 00001030
00403062: Prymes!_heap_alloc_dbg+0x1A2
00402e69: Prymes!_nh_malloc_dbg+0x19
00402e1e: Prymes!_malloc_dbg+0x1E
00404ff3: Prymes!_stbuf+0xC3
00401c23: Prymes!printf+0x43
00401109: Prymes!main+0xC9
00402039: Prymes!mainCRTStartup+0xE9
77e7a278: kernel32!BaseProcessStart+0x23
alloc (tid: 0xFF4):
address: 00ea07d0
size: 00000830
00403062: Prymes!_heap_alloc_dbg+0x1A2
00402e69: Prymes!_nh_malloc_dbg+0x19
00402e1e: Prymes!_malloc_dbg+0x1E
00403225: Prymes!_calloc_dbg+0x25
00401ad5: Prymes!__initstdio+0x45
00401f38: Prymes!_initterm+0x18
00401da1: Prymes!_cinit+0x21
00402014: Prymes!mainCRTStartup+0xC4
77e7a278: kernel32!BaseProcessStart+0x23
Typical debug scenarios
There are several failure scenarios that might be encountered. Some of them require quite a bit of detective work to get the whole picture.
Access violation in non-accessible page
This happens when full page heap is enabled if the tested application accesses beyond the end of buffer. It can also happen if it touches a freed block. In order to understand what is the nature of the address on which the exception occurred, you need to use:
!heap �p �a ADDRESS-OF-AV
Corrupted block message
At several moments during the lifetime of an allocation (allocation, user free, real free) the page heap manager checks if the block has all fill patterns intact and the block header has consistent data. If this is not the case you will get a verifier stop.
If the block is a full page heap block (for example, if you know for sure full page heap is enabled for all allocations) then you can use �!heap �p �a ADDRESS� to find out what are the characteristics of the block.
If the block is a light page heap block then you need to find out the start address for the block header. You can find the start address by dumping 30-40 bytes below the reported address and look for the magic start/end patterns for a block header (ABCDAAAA, ABCDBBBB, ABCDAAA9, ABCDBBBA).
The header will give all the information you need to understand the failure. Particularly, the magic patterns will tell if the block is allocated or free if it is a light page heap or a full page heap block. The information here has to be matched carefully with the offending call.
For example if a call to HeapFree is made with the address of a block plus four bytes, then you will get the corrupted message. The block header will look fine but you will have to notice that the first byte after the end of the header (first byte after 0xDCBAXXXX magic value) has a different address then the one in the call.
Special fill pointers
The page heap manager fills the user allocation with values that will look as kernel pointers. This happens when the block gets freed (fill value is F0) and when the block gets allocated but no request is made for the block to be zeroed (fill value is E0 for light page heap and C0 for full page heap). The non-zeroed allocations are typical for malloc/new users. If there is a failure (access violation) where a read/write is attempted at addresses like F0F0F0F0, E0E0E0E0, C0C0C0C0 then most probably you hit one of these cases.
A read/write at F0F0F0F0 means a block has been used after it got freed. Unfortunately you will need some detective work to find out which block caused this. You need to get the stack trace of the failure and then inspect the code for the functions on the stack. One of them might make a wrong assumption about an allocation being alive.
A read/write at E0E0E0E0/C0C0C0C0 means the application did not initialize properly the allocation. This also requires code inspection of the functions in the current stack trace. Here it is an example for this kind of failure. In a test process an access violation while doing a HeapFree on address E0E0E0E0 was noticed. It turned out that the test allocated a structure, did not initialize it correctly and then called the destructor of the object. Since a certain field was not null (it had E0E0E0E0 in it) it called delete on it.