Jun 202017
 

What’s a Heap?

Before understanding HeapCorruption here’s a quick review of the Heap. The HeapCreate function creates a private heap object from which the calling process can allocate memory blocks by using the HeapAlloc function. HeapCreate specifies both an initial size and a maximum size for the heap. The initial size determines the number of committed, read/write pages initially allocated for the heap. The maximum size determines the total number of reserved pages. These pages create a contiguous block in the virtual address space of a process into which the heap can grow. Additional pages are automatically committed from this reserved space if requests by HeapAlloc exceed the current size of committed pages, assuming that the physical storage for it is available. Once the pages are committed, they are not decommitted until the process is terminated or until the heap is destroyed by calling the HeapDestroy function.

We don’t directly interact with this function but internally the boiler plate application code does this, for e.g. CRT heap. When we use ‘new’, ‘malloc’, internally somewhere there’s a call happening to HeapAlloc. When we call ‘delete’ or ‘free’ internally somewhere there’s a call happening to HeapFree.

So what’s HeapCorruption?

Any inconsistency that’s caused to a process heap because of application code can be called a HeapCorruption. For e.g. Since most of the time the heap memory blocks are in the form of a linked list, linked to each other, so if we end up overwriting these links then we call that a heap corruption because the heap cannot be traversed anymore from one linked list node to the other.

We could see an application crashing with following error…

0:143> .lastevent
Last event: 616c.a550: Unknown exception – code c0000374 (first/second chance not available)
debugger time: Tue Jun 13 12:41:19.531 2017 (UTC – 5:00)

0:143> !error c0000374
Error code: (NTSTATUS) 0xc0000374 (3221226356) – A heap has been corrupted.

But in my case even though application crashed due to c0000374 and when I check the reason for HeapCorruption, I got the following…

Error type: HEAP_FAILURE_INVALID_ARGUMENT

So essentially the heap is not corrupted but we passed an invalid argument for heap free function hence this failure. So how do I verify that this is indeed the reason? So, we pick up the address that was passed on to HeapFree function and then analyze it using the !address command…

0:143> !address 0x0000003c`5cec97a8

Usage: Stack
Base Address: 0000003c`5ceae000
End Address: 0000003c`5ced0000
Region Size: 00000000`00022000 ( 136.000 kB)
State: 00001000 MEM_COMMIT
Protect: 00000004 PAGE_READWRITE
Type: 00020000 MEM_PRIVATE
Allocation Base: 0000003c`5ce50000
Allocation Protect: 00000004 PAGE_READWRITE
More info: ~143k
Content source: 1 (target), length: 6858

If we look at above output, usage shows as stack memory, which should never be freed via Heap functions but instead they’re automatically released at the end of usage scope. Lets now look at another address which is located on the Heap…

0:143> !address 0x0000003c`6acbfc00

Usage: Heap
Base Address: 0000003c`6a930000
End Address: 0000003c`6af17000
Region Size: 00000000`005e7000 ( 5.902 MB)
State: 00001000 MEM_COMMIT
Protect: 00000004 PAGE_READWRITE
Type: 00020000 MEM_PRIVATE
Allocation Base: 0000003c`6a220000
Allocation Protect: 00000004 PAGE_READWRITE
More info: heap owning the address: !heap 0x3c74e70000
More info: heap segment
More info: heap entry containing the address: !heap -x 0x3c6acbfc00

Content source: 1 (target), length: 257400

The above memory block is owned by a heap hence this can be freed via HeapFree Windows API and we should not get the HEAP_FAILURE_INVALID_ARGUMENT error. The error  ‘HEAP_FAILURE_INVALID_ARGUMENT’ can be further proved by following output from an internal extension command output…

**************************************************************
*                                                            *
*                    HEAP ERROR DETECTED                     *
*                                                            *
**************************************************************

Details:

Heap address: 0000003c74e70000
Error address: 0000003c5cec97a8
Error type: HEAP_FAILURE_INVALID_ARGUMENT

Details: The caller tried to a free a block at an invalid (unaligned) address.

Sep 102014
 

Issue

Recently a colleague asked this question. They had a customer who was experiencing a heap corruption so as expected we enabled PageHeap but there was a catch. The application had to run for a long time (around 30 days) in order to reproduce the crash and we had no idea what’s causing the crash.

How do we enable PageHeap?

We can enable standard PageHeap using following command run from an admin command prompt: gflags /p /enable ImageFileName
To enable full PageHeap use the following: gflags /p /enable ImageFileName /full 

(MSDN) Use care in interpreting the Enable page heap check box for an image file in the GFlags dialog box. It indicates that page heap verification is enabled for an image file, but it does not indicate whether it is full or standard page heap verification. If the check results from selecting the check box, then full page heap verification is enabled for the image file. However, if the check results from use of the command-line interface, then the check can represent the enabling of either full or standard page heap verification for the image file.

Why application hung?

So customer enabled PageHeap and went home. Came back next day to see that the application has stopped responding and is hung. The application hung apparently after enabling PageHeap and as we know of PageHeap: every allocation is paged to the page file. So guess why would the hang take place? PageFile size!

Resolution

The customer had set PageFile to its default size which apparently was not enough in this case. We suggested to increase the PageFile size and the hang went away. This resolved the issue. Note that if you enable PageHeap and then go home no matter what’s the PageFile size eventually the result will be unpredictable as the PageFile size is finite. You might need to tweak your PageHeap settings and make it per module or non-full standard page heap.

Conclusion

Please note there are different variants of PageHeap. In this case we needed a full PageHeap so please note this will be pretty heavy on the PageFile.