Sep 102014
 

Issue

Recently a colleague asked this question. They had a customer who was experiencing a heap corruption so as expected we enabled PageHeap but there was a catch. The application had to run for a long time (around 30 days) in order to reproduce the crash and we had no idea what’s causing the crash.

How do we enable PageHeap?

We can enable standard PageHeap using following command run from an admin command prompt: gflags /p /enable ImageFileName
To enable full PageHeap use the following: gflags /p /enable ImageFileName /full 

(MSDN) Use care in interpreting the Enable page heap check box for an image file in the GFlags dialog box. It indicates that page heap verification is enabled for an image file, but it does not indicate whether it is full or standard page heap verification. If the check results from selecting the check box, then full page heap verification is enabled for the image file. However, if the check results from use of the command-line interface, then the check can represent the enabling of either full or standard page heap verification for the image file.

Why application hung?

So customer enabled PageHeap and went home. Came back next day to see that the application has stopped responding and is hung. The application hung apparently after enabling PageHeap and as we know of PageHeap: every allocation is paged to the page file. So guess why would the hang take place? PageFile size!

Resolution

The customer had set PageFile to its default size which apparently was not enough in this case. We suggested to increase the PageFile size and the hang went away. This resolved the issue. Note that if you enable PageHeap and then go home no matter what’s the PageFile size eventually the result will be unpredictable as the PageFile size is finite. You might need to tweak your PageHeap settings and make it per module or non-full standard page heap.

Conclusion

Please note there are different variants of PageHeap. In this case we needed a full PageHeap so please note this will be pretty heavy on the PageFile.

Aug 272014
 
What is a Symbol?

When you compile your executable the compiler generates debugging symbol information for every file it compiles and then linker assembles all of these symbol information into one file called PDB file or the Program DataBase file. Every variable, function in your application code can be called as a symbol which implies there will be private and public symbols.

This generated .pdb file’s full local path is embedded into the executable file. This comes in handy while you are debugging this application on your development machine. The PDB file will be picked up by the debugger since full path to the PDB is embedded into the executable by the linker. This will help the debugger figure, line numbers, file names, callstacks, local and global vars.

Why do we need Symbols?

Symbol file or .pdb file contains information which are actually not needed when you run your application but these come in handy when debugging application for bugs. Without pdb files or symbol files figuring out bugs or exact callstacks will be a pain on Windows. If that’s the case you might ask then why is this information not embedded into the executable? The answer is symbols are not always needed hence they are dumped into a separate file called .pdb so that you can debug when needed and also you can choose who see’s what symbols in turn making it hard for people to reverse engineer your code.

What does a PDB file contain?

They can contain a variety of information. For e.g.

  • Source code information: Line numbers, file names
  • Variables: Global and Local variables mapped to their addresses
  • Function names mapped to their addresses
  • FPO information to get correct call stack.
  • etc

Windows Debugger installation contains utilities to check out a PDB file namely: symchk, agestore, symstore, pdbcopy etc.

Public and Private Symbols

When linker generates a PDB file it contain both private and public debugging symbol information. Of course you can configure what it generates in the linker property pages.

Private symbol data contains following (mostly)

  • Global and Local Variables.
  • Functions
  • All user defined types.
  • Line number and source file information.

Public symbol table contains following…

  • Functions (just the address)
  • Global variables that are visible across obj files.

As you might have inferred private symbol files will be bigger in size compared to public symbols files. Also since private symbol file contains public symbols information as well, we can generate a separate public symbol file from this private symbols file. We use a tool called pdbcopy.exe for this purpose, comes with the windows debugger installation.

Symbol Path

So how do we tell the debugger where to look for symbols. One of my favorites is to use the environment variable_NT_SYMBOL_PATH. This variable provides us the flexibility to specify cache directories for downloaded symbols, we can even specify per symbol server cache directory.

Following value for _NT_SYMBOL_PATH downloads symbols from the server and puts into C:\Symbols folder.

cache*c:\Symbols;SRV*http://symbolserver;srv*http://anotherserver;srv*http://onemoreserver

Following value for _NT_SYMBOL_PATH downloads symbols from the server and puts into C:\Symbols folder and downloads symbols from http://anotherserver to c:\anotherserver_cache_folder.

cache*c:\Symbols;SRV*http://symbolserver;srv*c:\anotherserver_cache_folder*http://anotherserver

Windows debugger provides commands to controls symbol path, .sympath, .symfix. I use .symfix to quickly setup a default symbol path and symbols will be downloaded to a sym folder under the debugger folder. While .sympath is a cool command. If you would quickly add a symbol path to the debugger, just do the following…

.sympath+ C:\AnotherSymbolFolder
.reload

Controlling Symbol Loading in Windows Debugger

The debugger provides a command called .symopt. If we run the command without any arguments its shows our current symbol loading settings, for e.g.

Output from .symopt

So we see in this case we’ve configured to load line number information, and since we haven’t said SYMOPT_PUBLICS_ONLY, then private symbols are loaded. SYMOPT_AUTO_PUBLICS tells debugger to look for public symbols only as a last resort.

More information on symbols loading options can be found here: http://msdn.microsoft.com/en-us/library/windows/hardware/ff558827(v=vs.85).aspx

Along with this to see a list of modules for which symbol loading failed use command ‘lme’. To get a verbose output of the symbol loading process in the debugger use “!sym noisy” to turn it off use “!sym quiet”.

Conclusion

Always keep your symbols handy. Never know when you might need them.

Dec 092013
 

Recently a colleague of mine asked where’s the length of CString string stored in memory. Hmm so lets dig around. Please note I’ve declared the following CString object in my code…

CString TestCString = _T("Nibu is testing CString");

If you dump CString type in the debugger we see following…

0:000> dt TestCString
Local var @ 0xb4fcd4 Type ATL::CStringT<wchar_t,StrTraitMFC_DLL<wchar_t,ATL::ChTraitsCRT<wchar_t> > >
   +0x000 m_pszData        : 0x00dfa2f8  "Nibu is testing CString"

From above dump of type CString we see that CString class defines just one variable: m_pszData. I don’t see a length variable here so where is the length stored for CString string?

Length of a CString string is stored at a negative offset from m_pszData. The data structure that resides at the negative offset is: ATL::CStringData

0:000> dt mfc100ud!ATL::CStringData
   +0x000 pStringMgr       : Ptr32 ATL::IAtlStringMgr
   +0x004 nDataLength      : Int4B
   +0x008 nAllocLength     : Int4B
   +0x00c nRefs            : Int4B

CStringData is retrieved via a call to function: GetData()

CStringData* GetData() const throw()
{
    return( reinterpret_cast< CStringData* >( m_pszData )-1 );
}

The above code is bit of pointer arithmetic, first m_pszData is cast to a pointer to CStringData and then the casted type is deducted by –1 (which will equate to -sizeof(CStringData). So lets see while debugging if we can get to the CStringData located at a negative offset. First lets get the size of ATL::CStringData in memory.

0:045> ?? sizeof(ATL::CStringData)
unsigned int 0x10

Size of ATL::CStringData comes to 0x10 bytes. So in my test application lets find out what is located at a negative offset of 0x10 bytes. In my current frame I’ve the following locals. My CString object is called TestCString, highlighted in bold in the below code snippet.

0:000> dv
           this = 0x00ef6ba8
        cmdInfo = class CCommandLineInfo
       ttParams = class CMFCToolTipInfo
      InitCtrls = struct tagINITCOMMONCONTROLSEX
   pDocTemplate = 0xcccccccc
    TestCString = class ATL::CStringT<wchar_t,StrTraitMFC_DLL<wchar_t,ATL::ChTraitsCRT<wchar_t> > > 
     pMainFrame = 0xcccccccc

Deduction of 0x10 bytes from address of m_pszData (0x00dfa2f8) gives us the address: 00dfa2e8

0:000> ? 0x00dfa2f8-0x10
Evaluate expression: 14656232 = 00dfa2e8

Lets try dumping out CStringData located at the address: 00dfa2e8. See below

0:000> dt 00dfa2e8 TestStack!ATL::CStringData
   +0x000 pStringMgr       : 0x786cb8e4 ATL::IAtlStringMgr
   +0x004 nDataLength      : 0n23
   +0x008 nAllocLength     : 0n23
   +0x00c nRefs            : 0n1

Dump type says, length of string is: 0n23 which is correct. The length of string “Nibu is testing CString” is indeed 23.

Code documentation of CStringData says this about its member variables…

struct CStringData
{
    IAtlStringMgr* pStringMgr;  // String manager for this CStringData
    int nDataLength;  // Length of currently used data in XCHARs (not including terminating null)
    int nAllocLength;  // Length of allocated data in XCHARs (not including terminating null)
    long nRefs;     // Reference count: negative == locked
    // XCHAR data[nAllocLength+1]  // A CStringData is always followed in memory by the actual array of character data

Difference between nDataLength and nAllocLength is quite evident from the above documentation. Hope this helps.

Nov 052013
 

High Memory Usage Scenario

Recently had a customer who was complaining about high memory usage on Windows 8.1. The application consumed about 140 MB on a Windows 8.1 OS as compared to a meager 3 to 4 MB on a Windows 7 or 8 machine.

Hmm interesting. Being experienced in troubleshooting for sometime now this smelled to me like an issue with some kind of debug flag settings. So immediately checked with customer if he has accidentally left some GFlags setting configured.

Reminded me of a customer who had an issue wherein all applications on his box started showing high memory usage, eventually this turned out to be an issue with a system wide flag configured via GFlags. GFlags is a helpful tool but please do remember to undo the changes once you’re done with the debugging. Probably stick a sticky somewhere which will hint you to turn off these settings.

So coming back to this incident, hmm why would the application consume high memory on Windows 8.1. Note: He had the application compiled using VS2008.

Memory Dump Analysis for High Memory Usage

Checked memory dump of Test.exe running on Windows 8.1 in our debugger and saw that it has some heap validation features enabled. This is the reason why huge amount of memory is being consumed since these heap validation features will require extra memory.

0:000> !heap
Index   Address  Name      Debugging options enabled
  1:   00300000                 tail checking free checking validate parameters
  2:   00c20000                 tail checking free checking validate parameters
  3:   00200000                 tail checking free checking validate parameters
  4:   02170000                 tail checking free checking validate parameters

I was bit surprised as the customer said he doesn’t have GFlags on his box. So I renamed Test.exe to Test1.exe and this is what the dump shows now. Looks like someone’s enabling heap validation flags on Test.exe.

0:000> !heap
Index   Address  Name      Debugging options enabled
  1:   001d0000                
  2:   00c20000                
  3:   02220000                
  4:   00390000

The application memory usage, after renaming, came down to 3.5 MB.

image

 

Cause of High Memory Usage

Eventually we figured out who’s turning the heap validation flags on. The integrated Application Verifier included in the Visual Studio Team Suite and Visual Studio Team System for Developers versions of Visual Studio was turning these features on and that was expected as well. The customer had pro version hence he probably didn’t see the settings in project properties. This is how the project property pages will look like…

image

So if you have application verifier installed on your box you’ll see your application listed as Visual Studio turns certain registry settings on/off based on your settings. Once your application starts up these settings will take effect. Troubleshooting is fun isn’t it. Smile

Sep 262013
 
About AgeStore

It’s a good habit to clear out old symbol files. Debugging tools for windows comes with a built in tool which help us do this. The tool is named ‘AgeStore’.

AgeStore executes in three modes…

  • -date=mm-dd-yy    – deletes all files that were last accessed before the specified date.
  • -days=xx                – deletes all files that were last accessed before today minus the amount of days specified by ‘xx’.
  • -size=xx                 – deletes files in order of last access time (oldest first), until all the files in the directory total to the amount of bytes specified by ‘xx’.

There is a caveat when running this command on vista and later. On Vista and later by default “Last Access Time” is disabled, since AgeStore works on “Last Access Time” the tool will fail. Use fsutil command to turn on “Last Access Time” feature, as follows…

E:\>fsutil behavior set DisableLastAccess 0
DisableLastAccess = 0

This will turn on last access feature. Please note if this feature was off by default, you’ll not see any old files (based on access) since you turned on last access feature just now. So you’ll have to leave this feature on and then later run the AgeStore command.

Note also that if you run the AgeStore command, the default action is to delete files unless, please be very careful. AgeStore can be used on any folder, not just on symbol folder.

AgeStore Help Text
E:\>Agestore

agestore [pathspec]

Deletes all files from a directory based on the last access time of the files.
[pathspec] defines the root path and file specification.
The default is all files in the current working directory

It runs in one of these modes...

-date=mm-dd-yy    - deletes all files that were last accessed before the specified date.
-days=xx          - deletes all files that were last accessed before today minus the
                    amount of days specified by 'xx'.
-size=xx          - deletes files in order of last access time (oldest first), until all the
                    files in the directory total to the amount of bytes specified by 'xx'.
-size             - lists the amount of bytes in the directory.
-lat=<on off>     - toggles filesytem support for last-access-time.

These other command line switches alter the behavior of the program.

-l                - list files only, don't delete
-s                - include subdirectories.
-k                - keep empty subdirectories - normally they are removed.
-q                - quiet mode stops listing of files as they are deleted.
-y                - eliminates the (y/n) prompt.
-r                - deletes RO files

This program deletes files.  You should run agestore with the -l switch
to see what it will delete, before actual usage
Sample Commands
  • The following command lists all symbols older than the given date
    AgeStore e:\pdbsymbols -date=07-08-13 -s –l
  • The following command list all pdb files older than the number of days given below
    AgeStore e:\pdbsymbols -days=60 -s –l
  • The following command deletes files in order of last access time (oldest first), until all the files in the directory total to the amount of bytes specified by the parameter passed to –size command.
    AgeStore e:\pdbsymbols -size=8000000 -s -l
    <snip>
    10375868360 bytes would be deleted
    4336640 bytes would remain
  • The following command lists the amount of bytes in the directory.
    AgeStore e:\pdbsymbols -size -s
Aug 232013
 

While stepping through disassembly code you might have wondered if there is a way to jump directly to the next branching statement or the next call or the next return statement instruction. The answer is: Yes there are some very useful ones, the following table of commands is taken from WinDbg documentation.

p (Step)

clip_image001[4]

Debug | Step Over

F10

Target executes one instruction. If this instruction is a function call, that function is executed as a single step.

pa (Step to Address)

     

Target executes until it reaches the specified address. All steps in this function are displayed (but steps in called functions are not).

pc (Step to Next Call)

     

Target executes until the next call instruction. If the current instruction is a call instruction, this call is executed completely and execution continues until the next call.

pct (Step to Next Call or Return)

     

Target executes until it reaches a call instruction or a return instruction.

ph (Step to Next Branching Instruction)

     

Target executes until it reaches any kind of branching instruction, including conditional or unconditional branches, calls, returns, and system calls.

pt (Step to Next Return)

     

Target executes until it reaches a return instruction.

t (Trace)

clip_image002[4]

Debug | Step Into

F11

F8

Target executes one instruction. If this instruction is a function call, debugger traces into that call.

ta (Trace to Address)

     

Target executes until it reaches the specified address. All steps in this function and called functions are displayed.

tb (Trace to Next Branch)

     

(All modes, except kernel mode, only on x86-based systems) Target executes until it reaches the next branch instruction.

tc (Trace to Next Call)

     

Target executes until the next call instruction. If the current instruction is a call instruction, the instruction is traced into until a new call is reached.

tct (Trace to Next Call or Return)

     

Target executes until it reaches a call instruction or return instruction. If the current instruction is a call instruction or return instruction, the instruction is traced into until a new call or return is reached.

th (Trace to Next Branching Instruction)

     

Target executes until it reaches any kind of branching instruction, including conditional or unconditional branches, calls, returns, and system calls. If the current instruction is a branching instruction, the instruction is traced into until a new branching instruction is reached.

tt (Trace to Next Return)

     

Target executes until it reaches a return instruction. If the current instruction is a return instruction, the instruction is traced into until a new return is reached.

wt (Trace and Watch Data)

     

Target executes until the completion of the whole specified function. Statistics are then displayed.

Apr 272013
 

Filename and line number information is stored inside private symbols (.pdb file). So if private symbols are available the debugger will try figuring out the line number information. Note: public symbols doesn’t have line number information.

So the question I’ve heard people new to windbg ask is how to turn off line number display. What’s the command for this. What I normally do is and the easiest of all is the ‘.lines’ command. This is a toggle command, next time you execute .lines, the command will turn ‘on’ line number information.

Another option is to use .symopt command:
http://msdn.microsoft.com/en-in/library/windows/hardware/ff558827(v=vs.85).aspx

The symbol option of interest to us is: SYMOPT_LOAD_LINES. Following is the MSDN description of this item.

This symbol option allows line number information to be read from source files. This option must be on for source debugging to work correctly.

In KD and CDB, this option is off by default; in WinDbg, this option is on by default. In CDB and KD, the -lines command-line option will turn this option on. Once the debugger is running, it can be turned on or off by using .symopt+0x10 or .symopt-0x10, respectively. It can also be toggled on and off by using the .lines (Toggle Source Line Support) command.

This option is on by default in DBH. Once DBH is running, it can be turned on or off by using symopt +10 or symopt -10, respectively.

Apr 272013
 

What’s NonInvasive debugging?

Non-Invasive debugging is a useful technique to debug hung processes. When NonInvasive debugging is going on the debugger suspends all threads in the process and has access to all threads, memory and register’s of the process. The debugger when NonInvasive debugging is in progress cannot modify process memory, cannot instruct the process to run as well.

NonInvasive debugging via WinDbg

To do non-invasive debugging via windbg/cdb check this link out:
http://msdn.microsoft.com/en-in/library/windows/hardware/ff552274(v=vs.85).aspx

Here let me show you how to do NonInvasive debugging via WinDbg UI.

Open WinDbg, press F6 or File->Attach to a Process. Please make sure you check “Noninvasive” check box in the “Attach to Process” dialog.

Enable NonInvasive debugging

While non-invasive debugging is in progress we can have another instance of the debugger attached to the debuggee. This proves that when non-invasive debugging is in progress the debugger is not attached to the debuggee. Note on Windows only one debugger can be attached at any time to a process. Also while debugging non-invasively common windbg commands like ‘g’ won’t work because this debugger is not attached to the debuggee invasively hence cannot instruct the process to resume execution. A debugger invasively attached to a debuggee manipulates it via a thread created in the remote process.

Thread manipulation during NonInvasive debugging

As written already: for non-invasive debugging the debugger suspends all the threads in the process so this also means that we can resume execution of these threads too. The command to do this is as follows…

0:000> ~*m

~m resumes a thread while ~n suspends a thread. If we don’t resume the threads we won’t see the process UI as the UI thread is also in a suspended state.

Viewing debuggee process memory during NonInvasive debugging

Now with two debuggers monitoring the process we can view the process’ memory via the non-invasive debugger as well. For e.g. when you set a breakpoint via the second debugger  (attached invasively) its interesting to see how the function code is modified by the debugger to get the breakpoint to work, see below e.g.

This is how code for ntdll!ntopenfile will look like before a breakpoint is set…

0:001> uf ntdll!ntopenfile
ntdll!ZwOpenFile:
000007f9`25972f10 4c8bd1          mov     r10,rcx           <<<<---- This three byte instruction is replaced, see below
000007f9`25972f13 b831000000      mov     eax,31h
000007f9`25972f18 0f05            syscall
000007f9`25972f1a c3              ret

This is how code for ntdll!ntopenfile will look like after a breakpoint is set…

0:000> uf ntdll!ntopenfile
ntdll!ZwOpenFile:
000007f9`25972f10 cc              int     3  <<<<------- Single byte instruction cc, and followed by
000007f9`25972f11 8bd1            mov     edx,ecx <<--- 8bd1: remaining two bytes of the above three bytes instruction: 4c8bd1. 4c replaced by cc.
000007f9`25972f13 b831000000      mov     eax,31h
000007f9`25972f18 0f05            syscall
000007f9`25972f1a c3              ret

In effect the original three byte instruction (4c8bd1) is replaced by (cc8bd1). The only change: 4c –> cc. cc evaluates to int 3. When the breakpoint is hit (or when we press Ctrl + Break) the breakpoint instruction is replaced by original op code i.e. 4c8bd1.

We could figure this out via the non-Invasive debugger. If a process is hung we can in effect go through the call stacks and find out potential hang scenarios, for e.g. a process waiting on a network drive.