Windows Server 2003: stack overflow

In this case Windows Server 2003 was crashing during boot.

kd> vertarget
Windows Server 2003 Kernel Version 3790 (Service Pack 2) UP Free x86 compatible
Product: LanManNt, suite: TerminalServer SingleUserTS
Built by: 3790.srv03_sp2_qfe.150316-2035

As always a good start to analyze a crash dump is the !analyze command.

kd> !analyze -v
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
This means a trap occurred in kernel mode, and it's a trap of a kind
that the kernel isn't allowed to have/catch (bound trap) or that
is always instant death (double fault).  The first number in the
bugcheck params is the number of the trap (8 = double fault, etc)
Consult an Intel x86 family manual to learn more about what these
traps are. Here is a *portion* of those codes:
If kv shows a taskGate
        use .tss on the part before the colon, then kv.
Else if kv shows a trapframe
        use .trap on that value
        .trap on the appropriate frame will show where the trap was taken
        (on x86, this will be the ebp that goes with the procedure KiTrap)
kb will then show the corrected stack.
Arg2: 80042000
Arg3: 00000000
Arg4: 00000000
Debugging Details:
STACK_OVERFLOW: Stack Limit: b7952000 Use (kF) and (!stackusage) to investigate stack usage.
b7952000 f722de1f 8a168b00 88b0c201 88b0c201 SCSIPORT!GetLogicalUnitExtension+0x9
b7952020 f7247d29 8a15f004 88b0c201 88b0c201 SCSIPORT!ScsiPortGetLogicalUnit+0x1e
WARNING: Stack unwind information not available. Following frames may be wrong.
b795204c f7248d95 8a15f004 88b0c201 8a168b00 nvgts+0x1d29
b79520cc f72290be 8a15f004 88b0c288 8a168b00 nvgts+0x2d95
b79520f0 8088f3f1 8a168a48 00000002 88b0c202 SCSIPORT!SpStartIoSynchronized+0x14f
b7952128 80a63147 8a168a48 88ae0588 88ac0e08 nt!KeSynchronizeExecution+0x21
b7952154 f72293a6 88ac0e08 8a168a48 88ac4c08 hal!HalBuildScatterGatherList+0x1c7
b79521a0 8081d180 8a168a48 88ac0e08 8a168b00 SCSIPORT!ScsiPortStartIo+0x36a
b79521c4 f722962f 8a168a48 88ae0588 00000000 nt!IoStartPacket+0x82
b79521f8 f7229146 8a168a48 02ae0640 88b0c288 SCSIPORT!ScsiPortFdoDispatch+0x270
b7952214 f7228dc3 8a1520e8 88ae0588 88ae0588 SCSIPORT!SpDispatchRequest+0x68
b7952230 f7228299 8a152030 88ae0588 8a188298 SCSIPORT!ScsiPortPdoScsi+0x129
b7952244 8081e185 8a152030 88ae0588 f735939c SCSIPORT!ScsiPortGlobalDispatch+0x1d
b7952258 f73593c6 f735939c b7952294 f7359bb2 nt!IofCallDriver+0x45
b7952264 f7359bb2 8a183e00 88ae0588 88ae0588 ACPI!ACPIDispatchForwardIrp+0x2a
b7952294 8081e185 8a183e00 f7370294 88b0c288 ACPI!ACPIDispatchIrp+0x15a
b79522a8 f729c6c8 88ae0588 88b0c288 88ab9428 nt!IofCallDriver+0x45
b79522c0 f729fee4 8a153020 88b0c288 88ae0588 nvrd32+0x16c8
b7952330 f72a0f90 8a153020 f7ab6000 00001000 nvrd32+0x4ee4
b7952360 f72a19a4 8a153020 8a18c160 8a18c160 nvrd32+0x5f90
b7952380 f72a1a6b b79523a8 f72a53d0 8a18c160 nvrd32+0x69a4
b7952388 f72a53d0 8a18c160 88ab9428 8a198050 nvrd32+0x6a6b
b79523a8 f72a542a 88ab9428 88ab5bf0 b79523ec nvrd32+0xa3d0
b79523b8 f72ad0aa 88ab5bf0 8a1bd1a0 88ab5be0 nvrd32+0xa42a
b79523ec f72a522e 8a1bd1a0 88ab5be0 88ab5be0 nvrd32+0x120aa
b7952400 f72a546c 8a1bd1a0 88ab5be0 b795241c nvrd32+0xa22e
b7952410 f72a5488 88ab5be0 b795243c f72add0d nvrd32+0xa46c
b795241c f72add0d 88ab5be0 00000000 8a1bd1a0 nvrd32+0xa488
b795243c f72a53d0 8a1bd1a0 88ab5be0 f7ab6000 nvrd32+0x12d0d
b795245c f72a183b 00ab5be0 8a18eba8 8a18e9d8 nvrd32+0xa3d0
b7952484 f729fb52 8a18e9d8 88ab4d04 f72b0134 nvrd32+0x683b
b79524ac f72a0d98 00000000 f72b0134 00000000 nvrd32+0x4b52
b79524c8 f72a0ed2 00000000 8a18e9d8 88ab4c08 nvrd32+0x5d98
b79524e4 f729c284 8a18e9d8 88ab4ac0 b7952510 nvrd32+0x5ed2
b79524f4 f72a2d8b 8a18e9d8 88ab4ac0 88ade578 nvrd32+0x1284
b7952510 8081e185 8a229428 88ab4ac0 88ab4c58 nvrd32+0x7d8b
b7952524 f7289607 88ab4c58 d4642000 b7952568 nt!IofCallDriver+0x45
b7952534 f72892b2 88ab4c58 8a122b70 88ade750 CLASSPNP!SubmitTransferPacket+0xbb
b7952568 f7289533 00000000 00001000 88ade578 CLASSPNP!ServiceTransferRequest+0x1e4
b795258c 8081e185 8a122ab8 00000000 8a151030 CLASSPNP!ClassReadWrite+0x159
b79525a0 f74c803f 8a1205e0 8a122880 b79525c4 nt!IofCallDriver+0x45
b79525b0 8081e185 8a122880 88ade578 88ade774 PartMgr!PmReadWrite+0x9a
b79525c4 f7317053 88ade578 8a22f058 88ade578 nt!IofCallDriver+0x45
b79525e0 8081e185 8a120528 88ade578 88ade798 ftdisk!FtDiskReadWrite+0x1a9
b79525f4 f72c08bc 8a2211c8 e161f438 8a18d868 nt!IofCallDriver+0x45
b795260c 8081e185 8a18d868 88ade578 88ade578 volsnap!VolSnapRead+0x52
b7952620 f7b50a62 b7952904 b7952804 f7b508d9 nt!IofCallDriver+0x45
b795262c f7b508d9 b7952904 8a18d868 5cc44000 Ntfs!NtfsSingleAsync+0x91
b7952804 f7b51156 b7952904 88ade578 e161f438 Ntfs!NtfsNonCachedIo+0x2db
b79528f0 f7b51079 b7952904 88ade578 00000001 Ntfs!NtfsCommonRead+0xaf5
b7952a9c 8081e185 8a09f718 88ade578 8a17f400 Ntfs!NtfsFsdRead+0x113
b7952ab0 f7205c59 8a17f400 00000000 00000306 nt!IofCallDriver+0x45
b7952ad8 8081e185 8a0a02d0 88ade578 88ade578 fltMgr!FltpDispatch+0x6f
b7952aec 8081e70d 88b53920 88b5d950 c060ba08 nt!IofCallDriver+0x45
b7952b04 808523fe 8a0a380d 88b5d988 88b5d968 nt!IoPageRead+0x109
b7952ba0 8086000e 00000001 c1741000 88b5d950 nt!MiDispatchFault+0xece
b7952c24 8088e680 00000000 c1741000 00000000 nt!MmAccessFault+0x89e
b7952c24 808b84a6 00000000 c1741000 00000000 nt!KiTrap0E+0xdc
b7952cec f7b90f2d 8a0a3898 b7952d1c 00001000 nt!CcMapData+0x8c
b7952d0c f7b939d5 88b0d648 e161f438 00041000 Ntfs!NtfsMapStream+0x4b
b7952d3c f7b94050 88b0d648 0000000c 00000041 Ntfs!ReadIndexBuffer+0x8f
b7952d6c f7b7f746 88b0d648 b7952e10 b7952ed4 Ntfs!FindFirstIndexEntry+0x196
b7952e74 f7b7f83c 88b0d648 e161f438 b7952ed4 Ntfs!NtOfsFindRecord+0xa4
b7952edc f7b7f95e 88b0d648 8a09f7f8 00000dad Ntfs!MapSecurityIdToSecurityDescriptorHeaderUnsafe+0x32
b7952f2c f7b8463e 88b0d648 8a09f7f8 00000dad Ntfs!NtfsCacheSharedSecurityBySecurityId+0x9e
b7952fc0 f7b847be 88b0d648 00000001 e17d3330 Ntfs!NtfsUpdateFcbInfoFromDisk+0x1d9
b795308c f7b820b9 88b0d648 88ad5db8 88ad5fd8 Ntfs!NtfsOpenFile+0x330
b79532b0 f7b91ef8 88b0d648 88ad5db8 b79532f0 Ntfs!NtfsCommonCreate+0x127e
b79533b4 8081e185 8a09f718 88ad5db8 8a17f400 Ntfs!NtfsFsdCreate+0x17d
b79533c8 f7213482 88ac2198 8a17f400 8a174368 nt!IofCallDriver+0x45
b79533f4 8081e185 8a0a02d0 88ad5db8 88ad5db8 fltMgr!FltpCreate+0xe4
b7953408 808fb411 b79535b4 8a120510 00000000 nt!IofCallDriver+0x45
b79534f0 80939f4d 8a120528 00000000 88b63760 nt!IopParseDevice+0xa35
b7953574 80936066 00000000 b79535b4 00000240 nt!ObpLookupObjectName+0x5c1
b79535c8 808ed0b5 00000000 00000000 00000100 nt!ObOpenObjectByName+0xea
b7953644 808ee36b b79537d8 80100000 88ad4db4 nt!IopCreateFile+0x447
b79536a0 808f0faa b79537d8 80100000 88ad4db4 nt!IoCreateFile+0xa3
b79536e0 8088b658 b79537d8 80100000 88ad4db4 nt!NtCreateFile+0x30
b79536e0 8082f4d1 b79537d8 80100000 88ad4db4 nt!KiSystemServicePostCall
b7953784 b8572ab9 b79537d8 80100000 88ad4db4 nt!ZwCreateFile+0x11
b79537cc 80a64456 ffdff120 00000000 b79537f8 nv4_mini!nvDumpConfig+0x1e1b29
b79537dc 80831eaa 00000001 ffdffa7c 80a64456 hal!KfLowerIrql+0x62
b79537f8 80828f97 00000000 88b53920 88b539c8 nt!KiExitDispatcher+0x130
b7953810 80829f96 ffdffa7c 88aadf28 88aade88 nt!KiAdjustQuantumThread+0x109
b795385c b79b6220 b79aa140 00000000 b79b6a9b nt!KeWaitForSingleObject+0x536
b79538e4 8081e185 8a02d038 88aade70 00000004 VIDEOPRT!pVideoPortDispatch+0xae
b79538f8 bf827957 00000003 00000000 80100000 nt!IofCallDriver+0x45
b7953928 bf809b0f 8a02d038 00232ee8 b7953984 win32k!GreDeviceIoControl+0x93
b795394c bd1195dd 8a02d038 00232ee8 b7953984 win32k!EngDeviceIoControl+0x1f
b7953c50 80a64456 808b0b40 808b6100 b7953ca8 nv4_disp+0x1025dd
b7953c60 80a6256d 808954f8 88b51204 0000006e hal!KfLowerIrql+0x62
b7953c64 808954f8 88b51204 0000006e 00000000 hal!KeReleaseQueuedSpinLock+0x2d
b7953ca8 88ac3d38 00000078 00000005 00000004 nt!ExAllocatePoolWithTag+0x980
00000000 00000000 00000000 00000000 00000000 0x88ac3d38

The !analyze output indicates that this crash the result of stack overflow and suggests us to use the !stackusage command to investigate stack usage. So, let's follow the instructions that the debugger is giving us.

kd> !stackusage
    Stack Usage By Module
      Size     Count  Module
0x00000B44        14  Ntfs
0x0000078C        32  nt
0x00000304         1  nv4_disp
0x00000268        19  nvrd32
0x00000118         8  SCSIPORT
0x000000AC         2  nvgts
0x00000088         1  VIDEOPRT
0x00000068         3  CLASSPNP
0x00000054         2  win32k
0x00000054         2  fltMgr
0x00000050         4  hal
0x00000048         1  nv4_mini
0x0000003C         2  ACPI
0x0000001C         1  ftdisk
0x00000018         1  volsnap
0x00000010         1  PartMgr

It would be easy to blame NTFS since it is the top stack consumer, but instead let's find out what drivers are making calls into NTFS to access data. We can see from above that the rest of the stack is not complete. It's because third party drivers were compiled with FPO optimisation turned on. So, I reconstructed the stack manually.

    #   Memory  ChildEBP RetAddr  
00           00000000 f722c88e nt!KiTrap08+0x75
01  b7952000 b7952000 f722de1f SCSIPORT!GetLogicalUnitExtension+0x9
02        20 b7952020 f7247d29 SCSIPORT!ScsiPortGetLogicalUnit+0x1e
03        2c b795204c f7248d95 nvgts+0x1d29
04        80 b79520cc f72290be nvgts+0x2d95
05        24 b79520f0 8088f3f1 SCSIPORT!SpStartIoSynchronized+0x14f
06        38 b7952128 80a63147 nt!KeSynchronizeExecution+0x21
07        2c b7952154 f72293a6 hal!HalBuildScatterGatherList+0x1c7
08        4c b79521a0 8081d180 SCSIPORT!ScsiPortStartIo+0x36a
09        24 b79521c4 f722962f nt!IoStartPacket+0x82
0a        34 b79521f8 f7229146 SCSIPORT!ScsiPortFdoDispatch+0x270
0b        1c b7952214 f7228dc3 SCSIPORT!SpDispatchRequest+0x68
0c        1c b7952230 f7228299 SCSIPORT!ScsiPortPdoScsi+0x129
0d        14 b7952244 8081e185 SCSIPORT!ScsiPortGlobalDispatch+0x1d
0e        14 b7952258 f73593c6 nt!IofCallDriver+0x45
0f         c b7952264 f7359bb2 ACPI!ACPIDispatchForwardIrp+0x2a
10        30 b7952294 8081e185 ACPI!ACPIDispatchIrp+0x15a
11        14 b79522a8 f729c6c8 nt!IofCallDriver+0x45
12        18 b79522c0 f729fee4 nvrd32+0x16c8
13        70 b7952330 f72a0f90 nvrd32+0x4ee4
14        30 b7952360 f72a19a4 nvrd32+0x5f90
15        20 b7952380 f72a1a6b nvrd32+0x69a4
16         8 b7952388 f72a53d0 nvrd32+0x6a6b
17        20 b79523a8 f72a542a nvrd32+0xa3d0
18        10 b79523b8 f72ad0aa nvrd32+0xa42a
19        34 b79523ec f72a522e nvrd32+0x120aa
1a        14 b7952400 f72a546c nvrd32+0xa22e
1b        10 b7952410 f72a5488 nvrd32+0xa46c
1c         c b795241c f72add0d nvrd32+0xa488
1d        20 b795243c f72a53d0 nvrd32+0x12d0d
1e        20 b795245c f72a183b nvrd32+0xa3d0
1f        28 b7952484 f729fb52 nvrd32+0x683b
20        28 b79524ac f72a0d98 nvrd32+0x4b52
21        1c b79524c8 f72a0ed2 nvrd32+0x5d98
22        1c b79524e4 f729c284 nvrd32+0x5ed2
23        10 b79524f4 f72a2d8b nvrd32+0x1284
24        1c b7952510 8081e185 nvrd32+0x7d8b
25        14 b7952524 f7289607 nt!IofCallDriver+0x45
26        10 b7952534 f72892b2 CLASSPNP!SubmitTransferPacket+0xbb
27        34 b7952568 f7289533 CLASSPNP!ServiceTransferRequest+0x1e4
28        24 b795258c 8081e185 CLASSPNP!ClassReadWrite+0x159
29        14 b79525a0 f74c803f nt!IofCallDriver+0x45
2a        10 b79525b0 8081e185 PartMgr!PmReadWrite+0x9a
2b        14 b79525c4 f7317053 nt!IofCallDriver+0x45
2c        1c b79525e0 8081e185 ftdisk!FtDiskReadWrite+0x1a9
2d        14 b79525f4 f72c08bc nt!IofCallDriver+0x45
2e        18 b795260c 8081e185 volsnap!VolSnapRead+0x52
2f        14 b7952620 f7b50a62 nt!IofCallDriver+0x45
30         c b795262c f7b508d9 Ntfs!NtfsSingleAsync+0x91
31       1d8 b7952804 f7b51156 Ntfs!NtfsNonCachedIo+0x2db
32        ec b79528f0 f7b51079 Ntfs!NtfsCommonRead+0xaf5
33       1ac b7952a9c 8081e185 Ntfs!NtfsFsdRead+0x113
34        14 b7952ab0 f7205c59 nt!IofCallDriver+0x45
35        28 b7952ad8 8081e185 fltMgr!FltpDispatch+0x6f
36        14 b7952aec 8081e70d nt!IofCallDriver+0x45
37        18 b7952b04 808523fe nt!IoPageRead+0x109
38        9c b7952ba0 8086000e nt!MiDispatchFault+0xece
39        84 b7952c24 8088e680 nt!MmAccessFault+0x89e
3a         0 b7952c24 808b84a6 nt!KiTrap0E+0xdc
3b        c8 b7952cec f7b90f2d nt!CcMapData+0x8c
3c        20 b7952d0c f7b939d5 Ntfs!NtfsMapStream+0x4b
3d        30 b7952d3c f7b94050 Ntfs!ReadIndexBuffer+0x8f
3e        30 b7952d6c f7b7f746 Ntfs!FindFirstIndexEntry+0x196
3f       108 b7952e74 f7b7f83c Ntfs!NtOfsFindRecord+0xa4
40        68 b7952edc f7b7f95e Ntfs!MapSecurityIdToSecurityDescriptorHeaderUnsafe+0x32
41        50 b7952f2c f7b8463e Ntfs!NtfsCacheSharedSecurityBySecurityId+0x9e
42        94 b7952fc0 f7b847be Ntfs!NtfsUpdateFcbInfoFromDisk+0x1d9
43        cc b795308c f7b820b9 Ntfs!NtfsOpenFile+0x330
44       224 b79532b0 f7b91ef8 Ntfs!NtfsCommonCreate+0x127e
45       104 b79533b4 8081e185 Ntfs!NtfsFsdCreate+0x17d
46        14 b79533c8 f7213482 nt!IofCallDriver+0x45
47        2c b79533f4 8081e185 fltMgr!FltpCreate+0xe4
48        14 b7953408 808fb411 nt!IofCallDriver+0x45
49        e8 b79534f0 80939f4d nt!IopParseDevice+0xa35
4a        84 b7953574 80936066 nt!ObpLookupObjectName+0x5c1
4b        54 b79535c8 808ed0b5 nt!ObOpenObjectByName+0xea
4c        7c b7953644 808ee36b nt!IopCreateFile+0x447
4d        5c b79536a0 808f0faa nt!IoCreateFile+0xa3
4e        40 b79536e0 8088b658 nt!NtCreateFile+0x30
4f         0 b79536e0 8082f4d1 nt!KiSystemServicePostCall
50        a4 b7953784 b8572ab9 nt!ZwCreateFile+0x11
51        48 b79537cc 80a64456 nv4_mini!nvDumpConfig+0x1e1b29
52        10 b79537dc 80831eaa hal!KfLowerIrql+0x62
53        1c b79537f8 80828f97 nt!KiExitDispatcher+0x130
54        18 b7953810 80829f96 nt!KiAdjustQuantumThread+0x109
55        4c b795385c b79b6220 nt!KeWaitForSingleObject+0x536
56        88 b79538e4 8081e185 VIDEOPRT!pVideoPortDispatch+0xae
57        14 b79538f8 bf827957 nt!IofCallDriver+0x45
58        30 b7953928 bf809b0f win32k!GreDeviceIoControl+0x93
59        24 b795394c bd1195dd win32k!EngDeviceIoControl+0x1f
5a       304 b7953c50 80a64456 nv4_disp+0x1025dd
5b        10 b7953c60 80a6256d hal!KfLowerIrql+0x62
5c         4 b7953c64 808954f8 hal!KeReleaseQueuedSpinLock+0x2d
5d        44 b7953ca8 88ac3d38 nt!ExAllocatePoolWithTag+0x980
5e           00000000 00000000 0x88ac3d38
             b795402c 80a64456 hal!HalpCheckForSoftwareInterrupt+0x81
01        10 b795403c 80831eaa hal!KfLowerIrql+0x62
02        1c b7954058 8082aebf nt!KiExitDispatcher+0x130
03        20 b7954078 8081e457 nt!KeInsertQueueApc+0x57
04        34 b79540ac b79b6c64 nt!IopfCompleteRequest+0x201
05        74 b7954120 8081e185 VIDEOPRT!pVideoPortDispatch+0xaf2
06        14 b7954134 bf827957 nt!IofCallDriver+0x45
07        30 b7954164 bf809b0f win32k!GreDeviceIoControl+0x93
08        24 b7954188 bd1198ef win32k!EngDeviceIoControl+0x1f
09        ec b7954274 808520c0 nv4_disp+0x1028ef
0a        98 b795430c b79543f8 nt!MiDispatchFault+0xb90
0b        5c b7954368 8088e680 0xb79543f8
0c         0 b7954368 e16e0010 nt!KiTrap0E+0xdc
0d           00000000 00000000 0xe16e0010
00           b79544e4 8081e185 VIDEOPRT!pVideoPortDispatch+0xaf2
01        14 b79544f8 bf827957 nt!IofCallDriver+0x45
02        30 b7954528 bf809b0f win32k!GreDeviceIoControl+0x93
03        24 b795454c bd085d0c win32k!EngDeviceIoControl+0x1f
04        20 b795456c bd3f0968 nv4_disp+0x6ed0c
05         4 b7954570 e1704010 nv4_disp+0x3d9968
06         4 b7954574 e1704010 0xe1704010
07         4 b7954578 e1705230 0xe1704010
08         4 b795457c 00000000 0xe1705230
00           b79549e8 bf80484e win32k!PDEVOBJ::PDEVOBJ+0x1bc
01        90 b7954a78 bf80465c win32k!hCreateHDEV+0x319
02       17c b7954bf4 bf8065dd win32k!DrvCreateMDEV+0x4f0
03        f4 b7954ce8 bf80645d win32k!DrvChangeDisplaySettings+0x2eb
04        3c b7954d24 bf8063c4 win32k!InitVideo+0x28
05        24 b7954d48 bf8062d8 win32k!UserInitialize+0x10d
06         8 b7954d50 8088b658 win32k!NtUserInitialize+0x8b
07         0 b7954d50 7c9383ac nt!KiSystemServicePostCall
08           0015fdb0 00000000 0x7c9383ac

As we can see there is only one call that is accessing some data and the result of this call is that about half of the stack was consumed.

kd> ? b7953784-b7952000
Evaluate expression: 6020 = 00001784

kd> ? b7955000-b7953784
Evaluate expression: 6268 = 0000187c

Let's find out the name of the file. We know that the third parameter to the ZwCreateFile function is a pointer to an OBJECT_ATTRIBUTES structure that specifies the object name.

kd> dps b7953784 L5
b7953784  00080286
b7953788  b8572ab9 nv4_mini!nvDumpConfig+0x1e1b29
b795378c  b79537d8
b7953790  80100000
b7953794  88ad4db4

kd> dt nt!_OBJECT_ATTRIBUTES 88ad4db4
   +0x000 Length           : 0x18
   +0x004 RootDirectory    : (null) 
   +0x008 ObjectName       : 0xb79539c8 _UNICODE_STRING "\SystemRoot\system32\nvdrssel.bin"
   +0x00c Attributes       : 0x240
   +0x010 SecurityDescriptor : (null) 
   +0x014 SecurityQualityOfService : (null)

So, to lower the stack consumption and to stop its overflow, I recommended to rename the nvdrssel.bin file. When it was done, the server came back to life.