There are conditions that can occur on a windows backup that can lead you to believe the backup is in progress, but it is hung indefinitely.
Example:
Job details:
10/22/2013 20:18:54 - Info nbjm (pid=19111) starting backup job (jobid=3688964) for client ironbark-bk, policy ITSS-ironbark, schedule Daily_Inc
10/22/2013 20:18:54 - estimated 2241187 kbytes needed
10/22/2013 20:18:54 - Info nbjm (pid=19111) started backup (backupid=ironbark-bk_1382487534) job for client ironbark-bk, policy ITSS-ironbark, schedule Daily_Inc on storage unit fimbo-disk-pool-01
10/22/2013 20:18:57 - started process bpbrm (pid=15885)
10/22/2013 20:18:59 - Info bpbrm (pid=15885) ironbark-bk is the host to backup data from
10/22/2013 20:18:59 - Info bpbrm (pid=15885) reading file list from client
10/22/2013 20:18:59 - connecting
10/22/2013 20:19:00 - Info bpbrm (pid=15885) starting bpbkar on client
10/22/2013 20:19:00 - connected; connect time: 0:00:00
10/22/2013 20:19:06 - Info bpbkar (pid=3420) Backup started
10/22/2013 20:19:06 - Info bpbrm (pid=15885) bptm pid: 15886
10/22/2013 20:19:08 - Info bptm (pid=15886) start
10/22/2013 20:19:08 - Info bptm (pid=15886) using 1048576 data buffer size
10/22/2013 20:19:08 - Info bptm (pid=15886) setting receive network buffer to 1048576 bytes
10/22/2013 20:19:08 - Info bptm (pid=15886) using 48 data buffers
10/22/2013 20:19:14 - Info bptm (pid=15886) start backup
10/22/2013 20:19:15 - Info bptm (pid=15886) backup child process is pid 15916
10/22/2013 20:19:15 - begin writing
bpbrm
20:18:56.981 [15885] <2> bpbrm main: INITIATING (VERBOSE = 0): version NetBackup 7.5 2013061020
20:18:56.982 [15885] <2> logparams: -backup -S walleye -c ironbark-bk -ct 13 -ru root -cl ITSS-ironbark -sched Daily_Inc -bt 1382487534 -dt 86396 -st 1 -b ironbark-bk_1382487534 -mediasvr fimbo -jobid 3688964 -jobgrpid 3688824 -masterversion 750000 -maxfrag 524287 -bpstart_time 1382487835 -reqid -1379845250 -mt 0 -to 0 -stunit fimbo-disk-pool-01 -rl 3 -rp 2678400 -eari 0 -p /ux1848 -mst 6 -flags 0 -storagesvr fimbo -sts_type AdvancedDisk -use_ofb -use_otm -WOFB_enabled 1 -WOFB_fim 1 -WOFB_error 1 -nbu -jm -secure 1 -kl 8 -rg root -fso -tir -tir_plus -connect_options 16974338
20:18:57.001 [15885] <2> vnet_pbxConnect: pbxConnectEx Succeeded
20:19:06.087 [15885] <2> write_file_names: successfully wrote buffer to COMM_SOCK
20:19:06.088 [15885] <2> bpbrm main: wrote CONTINUE on COMM_SOCK
20:19:06.089 [15885] <2> bpbrm main: from client ironbark-bk: INF - BACKUP 10/22/2013 8:19:05 PM ironbark-bk ITSS-ironbark Daily_Inc INCR 10/21/2013 7:19:09 PM
20:19:15.405 [15885] <2> bpbrm mm_sig: received ready signal from media manager
Nothing else
bptm
20:19:08.240 [15886] <2> bptm: INITIATING (VERBOSE = 0): -w -c ironbark-bk -dpath /ux1848 -stunit fimbo-disk-pool-01 -cl ITSS-ironbark -bt 1382487534 -b ironbark-bk_1382487534 -st 1 -cj 1 -reqid -1379845250 -jm -brm -hostname ironbark-bk -ru root -rclnt ironbark-bk -rclnthostname ironbark-bk -rl 3 -rp 2678400 -sl Daily_Inc -ct 13 -maxfrag 524287 -tir -eari 0 -mediasvr fimbo -nonrsvdports -no_callback -connect_options 0x01010101 -jobid 3688964 -jobgrpid 3688824 -masterversion 750000 -bpbrm_shm_id 419430423 -blks_per_buffer 2048
20:19:08.257 [15886] <2> vnet_pbxConnect: pbxConnectEx Succeeded
20:19:08.258 [15886] <2> job_connect: SO_KEEPALIVE set on socket 1 for client walleye
.....
20:19:15.339 [15886] <2> write_backup: backup child process is pid 15916
20:19:15.341 [15916] <2> close_disk_descriptors: descriptor close: files = FALSE sockets = FALSE
The above line is the only child PID for 15916 I see.
20:19:15.394 [15886] <2> db_end: Need to collect reply
20:19:15.405 [15886] <4> write_backup: begin writing backup id ironbark-bk_1382487534, copy 1, fragment 1, destination path /ux1848
20:19:15.405 [15886] <2> signal_parent: sending SIGUSR1 to bpbrm (pid = 15885)
20:19:15.405 [15886] <2> write_data: twin_index: 0 active: 1 dont_process: 0 wrote_backup_hdr: 0 finished_buff: 0 saved_cindex: -1 twin_is_disk 1 delay_brm: 0
20:19:15.405 [15886] <2> write_data: Total Kbytes transferred 0
bpbkar
8:19:20.067 PM: [3420.3168] <4> dos_backup::tfs_scannext: INF - no more path list entries
8:19:20.239 PM: [3420.3168] <4> VssSnapshot::V_VssVolumeSnapshot(): INF - Drive 'D:' added to VSS Volume Set
8:19:20.239 PM: [3420.3168] <4> VssSnapshot::V_VssVolumeSnapshot(): INF - Drive 'd:' added to VSS Volume Set
8:19:59.319 PM: [3420.3168] <4> VssSnapshot::V_VssVolumeSnapshot(): INF - VSS Volume Set snapped
8:19:59.319 PM: [3420.3168] <4> VssSnapshot::V_VssVolumeSnapshot(): INF - About to commence post-snap unbounded association checks
8:19:59.319 PM: [3420.3168] <4> backup_create: INF - NetBackup Temp Directory: 'D:\Program Files\Veritas\\NetBackup\Temp'
8:19:59.319 PM: [3420.3168] <4> tar_backup_tfi::UpdateExcludeListWithVHD: INF - UpdateExludeListWithVHD begin
Nothing else
the windows event log shows the cause:
Here is what we see for this hung backup:
Wed Oct 23 2013 01:24:53 Virtual Disk Service E1 Unexpected failure. Error code: 2@0200001D
Wed Oct 23 2013 01:04:54 Virtual Disk Service E1 Unexpected failure. Error code: 2@0200001D
Wed Oct 23 2013 00:44:54 Virtual Disk Service E1 Unexpected failure. Error code: 2@0200001D
Wed Oct 23 2013 00:24:54 Virtual Disk Service E1 Unexpected failure. Error code: 2@0200001D
Wed Oct 23 2013 00:04:54 Virtual Disk Service E1 Unexpected failure. Error code: 2@0200001D
Tue Oct 22 2013 20:19:59 Service Control Manager I7036 The Virtual Disk Service service entered the running state.
Tue Oct 22 2013 20:19:59 Service Control Manager I7035 The Virtual Disk Service service was successfully sent a start control.
Tue Oct 22 2013 20:19:59 Virtual Disk Service I3 Service started
Can something in NBU be changed so it can know of a fault like this and not have the backup hang indefinitely?