Intermittent Network Pauses

  • Thread starter Thread starter Intermittent Network Pauses
  • Start date Start date
I

Intermittent Network Pauses

Guest
Hi There,

We have a HP Class-C Chassis with 4 Blade servers, all running windows
server 2003. Two of these servers are clustered in Active Passive mode. This
cluster is connected to HP EVA3000 SAN array and the network interface of the
cluster connects to Cisco 3750 stack.
For the past number of months we have been having issues where the clients,
running windows XP lose their network drives and after a pause of approx
10-15 seconds reconnect and in many cases a reboot of the workstation is
required.

We have performed a number of test to isolate the root casue of the issue.
The network was fully checked and ruled out as the casue. Packet capture did
not reveal anything unusual, except traffic stopping from the cluster during
the pause.

We perfomed a number of tests on the cluster nodes and one of the tests was
a copy test.
We ran perfmon and started copying files between local drives and SAN drives.

Test1 :copy a Gigabyte file from Local C: to D:
Test2: copy the same file from local C: to SAN
Test3: Copy the same file from SAN to local C:

During all of the above tests we observed, on perfmon, the CPU utilsation
dropping to zero and the the network Interface utilisation dropping to zero
at exactly the same time. While the CPU utilisation recovered almost
immedailty, the network utilisation stayed at 0% for the duration of the copy.
These tests caused exactly the same outages that our users experience.
While this was happening I could still ping the server at all time.

The servers are running Windows 2003 Server SP2.
RSS is disabled on the Nics.
TOE is disabled.
Teaming is disabled as well.

Has anyone seen or had this or similar issues. Please help.

Thank you
 
Re: Intermittent Network Pauses


I have this exact same problem. My setup is:

Windows 2003 R2 Sp2 x64 Active/Passive Setup attached to a Xiotech
Array.
The systems are 2950 Dell Servers, quad processr 8gb of memory.

We have the exact same problem, that every so often network activity to
the servers pauses for 10-15 seconds. We had this problem pre-sp2 and
upgraded to sp2 to try and mitigate it. Things I have discovered:

1. It seems to correlate with periods when many connections being timed
out. So if you watch tcpmon ( sysinternals tool ) and see a bunch of
TIME_WAIT connections, if the system pauses the number of TIME_WAIT
connections will be drastically less. But correlation isn't causation.
I think this is a side effect, not the problem.

2. I get this feeling that it is something to do with rpc getting
hungup doing reverse lookups. But I can't prove it.

Have you disabled the tcp chimney stuff?


--
Squidi
------------------------------------------------------------------------
Squidi's Profile: http://forums.techarena.in/member.php?userid=48647
View this thread: http://forums.techarena.in/showthread.php?t=962570

http://forums.techarena.in
 
Re: Intermittent Network Pauses

We've seen this exact problem as well since around February or March this
year. Our hardware is Dell PowerEdge 2650's (2 node active/passive cluster on
W2k3 SP2 32-bit Enterprise), with a Dell/EMC SAN. It seems to be somehow
connected to increased network activity or large file transfers, but there
are never any useful events in the logs or illuminating activity on any
performance counters.

Despite many, many hours spent on the phone, so far neither MS or Dell has
been able to isolate the root cause. :(
 
Re: Intermittent Network Pauses


$hawn,

Have you looked into your storport driver versions? I figure you have,
but I have a very similar setup to yours at a different facility and I
overlooked the Microsoft KB's that upgrade the storport and we had a
nagging performance issue that was caused by older storport drivers.


The other guy,

Do you have VSS in use in any form? We don't use it for snapshots, but
we have a backup program that uses it to backup the SQL database on one
of the nodes. Do you ever see VSS messages in your event viewer?


--
Squidi
------------------------------------------------------------------------
Squidi's Profile: http://forums.techarena.in/member.php?userid=48647
View this thread: http://forums.techarena.in/showthread.php?t=962570

http://forums.techarena.in
 
Re: Intermittent Network Pauses

Yes. Actually today we replaced an entire server with a new Dell; all brand
new hardware and the latest drivers for everything. The dropouts are still
happening.
Dell swears there's nothing wrong with the SAN.

I'm starting to think that it could be a Win 2003 OS update? I might try
removing all updates since January to see if it goes away...


> Have you looked into your storport driver versions? I figure you have,
> but I have a very similar setup to yours at a different facility and I
> overlooked the Microsoft KB's that upgrade the storport and we had a
> nagging performance issue that was caused by older storport drivers.
 
Re: Intermittent Network Pauses


$hawn;3729513 Wrote:
> Yes. Actually today we replaced an entire server with a new Dell; all
> brand
> new hardware and the latest drivers for everything. The dropouts are
> still
> happening.
> Dell swears there's nothing wrong with the SAN.
>
> I'm starting to think that it could be a Win 2003 OS update? I might
> try
> removing all updates since January to see if it goes away...
>


If it works let me know, the problem started in October (ish) of 2007
for me, and I stopped updating the machines shortly thereafter because
I didn't want to throw in extra variables. Then I did all of the
updates ( including sp2 ) because I'm out of ideas.


--
Squidi
------------------------------------------------------------------------
Squidi's Profile: http://forums.techarena.in/member.php?userid=48647
View this thread: http://forums.techarena.in/showthread.php?t=962570

http://forums.techarena.in
 
RE: Intermittent Network Pauses

We logged a call with MS and they asked us to upgrade couple of drivers. We
will be doing the upgrade in the next day or so and will post the out come.

Drivers to be Upgraded:

1) Update elxstor driver to the latest version.
ELXSTOR.SYS |Emulex |5.1:20.7 |Aug 04 2006
|Storport Miniport Driver for LightPulse HBAs

2) Update hpcisss2.sys or we can contact HP to get latest Proliant support
Pack

HPCISSS2.SYS |Hewlett-Packard Company |6.8:0.32 |Jun 21 2007 |Smart
Array SAS/SATA Controller Storport Driver

"Intermittent Network Pauses" wrote:

> Hi There,
>
> We have a HP Class-C Chassis with 4 Blade servers, all running windows
> server 2003. Two of these servers are clustered in Active Passive mode. This
> cluster is connected to HP EVA3000 SAN array and the network interface of the
> cluster connects to Cisco 3750 stack.
> For the past number of months we have been having issues where the clients,
> running windows XP lose their network drives and after a pause of approx
> 10-15 seconds reconnect and in many cases a reboot of the workstation is
> required.
>
> We have performed a number of test to isolate the root casue of the issue.
> The network was fully checked and ruled out as the casue. Packet capture did
> not reveal anything unusual, except traffic stopping from the cluster during
> the pause.
>
> We perfomed a number of tests on the cluster nodes and one of the tests was
> a copy test.
> We ran perfmon and started copying files between local drives and SAN drives.
>
> Test1 :copy a Gigabyte file from Local C: to D:
> Test2: copy the same file from local C: to SAN
> Test3: Copy the same file from SAN to local C:
>
> During all of the above tests we observed, on perfmon, the CPU utilsation
> dropping to zero and the the network Interface utilisation dropping to zero
> at exactly the same time. While the CPU utilisation recovered almost
> immedailty, the network utilisation stayed at 0% for the duration of the copy.
> These tests caused exactly the same outages that our users experience.
> While this was happening I could still ping the server at all time.
>
> The servers are running Windows 2003 Server SP2.
> RSS is disabled on the Nics.
> TOE is disabled.
> Teaming is disabled as well.
>
> Has anyone seen or had this or similar issues. Please help.
>
> Thank you
>
>
>
>
 
Re: Intermittent Network Pauses


Below is a little perl script that keeps track of when it happens (
roughly ). Part of the problem with figuring this out is that there is
no record of when or how often it happens. If you point it at a text
file on a share ( smbstat \\server\share\file.txt ), it opens it, shows
it to you, closes it and records the the time in the log file. I then
look through the log file with something like:

awk '{if ( $7 > 1 ) {print $0}}' smbstat4.log

Which spits out everything that took greater then a second. No making
fun of my perlfu. This is one of the ways to do it!

I also posted on Microsoft's forums. The response was basically, "Call
PSS".

You two's clusters are on the supported list, maybe you would have
better luck.


---perl-------------------------------------------------

use Time::HiRes qw(usleep gettimeofday tv_interval);

if ( $#ARGV == -1 ) { die "Usage: smbstat <uncpathnametofile>\n"; }
$filename = @ARGV[0];

do {
open OUTFILE, ">>smbstat4.log" or die $!;
$before = [gettimeofday];

open NETFILE, $filename or die $!;
while (<NETFILE>) { print $_; }
($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,$atime,$mtime,$ctime,$blksize,$blocks)=
stat($filename);
print $size;
close NETFILE;

$after = [gettimeofday];
$i = tv_interval $before, $after;
$t = localtime;
print OUTFILE $t, " Interval: $i \n";
close OUTFILE;
sleep(5); } while (1);
--------------------------------------


--
Squidi
------------------------------------------------------------------------
Squidi's Profile: http://forums.techarena.in/member.php?userid=48647
View this thread: http://forums.techarena.in/showthread.php?t=962570

http://forums.techarena.in
 
Re: Intermittent Network Pauses

We rebuilt one of our fileservers, but leaving off all Windows Updates since
Oct 2007. So far this seems to have done the trick!

Now if we can only figure out which update caused the problem...

"Squidi" wrote:

>
> $hawn;3729513 Wrote:
> > Yes. Actually today we replaced an entire server with a new Dell; all
> > brand
> > new hardware and the latest drivers for everything. The dropouts are
> > still
> > happening.
> > Dell swears there's nothing wrong with the SAN.
> >
> > I'm starting to think that it could be a Win 2003 OS update? I might
> > try
> > removing all updates since January to see if it goes away...
> >

>
> If it works let me know, the problem started in October (ish) of 2007
> for me, and I stopped updating the machines shortly thereafter because
> I didn't want to throw in extra variables. Then I did all of the
> updates ( including sp2 ) because I'm out of ideas.
>
>
> --
> Squidi
> ------------------------------------------------------------------------
> Squidi's Profile: http://forums.techarena.in/member.php?userid=48647
> View this thread: http://forums.techarena.in/showthread.php?t=962570
>
> http://forums.techarena.in
>
>
 
Re: Intermittent Network Pauses


That's great! Are you at SP2 or not?

If you are in a good condition, could you generate a list of updates (
with something like http://www.nirsoft.net/utils/wul.html, or whatever
) that would be a "safe list"? Maybe I could work backwards.


--
Squidi
------------------------------------------------------------------------
Squidi's Profile: http://forums.techarena.in/member.php?userid=48647
View this thread: http://forums.techarena.in/showthread.php?t=962570

http://forums.techarena.in
 
Re: Intermittent Network Pauses

Yes, we are at SP2. I used the utility you mentioned to generate this list of
patches that are installed ("Safe List"), pasted below (sorry it's so ugly);
it's every applicable Windows Update for Server 2k3 through Sept 2007. If you
are able to isolate which update that was released after this list is causing
the problem, I think a lot of us would be very grateful. :)

NAME/DESCRIPTION/INSTALL DATE/DISPLAY VERSION/UPDATE TYPE/APPLICATION/WEB
LINK/UNINSTALL COMMAND/LAST MODIFIED TIME

KB914961 Windows Server 2003 Service Pack 2 5/13/2008 Service Pack Windows
Server
2003 http://support.microsoft.com/?kbid=914961 C:\WINDOWS\$NtServicePackUninstall$\spuninst\spuninst.exe 5/13/2008 1:42:55 PM
KB921503 Security Update for Windows Server 2003
(KB921503) 5/13/2008 1 Update Windows Server
2003 http://support.microsoft.com/?kbid=921503 C:\WINDOWS\$NtUninstallKB921503$\spuninst\spuninst.exe 5/13/2008 2:14:02 PM
KB924667-v2 Security Update for Windows Server 2003
(KB924667-v2) 5/13/2008 2 Update Windows Server
2003 http://support.microsoft.com/?kbid=924667-v2 C:\WINDOWS\$NtUninstallKB924667-v2$\spuninst\spuninst.exe 5/13/2008 2:16:57 PM
KB925398_WMP64 Security Update for Windows Media Player 6.4
(KB925398) N/A Windows Media Player
6.4 http://support.microsoft.com/?kbid=925398_WMP64 5/13/2008 2:14:27 PM
KB925398_WMP64 5/13/2008 Update Windows Media Player
6.4 http://support.microsoft.com/?kbid=925398_WMP64 C:\WINDOWS\$NtUninstallKB925398_WMP64$\spuninst\spuninst.exe 5/13/2008 2:14:27 PM
KB925902 Security Update for Windows Server 2003
(KB925902) 5/13/2008 1 Update Windows Server
2003 http://support.microsoft.com/?kbid=925902 C:\WINDOWS\$NtUninstallKB925902$\spuninst\spuninst.exe 5/13/2008 2:17:06 PM
KB926122 Security Update for Windows Server 2003
(KB926122) 5/13/2008 1 Update Windows Server
2003 http://support.microsoft.com/?kbid=926122 C:\WINDOWS\$NtUninstallKB926122$\spuninst\spuninst.exe 5/13/2008 2:13:55 PM
KB927891 Update for Windows Server 2003
(KB927891) 5/13/2008 5 Update Windows Server
2003 http://support.microsoft.com/?kbid=927891 C:\WINDOWS\$NtUninstallKB927891$\spuninst\spuninst.exe 5/13/2008 2:14:09 PM
KB929123 Security Update for Windows Server 2003
(KB929123) 5/13/2008 1 Update Windows Server
2003 http://support.microsoft.com/?kbid=929123 C:\WINDOWS\$NtUninstallKB929123$\spuninst\spuninst.exe 5/13/2008 2:17:38 PM
KB930178 Security Update for Windows Server 2003
(KB930178) 5/13/2008 1 Update Windows Server
2003 http://support.microsoft.com/?kbid=930178 C:\WINDOWS\$NtUninstallKB930178$\spuninst\spuninst.exe 5/13/2008 2:17:31 PM
KB931784 Security Update for Windows Server 2003
(KB931784) 5/13/2008 1 Update Windows Server
2003 http://support.microsoft.com/?kbid=931784 C:\WINDOWS\$NtUninstallKB931784$\spuninst\spuninst.exe 5/13/2008 2:14:41 PM
KB932168 Security Update for Windows Server 2003
(KB932168) 5/13/2008 1 Update Windows Server
2003 http://support.microsoft.com/?kbid=932168 C:\WINDOWS\$NtUninstallKB932168$\spuninst\spuninst.exe 5/13/2008 2:13:42 PM
KB933360 Update for Windows Server 2003
(KB933360) 5/13/2008 1 Update Windows Server
2003 http://support.microsoft.com/?kbid=933360 C:\WINDOWS\$NtUninstallKB933360$\spuninst\spuninst.exe 5/13/2008 2:14:15 PM
KB935839 Security Update for Windows Server 2003
(KB935839) 5/13/2008 1 Update Windows Server
2003 http://support.microsoft.com/?kbid=935839 C:\WINDOWS\$NtUninstallKB935839$\spuninst\spuninst.exe 5/13/2008 2:13:34 PM
KB935840 Security Update for Windows Server 2003
(KB935840) 5/13/2008 1 Update Windows Server
2003 http://support.microsoft.com/?kbid=935840 C:\WINDOWS\$NtUninstallKB935840$\spuninst\spuninst.exe 5/13/2008 2:17:44 PM
KB936021 Security Update for Windows Server 2003
(KB936021) 5/13/2008 1 Update Windows Server
2003 http://support.microsoft.com/?kbid=936021 C:\WINDOWS\$NtUninstallKB936021$\spuninst\spuninst.exe 5/13/2008 2:17:23 PM
KB936782 Security Update for Windows Server 2003
(KB936782) 5/13/2008 1 Update Windows Server
2003 http://support.microsoft.com/?kbid=936782 C:\WINDOWS\$NtUninstallKB936782$\spuninst\spuninst.exe 5/13/2008 2:13:49 PM
KB937143 Security Update for Windows Server 2003
(KB937143) 5/13/2008 1 Update Windows Server
2003 http://support.microsoft.com/?kbid=937143 C:\WINDOWS\$NtUninstallKB937143$\spuninst\spuninst.exe 5/13/2008 2:17:16 PM
KB938127 Security Update for Windows Server 2003
(KB938127) 5/13/2008 1 Update Windows Server
2003 http://support.microsoft.com/?kbid=938127 C:\WINDOWS\$NtUninstallKB938127$\spuninst\spuninst.exe 5/13/2008 2:14:33 PM
MMC30Core 5/13/2008 Update Windows Server
2003 http://support.microsoft.com/kb/MC30Core C:\WINDOWS\$NtUninstallMMC30Core$\spuninst\spuninst.exe 5/13/2008 12:57:33 PM
R2-In-band 5/13/2008 Update Windows Server
2003 http://support.microsoft.com/?kbid=R2-In-band C:\WINDOWS\$NtUninstallR2-In-band$\spuninst\spuninst.exe 5/13/2008 12:57:15 PM
R2-New-files 5/13/2008 Update Windows Server
2003 http://support.microsoft.com/?kbid=R2-New-files C:\WINDOWS\$NtUninstallR2-New-files$\spuninst\spuninst.exe 5/13/2008 12:57:44 PM
SP1 Microsoft .NET Framework 1.1 Service Pack
1 N/A SP .NETFramework http://support.microsoft.com/?kbid=SP1 5/13/2008
1:42:10 PM
 
Re: Intermittent Network Pauses


Hello,

After many weeks of fiddling with everything and anything the storage
group turned on Write Cache for our LUNs on the Xiotech array and the
problem disappeared.

It appears that we having very poor performance from our setup when
writing and reading from a single LUN. I have not figured out where the
bottleneck is, however, we are seeing 8-11 mB/s speeds from the volumes.
Which is terrible. I've got USB disks that work better then that.

For anyone else with similar problems:

Perfmon->Physical Disk->% Idle Time shouldn't be 0 for long stretches.


--
Squidi
------------------------------------------------------------------------
Squidi's Profile: http://forums.techarena.in/member.php?u=48647
View this thread: http://forums.techarena.in/showthread.php?t=962570

http://forums.techarena.in
 
Back
Top