Regular intermittent Kerberos failures

  • Thread starter Thread starter JimLad
  • Start date Start date
J

JimLad

Guest
Hi,

This is a last desperate call for help. About once a week, for
between
2 and 10 minutes, users are unable to log in to our main web
application (ASP based). They get the following message:

'Failed to generate SSPI context'

Looking at the System Log on the web server displays the following
messages for the web site and SQL SPNs:

'The Security System detected an authentication error for the server
HTTP/<website name>. The failure code from authentication protocol
Kerberos was "The time at the Primary Domain Controller is different
than the time at the Backup Domain Controller or member server by too
large an amount.
(0xc0000133)".'

' The Security System detected an authentication error for the server
MSSQLSvc/S05010010.corp.dnsdom.net:1433. The failure code from
authentication protocol Kerberos was "The time at the Primary Domain
Controller is different than the time at the Backup Domain Controller
or member server by too large an amount.
(0xc0000133)".'

I have used net time to check the times on the Domain Controller, web
server and db server. Can't see any problems. Our system guys have
been through the 'Failed to generate SSPI context' knowledge base
articles.

I haven't seen anything referring to this as a regularly repeating
intermittent problem. We are getting worried cos there is always the
chance it won't come back up!

I also notice that the Kerberos group policy "Maximum Tolerance for
Computer Clock Synchronization" is not defined. Does this need to be
defined or will it automatically use the default of 5 minutes?

Any help very gratefully received.

Cheers,

James
 
Re: Regular intermittent Kerberos failures

I am only replying because you have not had a response.
Is there any chance at all that the clocks are in fact out? Any chance that
you are looking at local times on the servers and that the Universal times
are in fact out?
If not:
- obviously there is a chance that something is intermittently breaking the
network, but that would be hard to track unless it tends to happen at the
same time.
- what account are you using for your application pool? The worker process
is recycled every 1740 minutes by default, and I am just wondering if the
process is being recycled, re-authenticating and taking a while to come
back.
Anthony,
http://www.airdesk.co.uk



"JimLad" <jamesdbirch@yahoo.co.uk> wrote in message
news:1187963927.575251.44310@q5g2000prf.googlegroups.com...
> Hi,
>
> This is a last desperate call for help. About once a week, for
> between
> 2 and 10 minutes, users are unable to log in to our main web
> application (ASP based). They get the following message:
>
> 'Failed to generate SSPI context'
>
> Looking at the System Log on the web server displays the following
> messages for the web site and SQL SPNs:
>
> 'The Security System detected an authentication error for the server
> HTTP/<website name>. The failure code from authentication protocol
> Kerberos was "The time at the Primary Domain Controller is different
> than the time at the Backup Domain Controller or member server by too
> large an amount.
> (0xc0000133)".'
>
> ' The Security System detected an authentication error for the server
> MSSQLSvc/S05010010.corp.dnsdom.net:1433. The failure code from
> authentication protocol Kerberos was "The time at the Primary Domain
> Controller is different than the time at the Backup Domain Controller
> or member server by too large an amount.
> (0xc0000133)".'
>
> I have used net time to check the times on the Domain Controller, web
> server and db server. Can't see any problems. Our system guys have
> been through the 'Failed to generate SSPI context' knowledge base
> articles.
>
> I haven't seen anything referring to this as a regularly repeating
> intermittent problem. We are getting worried cos there is always the
> chance it won't come back up!
>
> I also notice that the Kerberos group policy "Maximum Tolerance for
> Computer Clock Synchronization" is not defined. Does this need to be
> defined or will it automatically use the default of 5 minutes?
>
> Any help very gratefully received.
>
> Cheers,
>
> James
>
 
Re: Regular intermittent Kerberos failures

On Aug 24, 3:35 pm, "Anthony" <anthony.s...@spammedout.com> wrote:
> I am only replying because you have not had a response.
> Is there any chance at all that the clocks are in fact out? Any chance that
> you are looking at local times on the servers and that the Universal times
> are in fact out?
> If not:
> - obviously there is a chance that something is intermittently breaking the
> network, but that would be hard to track unless it tends to happen at the
> same time.
> - what account are you using for your application pool? The worker process
> is recycled every 1740 minutes by default, and I am just wondering if the
> process is being recycled, re-authenticating and taking a while to come
> back.
> Anthony,http://www.airdesk.co.uk
>
> "JimLad" <jamesdbi...@yahoo.co.uk> wrote in message
>
> news:1187963927.575251.44310@q5g2000prf.googlegroups.com...
>
>
>
> > Hi,

>
> > This is a last desperate call for help. About once a week, for
> > between
> > 2 and 10 minutes, users are unable to log in to our main web
> > application (ASP based). They get the following message:

>
> > 'Failed to generate SSPI context'

>
> > Looking at the System Log on the web server displays the following
> > messages for the web site and SQL SPNs:

>
> > 'The Security System detected an authentication error for the server
> > HTTP/<website name>. The failure code from authentication protocol
> > Kerberos was "The time at the Primary Domain Controller is different
> > than the time at the Backup Domain Controller or member server by too
> > large an amount.
> > (0xc0000133)".'

>
> > ' The Security System detected an authentication error for the server
> > MSSQLSvc/S05010010.corp.dnsdom.net:1433. The failure code from
> > authentication protocol Kerberos was "The time at the Primary Domain
> > Controller is different than the time at the Backup Domain Controller
> > or member server by too large an amount.
> > (0xc0000133)".'

>
> > I have used net time to check the times on the Domain Controller, web
> > server and db server. Can't see any problems. Our system guys have
> > been through the 'Failed to generate SSPI context' knowledge base
> > articles.

>
> > I haven't seen anything referring to this as a regularly repeating
> > intermittent problem. We are getting worried cos there is always the
> > chance it won't come back up!

>
> > I also notice that the Kerberos group policy "Maximum Tolerance for
> > Computer Clock Synchronization" is not defined. Does this need to be
> > defined or will it automatically use the default of 5 minutes?

>
> > Any help very gratefully received.

>
> > Cheers,

>
> > James- Hide quoted text -

>
> - Show quoted text -


Hi Anthony,

Looks like it's definitely the time, but I'm not sure where and why it
should be such an intermittent problem. Basically it occurs on a
weekly cycle (moving forward by the amount of time it was broken the
week before) and occurs for between a few seconds and 10 minutes.

We turned on Kerberos tracing and in the 16 seconds that it didn't
work this week we got the following messages on the web server:

Event Type: Error
Event Source: Kerberos
Event Category: None
Event ID: 3
Date: 30/08/2007
Time: 17:01:38
User: N/A
Computer: S05010072
Description:

A Kerberos Error Message was received:
on logon session
Client Time:
Server Time: 16:1:39.0000 8/30/2007 Z
Error Code: 0xb KDC_ERR_NEVER_VALID
Extended Error: 0xc0000133 KLIN(0)
Client Realm:
Client Name:
Server Realm: CORP.DNSDOM.NET
Server Name: MSSQLSvc/S05010010.corp.dnsdom.net:1433
Target Name: MSSQLSvc/S05010010.corp.dnsdom.net:1433@CORP.DNSDOM.NET
Error Text:
File: 9
Line: ae0
Error Data is in record data.

Event Type: Error
Event Source: Kerberos
Event Category: None
Event ID: 3
Date: 30/08/2007
Time: 17:01:47
User: N/A
Computer: S05010072
Description:

A Kerberos Error Message was received:
on logon session
Client Time:
Server Time: 16:1:49.0000 8/30/2007 Z
Error Code: 0xb KDC_ERR_NEVER_VALID
Extended Error: 0xc0000133 KLIN(0)
Client Realm:
Client Name:
Server Realm: CORP.DNSDOM.NET
Server Name: HTTP/<websitehostheader>
Target Name: HTTP/<websitehostheader>@CORP.DNSDOM.NET
Error Text:
File: 9
Line: ae0
Error Data is in record data.

0xB - KDC_ERR_NEVER_VALID: Requested start time is later than end time
Associated internal Windows error codes
·None
Corresponding debug output messages
·DebugLog("Client asked for endtime before starttime\n")
Possible Cause and Resolution
·There is a time difference between the KDC and the client.
Resolution
For Kerberos authentication to work, you must synchronize clocks on
the client and on the server. For more information about this error
and how to resolve it, see Time Synchronization (Clock Skew) earlier
in this white paper.

Any ideas why we would get this error message once a week for a window
of between a few seconds and 10 minutes?

Is there any way of knowing where the KDC is? I assume it's one of the
domain controllers, but as we have several is there a way of knowing
which is being used?

We have also been getting non-fatal Kerberos messages (0x25
KRB_AP_ERR_SKEW) about the time on file server S20. This isn't a DC
and isn't involved in the authentication so I'm not sure why we are
getting this message, even though that server is indeed 6 minutes
fast.

Outside this time window we get lots of the following messages:
0x34 KRB_ERR_RESPONSE_TOO_BIG (harmless apparently)
0xd KDC_ERR_BADOPTION (for web server)
0x7 KDC_ERR_S_PRINCIPAL_UNKNOWN (for one of the local domain
controllers, that should always be available)
0x25 KRB_AP_ERR_SKEW (for file server S20)

Cheers,

James
 
Re: Regular intermittent Kerberos failures

James,
This is a difficult one to help you troubleshoot. 7 days is the maximum
renewal for tickets, so it suggests that an account, possibly a machine
account, is up for renewal and can't.
If you have a time error on your file server too, it suggests that time is
not being maintained in the domain.
Things I would check are:
- the actual time of each machine when the error occurs
- the CMOS battery in the servers, for a failure to hold time
- someone has tried unsuccessfully to set an eternal time source: have a
look at the registry keys for Time to check this
Hope that helps, but you may be looking at MS Support to solve this,
Anthony;
http://www.airdesk.co.uk





"JimLad" <jamesdbirch@yahoo.co.uk> wrote in message
news:1188492448.382278.96250@57g2000hsv.googlegroups.com...
On Aug 24, 3:35 pm, "Anthony" <anthony.s...@spammedout.com> wrote:
> I am only replying because you have not had a response.
> Is there any chance at all that the clocks are in fact out? Any chance
> that
> you are looking at local times on the servers and that the Universal times
> are in fact out?
> If not:
> - obviously there is a chance that something is intermittently breaking
> the
> network, but that would be hard to track unless it tends to happen at the
> same time.
> - what account are you using for your application pool? The worker process
> is recycled every 1740 minutes by default, and I am just wondering if the
> process is being recycled, re-authenticating and taking a while to come
> back.
> Anthony,http://www.airdesk.co.uk
>
> "JimLad" <jamesdbi...@yahoo.co.uk> wrote in message
>
> news:1187963927.575251.44310@q5g2000prf.googlegroups.com...
>
>
>
> > Hi,

>
> > This is a last desperate call for help. About once a week, for
> > between
> > 2 and 10 minutes, users are unable to log in to our main web
> > application (ASP based). They get the following message:

>
> > 'Failed to generate SSPI context'

>
> > Looking at the System Log on the web server displays the following
> > messages for the web site and SQL SPNs:

>
> > 'The Security System detected an authentication error for the server
> > HTTP/<website name>. The failure code from authentication protocol
> > Kerberos was "The time at the Primary Domain Controller is different
> > than the time at the Backup Domain Controller or member server by too
> > large an amount.
> > (0xc0000133)".'

>
> > ' The Security System detected an authentication error for the server
> > MSSQLSvc/S05010010.corp.dnsdom.net:1433. The failure code from
> > authentication protocol Kerberos was "The time at the Primary Domain
> > Controller is different than the time at the Backup Domain Controller
> > or member server by too large an amount.
> > (0xc0000133)".'

>
> > I have used net time to check the times on the Domain Controller, web
> > server and db server. Can't see any problems. Our system guys have
> > been through the 'Failed to generate SSPI context' knowledge base
> > articles.

>
> > I haven't seen anything referring to this as a regularly repeating
> > intermittent problem. We are getting worried cos there is always the
> > chance it won't come back up!

>
> > I also notice that the Kerberos group policy "Maximum Tolerance for
> > Computer Clock Synchronization" is not defined. Does this need to be
> > defined or will it automatically use the default of 5 minutes?

>
> > Any help very gratefully received.

>
> > Cheers,

>
> > James- Hide quoted text -

>
> - Show quoted text -


Hi Anthony,

Looks like it's definitely the time, but I'm not sure where and why it
should be such an intermittent problem. Basically it occurs on a
weekly cycle (moving forward by the amount of time it was broken the
week before) and occurs for between a few seconds and 10 minutes.

We turned on Kerberos tracing and in the 16 seconds that it didn't
work this week we got the following messages on the web server:

Event Type: Error
Event Source: Kerberos
Event Category: None
Event ID: 3
Date: 30/08/2007
Time: 17:01:38
User: N/A
Computer: S05010072
Description:

A Kerberos Error Message was received:
on logon session
Client Time:
Server Time: 16:1:39.0000 8/30/2007 Z
Error Code: 0xb KDC_ERR_NEVER_VALID
Extended Error: 0xc0000133 KLIN(0)
Client Realm:
Client Name:
Server Realm: CORP.DNSDOM.NET
Server Name: MSSQLSvc/S05010010.corp.dnsdom.net:1433
Target Name: MSSQLSvc/S05010010.corp.dnsdom.net:1433@CORP.DNSDOM.NET
Error Text:
File: 9
Line: ae0
Error Data is in record data.

Event Type: Error
Event Source: Kerberos
Event Category: None
Event ID: 3
Date: 30/08/2007
Time: 17:01:47
User: N/A
Computer: S05010072
Description:

A Kerberos Error Message was received:
on logon session
Client Time:
Server Time: 16:1:49.0000 8/30/2007 Z
Error Code: 0xb KDC_ERR_NEVER_VALID
Extended Error: 0xc0000133 KLIN(0)
Client Realm:
Client Name:
Server Realm: CORP.DNSDOM.NET
Server Name: HTTP/<websitehostheader>
Target Name: HTTP/<websitehostheader>@CORP.DNSDOM.NET
Error Text:
File: 9
Line: ae0
Error Data is in record data.

0xB - KDC_ERR_NEVER_VALID: Requested start time is later than end time
Associated internal Windows error codes
·None
Corresponding debug output messages
·DebugLog("Client asked for endtime before starttime\n")
Possible Cause and Resolution
·There is a time difference between the KDC and the client.
Resolution
For Kerberos authentication to work, you must synchronize clocks on
the client and on the server. For more information about this error
and how to resolve it, see Time Synchronization (Clock Skew) earlier
in this white paper.

Any ideas why we would get this error message once a week for a window
of between a few seconds and 10 minutes?

Is there any way of knowing where the KDC is? I assume it's one of the
domain controllers, but as we have several is there a way of knowing
which is being used?

We have also been getting non-fatal Kerberos messages (0x25
KRB_AP_ERR_SKEW) about the time on file server S20. This isn't a DC
and isn't involved in the authentication so I'm not sure why we are
getting this message, even though that server is indeed 6 minutes
fast.

Outside this time window we get lots of the following messages:
0x34 KRB_ERR_RESPONSE_TOO_BIG (harmless apparently)
0xd KDC_ERR_BADOPTION (for web server)
0x7 KDC_ERR_S_PRINCIPAL_UNKNOWN (for one of the local domain
controllers, that should always be available)
0x25 KRB_AP_ERR_SKEW (for file server S20)

Cheers,

James
 
Re: Regular intermittent Kerberos failures

On Aug 30, 8:06 pm, "Anthony" <anthony.s...@spammedout.com> wrote:
> James,
> This is a difficult one to help you troubleshoot. 7 days is the maximum
> renewal for tickets, so it suggests that an account, possibly a machine
> account, is up for renewal and can't.
> If you have a time error on your file server too, it suggests that time is
> not being maintained in the domain.
> Things I would check are:
> - the actual time of each machine when the error occurs
> - the CMOS battery in the servers, for a failure to hold time
> - someone has tried unsuccessfully to set an eternal time source: have a
> look at the registry keys for Time to check this
> Hope that helps, but you may be looking at MS Support to solve this,
> Anthony;http://www.airdesk.co.uk
>
> "JimLad" <jamesdbi...@yahoo.co.uk> wrote in message
>
> news:1188492448.382278.96250@57g2000hsv.googlegroups.com...
> On Aug 24, 3:35 pm, "Anthony" <anthony.s...@spammedout.com> wrote:
>
>
>
>
>
> > I am only replying because you have not had a response.
> > Is there any chance at all that the clocks are in fact out? Any chance
> > that
> > you are looking at local times on the servers and that the Universal times
> > are in fact out?
> > If not:
> > - obviously there is a chance that something is intermittently breaking
> > the
> > network, but that would be hard to track unless it tends to happen at the
> > same time.
> > - what account are you using for your application pool? The worker process
> > is recycled every 1740 minutes by default, and I am just wondering if the
> > process is being recycled, re-authenticating and taking a while to come
> > back.
> > Anthony,http://www.airdesk.co.uk

>
> > "JimLad" <jamesdbi...@yahoo.co.uk> wrote in message

>
> >news:1187963927.575251.44310@q5g2000prf.googlegroups.com...

>
> > > Hi,

>
> > > This is a last desperate call for help. About once a week, for
> > > between
> > > 2 and 10 minutes, users are unable to log in to our main web
> > > application (ASP based). They get the following message:

>
> > > 'Failed to generate SSPI context'

>
> > > Looking at the System Log on the web server displays the following
> > > messages for the web site and SQL SPNs:

>
> > > 'The Security System detected an authentication error for the server
> > > HTTP/<website name>. The failure code from authentication protocol
> > > Kerberos was "The time at the Primary Domain Controller is different
> > > than the time at the Backup Domain Controller or member server by too
> > > large an amount.
> > > (0xc0000133)".'

>
> > > ' The Security System detected an authentication error for the server
> > > MSSQLSvc/S05010010.corp.dnsdom.net:1433. The failure code from
> > > authentication protocol Kerberos was "The time at the Primary Domain
> > > Controller is different than the time at the Backup Domain Controller
> > > or member server by too large an amount.
> > > (0xc0000133)".'

>
> > > I have used net time to check the times on the Domain Controller, web
> > > server and db server. Can't see any problems. Our system guys have
> > > been through the 'Failed to generate SSPI context' knowledge base
> > > articles.

>
> > > I haven't seen anything referring to this as a regularly repeating
> > > intermittent problem. We are getting worried cos there is always the
> > > chance it won't come back up!

>
> > > I also notice that the Kerberos group policy "Maximum Tolerance for
> > > Computer Clock Synchronization" is not defined. Does this need to be
> > > defined or will it automatically use the default of 5 minutes?

>
> > > Any help very gratefully received.

>
> > > Cheers,

>
> > > James- Hide quoted text -

>
> > - Show quoted text -

>
> Hi Anthony,
>
> Looks like it's definitely the time, but I'm not sure where and why it
> should be such an intermittent problem. Basically it occurs on a
> weekly cycle (moving forward by the amount of time it was broken the
> week before) and occurs for between a few seconds and 10 minutes.
>
> We turned on Kerberos tracing and in the 16 seconds that it didn't
> work this week we got the following messages on the web server:
>
> Event Type: Error
> Event Source: Kerberos
> Event Category: None
> Event ID: 3
> Date: 30/08/2007
> Time: 17:01:38
> User: N/A
> Computer: S05010072
> Description:
>
> A Kerberos Error Message was received:
> on logon session
> Client Time:
> Server Time: 16:1:39.0000 8/30/2007 Z
> Error Code: 0xb KDC_ERR_NEVER_VALID
> Extended Error: 0xc0000133 KLIN(0)
> Client Realm:
> Client Name:
> Server Realm: CORP.DNSDOM.NET
> Server Name: MSSQLSvc/S05010010.corp.dnsdom.net:1433
> Target Name: MSSQLSvc/S05010010.corp.dnsdom.net:1...@CORP.DNSDOM.NET
> Error Text:
> File: 9
> Line: ae0
> Error Data is in record data.
>
> Event Type: Error
> Event Source: Kerberos
> Event Category: None
> Event ID: 3
> Date: 30/08/2007
> Time: 17:01:47
> User: N/A
> Computer: S05010072
> Description:
>
> A Kerberos Error Message was received:
> on logon session
> Client Time:
> Server Time: 16:1:49.0000 8/30/2007 Z
> Error Code: 0xb KDC_ERR_NEVER_VALID
> Extended Error: 0xc0000133 KLIN(0)
> Client Realm:
> Client Name:
> Server Realm: CORP.DNSDOM.NET
> Server Name: HTTP/<websitehostheader>
> Target Name: HTTP/<websitehostheader>@CORP.DNSDOM.NET
> Error Text:
> File: 9
> Line: ae0
> Error Data is in record data.
>
> 0xB - KDC_ERR_NEVER_VALID: Requested start time is later than end time
> Associated internal Windows error codes
> ·None
> Corresponding debug output messages
> ·DebugLog("Client asked for endtime before starttime\n")
> Possible Cause and Resolution
> ·There is a time difference between the KDC and the client.
> Resolution
> For Kerberos authentication to work, you must synchronize clocks on
> the client and on the server. For more information about this error
> and how to resolve it, see Time Synchronization (Clock Skew) earlier
> in this white paper.
>
> Any ideas why we would get this error message once a week for a window
> of between a few seconds and 10 minutes?
>
> Is there any way of knowing where the KDC is? I assume it's one of the
> domain controllers, but as we have several is there a way of knowing
> which is being used?
>
> We have also been getting non-fatal Kerberos messages (0x25
> KRB_AP_ERR_SKEW) about the time on file server S20. This isn't a DC
> and isn't involved in the authentication so I'm not sure why we are
> getting this message, even though that server is indeed 6 minutes
> fast.
>
> Outside this time window we get lots of the following messages:
> 0x34 KRB_ERR_RESPONSE_TOO_BIG (harmless apparently)
> 0xd KDC_ERR_BADOPTION (for web server)
> 0x7 KDC_ERR_S_PRINCIPAL_UNKNOWN (for one of the local domain
> controllers, that should always be available)
> 0x25 KRB_AP_ERR_SKEW (for file server S20)
>
> Cheers,
>
> James- Hide quoted text -
>
> - Show quoted text -


Thanks Anthony. I've tried looking at tickets on the web server and db
server using klist in a scheduled job - I read somewhere that that
allowed you to look at tickets for the machine account rather than the
current user account. I can't see any tickets or TGTs with a renewal
datetime corresponding to the error time. I have asked one of our
systems guys to have a look at the tickets on the domain contollers
although I'm not hopeful.

Thanks for your help. I'm not sure we'll be able to solve this. We are
going to build a new web server and transfer and if that doesn't cure
it we'll put in a call to Microsoft Support.

Cheers,

James
 
Back
Top