Friday, March 30, 2012

PAE problem question (KB834628)

This question is related to
http://support.microsoft.com/defaul...b;en-us;834628.
We had a database crash about a month ago that was non-recoverable. This was
a subscriber database run in a continuous transactional environment. Server
is W2K3, /PAE and /3GB, 8GB ram, SQL Server gets 6655mb.
I ran a dbcc checkdb and came back with a large number of errors starting
with Errors 1 (see "Errors 1" below).
There are quite few more errors. The checkdb ends with
--CHECKDB found 0 allocation errors and 150 consistency errors in
database 'DB_rpt1'.
repair_allow_data_loss is the minimum repair level for the errors found by
DBCC CHECKDB (DB_rpt1 ).
DBCC execution completed. If DBCC printed error messages, contact your
system administrator.--
---
At the time we saw absolutely no errors at ANY level (SAN, OS, etc) except
when the SQL Server errorlog start screaming about --
2004-02-18 16:48:32.10 spid52 Error: 823, Severity: 24, State: 2
2004-02-18 16:48:32.10 spid52 I/O error (bad page ID) detected during
read at offset 0x00000004a66000 in file
'w:\DBdata\DB_rpt1_data2.NDF'..--
We had an HBA hardware failure 6 days later, we semi-attributed the
corruption to the bad HBA.
Could this be the 834628 problem? If so how can I tell - the KB is fairly
non-specific about how to spot the error.
Also would the fn_dblog function be of any use and what would I look for -
the article seems to say that one line would be all zero's but this isn't
that clear.
---
Errors 1 --(this is the very beginning of the checkdb
dump)--
Server: Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID 0, page ID (4:9523). The PageId in the
page header = (0:0).
Server: Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID 0, page ID (4:9524). The PageId in the
page header = (0:0).
Server: Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID 0, page ID (4:9525). The PageId in the
page header = (0:0).
Server: Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID 0, page ID (4:9526). The PageId in the
page header = (0:0).
Server: Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID 0, page ID (4:9527). The PageId in the
page header = (0:0).
Server: Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID 0, page ID (4:9528). The PageId in the
page header = (0:0).
Server: Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID 0, page ID (4:9529). The PageId in the
page header = (0:0).
Server: Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID 0, page ID (4:9530). The PageId in the
page header = (0:0).
DBCC results for 'DB_rpt1'.
CHECKDB found 0 allocation errors and 8 consistency errors not associated
with any single object.That is very unlikely a PAE problem. Errors like that are almost always
attributable to bad blocks on disk. With an HBA on the blink, it is likely
that some bad data got written to the disk array, possibly where data was
deleted since the pages are not owned by a specific object.
Geoff N. Hiten
Microsoft SQL Server MVP
Senior Database Administrator
Careerbuilder.com
I support the Professional Association for SQL Server
www.sqlpass.org
"frankm" <frank@.nxspxm.mallardcentral.com> wrote in message
news:uHA$AJOEEHA.3472@.TK2MSFTNGP09.phx.gbl...
> This question is related to
> http://support.microsoft.com/defaul...b;en-us;834628.
> We had a database crash about a month ago that was non-recoverable. This
was
> a subscriber database run in a continuous transactional environment.
Server
> is W2K3, /PAE and /3GB, 8GB ram, SQL Server gets 6655mb.
> I ran a dbcc checkdb and came back with a large number of errors starting
> with Errors 1 (see "Errors 1" below).
> There are quite few more errors. The checkdb ends with
> --CHECKDB found 0 allocation errors and 150 consistency errors in
> database 'DB_rpt1'.
> repair_allow_data_loss is the minimum repair level for the errors found by
> DBCC CHECKDB (DB_rpt1 ).
> DBCC execution completed. If DBCC printed error messages, contact your
> system administrator.--
> ---
> At the time we saw absolutely no errors at ANY level (SAN, OS, etc) except
> when the SQL Server errorlog start screaming about --
> 2004-02-18 16:48:32.10 spid52 Error: 823, Severity: 24, State: 2
> 2004-02-18 16:48:32.10 spid52 I/O error (bad page ID) detected during
> read at offset 0x00000004a66000 in file
> 'w:\DBdata\DB_rpt1_data2.NDF'..--
> We had an HBA hardware failure 6 days later, we semi-attributed the
> corruption to the bad HBA.
> Could this be the 834628 problem? If so how can I tell - the KB is fairly
> non-specific about how to spot the error.
> Also would the fn_dblog function be of any use and what would I look for -
> the article seems to say that one line would be all zero's but this isn't
> that clear.
> ---
> Errors 1 --(this is the very beginning of the checkdb
> dump)--
> Server: Msg 8909, Level 16, State 1, Line 1
> Table error: Object ID 0, index ID 0, page ID (4:9523). The PageId in the
> page header = (0:0).
> Server: Msg 8909, Level 16, State 1, Line 1
> Table error: Object ID 0, index ID 0, page ID (4:9524). The PageId in the
> page header = (0:0).
> Server: Msg 8909, Level 16, State 1, Line 1
> Table error: Object ID 0, index ID 0, page ID (4:9525). The PageId in the
> page header = (0:0).
> Server: Msg 8909, Level 16, State 1, Line 1
> Table error: Object ID 0, index ID 0, page ID (4:9526). The PageId in the
> page header = (0:0).
> Server: Msg 8909, Level 16, State 1, Line 1
> Table error: Object ID 0, index ID 0, page ID (4:9527). The PageId in the
> page header = (0:0).
> Server: Msg 8909, Level 16, State 1, Line 1
> Table error: Object ID 0, index ID 0, page ID (4:9528). The PageId in the
> page header = (0:0).
> Server: Msg 8909, Level 16, State 1, Line 1
> Table error: Object ID 0, index ID 0, page ID (4:9529). The PageId in the
> page header = (0:0).
> Server: Msg 8909, Level 16, State 1, Line 1
> Table error: Object ID 0, index ID 0, page ID (4:9530). The PageId in the
> page header = (0:0).
> DBCC results for 'DB_rpt1'.
> CHECKDB found 0 allocation errors and 8 consistency errors not associated
> with any single object.
>|||My original theory still looks good - that the hardware was on the way out
and "glitched" writing bad data to disk. The article wasn't very clear on
cause and effect - so I had to ask.
Thank You !
"Geoff N. Hiten" <SRDBA@.Careerbuilder.com> wrote in message
news:u3mbNcOEEHA.1368@.TK2MSFTNGP11.phx.gbl...
> That is very unlikely a PAE problem. Errors like that are almost always
> attributable to bad blocks on disk. With an HBA on the blink, it is
likely
> that some bad data got written to the disk array, possibly where data was
> deleted since the pages are not owned by a specific object.
> --
> Geoff N. Hiten
> Microsoft SQL Server MVP
> Senior Database Administrator
> Careerbuilder.com
> I support the Professional Association for SQL Server
> www.sqlpass.org
> "frankm" <frank@.nxspxm.mallardcentral.com> wrote in message
> news:uHA$AJOEEHA.3472@.TK2MSFTNGP09.phx.gbl...
> was
> Server
starting
by
except
fairly
for -
isn't
the
the
the
the
the
the
the
the
associated
>|||Hi Frankm,
I agreed with Geoff N that it seems this problem is not mostly related the
problem in KB834628. Based on my experience, 823 errors are mostly hardware
related. Specially this error 823 on bad page ID is raised when the page ID
is invalid. When the error occurred, you can try to perform the following
steps to troubleshoot.
1. Check for error in the system and application from NT even viewer
2. Run DBCC checkdb on databse
a. IF it's sql 7.0, use "-y823" to start sql server and capture the dump
when the error is raised again
b. IF it's sql 2k, user DBCC Dumptrigger('SET', 823) to enable the dump
(this will NOT require the sql to be recycled!)
When the error 823 happens, it will catch the dump on it in sql errorlog,
3. Please also contact hardware vendor to run hardware diagnostic.
If the problem is very complex and you are unable to narrow down, you can
contact our PSS with the above information to continue working with a
dedicated Support Professional.
Regards,
Michael Shao
Microsoft Online Partner Support
Get Secure! - www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.|||Thanks...
I have not used the DumpTrigger before - sounds really useful...
As I mentioned - there were absolutely NO errors at any other level - SAN,
OS, hardware -nothing.
The SAN people were in and could find no problem, keep in mind that the lun
was spread across 54 disks.
So the likelihood of it being a SAN problem was remote at best.
I had to ask the question -- Just in case it may be related...
Thanks for your help...
""Yuan Shao"" <v-yshao@.online.microsoft.com> wrote in message
news:PgR3kzUEEHA.616@.cpmsftngxa06.phx.gbl...
> Hi Frankm,
> I agreed with Geoff N that it seems this problem is not mostly related the
> problem in KB834628. Based on my experience, 823 errors are mostly
hardware
> related. Specially this error 823 on bad page ID is raised when the page
ID
> is invalid. When the error occurred, you can try to perform the following
> steps to troubleshoot.
> 1. Check for error in the system and application from NT even viewer
> 2. Run DBCC checkdb on databse
> a. IF it's sql 7.0, use "-y823" to start sql server and capture the dump
> when the error is raised again
> b. IF it's sql 2k, user DBCC Dumptrigger('SET', 823) to enable the dump
> (this will NOT require the sql to be recycled!)
> When the error 823 happens, it will catch the dump on it in sql errorlog,
> 3. Please also contact hardware vendor to run hardware diagnostic.
> If the problem is very complex and you are unable to narrow down, you can
> contact our PSS with the above information to continue working with a
> dedicated Support Professional.
> Regards,
> Michael Shao
> Microsoft Online Partner Support
> Get Secure! - www.microsoft.com/security
> This posting is provided "as is" with no warranties and confers no rights.
>

No comments:

Post a Comment