Monday, 9 July 2012

Trouble hooting AD Replication issues PART 1


AD Replication is not working between NYC and london

1.     Deleted all the replication partners configured manually for and under NYC-DC-01
2.     Ran command repadmin /kcc on NYC-DC-01
3.     Opened dssite.msc on nyc-dc-01 and confirmed replication partner got populate automatically as DCLON1
4.     Tried replicate now and it was able to pull replication from DCLON1
5.     Ran command repadmin /kcc on DCLON1
6.     Opened dssite.msc on DCLON1 and confirmed replication partner got populate automatically as NYC-DC-01
7.     Tried replicate now but it failed with error message
---------------------------
Replicate Now
---------------------------
The following error occurred during the attempt to synchronize naming context pcdir.int.gnl from domain controller NYC-DC-01 to domain controller DCLON1:
The naming context is in the process of being removed or is not replicated from the specified server.
This operation will not continue.
---------------------------
OK  
---------------------------
8.     Above error message points out to DNS issues
9.     On NYC-DC-01 confirmed it was point to Infoblox servers for name resolution
10.                        DNS server was also installed on NYC-DC-01 and was running secondary zone for PCDIR.INT.GNL getting transferred from Infoblox servers
11.                        Checked and confirmed all the records for domain controller NYC-DC-01 were registered correctly
12.                        Tried ping GUID for NYC-DC-01 from DCLON1 and vice versa and both worked fine
13.                        Tried access \\nyc-dc-01 from DCLON1 and vice versa and this also worked fine
14.                        Checked event viewer on DCLON1
15.                        Found event ID getting generated every time ran command repadmin /kcc on it for all the partitions

Event Type:            Warning
Event Source:        NTDS KCC
Event Category:    Knowledge Consistency Checker
Event ID:                1925
Date:                       31/05/2012
Time:                      07:48:01
User:                       NT AUTHORITY\ANONYMOUS LOGON
Computer:             DCLON1
Description:
The attempt to establish a replication link for the following writable directory partition failed.

Directory partition:
CN=Configuration,DC=pcdir,DC=int,DC=gnl
Source domain controller:
CN=NTDS Settings,CN=NYC-DC-01,CN=Servers,CN=NYCUS,CN=Sites,CN=Configuration,DC=pcdir,DC=int,DC=gnl
Source domain controller address:
d634407b-dd98-43e2-a6ec-b14d09ddd5b1._msdcs.pcdir.int.gnl
Intersite transport (if any):
CN=IP,CN=Inter-Site Transports,CN=Sites,CN=Configuration,DC=pcdir,DC=int,DC=gnl

This domain controller will be unable to replicate with the source domain controller until this problem is corrected. 

User Action
Verify if the source domain controller is accessible or network connectivity is available.

Additional Data
Error value:
1753 There are no more endpoints available from the endpoint mapper.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

16.                        Checked under registry key on DCLON1
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Parameters

Registry value: TCP/IP Port
Value type: REG_DWORD
Value data: 65000

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Netlogon\Parameters

Registry value: DCTcpipPort
Value type: REG_DWORD
Value data: 65000


17.                        Checked the same setting under NYC-DC-01 they were not defined
18.                        Checked for the same registry keys under all the domain controller in London site and they were listed in all
19.                        Created the values in NYC-DC-01 also
20.                        Restarted the domain controller for getting the registry changes in use
21.                        Checked replication from NYC-DC-01 was it worked fine for DCLON1
22.                        Ran command repadmin /kcc on DCLON1 again and this time different events got registered

Event Type:            Error
Event Source:        NTDS Replication
Event Category:    Replication
Event ID:                2042
Date:                       31/05/2012
Time:                      08:05:09
User:                       NT AUTHORITY\ANONYMOUS LOGON
Computer:             DCLON1
Description:
It has been too long since this machine last replicated with the named source machine. The time between replications with this source has exceeded the tombstone lifetime. Replication has been stopped with this source.
The reason that replication is not allowed to continue is that the two machine's views of deleted objects may now be different. The source machine may still have copies of objects that have been deleted (and garbage collected) on this machine. If they were allowed to replicate, the source machine might return objects which have already been deleted.
Time of last successful replication:
2012-03-28 13:25:21
Invocation ID of source:
057af820-f810-057a-0100-000000000000
Name of source:
d634407b-dd98-43e2-a6ec-b14d09ddd5b1._msdcs.pcdir.int.gnl
Tombstone lifetime (days):
60

The replication operation has failed.

User Action:

Determine which of the two machines was disconnected from the forest and is now out of date. You have three options:

1. Demote or reinstall the machine(s) that were disconnected.
2. Use the "repadmin /removelingeringobjects" tool to remove inconsistent deleted objects and then resume replication.
3. Resume replication. Inconsistent deleted objects may be introduced. You can continue replication by using the following registry key. Once the systems replicate once, it is recommended that you remove the key to reinstate the protection.
 Registry Key:
HKLM\System\CurrentControlSet\Services\NTDS\Parameters\Allow Replication With Divergent and Corrupt Partner


For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

23.                        Created registry key
HKLM\System\CurrentControlSet\Services\NTDS\Parameters\Allow Replication With Divergent and Corrupt Partner
HKLM\System\CurrentControlSet\Services\NTDS\Parameters\Strict Replication Consistency
Both values set to 1

24.                        Did replicate now from DCLON1 for NYC-DC-01 and this time it worked fine
25.                        Ran command repadmin /syncall /ePAD from both the domain controllers and confirmed all the partitions were getting replicated
26.                        Restarted NTFRS service and confirmed SYSVOL and Netlogon were also getting replicated

Therefore the issue here are was firstly AD replication was set to work only on specific port over the firewall and secondly since replication had not worked since long time NYC-DC-01 was been marked as tombstone. Please feel free to confirm if any more details are required.