Details
-
Type:
Bug
-
Status:
Closed
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: 2.5.3
-
Fix Version/s: 2.5.10
-
Component/s: Registration
-
Labels:None
-
Environment:PostgreSQL 8.4 under Debian 64 bit.
-
Number of attachments :
Description
We have 4 levels of nodes but with different link options, so we needed to create more than 4 node groups to configure links and routers:
ROOT
---------------------
SERVER1 SERVER2
| ... ------------------------ |
|||
ADM1 ADM2 ADM3 | |
----- ----- | |
S11 S12 S21 S22 S01 S02
ROOT, SERVER and ADMinistration levels can register childs. Store nodes can register/synchronize on a SERVER node or on an ADMinistration node (customer main server), so there are 2 node groups defined for Store level.
After registration at ADM level, the Store level node FAILED to connect:
Client side log:
2012-02-02 13:50:21,440 INFO [SymmetricDS] [SymmetricDS-job-7] Unregistered node is attempting to register
2012-02-02 13:50:21,485 INFO [SymmetricDS] [SymmetricDS-job-7] Using registration URL of https://adm1url:1234/sync/registration?nodeGroupId=STORE_ADM&externalId=11&syncURL=https%3A%2F%2Fanyurl%3A1895%2Fsync&schemaVersion=%3F&databaseType=PostgreSQL&databaseVersion=9.1&symmetricVersion=2.5.3
2012-02-02 13:50:22,610 ERROR [SymmetricDS] [SymmetricDS-job-7] Node identity is missing after registration. The registration server may be misconfigured or have an error.
2012-02-02 13:50:22,610 WARN [SymmetricDS] [SymmetricDS-job-7] Could not register. Sleeping for 17000 ms before attempting again.
Server side log:
2012-02-02 18:34:01,345 INFO [SymmetricDS] [qtp968414967-22] 12 data rows were not routed during the initial load of sym_node_group because they were filtered out by the data router.
2012-02-02 18:34:01,349 INFO [SymmetricDS] [qtp968414967-22] 26 data rows were not routed during the initial load of sym_node_group_link because they were filtered out by the data router.
2012-02-02 18:34:01,354 INFO [SymmetricDS] [qtp968414967-22] 20 data rows were not routed during the initial load of sym_node because they were filtered out by the data router.
2012-02-02 18:34:01,358 INFO [SymmetricDS] [qtp968414967-22] 2 data rows were not routed during the initial load of sym_node_security because they were filtered out by the data router.
2012-02-02 18:34:01,365 INFO [SymmetricDS] [qtp968414967-22] 7 data rows were not routed during the initial load of sym_channel because they were filtered out by the data router.
2012-02-02 18:34:01,373 INFO [SymmetricDS] [qtp968414967-22] 133 data rows were not routed during the initial load of sym_node_channel_ctl because they were filtered out by the data router.
2012-02-02 18:34:01,392 INFO [SymmetricDS] [qtp968414967-22] 202 data rows were not routed during the initial load of sym_trigger because they were filtered out by the data router.
2012-02-02 18:34:01,398 INFO [SymmetricDS] [qtp968414967-22] 50 data rows were not routed during the initial load of sym_router because they were filtered out by the data router.
2012-02-02 18:34:01,539 INFO [SymmetricDS] [qtp968414967-22] 2756 data rows were not routed during the initial load of sym_trigger_router because they were filtered out by the data router.
I ran Symmetric of ADM node in debug mode and found that the recursive funcion NetworkedNode.getNumberOfLinksAwayFromMe may have a bug:
protected int getNumberOfLinksAwayFromMe(String nodeId, int numberOfLinksIAmFromRoot) {
if (!node.getNodeId().equals(nodeId)) {
if (children != null) {
for (NetworkedNode child : children) {
if (child.getNode().getNodeId().equals(nodeId))
else {
int numberOfLinksAwayFromMe = child.getNumberOfLinksAwayFromMe(nodeId,
numberOfLinksIAmFromRoot + 1);
if (numberOfLinksAwayFromMe > numberOfLinksIAmFromRoot)
}
}
}
}
return numberOfLinksIAmFromRoot;
}
If we have more than one children, it always exits at the first step and it never iterates over the rest of children.
I still don't understand all the code, but when I fixed this function in the way I think it should be (iterating over all children), the registration succeded.
Another problem we are having is that the registered Store node (child of a specific ADMinistration node) replicates up to the ROOT and down to all other ADM servers and they start to generate batches for that Store that they are not parent of. I think the sym_node row for that store should not replicate to all ADM nodes.
The network sample was broken by the font, here it is:
ROOT <==> SERVER1 <============> S01/S02/.../S0x
ROOT <==> SERVER1 <==> ADM1 <==> S11/S12/.../S1x
ROOT <==> SERVER1 <==> ADM2 <==> S21/S22/.../S2x
ROOT <==> SERVER1 <==> ADMx ...
ROOT <==> SERVER2 ...