added a comment - - edited
This is the problem analysis on 18/01/2010.
Workflow log
Initially from workflow's console log QKK20100117043047_output.txt:
Start histmem ...
Histmem stopped
Current rate: 0.980000019073
Driving attenuator to 150 degree ...
ERROR:root:Exception occured. It will now go to clean up mode.
(<class 'jep.SicsExecutionException'>, SicsExecutionException('org.gumtree.gumnix.sics.io.SicsExecutionException: Time out on running device att',), <traceback object at 0x04AD2B20>)
GumTree had suggested that it had failed to run att to 150.
GumTree log
2010-01-17 04:29:02,396 INFO [Thread-9] org.gumtree.gumnix.sics.internal.io.Sic
sSocketListener:general - Server replied: { "con": 45, "trans": 1, "object": "hn
otify", "flag": "hdbevent", "data": { "/commands/histogram/histmem/feedback/stat
us": "IDLE" } }
2010-01-17 04:29:03,443 INFO [pool-6-thread-1] org.gumtree.gumnix.sics.internal
.io.SicsChannel:general - Client sent: contextdo 1332 hset /instrument/collimato
r/att 150.0
2010-01-17 04:29:03,599 INFO [Thread-8] org.gumtree.gumnix.sics.internal.io.Sic
sSocketListener:general - Server replied: { "con": 44, "trans": 1332, "object":
"hset", "flag": "error", "data": "ERROR: somebody else is still driving, Request
rejected" }
2010-01-17 04:29:03,911 INFO [Thread-9] org.gumtree.gumnix.sics.internal.io.Sic
sSocketListener:general - Server replied: { "con": 45, "trans": 1, "object": "hn
otify", "flag": "hdbevent", "data": { "/data/sics_suid": 1886 } }
2010-01-17 04:29:03,911 INFO [Thread-9] org.gumtree.gumnix.sics.internal.io.Sic
sSocketListener:general - Server replied: { "con": 45, "trans": 2, "object": "st
atemon", "flag": "warning", "data": "FINISH = hmcontrol" }
2010-01-17 04:29:03,911 INFO [Thread-9] org.gumtree.gumnix.sics.internal.io.Sic
sSocketListener:general - Server replied: { "con": 45, "trans": 4, "object": "st
atus", "flag": "warning", "data": "status = Eager to execute commands" }
Coming form the GumTree log, it suggested that SICS had refused to run att as it was still busy. Although the count command had returned idle, the histogram memory (hmcontrol) was still busy.
SICS log
04:26:21.198207: Accepted dummy connection
04:26:21.198246: Executing -> ::histogram_memory::countend_event <- from dummy s
ocket
04:26:21.198495: Next line intended for socket: 21
04:26:21.198507: {"/commands/histogram/histmem/feedback/status": "IDLE"}
...
04:26:22.416233: Executing -> contextdo 1332 hset /instrument/collimator/att 150
.0 <- from socket 20
04:26:22.416318: Executing -> hset /instrument/collimator/att 150.0 <- from sock
et 20
04:26:22.416401: Next line intended for socket: 20
04:26:22.416414: ERROR: somebody else is still driving, Request rejected
...
04:26:22.689239: SUID MSG FINISH = hmcontrol
04:26:22.689340: Next line intended for socket: 21
04:26:22.689352: {"/data/sics_suid": 1886}
04:26:22.689433: Next line intended for socket: 21
04:26:22.689447: FINISH = hmcontrol
04:26:22.696145: Next line intended for socket: 21
04:26:22.696176: status = Eager to execute commands
Please note that there is about 2:41 minute time difference between the server and the client, so the timestamp does not match between the logs from SICS and GumTree.
Again, it clear shows SICS was not ready before it can drive another drivable device. However, it is rather unusual, because GumTree did not attempt to drive until 1.2 sec later. In fact, it has never been happened before. The hmcontrol took about 1.5 sec to stop. In comparison, hmcontrol took 0.6 sec to stop from its previous run:
04:25:47.440289: Executing -> ::histogram_memory::countend_event <- from dummy s
ocket
04:25:47.440544: Next line intended for socket: 21
04:25:47.440558: {"/commands/histogram/histmem/feedback/status": "IDLE"}
...
04:25:48.094417: SUID MSG FINISH = hmcontrol
04:25:48.094505: Next line intended for socket: 21
04:25:48.094517: {"/data/sics_suid": 1878}
04:25:48.094595: Next line intended for socket: 21
04:25:48.094608: FINISH = hmcontrol
04:25:48.101428: Next line intended for socket: 21
04:25:48.101461: status = Eager to execute commands
04:25:48.671108: Executing -> contextdo 1328 hset /instrument/collimator/att 180
.0 <- from socket 20
04:25:48.671194: Executing -> hset /instrument/collimator/att 180.0 <- from sock
et 20
Would there be a network latency for such delay?
Summary
The histogram memory took almost double the amount of time to stop, and hence GumTree did not wait sufficiently (1 sec wait time) to drive the next command. The solution to this is to monitor the SICS status after the command is completed (command IDLE doesn't mean SICS is ready to execute next command).
This is the problem analysis on 18/01/2010.
Workflow log
Initially from workflow's console log QKK20100117043047_output.txt:
Start histmem ... Histmem stopped Current rate: 0.980000019073 Driving attenuator to 150 degree ... ERROR:root:Exception occured. It will now go to clean up mode. (<class 'jep.SicsExecutionException'>, SicsExecutionException('org.gumtree.gumnix.sics.io.SicsExecutionException: Time out on running device att',), <traceback object at 0x04AD2B20>)GumTree had suggested that it had failed to run att to 150.
GumTree log
Coming form the GumTree log, it suggested that SICS had refused to run att as it was still busy. Although the count command had returned idle, the histogram memory (hmcontrol) was still busy.
SICS log
Please note that there is about 2:41 minute time difference between the server and the client, so the timestamp does not match between the logs from SICS and GumTree.
Again, it clear shows SICS was not ready before it can drive another drivable device. However, it is rather unusual, because GumTree did not attempt to drive until 1.2 sec later. In fact, it has never been happened before. The hmcontrol took about 1.5 sec to stop. In comparison, hmcontrol took 0.6 sec to stop from its previous run:
Would there be a network latency for such delay?
Summary
The histogram memory took almost double the amount of time to stop, and hence GumTree did not wait sufficiently (1 sec wait time) to drive the next command. The solution to this is to monitor the SICS status after the command is completed (command IDLE doesn't mean SICS is ready to execute next command).