Home Corporate Contacts TETware Knowledgebase |
ProductsSolutionsInformationDatasheetDocumentationFAQKnowledgebase |
Return to Knowledgebase Index16. How to run a distributed test case which reboots one of the systemsThe information in this article is not presented as a complete solution but might be helpful to someone who is attempting to solve a similar type of problem.
Question We are using TETware 3.2 for running distributed tests
on UNIX systems using the
We have a requirement where we need to shutdown one of the systems that is running the test.
Answer First some background . . .
The precise behaviour that you will observe depends on what TCP/IP does
when the machine at the other end shuts down.
If the machine that is shutting down closes the connections in an
orderly way (as would happen in a normal shutdown), then the connected
peers will get notification of the close in the normal way (EOF on read,
By contrast, if the connections are not closed in an orderly way (as can sometimes happen when a machine crashes), the connection will simply hang for some period of time. Synchronisation requests will time out, but other connections will wait indefinitely for something to happen to the connection. Now, to answer your questions . . .
So, if you want to reboot (say) system 2, you should not include
system 2 in the system list that you pass to the
You will need call tet_remsync()
at various times so as to
ensure that all this happens in the correct order.
The order of events will look something like this. Events that are synchronised are connected by <-----> .
System 1 System 2 --------------------------------- --------------------------------- Create a child process using tet_fork() with a NULL parentproc and zero validresults (parent blocks in tet_fork() call, waiting for child to exit) In child process ---------------- Call tet_remexec() <-----------------> tccd forks and execs the to launch a remote process on remote process system 2 that will reboot the system tet_remexec() returns <------------> Remote process controller calls (if tet_remexec returns -1, don't sync tet_main() but print diagnostic and call tet_exit(1)) In remote process ----------------- Call signal(SIGHUP, SIG_IGN) Sync with system 2 to syncpoint N <---> Sync with system 1 to syncpoint N+1 (sync call returns) (sync call blocks) Call tet_exit(0) <-------------------> (tccd sends SIGHUP to remote (child logs off tccd on system 2 process which is ignored - process and exits) stays blocked in sync call) (call to tet_fork() returns in parent - if child exit status is non-zero this means that tet_remexec() has failed so give up; the API has already reported UNRESOLVED in this case) Parent process continues ------------------------ Sync with system 2 to syncpoint N+1 <-> (sync call returns) Sleep a bit - wait for system 2 call tet_logoff() to call reboot() (no more API calls are allowed after this point!) call reboot() (remote process and tccd get killed as system 2 goes down) ==================================== Enter the ping/sleep loop - wait for system 2 to come back up again ping loop ends <--------------------> System restarts - a new instance of tccd becomes Sleep a bit - wait for system 2 available once the system to come up multi-user comes up multi-user Call tet_remexec() to launch a etc ... different remote process on system 2; this time, call tet_remwait() to wait for the remote process to terminate Footnote This suggestion was offered speculatively and had not been tried out at the time of writing. But a subsequent message from the recipient indicated that a strategy based on this suggestion had in fact been successful.
See also
|