On 03/03/2014 02:06 PM, François Andriot wrote: > To be more precise, TDE does not lose tracks of the tdeio process. The tdeio > scheduler is always aware of its slave threads. > The actual problem is that the tdeio scheduler never receives the "job is > finished" notification from some slaves. > So it considers this slave as being eternally busy and keeps spawning new ones ... > > The nominal scenario looks like: > 1) an application requests an URL to the tdeio scheduler (e.g. konqueror asks > "directory listing for sftp://remotehost/") > 2) the tdeio scheduler instantiates a "job" > 3) the job looks for an idle "slave" that can do the job (e.g. correct > protocol), uses one if it exists, or else asks the scheduler to instantiate a > new slave. > 4) the slave spawns a 3rd party process (ssh in my case) and waits for text > output. (note: stheome slaves do the job directly without spawning a 3rd party > process) > 5) The 3rd party process does its job (remote directory listing for example) and > writes output to the slave. > 6) After the command is complete, the slave ceases receiving data because > nothing is written anymore by the 3rd party process. > 7) The slave sends "finished" to the job. > 8) the job sends "finished" to the scheduler. > 9) the scheduler deletes the job and puts the slave in the "idle slave list" so > that it can be reused by another job, or will be killed after some minutes of > idleness. > > What happens with my "kdirlist" problem (and probably your imap problem too), is > that step 7 never occurs. > For an unknown reason, the slave, after having received the correct data, never > notifies the job that it has finished, then the job never notifies the > scheduler, then the scheduler think the slave is still active and does not mark > it as "idle" ... and here is our stale tdeioslave ... (note: the slave > eventually gets killed if the remote host closes network connection for idleness > ... but it looks like this does not happen with ssh protocol) > > I'm currently looking into the cache mechanism of the "kdirlist" job class. > I believe (to be confirmed) that when kdirlist uses its internal cache, it still > spawns a slave but does NOT uses it at all, since it already has the data it is > looking for in its cache. > Then it returns the cached data and ignores the spawned slave, which sits there > forever, waiting for a query from the job that never comes ... > > Francois Francois, I'll say it again, "you're good!" Is the slave in #7 actually sending the "finished" and it never makes it to the job? Or is it not sending "finished" at all? Is there any possibility that #7 does not occur because the slave does not know where to send "finished"? What links/connects the slave to the job? Is it a signal/slot, or some memory address that the job originally passed to the slave in #3? Or does the slave just generate the "finished" and pass some type of job number along with it after #6?? Could the slave/job connection created in #3 be broken somehow such that the reverse path in #7 no longer exits after the delay in #6? -- David C. Rankin, J.D.,P.E.