trinity-devel@lists.pearsoncomputing.net

Message: previous - next
Month: February 2019

TDM grey-screen-of-death. Can anyone point me in the right direction?

From: Russell Brown <russell@...>
Date: Wed, 20 Feb 2019 13:06:30 +0000 (GMT)
Hi Devs,

Before I file a bug on this I'd like to try to narrow it down a bit so
I'm looking for some pointers on how TDM works and where I might start
looking.


Since upgrading to Debian Stretch and TDE:  R14.0.6, we're seeing an
issue where TDM won't start a new session for a display and has zombie
child process(es).  This is with networked X-Terminals using XDMCP.

It's happening roughly every week or so (no fixed time of day) on a
system with around 30 users/terminals logging in and out.

It might happen after a user logs out or after the remote terminal is
reset or, according users rumour, after a kdesktoplock has been
unlocked; I'm failing to get real concrete info on that aspect (I'm sure
you know what users are like!).

Anyway, the terminal gets fired up/restarts X11 after a logout but
doesn't then get a response to its XDMCP requests and just sits there
showing a default X11 background (hence the grey-screen-of-death
moniker).

When TDM is in this state, doing "ps waux | grep tdm" shows:

root     18481  0.0  0.0      0     0 ?        Z    Feb19   0:00 [tdm] <defunct>
root     20178  0.0  0.0      0     0 ?        Z    Feb19   0:00 [tdm] <defunct>
root     25478  0.0  0.0      0     0 ?        Z    Feb19   0:00 [edm] <defunct>

These processes in pstree look like:

 |-tdm(12251)-+-tdm(1004)---tdm_greet(1005)-+-krootimage(1009)
           |            |                             |-twin(1013)
           |            |                             `-{tdm_greet}(1020)
           |            |-tdm(1389)---tdm_greet(1390)-+-krootimage(1392)
           |            |                             `-{tdm_greet}(2309)
               <snip>
           |            |-tdm(4546)---tdm_greet(4547)-+-krootimage(4549)
           |            |                             |-twin(4552)
           |            |                             `-{tdm_greet}(4556)
           |            |-tdm(4738)---starttde(21417)-+-ssh-agent(21539)
           |            |                             `-tdeinit_phase1(21613)---kwrapper(21614)
           |            |-tdm(18481)
           |            |-tdm(20178)
           |            |-tdm(20178)
           |            |-tdm(25478)
               <snip>
           |-tdmtsak(15721)

Doing an strace on the main TDM process, shows it doing a select on a
bunch of fds but nothing else.

If I restart tdm (/etc/init.d/tdm restart), it will start working again
and, mostly, preserve the live users sessions...  but sometimes not and
everyone gets booted out :-(

Once restarted like this, some tdms have a parent of systemd and some
are children of the new? main tdm process:

systemd(1)-+-ModemManager(1591)-+-{gdbus}(1599)
           |                    `-{gmain}(1595)
      <snip>
           |-tdm(16655)---starttde(6171)-+-ssh-agent(6293)
           |                             `-tdeinit_phase1(6369)---kwrapper(6370)
           |-tdm(29430)---starttde(26723)-+-ssh-agent(26845)
           |                              `-tdeinit_phase1(26933)---kwrapper(26934)
           |-tdm(32506)---starttde(21854)-+-ssh-agent(21976)
           |                              `-tdeinit_phase1(22057)---kwrapper(22058)
           |-tdm(32680)-+-tdm(424)---tdm_greet(429)-+-krootimage(1068)
           |            |                           |-twin(1071)
           |            |                           `-{tdm_greet}(1075)
           |            |-tdm(426)---starttde(7968)-+-ssh-agent(8092)
           |            |                           `-tdeinit_phase1(8229)---kwrapper(8231)
           |            |-tdm(428)---tdm_greet(431)-+-krootimage(1150)
           |            |                           |-twin(1155)


So...  can anyone suggest where I should be looking to try an narrow
down the cause of this?

How does TDM know when a session ends and what stops it providing a new
one?

Should it have zombies hanging around?  Should it reap it's zombie
children?  I see there's a ReapChildren function in dm.c which seems
like it should get called when it gets a SIGCHLD.  How could that not be
working?

Thanks in advance.

-- 
 Regards,
     Russell
 --------------------------------------------------------------------
| Russell Brown          | MAIL: russell@... PHONE: 01780 471800 |
| Lady Lodge Systems     | WWW Work: http://www.lls.com              |
| Peterborough, England  | WWW Play: http://www.ruffle.me.uk         |
 --------------------------------------------------------------------