Fixed
Details
Details
Assignee
Corentin Labbe
Corentin LabbeReporter
Jan-Simon Moeller
Jan-Simon MoellerLabels
Contract ID
Components
Priority
Created September 22, 2018 at 8:53 PM
Updated December 7, 2018 at 8:56 PM
Resolved October 23, 2018 at 11:52 AM
Take this job:
https://lava.automotivelinux.org/scheduler/job/332
It gets stuck half-way and stays like this for >23h.
Isn't there a global job timeout ?
Why does the job lock-up. My hinch is a packet drop on the zmq pipe to the master that causes the connection to reset somehome. Then the master cannot re-aquire the correct job state / log (or vice versa, the slave possibly terminates correctly but it is not captured by the master).
Result is that a long queue builds-up in jenkins for test-jobs waiting for these to finish.