Uploaded image for project: ' AGL Development'
  1. AGL Development
  2. SPEC-1750

[lava][lava-docker] Jobs deadlock

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: CIAT
    • Labels:

      Description

      Take this job:

      https://lava.automotivelinux.org/scheduler/job/332

       

      It gets stuck half-way and stays like this for >23h.

      Isn't there a global job timeout ?
      Why does the job lock-up. My hinch is a packet drop on the zmq pipe to the master that causes the connection to reset somehome. Then the master cannot re-aquire the correct job state / log (or vice versa, the slave possibly terminates correctly but it is not captured by the master).

      Result is that a long queue builds-up in jenkins for test-jobs waiting for these to finish.

       

        Attachments

        1. 332.log
          15 kB
        2. js.log.gz
          155 kB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

            Assignee:
            clabbe Corentin Labbe
            Reporter:
            jsmoeller Jan-Simon Moeller
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: