LAVA: deployimages has false timeout when failure_retry = 2
Description
Environment
Activity
Kevin Hilman May 21, 2019 at 6:03 PM
LAVA developers acknowledge this is a bug (c.f. mailing list thread above) with no simple workaround.
Kevin Hilman April 29, 2019 at 9:56 PM
I tried what you did and got the same results. I sent an email to the list for clarificaion:
https://lists.lavasoftware.org/pipermail/lava-users/2019-April/001804.html
Jan-Simon Moeller April 29, 2019 at 8:56 PM(edited)
Well, deploy in total should not exceed e.g. 30 minutes or we're in a bad state.
According to https://validation.linaro.org/static/docs/v2/timeouts.html#individual-action-overrides we should be able to influence the http_download timeout. I just don't see that actually in effect e.g. here:
https://lava.automotivelinux.org/scheduler/job/3265/definition#defline48
but
https://lava.automotivelinux.org/scheduler/job/3265#action_1-1-1
If I read above doc correctly, we should be able to limit the http_download to e.g. 15min, the deployimages to e.g. 30.
Kevin Hilman April 29, 2019 at 5:22 PM
I don't think there's a logic bug.
The job has a set a 25 minute (1500 sec) timeout for the deploy action. That includes all downloads and retries. IMO, if we want this to be able to deal with the (often slow or broken) AGL downloads server, then we'll need to just increase that timeout.
Jan-Simon Moeller April 26, 2019 at 9:56 AM
it blocks the email reports as well.
we should allow failure_retry due to the time it takes to download artifacts.
See:
https://lava.automotivelinux.org/scheduler/job/3209
I don't know if this is a logic bug (no extension of timeout on failure_retry) or if we need to extend a different timeout to allow for the failure_retry to go through.
This is a blocker to enable failed boots back to gerrit.
Please check.
@Kevin Hilman , @Corentin Labbe .