agl-compositor crashing in KVM demo
Description
Environment
Attachments
Activity
Close for Pike M1
When Weston 11 lands in poky master we can probably more easily rig up a test with just agl-image-compositor + native-shell-client by using the meta-agl-core next branch.
I guess an immediate follow-up would be add some hot-plugging support in that native shell client. I can look into improving that, given also I need to add some notes (https://lf-automotivelinux.atlassian.net/browse/SPEC-4832#icft=SPEC-4832).
Thanks for the update, there's no faking of outputs like Qt does when the underlying wayland objects go away so I think this might be it why the shell client is crashing. There's no code to handle that and I haven't tried at all with that simple client, just the Qt one.
Even so, the compositor should (still) be up & running after that.
I'd be in favor of making that shell client a bit more resilient to cope with hot-plugging events (which I suspect it would crash this issue or not). This would be one thing to improve and couple probably improve a bit more the situation.
Another thing is that few more hot-plug fixes have made it through weston 11, so the fix included here and a few more landed there, so I'd suggest trying with that if possible. At least that's something we can do from our side to mediate the hardware issue.
My apologies, I did do a quick test last week, but didn't update here. I would say the backported change helps somewhat, as I did see on a couple of boots where the HDMI-A-2 kept disconnecting and reconnecting and there wasn't a crash of agl-compositor, but IIRC native-shell-client did not handle the display going away and coming back well (I think it crashes, can confirm again when I rig the test setup back up). I didn't comment here because on probably 2/3rds of my test boots there was still a crash of both agl-compositor and native-shell-client, and I was going to circle back and build an image with debug information to try and get useful stack traces for a comment. I'll try to find time to do that in the next couple of days. Overall, I am on the fence about whether to reopen this, and if so where to go next. It seems quite likely the root issue is hardware related since the same HDMI extender works fine on the other HDMI connector on the board, so realistically we'd perhaps have to ask Panasonic to investigate that. As to whether we should put a lot of effort into trying to make the agl-compositor/libweston + shell client + apps stack actually completely handle such a display issue, it's unclear to me if we can justify it...
I tried booting the agl-kvm-demo-platform image on the AGL reference hardware to vet some recent changes, and native-shell-client or QEMU seems to be tickling something in the newer agl-compositor in Octopus and master, as it throws an assert:
Apr 25 21:02:57 h3ulcb agl-compositor[1047]: agl-compositor: ../weston-10.0.2/libweston/backend-drm/kms.c:1403: atomic_flip_handler: Assertion `output->atomic_complete_pending' failed.
This results in a bit of a mess with it aborting triggering coredumps from the QEMU instances and native-shell-client, which systemd then attempts to restart, so the whole process iterates. The coredumps don't really provide much more information to my untrained eye, but I can try pulling something our of a agl-compositor one if you can tell me what to look for.