netboot fails due to network service triggering interface causing nbd to fail

Fixed

Description

Captured this:

[ OK ] Reached target Network (Pre). Starting Connection service... Starting Network Service... <<< [ OK ] Started Cynagora service. <<< [ OK ] Started Service to start the API 'supervisor'. Starting Run pending agl postinsts... [ OK ] Started User Mode Init Manager for TI shared transport. Starting Bluetooth service... [ 20.413584] run-agl-postinsts[3075]: Running postinst /etc/agl-postinsts/10-afb-test.sh... [ 51.933948] block nbd0: Connection timed out <<< [ OK ] Started Network Service. <<< [FAILED] Failed to start Start the security manager. See 'systemctl status security-manager.service' for details. [FAILED] Failed to start Telephony service. See 'systemctl status ofono.service' for details. [FAILED] Failed to start Login Service. See 'systemctl status systemd-logind.service' for details. [ OK ] Stopped Login Service. Starting Login Service... [FAILED] Failed to start Connection service. See 'systemctl status connman.service' for details. [ OK ] Reached target Network. Starting Avahi mDNS/DNS-SD Stack... Starting Target Communication Framework agent... [FAILED] Failed to start Target Communication Framework agent. See 'systemctl status tcf-agent.service' for details. [FAILED] Failed to start Avahi mDNS/DNS-SD Stack. See 'systemctl status avahi-daemon.service' for details. [ OK ] Stopped Connection service. Starting Connection service... [ OK ] Stopped Network Service. Starting Network Service... [FAILED] Failed to start Login Service. See 'systemctl status systemd-logind.service' for details. [FAILED] Failed to start Bluetooth service. See 'systemctl status bluetooth.service' for details. [ OK ] Reached target Bluetooth. [ OK ] Stopped Login Service. Starting Login Service...

It seems like the network is cycling up/down/up now which kills nbd.

Environment

None

Activity

Show:

Walt Miner 
July 10, 2020 at 3:14 PM

Close for JJ RC1

Jan-Simon Moeller 
March 11, 2020 at 4:49 PM

We had a nother session today - on the h3.

 

So what happens is:

http://git.yoctoproject.org/cgit/cgit.cgi/poky/commit/?h=zeus&id=a972e401

 

Of course debugging on qemux86-64 did not help here.  winking face .

Now the path forward is:

a) Fix upstream  - we actually have a nicer solution in our tree that we can upstream

    see: https://git.automotivelinux.org/AGL/meta-agl/tree/meta-agl-profile-core/recipes-core/systemd/systemd_%25.bbappend#n23

 

b) in our tree, remove the 80-wired.network file to avoid any unwanted interactions with connman

 

c) discuss and rework the usage of systemd-networkd:

    - we currently use it for CAN - but we do not need it strictly speaking

    - do the containers rely on systemd-networkd ?

    - option: disable systemd-networkd completely

 

Stephane Desneux 
March 4, 2020 at 2:51 PM

After firmware upgrade on h3ulcb (ws2.0), I don't get any boot issue anymore. Note that connman is also configured correctly to exclude eth0.

Scott Murray 
March 3, 2020 at 11:28 PM

I spoke too soon, I did see what looks like the same issue twice after a few boots, but then it stopped reproducing.  Which possibly suggests some form of race condition, but I'm not sure what it would be.

Jan-Simon Moeller 
March 3, 2020 at 11:00 PM

lava uses plain xnbd-server

root@agl-core-lab-1:/# dpkg -l | grep nbd
ii xnbd-common 0.3.0-2 amd64 Network Block Device - common files
ii xnbd-server 0.3.0-2 amd64 Network Block Device server with support for live migration

 

Details

Assignee

Reporter

Fix versions

Labels

Affects versions

Priority

Created March 3, 2020 at 11:47 AM
Updated July 10, 2020 at 3:14 PM
Resolved May 5, 2020 at 10:00 PM

Flag notifications