[LTP] [RFC] [PATCH] netns: Fix race in virtual interface bringup

Alexey Kodanev alexey.kodanev@oracle.com
Fri Nov 17 13:08:20 CET 2017


On 11/17/2017 09:09 AM, Li Wang wrote:
> Hi Dan,
>
> On Fri, Nov 10, 2017 at 4:38 AM, Dan Rue <dan.rue@linaro.org> wrote:
>> Symptoms (+ command, error):
>>     netns_comm_ip_ipv6_ioctl:
>>         + ip netns exec tst_net_ns1 ping6 -q -c2 -I veth1 fd00::2
>>         connect: Cannot assign requested address
>>
>>     netns_comm_ip_ipv6_netlink:
>>         + ip netns exec tst_net_ns0 ping6 -q -c2 -I veth0 fd00::3
>>         connect: Cannot assign requested address
>>
>>     netns_comm_ns_exec_ipv6_ioctl:
>>         + ns_exec 6689 net ping6 -q -c2 -I veth0 fd00::3
>>         connect: Cannot assign requested address
>>
>>     netns_comm_ns_exec_ipv6_netlin:
>>         + ns_exec 6891 net ping6 -q -c2 -I veth0 fd00::3
>>         connect: Cannot assign requested address
>>
>> The error is coming from ping6, which is trying to get an IP address for
>> veth0 (due to -I veth0), but cannot. Waiting for two seconds fixes the
>> test in my testcases. 1 second is not long enough.
>>
>> dmesg shows the following during the test:
>>
>>     [Nov 7 15:39] LTP: starting netns_comm_ip_ipv6_ioctl (netns_comm.sh ip ipv6 ioctl)
>>     [  +0.302401] IPv6: ADDRCONF(NETDEV_UP): veth0: link is not ready
>>     [  +0.048059] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready

It's quite strange that veth interface needs 2 seconds to become
operational and it is up in less than 0.3s according to dmesg, but
you said that it's not enough even 1 sec... Are you sure that IPv6
address not in tentative state and dad process actually disabled?
I'm asking because you don't have it disabled in the script:
https://gist.github.com/danrue/7b76bbcbc23a6296030b7295650b69f3

>>
>> Signed-off-by: Dan Rue <dan.rue@linaro.org>
>> ---
>>
>> We've periodically hit this problem across many arm64 kernels and boards, and
>> it seems to be caused by "ping6" running before the virtual interface is
>> actually ready. "sleep 2" works around the issue and proves that it is a race
>> condition, but I would prefer something faster and deterministic. Please
>> suggest a better implementation.
> Just FYI:
>
> I'm not good at network things, but one method I copied from ltp/numa
> test is to split the '2s' into many smaller pieces of time.
>
> which something like:
>
> --- a/testcases/kernel/containers/netns/netns_helper.sh
> +++ b/testcases/kernel/containers/netns/netns_helper.sh
> @@ -240,6 +240,22 @@ netns_ip_setup()
>                 tst_brkm TBROK "unable to add device veth1 to the
> separate network namespace"
>  }
>
> +wait_for_set_ip()
> +{
> +       local dev=$1
> +       local retries=200
> +
> +       while [ $retries -gt 0 ]; do
> +               dmesg -c | grep -q "IPv6: ADDRCONF(NETDEV_CHANGE):
> $dev: link becomes ready"


What about "grep -q up /sys/class/net/$dev/operstate && break"?

Thanks,
Alexey


> +               if [ $? -eq 0 ]; then
> +                       break
> +               fi
> +
> +               retries=$((retries-1))
> +               tst_sleep 10ms
> +       done
> +}
> +
>  ##
>  # Enables virtual ethernet devices and assigns IP addresses for both
>  # of them (IPv4/IPv6 variant is decided by netns_setup() function).
> @@ -285,6 +301,9 @@ netns_set_ip()
>                         tst_brkm TBROK "enabling veth1 device failed"
>                 ;;
>         esac
> +
> +       wait_for_set_ip veth0
> +       wait_for_set_ip veth1
>  }
>
>  netns_ns_exec_cleanup()
>
>> Also, is it correct that "ifconfig veth0 up" returns before the interface is
>> actually ready?
>>
>> See also this isolated test script:
>> https://gist.github.com/danrue/7b76bbcbc23a6296030b7295650b69f3
>>
>>  testcases/kernel/containers/netns/netns_helper.sh | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/testcases/kernel/containers/netns/netns_helper.sh b/testcases/kernel/containers/netns/netns_helper.sh
>> index a95cdf206..99172c0c0 100755
>> --- a/testcases/kernel/containers/netns/netns_helper.sh
>> +++ b/testcases/kernel/containers/netns/netns_helper.sh
>> @@ -285,6 +285,7 @@ netns_set_ip()
>>                         tst_brkm TBROK "enabling veth1 device failed"
>>                 ;;
>>         esac
>> +       sleep 2
>>  }
>>
>>  netns_ns_exec_cleanup()
>> --
>> 2.13.6
>>
>>
>> --
>> Mailing list info: https://lists.linux.it/listinfo/ltp
>
>



More information about the ltp mailing list