qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] linux-user: manage SOCK_PACKET socket type.


From: Laurent Vivier
Subject: Re: [Qemu-devel] [PATCH] linux-user: manage SOCK_PACKET socket type.
Date: Tue, 27 Oct 2015 11:47:38 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0


Le 27/10/2015 04:09, Laurent Vivier a écrit :
> 
> 
> Le 26/10/2015 15:40, Peter Maydell a écrit :
>> On 6 October 2015 at 18:11, Laurent Vivier <address@hidden> wrote:
>>> This is obsolete, but if we want to use dhcp with some distros (like debian
>>> ppc 8.2 jessie), we need it.
>>>
>>> At the bind level, we are not able to know the socket type so we try to
>>> guess it by analyzing the name. We manage only the case "ethX",
>>> "ethX" in spk_device is similar to set htons(0x6574) in sll_protocol in the
>>> normal case, and as this protocol does not exist, it's ok.
>>>
>>> SOCK_PACKET uses network endian to encode protocol in socket()
>>>
>>> in PACKET(7) :
>>>                                  protocol is the  IEEE  802.3  protocol
>>> number in network order.  See the <linux/if_ether.h> include file for a
>>> list of allowed protocols.  When protocol is  set  to  htons(ETH_P_ALL)
>>> then all protocols are received.  All incoming packets of that protocol
>>> type will be passed to the packet socket before they are passed to  the
>>> protocols implemented in the kernel.
>>>
>>> Signed-off-by: Laurent Vivier <address@hidden>
>>> ---
>>> This patch is a remix of an old patch sent in 2012:
>>> https://patchwork.ozlabs.org/patch/208892/
>>>
>>>  linux-user/syscall.c | 33 +++++++++++++++++++++++++++------
>>>  1 file changed, 27 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
>>> index 64be431..71cc1e2 100644
>>> --- a/linux-user/syscall.c
>>> +++ b/linux-user/syscall.c
>>> @@ -111,6 +111,7 @@ int __clone2(int (*fn)(void *), void *child_stack_base,
>>>  #include <linux/route.h>
>>>  #include <linux/filter.h>
>>>  #include <linux/blkpg.h>
>>> +#include <linux/if_packet.h>
>>>  #include "linux_loop.h"
>>>  #include "uname.h"
>>>
>>> @@ -1198,11 +1199,20 @@ static inline abi_long 
>>> target_to_host_sockaddr(struct sockaddr *addr,
>>>      memcpy(addr, target_saddr, len);
>>>      addr->sa_family = sa_family;
>>>      if (sa_family == AF_PACKET) {
>>> -       struct target_sockaddr_ll *lladdr;
>>> +        /* Manage an obsolete case :
>>> +         * if socket type is SOCK_PACKET, bind by name otherwise by index
>>> +         * but we are not able to know socket type, so check if the name
>>> +         * is usable...
>>> +         * see linux/net/packet/af_packet.c: packet_bind_spkt()
>>> +         */
>>> +        if (strncmp((char *)((struct sockaddr_pkt *)addr)->spkt_device,
>>> +                    "eth", 3) != 0) {
>>> +            struct target_sockaddr_ll *lladdr;
>>
>> This confuses me. The packet(7) manpage suggests there are two flavours
>> of packet socket:
>>  (1) legacy AF_INET + SOCK_PACKET
>>  (2) new style AF_PACKET + SOCK_RAW / SOCK_DGRAM
>>
>> but this comment suggests it's trying to handle AF_PACKET + SOCK_PACKET ?
> 
> In fact, I've started not from the man page, but from a non working dhcp
> client, originally with a m68k target and etch-m68k distro, and I've met
> again this problem on a ppc target and jessie distro.
> 
>>
>> If AF_PACKET was introduced as the new way of doing things, it's not
>> clear why it would be the family type used in the legacy approach's
>> sockaddr_pkt (though there seems to be code around that does this).
>> I suspect that 2.0 kernels just didn't check af_family here at all.
> 
> by digging into the code, I have found:
> 
> $ apt-get source isc-dhcp-client
> $ vi isc-dhcp-4.3.1/common/lpf.c
> 
> ...
>  72 int if_register_lpf (info)
>  73         struct interface_info *info;
>  74 {
> ...
>  79         if ((sock = socket(PF_PACKET, SOCK_PACKET,
>  80                            htons((short)ETH_P_ALL))) < 0) {
> ...
> 
> So we can see socket() is used with domain PF_PACKET and type SOCK_PACKET.
> 
> Next, the interface name is put into the sa_data of sockaddr, and bind()
> is used with AF_PACKET:
> 
>  94         /* Bind to the interface name */
>  95         memset (&sa, 0, sizeof sa);
>  96         sa.sa_family = AF_PACKET;
>  97         strncpy (sa.sa_data, (const char *)info -> ifp, sizeof
> sa.sa_data);
>  98         if (bind (sock, &sa, sizeof sa)) {
> 
> ifp is initialized from a list of all discovered interfaces in
> common/discover.c:
> ...
>  238 int
>  239 begin_iface_scan(struct iface_conf_list *ifaces) {
> ...
>  283         if (ioctl(ifaces->sock, SIOCGLIFCONF, &ifaces->conf) < 0) {
> ...
>  918 void
>  919 discover_interfaces(int state) {
> ...
>  924         struct interface_info *tmp;
> ...
>  939         if (!begin_iface_scan(&ifaces)) {
> ...
>  955         while (next_iface(&info, &err, &ifaces)) {
>  956
>  957                 /* See if we've seen an interface that matches this
> one. */
>  958                 for (tmp = interfaces; tmp; tmp = tmp->next) {
>  959                         if (!strcmp(tmp->name, info.name))
>  960                                 break;
>  961                 }
> 
> ...
>  987                         strcpy(tmp->name, info.name);
> ...
> 1050         }
> ...
> 1063         for (tmp = interfaces ; tmp != NULL ; tmp = tmp->next) {
> 1064                 if (tmp->ifp == NULL) {
> 1065                         struct ifreq *tif;
> 1066
> 1067                         tif = (struct ifreq *)dmalloc(sizeof(struct
> ifreq),
> 1068                                                       MDL);
> 1069                         if (tif == NULL)
> 1070                                 log_fatal("no space for ifp mockup.");
> 1071                         strcpy(tif->ifr_name, tmp->name);
> 1072                         tmp->ifp = tif;
> 1073                 }
> 1074         }
> 
>>
>> The code in the kernel's packet_recvmsg fills in the spkt_family
>> field with the netdevice's type field, which is an ARPHRD_* constant
>> as far as I can tell (I could well be wrong here).
> 
> kernel 4.3.0-rc3, net/packet/af_packet.c:
> 
>    2961
>    2962 static int packet_bind_spkt(struct socket *sock, struct sockaddr
> *uaddr,
>    2963                             int addr_len)
>    2964 {
>    2965         struct sock *sk = sock->sk;
>    2966         char name[15];
>    2967         struct net_device *dev;
>    2968         int err = -ENODEV;
>    2969
>    2970         /*
>    2971          *      Check legality
>    2972          */
>    2973
>    2974         if (addr_len != sizeof(struct sockaddr))
>    2975                 return -EINVAL;
>    2976         strlcpy(name, uaddr->sa_data, sizeof(name));
>    2977
>    2978         dev = dev_get_by_name(sock_net(sk), name);
>    2979         if (dev)
>    2980                 err = packet_do_bind(sk, dev, pkt_sk(sk)->num);
>    2981         return err;
>    2982 }
> ...
>    4246 static const struct proto_ops packet_ops_spkt = {
>    4247         .family =       PF_PACKET,
> ...
>    4250         .bind =         packet_bind_spkt,
> ...
>    3022
>    3023 static int packet_create(struct net *net, struct socket *sock,
> int protocol,
>    3024                          int kern)
> ...
>    3045         if (sock->type == SOCK_PACKET)
>    3046                 sock->ops = &packet_ops_spkt;
> ...
> 
>> Odd to have code in target_to_host_sockaddr and not in
>> host_to_target_sockaddr.
> 
> In the case of host_to_target_sockaddr(), there is no "if (sa_family ==
> AF_PACKET) {" as it was in target_to_host_sockaddr(), it's why I didn't
> add it (and I don't like to add code I don't test).
> 
>>
>>> -       lladdr = (struct target_sockaddr_ll *)addr;
>>> -       lladdr->sll_ifindex = tswap32(lladdr->sll_ifindex);
>>> -       lladdr->sll_hatype = tswap16(lladdr->sll_hatype);
>>> +            lladdr = (struct target_sockaddr_ll *)addr;
>>> +            lladdr->sll_ifindex = tswap32(lladdr->sll_ifindex);
>>> +            lladdr->sll_hatype = tswap16(lladdr->sll_hatype);
>>> +        }
>>>      }
>>>      unlock_user(target_saddr, target_addr, 0);
>>>
>>> @@ -2509,7 +2519,12 @@ static abi_long do_socketcall(int num, abi_ulong 
>>> vptr)
>>>      /* now when we have the args, actually handle the call */
>>>      switch (num) {
>>>      case SOCKOP_socket: /* domain, type, protocol */
>>> -        return do_socket(a[0], a[1], a[2]);
>>> +        if (a[0] == AF_PACKET ||
>>> +            a[1] == TARGET_SOCK_PACKET) {
>>> +            return do_socket(a[0], a[1], tswap16(a[2]));
>>> +        } else {
>>> +            return do_socket(a[0], a[1], a[2]);
>>> +        }
>>>      case SOCKOP_bind: /* sockfd, addr, addrlen */
>>>          return do_bind(a[0], a[1], a[2]);
>>>      case SOCKOP_connect: /* sockfd, addr, addrlen */
>>> @@ -7500,7 +7515,13 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
>>> arg1,
>>>  #endif
>>>  #ifdef TARGET_NR_socket
>>>      case TARGET_NR_socket:
>>> -        ret = do_socket(arg1, arg2, arg3);
>>> +        if (arg1 == AF_PACKET ||
>>> +            arg2 == TARGET_SOCK_PACKET) {
>>> +            /* in this case, socket() needs a network endian short */
>>> +            ret = do_socket(arg1, arg2, tswap16(arg3));
>>> +        } else {
>>> +            ret = do_socket(arg1, arg2, arg3);
>>> +        }
>>
>> This doesn't make sense to me. The argument to socket()
>> is passed in via a register; so if the guest code correctly
>> passes the protocol value as htons(whatever) then that will
>> be the value in arg3 and we do not need to swap anything.
>>
>> I see we had this discussion about the previous version of the
>> patch too... and my argument that we don't need the tswap16
>> in the socketcall code path either still makes sense to me.
> 
> I agree with you, I think I have confused socketcall() that passes
> parameters by memory, and socket() that passes parameters by registers.
> I will remove this part and resend a patch.

And for the socketcall part, we need the tswap16():

for instance,

    int a = htons(0x0003);

On a LE host:

    a = 0x00000300

On a BE host:

    a = 0x00000003

If the guest is BE, it will put in memory:

    0x00 0x00 0x00 0x03

Then a LE host, will read:

    int b = 0x03000000

but get_user_ual() in do_socketcall() will byte-swap it and put
0x00000003 in a[2].

so without the byte-swap, we call do_socket(..., 0x0003),
whereas the syscall is waiting for htons(0x0003) -> 0x0300 as we are on
LE host.

I'm sorry, I can't explain that better...

>>
>>>          fd_trans_unregister(ret);
>>>          break;
>>>  #endif
>>> --
>>> 2.4.3
>>
>> thanks
>> -- PMM
>>
> 
> Thank you for your comments,
> Laurent
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]