For the past week I’ve been struggling with a very particular issue in the Linux kernel. I seriously doubt that anyone else will have this issue, but if they do, perhaps a bit of luck and SEO will bring them to this page.
For part of my thesis work, I’m dealing with network devices—think Ethernet
and Wi-Fi cards, although they can also be “virtual”, created by a program like
a VPN. The kernel represents these devices, both physical and virtual, with the
struct net_device
. I don’t know much about them, but I do know that Chapter 2
of Understanding Linux Network Internals by Christian Benvenuti will give you
a good overview of the structure.
Anyway, back in the “good old days” of the kernel, you could iterate over the list of network devices quite easily. You would have to acquire a lock for safety, but then you just accessed a global linked list. However, back in 2009, network namespaces were added to the kernel.
Network namespaces serve the same purpose as any other type of namespace in the kernel. They help partition resources into groups, so that processes can be placed in these namespaces and only have access to some of the available resources. These are the foundation of technologies like containers. In the case of the network namespace, the resources being partitioned are network devices, along with firewall rules and all the other parts of the network stack.
Unfortunately, this complicates things if you’re looking to simply iterate over
the available network devices, or maybe find the device named eth0
or tun0
.
Instead of a global list of all network devices, each namespace has its own
list, and in order to get a network device you first need to have a reference to
the network namespace that device is in (struct net
). But there may be many
namespaces, or network namespaces may not be enabled in the kernel config! So
how do you get the correct struct net *
for looking up a network device?
In my case, I was working on behalf of a particular socket (struct sock
). I
figured that a socket is a network resource, and so it ought to belong to a
network namespace. So how do I get that namespace from the socket? I searched
through the definition of struct sock
and found that it has a member named
sk_net
(sort of 1), which has type possible_net_t
. The definition of
possible_net_t
is:
// find me in include/net/net_namespace.h
typedef struct {
#ifdef CONFIG_NET_NS
struct net *net;
#endif
} possible_net_t;
So, sockets will only contain a reference to their containing namespace when
namespaces are enabled—a smart optimization that saves (usually) 8 bytes of
memory for every single socket on the system. But how do you get the namespace
when CONFIG_NET_NS
is not enabled? The struct net
still exists in this
situation, but userspace cannot create new ones, so there will only ever be one
namespace.
It turns out that there is a function exactly for this purpose: read_pnet()
:
// find me in include/net/net_namespace.h
static inline struct net *read_pnet(const possible_net_t *pnet)
{
#ifdef CONFIG_NET_NS
return pnet->net;
#else
return &init_net;
#endif
}
This function ensures that when you read one of these possible_net_t
structures, you will always get the correct struct net *
, even when the config
option is disabled! Even more conveniently, there is a helper function for
sockets that does this for you:
// find me in include/net/sock.h
static inline
struct net *sock_net(const struct sock *sk)
{
return read_pnet(&sk->sk_net);
}
So at the end of the day, the answer to the question “how do I get a struct net
*
from a struct sock *
is quite simple: use sock_net()
”. In retrospect, it
sounds almost obvious, as if it were some helper method or attribute of a class!
But believe it or not, making this discovery (as banal as it may seem) took me
several hours (think, like, at least 8 hours of staring at a screen).
struct sock
… there’s also struct inet_sock
and struct tcp_sock
. Plus,
if you happen to work on MPTCP, there’s struct mptcp_sock
, plus the
additional bonus round of figuring out what the difference between a “master”
and “meta” socket is.struct net
was in the real world.net_device
,
net
, and sock
(which could not be more generic…).sk_net
field of struct sock
, I tried to use it directly,
until I compiled with a configuration that had CONFIG_NET_NS
disabled and
got an angry compiler error.read_pnet()
and then sock_net()
.
Any documentation at all (even a comment above the functions in the header)
would have been awesome for getting me there sooner, but alas, there was
none.I think most developers have experienced that issue they spent way too much time investigating, only to find that there was a simple solution under their very nose. The trouble here is that the pace of kernel development is so fast that extra documentation would simply go out of date in a few months or years, adding to the wasteland of old information future learners need to search through. No matter what, it seems like these questions are best answered by reading the code.
Actually, as is normal in the kernel, the true story is a bit more
complicated. Instead of having a sk_net
element in struct sock
, there is
a struct sock_common
which holds (you guessed it) common variables for all
sockets. They use some clever preprocessor magic to make you think that
sk_net
exists in struct sock
, when really it’s named skc_net
inside
of struct sock_common
. Check out the summary below:
// find me in include/net/sock.h
struct sock_common {
/* ... */
possible_net_t skc_net;
/* ... */
}
struct sock {
/* ... */
struct sock_common __sk_common;
/* ... */
#define sk_net __skc_common.skc_net
/* ... */
}