首頁 > 軟體

CentOS 6.5下VLAN裝置的效能問題

2020-06-16 17:50:57

問題描述

之前做的一些網路效能的測試都是在三層網路測試的,最近在大二層網路重新測試TDocker的網路效能時,發現物理機的效能比容器還差,在容器內部可以跑60w+,物理機器卻只能跑45w+。這與100w+的預期相差太遠。

由於在大二層的網路下引入了VLAN裝置(由於linux bridge不支援VLAN而引入),所以初步懷疑問題出在VLAN network device。

使用perf看一下,發現dev_queue_xmit中的一個spin lock占用了大量的CPU,達到70%+。

但是,在3.10.x的核心下卻沒有這個問題:

從上面可以看到,在3.10.x核心下,核心spin lock的開銷很小。另外,從後者的呼叫的路徑可以看到,spin lock主要出現在sk_buff從VLAN裝置下發物理網絡卡,而不是從協定棧下發VLAN裝置。看來,對於CentOS6.5(2.6.32-431),問題主要出現在VLAN裝置。

原因分析

先看看dev_queue_xmit函數,它是協定棧到底層網路裝置的入口。

dev_queue_xmit

//net/core/dev.c
int dev_queue_xmit(struct sk_buff *skb)
{
    struct net_device *dev = skb->dev;
    struct netdev_queue *txq;
    struct Qdisc *q;
...
    txq = netdev_pick_tx(dev, skb);
    q = rcu_dereference(txq->qdisc);

    trace_net_dev_queue(skb);
    if (q->enqueue) { ///對於VLAN裝置,沒有qdisc佇列,參考noqueue_qdisc
        rc = __dev_xmit_skb(skb, q, dev, txq);
        goto out;
    }

    /* The device has no queue. Common case for software devices:
       loopback, all the sorts of tunnels...

       Really, it is unlikely that netif_tx_lock protection is necessary
       here.  (f.e. loopback and IP tunnels are clean ignoring statistics
       counters.)
       However, it is possible, that they rely on protection
       made by us here.

       Check this and shot the lock. It is not prone from deadlocks.
       Either shot noqueue qdisc, it is even simpler 8)
     */
    if (dev->flags & IFF_UP) {
        int cpu = smp_processor_id(); /* ok because BHs are off */

        if (txq->xmit_lock_owner != cpu) {

            HARD_TX_LOCK(dev, txq, cpu);

            if (!netif_tx_queue_stopped(txq)) {
                rc = NET_XMIT_SUCCESS;
                if (!dev_hard_start_xmit(skb, dev, txq)) {
                    HARD_TX_UNLOCK(dev, txq);
                    goto out;
                }
            }
            HARD_TX_UNLOCK(dev, txq);
        } 
    }

    rc = -ENETDOWN;
    rcu_read_unlock_bh();

可以看到,核心在把sk_buff下發給網路裝置驅動之前,會嘗試請求佇列的xmit_lock,這是為了防止SMP多個CPU同時給driver下發資料。實際上,大部分driver自身內部已經實現了lock,所以,這裡的xmit_lock顯得有點多餘。所以,核心引入了NETIF_F_LLTX,如果驅動已經實現了lock,就會設定NETIF_F_LLTX標誌位,這樣,核心在呼叫dev_queue_xmit時,就不會對xmit_lock加鎖了。

TX_LOCK

#define HARD_TX_LOCK(dev, txq, cpu) {           
    if ((dev->features & NETIF_F_LLTX) == 0) {  
        __netif_tx_lock(txq, cpu);      
    }                       
}

static inline void __netif_tx_lock(struct netdev_queue *txq, int cpu)
{
    spin_lock(&txq->_xmit_lock);
    txq->xmit_lock_owner = cpu;
}

從上面的程式碼可以看到,如果網路裝置設定了NETIF_F_LLTX,核心就不會對xmit_lock加鎖。

但是CentOS6.5(2.6.32-431)的核心,對於VLAN裝置,卻沒有設定NETIF_F_LLTX,由於VLAN裝置只有一個佇列,必然導致xmit_lock競爭,使得sys CPU高達70%多。

  • 2.6.32-431
static int vlan_dev_init(struct net_device *dev)
{
    struct net_device *real_dev = vlan_dev_info(dev)->real_dev;
...
    /* IFF_BROADCAST|IFF_MULTICAST; ??? */
    dev->flags  = real_dev->flags & ~(IFF_UP | IFF_PROMISC | IFF_ALLMULTI);
    dev->iflink = real_dev->ifindex;
    dev->state  = (real_dev->state & ((1<<__LINK_STATE_NOCARRIER) |
                      (1<<__LINK_STATE_DORMANT))) |
              (1<<__LINK_STATE_PRESENT);

    dev->features |= real_dev->features & real_dev->vlan_features;
...
  • 3.10.x

而在3.10.x的核心,對於VLAN裝置,也只有一個佇列,為什麼卻沒有效能問題呢?

實際上,3.10.x的核心,對於VLAN裝置,設定了NETIF_F_LLTX,僅管只有一個佇列,也不會有xmit_lock的開銷。

static int vlan_dev_init(struct net_device *dev)
{
    struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
...
    /* IFF_BROADCAST|IFF_MULTICAST; ??? */
    dev->flags  = real_dev->flags & ~(IFF_UP | IFF_PROMISC | IFF_ALLMULTI |
                      IFF_MASTER | IFF_SLAVE);
    dev->iflink = real_dev->ifindex;
    dev->state  = (real_dev->state & ((1<<__LINK_STATE_NOCARRIER) |
                      (1<<__LINK_STATE_DORMANT))) |
              (1<<__LINK_STATE_PRESENT);

    dev->hw_features = NETIF_F_ALL_CSUM | NETIF_F_SG |
               NETIF_F_FRAGLIST | NETIF_F_ALL_TSO |
               NETIF_F_HIGHDMA | NETIF_F_SCTP_CSUM |
               NETIF_F_ALL_FCOE;

    dev->features |= real_dev->vlan_features | NETIF_F_LLTX;

檢視網路裝置features

一般來說,我們可以通過ethtool -k 檢視網路裝置的feature:

  • 2.6.32-431
# ethtool  -k eth1.11
Features for eth1.11:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: off
tx-vlan-offload: off
ntuple-filters: off
receive-hashing: off

對於CentOS6.5(2.6.32-431),是從/sys/class/net/${ethX}/features讀取features:

#cat /sys/class/net/eth1.11/features
0x114833
--------------------------
1 0001 0100 1000 0011 0011  0x114833
          1 0000 0000 0000  NETIF_F_LLTX    4096
            1000 0000 0000  NETIF_F_GSO     2048
     1 0000 0000 0000 0000  NETIF_F_TSO     1<<16
        100 0000 0000 0000  NETIF_F_GRO     16384
                        01  NETIF_F_SG      1
                        10  NETIF_F_IP_CSUM 2
                    1 0000  NETIF_F_IPV6_CSUM  16
                   10 0000  NETIF_F_HIGHDMA  32
1 0000 0000 0000 0000 0000  NETIF_F_TSO6  (1<<20)

可以看到,CentOS6.5的核心對於VLAN裝置,沒有設定NETIF_F_LLTX標誌。

  • 3.10.x

對於3.10.x核心,已經沒有/sys/class/net/${ethX}/features,但是核心支援ETHTOOL_GFEATURES命令(2.6.32-431不支援該命令),ethtool通過ETHTOOL_GFEATURES獲取網路裝置的features:

//net/core/ethtool.c
int dev_ethtool(struct net *net, struct ifreq *ifr)
{   
    case ETHTOOL_GFEATURES:
        rc = ethtool_get_features(dev, useraddr);
        break;
# ./ethtool -k eth1.11 | grep tx-lockless
tx-lockless: on [fixed]
# ./ethtool -k eth1 | grep tx-lockless   
tx-lockless: off [fixed]

從上面可以確認,3.10.x的核心對VLAN裝置的確設定了NETIF_F_LLTX標誌。

  • ethtool的實現
//ethtool-3.5
static struct feature_state *
get_features(struct cmd_context *ctx, const struct feature_defs *defs)
{
...
    if (defs->n_features) { ///核心支援ETHTOOL_GFEATURES
        state->features.cmd = ETHTOOL_GFEATURES;
        state->features.size = FEATURE_BITS_TO_BLOCKS(defs->n_features);
        err = send_ioctl(ctx, &state->features);
        if (err)
            perror("Cannot get device generic features");
        else
            allfail = 0;
    } else {
        /* We should have got VLAN tag offload flags through
         * ETHTOOL_GFLAGS.  However, prior to Linux 2.6.37
         * they were not exposed in this way - and since VLAN
         * tag offload was defined and implemented by many
         * drivers, we shouldn't assume they are off.
         * Instead, since these feature flag values were
         * stable, read them from sysfs.
         */
        char buf[20]; ///從/sys/class/net/%s/features讀取features
        if (get_netdev_attr(ctx, "features", buf, sizeof(buf)) > 0)
            state->off_flags |=
                strtoul(buf, NULL, 0) &
                (ETH_FLAG_RXVLAN | ETH_FLAG_TXVLAN);
    }


static int get_netdev_attr(struct cmd_context *ctx, const char *name,
            char *buf, size_t buf_len)
{
#ifdef TEST_ETHTOOL
    errno = ENOENT;
    return -1;
#else
    char path[40 + IFNAMSIZ];
    ssize_t len;
    int fd;

    len = snprintf(path, sizeof(path), "/sys/class/net/%s/%s",
               ctx->devname, name);
    assert(len < sizeof(path));
    fd = open(path, O_RDONLY);
    if (fd < 0)
        return fd;
    len = read(fd, buf, buf_len - 1);
    if (len >= 0)
        buf[len] = 0;
    close(fd);
    return len;
#endif
} 

本文永久更新連結地址http://www.linuxidc.com/Linux/2015-10/124529.htm


IT145.com E-mail:sddin#qq.com