%Network Subsystem %Thadeu Cascardo # Socket Buffers * include linux/skbuff.h * struct sk\\_buff - struct net\\_device *dev * alloc\\_skb(len, gfp) * dev\\_alloc\\_skb(len) - uses ATOMIC, reserves PAD * netdev\\_alloc\\_skb(netdev, len) - uses ATOMIC, since 2.6.18 * kfree\\_skb, dev\\_kfree\\_skb # Diagram * head - start of allocated buffer * end - end of allocated buffer * data - start of data, giving space after headroom * tail - end of used space * headroom - space between head and data * tailroom - space between tail and end # Socket Buffers operations * skb\\_put(skb, len) - adds data to end of skb * skb\\_push(skb, len) - adds data to start of skb * skb\\_pull(skb, len) - allocates headroom * skb\\_headroom and skb\\_tailroom * skb\\_reserve - only allowed for empty buffer, reserves headroom * skb\\_orphan - release it from its socket holder # Network Device * include linux/netdevice.h * struct net\\_device - char name[] - features - stats - netdev\\_ops - ethtool\\_ops - header\\_ops - flags - mtu - type - hard\\_header\\_len # Network Device Setup * alloc\\_netdev(szpriv, name, setup) * setup function * include linux/etherdevice.h * ether\\_setup * alloc\\_etherdev * register\\_netdev * unregister\\_netdev * free\\_netdev # Network Device Operations * struct net\\_device\\_ops * ndo\\_init * ndo\\_open - should call netif\\_start\\_queue * ndo\\_stop - should call netif\\_stop\\_queue # Network Device Address * struct net\\_device - dev\\_addr * random\\_ether\\_addr * struct net\\_device\\_ops - ndo\\_set\\_mac\\_address * eth\\_mac\\_addr # Transmission * ndo\\_start\\_xmit * Called with softirqs disabled or in softirq context * Called with a held lock # Limits on transmission * When TX buffers are full, xmit may call netif\\_stop\\_queue * Should arrange to get netif\\_wake\\_queue called after TX buffers are free again # Transmission timeout * Transmission may timeout after queue has been stopped * Current code already updates last time packet was transmitted * Driver should set watchdog\\_timeo and ndo\\_tx\\_timeout # Reception * Usually happens in an interrupt handler * Driver allocates skb: some drivers allocate it when setting up and arrange the device to write into the buffers directly * Must set skb field protocol: easily done for ethernet drivers with eth\\_type\\_trans * Finally, call netif\\_rx # NAPI * To allow more performance, NAPI introduces polling, avoiding too much interrupts when load is high * Driver disables interrupts and enables polling in its interrupt handler when RX happens * Network subsystem uses a softirq to do the polling * The driver poll function disables polling and reenabled interrupts when it's done with its hardware queue # NAPI * struct napi\\_struct * netif\\_napi\\_add(dev, napi, poll\\_func, weight) * napi\\_enable: called in open * napi\\_disable: called in stop - awaits completion * napi\\_schedule - napi\\_schedule\\_prep - \\_\\_napi\\_schedule * napi\\_complete: called in poll when all is done * Use netif\\_receive\\_skb instead of netif\\_rx # NAPI step by step * In the interrupt handler: - Checks that the interrupt received is RX - Call napi\\_schedule\\_prep to check that napi isn't already scheduled - Disable RX - Call \\_\\_napi\\_schedule # Weight and Budget * The weight is the start budget for the interface, usually 16 * The poll function must not dequeue more frames than the budget * It must call napi\\_complete if and only if it has exhausted the hardware queues with less than the budget * It must return the number of entries in the queue processed # Changes in net device * Use netdev\\_priv, no priv anymore * struct net\\_device\\_ops introduced in 2.6.29, with compatibility provided for drivers * Compatibility removed in 2.6.31 * netdev\\_tx\\_t: NETDEV\\_TX\\_OK, NETDEV\\_TX\\_BUSY, NETDEV\\_TX\\_LOCKED # Other recent changes * Some members moved to netdev\\_queue to increase cache-line usage * GRO/GSO - Handle hardware checksum acceleration * Multi-queue support, for devices with multiple queues, so they can be handled in different CPUs * RPS - Receive Packet Steering, which distributes protocol processing amongst multiple CPUs in case of a single device, single queue system * RFS - Receive Flow Steering, which tries to handle the packet in the CPU where the application is running