View network as a part of the operating system

Posted by Weitao Wang on May 24, 2021

Nowadays, the latency requirement and the throughput desire on the data center network keep increasing, along with the new technologies like NVMe, Optane memory, in-memory file system, and disaggregated data centers. The bottleneck of the services is shifting from the host-side CPU/GPU computation to the network transmission, and one indication is that the over-subscribed network is replaced by the non-blocking network in most data centers.

To solve this problem, industries as well as academias all accepted that the network should be viewed as an I/O of the operating systems and a latency-target should be given along with the transmission tasks to the network stack. However, is it really a good choice to leave the network transmissions unattended and accept the possibility of SLO violations?

The benefit from this new view has been explained in one of my paper: MXDAG: A Hybrid Abstraction for Cluster Applications.