N-Epitomizer: A Semantic Offloading Framework Leveraging Essential Information for Timely Neural Network Inferences
Wooseung Nam, Sung Yong Lee, Jinsung Lee, Huijeong Choe, Sangtae Ha, Kyunghan Lee
IEEE Transactions on Networking
Offloading neural network inferences from resource-constrained mobile devices to an edge server over wireless networks is becoming more crucial as the neural networks get heavier. To this end, recent studies have tried to make this offloading process more efficient. However, the most fundamental question on extracting and offloading the minimal amount of necessary information that does not degrade the inference accuracy has remained unanswered. We call such an ideal offloading semantic offloading and propose N-epitomizer, a new offloading framework that enables semantic offloading, thus achieving more reliable and timely inferences in highly-fluctuated or even low-bandwidth wireless networks. To realize N-epitomizer, we design an autoencoder-based scalable encoder trained to extract the most informative data and scale its output size to meet the latency and accuracy requirements of inferences over a network. We also accelerate N-epitomizer by exploiting light-weight knowledge distillation for the encoder design and decoder slimming for the decoder design, reducing its overall computation time significantly. Moreover, we extend our N-epitomizer to support multiple DNNs by extracting and offloading the union of the essential information required for each DNN. Our evaluation shows that N-epitomizer achieves exceptionally high compression for images without compromising inference accuracy, which is 21 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times$</tex-math> </inline-formula> , 77 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times$</tex-math> </inline-formula> , and 192 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times$</tex-math> </inline-formula> higher than JPEG compression, and 20 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times$</tex-math> </inline-formula> , 55 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times$</tex-math> </inline-formula> , and 86 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times$</tex-math> </inline-formula> higher than the state-of-the-art DNN-aware image compression GRACE for semantic segmentation, depth estimation, and classification, respectively. Our results show N-epitomizer’s strong potential as the first semantic offloading system to guarantee end-to-end latency even under highly varying cellular networks.
https://doi.org/10.1109/ton.2024.3523891
Computer science
Artificial neural network
Human–computer interaction
Data science
Artificial intelligence
World Wide Web
상세 정보 바로가기