Model Compression & Optimization

Model compression has emerged as an important area of research for deploying deep learning models on IoT devices. However, model compression is not a sufficient solution to fit the models within the memory of a single device; as a result we need to distribute them across multiple devices. This leads to a distributed inference paradigm in which communication costs represent another major bottleneck. To this end, we focus on knowledge distillation and ‘teacher’ – ‘student’ type of architectures for distributed model compression, as well as data independent model compression.

model compressions

Selected Publications

Farcas, Allen-Jasmin; Li, Guihong; Bhardwaj, Kartikeya; Marculescu, Radu

A Hardware Prototype Targeting Distributed Deep Learning for On-Device Inference (Inproceedings)

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 398–399, 2020.

(Links | BibTeX)

Bhardwaj, Kartikeya; Suda, Naveen; Marculescu, Radu

Dream distillation: A data-independent model compression framework (Journal Article)

arXiv preprint arXiv:1905.07072, 2019.

(Links | BibTeX)

Bhardwaj, Kartikeya; Lin, Ching-Yi; Sartor, Anderson; Marculescu, Radu

Memory-and communication-aware model compression for distributed deep learning inference on iot (Journal Article)

ACM Transactions on Embedded Computing Systems (TECS), 18 (5s), pp. 1–22, 2019.

(Links | BibTeX)

Sartor, Anderson Luiz; Becker, Pedro Henrique Exenberger; Wong, Stephan; Marculescu, Radu; Beck, Antonio Carlos Schneider

Machine Learning-Based Processor Adaptability Targeting Energy, Performance, and Reliability (Inproceedings)

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pp. 158–163, IEEE 2019.

(Links | BibTeX)