On-Device AI with Efficient Transformer Variants

Arvind Rajendra Choudhary

doi:10.15662/IJRAI.2024.0703002

Authors

Arvind Rajendra Choudhary MITE Moodbidri, Karnataka, India Author

DOI:

https://doi.org/10.15662/IJRAI.2024.0703002

Keywords:

n-device AI, Efficient Transformers, Model Compression, Quantization, Hardware-Aware Optimization, Mobile NLP, Edge ComputingarXiv

Abstract

On-device artificial intelligence (AI) has become increasingly vital for applications requiring real-time processing, privacy preservation, and reduced latency. Transformers, initially designed for cloud-based tasks, have been adapted to function efficiently on resource-constrained devices. This paper reviews various efficient transformer variants tailored for on-device AI, focusing on their architectural innovations, performance benchmarks, and deployment strategies. Key approaches include model compression, quantization, pruning, and hardware-aware optimizations. We also discuss the trade-offs between computational efficiency and model accuracy, providing insights into the practical deployment of these models on mobile and embedded systems.Redditacejournal.org+1

References

1. Wu, Z., Liu, Z., Lin, J., Lin, Y., & Han, S. (2020). Lite Transformer with Long-Short Range Attention. arXiv preprint arXiv:2004.11886.arXiv

2. Wang, H., Wu, Z., Liu, Z., Cai, H., Zhu, L., Gan, C., & Han, S. (2020). HAT: Hardware-Aware Transformers for Efficient Natural Language Processing. arXiv preprint arXiv:2005.14187.arXiv

3. Ge, T., Chen, S.-Q., & Wei, F. (2022). EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation. arXiv preprint arXiv:2202.07959.arXiv

4. Pan, J., Bulat, A., Tan, F., Zhu, X., Dudziak, L., Li, H., Tzimiropoulos, G., & Martinez, B. (2022). EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers. arXiv preprint arXiv:2205.03436.arXiv

5. Wang, H., Wu, Z., Liu, Z., Cai, H., Zhu, L., Gan, C., & Han, S. (2020). HAT: Hardware-Aware Transformers for Efficient Natural Language Processing. arXiv preprint arXiv:2005.14187.

6. Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2020). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.Ace Journal

7. Sun, Z., Yu, H., Song, X., Liu, R., Yang, Y., & Zhou, D. (2020). MobileBERT: a compact task-agnostic BERT for resource-limited devices. arXiv preprint arXiv:2004.02984.Ace Journal

8. Jiao, X., et al. (2020). TinyBERT: Distilling BERT for Natural Language Understanding. Findings of EMNLP.Ace Journal

9. Jacob, B., et al. (2018). Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. CVPR.Ace Journal

10. Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2020). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.Ace Journal

11. Sun, Z., Yu, H., Song, X., Liu, R., Yang, Y., & Zhou, D. (2020). MobileBERT: a compact task-agnostic BERT for resource-limited devices