CLOS network: A parameter-efficient alternative to linear layers in neural networks

Orgil Jargalsaikhan; Magvan-Erdene Gantulga; Nomin-Erdene Zorigtbaatar; Battsetseg Erdenebat

doi:10.58873/sict.v4i1.45

Orgil Jargalsaikhan
Magvan-Erdene Gantulga
Nomin-Erdene Zorigtbaatar
Battsetseg Erdenebat

Keywords

Linear layer, CLOS network, Transformers, parameter reduction

Abstract

In this paper, we propose utilizing the CLOS network architecture to replace traditional linear layers in deep learning models including transformers. The CLOS network, commonly used in networking systems, is adapted to neural networks to reduce parameter sizes while maintaining model performance. Our experiments show that the CLOS network achieves the same accuracy and loss as the conventional linear layer, but with fewer parameters. However, this efficiency comes at the cost of increased processing time which is 1.5x to 3x slower. Despite this trade-off, the CLOS network can be an effective alternative for parameter reduction in various architectures, including large models like transformers.

Abstract 173 | Full PDF Downloads 11

References

[1] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
[2] C. Clos, “A study of non-blocking switching networks,” The Bell System Technical Journal, vol. 32, no. 2, pp. 406–424, 1953.
[3] Y. Gong, L. Liu, M. Yang, and L. Bourdev, “Compressing deep convolutional networks using vector quantization,” arXiv preprint arXiv:1412.6115, 2014.
[4] C. Leng, Z. Dou, H. Li, S. Zhu, and R. Jin, “Extremely low bit neural network: Squeeze the last bit out with admm,” arXiv preprint arXiv:1707.09870, 2017.
[5] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” 2016. [Online]. Available: https://arxiv.org/abs/1510.00149
[6] B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, “Quantization and training of neural networks for efficient integer-arithmetic-only inference,” pp. 2704–2713, 06 2018.
[7] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “Quantized neural networks: Training neural networks with low precision weights and activations,” arXiv preprint arXiv:1609.07061, 2016.
[8] M. Kim and P. Smaragdis, “Bitwise neural networks,” arXiv preprint arXiv:1601.06071, 2016.
[9] F. Li and B. Liu, “Ternary weight networks,” arXiv preprint arXiv:1605.04711, 2016.
[10] J. Ott, Z. Lin, Y. Zhang, S.-C. Liu, and Y. Bengio, “Recurrent neural networks with limited numerical precision,” arXiv preprint arXiv:1608.06902, 2017.
[11] Z. Zhou, W. Zhou, H. Li, and R. Hong, “Online filter clustering and pruning for efficient convnets,” in 2018 25th IEEE International Conference on Image Processing (ICIP), 2018, pp. 11–15.
[12] C. Louizos, M. Welling, and D. P. Kingma, “Learning sparse neural networks through l0 regularization,” in International Conference on Learning Representations, 2018. [Online]. Available: https://openreview.net/forum?id=H1Y8hhg0b
[13] S. Swaminathan, D. Garg, R. Kannan, and F. Andres, “Sparse low rank factorization for deep neural network compression,” Neurocomputing, vol. 398, pp. 185–196, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0925231220302253
[14] A. Novikov, D. Podoprikhin, A. Osokin, and D. Vetrov, “Tensorizing neural networks,” in Proceedings of the 29th International Conference on Neural Information Processing Systems - Volume 1, ser. NIPS’15. Cambridge, MA, USA: MIT Press, 2015, p. 442–450.
[15] R. Saha, N. Sagan, V. Srivastava, A. J. Goldsmith, and M. Pilanci, “Compressing large language models using low rank and low precision decomposition,” 2024.
[16] V. Sindhwani, T. Sainath, and S. Kumar, “Structured transforms for small-footprint deep learning,” in Advances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, Eds., vol. 28. Curran Associates, Inc., 2015. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2015/file/851300ee84c2b80ed40f51ed26d866fc-Paper.pdf
[17] Y. Cheng, F. X. Yu, R. S. Feris, S. Kumar, A. Choudhary, and S.-F. Chang, “An exploration of parameter redundancy in deep networks with circulant projections,” in Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), ser. ICCV ’15. USA: IEEE Computer Society, 2015, p. 2857–2865 [Online]. Available: https://doi.org/10.1109/ICCV.2015.327
[18] E. J. Hu, Y. Shen, P. Wallis, Z. AllenZhu, Y. Li, S. Wang, L. Chen, Y. Li, and S. Wang, “Lora: Lowrank adaptation of large language models,” arXiv preprint, vol. arXiv:2106.09685, 2021.
[19] T. Chen, B. Xu, C. Zhang, and C. Guestrin, “Training deep nets with sublinear memory cost,” 2016. [Online]. Available: https://arxiv.org/abs/1604.06174
[20] G. Huang, S. Liu, L. van der Maaten, and K. Weinberger, “Condensenet: An efficient densenet using learned group convolutions,” pp.2752–2761, 06 2018.
[21] M. Tan and Q. V. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” ArXiv, vol. abs/1905.11946, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:167217261
[22] H. Cai, L. Zhu, and S. Han, “Proxylessnas: Direct neural architecture search on target task and hardware,” ArXiv, vol. abs/1812.00332, 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:54438210
[23] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” 2014. [Online]. Available: https://arxiv.org/abs/1409.4842
[24] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <0.5mb model size,” 2016. [Online]. Available: https://arxiv.org/abs/1602.07360
[25] M. Dehghani, S. Gouws, O. Vinyals, J. Uszkoreit, and Łukasz Kaiser, “Universal transformers,” 2019. [Online]. Available: https://arxiv.org/abs/1807.03819
[26] N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, and J. Dean, “Outrageously large neural networks: The sparsely-gated mixture-of-experts layer,” arXiv preprint arXiv:1701.06538, 2017.
[27] D. Lepikhin, H. Lee, Y. Xu, D. Chen, O. Firat, Y. Huang, M. Krikun, N. Shazeer, and Z. Chen, “Gshard: Scaling giant models with conditional computation and automatic sharding,” arXiv preprint arXiv:2006.16668, 2020.
[28] Y. Zhou, X. Zhai, W.-L. Chiang, Y. Li, J. Gu et al., “Mixture-of-experts with expert choice routing,” arXiv preprint arXiv:2202.09368, 2022.
[29] Y. Liu, J. Gu, N. Goyal, and et al., “Multilingual denoising pre-training for neural machine translation,” Transactions of the Association for Computational Linguistics, vol. 8, pp. 726–742, 2020. [Online]. Available: https://aclanthology.org/2020.tacl-1.47/
[30] A. Yang, A. Li, B. Yang, and et al., “Qwen3 technical report,” 2025. [Online]. Available: https://arxiv.org/abs/2505.09388
[31] N. Team, M. R. Costa-jussà, J. Cross, O. Çelebi, and et al., “No language left behind: Scaling human-centered machine translation,” 2022. [Online]. Available: https://arxiv.org/abs/2207.04672

Full PDF

Published: 2025-12-31

DOI: https://doi.org/10.58873/sict.v4i1.45

Issue

Vol. 4 No. 1 (2025): ICT Focus

Section

Articles

This work is licensed under a Creative Commons Attribution 4.0 International License.

License Terms :

The license term for an open-access journal refers to the duration and conditions under which the published content is made available to the public. Open-access journals often use licenses that permit broad use and distribution of the content. A commonly used license for open-access content is a Creative Commons license.

ICTFocus journal has Open Access to articles published under a license by CC BY (Attribution). This license allows others to copy, distribute, display, and perform the work and derivative works based upon it, but only if they give the author or licensor the credits in the manner specified by these.

The license term is typically indefinite, allowing for the ongoing availability of the content to the public. However, the specific license terms and duration can vary depending on the journal's policies and the type of Creative Commons license used. Always refer to the specific license terms provided by the open-access journal to understand how the content can be used.

Main Article Content

Keywords

Abstract

References

Article Sidebar

Article Details

Similar Articles