Publications
Publications in chronological order.
An up-to-date list is available on Google Scholar.
2026
- TMLRLayer Collapse Can be Induced by Unstructured PruningZhu Liao, Victor Quétu, Van-Tam Nguyen, and 1 more authorTransactions on Machine Learning Research, 2026
Unstructured pruning is a popular compression method for efficiently reducing model parameters. However, while it effectively decreases the number of parameters, it is commonly believed that unstructured pruning cannot shorten the computational critical path, i.e., the maximum number of layers traversed during forward propagation. In this paper, we study when and how unstructured pruning can yield structural effects. For rectifier-activated networks, we introduce the notion of neuron entropy, which quantifies the degree of nonlinearity utilization. We show that magnitude-based pruning naturally lowers this entropy, sometimes down to zero-entropy layers that become linearizable and can thus be removed. Building on this insight, we propose a method that leverages "unstructured" pruning to favor sparsity in low-entropy layers, enabling their complete removal. We validate the phenomenon across CNNs, Vision Transformers, and NLP models: unstructured pruning can induce effective layer removal with little or no performance degradation in over-parameterized networks. Our code is available at https://github.com/ZhuLIAO001/NEPENTHE.git.
@article{liao2026layer, title = {Layer Collapse Can be Induced by Unstructured Pruning}, author = {Liao, Zhu and Qu{\'e}tu, Victor and Nguyen, Van-Tam and Tartaglione, Enzo}, journal = {Transactions on Machine Learning Research}, issn = {2835-8856}, year = {2026}, }
2025
- ICCVFOLDER: Accelerating Multi-modal Large Language Models with Enhanced PerformanceHaicheng Wang, Zhemeng Yu, Gabriele Spadaro, and 3 more authorsICCV, 2025
Recently, Multi-modal Large Language Models (MLLMs) have shown remarkable effectiveness for multi-modal tasks due to their abilities to generate and understand cross-modal data. However, processing long sequences of visual tokens extracted from visual backbones poses a challenge for deployment in real-time applications. To address this issue, we introduce FOLDER, a simple yet effective plug-and-play module designed to reduce the length of the visual token sequence, mitigating computational and memory demands during both training and inference. Through a comprehensive analysis of the token reduction process in vision encoder, we analyze the information loss introduced by different reduction strategies and develop FOLDER to preserve key information while removing visual redundancy. We show the effectiveness of FOLDER by integrating it into the visual backbone of various MLLMs, significantly accelerating the inference phase. Furthermore, we evaluate its utility as a training accelerator or even performance booster for MLLMs. In both contexts, FOLDER achieves comparable or even better performance than the original models, while dramatically reducing complexity by removing up to 70% of visual tokens.
@article{wang2025folder, title = {FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance}, author = {Wang, Haicheng and Yu, Zhemeng and Spadaro, Gabriele and Ju, Chen and Qu{\'e}tu, Victor and Tartaglione, Enzo}, journal = {ICCV}, year = {2025}, } - ICCVLaCoOT: Layer Collapse through Optimal TransportVictor Quétu, Zhu Liao, Nour Hezbri, and 2 more authorsICCV, 2025
Although deep neural networks are well-known for their outstanding performance in tackling complex tasks, their hunger for computational resources remains a significant hurdle, posing energyconsumption issues and restricting their deployment on resource-constrained devices, preventing their widespread adoption. In this paper, we present an optimal transport-based method to reduce the depth of over-parametrized deep neural networks, alleviating their computational burden. More specifically, we propose a new regularization strategy based on the Max-Sliced Wasserstein distance to minimize the distance between the intermediate feature distributions in the neural network. We show that minimizing this distance enables the complete removal of intermediate layers in the network, achieving better performance/depth trade-off compared to existing techniques. We assess the effectiveness of our method on traditional image classification setups and extend it to generative image models. Our code is available at https://github.com/VGCQ/LaCoOT.
@article{quetu2025lacoot, title = {LaCoOT: Layer Collapse through Optimal Transport}, author = {Qu{\'e}tu, Victor and Liao, Zhu and Hezbri, Nour and Pizzati, Fabio and Tartaglione, Enzo}, journal = {ICCV}, year = {2025}, } - AAAITill the Layers Collapse: Compressing a Deep Neural Network through the Lenses of Batch Normalization LayersZhu Liao, Nour Hezbri, Victor Quétu, and 2 more authorsIn Proceedings of the AAAI Conference on Artificial Intelligence, 2025
Today, deep neural networks are widely used since they can handle a variety of complex tasks. Their generality makes them very powerful tools in modern technology. However, deep neural networks are often overparameterized. The usage of these large models consumes a lot of computation resources. In this paper, we introduce a method called \textbfTill the \textbfLayers \textbfCollapse (TLC), which compresses deep neural networks through the lenses of batch normalization layers. By reducing the depth of these networks, our method decreases deep neural networks’ computational requirements and overall latency. We validate our method on popular models such as Swin-T, MobileNet-V2, and RoBERTa, across both image classification and natural language processing (NLP) tasks.
@inproceedings{liao2024till, title = {Till the Layers Collapse: Compressing a Deep Neural Network through the Lenses of Batch Normalization Layers}, author = {Liao, Zhu and Hezbri, Nour and Qu{\'e}tu, Victor and Nguyen, Van-Tam and Tartaglione, Enzo}, booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence}, year = {2025}, } - SoftwareXLayerFold: A Python library to reduce the depth of neural networksGiommaria Pilo, Nour Hezbri, André Pereira Ferreira, and 2 more authorsSoftwareX, 2025
Large-scale models are the backbone of Computer Vision and Natural Language Processing, and their generalizability allows for transfer learning and deployment in different scenarios. However, their large size means that reducing their computational and memory demands remains a challenge. Recent research proposes to achieve “layer collapse”, a condition where multiple layers can be combined due to the collapse of non-linearities to linear operators. While this is an important discovery, most studies remain theoretical, often replacing non-linearities with simple identity functions and not providing a real implementation of the more compact architecture. Our contribution is LayerFold, a library that studies and implements the merging of collapsed layers. We address typical cases, from fully connected to convolutional layers, discussing constraints and prospective challenges. Our tests on edge devices reveal that merely reducing network depth does not always result in faster computation, even when GPU-equipped. This work raises important warnings and opens the door to further advances in efficient model deployment.
@article{pilo2025layerfold, title = {LayerFold: A Python library to reduce the depth of neural networks}, author = {Pilo, Giommaria and Hezbri, Nour and e Ferreira, Andr{\'e} Pereira and Qu{\'e}tu, Victor and Tartaglione, Enzo}, journal = {SoftwareX}, volume = {29}, pages = {102030}, year = {2025}, publisher = {Elsevier}, }
2024
- AAAIDSD²: Can We Dodge Sparse Double Descent and Compress the Neural Network Worry-Free?Victor Quétu, and Enzo TartaglioneIn Proceedings of the AAAI Conference on Artificial Intelligence, 2024
Neoteric works have shown that modern deep learning models can exhibit a sparse double descent phenomenon. Indeed, as the sparsity of the model increases, the test performance first worsens since the model is overfitting the training data; then, the overfitting reduces, leading to an improvement in performance, and finally, the model begins to forget critical information, resulting in underfitting. Such a behavior prevents using traditional early stop criteria. In this work, we have three key contributions. First, we propose a learning framework that avoids such a phenomenon and improves generalization. Second, we introduce an entropy measure providing more insights into the insurgence of this phenomenon and enabling the use of traditional stop criteria. Third, we provide a comprehensive quantitative analysis of contingent factors such as re-initialization methods, model width and depth, and dataset noise. The contributions are supported by empirical evidence in typical setups. Our code is available at https://github.com/VGCQ/DSD2.
@inproceedings{quetu2024dsd2, title = {DSD²: Can We Dodge Sparse Double Descent and Compress the Neural Network Worry-Free?}, author = {Qu{\'e}tu, Victor and Tartaglione, Enzo}, booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence}, volume = {38}, number = {13}, pages = {14749--14757}, year = {2024}, } - ECMLPKDDThe simpler the better: An entropy-based importance metric to reduce neural networks’ depthVictor Quétu, Zhu Liao, and Enzo TartaglioneIn Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2024
While deep neural networks are highly effective at solving complex tasks, large pre-trained models are commonly employed even to solve consistently simpler downstream tasks, which do not necessarily require a large model’s complexity. Motivated by the awareness of the ever-growing AI environmental impact, we propose an efficiency strategy that leverages prior knowledge transferred by large models. Simple but effective, we propose a method relying on an Entropy-bASed Importance mEtRic (EASIER) to reduce the depth of over-parametrized deep neural networks, which alleviates their computational burden. We assess the effectiveness of our method on traditional image classification setups. Our code is available at https://github.com/VGCQ/EASIER.
@inproceedings{quetu2024simpler, title = {The simpler the better: An entropy-based importance metric to reduce neural networks’ depth}, author = {Qu{\'e}tu, Victor and Liao, Zhu and Tartaglione, Enzo}, booktitle = {Joint European Conference on Machine Learning and Knowledge Discovery in Databases}, pages = {92--108}, year = {2024}, organization = {Springer}, } - ECCVWMemory-Optimized Once-For-All NetworkMaxime Girard, Victor Quétu, Samuel Tardieu, and 2 more authorsIn European Conference on Computer Vision, 2024
Deploying Deep Neural Networks (DNNs) on different hardware platforms is challenging due to varying resource constraints. Besides handcrafted approaches aiming at making deep models hardware-friendly, Neural Architectures Search is rising as a toolbox to craft more efficient DNNs without sacrificing performance. Among these, the Once-For-All (OFA) approach offers a solution by allowing the sampling of well-performing sub-networks from a single supernet – this leads to evident advantages in terms of computation. However, OFA does not fully utilize the potential memory capacity of the target device, focusing instead on limiting maximum memory usage per layer. This leaves room for an unexploited potential in terms of model generalizability. In this paper, we introduce a Memory-Optimized OFA (MOOFA) supernet, designed to enhance DNN deployment on resource-limited devices by maximizing memory usage (and for instance, features diversity) across different configurations. Tested on ImageNet, our MOOFA supernet demonstrates improvements in memory exploitation and model accuracy compared to the original OFA supernet. Our code is available at https://github.com/MaximeGirard/memory-optimized-once-for-all.
@inproceedings{girard2024memory, title = {Memory-Optimized Once-For-All Network}, author = {Girard, Maxime and Qu{\'e}tu, Victor and Tardieu, Samuel and Nguyen, Van-Tam and Tartaglione, Enzo}, booktitle = {European Conference on Computer Vision}, year = {2024}, }
2023
- ICCVWCan unstructured pruning reduce the depth in deep neural networks?Zhu Liao, Victor Quétu, Van-Tam Nguyen, and 1 more authorIn Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Pruning is a widely used technique for reducing the size of deep neural networks while maintaining their performance. However, such a technique, despite being able to massively compress deep models, is hardly able to remove entire layers from a model (even when structured): is this an addressable task? In this study, we introduce EGP, an innovative Entropy Guided Pruning algorithm aimed at reducing the size of deep neural networks while preserving their performance. The key focus of EGP is to prioritize pruning connections in layers with low entropy, ultimately leading to their complete removal. Through extensive experiments conducted on popular models like ResNet-18 and Swin-T, our findings demonstrate that EGP effectively compresses deep neural networks while maintaining competitive performance levels. Our results not only shed light on the underlying mechanism behind the advantages of unstructured pruning, but also pave the way for further investigations into the intricate relationship between entropy, pruning techniques, and deep learning performance. The EGP algorithm and its insights hold great promise for advancing the field of network compression and optimization.
@inproceedings{liao2023can, title = {Can unstructured pruning reduce the depth in deep neural networks?}, author = {Liao, Zhu and Qu{\'e}tu, Victor and Nguyen, Van-Tam and Tartaglione, Enzo}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision}, pages = {1402--1406}, year = {2023}, } - ICIPDodging the Double Descent in Deep Neural NetworksVictor Quétu, and Enzo TartaglioneIn 2023 IEEE International Conference on Image Processing (ICIP), 2023
Finding the optimal size of deep learning models is very actual and of broad impact, especially in energy-saving schemes. Very recently, an unexpected phenomenon, the “double descent”, has caught the attention of the deep learning community. As the model’s size grows, the performance gets first worse and then goes back to improving. It raises serious questions about the optimal model’s size to maintain high generalization: the model needs to be sufficiently over-parametrized, but adding too many parameters wastes training resources. Is it possible to find, in an efficient way, the best trade-off? Our work shows that the double descent phenomenon is potentially avoidable with proper conditioning of the learning problem, but a final answer is yet to be found. We empirically observe that there is hope to dodge the double descent in complex scenarios with proper regularization, as a simple l-2 regularization is already positively contributing to such a perspective.
@inproceedings{quetu2023dodging, title = {Dodging the Double Descent in Deep Neural Networks}, author = {Qu{\'e}tu, Victor and Tartaglione, Enzo}, booktitle = {2023 IEEE International Conference on Image Processing (ICIP)}, pages = {1625--1629}, year = {2023}, organization = {IEEE}, } - ICIAPSparse Double Descent in Vision Transformers: real or phantom threat?Victor Quétu, Marta Milovanović, and Enzo TartaglioneIn International Conference on Image Analysis and Processing, 2023
Vision transformers (ViT) have been of broad interest in recent theoretical and empirical works. They are state-of-the-art thanks to their attention-based approach, which boosts the identification of key features and patterns within images thanks to the capability of avoiding inductive bias, resulting in highly accurate image analysis. Meanwhile, neoteric studies have reported a “sparse double descent” phenomenon that can occur in modern deep-learning models, where extremely over-parametrized models can generalize well. This raises practical questions about the optimal size of the model and the quest over finding the best trade-off between sparsity and performance is launched: are Vision Transformers also prone to sparse double descent? Can we find a way to avoid such a phenomenon?
@inproceedings{quetu2023sparse, title = {Sparse Double Descent in Vision Transformers: real or phantom threat?}, author = {Qu{\'e}tu, Victor and Milovanovi{\'c}, Marta and Tartaglione, Enzo}, booktitle = {International Conference on Image Analysis and Processing}, pages = {490--502}, year = {2023}, organization = {Springer}, } - NeurocomputingDisentangling private classes through regularizationEnzo Tartaglione, Francesca Gennari, Victor Quétu, and 1 more authorNeurocomputing, 2023
Deep learning models are nowadays broadly deployed to solve an incredibly large variety of tasks. However, little attention has been devoted to connected legal aspects. In 2016, the European Union approved the General Data Protection Regulation which entered into force in 2018. Its main rationale was to protect the privacy and data protection of its citizens by the way of operating the so-called “Data Economy”. As data is the fuel of modern Artificial Intelligence, it is argued that the GDPR can be partly applicable to a series of algorithmic decision-making tasks before a more structured AI Regulation enters into force. In the meantime, AI should not allow undesired information leakage deviating from the purpose for which is created. In this work, we propose DisP, an approach for deep learning models disentangling the information related to some classes we desire to keep private, from the data processed by AI. In particular, DisP is a regularization strategy de-correlating the features belonging to the same private class at training time, hiding the information about private class membership. Our experiments on state-of-the-art deep learning models show the effectiveness of DisP, minimizing the risk of extraction for the classes we desire to keep private.
@article{tartaglione2023disentangling, title = {Disentangling private classes through regularization}, author = {Tartaglione, Enzo and Gennari, Francesca and Qu{\'e}tu, Victor and Grangetto, Marco}, journal = {Neurocomputing}, volume = {554}, pages = {126612}, year = {2023}, publisher = {Elsevier}, } - ECMLPKDDWThe Quest of Finding the Antidote to Sparse Double DescentVictor Quétu, and Marta MilovanovićIn Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2023
In energy-efficient schemes, finding the optimal size of deep learning models is very important and has a broad impact. Meanwhile, recent studies have reported an unexpected phenomenon, the sparse double descent: as the model’s sparsity increases, the performance first worsens, then improves, and finally deteriorates. Such a non-monotonic behavior raises serious questions about the optimal model’s size to maintain high performance: the model needs to be sufficiently over-parametrized, but having too many parameters wastes training resources. In this paper, we aim to find the best trade-off efficiently. More precisely, we tackle the occurrence of the sparse double descent and present some solutions to avoid it. Firstly, we show that a simple l-2 regularization method can help to mitigate this phenomenon but sacrifices the performance/sparsity compromise. To overcome this problem, we then introduce a learning scheme in which distilling knowledge regularizes the student model. Supported by experimental results achieved using typical image classification setups, we show that this approach leads to the avoidance of such a phenomenon.
@inproceedings{quetu2023quest, title = {The Quest of Finding the Antidote to Sparse Double Descent}, author = {Qu{\'e}tu, Victor and Milovanovi{\'c}, Marta}, booktitle = {Joint European Conference on Machine Learning and Knowledge Discovery in Databases}, pages = {153--167}, year = {2023}, organization = {Springer}, }