1.Using mixup as regularization and tuning hyper-parameters for ResNets (arXiv)
Author : Venkata Bhanu Teja Pallakonda
Abstract : While novel computer vision architectures are gaining traction, the impact of model architectures is often related to changes or exploring in training methods. Identity mapping-based architectures ResNets and DenseNets have promised path-breaking results in the image classification task and are go-to methods for even now if the data given is fairly limited. Considering the ease of training with limited resources this work revisits the ResNets and improves the ResNet50 cite{resnets} by using mixup data-augmentation as regularization and tuning the hyper-parameters.
2.Interpolation and approximation via Momentum ResNets and Neural ODEs (arXiv)
Author : Domènec Ruiz-Balet, Elisa Affili, Enrique Zuazua
Abstract : In this article, we explore the effects of memory terms in continuous-layer Deep Residual Networks by studying Neural ODEs (NODEs). We investigate two types of models. On one side, we consider the case of Residual Neural Networks with dependence on multiple layers, more precisely Momentum ResNets. On the other side, we analyze a Neural ODE with auxiliary states playing the role of memory states. We examine the interpolation and universal approximation properties for both architectures through a simultaneous control perspective. We also prove the ability of the second model to represent sophisticated maps, such as parametrizations of time-dependent functions. Numerical simulations complement our study.
3. Revisiting 3D ResNets for Video Recognition(arXiv)
Author : Xianzhi Du, Yeqing Li, Yin Cui, Rui Qian, Jing Li, Irwan Bello
Abstract : A recent work from Bello shows that training and scaling strategies may be more significant than model architectures for visual recognition. This short note studies effective training and scaling strategies for video recognition models. We propose a simple scaling strategy for 3D ResNets, in combination with improved training strategies and minor architectural changes. The resulting models, termed 3D ResNet-RS, attain competitive performance of 81.0 on Kinetics-400 and 83.8 on Kinetics-600 without pre-training. When pre-trained on a large Web Video Text dataset, our best model achieves 83.5 and 84.3 on Kinetics-400 and Kinetics-600. The proposed scaling rule is further evaluated in a self-supervised setup using contrastive learning, demonstrating improved performance. Code is available at: https://github.com/tensorflow/models/tree/master/official