2.1 Problem 🎯
In the application of Physics-Informed Neural Networks (PINNs), it comes as no surprise that the neural network hyperparameters, such as network depth, width, the choice of activation function, etc, all have significant impacts on the PINNs’ efficiency and accuracy.
Naturally, people would resort to AutoML (more specifically, neural architecture search) to automatically identify the optimal network hyperparameters. But before we can do that, there are two questions that need to be addressed:
- How to effectively navigate the vast search space?
- How to define a proper search objective?
This latter point is due to the fact that PINN is usually seen as an “unsupervised” problem: no labeled data is needed since the training is guided by minimizing the ODE/PDE residuals.
To better understand those two issues, the authors have conducted extensive experiments to investigate the PINN performance’s sensitivity with respect to the network structure. Let’s now take a look at what they have found.
2.2 Solution 💡
The first idea proposed in the paper is that the training loss can be used as the surrogate for the search objective, as it highly correlates with the final prediction accuracy of the PINN. This addresses the issue of defining a proper optimization target for hyperparameter search.
The second idea is that there is no need to optimize all network hyperparameters simultaneously. Instead, we can adopt a step-by-step decoupling strategy to, for example, first search for the optimal activation function, then fix the choice of the activation function and find the optimal network width, then fix the previous decisions and optimize network depth, and so on. In their experiments, the authors demonstrated that this strategy is very effective.
With those two ideas in mind, let’s see how we can execute the search in detail.
First of all, which network hyperparameters are considered? In the paper, the recommended search space is:
- Width: number of neurons in each hidden layer. The considered range is [8, 512] with a step of 4 or 8.
- Depth: number of hidden layers. The considered range is [3, 10] with a step of 1.
- Activation function: Tanh, Sigmoid, ReLU, and Swish.
- Changing point: the portion of the epochs using Adam to the total training epochs. The considered values are [0.1, 0.2, 0.3, 0.4, 0.5]. In PINN, it’s a common practice to first use Adam to train for certain epochs and then switch to L-BFGS to keep training for some epochs. This changing point hyperparameter determines the timing of the change.
- Learning rate: a fixed value of 1e-5, as it has a small effect on the final architecture search results.
- Training epochs: a fixed value of 10000, as it has a small effect on the final architecture search results.
Secondly, let’s examine the proposed procedure in detail:
- The first search target is the activation function. To achieve that, we sample the width and depth parameter space and calculate the losses for all width-depth samples under different activation functions. These results can give us ideas of which activation function is the dominant one. Once decided, we fix the activation function for the following steps.
- The second search target is the width. More specifically, we are looking for a couple of width intervals where PINN performs well.
- The third search target is the depth. Here, we only consider width varying within the best-performing intervals determined from the last step, and we would like to find the best K width-depth combinations where PINN performs well.
- The final search target is the changing point. We simply search for the best changing point for each of the top-K configurations identified from the last step.
The outcome of this search procedure is K different PINN structures. We can either select the best-performing one out of those K candidates or simply use all of them to form a K-ensemble PINN model.
Notice that several tuning parameters need to be specified in the above-presented procedure (e.g., number of width intervals, number of K, etc.), which would depend on the available tuning budget.
As for the specific optimization algorithms used in individual steps, off-the-shelf AutoML libraries can be employed to complete the task. For example, the authors in the paper used Tune package for executing the hyperparameter tuning.
2.3 Why the solution might work 🛠️
By decoupling the search of different hyperparameters, the scale of the search space can be greatly decreased. This not only substantially decreases the search complexity, but also significantly increases the chance of locating a (near) optimal network architecture for the physical problems under investigation.
Also, using the training loss as the search objective is both simple to implement and desirable. As the training loss (mainly constituted by PDE residual loss) highly correlates with the PINN accuracy during inference (according to the experiments conducted in the paper), identifying an architecture that delivers minimum training loss will also likely lead to a model with high prediction accuracy.
2.4 Benchmark ⏱️
The paper considered a total of 7 different benchmark problems. All problems are forward problems where PINN is used to solve the PDEs.
- Heat equation with Dirichlet boundary condition. This type of equation describes the heat or temperature distribution in a given domain over
- Heat equation with Neumann boundary conditions.
- Wave equation, which describes the propagation of oscillations in a space, such as mechanical and electromagnetic waves. Both Dirichlet and Neumann conditions are considered here.
- Burgers equation, which has been leveraged to model shock flows, wave propagation in combustion chambers, vehicular traffic movement, and more.
- Advection equation, which describes the motion of a scalar field as it is advected by a known velocity vector field.
- Advection equation, with different boundary conditions.
- Reaction equation, which describes chemical reactions.
The benchmark studies yielded that:
- The proposed Auto-PINN shows stable performance for various PDEs.
- For most cases, Auto-PINN is able to identify the neural network architecture with the smallest error values.
- The search trials are fewer with the Auto-PINN approach.
2.5 Strengths and Weaknesses ⚡
- Significantly reduced computational cost for performing neural architecture search for PINN applications.
- Improved likelihood of identifying a (near) optimal neural network architecture for different PDE problems.
- The effectiveness of using the training loss value as the search objective might depend on the specific characteristics of the PDE problem at hand, as the benchmarks are performed only for a specific set of PDEs.
- Data sampling strategy influences Auto-PINN performance. While the paper discusses the impact of different data sampling strategies, it does not provide a clear guideline on how to choose the best strategy for a given PDE problem. This could potentially add another layer of complexity to the use of Auto-PINN.
2.6 Alternatives 🔀
The conventional out-of-box AutoML algorithms can also be employed to tackle the problem of hyperparameter optimization in Physics-Informed Neural Networks (PINNs). Those algorithms include Random Search, Genetic Algorithms, Bayesian optimization, etc.
Compared to those alternative algorithms, the newly proposed Auto-PINN is specifically designed for PINN. This makes it a unique and effective solution for optimizing PINN hyperparameters.
There are several possibilities to further improve the proposed strategy:
- Incorporating more sophisticated data sampling strategies, such as adaptive- and residual-based sampling methods, to improve the search accuracy and the model performance.
To learn more about how to optimize the residual points distribution, check out this blog in the PINN design pattern series.
- More benchmarking on the search objective, to assess if training loss value is indeed a good surrogate for various types of PDEs.
- Incorporating other types of neural networks. The current version of Auto-PINN is designed for multilayer perceptron (MLP) architectures only. Future work could explore convolutional neural networks (CNNs) or recurrent neural networks (RNNs), which could potentially enhance the capability of PINNs in solving more complex PDE problems.
- Transfer learning in Auto-PINN. For instance, architectures that perform well on certain types of PDE problems could be used as starting points for the search process on similar types of PDE problems. This could potentially speed up the search process and improve the performance of the model.