## 2.1 Problem

Physics-Informed Neural Networks (PINNs) offer a distinct advantage over conventional neural networks by explicitly integrating known governing ordinary or partial differential equations (ODEs/PDEs) of physical processes. The enforcement of these governing equations in PINNs relies on a set of points known as residual points. These points are strategically selected within the simulation domain, and the corresponding network outputs are substituted into the governing equations to evaluate the residuals. The residuals indicate the extent to which the network outputs align with the underlying physical processes, thereby serving as a crucial physical loss term that guides the neural network training process.

It is evident that the distribution of these residual points plays a pivotal role in influencing the accuracy and efficiency of PINNs during training. However, the prevailing approach often involves simple uniform sampling, which leaves ample room for improvement.

Consequently, a pressing question arises: How can we optimize the distribution of residual points to enhance the accuracy and training efficiency of PINNs?

## 2.2 Solution

Promising ways of distributing the residual points are by adopting the **adaptive strategy** and the **refinement strategy:**

- The adaptive strategy means that after every certain number of training iterations, a new batch of residual points can be generated to replace the previous residual points;
- The refinement strategy means that extra residual points can be added to the existing ones, thus “refining” the residual points.

Based on those two foundational strategies, the paper proposed two novel sampling methods: *Residual-based Adaptive Distribution* (RAD) and *Residual-based Adaptive Refinement with Distribution* (RAR-D):

1. RAD: **R**esidual-based **A**daptive **D**istribution

The key idea is to draw new residual samples based on a customized probability density function over the spatial domain ** x**.

**The probability density function**

*P*(

**) is designed such that it’s proportional to the PDE residual ε(**

*x***) at**

*x***:**

*x*Here, *k* and *c* are two hyperparameters, and the expectation term in the denominator can be approximated by e.g., Monte Carlo integration.

In total, there are three hyperparameters for RAD approach: *k*,* c*, and the period of resampling *N*. Although the optimal hyperparameter values are problem-dependent, the suggested default values are 1, 1, and 2000.

2. RAR-D: **R**esidual-based** A**daptive **R**efinement with **D**istribution

Essentially, RAR-D adds the element of refinement on top of the proposed RAD approach: after certain training iterations, instead of replacing entirely the old residual points with new ones, RAR-D keeps the old residual points and draws new residual points according to the custom probability density function displayed above.

For RAR-D, the suggested default values for *k *and *c *are 2 and 0, respectively.

## 2.3 Why the solution might work

The key lies in the designed sampling probability density function: this density function tends to place more points in regions where the PDE residuals are large and fewer points in regions where the residuals are small. This strategic distribution of points enables a more detailed analysis of the PDE in regions where the residuals are higher, potentially leading to enhanced accuracy in PINN predictions. Additionally, the optimized distribution allows for more efficient use of computational resources, thus reducing the total number of points required for accurate resolution of the governing PDE.

## 2.4 Benchmark

The paper benchmarked the performance of the two proposed approaches along with 8 other sampling strategies, in terms of addressing forward and inverse problems. The considered physical equations include:

- Diffusion-reaction equation (inverse problem, calibrating reaction rate
*k*(*x*))

- Korteweg-de Vries equation (inverse problem, calibrating λ₁ and λ₂)

The comparison studies yielded that:

- RAD always performed the best, thus making it a good default strategy;
- If computational cost is a concern, RAR-D could be a strong alternative, as it tends to provide adequate accuracy and it’s less expensive than RAD;
- RAD & RAR-D are especially effective for complicated PDEs;
- The advantage of RAD & RAR-D shrinks if the simulated PDEs have smooth solutions.

## 2.5 Strength and Weakness

👍**Strength**

- dynamically improves the distribution of residual points based on the PDE residuals during training;
- leads to an increase in PINN accuracy;
- achieves comparable accuracy to existing methods with fewer residual points.

👎**Weakness**

- can be more computationally expensive than other non-adaptive uniform sampling methods. However, this is the price to pay for a higher accuracy;
- for PDEs with smooth solutions, e.g., diffusion equation, diffusion-reaction equation, some simple uniform sampling methods may produce sufficiently low errors, making the proposed solution potentially less suitable in those cases;
- introduced two new hyperparameters
*k*and*c*that need to be tuned as their optimal values are problem-dependent.

## 2.6 Alternatives

Other approaches have been proposed prior to the current paper:

Among those methods, two of them heavily influenced the approaches proposed in the current paper:

- Residual-based adaptive refinement (Lu et al.), which is a special case of the proposed RAR-D with a large value of
*k*; - Importance sampling (Nabian et al.), which is a special case of RAD by setting
*k*=1 and*c*=0.