Reveal flocking phase transition of self-propelled active particles by machine learning regression uncertainty

We develop the neural network based"learning from regression uncertainty"approach for automated detection of phases of matter in nonequilibrium active systems. Taking the flocking phase transition of self-propelled active particles described by the Vicsek model for example, we find that after training a neural network for solving the inverse statistical problem, i.e., for performing the regression task of reconstructing the noise level from given samples of such a nonequilibrium many-body complex system's steady state configurations, the uncertainty of regression results obtained by the well-trained network can actually be utilized to reveal possible phase transitions in the system under study. The noise level dependence of regression uncertainty assumes a non-trivial M-shape, and its valley appears at the critical point of the flocking phase transition. By directly comparing this regression-based approach with the widely-used classification-based"learning by confusion"and"learning with blanking"approaches, we show that our approach has practical effectiveness, efficiency, good generality for various physical systems across interdisciplinary fields, and a greater possibility of being interpretable via conventional notions of physics. These approaches can complement each other to serve as a promising generic toolbox for investigating rich critical phenomena and providing data-driven evidence on the existence of various phase transitions, especially for those complex scenarios associated with first-order phase transitions or nonequilibrium active systems where traditional research methods in physics could face difficulties.


I. INTRODUCTION
In recent years, the machine learning techniques based on the artificial neural network (ANN) have been increasingly utilized to assist research in the extensive fields of condensed matter physics and statistical physics, particularly since the establishing of two pioneering approaches dubbed "learning with blanking" [1,2] and "learning by confusion" [2].By utilizing the powerful ability of ANNs in classification to identify phases of matter [1][2][3][4][5][6][7][8], these two approaches and their variants have successfully provided data-driven new evidence on the existence of various phases of matter in different many-body complex systems, and data-driven estimations for the critical points of the associated phase transitions.Their practical successes can now be found all over the fields, including some tricky scenarios related to non-equilibrium [3,4], topological defects [5,6], and strongly correlated fermions [7,8], etc.For both classical and quantum systems, due to the versatility of ANNs in pattern recognition and data fitting, these machine learning approaches can readily deal with not only the data generated from numerical simulations, but also the experimental data [9][10][11].However, on the other hand, due to the insufficient clarity regarding the underlying working mechanism of ANNs [12][13][14], they often lack a direct physical connection between the recognized classes among the raw data analyzed by ANNs and the distinct phases of matter in the system under study [12][13][14], leaving a challenge to the deeper applications of machine learning techniques and ANNs in physics.
In the face of this challenge, it is worth noting that the powerful ability of ANNs is not limited to handling classification tasks.In fact, ANNs also excel in regression, and significantly, when ANNs are trained to perform regression tasks, the readily interpretable meaning of their outputs can usually be traced back to the regression tasks themselves straightforwardly.For instance, corresponding to the forward thinking of studying a physical system, i.e., finding the system's possible states based on system parameter's given values, the so-called inverse statistical problem (ISP) [15] refers to the regression task of finding the system parameter's possible values based on the system's given state.When the ANNs are trained to perform such an ISP regression task, their outputs are known directly to be the reconstructed system parameter itself, which is in sharp contrast to those ambiguous classes encountered in performing classification tasks.Actually, the connection between the regression results of ANNs to the conventional notions of physics have already started to garner attention from physicists.As a case in point, the opportunities from the symbolic regression [16][17][18][19] towards automated theory building are being explored, where ANNs are found to be capable of extracting the equations of motion [16], symmetries [17], conservation laws [18] from various types of data of physical systems, and even 100 equations in Feynman Lectures on Physics [19].These investigations demonstrate that compared to the classification results of ANNs, the regression results of ANNs have the larger opportunities to be interpreted in physics.
Hence, there arise the regression-based machine learning approaches that utilize ANNs for automated detection of phases of matter, such as the recently-established "learning from regression uncertainty" (LFRU) approach [20].The generic application of regression uncertainty in learning continuous phase transitions have been demonstrated on the Ising and q-state clock models, and the intrinsic connection between regression uncertainty and the system's response properties has been revealed [20].But further validation is still required to assess the generality of this new approach, particularly in the complex scenarios associated with the first-order phase transition in the non-equilibrium (NEQ) and non-lattice systems.Is LFRU still effective and efficient?One clearly knows that in sharp contrast to the continuous phase transition, there is no diverging correlation length scale associated with the first-order phase transition.This protects their rich physics at different length scales from being washed out by the diverging correlation length scale, but also makes their physics difficult to be studied by powerful tools such as the renormalization group [21].It is known as well that in the NEQ many-body systems, the general absence of the detailed balance naturally gives rise to much richer physics comparing to their equilibrium counterparts, e.g., the irregular behavior of turbulent flows [22], but it also leads to the situation that no unified approach or even any general guiding rule seems to exist when one attempts to tackle various physical problems in the NEQ scenario [23].Considering the versatility of ANNs in pattern recognition and data fitting, they are well-suited for scenarios like this.Concerning the complex scenarios associated with the first-order phase transition in active NEQ systems, if the ANN-based machine learning approaches, e.g., LFRU (and also the "learning with blanking" and "learning by confusion" approaches), can effectively and efficiently handle without the need for additional case-by-case designs, then they can serve as a promising generic tool and help to unveil the rich physics in such systems.
In this work, we investigate the generic application of LFRU in learning first-order phase transitions in active NEQ systems.We address the question for revealing the flocking phase transition of self-propelled active particles described by the Vicsek model [24][25][26][27] for example.This is a stochastic dynamical model with extrinsic noise (also known as vectorial noise), initially developed to simulate the flocking behavior of flying birds in low-visibility conditions like foggy weather, and it also has been one of the foundational models in the field of statistical physics for self-propelled active particle systems, exhibiting rich collective dynamics and self-organization phenomena [24][25][26][27].The change in the noise level in this active NEQ system [see Fig. 1 and Fig. 2(a)] can drive a first-order phase transition between the flocking phase (corresponding to low noise level, where the directions of motion of all particles are generally aligned) and the disordered phase (corresponding to high noise level, where the system's rotational symmetry is preserved) [24][25][26][27].Here in this work, we find that even in such a complex scenario associated with a first-order phase transition in an active NEQ system, ANNs can still be readily trained to perform the ISP regression and successfully reconstruct the noise level [see Fig. 2(b)].Then we investigate the uncertainty of the regression results obtained by the ANNs, and find that its noise level dependence contains hidden information, with the curve assuming an M-shape [see Fig. 3(a)].Most significantly, we find that this M-shaped curve of regression uncertainty, which is figured out autonomously by the well-trained ANNs, can be utilized to reveal the first-order flocking phase transition of self-propelled active particles in this active NEQ system.The existence of the non-trivial minimum of regression uncertainty manifests the presence of the flocking phase transition, and the position of the non-trivial minimum actually corresponds to the critical noise level of this flocking phase transition.Building upon our recent investigations [20], our findings in this work clearly demonstrate the good generality of LFRU for various physical systems across interdisciplinary fields.We also study the practical effectiveness of the widely-used classification-based "learning by confusion" and "learning with blanking" approaches on revealing the same flocking phase transition of selfpropelled active particles, directly compare LFRU with them on the efficiency, the requirement on prior physical knowledge, and the possibility of being interpretable via conventional notions of physics, and discuss these three approaches' similarities and differences and their respective characteristics.

II. FLOCKING PHASE TRANSITION AND THE INVERSE STATISTICAL PROBLEM OF SELF-PROPELLED ACTIVE PARTICLES
A. System and model The physical system under consideration consists of N self-propelled active particles in a two-dimensional box of size L × L with periodic boundary conditions under influences from environmental fluctuations, e.g., flying birds in low-visibility conditions like foggy weather.These selfpropelled active particles share a free speed v 0 , and at any time t, each particle i decide to adjust its own velocity v i (t) according to the average velocity of all particles located within its neighborhood U i (including i itself).This can be described by the Vicsek model with extrinsic noise, where the collective behavior of self-propelled active particles is modeled by a set of stochastic discretetime dynamical equations [24][25][26][27]: Figure 1.Typical samples corresponding to different noise levels that are generated by numerical simulations.In every sample, each of the circular markers represents a single selfpropelled particle in the two-dimensional space, with their spatial distribution representing the instantaneous spatial distribution of self-propelled particles, and their color distribution representing the instantaneous angular distribution of directions of motion of these self-propelled particles.Among the samples shown here for instance, the five samples in the left are in the flocking phase, and the rightmost one is in the disordered phase.See text for more details.
Here, ∆t is the discrete time step, ϑ is a normalization operator [ϑ(w) = w/|w|], and N i denotes the number of particles within U i .The neighborhood U i is a disk with radius r 0 centered at the location of i, and its determination is also subject to periodic boundary conditions.The random unit vector ξ i is a vectorial noise, with η ∈ [0, 1] being the noise level that captures the environmental fluctuations.A key feature of this system is that, for the fixed density ρ ≡ N/L 2 of self-propelled active particles, the change in the noise level η can drive a firstorder phase transition between the flocking phase and the disordered phase, as illustrated in Fig. 1, which is characterized by the jump in the system's global group velocity In the following, we focus on the case with N = 2048, ρ = 2, L = 32, v 0 = 0.5, r = 1 for instance, and in this case, the jump in v appears at η c = 0.626 ± 0.006 as one can see from Fig. 2(a).The concrete goal of our work is to apply LFRU to make the ANNs automatically extract this critical noise level η c by analyzing the data of the system's spatial distributions and velocity distributions as shown in Fig. 1.

B. Machine learning
To investigate this active NEQ system of self-propelled active particles, whether via applying classification-based or regression-based approaches, one shall first prepare the data in a form that is suitable for analysis by the ANNs.There are various ways to achieve this, and here for convenience, we adopt a similar way to the usage of the ANNs in the image processing applications like facial recognition, i.e., preparing the data in the form of images.As illustrated in Fig. 1, in every sample, each of the circular markers represents a single self-propelled particle in the two-dimensional space, with their spatial distribution representing the instantaneous spatial distribution of selfpropelled particles, and their color distribution representing the instantaneous angular distribution of directions of motion of these self-propelled particles.Since we shall directly employ an industrially mature deep ANN architec-ture, known as the residual neural network (ResNet) [28], whose standard input size is 3 × 224 × 224, and thus we accordingly prepare the samples into images of 224 × 224 pixels (3 for RGB channels of the images).These samples are then divided into three categories forming the training dataset, the validation dataset, and the test dataset.
The so-called training of ANNs refers to traversing the samples in the training dataset for several epochs, and in each epoch, the ANN works as a 3 × 224 × 224 → 1 map (for the ISP regression tasks) or 3 × 224 × 224 → 2 map (for the binary classification tasks).The ANN yields one single value (ISP regression) or two values (binary classification) as outputs concerning every input sample, and then a loss function is calculated based on these outputs and the samples' attached labels.According to the backpropagation, the values of the ANN's numerous trainable parameters (e.g., weights and biases among neurons) are optimized towards minimizing the loss function.For the ISP regression task involved in Sec.III A in the implementation of LFRU, the loss function can be the mean squared error between the reconstructed noise level η R and the actual noise level η at which the input sample is generated.While for the binary classification tasks involved in the implementations of "learning by confusion" and "learning with blanking", then the loss function can be the cross-entropy function between the confidence outputs and the class labels (we shall discuss in details in Sec.III B and Sec.III C about these binary classification tasks).To improve the generalization ability of the trained ANN, the values of the ANN's trainable parameters used after training are not the ones that minimize the loss function on the training dataset, but those that minimize the loss function on the validation dataset.Eventually, the well-trained ANN with these final set of trainable parameters is applied to the test dataset to evaluate its learning performance.In the following, let us start with discussing the ANN-based ISP regression.

C. Inverse statistical problem
The LFRU approach for automated detection of phases of matter utilizes the regression uncertainty in the ISP.To apply LFRU on investigating the flocking phase transition of self-propelled active particles, we shall construct a concrete ISP regression task so that the ANN can learn to deal with it and thereby expose its regression uncertainty.Concerning the active NEQ system under consideration, corresponding to the forward thinking, i.e., finding the system's possible steady-states (spatial distributions and velocity distributions as shown in Fig. 1) based on the given noise level η, one quite natural ISP is finding the possible value of η for every given steady-state of the system.Since the data are generated by directly simulating the stochastic discrete-time dynamical equations (1), inevitably there might exist a few samples that are highly similar but actually generated at different η.As a result, concerning all the samples generated at the same  noise level η, their reconstructed noise level η R obtained by ANNs (and any other method) will not be exactly the same, which thus gives rise to the intrinsic regression uncertainty U (η) for the reconstructed noise levels.Straightforwardly, we can use the standard deviation of well-trained ANN's outputs to characterize this regression uncertainty at each η, whose explicit form reads: where • denotes the average over all the samples generated at η in the test dataset.Reducing this intrinsic regression uncertainty is usually regarded as one of the central goals of the ISP itself, but here it is just a necessary intermediate product for revealing possible phase transitions, and theoretically it cannot be reduced to zero, so we are not intending to pursue its minimization.Further noticing that this is a complex scenario associated with a first-order phase transition in an active NEQ system, how to effectively and efficiently realize the ISP in the Vicsek model is already a non-trivial open question.
The traditional research methods majorly focus on the inverse Ising problem, yet they usually resort to some case-by-case methods such as mean-field [29] and maximum likelihood estimation [30].In this work, we directly utilizes the ANN-based machine learning techniques to perform the ISP regression.

III. AUTOMATED EXTRACTION OF CRITICAL NOISE LEVEL
A. Learning from regression uncertainty (LFRU) The datasets involve 17 different noise levels in the range η ∈ [0.39, 0.71] with a constant spacing ∆η = 0.02.For each noise level η, there are 2000 samples in the , the reconstructed noise level η R predicted by the well-trained ANN does not exactly match the actual noise level η, but almost.This suggests that ANNs can indeed learn the noise level η in this active NEQ system, overcoming potential interferences such as metastable states, and hence the outputs of the well-trained ANNs can naturally be considered as having a direct physical connection to the noise level η.
Since the automated detection of the flocking phase and the extraction of the critical noise level η c are further built upon the outputs of the well-trained ANNs, they thus also have the great possibility of being interpretable via conventional notions of physics.After verifying that ANNs can deal with the ISP regression in the active NEQ system under consideration, let us take a deeper look at the regression uncertainty U (η), i.e., the error bars of the regression results shown in Fig. 2(b).Its non-trivial information is actually "hidden" in Fig. 2(b), but when the noise level dependence of regression uncertainty U (η) is plotted explicitly in Fig. 3(a), one can clearly notice that the curve assumes an M-shape, and the valley position ηc = 0.63±0.01 is not located at the middle of the parameter region [0.39,0.71], but automatically corresponds to the critical noise level η c of the system (the vertical lines in Fig. 3 represent the critical noise level η c = 0.626 ± 0.006 obtained via traditional methods, i.e., by the jump in v).This suggests that ANNs can successfully extract the critical noise level η c of the flocking phase transition of self-propelled active particles.
These findings are consistent with our recent investigations that establish LFRU and demonstrate it on the Ising and q-state clock models [20], reflecting the LFRU's good generality for various physical systems across interdisciplinary fields.To summarize, by utilizing the powerful ability of ANNs in regression and its direct physical connection to conventional notions of physics, physicists applying LFRU simply need to provide ANNs with the actual parameter values of each sample to train for the ISP regression tasks.Then the uncertainty of regression results obtained by the well-trained ANNs can be utilized to reveal possible phase transitions in the system under study.If there is only one distinct phase of matter, i.e., no phase transition existing in the parameter region examined by LFRU, the curve of the regression uncertainty is demonstrated to exhibit only one trivial peak, without any non-trivial minimum.Once the curve of the regression uncertainty assumes the M-shape, it reveals that a phase transition is expected to exist in the examined parameter region, and the critical point of the phase transition can be extracted from the valley position of the regression uncertainty.

B. Learning by confusion
In the following, we shall also apply the widely-used classification-based "learning by confusion" and "learning with blanking" approaches on revealing the same flocking phase transition of self-propelled active particles as a direct comparison.To train ANNs for the classification tasks (more specifically, binary classification tasks here in this case), compared to the usage above, one shall change the loss function into another candidate that is suitable for classification tasks instead of regression tasks, e.g., the cross-entropy function.Moreover, here each sample shall be attached with a two-valued class label (C 1 , C 2 ) which can be interpreted as probabilities, with the label (1, 0) indicating that the sample is 100% likely to be class-A, the label (0, 1) indicating that the sample is 100% likely to be class-B.Accordingly, the two outputs of the ANN for each sample can also be interpreted as probabilities.For instance, an output (0.6, 0.4) means that the ANN has a 60% confidence to recognize the sample as class-A, and a 40% confidence to recognize the sample as class-B.Naturally, the classification result given by the ANN is class Now let us start with the "learning by confusion" approach.This approach reveals possible phase transitions via monitoring the contrast between the ANN's good and bad recognition performance when confusing labels are deliberately attached to some of the samples.To implement it, one shall purpose an arbitrary noise level η ′ c , declaring that any sample satisfying η < η ′ c (η > η ′ c ) belongs to class-A (class-B).In general, this binary classification rule is for no reason at all, and these ambiguous classes are entirely unrelated to the distinct phases of matter in the system under study (e.g., the flocking phase and the disordered phase).After training, the ANN is applied to the test dataset to evaluate its learning performance concerning this binary classification task associated with η ′ c .In the m test samples generated at different noise levels, when the trained ANN match m ′ samples of them with their attached class labels, its recognition performance is estimated by the classification accuracy P (η ′ c ) = m ′ /m.Repeating the above procedure with a series of different purposed η ′ c , one can thus establish the η ′ c dependence of the classification accuracy P (η ′ c ).
As one can see from Fig. 3(b), the curve of the classification accuracy P (η ′ c ) assumes a W-shape.Noticing that except the two trivial choices η ′ c = 0 and η ′ c = 1 which indicate essentially no classification is performed, for any arbitrary η ′ c that does not match the physical flocking transition point η c , its corresponding way of labeling the samples will inevitably confuse the ANN in the training process by those wrong labels, and hence lowers the classification accuracy P (η ′ c ) of the trained ANN.
To see this, let us concretely discuss the η ′ c > η c case for example (the η ′ c < η c case is just similar).Those class-A samples satisfying η c < η < η ′ c and the class-B samples satisfying η > η ′ c are in fact both in the disordered phase, but opposite class labels have been attached with them.How could the ANN learn the actually non-existent "difference" between these two classes?Meanwhile, they look indeed different from those other class-A samples satisfying η < η c .Then how could the ANN learn this actually non-existent "similarity"?These confusing labels that deviate from physical facts inevitably limit the recognition performance of the ANN, and hence the classification accuracy P (η ′ c ) would not be quite ideal.Naturally, for the same datasets, the closer η ′ c is to the physical flocking transition point η c , the fewer such confusing labels exist, leading to the relatively higher P (η ′ c ).This means that the classification accuracy P (η ′ c ) is expected to assume a non-trivial maximum exactly when η ′ c = η c .Therefore, the "learning by confusion" approach takes the peak position of the W-shaped curve of the classification accuracy P (η ′ c ) as the predicted critical noise level.
In Fig. 3(b), the average results of 20 well-trained ANNs suggest that the maximum of P (η ′ c ) corresponds to ηc = 0.62 ± 0.01, which matches well with the critical noise level η c = 0.626 ± 0.006 obtained via traditional methods (see also the vertical lines in Fig. 3).These investigations manifest that the "learning by confusion" approach also holds its practical effectiveness for extracting the critical noise level η c of the flocking phase transition of self-propelled active particles by analyzing the data of this active NEQ system's spatial distributions and velocity distributions as shown in Fig. 1.

C. Learning with blanking
Now let us switch to the "learning with blanking" approach.This approach reveals possible phase transitions via directly utilizing the ANN's ability to identify various phases of matter.When all the samples are attached with appropriate labels that are consistent with the physical facts (i.e., the η ′ c = η c case above), even though the ANN is trained with only the samples corresponding to low and high noise levels, blanking the intermediate noise levels, the ANN is still expected to easily accomplish its binary classification task.More specifically, here in the training and the validation datasets, the samples corresponding to η = 0.39, 0.41 (η = 0.69, 0.71) are labeled as class-A (class-B), and the other samples corresponding to the intermediate noise levels 0.41 < η < 0.69 are not involved before the ANN is well-trained.After training, the ANN's recognition confidences are evaluated concerning the test dataset within [0.43, 0.67] for instance.
As one can see from Fig. 3(c), which shows the average confidences of 20 well-trained ANNs (all are also ResNet, but independently trained and validated) for identifying the samples in η ∈ [0.43, 0.67] into class-A or class-B.The dashed and solid curves in Fig. 3(c) represent the noise level η dependence of the class-A confidence C 1 and the class-B confidence C 2 , respectively.The intersection point of these two curves locates at ηc ≈ 0.625, which is the critical noise level predicted by the ANN via this approach, since at the critical point, the instantaneous states of the system can be either in the flocking phase or in the disordered phase, resulting in the equal confidences of the well-trained ANN C 1 (η) = C 2 (η).This predicted critical noise level also matches the one η c = 0.626±0.006obtained via traditional methods (see also the vertical lines in Fig. 3), manifesting that the "learning with blanking" approach holds its practical effectiveness as well.

IV. COMPARISON
So far, we have already witnessed in Fig. 3 that without additional case-by-case designs for such a complex scenario associated with a first-order phase transition in an active NEQ system, the regression-based LFRU approach and two classification-based approaches can all be readily applied to utilize the powerful ability of ANNs for extracting the critical noise level η c .In the following, let us further discuss their respective characteristics.

A. Efficiency
The efficiency is a fundamental demand for any practical research method.Comparing the ANNs performing the regression and classification tasks, they have just a slight difference in their network architecture, lying in the number of output neurons (one neuron for the ISP regression, and two neurons for the binary classification), which thus leads to their almost equal computational complexity for traversing the same dataset once.Their loss functions' contributions to the computational complexity are also roughly the same, and their convergence speeds are close as well [20].Consequently, in the implementations of LFRU and the "learning by confusion" approach, the time required to give birth to a well-trained ANN is approximately equal.However, a well-trained ANN for the "learning by confusion" approach can just give a single value of P (η ′ c ), and it must train a series of ANNs examining a series of different η ′ c so as to locate the maximum of P (η ′ c ).While in sharp contrast, when applying LFRU, a complete curve of U (η) is directly obtained from one well-trained ANN.This makes the total time required of LFRU for automated detection of phases of matter is less than the total time required of the "learning by confusion" approach.As for the "learning with blanking" approach, it requires the least time since it only involves a small part of the whole datasets, but this comes at a cost, as it requires prior physical knowledge to a much greater extent and thus cannot independently realize the automated detection of phases of matter.

B. Requirement on prior physical knowledge
To realize the automated detection of phases of matter, any practical research method based on machine learning techniques is naturally not expected to require prior physical knowledge on the phases of matter in the system under study.This topic is related to the concept of supervision in machine learning.In the terminology of machine learning, a supervised learning algorithm refers to a machine learning algorithm that involves the attached labels for each sample as the target for learning.In this sense, all the three approaches investigated in this work are technically the supervised learning algorithms.However, as we have seen in Sec.II C and Sec.III A, when applying LFRU to reveal the flocking phase transition of self-propelled active particles, the labels are the actual noise level η at which each sample is generated, while the physical target of applying LFRU is not to solve the ISP itself, but to extract the critical noise level of the flocking phase transition.The labels provide the ANN with knowledge about the ISP, having nothing to do with the phases of matter and possible phase transitions in this system.Therefore, concerning the applications of machine learning techniques in physical researches, LFRU can be regarded as unsupervised.The "learning by confusion" approach is also unsupervised in the same sense [2], since the labels are attached according to an arbitrarily purposed noise level η ′ c .But it is also noteworthy that its binary classification implies the prior judgment that there are at most two phases existing in the system, making it requires further modifications before it can be applied to deal with those complex many-body systems with distinct intermediate phases [6].And on the other hand, the "learning with blanking" approach takes the intersection point of two confidence curves as the system's phase transition point, which assumes that there exists and only exists one phase transition.This makes it not only hard to readily handle the systems with distinct intermediate phases, but also face great challenges from the phase coexistence and crossover scenarios, since a phase transition, a phase coexistence, and a crossover can all similarly lead to the intersection [1].In general, among these three approaches, the "learning with blanking" requires prior physical knowledge most, followed by the "learning by confusion" approach, and LFRU shows a relative advantage in this aspect.

C. Possibility of being interpretable via conventional notions of physics
Another fundamental demand for automated detection of phases of matter is the possibility of being physically interpretable.Due to the insufficient clarity regarding the underlying working mechanism of ANNs [12][13][14], the interpretability of the ANN-based machine learning techniques themselves is beyond the scope of our work, and here in the implementations of all the three approaches, we consider the employed ANNs as black-box maps.But under such circumstances, one can still look for the physical interpretability of machine learning results obtained by utilizing these black-box maps, connecting the results to the conventional notions of physics.In this context, the final step of extracting the critical noise level via the classification-based approach is completed by the human rather than by ANNs, i.e., to directly consider the data boundary point between class-A and class-B as the physical phase transition point between the flocking phase and the disordered phase.This is actually injecting additional physical knowledge about this flocking phase transition into the ANNs afterwards, which weakens the unsupervised nature of the approach since for the practical application scenarios, i.e., for an unexplored system, it needs further reasons for this association.In distinction, ANNs learn faithfully the noise level η of the system under study, and based on this, the results of the regressionbased LFRU can be naturally connected to conventional notions of physics [20].The outputs of the well-trained ANNs are the reconstructed system parameter η itself, and hence the statistical properties of these outputs (such as the regression uncertainty) are a reflection of the system's statistical properties.When the statistical properties of the outputs of ANNs exhibit non-trivial behaviors, for instance, when the regression uncertainty U (η) assumes a non-trivial minimum at a certain noise level η = ηc , there is reason to expect that the system's statistical properties exhibit non-trivial behaviors as well at that point.This thus provides the justification for taking the noise level ηc that corresponds to the non-trivial minimum of regression uncertainty as the critical noise level of the flocking phase transition in this active NEQ system.Moreover, noticing that the intrinsic connection between regression uncertainty and the system's response properties has been revealed in the Ising and q-state clock models [20], the analogues of such a connection could also exist in the Vicsek model.These possibilities of numerically and theoretically connecting the machine learning results of ANNs to the conventional notions of physics are not often available for the classification-based approaches.

V. CONCLUSIONS
In this work, after training the ANNs to perform the ISP regression in the active NEQ system of self-propelled active particles described by the Vicsek model, we find that the regression uncertainty of the well-trained ANNs actually contains hidden information that can be utilized to reveal the flocking phase transition in this system.The noise level η dependence of regression uncertainty U (η) assumes an M-shape, providing data-driven new evidence on the existence of this phase transition.Its valley position provides a data-driven estimation ηc = 0.63 ± 0.01 for the critical noise level, which matches well with the value η c = 0.626 ± 0.006 obtained via traditional methods.We further find that the regression-based approach LFRU developed in this work for automated detection of phases of matter in active NEQ systems has several distinctive characteristics and practical advantages.LFRU can complement with the widely-used classification-based "learning by confusion" and "learning with blanking" approaches to serve as a promising generic toolbox, bringing a new perspective for investigating rich critical phenomena utilizing the ANN-based machine learning techniques, for those complex scenarios associated with the first-order phase transitions in active NEQ systems where traditional research methods in physics could face difficulties.Owing to the powerful ability of ANNs in regression and its direct physical connection to conventional notions of physics, these findings could inspire and guide the further revealing of the connection in the Vicsek model and Vicsek-like models between regression uncertainty and the system's statistical properties such as the response properties.In these regards, we believe that the development of LFRU will stimulate further efforts in both developing and applying physically interpretable machine learning approaches to unveil new physics in these active NEQ systems.

Figure 2 .
Figure 2. Inverse statistical problem in a self-propelled active particle system: (a) Noise level dependence of the system's global group velocity, whose jump at ηc = 0.626 ± 0.006 characterizes the first-order flocking phase transition; (b) Noise level dependence of the reconstructed noise level predicted by the well-trained ANN.The error bars represent the regression uncertainty U (η), and the diagonal line represent the ideal regression result ηR = η.See text for more details.

Figure 3 .
Figure 3. Revealing the flocking phase transition of selfpropelled active particles via applying three different machine learning approaches: (a) The LFRU approach; (b)The "learning by confusion" approach; (c) The "learning with blanking" approach.See text for more details.