Avoiding Barren Plateaus

During the current NISQ era of quantum computation the most common algorithms used to solve problems are based on a hybrid variational approach, where the quantum hardware is used to implement a variational wave function and measure observables, whereas the classical computer is used to store and update the variational parameters. Such algorithms are known as Variational Quantum Algorithms (VQAs) and are the most promising approach in reaching quantum advantage in the near future. However, despite proven advantages, variational algorithms are limited by the existence of barren plateaus (BPs) , which are regions of the optimization landscape where the gradients become vanishingly small. This makes the optimization exponentially hard as the gradients of the cost function are on average zero and deviations vanish exponentially in system size, which is furthermore exacerbated by finite estimation statistics in case of Hamiltonian averaging procedures.

 

So far, numerous approaches have been proposed to minimize this limitation. For instance, one can avoid BP at the initialization stage of variational algorithms, by performing pre-processing that finds a favorable initial configuration that can lead the system towards the desired optimization solution without the gradient becoming small. Also, studying the relation between BPs and entanglement can potentially lead to controlling entanglement to mitigate BPs. However, these approaches are impractical to implement on current quantum devices due to the difficulty in measuring entanglement. In an attempt to diagnose and avoid BPs, the authors in this paper propose the notion of weak barren plateaus (WBPs), which emerge when the entanglement of a local subsystem exceeds a certain threshold identified by the entanglement of a fully scrambled state. Utilizing WBPs, the authors propose a general algorithm to avoid barren plateaus that can be implemented on available NISQ devices, both at initialization and during the optimization.

 

The proposed algorithm estimates the expectation value of the cost function, its gradients, and the second Rényi entropy of small subsystems using classical shadow estimation during the optimization process. For the optimization procedure, gradient descent is used. The classical shadows protocol enables the tracking of the second Rényi entropy, which allows for an efficient diagnosis of the WBP both at the initialization step and during the optimization process of variational parameters. Upon encountering a WBP, as a certain subregion having a sufficiently large Rényi entropy, the algorithm restarts the optimization process with a decreased value of the update step. This process is controlled by a certain learning rate. The overall results show that the absence of WBPs is a sufficient condition for avoiding BPs.

 

The work further includes numerical simulations utilizing variational quantum eigensolvers (VQE) to test the proposed algorithm. For this, a WBP-free initial state is prepared using small qubit rotation angles. The algorithm is then implemented with different learning rates to compare the optimization performance. The results show that lower learning rates avoid WBPs during the optimization cycle, however if the learning rate is further decreased after an optimum value, then the algorithm converges slower and therefore to a larger (worse) energy due to the less than necessary training iterations. Hence, the learning rate should be sufficiently low enough to avoid BPs, but at the same time high enough such that it does not degrade the performance.

 

The results further indicate that entanglement also plays a crucial role in circumventing BPs and is important for achieving a good optimization performance. Moreover, observing the optimization performance according to the learning rate used, indicates that the system might get stuck in local minima characterized by large entanglement. The proposed way to avoid such areas is to define a parameter ‘alpha’ that suggests the presence of a WBP. It was found in the experimentation that setting the parameter ‘alpha’ to less than 1 can be beneficial for the optimization. This can help in avoiding low-quality local minima that are characterized by higher entanglement. However, this requires the entanglement structure of the target state to be previously known. One potential direction to further explore is whether an algorithm can be designed where the learning rate can be dynamically adjusted at every step of the optimization process and not only when a WBP is encountered. This may allow for efficiently maneuvering complicated optimization landscapes by staying clear of highly entangled local minima. Avoiding barren plateaus is a critical step in increasing the performance of any VQA, therefore it will be of interest to test the applicability of the techniques introduced here for quantum machine learning, quantum optimization, or variational time evolution.