Quantum Machine Learning (QML) models often consist of a data embedding followed by a parametrized quantum circuit often called Quantum Neural Network (QNN). So far, QNNs have played a significant role in quantum machine learning applications in function approximation, handling big data, modelling social networks, associative memory devices, and automated control systems. However, the training of such NNs requires optimizing the QNN’s parameters, which is a non-deterministic polynomial time problem.

The authors in this work employ the technique of over-parameterization, which is the regime where one trains a NN with a larger capacity than required to represent the distribution of the training data. Therefore, the QNN has more parameters to explore all relevant directions in the Hilbert space. Over-parametrizing a NN can improve its performance by improving the trainability and reducing generalization errors, leading to faster convergence. In this work, the authors provide a theoretical framework for the over-parametrization of QNNs and describe its correspondence to a computational phase transition, where the trainability of QNNs is vastly improved due to a landscape with less local minima. The work proposes three theorems to analyze the over-parameterization phenomenon followed by numerical verifications.

The first theorem derives an upper bound as a sufficient condition for over-parametrization. This indicates that for a general type of periodic-structured QNNs, an over-parameterized regime can be reached by increasing the number of parameters past some threshold critical value. This value is related to the dimension of the Dynamical Lie Algebra (DLA) associated with the generators of the QNN. The second theorem provides an operational meaning to the over-parameterization definition in terms of the model’s capacity i.e. increasing the number of parameters can never increase the model capacity beyond the upper bound derived in the first theorem. Lastly, the third theorem shows that the maximum number of relevant directions around the global minima of the optimization problem is always smaller than the upper bound, hence imposing a maximal rank on the Hessian when evaluated at the solution.

For numerical demonstration, three different optimization tasks were considered: i) finding the groundstate energy with a Variational Quantum Eigensolver (VQE), ii) decomposing a unitary to quantum gates through unitary compilation, and iii) compressing information with quantum autoencoding. For VQE, the problem sizes of n = 4, 6, 8, 10 qubits were considered for ansatzes with different depths L with both open and closed boundary conditions. After averaging over 50 random parameter initializations, it is observed that the loss function decreases exponentially with each optimization step when the number of layers is large enough. Similarly for unitary compilation, the numerical results suggest that as the depth of the circuit increases, the convergence towards the global optimum improves dramatically until reaching a saturation point. Lastly, the results for quantum autoencoding were obtained for a system of n = 4 qubits and for the same hardware efficient ansatz used for unitary compilation. Here, it was again observed that an over-parameterized QNN is able to accurately reach the global optima in a few iterations. The value of the obtained rank in contrast to the dimension of the DLA suggests that over-parameterization can be achieved with a number of parameters that is much smaller than the upper-bound.

It is noteworthy to mention a crucial difference between the results obtained for the Hamiltonian variational ansatz (theoretical) and for the hardware efficient ansatz (numerical). In the case of Hamiltonian variational ansatz, the dimension of the DLA scaled polynomially with n, whereas for the hardware efficient ansatz the dimension of the DLA grows exponentially with the system size. This implies that the system becomes over-parameterized at a polynomial number of parameters for the first case, while in the latter case one requires an exponentially large number of parameters. Since most ansatzes used for QNNs are ultimately hardware efficient ansatzes which exhibit barren plateaus, this may require an exponential number of parameters to be over-parameterized, which requires more effort in designing scalable and trainable ansatzes. Finally, a take-away message of this work is that QNN architectures with polynomially large DLAs will have more favorable loss landscapes due to the lack of barren plateaus.

One direction of future research would be to consider more complicated loss functions consisting of functions of different QNN contributions, such as those used in Quantum Circuit Learning for supervised learning or Differentiable Quantum Circuits for solving differential equations. It remains an open question whether such loss function would exhibit similar traits as those single-Hamiltonian minimizations considered here.
11 September 2021

Quantum Classifiers

Recently, there has been extensive research into quantum machine learning models and quantum algorithms that rival their classical counterparts. Some notable examples are image recognition advancements, automated driven cars, identification of skin cancers, and prediction of protein structures. However, one of the most well-known branches of machine learning is classification, which is widely applied in commercial and academic applications ranging from face recognition and recommendation systems to earthquake detection, disease diagnosis and further handling complex classification tasks.

Previous studies on quantum classifiers have extended popular classical classification algorithms to the quantum domain such as the quantum support vector machine, quantum nearest neighbor algorithms, and quantum decision tree classifiers, exhibiting the potential of providing quadratic or even exponential speedups.

This review paper introduces some important algorithms and models for quantum classifiers including i) quantum support vector machines, ii) quantum decision tree, iii) quantum nearest neighbors algorithms, iv) quantum annealing based classifiers, and v) variational quantum classifiers. The experimental advancements of the considered classifiers are also discussed along with theoretical progress.

So far, quantum kernel methods used in quantum support vector machines are proven to be the most successful to handle classification tasks. In that scenario, the quantum computer can be used to map the classical data to a Hilbert state space and estimate the inner products between those states to obtain the kernel matrix, which is further handled by classical computers.

Another strong candidate for classification is the quantum decision tree model, that uses von Neumann entropy for choosing the attributes to split the nodes. Recently proposed binary quantum decision tree classifier, named Q-tree, is based on a probabilistic approach where a quantum computer can be used to accomplish the probabilistic traversal of the decision tree via measurements, while tree inductions and predictions of query data can be integrated into this framework as well.

For the case of the quantum nearest neighbors algorithm, the authors discuss two algorithms. First is the quantum version of the nearest centroid algorithm which shows an exponential speedup for both the size of the training set and the dimension of the vectors. However, this is inefficient in few situations like when the centroid of the subset lies in the space of other subsets. The issue can be overcome by the second proposed algorithm, which includes a quantum random access memory, amplitude estimation, and the Dürr Høyer minimization algorithm. This algorithm can be implemented for the quantum nearest neighbors algorithm with a quadratic speedup over its classical counterparts.

Another type of classifier is the quantum annealing classifier. As the name suggests, quantum annealing is used for classification tasks reaching high levels of generalization and physical interpretability for various applications such as anomaly detection, Higgs boson classifications, classifying and ranking binding affinities, etc. In quantum annealing, two Hamiltonians are considered: a simple and easy to prepare Hamiltonian with a known ground state, and a problem Hamiltonian whose ground state represents the solution of the problem. By slowly evolving the simple Hamiltonian towards the problem Hamiltonian, one can find the solution of the problem. Quantum annealing uses quantum tunneling to explore the problem space and to avoid local minima.

The last type of quantum classifier that is discussed are variational quantum classifiers (VQCs), which are a subclass of quantum neural networks. As such, the parameters in a classification task will be optimized during the training process to minimize a predefined loss function. Many structures and protocols for VQCs have already been proposed, mainly addressing issues like gradient descent evaluation, neural network structure (convolutional, recurrent, etc.), avoiding barren plateaus, and vulnerability to adversarial attacks.

In terms of experimental advancements and proof-of-concept implementations to physical hardware, the authors discuss the following implementations. The distance-based quantum classifier was demonstrated in the superconducting platform of IBM Quantum Experience with a classic-quantum hybrid approach. The implementations for quantum nearest neighbors algorithms can be carried out using single photons acting as qubits generated through spontaneous parametric down-conversion. A nearest centroid classifier has been experimentally demonstrated on a eleven-qubit trapped ion quantum computer with a quantum device that is now commercially available via IonQ’s cloud service. Variational quantum classifiers have been experimentally implemented on the superconducting platform to classify artificially generated data. In this experiment, a variety of circuit depths was investigated, and it was shown that when error mitigation techniques were employed alongside a depth four circuit, 100% success rate was achieved.

Nevertheless, when discussing quantum advantage from the perspective of computational complexity, it has been proven that the classical optimization problems for variational quantum algorithms are NP-hard, even for classically tractable systems which are composed of only logarithmically many qubits or free fermions. Moreover, quantum advantage in machine learning can also be viewed in terms of the number of times that a quantum process E has been accessed. It has been proven that for any input distribution, a classical machine learning model can achieve accurate predictions on average by accessing E with times comparable to the optimal quantum machine learning model. By comparison, an exponential quantum advantage is possible for achieving an accurate prediction on all inputs. However, despite proven potential advantage, there are several challenges for quantum classifiers that limit their practical applications. First, implementing quantum random access memory (QRAM) with a desirable scale at the current stage. Second, due to inevitable noise in the quantum devices, it is crucial to increase the fidelity of the quantum operations and develop better error correction techniques. Third, further exploration of all aforementioned structures and protocols is required to identify potential advantage against their classical counterparts for future applications.

Based on these limitations, there is no commercial quantum classifier available at the moment. However, with the advancements of hardware platforms, quantum algorithms, and quantum error correction techniques, quantum classifiers are expected to provide significant advantage in the near future.
Variational Quantum Algorithms (VQAs) are one of the most prominent methods used during the Noisy Intermediate Scale Quantum (NISQ) era as they adapt to the constraints of NISQ devices. VQAs are used in a wide range of applications from dynamical quantum simulation to machine learning. NISQ devices do not have the resources to exploit the benefits arising from quantum error correction (QEC) techniques, therefore such algorithms suffer from the effects of noise, which decreases their performance. As a substitute of full QEC, many algorithms employ error mitigation techniques to counter the effects of noise. An example of a VQA is the Variational Quantum Eigensolver (VQE), a setting which does offer some strategies to mitigate errors, however additional strategies are required to complement the existing advantages over noise.

It has been proven that noise can significantly decrease the trainability of linear (or super-linear) depth VQAs. One effect of noise is the flattening of the cost landscape as a function of the variational parameters, thus requiring an exponential precision in system size to resolve its features. This phenomenon is known as Noise-Induced Barren Plateaus (NIBPs), because the algorithm can no longer resolve a finite cost gradient (known as Barren Plateau). It can be advantageous to pursue algorithms with sublinear circuit depth and utilizing strategies that limit the hardware noise.

In this work, the authors investigate the effects of error mitigation on the trainability of noisy cost function landscapes in two regimes: i) asymptotic regime (in terms of scaling with system size) and ii) non-asymptotic regime for various error mitigation strategies including Zero Noise Extrapolation (ZNE), Virtual Distillation (VD), Probabilistic Error Cancellation (PEC), and Clifford Data Regression (CDR). The error mitigation protocols are subjected to a class of local depolarizing noise, known to cause NIBP. The first case of an asymptotic regime was considered in terms of scaling with system size. The theoretical results show that, if VQA suffers from exponential cost concentration, the error mitigation strategies cannot remove this exponential scaling, implying that at least an exponential number of resources is required to extract accurate information from the cost landscape in order to find a cost-minimizing optimization direction.

In the second case, it is shown that VD decreases the resolvability of the noisy cost landscape, and impedes trainability. ZNE also shows similar results under more restrictive assumptions on the cost landscape. It is also observed that any improvement in the resolvability after applying PEC under local depolarizing noise exponentially degrades with increasing number of qubits. Finally, for Clifford Data Regression, there is no change to the resolvability of any pair of cost values if the same ansatz is used.

The work also numerically investigates the effects of error mitigation on VQA trainability for the case when the effects of cost concentration is minimal. This is done by simulating the Quantum Approximate Optimization Algorithm (QAOA) for 5-qubit MaxCut problems on a realistic noise model of an IBM quantum computer, obtained by gate set tomography of IBM’s Ourense quantum device. The results compare the quality of the solutions of noisy (unmitigated) and CDR-mitigated optimization and demonstrate that CDR-mitigated optimization outperforms noisy optimization for all considered implementations.

Unlike all the considered error mitigation strategies, it is shown that CDR reverses the concentration of cost values more than it increases the statistical uncertainty as it has a neutral impact on resolvability under a global depolarizing noise model. Also, it can remedy the effects of more complex noise models. This indicates that CDR could resolve trainability issues arising due to corruptions of the cost function outside of cost concentration, whilst having a neutral effect on cost concentration itself, and thus improving overall trainability. A potential direction for future work will be investigating other mechanisms which can allow mitigation strategies to improve the trainability of noisy cost landscapes. As is known from the theory of error correction, once sufficient resources exist, then NIBPs can indeed be avoided.
30 August 2021

Quantum Alphatron

Quantum machine learning has recently been one of the prominent areas in the field of quantum computing where quantum algorithms can be generalized to solve important problems. For specific problems like search and integer factoring, quantum algorithms can be generalized to important subroutines replacing linear algebra and machine learning. More accurately, the quantum search can be generalized to amplitude amplification allowing speedups for sampling and estimation tasks. Quantum factoring contains the phase estimation subroutine, which can be used for decomposing a matrix into eigenvalues and eigenvectors. Also, quantum kernel methods and quantum gradient computation are being actively researched for various machine learning applications.

Many quantum algorithms use heuristic methods similar to classical machine leaning. In the classical regime, it is difficult to obtain provable guarantees regarding the quality of learning and the run time. However, in quantum regimes, some algorithms retain the benefits of provable classical algorithms for learning and at the same time examine provable quantum speedups for machine learning. One such quantum algorithm that is inspired by its classical analogue is presented in this paper. It is called Alphatron and the classical algorithm is enhanced by quantum sub-routines. It is a gradient-descent-like algorithm for isotonic regression with the inclusion of kernel functions. This algorithm can be used to learn a kernelized, non-linear concept class of functions with a bounded noise term, hence can be exploited to learn a two-layer neural network, where one layer of activation functions feeds into a single activation function. In this work, the authors provide quantum speedups for the Alphatron and its application to non-linear classes of functions and two-layer neural networks.

The work considers pre-computation of the kernel matrix used in the algorithm and provides three sets of algorithms: Alphatron with pre-computation (Pre), approximate pre-computation (Approx Pre), and quantum estimation pre-computation (Q-Pre). Each algorithm offers improvement on the run-time complexity by pre-computation and quantum estimation, eventually improving the original Alphatron. The algorithms were tested against 'n' dimensional data, which is comparatively larger than the other parameters, thus showing relevance to many practical applications. With the aid of pre-computation, the complexity term which is quadratic in the total size of the dataset (m) and linear in "n" dimensions of data, loses a factor of T (number of iterations), therefore improving the performance. Furthermore, utilizing quantum amplitude estimation, a gain of quadratic speedup can be obtained along the dimension.

The work further exploits a setting where the samples are provided via quantum query access in order to harness quantum subroutines to estimate the entries of the kernel matrices used in the algorithm. This access can be provided by quantum random access memory (QRAM) via the GroverRudolph procedure. Such a device stores the data in (classical) memory cells but allows for superposition queries to the data. The cost of the resources is proportional to the length of the vector to set up, but then can provide a run-time of the superposition state that is logarithmic in the length of the vector.

The quantum subroutines used in this work are adaptations of the amplitude estimation algorithm. The results show that the learning guarantees can be preserved despite the erroneous estimation of the kernel matrices. Also, there are estimations inside the main loop of the algorithm which can be replaced with quantum subroutines while keeping the main loop intact. Moreover, in the case where data dimension 'n' is much larger than the other parameters, quantum pre-computation requires asymptotically more time than Alphatron with Kernel, since the same Alphatron with Kernel algorithm is used after the preparation of the kernel matrices. Therefore, optimizing Alphatron with Kernel is not cost-effective when the cost of preparing the data of size n is taken into account. Finally, further exploration of a potential quantum speedup is required for Alphatron with Kernel, especially for the case of pre-computation occurring prior to the algorithm.
26 August 2021

Honeycomb Memory

When designing fault-tolerant quantum memories, geometrical codes have shown significant success. A frequently used geometrical code is the surface code, mainly because it has proven higher circuit-level thresholds among other quantum codes, requiring only 4-body measurements. However, adding more locality, therefore decreasing the number of n-body measurements, can simplify quantum error-correction circuits, either through making the physical layout more sparse or making the syndrome circuit more compact. This can be achieved by decomposing the parity constraints into non-commuting measurements of unprotected degrees of freedom. The crucial limitation of adding more locality is that it often compromises the quality of error-correction, resulting in less information collected about the errors.

One of the potential candidates for building quantum codes is Kitaev's honeycomb model, which requires only 2-body measurements. In the Kitaev’s honeycomb model, the commuting operators serve as low-weight parity constraints, hence the resulting subsystem code from these constraints encodes no protected logical qubits. Based on that model, Hastings and Haah defined a static subsystem code, which they termed honeycomb code, that manifests ‘dynamic’ logical qubits instead of a static subsystem code. These qubits are encoded into pairs of macroscopic anticommuting observables changing in time, which when combined with fault-tolerant initialization and measurement, can lead to the design of a robust error-corrected quantum memory.

In this work, the authors quantify the robustness of logical qubits preserved by the honeycomb code using a correlated minimum-weight perfect-matching decoder. Using Monte Carlo sampling, the work estimates both thresholds and qubit counts required for finite-sized calculations at scale using the honeycomb memory, along with comparisons with the surface code. The authors consider three noisy gate sets: Noisy Gateset, Measurement Ancillae, and Honeycomb Cycle Length, each with an associated circuit-level error model controlled by a single error parameter p. For each of these error models, three figures of merit are considered, namely: the threshold, the lambda factor, and the teraquop qubit count, each one of them depending on the code, decoder, and error model. The notion of the teraquop qubit count is introduced as a good quantification of a code’s overall finite-size performance.

The honeycomb code was simulated using two pieces of software: i) Stim and ii) in-house minimum-weight perfect-matching decoder. For each code and error model, Stim generates a circuit file describing the stabilizer circuit and analyzes each error mechanism in the circuit, determining which detectors the error violates and which logical observables the error flips. A “graph-like” error model is obtained for the decoder which converts it into a weighted matching graph. The decoder optionally notes how errors with more than two symptoms have been decomposed into multiple graph-like (edge-like) errors, and derives reweighting rules between the edges corresponding to the graph-like pieces. These reweighting rules are used for correlated decoding and estimating the logical observable measurement outcome based on the detection event data provided.

When entanglement is generated using two-qubit unitary gates, the numerical results demonstrate a threshold of 0.2% − 0.3% for the honeycomb code compared to a threshold of 0.5% − 0.7% for the surface code in a controlled-not circuit model. It is evident that in this scenario, the 2-body measurements of the honeycomb model are not advantageous in the error correction scheme. The honeycomb memory requires between a 5-10 times qubit overhead to reach the teraquop regime compared to the surface code at 10^-3 error rates.

However, entanglement is instead generated using native two-body measurements (which is advantageous for quantum hardware due to less crosstalk), the honeycomb code achieves a threshold of 1.5% < p < 2.0%, where p is the collective error rate of the two-body measurement gate, including both readout errors and depolarizing noise. The comparative physical qubit savings of a teraquop honeycomb memory at a 10^−3 error rate is several orders of magnitude, because of the proximity to the surface code’s threshold.

The two-body locality of the honeycomb code allows to jointly measure the data qubits directly. This reduces the number of noisy operations in a syndrome cycle compared to decomposing the four-body measurements of the surface code, and eliminates the need for ancilla qubits. As a result, with two-body measurements at a physical error rate of 10^−3, a teraquop honeycomb memory requires only 600 qubits. An important direction to explore is whether the honeycomb code can be embedded naturally in a planar geometry. The main issue would be the proper introduction of boundaries, in a way that does not compromise certain properties of the code, similarly to the surface code. Finally, explicit constructions of the application of logical gates can be another challenging target problem.
The capabilities of present-day NISQ devices are constrained by high noise levels and limited error mitigation of qubits. Hence, optimizing NISQ algorithms to use a limited amount of resources is an essential requirement to reduce the effect of errors. One strategy is to use a hybrid classical-quantum approach that employ shallow quantum circuits to solve the “hard” part of the problem. These shallow circuits are more resilient to noise by construction and require limited resources. Variational Quantum Algorithms (VQA) such as the Variational Quantum Eigensolver(VQE) and Quantum Approximate Optimization Algorithm(QAOA) have shown great promise in exploiting this hybrid approach, where a cost function is evaluated in the quantum circuit whose parameters are optimized by a classical non-linear optimizer. The application areas cover an extensive range including chemistry, machine learning, circuit compilation, and classical optimization among others.

Although VQA algorithms are shown to be resilient against coherent errors, the qubits utilized in the quantum circuit are still affected by decoherence. Also, due to high qubit requirements, quantum error correction methods cannot be utilized with VQAs to overcome the effect of decoherence. Another significant source of error that limits the current capability of quantum devices is the readout error caused by the imperfect measurement devices. To combat the noise, one can have the circuit evolution taking place only on a subspace of the full Hilbert space, which will lead to valid and invalid measurement outcomes. The invalid measurement outcomes are attributed to the noise and are discarded. Another helpful approach in combating noise is encoding the data in a format that indicates errors in a straightforward way. One popular approach is one-hot encoding, which results in one-hot quantum states. Such type of binary encodings are used to obtain qubit-saving formulations for the Travelling Salesman Problem, graph coloring problem, quadratic Knapsack problem, and Max k-Cut problem.

In this paper, the authors propose schemes for error mitigation in variational quantum circuits through mid-circuit post-selection, which is performed by injecting the error mitigation sub-circuit consisting of gates and measurements, that detects errors and discards the erroneous data. The work presents post-selection schemes for various encodings which were used to obtain different valid subspaces of quantum states to be used with VQA while solving particular combinatorial optimization problems and problems from quantum chemistry. The encoding methods are i) k-hot, ii) one-hot iii) domain wall encoding iv) Binary and gray encoding and v) one-hot and binary mixed. The measurement was done via two approaches: post-selection through filtering and post-selection through compression. The advantage of the second approach is that it does not require ancilla qubits.

The work implements one-hot to binary post-selection through a compression scheme to solve the Travelling Salesman Problem (TSP) using the Quantum Alternating Operator Ansatz (QAOA) algorithm. In the case of the one-hot method, encoding works by compressing the valid subspace to a smaller subspace of quantum states and differentiates from the known methods. The experiment results show that for amplitude damping, depolarizing, and bit-flip noise, the mid-circuit post-selection has a positive impact on the outcome compared to just using the final post-selection as a criterion. The proposed error mitigation schemes are qubit efficient i.e. they require only mid-circuit measurements and reset instead of a classical “if” operation. The presented methods are currently applicable to existing NISQ algorithms, as well as outside the scope of VQA and with different objective Hamiltonians. Furthermore, mid-circuit measurements have been increasingly made available by providers of quantum hardware such as IBM and Honeywell.

The ancilla-free post-selection through compression scheme can be applied to any problem where the feasible states are one-hot, including the problems defined over permutations such as Vehicle Routing Problem, variations of TSP, Railway Dispatching Problem, Graph Isomorphism Problem, and Flight Gate Assignment Problem. There are multiple factors that should be considered when designing such schemes, including the complexity of the post-selection, the form of the feasible subspace S, the strength, and form of the noise affecting the computation. It is advantageous to design methods that would choose the optimal number (and perhaps the position) of mid-circuit post-selections to be applied automatically. Utilizing such optimized error mitigation schemes can lead to high level of cancellation of noise in NISQ devices.
The classical world is fundamentally governed by quantum mechanics. When attempting to simulate models of nature, one has to incorporate its quantum-mechanical nature. In many cases, simulating the quantum-mechanical nature is synonymous to quantum time evolution (QTE), which is the process of evolving a quantum state over time with respect to a Hamiltonian H. Current applications of QTE ranges from simulating Ising models, approximating quantum Gibbs states, and solving combinatorial optimization problems.

The implementation of QTE on an actual physical system such as a gate-based quantum computer requires the respective evolution to be translated into quantum gates. This translation can be approximated with approaches like Trotterization. However, this approach may lead to deep quantum circuits depending on the evolution time and expected accuracy, thus is not well-suited for near-term quantum computers. An alternative approach is Variational quantum time evolution (VarQTE) which approximates the target state with a state whose time dependence is projected onto the parameters of a variational ansatz and can be used to simulate quantum time dynamics with parameterized quantum circuits. Thus, it can utilize near-term devices for solving practically relevant tasks. However, available parameterized unitaries possess limited expressivity due to the usage of short circuits which results in an approximation error with VarQTE.

The authors in this work derive global phase-agnostic error bounds for real and imaginary time evolution based on McLachlan’s variational principle. The error bounds for variational quantum real and imaginary time evolution are referred to as VarQRTE (Variational Quantum Real Time Evolution) and VarQITE (Variational Quantum Imaginary Time Evolution), respectively.

The authors proposed theoretical derivations of the error bounds for the Bures metric between a state prepared with VarQTE and the target state. The preparation of the state was taken in infinitesimally small time steps in order to prevent large errors in the numerical simulations. To demonstrate the efficiency of the derived error bounds, the authors present 3 examples; an illustrative Hamiltonian (Hill), an Ising model and two qubit hydrogen molecule approximations. Furthermore, different ODE formulations and solvers (Euler & RK54) were used to implement VarQRTE and VarQITE, with the final goal being the comparison between the error bounds.

For VarQRTE, the results show very tight error bounds for (Hill) when the Euler method was implemented with 100 time steps, as well as an adaptive step size RK54 ODE solver. The results show that RK54 achieves better error bounds as well as smaller fluctuations in the system energy while using significantly less time steps. In the case of the Ising model, the argmin ODE, which is analytically equivalent to solving the standard ODE with a least square solver, leads to smaller errors than the standard ODE. Moreover, the experiment which uses RK54 and the argmin ODE, leads to the smallest state error as well as error bound in the case of hydrogen Hamiltonian. For VarQITE, the application of RK54 reduces the error in the integration in comparison to standard ODE when implemented on (Hill). However, the argmin ODE performs better than the standard ODE by a small margin in case of the Ising model. For VarQITE applied to Hydrogen, the argmin ODE was used with RK54 and all gradient errors were reduced to 0.

The overall results suggest that the error bounds are good approximations to the actual error in case the gradient errors are small, otherwise they can grow fast, exceeding the meaningful range. Also, the error bounds are strongly dependent on the numerical integration method used. For instance, the numerical integration error introduced by forward Euler can easily exceed the error bounds beyond the useful range, however an adaptive step size scheme such as RK54 can significantly increase the numerical stability and accuracy. Furthermore, ODE formulation based on the minimization of the local gradient error also contributes to reducing the simulation errors. An open question for future investigation is to derive error bounds that are applicable for larger errors and quantification of the influence of quantum hardware noise on these error bounds. Finally, a critical investigation of the behavior of VarQTE and the respective error bounds is crucial at designated points, such as phase transitions in order to unlock the full potential of the QTE simulation technique.
The main objective of error mitigation techniques in quantum computing algorithms is to make quantum computation more efficient by reducing the noise level due to noisy quantum hardware to a low level. The main advantage of error mitigation techniques compared to error correction techniques lies in the lower resource requirements.

There already exist a variety of error mitigation techniques. A popular example is ZNE, which relies on collecting data at different noise levels and fitting an extrapolation to the noiseless case. However, this is limited to small size problems and is ineffective in high noise regimes. On the other hand, classically simulable circuits based on the Clifford data regression (CDR) method can be used to inform researchers about noise in a quantum device. CDR constructs a training set using data obtained from classically simulable near-Clifford circuits and performs a regressive fit on an ansatz mapping noisy values to exact expectation values. The fitted ansatz is used to estimate the value of the noiseless result. The combination of both methods is proven to be outperforming either individual method. Another recent method is Virtual Distillation (VD), which mitigates noise by simultaneously preparing multiple copies of a noisy state, entangling them, and making measurements to virtually prepare a state which is purer than the original noisy states, achieving exponential error suppression.

In this work, the authors propose to unify zero noise extrapolation (ZNE), variable noise Clifford data regression (vnCDR), and virtual distillation (VD) into one mitigation framework called UNIfied Technique for Error mitigation with Data (UNITED). The objective is to combine the advantages of each method in order to improve the mitigation of noise. The methods were applied to the mitigation of errors in local observables, which are measured from states prepared with random circuits. Each of these methods has different strengths and resource requirements, but their performance with 10^5 - 10^10 total shots was considered enough to mitigate the expectation value of an observable of interest.

The work combines the ideas of near-Clifford circuit data and the multi-copy data from VD to formulate Clifford guided VD (CGVD), which uses Clifford data to guide a Richardson-like extrapolation to the many copy limit. Furthermore, the work generalizes this method to perform Clifford-guided double Richardson-like extrapolation to the zero-noise limit. It is also shown that performance improves if the CDR training data are supplemented with data from the training circuits that run under increased noise. Similarly, supplementing the vnCDR training set with results obtained by VD will improve its performance.

A realistic simulation of a noisy trapped-ion device was used to enable all-to-all connectivity and to benchmark UNITED and the other methods for problems with various numbers of qubits, circuit depths, and total numbers of shots. The results show that different techniques are optimal for different shot budgets. For instance, ZNE is comparatively more efficient for small shot budgets (10^5), while Clifford-based approaches perform better for larger shot budgets (10^6 - 10^8). UNITED outperforms all other methods for shot budgets on the scale of 10^10.

These results offer new hope for future work in the direction of further unification with other error mitigation techniques such as noise-resilient quantum compiling, verified phase estimation, and specific approaches leveraging symmetries and/or post-selection. Another potential improvement lies in the timing of measurements of new expectation values (i.e. more near-Clifford circuits, more noise levels, more copies of the state, etc.) versus expending more shots per expectation value in order to get the best mitigation possible, given fixed resources. Finally, investigation of the effects of noisy controlled derangement on VD and UNITED performance is crucial in determining the maximum potential of these methods.
Quantum Machine Learning (QML) techniques have been rigorously researched in the past few years, showing great promise in a number of applications, using various quantum algorithmic techniques and potential speedups as compared to its classical counterparts. Many of these techniques exploit the high dimensionality of the Hilbert space through kernel methods while most exponential speedups require fault-tolerant quantum computers with access to a quantum memory (QRAM).

While previous works have successfully established quantum analogues to common classical approaches like classification, clustering, regression, and other tasks in data analysis, there has been a lack of a clear demonstration of a quantum speedup for tasks on classical datasets for existing quantum neural networks proposals. While algorithms for quantum machine learning are largely based on methods from linear algebra, neural networks rely on nonlinearity to act as a universal approximator. Given the success of deep neural networks in classical machine learning, the use of quantum computers to efficiently train classical deep neural networks remains an interesting open question.

In this work, the authors show how one may exploit the benefit of deep neural networks via the Neural Tangent Kernel (NTK) formalism, i.e. increasing the number of hidden layers L. As the neural network is deepened, the NTK matrix becomes increasingly well-conditioned, improving the speed at which gradient descent trains the neural network. The authors propose a general framework for fully connected neural network architectures via a quantum algorithm to train a wide and deep neural network under an approximation of the NTK, estimating the trained neural network output with vanishing error as the training set size increases. Furthermore, the linear form of the NTK offers a general framework for neural network architectures similar to those used in state-of-the-art applications of deep learning. The work provides two different approximations: a sparsified NTK and a diagonal NTK approximation. The diagonal NTK is defined by setting all off-diagonal elements of the NTK to zero, while the sparsified NTK is defined by only permitting off-diagonal elements to be nonzero in any row or column. For both-approximations, the matrix element bounds of the NTK guarantee the convergence of the approximation and enable efficient gradient descent which highlights the correspondence between conditions for trainable classical neural networks and an efficient quantum algorithm.

The sparsified NTK approximation has a strictly tighter upper bound given by the Gershgorin circle theorem on the error compared to the diagonal NTK approximation, suggesting the superior performance of the former. To support this theoretical result, numerical demonstration has been done using the MNIST image dataset, more accurately considering the MNIST binary image classification task between pairs of digits. Two infinite-width neural network architectures are used: i) a feedforward fully-connected neural network and ii) an architecture based on the convolutional Myrtle network. In each case, the handwritten image dataset is projected onto the surface of a unit sphere, during the initial encoding into QRAM. The results demonstrate that even relatively common data distributions satisfy the necessary conditions for efficient quantum state preparation and quantum state measurement, fully realizing an exponential quantum speedup over gradient descent.

This work demonstrates a quantum algorithm to train classical neural networks in logarithmic time O(log n) and provides numerical evidence of such efficiency on the standard MNIST image dataset. As the neural tangent kernel offers a versatile framework across architectures such as convolutional neural networks, graph neural networks, and transformers, the approximation introduced by a sparsified or diagonal kernel in this work has the potential to extend to any chaotic kernel. Also, the increase of the depth of a chaotic kernel might give rise to the kernel structure key to well-conditioning and successful approximation in logarithmic time. This can potentially open new possibilities for improved machine learning methods to quantum computing beyond the scope of classical neural networks.
In recent years, there have been significant developments in the field of quantum algorithms, especially in applications involving quantum chemistry, and representations of molecular Hamiltonians. These include variational quantum algorithms that aim to maximize the algorithmic performance despite the limited coherence times of currently available hardware. Unfortunately, that comes at the cost of introducing heuristic aspects, making it difficult to obtain rigorous performance guarantees. In contrast, quantum phase estimation approaches provide a route to accurately calculate eigenstates to small specifiable error, assuming sufficiently high overlap with the true eigenstates during the preparation of approximate eigenstates. Also, in fault-tolerant quantum computation, required resources will depend on the ability to bound errors in the algorithm which means that tighter error estimates will lead to fewer resource requirements.

Estimation of the resources required for phase estimation can be done using product-formula decomposition, also known as Trotterization. Previous works have shown that the number of fermions in the system can be used to offset the dependence of the error on the number of orbitals that can be applied to chemical systems in realistically sized basis sets. In this work, the authors present three factorized decompositions and use these in conjunction with the fermionic seminorm to obtain tighter Trotter error bounds in practice and apply it to the particular case of simulating a uniform electron gas (Jellium). Another objective of this approach is reducing the classical runtime and the gate complexity of quantum algorithms.

Each of the three factorized decompositions used in the work; namely spectral decomposition, cosine decomposition, and Cholesky decomposition, exhibits its own advantage. The spectral decomposition is generally applicable and extends beyond the plane wave dual basis. The cosine decomposition best exploits fermion number information and thus can be the most effective in the low-filling fraction regime. The Cholesky decomposition has the smallest constant factor overhead and therefore performs best in the medium and half-filling regimes.

The approach is applied to simulation of a uniform electron gas, finding substantial (over 100×) improvements in Trotter error for low-filling fraction and pushing towards much higher numbers of orbitals than is possible with existing methods. The approach was benchmarked against three prior art bounds: the analytic SHC bound, the fermionic commutator approach, and a similar Pauli commutator approach. A substantial classical runtime advantage for the calculation of the bounds was observed. In the case of fermionic and Pauli commutator approaches, the calculation of large spin-orbital number N>200 becomes intractable without access to > 100 GB of RAM. In contrast, the new bounds on a 512 spin-orbital can be calculated in less than 6 hours using < 16 GB of RAM. The work also calculates the T-bound for phase estimation of the ground state energy for Jellium, with significant improvements in gate complexity of over 10× as compared to the existing Trotter-based approaches. The gate counts in the results demonstrate that the trotter approach can comparatively perform better in specific regimes of interest than post-Trotter methods like qubitization. This holds true even while using fewer gates than qubitization for a large enough Wigner-Seitz radius.

The work focuses on second-order Trotter in the plane wave dual basis, but the techniques can be further generalized to higher-order trotter bounds. While the spectral and Cholesky decompositions are applicable in case of compact basis sets, an interesting potential lies in the cosine decomposition to obtain an even tighter bound in the low-filling fraction regime. Also, calculation of the higher-order trotter bounds would require time (scaling as O(N^7)), making it a costlier run. A potential alternative would be the post-Trotter methods, which exhibit superior asymptotic performance with respect to target error as well as being cost-effective due to the use of Hamiltonian factorizations. However, trotter methods possess good constant pre-factors in the runtime and few additional ancilla qubits requirements compared to post-Trotter methods, suggesting that trotter methods could perform comparatively better at specific tasks in the pre-asymptotic regime. It would be interesting to explore whether second quantized post-Trotter methods can similarly exploit low-filling fractions, where Trotter methods perform better.
Classical machine learning research is blossoming in recent years. Some of the most interesting approaches focus on studying graphs and developing algorithms to exploit the information contained in their structures. In many fields relevant to the modern world, such as chemistry, bioinformatics, social network analysis, computer vision, and natural language, one can find inherent graph structures which possess valuable information about the underlying problem structure. One way to find the similarities between various graph structures is done via graph kernels. The graph kernel approach is based on finding a way to associate any graph with a feature vector encapsulating its relevant characteristics (the feature map) and then computing the similarity between those vectors, in the form of a scalar product in the feature space.

However, the design of a graph kernel is always dependent on the problem and the specific features that need to be encapsulated. Thus, its biggest limitation is the algorithmic complexity when applied to a variety of problems. Typical examples are the graphlets subsampling and the random walk kernels. In this work, the authors propose a versatile and easily scalable graph kernel based on the time-evolution of a quantum system, where the information about a graph is encoded in the parameters of a tunable Hamiltonian acting on an array of qubits. An alternating sequence of free Hamiltonian evolution produces measurement samples that retain key features of the data.

The proposed Quantum Evolution (QE) Kernel approach is based on associating each graph with a probability distribution, obtained by the measurement of an observable on a quantum system whose dynamics are driven by the topology of the associated graph. The QE kernel between two graphs is given as a distance between their respective probability distributions. The protocol was tested on a graph classification task on several datasets and results were compared with other graph kernels. Each graph kernel is associated with a Support Vector Machine, and the accuracy is obtained via a score, namely the proportion of graphs in the test set that is attributed to the correct class. A grid search is performed in the defined range and the value with the best score is selected and averaged over 10 repetitions of the following cross-validation procedure: the dataset is first randomly split into 10 subsets of equal size, and the accuracy of the kernel is then measured on each of these 10 subsets, after having been trained on the 9 other subsets. The accuracy resulting from this split is then the average accuracy over the 10 possible choices of the test subset. The performance of the kernel is also shown to be stable against detection error and noise.

The results show that for all datasets, at least one quantum kernel is better than the best classical kernel. The depth of the time evolution is quite crucial as better accuracy is obtained by increasing the number of layers for the Ising evolution, suggesting richer features can be captured by adding more layers. However, as shown in the IMDB-MULTI and Fingerprints experiments, the larger layer Ising scheme performed worse, because of the incomplete convergence of the optimization. Bigger datasets require more function evaluations, which require a larger computational budget.

The physical implementation of the approach is presented for the case of a programmable array of qubits made of neutral atoms trapped in optical tweezers. By promoting the atoms to Rydberg states, they behave as huge electric dipoles and experience dipole-dipole interactions that map into spin Hamiltonians. The spins undergo various types of interactions depending on the Rydberg states involved in the process, leading to different Hamiltonians. This platform is limited to smaller graphs for now (~100 nodes), however, the authors point out that potentially a significant speed-up could also be gained by implementing variations of near-term quantum computing algorithms running on classical compute hardware like GPUs, in a so-called quantum-inspred manner, possibly applicable to the current approach as well. Moreover, designing the right observable and choosing the appropriate Hamiltonian are crucial parameters, in order to improve the accuracy of the design. Finally, potential improvement of such experiments lies in the use of multi-kernel learning, which combines different kernels built from the same sequence of measurements on the final states, and optimization of noise cancellation techniques.
The study of quantum many body systems is one of the most significant applications of quantum computing and has applications in physics, materials science, and chemistry. Classical algorithms and methods such as density functional theory, Monte Carlo, and density-matrix renormalization group have assisted in solving some problems in many body systems but are proven inefficient when it comes to a general class of problems. The reason classical computers fall short in simulating quantum many-body physics is that describing an 𝑛-qubit quantum system accurately may require an amount of classical data that is exponential in 𝑛.

Recently, classical machine learning (ML) techniques have shown promising results in investigating problems in quantum many-body physics. Although these methods are experimentally effective with present size systems, they cannot provide the theoretical proof for efficiently solving larger system sizes, which cannot be solved by these classical algorithms yet. In this work, the authors attempt to establish classical machine learning algorithms based on the concept of a classical shadow derived from randomized Pauli measurements. The objective is to examine two potential applications of classical ML. The first application involves learning to predict classical representations of quantum many-body ground states while the second attempts to classify quantum states of matter into phases in a supervised learning scenario.

In the first application, a family of Hamiltonians is considered, where the Hamiltonian 𝐻(𝑥) depends on m real parameters. The ML algorithm is trained on a set of training data consisting of sampled values of 𝑥, each accompanied by the corresponding classical shadow for the ground state of the Hamiltonians. The ML algorithm then predicts a classical representation of the ground states for new values of 𝑥, hence estimating ground state properties using the predicted classical representation. This method is shown to be efficient for the case of predicting ground state properties that do not vary too rapidly as a function of 𝑥, with provable bounds.

In the second application, in order to predict the phase label for new quantum states that were not encountered during training, it was assumed that any two phases can be distinguished by a nonlinear function of marginal density operators of subsystems of constant size. It is shown that if such a function exists, a classical ML algorithm can learn to distinguish the phases using an amount of training data and classical processing which are polynomial in the system size. The authors also performed numerical experiments on quantum systems such as the Rydberg atom chain and 2D antiferromagnetic Heisenberg model, in order to test the efficiency of the learning and prediction phases. The numerics verify the claim that classical ML algorithms can efficiently classify a wide range of quantum phases of matter, in particular when informed by data collected in physical experiments and/or simulations on quantum computers.

Overall, the work focuses on the classical shadow formalism which assumes that each quantum state is represented by its classical shadow, that could be obtained either from a classical computation or from an experiment on a quantum device. The classical ML algorithm is then trained on labeled classical shadows, andshadows and learns to predict labels for new classical shadows. This is an interesting concept that can be potentially used for other classical representations of quantum states which could be exploited by classical ML. As an added bonus, existing highly programmable quantum simulators, which lack local controls in implementing single qubit Pauli measurements, can be benefitted from such a formalism.

An interesting outlook for quantum computing can then be predicted as follows; typically, the computational scaling of quantum computaitonal algorithms can be shown to be significantly better than classical algorithms for the same task in quantum many-body problems. However, the requirements of fault-tolerant quantum computers and other overhead implies a pre-factor that allows only a limited number of (very accurate) simulations to be performed in a reasonable timeframe. In this way, quantum computing may not help in generating huge datasets or explore exhaustively a large configurational space to design new materials, by themselves. However, when combined with Machine Learning approaches such as those presented in the work discussed today, the accurate quantum-computer generated data may replace the physical experiments required for generating the data in this strategy. That in-turn allows for a pure in-silico and more cost-effective strategy for discovery and design of new materials and understanding of (potentially new) physics.
Page 1 of 11

What's Interesting?

How can we help you?

Invalid Input

Invalid Input

Invalid Input

Invalid Input

Invalid Input

Invalid Input

Copyright © Qu & Co BV
close