There has been a large body of work investigating potential advantages of quantum computational algorithms over their classical counterparts, with the ultimate proof being an experimental verification of a "quantum advantage”. Some of the classical problems that are being targeted include factorization of large integers and unstructured database searching. While the advances in both experimental hardware and software are driving the field slowly but steadily towards quantum supremacy, more efforts are still required in both fields. Some of the most promising algorithms for potential near-term quantum advantages include the class of variational quantum algorithms (VQAs). VQAs have been applied to many scientific domains, including molecular dynamical studies, and quantum optimization problems. VQAs are also studied for various quantum machine learning (QML) applications such as regression, classification, generative modeling, deep reinforcement learning, sequence modeling, speech recognition, metric and embedding learning, transfer learning, and federated learning.

In addition to VQAs being applied as quantum implementations of classical machine learning paradigms, conversely VQAs may also themselves benefit from various machine learning paradigms, with one of the most popular being Reinforcement Learning (RL). RL has been utilized to assist in several problems in quantum information processing, such as decoding errors, quantum feedback, and adaptive code design. While so-far such schemes are envisioned with a classical computer as a RL co-processor, implementing “quantum” RL using quantum computers has been shown to make the decision-making process for RL agents quadratically faster than on classical hardware.

In this work, the authors present a new quantum architecture search framework that includes an RL agent interacting with a quantum computer or quantum simulator powered by deep reinforcement learning (DRL). The objective is to investigate the potential of training an RL agent to search for a quantum circuit architecture for generating a desired quantum state.

The proposed framework consists of two major components; a quantum computer or quantum simulator and an RL agent hosted on a classical computer that interacts with the quantum computer or quantum simulator. In each time step, the RL agent chooses an action from the possible set of actions consisting of different quantum operations (one- and two-qubit gates) thereby updating the quantum circuit. After each update, the quantum simulator executes the new circuit and calculates the fidelity to the given target state. The fidelity of the quantum circuit is then evaluated to determine the reward to be sent back to the agent; positive in case the fidelity reaches a pre-defined threshold, else negative.

The authors use the set of single-qubit Pauli measurements as the “states” or observations which the environment returns to the RL agent. The RL agent is then iteratively updated based on this information. The procedure continues until the agent reaches either the desired threshold or the maximum allowed steps. In this work, RL algorithms like A2C and PPO were used to optimize the agent. The results demonstrate that given the same neural network architecture, PPO performs significantly better than the A2C in terms of convergence speed and stability for both the case of noise-free and noisy environments. The result is shown to be consistent with the 2-qubit case.

In this work, the simulation of quantum circuits in both noise-free and noisy environments is implemented via Qiskit software from IBM. The efficiency of such an approach when it comes to large quantum circuits is quite low as the complexity scales exponentially with the number of qubits. Theoretically, a quantum circuit may approximate any quantum state (up to an error tolerance) using a finite number of gates, given a universal set of one and two-qubit quantum gates. Hence, in principle, the RL approach is valid for arbitrary large qubit size but is extremely hard to implement computationally. It would also require thousands of training episodes to verify this kind of experiment on real quantum computers. One can expect significant development in the future when quantum computing resources are more accessible. Finally, another interesting scope will be to investigate the quantum architecture search problem with different target quantum states and different noise configurations.