Affiliation:
1. The Smart Materials Research Institute, Southern Federal University
2. Vorovich Institute of Mathematics, Mechanics, and Computer Sciences, Southern Federal University
Abstract
The yield of reaction products depends on the interaction between processes on the catalyst surface: adsorption, activation, reaction, desorption, and others. These processes, in turn, depend on the magnitude of the flows of reaction mixtures, temperature, and pressure. Under stationary conditions, active sites on the surface can be poisoned by reaction by-products or blocked by an excess of adsorbed reactant molecules. Dynamic control of reaction parameters takes into account changes in surface properties and adjusts temperature, flow rates and other parameters accordingly. A reinforcement learning algorithm was applied to control the oxidation reaction of carbon monoxide CO on the surface of palladium nanoparticles. The algorithm was trained to maximize the rate of carbon dioxide production based on information about the magnitude of CO, O2 and CO2 fluxes at each time step. A gradient policy algorithm with a continuous action space was chosen, and observations of the flow rates were extended over several successive time steps, which made it possible to obtain a set of non-stationary solutions. The maximum yield of the product is achieved with a periodic change in gas flows, which ensures a balance between the available adsorption sites and the concentration of activated intermediates. This methodology opens up prospects for optimizing catalytic reactions under nonstationary conditions.
Publisher
The Russian Academy of Sciences