Two-Sided Deep Reinforcement Learning for Dynamic

Mobility-on-Demand Management with Mixed-Autonomy

Xie Jiaohong, Liu Yang*, Chen Nan

Congratulations to Ms. Xie Jiaohong (supervised by A/Prof. Chen Nan and Dr. Liu Yang), who won Second Place in the Student Research Competition in the virtual event on Artificial Intelligence Enabled Next Generation Transportation Systems. This was organized by the Artificial Intelligence in Transportation Committee of ASCE Transportation & Development Institute. Her research and presentation are titled “Two-Sided Deep Reinforcement Learning for Dynamic Mobility-on-Demand Management with Mixed-Autonomy”.

Below is a short description of her work presented in this competition.

Autonomous vehicles (AVs) are expected to operate on Mobility-on-Demand (MoD) platforms because AV technology enables flexible self-relocation and system-optimal coordination. Unlike the existing studies, which focus on MoD with pure AV fleet or conventional vehicles (CVs) fleet, we aim to optimize the dynamic fleet management of an MoD system with a mixed autonomy of CVs and AVs. We consider a realistic case that human drivers may relocate freely and learn strategies to maximize their own compensation. In contrast, AVs are fully compliant with the platform's decisions. To achieve a high level of service provided by a mixed fleet, we propose that the platform prioritizes human drivers in the matching decisions, when on-demand requests arrive, and dynamically determines the optimal commission fee to influence drivers' behavior.

Figure 1 Illustration of the MoD system with mixed fleet

However, it is challenging to make efficient real-time fleet management decisions when spatio-temporal uncertainty in demand and complex interactions among human drivers and operators are explicitly considered. To tackle the challenges, we develop a two-sided multi-agent deep reinforcement learning (DRL) approach, in which the operator acts as a supervisor agent on one side and makes centralized decisions on the mixed fleet, and each CV driver acts as an individual agent on the other side and learns to make desirable decisions non-cooperatively.

Figure 2 The two-sided multi-agent reinforcement learning approach

For the first time, a scalable algorithm, which uses the actor-critic (A2C) method and mean-field approximation method to train the agents, is developed here for the mixed fleet management. Furthermore, deep neural networks (DNNs) are adapted to enhance the approximation for our high-dimensional and large-scale problems. We propose a two-head policy network to enable the supervisor agent to make two sets of decisions based on one policy network, which greatly reduces the computational time. The proposed approach is validated using a case study in New York City using real taxi trip data. Results show that our algorithm can make high-quality decisions quickly and outperform benchmark policies. Our fleet management strategy makes both the platform and the drivers better off, especially in scenarios with higher demand volume.

Figure 3 Training curves of gross merchandise value(GMV)

Figure 4 Training curves of order fulfilment rate (OFR)