Fields such as robotics, medicine, and political science are increasingly exploring how to train artificial intelligence (AI) to make impactful decisions. Imagine using AI to intelligently manage traffic in a busy city, enabling faster travel for drivers while enhancing both safety and sustainability.
However, training AI to make effective decisions is a challenging endeavor.
Reinforcement learning models, the backbone of these AI decision-making systems, often falter when confronted with small task variations. For example, a model might struggle to oversee intersections with differing speed limits, lane numbers, or traffic behaviors.
To enhance the reliability of reinforcement learning models for complex, variable tasks, MIT researchers have proposed a more efficient training algorithm.
This algorithm smartly identifies the best tasks for training an AI agent, enabling it to perform effectively across a range of related challenges. In traffic signal management, each task could represent control over individual intersections in a metropolis.
By concentrating on a select number of intersections that significantly influence the overall efficiency of the algorithm, this innovative approach maximizes performance while minimizing training costs.
The researchers discovered that their method was five to fifty times more efficient than traditional techniques across various simulated tasks. This boost in efficiency allows the algorithm to identify premium solutions in a fraction of the time, enhancing the AI agent’s performance.
“We achieved remarkable performance gains with a straightforward algorithm by thinking creatively. A less complicated algorithm is more likely to be embraced by the community because it’s easier to implement and understand,” states Cathy Wu, the Thomas D. and Virginia W. Cabot Career Development Associate Professor in Civil and Environmental Engineering (CEE) and the Institute for Data, Systems, and Society (IDSS), also a member of the Laboratory for Information and Decision Systems (LIDS).
Wu collaborated on the research paper with lead author Jung-Hoon Cho, a CEE graduate student; Vindula Jayawardana, a graduate student in the Department of Electrical Engineering and Computer Science (EECS); and Sirui Li, studying at IDSS. Their findings will be presented at the Conference on Neural Information Processing Systems.
Striking a Balance in AI Training
When training an algorithm to regulate traffic lights across multiple intersections, an engineer typically faces a choice between two primary methods. One approach is to train separate algorithms for each intersection using only that intersection’s data. The alternative is to train a singular algorithm using data from all intersections, then apply it to each individual setting.
Both strategies come with inherent drawbacks. Training individual algorithms for each task is labor-intensive and demands substantial data and computational resources, while a collective algorithm often results in mediocre performance.
Wu and her team aimed to find a middle ground between these options.
In their method, they select a subset of tasks, training one algorithm for each task autonomously. Crucially, they choose tasks that are most likely to enhance the algorithm’s overall effectiveness.
They utilize a common reinforcement learning strategy called zero-shot transfer learning, which enables an already-trained model to tackle a new task without further training. Typically, this leads to remarkable performance on related tasks.
“Ideally, we would train on all tasks, but we wondered if we could limit our training to a subset while still boosting performance across all tasks,” Wu explains.
To determine which tasks to prioritize, the researchers developed the Model-Based Transfer Learning (MBTL) algorithm.
MBTL features two key components: it evaluates how well each algorithm would perform if trained independently on a task, and it assesses the potential performance decline when transferring the trained model to other tasks, a concept referred to as generalization performance.
This modeling of generalization performance enables MBTL to ascertain the value of training on new tasks effectively.
MBTL operates sequentially, initially selecting the task that yields the most significant performance gain, then identifying additional tasks that provide further improvements.
By concentrating solely on the most promising tasks, MBTL dramatically enhances training efficiency.
Slashing Training Costs
When the researchers applied this technique to various simulated tasks—including traffic signal control, real-time speed advisories, and classic control challenges—they found it to be five to fifty times more efficient than conventional methods.
This means they could achieve comparable results using far less data. For instance, with a fiftyfold efficiency increase, the MBTL algorithm could train effectively using only two tasks, achieving results equivalent to a standard method that requires data from one hundred tasks.
“This indicates that either the data from the other ninety-eight tasks wasn’t required, or that training on all one hundred tasks created confusion for the algorithm, resulting in worse performance than ours,” Wu notes.
With MBTL, even a slight extension of training time can lead to significantly enhanced results.
In future work, the researchers aspire to develop MBTL algorithms that handle more complex scenarios, such as high-dimensional task spaces. They also plan to apply their approach to real-world challenges, particularly in next-generation mobility systems.
This research is partially supported by a National Science Foundation CAREER Award, the Kwanjeong Educational Foundation PhD Scholarship Program, and an Amazon Robotics PhD Fellowship.
Photo credit & article inspired by: Massachusetts Institute of Technology