Humans may be one of the biggest roadblocks preventing fully autonomous vehicles from entering city streets.
If a robot is going to drive a vehicle safely in downtown Boston, it needs to be able to predict what nearby drivers, cyclists and pedestrians are going to do next.
Behavior prediction is a difficult problem, however, and current AI solutions are either too simplistic (they can assume that pedestrians always walk in a straight line) or too conservative (to avoid pedestrians, the robot simply leaves the car in the park), or may only predict an agent’s next trips (roads typically carry many users at once).
MIT researchers have come up with a deceptively simple solution to this complicated challenge. They break a multi-agent behavior prediction problem into smaller pieces and tackle each one individually, so that a computer can solve this complex task in real time.
Their behavior prediction framework first assumes the relationships between two road users -; which car, cyclist or pedestrian has the right of way, and which officer will yield the right of way; and uses these relationships to predict the future trajectories of multiple agents.
These estimated trajectories were more accurate than those from other machine learning models, compared to actual traffic flow in a huge dataset compiled by self-driving company Waymo. The MIT technique even outperformed Waymo’s recently released model. And because the researchers broke the problem into simpler chunks, their technique used less memory.
“It’s a very intuitive idea, but no one has fully explored it before, and it works pretty well. The simplicity is definitely a plus. We benchmark our model against other leading models in the field, including the one from Waymo, the leading company in this field, and our model achieves the best performance on this difficult benchmark. This has a lot of potential for the future”, says co-lead author Xin “Cyrus” Huang, a graduate student in the Department of Aeronautics and Astronautics and a research assistant in the lab of Brian Williams, a professor of aeronautics and astronautics and a member of the Laboratory of Computing and of artificial intelligence (CSAIL ).
Joining Huang and Williams on the paper are three researchers from Tsinghua University in China: co-lead author Qiao Sun, a research assistant; Junru Gu, a graduate student; and lead author Hang Zhao PhD ’19, assistant professor. The research will be presented at the Computer Vision and Pattern Recognition conference.
Several small Models
The researchers’ machine learning method, called M2I, takes two inputs: the past trajectories of cars, cyclists and pedestrians interacting in a traffic environment such as a four-way intersection, and a map with street locations. , lane configurations, etc.
Using this information, a relationship predictor infers which of the two agents has the right of way first, classifying one as the giver and the other as the giver. Then a prediction model, called marginal predictor, guesses the trajectory of the passing agent, since this agent behaves independently.
A second prediction model, known as the conditional predictor, then guesses what the yielding agent will do based on the actions of the passing agent. The system predicts a number of different trajectories for the dealer and setter, calculates the probability of each individually, and then selects the six joint outcomes with the highest probability of occurring.
M2I produces a prediction of how these agents will move through traffic for the next eight seconds. In one example, their method had a vehicle slow down so a pedestrian could cross the street, then speed up when they cleared the intersection. In another example, the vehicle waited until several cars had passed before turning from a side street onto a busy main road.
While this initial research focuses on the interactions between two agents, M2I could infer relationships between many agents and then guess their trajectories by connecting several marginal and conditional predictors.
Real-life driving tests
The researchers trained the models using the Waymo Open Motion dataset, which contains millions of real-life traffic scenes involving vehicles, pedestrians and cyclists recorded by sensors and lidar cameras (detection and ranging of light) mounted on the company’s autonomous vehicles. They focused specifically on cases with multiple agents.
To determine accuracy, they compared each method’s six prediction samples, weighted by their confidence levels, to the actual trajectories followed by cars, cyclists and pedestrians in a scene. Their method was the most accurate. It also outperformed baseline models on a metric known as overlap rate; if two trajectories overlap, this indicates a collision. M2I had the lowest overlap rate.
“Rather than simply building a more complex model to solve this problem, we took an approach that more closely resembles the way a human thinks when reasoning about interactions with others. A human does not reason about all the hundreds combinations of future behaviors. We make pretty quick decisions,” Huang said.
Another advantage of M2I is that because it breaks the problem down into smaller pieces, it is easier for a user to understand the model’s decision making. In the long run, this could help users trust self-driving vehicles more, Huang says.
But the framework can’t account for cases where two officers influence each other, such as when two vehicles are each moving forward at a four-way stop because drivers don’t know who should yield.
They plan to address this limitation in future work. They also want to use their method to simulate realistic road user interactions, which could be used to verify self-driving car scheduling algorithms or create massive amounts of synthetic driving data to improve model performance.
This research is supported, in part, by the Qualcomm Innovation Grant. The Toyota Research Institute also provided funds to support this work.