A remarkable achievement has been reached by a group of scientists hailing from MIT and the MIT-IBM Watson AI Lab. They have introduced a groundbreaking AI model with the potential to greatly enhance the high-resolution computer vision functions that could be vital for autonomous vehicles. Autonomous cars demand swift and accurate object identification and this novel model holds the promise of significantly boosting their speed and efficiency in achieving this goal.
The model, named EfficientViT, tackles the complex task of semantic segmentation, which involves categorizing every pixel in a high-resolution image. Unlike previous models that struggled with the computational complexity of high-resolution images, it achieves the same accuracy with only linear computational complexity.
This breakthrough has the potential to make autonomous vehicles up to nine times faster in processing high-resolution images on devices with limited hardware resources, like on-board computers in self-driving cars. Moreover, it maintains or even improves accuracy compared to existing models.
The approach of research team simplifies the attention map generation process used in vision transformers, reducing the computational load while maintaining global context understanding. To compensate for the accuracy loss due to this simplification, they added two components that capture local features and enable multiscale learning.
EfficientViT is not limited to autonomous vehicles. It could also enhance other high-resolution computer vision tasks such as medical image segmentation. The researchers emphasize the importance of balancing performance and efficiency, making this model suitable for various devices and applications.
This groundbreaking research opens new doors for AI applications, potentially benefiting industries beyond self-driving cars and it showcases the potential of efficient AI computing in real-world scenarios.