Shanghai Artificial Intelligence Laboratory has open sourced [InternVLA-A1], a vision-language-action end-to-end unified model
The Shanghai Artificial Intelligence Laboratory has open sourced InternVLA-A1, a unified vision-language-action model that can perform tasks under natural language prompts, such as "put the pen on the table into the pen holder" without preset coordinates. It solves the problem of information loss between traditional perception and action, and supports multi-modal perception and understanding.
The Shanghai Artificial Intelligence Laboratory has open sourced InternVLA-A1, a unified vision-language-action model that can perform tasks under natural language prompts, such as "put the pen on the table into the pen holder" without preset coordinates. It solves the problem of information loss between traditional perception and action, and supports multi-modal perception and understanding.