IBM Research set out to build an open-source vision-language language model (VLM) that could analyze not only natural images but the charts, tables, and other data visualizations that are the mainstay of enterprise reports
Data visualizations make complex information more accessible, and even memorable, distilling a sea of words and numbers into a tight, compelling story. And while AI models excel at summarizing pages of text, they often miss the big picture when it comes to tidy visualizations.
The ability to grasp important takeaways in a chart or table involves knowing how to interpret closely entwined linguistic and graphical information. Even multi-modal language models trained on both text and images can struggle to make sense of graphical data that us humans find so compelling.
To close this gap, IBM Research set out to build an open-source vision-language language model (VLM) that could analyze not only natural images but the charts, tables, and other data visualizations that are the mainstay of enterprise reports. The first version of Granite Vision, released under an Apache 2.0 license, is now available on Hugging Face.
Granite Vision is fast and inexpensive to run. It’s also competitive with other small, open-source VLMs at extracting information from the tables, charts, and diagrams featured in popular document understanding benchmarks.
