A new study by Databricks revealed that interest in large language models (LLMs) has surged over a thousand-fold in the last six months between November 2022 and May 2023.
“The historic surge of interest in large language models (LLMs) since ChatGPT launched to the public late last year has made the topic inescapable. Not only is the technology improving at an unparalleled cadence, but companies are also building their own models like never before,” says Nick Eayrs, vice president, field engineering at Databricks Asia Pacific & Japan.
"Now, predictive models are underpinning mission-critical tasks, giving organisations significant competitive advantage and allowing them to provide highly differentiated products and services."
Nick Eayrs, Databricks
The newly released 2023 State of Data + AI report analyses anonymised usage data from over 9,000 global Databricks customers to comprehensively examine organisations’ data and AI initiatives. It uncovers exactly where enterprises find themselves in this transformation, and the platforms and tools they are using to take advantage of it.
The research noted that the hype around LLMs is real. From the end of November 2022 to the beginning of May 2023, SaaS LLMs used to access models like OpenAI grew exponentially, with Lakehouse customers at a rapid 1,310%. Transformer-related libraries like HuggingFace (an NLP toolkit and model hub), which are used to train homegrown LLMs and were in demand even before the launch of ChatGPT, grew 82% within the same time frame.
Other key findings include:
- Data transformation and integration are more vital than ever: The fastest growing tools on Databricks are dbt (206% YoY) and FiveTran (181%). But of the ten most popular data and AI products, six are data integration tools, including Informatica and Qlik, making it the fastest-growing market on the Databricks Lakehouse.
- Companies eye open source: When looking at the most popular data and AI products, Microsoft Power BI and Plotly reign above the rest. But organisations are showing a strong pull to open technologies; eight of the ten most popular data and AI products are based on open source software, including dbt, Hugging Face and GeoPandas.
- Enterprises are doing more AI projects than ever before – and getting better at it: The number of models that are candidates for production (used in operations) grew 411% year-over-year, while the number of experimental projects grew 54%. Our data also shows that, on average, one in three experimental models are a candidate for the real world, compared to one in five last year, suggesting organisations are getting better at building and scaling these projects.
- AI is growing, but don’t forget traditional data analytics: Power BI was the most popular program running on top of the Lakehouse last year. The Lakehouse is increasingly used for data warehousing, including serverless data warehousing with Databricks SQL, which grew 144% YoY.
While it is still early days, these emerging trends are bound to define the future of AI. And business leaders need to pay attention. It's never been clearer: the companies that harness the power of DS/ML will lead the next generation of data.