Google BigQuery Launches Third-Party Generative AI Inference: What You Need to Know

Google has recently unveiled a trailblazing capability in BigQuery, enabling third-party generative AI inference for open models. This feature allows data teams to deploy and run models from Hugging Face or Vertex AI Model Garden using simple SQL commands. Let’s dive into what this means for data teams, how it works, and what advantages it brings to the table.

Contents

Simplifying AI Deployment
How to Use the New Feature

Deploying a Model
Running Inference

Customization for Production Use
Granular Resource Control
Model Compatibility
Impacts on Data Roles
Competitive Landscape
Learning Resources

Simplifying AI Deployment

Historically, deploying open-source AI models has been a cumbersome task for data teams. They faced a multitude of challenges, including managing Kubernetes clusters, configuring endpoints, and coordinating various tools. As Virinchi T noted in a Medium article, "This process requires multiple tools, different skill sets, and significant operational overhead." For many teams, this friction prevented them from harnessing AI capabilities—even when the models were readily available.

With the latest enhancement in BigQuery, this complexity has been significantly reduced. Now, utilizing a SQL interface, the entire workflow can be distilled down to merely two SQL statements.

How to Use the New Feature

Deploying a Model

To get started, users can create a model by executing a CREATE MODEL statement, specifying either a Hugging Face model ID, such as sentence-transformers/all-MiniLM-L6-v2, or a model name from the Vertex AI Model Garden. Google’s BigQuery takes care of provisioning compute resources with default configurations, typically completing the deployment process within 3 to 10 minutes, depending on the chosen model’s size.

Running Inference

Once the model is deployed, running inference is seamless. Users can utilize AI.GENERATE_TEXT for language models or AI.GENERATE_EMBEDDING for embeddings, querying the necessary data directly from BigQuery tables. BigQuery also smartly manages the resource lifecycle with the endpoint_idle_ttl option, automatically shutting down idle endpoints to prevent unnecessary charges. If a team needs to undeploy endpoints, they can easily do so using the ALTER MODEL statement when batch jobs conclude.

Customization for Production Use

One of the standout features of this new capability is its customization options for production use cases. Users can specify machine types, set replica counts, and configure endpoint idle times directly within the CREATE MODEL statement. Additionally, Compute Engine reservations can secure GPU instances to ensure consistent performance. When it’s time to retire a model, a simple DROP MODEL statement cleans up all associated resources in Vertex AI.

Granular Resource Control

Google’s blog emphasizes "granular resource control" and "automated resource management," which allow teams to find an effective balance between performance and cost—all while remaining within the SQL environment. Earlier posts demonstrated that using similar patterns with open-source embedding models, processing 38 million rows cost as little as $2-3.

Model Compatibility

This new feature supports an impressive array of over 13,000 Hugging Face text embedding models and more than 170,000 text generation models, including Meta’s Llama series and Google’s Gemma family. However, models must meet Vertex AI Model Garden’s deployment requirements, such as regional availability and quota limits.

Impacts on Data Roles

The launch has distinct advantages for various roles within data teams:

For Data Analysts: The new SQL interface empowers you to experiment with ML models directly in your SQL environment, eliminating the need to wait for engineering resources.
For Data Engineers: It simplifies the process of building ML-powered data pipelines, removing the need for separate ML infrastructure maintenance.

Competitive Landscape

With the introduction of this feature, BigQuery enters the competitive landscape alongside Snowflake’s Cortex AI and Databricks’ Model Serving, both of which offer SQL-accessible ML inference. BigQuery’s strength lies in its direct integration with the extensive Hugging Face model catalog, making it an attractive option for users already leveraging Google Cloud.

Learning Resources

For those eager to explore this new functionality, comprehensive documentation and tutorials are readily available for text generation with Gemma models and embedding generation, ensuring users can quickly get up to speed and make the most of these advancements.

This new capability is set to revolutionize how data teams approach machine learning, streamlining processes and making cutting-edge AI more accessible than ever.

Inspired by: Source

Google BigQuery Introduces SQL-Native Managed Inference for Enhanced Hugging Face Model Integration

Google BigQuery Launches Third-Party Generative AI Inference: What You Need to Know

Simplifying AI Deployment

How to Use the New Feature

Deploying a Model

Running Inference

Customization for Production Use

Granular Resource Control

Model Compatibility

Impacts on Data Roles

Competitive Landscape

Learning Resources

Stay Connected

Explore Top AI Tools Instantly

Latest News

NAACP Lawsuit Claims Elon Musk’s xAI Pollutes Black Neighborhoods Near Memphis

Enhancing Gradient Concentration to Distinguish Between SFT and RL Data

Optimizing Use-Case Based Deployments with SageMaker JumpStart

Unlocking Vector Databases and Embeddings Using ChromaDB: A Comprehensive Guide on Real Python

Leading global tech insights for 20M+ innovators

Quick Link

Support

Sign Up for Our Newsletter

Google BigQuery Launches Third-Party Generative AI Inference: What You Need to Know

Simplifying AI Deployment

How to Use the New Feature

Deploying a Model

Running Inference

More Read

Customization for Production Use

Granular Resource Control

Model Compatibility

Impacts on Data Roles

Competitive Landscape

Learning Resources

Sign Up For Daily Newsletter

Get AI news first! Join our newsletter for fresh updates on open-source models.

Stay Connected

Explore Top AI Tools Instantly

Latest News

NAACP Lawsuit Claims Elon Musk’s xAI Pollutes Black Neighborhoods Near Memphis

Enhancing Gradient Concentration to Distinguish Between SFT and RL Data

Optimizing Use-Case Based Deployments with SageMaker JumpStart

Unlocking Vector Databases and Embeddings Using ChromaDB: A Comprehensive Guide on Real Python