Vertex AI Pipelines will run your three Cloud Functions in sequence. It will also charge you like you're training a foundation model to do it.
I had two production pipelines. An ETL ingestion framework and an NLP review analysis system, both built on the GCP stack that made sense in 2024. Cloud Functions for triggers, Vertex AI Pipelines for orchestration, custom Docker containers for each processing step, and enough glue code to wallpaper a studio apartment. They worked. They also cost ten times more than they should have.
I modernized both. Ripped out Vertex AI Pipelines, killed every Docker container, deleted a GPU instance, replaced three separate AI services with one Gemini Flash call. The result: 98% cost reduction on the NLP pipeline. Zero containers across both projects. And the code is actually readable now.
Here's what I changed, what broke along the way, and what I'd do differently.
Three Things That Made the Old Stack Obsolete
Gemini 2.0 Flash collapsed entire ML pipelines into single API calls. Structured JSON output means you get sentiment analysis, text classification, and summarization in one request. At $0.075/million input tokens and $0.30/million output tokens, it's practically free at typical data volumes. The old pipeline ran three separate AI services to get the same result. Three billing accounts. Three failure modes. Three sets of documentation to maintain.
Cloud Workflows replaced Vertex AI Pipelines for orchestration. Vertex Pipelines is built for training complex ML models with custom containers, distributed compute, and artifact management. Using it to run three functions in sequence is like renting a freight terminal to mail a letter. Cloud Workflows is YAML-defined, lightweight, and costs orders of magnitude less.
Dataform hit production maturity in BigQuery. Declarative SQL transformations with dependency management, incremental processing, and built-in testing, all native to BigQuery. No Cloud Composer. No Airflow. No custom Python transformation scripts.
DataBridge: Config-Driven Ingestion
The original pipeline was built for one client: CSV files land in GCS, a Cloud Function picks them up and loads them into BigQuery. Simple enough. Then client two shows up. Then client three. Each one needs different file formats, different schemas, different transformations. The original system had hardcoded Python dictionaries for schemas, environment variables for configuration, and a gcloud CLI README as the deployment strategy. Every new client meant code changes.
DataBridge is config-driven. One YAML file per client defines file types, schemas, transformations, and BigQuery targets:
# One YAML file = one client pipeline
client_name: acme_corp
project_id: acme-analytics
dataset: raw_data
bucket: acme-incoming
file_types:
sales:
pattern: "sales_report"
target_table: raw_sales
schema_file: sales.json
transformation:
add_ingestion_date: true
rename_columns:
"Store ID": "store_id"
A JSON schema registry validates incoming data against declared contracts. A Pub/Sub Dead Letter Queue catches failures with automatic retry (3x) before alerting. Terraform manages all infrastructure.
The transformation layer moved from custom Python to Dataform. Three staging models clean and normalize raw data. Three mart models produce analytics-ready tables. The SQL is declarative, version-controlled, and testable. Any analyst can read and modify it without touching Python.
GCS Bucket → Pub/Sub → Cloud Function (validate + load) → BigQuery (raw)
↓ (failure) ↓
DLQ → Retry → Alert Dataform (staging → marts)
↓
Cloud Monitoring (6 widgets)
The result: 56 files, 4,100 lines of code, 61 tests. New client onboarding went from days to minutes.
The thing that surprised me: the hardest part wasn't the config system or the schema registry. It was extracting the hardcoded assumptions. The original code had client-specific logic buried three layers deep in transformation functions. Column names that only made sense for one client's CSV format. Date parsing that assumed US locale because the first client was American. The archeology took longer than the architecture.
ReviewForge: One API Call Replaces Everything
This is the one that changed how I think about ML pipelines.
Before the pipeline existed, this work was manual. An analyst running sentiment analysis, classification, and summarization in Jupyter notebooks, one review batch at a time. I built the Vertex AI Pipeline to automate that entirely. Drop a CSV into GCS, the pipeline handles the rest. No notebooks. No analyst bottleneck.
The automated system worked. But it was a four-container stack on Vertex AI Pipelines:
- Container 1: Google Natural Language API for sentiment analysis
- Container 2: BART zero-shot classification, a custom TensorFlow model running on GPU
- Container 3: Gemini Pro for batch summarization
- Container 4: Custom Python for data loading and transformation
Four Docker images built via Cloud Build. GPU instances for BART inference. The BART zero-shot classification step alone took 2.5 hours to process 250 reviews. Three separate AI services, each with its own pricing model, latency profile, and failure behavior. Cost: roughly $0.006 per review.
I replaced all of it with one Gemini 2.0 Flash call.
Structured JSON output means sentiment, categories, summary, and aspect-level sentiments come back in a single response. No GPU. No containers. No Vertex AI Pipelines.
Cloud Workflows orchestrates three lightweight Cloud Functions in sequence: Ingest (CSV to BigQuery), Analyze (Gemini Flash), Load (staging to final). Dataform handles the rest. Two staging models and 4 mart models produce per-product sentiment aggregations, category trends, and aspect analysis.
The Numbers
| Metric | Before | After | Change | |--------|--------|-------|--------| | Cost per review | $0.006 | $0.0001 | -98% | | 10K reviews/month | $60 | $1.50 | -97.5% | | Docker containers | 4 | 0 | -100% | | GPU instances | Yes (BART) | No | Eliminated | | AI services used | 3 (NL API + BART + Gemini Pro) | 1 (Gemini Flash) | -67% | | Orchestrator | Vertex AI Pipelines | Cloud Workflows | 10x cheaper | | Dataform models | 0 | 6 | Full SQL layer | | Terraform lines | 0 | 791 | Full IaC |
The 98% cost reduction isn't cherry-picked. At 100K reviews per month, the old pipeline costs roughly $600 in AI inference alone. The new one costs $15 total, including all GCP services.
The obvious question: is the output actually as good?
The old pipeline did document-level sentiment. Positive, negative, neutral, plus a magnitude score. That's it. BART classified reviews into fixed categories that required retraining to change. The summarization was a separate Gemini Pro call with no structured output.
Gemini Flash does aspect-level sentiment. It doesn't just tell you a review is negative. It tells you which specific feature drove the negativity, with the exact phrase as evidence. Categories are configurable YAML, not retrained models. Sentiment, classification, summarization, and aspect analysis all come back in a single structured JSON response.
It's not just cheaper. It sees more. The old system answered "is this review positive?" The new system answers "this review is negative because of battery life, positive about the display, and neutral on price, here are the phrases that say so."
One thing to watch: Gemini Flash structured output isn't 100% reliable. You need a validation layer in the Cloud Function that catches malformed responses and retries them. Models are probabilistic. Your pipeline shouldn't be.
Dataform Is the Underrated Move
Both projects lean on Dataform, and it's the change I'd push hardest on anyone still writing custom Python transformation code for BigQuery.
Declarative SQL over imperative Python. Instead of writing Python functions that read from one table and write to another, you write SQL with config blocks that declare dependencies, materialization strategy, and assertions. Dataform handles execution order, incremental processing, and schema management.
Native BigQuery integration. No external scheduler, no Cloud Composer, no separate execution environment. It runs inside BigQuery's infrastructure.
Testable transformations. Each model can have assertions (row counts, uniqueness, not-null checks) that run automatically. In ReviewForge, the staging models validate that every review has a parsed sentiment score and at least one category before the data reaches the mart layer.
Version-controlled SQL. The entire transformation layer lives in git. Code review for SQL changes. Branch-based development. It's what dbt promised but built directly into BigQuery.
After two projects and 12 combined Dataform models, I'm confident saying this: if your transformation layer is Python scripts reading from BigQuery and writing back to BigQuery, you're maintaining complexity you don't need.
Patterns That Made Both Projects Work
Config-driven everything. Both projects use YAML configs validated by Pydantic models. No hardcoded thresholds, no environment variable sprawl. One file per client, version-controlled, with sensible defaults and clear overrides.
Terraform from day one. DataBridge's Terraform module provisions the full stack: GCS bucket, Pub/Sub topic, DLQ subscription, Cloud Function, IAM bindings, monitoring dashboard. ReviewForge's 791-line main.tf covers Cloud Workflows, Eventarc triggers, and alert policies. Infrastructure is reproducible and auditable.
Monitoring built-in, not bolted on. Both projects ship with Cloud Monitoring dashboards and alert policies as part of the Terraform deployment. DataBridge has 6 widgets tracking ingestion volume, error rates, and DLQ depth. ReviewForge has 9 widgets. You see problems the day you deploy, not three weeks later when a client calls.
Test coverage that catches real bugs. 61 tests for DataBridge. 63 tests for ReviewForge. Unit tests for config parsing, schema validation, transformation logic, and API contracts. Integration tests that run the full pipeline against local fixtures. The test suites caught schema mismatches, edge cases in date parsing, and malformed JSON from Gemini during development. Those would have been production incidents.
Whitelabel from day one. Both projects started as client-specific pipelines. Extracting hardcoded values, building schema registries, designing client isolation. That work took real effort. Starting whitelabel costs maybe 20% more upfront. It saves weeks when client two shows up.
Five Things I'd Tell You Before You Start
-
Match the orchestrator to the workload. Vertex AI Pipelines is for training ML models with custom containers and distributed compute. If you're running three functions in sequence, use Cloud Workflows. The 10x cost difference is the least important reason. The operational simplicity matters more.
-
Gemini Flash killed the multi-model pipeline. For classification, sentiment, and summarization, there's no reason to maintain separate models. Structured output with JSON schema enforcement gives you typed, validated responses. But validate the JSON anyway. Models are probabilistic. Your pipeline shouldn't be.
-
Dataform is production-ready. Stop waiting. The declarative SQL approach is more maintainable, more testable, and more accessible to analysts who need to understand the logic. Custom Python transformation scripts in BigQuery pipelines are technical debt you're choosing to carry.
-
The extraction is harder than the build. Modernizing an existing pipeline means finding every hardcoded assumption in the old code. Client-specific column names. Locale-dependent date parsing. Magic numbers in transformation thresholds. Budget twice as long for this as you think you need.
-
Ship monitoring with the infrastructure, not after. If the Terraform that creates your pipeline also creates your dashboards and alerts, monitoring is never the thing you'll "get to later." It's there from the first deploy.
Both repos are open source. Go break them.
Christian Bourlier builds pipelines at rezzed.ai. He also deletes them. The deleting pays better.