Published 12 min.

Building a gold price API with Cloudflare Workers and Puppeteer

Building a gold price API with Cloudflare Workers and Puppeteer cover image

Introduction

If you’ve ever needed real-time commodity pricing data for a spreadsheet, dashboard, or application, you know the challenge: manually refreshing websites, copying values, and hoping they’re still accurate by the time you use them. Third-party gold price APIs often come with expensive monthly subscriptions, restrictive rate limits, or unreliable data quality, creating demand for DIY alternatives.

What if you could build your own serverless price tracker that’s free, fast, and updates automatically?

In this tutorial, we’ll build a production-ready gold price tracker API using Cloudflare Workers, Puppeteer, and R2 storage. The entire project runs on Cloudflare’s edge network, meaning zero servers to maintain, near-instant response times globally, and a monthly cost of $0 for most use cases.

Here’s what makes this approach powerful:

  • Automated scraping with Puppeteer: extract real-time prices from goldprice.org using browser automation on the edge
  • Smart caching with R2 storage: store the latest price in Cloudflare’s S3-compatible object storage, complete with metadata tracking previous values
  • Scheduled updates: run automated price checks every 6 hours using built-in cron triggers
  • Google Sheets ready: use a simple Apps Script function to pull live prices directly into your spreadsheets

What you’ll learn

By the end of this tutorial, you’ll know how to:

  1. Set up Cloudflare Workers for web scraping
  2. Use Puppeteer for browser automation at the edge
  3. Configure scheduled tasks with cron triggers
  4. Integrate your API with Google Sheets

Prerequisites

To follow along, you’ll need:

  • Basic JavaScript/TypeScript knowledge
  • A Cloudflare account (free tier works perfectly)
  • 20 minutes of focused time

The complete project is around 200 lines of well-organized TypeScript, and you’ll have a working API by the time we’re done.

🧑‍💻
The complete code is available in the GitHub repository. Deploy your own version, customize it for your needs, and share what you build. https://github.com/Gosu-Team/gold-price-api

Serverless web scraping vs. VPS

Traditional web scraping typically involves managing VPS servers, dealing with cron job failures, or watching cloud bills increase month after month. Serverless web scraping with Cloudflare Workers changes this entire paradigm.

The old approach costs money even when your scraper sits idle

The traditional approach: rent a VPS, install Node.js and Puppeteer, set up cron jobs, configure monitoring, handle updates, and maintain server uptime. You’re paying for a server 24/7 even though your scraper might only run for 30 seconds every few hours.

The serverless approach flips this model. With Cloudflare Workers, you write your scraping logic once, deploy it to Cloudflare’s global edge network, and it runs on-demand. No servers to maintain, no operating system updates, no worrying about uptime. You only pay for the actual execution time, measured in milliseconds.

Cloudflare Workers excel at certain scraping patterns:

Perfect forNot ideal for
Scheduled price/data collection (like our gold price tracker)Complex multi-step scraping flows with heavy session management
Public API endpoints that cache scraped dataLarge-scale scraping operations (thousands of pages per run)
Simple scraping tasks (extract a few elements from a page)Tasks requiring long-running processes (Workers have CPU time limits)
Projects that need global distributionScraping that needs rotating proxy networks
Sites without advanced bot detection

For our gold price API, we’re in the sweet spot: a simple scrape every 6 hours, cache the result in R2 storage, and serve it via a fast API endpoint.

Your API architecture uses three Cloudflare services that work together seamlessly

Before we dive into code, let’s understand how all the pieces of our gold price API work together. This is a simple system, but it leverages three powerful Cloudflare services: Workers, Browser Workers, and R2 storage.

System design components

Blog post image

Cloudflare Worker (HTTP handler)

This is your main entry point. It receives HTTP requests, checks the R2 cache, and either returns cached data or triggers a fresh scrape. It’s also responsible for handling the scheduled cron events.

Browser Worker (Puppeteer)

Cloudflare’s browser rendering service gives you a real Chromium browser at the edge. We use this to navigate to goldprice.org, wait for the page to load, and extract the current price from the DOM. No need to parse HTML or deal with JavaScript rendering issues, Puppeteer handles it all.

R2 bucket (storage)

Cloudflare R2 offers scalable, durable object storage integrated with Workers, enabling serverless applications to efficiently store and serve data without egress fees. We store a single JSON file (gold-price.json) containing the current price, source URL, and timestamp. R2 also supports custom metadata, which we use to track the previous price and last update time.

Cron trigger (scheduler)

Built into Cloudflare Workers, cron triggers let you schedule functions using standard cron syntax. We set it to 0 */6 * * * (every 6 hours) to keep our price data fresh without manual intervention.

Data flow

First request scenario:

  1. User makes initial API request
  2. Worker checks R2 cache (empty)
  3. Worker launches Puppeteer to scrape goldprice.org
  4. Price data stored in R2 with metadata
  5. JSON response returned to user

Subsequent requests:

  1. User makes API request
  2. Worker checks R2 cache (hit)
  3. Cached JSON returned immediately

Scheduled updates:

  1. Cron trigger fires every 6 hours
  2. Worker launches Puppeteer to scrape current price
  3. R2 cache updated with fresh data and metadata

Puppeteer on Workers requires keeping your scraping logic lean and efficient

Now let’s build the heart of our API: the scraping service that fetches gold prices from goldprice.org using Puppeteer.

Browser rendering at the edge has limitations you need to understand

Cloudflare’s browser rendering service lets you run Puppeteer directly at the edge. Unlike the Node.js version, it’s optimized for serverless environments with some limitations:

  • No persistent browser instances between requests
  • Limited execution time (30 seconds for free tier)
  • Reduced API surface (core automation features only)

The key is keeping your scraping logic focused and efficient. Launch the browser, grab the data, and clean up immediately.

Finding the right CSS selector is the foundation of reliable scraping

Before writing code, inspect goldprice.org in your browser:

Blog post image
  1. Open Developer Tools
  2. Find the price element using the inspector
  3. Note the selector: .gpoticker-price
  4. Verify it contains the current spot price

This selector is stable and reliable for our scraper. If the website changes its structure, you’ll only need to update this one value.

Implementing the crawler service

Create src/services/crawler.ts:

typescript
import puppeteer from "@cloudflare/puppeteer";
import { PriceData } from "../types";
import { storePrice } from "./storage";

export async function crawlAndStorePrice(
  env: RuntimeEnv,
  targetUrl: string
): Promise<void> {
  let browser, page;

  try {
    // Launch browser instance
    browser = await puppeteer.launch(env.BROWSER);
    page = await browser.newPage();

    // Navigate and wait for network to settle
    await page.goto(targetUrl, {
      waitUntil: "networkidle0",
    });

    // Extract price using CSS selector
    const price = await page.$eval(
      ".gpoticker-price",
      (el) => el.textContent?.trim() ?? ""
    );

    if (!price) {
      throw new Error("Price not found - check selector");
    }

    // Prepare data object
    const priceData: PriceData = {
      price,
      source: targetUrl,
      fetchedAt: new Date().toISOString(),
    };

    // Store in R2 bucket
    await storePrice(env, priceData);
  } finally {
    // Always clean up resources
    await page?.close();
    await browser?.close();
  }
}

Always clean up browser resources in a finally block

The finally block is essential. Browser instances consume memory and count toward your Worker’s resource limits. Always close them, even if scraping fails.

For production, add specific error types:

typescript
try {
  // scraping logic
} catch (error) {
  if (error.message.includes("timeout")) {
    // Handle slow page loads
  } else if (error.message.includes("not found")) {
    // Handle missing elements
  }
  throw error; // Re-throw for logging
}

Performance optimization tips

  1. Use networkidle0: waits until network activity stops, ensuring dynamic content loads completely
  2. Keep sessions short: don’t reuse browser instances. Launch, scrape, close. Each request gets a fresh browser.
  3. Validate early: check if the price element exists before processing. Fail fast on structural changes.
  4. Set timeouts: add page.setDefaultTimeout(10000) to prevent hanging on slow connections

This crawler is the foundation. Next, we’ll wire it to R2 storage for intelligent caching.

R2 storage eliminates egress fees that make traditional object storage expensive

R2 is Cloudflare’s S3-compatible object storage with a key advantage: zero egress fees. This makes it perfect for our caching strategy.

Traditional object storage charges you for every read

Traditional object storage services charge for data transfer out of their network. For an API that serves price data repeatedly, these egress fees add up quickly. R2 eliminates this cost entirely.

Benefits for our use case:

  • S3-compatible API (familiar interface)
  • No egress fees (unlimited reads for our volume)
  • Metadata support (track previous prices)
  • Perfect for JSON caching

Storage service implementation

Create src/services/storage.ts:

typescript
import { PriceData } from "../types";

export async function storePrice(
  env: RuntimeEnv,
  priceData: PriceData
): Promise<void> {
  // Get existing data for metadata
  const existing = await getStoredPrice(env);

  await env.MISC.put(
    "gold-price.json",
    JSON.stringify(priceData),
    {
      httpMetadata: {
        contentType: "application/json",
      },
      customMetadata: {
        lastUpdated: priceData.fetchedAt,
        previousPrice: existing?.price ?? "none",
        previousUpdate: existing?.fetchedAt ?? "none",
      },
    }
  );
}

export async function getStoredPrice(
  env: RuntimeEnv
): Promise<PriceData | null> {
  const object = await env.MISC.get("gold-price.json");

  if (!object) {
    return null;
  }

  const data = await object.text();
  return JSON.parse(data) as PriceData;
}

Cron triggers automate price updates without external dependencies

One of Cloudflare Workers’ most powerful features is built-in cron triggers. Instead of setting up external schedulers or keeping servers running just to trigger tasks, you can schedule your Worker to run automatically at specific intervals.

Configuring cron triggers

In your wrangler.jsonc file, add the triggers configuration:

json
{
  "name": "gold-api",
  "main": "src/index.ts",
  "triggers": {
    "crons": ["0 */6 * * *"]
  }
}

The cron expression 0 */6 * * * means “run at minute 0 of every 6th hour.” In other words, the scraper runs at 12:00 AM, 6:00 AM, 12:00 PM, and 6:00 PM daily, four times per day to keep prices fresh.

Cron syntax breakdown:

  • 0 - Minute (0th minute of the hour)
  • */6 - Every 6 hours
  • * - Every day of month
  • * - Every month
  • * - Every day of week

Implementing the scheduled handler

Create src/handlers/scheduled.ts:

typescript
import { crawlAndStorePrice } from "../services/crawler";
import { DEFAULT_URL } from "../config/constants";

export async function handleScheduled(
  env: RuntimeEnv,
  ctx: ExecutionContext
): Promise<void> {
  const targetUrl = env.GOLD_PRICE_ORG ?? DEFAULT_URL;

  ctx.waitUntil(
    crawlAndStorePrice(env, targetUrl)
  );
}

The ctx.waitUntil() method ensures the Worker doesn’t terminate before the scraping completes, even though the scheduled event itself has no client waiting for a response.

Main entry point

Wire both handlers in src/index.ts:

typescript
import { handleFetch } from "./handlers/fetch";
import { handleScheduled } from "./handlers/scheduled";

export default {
  async fetch(
    request: Request,
    env: RuntimeEnv,
    ctx: ExecutionContext
  ): Promise<Response> {
    return handleFetch(request, env);
  },

  async scheduled(
    event: ScheduledEvent,
    env: RuntimeEnv,
    ctx: ExecutionContext
  ): Promise<void> {
    return handleScheduled(env, ctx);
  },
};

Testing locally

Before deploying, test your cron handler locally:

bash
# Start the development server
npm run dev

# In another terminal, trigger the cron manually
curl http://localhost:8787/__scheduled?cron=0+*/6+*+*+*

If everything works as expected we receive the price

Blog post image

Google Sheets integration makes your API accessible to non-technical users

Now that we have a working API, let’s make it accessible to one of the most common use cases: Google Sheets. Whether you’re tracking an investment portfolio, building a financial dashboard, or just monitoring commodity prices, pulling live data into spreadsheets is incredibly useful.

Google Sheets is the perfect no-code frontend for price data

Google Sheets is an ideal frontend for our gold price API. Non-technical users can access real-time prices without writing code, share data across teams, and build custom dashboards with charts and formulas. Best of all, it’s free and works everywhere.

Many investors already track their precious metals holdings using spreadsheet formulas to calculate current values based on live spot prices.

Creating a custom function

Google Sheets supports custom functions via Apps Script. Here’s how to create a GETGOLDPRICE() function:

Step 1: Open Apps Script

  1. Open your Google Sheet
  2. Click ExtensionsApps Script
  3. Delete any existing code in the editor

Step 2: Add the function

Paste this code:

javascript
function GETGOLDPRICE() {
  var url = "https://gold-api.your-worker.dev/";

  try {
    var response = UrlFetchApp.fetch(url);
    var data = JSON.parse(response.getContentText());

    // Return just the numeric price value
    return parseFloat(data.price.replace(/[$,]/g, ''));
  } catch (error) {
    return "Error: " + error.message;
  }
}

Replace your-worker.dev with your actual Worker URL.

Step 3: Save and use

  1. Save the project (Ctrl/Cmd + S)
  2. Name it “Gold Price Tracker”
  3. Return to your spreadsheet
  4. In any cell, type =GETGOLDPRICE()

Real-world portfolio tracking examples

Here’s a simple portfolio tracker structure:

AssetPriceHoldingsValue
Gold=GETGOLDPRICE()10=B2*C2

Add conditional formatting to highlight price changes, build charts to visualize trends, or set up email alerts when prices cross certain thresholds using Apps Script triggers.

Your gold price API is now accessible to anyone with a spreadsheet (no coding required). This solves a common problem where users struggle to find free APIs with adequate access for their spreadsheet integrations.

Conclusion

You’ve just built a production-ready gold price API using Cloudflare’s edge platform and it cost you nothing but 20 minutes of setup time. This isn’t just a toy project; it’s a fully functional serverless application that scrapes live data, caches intelligently, updates automatically, and serves requests globally with minimal latency.

What we accomplished:

  • Automated web scraping with Puppeteer at the edge
  • Smart caching using R2 storage with metadata tracking
  • Scheduled updates via cron triggers (zero external dependencies)
  • Google Sheets integration for non-technical users
  • Global deployment across data centers worldwide

The beauty of this architecture is its versatility. The same pattern works for tracking silver prices, cryptocurrency rates, stock indices, or any publicly available data. Change the URL, update the selector, and you have a new API in minutes.


Additional resources