Getting Started with Google’s Page Speed API

Heather Armstrong
Reading time: 9 minutes
11th March 2019

Page speed is one of the biggest indicators of how long someone will spend on your site. Slow loading pages can lead to higher bounce rates, lower conversion rates, and hence, lower revenue.

To get some insight into whether load times may be affecting your audience retention and conversion, the Google’s Page Speed Insights tool is great place to start.

What’s so great about the Page Speed Insights API?

With this tool, you can plug in a URL and receive a summary of its performance. This is great for sampling a handful of URLs, but what if you have a large website and want to see a comprehensive overview of performance across multiple sections and page types?

This is where the API comes in. Google’s Page Speed Insights API gives us the opportunity to analyze performance for many pages and log the results, without needing to explicitly request URLs one at a time and interpret the results manually.

With this in mind, we’ve put together a simple guide that will get you started using the API for your own website. Once you’ve familiarized yourself with the process outlined below, you’ll see how it can be used to analyze your site-speed at scale, keep track of how it’s changing over time or even set up monitoring tools.

This guide assumes some familiarity with scripting. Here we use Python to interface with the API and parse the results.

Objectives

In this post you will learn how to:

  1. Construct a Google Page Speed Insights API query
  2. Make API requests for a table of URLs
  3. Extract basic information from the API response
  4. Run the given example script in Python

Getting set up

There are a few steps you will need to follow before querying the Page Speed Insights API with Python.

  • API setup: Many Google APIs require API keys, passwords and other authentication measures. However, you don’t require any of this to get started with the Google Page Speeds API!
  • Python 3 installation: If you’ve never used python before, we recommend getting started with the Anaconda distribution (Python 3.x version), which installs python along with popular data analysis libraries like Pandas.

Making the requests

Basics of a request

The API can be queried at this endpoint using GET requests:

GET https://www.googleapis.com/pagespeedonline/v5/runPagespeed

We then add on additional parameters to specify the URL we want to find the page speed of and the device type to use, as shown below:

https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url={url}&strategy={device_type}

When making requests, you should replace {url} with the URL-encoded page URL from your website, and {device_type} with with mobile or desktop, to specify the device type.

Python packages

In order to make requests, ingest them and then write the results to tables, we’ll be using a few python libraries:

  • urllib: To make the HTTP requests.
  • json: To parse and read the response objects.
  • pandas: To save the results in CSV format.

Constructing the query

To make an API request using Python, we can use the urllib.request.urlopen method:

import urllib.request
import urllib.parse

url = 'http://www.example.com'
escaped_url = urllib.parse.quote(url)
device_type = 'mobile'

# Construct request url
contents = urllib.request.urlopen(
    'https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url={}&strategy={}'\
    .format(escaped_url, device_type)
).read().decode('UTF-8')

This request should return a (surprisingly large) JSON response. We’ll discuss this in more detail shortly.

Making multiple queries

A major selling point of this API is that it enables us to pull page speeds for batches of URLs. Let’s take a look at how this can be done with Python.

One option is to store the request parameters (url and device_type) in a CSV, which we can load into a Pandas DataFrame to iterate over. Notice below that each request, or unique url + device_type pair has its own row.

Store data in CSV

URL, device_type
0, https://www.example.com, desktop
1, https://www.example.com, mobile
2, https://www.example.com/blog, desktop
3, https://www.example.com/blog, mobile

Load the CSV

import pandas as pd
df = pd.read_csv(url_file)

Once we have a dataset with all the URLs to request, we can iterate through them and make an API request for each row. This is shown below:

import time

# This is where the responses will be stored
response_object = {}

# Iterating through df
for i in range(0, len(df)):

    # Error handling
    try:
        print('Requesting row #:', i)

        # Define the request parameters
        url = df.iloc[i]['URL']
        device_type = df.iloc[i]['device_type']

        # Making request
        contents = urllib.request.urlopen(
            'https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url={}&strategy={}'\
            .format(url, device_type)
        ).read().decode('UTF-8')

        # Converts to json format
        contents_json = json.loads(contents)

        # Insert returned json response into response_object
        response_object[device_type][url] = contents_json
        print('Sleeping for 20 seconds between responses.')
        time.sleep(20)


    except Exception as e:
        print('Error:', e)
        print('Returning empty response for url:', url)
        response_object[device_type][url] = {}

Reading the response

Before applying any filters or formatting on the data, we can first store the full responses for future use like this:

import json
from datetime import datetime

f_name ='data/{}-response.json'.format(datetime.now().strftime("%Y-%m-%d_%H:%M:%S"))

with open(f_name, 'w') as outfile:
    json.dump(response_object, outfile, indent=4)

As mentioned above, each response returns a JSON object. They have many different properties relating to the given URL, and are far too large to decipher without filtering and formatting.

To do this, we will be using the Pandas library, which makes it easy to extract the data we want in table format and export to CSV.

This is the general structure of the response. The data on load times has been minimized due to its size.

General response structure

Among other information, there are two major sources of page speed data included in the response: Lab data, stored in ‘lighthouseResult’ and Field data, stored in ‘loadingExperience’. In this post, we’ll be focusing on just Field data, which is crowd sourced based on real-world users on the Chrome browser.

In particular, we are going to extract the following metrics:

  • Requested URL and Final URL
    • We need both the Requested and Final resolved URL that was audited to make sure that they are the same. This will help us identify that the result came from the intended URL instead of a redirect.

    We can see that both URLs are the same in ‘lighthouseResult’ above.

 

  • First Contentful Paint (ms)
    • This is the time between the user’s first navigation to the page and when the browser first renders a piece of content, telling the user that the page is loading.
    • This metric is measured in milliseconds.
  • First Contentful Paint (proportions of slow, average, fast)
    • This shows the percent of pages that have slow, average, and fast load times of First Contentful Paint.

    First Contentful Paint load time in milliseconds, labeled ‘percentile’, and proportion of slow, average, and fast.

 

All these results can be extracted for either, or both, the mobile and desktop data.

If we call our Pandas dataframe df_field_responses, here is how we would extract these properties:

import pandas as pd

# Specify the device_type (mobile or desktop)
device_type = 'mobile'

# Create dataframe to store responses
df_field_responses = pd.DataFrame(
    columns=['requested_url',
             'final_url',
             'FCM_ms',
             'FCP_category',
             'FCP_fast',
             'FCP_avg',
             'FCP_slow'
    ]
)

for (url, i) in zip(
    response_object[device_type].keys(),
    range(0, len(df_field_responses))
):

    try:
        print('Trying to insert response for url:', url)
        # We reuse this below when selecting data from the response
        fcp_loading = response_object[device_type][url]\
            ['loadingExperience']['metrics']['FIRST_CONTENTFUL_PAINT_MS']

        # URLs
        df_field_responses.loc[i, 'requested_url'] =\
            response_object[device_type][url]['lighthouseResult']['requestedUrl']
        df_field_responses.loc[i, 'final_url'] =\
            response_object[device_type][url]['lighthouseResult']['finalUrl']

        # Loading experience: First Contentful Paint (ms)
        df_field_responses.loc[i, 'FCP_ms'] = fcp_loading['percentile']
        df_field_responses.loc[i, 'FCP_category'] = fcp_loading['category']

        # Proportions: First Contentful Paint
        df_field_responses.loc[i, 'FCP_fast'] =\
            fcp_loading['distributions'][0]['proportion']
        df_field_responses.loc[i, 'FCP_avg'] =\
            fcp_loading['distributions'][1]['proportion']
        df_field_responses.loc[i, 'FCP_slow'] =\
            fcp_loading['distributions'][2]['proportion']

        print('Inserted for row {}: {}'.format(i, df_field_responses.loc[i]))

    except Exception as e:
        print('Error:', e)
        print('Filling row with Error for row: {}; url: {}'.format(i, url))
        # Fill in 'Error' for row if a field couldn't be found
        df_field_responses.loc[i] = ['Error' for i in range(0, len(df_field_responses.columns))]

Then to store the dataframe, df_field_responses, in a CSV:

df_field_responses.to_csv('page_speeds_filtered_responses.csv', index=False)

Running the scripts on GitHub

The repository on GitHub contains instructions on how to run the files, but here is a quick breakdown.

  1. Before running the example scripts on GitHub, you will need to clone the repository using
    • git clone https://github.com/Ayima/page-speed-blog-post.git
  2. Then create a CSV file with the URLs to query.
  3. Fill in the config file with the URL file name.
  4. Command to run the scripts:
python main.py --config-file config.json

Something to keep in mind:

The API has a limit as to how many requests you can make per day and per second.

There are several ways to prepare for this including:

  • Error handling: Repeat requests that return an error
  • Throttling: in your script to limit the number of requests sent per second, and re-requesting if a URL fails.
  • Get an API key if necessary (usually if you’re making more than one query per second).

Hopefully after reading this guide you’re able to get up and running with some basic querying of the Google Page Speed Insights API.  Feel free to reach us on twitter @ayima with any questions or if you run into any problems!

How we use the Page Speeds API at Ayima

Here at Ayima, we continuously collect and warehouse page speeds for clients. This helps us keep an eye on the health of their websites and identify negative or positive trends. By monitoring speeds for a variety of pages, we are able to visualize performance by site section or page type (e.g. product pages VS category pages for Ecommerce websites).

 

 

We also track other interesting metrics provided by the API, including Google’s Lab data, and present everything in an interactive dashboard. For more information on this please get in touch, we would love to chat with you!

Source Code: You can find the GitHub project with an example script to run here.

Written By Heather Armstrong
Asset 1 Asset 1 Asset 3