Version: Upcoming

Parquet Files on AWS S3

All SpiderRock historical data is available in Parquet files, hosted on AWS S3 and partitioned in Hive style by date. This page covers the file format, how to download the data, and how to read it using common tools.

File Format

Date partitioning follows Central Time for US datasets and Central European Time for European data. To avoid conflicts with date columns that may be present in the files, the date partition is named date_p.

For options datasets, SpiderRock additionally partitions by type: futures (type=FUT) and equities (type=EQT). For European data, a third partitioning level is provided based on the exchange the data pertains to.

This partitioning scheme enables efficient access when filtering by the partition key in a WHERE clause. Additionally, data is sorted by ticker, option characteristics, and timestamp (the date column) where applicable, further improving query performance.

All timestamps are in UTC.

Downloading the Data

Downloading via AWS CLI

During client onboarding, you will receive AWS credentials for accessing SpiderRock data. These include a set of CLI credentials and a set of web portal credentials. SpiderRock recommends logging into the AWS portal first to explore the folder structure and identify the data available to you.

To download data to your local machine, SpiderRock recommends using the AWS CLI. Once you have identified the data you need, run the following command to download the data to your local machine:

export AWS_ACCESS_KEY_ID=<your access key id>
export AWS_SECRET_ACCESS_KEY=<your access key secret>
aws s3 sync <s3://srhistdatastore/v8/......> <your local copy path>

Downloading via IAM Role (AWS CLI)

If you access SpiderRock data through a cross-account IAM role, add a named profile to your ~/.aws/config file. The AWS CLI will assume the role automatically when that profile is used.

~/.aws/config
[profile spiderrock]
role_arn = <SPIDERROCK_ROLE_ARN>
source_profile = default
role_session_name = historical-data-download

Once the profile is configured, pass it to aws s3 sync with the --profile flag:

aws s3 sync \
    s3://srhistdatastore/v8/US/SurfaceFixedGridIntradayHist/type=EQT/date_p=2026-02-09/ \
    ./SurfaceFixedGridIntradayHist/date_p=2026-02-09/ \
    --profile spiderrock

Downloading via IAM Role (Python)

If your organization accesses SpiderRock data through a cross-account IAM role rather than long-lived credentials, use the AWS Security Token Service (STS) to assume the role and obtain temporary credentials before downloading.

import boto3

# Create an STS client using AWS credentials from your own AWS account.
# The IAM user or role running this code must be permissioned to assume the SpiderRock-provided IAM role.
sts = boto3.client("sts")

# Replace with the AWS Role ARN provided by SpiderRock.
provided_role = "<SPIDERROCK_ROLE_ARN>"

# Assume the SpiderRock role and obtain temporary credentials.
# RoleSessionName can be customized for audit and logging purposes.
response = sts.assume_role(
    RoleArn=provided_role,
    RoleSessionName="historical-data-download",
)

# Create an S3 client using the temporary credentials returned from the AssumeRole request.
s3 = boto3.client(
    "s3",
    aws_access_key_id=response["Credentials"]["AccessKeyId"],
    aws_secret_access_key=response["Credentials"]["SecretAccessKey"],
    aws_session_token=response["Credentials"]["SessionToken"],
)

# Download an object from the SpiderRock Historical Data bucket.
# Replace the Key value with the desired dataset, date, and file path.
# In production workflows, clients typically parameterize the Key and run
# this process on a schedule to retrieve newly available data files.
s3.download_file(
    Bucket="srhistdatastore",
    Key="v8/US/SurfaceFixedGridIntradayHist/type=EQT/date_p=2026-02-09/SurfaceFixedGridIntradayHist_2026-02-09.parquet",
    Filename="SurfaceFixedGridIntradayHist_2026-02-09.parquet",
)

Reading the Data

Many tools support working with Parquet files. Internally, SpiderRock uses DuckDB and Polars. The following examples demonstrate how to efficiently read a single day's worth of SpiderRock volatilities from OptionIntradayHist for AAPL.

DuckDB

SELECT
	okey_tk,
	okey_dt,
	okey_xx,
	okey_cp,
	date,
	srvol
FROM
	read_parquet('./OptionIntradayHist/**', hive_partitioning = true)
WHERE
	date_p = '2025-12-26'
	AND okey_tk = 'AAPL'

Polars

import polars as pl
import datetime

df = (
    pl.scan_parquet("./OptionIntradayHist/**")
    .filter(
        (pl.col("date_p") == datetime.date(2025, 12, 26)).and_(
            pl.col("okey_tk") == "AAPL"
        )
    )
    .select(["okey_tk", "okey_dt", "okey_xx", "okey_cp", "date", "srvol"])
    .collect()
)

Working with Complex Data Types

Some SpiderRock datasets use complex data types available in Parquet files. For example, the AuctionNotice dataset contains an OrderLegs column of type "list of struct."

Some clients prefer to flatten these fields by creating a separate row for each list item and extracting the struct fields into separate columns. The following example demonstrates how to accomplish this using Polars:

df = (
    pl.read_parquet(
        "./AuctionNotice/date_p=2025-11-03/AuctionNotice_2025-11-03.parquet"
    )
    .explode("OrderLegs")
    .unnest("OrderLegs", separator="_")
)

File Format​

Downloading the Data​

Downloading via AWS CLI​

Downloading via IAM Role (AWS CLI)​

Downloading via IAM Role (Python)​

Reading the Data​

DuckDB​

Polars​

Working with Complex Data Types​