Parquet Files on AWS S3
All SpiderRock historical data is available in Parquet files, hosted on AWS S3 and partitioned in Hive style by date. This page covers the file format, how to download the data, and how to read it using common tools.
File Format
Date partitioning follows Central Time for US datasets and Central European Time for European data. To avoid conflicts with date columns that may be present in the files, the date partition is named date_p.
For options datasets, SpiderRock additionally partitions by type: futures (type=FUT) and equities (type=EQT). For European data, a third partitioning level is provided based on the exchange the data pertains to.
This partitioning scheme enables efficient access when filtering by the partition key in a WHERE clause. Additionally, data is sorted by ticker, option characteristics, and timestamp (the date column) where applicable, further improving query performance.
All timestamps are in UTC.
Downloading the Data
Downloading via AWS CLI
During client onboarding, you will receive AWS credentials for accessing SpiderRock data. These include a set of CLI credentials and a set of web portal credentials. SpiderRock recommends logging into the AWS portal first to explore the folder structure and identify the data available to you.
To download data to your local machine, SpiderRock recommends using the AWS CLI. Once you have identified the data you need, run the following command to download the data to your local machine:
export AWS_ACCESS_KEY_ID=<your access key id>
export AWS_SECRET_ACCESS_KEY=<your access key secret>
aws s3 sync <s3://srhistdatastore/v8/......> <your local copy path>
Downloading via IAM Role (AWS CLI)
If you access SpiderRock data through a cross-account IAM role, add a named profile to your ~/.aws/config file. The AWS CLI will assume the role automatically when that profile is used.
[profile spiderrock]
role_arn = <SPIDERROCK_ROLE_ARN>
source_profile = default
role_session_name = historical-data-download
Once the profile is configured, pass it to aws s3 sync with the --profile flag:
aws s3 sync \
s3://srhistdatastore/v8/US/SurfaceFixedGridIntradayHist/type=EQT/date_p=2026-02-09/ \
./SurfaceFixedGridIntradayHist/date_p=2026-02-09/ \
--profile spiderrock
Downloading via IAM Role (Python)
If your organization accesses SpiderRock data through a cross-account IAM role rather than long-lived credentials, use the AWS Security Token Service (STS) to assume the role and obtain temporary credentials before downloading.
import boto3
# Create an STS client using AWS credentials from your own AWS account.
# The IAM user or role running this code must be permissioned to assume the SpiderRock-provided IAM role.
sts = boto3.client("sts")
# Replace with the AWS Role ARN provided by SpiderRock.
provided_role = "<SPIDERROCK_ROLE_ARN>"
# Assume the SpiderRock role and obtain temporary credentials.
# RoleSessionName can be customized for audit and logging purposes.
response = sts.assume_role(
RoleArn=provided_role,
RoleSessionName="historical-data-download",
)
# Create an S3 client using the temporary credentials returned from the AssumeRole request.
s3 = boto3.client(
"s3",
aws_access_key_id=response["Credentials"]["AccessKeyId"],
aws_secret_access_key=response["Credentials"]["SecretAccessKey"],
aws_session_token=response["Credentials"]["SessionToken"],
)
# Download an object from the SpiderRock Historical Data bucket.
# Replace the Key value with the desired dataset, date, and file path.
# In production workflows, clients typically parameterize the Key and run
# this process on a schedule to retrieve newly available data files.
s3.download_file(
Bucket="srhistdatastore",
Key="v8/US/SurfaceFixedGridIntradayHist/type=EQT/date_p=2026-02-09/SurfaceFixedGridIntradayHist_2026-02-09.parquet",
Filename="SurfaceFixedGridIntradayHist_2026-02-09.parquet",
)
Reading the Data
Many tools support working with Parquet files. Internally, SpiderRock uses DuckDB and Polars. The following examples demonstrate how to efficiently read a single day's worth of SpiderRock volatilities from OptionIntradayHist for AAPL.
DuckDB
SELECT
okey_tk,
okey_dt,
okey_xx,
okey_cp,
date,
srvol
FROM
read_parquet('./OptionIntradayHist/**', hive_partitioning = true)
WHERE
date_p = '2025-12-26'
AND okey_tk = 'AAPL'
Polars
import polars as pl
import datetime
df = (
pl.scan_parquet("./OptionIntradayHist/**")
.filter(
(pl.col("date_p") == datetime.date(2025, 12, 26)).and_(
pl.col("okey_tk") == "AAPL"
)
)
.select(["okey_tk", "okey_dt", "okey_xx", "okey_cp", "date", "srvol"])
.collect()
)
Working with Complex Data Types
Some SpiderRock datasets use complex data types available in Parquet files. For example, the AuctionNotice dataset contains an OrderLegs column of type "list of struct."
Some clients prefer to flatten these fields by creating a separate row for each list item and extracting the struct fields into separate columns. The following example demonstrates how to accomplish this using Polars:
df = (
pl.read_parquet(
"./AuctionNotice/date_p=2025-11-03/AuctionNotice_2025-11-03.parquet"
)
.explode("OrderLegs")
.unnest("OrderLegs", separator="_")
)