Skip to main content
Version: Upcoming

Parquet Files on AWS S3

Overview

All SpiderRock historical data is available in Parquet files, hosted on AWS S3 and partitioned in Hive style by date. This page covers the file format, how to download the data, and how to read it using common tools.

File Format

Date partitioning follows Central Time for US datasets and Central European Time for European data. To avoid conflicts with date columns that may be present in the files, the date partition is named date_p.

For options datasets, SpiderRock additionally partitions by type: futures (type=FUT) and equities (type=EQT). For European data, a third partitioning level is provided based on the exchange the data pertains to.

This partitioning scheme enables efficient access when filtering by the partition key in a WHERE clause. Additionally, data is sorted by ticker, option characteristics, and timestamp (the date column) where applicable, further improving query performance.

All timestamps are in UTC.

Downloading the Data

During client onboarding, you will receive AWS credentials for accessing SpiderRock data. These include a set of CLI credentials and a set of web portal credentials. SpiderRock recommends logging into the AWS portal first to explore the folder structure and identify the data available to you.

To download data to your local machine, SpiderRock recommends using the AWS CLI. Once you have identified the data you need, run the following command to download the data to your local machine:

export AWS_ACCESS_KEY_ID=<your access key id>
export AWS_SECRET_ACCESS_KEY=<your access key secret>
aws s3 sync <s3://srhistdatastore/v8/......> <your local copy path>

Reading the Data

Many tools support working with Parquet files. Internally, SpiderRock uses DuckDB and Polars. The following examples demonstrate how to efficiently read a single day's worth of SpiderRock volatilities from OptionIntradayHist for AAPL.

DuckDB

SELECT
okey_tk,
okey_dt,
okey_xx,
okey_cp,
date,
srvol
FROM
read_parquet('./OptionIntradayHist/**', hive_partitioning = true)
WHERE
date_p = '2025-12-26'
AND okey_tk = 'AAPL'

Polars

import polars as pl
import datetime

df = (
pl.scan_parquet("./OptionIntradayHist/**")
.filter(
(pl.col("date_p") == datetime.date(2025, 12, 26)).and_(
pl.col("okey_tk") == "AAPL"
)
)
.select(["okey_tk", "okey_dt", "okey_xx", "okey_cp", "date", "srvol"])
.collect()
)

Working with Complex Data Types

Some SpiderRock datasets use complex data types available in Parquet files. For example, the AuctionNotice dataset contains an OrderLegs column of type "list of struct."

Some clients prefer to flatten these fields by creating a separate row for each list item and extracting the struct fields into separate columns. The following example demonstrates how to accomplish this using Polars:

df = (
pl.read_parquet(
"./AuctionNotice/date_p=2025-11-03/AuctionNotice_2025-11-03.parquet"
)
.explode("OrderLegs")
.unnest("OrderLegs", separator="_")
)