Version: 8.5.12.1

Parquet Files on AWS S3

File Format

All our data is available in Parquet files, partitioned in Hive style by date. Date partitioning follows Central Time for US datasets and Central European Time for European data. To avoid conflicts with date columns that may be present in the files, we named the date partition date_p.

For options datasets, we additionally partition by type: futures (type=FUT) and equities (type=EQT). For European data, we provide a third partitioning level: the exchange the data pertains to.

This partitioning scheme allows for efficient access when filtering by the partition key in a WHERE clause. Additionally, the data is sorted by ticker, option characteristics, and timestamp (the date column) where applicable, further improving query performance.

All timestamps are in UTC.

Downloading the Data

During client onboarding, you will receive AWS credentials for accessing our data. These include a set of CLI credentials and a set of web portal credentials. We recommend logging into the AWS portal first to explore the folder structure and identify the data available to you.

To download data to your local machine, we recommend using the AWS CLI. Once you have identified the data you need, use the following command:

export AWS_ACCESS_KEY_ID=<your access key id>
export AWS_SECRET_ACCESS_KEY=<your access key secret>
aws s3 sync <s3://srhistdatastore/v8/......> <your local copy path>

Reading the Data

Many tools support working with Parquet files. Internally, we use DuckDB and Polars.

The following examples show how to efficiently read a single day's worth of SpiderRock volatilities from OptionIntradayHist for AAPL.

DuckDB

SELECT
	okey_tk,
	okey_dt,
	okey_xx,
	okey_cp,
	date,
	srvol
FROM
	read_parquet('./OptionIntradayHist/**', hive_partitioning = true)
WHERE
	date_p = '2025-12-26'
	AND okey_tk = 'AAPL'

Polars

import polars as pl
import datetime

df = (
    pl.scan_parquet("./OptionIntradayHist/**")
    .filter(
        (pl.col("date_p") == datetime.date(2025, 12, 26)).and_(
            pl.col("okey_tk") == "AAPL"
        )
    )
    .select(["okey_tk", "okey_dt", "okey_xx", "okey_cp", "date", "srvol"])
    .collect()
)

Working with Complex Data Types

Some of our datasets use complex data types available in Parquet files. For example, the AuctionNotice dataset contains an OrderLegs column of type "list of struct".

Some clients prefer to flatten these fields by creating a separate row for each list item and extracting the struct fields into separate columns. The following code snippet demonstrates how to do this using Polars:

df = (
    pl.read_parquet(
        "./AuctionNotice/date_p=2025-11-03/AuctionNotice_2025-11-03.parquet"
    )
    .explode("OrderLegs")
    .unnest("OrderLegs", separator="_")
)

File Format​

Downloading the Data​

Reading the Data​

DuckDB​

Polars​