Delta Lake Adapter
ZooPipe provides experimental support for Delta Lake, enabling high-performance reads and writes to Data Lakehouses.
Features
- ACID Transactions: Writers generate data files, and a central Coordinator commits them atomically to the Delta Log (Implementation pending API stabilization).
- Time Travel: Read older versions of your data by specifying a version number.
- Cloud Native: Native support for S3, Azure Blob Storage, and GCS using Rust-based connectors.
- Schema Evolution: (Planned) Automatically merge schema changes.
Installation
Delta Lake support is included in the standard zoopipe package, but requires cloud credentials if accessing remote storage.
Usage
Reading from Delta Lake
from zoopipe import DeltaInputAdapter
# Read the latest version
adapter = DeltaInputAdapter(
"s3://my-data-lake/users_table",
storage_options={
"AWS_ACCESS_KEY_ID": "...",
"AWS_SECRET_ACCESS_KEY": "...",
"AWS_REGION": "us-east-1"
}
)
for batch in adapter.get_batches():
print(f"Read {len(batch)} records")
Writing to Delta Lake
from zoopipe import DeltaOutputAdapter, Pipe
# Append mode (default)
writer = DeltaOutputAdapter(
"s3://my-data-lake/processed_users",
mode="append",
storage_options={...}
)
pipe = Pipe(
...,
output_adapter=writer
)
pipe.run()
Architecture
ZooPipe uses the delta-rs (deltalake) Rust crate for protocol compliance.
- Reading: The
DeltaReaderqueries the_delta_logto find valid Parquet files for the requested snapshot and uses Zoopipe's multi-threaded Parquet engine to read them. - Writing: Workers write standard Parquet files to the table directory.
- Committing: The
DeltaCoordinatorcollects the list of new files and creates an atomicAddtransaction in the Delta Log.
Note on Version 2026.1: Support for atomic commits is currently stubbed due to breaking changes in the underlying
deltalake0.30 Rust API. Files are written to storage but not yet registered in the log.