Introduction
This library provides a comprehensive toolkit for working with structured data pipelines. It builds on top of established standards like Apache Arrow and Parquet to deliver high-throughput, low-latency data processing for analytical workloads. The design philosophy emphasizes composability, type safety, and minimal runtime overhead while remaining accessible to users coming from pandas or NumPy backgrounds.
Installation
Install the library using pip from the official PyPI repository. The package supports Python 3.9 through 3.12 on Linux, macOS, and Windows. For optimal performance on large datasets, we recommend installing the optional acceleration extras which include native SIMD-optimized routines and parallel execution backends. Source builds from GitHub require a recent Rust toolchain and CMake for the C++ extensions used by the columnar storage layer.
Quickstart
The simplest way to get started is to load a sample dataset and perform a few common operations. Below is a complete example demonstrating CSV ingestion, schema inference, basic filtering, and aggregation. Each step is intentionally explicit so you can see how the high-level API maps to lower-level primitives. After working through this example you will have a clear understanding of the data model and how to extend it for your own use cases.