Install Xorq

Set up Xorq in your environment with Python 3.10 or higher

This guide shows you how to install Xorq and configure it for your environment.

Python version

Xorq requires Python 3.10 or higher. Check your version with python --version. If you need to install or update Python, then visit the official Python downloads page.

Install Xorq

Choose the installation method that matches your needs. Start with the examples option to get all the dependencies for tutorials and ML workflows.

Tip

Install Xorq with examples to follow tutorials and build ML pipelines. This option includes scikit-learn, XGBoost, DuckDB, OpenAI, and other libraries you’ll need.

pip install "xorq[examples]"

Install Xorq using pip. This gives you the core library, an embedded DataFusion backend, and Pandas support.

pip install xorq

This option includes example datasets and ML libraries like scikit-learn, XGBoost, and the OpenAI SDK.

pip install "xorq[examples]"

Install the latest development version directly from GitHub.

pip install git+https://github.com/xorq-labs/xorq.git

For local development, clone the repository and install in editable mode:

git clone https://github.com/xorq-labs/xorq.git
cd xorq
pip install -e ".[examples]"

Xorq uses uv internally for dependency management. You can use it to install Xorq as well.

# Install uv first if you don't have it
pip install uv

# Install Xorq with uv
uv pip install xorq

For project-based installation with locked dependencies:

uv init my-xorq-project
cd my-xorq-project
uv add xorq

This runs Xorq directly from the GitHub repository using nix.

nix run github:xorq-labs/xorq

Install backend support

Xorq runs on multiple execution engines. Choose the backends that match your infrastructure.

Tip

If you’re just getting started, install all backends. This lets you experiment with different engines and find what works best for your use case.

This installs support for all backends at once. It’s the fastest way to explore what Xorq can do.

pip install "xorq[examples,duckdb,snowflake,postgres,pyiceberg,datafusion,sqlite]"

DuckDB works well for analytical workloads on local or moderate-sized datasets. It excels at asof joins and working with Parquet files.

pip install "xorq[duckdb]"

DataFusion works well for in-memory analytical processing and custom UDFs. Note that an embedded DataFusion backend is included in the base installation.

pip install "xorq[datafusion]"

PostgreSQL works well for production workloads with existing PostgreSQL databases.

pip install "xorq[postgres]"

PyIceberg works well for working with Apache Iceberg tables in data lakes.

pip install "xorq[pyiceberg]"

Snowflake works well for cloud data warehouse operations with managed infrastructure and scalability.

pip install "xorq[snowflake]"

SQLite works well for lightweight, serverless databases and local development.

pip install "xorq[sqlite]"

Trino works well for distributed queries, data federation, and enterprise security. Install the Trino Python client separately.

pip install trino

Connect to backends

Once you’ve installed Xorq, you’ll need to connect to a backend before you can work with data.

The embedded backend is the default option. It uses a modified DataFusion engine that’s optimized for Arrow UDF execution.

import xorq.api as xo


con = xo.connect()
print(f"Connected to: {con}")

1: The connect function creates a connection to the embedded backend.

Pandas works well for local development and small datasets. Here’s how to create a connection and load data into it:

import xorq.api as xo
import pandas as pd


pandas_con = xo.pandas.connect()


df = pd.DataFrame({
    "a": [1, 2, 3, 4, 5],
    "b": [2, 3, 4, 5, 6]
})


table = pandas_con.create_table("my_table", df)

1: Connect to the Pandas backend.
2: Create a Pandas DataFrame with sample data.
3: Load the DataFrame into a table in the Pandas backend.

PostgreSQL connections require database credentials. You can provide them directly or load them from environment variables.

Using environment variables:

Set these environment variables before running your code:

POSTGRES_HOST
POSTGRES_PORT
POSTGRES_DATABASE
POSTGRES_USER
POSTGRES_PASSWORD

Then connect using connect_env:

import xorq.api as xo


pg_con = xo.postgres.connect_env()


batting_table = pg_con.table("batting")

1: The connect_env function reads credentials from environment variables.
2: Access an existing table in your PostgreSQL database.

Using direct credentials:

You can also provide credentials directly in your code:

import xorq.api as xo

pg_con = xo.postgres.connect(
    host="localhost",
    port=5432,
    database="your_database",
    user="your_user",
    password="your_password"
)

batting_table = pg_con.table("batting")

1: Replace these values with your actual PostgreSQL connection details.

Warning

Don’t hardcode credentials in production code. Use environment variables or a secrets management system instead.

DuckDB connections can be in-memory or persistent. Here’s how both options work.

In-memory database:

import xorq.api as xo


duck_con = xo.duckdb.connect()

1: This creates an in-memory DuckDB database.

Persistent database:

For a persistent database, provide the path:

import xorq.api as xo


duck_con = xo.duckdb.connect(database="my_database.duckdb")

1: This creates or opens a DuckDB database file.

Tip

Use persistent databases when you want your data to survive between sessions. In-memory databases are faster but lose data when your program exits.

Snowflake connections require your account credentials and resource identifiers.

import xorq.api as xo

snow_con = xo.snowflake.connect(
    user="your_user",
    password="your_password",
    account="your_account",
    role="your_role",
    warehouse="your_warehouse",
    database="your_database",
    schema="your_schema"
)

1: Replace these values with your actual Snowflake credentials.

Trino connections let you query federated data sources across your infrastructure.

import xorq.api as xo

trino_con = xo.trino.connect(
    host="localhost",
    port=8080,
    user="your_user",
    database="your_catalog",
    schema="your_schema"
)

1: Replace these values with your Trino server details.

SQLite works well for lightweight databases and local development.

# This creates or opens an SQLite database file.
import xorq.api as xo

sqlite_con = xo.sqlite.connect(database="my_data.db")

Verify Xorq installation

You can verify your installation by running a simple query. This example loads the iris dataset and filters it.

Note

This query uses the embedded backend, so you don’t need any additional setup. It’s a good way to confirm everything is working before connecting to external databases.

import xorq.api as xo  

con = xo.connect()

iris = xo.examples.iris.fetch(backend=con)

filtered = iris.filter(xo._.sepal_length > 5)

grouped = filtered.group_by("species").agg(
    xo._.sepal_width.sum()
)

result = grouped.execute()
print(result)

1: Connect to the embedded backend.
2: Load the iris dataset into your backend.
3: Filter rows where sepal length is greater than five.
4: Group by species and sum the sepal widths.
5: Execute the query and print results.

If everything works correctly, then you’ll see output showing the aggregated sepal widths grouped by species. The result is a PyArrow Table with two columns: species (Versicolor, Setosa, Virginica) and the summed sepal widths for each species. This confirms Xorq can load data, apply transformations, and run queries on your system.

Next steps

Now that you’ve installed Xorq and connected to a backend, explore these tutorials:

Defer query execution — Learn when Xorq builds expressions versus when it runs computation
Cache expression results — Understand cache hits and misses, plus how invalidation works
Switch between backends — Move data between different execution engines