import xorq.api as xo
con = xo.connect()
print(f"Connected to: {con}")- 1
-
The
connectfunction creates a connection to the embedded backend.
This guide shows you how to install Xorq and configure it for your environment.
Xorq requires Python 3.10 or higher. Check your version with python --version. If you need to install or update Python, then visit the official Python downloads page.
Choose the installation method that matches your needs. Start with the examples option to get all the dependencies for tutorials and ML workflows.
Install Xorq using pip. This gives you the core library, an embedded DataFusion backend, and Pandas support.
This option includes example datasets and ML libraries like scikit-learn, XGBoost, and the OpenAI SDK.
Install the latest development version directly from GitHub.
For local development, clone the repository and install in editable mode:
Xorq uses uv internally for dependency management. You can use it to install Xorq as well.
For project-based installation with locked dependencies:
Xorq runs on multiple execution engines. Choose the backends that match your infrastructure.
If you’re just getting started, install all backends. This lets you experiment with different engines and find what works best for your use case.
This installs support for all backends at once. It’s the fastest way to explore what Xorq can do.
DuckDB works well for analytical workloads on local or moderate-sized datasets. It excels at asof joins and working with Parquet files.
DataFusion works well for in-memory analytical processing and custom UDFs. Note that an embedded DataFusion backend is included in the base installation.
PostgreSQL works well for production workloads with existing PostgreSQL databases.
PyIceberg works well for working with Apache Iceberg tables in data lakes.
Snowflake works well for cloud data warehouse operations with managed infrastructure and scalability.
SQLite works well for lightweight, serverless databases and local development.
Once you’ve installed Xorq, you’ll need to connect to a backend before you can work with data.
The embedded backend is the default option. It uses a modified DataFusion engine that’s optimized for Arrow UDF execution.
Pandas works well for local development and small datasets. Here’s how to create a connection and load data into it:
PostgreSQL connections require database credentials. You can provide them directly or load them from environment variables.
Set these environment variables before running your code:
POSTGRES_HOSTPOSTGRES_PORTPOSTGRES_DATABASEPOSTGRES_USERPOSTGRES_PASSWORDThen connect using connect_env:
connect_env function reads credentials from environment variables.
You can also provide credentials directly in your code:
Don’t hardcode credentials in production code. Use environment variables or a secrets management system instead.
DuckDB connections can be in-memory or persistent. Here’s how both options work.
For a persistent database, provide the path:
Use persistent databases when you want your data to survive between sessions. In-memory databases are faster but lose data when your program exits.
Snowflake connections require your account credentials and resource identifiers.
Trino connections let you query federated data sources across your infrastructure.
You can verify your installation by running a simple query. This example loads the iris dataset and filters it.
This query uses the embedded backend, so you don’t need any additional setup. It’s a good way to confirm everything is working before connecting to external databases.
If everything works correctly, then you’ll see output showing the aggregated sepal widths grouped by species. The result is a PyArrow Table with two columns: species (Versicolor, Setosa, Virginica) and the summed sepal widths for each species. This confirms Xorq can load data, apply transformations, and run queries on your system.
Now that you’ve installed Xorq and connected to a backend, explore these tutorials: