Compare model performance
This tutorial shows you how to compare multiple machine learning models to find the best performer. You’ll learn how to evaluate different classifiers systematically using Xorq’s ML workflows, then select the best model based on accuracy scores.
You’ll train four classifiers (k-nearest neighbors, linear SVM, decision tree, random forest) on the same dataset and compare their performance. Xorq lets you run these scikit-learn operations directly on data from databases (PostgreSQL, Snowflake, etc.) without loading data into memory first. This tutorial uses synthetic data for demonstration, but the same approach works with database tables.
After completing this tutorial, you’ll know how to run experiments that compare model performance and select the best approach for your data.
Prerequisites
You need:
- Xorq installed (see Install Xorq)
- Basic familiarity with scikit-learn classifiers
- Understanding of train/test splits
Why compare models?
Different algorithms perform differently on different datasets. You compare multiple classifiers to find which one gives the best accuracy for your specific data.
This tutorial trains four models: k-nearest neighbors, linear SVM, decision tree, and random forest. It then compares their accuracy scores.
How to follow along
This tutorial builds code incrementally. Each section provides a code block that you run sequentially.
Recommended approach: Open a terminal, run python to start an interactive Python shell, then copy and paste each code block in order.
Alternative approaches:
Jupyter notebook: Create a new notebook and run each code block in a separate cell
Python script: Combine all code blocks into a single
.pyfile and run it
The code blocks build on each other. Variables like X_train, train, test, and features are created in earlier blocks and used in later ones.
Create synthetic data
Start by generating a classification dataset. You’ll use the “moons” dataset, which has two interleaving half-circles:
import xorq.api as xo
import pandas as pd
import numpy as np
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
X, y = make_moons(noise=0.3, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.4, random_state=42
)
print(f"Training samples: {len(X_train)}")
print(f"Test samples: {len(X_test)}")
print(f"Features: {X.shape[1]}")- 1
- Generate a “moons” dataset with 100 samples and some noise.
- 2
- Split into train (60%) and test (40%) sets.
- 3
- Check the sizes.
The output shows:
Training samples: 60
Test samples: 40
Features: 2
This synthetic data has two classes that aren’t linearly separable. It’s perfect for comparing how different classifiers handle non-linear boundaries.
Understanding this data helps you interpret results later. The moons shape means linear models struggle while non-linear models perform better.
Convert to Xorq tables
Convert the NumPy arrays into Xorq table expressions:
def make_xorq_tables(X_train, y_train, X_test, y_test):
con = xo.connect()
# Create training table
train = con.register(
pd.DataFrame(X_train, columns=["feature_0", "feature_1"])
.assign(target=y_train),
"train"
)
# Create test table
test = con.register(
pd.DataFrame(X_test, columns=["feature_0", "feature_1"])
.assign(target=y_test),
"test"
)
features = ["feature_0", "feature_1"]
return train, test, features
train, test, features = make_xorq_tables(X_train, y_train, X_test, y_test)
print(f"\nXorq tables created")
print(f"Train columns: {train.columns}")
print(f"Features: {features}")- 1
- Create a helper function that converts arrays to Xorq tables using con.register.
- 2
- Convert your train/test data to Xorq expressions.
- 3
- Verify the tables.
The output shows:
Xorq tables created
Train columns: ('feature_0', 'feature_1', 'target')
Features: ['feature_0', 'feature_1']
What just happened? You registered pandas DataFrames as tables in Xorq. Now you can use these tables with Xorq’s deferred execution patterns.
Using database data: Instead of con.register(), you can use data directly from databases. For example, with PostgreSQL: pg = xo.postgres.connect_env() then train = pg.table("training_data"). The same scikit-learn operations work regardless of whether your data comes from a database table or an in-memory DataFrame. Xorq handles the data transfer automatically.
Train and evaluate one model
Train a single classifier and measure its accuracy. This establishes a baseline before comparing multiple models.
import sklearn.pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from xorq.expr.ml import Pipeline
sklearn_pipeline = sklearn.pipeline.Pipeline([
("scaler", StandardScaler()),
("knn", KNeighborsClassifier(n_neighbors=3))
])
xorq_pipeline = Pipeline.from_instance(sklearn_pipeline)
fitted_pipeline = xorq_pipeline.fit(
train,
features=features,
target="target"
)
score = fitted_pipeline.score_expr(test)
print(f"\nK-Nearest Neighbors accuracy: {score:.2%}")- 1
- Create a scikit-learn pipeline with scaling and k-nearest neighbors (k=3).
- 2
- Wrap it with Xorq’s Pipeline.from_instance().
- 3
- Fit on the training data (deferred).
- 4
-
Score the model on test data.
score_expr()executes immediately and returns the accuracy score.
Example output:
K-Nearest Neighbors accuracy: 90.00%
Note: score_expr() executes immediately and returns a float score. The fit() operation is deferred, but scoring executes right away to give you the accuracy.
Compare multiple classifiers
Compare multiple models by defining each classifier, wrapping it in a pipeline, and evaluating them:
import sklearn.pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from xorq.expr.ml import Pipeline
classifiers = {
"K-Nearest Neighbors": KNeighborsClassifier(n_neighbors=3),
"Linear SVM": SVC(kernel="linear", C=0.025, random_state=42),
"Decision Tree": DecisionTreeClassifier(max_depth=5, random_state=42),
"Random Forest": RandomForestClassifier(
max_depth=5, n_estimators=10, max_features=1, random_state=42
),
}
results = {}
for name, clf in classifiers.items():
# Wrap in sklearn pipeline with scaling
sklearn_pipe = sklearn.pipeline.Pipeline([
("scaler", StandardScaler()),
("classifier", clf)
])
# Convert to Xorq and fit
xorq_pipe = Pipeline.from_instance(sklearn_pipe)
fitted = xorq_pipe.fit(train, features=features, target="target")
# Evaluate
score = fitted.score_expr(test)
results[name] = score
print(f"{name}: {score:.2%}")
best_model = max(results, key=results.get)
best_score = results[best_model]
print(f"\nBest model: {best_model}")
print(f"Best accuracy: {best_score:.2%}")- 1
- Define four classifiers to compare.
- 2
- Loop through each: wrap, fit, score.
- 3
- Find the best performer.
Example output:
K-Nearest Neighbors: 90.00%
Linear SVM: 85.00%
Decision Tree: 87.50%
Random Forest: 92.50%
Best model: Random Forest
Best accuracy: 92.50%
You’ve compared four different classifiers and identified that Random Forest performs best on this dataset. The non-linear decision boundary of Random Forest handles the moons shape better than linear models.
Verify against scikit-learn
Now you’ll verify that Xorq’s scores match scikit-learn’s scores exactly. This builds confidence that Xorq’s wrapper doesn’t change the underlying algorithms.
import numpy as np
import sklearn.pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from xorq.expr.ml import Pipeline
def verify_score(train, test, features, target, sklearn_pipeline):
# Xorq evaluation
xorq_pipe = Pipeline.from_instance(sklearn_pipeline)
fitted = xorq_pipe.fit(train, features=features, target=target)
xorq_score = fitted.score_expr(test)
# sklearn evaluation
train_df = train.execute()
test_df = test.execute()
sklearn_pipeline.fit(train_df[features], train_df[target])
sklearn_score = sklearn_pipeline.score(test_df[features], test_df[target])
return xorq_score, sklearn_score
sklearn_pipe = sklearn.pipeline.Pipeline([
("scaler", StandardScaler()),
("knn", KNeighborsClassifier(n_neighbors=5))
])
xorq_score, sklearn_score = verify_score(
train, test, features, "target", sklearn_pipe
)
print(f"\nVerification:")
print(f"Xorq score: {xorq_score:.4f}")
print(f"sklearn score: {sklearn_score:.4f}")
print(f"Match: {np.isclose(xorq_score, sklearn_score)}")- 1
- Create a helper that evaluates with both Xorq and scikit-learn.
- 2
- Test with a k-nearest neighbors classifier.
- 3
- Verify the scores match.
Example output:
Verification:
Xorq score: 0.9000
sklearn score: 0.9000
Match: True
This confirms that Xorq produces identical results to scikit-learn. The only difference is deferred execution and caching. The algorithms themselves are unchanged.
Complete example
Here’s the full workflow in one place:
import xorq.api as xo
import pandas as pd
import numpy as np
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
import sklearn.pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from xorq.expr.ml import Pipeline
# Generate synthetic data
X, y = make_moons(noise=0.3, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.4, random_state=42
)
# Convert to Xorq tables
def make_xorq_tables(X_train, y_train, X_test, y_test):
con = xo.connect()
train = con.register(
pd.DataFrame(X_train, columns=["feature_0", "feature_1"])
.assign(target=y_train),
"train"
)
test = con.register(
pd.DataFrame(X_test, columns=["feature_0", "feature_1"])
.assign(target=y_test),
"test"
)
features = ["feature_0", "feature_1"]
return train, test, features
train, test, features = make_xorq_tables(X_train, y_train, X_test, y_test)
# Define classifiers
classifiers = {
"K-Nearest Neighbors": KNeighborsClassifier(n_neighbors=3),
"Linear SVM": SVC(kernel="linear", C=0.025, random_state=42),
"Decision Tree": DecisionTreeClassifier(max_depth=5, random_state=42),
"Random Forest": RandomForestClassifier(
max_depth=5, n_estimators=10, max_features=1, random_state=42
),
}
# Evaluate each classifier
results = {}
for name, clf in classifiers.items():
sklearn_pipe = sklearn.pipeline.Pipeline([
("scaler", StandardScaler()),
("classifier", clf)
])
xorq_pipe = Pipeline.from_instance(sklearn_pipe)
fitted = xorq_pipe.fit(train, features=features, target="target")
score = fitted.score_expr(test)
results[name] = score
print(f"{name}: {score:.2%}")
# Select best model
best_model = max(results, key=results.get)
print(f"\nBest model: {best_model} ({results[best_model]:.2%})")The pattern is consistent: wrap, fit, score, compare.
What you learned
You’ve learned how to evaluate multiple models systematically. Here’s what you accomplished:
- Created synthetic classification data with make_moons
- Converted NumPy arrays to Xorq table expressions
- Trained and scored individual classifiers
- Compared multiple models to find the best performer
- Verified that Xorq matches scikit-learn exactly
The same approach works with data from databases (PostgreSQL, Snowflake, etc.). Instead of loading data into memory, you can run scikit-learn operations directly on database tables. Model comparison is systematic with Xorq: define your candidates, evaluate them all, pick the winner. The deferred execution pattern works across any scikit-learn estimator.
Next steps
Now that you know how to compare models, continue learning:
- Train your first model — Learn the basics of model training with Xorq
- Split data for training — Create proper train/test/validation splits