Compare model performance

Run scikit-learn operations directly on database data and compare multiple models to find the best classifier

This tutorial shows you how to compare multiple machine learning models to find the best performer. You’ll learn how to evaluate different classifiers systematically using Xorq’s ML workflows, then select the best model based on accuracy scores.

You’ll train four classifiers (k-nearest neighbors, linear SVM, decision tree, random forest) on the same dataset and compare their performance. Xorq lets you run these scikit-learn operations directly on data from databases (PostgreSQL, Snowflake, etc.) without loading data into memory first. This tutorial uses synthetic data for demonstration, but the same approach works with database tables.

After completing this tutorial, you’ll know how to run experiments that compare model performance and select the best approach for your data.

Prerequisites

You need:

Why compare models?

Different algorithms perform differently on different datasets. You compare multiple classifiers to find which one gives the best accuracy for your specific data.

This tutorial trains four models: k-nearest neighbors, linear SVM, decision tree, and random forest. It then compares their accuracy scores.

How to follow along

This tutorial builds code incrementally. Each section provides a code block that you run sequentially.

Recommended approach: Open a terminal, run python to start an interactive Python shell, then copy and paste each code block in order.

Alternative approaches:

  • Jupyter notebook: Create a new notebook and run each code block in a separate cell

  • Python script: Combine all code blocks into a single .py file and run it

The code blocks build on each other. Variables like X_train, train, test, and features are created in earlier blocks and used in later ones.

Create synthetic data

Start by generating a classification dataset. You’ll use the “moons” dataset, which has two interleaving half-circles:

import xorq.api as xo
import pandas as pd
import numpy as np
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split


X, y = make_moons(noise=0.3, random_state=0)


X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.4, random_state=42
)


print(f"Training samples: {len(X_train)}")
print(f"Test samples: {len(X_test)}")
print(f"Features: {X.shape[1]}")
1
Generate a “moons” dataset with 100 samples and some noise.
2
Split into train (60%) and test (40%) sets.
3
Check the sizes.

The output shows:

Training samples: 60
Test samples: 40
Features: 2

This synthetic data has two classes that aren’t linearly separable. It’s perfect for comparing how different classifiers handle non-linear boundaries.

Understanding this data helps you interpret results later. The moons shape means linear models struggle while non-linear models perform better.

Convert to Xorq tables

Convert the NumPy arrays into Xorq table expressions:


def make_xorq_tables(X_train, y_train, X_test, y_test):
    con = xo.connect()
    
    # Create training table
    train = con.register(
        pd.DataFrame(X_train, columns=["feature_0", "feature_1"])
        .assign(target=y_train),
        "train"
    )
    
    # Create test table
    test = con.register(
        pd.DataFrame(X_test, columns=["feature_0", "feature_1"])
        .assign(target=y_test),
        "test"
    )
    
    features = ["feature_0", "feature_1"]
    return train, test, features


train, test, features = make_xorq_tables(X_train, y_train, X_test, y_test)


print(f"\nXorq tables created")
print(f"Train columns: {train.columns}")
print(f"Features: {features}")
1
Create a helper function that converts arrays to Xorq tables using con.register.
2
Convert your train/test data to Xorq expressions.
3
Verify the tables.

The output shows:

Xorq tables created
Train columns: ('feature_0', 'feature_1', 'target')
Features: ['feature_0', 'feature_1']

What just happened? You registered pandas DataFrames as tables in Xorq. Now you can use these tables with Xorq’s deferred execution patterns.

Tip

Using database data: Instead of con.register(), you can use data directly from databases. For example, with PostgreSQL: pg = xo.postgres.connect_env() then train = pg.table("training_data"). The same scikit-learn operations work regardless of whether your data comes from a database table or an in-memory DataFrame. Xorq handles the data transfer automatically.

Train and evaluate one model

Train a single classifier and measure its accuracy. This establishes a baseline before comparing multiple models.

import sklearn.pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from xorq.expr.ml import Pipeline


sklearn_pipeline = sklearn.pipeline.Pipeline([
    ("scaler", StandardScaler()),
    ("knn", KNeighborsClassifier(n_neighbors=3))
])


xorq_pipeline = Pipeline.from_instance(sklearn_pipeline)


fitted_pipeline = xorq_pipeline.fit(
    train,
    features=features,
    target="target"
)


score = fitted_pipeline.score_expr(test)
print(f"\nK-Nearest Neighbors accuracy: {score:.2%}")
1
Create a scikit-learn pipeline with scaling and k-nearest neighbors (k=3).
2
Wrap it with Xorq’s Pipeline.from_instance().
3
Fit on the training data (deferred).
4
Score the model on test data. score_expr() executes immediately and returns the accuracy score.

Example output:

K-Nearest Neighbors accuracy: 90.00%

Note: score_expr() executes immediately and returns a float score. The fit() operation is deferred, but scoring executes right away to give you the accuracy.

Compare multiple classifiers

Compare multiple models by defining each classifier, wrapping it in a pipeline, and evaluating them:

import sklearn.pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from xorq.expr.ml import Pipeline


classifiers = {
    "K-Nearest Neighbors": KNeighborsClassifier(n_neighbors=3),
    "Linear SVM": SVC(kernel="linear", C=0.025, random_state=42),
    "Decision Tree": DecisionTreeClassifier(max_depth=5, random_state=42),
    "Random Forest": RandomForestClassifier(
        max_depth=5, n_estimators=10, max_features=1, random_state=42
    ),
}


results = {}
for name, clf in classifiers.items():
    # Wrap in sklearn pipeline with scaling
    sklearn_pipe = sklearn.pipeline.Pipeline([
        ("scaler", StandardScaler()),
        ("classifier", clf)
    ])
    
    # Convert to Xorq and fit
    xorq_pipe = Pipeline.from_instance(sklearn_pipe)
    fitted = xorq_pipe.fit(train, features=features, target="target")
    
    # Evaluate
    score = fitted.score_expr(test)
    results[name] = score
    
    print(f"{name}: {score:.2%}")


best_model = max(results, key=results.get)
best_score = results[best_model]

print(f"\nBest model: {best_model}")
print(f"Best accuracy: {best_score:.2%}")
1
Define four classifiers to compare.
2
Loop through each: wrap, fit, score.
3
Find the best performer.

Example output:

K-Nearest Neighbors: 90.00%
Linear SVM: 85.00%
Decision Tree: 87.50%
Random Forest: 92.50%

Best model: Random Forest
Best accuracy: 92.50%

You’ve compared four different classifiers and identified that Random Forest performs best on this dataset. The non-linear decision boundary of Random Forest handles the moons shape better than linear models.

Verify against scikit-learn

Now you’ll verify that Xorq’s scores match scikit-learn’s scores exactly. This builds confidence that Xorq’s wrapper doesn’t change the underlying algorithms.

import numpy as np
import sklearn.pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from xorq.expr.ml import Pipeline


def verify_score(train, test, features, target, sklearn_pipeline):
    # Xorq evaluation
    xorq_pipe = Pipeline.from_instance(sklearn_pipeline)
    fitted = xorq_pipe.fit(train, features=features, target=target)
    xorq_score = fitted.score_expr(test)
    
    # sklearn evaluation
    train_df = train.execute()
    test_df = test.execute()
    sklearn_pipeline.fit(train_df[features], train_df[target])
    sklearn_score = sklearn_pipeline.score(test_df[features], test_df[target])
    
    return xorq_score, sklearn_score


sklearn_pipe = sklearn.pipeline.Pipeline([
    ("scaler", StandardScaler()),
    ("knn", KNeighborsClassifier(n_neighbors=5))
])

xorq_score, sklearn_score = verify_score(
    train, test, features, "target", sklearn_pipe
)


print(f"\nVerification:")
print(f"Xorq score: {xorq_score:.4f}")
print(f"sklearn score: {sklearn_score:.4f}")
print(f"Match: {np.isclose(xorq_score, sklearn_score)}")
1
Create a helper that evaluates with both Xorq and scikit-learn.
2
Test with a k-nearest neighbors classifier.
3
Verify the scores match.

Example output:

Verification:
Xorq score: 0.9000
sklearn score: 0.9000
Match: True

This confirms that Xorq produces identical results to scikit-learn. The only difference is deferred execution and caching. The algorithms themselves are unchanged.

Complete example

Here’s the full workflow in one place:

import xorq.api as xo
import pandas as pd
import numpy as np
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
import sklearn.pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from xorq.expr.ml import Pipeline

# Generate synthetic data
X, y = make_moons(noise=0.3, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.4, random_state=42
)

# Convert to Xorq tables
def make_xorq_tables(X_train, y_train, X_test, y_test):
    con = xo.connect()
    train = con.register(
        pd.DataFrame(X_train, columns=["feature_0", "feature_1"])
        .assign(target=y_train),
        "train"
    )
    test = con.register(
        pd.DataFrame(X_test, columns=["feature_0", "feature_1"])
        .assign(target=y_test),
        "test"
    )
    features = ["feature_0", "feature_1"]
    return train, test, features

train, test, features = make_xorq_tables(X_train, y_train, X_test, y_test)

# Define classifiers
classifiers = {
    "K-Nearest Neighbors": KNeighborsClassifier(n_neighbors=3),
    "Linear SVM": SVC(kernel="linear", C=0.025, random_state=42),
    "Decision Tree": DecisionTreeClassifier(max_depth=5, random_state=42),
    "Random Forest": RandomForestClassifier(
        max_depth=5, n_estimators=10, max_features=1, random_state=42
    ),
}

# Evaluate each classifier
results = {}
for name, clf in classifiers.items():
    sklearn_pipe = sklearn.pipeline.Pipeline([
        ("scaler", StandardScaler()),
        ("classifier", clf)
    ])
    xorq_pipe = Pipeline.from_instance(sklearn_pipe)
    fitted = xorq_pipe.fit(train, features=features, target="target")
    score = fitted.score_expr(test)
    results[name] = score
    print(f"{name}: {score:.2%}")

# Select best model
best_model = max(results, key=results.get)
print(f"\nBest model: {best_model} ({results[best_model]:.2%})")

The pattern is consistent: wrap, fit, score, compare.

What you learned

You’ve learned how to evaluate multiple models systematically. Here’s what you accomplished:

  • Created synthetic classification data with make_moons
  • Converted NumPy arrays to Xorq table expressions
  • Trained and scored individual classifiers
  • Compared multiple models to find the best performer
  • Verified that Xorq matches scikit-learn exactly

The same approach works with data from databases (PostgreSQL, Snowflake, etc.). Instead of loading data into memory, you can run scikit-learn operations directly on database tables. Model comparison is systematic with Xorq: define your candidates, evaluate them all, pick the winner. The deferred execution pattern works across any scikit-learn estimator.

Next steps

Now that you know how to compare models, continue learning: