XGBoost vs LSTM with Time-Series Cross-Validation

Learn how to leverage the temporal structure in CrowdCent's training data to dramatically improve prediction performance using an LSTM model compared to a traditional XGBoost approach. We'll use the sklego.model_selection.TimeGapSplit to set up proper time-series cross-validation with gap periods to prevent leakage and then visualize the performance on various metrics over time with moving averages.

Key Insight: Sequential Feature Processing¶

CrowdCent's training and inference data contains features with a defined temporal sequence through lag windows (e.g., feature_1_lag15, feature_1_lag10, feature_1_lag5, feature_1_lag0). While traditional models like XGBoost treat these as independent features, an LSTM can process them sequentially, capturing temporal dependencies.

Model Comparison¶

XGBoost: Treats all features as independent inputs, achieving ~0.06 average 30-day Spearman correlation
LSTM (from Centimators): Processes features sequentially by reshaping them along the lag axis, achieving ~0.19 average 30-day Spearman correlation - over 3x improvement!

Feature Reshaping Example¶

The LSTM transforms the flat feature vector:

[feature_1_lag10, feature_1_lag5, feature_1_lag0, feature_2_lag10, feature_2_lag5, feature_2_lag0, ...]

into a sequential 2D tensor:

[[feature_1_lag10, feature_2_lag10],
 [feature_1_lag5, feature_2_lag5],
 [feature_1_lag0, feature_2_lag0]]

What You'll Learn¶

How to set up proper time-series cross-validation with gap periods to prevent leakage
How to train both XGBoost and LSTM models on the same features for fair comparison
How to evaluate models using CrowdCent's official scoring metrics (Spearman correlation and NDCG@40)
How to visualize performance over time with moving averages

Key Findings¶

10-day predictions achieve higher raw scores than 30-day predictions (e.g., 0.255 vs 0.19 spearman corr for LSTM)
The LSTM significantly outperforms XGBoost without any hyperparameter tuning (experimental results)
All results shown are out-of-sample using proper time-series cross-validation (TimeGapSplit from sklego)

Note: This comparison uses identical features for both models and no hyperparameter optimization, demonstrating the power of sequential processing for this dataset.

In [1]:

Copied!





import crowdcent_challenge as cc
import polars as pl
from xgboost import XGBRegressor
from datetime import timedelta
from sklego.model_selection import TimeGapSplit
import altair as alt
import os
from crowdcent_challenge.scoring import evaluate_hyperliquid_submission

os.environ["KERAS_BACKEND"] = "jax"

from centimators.model_estimators import LSTMRegressor
import crowdcent_challenge as cc
import polars as pl
from xgboost import XGBRegressor
from datetime import timedelta
from sklego.model_selection import TimeGapSplit
import altair as alt
import os
from crowdcent_challenge.scoring import evaluate_hyperliquid_submission

os.environ["KERAS_BACKEND"] = "jax"

from centimators.model_estimators import LSTMRegressor

Initialize the client¶

In [2]:

Copied!

client = cc.ChallengeClient(
    challenge_slug="hyperliquid-ranking",
)
client = cc.ChallengeClient(
    challenge_slug="hyperliquid-ranking",
)

2025-06-16 20:28:18,994 - INFO - ChallengeClient initialized for 'hyperliquid-ranking' at URL: https://crowdcent.com/api

Get CrowdCent's training data¶

In [3]:

Copied!

client.download_training_dataset(version="latest", dest_path="training_data.parquet")

data = pl.read_parquet("training_data.parquet")
data.head()
client.download_training_dataset(version="latest", dest_path="training_data.parquet")

data = pl.read_parquet("training_data.parquet")
data.head()

2025-06-16 20:28:19,256 - INFO - Downloading training data for challenge 'hyperliquid-ranking' v1.0 to training_data.parquet
Downloading training_data.parquet: 100%|██████████| 85.1M/85.1M [00:01<00:00, 66.6MB/s]
2025-06-16 20:28:20,974 - INFO - Successfully downloaded training data to training_data.parquet

Out[3]:

shape: (5, 85)

id	eodhd_id	date	feature_16_lag15	feature_13_lag15	feature_14_lag15	feature_15_lag15	feature_8_lag15	feature_5_lag15	feature_6_lag15	feature_7_lag15	feature_12_lag15	feature_9_lag15	feature_10_lag15	feature_11_lag15	feature_4_lag15	feature_1_lag15	feature_2_lag15	feature_3_lag15	feature_20_lag15	feature_17_lag15	feature_18_lag15	feature_19_lag15	feature_16_lag10	feature_13_lag10	feature_14_lag10	feature_15_lag10	feature_8_lag10	feature_5_lag10	feature_6_lag10	feature_7_lag10	feature_12_lag10	feature_9_lag10	feature_10_lag10	feature_11_lag10	feature_4_lag10	feature_1_lag10	…	feature_5_lag5	feature_6_lag5	feature_7_lag5	feature_12_lag5	feature_9_lag5	feature_10_lag5	feature_11_lag5	feature_4_lag5	feature_1_lag5	feature_2_lag5	feature_3_lag5	feature_20_lag5	feature_17_lag5	feature_18_lag5	feature_19_lag5	feature_16_lag0	feature_13_lag0	feature_14_lag0	feature_15_lag0	feature_8_lag0	feature_5_lag0	feature_6_lag0	feature_7_lag0	feature_12_lag0	feature_9_lag0	feature_10_lag0	feature_11_lag0	feature_4_lag0	feature_1_lag0	feature_2_lag0	feature_3_lag0	feature_20_lag0	feature_17_lag0	feature_18_lag0	feature_19_lag0	target_10d	target_30d
str	str	datetime[μs]	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	…	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64
"BRETT"	"BRETT29743-USD.CC"	2024-05-06 00:00:00	0.55721	0.540086	0.512754	0.508386	0.692426	0.55288	0.549131	0.533897	0.736748	0.617263	0.603953	0.623957	0.450295	0.479222	0.519303	0.528181	0.527312	0.525878	0.497428	0.518564	0.239416	0.398313	0.451205	0.487323	0.118248	0.405337	0.448927	0.492024	0.086131	0.41144	0.485798	0.550351	0.236496	0.343396	…	0.177372	0.365126	0.493638	0.351825	0.218978	0.418121	0.546089	0.268613	0.252555	0.365888	0.495271	0.420438	0.551095	0.538486	0.542948	0.233333	0.249513	0.323913	0.447919	0.289855	0.263176	0.334256	0.435102	0.428986	0.390405	0.400922	0.508626	0.272464	0.270538	0.306967	0.431848	0.401449	0.410944	0.507738	0.507471	0.507246	0.985507
"BRETT"	"BRETT29743-USD.CC"	2024-05-07 00:00:00	0.456339	0.471133	0.514562	0.508886	0.54709	0.519471	0.515427	0.532992	0.613331	0.547406	0.606655	0.627392	0.563407	0.461703	0.512452	0.548603	0.517968	0.518984	0.491352	0.514658	0.233577	0.344958	0.446484	0.48631	0.116788	0.331939	0.402621	0.492332	0.056934	0.335133	0.450072	0.55014	0.110949	0.337178	…	0.175161	0.347316	0.470913	0.351814	0.204374	0.37589	0.527665	0.230646	0.170798	0.31625	0.470839	0.374209	0.602433	0.560709	0.53946	0.272464	0.269793	0.307375	0.428926	0.407246	0.32039	0.326165	0.426455	0.484058	0.417936	0.376534	0.491264	0.365217	0.297932	0.317555	0.420723	0.281159	0.327684	0.500999	0.487154	0.869565	0.985507
"BRETT"	"BRETT29743-USD.CC"	2024-05-08 00:00:00	0.566488	0.453614	0.507711	0.529308	0.383759	0.434102	0.471631	0.518237	0.448497	0.491656	0.563988	0.615796	0.37807	0.391257	0.465007	0.524881	0.46842	0.55421	0.480155	0.521811	0.106569	0.336529	0.407477	0.490267	0.287591	0.335675	0.417691	0.514816	0.230657	0.339577	0.45308	0.574943	0.291971	0.33502	…	0.262028	0.348065	0.471507	0.332974	0.281815	0.386736	0.525698	0.239405	0.265688	0.328473	0.472002	0.498498	0.616402	0.585306	0.538537	0.365217	0.297916	0.317222	0.417801	0.255072	0.245769	0.290722	0.407634	0.314493	0.323733	0.331655	0.468477	0.171014	0.20521	0.270115	0.396829	0.205797	0.352147	0.476755	0.47601	0.913043	0.985507
"BRETT"	"BRETT29743-USD.CC"	2024-05-09 00:00:00	0.382578	0.383157	0.46101	0.505768	0.401234	0.42504	0.441107	0.496157	0.321855	0.413705	0.524751	0.593716	0.40717	0.410203	0.472939	0.505173	0.590994	0.579914	0.497107	0.513647	0.286131	0.334355	0.427468	0.513196	0.268613	0.334924	0.391343	0.490949	0.337226	0.329541	0.440073	0.566184	0.254015	0.330593	…	0.261943	0.343492	0.450375	0.403047	0.370136	0.391921	0.52686	0.242272	0.248143	0.329173	0.449071	0.554988	0.581873	0.580894	0.524828	0.171014	0.205194	0.269774	0.393907	0.268116	0.261695	0.298309	0.387545	0.276812	0.339929	0.334735	0.464148	0.321739	0.282006	0.306299	0.392148	0.265217	0.410103	0.50499	0.468336	0.934783	0.992754
"BRETT"	"BRETT29743-USD.CC"	2024-05-10 00:00:00	0.407299	0.398432	0.467491	0.485709	0.268613	0.414399	0.446095	0.489301	0.232117	0.413673	0.488628	0.5732	0.240876	0.396751	0.456653	0.506786	0.60292	0.598595	0.535617	0.530595	0.252555	0.329927	0.390503	0.489134	0.235036	0.251825	0.366699	0.490923	0.348905	0.290511	0.440729	0.546316	0.267153	0.254015	…	0.268333	0.341366	0.43541	0.420322	0.384613	0.399143	0.530058	0.233365	0.250259	0.323505	0.451219	0.38994	0.498619	0.548607	0.510192	0.321739	0.28199	0.305958	0.389226	0.249275	0.275452	0.263639	0.385433	0.344928	0.382625	0.336568	0.452784	0.343478	0.288422	0.271218	0.394539	0.273913	0.331926	0.468518	0.468964	0.949275	0.985507

Create cross validation folds¶

In [4]:

Copied!





cv_kwargs = {
    "val_days": 100,
    "gap_days": 30,
    "n_splits": 3,
    "cv_window_type": "rolling",
}

cv = TimeGapSplit(
    date_serie=data["date"],
    valid_duration=timedelta(days=cv_kwargs["val_days"]),
    gap_duration=timedelta(days=cv_kwargs["gap_days"]),
    n_splits=cv_kwargs["n_splits"],
    window=cv_kwargs["cv_window_type"],
)

cv.summary(data)
cv_kwargs = {
    "val_days": 100,
    "gap_days": 30,
    "n_splits": 3,
    "cv_window_type": "rolling",
}

cv = TimeGapSplit(
    date_serie=data["date"],
    valid_duration=timedelta(days=cv_kwargs["val_days"]),
    gap_duration=timedelta(days=cv_kwargs["gap_days"]),
    n_splits=cv_kwargs["n_splits"],
    window=cv_kwargs["cv_window_type"],
)

cv.summary(data)

Out[4]:

shape: (6, 7)

Start date	End date	Period	Unique days	nbr samples	part	fold
datetime[μs]	datetime[μs]	duration[μs]	i64	i64	str	i64
2020-02-26 00:00:00	2024-06-09 00:00:00	1565d	1566	128580	"train"	0
2024-07-10 00:00:00	2024-10-17 00:00:00	99d	100	14170	"valid"	0
2020-06-05 00:00:00	2024-09-17 00:00:00	1565d	1566	138210	"train"	1
2024-10-18 00:00:00	2025-01-25 00:00:00	99d	100	15160	"valid"	1
2020-09-13 00:00:00	2024-12-26 00:00:00	1565d	1566	148300	"train"	2
2025-01-26 00:00:00	2025-05-05 00:00:00	99d	100	16958	"valid"	2

Define features, targets, and lag windows¶

In [5]:

Copied!





feature_cols = [
    col
    for col in data.columns
    if col not in ["id", "eodhd_id", "date", "target_10d", "target_30d"]
]
target_cols = ["target_10d", "target_30d"]
lag_windows = [0, 5, 10, 15]
n_features_per_timestep = len(feature_cols) // len(lag_windows)
feature_cols = [
    col
    for col in data.columns
    if col not in ["id", "eodhd_id", "date", "target_10d", "target_30d"]
]
target_cols = ["target_10d", "target_30d"]
lag_windows = [0, 5, 10, 15]
n_features_per_timestep = len(feature_cols) // len(lag_windows)

Train models on each train fold and predict validation¶

In [ ]:

Copied!





print("Training and evaluating XGBoost model with detailed scoring...")

fold_data = []
models = [
    XGBRegressor(n_estimators=2000),
    LSTMRegressor(
        output_units=2,
        lag_windows=lag_windows,
        n_features_per_timestep=n_features_per_timestep,
    ),
]

for fold, (train_idx, val_idx) in enumerate(cv.split(data)):
    print(f"\nFold {fold + 1}/{cv.n_splits}")

    # Get train and validation data
    train_data = data[train_idx]
    val_data = data[val_idx]

    print(f"  Train dates: {train_data['date'].min()} to {train_data['date'].max()}")
    print(f"  Val dates: {val_data['date'].min()} to {val_data['date'].max()}")
    print(f"  Train samples: {len(train_data)}, Val samples: {len(val_data)}")

    # Train model
    for model in models:
        fit_kwargs = {}
        if isinstance(model, LSTMRegressor):
            fit_kwargs["epochs"] = 5
            fit_kwargs["validation_data"] = (
                val_data[feature_cols].to_pandas(),
                val_data[target_cols].to_pandas(),
            )
        model.fit(
            train_data[feature_cols].to_pandas(),
            train_data[target_cols].to_pandas(),
            **fit_kwargs,
        )

        # Make predictions
        preds = pl.from_numpy(
            model.predict(val_data[feature_cols].to_pandas()),
            {"pred_10d": pl.Float64, "pred_30d": pl.Float64},
        )
        preds = preds.with_columns(
            pl.lit(fold).alias("fold"), pl.lit(model.__class__.__name__).alias("model")
        )

        fold_data.append(val_data.with_columns(preds))
print("Training and evaluating XGBoost model with detailed scoring...")

fold_data = []
models = [
    XGBRegressor(n_estimators=2000),
    LSTMRegressor(
        output_units=2,
        lag_windows=lag_windows,
        n_features_per_timestep=n_features_per_timestep,
    ),
]

for fold, (train_idx, val_idx) in enumerate(cv.split(data)):
    print(f"\nFold {fold + 1}/{cv.n_splits}")

    # Get train and validation data
    train_data = data[train_idx]
    val_data = data[val_idx]

    print(f"  Train dates: {train_data['date'].min()} to {train_data['date'].max()}")
    print(f"  Val dates: {val_data['date'].min()} to {val_data['date'].max()}")
    print(f"  Train samples: {len(train_data)}, Val samples: {len(val_data)}")

    # Train model
    for model in models:
        fit_kwargs = {}
        if isinstance(model, LSTMRegressor):
            fit_kwargs["epochs"] = 5
            fit_kwargs["validation_data"] = (
                val_data[feature_cols].to_pandas(),
                val_data[target_cols].to_pandas(),
            )
        model.fit(
            train_data[feature_cols].to_pandas(),
            train_data[target_cols].to_pandas(),
            **fit_kwargs,
        )

        # Make predictions
        preds = pl.from_numpy(
            model.predict(val_data[feature_cols].to_pandas()),
            {"pred_10d": pl.Float64, "pred_30d": pl.Float64},
        )
        preds = preds.with_columns(
            pl.lit(fold).alias("fold"), pl.lit(model.__class__.__name__).alias("model")
        )

        fold_data.append(val_data.with_columns(preds))

Training and evaluating XGBoost model with detailed scoring...

Fold 1/3
  Train dates: 2020-02-26 00:00:00 to 2024-06-09 00:00:00
  Val dates: 2024-07-10 00:00:00 to 2024-10-17 00:00:00
  Train samples: 128580, Val samples: 14170

Epoch 1/5
4019/4019 ━━━━━━━━━━━━━━━━━━━━ 22s 5ms/step - loss: 0.0817 - mse: 0.0817 - val_loss: 0.0784 - val_mse: 0.0784
Epoch 2/5
4019/4019 ━━━━━━━━━━━━━━━━━━━━ 21s 5ms/step - loss: 0.0784 - mse: 0.0784 - val_loss: 0.0781 - val_mse: 0.0781
Epoch 3/5
4019/4019 ━━━━━━━━━━━━━━━━━━━━ 21s 5ms/step - loss: 0.0780 - mse: 0.0780 - val_loss: 0.0781 - val_mse: 0.0781
Epoch 4/5
4019/4019 ━━━━━━━━━━━━━━━━━━━━ 21s 5ms/step - loss: 0.0783 - mse: 0.0783 - val_loss: 0.0781 - val_mse: 0.0781
Epoch 5/5
4019/4019 ━━━━━━━━━━━━━━━━━━━━ 21s 5ms/step - loss: 0.0781 - mse: 0.0781 - val_loss: 0.0778 - val_mse: 0.0778
28/28 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step

Fold 2/3
  Train dates: 2020-06-05 00:00:00 to 2024-09-17 00:00:00
  Val dates: 2024-10-18 00:00:00 to 2025-01-25 00:00:00
  Train samples: 138210, Val samples: 15160
Epoch 1/5
4320/4320 ━━━━━━━━━━━━━━━━━━━━ 24s 5ms/step - loss: 0.0773 - mse: 0.0773 - val_loss: 0.0734 - val_mse: 0.0734
Epoch 2/5
4320/4320 ━━━━━━━━━━━━━━━━━━━━ 22s 5ms/step - loss: 0.0775 - mse: 0.0775 - val_loss: 0.0729 - val_mse: 0.0729
Epoch 3/5
4320/4320 ━━━━━━━━━━━━━━━━━━━━ 22s 5ms/step - loss: 0.0773 - mse: 0.0773 - val_loss: 0.0728 - val_mse: 0.0728
Epoch 4/5
4320/4320 ━━━━━━━━━━━━━━━━━━━━ 22s 5ms/step - loss: 0.0772 - mse: 0.0772 - val_loss: 0.0729 - val_mse: 0.0729
Epoch 5/5
4320/4320 ━━━━━━━━━━━━━━━━━━━━ 22s 5ms/step - loss: 0.0773 - mse: 0.0773 - val_loss: 0.0731 - val_mse: 0.0731
30/30 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step

Fold 3/3
  Train dates: 2020-09-13 00:00:00 to 2024-12-26 00:00:00
  Val dates: 2025-01-26 00:00:00 to 2025-05-05 00:00:00
  Train samples: 148300, Val samples: 16958
Epoch 1/5
4635/4635 ━━━━━━━━━━━━━━━━━━━━ 25s 5ms/step - loss: 0.0769 - mse: 0.0769 - val_loss: 0.0767 - val_mse: 0.0767
Epoch 2/5
4635/4635 ━━━━━━━━━━━━━━━━━━━━ 24s 5ms/step - loss: 0.0767 - mse: 0.0767 - val_loss: 0.0783 - val_mse: 0.0783
Epoch 3/5
4635/4635 ━━━━━━━━━━━━━━━━━━━━ 24s 5ms/step - loss: 0.0769 - mse: 0.0769 - val_loss: 0.0770 - val_mse: 0.0770
Epoch 4/5
4635/4635 ━━━━━━━━━━━━━━━━━━━━ 24s 5ms/step - loss: 0.0764 - mse: 0.0764 - val_loss: 0.0767 - val_mse: 0.0767
Epoch 5/5
4635/4635 ━━━━━━━━━━━━━━━━━━━━ 24s 5ms/step - loss: 0.0768 - mse: 0.0768 - val_loss: 0.0787 - val_mse: 0.0787
34/34 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step

In [31]:

Copied!

combined_preds = pl.concat(fold_data)
combined_preds.head()
combined_preds = pl.concat(fold_data)
combined_preds.head()

Out[31]:

shape: (5, 89)

id	eodhd_id	date	feature_16_lag15	feature_13_lag15	feature_14_lag15	feature_15_lag15	feature_8_lag15	feature_5_lag15	feature_6_lag15	feature_7_lag15	feature_12_lag15	feature_9_lag15	feature_10_lag15	feature_11_lag15	feature_4_lag15	feature_1_lag15	feature_2_lag15	feature_3_lag15	feature_20_lag15	feature_17_lag15	feature_18_lag15	feature_19_lag15	feature_16_lag10	feature_13_lag10	feature_14_lag10	feature_15_lag10	feature_8_lag10	feature_5_lag10	feature_6_lag10	feature_7_lag10	feature_12_lag10	feature_9_lag10	feature_10_lag10	feature_11_lag10	feature_4_lag10	feature_1_lag10	…	feature_9_lag5	feature_10_lag5	feature_11_lag5	feature_4_lag5	feature_1_lag5	feature_2_lag5	feature_3_lag5	feature_20_lag5	feature_17_lag5	feature_18_lag5	feature_19_lag5	feature_16_lag0	feature_13_lag0	feature_14_lag0	feature_15_lag0	feature_8_lag0	feature_5_lag0	feature_6_lag0	feature_7_lag0	feature_12_lag0	feature_9_lag0	feature_10_lag0	feature_11_lag0	feature_4_lag0	feature_1_lag0	feature_2_lag0	feature_3_lag0	feature_20_lag0	feature_17_lag0	feature_18_lag0	feature_19_lag0	target_10d	target_30d	pred_10d	pred_30d	fold	model
str	str	datetime[μs]	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	…	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	i32	str
"BRETT"	"BRETT29743-USD.CC"	2024-07-10 00:00:00	0.582636	0.63196	0.567633	0.653365	0.567062	0.530895	0.508704	0.56644	0.505835	0.622784	0.645015	0.691582	0.61006	0.661289	0.568315	0.640492	0.504829	0.469691	0.471515	0.486004	0.591667	0.587152	0.516999	0.629483	0.386723	0.476893	0.436182	0.551673	0.581011	0.543423	0.546789	0.664751	0.453945	0.532003	…	0.486167	0.554475	0.619781	0.508145	0.481045	0.571167	0.583708	0.431666	0.397232	0.433462	0.443389	0.4	0.460027	0.523589	0.557202	0.367133	0.485561	0.481227	0.496003	0.427972	0.409647	0.476535	0.569828	0.388811	0.448478	0.49024	0.547042	0.544056	0.487861	0.460837	0.467384	0.328671	0.405594	0.262604	0.436791	0	"XGBRegressor"
"TIA"	"TIA-USD.CC"	2024-07-10 00:00:00	0.527324	0.462547	0.424517	0.479083	0.667706	0.596006	0.501783	0.502782	0.664125	0.489914	0.433712	0.452701	0.547324	0.464556	0.434525	0.467587	0.532696	0.549262	0.500588	0.489446	0.333488	0.430406	0.42528	0.450303	0.26735	0.467528	0.457388	0.474888	0.385108	0.524616	0.436511	0.439001	0.351354	0.449339	…	0.41404	0.451977	0.446531	0.44049	0.395922	0.430239	0.438368	0.548498	0.521432	0.535347	0.489605	0.758042	0.569009	0.499707	0.453081	0.752448	0.547282	0.507405	0.484174	0.883916	0.663444	0.59403	0.471979	0.630769	0.53563	0.492484	0.447263	0.591608	0.570053	0.541792	0.516922	0.013986	0.174825	0.189965	0.13181	0	"XGBRegressor"
"UNIBOT"	"UNIBOT27009-USD.CC"	2024-07-10 00:00:00	0.197304	0.492413	0.455958	0.434261	0.217384	0.427973	0.415099	0.429836	0.145171	0.438711	0.40283	0.410917	0.204346	0.441176	0.439864	0.43401	0.592475	0.495348	0.519382	0.52508	0.450291	0.323797	0.469618	0.423626	0.39935	0.308367	0.436676	0.406346	0.364257	0.254714	0.410198	0.388812	0.434758	0.319552	…	0.533015	0.485863	0.425274	0.539693	0.487225	0.464201	0.42267	0.379602	0.435206	0.465277	0.503209	0.381818	0.435098	0.379448	0.425286	0.51049	0.504368	0.406368	0.417618	0.376224	0.538998	0.396856	0.403073	0.511888	0.52579	0.422671	0.437971	0.604196	0.491899	0.516771	0.511046	0.972028	0.811189	0.312127	0.459849	0	"XGBRegressor"
"GMX"	"GMX11857-USD.CC"	2024-07-10 00:00:00	0.655352	0.547352	0.52505	0.51411	0.493541	0.478626	0.453018	0.502233	0.582777	0.488875	0.457043	0.480831	0.577586	0.516362	0.489627	0.518291	0.43994	0.512894	0.457628	0.51247	0.511632	0.583492	0.489914	0.525345	0.619905	0.556723	0.478647	0.518188	0.57777	0.580273	0.487355	0.488807	0.597449	0.587517	…	0.582902	0.535889	0.505657	0.598907	0.598178	0.55727	0.56189	0.466148	0.491062	0.501978	0.503281	0.488112	0.564592	0.574042	0.535743	0.425175	0.522023	0.539373	0.507692	0.492308	0.54017	0.560222	0.477271	0.461538	0.530223	0.55887	0.523608	0.577622	0.521885	0.499921	0.489724	0.265734	0.447552	0.362585	0.391194	0	"XGBRegressor"
"MERL"	"MERL-USD.CC"	2024-07-10 00:00:00	0.526539	0.40421	0.405903	0.454694	0.657304	0.499628	0.401089	0.481043	0.523863	0.432692	0.431368	0.517578	0.523581	0.48196	0.395542	0.446414	0.650423	0.513289	0.508811	0.512539	0.516113	0.521326	0.438554	0.481455	0.491579	0.574441	0.459491	0.504831	0.569024	0.546444	0.467102	0.548991	0.481897	0.502739	…	0.548784	0.490738	0.517577	0.364572	0.423235	0.452597	0.444308	0.417768	0.43888	0.476085	0.494537	0.584615	0.535802	0.528564	0.453047	0.535664	0.446439	0.51044	0.44371	0.605594	0.567069	0.556756	0.491442	0.584615	0.474594	0.488666	0.429555	0.544056	0.480912	0.51806	0.489741	0.055944	0.517483	0.277185	0.227254	0	"XGBRegressor"

Calculate scores¶

In [32]:

Copied!





def calculate_daily_scores(predictions_df: pl.DataFrame) -> pl.DataFrame:
    """Calculate daily scores for predictions using hyperliquid evaluation metrics.

    Args:
        predictions_df: DataFrame containing predictions and targets

    Returns:
        DataFrame with daily evaluation metrics
    """
    columns = ["pred_10d", "target_10d", "pred_30d", "target_30d"]

    return (
        predictions_df.group_by(["date", "model"])
        .agg([pl.col(col) for col in columns])
        .with_columns(
            pl.struct(columns)
            .alias("daily_scores")
            .map_elements(
                lambda x: evaluate_hyperliquid_submission(
                    y_true_10d=x["target_10d"],
                    y_pred_10d=x["pred_10d"],
                    y_true_30d=x["target_30d"],
                    y_pred_30d=x["pred_30d"],
                ),
                return_dtype=pl.Struct,
            )
        )
    ).unnest("daily_scores")


daily_scores = calculate_daily_scores(combined_preds)
daily_scores
def calculate_daily_scores(predictions_df: pl.DataFrame) -> pl.DataFrame:
    """Calculate daily scores for predictions using hyperliquid evaluation metrics.

    Args:
        predictions_df: DataFrame containing predictions and targets

    Returns:
        DataFrame with daily evaluation metrics
    """
    columns = ["pred_10d", "target_10d", "pred_30d", "target_30d"]

    return (
        predictions_df.group_by(["date", "model"])
        .agg([pl.col(col) for col in columns])
        .with_columns(
            pl.struct(columns)
            .alias("daily_scores")
            .map_elements(
                lambda x: evaluate_hyperliquid_submission(
                    y_true_10d=x["target_10d"],
                    y_pred_10d=x["pred_10d"],
                    y_true_30d=x["target_30d"],
                    y_pred_30d=x["pred_30d"],
                ),
                return_dtype=pl.Struct,
            )
        )
    ).unnest("daily_scores")


daily_scores = calculate_daily_scores(combined_preds)
daily_scores

Out[32]:

shape: (600, 10)

date	model	pred_10d	target_10d	pred_30d	target_30d	spearman_10d	spearman_30d	ndcg@40_10d	ndcg@40_30d
datetime[μs]	str	list[f64]	list[f64]	list[f64]	list[f64]	f64	f64	f64	f64
2025-03-19 00:00:00	"LSTMRegressor"	[0.510342, 0.589169, … 0.516715]	[0.611429, 0.831429, … 0.177143]	[0.520879, 0.534651, … 0.528912]	[0.845714, 0.231429, … 0.977143]	0.407122	0.284606	0.749886	0.690413
2025-03-05 00:00:00	"XGBRegressor"	[0.438884, 0.374294, … 0.463834]	[0.212644, 0.066092, … 0.833333]	[0.33935, 0.566031, … 0.488783]	[0.45977, 0.175287, … 0.511494]	0.152309	0.054923	0.653985	0.550478
2024-10-13 00:00:00	"LSTMRegressor"	[0.614008, 0.540451, … 0.517269]	[0.655629, 0.443709, … 0.907285]	[0.592994, 0.529025, … 0.515621]	[0.741722, 0.125828, … 0.695364]	0.127505	0.073099	0.603702	0.608025
2025-04-08 00:00:00	"XGBRegressor"	[0.509997, 0.595915, … 0.702079]	[0.938202, 0.176966, … 0.983146]	[0.625393, 0.684723, … 0.639499]	[0.955056, 0.69382, … 0.713483]	0.287763	0.04348	0.639029	0.585965
2024-10-24 00:00:00	"XGBRegressor"	[0.532243, 0.478276, … 0.520768]	[0.188312, 0.168831, … 0.305195]	[0.66408, 0.632372, … 0.574681]	[0.883117, 0.103896, … 0.350649]	0.228429	0.094441	0.655664	0.615117
…	…	…	…	…	…	…	…	…	…
2025-03-10 00:00:00	"LSTMRegressor"	[0.416265, 0.472912, … 0.559355]	[0.554286, 0.597143, … 0.748571]	[0.460617, 0.44555, … 0.547587]	[0.542857, 0.408571, … 0.925714]	0.178997	0.125798	0.566258	0.631034
2024-08-06 00:00:00	"XGBRegressor"	[0.913959, 0.477772, … 0.412959]	[0.041379, 0.641379, … 0.682759]	[0.760317, 0.657521, … 0.529122]	[0.034483, 0.02069, … 0.462069]	0.239257	-0.001098	0.696198	0.599227
2024-07-18 00:00:00	"XGBRegressor"	[0.68644, 0.285066, … 0.467805]	[0.482517, 0.083916, … 0.951049]	[0.57571, 0.447701, … 0.350697]	[0.160839, 0.286713, … 0.825175]	0.16556	-0.128605	0.649676	0.524151
2024-10-14 00:00:00	"XGBRegressor"	[0.428081, 0.506207, … 0.358323]	[0.112583, 0.298013, … 0.827815]	[0.41764, 0.546764, … 0.229562]	[0.344371, 0.192053, … 0.549669]	0.269487	0.179918	0.697752	0.658282
2024-09-12 00:00:00	"LSTMRegressor"	[0.458443, 0.373957, … 0.47088]	[0.136986, 0.047945, … 0.979452]	[0.488663, 0.425867, … 0.500952]	[0.60274, 0.205479, … 0.876712]	0.20172	0.056749	0.675872	0.585648

In [33]:

Copied!





from centimators.feature_transformers import MovingAverageTransformer

daily_scores = daily_scores.sort("date")
ma_transformer = MovingAverageTransformer(
    feature_names=["spearman_10d", "spearman_30d", "ndcg@40_10d", "ndcg@40_30d"],
    windows=[7, 30],
)
ma_columns = ma_transformer.fit_transform(
    daily_scores, ticker_series=daily_scores["model"]
)
daily_scores_df = daily_scores.with_columns(ma_columns)
from centimators.feature_transformers import MovingAverageTransformer

daily_scores = daily_scores.sort("date")
ma_transformer = MovingAverageTransformer(
    feature_names=["spearman_10d", "spearman_30d", "ndcg@40_10d", "ndcg@40_30d"],
    windows=[7, 30],
)
ma_columns = ma_transformer.fit_transform(
    daily_scores, ticker_series=daily_scores["model"]
)
daily_scores_df = daily_scores.with_columns(ma_columns)

Plot validation metrics (combining all folds)¶

In [37]:

Copied!





def plot_metric_comparison(df, metric_name, anchor_ref=0, width=400, height=200):
    """Plot a single metric across 10d and 30d timeframes side by side with shared y-axis.

    Args:
        df: Polars DataFrame with daily scores
        metric_name: Base metric name ('spearman' or 'ndcg@40')
        width: Chart width in pixels per timeframe
        height: Chart height
    """
    # Convert to pandas once
    pdf = df.to_pandas()

    # Shared selection for zooming
    brush = alt.selection_interval(bind="scales", encodings=["x"])

    # Define timeframes and colors
    timeframes = ["10d", "30d"]

    # Get column names for this metric
    col_10d = f"{metric_name}_10d"
    col_30d = f"{metric_name}_30d"

    # Calculate shared y-axis domain
    min_val = min(df[col_10d].min(), df[col_30d].min())
    max_val = max(df[col_10d].max(), df[col_30d].max())
    y_domain = [min_val * 0.95, max_val * 1.05]

    charts = []
    for timeframe in timeframes:
        col_name = f"{metric_name}_{timeframe}"

        # Calculate per-model statistics
        model_stats = (
            df.group_by("model")
            .agg(
                [
                    pl.col(col_name).mean().alias("mean"),
                    pl.col(col_name).std().alias("std"),
                ]
            )
            .with_columns((pl.col("mean") / pl.col("std")).alias("sharpe"))
        )

        # Create concise title with mean values
        mean_values = model_stats.to_pandas()
        mean_text = " | ".join(
            [f"{row['model']}: {row['mean']:.3f}" for _, row in mean_values.iterrows()]
        )
        title_text = f"{metric_name.upper()} {timeframe}\nMeans: {mean_text}"

        chart = (
            alt.Chart(pdf)
            .add_params(brush)
            .mark_point(opacity=0.6)
            .encode(
                x=alt.X("date:T", title="Date"),
                y=alt.Y(
                    f"{col_name}:Q",
                    title=f"{metric_name.upper()} {timeframe}",
                    scale=alt.Scale(domain=y_domain),
                ),
                color=alt.Color("model:N", legend=alt.Legend(symbolOpacity=1.0)),
                tooltip=[
                    "date:T",
                    "model:N",
                    f"{col_name}:Q",
                    alt.Tooltip("model:N", title="Model"),
                ],
            )
            .properties(
                width=width,
                height=height,
                title=alt.TitleParams(
                    text=title_text,
                    fontSize=10,
                    anchor="start",
                ),
            )
        )

        moving_average_chart = (
            alt.Chart(pdf)
            .mark_line(strokeWidth=2, opacity=1)
            .encode(
                x=alt.X("date:T"),
                y=alt.Y(f"{col_name}_ma30:Q", scale=alt.Scale(domain=y_domain)),
                color=alt.Color("model:N", legend=alt.Legend(symbolOpacity=1.0)),
                tooltip=[
                    "date:T",
                    "model:N",
                    f"{col_name}_ma30:Q",
                    alt.Tooltip("model:N", title="Model"),
                ],
            )
        )

        # Add per-model mean reference lines
        mean_lines = []
        for row in model_stats.iter_rows(named=True):
            model = row["model"]
            mean_val = row["mean"]

            mean_line = (
                alt.Chart(pdf[pdf["model"] == model])
                .mark_rule(strokeDash=[5, 5], opacity=1, strokeWidth=2)
                .encode(
                    y=alt.datum(mean_val),
                    color=alt.Color("model:N", legend=alt.Legend(symbolOpacity=0.7)),
                )
            )
            mean_lines.append(mean_line)

        # Add anchor reference line
        anchor_line = (
            alt.Chart(pdf)
            .mark_rule(strokeDash=[2, 2], opacity=0.7, color="gray", strokeWidth=1)
            .encode(y=alt.datum(anchor_ref))
        )

        combined_chart = chart + moving_average_chart + anchor_line
        for mean_line in mean_lines:
            combined_chart += mean_line

        charts.append(combined_chart)

    return alt.hconcat(*charts, spacing=10).resolve_scale(x="shared")


# Plot both metrics
spearman_chart = plot_metric_comparison(daily_scores_df, "spearman", anchor_ref=0)
ndcg_chart = plot_metric_comparison(daily_scores_df, "ndcg@40", anchor_ref=0.5)

# Combine vertically
alt.vconcat(spearman_chart, ndcg_chart, spacing=20)
def plot_metric_comparison(df, metric_name, anchor_ref=0, width=400, height=200):
    """Plot a single metric across 10d and 30d timeframes side by side with shared y-axis.

    Args:
        df: Polars DataFrame with daily scores
        metric_name: Base metric name ('spearman' or 'ndcg@40')
        width: Chart width in pixels per timeframe
        height: Chart height
    """
    # Convert to pandas once
    pdf = df.to_pandas()

    # Shared selection for zooming
    brush = alt.selection_interval(bind="scales", encodings=["x"])

    # Define timeframes and colors
    timeframes = ["10d", "30d"]

    # Get column names for this metric
    col_10d = f"{metric_name}_10d"
    col_30d = f"{metric_name}_30d"

    # Calculate shared y-axis domain
    min_val = min(df[col_10d].min(), df[col_30d].min())
    max_val = max(df[col_10d].max(), df[col_30d].max())
    y_domain = [min_val * 0.95, max_val * 1.05]

    charts = []
    for timeframe in timeframes:
        col_name = f"{metric_name}_{timeframe}"

        # Calculate per-model statistics
        model_stats = (
            df.group_by("model")
            .agg(
                [
                    pl.col(col_name).mean().alias("mean"),
                    pl.col(col_name).std().alias("std"),
                ]
            )
            .with_columns((pl.col("mean") / pl.col("std")).alias("sharpe"))
        )

        # Create concise title with mean values
        mean_values = model_stats.to_pandas()
        mean_text = " | ".join(
            [f"{row['model']}: {row['mean']:.3f}" for _, row in mean_values.iterrows()]
        )
        title_text = f"{metric_name.upper()} {timeframe}\nMeans: {mean_text}"

        chart = (
            alt.Chart(pdf)
            .add_params(brush)
            .mark_point(opacity=0.6)
            .encode(
                x=alt.X("date:T", title="Date"),
                y=alt.Y(
                    f"{col_name}:Q",
                    title=f"{metric_name.upper()} {timeframe}",
                    scale=alt.Scale(domain=y_domain),
                ),
                color=alt.Color("model:N", legend=alt.Legend(symbolOpacity=1.0)),
                tooltip=[
                    "date:T",
                    "model:N",
                    f"{col_name}:Q",
                    alt.Tooltip("model:N", title="Model"),
                ],
            )
            .properties(
                width=width,
                height=height,
                title=alt.TitleParams(
                    text=title_text,
                    fontSize=10,
                    anchor="start",
                ),
            )
        )

        moving_average_chart = (
            alt.Chart(pdf)
            .mark_line(strokeWidth=2, opacity=1)
            .encode(
                x=alt.X("date:T"),
                y=alt.Y(f"{col_name}_ma30:Q", scale=alt.Scale(domain=y_domain)),
                color=alt.Color("model:N", legend=alt.Legend(symbolOpacity=1.0)),
                tooltip=[
                    "date:T",
                    "model:N",
                    f"{col_name}_ma30:Q",
                    alt.Tooltip("model:N", title="Model"),
                ],
            )
        )

        # Add per-model mean reference lines
        mean_lines = []
        for row in model_stats.iter_rows(named=True):
            model = row["model"]
            mean_val = row["mean"]

            mean_line = (
                alt.Chart(pdf[pdf["model"] == model])
                .mark_rule(strokeDash=[5, 5], opacity=1, strokeWidth=2)
                .encode(
                    y=alt.datum(mean_val),
                    color=alt.Color("model:N", legend=alt.Legend(symbolOpacity=0.7)),
                )
            )
            mean_lines.append(mean_line)

        # Add anchor reference line
        anchor_line = (
            alt.Chart(pdf)
            .mark_rule(strokeDash=[2, 2], opacity=0.7, color="gray", strokeWidth=1)
            .encode(y=alt.datum(anchor_ref))
        )

        combined_chart = chart + moving_average_chart + anchor_line
        for mean_line in mean_lines:
            combined_chart += mean_line

        charts.append(combined_chart)

    return alt.hconcat(*charts, spacing=10).resolve_scale(x="shared")


# Plot both metrics
spearman_chart = plot_metric_comparison(daily_scores_df, "spearman", anchor_ref=0)
ndcg_chart = plot_metric_comparison(daily_scores_df, "ndcg@40", anchor_ref=0.5)

# Combine vertically
alt.vconcat(spearman_chart, ndcg_chart, spacing=20)

Out[37]: