Frequently Asked Questions (FAQ)
General & Getting Started
What is the CrowdCent Challenge?
The CrowdCent Challenge is a series of open data science competitions (challenges) focused on predicting investment/market outcomes. Participants use various datasets to build machine learning models that predict future returns over various time horizons. Submissions are used to create meta models that can be turned into investable portfolios.
How do I get started?
Refer to the Installation & Quick Start Guide for detailed steps.
- Install the client library: We recommend using uv:
uv pip install crowdcent-challenge
. Alternatively, usepip install crowdcent-challenge
. - Get an API Key: Visit your CrowdCent profile page and generate a new key. Save it securely.
- Set up Authentication: Provide your API key either when initializing the Python
ChallengeClient
, setting theCROWDCENT_API_KEY
environment variable, or placing it in a.env
file (CROWDCENT_API_KEY=your_key_here
) in your project directory. - Explore: Use the Python client or the
crowdcent
CLI to list available challenges (crowdcent list-challenges
). - Download Data: Choose a challenge and download the training and inference data using the client or CLI (e.g.,
crowdcent download-training-data <challenge_slug> latest
). - Build & Submit: Train your model and submit your predictions in the required format.
Who can participate?
The challenge is open to anyone interested in data science, machine learning, and finance. Check the terms of service for more details.
API Key & Authentication
How do I get an API Key?
Go to your profile page on the CrowdCent website after logging in. Click the "Generate New Key" button.
How do I use my API Key?
The crowdcent-challenge
library (both Python client and CLI) automatically looks for your API key in the following order:
- Passed directly to the
ChallengeClient
initializer (api_key=...
). - The
CROWDCENT_API_KEY
environment variable. - A
.env
file in your current working directory containingCROWDCENT_API_KEY=your_key_here
.
What if my API key doesn't work?
Ensure you copied the key correctly and included the ApiKey
prefix if using tools like Swagger UI directly (the client library handles this automatically). Verify it hasn't been revoked on your profile page. If issues persist, generate a new key or contact support.
Python Client vs. CLI
What's the difference between the ChallengeClient
(Python) and the crowdcent
CLI?
They both interact with the same CrowdCent API.
ChallengeClient
(Python): Designed for programmatic use within your modeling scripts or notebooks. Ideal for automating data downloads, processing, and prediction uploads.crowdcent
(CLI): A command-line tool for manual operations like listing challenges, downloading specific data files, checking submission status, etc., directly from your terminal.
Which one should I use?
Use the ChallengeClient
within your Python code for automation and integration with your modeling workflow. Use the crowdcent
CLI for quick checks, manual downloads, or exploring the available challenges and data without writing Python code.
Data
What format is the data provided in?
All datasets (training data, inference features, meta models) and submission files are in the Apache Parquet (.parquet
) format. This columnar format is efficient for the type of data used in the challenge.
What kind of features are included?
- Important Note: Some challenges or training datasets might only provide target labels and identifiers (
id
,date
,target_10d
,target_30d
). In these cases, participants are expected to source or engineer their own relevant features. - Features are often obfuscated. Refer to the Data page and specific challenge rules for details on the data provided for each competition.
What am I predicting?
Your goal is to to predict relative performance metrics (e.g., returns) for different future time horizons based on the provided features. The required prediction columns vary by challenge, but may look like pred_10d
, pred_30d
. Always check the specific challenge rules for the exact requirements.
What's the difference between Training and Inference data?
- Training Data: Contains historical data, including target variables (e.g.,
target_1M
,target_3M
, ...) and identifiers (id
, dates). It may also contain pre-computed features, but sometimes you will need to generate your own features based on the provided IDs and timestamps. Used to train your models. Training datasets are versioned. - Inference Data: Contains features (if provided by the challenge) and identifiers (
id
) for a new period but without the target labels. This is the data you use (along with any features you generate) to make predictions for submission. Inference data is released periodically.
What is the 'Meta Model'?
The meta model typically represents an aggregation (e.g., an average or ensemble) of all valid user submissions for past inference periods within a specific challenge. It can serve as a benchmark or potentially as an additional feature for your own models. You can download it via the client or CLI.
Submissions
What format does my submission file need to be?
Your submission must be a Parquet file containing an id
column that matches the IDs from the corresponding inference dataset, and all the required prediction columns (e.g., pred_1M
, pred_3M
, etc.). All prediction columns must contain numeric values, and no missing values are allowed.
How do I submit my predictions?
- Python: Use
client.submit_predictions("path/to/your/predictions.parquet")
. - CLI: Use
crowdcent submit <challenge_slug> path/to/your/predictions.parquet
. Submissions are automatically associated with the currently active inference period for the specified challenge.
How often can I submit?
You can typically submit multiple times for an active inference period. Your latest valid submission before the deadline is the one that counts for scoring. Check the specific challenge rules.
How do I check my submission status?
- Python: Use
client.list_submissions()
(optionally filter byperiod='current'
orperiod='YYYY-MM-DD'
) andclient.get_submission(<submission_id>)
. - CLI: Use
crowdcent list-submissions <challenge_slug>
(optionally filter with--period current
or--period YYYY-MM-DD
) andcrowdcent get-submission <challenge_slug> <submission_id>
. Statuses include "pending", "processing", "evaluated" (or "scored"), and "error" (or "failed").
Environment & Troubleshooting
What version of Python should I use?
The client library requires Python 3.10 or higher (as specified in pyproject.toml
).
uv
or pip
?
We recommend using uv
for faster dependency management (uv pip install ...
). However, standard pip install ...
also works.
Where can I get help?
- Discord: Join the CrowdCent Discord server.
- Email: Contact info@crowdcent.com.
Contributing
How can I contribute?
Contributions to the crowdcent-challenge
client library and its documentation are welcome! Please see the Contributing Guidelines for details on the standard GitHub workflow (fork, branch, commit, PR).