Introduction
Whale‑E is a command-line backtesting engine designed for crypto markets. It tests and optimizes trading strategies using historical data from multiple exchanges.
Strategies are written in TOML, a declarative, human-readable format. You do not need to write code: you combine indicators and conditional blocks to define your entry and exit rules.
Instead of setting a single value for each parameter, you define ranges or sets of possible values. All of those combinations are then tested. For each run, an objective that you define computes a score; this score is used to rank the results and select the configurations that best match your performance criteria.
For a given strategy and parameter set, the results are always the same. You can also export a backtest configuration as Pine Script to verify a combination outside the engine.
Grid search basics
To test all the parameters that vary in a strategy, grid search is used. This method systematically explores every possible combination of the parameters defined in your TOML file.
Take a moving average as an example. Instead of locking in a single type of moving average, you can test several variants, such as SMA, EMA or WMA. For each type of moving average, you then define a range of possible lengths, for example from 10 to 50 periods.
This exploration works on two levels:
- grids, which group together all the fixed “qualitative” choices, such as the moving average type, price source, timeframe or symbol
- hyperparameters, which are purely numeric settings, such as the length of an indicator or an RSI threshold, and are explored within each grid.
For every grid and every hyperparameter combination, a backtest is run and evaluated with the objective you defined. This allows you to sort the backtests and select the configurations that best match your criteria.
There are other optimization methods besides grid search, such as genetic algorithms or Bayesian optimization. These approaches only evaluate a subset of the possible combinations: they aim to get closer to the best solution by focusing first on the most promising regions of the parameter space. Their advantage is that they can find good configurations with far fewer tests, which significantly reduces computation time when the search space is very large.
Grid search takes a different approach: it tests every combination one by one, which covers all the possibilities you defined upfront at the cost of a higher number of tests.
Growth in the number of combinations
With grid search, the number of combinations grows very quickly as soon as you add parameters or possible values for each parameter.
If you define 3 parameters with 50 values each, you already get 50³, i.e. 125,000 combinations. With 4 parameters, this becomes 50⁴, or 6,250,000 combinations. With 5 parameters, 50⁵ represents 312,500,000 combinations, and with 6 parameters, 50⁶ rises to around 15.6 billion combinations.
Computation time increases accordingly. As a rough guideline, a few hundred thousand combinations are usually processed in seconds, a few million in seconds or minutes, and hundreds of millions can already mean hours of runtime. Beyond a billion combinations, runtime tends to be measured in days, even on a recent CPU, depending on the strategy and the machine used.
Grid search remains relevant as long as the search space stays manageable.
Overfitting and historical performance
When you explore a large number of parameter combinations with grid search, the risk of overfitting increases: the strategy ends up adapting to the quirks of the historical dataset instead of capturing more general market behaviour.
Strong backtest performance mostly shows that the strategy fits past data. It does not guarantee similar results in the future.
The goal is not just to maximise performance on the historical sample, but to increase the chances that the strategy remains valid on new, unseen data, often called out-of-sample data, which has not been used during optimisation. To reduce the risk of overfitting, there are specific validation techniques: out-of-sample tests, walk-forward procedures, robustness analysis, and so on.
In this workflow, optimisation remains a first step, to be complemented by validation techniques and by checking the strategy’s economic rationale before any live use.