Data parts – what they are and how they could be used?
The new SQ build 125 introduces multiple data range parts. Until now, you were able to divide your history data to two parts only:
- In Sample – this is where the strategies are evolved using genetic evolution. This means that strategy is evaluated on this part of data, and its performance score (fitness) is computed from metrics in this part of data.This fitness then determines which strategies in population are selected to be crossed and mutated to create a new generation. The best strategies have highest probability to be chosen for this and thanks to this the population as a whole should get better with every generation.
- Out Of Sample – this is “unknown” part of data that was not part of evolution. It is used to verify that strategies work also on “unknown” data.
Genetic evolution doesn’t see this part of data.
The new build 125 adds two more possible types, making it 4 in total:
- In Sample Training (IST) – this is the same as In Sample that we had until now. Genetic evolution uses this part to determine fitness and rank the strategies in population.
- In Sample Validation (ISV) – a new part in SQ X that is used to determine if strategy performance in IST part holds also in ISV part.
In machine learning it is used to determine if models trained on Training set (IST) holds also in Validation set.
In SQ X it can be used to restart genetic evolution when fitness stagnates in this part.
- Out of sample – this is as same as before, it represents an “unknown” part of data that was not part of the evolution
- No Trade – special part that means that strategy will not trade in this part. It can be used for example to skip a part in the middle of data that has low volatility.
The general recommendation in machine learning is to split the history data to 3 same parts: IST, ISV and OOS.
Another possible split could be 60/20/20, or move Out of Sample period to the front.
Multiple data sections
Another new feature if SQ X B 125 is that you can define multiple In Sample Validation or Out of Sample parts, not just one, and in any order you want.
In the picture above the white parts are all In Sample training – these are the data on which strategies are evolved.
Blue (ISV) parts are data on which strategies are verified, and evolution can be restarted when it is stagnating.
Gray (no trade) part is the one that is left out – strategy doesn’t trade here.
Green parts are OOS parts – they are not part of genetic evolution and strategies are evaluated on unknown data.
Conditions and filtering by each section
All metrics are now computed also independently for each of these parts, and you can use them in your conditions.
For example, you can filter out strategies where: Net profit (OOS1) is worse than 80% of Net profit (OOS2).
This allows you to use “stricter” filtering where strategy has to perform well on multiple parts of data.