i disagree with this simple formula – IS should be bigger than OOS. by my opinion is better to build strategies on a little of data, make choices through the whole data behavior (high volatility, low volatily, chop) and generate only on few months (years) and make everything else as OOS
by this way you will get more different strs, which can be ready for most of states of a market
You want to be a profitable algotrader? We started using StrateQuant software in early 2016. For now we have a very big knowhow for building EAs for every possible types of markets. We share this knowhow, apps, tools and also all final strategies with real traders. If you want to join us, fill in the FORM. 1500+ final SQX strategies for members running on demo accounts to verify the edge DEMO ACCS. We provide also strategies for indices - DAX, DOW JONES, NASDAQ, SP, UK, because we have realtick data from our brokers.
i disagree with this simple formula – IS should be bigger than OOS. by my opinion is better to build strategies on a little of data, make choices through the whole data behavior (high volatility, low volatily, chop) and generate only on few months (years) and make everything else as OOS by this way you will get more different strs, which can be ready for most of states of a market
I agree with hankeys, the smaller your IS is, the less overfit the strategy is, and the more strategies you can generate in a shorter time. Filter according to large data, generate on small.
Thanks to you both, thats a good point. However how is the behaviour of SQX if we search for strategies using the random build method? SQX makes no difference between IS and OOS?
Thank you guys
Now I know it’s written in the “user’s guide”:In Sample – it is used during genetic evolution;Out of Sample – this part of the data is used to verify.
When I set IS to 0 then no evolution happens, fitness stays at 0.06. I guess evolution is about improving strategy’s fitness value
I’m not sure if it’s theoretically right to generate strategies whith small IS and big OOS.
I think it makes no difference that overfitting happens on IS or OOS.
Isn’t backtesting about overfitting? History will repeat itself this sort of thing?
I’ll give you my take on things:
– First and foremost, indeed genetic evolution is about improving a strategy’s fitness level, by trying to slightly change its parameters on each run, and see whether it improved or not.
– Generation of the strategy (“training of the model”) happens on the In sample period, regardless of whether you use genetic or random evolution.
– Now, using out of sample periods makes sense in both types of generation, I’ll explain why.
This is the obvious one, once you “recycle” the same strategy with fine tuning and permutations until it satisfies your filters (or fitness minimum), this strategy is most probably overfit on that set of data. Therefore it makes sense to run the strategy one time “out of the box” on a previously unknown set of data, to make sure it functions well without being perfectly fitted on it. thereby reducing chance of overfitting. This can be done using the OOS slider during building process.
– Essentially, here a single strategy may run 200 times on the IS data until it satisfies your needs, then only 1 time on OOS data to make sure it satisfies your performance requirements, so you can see the advantage of that OOS period and how much you reduce chances of overfitting.
Here, unlike genetic evo, a single strategy runs only 1 time on IS data, and then 1 time on OS data, and then your filters are checked. So “to the naked eye” it seems there is no difference between IS and OOS, since before landing in your databank it sweeps through the whole period combined.
BUT, firstly, don’t forget that if a strategy doesn’t satisfy your IS needs, it will not go on to be tested on the OOS (Unless your filters are only related to the whole data set, and then the above is true, it basically means the whole data is IS, whether you divide it or not).
Here, because in Random gen we get a lower percentage of strategies to pass to the databank (since we just randomly throw in a mix of parameters and see if it works), it is its own kind of overfitting, even if we don’t recycle it as on genetic evo, since we just switch strategies on hyper-drive, until it “nicely sits” on our data set. So here what makes sense for us to do is to use an unseen set of data for OOS, which isn’t part of the generation process, for example the last 6 months or a year, which will be only revealed to the strategy in the retester. That way we have more confidence that our randomly created strategy, didn’t “accidentally” perfectly fall on our building data set.
And that’s only the tip of the iceberg to avoid overfitting, robustness testing and walk-forward are the real MVPs of avoiding it. But OOS periods are, to me, an essential part of the building and retesting process and I always reserve one OOS period at least for building, and another more recent one totally unseen, to be run on the retester.
You must be logged in to reply to this topic.