Walk-Forward Optimization vs Robustness

5 years ago #245968

I want to make a thread about WFO in general and have a discussion about how useful or not other quants here find it to be.

I am of the opinion that it is not that necessary at all and does not help determine the robustness of any particular strategy. The way WFO is structured, any given strategy is repeatedly optimized across multiple periods of market data and is then tested on a small portion of OOS data. The overall concept of this being that it is more likely to simulate live trading and tests how strategies perform when adapting to different market conditions. However, is this really true?

If the parameters of a given strategy are repeatedly optimized, then it is my opinion that all the WFO is doing is testing the robustness of those specific parameters as they occur during that time period in market data. In other words, it relies on constant optimization in order to determine the robustness of a strategy and I believe that constantly optimizing strategies is a flawed premise in the first place. I have tested hundreds of thousands of strategies at this point and a conclusion I have drawn from doing this is that changing the parameters by a large degree, you’re effectively working with a completely different strategy every time you do that. I am not talking about changing parameters by a small degree as you can test this with Monte Carlo simulations. If you are constantly optimizing a strategy for a much smaller period of time, the parameters can change so much that it may very well be optimized for that time period but fail long term. Furthermore, since you are optimizing over a small time period, there’s no way to tell when that given optimization will fail because each optimization can be drastically different from the last and you may end up with new parameters that were good for the last year but fail in live markets because you’re essentially trading with a different strategy than the last optimization. And if you are not constantly optimizing the final strategy that passes WFO then there is no real point to looking at what WFO is telling you in the first place because it is not an indication of the performance of a strategy that you do not change the parameters of.

I say all this to say it doesn’t make too much sense to me why anyone would constantly optimize strategies for current market conditions. Every time you do this, you’re working with a different strategy that may have only worked during the small time frame it was optimized for and you don’t know how drastically the optimization can affect the parameters given new market data. It makes much more sense to me to find strategies that survive OOS data and optimize it once in order to fit the performance to the entire data set. If it survives OOS and Monte Carlo simulations, optimizing it once finds the best parameters that have retained the most performance over the whole data set and you don’t have to worry about overfitting because it has already survived OOS at this point. You also should not need to constantly come back to your strategy and reoptimize it, which can change the nature of the strategy altogether. Instead, optimization is done once and you can have reliable performance backing the final parameters.

But my goal here is not to be lazy. Of course, this means less work to do for developing strategies. Cutting WFO out of my work-flow has made the process of having a final strategy I consider robust considerably faster and you don’t need to come back and reoptimize 10-100 different strategies you may be running. But I wouldn’t want to cut WFO out of my workflow because of practicality and laziness. If it truly is more beneficial to use it for the robustness process, I would not mind a tradeoff in time for a better strategy that is more reliable. It is more important not to let small oversights cost you money in the markets. However, so far, I have not found any legitimate reason or evidence as to why it is better to constantly optimize a given strategy.

So I want to pose the question to everyone else who uses WFO. Do you find it to be beneficial to use it? If so, how do you feel constantly optimizing strategies is more beneficial than not doing so? I have struggled to find any reason for this and would like some feedback from others who definitely disagree with my position because as I understand it, WFO is more of an optional test for this very reason.

0

5 years ago #245977

changing the parameters by a large degree, you’re effectively working with a completely different strategy every time you do that.

This thought has crossed my mind a lot. At which point do we draw the line between optimizing and randomly trying completely different strategies? In my own EA I have optimize-able parameters that completely change entire aspects of the strategy and I consider optimizing them to be more akin to trying different strategies randomly. This is one of the reasons why I think it’s important to know exactly what every variable actually does before optimizing it so we can judge if it should be optimized at all and how much we can change it without it becoming a different strategy.

Judging from my limited experience plus all the trading books that include a walk-forward chapter I’d say it’s highly likely that WF can have some value if we do it properly and take care not to data-mine too much and take care not to, as you say, “end up with a completely different strategy.” A WF can do more harm than good if not done properly though, especially with SQX WF matrix can sometimes make a WF of certain parameters look good when actually we’ve over-optimized the WF process itself.

It’s pretty rare for me that a strategy does as well or better in it’s OOS of a WF than it did in it’s OOS during strategy development phase, but when it does I have a little more confidence in the WF. Of course I will re-optimize as necessary, as you said, it makes no sense not to, yes it’s a pain and I’ve made special adjustments to my mt4 code to ease the pain.

I’m not sure that I would bother to use it as a robustness test because like you said the MC test can do something similar.

I agree with your entire post and it is extremely well written and poses some great questions that we both share but this absolutely is ringing alarm bells for me:

It makes much more sense to me to find strategies that survive OOS data and optimize it once in order to fit the performance to the entire data set.

You stand the risk of over-fitting during your final optimization! I wouldn’t optimize it after you’ve already done all the other tests. This is no different than a walk forward with a large in-sample that you haven’t tested out-of-sample yet. I’d at least go for an anchored WF before attempting that.

May all your fits be loose.

https://www.darwinex.com/darwin/SUG.4.2/

0

5 years ago #245978

The concept must be if a strategy can be walk forward optimized on unseen data and by doing this then improve the performance significantly, then it is an by market conditions adaptable strategy that most probably can be adapted to different markets conditions if recently optimized. I mean when Walk forward was developed they needed away to fix the strategy since they had only one that they worked on for 2 years and needed this one to work on so many markets as possible. I am testing this actually i have about 400 strategies running on demo that were optimized on Unseen data and they shall be re optimized in December and originally they all have >80 percent winning periods on unseen data which in practice should mean that by june next year time for second optimization 90% at least should have had a profitable period. If that holds true i will probably use it other wise i will dump it which is the most likely scenario.

All thought so far i see best performance from strategies that are simple optimized on recent data but still performed great on unseen data and that is much simpler to do. Howe ever this seems to create a certain % of total immediate total losers so they first need to be incubated. But the remaining have had good performance all thought in my test doing this they were of a type that have recently performed well any way looking back 6 months so things like type of strategy and that type of general performance on recent market has to be considered as well and compared with the Optimized to deter-main the effectiveness of doing and implementing these operations for strategies used with real money. However i found it is almost impossible to make a strategy work on a loosing period by optimizing it the only thing that seems to happen is a slight performance improvement over time they becomes a better when they are already making profit un optimized.

For robustness test i think the best option is to have strategies that work on many instruments and time frames and they are easy to find on SQx. How can a strategy be curve fitted or bad if it works on unseen data on X different markets. It will of course work maybe not the next 6 months but later it will for sure or the market would stop being random which will not happen.

0

5 years ago #245980

bentra wrote:

You stand the risk of over-fitting during your final optimization! I wouldn’t optimize it after you’ve already done all the other tests. This is no different than a walk forward with a large in-sample that you haven’t tested out-of-sample yet. I’d at least go for an anchored WF before attempting that.

I would have to disagree with this. The final optimization after everything has been tested is generally limited to changing a few parameters only slightly. For example, moving a stop-loss from 3.4x ATR down to 2.8x ATR or moving the exit after bars from 25 up to 40. It is illogical to me that these changes would make a strategy “overfit” all of the sudden. Overfit strategies are strategies that only work on the data they were TRAINED on. Optimization across the data set is not the same as retraining. It still has survived on previous OOS data. Minorly tweaking the strategy is not retraining it on this OOS data as it should still be considered unseen from the original training data set.

mabi wrote:

For robustness test i think the best option is to have strategies that work on many instruments and time frames and they are easy to find on SQx. How can a strategy be curve fitted or bad if it works on unseen data on X different markets. It will of course work maybe not the next 6 months but later it will for sure or the market would stop being random which will not happen.

I thought of this originally as well. However, my concern was with how different other markets can be at times. This is even more of a gray area than WFO to me and I still want to do more research into strategies working on other markets being a true robustness test. From what I’ve gathered, you have to test markets that are similar. As an obvious example, you can’t do something like compare results of one strategy on EUR/USD and USD/JPY as they move near opposite to one another. You would have to do something like comparing a strategy on EUR/USD to GBP/USD and if testing USD/JPY, compare it to EUR/JPY and so on.

mabi wrote:

All thought so far i see best performance from strategies that are simple optimized on recent data but still performed great on unseen data and that is much simpler to do.

I am curious to do some more research and get more opinion regarding this. I never trained on more recent data and tested it against OOS in the past. I would do the opposite where I’d train on past data and make recent data my OOS in order to prove that the strategy still worked in today’s market conditions. The best of both worlds would be to have multiple IS periods as well as multiple OOS but I’m waiting on that feature request to be implemented 😉

0

5 years ago #245991

Minorly tweaking the strategy is not retraining it on this OOS data as it should still be considered unseen from the original training data set.

Any tweaking at all is a form of fitting to some degree, if you use the OOS for the tweaks then it is by definition considered seen and by definition no longer considered OOS. As I understand, small tweaks are less likely to hurt and more likely to help but the whole idea of the walk forward is to try and verify how much the optimizing/tweaking part is helping or hurting. You could use slightly less data or obtain more data and go for a walk forward by “minor tweek” instead. In your example of using “entire data sample,” what’s the difference between these three things:
-“minor tweaking” on 16 years of your entire data
-obtaining more data first and then using 16 years out of 25 years for IS and doing a walk forward on the rest by “minor tweaking”
-using 14 out of 16 years and do a walk forward by “minor tweak”on the remaining OOS 2 years? (could be better than nothing and give you confidence in your “minor tweaking” stage.)

As an obvious example, you can’t do something like compare results of one strategy on EUR/USD and USD/JPY as they move near opposite to one another. You would have to do something like comparing a strategy on EUR/USD to GBP/USD and if testing USD/JPY, compare it to EUR/JPY and so on.

If your strategy is symmetrical then of course you can compare results from EURUSD to USDJPY. If your strategy is not symmetrical than your statement makes sense. I think for context when talking about cross market testing, it will be helpful to specify time frame and also whether or not we are talking about a symmetrical strategies. For instance, I find that larger time-frames (H4 / D1) are better as symmetrical and cross market strategies while smaller time-frames are better asymmetrical and seem more specialized for a single pair. I also find that GOLD, USDJPY, EURUSD and GBPUSD are a solid set of cross tests for larger time frame symmetrical strategies. But a good data set for a smaller time frame asymmetrical strategy something like EURUSD, EURJPY and GBPUSD seems best for a EURUSD strat. (One additional xxxUSD and one additional EURxxx) I actually made a feature request so we can “flip” the cross so we could have a cross check set like USDJPY, EURJPY, USDGBP as a logical cross set for an asymmetrical USDJPY strategy. I forgot all about it till now. (https://roadmap.strategyquant.com/tasks/sq4_3319)

May all your fits be loose.

https://www.darwinex.com/darwin/SUG.4.2/

0

5 years ago #246884

0

5 years ago #246888

bentra wrote:

what’s the difference between these three things: -“minor tweaking” on 16 years of your entire data -obtaining more data first and then using 16 years out of 25 years for IS and doing a walk forward on the rest by “minor tweaking” -using 14 out of 16 years and do a walk forward by “minor tweak”on the remaining OOS 2 years? (could be better than nothing and give you confidence in your “minor tweaking” stage.)

Well, WFO tests for optimizing strategies multiple times. I don’t do this. So the difference is that performing this test does not give me the final data I want when it comes to making a strategy ready for production as I don’t plan on optimizing it more than once.

bentra wrote:

If your strategy is symmetrical then of course you can compare results from EURUSD to USDJPY. If your strategy is not symmetrical than your statement makes sense. I think for context when talking about cross market testing, it will be helpful to specify time frame and also whether or not we are talking about asymmetrical strategies.

I used to swear by symmetrical strategies until data showed me otherwise. Now I only use short-only and long-only strategies for a given market. I will do one-sided training per side per market. The final result is two different strategies that each only trade one direction with completely different rules. Some people like yourself swear by symmetry. To each their own. As for TF, it really doesn’t matter as long as it is not anything like 5m or lower. If anyone has been able to make 5m strategies or lower work in live trading, I salute them.

0

5 years ago #247232

until data showed me otherwise. Now I only use short-only and long-only strategies

Not surprising considering how flawed the symmetry is in SQX still.

I can see how you think I love symmetry from all my recent activity but actually most of my live strats are not.

May all your fits be loose.

https://www.darwinex.com/darwin/SUG.4.2/

0

Products

Resources

Company

Follow us