Optimal instances of SQ3 per core and scripts for managing them
21 replies
Threshold
7 years ago #115144
What is the optimal # of SQ3 instances you run per core?
I do 1 SQ instance per core and and its multithreaded so I do 2 threads per SQ on my old Dell Poweredges.
Are there programs or scripts out there for managing multiple instances of software? My new server has 32 cores, thats 32 instances of SQ3, maybe 16, I’ll probably be installing and running on it for optimal strategy generation speeds. Thats a lot of work changing their settings and everything if I want them all doing the exact same thing.
mikeyc
7 years ago #137094
SQ is multithreaded, can you please explain why you do this? Maybe I missed something in a discussion somewhere….
Threshold
7 years ago #137097
You get way faster results running multiple instances. SQ3 uses multithreading/cores very inefficiently even with the Zulu hack.
1 SQ per core using 2 threads seems to be optimal, maybe even 1 SQ per thread.
_Cujo
7 years ago #137102
I’m running 4 instances of SQ right now on this machine (I took the screen shot awhile ago, then got distracted, but it’s still running now). With the Zulu hack and the command line stuff from Geektrader’s thread.
It’s not super fast, or anything, but works fine. I changed machines start of the month, downgrading actually, as the last machine was overkill.
statistic
7 years ago #137325
What is the optimal # of SQ3 instances you run per core?
I do 1 SQ instance per core and and its multithreaded so I do 2 threads per SQ on my old Dell Poweredges.
Are there programs or scripts out there for managing multiple instances of software? My new server has 32 cores, thats 32 instances of SQ3, maybe 16, I’ll probably be installing and running on it for optimal strategy generation speeds. Thats a lot of work changing their settings and everything if I want them all doing the exact same thing.
I believe you have 16 real cores / 32 threads . I have the same machine (two sockets Xeon E5-2630) , SQ cannot use more then 4 real cores, hence you can use 16 real cores and run 4 SQ instances and it will load it 98%, but you need to optimise Disk (SSD-PCI-X in RAID0 will do )
Even if you enable 32 on SQ it will not do any difference then 4 threads. I have tested extensively on different machines and this is optimum performance, if you ran more instances they will be fighting for CPU time and will be long queue. But I have not checked the amount of strategies, I only speak about CPU load and queuing. If you can check for me the actual generation speed X strategies/minute. I think it might be worth to discard CPU queuing.
If you do that test, it will be helpful so I can do the same:
Test #1 Run 1-2-3-4-x SQ instances to load CPU 100%, and measure how many generations it created (use Random in this case, on average it will be the same size of strategies.
Test #2 Run 1-2-3-4-x + 1-2-3-4 more instances of SQ and measure how many strategies are generated.
I hope you run Server 2012 on that machine or at least Windows 10.
This way we can find out what is the optimum. I have not apply any hacks a there is no need all real cores are loaded fully.
I will wait for your results and then post mine ones.
======
_Cujo
Using Cloud is a waste of money they will never give you Real cores, but crapy vCPU which is like 4-10 times slower. then real cores. Just go for dedicated server or buy a decent E5 machine.
mentaledge
7 years ago #137331
Well running multiple instances will give some workflow parallelization, which is useful when I want to break things down to smaller set of building blocks.
I’m a new to SQ and what puzzles me is that despite the number of cores or machine itself the “CPU busy” does not go above 50%.
Do I miss something that blocks SQ from using all available resources?
statistic
7 years ago #137332
Well running multiple instances will give some workflow parallelization, which is useful when I want to break things down to smaller set of building blocks.
I’m a new to SQ and what puzzles me is that despite the number of cores or machine itself the “CPU busy” does not go above 50%.
Do I miss something that blocks SQ from using all available resources?
Please read my previous reply on machine with 16 real cores. 1x instance uses only 25% all 16 cores (despite of configuration to use 16 cores/or 32Threads), hence, I decide it is 4x cores. I use another machine with 4 cores and it the same SET uses 98% CPU. I think Sq3 only uses max 4x real cores and 7GB ram ( not 6GB as was mentioned but 7GB (6,8GB) ), The server has 38GHz processor power (threading is never counted it comes as a bonus), and they are ALL used just with 4 instances, but I use 6, just to make sure I squeezing all juice out of it, despite the CPU queue.
mentaledge
7 years ago #137333
Yes, I got that, what I meant I see same low load even on 4x cores. That is for generation phase.
geektrader
7 years ago #137337
My findings are that 1,5 threads per instance are optimal (theoretic calculation), figured a long time a year back what the optimum is. Since the value is 1,5 threads, I am sticking to 1 thread per instance, also because memory is doubled if switching from 1 to 2 threads, but CPU usage it not really doubled (just ~1,6). Hence 1 thread per instance is the optimum for me. Launching is done via a .bat file which copies SQ to X directories then launches all of them (you have to do this via the “START blah.exe” command line of Windows from within the .bat file, otherwise the .bat file is exited after launching the first SQ instance).
4 instances example (using compressed directories via NTFS if available, cleaning temp / logs of SQ before copying so they don´t get copied, start all of them with LOW priority so that I can do other work and the SQ instances only use the CPU amount that is currently unused by my daily normal work, using exclusion.txt to avoid copying of my /strategies/ directory since this is huge and doesn´t need to be copied for each instance since I can just load it from the main SQ directory from each instance):
@echo off rmdir "C:\Program Files\StrategyQuant\temp" /S /Q rmdir "C:\Program Files\StrategyQuant\log" /S /Q rmdir "c:\temp\strategyquant-temp" /S /Q mkdir "c:\temp\strategyquant-temp" mkdir "c:\temp\strategyquant-temp\1" mkdir "c:\temp\strategyquant-temp\2" mkdir "c:\temp\strategyquant-temp\3" mkdir "c:\temp\strategyquant-temp\4" compact /c /s:"c:\temp\strategyquant-temp\1" compact /c /s:"c:\temp\strategyquant-temp\2" compact /c /s:"c:\temp\strategyquant-temp\3" compact /c /s:"c:\temp\strategyquant-temp\4" c: xcopy "C:\Program Files\StrategyQuant" "c:\temp\strategyquant-temp\1" /E /Y /EXCLUDE:exclusion.txt CD "c:\temp\strategyquant-temp\1" start /LOW StrategyQuant64.exe -J-server -J-Xmx2500m -J-XX:+DisableExplicitGC -J-XX:+AggressiveOpts -J-XX:+UseSerialGC xcopy "C:\Program Files\StrategyQuant" "c:\temp\strategyquant-temp\2" /E /Y /EXCLUDE:exclusion.txt CD "c:\temp\strategyquant-temp\2" start /LOW StrategyQuant64.exe -J-server -J-Xmx2500m -J-XX:+DisableExplicitGC -J-XX:+AggressiveOpts -J-XX:+UseSerialGC xcopy "C:\Program Files\StrategyQuant" "c:\temp\strategyquant-temp\3" /E /Y /EXCLUDE:exclusion.txt CD "c:\temp\strategyquant-temp\3" start /LOW StrategyQuant64.exe -J-server -J-Xmx2500m -J-XX:+DisableExplicitGC -J-XX:+AggressiveOpts -J-XX:+UseSerialGC xcopy "C:\Program Files\StrategyQuant" "c:\temp\strategyquant-temp\4" /E /Y /EXCLUDE:exclusion.txt CD "c:\temp\strategyquant-temp\4" start /LOW StrategyQuant64.exe -J-server -J-Xmx2500m -J-XX:+DisableExplicitGC -J-XX:+AggressiveOpts -J-XX:+UseSerialGC
exclusion.txt just contains:
strategies
which will exclude the /strategies directory as mentioned.
Of course you also need to adjust the amount of RAM used for each instance depending on your systems memory. And the batch file can be easily extended at the needed places to use more or less instances. I have bat files for 2 to 21 instances.
Threshold
7 years ago #137338
I believe you have 16 real cores / 32 threads . I have the same machine (two sockets Xeon E5-2630) , SQ cannot use more then 4 real cores, hence you can use 16 real cores and run 4 SQ instances and it will load it 98%, but you need to optimise Disk (SSD-PCI-X in RAID0 will do )
Even if you enable 32 on SQ it will not do any difference then 4 threads. I have tested extensively on different machines and this is optimum performance, if you ran more instances they will be fighting for CPU time and will be long queue. But I have not checked the amount of strategies, I only speak about CPU load and queuing. If you can check for me the actual generation speed X strategies/minute. I think it might be worth to discard CPU queuing.
If you do that test, it will be helpful so I can do the same:
Test #1 Run 1-2-3-4-x SQ instances to load CPU 100%, and measure how many generations it created (use Random in this case, on average it will be the same size of strategies.
Test #2 Run 1-2-3-4-x + 1-2-3-4 more instances of SQ and measure how many strategies are generated.
I hope you run Server 2012 on that machine or at least Windows 10.
This way we can find out what is the optimum. I have not apply any hacks a there is no need all real cores are loaded fully.
I will wait for your results and then post mine ones.
======
_Cujo
Using Cloud is a waste of money they will never give you Real cores, but crapy vCPU which is like 4-10 times slower. then real cores. Just go for dedicated server or buy a decent E5 machine.
No, I have a r810 with 4 sockets (4x x7560)= 32 cores, 64 threads. 128GB ram, Windows Server 2008 Enterprise. Windows 10 cannot use 4 sockets, and I don’t like the OS.
Threshold
7 years ago #137373
It seems that running SQ beyond a certain threadcount actually negatively impacts generation speeds and slows them down.
I tested 1 SQ running 64 threads just to see what happens: Generation speeds go from the normal 0.2s per strategy to about ~5-10s per strategy. CPU load hovered around 1-3% usage.
statistic
7 years ago #137379
It seems that running SQ beyond a certain threadcount actually negatively impacts generation speeds and slows them down.
I tested 1 SQ running 64 threads just to see what happens: Generation speeds go from the normal 0.2s per strategy to about ~5-10s per strategy. CPU load hovered around 1-3% usage.
Thank you for the test , It confirms my thinking .
The true test will be to:
1. Create a folder with 20000 strategies
2. Clone it (to avoid any modifications from SQ)
3. Load (start from bingeing to avoid any issues with memory / disk / CPU) clean load
4. Run test for 20 years on M1 – record time
5. Clean all, close
6. Go to step #3, compare results (repeat #3 for every combination, with Threads, without Hyper threading and only Cores etc.)
it is in my to do list , but if some one can run this test , and show the results, it would be helpful.
Threshold
7 years ago #137390
@echo off rmdir "C:\Program Files\StrategyQuant\temp" /S /Q rmdir "C:\Program Files\StrategyQuant\log" /S /Q rmdir "c:\temp\strategyquant-temp" /S /Q mkdir "c:\temp\strategyquant-temp" mkdir "c:\temp\strategyquant-temp\1" mkdir "c:\temp\strategyquant-temp\2" mkdir "c:\temp\strategyquant-temp\3" mkdir "c:\temp\strategyquant-temp\4" compact /c /s:"c:\temp\strategyquant-temp\1" compact /c /s:"c:\temp\strategyquant-temp\2" compact /c /s:"c:\temp\strategyquant-temp\3" compact /c /s:"c:\temp\strategyquant-temp\4" c: xcopy "C:\Program Files\StrategyQuant" "c:\temp\strategyquant-temp\1" /E /Y /EXCLUDE:exclusion.txt CD "c:\temp\strategyquant-temp\1" start /LOW StrategyQuant64.exe -J-server -J-Xmx2500m -J-XX:+DisableExplicitGC -J-XX:+AggressiveOpts -J-XX:+UseSerialGC xcopy "C:\Program Files\StrategyQuant" "c:\temp\strategyquant-temp\2" /E /Y /EXCLUDE:exclusion.txt CD "c:\temp\strategyquant-temp\2" start /LOW StrategyQuant64.exe -J-server -J-Xmx2500m -J-XX:+DisableExplicitGC -J-XX:+AggressiveOpts -J-XX:+UseSerialGC xcopy "C:\Program Files\StrategyQuant" "c:\temp\strategyquant-temp\3" /E /Y /EXCLUDE:exclusion.txt CD "c:\temp\strategyquant-temp\3" start /LOW StrategyQuant64.exe -J-server -J-Xmx2500m -J-XX:+DisableExplicitGC -J-XX:+AggressiveOpts -J-XX:+UseSerialGC xcopy "C:\Program Files\StrategyQuant" "c:\temp\strategyquant-temp\4" /E /Y /EXCLUDE:exclusion.txt CD "c:\temp\strategyquant-temp\4" start /LOW StrategyQuant64.exe -J-server -J-Xmx2500m -J-XX:+DisableExplicitGC -J-XX:+AggressiveOpts -J-XX:+UseSerialGC
Do you find this is much faster than just loading a ‘master’ settings file into each one or just less of a headache?
Avoiding the headache probably makes it worth it on its own.
geektrader
7 years ago #137394
Yea, I want everything to be right and up to date for each instance, especially since I don´t want to update the data on 28 pairs each week on X instances. Also, the batch files have pre-configured RAM alligment for my system and the amount of instances that are run each time. So I know it´s always correct. And I often run 16 instances at once for example, then it even gets bothersome to load the set file in each of them. With a SSD + eBoostr, all the launching via the .bat-files barely takes 3 minutes or less.
Threshold
7 years ago #137397
I ran it last night, its definitely better.
I created a separate “master bin” folder copy. Before I ran the batch I deleted all the historical data I wouldnt be using from the bin folder. Many less gigs copied.
Threshold
7 years ago #137398
Averaging about 80-85% usage. 32SQs 2 threads each. 64SQs with 1 thread each would have been too much work.