Reply

Optimal instances of SQ3 per core and scripts for managing them

21 replies

Threshold

Customer, bbp_participant, community, 723 replies.

Visit profile

7 years ago #115144

What is the optimal # of SQ3 instances you run per core?

I do 1 SQ instance per core and and its multithreaded so I do 2 threads per SQ on my old Dell Poweredges.

Are there programs or scripts out there for managing multiple instances of software? My new server has 32 cores, thats 32 instances of SQ3, maybe 16, I’ll probably be installing and running on it for optimal strategy generation speeds. Thats a lot of work changing their settings and everything if I want them all doing the exact same thing.

0

mikeyc

Customer, bbp_participant, community, 877 replies.

Visit profile

7 years ago #137094

SQ is multithreaded, can you please explain why you do this? Maybe I missed something in a discussion somewhere….

0

Threshold

Customer, bbp_participant, community, 723 replies.

Visit profile

7 years ago #137097

You get way faster results running multiple instances. SQ3 uses multithreading/cores very inefficiently even with the Zulu hack.

1 SQ per core using 2 threads seems to be optimal, maybe even 1 SQ per thread.

0

_Cujo

Customer, bbp_participant, community, 101 replies.

Visit profile

7 years ago #137102

I’m running 4 instances of SQ right now on this machine (I took the screen shot awhile ago, then got distracted, but it’s still running now). With the Zulu hack and the command line stuff from Geektrader’s thread.

 

It’s not super fast, or anything, but works fine. I changed machines start of the month, downgrading actually, as the last machine was overkill.

 

edq11x7.png

0

statistic

Subscriber, bbp_participant, community, 31 replies.

Visit profile

7 years ago #137325

What is the optimal # of SQ3 instances you run per core?

I do 1 SQ instance per core and and its multithreaded so I do 2 threads per SQ on my old Dell Poweredges.

Are there programs or scripts out there for managing multiple instances of software? My new server has 32 cores, thats 32 instances of SQ3, maybe 16, I’ll probably be installing and running on it for optimal strategy generation speeds. Thats a lot of work changing their settings and everything if I want them all doing the exact same thing.

 

I believe you have 16 real cores / 32 threads  . I have the same machine (two sockets Xeon E5-2630) , SQ cannot use more then 4 real cores, hence you can use 16 real cores and run 4 SQ instances and it will load it 98%, but you need to optimise Disk (SSD-PCI-X in RAID0 will do )

Even if you enable 32 on SQ it will not do any difference then 4 threads. I have tested extensively on different machines and this is optimum performance, if you ran more instances they will be fighting for CPU time and will be long queue. But I have not checked the amount of strategies, I only speak about CPU load and queuing. If you can check for me the actual generation speed X strategies/minute. I think it might be worth to discard CPU queuing.

If you do that test, it will be helpful so I can do the same:

Test #1 Run 1-2-3-4-x SQ instances to load CPU 100%, and measure how many generations it created (use Random in this case, on average it will be the same size of strategies.

Test #2 Run 1-2-3-4-x + 1-2-3-4 more instances of SQ and measure how many strategies are generated.

 

I hope you run Server 2012 on that machine or at least Windows 10.

This way we can find out what is the optimum. I have not apply any hacks a there is no need all real cores are loaded fully.

I will wait for your results and then post mine ones.

======

 

 

_Cujo

Using Cloud is a waste of money they will never give you Real cores, but crapy vCPU which is like 4-10 times slower. then real cores. Just go for dedicated server or buy a decent E5 machine.

0

mentaledge

Customer, bbp_participant, community, sq-ultimate, 25 replies.

Visit profile

7 years ago #137331

Well running multiple instances will give some workflow parallelization, which is useful when I want to break things down to smaller set of building blocks.

I’m a new to SQ and what puzzles me is that despite the number of cores or machine itself the “CPU busy” does not go above 50%.

Do I miss something that blocks SQ from using all available resources?

0

statistic

Subscriber, bbp_participant, community, 31 replies.

Visit profile

7 years ago #137332

Well running multiple instances will give some workflow parallelization, which is useful when I want to break things down to smaller set of building blocks.

I’m a new to SQ and what puzzles me is that despite the number of cores or machine itself the “CPU busy” does not go above 50%.

Do I miss something that blocks SQ from using all available resources?

Please read my previous reply on machine with 16 real cores. 1x instance uses only 25% all 16 cores (despite of configuration to use 16 cores/or 32Threads), hence, I decide it is 4x cores. I use another machine with 4 cores and it the same SET uses 98% CPU.  I think Sq3 only uses max 4x real cores and 7GB ram ( not 6GB as was mentioned but 7GB (6,8GB) ), The server has 38GHz processor power (threading is never counted it comes as a bonus), and they are ALL used just with 4 instances, but I use 6, just to make sure I squeezing all juice out of it, despite the CPU queue.

0

mentaledge

Customer, bbp_participant, community, sq-ultimate, 25 replies.

Visit profile

7 years ago #137333

Yes, I got that, what I meant I see same low load even on 4x cores. That is for generation phase.

0

geektrader

Customer, bbp_participant, community, 522 replies.

Visit profile

7 years ago #137337

My findings are that 1,5 threads per instance are optimal (theoretic calculation), figured a long time a year back what the optimum is. Since the value is 1,5 threads, I am sticking to 1 thread per instance, also because memory is doubled if switching from 1 to 2 threads, but CPU usage it not really doubled (just ~1,6). Hence 1 thread per instance is the optimum for me. Launching is done via a .bat file which copies SQ to X directories  then launches all of them (you have to do this via the “START blah.exe” command line of Windows from within the .bat file, otherwise the .bat file is exited after launching the first SQ instance).

 

4 instances example (using compressed directories via NTFS if available, cleaning temp / logs of SQ before copying so they don´t get copied, start all of them with LOW priority so that I can do other work and the SQ instances only use the CPU amount that is currently unused by my daily normal work, using exclusion.txt to avoid copying of my /strategies/ directory since this is huge and doesn´t need to be copied for each instance since I can just load it from the main SQ directory from each instance):

@echo off
rmdir "C:\Program Files\StrategyQuant\temp" /S /Q
rmdir "C:\Program Files\StrategyQuant\log" /S /Q

rmdir "c:\temp\strategyquant-temp" /S /Q
mkdir "c:\temp\strategyquant-temp"
mkdir "c:\temp\strategyquant-temp\1"
mkdir "c:\temp\strategyquant-temp\2"
mkdir "c:\temp\strategyquant-temp\3"
mkdir "c:\temp\strategyquant-temp\4"

compact /c /s:"c:\temp\strategyquant-temp\1"
compact /c /s:"c:\temp\strategyquant-temp\2"
compact /c /s:"c:\temp\strategyquant-temp\3"
compact /c /s:"c:\temp\strategyquant-temp\4"

c:


xcopy "C:\Program Files\StrategyQuant" "c:\temp\strategyquant-temp\1" /E /Y /EXCLUDE:exclusion.txt

CD "c:\temp\strategyquant-temp\1"
start /LOW StrategyQuant64.exe -J-server -J-Xmx2500m  -J-XX:+DisableExplicitGC -J-XX:+AggressiveOpts -J-XX:+UseSerialGC


xcopy "C:\Program Files\StrategyQuant" "c:\temp\strategyquant-temp\2" /E /Y /EXCLUDE:exclusion.txt

CD "c:\temp\strategyquant-temp\2"
start /LOW StrategyQuant64.exe -J-server -J-Xmx2500m  -J-XX:+DisableExplicitGC -J-XX:+AggressiveOpts -J-XX:+UseSerialGC


xcopy "C:\Program Files\StrategyQuant" "c:\temp\strategyquant-temp\3" /E /Y /EXCLUDE:exclusion.txt

CD "c:\temp\strategyquant-temp\3"
start /LOW StrategyQuant64.exe -J-server -J-Xmx2500m  -J-XX:+DisableExplicitGC -J-XX:+AggressiveOpts -J-XX:+UseSerialGC


xcopy "C:\Program Files\StrategyQuant" "c:\temp\strategyquant-temp\4" /E /Y /EXCLUDE:exclusion.txt

CD "c:\temp\strategyquant-temp\4"
start /LOW StrategyQuant64.exe -J-server -J-Xmx2500m  -J-XX:+DisableExplicitGC -J-XX:+AggressiveOpts -J-XX:+UseSerialGC

exclusion.txt just contains:

strategies

which will exclude the /strategies directory as mentioned.

 

Of course you also need to adjust the amount of RAM used for each instance depending on your systems memory. And the batch file can be easily extended at the needed places to use more or less instances. I have bat files for 2 to 21 instances.


🚀 Unlock Your Edge in Automated Forex Strategy Development 🚀

Historical Forex Data Starting From 1987, 28 Pairs, M1, 99% Error-Free, Lifetime Free Updates

0

Threshold

Customer, bbp_participant, community, 723 replies.

Visit profile

7 years ago #137338

I believe you have 16 real cores / 32 threads  . I have the same machine (two sockets Xeon E5-2630) , SQ cannot use more then 4 real cores, hence you can use 16 real cores and run 4 SQ instances and it will load it 98%, but you need to optimise Disk (SSD-PCI-X in RAID0 will do )

Even if you enable 32 on SQ it will not do any difference then 4 threads. I have tested extensively on different machines and this is optimum performance, if you ran more instances they will be fighting for CPU time and will be long queue. But I have not checked the amount of strategies, I only speak about CPU load and queuing. If you can check for me the actual generation speed X strategies/minute. I think it might be worth to discard CPU queuing.

If you do that test, it will be helpful so I can do the same:

Test #1 Run 1-2-3-4-x SQ instances to load CPU 100%, and measure how many generations it created (use Random in this case, on average it will be the same size of strategies.

Test #2 Run 1-2-3-4-x + 1-2-3-4 more instances of SQ and measure how many strategies are generated.

 

I hope you run Server 2012 on that machine or at least Windows 10.

This way we can find out what is the optimum. I have not apply any hacks a there is no need all real cores are loaded fully.

I will wait for your results and then post mine ones.

======

 

 

_Cujo

Using Cloud is a waste of money they will never give you Real cores, but crapy vCPU which is like 4-10 times slower. then real cores. Just go for dedicated server or buy a decent E5 machine.

No, I have a r810 with 4 sockets (4x x7560)= 32 cores, 64 threads. 128GB ram, Windows Server 2008 Enterprise. Windows 10 cannot use 4 sockets, and I don’t like the OS.

0

Threshold

Customer, bbp_participant, community, 723 replies.

Visit profile

7 years ago #137373

It seems that running SQ beyond a certain threadcount actually negatively impacts generation speeds and slows them down.

I tested 1 SQ running 64 threads just to see what happens: Generation speeds go from the normal 0.2s per strategy to about ~5-10s per strategy. CPU load hovered around  1-3% usage.

0

statistic

Subscriber, bbp_participant, community, 31 replies.

Visit profile

7 years ago #137379

It seems that running SQ beyond a certain threadcount actually negatively impacts generation speeds and slows them down.

I tested 1 SQ running 64 threads just to see what happens: Generation speeds go from the normal 0.2s per strategy to about ~5-10s per strategy. CPU load hovered around  1-3% usage.

Thank you for the test , It confirms my thinking .

The true test will be to:

1. Create a folder with 20000 strategies

2. Clone it (to avoid any modifications from SQ)

3. Load (start from bingeing to avoid any issues with memory / disk / CPU) clean load

4. Run test for 20 years on M1 – record time

5. Clean all, close

6. Go to step #3, compare results (repeat #3 for every combination, with Threads, without Hyper threading and only Cores etc.)

it is in my to do list , but if some one can run this test , and show the results, it would be helpful.

0

Threshold

Customer, bbp_participant, community, 723 replies.

Visit profile

7 years ago #137390

@echo off
rmdir "C:\Program Files\StrategyQuant\temp" /S /Q
rmdir "C:\Program Files\StrategyQuant\log" /S /Q

rmdir "c:\temp\strategyquant-temp" /S /Q
mkdir "c:\temp\strategyquant-temp"
mkdir "c:\temp\strategyquant-temp\1"
mkdir "c:\temp\strategyquant-temp\2"
mkdir "c:\temp\strategyquant-temp\3"
mkdir "c:\temp\strategyquant-temp\4"

compact /c /s:"c:\temp\strategyquant-temp\1"
compact /c /s:"c:\temp\strategyquant-temp\2"
compact /c /s:"c:\temp\strategyquant-temp\3"
compact /c /s:"c:\temp\strategyquant-temp\4"

c:


xcopy "C:\Program Files\StrategyQuant" "c:\temp\strategyquant-temp\1" /E /Y /EXCLUDE:exclusion.txt

CD "c:\temp\strategyquant-temp\1"
start /LOW StrategyQuant64.exe -J-server -J-Xmx2500m  -J-XX:+DisableExplicitGC -J-XX:+AggressiveOpts -J-XX:+UseSerialGC


xcopy "C:\Program Files\StrategyQuant" "c:\temp\strategyquant-temp\2" /E /Y /EXCLUDE:exclusion.txt

CD "c:\temp\strategyquant-temp\2"
start /LOW StrategyQuant64.exe -J-server -J-Xmx2500m  -J-XX:+DisableExplicitGC -J-XX:+AggressiveOpts -J-XX:+UseSerialGC


xcopy "C:\Program Files\StrategyQuant" "c:\temp\strategyquant-temp\3" /E /Y /EXCLUDE:exclusion.txt

CD "c:\temp\strategyquant-temp\3"
start /LOW StrategyQuant64.exe -J-server -J-Xmx2500m  -J-XX:+DisableExplicitGC -J-XX:+AggressiveOpts -J-XX:+UseSerialGC


xcopy "C:\Program Files\StrategyQuant" "c:\temp\strategyquant-temp\4" /E /Y /EXCLUDE:exclusion.txt

CD "c:\temp\strategyquant-temp\4"
start /LOW StrategyQuant64.exe -J-server -J-Xmx2500m  -J-XX:+DisableExplicitGC -J-XX:+AggressiveOpts -J-XX:+UseSerialGC

Do you find this is much faster than just loading a ‘master’ settings file into each one or just less of a headache?
Avoiding the headache probably makes it worth it on its own.

0

geektrader

Customer, bbp_participant, community, 522 replies.

Visit profile

7 years ago #137394

Yea, I want everything to be right and up to date for each instance, especially since I don´t want to update the data on 28 pairs each week on X instances. Also, the batch files have pre-configured RAM alligment for my system and the amount of instances that are run each time. So I know it´s always correct. And I often run 16 instances at once for example, then it even gets bothersome to load the set file in each of them. With a SSD + eBoostr, all the launching via the .bat-files barely takes 3 minutes or less.


🚀 Unlock Your Edge in Automated Forex Strategy Development 🚀

Historical Forex Data Starting From 1987, 28 Pairs, M1, 99% Error-Free, Lifetime Free Updates

0

Threshold

Customer, bbp_participant, community, 723 replies.

Visit profile

7 years ago #137397

I ran it last night, its definitely better.
I created a separate “master bin” folder copy. Before I ran the batch I deleted all the historical data I wouldnt be using from the bin folder. Many less gigs copied.

0

Threshold

Customer, bbp_participant, community, 723 replies.

Visit profile

7 years ago #137398

Averaging about 80-85% usage. 32SQs 2 threads each. 64SQs with 1 thread each would have been too much work.

File: 32SQs.png32SQs.png

0

Viewing 15 replies - 1 through 15 (of 21 total)

1 2