If you look into that benchmark - the dual Xeons are faster than the M4 - its the NVME drives that owner has paired them with that is retarding his score.
But regardless it is rather impressive that a M4 can keep up with a ten year old dual Xeon - likely because PI is not compiled to leverage the Xeon CPUs specific matrix manipulation capabilities
A further update from STAstro - he has found two interesting insights:
1. WBPP is faster under Windows then Ubuntu - total surprise there
2. Under Windows v1.8.9-1 is materially slower in WBPP than v1.8.9-3 - most of this accrues to Local Normailsation being about 40% faster in the latest version of PI running WBPP v2.7.8
That linux benchmark is my PC, a used Dell Precision T7910 workstation that I bought off eBay for $1200 in Jan 2024. What does a Macbook Pro M4 Max (the machine used in the other comparison test liked above) set you back...?
If you look into that benchmark - the dual Xeons are faster than the M4 - its the NVME drives that owner has paired them with that is retarding his score.
curious about your comment re NVME swap drives compromising performance. The swap storage directories (16 in all) are on a fast NVME SSD that is separate to both the OS drive and the PI working directory.
Other than using a ramdisk for PI swap, how else would you suggest setting up the swap directories to improve performance?
That linux benchmark is my PC, a used Dell Precision T7910 workstation that I bought off eBay for $1200 in Jan 2024. What does a Macbook Pro M4 Max (the machine used in the other comparison test liked above) set you back...?
nice job with the setup / benchmark.
i think you're looking at about $5k for that particular model.
I am not sure how he has set up his NVME drives - but if it used an add on Gen 4 PCIE card and set it up in RAID 0 that score would double and - if he is on Gen 5 then its performance would be double again! RAMDisks sometimes don't score as high as the very best NVME drives in RAID 0 - not sure why that is the case though...
BTW PixInsight v1.9 dropped yesterday - for a brief moment in time - I had the two fastest Windows based systems scores in the world that's right - my setup were the only two Windows enteries. V1.9 for Windows scores about 2,000 points lower than V1.8.9-3 - I haven't tried WBPP on it yet though!
Well a very interesting latest release by PixInsight v1.9.3 that delivers 20% - 30% performance improvements (real world and benchmarks) by the thoughtful practice of once-off performance monitoring of worker thread numbers and sub threads spawned versus image sizes. This is basically individual machine processor optimisation of thread counts for the most frequent and heavily used tasks in PI.
So once this update is installed and all patches download - users are encouraged to run processor optimisation to create their own optimisation profile. On 8 core machines this takes about 10 - 12 minutes. On my 36 core Xeons it took about 35 minutes. One user with a 48 core / 96 thread platinum Xeon took 70 minutes running this diagnostic.
So the end results - my best run with PI version 1.9.2 scored 17,523 - my best run with version 1.9.3 is 20,207 - a very nice improvement for free!
Interesting to see one the third fast rig on the table - a Platnium Xeon 8558P - its scores top out as 21,473 for Windows and 55,091 on Linux - with exactly the same gear.
Makes me ponder would my scores would also increase by 2.5x (giving me the 4th fast rig benchmarked - vs my current 28th ranking!
Lastly a user with Xeon Gold 6536N processor has not only mapped Windows vs Linux scores - he has done this with 8, 16, 24, 32, 48, 56 and 64 threads - and the surprising conclusion is this (and most recent versions of) PI scale very poorly with increased processor threads in Windows - but on the exact same gear performance scales linearly in Linux - pointing to a poor Windows coding issue!
Wow, I want your computer, I must check Grey's auctions.
I tried a couple of old servers but they weren't that good. I may have been running Windows server edition on them, I don't remember, one was a 4 processor unit but old processors and they sucked energy like sand sucks water.
So the end results - my best run with PI version 1.9.2 scored 17,523 - my best run with version 1.9.3 is 20,207 - a very nice improvement for free!
Interesting to see one the third fast rig on the table - a Platnium Xeon 8558P - its scores top out as 21,473 for Windows and 55,091 on Linux - with exactly the same gear.
Makes me ponder would my scores would also increase by 2.5x (giving me the 4th fast rig benchmarked - vs my current 28th ranking!
Lastly a user with Xeon Gold 6536N processor has not only mapped Windows vs Linux scores - he has done this with 8, 16, 24, 32, 48, 56 and 64 threads - and the surprising conclusion is this (and most recent versions of) PI scale very poorly with increased processor threads in Windows - but on the exact same gear performance scales linearly in Linux - pointing to a poor Windows coding issue!
so plenty more free performance to be had with an OS upgrade! would be interesting to see your results if you decide to add a linux partition.
I may try the Linux path in the future - if the PI team can't figure out what in the thread management is bottlenecking PI. The very fast Linux running under Windows runs far faster then running directly under Windows should give them some hint of where to look for what is going askew!