Dual Xeons - a processing platform I am playing with
Living 13kms North East of Sydney there is not much I can do with our nightime skies and light pollution - but I did want to experiment with what can one achieve if one integrates massive amounts of data on a target. Do to that in a sensible amount of time requires a lot of compute and I/o power.
So it got my thinking what are my options - and I could see four obivious paths:
1. Upgrade my existing workstation from a 8 core extreme I7 to a 16 core I9-10980XE
2. Build a AMD Ryzen Threadripper 3990X based rig (64 cores)
3. Buy a old HP Z640 dual Xeon E5-2699v3 (36 cores)
4. Rent a AWS massive compute and I/O cluster when I want to stack and integrate shots
The first option I may do down the track - the chips aren't so common and finding one and potential upsizing my water cooler for the larger hotter CPU meant this path would probably cost $1,400.
The second option is likely the most powerful - the CPU alone costs around $5K its basically a baby EPYC core - but a solution going down that pathway could be in the $8K - $10K route - a bit too much for a processing love to have.
Option 4 was a bit tricky to justify - as there would be a lot of data to move up and down from the cloud which would require considerable time and internet bandwidth - and I would need not just very high compute - high I/Ops is required too - so lacking the experience I discounted this option.
So that left option 3 - and during a roadtrip to the Sunshine coast over the past two weeks opportunity struck. The TechFactory just before Brisbane had a HP Z640 Win 10 Pro based dual Xeon with 36 cores and 72 threads, a P4000 graphics card (about equivalent to a NVidia 1070) a 1TB SSD, 8 TB HDD and 256GB RAM for $1,900 so I jumped on it.
I also picked up a 27" 2K monitor, keyboard and mouse for $200 and my wife got a great Mac Air - so it was a real fun visit to a place that re-purposes end of life high end servers, switches and workstations - its was a real geeks playground - they had militarly grad equipment everywhere.
So I got back home and set it all up. First task was to improve I/O - so I added a old ASUS Hyper M.2 X16 Gen 4 PCIE card that I had bought years ago and never used (for lack of PCIE 16 lanes). To this I added 4 x 2TB MP600 Elite cards and put them in RAID 0 as my working space - a 8TB scratch file space. So all up that is another $1K in gear. The Z640 BIOS makes it easy to bifrucate a PCIE X16 lane into 4x/4x/4x/4x that you need to see all of the M2 drive and thus stripe them under Windows Disk Management into a dynamic disk. The only losses you get is the Z640 is only PCIE x16 gen 3.0 - whereas the drives and hypercard are Gen 4 - so although each drive is rated to 7000 MB/sec under gen 4 this halves under gen 3. But in RAID scores above 8000 MB/sec are commonplace which makes for blindingly fast I/O.
So all the rest I had to do was install the latest NVidia Quadro P4000 drives, instal PI and install GPU acceleration for PI using CUDA. These CPUs are not Windows approve for Windows 11 - so I am stuck on Win 10 Pro for a while.
Testing PI's WBPP showed interesting results - PI only uses half the processor cores. It sees 72 logical cores and schedules significant work for this many cores - but Windows seems to only dispatch all the work to one of the CPUs - the first NUMA node of 36 logical cores - the rest just sit idle.
So at 50% CPU load this new workstation too about 3 hours to stack and integrat 730 subs - that took my old workstation 4 hours. So if half the processors can do that - I will be really keen to see what the rig can do when fully loaded! It is also super quiet - at full load it is basically totally silent and cool. The PI guys are working on release 1.9 due out in the next few weeks - but are trying to investigate why my and a few other rigs only use half the avialable processors (and this is only a Windows behaviour - on Linux all processors get used).
I also saw very unusual CPU scheduling behaviour on the RC-Astro Xterminator suite. My old workstation could process an image with BlurXterminator in about 3 minutes on CPU and 20 seconds on GPU; the new one took 40 minutes on CPU (with only about 5-6 cores only 5% loaded) but on GPU it did it in under 40 seconds - a 60 fold improvement - pointing to something really off with the workload dispatcher.
So for anyone thinking of having a dedicated astro processing rig - a used HP Z640 of the right configuration, core count and memory size can be nought for a real steal nowadays!
Last edited by g__day; 25-11-2024 at 08:56 PM.
|