First you should identify the price performance per watt of the systems (processor+memory+network) over their lifetime in your use case. This turns out to be very difficult. The few benchmarks you find online can not be easily compared. The interesting choices are usually just released and have no benchmarks yet. More problematic, benchmarks seldom list the energy and cost of a system
every year I try to design the lowest price/performance/watt system I can find, I spend a few weeks on this. Even with that much effort I have not been able to establish if a cluster of raspberry pi's is faster and cheaper than an AMD EPYC with several cpu's. Establishing the cost is even harder, if you want to manufacture enough of the systems to benefit from scale. (not a problem in a homebrew system)
Currently I guestimate that the best price/performance/watt system is some mass produced $1-$3 ARM (soon RISC-V) with two DRAM chips and a fast network connection like an FPGA with high speed SERDES link fabric switch. Price per core plus memory must be way below $10. Raspberry pi's are no contenders, they only have 200-300 Mbps network for 4 cores to share, that should be several gigabits/s to be competitive.
A custom build AMD EPYC with several GPU's networked together can turn out to be faster and cheaper. The retail price needs to be lowered and you need to build a similiar network fabric switch.
You can find better performance if you tailor the processor design to the task. So a system with the best price/performance/watt will be different for different software.
Even cheaper will be systems where you can rebalance part of the compute recources between different programs. The system will can less efficient because the hardware reconfigurability has has a high overhead, but that is offset by being more efficient for one program. You could for example balance transistors between integer, floating point, cache or network-on-chip (NoC).
We are currently making prototypes of our own design of a reconfgurable manycore processor with NoC fabric. Around $9 per core in an FPGA. The entry system costs around $80 and $500.
We plan to build this as an ASIC, the price will then drop to $1 per core including DRAM.
Even cheaper will be if we not slice the wafer into 22000 chips but leave it whole. You get over 100.000 cores with little memory with petabits/s network for less than $6000. Half the wafer is reconfigurable logic that can be reprogrammed at runtime as GPU, TPU, CPU or any other custom optimisation. The energy cost of the wafer can be zero if you use the wafer as a water heating element and lower if you only run it on solar PV during the day. (If you share two wafers with a user on the night time side of the earth, you both can have 24 hour computing on $0,02 per kWh of solar PV). We can make a 180nm $500 version and a $1 version but they would not be as good as the 28nm $6000 version. A future 7nm wafer scale version might never be cheaper than the 28nm version, we will have to wait and see.
An silicon optical network on the wafer (also $6000) would allow two or more wafers to be networked at several terabits/s. This overcomes its small memory problem.
Think of it, you want to get rid of any overhead in the system, like pcb, chip package, connectors, cables. You want to put everything on a single large chip, the whole wafer. Because we have different CMOS technology for DRAM or processors, we wind up with needing one wafer.
We are confident in 30 years we can grow the wafer(or 3D block) and the solar panel from CO2.