korbin's comments

korbin · on June 21, 2024

SHA256 is designed as such that the maximum amount of data that can be contained within a single block is 440 bits (55 bytes.)

If you carefully organize the nonce at the end and use all 55 bytes, you can pre-hash the first ~20/64 rounds of state and the first several rounds of W generation and just base further iterations off of that static value (this is known as a "midstate optimization.")

> If you limit your variable portion to a base16 alphabet like A-P

The more nonce bits you decide to use, the less you can statically pre-hash.

In FPGA, I am using 64 deep, 8-bit-wide memories to do the alphabet expansion. I am guessing in CUDA you could something similar with `LOP3.LUT`.

korbin · on June 20, 2024

I'm at about ~22GH/s per Xilinx VU9P UltraScale+ FPGA - at least 40GH/s is possible (for around 250W/device.)

The nonce alphabet being limited makes the whole thing quite a bit more expensive.

jffry · on June 25, 2024

That's a beefy FPGA! Wish I had access to one to revive my 2009 era FPGA coursework knowledge but it seems the dev boards start at 5 figures which is too rich for a side project.

korbin · on July 4, 2024

You can obtain them on capable boards on eBay for $600-1000 - https://www.ebay.com/itm/326181455270