Okay if I understand correctly your proposed PoW algorithm, we are generating pseudo-random values that can range over the address space of the memory size we target until we generate a duplicate (or triplicate). The Birthday math tells us that we will find a solution with 50% probability before visiting less than roughly 10% of the addresses, or 99% before visiting less than roughly 20% of the addresses. Thus this is a way to require large memory without actually writing to every memory address, i.e. a sparse memory access algorithm. Btw, I also considered this sort of algorithm before and dumped it.
Assuming the generation of the pseudo-random values can be accelerated by parallelism, i.e. that the algorithm is main memory latency bound, then the GPU is going to be much faster, because the GPU masks memory latency by employing 1000s of threads, i.e. we can test 1000s of pseudo-random values in parallel. This was the key myopia that caused Litecoin to fail at stopping GPUs from outperforming CPUs, although it did improve the difference relative to Bitcoin's PoW.
I don't see how the use of a hash table really changes the outcome significantly, as it will only serialize perhaps every 10 buckets and we assume the pseudo-random values are uniformly distributed over the entire address space. Collisions with a 1000 threads are going to be rare.
I have revealed my key insight into PoW. If you weren't aware of this and I am correct, I hope you tip me consumerately regardless if it leads you to make changes or not (I should not be responsible for your success, only for proving what doesn't work).
Edited: to insert word "not" after "I should" that was missing.