Is the code optimized for a single image? So far, at 5 lanpaint steps, it takes around 36 minutes on a 4090 and 256gb of ram.