I agree, I would love a mono version of this sensor, sadly unlikely it seems.

Regards the pixel size, what you wrote is more or less what I had decided is most likely right, with the exception that each colour "Pixel" is actually an already software binned 2X2 block of that colour pixel. The physical pixel size is about 2.31um.

The Sony literature shows the ability to play tricks with that configuration in it's original life as a security camera sensor, making HDR frames in a single exposure by exposing the sub pixels in pairs rather than the 2X2 "pixel" and exposing each pair for a different time but starting concurrently as a way to reduce the blurring of moving objects that HDR produces.
