Floating point numbers and colors…

… and why the idea of mapping a floating point color value of 1.0 to the byte value 255 was a really bad idea.

We’re stuck with it, I know, and I found a way to hack myself around the problem, but still, it’s one of those things that may have seemed like a good idea at the time, but the sensible thing would still have been to map 1.0 to 256 – a lot of things would have worked a lot nicer that way.

This all started with me messing around with shaders in Unity. I needed to pass a bunch of data from the Unity application all the way down to the fragment shader, and decided that the best way to do this was to encode it in a texture – I mean 4 floats per pixel and relatively cheap access from the shader – what could go wrong?

The first couple of issues I ran into was a lot of sub-systems’ over-zealous desire to “improve” my texture before handing it over. No thank you, I do not want that pixel averaged and mipmapped. This was mostly a matter of setting the right options and properties, so I got that sorted.

The real problem, however, was that I somehow managed to completely ignore what a texture really is: An array of 8-bit color values. While we may see pixel values as a nice vector of 4 floats in both Unity and the Shader code, the color itself lives a small portion of its live in the cramped space of just 4 bytes.

And here comes the mapping issue: Because 1.0 is mapped to 255. The nice round floating point value of 0.5 does not become a nice round 128 byte-value. It becomes 127.5. You cannot store 127.5 in a byte, so it ends up as 127. When you convert 127 back to a float it, in turn, ends up as 127/255=0.498..something. Which is not just off from the 0.5 I was hoping for, it is also rounded down, so when I try to use it to find a tile in a texture, I end up in the wrong tile.


What I really needed to store was the integer values from 0 to 15, so here is my solution:

In unity: color.r = N/16 + N/4080;

In my shader: N = floor(color.r*16)

Not rocket science; I convert the number to a float in the range [0;1] , and add a magic offset. N/4080 is the smallest number that, after being converted to a byte, will not cause the conversion back to float to be rounded down. This number depends on N due to the nature of FP numbers. Why the constant is 4080 I don’t know, but I’m sure there’s some IEEE FP guru somewhere, who can enlighten me.

For now, I am just happy that my textures are tiling properly šŸ™‚

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s