jes notes Index Gallery .

Clocks ▼

House ▼

Ideas ▼

Machining ▼

Signed Distance Functions Shaft passers Snap issues

2025-03-18

Last modified: 2025-03-18 23:00:58

< 2025-03-17 2025-03-19 >

Shader compilation time
Multisampling
Peptide

Shader compilation time

This may be the sort of thing where I have to work on something else and hope to solve it by accident, so I probably shouldn't put off writing the interval arithmetic stuff much longer.

But today I do want to have one more go at figuring out why my shaders take so long to compile, or what I can do about it.

The first thing is coming up with a way to ensure that when I compile a shader it is actually recompiling it rather than using a cached result. So I think my map() function wants to permute the distance by some compile-time constant? Is that enough?

 float map(vec3 p) {
     p = rotatePoint(p);
     return ${document.shaderCode()} + ${Math.random()};
 }

For a document that is just a single triangle SketchNode, it compiles in 1323 ms (best case out of 6 or so refreshes).

With the in/out test removed, this goes down to 652 ms.

All 3 tests: 1323 ms
Only first 2 tests: 1103 ms
Only final test: 901 ms
No tests: 652 ms

In each case reporting the best result out of a bunch of refreshes. (Rationale for taking the "best" is that other random software stuff can slow it down but can't speed it up, so the fastest time is most representative of the time that is purely spent on compiling the shader).

So it looks like the compile time grows by about 225 ms per in/out test, apparently linearly.

The test is:

  s *= 1.0 - 2.0 * float((p2d.y >= va.y && p2d.y < vb.y && ds.y > 0.0) ||
                    (p2d.y < va.y && p2d.y >= vb.y && ds.y <= 0.0));

I just don't see how it takes 225 ms to compile that.

What if I run the test twice each time? It will always be a no-op because 1.0*1.0 == -1.0*-1.0, but I wonder how it affects compile time.

On the theory above we would expect about 650 + 6 * 225 = 2000 ms.

Best case I saw was 1773 ms! Which is not super far off.

And what if we make a square instead of a triangle? We'd expect maybe a bit more than 650 ms, plus 4 times 225 = maybe 1600 ms?

Best case was 1777 ms, which is curiously close to 1773 ms before. Is there some fixed-duration optimisation process or something, and it increases the amount of time it spends optimising based on the size of your shader?

ChatGPT told me about something called WEBGL_debug_shaders which can tell you what the GLSL of your "translated" shader looks like. Which is interesting because I didn't know it was going to be "translated". I think it basically looks like my original shader but with all the variables renamed like "webgl_2df94a985cc6c16b".

https://developer.mozilla.org/en-US/docs/Web/API/WEBGL_debug_shaders/getTranslatedShaderSource

ChatGPT suggested rewriting the in/out test as:

  float signCheck = step(va.y, p2d.y) * step(p2d.y, vb.y) * step(0.0, ds.y) +
              step(vb.y, p2d.y) * step(p2d.y, va.y) * step(ds.y, 0.0);
  s *= 1.0 - 2.0 * signCheck;

And... it now compiles in 175 ms?? I'm pretty sure I tried this before so I won't count my chickens too early, but that is looking promising.

It does seem to cause a slight visual glitch though, I think because it has basically turned a "<=" into a "<" or something. I guess I subtract eps from the first argument?

Oh! A much worse problem is that now if both sides of the disjunction are true then we get signCheck == 2.0.

This works:

  float signCheck = step(va.y, p2d.y) * step(p2d.y, vb.y-eps) * step(0.0, ds.y) +
              step(vb.y, p2d.y) * step(p2d.y, va.y-eps) * step(ds.y, 0.0);
  s = mix(s, -s, min(1.0, signCheck));

Great, so now the default object compiles in 1000 ms instead of 1900 ms. So there is more to do. But the document with just a triangle sketch is down from 1323 ms to 176 ms.

So what else in the default document is slow?

Starting from 1000 ms, how much do I lose when I delete each thing:

Sketch: 200 ms
Box: 150 ms
Extrude: 140 ms
DistanceDeform: 130 ms
Transform (rotate): 120 ms
Sphere: 120ms
Torus: 75 ms
Thickness: 20 ms
Transform (translate): ~0 ms

So the Sketch is still the most expensive part.

OK, and if I start with a blank document, how much does compilation time grow when I add each thing?

empty: 9 ms
Box: 215 ms
Box+Sphere: 277 ms
Sphere: 184 ms
Torus: 198 ms
Box+Torus: 324 ms
Sphere+Torus: 264 ms
Box+Sphere+Torus: 391 ms

A union of 0 objects is a no-op. Union of 1 object is free because the generated code is a straight pass-through. Union of 2 objects is one min() operation, and union of 3 objects is 2 min() operations.

But somehow adding a second object increases the overhead by less than adding the first object did.

It's pretty crazy that a document with just a sphere takes 184 ms to compile when the empty document only takes 9 ms, because the sphere code is just length(p) - r.

It would be really good if I could somehow statically compile the ray marching code and lighting code etc., and just recompile the SDF when it changes. Isn't that what gl.linkProgram() is meant to be for? But I couldn't get it to work last time I tried.

Cursor insists that it is meant to be possible.

I think it's not possible. The closest you could do is maybe express the SDF as a huge collection of uniforms?

Default scene is taking 5712 ms on laptop now. I'm going to look at deleting the secondary object code so that we can do the thing of drawing the secondary object with a second render pass of broadly the same shader.

Still takes about the same amount of time to compile the main one. How is that possible!

Deleting the in/out test in SketchNode brings it down to 3710 ms. So that is still massively expensive, for whatever reason.

If I comment out the edge detection code then it comes down to 823 ms! OK so that is part of the problem. Maybe if all the function calls in shaders are basically completely unrolled? detectEdge() makes 6 calls to calcNormal(), and calcNormal() makes 6 calls to map(). If that is how it works then we'd get 36 extra copies of the contents of map().

Instead of having calcNormal() look at 6 points nearby, and detectEdge() call calcNormal() for 6 points nearby, I think we could just have calcNormalAndEdge() look at 13 points in total, and calculate both the normal and the edge information from those 13 points.

So in one dimension, points at -2, -1, 0, 1, 2. And then the normal is based on the difference between points -1 and 1. And the edge is based on the difference from -2 to 0 and from 0 to 2.

Actually that still turns into like 20 evaluations in 3 dimensions.

How else could we detect edges? I think an edge is somewhere that has a discontinuous normal.

What if we compare "central differences" and "forward differences" in each direction? Then we don't need any more evaluations than we're already doing for the normals.

"Discontinuities" in the first derivative are characterised by very large values in the second derivative.

OK, so we can actually do it with just the 6 evaluations for central differences for the normal, plus the one evaluation for the actual point (which will be approximately 0.0000, but best to have the true value).

And now the default scene compiles in 807 ms on the laptop. So this is a big improvement.

One issue is we want a fixed offset for the normal calculation but a scale-dependent one for the edge detection. Never mind?

Multisampling

The multisampling doesn't actually seem to be working very well, it is giving jaggy edges even though resolution scale is about 7.5.

Do I need to make the canvas bigger than the viewport in order to get the benefit?

Peptide

With shader compile times "solved" to an extent, I am ready to start working on Peptide.

So the plan is:

make a tree structure of arithmetic expressions
evaluate the trees in JavaScript (and unit tests)
compile the trees to GLSL (and unit tests)
make Isoform use Peptide to convert SDFs to GLSL, and to show coordinates under cursor
evaluate the trees in JavaScript with interval arithmteic (and unit tests)
compile the trees to GLSL with interval arithmetic (and unit tests)
make Isoform use binary search over interval for rendering

OK, I have a basic version working with evaluation in JavaScript, with unit tests. But it only supports scalar values, need to make it support vectors.

Cool, now has vectors, and has methods to construct vectors, and select components from vectors.

I could start with compiling to GLSL by compiling to JavaScript instead and then running the same test suite on it. It might also be useful to have a "compiled" JavaScript form of the SDF as it will probably be more efficient to evaluate than the tree is.

So:

Turn the "tree" into a DAG by unifying identical subtrees
DFS the DAG and create static single assignment form
"register allocation"
codegen

For a first pass we can simplify translate SSA straight into JavaScript.

OK, I have it in SSA in JavaScript, and with a greedy register allocation algorithm, and generating equivalent GLSL code as well. And eliminating common subexpressions.

So next up I guess is working out how to unit test the GLSL code.

And then we can look at doing interval arithmetic, and skipping unused branches of min/max.

I think for testing GLSL code we want to run it on a fragment with a single pixel and then read out the colour that it draws?

Actually a good first test would be to check that it compiles.

< 2025-03-17 2025-03-19 >