Last modified: 2025-03-18 16:05:21
< 2025-03-17This may be the sort of thing where I have to work on something else and hope to solve it by accident, so I probably shouldn't put off writing the interval arithmetic stuff much longer.
But today I do want to have one more go at figuring out why my shaders take so long to compile, or what I can do about it.
The first thing is coming up with a way to ensure
that when I compile a shader it is actually recompiling it rather
than using a cached result. So I think my map()
function wants to
permute the distance by some compile-time constant? Is that enough?
float map(vec3 p) {
p = rotatePoint(p);
return ${document.shaderCode()} + ${Math.random()};
}
For a document that is just a single triangle SketchNode
, it compiles
in 1323 ms (best case out of 6 or so refreshes).
With the in/out test removed, this goes down to 652 ms.
In each case reporting the best result out of a bunch of refreshes. (Rationale for taking the "best" is that other random software stuff can slow it down but can't speed it up, so the fastest time is most representative of the time that is purely spent on compiling the shader).
So it looks like the compile time grows by about 225 ms per in/out test, apparently linearly.
The test is:
s *= 1.0 - 2.0 * float((p2d.y >= va.y && p2d.y < vb.y && ds.y > 0.0) ||
(p2d.y < va.y && p2d.y >= vb.y && ds.y <= 0.0));
I just don't see how it takes 225 ms to compile that.
What if I run the test twice each time? It will always be a no-op
because 1.0*1.0 == -1.0*-1.0
, but I wonder how it affects compile time.
On the theory above we would expect about 650 + 6 * 225 = 2000 ms
.
Best case I saw was 1773 ms! Which is not super far off.
And what if we make a square instead of a triangle? We'd expect maybe a bit more than 650 ms, plus 4 times 225 = maybe 1600 ms?
Best case was 1777 ms, which is curiously close to 1773 ms before. Is there some fixed-duration optimisation process or something, and it increases the amount of time it spends optimising based on the size of your shader?
ChatGPT told me about something called WEBGL_debug_shaders
which can
tell you what the GLSL of your "translated" shader looks like.
Which is interesting because I didn't know it was going to be
"translated". I think it basically looks like my original shader but
with all the variables renamed like "webgl_2df94a985cc6c16b".
https://developer.mozilla.org/en-US/docs/Web/API/WEBGL_debug_shaders/getTranslatedShaderSource
ChatGPT suggested rewriting the in/out test as:
float signCheck = step(va.y, p2d.y) * step(p2d.y, vb.y) * step(0.0, ds.y) +
step(vb.y, p2d.y) * step(p2d.y, va.y) * step(ds.y, 0.0);
s *= 1.0 - 2.0 * signCheck;
And... it now compiles in 175 ms?? I'm pretty sure I tried this before so I won't count my chickens too early, but that is looking promising.
It does seem to cause a slight visual glitch though, I think because
it has basically turned a "<=" into a "<" or something. I guess
I subtract eps
from the first argument?
Oh! A much worse problem is that now if both sides of the disjunction
are true then we get signCheck == 2.0
.
This works:
float signCheck = step(va.y, p2d.y) * step(p2d.y, vb.y-eps) * step(0.0, ds.y) +
step(vb.y, p2d.y) * step(p2d.y, va.y-eps) * step(ds.y, 0.0);
s = mix(s, -s, min(1.0, signCheck));
Great, so now the default object compiles in 1000 ms instead of 1900 ms. So there is more to do. But the document with just a triangle sketch is down from 1323 ms to 176 ms.
So what else in the default document is slow?
Starting from 1000 ms, how much do I lose when I delete each thing:
So the Sketch is still the most expensive part.
OK, and if I start with a blank document, how much does compilation time grow when I add each thing?
A union of 0 objects is a no-op. Union of 1 object is free because
the generated code is a straight pass-through. Union of
2 objects is one min()
operation, and union of 3 objects is 2 min()
operations.
But somehow adding a second object increases the overhead by less than adding the first object did.
It's pretty crazy that a document with just a sphere takes 184 ms
to compile when the empty document only takes 9 ms, because the
sphere code is just length(p) - r
.
It would be really good if I could somehow statically compile the
ray marching code and lighting code etc., and just recompile the SDF
when it changes. Isn't that what gl.linkProgram()
is meant to be for?
But I couldn't get it to work last time I tried.
Cursor insists that it is meant to be possible.
I think it's not possible. The closest you could do is maybe express the SDF as a huge collection of uniforms?
Default scene is taking 5712 ms on laptop now. I'm going to look at deleting the secondary object code so that we can do the thing of drawing the secondary object with a second render pass of broadly the same shader.
Still takes about the same amount of time to compile the main one. How is that possible!
Deleting the in/out test in SketchNode
brings it down to 3710 ms. So that is
still massively expensive, for whatever reason.
If I comment out the edge detection code then it comes down to 823 ms! OK so
that is part of the problem. Maybe if all the function calls in shaders are basically completely unrolled?
detectEdge()
makes 6 calls to calcNormal()
, and calcNormal()
makes 6 calls to map()
.
If that is how it works then we'd get 36 extra copies of the contents of map()
.
Instead of having calcNormal()
look at 6 points nearby, and detectEdge()
call calcNormal()
for 6 points
nearby, I think we could just have calcNormalAndEdge()
look at 13 points in total, and calculate both the normal
and the edge information from those 13 points.
So in one dimension, points at -2, -1, 0, 1, 2. And then the normal is based on the difference between points -1 and 1. And the edge is based on the difference from -2 to 0 and from 0 to 2.
Actually that still turns into like 20 evaluations in 3 dimensions.
How else could we detect edges? I think an edge is somewhere that has a discontinuous normal.
What if we compare "central differences" and "forward differences" in each direction? Then we don't need any more evaluations than we're already doing for the normals.
"Discontinuities" in the first derivative are characterised by very large values in the second derivative.
OK, so we can actually do it with just the 6 evaluations for central differences for the normal, plus the one evaluation for the actual point (which will be approximately 0.0000, but best to have the true value).
And now the default scene compiles in 807 ms on the laptop. So this is a big improvement.
One issue is we want a fixed offset for the normal calculation but a scale-dependent one for the edge detection. Never mind?
The multisampling doesn't actually seem to be working very well, it is giving jaggy edges even though resolution scale is about 7.5.
Do I need to make the canvas bigger than the viewport in order to get the benefit?
With shader compile times "solved" to an extent, I am ready to start working on Peptide.
So the plan is:
OK, I have a basic version working with evaluation in JavaScript, with unit tests. But it only supports scalar values, need to make it support vectors.
Cool, now has vectors, and has methods to construct vectors, and select components from vectors.
I could start with compiling to GLSL by compiling to JavaScript instead and then running the same test suite on it. It might also be useful to have a "compiled" JavaScript form of the SDF as it will probably be more efficient to evaluate than the tree is.
So:
For a first pass we can simplify translate SSA straight into JavaScript.
< 2025-03-17