Last modified: 2025-02-18 22:30:47
< 2025-02-15 2025-02-19 >So the latest on this is I have a version of FreeCAD
build and run from inside the ubuntu-gpu
distrobox.
So then what do I want to do to it?
My sense is that pattern operations are kind of quadratic-time. If you imagine the first one cuts a shape out of the body by intersecting it, and then the second one has a more complicated body to intersect against, and the third one has an even more complicated body to intersect against. Like if intersection is linear in the number of faces or whatever, and each item of the pattern increases the number of faces by some fixed amount, then the overall pattern operation is quadratic.
So maybe I start by doing a slow pattern operation with callgrind, and see where it is slow?
FreeCAD is very slow in callgrind.
This is the object I'm using:
It is a rectangle padded up, then a circle pocketed in, and PolarPattern around Z axis. And I exercise the patterning code multiple times by repeatedly clicking the "up" arrow in the PolarPattern form to make it re-run the calculation with 1 more item each time.
Then eventually just close FreeCAD, then look in KCacheGrind.
Initial map is:
I think maybe too much of the profile was spent on startup time, so it's not highlighting whatever is slow in patterning. According to ChatGPT I can launch with
$ valgrind --tool=callgrind --instr-atstart=no bin/FreeCAD
And then use:
$ callgrind_control -i on $pid
to start profiling. Let's give it a go.
Now we have:
From a quick look I think all the slow stuff is inside OpenCascade. Perhaps I need to be building my own OpenCascade. And in fact this might be helpful: if I can work out what it is doing with OpenCascade that is slow, then I can make a minimal test program that does the same thing, and skip the whole FreeCAD layer entirely.
I have found the code that implements the PolarPattern feature, the important bit is
in Transformed::execute()
in src/Mod/PartDesign/App/FeatureTransformed.cpp
. There is
code that loops through all the "original tool shapes" and transforms them according to
the pattern and applies them again. So yes my earlier speculation about how it would
end up being quadratic time was basically right.
The quickest thing I can do is make it break the loop early if it gets a SIGHUP, and then also make the GUI level fork a child process which just shows a window with a single button on it which sends a SIGHUP.
And once the concept is proven we could maybe do the same thing but using threads instead of forking, if it matters.
Another idea is to make this loop run a tick of the event loop on every iteration, not sure if that is possible and then forget about threads.
But first: build it, and see if manually sending a SIGHUP interrupts the operation.
But somehow my distrobox is broken now. cmake
wasn't installed, I think because I had
already built FreeCAD by the time I created the GPU distrobox, so I only installed
the runtime dependencies, but now if I try to install cmake
I get:
jes@ubuntu-gpu:~$ sudo apt install cmake
...
You might want to run 'apt --fix-broken install' to correct these.
The following packages have unmet dependencies.
cmake : Depends: cmake-data (= 3.28.3-1build7) but it is not going to be installed
E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).
jes@ubuntu-gpu:~$ sudo apt --fix-broken install
...
Unpacking cmake-data (3.28.3-1build7) ...
dpkg: error processing archive /var/cache/apt/archives/cmake-data_3.28.3-1build7_all.deb (
--unpack):
unable to make backup link of './usr/share/cmake-3.28/Modules/Compiler/NVIDIA-CUDA.cmake'
before installing new version: Invalid cross-device link
dpkg-deb: error: paste subprocess was killed by signal (Broken pipe)
Errors were encountered while processing:
/var/cache/apt/archives/cmake-data_3.28.3-1build7_all.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)
Maybe the plan is to still use the old distrobox for building it and just use the GPU one for running it? That works.
I tried making the signal handler set a volatile int
, but it didn't seem to be working.
Claude suggested making it a std::atomic
or something.
OK, I added some logging to say what the loop iterations are doing, and it looks like it only does one iteration? I guess it is looping over the bodies that I'm cutting away, so I am handling SIGHUP in the wrong place. There is a more-inner loop elsewhere.
It's getTransformedCompShape
. This function is the one that iterates over your
transformations. So I need to make this one stop iterating if you get a SIGHUP,
and then make the function that calls it give up and return early if it got a SIGHUP.
Seems to be working!
And if the processing gets far enough that getTransformedCompShape()
completes and
OpenCascade starts applying your transformation, then I guess there's nothing you can do?
I've switched to SIGUSR1.
https://github.com/FreeCAD/FreeCAD/commit/20f2f0a6afa7c4dcdaa373febe9b5b8f141d3e42
Next up is making some sort of zenity
UI pop up or something if it isn't finished
within a few seconds.
And maybe I could abstract this into some sort of "Abortable Operation" function,
instead of special-casing it in FeatureTransformed.cpp
.
OK, so now we're at:
FeatureTransformed.cpp
to launch a Qt dialog doesn't work because PartDesign is a standalone .so file that isn't linked with all the Qt widget stuffSo we either need to move the GUI creation stuff further up the call stack, or else (better) run the pattern operation in a separate pthread or whatever so that we have the option to cancel it and it doesn't block the GUI.
OK, I think I'm wrong about getTransformedCompShape()
being the slow part. The slow part
is actually a single call into OpenCascade, which is much more annoying to cancel.
So the new plan is to try to get FeatureTransformed.cpp
to do its work in a child
process and just send the resulting shape over a pipe.
And get ViewProviderTransformed.cpp
to pop up a modal that kills the child process
if you click the abort button.
Wow, this is actually working. So good.
I need the modal to disappear on its own when the transformation is complete. And I want the modal to not appear until a few seconds have passed, otherwise it's just annoying and distracting.
OK, great, all working!
https://github.com/jes/FreeCAD/commit/a496b9bb2d5134d3bafefb9fc1f53f246a602da7
One regression is that we used to get more fine-grained info in error cases, but now it is just "Child process failed".
And another is that it will be totally broken on Windows etc. I should move the computation code into another function, and on Linux we call that function in the child process, and on Windows we just call it directly pending any better ideas.
Also the comments are kind of not very good, need to address that when I'm done.
Annoyingly it's only working on fuse operations! It hangs on cuts. Not sure what is wrong. If I can't fix it quite quickly I might just make a demo video of what I already have on fuse operations, and open a draft PR.
Another issue is it looks like opening up the Transform dialog results in running the calculation (at least) twice! We should definitely fix that.
Also it seems to do 2 cuts when you ask for 2 copies of the thing - but aren't you already starting from a body that has 1 cut in it?
No, it's not a difference between cuts and fuses. It seems to be that it works if you're just doing a pattern of 1 existing feature (i.e. rotating copies of the entire body?), but doesn't work if you are just doing a pattern of one feature on to an existing body.
Maybe if I try passing the shapes in over a pipe instead of accessing them directly in memory?
Ah! I think the problem might simply have been that I applied my changes from one repo to a repo from a slightly different tip, and then copied the file back across, and there actually were other changes to the same file in the mean time. Hopefully that's all it was.
Nope :(. Still bad.
So now I'm trying serialising the shape before forking and then deserialising it after. Not super hopeful though.
Yeah, still no good.
So the next thing to do is make sure I'm starting from the version that I pushed earlier, and not just that one single file copied into a different tree.
Done that, still hangs. So what is the problem here?
So the next thing is:
Issues are:
Surprisingly, the child process is not stuck in an infinite loop. I attached with strace
and it looks like it is stuck on some sort of mutex:
futex(0x61616b3cd360, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY
I should attach with gdb and see where it is.
(gdb) bt
#0 0x0000797d6863bd71 in __futex_abstimed_wait_common64 (private=31101, cancel=true,
abstime=0x0, op=393, expected=0, futex_word=0x61616b3cd360)
at ./nptl/futex-internal.c:57
#1 __futex_abstimed_wait_common (cancel=true, private=31101, abstime=0x0, clockid=0,
expected=0, futex_word=0x61616b3cd360) at ./nptl/futex-internal.c:87
#2 __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x61616b3cd360,
expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0,
private=private@entry=0) at ./nptl/futex-internal.c:139
#3 0x0000797d6863e7ed in __pthread_cond_wait_common (abstime=0x0, clockid=0,
mutex=0x61616b3cd310, cond=0x61616b3cd338) at ./nptl/pthread_cond_wait.c:503
#4 ___pthread_cond_wait (cond=0x61616b3cd338, mutex=0x61616b3cd310)
at ./nptl/pthread_cond_wait.c:627
#5 0x0000797d4f4b670c in Standard_Condition::Wait() ()
from /lib/x86_64-linux-gnu/libTKernel.so.7
#6 0x0000797d4f4b1df8 in OSD_ThreadPool::EnumeratedThread::WaitIdle() ()
from /lib/x86_64-linux-gnu/libTKernel.so.7
#7 0x0000797d4f4b3555 in OSD_ThreadPool::Launcher::wait() ()
from /lib/x86_64-linux-gnu/libTKernel.so.7
#8 0x0000797d4d76f296 in BOPAlgo_PaveFiller::FillShrunkData(TopAbs_ShapeEnum, TopAbs_ShapeEnum) () from /lib/x86_64-linux-gnu/libTKBO.so.7
#9 0x0000797d4d74912c in BOPAlgo_PaveFiller::PerformEE(Message_ProgressRange const&) ()
from /lib/x86_64-linux-gnu/libTKBO.so.7
#10 0x0000797d4d73eb50 in BOPAlgo_PaveFiller::PerformInternal(Message_ProgressRange const&) () from /lib/x86_64-linux-gnu/libTKBO.so.7
#11 0x0000797d4d7386a3 in BOPAlgo_PaveFiller::Perform(Message_ProgressRange const&) ()
from /lib/x86_64-linux-gnu/libTKBO.so.7
#12 0x0000797d4d6ed550 in BRepAlgoAPI_BuilderAlgo::IntersectShapes(NCollection_List<TopoDS_Shape> const&, Message_ProgressRange const&) () from /lib/x86_64-linux-gnu/libTKBO.so.7
#13 0x0000797d4d6e3464 in BRepAlgoAPI_BooleanOperation::Build(Message_ProgressRange const&) () from /lib/x86_64-linux-gnu/libTKBO.so.7
#14 0x0000797d37ac0839 in Part::TopoShape::makeElementBoolean(char const*, std::vector<Part::TopoShape, std::allocator<Part::TopoShape> > const&, char const*, double) ()
from /home/jes/projects/FreeCAD/build/Mod/Part/Part.so
#15 0x0000797d37ab110d in Part::TopoShape::makeElementCut(std::vector<Part::TopoShape, std::allocator<Part::TopoShape> > const&, char const*, double) ()
from /home/jes/projects/FreeCAD/build/Mod/Part/Part.so
#16 0x0000797d2a8b6dcf in PartDesign::Transformed::execute() ()
from /home/jes/projects/FreeCAD/build/Mod/PartDesign/_PartDesign.so
So, somewhere deep inside OpenCascade, it is waiting on something that never comes.
I wonder if OpenCascade has spawned a thread pool or something that we then lose when we fork? How can I get a reference to the OSD_ThreadPool or whatever? Or make a fresh one in the child?
If I find the interface between FreeCAD code and OpenCascade code, won't that have to pass in all the required state? Or is it global state? Global state might be easier because I can then re-initialise it without asking anyone?
Part::TopoShape::makeElementBoolean
is the last function in FreeCAD land.
Hello:
mk->SetRunParallel(Standard_True);
OSD_Parallel::SetUseOcctThreads(Standard_True);
Maybe I invert that and see what happens.
Ho ho! Now it works. I don't really want to give up multi-threading though.
I wonder if I can find where FreeCAD initialises the OSD_ThreadPool
, and do the
same thing in the child?
It appears it doesn't initialise the ThreadPool at all. I wonder how many threads it is using?
I might be able to call OSD_ThreadPool::DefaultPool(10)->Init(10)
to reinitialise
with 10 threads.
Nope, that segfaults.
OK, what if we keep SetRunParallel
but turn off SetUseOcctThreads
? Still hangs.
OK then, the "big guns" is instead of a straightforward fork, we spawn a helper process, send its arguments over a pipe, and get its results over a pipe, and it can initialise its own OpenCascade.
OK, I think doing this in FeatureTransformed.cpp
is not the right place. I should
do it in TopoShape.makeElementCut()
and TopoShape.makeElementFuse()
. That way
the helper process just needs to know whether it is cutting or fusing, and the shapes
it's working with, all of which are easy to serialise.
So I need an alternative implementation of makeElementCut()
and makeElementFuse()
that come with some way to abort, and then FeatureTransformed
just passes the
abort signal along.
I think we want an alternative makeElementBoolean()
that, instead of waiting until
it's complete, instead gives you a handle that only has 2 methods: one requests an
abort, and one waits for the child process to complete and yields the result.
makeElementBooleanAsync()
let's say.
So makeElementBooleanAsync()
serialises its arguments, launches the helper
process, and constructs the AsyncProcessHandle
which knows the pid of the child,
and the fd that the results will come on.
And then the caller code changes from:
shape.makeElementBoolean(...);
to:
void Transformed::abort() {
this.isAborted = true;
this.handle.abort();
}
...
this.isAborted = false;
...
this.handle = shape.makeElementBooleanAsync(...);
shape = this.handle.join();
if (this.isAborted) {
return App::DocumentObjectExecReturn("aborted");
}
OK, update: I may be making this way more complicated than it needs to be.
There is already FreeCADCmd
which is a standalone Python console for FreeCAD.
Could I write my child process in Python and launch it with FreeCADCmd
?
Meh, don't think so, I can't see any way to tell it to run a particular Python script.
Maybe there is a way to get FreeCADCmd
to do it. For now I'm making the "Part" module
create a new binary called "BooleanWorker", and I'll make it shell out to that
if available, and otherwise fallback to conventional blocking mode. So the
AsyncProcessHandle
also wants to have a way to say that actually it's not
asynchronous at all and the handle does nothing. And the FeatureTransformed code needs
to also not try to abort()
the handle before it is created.
So next steps are:
makeElementBooleanAsync()
And then fixing issues: