jes notes Index Gallery . Shaft passers Snap issues

2023-11-25

Last modified: 2023-11-25 22:52:20

< 2023-11-24 2023-11-26 >

Garage

I've put all the decking planks up on the wall, and some of the longer other planks, and put the other bits of wood storage nearby:

I've put a big shelf up above the workbench:

And sorted out the cans, sandpaper, etc.:

I'm slightly concerned that the "data" light isn't showing on the powerline ethernet adapter, is it only within range intermittently? I may need to move it.

Next up is probably putting MDF top on the island bench and the computer desk, and putting up some plywood for the "tool wall".

MDF on the island bench (albeit messy already):

I often like to use this bench for applying finishes, and the old bench top got lots of paint and lacquer on it because of that. I kind of want to keep the new top clean and smooth and flat, so perhaps I should get a big roll of paper to use as a sacrificial top when I'm painting?

And I used an offcut from the island bench top to make the (first piece of?) "tool wall":

Plywood would be a better choice for the tool wall, but the offcut from the island bench top was almost exactly the right size, so I used it. It still needs fixing to the wall, and some screws or hooks or something to hold the tools. It is smaller than I imagined, so there may end up being another piece.

Stepper motor clock

Some reference information for making the stepper motor clocks:

The Geneva drive pin is 2mm diameter and 20mm long.

Shaft lengths required (all in 5mm aluminium round bar, with ~1mm chamfer at each end):

The frame pieces are 100mm wide, and 120mm long for the base and 240mm long for the backplane.

Here's the arrangement of wheels and shafts:

SCAMP

I wrote some more tests, using a lot of the installed system binaries, so it should indirectly test quite a lot of the compiler functions.

I'm wondering if I should implement regexes with a Thompson NFA, in preparation for Advent of Code. OK, I seem to have managed so far without regexes, but it might be fun anyway. And it could make my grep more powerful.

Features that I want:

Forgetting the Thompson NFA idea for a second, how would I naively implement this? I'd probably turn it into an expression tree and then evaluate the expression a bit like how slangi works.

Russ Cox says (in 2007) that the Thompson NFA is a lot faster than Perl: https://swtch.com/~rsc/regexp/regexp1.html

How does my idea differ from an NFA?

it is possible to write so-called “pathological” regular expressions that Perl matches very very slowly. In contrast, there are no regular expressions that are pathological for the Thompson NFA implementation

...

Today, regular expressions have also become a shining example of how ignoring good theory leads to bad programs. The regular expression implementations used by today's popular tools are significantly slower than the ones used in many of those thirty-year-old Unix tools.

This feels analogous to how Harrison invented good pendulum clocks in the 1700s but people ignored his ideas and kept making bad pendulum clocks.

I wonder if I would use my parse.sl lib in 2 different ways: once to parse the regular expression, to build the compiled form, and then once from within the compiled form to check whether a string matches?

No, I don't think backtracking with parse.sl is optimal:

A more efficient but more complicated way to simulate perfect guessing is to guess both options simultaneously. In this approach, the simulation allows the machine to be in multiple states at once. To process each letter, it advances all the states along all the arrows that match the letter.

So Thompson's contribution is not that a regular expression can be turned into an NFA, but that an NFA can be evaluated more efficiently than naive backtracking.

In the worst case, the NFA might be in every state at each step, but this results in at worst a constant amount of work independent of the length of the string, so arbitrarily large input strings can be processed in linear time. This is a dramatic improvement over the exponential time required by the backtracking approach.

Apparently Thompson's implementation compiled the NFA such that every state was turned into machine code, and then the "super-state" was a list of functions to call with the next character, that I guess would add their next states (if any) to the next list of functions to call with the next character.

So let's say each state is a function that takes a character and a bitmap of "next states", and it sets a 1 in the bitmap for each state that it can move to?

Then we use "double-buffering", clear out the "next states" bitmap, call each state function from the "current states" bitmap, and repeat until done? How does capturing work? Is there a more time-/memory-efficient "set" data structure than a bitmap that I should be using instead?

Instead of using a bitmap, the Cox document suggests associating with each state a "lastlist" id, and when you add that state to the "next states" list, you set its "lastlist" to be the current list id, and increment the list id at each step. Then when adding a state to the list, you check if that state's "lastlist" is the current list id, and if so it is already in the list and doesn't need to be added again. This is better than a bitmap because it permits evaluating the states in the list without having to check every element of the bitmap.

Re. capturing:

The extraction of submatch boundaries has been mostly ignored by computer science theorists, and it is perhaps the most compelling argument for using recursive backtracking. However, Thompson-style algorithms can be adapted to track submatch boundaries without giving up efficient performance. The Eighth Edition Unix regexp(3) library implemented such an algorithm as early as 1985, though as explained below, it was not very widely used or even noticed.

But the page doesn't explain how to do it. One obvious complication is conditional capturing, like /abc|(123)/. There might be a capture of "123", or there might not, even if the regex matches.

In principle I think I'm happy for the library to behave weirdly in the case of conditional captures, but I think it should still correctly decide whether a string matches a regex, it should still capture something sensible if it captures anything at all, and it should not crash. I'm also happy for conditional captures to be a compile error. But I'm not happy for it to cause crashes.

There is a man page for 8th edition regexp(3): https://man.cat-v.org/unix_8th/3/regexp - but it contains nothing of interest.

Apparently Rob Pike wrote the code that became the 8th edition regexp code, but:

Pike's implementation incorporated submatch tracking into an efficient NFA simulation but, like the rest of the Eighth Edition source, was not widely distributed. Pike himself did not realize that his technique was anything new. Henry Spencer reimplemented the Eighth Edition library interface from scratch, but using backtracking, and released his implementation into the public domain. It became very widely used, eventually serving as the basis for the slow regular expression implementations mentioned earlier: Perl, PCRE, Python, and so on.

What a disaster! Where is Rob Pike's 8th edition regexp implementation? Apparently it was released in 1992: https://groups.google.com/g/comp.os.research/c/fvfHNv_t_Dw/m/UYDRogQ1ePEJ but "the particularly efficient regular expression search algorithm went unnoticed".

OK, apparently "the code is now available in many forms":

Only the third of which is still a working URL. But I have a link to a "libregexp9.tgz": https://9fans.github.io/plan9port/unix/libregexp9.tgz

Let's look inside.

It contains only 1500 lines of C, so that is good.

Lol, the NOTICE file says it was written by Rob Pike. Is this actually weirdly arcane knowledge that only a few people have ever understood?

I can't be bothered understanding this now, but this is definitely worth looking at before I implement regexes for SCAMP. It's in ~/Downloads/libregexp.

< 2023-11-24 2023-11-26 >