jes notes Index Gallery . 4th axis Blog ideas Micro-machining CNC milling machine Paint colours CNC Router Shaft passers Software ideas Stepper motor clock Toy ideas Watchmaking

2023-11-26

Last modified: 2023-11-26 21:45:49

< 2023-11-25 2023-11-27 >

Thompson NFA

Maybe the way we do capturing is we assign capture order based on the literal order that the capturing groups occur in the regex (not in the matched string, this is different when captures are conditional for example, but we don't care what we do in that case).

And then whenever we enter a state that starts a capture, we update the "start" index of that state, and whenever we enter a state that ends a capture, we update the "end" index of that state.

We might want to make updating the indexes conditional on not already having entered, or not still being in, the capture.

Example: "aabb" =~ /^([ab]+)$/

The capture could allowably be any of: empty string, a, aa, b, bb, ab, aab, abb, aabb.

I think the algorithm I stated above (update index whenever you can enter or exit) would capture just "b", because it would always start and end at the last possible point.

What about regexes with multiple capturing groups that could overlap? We certainly wouldn't want to capture the same text in both groups, would we?

I think we want the capturing index to start at the time we can go into the capture if we're not already in it, and stop at the time we go out of the capture. If we enter the capture again for a second time after having lost it previously, then we update the start index to the new one.

So each state needs to know the capturing group that it pertains to, and we need something to say not only which states are active, but which capturing groups are active.

So in /^x([ab]+)$/, the states are:

And the transitions are:

s0 --x--> s1 --ab--> s2 ---EOF--> s3 ^ | --ab--

So if we read an 'x' from s0 we go into s1. If we read an 'a' or a 'b' from s1, we go into s2. If we reach the end of the string from s2, we are done. If we read an 'a' or a 'b' from s2, we stay in s2.

So I think the capture is purely on s2. When you enter s2, if you're not already in it, you enter the capturing group. When you leave s2, if you're not still in it, you leave the capturing group.

So each state needs to have:

And each capture group needs to have:

So I think matching and capturing, given the NFA, is straightforward. Then it's just a case of turning the regex into an NFA, which also should be straightforward.

It would be good to read Thompson's actual paper, but the link in the Russ Cox page ( https://www.cse.chalmers.se/~coquand/AUTOMATA/thompson.pdf ) has been broken for the entire time that Wayback Machine has indexed it.

Aha, great success: https://dl.acm.org/doi/pdf/10.1145/363347.363387 (mirrored)

Plan for making the regex library:

I don't know how big it will end up being, might have to not be part of the library blob.

Well I've read the paper and the implementation is very small, however it compiles to IBM 7094 machine code, which I don't understand. It's not obvious how much of the heavy lifting is done by the CPU. I literally don't understand how the implementation in the paper maps to what I understood from Russ Cox's web page.

Oh well, let's proceed to implementing my understanding in Perl.

Well I wrote a handful of scripts:

Next is to make capturing in regex work correctly.

Stepper motor clock

I'm going to print the electrical cover, with "organic" supports. This is a 3h59m print job.

I have made all of the shafts for 2 clocks, except only cut the flat on one of them. The Z axis motor on the milling machine wasn't working when I first got to it. I took the motor off, swapped the cables around, couldn't get it to work. Then I hit the end of the shaft with a hammer and it started working. I don't know what was wrong.

The hole in the second wheel needs to be clearance for the 5mm shaft (but ideally should locate the centre), I've updated the CAD, STL, and G-code accordingly. Also saved the PrusaSlicer job as a 3mf this time, in case I need to update it again.

Might be a good idea to put a groove down the side of the motor slot, next to one of the screw holes, to allow an unseen wire to earth the motor housing:

Garage

Yeah, the networking doesn't work with the powerline adapter where it is. Frustrating. At least I haven't bothered pinning the cable up yet, lol. I will have to move it closer to the house and get a longer cable.

< 2023-11-25 2023-11-27 >