timing tests ... ok, endmatch totally wins. Especially for a mismatch in which case endmatch returns early, but match-end shows no gain. These tests were run on anarki arc:
average time im ms over 100000 runs of [_ ".jpg" "hello.jpg"]
endmatch 0.029
match-end 0.064
average time im ms over 100000 runs of [_ ".jpg" "hello.png"]
endmatch 0.016
match-end 0.062
So it seems performance is a likely explanation for the way endmatch is implemented.
(interestingly, for comparison, on rainbow the time for endmatch is roughly similar (0.028,0.019) but for match-end is disastrous (0.11,0.11). I'll need to figure out why.)
You're right about my match-end not working on arc2 - I mostly use anarki as it has a bunch of convenient changes including negative offsets for cut.
Though keep in mind that effort making a particular function run faster is only useful if a significant percentage of a program's execution time is spent in that function. For example, even if your match-end runs four times slower than endmatch in rainbow, if a program spends only 0.01% of its time matching the end of strings, then making match-end run faster won't help the program run any faster.
Certainly as an optimization arc2's implementation of endmatch would appear to be premature one, considering that endmatch isn't even used in the arc2 and HN source code, much less identified (as far as I know :) as a hot point slowing down the execution of someone's program. And while I do personally find it to be a fun piece of code (demonstrating how loop unrolling can be done using macros), your implementation is a clear winner in clarity and succinctness. :-)