Are there any regex implementations out there that support intersection and complement operators in the regex syntax itself?
-
Are there any regex implementations out there that support intersection and complement operators in the regex syntax itself?
(These operators appear to have been omitted in regular expressions by historical accident, tracing back to a paper from Kleene in 1951, where he skipped them because they added no expressive power to the formalism he was working on)
-
Are there any regex implementations out there that support intersection and complement operators in the regex syntax itself?
(These operators appear to have been omitted in regular expressions by historical accident, tracing back to a paper from Kleene in 1951, where he skipped them because they added no expressive power to the formalism he was working on)
@bradlarsen It's not just by historical accident. Intersection/complement blows up the state space of even an NFA-based matcher. https://cstheory.stackexchange.com/questions/52062/intersection-non-emptiness-problem-over-regular-expressions-and-nfa
-
@bradlarsen It's not just by historical accident. Intersection/complement blows up the state space of even an NFA-based matcher. https://cstheory.stackexchange.com/questions/52062/intersection-non-emptiness-problem-over-regular-expressions-and-nfa
@bradlarsen Although if you're just trying to match top-level intersections or complements against particular inputs, you can run different DFAs/NFAs in parallel on the same input and combine their accept/reject results with the appropriate boolean operator.
-
Are there any regex implementations out there that support intersection and complement operators in the regex syntax itself?
(These operators appear to have been omitted in regular expressions by historical accident, tracing back to a paper from Kleene in 1951, where he skipped them because they added no expressive power to the formalism he was working on)
@bradlarsen i have a scala implementation that does this: https://github.com/non/antimirov
the real goal was being able to test regexes for equality/inclusion (i.e. subset/superset). it uses partial derivatives of regular expressions to try to do this efficiently. it does fairly well most of the time but in the worst case it is very slow
-
@bradlarsen Although if you're just trying to match top-level intersections or complements against particular inputs, you can run different DFAs/NFAs in parallel on the same input and combine their accept/reject results with the appropriate boolean operator.
@pervognsen right! At the top level, it's straightforward to use boolean operations in the host language, possibly with multiple calls to regex operations, to approximate complement / intersection. No change in algorithmic complexity doing that
-
R relay@relay.infosec.exchange shared this topic
-
@bradlarsen It's not just by historical accident. Intersection/complement blows up the state space of even an NFA-based matcher. https://cstheory.stackexchange.com/questions/52062/intersection-non-emptiness-problem-over-regular-expressions-and-nfa
@pervognsen is that page discussing a slightly different problem though? That PSPACE-complete problem discussed there is to determine if the intersection of DFAs is empty. This seems like a stronger property than whether a particular input string is accepted by all the DFAs (how you might imagine implementing intersection in a DFA-based regex engine). Or am I overlooking something?
-
@pervognsen is that page discussing a slightly different problem though? That PSPACE-complete problem discussed there is to determine if the intersection of DFAs is empty. This seems like a stronger property than whether a particular input string is accepted by all the DFAs (how you might imagine implementing intersection in a DFA-based regex engine). Or am I overlooking something?
@bradlarsen You run into the same problem with intersection/complement if you try to "internalize" the product construction as a single DFA/NFA the way we do with the other regular expression operations. Once you have a single DFA/NFA you can check its equivalence to the empty set in linear time and space by searching for a path from the initial states to accepting states.
-
@bradlarsen i have a scala implementation that does this: https://github.com/non/antimirov
the real goal was being able to test regexes for equality/inclusion (i.e. subset/superset). it uses partial derivatives of regular expressions to try to do this efficiently. it does fairly well most of the time but in the worst case it is very slow
@d6 how slow is "very slow"? Would it realistically be able to handle equality testing of small variations to patterns like this?
noseyparker/crates/noseyparker/data/default/builtin/rules/phpmailer.yml at 2e6e7f36ce36619852532bbe698d8cb7a26d2da7 · praetorian-inc/noseyparker
Nosey Parker is a command-line tool that finds secrets and sensitive information in textual data and Git history. - noseyparker/crates/noseyparker/data/default/builtin/rules/phpmailer.yml at 2e6e7f36ce36619852532bbe698d8cb7a26d2da7 · praetorian-inc/noseyparker
GitHub (github.com)