pub struct Cylon { /* private fields */ }Expand description
A Cylon is an NFA that recognizes rules from a compiled robots.txt file. By providing it a URL path, it can decide whether or not the robots file that compiled it allows or disallows that path.
The performance is on average O(n ^ k), where n is the length of the path and k is the average number of transitions from one prefix. This exponontial runtime is acceptable in most cases because k tends to be very small.
Contrast that with the naive approach of matching each rule individually. If you can match a rule in O(n) time and there are p rules of length q, then the performance will be O(n * p * q). However the NFA is likely more efficient, because it can avoid matching the same prefix multiple times. If there are x prefixes and each prefix is used y times, then the naive approach must make O(x * y) comparisons whereas the NFA only makes O(y) comparisons.
In general robots.txt files have a lot of shared prefixes due to the nature of URLs. That is why the pre-compiled NFA will be faster in most cases. However there is an upfront cost of compiling the NFA which is not present when doing naive matching. That cost can be amortized by caching the compiled Cylon for subsequent uses.