The Fastest Way to Parse Regex in C#

googleads_generic_blank
POST
mmombrea-headshot
By Matt Mombrea
Share
The fastest way to parse Regex

As software engineers, we often struggle to balance optimization vs. just getting the job done.

The important thing to consider when building a system is the actual use cases for certain functions. For example, a function that gets called infrequently, like user registration, doesn’t always need a ton of consideration for the absolute fastest performance - and spending extra effort squeezing out marginal gains is likely time better spent elsewhere. 

On the other hand, in cases where performance is critical, implementing the fastest possible solution can have a significant impact on user experience, server costs, and scalability. 

Recently while working on one such case, I came across the need for a series of Regular Expressions to check for pattern-based matching during a rapidly firing API request. This request needs to process as quickly as possible and handle thousands of requests per hour. 

This service is built in C# using .NET core 9, so at first pass I created a new Regex with my pattern and used it to check for a match the same way I’ve done for years.

Regex regex = new(myPattern, RegexOptions.IgnoreCase);
return regex.IsMatch(item);

I tested the results for the expected outcome and had what I was looking for - and ALMOST moved on. 

For added context, there are several different parsing operations in the call, so I paused when I considered the impact this logic would have on a high frequency route. 

Ways to Check for a Regex Match

The first thing to do is to figure out the available techniques for matching via Regex so we can benchmark them. There are 3 obvious ways and 1 not so obvious.

1. Instantiated (Bad - Don’t Do This)

The bad way to do this would be to instantiate the Regex class for every request or iteration, so imagine the below function is called 10,000 times with a different input:

public static bool TestRegexBad(string item)
{
   Regex regex = new(myPattern, RegexOptions.IgnoreCase);
   return regex.IsMatch(item);
}

2. Reusable

A smarter way to go would be to instantiate the Regex class once and reuse it for each iteration.
In the calling class you can define the Regex:

private static readonly Regex RegexReusable = new(myPattern, RegexOptions.IgnoreCase);

Then pass it to the function without recreating it:

public static bool TestRegexReusable(string item, Regex regexReusable)
{
   return regexReusable.IsMatch(item);
}

3. Inline

Rather than defining it at all, you can use the base Regex class statically and provide it with the input and the pattern:

public static bool TestRegexInline(string item)
{
      return Regex.IsMatch(item, myPattern, RegexOptions.IgnoreCase);
}

4. Source Generated / Compile-Time

When you have a fixed pattern, meaning the pattern is not created with a variable inside of it, you have a fourth option which is a compile-time Regex.

The Source Generated Regex needs to be defined in a partial class, so it’s easiest to create a separate class to hold all of your compile-time Regex patterns. You can create it as a static class, or you can use dependency injection if you don’t make it static.

public static partial class RegexCompiled
{
   [GeneratedRegex(myPattern, RegexOptions.IgnoreCase)]
   public static partial Regex MyGeneratedRegex();
}

With that Source Generated Regex defined, you can then use it like so:

public static bool TestRegexCompiled(string item)
{
   return RegexCompiled.MyGeneratedRegex().IsMatch(item);
}

The compiler will emit the operation code which will be included with your project code.

Benchmarks

Using BenchmarkDotNet we’re running 10,000 iterations on each method.

First, the Bad technique is very bad! Not only being 14x slower than the next slowest, but also allocating a ton of memory and triggering all kinds of garbage collection. Don’t do that. 

Next, the Inline and Reusable methods are basically the same performance.

Finally, the Source Generated / compile-time method is over 14x faster than Inline or Reusable!

Improving Reusable

In the previous test, we created a static reusable Regex which turned out to be no better than the Inline static instance. When creating an instance of the Regex class however, you have the ability to pass an option to the constructor indicating that you want this instance to be compiled.

This will essentially create a Generated Regex at runtime, which is similar to the compile-time Regex but with the downside of a slower start-up time because the operation code isn’t pre-generated. If the expression is called many times however, the increased performance will outweigh the start-up costs.

private static readonly Regex RegexReusableCompiled = new(myPattern, RegexOptions.Compiled | RegexOptions.IgnoreCase);

Running our same benchmark of 10,000 iterations, we now see the benefits of a compiled Reusable approach:

The Winners

Source Generated / Compile-Time Regex

If you have fixed Regex patterns, leveraging Generated / compile-time Regex is by far the most performant method. With a 14x performance increase over traditional methods, this can save a huge amount of processing time in certain scenarios. This is the preferred method of Regex whenever it’s possible in your project. 

In reality you’ll likely have a blend of fixed patterns that you can use compiled Regex for, and Inline Regex for cases where the pattern is variable.

Runner Up: Reusable Compiled Regex

A close second in performance is passing the Compiled option to the instantiation of a reusable Regex instance. If for some reason you are unable to use Source Generated Regex, this is the next best option if the function is called many times.

Honorable Mention: Benchmarking

While this is a pretty straightforward experiment, it highlights the value of benchmarking your code. Creating a set of benchmarks for frequently called functions will help you discover issues and improve performance that you might otherwise overlook. 

If you’re struggling with bottlenecks and performance issues in your critical business systems, reach out to the experts at Cypress North for assistance.

POST
mmombrea-headshot
By Matt Mombrea
Share

Leave a Reply

Your email address will not be published. Required fields are marked *

Meet the Author

mmombrea-headshot
CTO / Partner

Matthew Mombrea

Matt is our Chief Technology Officer and one of the founders of our agency. He started Cypress North in 2010 with Greg Finn, and now leads our Buffalo office. As the head of our development team, Matt oversees all of our technical strategy and software and systems design efforts.

With more than 19 years of software engineering experience, Matt has the knowledge and expertise to help our clients find solutions that will solve their problems and help them reach their goals. He is dedicated to doing things the right way and finding the right custom solution for each client, all while accounting for long-term maintainability and technical debt.

Matt is a Buffalo native and graduated from St. Bonaventure University, where he studied computer science.

When he’s not at work, Matt enjoys spending time with his kids and his dog. He also likes to golf, snowboard, and roast coffee.