Skip to main content

Introduction To Regular Expressions [ Regex ]


What Is A Regular Expression

Regular expression or regex for short is a sequence of characters that defines a search pattern.
Let me simplify this for you.
Imagine you're writing a huge assignment or a report then you realize you misspelled a word and apparently the word was used a couple of hundred times throughout your document, now any sane person living in the 21st century would :
find and replace->”misspelled word”->”correct word”.

Have you ever wondered how the computer checks for the word?
How the words are magically found and replaced?
It just searches the entire document for “word-to-be-searched” and replace the string with the new one.
Imagine this scenario, you are asked to redact phone numbers from a letter, now you don’t know any of the numbers personally, and since the letter contains plenty of numbers that are not just phone numbers and those should not be redacted what would you do then?
This is where Regular Expressions come in to play, a regular expression is basically a phase or a sequence of characters that represent a series of characters in any given order in a given grammar. To simplify it further:
Let G be the language we use, such that G = {a, b, c}
This basically means that we are just going to use a, b, and c as the total alphabets in the language instead of the whole 26 alphabets.

Now we can have the alphabets coming in any order, say abc, bac, aaa, bbb, acc, bbb….. and any number of times. Suppose that we have to find all the words that start with a, can have any of the 3 letters in the middle and ends with c. So our possible options are abc, aac, and acc.
Regex state machine
State Transition Diagram
Now looking at the state transition table, A B and C denote the states and a b and c denote the alphabets passed. We always start from the start state, here we start the automata if and only if the first character is ‘a’ else we just skip the word altogether.

From state A, the next state is reached when any of the three letters are incurred. Moving from B we go to state C, which is the final state (denoted by double circle) if and only if ‘c’ is incurred, else we move to dead state which means that our automata gets halted and returns that the word did not satisfy the grammar set (aac, abc & acc).
You don’t always need to draw out the transition table to use a regular expression, this whole table can be represented in text as :
Regular Expression: a(a+b+c)c
Where the symbols represent the following:           
+ :- OR operator
* :- 1 or more occurrences 
So if we need to find a sequence where the above word repeats itself over and over, we just use the regular expression, (a (a + b + c) c)*
Now that you have somewhat of an understanding of what and how regular expressions work, let’s move onto the actual coding side and implementation of regex.
Implementation in C#.

To access regular expression functionality you have to include System.Text.RegularExpressions.
Some of the symbol interpretations change when you use them in programming languages like C#.

List Of Symbols And Their Use In C# Regex

  • ‘\b’: This is a special symbol that indicates the compiler to match the position at the beginning or end of the word to be searched for.
  • ‘\d’: Indicates that the character is a digit ranging from 0-9.
  • {n}: This is usually used after a symbol or word to be searched, n is the number of time the symbol occurs.
  • ‘+’: At least one or more occurrences.
  • ‘\w’: Alphanumeric characters.
  •  ‘.’: Any character except a new line.
  •  ‘\s’: Whitespace.
  • ‘^’: Beginning of a string.
  • ‘$’: End of a string.
  • ‘*’: Any number of repetitions.
  • {n,m}: Repeat the symbol at least n times but not more than m times.
  • {n,}: Repeat the symbol at least n times with no upper limit.
  • ‘\W’: Not alphanumeric
  • ‘\S’: Not white space
  •  ‘\D’: Not digit
  • ‘\B’: Not beginning or end of a word
  • [^x]: Any character that is not x
  • [^aeiou]: Any consonant
  •  ‘*?’: Any number of times but as few as possible
  •  ‘+?’: One or more occurrences but as few as possible
  •  ‘??’: 0 or 1 occurrences but as few as possible
The main function that we will be using in the Regex class is :
Regex.Matches( string textInput, string regexStatement) :- returns MatchCollection object.
We will go over MatchCollection class in the future.

Example Program:

using System;
using System.Text.RegularExpressions;
     
public class BitshiftProgrammer
{
 private static void CheckForCaptialsAtSentenceStart(string text)
 {
       MatchCollection mc = Regex.Matches(text,@"\. [A-Z]\w*");
       /*Start checking with presence of '.' then look for a space then any captial letter between A & Z 
       then following it can be any alpha-numeric value until we see a non-alpha numeric value*/
       foreach (Match m in mc)
       {
            Console.WriteLine(m);
       }
 }
 public static void Main(string[] args)
 {
       Console.WriteLine("Checking for capital rule non-violating words");
       CheckForCaptialsAtSentenceStart("This is first sentence. Second sentence is better. third sentence needs some work. Fourth has become better");
 }
}
Output :
Checking for captical rule non-violting words
. Second
. Fourth
We will go over many such C# examples in the future where we look into much more complicated ones.
Well I hope you learnt something of value.
Please do support Bitshift Programmer by sharing this with your friends and colleagues.
For More C# Tutorials, go HERE.
For Unity Tutorials, go HERE.

Comments

Assets Worth Checking Out

POPULAR POSTS

Curved Surface Shader [ Unity Implementation ]

Curved Surface Shader This is the shader that we will be having at the end of this tutorial.
 The curved surface shader is capable of achieving really varied visual effects from showing space-time curve due to gravity to a generic curved world shader that is seen in endless runners like Subway Surfers.
The concepts that you learn here can open you up to a new way of looking at shaders and if you didn't think they were the coolest thing ever already, hopefully let this be the turning point.😝.

Both the examples show above use the same exact material is just that different values have been passed to the shader.
Start by creating a new unlit shader in Unity and we will work our way from there.
First we define what the properties are:
_MainTex("Texture", 2D) = "white" {} _BendAmount("Bend Amount", Vector) = (1,1,1,1) _BendOrigin("Bend Origin", Vector) = (0,0,0,0) _BendFallOff("Bend Falloff", float) = 1.0 _BendFallOffStr("Falloff s…

Pixelation Shader - Unity Shader

Pixelation Shader This is the correct way (one of many) of showing pixelation as a post-processing effect. This effect will work in any aspect ratio without any pixel size scaling issues as well as it is very minimal in terms of coding it up.

In order to get this to work 2 components have to be set up:
1) The pixelation image effect
2) The script - which will be attached to the camera

So let's get started by creating a new image effect shader.
We will take a look at our Shaderlab properties :
_MainTex("Texture", 2D) = "white" {} That's it, Everything else will be private and not shown in the editor.
Now we will see what are defined along with the _MainTex but are private.
sampler2D _MainTex; int _PixelDensity; float2 _AspectRatioMultiplier; We will pass _PixelDensity & _AspectRatioMultiplier values from the script.
As this is an image effect there is no need to play around with the vertex shader.
Let's take a look at our fragment shader:
fixed4 frag (…

Toon Liquid Shader - Unity Shader

Toon Liquid Shader This is how the shader will end up looking :
This shader is pretty neat and somewhat easy to implement as well as to understand. Since we will be adding some basic physics to the toon water as it is moved about we will have to support that in the vertex shader as well.
So let's start by looking at the properties :
Properties { _Colour ("Colour", Color) = (1,1,1,1) _FillAmount ("Fill Amount", Range(-10,10)) = 0.0 [HideInInspector] _WobbleX ("WobbleX", Range(-1,1)) = 0.0 [HideInInspector] _WobbleZ ("WobbleZ", Range(-1,1)) = 0.0 _TopColor ("Top Color", Color) = (1,1,1,1) _FoamColor ("Foam Line Color", Color) = (1,1,1,1) _Rim ("Foam Line Width", Range(0,0.1)) = 0.0 _RimColor ("Rim Color", Color) = (1,1,1,1) _RimPower ("Rim Power", Range(0,10)) = 0.0 } Just the usual stuff that we are used to. The only thing that may stand out is the [HideInInspector] tag, This works j…

Alto's Adventure Style Procedural Surface Generation Part 1

Alto's Adventure Style - Procedural Surface Generation This game appears to be a strictly 2D game but if you have played it enough you will notice that some of the art assets used look like it's 3D ( I don't know if they are tho ). If you haven't played the game you are missing out on one the most visually pleasing and calming games out there ( There is literally a mode called Zen mode in the game ).
Anyway, I am going to show you how to make a procedural 2D world ( without the trees, buildings and background ) like in Alto's Adventure.
But you may notice I have a plane which is in in the Z-axis giving a depth to the surface which is not there in Alto's Adventure but if you want to know how to do it then that will be in part 2.
To achieve the same effect of Alto's Adventure ( I'm leaving that up to you ) only minimal changes are needed to the code that I am going to explain.
We are going to be using the plane mesh in unity for creating the 2D surface as th…

Gift Wrapping Convex Hull Algorithm With Unity Implementation

Convex Hull Algorithm Convex Hull algorithms are one of those algorithms that keep popping up from time to time in seemingly unrelated fields from big data to image processing to collision detection in physics engines, It seems to be all over the place. Why should you care? Cus you can do magic with it and it seems so simple to implement when you first hear about it, but when you start thinking about it, you will realize why it's not such a straightforward thing to do.
Now that I got you interested (hopefully) and now we will see just what a convex hull is.
As you may have noticed a perimeter was made with the same points that was given and these perimeter points enclose the entire set of points.
Now we have to clear up the term 'Convex'.
Convex means no part of the object is caved inwards or that none of the internal angles made by the points exceed 180 degrees.
In this example of a concave shape internal angles go beyond 180 degrees.
What are those red lines for? Well...…