Getting to Grips with Grep!

7 minute read

Introduction

Grep, has been around since the early 1970s, and has an interesting backstory attached to it (well explained by this video from Computerphile on YouTube).

Nowadays, it’s a staple of Linux administration, and a powerful tool that you can use to improve your efficiency.

What is Grep?

Grep stands for “globally search for a regular expression and print matching lines”. And as the name suggests, it is a tool well suited to sifting through text.

When Should You use Grep

There are two main use cases for grep

  1. Search text within files
  2. Filter output of another program, so you only get what you care about

We’ll go through some examples of grep in action. The text file we will work with will be a pretty trivial one, just so I can show you the absolute basics of what grep is capable of, and some of the options you can run it with. Perhaps in the future we could explore some more realistic applications.

Working With Grep: A Basic Example

Consider the following file, saved as grep-example.txt. It’s 17 lines long, 15 contain a word, with one rogue line containing a random IP address, and another line containing something that had ambitions of being an IP address once upon a time…

Let’s see what we can do with this.

vessel
cemetery
scintillating
Scintillating
shape
204.235.208.222
2049235%208a222
opine
declare
zinc
disagreeable
lizards
mark
Hello
hello
queen
queue

Exact Match

The most basic way to use grep is to try to get it to produce an exact match. We don’t have to wrap our search term in single quotes, but it does help guard against some unpredictable outcomes if you’re new to the tool, so I’d recommend doing so to start.

If I type in grep 'cemetery' grep-example.txt the command will return the following:

$ grep 'cemetery' grep-example.txt
cemetery

Nice! What if I want all the entries that contain the letter “c”. I can run grep 'c' grep-example.txt

$ grep 'c' grep-example.txt
cemetery
scintillating
Scintillating
declare
zinc

OK, but this by itself might be pretty useless: How am I going to actually look these up. One way to do this would be to use the --line-number option , which can be abbreviated to -n :

$ grep 'c' -n grep-example.txt
2:cemetery
3:scintillating
4:Scintillating
9:declare
10:zinc

Regular Expressions

So far so good. But what if we want to grab that IP address?:

$ grep '204.235.208.222' grep-example.txt
204.235.208.222
2049235%208a222

Wait! What? We didn’t get the results we were expecting.This didn’t work because Linux interpreted the . character here as standing in for any character whatsoever. As we touched upon earlier, the RE in grep stands for “regular expressions”- the programme checks for these and uses them in your search.

Because of this the following things also would have been returned in the search as well:

204A235B208C222
204-235-208-222
204323532083222

We wanted to find the IP address though. What can we do to get the results we want?

There are two ways of dealing with this situation

You could escape characters, so their special meaning in Regex is taken away:

$ grep '204\.235\.208\.222' grep-example.txt
204.235.208.222

Which gives us what we want.

However, for a large string, this could be quite tedious, and if you were sending this line to a colleague who isn’t used to working with regex, they might not find the command you wrote to be intuitive.

A cleaner alternative would be to pass the -F flag (which is an abbreviation for --fixed-strings)to grep, or use fgrep. Both of these options will have the same result: they will let you search for a string that is an exact match to one you provide.

$ grep --fixed-strings 204.235.208.222 grep-example.txt
204.235.208.222

$ grep -F 204.235.208.222 grep-example.txt
204.235.208.222

$ fgrep 204.235.208.222 grep-example.txt
204.235.208.222

#All these commands return the very same result

When you do want to use regular expressions in your search term, it can be a good idea to use the --extended-regexp option (-E ) for short. you can also call egrep instead of grep to achieve the same thing:

$ grep -E '[Hh]ello' grep-example.txt
Hello
hello

A lot of Regex features are available without passing this flag, but in case you can’t remember which are included and which are not, passing this flag can help you stay on the safe side.

Filtering Out Noise

Imagine we are getting a huge amount of input from a programme we are using which is giving us information we don’t care about right now. Often when we use grep, we use it to filter out things we don’t care about.

Imagine that, in the example from earlier, instead of searching for all lines that contained the letter c, what about lines that omit it?

To do this, we have the option to use the inversion flag -v , short for --invert-match, which works like this:

$ grep -v 'c' grep-example.txt
vessel
shape
204.235.208.222
2049235%208a222
opine
disagreeable
lizards
mark
Hello
hello
queen
queue

$ grep --invert-match 'c' grep-example.txt
vessel
shape
204.235.208.222
2049235%208a222
opine
disagreeable
lizards
mark
Hello
hello
queen
queue

#Again, both these commands do the very same thing

This command is especially useful when you are using grep to control the flow of data from one application to another (As you might do with a CI pipeline).

Case Insensitivity

The next thing we should take a look at is when you are sifting through a large amount of text, and you want to check for all instances of a string, without filtering for caps/lower case. At the moment, we know how to get grep to return only one of the two hello strings in our file:

$ grep 'Hello' grep-example.txt
Hello

$ grep 'hello' grep-example.txt
hello

To do this. Grep has a special flag --ignore-case which (as the name suggests) ignores cases when you do your searches. The short form of this command is -i.

$ grep --ignore-case 'H' grep-example.txt
shape
Hello
hello

The abbreviated version for this command grep --i 'H' grep-example.txt would have accomplished the very same thing.

Multiple files

These commands so far should get you started, and give you some idea of how grep is used within a file. but what if you have a large number of files distributed over a large number of directories. This is probably how you are going to use grep a fair bit in reality, and fortunately, as we mentioned at the beginning of this article, It is a problem that grep was born to solve.

To show how grep works recursively, I’ve taken our example file from early and placed copies of it in several places within the directory structure of a project. I used the tree command (which is a pretty handy tool to get your hands on, in and of itself, ) to generate the diagram below.

$ tree nested_texts/

.
├── grep-example.txt
└── nest
    ├── grep-example.txt
    └── nest
        └── grep-example.txt

Right, so to simply do the job, and run a search recursively, through every file in a tree, we can pass grep the -r flag as below.(The long form of this command being --recursive .

grep -r hello nested_texts/
nested_texts/grep-example.txt:hello
nested_texts/nest/grep-example.txt:hello
nested_texts/nest/nest/grep-example.txt:hello

Pretty handy right? You can probably already start to see how this could be used to trawl through some of the larger repositories you work with.

A Side Note About Flags

That pretty much concludes everything I wanted to show you, however there is one last thing I wanted to show, that I couldn’t find a good way to fit in. When you use flags with grep (and a number of other Linux tools, for that matter), you can combine your flags. A couple quick examples of this are below:

A command combining recursion with case insensitivity:

grep -ri 'H' nested_texts/
nested_texts/grep-example.txt:shape
nested_texts/grep-example.txt:Hello
nested_texts/grep-example.txt:hello
nested_texts/nest/grep-example.txt:shape
nested_texts/nest/grep-example.txt:Hello
nested_texts/nest/grep-example.txt:hello
nested_texts/nest/nest/grep-example.txt:shape
nested_texts/nest/nest/grep-example.txt:Hello
nested_texts/nest/nest/grep-example.txt:hello

#Returns everything containing the letter 'h', Caps or not, in every file within the tree.

A command combining recursion with case insensitivity and prints line numbers, I’ve seen people use this a lot, and I’m sure you will too:

grep -rni 'H' nested_texts/
nested_texts/grep-example.txt:5:shape
nested_texts/grep-example.txt:14:Hello
nested_texts/grep-example.txt:15:hello
nested_texts/nest/grep-example.txt:5:shape
nested_texts/nest/grep-example.txt:14:Hello
nested_texts/nest/grep-example.txt:15:hello
nested_texts/nest/nest/grep-example.txt:5:shape
nested_texts/nest/nest/grep-example.txt:14:Hello
nested_texts/nest/nest/grep-example.txt:15:hello

#Same as before, but with line numbers

Conclusion

So, If you’re getting used to using grep for the first time, I hope this has served as a good introduction. Note that if you’re working with really large repositories the are faster tools available to you, but grep is still worth getting to know.

Thanks for reading.

Updated: