How to sed

🗓️
•
🔄
•
⏳ 9 min

Dive much deeper into sed here and here.

Keep in mind

Not all sed implementations are created equal.

This post is about the GNU version as it has a lot of cool features that OSX, the various BSDs and Busybox variants are missing.

The basics

Sed stands for Stream EDitor, you can edit a stream like this:

sh
echo "searching, seek and destroy" | sed 's/seek/destroy/g'

Or run the program directly on a file like this:

sed 's/seek/destroy/g' lightning.md
| | |
sed 'do_this' on_this

Let’s break down the ‘do_this’ part: Sed will Substitute seek with destroy Globally1 within lightning.md.

As is the case with most terminal utilities, it output to stdout by default, so no changes will be done to our lightning.md file. We can pass it the -i flag to make the changes ‘in place’, i.e. overwrite the original file.

Of course, we can also redirect its output to a different file with >.

So given a file like:

This line contains the word line twice
This line also contains the word line twice

If we run a sed command like sed 's/line/potato/' test-one-line.md, it would print the following to stdout:

This potato contains the word line twice
This potato also contains the word line twice

Notice how we didn’t use the Global1 scope, so sed parsed only the first instance of line on both lines.

Using the -i flag it will overwrite the file instead of printing to stdout.

Quality of life

Always quote

Notice the ' in sed 's/seek/destroy/g'. This prevents any regex we might use from leaking out to the shell.

Extended Regex

By default, only basic regex is enabled, which enables you to use some special characters (like . or *) while others will be taken literally (like + or ?).

We can choose to use Extended regex by passing the -E flag to the command. Give this a try if you find your regex to not work as expected.

Learn more about regex here.

Pick a convenient delimiter

Usually, sed examples are shown with the / char as a delimiter.

For this to work, all / within the command need to be escaped.

You might find it useful to switch delimiter, especially when using sed on paths:

sed 's/\/bin\/bash\//\/bin\/sh\//g' -> sed 's:/bin/bash/:/bin/sh/:g' or sed 's_/bin/bash/_/bin/sh/_g'

Sed doesn’t really care what you use as long as you are consistent with it.

Simple but useful

Remove all EOL spaces

sh
sed 's/\s$//'

Remove all spaces at the end of all lines in the given file.

The \s is simply a way of representing white spaces. You can learn more about it here.

Delete all instances of word

sh
sed 's/foo//g'

Delete all instances of foo.

You might be tempted to use something like s/.*foo.*//g to delete any line containing foo.

Don’t, it will leave an empty line in its place. There is a delete command for this use case.

Only in nth instance

sh
sed 's/lorem/ipsum/2'

Substitute lorem for ipsum only on the 2nd instance of lorem of every line.

Only from nth instance

sh
sed 's/lorem/ipsum/2g'

Substitute lorem for ipsum from the 2nd instance of lorem of every line, until the end of the line.

The not-so-basics

Only on matching lines

sh
sed '/^foo/ s/hi/mom/' file

Substitute hi for mom only on lines that start with foo.

For example, to migrate CSS classes from snake_case to camelCase, without compromising their properties, you might use something like:

sh
sed -E '/\{$/ s_*(\w+?)_\u\1_g' file.css

Which only does the thing in lines that end with{.

If that looks like a bunch of random symbols to you, check out this post.

Between matching lines

You can apply a command only within a certain (variable) range:

sh
sed '/#region/,/#endregion/s/foo/bar/' file.cs

Re-use the match

You can use & to represent the match:

sh
echo "what a nice example, this is a cool program!" | sed 's/[nice|cool]/VERY&/'

Would output:

what a VERYnice example, this is a VERYcool program!

Case-insensitive

You can add an i at the end to make the match case-insensitive:

sh
sed 's/foo/bar/gi'

Which means:

foo Foo -> bar bar

Negate matches

You can tell sed to do it’s magic only on lines not matching a given pattern:

sh
sed '/^foo bar baz.*/! s/foo bar/hi mom/' afile.txt

This would substitute foo bar for hi mom except in lines that start with foo bar baz.

Output replacements to separate file

You can write the lines affected by sed to a separate file with w:

sh
sed 's_foo_bar_w replacementsFile' fileToModify

Substitute multiple lines

By default, sed uses \n chars as line delimiters, so multi-line substitutions are non-trivial.

Thankfully, the GNU version supports the -z flag, which tells sed to use NUL as the line delimiter.

This allows you to get a bit fancy and do things like:

sh
sed -z 's_line one\nline two_merged lines one and two_g'

Consider however, that this means that ^ and $ now refer to the end of the file (NUL) instead of the line, which also affects the g at the end of the command.

Sadly, non GNU implementations of sed require a bit more ‘sed-Fu’ to achieve this.

Groupings and References

You can leverage the magic of Groupings and References to, for example, switch words around:

sh
sed -E 's:([a-zA-Z]*) ([a-zA-Z]*):\2 \1:' file

Which means:

World Hello -> Hello World

Want a better use case?

sh
sed -E 's_(.+?)\[(.+?)\]\(([^)]+)\)(.+?)_\1\2[^\3]\4\n\n\n[^\3]: \3\n_g' book.md

sweat

Let’s take it apart:

The ‘search’ part looks like this: (.+?)\[(.+?)\]\(([^)]+)\)(.+?).

The first and last groupings are pretty simple: ‘whatever goes before/after the mess in between’.

That leaves us with \[(.+?)\]\(([^)]+)\), which looks like a mess because we have to escape a lot of regular and squared parenthesis.

There are two distinct zones to this regex: \[(.+?)\] and \(([^)]+)\).

The first means ‘everything inside [squared parenthesis]’, while the second could also be written like \((.+?)\) (which is pretty much the same as the other one, except for the different parenthesis).

Want to know why to use one instead of the other? Check out this post.

So we have four groups:

  1. Everything before
  2. Everything within []
  3. Everything within ()
  4. Everything after

Replace

On the other hand, the ‘replace’ part reads \1\2[^\3]\4\n\n\n[^\3]: \3\n.

We can see that there are two parts to this mess: \1\2[^\3]\4 and [^\3]: \3, with a bunch of line breaks (\n) here and there.

Notice also how the ‘[squared parenthesis]’ are not escaped here.

The first part simply removes all the parenthesis from the match, while enclosing the third grouping in squared parenthesis and prepending it with a ^.

So text [looks like](a-link) more text becomes text looks like[^a-link] more text.

The second half repeats the previous behavior regarding the third grouping while adding it again after a : and a white space.

Taking into account the line breaks, text [looks like](a-link) more text becomes:

text looks like[^a-link] more text
[^a-link]: a-link

So we successfully turned Markdown links into Markdown references, without breaking the rest of the line.

Keep in mind that this command will hammer through images (![image-text](image-link)) as well. You might want to negate those matches with something like /!.*/!.

Also, this command won’t behave nicely on lines with two or more links.

Was it a headache? Yes.

Was it more of a headache than doing it by hand on 400+ pages, heavily referenced book? Hell no!

Change cases

Here are some of the GNU specific goodies mentioned earlier:

\l Turn the next character to lowercase.
\L Apply \l until a \U or \E is found.
\u Turn the next character to uppercase.
\U Apply \u until a \L or \E is found.
\E End case conversion started by \L or \U.

So to give a simple example, you can ensure all headings in a .md file start with upper case letters by running this:

sh
sed -E 's/^(#+) (\w+)/\1 \u\2/' cases.md

Which means:

## all caps -> ## All caps

Concatenate multiple commands

Sometimes doing everything in one go is a bit of a headache or actually impossible.

You can pipe sed commands using the shell (|) or adding the -e flag before them:

sh
sed -Ee 's/(^#+) (\w+)/\1 \u\2/' -e 's/foo/bar/g' cases.md

This way, the file is read once and the commands are run one after the other on each line.

More than substitutions

Sed is a stream editor, so you can do much more than substitutions with it.

Delete

To delete any line containing the word vim you could do:

sh
sed /vim/d file

For a more useful example, you could delete empty lines with:

sh
sed '/^$/d' file

Or delete commented lines (starting with #) like so:

sh
sed '/^#/d' file

Or negate the whole thing and delete everything but commented lines:

sh
sed -E '/^#/!d' file

Print

You can tell sed to print the lines where replacements are made with p:

sh
sed 's/foo/bar/p' file

You can also simulate grep-like behavior with something like sed '/re/p' file (familiar?), which would simply print all instances of re.

Of course, without the -i flag sed prints everything else as well, so you end up with the lines you are interested in printed twice.

Use the -n flag to make it behave as expected (which is to only print matching lines).

For a more practical example, you can print the lines between two matches:

sh
sed -nE '/between-this/,/and-this/p' file

Append, Insert and Change

Append text on a new line after each line containing the given text:

sh
sed '/foo/a\AFTER FOO' file

Insert text on a new line before each line containing the given text:

sh
sed '/foo/i\BEFORE FOO' file

Change line containing the given text:

sh
sed '/bar/c\BAR IS CHANGED' file

Footnotes

  1. sed operates on a per-line basis, so when we determine the scope (Global in the example), we are referring to the scope within each line. ↩ ↩2


Other posts you might like