Dive much deeper into sed here and here.
Keep in mind
Not all sed implementations are created equal.
This post is about the GNU version as it has a lot of cool features that OSX, the various BSDs and Busybox variants are missing.
The basics
Sed stands for Stream EDitor, you can edit a stream like this:
Or run the program directly on a file like this:
Letâs break down the âdo_thisâ part: Sed will Substitute seek
with destroy
Globally1 within lightning.md
.
As is the case with most terminal utilities, it output to stdout
by default, so no changes will be done to our lightning.md
file. We can pass it the -i
flag to make the changes âin placeâ, i.e. overwrite the original file.
Of course, we can also redirect its output to a different file with >
.
So given a file like:
If we run a sed command like sed 's/line/potato/' test-one-line.md
, it would print the following to stdout
:
Notice how we didnât use the Global1 scope, so sed parsed only the first instance of line
on both lines.
Using the -i
flag it will overwrite the file instead of printing to stdout
.
Quality of life
Always quote
Notice the '
in sed 's/seek/destroy/g'
. This prevents any regex we might use from leaking out to the shell.
Extended Regex
By default, only basic regex is enabled, which enables you to use some special characters (like .
or *
) while others will be taken literally (like +
or ?
).
We can choose to use Extended regex by passing the -E
flag to the command. Give this a try if you find your regex to not work as expected.
Learn more about regex here.
Pick a convenient delimiter
Usually, sed examples are shown with the /
char as a delimiter.
For this to work, all /
within the command need to be escaped.
You might find it useful to switch delimiter, especially when using sed on paths:
sed 's/\/bin\/bash\//\/bin\/sh\//g'
-> sed 's:/bin/bash/:/bin/sh/:g'
or sed 's_/bin/bash/_/bin/sh/_g'
Sed doesnât really care what you use as long as you are consistent with it.
Simple but useful
Remove all EOL spaces
Remove all spaces at the end of all lines in the given file.
The \s
is simply a way of representing white spaces. You can learn more about it here.
Delete all instances of word
Delete all instances of foo
.
You might be tempted to use something like s/.*foo.*//g
to delete any line containing foo
.
Donât, it will leave an empty line in its place. There is a delete command for this use case.
Only in nth instance
Substitute lorem
for ipsum
only on the 2nd
instance of lorem
of every line.
Only from nth instance
Substitute lorem
for ipsum
from the 2nd
instance of lorem
of every line, until the end of the line.
The not-so-basics
Only on matching lines
Substitute hi
for mom
only on lines that start with foo
.
For example, to migrate CSS classes from snake_case to camelCase, without compromising their properties, you might use something like:
Which only does the thing in lines that end with{
.
If that looks like a bunch of random symbols to you, check out this post.
Between matching lines
You can apply a command only within a certain (variable) range:
Re-use the match
You can use &
to represent the match:
Would output:
Case-insensitive
You can add an i
at the end to make the match case-insensitive:
Which means:
Negate matches
You can tell sed to do itâs magic only on lines not matching a given pattern:
This would substitute foo bar
for hi mom
except in lines that start with foo bar baz
.
Output replacements to separate file
You can write the lines affected by sed to a separate file with w
:
Substitute multiple lines
By default, sed uses \n
chars as line delimiters, so multi-line substitutions are non-trivial.
Thankfully, the GNU version supports the -z
flag, which tells sed to use NUL
as the line delimiter.
This allows you to get a bit fancy and do things like:
Consider however, that this means that
^
and$
now refer to the end of the file (NUL
) instead of the line, which also affects theg
at the end of the command.
Sadly, non GNU implementations of sed require a bit more âsed-Fuâ to achieve this.
Groupings and References
You can leverage the magic of Groupings and References to, for example, switch words around:
Which means:
Want a better use case?
Letâs take it apart:
Search
The âsearchâ part looks like this: (.+?)\[(.+?)\]\(([^)]+)\)(.+?)
.
The first and last groupings are pretty simple: âwhatever goes before/after the mess in betweenâ.
That leaves us with \[(.+?)\]\(([^)]+)\)
, which looks like a mess because we have to escape a lot of regular and squared parenthesis.
There are two distinct zones to this regex: \[(.+?)\]
and \(([^)]+)\)
.
The first means âeverything inside [squared parenthesis]â, while the second could also be written like \((.+?)\)
(which is pretty much the same as the other one, except for the different parenthesis).
Want to know why to use one instead of the other? Check out this post.
So we have four groups:
- Everything before
- Everything within
[]
- Everything within
()
- Everything after
Replace
On the other hand, the âreplaceâ part reads \1\2[^\3]\4\n\n\n[^\3]: \3\n
.
We can see that there are two parts to this mess: \1\2[^\3]\4
and [^\3]: \3
, with a bunch of line breaks (\n
) here and there.
Notice also how the â[squared parenthesis]â are not escaped here.
The first part simply removes all the parenthesis from the match, while enclosing the third grouping in squared parenthesis and prepending it with a ^
.
So text [looks like](a-link) more text
becomes text looks like[^a-link] more text
.
The second half repeats the previous behavior regarding the third grouping while adding it again after a :
and a white space.
Taking into account the line breaks, text [looks like](a-link) more text
becomes:
So we successfully turned Markdown links into Markdown references, without breaking the rest of the line.
Keep in mind that this command will hammer through images (![image-text](image-link)
) as well. You might want to negate those matches with something like /!.*/!
.
Also, this command wonât behave nicely on lines with two or more links.
Was it a headache? Yes.
Was it more of a headache than doing it by hand on 400+ pages, heavily referenced book? Hell no!
Change cases
Here are some of the GNU specific goodies mentioned earlier:
So to give a simple example, you can ensure all headings in a .md
file start with upper case letters by running this:
Which means:
Concatenate multiple commands
Sometimes doing everything in one go is a bit of a headache or actually impossible.
You can pipe sed commands using the shell (|
) or adding the -e
flag before them:
This way, the file is read once and the commands are run one after the other on each line.
More than substitutions
Sed is a stream editor, so you can do much more than substitutions with it.
Delete
To delete any line containing the word vim
you could do:
For a more useful example, you could delete empty lines with:
Or delete commented lines (starting with #
) like so:
Or negate the whole thing and delete everything but commented lines:
You can tell sed to print the lines where replacements are made with p
:
You can also simulate grep-like behavior with something like sed '/re/p' file
(familiar?), which would simply print all instances of re
.
Of course, without the -i
flag sed prints everything else as well, so you end up with the lines you are interested in printed twice.
Use the -n
flag to make it behave as expected (which is to only print matching lines).
For a more practical example, you can print the lines between two matches:
Append, Insert and Change
Append text on a new line after each line containing the given text:
Insert text on a new line before each line containing the given text:
Change line containing the given text: