Mastering SED

The Stream Editor for Efficient Text Processing

By Bhavay Goyal, Heer Ahir, Ritika

How to Install and Use sed (Linux Stream Editor) for Data Cleaning

Introduction

sed (Stream Editor) is a powerful text-processing tool in Linux that allows you to perform search, replace, delete, insert, and filter operations on text files. It is widely used for data cleaning and manipulation tasks.


Installing sed

On Debian-based systems (Ubuntu, Debian)

sudo apt update
sudo apt install sed

On Red Hat-based systems (CentOS, Fedora, RHEL)

sudo dnf install sed  # For Fedora 22+
sudo yum install sed  # For older versions

On Arch Linux

sudo pacman -S sed

On macOS (using Homebrew)

brew install gnu-sed

To verify the installation, run:

sed --version

Windows Systems

Since sed is a Unix-based tool, Windows users can access it through one of the following methods:

  • Windows Subsystem for Linux (WSL)
    • Run a Linux distribution inside Windows and use sed just like on a native Linux system.
    • Install WSL by following this guide: Install WSL - Microsoft Learn
  • Git for Windows (Git Bash)
    • Git Bash provides a Unix-like terminal on Windows that includes sed.
    • Download Git for Windows here: Git for Windows

Understanding the sed Command Format

The basic syntax of sed is:

sed [OPTIONS] 'COMMAND' filename

Where:

  • Options - Modify sed behavior (e.g., -i for inline editing).
  • COMMAND – The actual operation (s, d, i, a, etc.).
  • filename – The file to be processed (or input via stdin).

Commonly Used sed Options

  • -i – Edit files in-place (modifies the file directly).
  • -n – Suppress automatic printing of lines (useful with p to print only selected lines).
  • -e – Allows multiple sed commands in a single execution.
  • -f – Read commands from a separate script file instead of writing them inline.

For example, replacing all insatnces of the word "error" with "warning" in a file (named logfile.log) and modifying it directly (we’ll see more on this ahead):

sed -i 's/error/warning/g' logfile.log

What is Redirection in sed (> and >>)

  • > – Redirects output to a new file (overwriting it).
  • >> – Redirects output to an existing file (appending to it).

For example, replacing text and saving the output to output.txt:

sed 's/old/new/g' filename > output.txt

Key Features & Explanation

1. Text Substitution

Syntax:

sed 's/old_text/new_text/' filename # Replace first occuraence in each line
sed 's/old_text/new_text/g' filename # Replace all occurrences in each line

Example:

Replace “cat” with “dog” in each line.

sed 's/cat/dog/' animals.txt

Input:

I have a cat and a black cat.

Output:

I have a dog and a black cat.

2. Conditional Substitution

Syntax:

sed '/pattern/ s/old/new/' filename # Here pattern refers to Regular expression (RegEx)

Example:

Example: Replace “cat” with “dog” only in lines containing “pet” in animals.txt.

sed '/pet/ s/cat/dog/' animals.txt

Input (animals.txt):

I have a cat.
My pet is a cat.
Cats are independent.

Output:

I have a cat.
My pet is a dog.
Cats are independent.

Explanation: The substitution s/cat/dog/ is applied only to lines matching the pattern “pet”.

3. Deleting Lines

Syntax:

sed '/pattern/d' filename     # Delete lines containing a pattern (RegEx pattern)
sed 'Nd' filename             # Delete line number N
sed 'M,Nd' filename           # Delete lines from M to N

Example:

Delete line 3 from a file.

sed '3d' logs.txt

Input (logs.txt):

Line 1: Success
Line 2: Warning
Line 3: Error detected
Line 4: Success

Output:

Line 1: Success
Line 2: Warning
Line 4: Success

4. Inserting and Appending Lines

Syntax:

sed 'Ni\New Line' filename    # Insert a line before line N
sed 'Na\New Line' filename    # Append a line after line N

Example:

Insert “This is a new inserted line” before line 3.

sed '3i\This is a new inserted line' file.txt

Input (file.txt):

Line 1
Line 2
Line 3
Line 4

Output:

Line 1
Line 2
This is a new inserted line
Line 3
Line 4

5. Replacing Text in a Specific Line

Syntax:

sed 'Ns/old_text/new_text/' filename

Example:

Replace “apple” with “orange” only in line 2.

sed '2s/apple/orange/' fruits.txt

Input (fruits.txt):

Apple is red.
I love apple juice.
Apples are tasty.

Output:

Apple is red.
I love orange juice.
Apples are tasty.

6. Case-Insensitive Substitution

Syntax:

sed 's/old_text/new_text/I' filename   # 'I' flag for case-insensitive

Example:

Replace “hello” with “Hi” in a case-insensitive manner.

sed 's/hello/Hi/I' greetings.txt

Input (greetings.txt):

Hello world!
hello there!

Output:

Hi world!
Hi there!

7. Printing Specific Lines

Syntax:

sed -n 'N,Mp' filename   # Print lines from N to M
sed -n '/pattern/p' filename  # Print lines matching a pattern

Example:

Print only lines 2 to 4.

sed -n '2,4p' file.txt

Input (file.txt):

Line 1
Line 2
Line 3
Line 4
Line 5

Output:

Line 2
Line 3
Line 4

8. Replacing Multiple Patterns

Syntax:

sed -e 's/old1/new1/g' -e 's/old2/new2/g' filename

Example:

Replace “red” with “blue” and “green” with “yellow” in one command.

sed -e 's/red/blue/g' -e 's/green/yellow/g' colors.txt

Input (colors.txt):

I like red and green colors.

Output:

I like blue and yellow colors.

9. Removing Extra Spaces

Syntax:

sed 's/  */ /g' filename   # Replace multiple spaces with a single space

Example:

Convert multiple spaces into a single space.

sed 's/  */ /g' testfile.txt

Input (testfile.txt):

This    is   a   test   file.

Output:

This is a test file.

10. Performing In-Place Edits with Backup

Syntax:

sed -i.bak 's/pattern/replacement/' filename

Example:

Replace the first occurrence of “error” with “warning” in each line of log.txt, creating a backup named log.txt.bak.

sed -i.bak 's/error/warning/' log.txt

Input (log.txt):

error: file not found
error: access denied

Output (log.txt):

warning: file not found
warning: access denied

Backup (log.txt.bak):

error: file not found
error: access denied

Explanation: The -i.bak option edits the file in place and creates a backup with the .bak extension before making changes.


Use Cases

  • Configuration Management – Automate updates to configuration files.
  • Data Cleaning – Remove or replace unwanted data patterns in datasets.
  • Log Processing – Extract and format specific information from log files.
  • Code Refactoring – Modify source code files in bulk.
  • Batch Text Processing – Efficiently manipulate large text files.

Conclusion

sed is a powerful tool for text processing and data cleaning in Linux. With its substitution, deletion, insertion, and filtering capabilities, it becomes a crucial utility for anyone working with large text files or automation scripts.

Further Readings

  • For further learning, check out the official documentation: GNU Sed Manual
  • If you want to learn more about RegEx, do check this out Regex Tutorial
Tags: assignment