How to Install and Use sed
(Linux Stream Editor) for Data Cleaning
Introduction
sed
(Stream Editor) is a powerful text-processing tool in Linux that allows you to perform search, replace, delete, insert, and filter operations on text files. It is widely used for data cleaning and manipulation tasks.
Installing sed
On Debian-based systems (Ubuntu, Debian)
sudo apt update
sudo apt install sed
On Red Hat-based systems (CentOS, Fedora, RHEL)
sudo dnf install sed # For Fedora 22+
sudo yum install sed # For older versions
On Arch Linux
sudo pacman -S sed
On macOS (using Homebrew)
brew install gnu-sed
To verify the installation, run:
sed --version
Windows Systems
Since sed
is a Unix-based tool, Windows users can access it through one of the following methods:
- Windows Subsystem for Linux (WSL)
- Run a Linux distribution inside Windows and use
sed
just like on a native Linux system. - Install WSL by following this guide: Install WSL - Microsoft Learn
- Run a Linux distribution inside Windows and use
- Git for Windows (Git Bash)
- Git Bash provides a Unix-like terminal on Windows that includes
sed
. - Download Git for Windows here: Git for Windows
- Git Bash provides a Unix-like terminal on Windows that includes
Understanding the sed
Command Format
The basic syntax of sed is:
sed [OPTIONS] 'COMMAND' filename
Where:
Options
- Modifysed
behavior (e.g.,-i
for inline editing).COMMAND
– The actual operation (s
,d
,i
,a
, etc.).filename
– The file to be processed (or input via stdin).
Commonly Used sed
Options
-i
– Edit files in-place (modifies the file directly).-n
– Suppress automatic printing of lines (useful withp
to print only selected lines).-e
– Allows multiplesed
commands in a single execution.-f
– Read commands from a separate script file instead of writing them inline.
For example, replacing all insatnces of the word "error"
with "warning"
in a file (named logfile.log
) and modifying it directly (we’ll see more on this ahead):
sed -i 's/error/warning/g' logfile.log
What is Redirection in sed (>
and >>
)
>
– Redirects output to a new file (overwriting it).>>
– Redirects output to an existing file (appending to it).
For example, replacing text and saving the output to output.txt
:
sed 's/old/new/g' filename > output.txt
Key Features & Explanation
1. Text Substitution
Syntax:
sed 's/old_text/new_text/' filename # Replace first occuraence in each line
sed 's/old_text/new_text/g' filename # Replace all occurrences in each line
Example:
Replace “cat” with “dog” in each line.
sed 's/cat/dog/' animals.txt
Input:
I have a cat and a black cat.
Output:
I have a dog and a black cat.
2. Conditional Substitution
Syntax:
sed '/pattern/ s/old/new/' filename # Here pattern refers to Regular expression (RegEx)
Example:
Example: Replace “cat” with “dog” only in lines containing “pet” in animals.txt.
sed '/pet/ s/cat/dog/' animals.txt
Input (animals.txt):
I have a cat.
My pet is a cat.
Cats are independent.
Output:
I have a cat.
My pet is a dog.
Cats are independent.
Explanation: The substitution s/cat/dog/
is applied only to lines matching the pattern “pet”.
3. Deleting Lines
Syntax:
sed '/pattern/d' filename # Delete lines containing a pattern (RegEx pattern)
sed 'Nd' filename # Delete line number N
sed 'M,Nd' filename # Delete lines from M to N
Example:
Delete line 3 from a file.
sed '3d' logs.txt
Input (logs.txt):
Line 1: Success
Line 2: Warning
Line 3: Error detected
Line 4: Success
Output:
Line 1: Success
Line 2: Warning
Line 4: Success
4. Inserting and Appending Lines
Syntax:
sed 'Ni\New Line' filename # Insert a line before line N
sed 'Na\New Line' filename # Append a line after line N
Example:
Insert “This is a new inserted line” before line 3.
sed '3i\This is a new inserted line' file.txt
Input (file.txt):
Line 1
Line 2
Line 3
Line 4
Output:
Line 1
Line 2
This is a new inserted line
Line 3
Line 4
5. Replacing Text in a Specific Line
Syntax:
sed 'Ns/old_text/new_text/' filename
Example:
Replace “apple” with “orange” only in line 2.
sed '2s/apple/orange/' fruits.txt
Input (fruits.txt):
Apple is red.
I love apple juice.
Apples are tasty.
Output:
Apple is red.
I love orange juice.
Apples are tasty.
6. Case-Insensitive Substitution
Syntax:
sed 's/old_text/new_text/I' filename # 'I' flag for case-insensitive
Example:
Replace “hello” with “Hi” in a case-insensitive manner.
sed 's/hello/Hi/I' greetings.txt
Input (greetings.txt):
Hello world!
hello there!
Output:
Hi world!
Hi there!
7. Printing Specific Lines
Syntax:
sed -n 'N,Mp' filename # Print lines from N to M
sed -n '/pattern/p' filename # Print lines matching a pattern
Example:
Print only lines 2 to 4.
sed -n '2,4p' file.txt
Input (file.txt):
Line 1
Line 2
Line 3
Line 4
Line 5
Output:
Line 2
Line 3
Line 4
8. Replacing Multiple Patterns
Syntax:
sed -e 's/old1/new1/g' -e 's/old2/new2/g' filename
Example:
Replace “red” with “blue” and “green” with “yellow” in one command.
sed -e 's/red/blue/g' -e 's/green/yellow/g' colors.txt
Input (colors.txt):
I like red and green colors.
Output:
I like blue and yellow colors.
9. Removing Extra Spaces
Syntax:
sed 's/ */ /g' filename # Replace multiple spaces with a single space
Example:
Convert multiple spaces into a single space.
sed 's/ */ /g' testfile.txt
Input (testfile.txt):
This is a test file.
Output:
This is a test file.
10. Performing In-Place Edits with Backup
Syntax:
sed -i.bak 's/pattern/replacement/' filename
Example:
Replace the first occurrence of “error” with “warning” in each line of log.txt, creating a backup named log.txt.bak.
sed -i.bak 's/error/warning/' log.txt
Input (log.txt):
error: file not found
error: access denied
Output (log.txt):
warning: file not found
warning: access denied
Backup (log.txt.bak):
error: file not found
error: access denied
Explanation: The -i.bak
option edits the file in place and creates a backup with the .bak
extension before making changes.
Use Cases
- Configuration Management – Automate updates to configuration files.
- Data Cleaning – Remove or replace unwanted data patterns in datasets.
- Log Processing – Extract and format specific information from log files.
- Code Refactoring – Modify source code files in bulk.
- Batch Text Processing – Efficiently manipulate large text files.
Conclusion
sed
is a powerful tool for text processing and data cleaning in Linux. With its substitution, deletion, insertion, and filtering capabilities, it becomes a crucial utility for anyone working with large text files or automation scripts.
Further Readings
- For further learning, check out the official documentation: GNU Sed Manual
- If you want to learn more about RegEx, do check this out Regex Tutorial