When you need to quickly extract the third column from a 100,000-line log file, calculate sum of numbers in CSV or find duplicates in configuration files, experienced system administrators reach for one tool — awk. This command-line utility has existed since 1977 but remains irreplaceable for processing structured text.
Unlike Python scripts that need writing and debugging, awk solves most text processing tasks with a single line of code. Unlike grep which simply searches patterns, awk analyzes data structure by columns and rows, performs mathematical calculations and formats output.
This guide shows how to use awk for daily system administrator tasks: web server log analysis, CSV file processing, system resource monitoring and routine operation automation. All examples tested on real THE.Hosting servers and ready to use.
What is AWK and Why You Need It
AWK is a programming language and command-line utility for text data processing in Unix and Linux systems. Name comes from creators' surnames: Aho, Weinberger and Kernighan, who developed the language at Bell Labs in 1977.
AWK's core concept: program reads input text line by line, splits each line into fields (columns) by delimiter and performs specified actions on these fields. This makes awk ideal for working with structured data — logs, CSV files, system command output, configuration files.
When to use awk:
- Extract specific columns from data tables
- Filter rows by conditions (greater/less, contains text)
- Calculations on columns (sum, average, count)
- Format output of other commands
- Process CSV, TSV, log files
- Quick analysis of large text files without loading into memory
Comparison with alternatives:
| Tool | When to use | Complexity |
|---|---|---|
| awk | Work with columns, calculations, formatting | Medium |
| grep | Simple line search by pattern | Low |
| sed | Text replacement, stream editing | Medium |
| cut | Extract fixed columns | Low |
| Python | Complex logic, large projects | High |
Real example: You have web server log with 50,000 lines. Need to find all requests to /api/users endpoint and count how many times each IP address accessed it. With awk this is one line:
awk '/\/api\/users/ {count[$1]++} END {for (ip in count) print ip, count[ip]}' access.log
Equivalent Python script would take 15-20 lines of code.
AWK Command Syntax
Basic structure:
awk 'pattern {action}' file
- pattern — line selection condition (optional)
- action — action on lines matching condition
- file — input file (can be multiple or stdin via pipe)
Main usage variants:
Print all lines (like cat)
awk '{print}' file.txt
Print first column
awk '{print $1}' file.txt
Lines containing "error"
awk '/error/ {print}' file.txt
With condition on column value
awk '$3 > 100 {print $1, $3}' file.txt
Multiple actions via semicolon
awk '{sum += $2; count++} END {print sum/count}' file.txt
Important command-line options:
| Option | Description | Example |
|---|---|---|
-F |
Specify field separator | awk -F':' '{print $1}' /etc/passwd |
-v |
Pass variable to program | awk -v limit=100 '$3 > limit' |
-f |
Read program from file | awk -f script.awk data.txt |
Field separators:
By default awk splits lines by spaces and tabs. For other separators:
CSV file (comma)
awk -F',' '{print $1, $3}' data.csv
/etc/passwd file (colon)
awk -F':' '{print $1, $6}' /etc/passwd
Multiple separator characters
awk -F'[,:]' '{print $1}' mixed.txt
Regular expression as separator
awk -F'[[:space:]]+' '{print $1}' file.txt
Basic AWK Concepts
Fields and Records
Record — line of input text. Awk processes files line by line.
Field — column in record, separated by special character.
Field notation:
$0— entire line$1— first field (first column)$2— second field$NF— last field (NF= number of fields)$(NF-1)— second to last field
Example data (employees.txt):
John Manager Sales 5000
Alice Developer IT 6000
Bob Analyst Finance 4500
Working with fields:
Print name and salary
awk '{print $1, $4}' employees.txt
Swap columns
awk '{print $4, $2, $1}' employees.txt
String concatenation
awk '{print $1 " works in " $3}' employees.txt
Arithmetic
awk '{print $1, $4 * 1.15}' employees.txt
Patterns (Conditions)
Patterns define which lines to process.
Pattern types:
- Regular expressions (between slashes)
awk '/IT/ {print}' employees.txt
- Field comparison
awk '$4 > 5000 {print $1, $4}' employees.txt
- Logical operators
awk '$3 == "Sales" && $4 > 4000 {print}' employees.txt
AND: Sales department AND salary > 4000
awk '$4 < 5000 || $2 == "Manager" {print}' employees.txt
- Line ranges
awk '/START/,/END/ {print}' file.txt
- Negation
awk '$3 != "IT" {print}' employees.txt
All except IT department
awk '!/Manager/ {print}' employees.txt
Comparison operators:
| Operator | Meaning | Example |
|---|---|---|
== |
Equal | $1 == "John" |
!= |
Not equal | $2 != "Sales" |
< |
Less than | $3 < 100 |
> |
Greater than | $3 > 1000 |
<= |
Less or equal | $3 <= 50 |
>= |
Greater or equal | $3 >= 100 |
~ |
Matches regex | $1 ~ /^A/ |
!~ |
Doesn't match regex | $1 !~ /test/ |
Special Patterns BEGIN and END
BEGIN — executed once before processing data
END — executed once after processing all data
Header and footer
awk 'BEGIN {print "Name\tSalary"}
{print $1, $4}
END {print "---\nTotal employees:", NR}' employees.txt
Practical example — calculating average:
awk 'BEGIN {sum=0; count=0}
{sum += $4; count++}
END {print "Average salary:", sum/count}' employees.txt
Built-in AWK Variables
AWK provides set of predefined variables for data work.
| Variable | Description | Example usage |
|---|---|---|
NR |
Number of Records — current line number | {print NR, $0} |
NF |
Number of Fields — field count in line | {print $NF} (last field) |
FS |
Field Separator — input field separator | BEGIN {FS=":"} {print $1} |
OFS |
Output Field Separator — output field separator | BEGIN {OFS=","} {print $1, $2} |
RS |
Record Separator — input record separator | BEGIN {RS=";"} {print} |
ORS |
Output Record Separator — output record separator | BEGIN {ORS="---\n"} {print} |
FILENAME |
Current processed file name | {print FILENAME, $0} |
FNR |
Line number in current file | {print FILENAME, FNR, $0} |
Usage examples:
Line numbering
awk '{print NR, $0}' file.txt
Output field count in each line
awk '{print "Line", NR, "contains", NF, "fields"}' data.txt
Change output separator
awk 'BEGIN {OFS=" | "} {print $1, $2, $3}' file.txt
Processing multiple files
awk '{print FILENAME, FNR, $1}' file1.txt file2.txt
Operations and Calculations in AWK
Arithmetic Operations
Basic operations: + - * / % (modulo)
awk '{print $1, $2 * 1.2}' file.txt
Increase value
awk '{$4 = $4 * 1.15; print}' employees.txt
Accumulate sum
awk '{sum += $3} END {print "Total:", sum}' sales.txt
Counter
awk '/error/ {count++} END {print "Errors:", count}' log.txt
String Operations
String concatenation (just space)
awk '{print $1 $2}' file.txt # JohnDoe
awk '{print $1 " " $2}' file.txt # John Doe
awk '{print $1 "-" $2}' file.txt # John-Doe
length() function — string length
awk '{print $1, length($1)}' file.txt
awk 'length($0) > 80 {print}' file.txt # Lines longer than 80 chars
substr() function — substring
awk '{print substr($1, 1, 3)}' file.txt # First 3 characters
tolower() and toupper() functions
awk '{print tolower($1)}' file.txt
awk '{print toupper($1)}' file.txt
gsub() function — replace all occurrences
awk '{gsub(/old/, "new"); print}' file.txt
sub() function — replace first occurrence
awk '{sub(/old/, "new"); print}' file.txt
match() function — find pattern
awk 'match($0, /[0-9]+/) {print substr($0, RSTART, RLENGTH)}' file.txt
Conditional Constructs
If-else:
Basic if
awk '{if ($3 > 5000) print $1, "high salary"}' employees.txt
If-else
awk '{if ($4 > 5000)
print $1, "high"
else
print $1, "low"}' employees.txt
Multi-line program
awk '{
if ($4 >= 6000)
category = "senior"
else if ($4 >= 5000)
category = "middle"
else
category = "junior"
print $1, category
}' employees.txt
Ternary operator:
awk '{print $1, ($4 > 5000 ? "high" : "low")}' employees.txt
Loops
For loop:
Output all fields line by line
awk '{for (i=1; i<=NF; i++) print i, $i}' file.txt
Sum all numbers in line
awk '{sum=0; for(i=1; i<=NF; i++) sum+=$i; print sum}' numbers.txt
While loop:
awk '{i=1; while(i<=NF) {print i":"$i; i++}}' file.txt
Arrays in AWK
Arrays in awk are associative (like hash tables) — key can be any string.
Basic Array Usage
Count unique values
awk '{count[$1]++} END {for (name in count) print name, count[name]}' file.txt
Sum by categories
awk '{sales[$3] += $4}
END {for (dept in sales) print dept, sales[dept]}' data.txt
Practical Array Examples
Count IP addresses in log:
awk '{ip_count[$1]++}
END {for (ip in ip_count) print ip, ip_count[ip]}' access.log | sort -rn -k2
Group by HTTP status:
awk '{status[$9]++}
END {for (code in status) print code, status[code]}' access.log
Find duplicates:
awk '{if (seen[$0]++) print "Duplicate:", $0}' file.txt
Remove duplicates (output unique lines):
awk '!seen[$0]++' file.txt
Practical Examples for Real Tasks
Web Server Log Analysis
access.log format:
192.168.1.100 - - [24/Feb/2026:10:15:30 +0000] "GET /index.html HTTP/1.1" 200 1234
Top-10 IP addresses by request count:
awk '{ip[$1]++} END {for (i in ip) print ip[i], i}' access.log | sort -rn | head -10
Count requests by HTTP method:
awk '{print $6}' access.log | sort | uniq -c
Filter 404 errors:
awk '$9 == 404 {print $7}' access.log | sort | uniq -c | sort -rn
Traffic by hour:
awk '{print substr($4, 14, 2)}' access.log | sort | uniq -c
CSV File Processing
Example CSV (sales.csv):
Date,Product,Quantity,Price
2026-02-01,Widget,10,25.50
2026-02-01,Gadget,5,120.00
2026-02-02,Widget,8,25.50
Calculate total revenue:
awk -F',' 'NR>1 {sum += $3 * $4} END {print "Total:", sum}' sales.csv
Sum by products:
awk -F',' 'NR>1 {total[$2] += $3 * $4}
END {for (p in total) print p, total[p]}' sales.csv
Filter by date:
awk -F',' '$1 ~ /2026-02-01/ {print $2, $3}' sales.csv
Formatted output with headers:
awk -F',' 'BEGIN {print "Product | Total"}
NR>1 {total[$2] += $3 * $4}
END {for (p in total) printf "%-10s | $%.2f\n", p, total[p]}' sales.csv
System Administration
Monitor disk usage:
df -h | awk 'NR>1 {if ($5+0 > 80) print $6, "used", $5}'
Top processes by memory:
ps aux | awk 'NR>1 {print $11, $4"%"}' | sort -k2 -rn | head -10
Analyze active connections:
netstat -an | awk '/ESTABLISHED/ {count[$5]++}
END {for (ip in count) print ip, count[ip]}' | sort -rn -k2
Check open ports:
netstat -tuln | awk 'NR>2 {print $4}' | awk -F':' '{print $NF}' | sort -n | uniq
Configuration File Processing
Extract active users from /etc/passwd:
awk -F':' '$7 !~ /nologin|false/ {print $1, $6}' /etc/passwd
Users with UID > 1000:
awk -F':' '$3 >= 1000 {print $1, $3}' /etc/passwd
User groups:
awk -F':' '/^username:/ {print $4}' /etc/passwd |
xargs -I {} awk -F':' '$3 == {} {print $1}' /etc/group
Advanced Techniques
Multi-file Processing
Compare two files
awk 'NR==FNR {a[$1]; next} $1 in a' file1.txt file2.txt
File difference
awk 'NR==FNR {a[$1]; next} !($1 in a)' file1.txt file2.txt
Output Formatting with printf
Column alignment
awk '{printf "%-10s %8.2f\n", $1, $2}' file.txt
Table with headers
awk 'BEGIN {printf "%-10s %10s %10s\n", "Name", "Salary", "Dept"}
{printf "%-10s %10d %10s\n", $1, $4, $3}' employees.txt
Calling External Commands
Execute shell command
awk '{system("echo Processing: " $1)}' file.txt
Read command output
awk 'BEGIN {
cmd = "date"
cmd | getline result
close(cmd)
print result
}'
Creating Reports
Detailed log report:
awk 'BEGIN {
print "=== Web Server Report ==="
print "Generated:", strftime("%Y-%m-%d %H:%M:%S")
}
{
requests++
bytes += $10
status[$9]++
if ($9 >= 400) errors++
}
END {
print "\nTotal Requests:", requests
print "Total Traffic:", bytes/1024/1024, "MB"
print "Error Rate:", (errors/requests)*100 "%"
print "\nStatus Codes:"
for (code in status)
printf " %s: %d (%.1f%%)\n", code, status[code], (status[code]/requests)*100
}' access.log
Comparing AWK with Alternatives
awk vs sed
awk better for:
- Working with data columns
- Mathematical calculations
- Complex logic (conditions, loops)
- Report creation
sed better for:
- Simple text replacement
- Stream editing
- Line deletion
- Text insertion
Example task: Extract email addresses from text
awk
awk 'match($0, /[a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z]{2,}/) {
print substr($0, RSTART, RLENGTH)
}' file.txt
sed (harder and less readable)
sed -n 's/.*\([a-zA-Z0-9._-]\+@[a-zA-Z0-9._-]\+\.[a-zA-Z]\{2,\}\).*/\1/p' file.txt
awk vs Python
awk advantages:
- Faster for simple tasks (no interpreter overhead)
- Shorter code for typical operations
- Built into any Linux
- More convenient in pipe chains
Python advantages:
- More standard library capabilities
- Better for complex logic
- Easier debugging
- More documentation
Code comparison — calculate average salary:
awk (1 line)
awk '{sum+=$4; count++} END {print sum/count}' employees.txt
Python (minimum 7 lines)
with open('employees.txt') as f:
total = 0
count = 0
for line in f:
salary = int(line.split()[3])
total += salary
count += 1
print(total / count)
Optimization and Best Practices
Performance
1. Avoid unnecessary operations:
Bad - check on every line
awk '{if (NR > 1) print}' file.txt
Good - use range
awk 'NR > 1' file.txt
2. Use built-in variables:
Slow
awk '{print $1, $2, $3, $4}' file.txt
Faster
awk '{$5=""; print}' file.txt
3. Filter early:
Bad - processes all lines
awk '{sum += $3} $2 == "Sales" {print}' file.txt
Good - filters immediately
awk '$2 == "Sales" {sum += $3; print}' file.txt
Code Readability
Use variables:
Bad
awk '$3 > 5000 && $4 == "IT" {print $1}' file.txt
Good
awk '{
salary = $3
dept = $4
name = $1
if (salary > 5000 && dept == "IT")
print name
}' file.txt
Extract complex programs to files:
script.awk
BEGIN {
FS = ","
OFS = " | "
total = 0
}
NR > 1 {
total += $3 * $4
printf "%s | %s | %.2f\n", $1, $2, $3 * $4
}
END {
print "---"
printf "Total: $%.2f\n", total
}
Usage:
awk -f script.awk data.csv
Common Mistakes and Solutions
Problem: Wrong field separator
Error - CSV processed as spaces
awk '{print $2}' data.csv
Solution - specify separator
awk -F',' '{print $2}' data.csv
Problem: Skipping header
Error - header participates in calculations
awk '{sum += $2} END {print sum}' file.csv
Solution 1 - start from second line
awk 'NR > 1 {sum += $2} END {print sum}' file.csv
Solution 2 - skip non-numeric
awk '{if ($2 ~ /^[0-9]+$/) sum += $2} END {print sum}' file.csv
Problem: Division by zero
Error - possible division by zero
awk '{print $1 / $2}' file.txt
Solution - check
awk '{if ($2 != 0) print $1 / $2; else print "N/A"}' file.txt
Problem: Spaces in CSV fields
Error - field "New York" splits into two
awk -F',' '{print $3}' cities.csv
Solution - trim spaces
awk -F',' '{gsub(/^ +| +$/, "", $3); print $3}' cities.csv
Frequently Asked Questions (FAQ)
How to print last column?
awk '{print $NF}' file.txt
How to print all except first column?
awk '{$1=""; print}' file.txt
Or cleaner:
awk '{for(i=2;i<=NF;i++) printf "%s ", $i; print ""}' file.txt
How to count lines?
awk 'END {print NR}' file.txt
Or simply
wc -l file.txt
How to process files with paths containing spaces?
Use FILENAME variable
awk '{print FILENAME, $0}' "file with spaces.txt"
How to output lines between two patterns?
awk '/START/,/END/' file.txt
How to remove duplicates preserving order?
awk '!seen[$0]++' file.txt
Can you modify original file?
No, awk only reads. For writing:
awk '{print $1, $2}' input.txt > output.txt
How to process JSON in awk?
Awk poorly suited for JSON. Use jq:
jq '.items[] | .name' file.json
Installing and AWK Versions
Check installed version:
awk --version
or
which awk
Main implementations:
- gawk — GNU Awk, most common in Linux
- mawk — fast implementation, fewer features
- nawk — "new awk", POSIX standard
- original awk — outdated, in modern systems usually symlink to gawk
Installation on different systems:
Ubuntu/Debian
sudo apt install gawk
CentOS/RHEL
sudo yum install gawk
macOS
brew install gawk
On THE.Hosting servers gawk is pre-installed on all plans and ready to use.
Conclusion
The awk command is an indispensable tool in system administrator and DevOps engineer arsenal. It solves in seconds tasks that would require dozens of code lines in other languages.
Main awk advantages: work with structured data by columns, built-in regular expression support, mathematical calculations, minimalist syntax for simple tasks and ability to create complex programs for advanced scenarios.
Having learned basic concepts from this guide you can: analyze web server and application log files, process CSV and other tabular data, automate routine text operations, monitor system resources and build reports.
On THE.Hosting servers you can use awk for web server log analysis, load monitoring and administration task automation. All modern Linux distributions on our VPS include GNU Awk ready to work.
Additional Resources:
- GNU Awk Manual — official documentation
- The AWK Programming Language — book by language creators
- Awk One-Liners — ready solutions collection