AWK Command in Linux: Complete Guide to Text Processing and Data Analysis

24.02.2026

13:23

When you need to quickly extract the third column from a 100,000-line log file, calculate sum of numbers in CSV or find duplicates in configuration files, experienced system administrators reach for one tool — awk. This command-line utility has existed since 1977 but remains irreplaceable for processing structured text.

Unlike Python scripts that need writing and debugging, awk solves most text processing tasks with a single line of code. Unlike grep which simply searches patterns, awk analyzes data structure by columns and rows, performs mathematical calculations and formats output.

This guide shows how to use awk for daily system administrator tasks: web server log analysis, CSV file processing, system resource monitoring and routine operation automation. All examples tested on real THE.Hosting servers and ready to use.

What is AWK and Why You Need It

AWK is a programming language and command-line utility for text data processing in Unix and Linux systems. Name comes from creators' surnames: Aho, Weinberger and Kernighan, who developed the language at Bell Labs in 1977.

AWK's core concept: program reads input text line by line, splits each line into fields (columns) by delimiter and performs specified actions on these fields. This makes awk ideal for working with structured data — logs, CSV files, system command output, configuration files.

When to use awk:

Extract specific columns from data tables
Filter rows by conditions (greater/less, contains text)
Calculations on columns (sum, average, count)
Format output of other commands
Process CSV, TSV, log files
Quick analysis of large text files without loading into memory

Comparison with alternatives:

Tool	When to use	Complexity
awk	Work with columns, calculations, formatting	Medium
grep	Simple line search by pattern	Low
sed	Text replacement, stream editing	Medium
cut	Extract fixed columns	Low
Python	Complex logic, large projects	High

Real example: You have web server log with 50,000 lines. Need to find all requests to /api/users endpoint and count how many times each IP address accessed it. With awk this is one line:

awk '/\/api\/users/ {count[$1]++} END {for (ip in count) print ip, count[ip]}' access.log

Equivalent Python script would take 15-20 lines of code.

AWK Command Syntax

Basic structure:

awk 'pattern {action}' file

pattern — line selection condition (optional)
action — action on lines matching condition
file — input file (can be multiple or stdin via pipe)

Main usage variants:

Print all lines (like cat)

awk '{print}' file.txt

Print first column

awk '{print $1}' file.txt

Lines containing "error"

awk '/error/ {print}' file.txt

With condition on column value

awk '$3 > 100 {print $1, $3}' file.txt

Multiple actions via semicolon

awk '{sum += $2; count++} END {print sum/count}' file.txt

Important command-line options:

Option	Description	Example
`-F`	Specify field separator	`awk -F':' '{print $1}' /etc/passwd`
`-v`	Pass variable to program	`awk -v limit=100 '$3 > limit'`
`-f`	Read program from file	`awk -f script.awk data.txt`

Field separators:

By default awk splits lines by spaces and tabs. For other separators:

CSV file (comma)

awk -F',' '{print $1, $3}' data.csv

/etc/passwd file (colon)

awk -F':' '{print $1, $6}' /etc/passwd

Multiple separator characters

awk -F'[,:]' '{print $1}' mixed.txt

Regular expression as separator

awk -F'[[:space:]]+' '{print $1}' file.txt

Basic AWK Concepts

Fields and Records

Record — line of input text. Awk processes files line by line.

Field — column in record, separated by special character.

Field notation:

$0 — entire line
$1 — first field (first column)
$2 — second field
$NF — last field (NF = number of fields)
$(NF-1) — second to last field

Example data (employees.txt):

John Manager Sales 5000
Alice Developer IT 6000
Bob Analyst Finance 4500

Working with fields:

Print name and salary

awk '{print $1, $4}' employees.txt

Swap columns

awk '{print $4, $2, $1}' employees.txt

String concatenation

awk '{print $1 " works in " $3}' employees.txt

Arithmetic

awk '{print $1, $4 * 1.15}' employees.txt

Patterns (Conditions)

Patterns define which lines to process.

Pattern types:

Regular expressions (between slashes)

awk '/IT/ {print}' employees.txt

Field comparison

awk '$4 > 5000 {print $1, $4}' employees.txt

Logical operators

awk '$3 == "Sales" && $4 > 4000 {print}' employees.txt

AND: Sales department AND salary > 4000

awk '$4 < 5000 || $2 == "Manager" {print}' employees.txt

Line ranges

awk '/START/,/END/ {print}' file.txt

Negation

awk '$3 != "IT" {print}' employees.txt

All except IT department

awk '!/Manager/ {print}' employees.txt

Comparison operators:

Operator	Meaning	Example
`==`	Equal	`$1 == "John"`
`!=`	Not equal	`$2 != "Sales"`
`<`	Less than	`$3 < 100`
`>`	Greater than	`$3 > 1000`
`<=`	Less or equal	`$3 <= 50`
`>=`	Greater or equal	`$3 >= 100`
`~`	Matches regex	`$1 ~ /^A/`
`!~`	Doesn't match regex	`$1 !~ /test/`

Special Patterns BEGIN and END

BEGIN — executed once before processing data
END — executed once after processing all data

Header and footer

awk 'BEGIN {print "Name\tSalary"}
{print $1, $4}
END {print "---\nTotal employees:", NR}' employees.txt

Practical example — calculating average:

awk 'BEGIN {sum=0; count=0}
{sum += $4; count++}
END {print "Average salary:", sum/count}' employees.txt

Built-in AWK Variables

AWK provides set of predefined variables for data work.

Variable	Description	Example usage
`NR`	Number of Records — current line number	`{print NR, $0}`
`NF`	Number of Fields — field count in line	`{print $NF}` (last field)
`FS`	Field Separator — input field separator	`BEGIN {FS=":"} {print $1}`
`OFS`	Output Field Separator — output field separator	`BEGIN {OFS=","} {print $1, $2}`
`RS`	Record Separator — input record separator	`BEGIN {RS=";"} {print}`
`ORS`	Output Record Separator — output record separator	`BEGIN {ORS="---\n"} {print}`
`FILENAME`	Current processed file name	`{print FILENAME, $0}`
`FNR`	Line number in current file	`{print FILENAME, FNR, $0}`

Usage examples:

Line numbering

awk '{print NR, $0}' file.txt

Output field count in each line

awk '{print "Line", NR, "contains", NF, "fields"}' data.txt

Change output separator

awk 'BEGIN {OFS=" | "} {print $1, $2, $3}' file.txt

Processing multiple files

awk '{print FILENAME, FNR, $1}' file1.txt file2.txt

Operations and Calculations in AWK

Arithmetic Operations

Basic operations: + - * / % (modulo)

awk '{print $1, $2 * 1.2}' file.txt

Increase value

awk '{$4 = $4 * 1.15; print}' employees.txt

Accumulate sum

awk '{sum += $3} END {print "Total:", sum}' sales.txt

Counter

awk '/error/ {count++} END {print "Errors:", count}' log.txt

String Operations

String concatenation (just space)

awk '{print $1 $2}' file.txt          # JohnDoe
awk '{print $1 " " $2}' file.txt      # John Doe
awk '{print $1 "-" $2}' file.txt      # John-Doe

length() function — string length

awk '{print $1, length($1)}' file.txt
awk 'length($0) > 80 {print}' file.txt  # Lines longer than 80 chars

substr() function — substring

awk '{print substr($1, 1, 3)}' file.txt  # First 3 characters

tolower() and toupper() functions

awk '{print tolower($1)}' file.txt
awk '{print toupper($1)}' file.txt

gsub() function — replace all occurrences

awk '{gsub(/old/, "new"); print}' file.txt

sub() function — replace first occurrence

awk '{sub(/old/, "new"); print}' file.txt

match() function — find pattern

awk 'match($0, /[0-9]+/) {print substr($0, RSTART, RLENGTH)}' file.txt

Conditional Constructs

If-else:

Basic if

awk '{if ($3 > 5000) print $1, "high salary"}' employees.txt

If-else

awk '{if ($4 > 5000)
print $1, "high"
else
print $1, "low"}' employees.txt

Multi-line program

awk '{
if ($4 >= 6000)
category = "senior"
else if ($4 >= 5000)
category = "middle"
else
category = "junior"
print $1, category
}' employees.txt

Ternary operator:

awk '{print $1, ($4 > 5000 ? "high" : "low")}' employees.txt

Loops

For loop:

Output all fields line by line

awk '{for (i=1; i<=NF; i++) print i, $i}' file.txt

Sum all numbers in line

awk '{sum=0; for(i=1; i<=NF; i++) sum+=$i; print sum}' numbers.txt

While loop:

awk '{i=1; while(i<=NF) {print i":"$i; i++}}' file.txt

Arrays in AWK

Arrays in awk are associative (like hash tables) — key can be any string.

Basic Array Usage

Count unique values

awk '{count[$1]++} END {for (name in count) print name, count[name]}' file.txt

Sum by categories

awk '{sales[$3] += $4}
END {for (dept in sales) print dept, sales[dept]}' data.txt

Practical Array Examples

Count IP addresses in log:

awk '{ip_count[$1]++} 
     END {for (ip in ip_count) print ip, ip_count[ip]}' access.log | sort -rn -k2

Group by HTTP status:

awk '{status[$9]++} 
     END {for (code in status) print code, status[code]}' access.log

Find duplicates:

awk '{if (seen[$0]++) print "Duplicate:", $0}' file.txt

Remove duplicates (output unique lines):

awk '!seen[$0]++' file.txt

Practical Examples for Real Tasks

Web Server Log Analysis

access.log format:

192.168.1.100 - - [24/Feb/2026:10:15:30 +0000] "GET /index.html HTTP/1.1" 200 1234

Top-10 IP addresses by request count:

awk '{ip[$1]++} END {for (i in ip) print ip[i], i}' access.log | sort -rn | head -10

Count requests by HTTP method:

awk '{print $6}' access.log | sort | uniq -c

Filter 404 errors:

awk '$9 == 404 {print $7}' access.log | sort | uniq -c | sort -rn

Traffic by hour:

awk '{print substr($4, 14, 2)}' access.log | sort | uniq -c

CSV File Processing

Example CSV (sales.csv):

Date,Product,Quantity,Price
2026-02-01,Widget,10,25.50
2026-02-01,Gadget,5,120.00
2026-02-02,Widget,8,25.50

Calculate total revenue:

awk -F',' 'NR>1 {sum += $3 * $4} END {print "Total:", sum}' sales.csv

Sum by products:

awk -F',' 'NR>1 {total[$2] += $3 * $4} 
           END {for (p in total) print p, total[p]}' sales.csv

Filter by date:

awk -F',' '$1 ~ /2026-02-01/ {print $2, $3}' sales.csv

Formatted output with headers:

awk -F',' 'BEGIN {print "Product | Total"} 
           NR>1 {total[$2] += $3 * $4} 
           END {for (p in total) printf "%-10s | $%.2f\n", p, total[p]}' sales.csv

System Administration

Monitor disk usage:

df -h | awk 'NR>1 {if ($5+0 > 80) print $6, "used", $5}'

Top processes by memory:

ps aux | awk 'NR>1 {print $11, $4"%"}' | sort -k2 -rn | head -10

Analyze active connections:

netstat -an | awk '/ESTABLISHED/ {count[$5]++} 
                   END {for (ip in count) print ip, count[ip]}' | sort -rn -k2

Check open ports:

netstat -tuln | awk 'NR>2 {print $4}' | awk -F':' '{print $NF}' | sort -n | uniq

Configuration File Processing

Extract active users from /etc/passwd:

awk -F':' '$7 !~ /nologin|false/ {print $1, $6}' /etc/passwd

Users with UID > 1000:

awk -F':' '$3 >= 1000 {print $1, $3}' /etc/passwd

User groups:

awk -F':' '/^username:/ {print $4}' /etc/passwd | 
  xargs -I {} awk -F':' '$3 == {} {print $1}' /etc/group

Advanced Techniques

Multi-file Processing

Compare two files

awk 'NR==FNR {a[$1]; next} $1 in a' file1.txt file2.txt

File difference

awk 'NR==FNR {a[$1]; next} !($1 in a)' file1.txt file2.txt

Output Formatting with printf

Column alignment

awk '{printf "%-10s %8.2f\n", $1, $2}' file.txt

Table with headers

awk 'BEGIN {printf "%-10s %10s %10s\n", "Name", "Salary", "Dept"}
{printf "%-10s %10d %10s\n", $1, $4, $3}' employees.txt

Calling External Commands

Execute shell command

awk '{system("echo Processing: " $1)}' file.txt

Read command output

awk 'BEGIN {
cmd = "date"
cmd | getline result
close(cmd)
print result
}'

Creating Reports

Detailed log report:

awk 'BEGIN {
  print "=== Web Server Report ==="
  print "Generated:", strftime("%Y-%m-%d %H:%M:%S")
}
{
  requests++
  bytes += $10
  status[$9]++
  if ($9 >= 400) errors++
}
END {
  print "\nTotal Requests:", requests
  print "Total Traffic:", bytes/1024/1024, "MB"
  print "Error Rate:", (errors/requests)*100 "%"
  print "\nStatus Codes:"
  for (code in status) 
    printf "  %s: %d (%.1f%%)\n", code, status[code], (status[code]/requests)*100
}' access.log

Comparing AWK with Alternatives

awk vs sed

awk better for:

Working with data columns
Mathematical calculations
Complex logic (conditions, loops)
Report creation

sed better for:

Simple text replacement
Stream editing
Line deletion
Text insertion

Example task: Extract email addresses from text

awk

awk 'match($0, /[a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z]{2,}/) {
print substr($0, RSTART, RLENGTH)
}' file.txt

sed (harder and less readable)

sed -n 's/.*\([a-zA-Z0-9._-]\+@[a-zA-Z0-9._-]\+\.[a-zA-Z]\{2,\}\).*/\1/p' file.txt

awk vs Python

awk advantages:

Faster for simple tasks (no interpreter overhead)
Shorter code for typical operations
Built into any Linux
More convenient in pipe chains

Python advantages:

More standard library capabilities
Better for complex logic
Easier debugging
More documentation

Code comparison — calculate average salary:

awk (1 line)

awk '{sum+=$4; count++} END {print sum/count}' employees.txt

Python (minimum 7 lines)

with open('employees.txt') as f:
total = 0
count = 0
for line in f:
salary = int(line.split()[3])
total += salary
count += 1
print(total / count)

Optimization and Best Practices

Performance

1. Avoid unnecessary operations:

Bad - check on every line

awk '{if (NR > 1) print}' file.txt

Good - use range

awk 'NR > 1' file.txt

2. Use built-in variables:

Slow

awk '{print $1, $2, $3, $4}' file.txt

Faster

awk '{$5=""; print}' file.txt

3. Filter early:

Bad - processes all lines

awk '{sum += $3} $2 == "Sales" {print}' file.txt

Good - filters immediately

awk '$2 == "Sales" {sum += $3; print}' file.txt

Code Readability

Use variables:

Bad

awk '$3 > 5000 && $4 == "IT" {print $1}' file.txt

Good

awk '{
salary = $3
dept = $4
name = $1
if (salary > 5000 && dept == "IT")
print name
}' file.txt

Extract complex programs to files:

script.awk

BEGIN {
FS = ","
OFS = " | "
total = 0
}
NR > 1 {
total += $3 * $4
printf "%s | %s | %.2f\n", $1, $2, $3 * $4
}
END {
print "---"
printf "Total: $%.2f\n", total
}

Usage:

awk -f script.awk data.csv

Common Mistakes and Solutions

Problem: Wrong field separator

Error - CSV processed as spaces

awk '{print $2}' data.csv

Solution - specify separator

awk -F',' '{print $2}' data.csv

Problem: Skipping header

Error - header participates in calculations

awk '{sum += $2} END {print sum}' file.csv

Solution 1 - start from second line

awk 'NR > 1 {sum += $2} END {print sum}' file.csv

Solution 2 - skip non-numeric

awk '{if ($2 ~ /^[0-9]+$/) sum += $2} END {print sum}' file.csv

Problem: Division by zero

Error - possible division by zero

awk '{print $1 / $2}' file.txt

Solution - check

awk '{if ($2 != 0) print $1 / $2; else print "N/A"}' file.txt

Problem: Spaces in CSV fields

Error - field "New York" splits into two

awk -F',' '{print $3}' cities.csv

Solution - trim spaces

awk -F',' '{gsub(/^ +| +$/, "", $3); print $3}' cities.csv

Frequently Asked Questions (FAQ)

How to print last column?

awk '{print $NF}' file.txt

How to print all except first column?

awk '{$1=""; print}' file.txt

Or cleaner:

awk '{for(i=2;i<=NF;i++) printf "%s ", $i; print ""}' file.txt

How to count lines?

awk 'END {print NR}' file.txt

Or simply

wc -l file.txt

How to process files with paths containing spaces?

Use FILENAME variable

awk '{print FILENAME, $0}' "file with spaces.txt"

How to output lines between two patterns?

awk '/START/,/END/' file.txt

How to remove duplicates preserving order?

awk '!seen[$0]++' file.txt

Can you modify original file?

No, awk only reads. For writing:

awk '{print $1, $2}' input.txt > output.txt

How to process JSON in awk?

Awk poorly suited for JSON. Use jq:

jq '.items[] | .name' file.json

Installing and AWK Versions

Check installed version:

awk --version

which awk

Main implementations:

gawk — GNU Awk, most common in Linux
mawk — fast implementation, fewer features
nawk — "new awk", POSIX standard
original awk — outdated, in modern systems usually symlink to gawk

Installation on different systems:

Ubuntu/Debian

sudo apt install gawk

CentOS/RHEL

sudo yum install gawk

macOS

brew install gawk

On THE.Hosting servers gawk is pre-installed on all plans and ready to use.

Conclusion

The awk command is an indispensable tool in system administrator and DevOps engineer arsenal. It solves in seconds tasks that would require dozens of code lines in other languages.

Main awk advantages: work with structured data by columns, built-in regular expression support, mathematical calculations, minimalist syntax for simple tasks and ability to create complex programs for advanced scenarios.

Having learned basic concepts from this guide you can: analyze web server and application log files, process CSV and other tabular data, automate routine text operations, monitor system resources and build reports.

On THE.Hosting servers you can use awk for web server log analysis, load monitoring and administration task automation. All modern Linux distributions on our VPS include GNU Awk ready to work.

Order VPS for Linux work

Additional Resources:

GNU Awk Manual — official documentation
The AWK Programming Language — book by language creators
Awk One-Liners — ready solutions collection

15% discount on new VPS

Hurry up to order a server in any location

Choose a VPS