AWK Command in Linux: Complete Guide to Text Processing and Data Analysis

24.02.2026
13:23

When you need to quickly extract the third column from a 100,000-line log file, calculate sum of numbers in CSV or find duplicates in configuration files, experienced system administrators reach for one tool — awk. This command-line utility has existed since 1977 but remains irreplaceable for processing structured text.

Unlike Python scripts that need writing and debugging, awk solves most text processing tasks with a single line of code. Unlike grep which simply searches patterns, awk analyzes data structure by columns and rows, performs mathematical calculations and formats output.

This guide shows how to use awk for daily system administrator tasks: web server log analysis, CSV file processing, system resource monitoring and routine operation automation. All examples tested on real THE.Hosting servers and ready to use.

What is AWK and Why You Need It

AWK is a programming language and command-line utility for text data processing in Unix and Linux systems. Name comes from creators' surnames: Aho, Weinberger and Kernighan, who developed the language at Bell Labs in 1977.

AWK's core concept: program reads input text line by line, splits each line into fields (columns) by delimiter and performs specified actions on these fields. This makes awk ideal for working with structured data — logs, CSV files, system command output, configuration files.

When to use awk:

  • Extract specific columns from data tables
  • Filter rows by conditions (greater/less, contains text)
  • Calculations on columns (sum, average, count)
  • Format output of other commands
  • Process CSV, TSV, log files
  • Quick analysis of large text files without loading into memory

Comparison with alternatives:

Tool When to use Complexity
awk Work with columns, calculations, formatting Medium
grep Simple line search by pattern Low
sed Text replacement, stream editing Medium
cut Extract fixed columns Low
Python Complex logic, large projects High

Real example: You have web server log with 50,000 lines. Need to find all requests to /api/users endpoint and count how many times each IP address accessed it. With awk this is one line:

awk '/\/api\/users/ {count[$1]++} END {for (ip in count) print ip, count[ip]}' access.log

Equivalent Python script would take 15-20 lines of code.

AWK Command Syntax

Basic structure:

awk 'pattern {action}' file
  • pattern — line selection condition (optional)
  • action — action on lines matching condition
  • file — input file (can be multiple or stdin via pipe)

Main usage variants:

Print all lines (like cat)

awk '{print}' file.txt

Print first column

awk '{print $1}' file.txt

Lines containing "error"

awk '/error/ {print}' file.txt

With condition on column value

awk '$3 > 100 {print $1, $3}' file.txt

Multiple actions via semicolon

awk '{sum += $2; count++} END {print sum/count}' file.txt

Important command-line options:

Option Description Example
-F Specify field separator awk -F':' '{print $1}' /etc/passwd
-v Pass variable to program awk -v limit=100 '$3 > limit'
-f Read program from file awk -f script.awk data.txt

Field separators:

By default awk splits lines by spaces and tabs. For other separators:

CSV file (comma)

awk -F',' '{print $1, $3}' data.csv

/etc/passwd file (colon)

awk -F':' '{print $1, $6}' /etc/passwd

Multiple separator characters

awk -F'[,:]' '{print $1}' mixed.txt

Regular expression as separator

awk -F'[[:space:]]+' '{print $1}' file.txt

Basic AWK Concepts

Fields and Records

Record — line of input text. Awk processes files line by line.

Field — column in record, separated by special character.

Field notation:

  • $0 — entire line
  • $1 — first field (first column)
  • $2 — second field
  • $NF — last field (NF = number of fields)
  • $(NF-1) — second to last field

Example data (employees.txt):

John Manager Sales 5000
Alice Developer IT 6000
Bob Analyst Finance 4500

Working with fields:

Print name and salary

awk '{print $1, $4}' employees.txt

Swap columns

awk '{print $4, $2, $1}' employees.txt

String concatenation

awk '{print $1 " works in " $3}' employees.txt

Arithmetic

awk '{print $1, $4 * 1.15}' employees.txt

Patterns (Conditions)

Patterns define which lines to process.

Pattern types:

  1. Regular expressions (between slashes)
awk '/IT/ {print}' employees.txt
  1. Field comparison
awk '$4 > 5000 {print $1, $4}' employees.txt
  1. Logical operators
awk '$3 == "Sales" && $4 > 4000 {print}' employees.txt

AND: Sales department AND salary > 4000

awk '$4 < 5000 || $2 == "Manager" {print}' employees.txt
  1. Line ranges
awk '/START/,/END/ {print}' file.txt
  1. Negation
awk '$3 != "IT" {print}' employees.txt

All except IT department

awk '!/Manager/ {print}' employees.txt

Comparison operators:

Operator Meaning Example
== Equal $1 == "John"
!= Not equal $2 != "Sales"
< Less than $3 < 100
> Greater than $3 > 1000
<= Less or equal $3 <= 50
>= Greater or equal $3 >= 100
~ Matches regex $1 ~ /^A/
!~ Doesn't match regex $1 !~ /test/

Special Patterns BEGIN and END

BEGIN — executed once before processing data
END — executed once after processing all data

Header and footer

awk 'BEGIN {print "Name\tSalary"}
{print $1, $4}
END {print "---\nTotal employees:", NR}' employees.txt

Practical example — calculating average:

awk 'BEGIN {sum=0; count=0}
{sum += $4; count++}
END {print "Average salary:", sum/count}' employees.txt

Built-in AWK Variables

AWK provides set of predefined variables for data work.

Variable Description Example usage
NR Number of Records — current line number {print NR, $0}
NF Number of Fields — field count in line {print $NF} (last field)
FS Field Separator — input field separator BEGIN {FS=":"} {print $1}
OFS Output Field Separator — output field separator BEGIN {OFS=","} {print $1, $2}
RS Record Separator — input record separator BEGIN {RS=";"} {print}
ORS Output Record Separator — output record separator BEGIN {ORS="---\n"} {print}
FILENAME Current processed file name {print FILENAME, $0}
FNR Line number in current file {print FILENAME, FNR, $0}

Usage examples:

Line numbering

awk '{print NR, $0}' file.txt

Output field count in each line

awk '{print "Line", NR, "contains", NF, "fields"}' data.txt

Change output separator

awk 'BEGIN {OFS=" | "} {print $1, $2, $3}' file.txt

Processing multiple files

awk '{print FILENAME, FNR, $1}' file1.txt file2.txt

Operations and Calculations in AWK

Arithmetic Operations

Basic operations: + - * / % (modulo)

awk '{print $1, $2 * 1.2}' file.txt

Increase value

awk '{$4 = $4 * 1.15; print}' employees.txt

Accumulate sum

awk '{sum += $3} END {print "Total:", sum}' sales.txt

Counter

awk '/error/ {count++} END {print "Errors:", count}' log.txt

String Operations

String concatenation (just space)

awk '{print $1 $2}' file.txt          # JohnDoe
awk '{print $1 " " $2}' file.txt      # John Doe
awk '{print $1 "-" $2}' file.txt      # John-Doe

length() function — string length

awk '{print $1, length($1)}' file.txt
awk 'length($0) > 80 {print}' file.txt  # Lines longer than 80 chars

substr() function — substring

awk '{print substr($1, 1, 3)}' file.txt  # First 3 characters

tolower() and toupper() functions

awk '{print tolower($1)}' file.txt
awk '{print toupper($1)}' file.txt

gsub() function — replace all occurrences

awk '{gsub(/old/, "new"); print}' file.txt

sub() function — replace first occurrence

awk '{sub(/old/, "new"); print}' file.txt

match() function — find pattern

awk 'match($0, /[0-9]+/) {print substr($0, RSTART, RLENGTH)}' file.txt

Conditional Constructs

If-else:

Basic if

awk '{if ($3 > 5000) print $1, "high salary"}' employees.txt

If-else

awk '{if ($4 > 5000)
print $1, "high"
else
print $1, "low"}' employees.txt

Multi-line program

awk '{
if ($4 >= 6000)
category = "senior"
else if ($4 >= 5000)
category = "middle"
else
category = "junior"
print $1, category
}' employees.txt

Ternary operator:

awk '{print $1, ($4 > 5000 ? "high" : "low")}' employees.txt

Loops

For loop:

Output all fields line by line

awk '{for (i=1; i<=NF; i++) print i, $i}' file.txt

Sum all numbers in line

awk '{sum=0; for(i=1; i<=NF; i++) sum+=$i; print sum}' numbers.txt

While loop:

awk '{i=1; while(i<=NF) {print i":"$i; i++}}' file.txt

Arrays in AWK

Arrays in awk are associative (like hash tables) — key can be any string.

Basic Array Usage

Count unique values

awk '{count[$1]++} END {for (name in count) print name, count[name]}' file.txt

Sum by categories

awk '{sales[$3] += $4}
END {for (dept in sales) print dept, sales[dept]}' data.txt

Practical Array Examples

Count IP addresses in log:

awk '{ip_count[$1]++} 
     END {for (ip in ip_count) print ip, ip_count[ip]}' access.log | sort -rn -k2

Group by HTTP status:

awk '{status[$9]++} 
     END {for (code in status) print code, status[code]}' access.log

Find duplicates:

awk '{if (seen[$0]++) print "Duplicate:", $0}' file.txt

Remove duplicates (output unique lines):

awk '!seen[$0]++' file.txt

Practical Examples for Real Tasks

Web Server Log Analysis

access.log format:

192.168.1.100 - - [24/Feb/2026:10:15:30 +0000] "GET /index.html HTTP/1.1" 200 1234

Top-10 IP addresses by request count:

awk '{ip[$1]++} END {for (i in ip) print ip[i], i}' access.log | sort -rn | head -10

Count requests by HTTP method:

awk '{print $6}' access.log | sort | uniq -c

Filter 404 errors:

awk '$9 == 404 {print $7}' access.log | sort | uniq -c | sort -rn

Traffic by hour:

awk '{print substr($4, 14, 2)}' access.log | sort | uniq -c

CSV File Processing

Example CSV (sales.csv):

Date,Product,Quantity,Price
2026-02-01,Widget,10,25.50
2026-02-01,Gadget,5,120.00
2026-02-02,Widget,8,25.50

Calculate total revenue:

awk -F',' 'NR>1 {sum += $3 * $4} END {print "Total:", sum}' sales.csv

Sum by products:

awk -F',' 'NR>1 {total[$2] += $3 * $4} 
           END {for (p in total) print p, total[p]}' sales.csv

Filter by date:

awk -F',' '$1 ~ /2026-02-01/ {print $2, $3}' sales.csv

Formatted output with headers:

awk -F',' 'BEGIN {print "Product | Total"} 
           NR>1 {total[$2] += $3 * $4} 
           END {for (p in total) printf "%-10s | $%.2f\n", p, total[p]}' sales.csv

System Administration

Monitor disk usage:

df -h | awk 'NR>1 {if ($5+0 > 80) print $6, "used", $5}'

Top processes by memory:

ps aux | awk 'NR>1 {print $11, $4"%"}' | sort -k2 -rn | head -10

Analyze active connections:

netstat -an | awk '/ESTABLISHED/ {count[$5]++} 
                   END {for (ip in count) print ip, count[ip]}' | sort -rn -k2

Check open ports:

netstat -tuln | awk 'NR>2 {print $4}' | awk -F':' '{print $NF}' | sort -n | uniq

Configuration File Processing

Extract active users from /etc/passwd:

awk -F':' '$7 !~ /nologin|false/ {print $1, $6}' /etc/passwd

Users with UID > 1000:

awk -F':' '$3 >= 1000 {print $1, $3}' /etc/passwd

User groups:

awk -F':' '/^username:/ {print $4}' /etc/passwd | 
  xargs -I {} awk -F':' '$3 == {} {print $1}' /etc/group

Advanced Techniques

Multi-file Processing

Compare two files

awk 'NR==FNR {a[$1]; next} $1 in a' file1.txt file2.txt

File difference

awk 'NR==FNR {a[$1]; next} !($1 in a)' file1.txt file2.txt

Output Formatting with printf

Column alignment

awk '{printf "%-10s %8.2f\n", $1, $2}' file.txt

Table with headers

awk 'BEGIN {printf "%-10s %10s %10s\n", "Name", "Salary", "Dept"}
{printf "%-10s %10d %10s\n", $1, $4, $3}' employees.txt

Calling External Commands

Execute shell command

awk '{system("echo Processing: " $1)}' file.txt

Read command output

awk 'BEGIN {
cmd = "date"
cmd | getline result
close(cmd)
print result
}'

Creating Reports

Detailed log report:

awk 'BEGIN {
  print "=== Web Server Report ==="
  print "Generated:", strftime("%Y-%m-%d %H:%M:%S")
}
{
  requests++
  bytes += $10
  status[$9]++
  if ($9 >= 400) errors++
}
END {
  print "\nTotal Requests:", requests
  print "Total Traffic:", bytes/1024/1024, "MB"
  print "Error Rate:", (errors/requests)*100 "%"
  print "\nStatus Codes:"
  for (code in status) 
    printf "  %s: %d (%.1f%%)\n", code, status[code], (status[code]/requests)*100
}' access.log

Comparing AWK with Alternatives

awk vs sed

awk better for:

  • Working with data columns
  • Mathematical calculations
  • Complex logic (conditions, loops)
  • Report creation

sed better for:

  • Simple text replacement
  • Stream editing
  • Line deletion
  • Text insertion

Example task: Extract email addresses from text

awk

awk 'match($0, /[a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z]{2,}/) {
print substr($0, RSTART, RLENGTH)
}' file.txt

sed (harder and less readable)

sed -n 's/.*\([a-zA-Z0-9._-]\+@[a-zA-Z0-9._-]\+\.[a-zA-Z]\{2,\}\).*/\1/p' file.txt

awk vs Python

awk advantages:

  • Faster for simple tasks (no interpreter overhead)
  • Shorter code for typical operations
  • Built into any Linux
  • More convenient in pipe chains

Python advantages:

  • More standard library capabilities
  • Better for complex logic
  • Easier debugging
  • More documentation

Code comparison — calculate average salary:

awk (1 line)

awk '{sum+=$4; count++} END {print sum/count}' employees.txt

Python (minimum 7 lines)

with open('employees.txt') as f:
total = 0
count = 0
for line in f:
salary = int(line.split()[3])
total += salary
count += 1
print(total / count)

Optimization and Best Practices

Performance

1. Avoid unnecessary operations:

Bad - check on every line

awk '{if (NR > 1) print}' file.txt

Good - use range

awk 'NR > 1' file.txt

2. Use built-in variables:

Slow

awk '{print $1, $2, $3, $4}' file.txt

Faster

awk '{$5=""; print}' file.txt

3. Filter early:

Bad - processes all lines

awk '{sum += $3} $2 == "Sales" {print}' file.txt

Good - filters immediately

awk '$2 == "Sales" {sum += $3; print}' file.txt

Code Readability

Use variables:

Bad

awk '$3 > 5000 && $4 == "IT" {print $1}' file.txt

Good

awk '{
salary = $3
dept = $4
name = $1
if (salary > 5000 && dept == "IT")
print name
}' file.txt

Extract complex programs to files:

script.awk

BEGIN {
FS = ","
OFS = " | "
total = 0
}
NR > 1 {
total += $3 * $4
printf "%s | %s | %.2f\n", $1, $2, $3 * $4
}
END {
print "---"
printf "Total: $%.2f\n", total
}

Usage:

awk -f script.awk data.csv

Common Mistakes and Solutions

Problem: Wrong field separator

Error - CSV processed as spaces

awk '{print $2}' data.csv

Solution - specify separator

awk -F',' '{print $2}' data.csv

Problem: Skipping header

Error - header participates in calculations

awk '{sum += $2} END {print sum}' file.csv

Solution 1 - start from second line

awk 'NR > 1 {sum += $2} END {print sum}' file.csv

Solution 2 - skip non-numeric

awk '{if ($2 ~ /^[0-9]+$/) sum += $2} END {print sum}' file.csv

Problem: Division by zero

Error - possible division by zero

awk '{print $1 / $2}' file.txt

Solution - check

awk '{if ($2 != 0) print $1 / $2; else print "N/A"}' file.txt

Problem: Spaces in CSV fields

Error - field "New York" splits into two

awk -F',' '{print $3}' cities.csv

Solution - trim spaces

awk -F',' '{gsub(/^ +| +$/, "", $3); print $3}' cities.csv

Frequently Asked Questions (FAQ)

How to print last column?

awk '{print $NF}' file.txt

How to print all except first column?

awk '{$1=""; print}' file.txt

Or cleaner:

awk '{for(i=2;i<=NF;i++) printf "%s ", $i; print ""}' file.txt

How to count lines?

awk 'END {print NR}' file.txt

Or simply

wc -l file.txt

How to process files with paths containing spaces?

Use FILENAME variable

awk '{print FILENAME, $0}' "file with spaces.txt"

How to output lines between two patterns?

awk '/START/,/END/' file.txt

How to remove duplicates preserving order?

awk '!seen[$0]++' file.txt

Can you modify original file?

No, awk only reads. For writing:

awk '{print $1, $2}' input.txt > output.txt

How to process JSON in awk?

Awk poorly suited for JSON. Use jq:

jq '.items[] | .name' file.json

Installing and AWK Versions

Check installed version:

awk --version

or

which awk

Main implementations:

  • gawk — GNU Awk, most common in Linux
  • mawk — fast implementation, fewer features
  • nawk — "new awk", POSIX standard
  • original awk — outdated, in modern systems usually symlink to gawk

Installation on different systems:

Ubuntu/Debian

sudo apt install gawk

CentOS/RHEL

sudo yum install gawk

macOS

brew install gawk

On THE.Hosting servers gawk is pre-installed on all plans and ready to use.

Conclusion

The awk command is an indispensable tool in system administrator and DevOps engineer arsenal. It solves in seconds tasks that would require dozens of code lines in other languages.

Main awk advantages: work with structured data by columns, built-in regular expression support, mathematical calculations, minimalist syntax for simple tasks and ability to create complex programs for advanced scenarios.

Having learned basic concepts from this guide you can: analyze web server and application log files, process CSV and other tabular data, automate routine text operations, monitor system resources and build reports.

On THE.Hosting servers you can use awk for web server log analysis, load monitoring and administration task automation. All modern Linux distributions on our VPS include GNU Awk ready to work.

Order VPS for Linux work

Additional Resources:

Other articles

24.02.2026
12
Knowledge base / Instructions
OroCommerce on VPS: B2B Platform for Manufacturers and Distributors
24.02.2026
11
Knowledge base / Instructions
Saleor on VPS: Modern Headless Platform for E-commerce
24.02.2026
16
Knowledge base / All about domains
.cc Domain – A Versatile Solution for Your Project