续正则表达式与Linux文本三剑客Day2

awk命令

模式

BEGINEND其它条件

  • BEGIN模式是处理文本之前需要执行的操作
  • END模式是处理文本之后需要执行的操作
  • 其它条件可以自己定义,如:NR==2显示第二行,NR<4显示小于4的行

BEGIN

1
awk 'BEGIN{print "我会先被打印!"}{print $0}' awk1.txt

执行演示

1
2
3
4
5
6
[root@localhost ~]# awk 'BEGIN{print "我会先被打印!"}{print $0}' awk1.txt
我会先被打印!
a1 a2 a3 a4
b1 b2 b3 b4
c1 c2 c3 c4
d1 d2 d3 d4

END

1
awk 'BEGIN{print "我是BEGIN先执行的内容!"}{print $0}END{print "我是END后执行的内容!"}' awk2.txt

执行演示

1
2
3
4
5
6
[root@localhost ~]# awk 'BEGIN{print "我是BEGIN先执行的内容!"}{print $0}END{print "我是END后执行的内容!"}' awk2.txt
我是BEGIN先执行的内容!
a:b:c:d
1:2:3:4
x:x:x:x
我是END后执行的内容!

其它条件

awk模式关系运算符

关系运算符 解释 示例
< 小于 NR<2
<= 小于等于 NR<=3
== 等于 NR==1
!= 不等于 NR!=1
>= 大于等于 NR>=1
> 大于 NR>1
~ 匹配正则 x~/正则表达式/
!~ 不匹配正则 x!~/正则表达式/
!=
1
awk 'NR!=1{print NR,$0}' awk1.txt

执行演示

1
2
3
4
[root@localhost ~]# awk 'NR!=1{print NR,$0}' awk1.txt
2 b1 b2 b3 b4
3 c1 c2 c3 c4
4 d1 d2 d3 d4

正则实践

使用正则的语法

1
2
grep '正则表达式' 文件
awk '/正则表达式/动作' 文件
1
2
3
4
5
6
7
8
[root@localhost ~]# cat awk2.txt
a:b:c:d
1:2:3:4
x:x:x:x
[root@localhost ~]# awk '/d$/{print $0}' awk2.txt
a:b:c:d
[root@localhost ~]# grep "d$" awk2.txt
a:b:c:d

awk nginx企业实战

日志文件

access.log

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
60.255.73.42 - - [24/Jul/2022:18:55:11 +0800] "GET /js/schedule_index.js HTTP/1.1" 404 555 "https://lptexas.top/images/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.134 Safari/537.36 Edg/103.0.1264.71"
60.255.73.42 - - [24/Jul/2022:18:55:11 +0800] "GET /js/canvas-nest.js HTTP/1.1" 304 0 "https://lptexas.top/images/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.134 Safari/537.36 Edg/103.0.1264.71"
60.255.73.42 - - [24/Jul/2022:18:55:11 +0800] "GET /img/post/gawr_gura1.jpg HTTP/1.1" 304 0 "https://lptexas.top/images/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.134 Safari/537.36 Edg/103.0.1264.71"
60.255.73.42 - - [24/Jul/2022:18:55:12 +0800] "GET /img/post/gawr_gura2.jpg HTTP/1.1" 304 0 "https://lptexas.top/images/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.134 Safari/537.36 Edg/103.0.1264.71"
60.255.73.42 - - [24/Jul/2022:18:55:12 +0800] "GET /img/web-info/avator.jpg HTTP/1.1" 304 0 "https://lptexas.top/images/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.134 Safari/537.36 Edg/103.0.1264.71"
60.255.73.42 - - [24/Jul/2022:18:55:12 +0800] "GET /css/schedule_style.css HTTP/1.1" 404 555 "https://lptexas.top/images/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.134 Safari/537.36 Edg/103.0.1264.71"
60.255.73.42 - - [24/Jul/2022:18:55:13 +0800] "GET /js/schedule_index.js HTTP/1.1" 404 555 "https://lptexas.top/images/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.134 Safari/537.36 Edg/103.0.1264.71"
184.105.247.195 - - [24/Jul/2022:18:55:37 +0800] "GET / HTTP/1.1" 301 169 "-" "-"
156.96.154.202 - - [24/Jul/2022:18:55:48 +0800] "GET / HTTP/1.1" 301 169 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
156.96.154.202 - - [24/Jul/2022:18:55:48 +0800] "" 400 0 "-" "-"
156.96.154.202 - - [24/Jul/2022:18:55:49 +0800] "GET / HTTP/1.1" 200 47180 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
156.96.154.202 - - [24/Jul/2022:18:55:49 +0800] "GET //wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
156.96.154.202 - - [24/Jul/2022:18:55:50 +0800] "GET //xmlrpc.php?rsd HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
156.96.154.202 - - [24/Jul/2022:18:55:50 +0800] "GET / HTTP/1.1" 200 47180 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
156.96.154.202 - - [24/Jul/2022:18:55:50 +0800] "GET //blog/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
156.96.154.202 - - [24/Jul/2022:18:55:50 +0800] "GET //web/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
156.96.154.202 - - [24/Jul/2022:18:55:50 +0800] "GET //wordpress/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
156.96.154.202 - - [24/Jul/2022:18:55:51 +0800] "GET //website/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
156.96.154.202 - - [24/Jul/2022:18:55:51 +0800] "GET //wp/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
156.96.154.202 - - [24/Jul/2022:18:55:51 +0800] "GET //news/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
156.96.154.202 - - [24/Jul/2022:18:55:51 +0800] "GET //2018/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
156.96.154.202 - - [24/Jul/2022:18:55:51 +0800] "GET //2019/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
156.96.154.202 - - [24/Jul/2022:18:55:52 +0800] "GET //shop/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
156.96.154.202 - - [24/Jul/2022:18:55:52 +0800] "GET //wp1/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
156.96.154.202 - - [24/Jul/2022:18:55:52 +0800] "GET //test/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
156.96.154.202 - - [24/Jul/2022:18:55:52 +0800] "GET //media/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
156.96.154.202 - - [24/Jul/2022:18:55:52 +0800] "GET //wp2/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
156.96.154.202 - - [24/Jul/2022:18:55:53 +0800] "GET //site/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
156.96.154.202 - - [24/Jul/2022:18:55:53 +0800] "GET //cms/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
156.96.154.202 - - [24/Jul/2022:18:55:53 +0800] "GET //sito/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
156.96.154.202 - - [24/Jul/2022:18:55:53 +0800] "" 400 0 "-" "-"
111.30.182.61 - - [24/Jul/2022:18:56:25 +0800] "GET / HTTP/1.1" 301 169 "-" "DNSPod-Monitor/2.0"

统计访客ip数量

使用的辅助命令

1
2
3
sort -n 数字从大到小排序
wc -l 统计行数,也就是ip总条目数
uniq 去除重复结果

通过观察access.log文件,我们发现文件分隔符为空格,且第一列为我们所需要的ip

1
awk '{print $1}' 30access.log|sort -n

执行演示

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
[root@localhost ~]# awk '{print $1}' access.log|sort -n
60.255.73.42
60.255.73.42
60.255.73.42
60.255.73.42
60.255.73.42
60.255.73.42
60.255.73.42
111.30.182.61
156.96.154.202
156.96.154.202
156.96.154.202
156.96.154.202
156.96.154.202
156.96.154.202
156.96.154.202
156.96.154.202
156.96.154.202
156.96.154.202
156.96.154.202
156.96.154.202
156.96.154.202
156.96.154.202
156.96.154.202
156.96.154.202
156.96.154.202
156.96.154.202
156.96.154.202
156.96.154.202
156.96.154.202
156.96.154.202
156.96.154.202
184.105.247.195

对ip去重

1
2
uniq 
-c 显示行号

再次执行

1
2
3
4
5
6
7
8
9
10
11
[root@localhost ~]# awk '{print $1}' access.log|sort -n|uniq
60.255.73.42
111.30.182.61
156.96.154.202
184.105.247.195
#######
[root@localhost ~]# awk '{print $1}' access.log|sort -n|uniq -c
7 60.255.73.42
1 111.30.182.61
23 156.96.154.202
1 184.105.247.195

文本三剑客练习

准备文件

REg2.txt

1
2
3
4
5
6
7
8
9
10
 I am studying REg.
I like grep.

My phone number is +86 123456789000

#我是注释
aoooz
aooz
aoz
az

grep.txt

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
polkitd:x:999:998:User for polkitd:/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
sshd1:x:4:4:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
sshd2:x:14:14:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
chrony:x:998:996::/var/lib/chrony:/sbin/nologin
commen:x:1000:1000::/home/commen:/bin/bash

grep1.txt

1
2
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin

grep2.txt

1
bin:x:1:1:bin:/bin:/sbin/nologin

grep

匹配以root、sshd开头的行

1
2
3
4
5
[root@localhost ~]# grep "^(root|sshd)" grep.txt -E
root:x:0:0:root:/root:/bin/bash
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
sshd1:x:4:4:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
sshd2:x:14:14:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin

匹配root、sshd的行(不出现sshd1、sshd2)

1
2
3
[root@localhost ~]# grep "^(root|sshd)\>" grep.txt -E
root:x:0:0:root:/root:/bin/bash
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin

过滤出bin开头的行并显示行号

1
2
[root@localhost ~]# grep -n "^bin" grep.txt
2:bin:x:1:1:bin:/bin:/sbin/nologin

统计sshd开头的行出现的次数

1
2
[root@localhost ~]# grep -c "^sshd" grep.txt
3

匹配sshd开头的行最多出现两次

1
2
3
[root@localhost ~]# grep -m 2 "^sshd" grep.txt
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
sshd1:x:4:4:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin

匹配多个文件,列出存在匹配信息的文件名字

1
2
3
4
5
6
7
[root@localhost ~]# grep "^bin" grep.txt grep1.txt grep2.txt
grep.txt:bin:x:1:1:bin:/bin:/sbin/nologin
grep1.txt:bin:x:1:1:bin:/bin:/sbin/nologin
##############
[root@localhost ~]# grep -l "^bin" grep.txt grep1.txt grep2.txt
grep.txt
grep1.txt

过滤除了root开头的行

1
2
[root@localhost ~]# grep -v "^root" grep1.txt
bin:x:1:1:bin:/bin:/sbin/nologin

显示不以/bin/bash结尾的行

减少点字数,这里就直接写命令了

1
grep -v "/bin/bash$" grep.txt -n

找出有两位数或三位数的行

用到拓展正则

  • [abc]或[a-c]、[012]或[0-2]

  • a{n,m}

  • \>

  • \<

此题将作为Day1正则符号的案例其一

1
2
3
grep "[0-9]{2,3}" grep.txt -E
grep "[0-9]{2,3}\>" grep.txt -E
grep "\<[0-9]{2,3}\>" grep.txt -E

找出文件中以至少n个空白字符开头,后面是非空字符的行

单独的空格表示方法(正则的知识)

[[:space:]]

方法一

1
2
3
4
5
[root@localhost ~]# grep "^[[:space:]].*" REg2.txt -n
1: I am studying REg.
4: My phone number is +86 123456789000
8: aooz
9: aoz

方法二

拓展正则

[^[:space:]]

空格取反,表示取其它字符

1
2
3
4
5
[root@localhost ~]# grep "^[[:space:]]+[^[:space:]]" REg2.txt -n -E
1: I am studying REg.
4: My phone number is +86 123456789000
8: aooz
9: aoz

不区分大小写找出所有含i的行

方法一

-i参数

1
2
3
4
[root@localhost ~]# grep -i "i" REg2.txt -n
1: I am studying REg.
2:I like grep.
4: My phone number is +86 123456789000

方法二

拓展正则

^(i|I)

1
2
3
4
[root@localhost ~]# grep -E "(i|I)" REg2.txt -n
1: I am studying REg.
2:I like grep.
4: My phone number is +86 123456789000

方法三

[]

1
2
3
4
[root@localhost ~]# grep "[i|I]" REg2.txt -n
1: I am studying REg.
2:I like grep.
4: My phone number is +86 123456789000

找出root,sshd,nobody的信息

1
2
3
4
[root@localhost ~]# grep -E "^(root|sshd|nobody)\>" grep.txt
root:x:0:0:root:/root:/bin/bash
nobody:x:99:99:Nobody:/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin

找出/etc/init.d/functions文件中的所有函数名

注意转义符

1
grep -E "[a-zA-Z]+\(\)" /etc/init.d/functions

找出用户名和shell相同的用户

即,用户名和使用的解释器同名

如:sync:x:5:0:sync:/sbin:/bin/sync

补充拓展正则

1
2
3
\1 表示引用前面分组的内容,且为第一个分组,分组指用()括起来的内容
\2
...
1
2
3
grep -E "^([^:])\>+.*\1$" grep.txt -n
#解释正则的部分:
#匹配以非冒号开头的任意字符,到冒号前匹配结束,然后匹配前一个字符一次或者多次,以引用()分组的结果为结尾
1
2
3
4
[root@localhost ~]# grep -E "^([^:]+\>).*\1$" grep.txt -n
6:sync:x:5:0:sync:/sbin:/bin/sync
7:shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
8:halt:x:7:0:halt:/sbin:/sbin/halt