续正则表达式与Linux文本三剑客Day2
awk命令
模式
BEGIN
与END
与其它条件
- BEGIN模式是处理文本之前需要执行的操作
- END模式是处理文本之后需要执行的操作
- 其它条件可以自己定义,如:NR==2显示第二行,NR<4显示小于4的行
BEGIN
1
| awk 'BEGIN{print "我会先被打印!"}{print $0}' awk1.txt
|
执行演示
1 2 3 4 5 6
| [root@localhost ~] 我会先被打印! a1 a2 a3 a4 b1 b2 b3 b4 c1 c2 c3 c4 d1 d2 d3 d4
|
END
1
| awk 'BEGIN{print "我是BEGIN先执行的内容!"}{print $0}END{print "我是END后执行的内容!"}' awk2.txt
|
执行演示
1 2 3 4 5 6
| [root@localhost ~] 我是BEGIN先执行的内容! a:b:c:d 1:2:3:4 x:x:x:x 我是END后执行的内容!
|
其它条件
awk模式关系运算符
关系运算符 |
解释 |
示例 |
< |
小于 |
NR<2 |
<= |
小于等于 |
NR<=3 |
== |
等于 |
NR==1 |
!= |
不等于 |
NR!=1 |
>= |
大于等于 |
NR>=1 |
> |
大于 |
NR>1 |
~ |
匹配正则 |
x~/正则表达式/ |
!~ |
不匹配正则 |
x!~/正则表达式/ |
!=
1
| awk 'NR!=1{print NR,$0}' awk1.txt
|
执行演示
1 2 3 4
| [root@localhost ~] 2 b1 b2 b3 b4 3 c1 c2 c3 c4 4 d1 d2 d3 d4
|
正则实践
使用正则的语法
1 2
| grep '正则表达式' 文件 awk '/正则表达式/动作' 文件
|
1 2 3 4 5 6 7 8
| [root@localhost ~] a:b:c:d 1:2:3:4 x:x:x:x [root@localhost ~] a:b:c:d [root@localhost ~] a:b:c:d
|
awk nginx企业实战
日志文件
access.log
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
| 60.255.73.42 - - [24/Jul/2022:18:55:11 +0800] "GET /js/schedule_index.js HTTP/1.1" 404 555 "https://lptexas.top/images/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.134 Safari/537.36 Edg/103.0.1264.71" 60.255.73.42 - - [24/Jul/2022:18:55:11 +0800] "GET /js/canvas-nest.js HTTP/1.1" 304 0 "https://lptexas.top/images/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.134 Safari/537.36 Edg/103.0.1264.71" 60.255.73.42 - - [24/Jul/2022:18:55:11 +0800] "GET /img/post/gawr_gura1.jpg HTTP/1.1" 304 0 "https://lptexas.top/images/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.134 Safari/537.36 Edg/103.0.1264.71" 60.255.73.42 - - [24/Jul/2022:18:55:12 +0800] "GET /img/post/gawr_gura2.jpg HTTP/1.1" 304 0 "https://lptexas.top/images/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.134 Safari/537.36 Edg/103.0.1264.71" 60.255.73.42 - - [24/Jul/2022:18:55:12 +0800] "GET /img/web-info/avator.jpg HTTP/1.1" 304 0 "https://lptexas.top/images/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.134 Safari/537.36 Edg/103.0.1264.71" 60.255.73.42 - - [24/Jul/2022:18:55:12 +0800] "GET /css/schedule_style.css HTTP/1.1" 404 555 "https://lptexas.top/images/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.134 Safari/537.36 Edg/103.0.1264.71" 60.255.73.42 - - [24/Jul/2022:18:55:13 +0800] "GET /js/schedule_index.js HTTP/1.1" 404 555 "https://lptexas.top/images/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.134 Safari/537.36 Edg/103.0.1264.71" 184.105.247.195 - - [24/Jul/2022:18:55:37 +0800] "GET / HTTP/1.1" 301 169 "-" "-" 156.96.154.202 - - [24/Jul/2022:18:55:48 +0800] "GET / HTTP/1.1" 301 169 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" 156.96.154.202 - - [24/Jul/2022:18:55:48 +0800] "" 400 0 "-" "-" 156.96.154.202 - - [24/Jul/2022:18:55:49 +0800] "GET / HTTP/1.1" 200 47180 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" 156.96.154.202 - - [24/Jul/2022:18:55:49 +0800] "GET //wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" 156.96.154.202 - - [24/Jul/2022:18:55:50 +0800] "GET //xmlrpc.php?rsd HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" 156.96.154.202 - - [24/Jul/2022:18:55:50 +0800] "GET / HTTP/1.1" 200 47180 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" 156.96.154.202 - - [24/Jul/2022:18:55:50 +0800] "GET //blog/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" 156.96.154.202 - - [24/Jul/2022:18:55:50 +0800] "GET //web/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" 156.96.154.202 - - [24/Jul/2022:18:55:50 +0800] "GET //wordpress/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" 156.96.154.202 - - [24/Jul/2022:18:55:51 +0800] "GET //website/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" 156.96.154.202 - - [24/Jul/2022:18:55:51 +0800] "GET //wp/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" 156.96.154.202 - - [24/Jul/2022:18:55:51 +0800] "GET //news/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" 156.96.154.202 - - [24/Jul/2022:18:55:51 +0800] "GET //2018/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" 156.96.154.202 - - [24/Jul/2022:18:55:51 +0800] "GET //2019/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" 156.96.154.202 - - [24/Jul/2022:18:55:52 +0800] "GET //shop/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" 156.96.154.202 - - [24/Jul/2022:18:55:52 +0800] "GET //wp1/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" 156.96.154.202 - - [24/Jul/2022:18:55:52 +0800] "GET //test/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" 156.96.154.202 - - [24/Jul/2022:18:55:52 +0800] "GET //media/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" 156.96.154.202 - - [24/Jul/2022:18:55:52 +0800] "GET //wp2/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" 156.96.154.202 - - [24/Jul/2022:18:55:53 +0800] "GET //site/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" 156.96.154.202 - - [24/Jul/2022:18:55:53 +0800] "GET //cms/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" 156.96.154.202 - - [24/Jul/2022:18:55:53 +0800] "GET //sito/wp-includes/wlwmanifest.xml HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" 156.96.154.202 - - [24/Jul/2022:18:55:53 +0800] "" 400 0 "-" "-" 111.30.182.61 - - [24/Jul/2022:18:56:25 +0800] "GET / HTTP/1.1" 301 169 "-" "DNSPod-Monitor/2.0"
|
统计访客ip数量
使用的辅助命令
1 2 3
| sort -n 数字从大到小排序 wc -l 统计行数,也就是ip总条目数 uniq 去除重复结果
|
通过观察access.log文件,我们发现文件分隔符为空格,且第一列为我们所需要的ip
1
| awk '{print $1}' 30access.log|sort -n
|
执行演示
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
| [root@localhost ~] 60.255.73.42 60.255.73.42 60.255.73.42 60.255.73.42 60.255.73.42 60.255.73.42 60.255.73.42 111.30.182.61 156.96.154.202 156.96.154.202 156.96.154.202 156.96.154.202 156.96.154.202 156.96.154.202 156.96.154.202 156.96.154.202 156.96.154.202 156.96.154.202 156.96.154.202 156.96.154.202 156.96.154.202 156.96.154.202 156.96.154.202 156.96.154.202 156.96.154.202 156.96.154.202 156.96.154.202 156.96.154.202 156.96.154.202 156.96.154.202 156.96.154.202 184.105.247.195
|
对ip去重
再次执行
1 2 3 4 5 6 7 8 9 10 11
| [root@localhost ~] 60.255.73.42 111.30.182.61 156.96.154.202 184.105.247.195
[root@localhost ~] 7 60.255.73.42 1 111.30.182.61 23 156.96.154.202 1 184.105.247.195
|
文本三剑客练习
准备文件
REg2.txt
1 2 3 4 5 6 7 8 9 10
| I am studying REg. I like grep.
My phone number is +86 123456789000
aoooz aooz aoz az
|
grep.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
| root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin sync:x:5:0:sync:/sbin:/bin/sync shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown halt:x:7:0:halt:/sbin:/sbin/halt mail:x:8:12:mail:/var/spool/mail:/sbin/nologin operator:x:11:0:operator:/root:/sbin/nologin games:x:12:100:games:/usr/games:/sbin/nologin ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin nobody:x:99:99:Nobody:/:/sbin/nologin systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin dbus:x:81:81:System message bus:/:/sbin/nologin polkitd:x:999:998:User for polkitd:/:/sbin/nologin sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin sshd1:x:4:4:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin sshd2:x:14:14:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin postfix:x:89:89::/var/spool/postfix:/sbin/nologin chrony:x:998:996::/var/lib/chrony:/sbin/nologin commen:x:1000:1000::/home/commen:/bin/bash
|
grep1.txt
1 2
| root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin
|
grep2.txt
1
| bin:x:1:1:bin:/bin:/sbin/nologin
|
grep
匹配以root、sshd开头的行
1 2 3 4 5
| [root@localhost ~] root:x:0:0:root:/root:/bin/bash sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin sshd1:x:4:4:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin sshd2:x:14:14:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
|
匹配root、sshd的行(不出现sshd1、sshd2)
1 2 3
| [root@localhost ~] root:x:0:0:root:/root:/bin/bash sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
|
过滤出bin开头的行并显示行号
1 2
| [root@localhost ~] 2:bin:x:1:1:bin:/bin:/sbin/nologin
|
统计sshd开头的行出现的次数
匹配sshd开头的行最多出现两次
1 2 3
| [root@localhost ~] sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin sshd1:x:4:4:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
|
匹配多个文件,列出存在匹配信息的文件名字
1 2 3 4 5 6 7
| [root@localhost ~] grep.txt:bin:x:1:1:bin:/bin:/sbin/nologin grep1.txt:bin:x:1:1:bin:/bin:/sbin/nologin
[root@localhost ~] grep.txt grep1.txt
|
过滤除了root开头的行
1 2
| [root@localhost ~] bin:x:1:1:bin:/bin:/sbin/nologin
|
显示不以/bin/bash结尾的行
减少点字数,这里就直接写命令了
1
| grep -v "/bin/bash$" grep.txt -n
|
找出有两位数或三位数的行
用到拓展正则
[abc]或[a-c]、[012]或[0-2]
a{n,m}
\>
\<
此题将作为Day1正则符号的案例其一
1 2 3
| grep "[0-9]{2,3}" grep.txt -E grep "[0-9]{2,3}\>" grep.txt -E grep "\<[0-9]{2,3}\>" grep.txt -E
|
找出文件中以至少n个空白字符开头,后面是非空字符的行
单独的空格表示方法(正则的知识)
[[:space:]]
方法一
1 2 3 4 5
| [root@localhost ~] 1: I am studying REg. 4: My phone number is +86 123456789000 8: aooz 9: aoz
|
方法二
拓展正则
[^[:space:]]
空格取反,表示取其它字符
1 2 3 4 5
| [root@localhost ~] 1: I am studying REg. 4: My phone number is +86 123456789000 8: aooz 9: aoz
|
不区分大小写找出所有含i的行
方法一
-i参数
1 2 3 4
| [root@localhost ~] 1: I am studying REg. 2:I like grep. 4: My phone number is +86 123456789000
|
方法二
拓展正则
^(i|I)
1 2 3 4
| [root@localhost ~] 1: I am studying REg. 2:I like grep. 4: My phone number is +86 123456789000
|
方法三
[]
1 2 3 4
| [root@localhost ~] 1: I am studying REg. 2:I like grep. 4: My phone number is +86 123456789000
|
找出root,sshd,nobody的信息
1 2 3 4
| [root@localhost ~] root:x:0:0:root:/root:/bin/bash nobody:x:99:99:Nobody:/:/sbin/nologin sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
|
找出/etc/init.d/functions文件中的所有函数名
注意转义符
1
| grep -E "[a-zA-Z]+\(\)" /etc/init.d/functions
|
找出用户名和shell相同的用户
即,用户名和使用的解释器同名
如:sync:x:5:0:sync:/sbin:/bin/sync
补充拓展正则
1 2 3
| \1 表示引用前面分组的内容,且为第一个分组,分组指用()括起来的内容 \2 ...
|
1 2 3
| grep -E "^([^:])\>+.*\1$" grep.txt -n
|
1 2 3 4
| [root@localhost ~] 6:sync:x:5:0:sync:/sbin:/bin/sync 7:shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown 8:halt:x:7:0:halt:/sbin:/sbin/halt
|