ABNF格式说明

一、简介

ABNF全称是Augmented Backus-Naur Form,广泛用于很多的互联网文档说明中。主要作用就是以简洁的字符串来描述某些规范。使用了ABNF的标准说明有:电子邮件的标准说明[RFC733]和之后的[RFC822],HTTP1.1协议的[RFC7230]。因此,要想阅读这些文档,必须了解ABNF的格式。ABNF在RFC5234中进行了详细的说明。

二、规则定义

2.1 规则命名

ABNF中规则的命名是大小写不敏感的,由字母开头,后面跟上字母、数字或连字符

2.2 规则格式

一个规则是如下格式定义的:

name = elements crlf

name指的是规则名,elements是一个或多个规则名,或者是终端字符,crlf也就是我们常说的\r\n

2.3 终端值

一个规则被解释成一个字符串。每个字符都是一个非负的数字(比如ASCII码中a对应十进制的97)。终端值就是这些数字。目前定义了以下几种进制:

b     = binary          ;二进制
d     = decimal         ;十进制
x     = hexadecimal     ;十六进制

因此:

CR = %d13
CR = %x0D

使用”.”号来分割字符

CRLF = %d13.10

2.4 额外的编码

根据编码不同,所显示的值可能也不同。比如7-bit的US-ASCII和16-bit的unicode编码,结果是截然不同的。目前7-bit的US-ASCII编码是最常用的。

三、 运算符

3.1 连接: Rule1 Rule2

连接的意思就是值一个规则可能由其他规则连接而成。比如

foo = %x61 ; a
bar = %x62 ; b
mumble = foo bar foo

因此规则mumnle = aba

3.2 选择: Rule1 / Rule2

选择就是多选一的意思。比如

rule = foo / bar

那么rule是foo或者bar都接受的

3.3 扩展的选择: Rule1 =/ Rule2

ruleset = rule1 / rule2
ruleset =/ rule3
ruleset =/ rule4 / rule5

那么ruleset最终为

ruleset = rule1 / rule2 / rule3 / rule4 / rule5

3.4 范围选择: %c##-##

DIGIT = %x30-39

等价于

DIGIT = "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9"

3.5 序列组: (Rule1 Rule2)

序列组主要是为了阅读上的方便

elem (foo / bar) blat

等价于

(elem foo blat) or (elem bar blat)

elem foo / bar blat

等价于

(elem foo) or (bar blat)

3.6 变量重复: *Rule

完整的格式为:<a>*<b>element
<a><b>是可选的数字值,代表最少a个,最多b个

因此:

*<element> 0到任意多个
1*<element> 至少1个
3*3<element> 只能是3个
1*2<element> 1到2个

3.7 指定的重复: nRule

n<element>等价于n*n<element>

3.8 可选的序列: [Rule]

[Rule]代表这个规则可有可无。因此[foo bar]等价于*1[foo bar]

3.9 注释: ;Comment

使用;来表示注释

3.10 运算符优先级

运算符优先级从上往下排序如下:

规则名, 单值, 终端值
注释
范围取值
重复
组, 可选
连接
选择

四、使用ABNF定义ABNF

rulelist = 1*( rule / (*c-wsp c-nl) )

rule = rulename defined-as elements c-nl
; continues if next line starts
; with white space

rulename = ALPHA *(ALPHA / DIGIT / "-")

defined-as = *c-wsp ("=" / "=/") *c-wsp
; basic rules definition and
; incremental alternatives

elements = alternation *c-wsp

c-wsp = WSP / (c-nl WSP)

c-nl = comment / CRLF
; comment or newline

comment = ";" *(WSP / VCHAR) CRLF

alternation = concatenation
*(*c-wsp "/" *c-wsp concatenation)

concatenation = repetition *(1*c-wsp repetition)

repetition = [repeat] element

repeat = 1*DIGIT / (*DIGIT "*" *DIGIT)

element = rulename / group / option /
char-val / num-val / prose-val

group = "(" *c-wsp alternation *c-wsp ")"

option = "[" *c-wsp alternation *c-wsp "]"

char-val = DQUOTE *(%x20-21 / %x23-7E) DQUOTE
; quoted string of SP and VCHAR
; without DQUOTE

num-val = "%" (bin-val / dec-val / hex-val)

bin-val = "b" 1*BIT
[ 1*("." 1*BIT) / ("-" 1*BIT) ]
; series of concatenated bit values
; or single ONEOF range

dec-val = "d" 1*DIGIT
[ 1*("." 1*DIGIT) / ("-" 1*DIGIT) ]

hex-val = "x" 1*HEXDIG
[ 1*("." 1*HEXDIG) / ("-" 1*HEXDIG) ]

prose-val = "<" *(%x20-3D / %x3F-7E) ">"
; bracketed string of SP and VCHAR
; without angles
; prose description, to be used as
; last resort

附录:核心规则

ALPHA = %x41-5A / %x61-7A ; A-Z / a-z

BIT = "0" / "1"

CHAR = %x01-7F
; any 7-bit US-ASCII character,
; excluding NUL

CR = %x0D
; carriage return

CRLF = CR LF
; Internet standard newline

CTL = %x00-1F / %x7F
; controls

DIGIT = %x30-39
; 0-9

DQUOTE = %x22
; " (Double Quote)

HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F"

HTAB = %x09
; horizontal tab

LF = %x0A
; linefeed

LWSP = *(WSP / CRLF WSP)
; Use of this linear-white-space rule
; permits lines containing only white
; space that are no longer legal in
; mail headers and have caused
; interoperability problems in other
; contexts.
; Do not use when defining mail
; headers and use with caution in
; other contexts.

OCTET = %x00-FF
; 8 bits of data

SP = %x20

VCHAR = %x21-7E
; visible (printing) characters

WSP = SP / HTAB
; white space

RFC5234地址:https://tools.ietf.org/html/rfc5234