Backus–Naur Form
2011-08-24 18:31
288 查看
A BNF specification is a set of derivation rules, written as
where <symbol> is a
nonterminal, and the
__expression__ consists of one or more sequences of symbols; more sequences are separated by the
vertical bar, '|', indicating a
choice, the whole being a possible
substitution for the symbol on the left. Symbols that never appear on a left side are
terminals. On the other hand, symbols that appear on a left side are
non-terminals and are always enclosed between the pair <>.
U.S.
postal address:
This translates into English as:
A postal address consists of a name-part, followed by a
street-address part, followed by a
zip-code part.
A name-part consists of either: a personal-part followed by a
last name followed by an optional
suffix (Jr., Sr., or dynastic number) and
end-of-line, or a personal part followed by a name part (this rule illustrates the use of
recursion in BNFs, covering the case of people who use multiple first and middle names and/or initials).
A personal-part consists of either a
first name or an
initial followed by a dot.
A street address consists of a house number, followed by a street name, followed by an optional
apartment specifier, followed by an end-of-line.
A zip-part consists of a
town-name, followed by a comma, followed by a
state code, followed by a ZIP-code followed by an end-of-line.
A opt-jr-part consists of a suffix, such as "Sr.", "Jr." or a
roman-numeral, or an empty string (i.e. nothing).
Note that many things (such as the format of a first-name, apartment specifier, ZIP-code, and Roman numeral) are left unspecified here. If necessary, they may be described using additional BNF rules.
This assumes that no
whitespace is necessary for proper interpretation of the rule. <EOL> represents the appropriate
line-end specifier (in
ASCII, carriage-return and/or line-feed, depending on the
operating system). <rule-name> and <text> are to be substituted with a declared rule's name/label or literal text, respectively.
In the U.S. postal address example above, the entire block-quote is a syntax. Each line or unbroken grouping of lines is a rule; for example one rule begins with "<name-part> ::=". The other part of that rule (aside from a line-end) is an expression, which
consists of two lists separated by a pipe "|". These two lists consists of some terms (three terms and two terms, respectively). Each term in this particular rule is a rule-name.
regular expression repetition operators such as
Extended Backus–Naur Form (EBNF) is a common one. In fact the example above is not the pure form invented for the ALGOL 60 report. The bracket notation "[ ]" was introduced a few years later in
IBM's PL/I definition but is now universally recognised.
ABNF and RBNF are other extensions commonly used to describe
Internet Engineering Task Force (IETF)
protocols.
Parsing expression grammars build on the BNF and
regular expression notations to form an alternative class of
formal grammar, which is essentially
analytic rather than
generative in character.
Many BNF specifications found online today are intended to be human readable and are non-formal. These often include many of the following syntax rules and extensions:
Optional items enclosed in square brackets. E.g. [<item-x>]
Items repeating 0 or more times are enclosed in curly brackets or suffixed with an asterisk. E.g. <word> ::= <letter> {<letter>}
Items repeating 1 or more times are followed by a '+'
Terminals may appear in bold and NonTerminals in plain text rather than using italics and angle brackets
Alternative choices in a production are separated by the ‘|’ symbol. E.g., <alternative-A> | <alternative-B>
Where items need to be grouped they are enclosed in simple parentheses
<symbol> ::= __expression__
where <symbol> is a
nonterminal, and the
__expression__ consists of one or more sequences of symbols; more sequences are separated by the
vertical bar, '|', indicating a
choice, the whole being a possible
substitution for the symbol on the left. Symbols that never appear on a left side are
terminals. On the other hand, symbols that appear on a left side are
non-terminals and are always enclosed between the pair <>.
Example
As an example, consider this possible BNF for aU.S.
postal address:
<postal-address> ::= <name-part> <street-address> <zip-part> <name-part> ::= <personal-part> <last-name> <opt-jr-part> <EOL> | <personal-part> <name-part> <personal-part> ::= <first-name> | <initial> "." <street-address> ::= <house-num> <street-name> <opt-apt-num> <EOL> <zip-part> ::= <town-name> "," <state-code> <ZIP-code> <EOL> <opt-jr-part> ::= "Sr." | "Jr." | <roman-numeral> | ""
This translates into English as:
A postal address consists of a name-part, followed by a
street-address part, followed by a
zip-code part.
A name-part consists of either: a personal-part followed by a
last name followed by an optional
suffix (Jr., Sr., or dynastic number) and
end-of-line, or a personal part followed by a name part (this rule illustrates the use of
recursion in BNFs, covering the case of people who use multiple first and middle names and/or initials).
A personal-part consists of either a
first name or an
initial followed by a dot.
A street address consists of a house number, followed by a street name, followed by an optional
apartment specifier, followed by an end-of-line.
A zip-part consists of a
town-name, followed by a comma, followed by a
state code, followed by a ZIP-code followed by an end-of-line.
A opt-jr-part consists of a suffix, such as "Sr.", "Jr." or a
roman-numeral, or an empty string (i.e. nothing).
Note that many things (such as the format of a first-name, apartment specifier, ZIP-code, and Roman numeral) are left unspecified here. If necessary, they may be described using additional BNF rules.
Further examples
BNF's syntax itself may be represented with a BNF like the following:<syntax> ::= <rule> | <rule> <syntax> <rule> ::= <opt-whitespace> "<" <rule-name> ">" <opt-whitespace> "::=" <opt-whitespace> <expression> <line-end> <opt-whitespace> ::= " " <opt-whitespace> | "" <!-- "" is empty string, i.e. no whitespace --> <expression> ::= <list> | <list> "|" <expression> <line-end> ::= <opt-whitespace> <EOL> | <line-end> <line-end> <list> ::= <term> | <term> <opt-whitespace> <list> <term> ::= <literal> | "<" <rule-name> ">" <literal> ::= '"' <text> '"' | "'" <text> "'" <!-- actually, the original BNF did not use quotes -->
This assumes that no
whitespace is necessary for proper interpretation of the rule. <EOL> represents the appropriate
line-end specifier (in
ASCII, carriage-return and/or line-feed, depending on the
operating system). <rule-name> and <text> are to be substituted with a declared rule's name/label or literal text, respectively.
In the U.S. postal address example above, the entire block-quote is a syntax. Each line or unbroken grouping of lines is a rule; for example one rule begins with "<name-part> ::=". The other part of that rule (aside from a line-end) is an expression, which
consists of two lists separated by a pipe "|". These two lists consists of some terms (three terms and two terms, respectively). Each term in this particular rule is a rule-name.
Variants
There are many variants and extensions of BNF, generally either for the sake of simplicity and succinctness, or to adapt it to a specific application. One common feature of many variants is the use ofregular expression repetition operators such as
*and
+. The
Extended Backus–Naur Form (EBNF) is a common one. In fact the example above is not the pure form invented for the ALGOL 60 report. The bracket notation "[ ]" was introduced a few years later in
IBM's PL/I definition but is now universally recognised.
ABNF and RBNF are other extensions commonly used to describe
Internet Engineering Task Force (IETF)
protocols.
Parsing expression grammars build on the BNF and
regular expression notations to form an alternative class of
formal grammar, which is essentially
analytic rather than
generative in character.
Many BNF specifications found online today are intended to be human readable and are non-formal. These often include many of the following syntax rules and extensions:
Optional items enclosed in square brackets. E.g. [<item-x>]
Items repeating 0 or more times are enclosed in curly brackets or suffixed with an asterisk. E.g. <word> ::= <letter> {<letter>}
Items repeating 1 or more times are followed by a '+'
Terminals may appear in bold and NonTerminals in plain text rather than using italics and angle brackets
Alternative choices in a production are separated by the ‘|’ symbol. E.g., <alternative-A> | <alternative-B>
Where items need to be grouped they are enclosed in simple parentheses
相关文章推荐
- 私人公司并购凭什么就可改变未来国际软件市场竞争格局?
- C#学习笔记四ref out参数
- windows2003 下IIS没有ASP.NET选项卡的
- 老板让你工作更艰难的七种手段
- 关于NSOperation
- Run DTS from stored procedure
- C#学习笔记三字符串相关操作
- WINCE6.0下开始菜单的“挂起(suspend)”是否可见及阻止系统进入睡眠模式
- 与TCL风雨同行
- SQL截取字符串
- WINCE6.0下开始菜单的“挂起(suspend)”是否可见及阻止系统进入睡眠模式
- NHibernate 快速入门教程(Visual Entity 初学者必看)
- java基础类型
- C#学习笔记二函数、函数重载
- GetRegionData这个函数真难用
- 解决WINCE6.0新建工程编译出错的问题
- 解决WINCE6.0新建工程编译出错的问题
- C#学习笔记一类型转换、枚举、foreach
- 通知服务,一个神奇的功能 推荐
- 关于C语言文件操作的小结