您的位置:首页 > 其它

Backus–Naur Form(巴科斯范式)

2010-11-11 21:50 197 查看
From Wikipedia, the free encyclopedia

Jump to: navigation,
search

In computer science, BNF
(Backus Normal Form
or Backus–Naur Form
) is a notation technique for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document formats
, instruction sets and communication protocols
.
It is applied wherever exact descriptions of languages are needed, for
instance, in official language specifications, in manuals, and in
textbooks on programming language theory.

Many extensions and variants of the original notation are used; some are exactly defined, including Extended Backus–Naur Form
(EBNF) and Augmented Backus–Naur Form
(ABNF).

History

The idea of describing the structure of language with rewriting rules can be traced back to at least the work of Pāṇini (around 400 BC), who used it in his description of Sanskrit word structure - hence, some suggest to rename BNF to Panini–Backus Form.
[
1]

. American linguists such as Leonard Bloomfield and Zellig Harris
took this idea a step further by attempting to formalize language and
its study in terms of formal definitions and procedures (around
1920-1960)..

Meanwhile, string rewriting rules as formal, abstract systems
were introduced and studied by mathematicians such as Axel Thue (in 1914), Emil Post (1920s-1940s) and Alan Turing (1936). Noam Chomsky, teaching linguistics to students of information theory at MIT
,
combined linguistics and mathematics, by taking what is essentially
Thue's formalism as the basis for the description of the syntax of natural language; he also introduced a clear distinction between generative rules (those of context-free grammars) and transformation rules (1956).[
2]

[
3]

John Backus
, a programming language designer at IBM, adopted Chomsky's generative rules[
4]

to describe the syntax of the new programming language IAL, known today as ALGOL 58 (1959),[
5]

using the BNF notation.

Further development of ALGOL led to ALGOL 60; in its report (1963), Peter Naur named Backus's notation Backus Normal Form
, and simplified it to minimize the character set used. However, Donald Knuth argued that BNF should rather be read as Backus-Naur Form
, as it is "not a normal form in any sense",[
6]

unlike, for instance, Chomsky Normal Form
.

Introduction

A BNF specification is a set of derivation rules, written as

<
symbol
>
::=
__expression__


where <symbol> is a nonterminal

, and the __expression__ consists of one or more sequences of symbols; more sequences are separated by the vertical bar, '|', indicating a choice, the whole being a possible substitution for the symbol on the left. Symbols that never appear on a left side are terminals

. On the other hand, symbols that appear on a left side are non-terminals

and are always enclosed between the pair <>.

[edit]
Example

As an example, consider this possible BNF for a U.S. postal address:

<
postal-address
>
::=
<
name-part
>
<
street-address
>
<
zip-part
>

<
name-part
>
::=
<
personal-part
>
<
last-name
>
<
opt-jr-part
>
<
EOL
>

|
<
personal-part
>
<
name-part
>

<
personal-part
>
::=
<
first-name
>
|
<
initial
>
"."

<
street-address
>
::=
<
house-num
>
<
street-name
>
<
opt-apt-num
>
<
EOL
>

<
zip-part
>
::=
<
town-name
>
","
<
state-code
>
<
ZIP-code
>
<
EOL
>

<
opt-jr-part
>
::=
"Sr."
|
"Jr."
|
<
roman-numeral
>
|
""


This translates into English as:

A postal address consists of a name-part, followed by a street-address
part, followed by a zip-code
part.

A name-part consists of either: a personal-part followed by a last name
followed by an optional "jr-part" (Jr., Sr., or dynastic number) and end-of-line
, or a personal part followed by a name part (this rule illustrates the use of recursion in BNFs, covering the case of people who use multiple first and middle names and/or initials).

A personal-part consists of either a first name
or an initial followed by a dot.

A street address consists of a house number, followed by a street name, followed by an optional apartment specifier, followed by an end-of-line.

A zip-part consists of a town-name, followed by a comma, followed by a state code
, followed by a ZIP-code followed by an end-of-line.

A opt-jr-part consists of "Sr." or "Jr." or a roman-numeral or an empty string (i.e. nothing).

Note that many things (such as the format of a first-name, apartment
specifier, ZIP-code, and Roman numeral) are left unspecified here. If
necessary, they may be described using additional BNF rules.

Further examples

BNF's syntax itself may be represented with a BNF like the following:

<
syntax
>
::=
<
rule
>
|
<
rule
>
<
syntax
>

<
rule
>
::=
<
opt-whitespace
>
"<"
<
rule-name
>
">"
<
opt-whitespace
>
"::="

<
opt-whitespace
>
<
expression
>
<
line-end
>

<
opt-whitespace
>
::=
" "
<
opt-whitespace
>
|
""
<
!-- ""
is empty string, i.e. no whitespace -->

<
expression
>
::=
<
list
>
|
<
list
>
"|"
<
expression
>

<
line-end
>
::=
<
opt-whitespace
>
<
EOL
>
|
<
line-end
>
<
line-end
>

<
list
>
::=
<
term
>
|
<
term
>
<
opt-whitespace
>
<
list
>

<
term
>
::=
<
literal
>
|
"<"
<
rule-name
>
">"

<
literal
>
::=
'"'
<
text
>
'"'
|
"'"
<
text
>
"'"
<
!-- actually, the original BNF did not use quotes --
>


This assumes that no whitespace
is necessary for proper interpretation of the rule. <EOL> represents the appropriate line-end specifier (in ASCII, carriage-return and/or line-feed, depending on the operating system). <rule-name> and <text> are to be substituted with a declared rule's name/label or literal text, respectively.

In the U.S. postal address example above, the entire block-quote is a
syntax. Each line or unbroken grouping of lines is a rule; for example
one rule begins with "<name-part> ::=". The other part of that
rule (aside from a line-end) is an expression, which consists of two
lists separated by a pipe "|". These two lists consists of some terms
(three terms and two terms, respectively). Each term in this particular
rule is a rule-name.

Variants

There are many variants and extensions of BNF, generally either for
the sake of simplicity and succinctness, or to adapt it to a specific
application. One common feature of many variants is the use of regular expression repetition operators such as
*

and
+

. The Extended Backus–Naur Form
(EBNF) is a common one. In fact the example above is not the pure form
invented for the ALGOL 60 report. The bracket notation "[ ]" was
introduced a few years later in IBM
's PL/I definition but is now universally recognised. ABNF and RBNF are other extensions commonly used to describe IETF
protocols
.

Parsing expression grammars build on the BNF and regular expression notations to form an alternative class of formal grammar, which is essentially analytic
rather than generative in character.

Many BNF specifications found online today are intended to be human
readable and are non-formal. These often include many of the following
syntax rules and extensions:

Optional items enclosed in square brackets. E.g. [<item-x>]

Items repeating 0 or more times are enclosed in curly brackets or
suffixed with an asterisk. E.g. <word> ::= <letter>
{<letter>}

Items repeating 1 or more times are followed by a '+'

Terminals may appear in bold and NonTerminals in plain text rather than using italics and angle brackets

Alternative choices in a production are separated by the ‘|’ symbol. E.g., <alternative-A> | <alternative-B>

Where items need to be grouped they are enclosed in simple parentheses

http://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: