《Engineering A Compiler》 1. An Overview of Compilation【好书翻译】

An Overview of Compilation 

【Syntax 】
In a compiler, this task is called syntax analysis. To perform syntax analysis efficiently, the compiler needs:
1. a formal definition of the source language,
2. an efficient membership test for the source language, and
3. a plan for how to handle illegal inputs.

   Mathematically, the source language is a set, usually infinite, of strings defined by some finite set of rules. The compiler’s front end must determine whether the source program presented for compilation is, in fact, an element in that set of valid strings.

   To keep the set of rules that define a language small, the rules typically refer to words by their syntactic categories, or parts of-speech, rather than individual words. In describing English, for example, this abstraction allows us to state that many sentences have the form:
                       sentence → subject verb object period 
   rather than trying to enumerate the set of all sentences. For example, we use a syntactic variable, verb, to represent all possible verbs. With English, the reader generally recognizes many thousand words and knows the possible partsof-speech that each can fulfill. For an unfamiliar string, the reader consults a dictionary. Thus, the syntactic structure of the language is based on a set of rules, or a grammar, and a system for grouping characters together to form words and for classifying those words into their syntactic categories.

To understand whether the sentence “Compilers are engineered objects.” is, in fact, a valid English sentence, we first establish that each word is valid. Next, each word is replaced by its syntactic category to create a somewhat more abstract representation of the sentence–noun verb adjective noun period Finally, we try to fit this sequence of abstracted words into the rules for an English sentence. A working knowledge of English grammar might include the following rules:
1 sentence → subject verb object
2 subject → noun
3 subject → modifier noun
4 object → noun
5 object → modifier noun
6 modifier → adjective
7 modifier → adjectival phrase
. . .

Here, the symbol → reads “derives” and means that an instance of the right hand side can be abstracted to the left hand side. By inspection, we can discover the following derivation for our example sentence.

Rule   Prototype Sentence
-----------------------------
-        sentence
1       subject verb object period
2       noun verb object period
5       noun verb modifier noun period
6       noun verb adjective noun period

At this point, the prototype sentence generated by the derivation matches the abstract representation of our input sentence. Because they match, at this level of abstraction, we can conclude that the input sentence is a member of the language described by the grammar. This process of discovering a valid derivation for some stream of tokens is called parsing.

                   编译理论概览

【句法】
在编译器中,这项任务称作“句法分析”,为了有效地分析句法,编译器需要如下几条:
1. 源语言的形式化定义
2. 有效测试源语言的成员性
3. 对于如何处理非法输入的计划

   从数学角度来说,源语言通常是有穷规则集定义的字符串的无穷集。编译器的前端必须决定待编译的源程序事实上是前述有效字符串集合的一个元素。

   为了使得定义语言的规则的集合小,规则通常指代词的句法属类,或是词性,而不是单个的词。比如,在描述英语时,这种抽象使得我们可以声明很多有同样的形式句子:
                                       句子  —> 主语  动词   宾语 句号
     而不是试图枚举集合中的所有句子。 比如,我们用一个句法变量"verb"代表所有可能的动词。对于英语,读者通常认得上千单词,知道每个需要填的词性。 对于不熟悉的字符串,读者求助词典。因此,语言的句法结构是基于规则集合或语法,以及基于把字符组合一起形成单词并把这些单词分成句法属类。

   为了理解句子“编译器是构造出来的客体”是一句有效的句子,问首先确立每个单词是有效的。接着,每个词由它的句法属类代替,以此生成稍微更抽象的句子表达——名词 动词 形容词 名词 句号。最后,我们试图使这个由抽象词组成的句子符合规则集描述的英语句子。有用的英语语法可能包括如下规则:
1. 句子 —》 主语 动词 宾语
2. 主语 —》 名词
3. 主语 —》 修饰符 名词
4. 宾语 —》 名词
5. 宾语 —》 修饰符 名词
6. 修饰符 —》 形容词
7. 修饰符 —》 形容词短语
。。。

这里符号—》读作“派生”,意指右边的实例可以被抽象成左边的东东。通过检查,我们能够发现我们的示例句子可以这样派生出来:

规则   句子原型
------------------------------
-         句子
1         主语 动词 宾语 句号
2         名词 动词 宾语 句号
5         名词 动词 修饰符 名词 句号
6         名词 动词 形容词 名词 句号

到此为止,通过派生得出的句子原型与我们的输入句子的抽象表达完全匹配。因为它们在抽象层次匹配,我们可以得出结论:输入的句子是由语法描述的语言的一个组成成员。对于符号流发现一个有效派生的过程成为解析。


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

读书笔记:
1. 编译器需要知道源语言的形式化定义,即由有穷规则描述的形式化的语法,这样才能在语法空间里派生出一个句子原型,然后把该原型与输入句子的抽象描述进行匹配。如果完全匹配,则说明输入的句子确实是目标语言的一个句子,也即语言空间里的一个元素。
2. 对于计算机编程语言来说,把输入的指令句子,如 int x=3; 分割成符号流 int x = 3 ; 然后抽象成语法变量描述:keyword id operator number ;  然后再在语言的文法空间里查找是否有可以派生出一条句子原型与这个抽象描述匹配。

3. at this point  在这一点上
   Could I put in a word at this point? 我现在说几句行吗?
   At this point, your logic is at fault. 在这一点上你的推理是错误的。
4. right hand side, left hand side 右边、左边
5. fit into  vt. 适应、符合

posted on 2009-08-13 23:52 挑灯看剑 阅读(405) 评论(0)  编辑 收藏 引用 所属分类: 工作随笔

只有注册用户登录后才能发表评论。
<2009年7月>
2829301234
567891011
12131415161718
19202122232425
2627282930311
2345678

导航

公告

【自我介绍】 08年南开大学硕士毕业 最近关注:算法、Linux、c++、高并发 爱好:滑旱冰、打乒乓球、台球、保龄球

常用链接

随笔分类(139)

文章分类

我常去的网站

技术博客(都是大牛)

技术站点

搜索

积分与排名