The Flow Caml language

Chapter 3 The Flow Caml language

This chapter is a brief description of the Flow Caml language. It lists the language constructs, and gives their syntax and semantics. However, this description is only informal, without the attempt to provide a mathematical formalization of the language. For a more detailed description of the core language and its type system, the reader is refereed to [PS03]. Moreover, many aspects of the Flow Caml language are directly derived from Objective Caml, so they are described only succinctly, the reader is refereed to the Objective Caml's documentation [LDG⁺02a] for further details.

Notations

The syntax of the language is given in BNF-like notation. Terminal symbols are set in typewriter font (like this). Non-terminal symbols are set in italic font (like that). Square brackets [...] denote optional components. Curly brackets <...> denotes zero, one or several repetitions of the enclosed components. Curly bracket with a trailing plus sign <...>+ denote one or several repetitions of the enclosed components. Parentheses (...) denote grouping.

3.1 Lexical conventions

Flow Caml adopts the same lexical conventions as Objective Caml.

ident	::=	(letter \| `_`) <letter \| `0`...`9` \| `_` \| `'`>

letter	::=	`A`...`Z` \| `a`...`z`
infix-symbol	::=	(`=` \| `<` \| `>` \| `@` \| `^` \| `\|` \| `&` \| `+` \| `-` \| `` \| `/` \| `$` \| `%`) <operator-char*>
prefix-symbol	::=	(`!` \| `?` \| `~`) <operator-char>
operator-char	::=	`!` \| `$` \| `%` \| `&` \| `+` \| `-` \| `.` \| `/` \| `:` `<` \| `=` \| `>` \| `?` \| `@` \| `^` \| `\|` \| `~`

Flow Caml integer, float and character literals are identical to those of Objective Caml. There are two flavors of string literals in Flow Caml: mutable ones (charray-literal), introduced between backquotes (`) and immutable ones (string-literal), between double quotes ("). It is worth noting that escape sequences are the same for both. In particular, \` is not a valid escape sequence, even between `, so one has to write \096.

integer-literal	::=	[`-`] <`0`...`9`>+
	\|	[`-`] (`0x` \| `0X`) <`0`...`9` \| `A`...`F` \| `a`...`f`>+
	\|	[`-`] (`0o` \| `0O`) <`0`...`7`>+
	\|	[`-`] (`0b` \| `0B`) <`0`...`1`>+
float-literal	::=	[`-`] <`0`...`9`>+ [`.` <`0`...`9`>] [(`e` \| `E`) [`+` \| `-`] <`0`...`9`>+]
char-literal	::=	`'` regular-char `'`
	\|	`'` escape-sequence `'`
escape-sequence	::=	`\` (`\` \| `"` \| `'` \| `n` \| `t` \| `b` \| `t`)
	\|	`\` (`0`...`9`) (`0`...`9`) (`0`...`9`)
string-literal	::=	`"` <string-character> `"`
charray-literal	::=	` <charray-character> `
string-character	::=	regular-char-string
	\|	escape-sequence
charray-character	::=	regular-char-charray
	\|	escape-sequence

The identifiers below are reserved as keywords, and cannot be employed otherwise. The first groups include all Objective Caml keywords while the second group lists Flow Caml additional ones.

    and         as          assert      asr         begin       class  
    closed      constraint  do          done        downto      else   
    end         exception   external    false       for         fun    
    function    functor     if          in          include     inherit
    land        lazy        let         lor         lsl         lsr    
    lxor        match       method      mod         module      mutable
    new         of          open        or          parser      private
    rec         sig         struct      then        to          true   
    try         type        val         virtual     when        while  
    with                                                               

    affects     content     finally     flow        greater     less   
    level       noneq       propagate   raise       raises      row    
    than

The following character sequences are also keywords:

    #     &     '     (     )     *     ,     ->    ? 
    ??    .     ..    .(    .[    :     ::    :=    ; 
    ;;    <-    =     [     [|    [<    {<    ]     |]
    >]    >}    _     `     {     |     }     ~       

    -{    }->   ={    }=>

3.2 The core language

3.2.1 Values

Values of Flow Caml are those of Objective Caml: integer numbers, floating-point numbers, characters, character strings, tuples, arrays, variant values and functions. (Polymorphic variants and objects are not supported.)

3.2.2 Names

Naming objects

Identifiers are used to give names to several classes of language objects and refer to these objects by name later:

· Value names:	value-name	::=	lowercase-ident \| `(` operator-name `)`
	operator-name	::=	prefix-symbol \| infix-symbol
		\|	`*` \| `=` \| `or` \| `&` \| `:=`

· Value constructors:	constr-name	::=	capitalized-ident \| `false` \| `true` \| `[]` \| `()` \| `::`

· Record fields:	field-name	::=	lowercase-ident

· Type constructors:	typeconstr-name	::=	lowercase-ident

· Level names:	level-name	::=	uppercase-ident

· Principals:	principal	::=	`!` lowercase-ident

· Exception names:	exception-name	::=	capitalized-ident

· Module name:	module-name	::=	capitalized-ident

· Module type names:	modtype-name	::=	ident

These nine name-spaces are distinguished both by the context and by the capitalization of the identifier. In comparison with Objective Caml, we have three additional classes: level names and principals (see section 3.2.3), and exception names. Exception names are syntactically distinguished from value constructors because exceptions are not first class values in Flow Caml.

Referring to named objects

A named object can be referred to either by its name (following the usual static scoping rules for names) or by an access path prefix.name, where prefix designates a module and name is the name of an object defined in that module. The first component of the path, prefix is either a simple module name or an access path name₁.name₂..., in case the defining module is itself nested inside other modules. For referring to type constructors, levels, exceptions (in type expressions) or module types, the prefix can also contain simple functor applications (as in the syntactic class ext-module-path), in case the defining module is the result of a functor application.

value-path	::=	value-name \| module-path `.` lowercase-ident
constr	::=	constr-name \| module-path `.` capitalized-ident
typeconstr	::=	typeconstr-name \| extended-module-path `.` lowercase-ident
level	::=	level-name \| extended-module-path `.` capitalized-ident
exception	::=	exception-name \| extended-module-path `.` capitalized-ident
field	::=	field-name \| module-path `.` lowercase-ident
module-path	::=	module-name \| module-path `.` capitalized-ident
ext-module-path	::=	module-name
	\|	ext-module-path `.` capitalized-ident
	\|	ext-module-path `(` ext-module-path `)`
modtype-path	::=	modtype-name \| module-path `.` ident

Principal names need not be qualified because they are global labels.

3.2.3 Security levels

Flow Caml types are annotated by security levels, which are supposed to belong to a lattice (the corresponding partial order is denoted by <, although it is not strict). The lattice must include principals (principal) and levels introduced by level declarations (level), which form ``(constant) security levels'':

security-level

::=

principal | level

However, in order to preserve the lattice structure, it may also contains other security levels, which are not representable in Flow Caml syntax: they never will appear in type expressions; however, assuming their existence is necessary to interpret type schemes. Principals are intended to represent external entities or channels which programs may interact with.

The partial order on security levels is the smallest order which satisfies:

The inequalities between principals provided in flow declarations (see section 3.3.3),
and the assumptions made in level definitions (see section 3.2.4).

Information flow is allowed from a source labeled by the security level security-level₁ to a sink labeled by security-level₂ if and only if security-level₁ < security-level₂ holds.

3.2.4 Level definitions

Level names can be bound to security levels of the lattice thanks to level definitions.

level-definition	::=	`level` level-name level-repr
level-repr	::=	[`greater` `than` security-level-list] [`less` `than` security-level-list]
	\|	`=` security-level
security-level-list	::=	security-level <`,` security-level>

A level definition is introduced by the level keyword. It consists in a capitalized identifier followed by two optional sets of assumptions or bounds. The identifier is the name of the level being defined. The assumptions relate this new level with existing ones:

If a clause greater than security-level₁ , ... , security-level_n is provided, then the new level is made greater than or equal to security-level_i, for all i.
If a clause less than security-level₁ , ... , security-level_n is provided, then the new level is made less than or equal to security-level_i, for all i.

The representation = security-level is a shorthand for greater than security-level less than security-level. The definition cannot introduce new relationships about levels listed in these assumptions: every level which appears in the former must be known to be less than or equal to each of those appearing in the latter.

3.2.5 Type expressions

Type expressions denote types, security levels and rows in types schemes, definitions of data-types as well as in type constraints over patterns and expressions. A type expression is of one of the three kinds: level, type or row (this last kind is parametrized with a (possibly empty) set of exceptions which is the complementary of its domain).

kind ::= level | type | row [ [exception <, exception>] ]

A row of kind row [exception₁, ..., exception_n] is a mapping from exceptions distinct of exception₁, ..., exception_n to security levels.

typexpr	::=	`'` ident
	\|	`(` typeexpr `)`
	\|	typexpr `-{`typexpr `\|` typexpr `\|` typexpr `}->` typexpr
	\|	typexpr <`` typexpr*>+
	\|	typeconstr
	\|	typeexpr typeconstr
	\|	`(` typexpr <`,` typexpr> `)` typeconstr
	\|	exception `:` typexpr `;` typexpr

Type expressions involve type variables 'ident. A type variable can be used with any kind (i.e. there is a not a distinguished name-space for every kind of variables); however in a given context (i.e. a type scheme, a data-type definition or a type constraint), every occurrence of a given type variable must be used with the same kind. In type definitions, type variables are names for the type parameters. In type schemes, they are implicitly universally quantified.

A function type comprises 5 type expressions: typexpr₁ -{typexpr₂ | typexpr₃ | typexpr₄}-> typexpr₅. typexpr₁ and typexpr₅ have the kind type; the former is the type of the argument of the function and the latter that of its result. typexpr₂ and typexpr₄ are levels (of kind level). The former is the security level of the context (generally written pc in the literature) where the function is called while the latter is the annotation attached to the function's identity. Lastly, typexpr₃ is the row (of kind row []) describing the exceptions raised by the function.

Tuples types typexpr₁ * ... * typexpr_n is the type of tuples whose elements belong to types typexpr₁, ..., typexpr_n respectively.

Constructed types consists either of a type constructor with no parameter (typeconstr) or with one (typeexpr typeconstr) or several (( typeexpr <, typeexpr> )) arguments. Every type constructor is supposed to have a signature which gives the number of parameters it expects and their respective kind, see section 3.2.7.

The row term exception : typexpr₁ ; typexpr₂ stands for the row whose entry at index exception is typexpr₁ and whose other entries are given by the row typexpr₂.

3.2.6 Type schemes

Type schemes intend to describe the set of admissible types for some expression or value. Because of the presence of subtyping, this set cannot be represented by a single type expression with (universally quantified) free variables but must involve constraints.

type-scheme	::=	typexpr [`with` constraint <`and` constraint>]
constraint	::=	left-hand-side <`,` left-hand-side> `<` right-hand-side <`,` right-hand-side>
	\|	typexpr <`~` typexpr>+
left-hand-side	::=	typexpr \| `content` `(`typexpr`)`
right-hand-side	::=	typexpr \| `level` `(`typexpr`)`

A type scheme consists in a type expression (the body) and an optional list of constraints, stating assumptions between the variables appearing in the body. Type variables which appear in a type scheme are implicitly universally quantified: intuitively, the type scheme stands for the set of all instances of the body which satisfy then constraints. A constraint is either an inequality, i.e. a pair of a left-hand-side and a right-hand-side, separated by the symbol <, or a same-skeleton constraint, i.e. a ~-separated list of type expressions.

Some abbreviations are allowed in type expressions which appear as the body of a type scheme:

A simple constant level security-level stands for a fresh level variable 'a with the two constraints security-level < 'a and 'a < security-level.
[< security-level₁ , ... , security-level_n] is a shorthand for a fresh level variable 'a with the constraint 'a < security-level₁ , ... , security-level_n. Similarly, [> security-level₁ , ... , security-level_n] is a shorthand for a fresh level variable 'a with the constraint security-level₁ , ... , security-level_n < 'a. Lastly, [< security-level₁ , ... , security-level_n |> security-level_{n+ 1} , ... , security-level_{n+ k}] is a shorthand for a fresh level variable 'a with the constraints security-level₁ , ... , security-level_n < 'a and 'a < security-level_{n+ 1} , ... , security-level_{n+ k}.
_ stands for an anonymous fresh variable of any kind.
On functions arrows, anonymous variables can be omitted. For instance, -{| typexpr₁ | typexpr₂ }-> is a shorthand for -{ 'a | typexpr₁ | typexpr₂ }-> where 'a is a fresh variable. Furthermore, an arrow whose three annotations are omitted, -{ | | | }->, can be written ->.

We do not give here a precise description of the interpretation of type schemes and constraints. The reader is referred to section 2.2 for an informal presentation and to [PS03] for a formal definition of the type system.

3.2.7 Type definitions

Type definitions bind type constructors to data types: either variant types, record types, type abbreviations, or abstract data types. They also bind the value constructors and record fields associated with the definition.

type-definition	::=	`type` typedef <`and` typedef>
typedef	::=	[`noneq`] [type-params] typeconstr-name [`=` typexpr] [type-repr]
type-params	::=	type-param \| `(` type-param <`,` type-param> `)`
type-param	::=	[`+` \| `-` \| `=` \| `#`] `'` ident [`:` (`level` \| `type` `row` `[` [exception-list] `]` )]
type-repr	::=	constr-decl <`\|` constr-decl> [`#` `'`ident]
	\|	`{` field-decl <`;` field-decl> `}` [`#` `'`ident]
constr-decl	::=	constr-name \| constr-name `of` typexpr
field-decl	::=	[`mutable`] field-name `:` typexpr

Type definitions are introduced by the type keyword, and consist in one or several simple definitions, possibly mutually recursive, separated by the and keyword. Each simple definition defines one type constructor. A simple definition consists in a lowercase identifier, possibly preceded by a noneq flag and one or several type parameters, and followed by an optional type equation, and then an optional type representation. The identifier is the name of the type constructor being defined.

The optional type parameters are either one type variable 'ident or a list of type variables ( 'ident₁ , ... , ident_n) for type constructors with several parameters. Each parameter may be annotated by its variance and its kind. In the case where the type definition introduces an abstract type (i.e. no type equation is provided), these annotations are mandatory and reproduced in the signature of the type constructor. In other cases, the signature computed for the type constructor is the minimal one which fits the type equation, the type representation and these annotations.

The optional type equation = typexpr makes the defined type equivalent to the type expression typexpr on the right of the = sign: one can be substituted for the other during typing. If no type equation is given, a new type is generated which is incompatible with any other type.

The optional type representation describes the data structure representing the defined type, by giving the list of associated constructors (if it is a variant type) or associated fields (if it is a record type):

The type representation = constr-decl <| constr-decl> [# ' ident] describes a variant type. The optional annotation [# ' ident] declares the security level attached to variant values. It must be one of the parameters of the type constructor and it is required if the variant type comprises several constructors.
The type representation = { field-decl <; field-decl > } describes a record type. The optional annotation [# ' ident] declares the security level attached to records values. It must be one of the parameter of the type constructor and it is required if the record type comprises one or several mutable fields.

If no type representation is given, nothing is assumed on the structure of the type besides what is stated in the optional type equation.

3.2.8 Exception definitions

Exception definitions introduce new exception names.

exntypexpr	::=	`'` ident
	\|	security-level
	\|	`(` typeexpr `)`
	\|	exntypexpr `-{`exntypexpr `\|` exntypexpr `\|` exntypexpr `}->` exntypexpr
	\|	exntypexpr <`` exntypexpr*>+
	\|	typeconstr
	\|	typeexpr typeconstr
	\|	`(` exntypexpr <`,` exntypexpr> `)` typeconstr
	\|	exception `:` exntypexpr `;` exntypexpr
exception-definition	::=	`exception` exception-name [exception-argument] [`=` exception-name]
exception-argument	::=	[`:` `'`ident] `of` exntypexpr

They consist in an upper case identifier, followed by an optional argument declaration, then an optional equation.

If no argument declaration is provided, then the exception name is constant. Otherwise, the argument declaration consists in an optional parameter and a type expression. The parameter is a level variable 'ident which can appear in the type expression. The type expression exntypexpr gives the type of the exception's argument, it is a type expression which may involve constant security levels.

If no equation is provided, then the definition generates a new exception, distinct from all other exceptions in the system. Otherwise, it only gives an alternate name to an existing exception. In this case, the optional argument declaration must be identical to that of the re-binded definition.

3.2.9 Constants

The syntactic class of constants comprises literals from the five base types (integers, floating-point numbers, characters, mutable characters strings and immutable characters strings) and constant constructors.

constant	::=	integer-literal
	\|	float-literal
	\|	char-literal
	\|	string-literal
	\|	charray-literal
	\|	constr

3.2.10 Patterns

Flow Caml provides the same patterns as Objective Caml (at the exception of the forms dealing with polymorphic variants). Their respective semantics is naturally preserved.

pattern	::=	value-name
	\|	`_`
	\|	pattern `as` value-name
	\|	`(` pattern `)`
	\|	`(` pattern `:` type-scheme `)`
	\|	pattern `\|` pattern
	\|	constr pattern
	\|	`{` field `=` pattern <`;` field `=` pattern> `}`
	\|	`[` pattern <`;` pattern> `]`
	\|	pattern `::` pattern
	\|	`[\|` pattern <`;` pattern> `\|]`

Because exceptions are not first class values and exception names are not regular value constructors, Flow Caml has a class of patterns dedicated to exceptions:

exception-pattern	::=	`_`
	\|	exception [pattern] <`\|` exception [pattern]>

The first kind of pattern matches every exception. The second matches the exceptions of the given names, with an argument value matching the additional pattern (if any).

3.2.11 Expressions

expr	::=	value-path
	\|	constant
	\|	`(` expr `)`
	\|	`begin` expr `end`
	\|	`(` expr `:` type-scheme `)`
	\|	expr `,` expr <`,` expr>
	\|	ncconstr expr
	\|	expr `::` expr
	\|	`[` expr <`;` expr> `]`
	\|	`[\|` expr <`;` expr> `\|]`
	\|	`{` field `=` expr <`;` field `=` expr> `}`
	\|	`{` expr `with` field `=` expr <`;` field `=` expr> `}`
	\|	expr <expr>+
	\|	prefix-symbol expr
	\|	expr (infix-symbol \| `` \| `=` \| `or` \| `&`) expr*
	\|	expr `.` field
	\|	expr `.` field `<-` expr
	\|	expr `.(` expr `)`
	\|	expr `.(` expr `) <-` expr
	\|	expr `.[` expr `]`
	\|	expr `.[` expr `] <-` expr
	\|	`if` expr `then` expr [`else` expr]
	\|	`while` expr `do` expr `done`
	\|	`for` ident `=` expr (`to` \| `downto`) expr `do` expr `done`
	\|	expr `;` expr
	\|	`match` expr `with` pattern-matching
	\|	`function` pattern-matching
	\|	`fun` multiple-matching
	\|	`raise` (exception \| `(` exception expr `)`)
	\|	`try` expr `with` [`\|`] handler <`\|` handler>
	\|	`try` expr `finally` expr
	\|	`let` [`rec`] let-binding <`and` let-binding> `in` expr
pattern-matching	::=	[`\|`] pattern [`when` expr] `->` expr <`\|` pattern [`when` expr] `->` expr>
multiple-matching	::=	<pattern>+ [`when` expr] `->` expr
let-binding	::=	pattern [`:` type-scheme] `=` expr
	\|	value-name <pattern>+ [`:` type-scheme] `=` expr
handler	::=	exception-pattern `->` expr [`propagate`]

The only differences between the syntax of Flow Caml expressions and those of Objective Caml rely in the fact that exceptions are first class values in the former while they are not in the latter. Thus, raise is no longer a regular function but a construct of the language. Two exceptions handlers are provided: the expression

	`try`
	expr
	`with`
	pattern₁ `->` expr₁ [`propagate`]
	...
	\| pattern_n `->` expr_n [`propagate`]

evaluates the expression expr and returns its value if the evaluation does not raise any exception. If it raises an exception, it is matched against the patterns pattern₁ to pattern_n. If the matching against pattern_i is the first which succeeds, the associated expression expr_i is evaluated (in an environment enriched by the bindings performed during the matching). If the handler is terminated with the keyword propagate, the trapped exception is propagated (in this case, exceptions raised by expr_i are not thrown), otherwise the value produced by the evaluation of expr_i becomes that of the whole try expression. If none of the patterns matches the exception raised by expr, the exception is raised again, thereby transparently ``passing through'' the try construct.

The expression try expr₁ finally expr₂ evaluates the expression expr₁. This produces a result which is either a value of a raised exception. In both cases, the expression expr₂ is evaluated and its result (value or exception) discarded. Finally, the result produced by expr₁ becomes the result of the whole expression.

Evaluation order

For the purpose of obtaining a precise information flow analysis, the evaluation order of expressions must be specified. As a result, the right-to-left evaluation order of the current implementation of the Objective Caml language is made part of the specification of the Flow Caml core language.

3.3 The module language

3.3.1 Module types (module specifications)

Module types are the module-level equivalent of type expressions: they specify the general shape and type properties of modules.

module-type	::=	modtype-path
	\|	`sig` <specification [`;;`]> `end`
	\|	`functor` `(` module-name `:` module-type `)` functor-arrow module-type
	\|	module-type `with` mod-constraint <`and` mod-constraint>
	\|	`(` module-type `)`
specification	::=	`val` value-name `:` type-scheme
	\|	`external` value-name `:` type-scheme `=` external-declaration
	\|	type-definition
	\|	level-definition
	\|	exception-definition
	\|	`module` module-name module-args `:` module-type
	\|	`module type` modtype-name `=` [module-type]
	\|	`open` module-path
	\|	`include` module-type
module-args	::=	<`(` module-name `:` module-type `)`>
mod-constraint	::=	`type` [type-parameters] typeconstr `=` typexpr
	\|	`level` level level-repr
	\|	`module` module-path `=` ext-module-path
functor-arrow	::=	`->` \| `-{` [security-level-list] `\|` [security-level-list] `}->`

This language of module types is largely identical to that of Objective Caml, so we only mention the significant differences:

A new for of specifications, level definitions, may appear in signatures.
In Flow Caml signatures, exception declarations (introduced by the keyword exception) may mention an equality between exceptions (as in structures). This is made necessary because exceptions appears in types for values.
Functors arrows may mention two lists of security levels, a pair of bounds whose purpose is explained in section 2.7.3 of the tutorial.

3.3.2 Module expressions (module implementations)

Module expressions are the module-level equivalent of value expressions: they evaluate to modules, thus providing implementations for the specifications expressed in module types. There is no difference with Objective Caml to be mentioned.

module-expr	::=	module-path
	\|	`struct` <definition [`;;`]> `end`
	\|	`functor` `(` module-name `:` module-type `)` `->` module-expr
	\|	module-expr `(` module-expr `)`
	\|	`(` module-expr `)`
	\|	`(` module-expr `:` module-type `)`
definition	::=	`let` [`rec`] let-binding <`and` let-binding>
	\|	`external` value-name `:` type-scheme `=` external-declaration
	\|	type-definition
	\|	level-definition
	\|	exception-definition
	\|	`module` module-name module-args `=` module-expr
	\|	`module type` modtype-name `=` module-type
	\|	`open` module-path
	\|	`include` module-expr

Evaluation order

The remark about the evaluation order made for the core language also applies to the module language. As a result, the right-to-left evaluation order of the current implementation of the Objective Caml language is made part of the specification of the Flow Caml module language.

3.3.3 Compilation units

interface	::=	flows-declaration [`affects` security-level-list] [`raises` security-level-list] <specification>
implementation	::=	flows-declaration <specification>
flows-declaration	::=	`flow` principal-list `<` principal-list <`and` principal-list `<` principal-list>
principal-list	::=	principal <`,` principal>

A Flow Caml program can be made of one or several compilation units. Each compilation is made of an interface and an implementation. It also has a name unit-name, derived from the names of the files containing the interface and the implementation (see section 4.2 for details).

An implementation consists in two parts: a list of flow declarations and a specification. The flow declarations define the partial order < between principals which is used throughout the type-checking of the specification: it is the smallest one which satisfies all the listed inequalities. The specification is the body of a structure which lists the values, types, levels, exceptions, modules and module types implemented by the unit.

An interface is made of three sections: a list of flow declarations, the affects and raises statements and a specification. The flow declarations define the partial order < between principals which has been used throughout the type-checking of the unit: in short, it provides a description of the possible information flow generated by the code of the unit. The inequality between principals provided in the interface of a unit must imply (possibly by transitivity) all those mentioned in the implementation of the unit. Lastly, the statements affects and raises mentions respectively the lower and upper bounds of the module expression underlying the compilation unit. They are omitted when the bound is empty.