Chapter 1
LEXICAL TOKENS

DEFINITIONAL CONVENTIONS

The meta language used in this standard to specify the syntax of the constructs is based on the Backus-Naur Form. The meanings of the various meta symbols are listed in the table below. Further (semantic) specifications of the constructs are given in prose and, in some cases, by equivalent program fragments. In such program fragments some identifiers introduced by declarations are printed in upper case. The use of upper case letters signifies that the identifier in question represents some quantity which is inaccessible to a program. An example of this convention is the identifier EVENT_NOTICE of chapter 12. Any other identifier that is defined elsewhere in the standard will denote the corresponding entity by its occurrence in such a program fragment.

Note:

The use of program fragments as described above, as well as the description of standard facilities (see chapters 8-12) by algorithmic means should be taken as definitive only as far as their effect is concerned. An actual implementation should seek to produce these effects in as efficient a manner as practicable. Furthermore, when arithmetic of real type is concerned, even the effects must be regarded as defined with only a finite degree of accuracy (see 3.5.3).

Metalanguage Symbols

          Metasymbol             Meaning

              =                  is defined to be
              !                  alternatively
              [ x ]              0 or 1 instance of x
              { x }              0 or more instances of x
              ( x | y )          grouping: either x or y
              xyz                the terminal symbol xyz
              meta-identifier    a non-terminal symbol
              ...                see below

A meta-identifier is a sequence of letters, digits and hyphens beginning with a letter. The identifier has intentionally been chosen to convey a hint of its meaning to the reader. The exact meaning is, however, defined by its (single) occurrence on the left hand side of a production. When used outside productions these identifiers are generally written with spaces instead of hyphens, except in cases where possible ambiguities might result.

A few productions contain the ellipsis (...) as a right hand side. In such cases a prose explanation is given immediately below the production.

A sequence of terminal and non-terminal symbols in a production implies concatenation of the text that they ultimately represent. Within chapter 1 this concatenation is direct; no characters may intervene. In the remainder of the Standard the concatenation is in accordance with the rules set out in this chapter.

The characters required to form SIMULA programs are those explicitly classified as "basic" in the table given in section 1.2. Additional characters of that table may be employed as described in that section.

A SIMULA source module consists of directive lines and program lines. Apart from 1.1 this standard is not concerned with directive lines. The lexical tokens used to construct program lines are classified into special symbols, identifiers, unsigned numbers, simple strings and character constants.

No lexical token may consist of more than 72 characters.

     letter
         =  A | B | C | D | E | F | G | H | I
         |  J | K | L | M | N | O | P | Q | R
         |  S | T | U | V | W | X | Y | Z
         |  a | b | c | d | e | f | g | h | i
         |  j | k | l | m | n | o | p | q | r
         |  s | t | u | v | w | x | y | z

The representation of any letter (upper or lower case, differences in font, etc.) occurring anywhere other than in a simple string or a character constant has no significance in that occurrence for the meaning of the program.

     digit
         = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

     space
         =  SP

SP is the space (blank) character (ISO 646 code 2/0).

Directive lines

If the first character of a line is "%" (percent) the line as a whole is a directive line.

A directive line serves to communicate information to the processor and consequently its meaning is entirely implementation-dependent, with the following single exception. If the second character is a space, the line has no significance; it may be used for annotation purposes.

Note:

The interpretation of a directive line takes precedence over the treatment of subsequent lines.
The interpretation by the processor may cause inclusion of lines not present in the module, or deletion of some lines actually following the directive in question.

The language defined in the following defines the resulting program text after all directive lines have been interpreted and thereafter deleted.

The character set

The standard presupposes an 8-bit internal representation of characters. ISO 2022). Thus the isocode facility allows for inclusion of characters with "isorank" value greater than 127 in simple strings and character constants. An implementation may restrict this possibility as well as the character set given below, as long as the "basic" characters of the table are included.

The standard character set is defined by the table below. For each character its "isorank" (see 9.2), name or printable representation and the classification of the character as a program text constituent are given.

  basic:    Significant in all contexts.
  skip:     Skipped in all contexts.
  graphic:  Significant inside comments, inside simple strings, and inside
            character constants; illegal outside these constructs.
  national: Reserved for national alphabet extension; treated as "graphic".
  format:   Format effector, see 1.9.

   0 NUL  skip        32  SP  basic       64      national    96      national
   1 SOH  illegal     33  !   basic       65  A   basic       97  a   basic
   2 STX  illegal     34  "   basic       66  B   basic       98  b   basic
   3 ETX  illegal     35  #   graphic     67  C   basic       99  c   basic
   4 EOT  illegal     36  $   graphic     68  D   basic      100  d   basic
   5 ENQ  illegal     37  %   graphic     69  E   basic      101  e   basic
   6 ACK  illegal     38  &   basic       70  F   basic      102  f   basic
   7 BEL  illegal     39  '   basic       71  G   basic      103  g   basic

   8 BS   format      40  (   basic       72  H   basic      104  h   basic
   9 HT   format      41  )   basic       73  I   basic      105  i   basic
  10 LF   format      42  *   basic       74  J   basic      106  j   basic
  11 VT   format      43  +   basic       75  K   basic      107  k   basic
  12 FF   format      44  ,   basic       76  L   basic      108  l   basic
  13 CR   format      45  -   basic       77  M   basic      109  m   basic
  14 SO   illegal     46  .   basic       78  N   basic      110  n   basic
  15 SI   illegal     47  /   basic       79  O   basic      111  o   basic

  16 DLE  illegal     48  0   basic       80  P   basic      112  p   basic
  17 DC1  illegal     49  1   basic       81  Q   basic      113  q   basic
  18 DC2  illegal     50  2   basic       82  R   basic      114  r   basic
  19 DC3  illegal     51  3   basic       83  S   basic      115  s   basic
  20 DC4  illegal     52  4   basic       84  T   basic      116  t   basic
  21 NAK  illegal     53  5   basic       85  U   basic      117  u   basic
  22 SYN  illegal     54  6   basic       86  V   basic      118  v   basic
  23 ETB  illegal     55  7   basic       87  W   basic      119  w   basic

  24 CAN  illegal     56  8   basic       88  X   basic      120  x   basic
  25 EM   illegal     57  9   basic       89  Y   basic      121  y   basic
  26 SUB  illegal     58  :   basic       90  Z   basic      122  z   basic
  27 ESC  illegal     59  ;   basic       91      national   123      national
  28 FS   illegal     60  <   basic       92      national   124      national
  29 GS   illegal     61  =   basic       93      national   125      national
  30 RS   illegal     62  >   basic       94      national   126      national
  31 US   illegal     63  ?   graphic     95  _   basic      127 DEL  skip

Table 1.1. Standard character set
(International Reference Version)

Special symbols

         +   -   *   /   //  **      Arithmetic operators
         &                           Text concatenation operator, or exp. mark
         &&                          Exponent mark in long real numbers
         :=  :-                      Assignment operators
         <   <=  =   >=  >   <>      Value relational operators
         ==  =/=                     Reference relational operators
         '                           Character quote
         "   ""                      String quote ("" only within strings)
         !                           Code quote, or comment
         ;                           Statement separator, or
                                     declaration or specification delimiter
         :                           Array bounds separator, or
                                     label definition or virtual delimiter
         (   )                       Parameter, array bounds grouping, or expr.
         .                           Remote indicator ("dot"), or decimal mark
         ,                           Parameter, array bounds pair or expression
                                     separator

Table 1.2. Special symbols, excluding key words

Normally the syntax of the language assumes that all syntactic units are recognised as being the largest possible string of characters which fits the syntax of a symbol. However, in an array declaration the symbol ":" is always a bounds separator, even if it is immediately followed by a minus.

         activate    else        if          none        short
         after       end         imp         not         step
         and         eq          in          notext      switch
         array       eqv         inner
         at          external    inspect     or          text
                                 integer     otherwise   then
         before      false       is                      this
         begin       for                     prior       to
         boolean                 label       procedure   true
                     ge          le          protected
         character   go          long                    until
         class       goto        lt          qua
         comment     gt                                  value
                                 name        reactivate  virtual
         delay       hidden      ne          real
         do                      new         ref         when
                                                         while

Table 1.3. SIMULA key words

Note:

For typographical reasons, the standard key words may, within this Standard, be printed as indicated in table 1.3. Within a program, the key words are printed as identifiers (cfr. letter production above).

Identifiers

    identifier
        =  letter  { letter  |  digit  |  _ }
No identifier can have the same spelling as any key word. Apart from this, identifiers may be chosen freely. They have no inherent meaning, but serve for the identification of language quantities i.e. simple variables, arrays, texts, labels, switches, procedures, classes and class attributes. Within a procedure declaration identifiers also act as formal parameters, in which capacity they may represent a literal value or any language quantity except a class. All constituent characters are significant in distinguishing between identifiers.

Numbers

     unsigned-number
         =  decimal-number  [ exponent-part ]
         |  exponent-part

     decimal-number
         =  unsigned-integer  [ decimal-fraction ]
         |  decimal-fraction

     decimal-fraction
         =  .  unsigned-integer

     exponent-part
         =  ( & | && )  [ + | - ]  unsigned-integer

     unsigned-integer
         =  digit  { digit | _ }
         |  radix R radix-digit { radix-digit | _ }

     radix
         =  2 | 4 | 8 | 16

     radix-digit
         =  digit | A | B | C | D | E | F

Decimal numbers have their conventional meaning. The exponent part is a scale factor expressed as an integral power of 10.

Unsigned integers are normally expressed in decimal digits. Unsigned integers of radix 2, 4, 8, or 16 may be expressed as shown. The radix digits A through F express radix 16 digits 10 through 15 (decimal). The radix determines the legality and the interpretation of a radix digit in an obvious manner.

An unsigned number which is an unsigned integer is of type integer. Otherwise, if an unsigned number contains an exponent part with a double ampersand (&&) it is of type long real, else it is of type real.

Examples

     2&1    2.0&+1   .2&2   20.0   200&-1   - represent same real value (20.0)
     2.345_678&&0                           - long real value (2.345678)

Strings

     string
         =  simple-string { string-separator simple-string }

     string-separator
         =  token-separator  { token-separator }

     simple-string
         =  " { ISO-code | non-quote-character | "" } "

     ISO-code
         =  !  digit  [ digit ]  [ digit ]  !

     non-quote-character
         =  ...
A non-quote-character is

A simple string must be contained within a single program line. Long strings are included as a sequence of simple strings separated by token separators.

In order to include a complete 8-bit coded character set, any character may be represented within a string by an integer, its isocode, corresponding to its bit combination. An isocode cannot consist of more than three digits, and it must be less than 256. If these conditions are not satisfied, the construction is interpreted as a character sequence. The string quote may, however, also be represented in simple strings by two consecutive quotes (see the last example below). Observe that, as a consequence of the definitional conventions given earlier in this chapter, no spaces may intervene between such a pair of string quotes.

Examples

                The string:               represents:

                "Ab"  "cde"               Abcde
                "AB"  "CDE"  ABCDE
                "!2!ABCDE!3!"             ABCDE enclosed by STX and ETX
                "!2" "!ABCDE!" "3!"       !2!ABCDE!3!
                "AB"" C""DE"              AB" C"DE

Character constants

     character-constant
         =  '  character-designator  '

     character-designator
         =  ISO-code
         |  non-quote-character
         |  "

A character constant is either a single printing character or it is an ISO-code - in both cases surrounded by character quotes (' - ISO 646 code 2/7).

Within the data processing system, characters are represented by values according to some implementation-defined code. This code also defines the collating sequence used when comparing character (and text) values by means of relational operators.

Comment convention

For the purpose of annotating the program proper comments may be included in a program. The substitution of end for an end-comment, or a space for a direct comment does not alter the meaning of a program.

Note:

As a consequence of 1.8.1 and 1.8.2 comments cannot be nested. It is understood that the comment structure encountered first in a program when reading from left to right has precedence in being replaced over later structures contained by the sequence.

End comment

The key word end may be followed by any sequence of characters and separation of lines not containing any of the special symbols end, else, when, otherwise, or ";". This sequence (excluding the delimiting special symbol, but including the initial end) constitutes an end-comment.

Direct comment

The special symbol "!" (exclamation mark) followed by any sequence of characters or separation of lines not containing ";" (semicolon), and delimited by semicolon, is treated as a comment if the exclamation mark does not occur within a character constant or a simple string (in which cases it may either represent itself or act as a code quote), or within a comment.

Note:

The delimiting semicolon is considered part of a direct comment and thus takes part in the
substitution.

Example

            if B then begin ... end !then; else ...

is not valid since the ! is part of an end-comment. Thus ";" will act as a statement separator (and no statement can start with else).

Token separators

     format-effector
         =  BS  |  HT  |  LF  |  VT  |  FF  |  CR

BS, HT, LF, VT, FF, and CR represent the characters thus named in table 1.3. A format effector in general acts as a space. In addition, an implementation may define some additional action to be taken (such as tabulation when listing the program); such action has no significance for the meaning of the program.

     token-separator
         =  ...

A token-separator is

Zero or more token separators may occur between any two consecutive tokens, or before the first token of a program text. At least one token separator must occur between any pair of consecutive tokens made up of identifiers, key words, simple strings or unsigned numbers. No token separators may occur within tokens.

Program interchange and lexical alternatives

In order to ease portability of SIMULA programs, a common representation has been adopted for the language. This representation is used throughout this standard except for the following conventions adopted for typographical reasons:

Alternate representation of some symbols

The representation for lexical tokens and separators given in 1.2 to 1.9 constitutes a standard representation for these tokens and separators. This standard representation is recommended for program interchange.

For historical reasons the following alternatives have been defined. All processors that have the required characters in their character set must provide both the standard and the alternate representations, and there is no distinction made between corresponding tokens or separators.

The alternate representations for the tokens are

           standard token   alternative representation

                 <                    lt
                 <=                   le
                 =                    eq
                 >=                   ge
                 >                    gt
                 <>                   ne
                 !                  comment