Chapter 8
ATTRIBUTES OF TEXT

A text object is conceptually an instance of

         class TEXTOBJ( SIZE, CONST); integer SIZE; boolean CONST;
         begin character array MAIN( 1:SIZE ); end;

Any text value processed by the program is contained within a text frame, i.e. a non-empty segment of the MAIN attribute of some TEXTOBJ instance, or it is empty (i.e. notext). See 2.5.

A text variable is conceptually an instance of a composite structure

         ref(TEXTOBJ)  OBJ;  integer START, LENGTH, POS;

It references (and has as its value the contents of) some text frame defined by the three first components. POS identifies the current character. See 3.1.2.

See also 3.3.3 and 3.3.6 (text relations), 3.7 (text expressions), 4.1.2 and 4.1.3 (text assignments).

This chapter defines all procedure attributes of any text variable. They may be accessed by remote identifiers of the form

         simple-text-expression . procedure-identifier

The attributes are

         boolean procedure constant; ........................ 8.1;
         integer procedure start; ........................... 8.1;
         integer procedure length; .......................... 8.1;
         text procedure main; ............................... 8.1;
         integer procedure pos; ............................. 8.2;
         procedure setpos(i); integer i; .................... 8.2;
         boolean procedure more; ............................ 8.2;
         character procedure getchar; ....................... 8.2;
         procedure putchar(c); character c; ................. 8.2;
         text procedure sub(i,n); integer i,n; .............. 8.4;
         text procedure strip; .............................. 8.4;
         integer procedure getint; .......................... 8.6;
         long real procedure getreal; ....................... 8.6;
         integer procedure getfrac; ......................... 8.6;
         procedure putint(i); integer i; .................... 8.7;
         procedure putfix(i,n); integer i,n; ................ 8.7;
         procedure putreal(r,n); long real r; integer n; .... 8.7;
         procedure putfrac(i,n); integer i,n; ............... 8.7;

In the following "X" denotes a text variable unless otherwise specified.

"constant", "start", "length" and "main"

 CONSTANT   boolean procedure constant;
            constant:= OBJ == none  or else  OBJ.CONST;


 START      integer procedure start;   start  := START;


 LENGTH     integer procedure length;  length := LENGTH;


 MAIN       text procedure main;
            if OBJ =/= none then
            begin text T;
               T.OBJ    :- OBJ;
               T.START  := 1;
               T.LENGTH := OBJ.SIZE;
               T.POS    := 1;
               main     :- T;
            end main;

            "X.main" is a reference to the main frame which contains the
            frame referenced by X.

The following relations are true for any text variable X:

                      X.main.length  >=  X.length
                      X.main.main    ==  X.main

In addition,

                     notext.main    ==  notext
                     "ABC".main     =   "ABC"      (but "ABC".main =/= "ABC")

Examples

     boolean procedure overlapping(X,Y);  text X,Y;
     overlapping := X.main == Y.main  and then
                    (if    X.start <= Y.start
                     then  X.start + X.length > Y.start
                     else  Y.start + Y.length > X.start);

"overlapping(X,Y)" is true if and only if X and Y reference text frames which overlap each other.

     boolean procedure subtext(X,Y);  text X,Y;
     subtext := X.main == Y.main
                and then  X.start >= Y.start
                and then  X.start + X.length <= Y.start + Y.length;

"subtext(X,Y)" is true if and only if X references a subframe of Y, or if both reference notext.

Character access

The characters of a text are accessible one at a time. Any text variable contains a "position indicator" POS, which identifies the currently accessible character, if any, of the referenced text frame. The position indicator of a given text variable X is an integer in the range (1,X.length+1).

The position indicator of a given text variable may be altered by the procedures "setpos", "getchar", and "putchar" of the text variable. Also any of the procedures defined in 8.6 and 8.7 may alter the position indicator of the text variable of which the procedure is an attribute.

Position indicators are ignored and left unaltered by text reference relations, text value relations and text value assignments.

The following procedures are facilities available for character accessing. They are oriented towards sequential access.

Note: The implicit modification of POS is lost immediately if "setpos", "getchar" or "putchar" is successfully applied to a text expression which is not a variable (see 3.7).

 POS        integer procedure pos;  pos := POS;


 SETPOS     procedure setpos(i); integer i;
            POS := if i < 1 or i > LENGTH + 1 then LENGTH + 1 else i;


 MORE       boolean procedure more;  more := POS <= LENGTH;


 GETCHAR    character procedure getchar;
            if POS > LENGTH then error("..." ! Pos out of range;)
            else begin
               getchar:= OBJ.MAIN(START + POS - 1);  POS:= POS + 1
            end getchar;


 PUTCHAR    procedure putchar(c); character c;
            if  OBJ == none or else OBJ.CONST or else POS>LENGTH
            then error("...")
            else begin
               OBJ.MAIN(START + POS - 1):= c;  POS:= POS + 1
            end putchar;

Text generation

The following standard procedures are available for text frame generation:

 BLANKS    text procedure blanks(n); integer n;
           if        n < 0 then error("..." ! Parm. to blanks < 0;)
           else if   n > 0
           then begin text T;
              T.OBJ    :- new TEXTOBJ(n,false);
              T.START  := 1;
              T.LENGTH := n;
              T.POS    := 1;
              T        := notext;    ! blank-fill, see 4.1.2;
              blanks   :- T
           end blanks;

           "blanks(n)", with n > 0, references a new alterable main frame of
           length n, containing only blank characters. "blanks(0)"
           references notext.


 COPY      text procedure copy(T); text T;
           if T =/= notext
           then begin text U;
              U.OBJ    :- new TEXTOBJ(T.LENGTH,false);
              U.START  := 1;
              U.LENGTH := T.LENGTH;
              U.POS    := 1;
              U        := T;
              copy     :- U
           end copy;

           "copy(T)", with T =/= notext, references a new alterable main
           frame which contains a text value identical to that of T.

Text frame generation is also performed by the text concatenation operator (see 3.7.1) and by the standard procedure "datetime" (see 9.10).

Subtexts

Two procedures are available for referencing subtexts (subframes).

 SUB       text procedure sub(i,n); integer i,n;
           if i < 0 or n < 0 or i + n > LENGTH + 1
           then error("..." ! Sub out of frame;)
           else if n > 0
           then begin text T;
              T.OBJ    :- OBJ;
              T.START  := START + i - 1;
              T.LENGTH := n;
              T.POS    := 1;
              sub      :- T
           end;

           If legal, "X.sub(i,n)" references that subframe of X whose first
           character is character number i of X, and which contains n
           consecutive characters. The POS attribute of the expression
           defines a local numbering of the characters within the subframe.
           If n = 0, the expression references notext.

           If legal, the following Boolean expressions are true for any text
           variable X:

                X.sub(i,n).sub(j,m) == X.sub(i+j-1,m)

                n <> 0  imp  X.main == X.sub(i,n).main

                X.main.sub(X.start,X.length) == X


 STRIP     text procedure strip; ... ;

           The expression "X.strip" is equivalent to "X.sub(1,n)", where n
           indicates the position of the last non-blank character in X. If X
           does not contain any non-blank character, notext is returned.

           Let X and Y be text variables. Then after the value assignment
           "X:=Y", if legal, the relation "X.strip = Y.strip" has the value
           true, while "X = Y" is true only if X.length = Y.length.

Numeric text values

The names of the syntactic units in this section are in upper case to indicate that these rules concern syntax for data and not for program text.

The syntax applies to sequences of characters, i.e. to text values.

     NUMERIC-ITEM     =  REAL-ITEM  |  GROUPED-ITEM

     REAL-ITEM        =  DECIMAL-ITEM  [ EXPONENT ]
                      |  SIGN-PART  EXPONENT

     GROUPED-ITEM     =  SIGN-PART  GROUPS  [ DECIMAL-MARK  GROUPS ]
                      |  SIGN-PART  DECIMAL-MARK  GROUPS

     DECIMAL-ITEM     =  INTEGER-ITEM  [ FRACTION ]
                      |  SIGN-PART  FRACTION

     INTEGER-ITEM     =  SIGN-PART  DIGITS

     FRACTION         =  DECIMAL-MARK  DIGITS

     SIGN-PART        =  BLANKS  [ SIGN ]  BLANKS

     EXPONENT         =  LOWTEN-CHARACTER  INTEGER-ITEM

     GROUPS           =  DIGITS  { BLANK  DIGITS }

     SIGN             =  +  |  -

     DIGITS           =  DIGIT  { DIGIT }

     DIGIT            =  0  |  1  |  2  |  3  |  4
                      |  5  |  6  |  7  |  8  |  9

     LOWTEN-CHARACTER =  &  |  ...

     DECIMAL-MARK     =  .  |  ,

     BLANKS           =  {  BLANK  |  TAB  }

BLANK and TAB are the characters space and horisontal tabulation respectively.

The default representations of LOWTEN CHARACTER and DECIMAL MARK are & and ., respectively. These values may, however, be changed by appropriate procedure calls, see 9.2.

A numeric item is a character sequence which may be derived from NUMERIC ITEM. "Editing" and "de-editing" procedures are available for the conversion between arithmetic values and text values which are numeric items, and vice versa.

The editing and de-editing procedures are oriented towards "fixed field" text manipulation.

Note: Both the editing and the de-editing procedures are understood to operate on text values represented in the internal character set.

"De-editing" procedures

A de-editing procedure of a given text variable X operates in the following way:

The longest numeric item, if any, of a given form, which is contained by X and which contains the first character of X is located. (Note that leading blanks and tabs are accepted as part of any numeric item.)
If no such numeric item is found, a run-time error occurs.
Otherwise, the numeric item is interpreted as a number.
If that number is outside a relevant implementation-defined range, a run-time error occurs.
Otherwise, an arithmetic value is computed, which is equal to or approximates to that number.
The position indicator of X is made one greater than the position of the last character of the numeric item. Note that this increment is lost immediately if X does not correspond to a variable (see 3.7).

The following de-editing procedures are available.

 GETINT    integer procedure getint; ... ;

           The procedure locates an INTEGER ITEM.  The function value is
           equal to the corresponding integer.


 GETREAL   long real procedure getreal; ... ;

           The procedure locates a REAL ITEM.  The function value is equal
           to or approximates to the corresponding number. An INTEGER ITEM
           exceeding a certain implementation-defined range may lose
           precision when converted to long real.

     Note: No distinction is made between real and long real items.
           In order to preserve precision the procedure assumes long real
           precision.


 GETFRAC   integer procedure getfrac; ... ;

           The procedure locates a GROUPED ITEM.  The function value is
           equal to the resulting integer. The digits of a GROUPED ITEM
           may be interspersed with BLANKS and a single DECIMAL MARK which
           are ignored by the procedure.

Note: "getfrac" is thus able to de-edit more general patterns than those generated by "putfrac".

Editing procedures

Editing procedures of a given text variable X serve to convert arithmetic values to numeric items. After an editing operation, the numeric item obtained, if any, is right-adjusted in the text frame referenced by X and preceded by as many blanks as necessary to fill the text frame. The final value of the position indicator of X is X.length+1. Note that this increment is lost immediately if X does not correspond to a variable, (see 3.7).

A positive number is edited without a sign. A negative number is edited with a minus sign immediately preceding the most significant character. Leading non-significant zeros are suppressed, except possibly in an EXPONENT.

If X references a constant text frame or notext, an error results. Otherwise, if the text frame is too short to contain the resulting numeric item, the text frame into which the number was to be edited is filled with asterisks. If the parameters to "putfix" and "putreal" are such that some of the printed digits will be without significance, zeros are substituted for these digits (and no error condition is raised).

In "putfix" and "putreal", the numeric item designates that number of the specified form which differs by the smallest possible amount from the value of "r" or from the approximation to the value of "r".

 PUTINT    procedure putint(i); integer i; ... ;

           The value of the parameter is converted to an INTEGER ITEM which
           designates an integer equal to that value.

 PUTFIX    procedure putfix(r,n);  r; integer n; ... ;

           The resulting numeric item is an INTEGER ITEM if n=0 or a DECIMAL
           ITEM with a FRACTION of n digits if n>0. It designates a number
           equal to the value of r or an approximation to the value of r,
           correctly rounded to n decimal places. If n<0, a run-time error
           is caused.

 PUTREAL   procedure putreal(r,n);  r; integer n; ... ;

           The resulting numeric item is a REAL ITEM containing an EXPONENT
           with a fixed implementation-defined number of characters. The
           EXPONENT is preceded by a SIGN PART if n=0, or by an INTEGER ITEM
           with one digit if n=1, or if n>1, by a DECIMAL ITEM with an
           INTEGER ITEM of 1 digit only, and a fraction of n-1 digits. If
           n<0 a runtime error is caused.

 PUTFRAC   procedure putfrac(i,n); integer i,n; ... ;

           The resulting numeric item is a GROUPED ITEM with no DECIMAL MARK
           if n<=0, and with a DECIMAL MARK followed by total of n digits if
           n>0. Each digit group consists of 3 digits, except possibly the
           first one, and possibly the last one following a DECIMAL MARK. The
           numeric item is an exact representation of the number i * 10**(-n).

Examples

               procedure compact(T); text T;
               begin text U; character c;
                     T.setpos(1);  U:- T;
                     while U.more do begin
                        c:=U.getchar; if c <> ' ' then T.putchar(c)
                     end;
                     while T.more do T.putchar(' ')
               end compact;

The procedure rearranges the characters of the text frame referenced by its parameter. The non-blank characters are collected in the leftmost part of the text frame and the remainder, if any, is filled with blank characters. Since the parameter is called by reference, its position indicator is not altered.

            begin
               text tr, type, amount, price, payment;
               integer pay, total;
               tr      :- blanks(80);
               type    :- tr.sub(1,5);
               amount  :- tr.sub(20,5);
               price   :- tr.sub(30,6);
               payment :- tr.sub(40,10);
               ... ; ! ***;
               if type = "order" then begin
                  pay   := amount.getint * price.getfrac;
                  total := total + pay;
                  payment.putfrac(pay,2)
               end
            end

If tr at *** holds the text

               "order                1200    155.75               ..."

it will after editing contain

               "order                1200    155.75     18 690.00 ...".

Chapter 8 ATTRIBUTES OF TEXT