5.1.2 Description

An assembly source program consists of statements.

A statement is written in one line, using the characters listed in "(1) Character set". An assembly language statement consists of a "symbol", a "mnemonic", "operands", and a "comment".

[symbol][:]       [mnemonic]     [operand], [operand]    ;[comment]

 

These fields are delimited by a space, a tab, a colon (:), or a semicolon (;). The maximum number of characters in one line is theoretically 4294967294 (= 0xFFFFFFFE), but the memory size limits the actual maximum number of characters.

Statements can be written in a free format; as long as the order of the symbol, mnemonic, operands, and comment is correct, they can be written in any columns. Note that one statement can be written within one line.

 

To write a symbol in the symbol field, a colon, one or more spaces, or a tab should be appended to delimit the symbol from the rest of the statement. Whether colons or spaces or tabs are used, however, depends on the instruction coded by the mnemonic. Before and after a colon, any number of spaces or tabs can be inserted.

 

When operands are necessary, they should be separated from the rest of the statement by one or more spaces or tabs.

 

To write a comment in the comment field, it should be delimited from the rest of the statement by a semicolon. Before and after a semicolon, any number of spaces or tabs can be inserted.

 

One assembly language statement is described on one line. There is a line feed (return) at the end of the statement.

 

(1)

Character set

The characters that can be used in a source program (assembly language) supported by the assembler are the following 3 types of characters.

(a)

Language characters

These characters are used to code instructions in the source.

The language characters are further classified by their functions as follows.

Table 5.1

Language Characters and Character Set

General Name of Subclass

Character

Numerals

0 1 2 3 4 5 6 7 8 9

Alphabetic characters

Uppercase letter

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Lowercase letter

a b c d e f g h i j k l m n o p q r s t u v w x y z

Characters similar to alphabet

@ _ (underscore) .(period)

Special characters

Special character type 1

. , : ; * / + - < > ( ) $ = ! & # [ ] " % << >> | ^ ? ~

Special character type 2

\

Special character type 3

LF, CR LF, HT

-

The alphabetic characters (including the characters similar to alphabet) and numerals are collectively called alphanumeric characters.

-

Reserved words and lowercase alphabetic characters specified in numeric constants are interpreted as the corresponding uppercase characters.

-

When lowercase alphabetic characters are used in a user-defined symbol, the uppercase and lowercase are distinguished for interpretation.

 

The following shows the usage of special characters of type 1.

If a special character of this type appears outside constant data or comment fields in a source program for a purpose other than those listed below, an error will occur.

Table 5.2

Special Characters Type 1 and Usage of Characters

Character

Usage

.(period)

Bit position specifier

Symbol for beginning a directive

, (comma)

Delimits an operand

: (colon)

Delimits a label

Extended address specification ("ES:")

; (semicolon)

Beginning of comment

*

Multiplication operator

/

Division operator

+

Positive sign

Addition operator

- (hyphen)

Negative sign

Subtraction operator

' (single quotation)

Symbol for beginning or ending a character constant

<

Relational operator

Shift operator

>

Relational operator

Shift operator

( )

Specifies an operation sequence

$

Symbol for beginning a control instruction

Symbol specifying relative addressing

=

Relational operator

!

Relational operator

Beginning immediate addressing

&

Bit logic operator

Logical operator

#

Beginning indicates

Beginning comment (when used at the beginning of a line)

[ ]

Indirect indication symbol

"(double quotation)

Start and end of character string constant

%

Remainder operator

|

Bit logic operator

Logical operator

^

Bit logic operator

?

Concatenation symbol (in macro body)

~

Bit logic operator

(blank or tab)

Field delimiter

 

The following shows the usage of special characters of type 2.

Table 5.3

Special Characters Type 2 and Usage of Characters

Escape Sequence

Value (ASCII)

Meaning

\0

0x00

null character

\a

0x07

Alert (Warning tone)

\b

0x08

Backspace

\f

0x0C

Form feed (New Page)

\n

0x0A

New line (Line feed)

\r

0x0D

Carriage return (Restore)

\t

0x09

Horizontal tab

\v

0x0B

Vertical tab

\\

0x5C

Backslash

\

0x27

Single quotation

\"

0x22

Double quotation

\?

0x3F

Question mark

\ooo

0 - 0377

Octal number (0 to 255 in decimal) having up to three digits (o indicates an octal digit)

\xhh

0x00 - 0xFF

Hexadecimal number (0 to 255 in decimal) having up to two digits (h indicates a hexadecimal digit)

 

The following shows the usage of special characters of type 3.

<1>

CR LF, LF

These characters delimit lines.

Special Character Type 3

Value Output to List

CR LF

0x0D0A

LF

0x0A

<2>

HT

This character moves the column position in a source program. It is output to a list as is.

(b)

Character data

Character data refers to characters used to write character constant, character string constant, and the quote-enclosed operands of some control instructions.

Caution

Character data can use all characters (including multibyte character, although the encoding depends on the OS).

-

Uppercase and lowercase characters are distinguished.

-

The following shows the handling of HT, CR LF, and LF.

 

Object Output

Value Output to Lis

HT

0x09

0x09 (a tab is expanded as is)

CR LF

0x0D0A

0x0D0ANote

LF

0x0A

0x0ANote

Note

These characters only delimit lines and they are not regarded as part of the character data.

(c)

Comment characters

Comment characters are used to write comments.

Caution

Comment characters and character data have the same character set.

(2)

Constants

A constant is a fixed value or data item and is also referred to as immediate data.

There are three types of constant as shown below.

(a)

Numeric constants

Integer constants can be written in binary, octal, decimal, or hexadecimal notation.
Integer constants has a width of 32 bits. A negative value is expressed as a 2's complement. If an integer value that exceeds the range of the values that can be expressed by 32 bits is specified, the assembler uses the value of the lower 32 bits of that integer value and continues processing (it does not output message).

Type

Notation

Example

Binary

Append an "0b" or "0B" suffix to the number.

Append "b" or "B" at the end of the number.

0b1101, 0B1101

1101b, 1101B

Octal

Append an "0" suffix to the number.

Append "o" or "O" at the end of the number.

074

074o, 074O

Decimal

Simply write the number.

128

Hexadecimal

Append an "0x" or "0X" suffix to the number.

Append "h" or "H" at the end of the number.

0xA6, 0XA6

6Ah, 6AH

 

The beginning of a numeric constant should be a numeral.

For example, when 10 in decimal is written in hexadecimal with "H" appended at its end, append "0" at the beginning and write "0AH". If it is written as "AH", it is regarded as a symbol.

Caution

Prefix notation (like 0xn...n) and suffix notation (n...nh) cannot be used together within one source program.
Specify the notation through the -base_number = (prefix | suffix) option.

(b)

Character constants

A character constant consists of a single character enclosed by a pair of single quotation marks (' ') and indicates the value of the enclosed character.

The number of characters should be 1.

This is a 32-bit value holding the right-justified code for the specified character. When the upper bytes are empty, they are filled with 0.

Example

Character Constants

Evaluated Value

'A'

0x00000041

' ' (1 blank)

0x00000020

(c)

Character string constants

A character string constant is a sequence of some characters shown in "(1) Character set" enclosed by a pair of quotation marks (" ") and indicates the characters themselves.

 

To include the double quote character in the string, write it twice in succession.

Example

Character string Constants

Evaluated Value

"ab"

0x6162

"A"

0x41

" " (1 blank)

0x20

""

None

(3)

Symbol

A reference to a symbol is handled as a specification of the value defined for the symbol.

The symbols allowed in this assembler are classified into the following types.

-

Name

A symbol specified in the symbol field of a symbol definition directive. This type of symbol has a value.

The range of a value is -2147483648 to 2147483647 (0x80000000 to 0x7FFFFFFF).

-

Label

A symbol written between the beginning of a line and a colon (:). This type of symbol has an address value.

The range of an address value is 0 to 1048575 (0x00000 to 0xFFFFF).

-

External reference name

A symbol specified in the operand field of an external reference name definition directive to refer to the symbol defined in a module from another module. The address value for this symbol is set to 0 at assembly and it is determined at linkage.

A symbol that is not defined in the module where the symbol is referenced is also regarded as an external reference name.

-

Section name

A symbol specified in the symbol field of a section definition directive.

This symbol does not have a value.

-

Macro name

A symbol specified in the symbol field of a macro definition directive. It is used for reference to a macro.

This symbol does not have a value.

-

Macro formal parameter name

A symbol specified in the operand field of a macro definition directive.

This symbol does not have a value.

 

A symbol defined using a bit position specifier is called a bit symbol.

A reference to a symbol using a bit position specifier is called a bit reference to a symbol.

 

The symbol field is for symbols, which are names given to addresses and data objects. Symbols make programs easier to understand.

(a)

Symbol types

Symbols that can be written in the symbol field are classified as shown below, depending on their purpose and how they are defined.

Symbol Type

Purpose

Definition Method

Label

Use this type when referring to the address of the label location.

Note that the label appended to a directive is regarded as included in the section immediately before the directive

Write a symbol followed by a colon ( : ).

Name

Use this type when assigning numerical data or an address and referring to it as a symbol.

Write in the symbol field of a Symbol definition directive.

Delimit the symbol field and mnemonic field by one or more spaces or tabs.

Section name

Use this type when referring to a symbol as input information for the optimizing linker.

Write in the symbol field of a section definition directive.

Delimit the symbol field and mnemonic field by one or more spaces or tabs.

Macro name

Use to name macros in source programs.

Write in the symbol field of macro directive.

Delimit the symbol field and mnemonic field by one or more spaces or tabs.

 

Multiple symbols cannot be written in a symbol field. In addition, only one symbol of any one of the above types can be defined in a line.

(b)

Conventions of symbol description

Observe the following conventions when writing symbols.

-

The characters which can be used in symbols are the alphanumeric characters and special characters (@, _, .).
The first character in a symbol cannot be a digit (0 to 9).

-

The maximum number of characters for a symbol is 4,294,967,294 (=0xFFFFFFFE) (theoretical value). The actual number that can be used depends on the amount of memory, however.

-

Reserved words cannot be used as symbols.
See "5.6 Reserved Words" for a list of reserved words.

-

The same symbol cannot be defined more than once.
However, a symbol defined with the .SET directive can be redefined with the .SET directive.

-

When a label is written in a symbol field, the colon ( : ) must appear immediately after the label name.
When using another type of symbol, insert a space or a tab to delimit the symbol from the mnemonic field.

 

Example

Correct symbols

CODE01  .CSEG               ; "CODE01" is a section name.
VAR01   .EQU    0x10        ; "VAR01" is a name.
LAB01:  .DB2    0           ; "LAB01" is a label.

 

Example

Incorrect symbols

1ABC    .EQU    0x3         ; The first character is a digit.s
LAB     MOV     1, r10      ; "LAB"is a label and must be separated from the 
                            ; mnemonic field by a colon ( : ).
FLAG:   .EQU    0x10        ; The colon ( : ) is not needed for names.

 

Example

A statement composed of a symbol only

ABCD:                       ; ABCD is defined as a label.

(c)

Points to note about symbols

When writing an assembler generation symbol (see "5.7 Assembler Generated Symbols".), there is a possibility which becomes an error by a multi-definition, don't use an assembler generation symbol.

In addition, if a section name is not specified in a section definition directive, note that the assembler automatically generates a section name.

(4)

Mnemonic field

Write instruction mnemonics, directives, and macro references in the mnemonic field.

If the instruction or directive or macro reference requires an operand or operands, the mnemonic field must be separated from the operand field with one or more blanks or tabs.

However, if the first operand begins with "#", "$","!", "[", or "(", the statement will be assembled properly even if nothing exists between the mnemonic field and the first operand field.

Example

Correct mnemonics

MOV     A, #1

 

Example

Incorrect mnemonics

MOVA, #1      ; There is no blank between the mnemonic and operand fields.
MO V  A, #1   ; The mnemonic field contains a blank.
MOVE  A, #1   ; This is an instruction that cannot be coded in the mnemonic field.

(5)

Operand field

In the operand field, write operands (data) for the instructions, directives, or macro references that require them.

Some instructions and directives require no operands, while others require two or more.

When you provide two or more operands, delimit them with a comma ( , ). Before and after a comma, any number of spaces or tabs can be inserted.

(6)

Comment

Write a comment after a number sign (#) at the beginning of a line or after a semicolon (;) in the middle of a line

The comment field continues from the # or semicolon to the new line code at the end of the line, or to the EOF code of the file.

Comments make it easier to understand and maintain programs.

Comments are not processed by the assembler, and are output verbatim to assembly lists.

Characters that can be described in the comment field are those shown in "(1) Character set".

Example

# This is a comment
HERE:   MOV     A, #0x0F        ;This is a comment
;
;       BEGIN LOOP HERE
;