An assembly source program consists of statements.
A statement is written in one line, using the characters listed in "(1) Character set". An assembly language statement consists of a "symbol", a "mnemonic", "operands", and a "comment".
These fields are delimited by a space, a tab, a colon (:), or a semicolon (;). The maximum number of characters in one line is theoretically 4294967294 (= 0xFFFFFFFE), but the memory size limits the actual maximum number of characters.
Statements can be written in a free format; as long as the order of the symbol, mnemonic, operands, and comment is correct, they can be written in any columns. Note that one statement can be written within one line.
To write a symbol in the symbol field, a colon, one or more spaces, or a tab should be appended to delimit the symbol from the rest of the statement. Whether colons or spaces or tabs are used, however, depends on the instruction coded by the mnemonic. Before and after a colon, any number of spaces or tabs can be inserted.
When operands are necessary, they should be separated from the rest of the statement by one or more spaces or tabs.
To write a comment in the comment field, it should be delimited from the rest of the statement by a semicolon. Before and after a semicolon, any number of spaces or tabs can be inserted.
One assembly language statement is described on one line. There is a line feed (return) at the end of the statement.
The characters that can be used in a source program (assembly language) supported by the assembler are the following 3 types of characters.
These characters are used to code instructions in the source.
The language characters are further classified by their functions as follows.
The alphabetic characters (including the characters similar to alphabet) and numerals are collectively called alphanumeric characters. |
Reserved words and lowercase alphabetic characters specified in numeric constants are interpreted as the corresponding uppercase characters. |
When lowercase alphabetic characters are used in a user-defined symbol, the uppercase and lowercase are distinguished for interpretation. |
The following shows the usage of special characters of type 1.
If a special character of this type appears outside constant data or comment fields in a source program for a purpose other than those listed below, an error will occur.
The following shows the usage of special characters of type 2.
Octal number (0 to 255 in decimal) having up to three digits (o indicates an octal digit) |
||
Hexadecimal number (0 to 255 in decimal) having up to two digits (h indicates a hexadecimal digit) |
The following shows the usage of special characters of type 3.
These characters delimit lines.
This character moves the column position in a source program. It is output to a list as is.
Character data refers to characters used to write character constant, character string constant, and the quote-enclosed operands of some control instructions.
Character data can use all characters (including multibyte character, although the encoding depends on the OS). |
Comment characters are used to write comments.
A constant is a fixed value or data item and is also referred to as immediate data.
There are three types of constant as shown below.
Integer constants can be written in binary, octal, decimal, or hexadecimal notation.
Integer constants has a width of 32 bits. A negative value is expressed as a 2's complement. If an integer value that exceeds the range of the values that can be expressed by 32 bits is specified, the assembler uses the value of the lower 32 bits of that integer value and continues processing (it does not output message).
The beginning of a numeric constant should be a numeral.
For example, when 10 in decimal is written in hexadecimal with "H" appended at its end, append "0" at the beginning and write "0AH". If it is written as "AH", it is regarded as a symbol.
Prefix notation (like 0xn...n) and suffix notation (n...nh) cannot be used together within one source program. |
A character constant consists of a single character enclosed by a pair of single quotation marks (' ') and indicates the value of the enclosed character.
The number of characters should be 1.
This is a 32-bit value holding the right-justified code for the specified character. When the upper bytes are empty, they are filled with 0.
A character string constant is a sequence of some characters shown in "(1) Character set" enclosed by a pair of quotation marks (" ") and indicates the characters themselves.
To include the double quote character in the string, write it twice in succession.
A reference to a symbol is handled as a specification of the value defined for the symbol.
The symbols allowed in this assembler are classified into the following types.
A symbol specified in the symbol field of a symbol definition directive. This type of symbol has a value.
The range of a value is -2147483648 to 2147483647 (0x80000000 to 0x7FFFFFFF).
A symbol written between the beginning of a line and a colon (:). This type of symbol has an address value.
The range of an address value is 0 to 1048575 (0x00000 to 0xFFFFF).
A symbol specified in the operand field of an external reference name definition directive to refer to the symbol defined in a module from another module. The address value for this symbol is set to 0 at assembly and it is determined at linkage.
A symbol that is not defined in the module where the symbol is referenced is also regarded as an external reference name.
A symbol specified in the symbol field of a section definition directive.
This symbol does not have a value.
A symbol specified in the symbol field of a macro definition directive. It is used for reference to a macro.
This symbol does not have a value.
A symbol specified in the operand field of a macro definition directive.
This symbol does not have a value.
A symbol defined using a bit position specifier is called a bit symbol.
A reference to a symbol using a bit position specifier is called a bit reference to a symbol.
The symbol field is for symbols, which are names given to addresses and data objects. Symbols make programs easier to understand.
Symbols that can be written in the symbol field are classified as shown below, depending on their purpose and how they are defined.
Multiple symbols cannot be written in a symbol field. In addition, only one symbol of any one of the above types can be defined in a line.
Observe the following conventions when writing symbols.
The characters which can be used in symbols are the alphanumeric characters and special characters (@, _, .). |
The maximum number of characters for a symbol is 4,294,967,294 (=0xFFFFFFFE) (theoretical value). The actual number that can be used depends on the amount of memory, however. |
Reserved words cannot be used as symbols. |
The same symbol cannot be defined more than once. |
When a label is written in a symbol field, the colon ( : ) must appear immediately after the label name. |
CODE01 .CSEG ; "CODE01" is a section name. VAR01 .EQU 0x10 ; "VAR01" is a name. LAB01: .DB2 0 ; "LAB01" is a label. |
When writing an assembler generation symbol (see "5.7 Assembler Generated Symbols".), there is a possibility which becomes an error by a multi-definition, don't use an assembler generation symbol.
In addition, if a section name is not specified in a section definition directive, note that the assembler automatically generates a section name.
Write instruction mnemonics, directives, and macro references in the mnemonic field.
If the instruction or directive or macro reference requires an operand or operands, the mnemonic field must be separated from the operand field with one or more blanks or tabs.
However, if the first operand begins with "#", "$","!", "[", or "(", the statement will be assembled properly even if nothing exists between the mnemonic field and the first operand field.
MOVA, #1 ; There is no blank between the mnemonic and operand fields. MO V A, #1 ; The mnemonic field contains a blank. MOVE A, #1 ; This is an instruction that cannot be coded in the mnemonic field. |
In the operand field, write operands (data) for the instructions, directives, or macro references that require them.
Some instructions and directives require no operands, while others require two or more.
When you provide two or more operands, delimit them with a comma ( , ). Before and after a comma, any number of spaces or tabs can be inserted.
Write a comment after a number sign (#) at the beginning of a line or after a semicolon (;) in the middle of a line
The comment field continues from the # or semicolon to the new line code at the end of the line, or to the EOF code of the file.
Comments make it easier to understand and maintain programs.
Comments are not processed by the assembler, and are output verbatim to assembly lists.
Characters that can be described in the comment field are those shown in "(1) Character set".