And if you're writing mutating viruses, you can apparently insert extra NOP 's during the virus' replication phase -- this can fool some types of virus scanners that simply search for certain byte patterns. Conditional jump limitations Conditional jumps all of the J instructions except JMP have a serious limitation: they can only jump forward bytes, or backward bytes starting with the byte immediately following the conditional jump instruction.
Many instructions assemble to three or four bytes in length or more, or less, of course , so this doesn't allow much of a range. If you try to conditional-jump to a label that is more than bytes ahead or bytes back the distance from the instruction after the conditional-jump instruction to the label is called the displacement , the assembler will vigorously complain. FarAwayLabel: Very clumsy indeed. Add extra comments to explain if the logic becomes less than clear.
In many cases you can get away with avoiding these "gymnastics" -- in the above TEST5. ASM program, we didn't have any problem. Personally, I don't even think about this problem unless the assembler complains, in which case the code can be modified reasonably quickly. The and later processors actually permit conditional-jump displacements of bytes ahead and bytes back, but in order to use extended instruction sets , , etc. Let's ignore this for now. Loops using jumps You can construct loops by using jumps to transfer execution to previously-occurring labels.
You could put a compare-jump construction inside an infinite loop as a way of getting out of the infinite loop, but it would be more sensible just to use the conditional jump like this: Loop: ; Do something here ; Test for some condition: CMP BX, 50 JNE Loop This loop would continue until BX contained 50 dec. One very common type of loop involves using a counter, so that some portion of code is repeatedly executed some pre-determined number of times.
Before entering the loop, you must load some value, such as 10, into CX. The LOOP instruction goes at the bottom of the loop, and it takes as an operand a label, just like a conditional jump instruction. Then, if CX is zero, the loop ends; if CX is non-zero, execution jumps to the specified label. You are certainly permitted to read the value of CX, and you are also allowed to modify it, to extend or shorten or end the loop. And it uses arrays, too: it will add up all of the values in an array.
It doesn't display anything, so if you want to see it in action, try using a debugger. ASM ends I made sure that the sum of the five values in the array would not exceed the limit for a word, dec. If you do find yourself in a situation where you need to add lists of bigger numbers doublewords, perhaps , look up the instruction ADC , meaning "add with carry". For subtracting bigger numbers, a related instruction is SBB , meaning "subtract with borrow".
Declaring variables in the code segment Now that we know how to use jump instructions, we can declare variables in the code segment.
You have to jump over the declared data. If you didn't, the processor would continue reading instructions from the code segment, including those declared bytes! The processor would try to execute the instructions meant by those bytes, which might cause some strange behavior! If you use variables in the code segment, remember to use the override segment "CS:", like this Of course, variables are not usually declared in the code segment. The data segment is a much better place for them.
But if the data segment gets full there is a 64K limit under the small memory model , then you can use this alternative. Pointers, of course, are just addresses of certain variables in memory. Near pointers can only point to variables stored in the data segment and stack segment, and so the segment is always constant. Because the segment is constant, only the offset needs to be stored or manipulated. Far pointers include both the segment and the offset, and so they can point to any data anywhere in the 1MB address space.
Also, near and far pointers suffer from segment wraparound ; when the offset is incremented past its limit, it resets back to hex, but the segment is not changed. Huge pointers, because they are normalized with each change in pointed-to address, don't suffer from this problem. Huge pointers are significantly slower than far pointers, however, because of the overhead associated with checking and normalizing the pointers with each use.
Information on huge pointers is sparse. A real example of memory access Here's a real example in which we read from and write to memory. We also get to use our bitwise logical operators and our methods for manipulating fields within larger types.
One of the status bytes in the first segment resides at We can read in this byte in a program to see if any of the Shift, Ctrl, or Alt keys are being pressed. We can also check the states of the Caps Lock, Num Lock, and Scroll Lock keys, and whether insert or overstrike mode is active.
To do this, we can read in this byte using peekb , or, by using a far pointer , then construct an AND mask to block out the bits we are not interested in, and then compare the remaining bits with zero to see if they are on or off.
But we can have more fun by setting certain bits in this byte on or off. Changing the Caps Lock, Num Lock, and Scroll Lock bits should appropriately change the status lights on your keyboard! We'll be careful to leave the other bits in this byte alone. Writing junk data to random addresses, as I have shown in some of the examples previously, is not a good idea!
If you want some practice with accessing memory, you might want to re-write the above code fragment using far pointers. You might want to use the delay function, from DOS. H , for generating short pauses during your program. Or, write a program that continuously prints out the contents of this status byte, and watch how the contents change as you press the Ctrl and Alt keys, etc.
Summary This chapter has explained the segment:offset addressing system used by the and later processors, as well as how the 1MB address space in the PC is laid out at least under DOS. We have examined the mysteries of little-endian number storage. References to material The memory map in this chapter was adapted from similar diagrams in the following two books both of which I recommend highly : Norton, Peter. ISBN: Norton's book has a good listing of many of the status bytes in the first memory segment, as well as very comprehensive interrupt lists interrupts will be discussed in a later chapter.
Messmer, Hans-Peter. USA: Addison-Wesley, Hooray for endorsements of bad standards. No, that would be too convenient! The semicolon character can't be used to signify comments within an " asm " block.
C has a set of special, "sacred" registers that it doesn't want disturbed. C expects you to leave these registers as you found them -- in other words, C leaves critical information such as the location of the code, data, and stack segments in these registers, and it expects that information to be there when it comes back to use it later. Now, this doesn't mean you can't use these registers. You are certainly permitted to use them -- but if you modify one of these special registers, you must set that register back to its original value when you're finished.
Why do we need to preserve these special registers? The compiler has certain standards -- when generating machine code, the compiler expects certain registers to have certain values.
The compiler normally puts in code at the start of a program to set up the segment registers, so that, for example, DS points to the data segment that contains the global variables. It then expects DS to remain unchanged throughout the program, so if we were to modify DS, we could imagine that certain nasty things might happen.
The compiler relies on the other special registers, to keep track of the code segment and the stack. As always, creating a stack overflow or underflow is not a healthy thing to do. How can you find out how big the stack is? If you're using the compiler from the command line, you can change the stack size with a command-line switch. Of course, you must leave the stack in the same condition you found it.
If we put some junk data on the stack within the function, as in the above example, then when the function returns, that junk data will be used as the return address. Basically, just follow safe stack guidelines -- leave the stack in the same condition in which you found it!
The built-in assembler doesn't handle macros. You can't put a label inside an " asm " block. You don't see labels very often in C code, because most people try to avoid goto s.
And of course it's ugly. It's C. Inline assembler is perfect for many tasks. It is much, much easier to use than the other method assembling. ASM files to produce. OBJ files, and linking those. OBJ files with the. As far as I can tell, inline assembler is quite popular.
Apparently the excellent demo Crystal Dreams 2 by Triton was written in Borland Pascal, with virtually all of the code in inline assembler. I think that's quite clever -- you can leave some of the messy details, such as parameter passing and variable scoping, to the high-level language.
However, the Triton fellows did complain about the lack of support for macros -- see the credits at the end of Crystal Dreams 2. External assembly -- combining assembler with C Inline assembler is useful for many purposes, but sometimes we need a heavy-duty solution.
Using "external assembly", we can combine C and assembler by doing this: we write a bunch of assembler procedures, put them into a. ASM file, and assemble that file to get an. OBJ file. Then we link that. OBJ file with the. OBJ files that are generated by compiling an associated C program, and we get the final.
EXE program. Each source code file will be called a module. Our goal is to be able to write a procedure in assembler, and then call it from a C program, just as if it was a normal function written in C.
In order to do this, we have to follow certain rules and conventions in our assembler procedures. We need to emulate a C function -- we need to do everything that a C function does -- so that we can trick C into thinking that the assembler procedure is a C function.
The first thing we must do is ensure that the memory models for our assembler and C modules are the same. In assembler, that means the " small " in the " MODEL small " directive we've used in just about every assembler program. I personally use the large memory model for all my C programs, so the examples in this tutorial will also use the large memory model.
If you're using the command-line compiler, there are switches to specify the memory model -- we'll see these later. That is, just plain normal assembler, where we put our code into an. ASM file. Well, that's reasonably easy to do First, we'd need to declare the string -- let's make it say something different -- in the data segment In fact, this assembler procedure is almost ready to be integrated with a C program.
Let's start learning how to convert the DisplayMessage procedure to a procedure that is compatible with C. But first, we must learn how to share global variables between an assembler module and a C module. Sharing global variables There are two ways to share global variables between a C module and an assembler module: Declare a global variable in the C module Import the global variable in the assembler module, using " EXTRN " Declare a global variable in the data segment of the assembler module Tell the assembler that we want to export or "make public" this global variable Import the global variable in the C program Let's see how the first option, the C-to-assembler version, works.
Then, in the assembler module, we need to import the global variable. Of course, we do this in the data segment. There appears to be an extra underscore at the start of each variable name!
Yes, there's one important catch: you have to add an underline character underscore to the front of the variable name in assembler. That becomes part of the variable name in the assembler module , so if you want to access one of the variables, you must use the name with the underscore. So we have to play C's game here. Assembly language instructions perform "tiny little actions". You won't find any assembly language instructions to write text to the screen or handle keyboard input. This means assembler programs have to deal with a lot more detail.
For writing to the screen, for example, you can't just call a pre-written function like printf. You have to write your own special routines in assembler to write to the screen which, in this case, can be made a bit easier using interrupts.
So a major disadvantage is the micromanagement that you must handle. It almost always takes longer to write a program in assembler than it does to write an equivalent program in a high-level language.
So why would anyone use assembly language? The two biggest reasons are control and speed. Assembler allows you direct access to whatever hardware devices and resources you want. You can do anything however you like, whereas in high-level languages, if you don't like the way a built-in function works, there's not much you can do about it. This added control lets you optimize for speed. Properly written assembler programs can run much faster than compiled programs, for several reasons.
Compilers often generate redundant code or code that could execute faster if written a different way. Modern compilers are actually very good at producing optimized code, but there is still room for improvement. Also, high-level languages often perform extra error checking, such as bounds checking. True, C does basically no error checking, which is why it's generally faster than other languages.
Assembler gives you total control over these matters, so you can decide how efficiently a routine should run. In terms of readability and maintainability, high-level langauges are far superior to assembler.
The worst aspect of assembler is its complete lack of portability -- you can't easily convert your program to other platforms that use different families of microprocessors.
Other makes and models of processors use different instruction sets and assembly languages. Which assembler should I use? It seems to be the most widely-used assembler for the PC, so it's more or less the standard. I'm satisfied with it. The latest version of TASM as of this writing is 5. It's impossible to find in stores and it's not cheap, although you can get a slight discount if you're a student. A good alternative is the shareware assembler A86, along with its debugger D These should be reasonably easy to find on the internet -- do an FTP search or visit some software repository sites and look for "A86".
There are also some other lesser-known commercial brands of assemblers, and there are some old shareware ones such as CHASM. The assembler code here has only been tested with TASM.
For other assemblers, you may need to do some basic conversions. The modern assemblers are more or less similar. The instructions and opcodes must be the same across all assemblers for the PC; they mainly differ in the formatting of the "overhead" assembler directives and structures, and they differ in terms of fancy new features. While you're choosing an assembler, you might want to go out and get a book on assembly language.
These tutorials will cover all of the important points, but it's always good to have a second source of reference. More importantly, though, make sure that you get an assembly language book that has a good instruction set listing at the back.
Getting started Most assembler books and tutorials start with binary and hexadecimal numbers. I'm assuming that you've read Chapters 1 through 3; if you have, congratulations, you're already familiar with binary and hex, so we don't need to bother with it here.
I'm also assuming you've read Chapters 4 through 6, so when we have to deal with memory addressing, interrupts, and hardware ports, you'll already know what's going on and we'll only need to learn how to use them with assembler. Registers The processor's registers were covered in Chapter 5, but they are so important that I'm going to briefly review them here.
If you've read Chapter 5 recently and you feel familiar with this material, skip it! Although they can be used for whatever miscellaneous purposes, some instructions require that certain data be present in particular registers: AX is usually used for storing values to be operated on by mathematical operations. CX is often used as a counter.
BX and DX are occasionally used to store address segments or offsets.
0コメント