Yes, but the problem is that even the references that are actual pointers do not really hold a pointer address, as this address should not end with a 0. You would do something like :
ref = (malloc (sizeof (cons-cell)) << 1) + 1;
That way, ref knows it is pointing something (as ref & 1 == 1), it knows the address to this thing (it is ref >> 1), but this is totally invisible to Boehm's GC, I suppose. I guess it would think "hmm, this address 01001000 does not seem to be pointed by anyone. The only pointer I can see around is 10010001. Not the same thing obviously. Let's free 01001000 !"
But maybe I'm missing something, I never used Boehm except in toy apps.
Oh, in order to use boehm's GC, you should store references as normal pointers. And you have to tag the lowest two bits of fixnums and immediate consts with 01/10/11.
Another solution is to write your own object memory, allocation functions and GC yourself.
This is true; we'll have to change tags so that real objects are actual pointers. The alternative is to make everything a pointer, i.e. even "numbers" are really pointers to numbers.
I think it's easier to swap the meaning of numbers with objects rather than build our own GC.
As for infinite loops in GC - do you make sure not to scan a memory area you've already scanned?
Yes, and if he wants to write a GC, I advise him to start with a semi-space copying GC, which is simpler and has no recursions. By this you don't have to preserve an extra bit for mark-and-sweep, and you need no GC stacks for deep-first traversing.
I don't really like making everything a pointer. I don't know any Lisp implementation working this way (even the slowest ones). It really implies a performance hit every time you use a small integer value (with -2G < small < 2G). Anyway, binx's idea seems to be good...
Well, about infinite loops, I don't know, I stopped yesterday when my head was starting to burn... And gcc always compaining about pointers being of the wrong type (the way it is implemented, with tagged references, I have to cheat a lot). Oh, come on, gcc, who do you think you are, an Ada compiler ?
You don't need to make everything a pointer, but you can make everything a structure of 12 or 16 bytes. Lua takes this approach and it performs rather well.
Here's the Lua way:
struct obj{
int type;
union data{
double num;
void *ptr;
};
}
By this means, you can achieve architecture independence.
This approach is the best. In fact this has to be done, because not all computers have sizeof(int) == sizeof(void* ); certainly my AMD64 running on a 64-bit kernel doesn't.
Note that we can actually abstract away the representation of objects from the actual binary bits that represent it. For instance I would recommend that objects use the following representation (as noted by binx):
typedef struct s_obj{
enum e_type {
other = 0,
number = 1,
symbol = 2,
character = 3
}
type;
union u_data{
void* other;
int number;
void* symbol;
Uchar character; //unicode character
} data;
} obj;
#define OBJ_TYPE(o) ((o).type)
#define OBJ2FIXNUM(o) ((o).data.number)
#define FIXNUM_MIN INT_MIN //requires glibc
#define FIXNUM_MAX INT_MAX
//others defined similarly
However, for machines with the following characteristics:
1. 32-bit pointers and 32-bit int's
2. malloc() always returns an address at a 32-bit boundary (i.e. every four bytes, or with the lowest two bits zero)
In fact, maybe it's not a problem even on machines where ints are not as big as pointers. You can treat each value as an int pointer, and, when you need a fixnum, do
(int) my_int_ptr << 1
If ints are bigger than pointers, that's okay : you loose possible bits, but at least it's working (for example, if ints are on 6 bytes and addresses on 4 bytes, only the 4 least significant bytes are useful, the 2 other ones are lost). If pointers are bigger than ints, it's still working, but this time lost space is in their pointer representation.
The only thing is that fixnums don't have an architecture-independant size, but that's not a problem in practice anyway. The only constraint is that the biggest fixnum is
2**(max(sizeof(int), sizeof(int*)) - 2)
The last-bit-is-1-for-fixnum trick works on every architecture except machines where addresses are on 8 bits. Well, maybe we can ignore these ?
Actually it is, arc2c output crashes on my AMD64. I'm trying to hack out a replacement, but it's not working yet.
The problem is that apparently the bits of pointers that happen to be beyond the bits of ints are not zero, so assigning a pointer to an int chops off the significant bits.
Oh... Maybe a simple fix would be to change int to long. I know this is not standard behavior, but in practice I think long and pointers are always the same size. mzscheme obviously works that way : every object is a pointer that you can cast to a long if it is a fixnum.
However I think there may be processors/archs where sizeof(long) != sizeof(void* ). My attempt at fixing was to use a union, but something's wrong with the way closures are handled in my fix attempt - closures don't seem to have a type associated with them, so I'm not exactly sure how they're supposed to be done.
Edit: I'll probably need to search through C language specs, though - I'm not sure, but it might be standardized that sizeof(long) >= sizeof(void* )
What you want are the two typedefs intptr_t and uintptr_t . Each one is defined to be an integer big enough to hold a pointer (the former being signed, and the latter being unsigned); they aren't mandatory, though, so you might have
Making a fixnum a pointer to a allocated memory containing the value is terribly slow.
The infinite loop could be caused by two objects that references each other, e.g. a references b and b references a, so the GC keeps going from a to b and from b to a. To solve the problem, if this is the problem, if an object has been already marked, don't follow it.
Hmm... That would work because addresses are on 4 bytes, so they always end with 00, right ? That's a nice idea :) But it wouldn't work on an architecture with addresses on less than 4 bytes, no ? There are not many nowadays, I guess, but I suppose that should be added as an assertion is the code...
Do you know ALL_INTERIOR_POINTERS? In gc/doc/README:
"Any objects not intended to be collected must be pointed to either from other such accessible objects, or from the registers, stack, data, or statically allocated bss segments. Pointers from the stack or registers may point to anywhere inside an object. The same is true for heap pointers if the collector is compiled with ALL_INTERIOR_POINTERS defined, or GC_all_interior_pointers is otherwise set, as is now the default."