A method to write object oriented code in ANSI C

24 May 2020 - tsp
Last update 09 Jul 2020

Quick note: This blog post emerged out of a chat with a student in a few minutes. If one spots errors please feel free to make corrections. It might also grow from time to time whenever I remind myself of it’s existence.

Object oriented programming?

What is object oriented programming anyways? Basically it’s an paradigm in which one sees some amount of data (specified by a data structure) as well as a set of functions as an set called an object. This is normally realized by either having a data structure and a set of functions defined in the same namespace as well as tracking the type of the object during compile time - this is what happens when one uses C++ without virtual functions and without runtime type information (RTTI). In this case the compiler tracks the data type of all objects and thus knows the set of routines that are going to be called and which are available for the given object. Additionally all functions are mangled in a way to be object unique instead of globally used. A C++ function for the object named ExampleObject that’s named exampleFunction(int a, int b) that would be denoted void ExampleObject::exampleFunction(int a, int b) in the implementation might be called @@ExampleObject@exampleFunction@@void@@int@int by some compiler. In this case one sees that the object name as well as the function name is encoded - as well as the return type and the type signature of the arguments which is required since C++ also supports overloading functions using the same name but different sets of arguments.

The main drawback of this simple approach is that there is a small problem when overriding functions in a processed called inheritance. When one object inherits from another object it contains the same set of values inside the data-structures as the original object and also inherits all functions from the original object. It might be extended (i.e. a new set of functions might be added as well as new data fields). Code might address an object either with it’s new type using the union of both sets of functions as well as by it’s base type and use only the original set of functions. As a side-note: Because of this way a cast up towards the original object that contains less data and functions (upcast) is always implicitly possible, a downcast requires additional checks or knowledge by the program since the downcast is a claim that the object contains more properties and methods than previously known object.

The main problem with this approach is now that functions can be overridden during inheritance - it’s possible for the new datatype to define a different function for a function that has already been defined in the parent class. Pseudo code might represent that the following way:

class Animal {
	public String getName() { return "Unspecified animal"; }
}
class Cat extends Animal {
	public String getName() { return "Cat"; }
}
class Dog extends Animal {
	public String getName() { return "Dog"; }
}

Now when one instantiates Animal and would call getName() the function would obviously return "Unspecified animal". The same would happen when one directly instantiates Cat or Dog and calling their respective functions:

Animal animal = new Animal();
a->getName(); // Returns "Unspecified animal"

Dog dog = new Dog();
dog->getName(); // Returns "Dog"

Cat cat = new Cat();
cat->getName(); // Returns "Cat"

The problem that might now arise happens during upcasting:

Cat cat = new cat();
cat->getName(); // Returns "cat"

Animal animal = cat; // Upcast
animal->getName(); // Returns "unspecified animal"

To allow an object to override this behavior in a way that the application is capable of calling overridden methods languages normally choose to use virtual function tables. To use them in C++ one specifies a function to be virtual. In case of Java for example this is always the case. The basic idea is that for each object type there exists a list of function pointers inside the data-structure that represents the object - this is called virtual (function) table. For every function that’s called on the object the compiler generates code that first fetches the function pointer and indirectly the call towards this function pointer. This allows an object to behave the following way:

Cat cat = new cat();
cat->getName(); // Returns "cat"

Animal animal = cat; // Upcast
animal->getName(); // Still returns "cat" with virtual functions

Additionally some language always (for example Java) or optionally (C++) provide a way to gather information about objects at runtime. This might be an extensive set of methods like Java reflection that allows code to walk over variables and methods contained inside an object, determine their signatures, etc. - or it might be a little bit more basic like C++ runtime type information that keeps information about the real most derived type of the object so one can follow the cast hierarchy - that allows one to use dynamic_cast during an downcast to check if the object really implements all claimed methods.

Multiple inheritance, COM, etc.

Another often used pattern is multiple inheritance during which an object derives from multiple objects, implements different independent interfaces, etc. This is commonly found in frameworks like the component object framework or to some extend in Java (without multiple inheritance though). The basic idea is that a class can inherit from multiple different objects:

class ParentA { void funA(); }
class ParentB { void funB(); }

class ChildC extends ParentA, ParentB { ... } // Now contains funA and funB

Language support for such kind of inheritance is limited though since this requires extensive sets of rules on how to deal with situations in which the parent classes define the same methods (which parent wins) or derive from the same parent but one of the children inside the inheritance DAG overrides methods and others don’t. To solve that it’s normally possible to query the given interface from an object. One of the most prominent frameworks that uses multiple inheritance is the component object model (COM) developed by Microsoft that has been used extensively inside the Windows operating system and multimedia frameworks as well as their Office products.

The basic idea is that every object derives from the same basis interface that supports only 3 methods:

unsigned long int AddRef();
void Release();
void QueryInterface(void** lpOut, UUID* interfaceUuid);

Note that these methods do not precisely math the IUnknown interface of COM but reflect the same idea. This interface allows one to increment and reference counter (AddRef), ask for an specific interface by its globally unique identifier (QueryInterface) and allows one to decrement the reference count and potentially release the object (Release). In case one wants to use a specific interface of an object one simply calls the QueryInterface function and gets the desired behavior of the implementation according to this interface. One might also even not know anything about the basic type of the object during implementation or during runtime injection.

Why do this in an imperative language like ANSI C?

The big advantage when using a language as ANSI C is it’s simplicity (the language specification is rather simple), it’s broad platform support - as up to my knowledge there is no other language that has such a wide platform support as ANSI C or is as well supported by compilers when it comes to standards coverage -, the really advanced tooling (for example static analysis and proof assistants, runtime verification tools, etc.) as well as stability of the language and it’s associated environments. Of course it’s easier to use languages as Java when doing an heavily OOP dependent project and might not be a good idea to use OOP patterns when developing simple C applications - one just has to use the right tool for the given task. It of course also lacks stuff like dynamic code generation at compile time by templates and the extensive type checking offered by the C++ language model - so there are also drawbacks of using ANSI C that way.

The first basic approach is of course to get to know how this stuff is done and how it’s realized by C++ compilers anyway. In my opinion that’s always a good excuse to look into some stuff - and sometimes it’s really a nice and clean way of segmenting ANSI C applications although formal proofs get way more challenging when using function pointer based patterns.

Implementing objects that way in ANSI C

The basic idea is to define function pointer types for every member function of a class as well as a so called virtual function table structure. This structure might contain a virtual function table of it’s parent class that it directly extends as first member since upcasting a structure type to it’s first member type is always a valid operation. This virtual function table then will be referenced from the structure that represents the object as it’s first member.

For a pseudocode class like:

class A {
	int functionA() { return 1; }
	int functionB() { return 1; }
}
class B extends A {
	// Inherited but overriden
	int functionB() { return 2; }

	// Newly added
	int functionC() { return 2; }
}

one might define:

struct classA;
struct classAVtbl;

typedef int (*classA_functionA)(struct classA* lpThis);
typedef int (*classA_functionB)(struct classA* lpThis);

struct classAVtbl {
	classA_functionA	functionA;
	classA_functionB	functionB;
};
struct classA {
	struct classAVtbl	vtbl;
	// Any other data (for example a reserved void* would be useful)
};



struct classB;
struct classBVtbl;

typedef int (*classB_functionC)(struct classB* lpThis);

struct classBVtbl {
	struct classAVtbl	classA;
	classB_functionC	functionC;
};

The implementation might then provide factory routines and virtual function tables:

struct classAImpl {
	struct classA		object;

	// Add internal state private to this implementation here
}

static int classAImpl_FunctionA(struct classA* lpThis) {
	return 1;
}
static int classAImpl_FunctionB(struct classA* lpThis) {
	return 1;
}

static struct classAVtbl classAImplVtbl = {
	&classAImpl_FunctionA,
	&classAImpl_FunctionB
};

struct classA* factoryA() {
	struct classAImpl* newObject = malloc(sizeof(*newObject));
	if(newObject == NULL) { return NULL; }

	newObject->object.vtbl = &classAImplVtbl;
	return &(newObject->object);
}

And the implementation for classB:

struct classBImpl {
	struct classB		object;

	// Add internal state private to this implementation here
}

static int classBImpl_FunctionA(struct classA* lpThis) {
	return 1;
}
static int classBImpl_FunctionB(struct classA* lpThis) {
	return 1;
}
static int classBImpl_FunctionC(struct classB* lpThis) {
	return 1;
}

static struct classBVtbl classBImplVtbl = {
	{
		&classBImpl_FunctionA,
		&classBImpl_FunctionB
	},
	&classBImpl_FunctionC
};

struct classB* factoryB() {
	struct classBImpl* newObject = malloc(sizeof(*newObject));
	if(newObject == NULL) { return NULL; }

	newObject->object.vtbl = &classBImplVtbl;
	return &(newObject->object);
}

In this way it’s possible to downcast the classB to classA and call all

struct classB* lpB = factoryB();
struct classA* lpA = (struct classA*)lpB;

lpA->vtbl->functionA(lpA); // Returns 1
lpA->vtbl->functionB(lpA); // Returns 2
lpB->vtbl->functionC(lpB); // Returns 2

This might sound somewhat strange on the first glance but this is also the method most modern C++ and native code compilers solve the problem of class inheritance when generating code from source specifications. Also the way of passing an reference to the object as first argument matches the method used by most compilers to keep track of the this pointer. Some runtime environments might do additional checks (like Java) and enforce security constraints with their VM.

Additionally languages like C++ provide some internal callbacks to carry type information about classes to perform checks during dynamic_cast that might throw exceptions (which - by the way - is some kind of stack unwinding) in case a downcast is happening to a type that doesn’t match the implementation.

One might use the same method as presented above to implement multiple inheritance using the QueryInterface method. To do this one might simply implement multiple object structures containing an arbitrary pointer (defined as char* or void*) that points to the real object implementation structure. Depending on the supplied UUID one returns a reference to different object structures contained inside the main implementation structure. This also allows checking the given interfaces except the basis interface is really supported by the given object at runtime.

This article is tagged:


Data protection policy

Dipl.-Ing. Thomas Spielauer, Wien (webcomplains389t48957@tspi.at)

This webpage is also available via TOR at http://jugujbrirx3irwyx.onion/

Valid HTML 4.01 Strict Powered by FreeBSD IPv6 support