C#: Value and Reference Types

Gabriel Calegari Included in Programming Language

2020-11-19 • 11 minutes read

/en/posts/c-sharp-value-and-reference-types/featured-image.en.png

Contents

C# is a programming language developed by Microsoft, being the main language of the .NET platform. With each new version, Microsoft introduces various improvements that enhance the developer’s experience and enable the building of more robust software. The latest version of C# (v9) was recently released.

This post aims to start a series of publications that will address C#. Just as a literature writer needs to master the language to be able to manipulate it and construct logical sentences, a software developer also needs to explore the language that serves as an instrument to produce software. So, let’s practice!

Common Type System

Software written for the .NET platform is language-independent. This means two things:

The platform will be able to execute any program that can be compiled to the Intermediate Language. F# and Visual Basic are examples of languages that compile to the Intermediate Language.
You can reference in your C# project, libraries written in F# (and vice versa), but compiled for .NET, so these languages must follow certain conventions.

The data types in C# are derived from the Common Type System (CTS), which is a system for specifying how languages that compile to the .NET platform should work with types.

The CTS defines, for example, various categories of types: classes, structures, enumerations, interfaces, and delegates. It also defines access modifiers, how inheritance should be defined, and operator overloading.

Data Types

In C#, when you create a variable and assign a value to it, the storage of that value can occur in different ways depending on the type. This is because the CTS defines value types and reference types.

Value Types

When assigning a value to a variable that is a value type, you are storing an instance of that type in memory. When you assign this variable to another variable of the same type, a copy of the value occurs. For example:

1
2
int x = 20;
int y = x;

When assigning the value 20 to the variable x, a space in memory is allocated to store the value 20. When assigning the variable x to the variable y, a new space in memory is allocated to store a copy of the value 20. That is, there are two separate memory spaces for storing this value.

In C#, value types are derived from System.ValueType, such as integral numeric types int and byte; floating-point numeric types like float and double; and types bool, char, struct, enum, and tuples.

Reference Types

Unlike value type variables, when initializing a reference type variable, in addition to allocating a space in memory to store the assigned content, the initialized variable is stored in another memory space, where there is a reference to the first memory position of the content. This means that when assigning one reference type variable to another, only the reference value is copied. For example:

1
2
Car car = new Car("Gol");
Car car2 = car;

When assigning the object resulting from the expression new Car("Gol") to the variable car, a space in memory is allocated to store all the values that the car object requires. In addition, a new space in memory is allocated to store the memory address (reference) of the stored object. Therefore, when assigning the content of the variable car to the variable car2, only the reference of the stored object is copied. This means in practice that these two variables point to the same location in memory.

Now consider the following situation:

1
2
3
4
5
6
7
Car car = new Car("Gol");
Car car2 = car;

car2.Name = "Uno";

Console.WriteLine(car.Name); // Uno
Console.WriteLine(car2.Name) // Uno

Notice that the Name property was changed from the variable car2. If car2 points to the same memory position as car, when printing the content of this property through both variables, the returned values should be the same, displaying the change in the value of the Name variable.

Since reference type variables store only memory addresses, nullifying the variable car2 would have no impact on car.

1
2
3
4
5
6
7
Car car = new Car("Gol");
Car car2 = car;

car2 = null;

Console.WriteLine(car.Name); // Uno, which proofs that happened nothing on car
Console.WriteLine(car2.Name) // NullReferenceException, which proofs that car2 is not referencing any memory address

The same occurs in an operation of initializing a new Car object being assigned to the variable car2. New memory spaces are allocated to store the content of this object, and car2 now contains the reference to this newly allocated memory space.

1
2
3
4
5
6
7
Car car = new Car("Gol");
Car car2 = car;

car2 = new Car("Uno");

Console.WriteLine(car.Name); // Gol, which proofs that happened nothing on car
Console.WriteLine(car2.Name) // Uno, which proofs that car2 points to a new position in memory

In C#, classes, interfaces, and delegates are reference types. Variables declared as being of type string and dynamic are also reference types.

Going a bit further down…

All this behavior that exists in value types and reference types is related to the way the operating system provides a running process with memory allocation space. Two of these spaces are the abstractions stack and heap.”

Stack

The stack memory area is much smaller than the heap and works as a LIFO (last in, first out) data structure. Every time a function is called, a block of memory is reserved on the stack for storing the function’s local variables. When the function returns, this area is deallocated. Because this area is small, depending on the amount of space required for a function, an error called stack overflow can occur, indicating that the stack is full. Due to its size and navigation method, retrieving data from the stack is very fast. Additionally, the memory spaces of the stack are not fragmented.

Heap

The heap memory area is intended for the dynamic allocation of variables. Its size varies according to use. It is not accessed directly like the stack. Instead, access to values stored in it depends precisely on the stack, where there is a reference to the data contained in the heap memory area. When there are no longer any variables in the stack pointing to memory positions in the heap, in C# there is the garbage collector, which is responsible for deallocating these memory positions. Retrieving a value stored in the heap may not be as fast as in the stack due to its size and fragmentation.

Memory areas vs Data Types

Given the characteristics of each memory area, you might be wondering where each data type is stored. In C#, every reference type is stored in the heap memory area. Value types, in general, are stored in the stack, with two exceptions: (1) when a value type is declared as a member of a class, it is stored in the heap along with the rest of the class; (2) when a value type is declared as a member of a struct, it is stored wherever the struct is stored (in the stack if it is a member of a local function, or in the heap if it is a member of a class).

Nullable Types

Due to the way value types are stored, it is not possible to assign null to a value type. However, there may be situations where a null value of this type is desired. Version 2 of C# introduced the concept of nullable value types. A nullable value type T? allows any variable of the value type T to have a null value. Every nullable value type is derived from System.Nullable<T>. For example:

1
2
char? letter = null;
letter = 'M';

Every reference type can be assigned the value null. And this can be a problem. How many NullReferenceException have you seen occur? Many, right? To try to address this, C# 8 introduced the feature of nullable reference types, which is disabled by default. When enabling this feature, all reference types that are declared without the T? syntax need to be initialized with non-null values. If this rule is violated, during the static analysis of the code, the compiler will produce a warning. For example:

1
2
string name = null; // Warning! Non-nullable field must contain a non-null value.
var person = new Person(name); // Warning! A nullable variable is being passed as a parameter.

With nullable reference types enabled, for the variable name to be nullable, it must be declared as string? name.

Parameter passing

In C#, parameter passing to a method can be done by value or by reference.

Pass by value

When a parameter is passed to a method by value, then a copy of that parameter is made in memory. Changes to the parameter that occur within the method do not impact the original data stored in the argument variable. For example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
void DoubleIt(int x) 
{
  x =* 2;
  Console.WriteLine($"Value of x inside of DoubleIt: {x}");
}

static void Main()
{
  int x = 10;

  Console.WriteLine($"Value of x before calling DoubleIt: {x}");
  DoubleIt(x);
  Console.WriteLine($"Value of x after calling DoubleIt: {x}");
}

/* Output:
    Value of x before calling DoubleIt: 10
    Value of x inside of DoubleIt: 20
    Value of x after calling DoubleIt: 10
*/

The value of the variable x from the Main method is copied to the variable x within the scope of the DoubleIt method. Thus, the variable x from the Main method does not undergo any modification, as they are different variables.

By default, in C# arguments are always passed by value to a method. When a reference type is passed to a method, a copy of its reference is made.

Pass by reference

To pass a parameter by reference, one must use the ref or out modifiers.

The ref modifier can be declared alongside the parameter in the method declaration to pass a value type by reference. For example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
static void DoubleIt(ref int x) 
{
  x =* 2;
  Console.WriteLine($"Value of x inside of DoubleIt: {x}");
}

static void Main()
{
  int x = 10;

  Console.WriteLine($"Value of x before calling DoubleIt: {x}");
  DoubleIt(ref x); // Pass by reference
  Console.WriteLine($"Value of x after calling DoubleIt: {x}");
}

/* Output:
    Value of x before calling DoubleIt: 10
    Value of x inside of DoubleIt: 20
    Value of x after calling DoubleIt: 20
*/

When ref is used with a reference type, instead of copying the reference that the variable stores to the argument variable, the reference of the variable containing the object’s reference is passed. This way, it is possible that within the method, the passed variable can point to another reference. For example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
static void Change(ref Person person)
{
  person.Name = "Maria";
  person = new Person("Joana");

  Console.WriteLine($"Value of Name inside Change: {person.Name}");
}

static void Main()
{
  var person = new Person("Leticia");

  Console.WriteLine($"Value of Name before calling Change: {person.Name}");
  Change(person);
  Console.WriteLine($"Value of Name after calling Change: {person.Name}");
}

/* Output:
    Value of Name before calling Change: Leticia
    Value of Name inside Change: Joana
    Value of Name after calling Change: Joana
*/

The out modifier is used in a similar way to ref, but it requires the method that uses it to initialize the variable if it is passed without being initialized. For example:

1
2
3
string numberAsString = "10";
int number;
int.TryParse(numberAsString, out number);

The variable number is passed to the method int.TryParse by reference, since the out modifier is used. It can be passed without being initialized, but the method int.TryParse will have to initialize it before returning.

Type conversion

A value type can be converted to a reference type and vice versa. These operations are costly and can affect the performance of the application.

Boxing

When a value type is converted to an object type, this process is called boxing. The boxing conversion of a value type stores an object instance in the heap and copies the value into the new object.

1
2
int x = 123;
object o = x; // This operation does boxing conversion

Unboxing

Unboxing conversion is the opposite: it is the conversion from a reference type to a value type. The unboxing conversion of a reference type to a value type stores the value in the stack. Attempting to perform an unboxing conversion of a reference to an incompatible value type causes an InvalidCastException.

1
2
3
int x = 123;
object o = x; // This operation does boxing conversion
int y = (int) o; // This operation does unboxing conversion

The figure below shows what happens in the heap and the stack when the boxing and unboxing conversion processes occur.

Schema shows boxing and unboxing conversions

References

MICROSOFT. Common Type System. Available at: https://docs.microsoft.com/en/dotnet/standard/base-types/common-type-system
MICROSOFT. Typing (C# Programming Reference). Available at: https://docs.microsoft.com/en/dotnet/csharp/programming-guide/types/
MICROSOFT. Reference Types (C# Reference). Available at: https://docs.microsoft.com/en/dotnet/csharp/language-reference/keywords/reference-types
MICROSOFT. Value Types (C# Reference). Available at: https://docs.microsoft.com/en/dotnet/csharp/language-reference/builtin-types/value-types