Embedded Office Blog

Embedded Market, Basics, Functional Safety, Embedded Security

How to Identify Files when leaving Version Control System

by Matthias Riegel (comments: 0)

In our business scenario, we are faced with the challenge of tracking the content and integrity of source code files (and later executable op-code) from our version control system, via the customers and their development environments further into the customers safety end-products.

This is a consequence from providing pre-certified software components. Within these software components for safety critical applications, a source file that is used by an application, must be the exact same version that was used during code review and software verification phases. It must be possible to prove, that this is the case.

This article describes the short comings of a widely used approach and present an alternative which allows consistent file identification during all of the steps described above. This article describes the method for the use case of ANSI-C source code files.

Using RCS-keywords

In a centralized version control system like Perforce or Clearcase, a file has a unique path and the file's revision number makes the file version unique. Using so-called RCS-keywords (comes from: revision control system 'RCS'), the file identifier can be placed in a string variable inside the source file:

char* fileid = "$Id: //path/to/file.c#42";

Thus, the file contains a unique string which can be referred to during code review and delivery. It can be printed into the test log and, by computing a checksum over the identifiers of all source files of a module, a single value can be computed and checked during application startup which ensures, that the correct version of all files of a module is used.

Thus, the version of the file can be uniquely identified during code reviews, printed into the test log and checked during delivery.

The Identifier is not tied to the file's content, but rather to the place in the version control system it resides. When it is moved into a staging area, or placed in the customer's version control system, the identifier changes and additional mappings have to be maintained to prove that the version in use is in fact the reviewed and tested one.

If the RCS-Keyword feature is disabled, the source file can be modified without a change to the identifier.

Decentralized version control systems like Subversion or Git do not have consistent revision numbers for files, but Git for example can use hooks or filters to place the commit ID of the last modification in the file, which has the same effect of being lost when the file is moved or placed in a non related git repository.

Manually supplied Unique IDs

Using a policy of always placing a generated unique id (e.g. UUID / random number) in the file would also ensure an identifiable file version, but this has to be done manually. The version control system can be configured to verify that the id is modified by checking against previous versions, but it remains a somewhat deliberate action that must happen. Proving, that the file was not modified on the way from review/test to the application is difficult.

File Hash.

During review and delivery, the hash of a source file can be computed to identify the file version. This hash can not be placed inside the source file, as this would modify the file, so it must be verified manually during software testing and when rolling out the application.

Proposed method

My proposed approach is to compute the hash over the source file, while excluding the parts from hashing which are harmless if modified:

  • The resulting file identifier itself is removed
  • RCS-Keywords are emptied (so they look like $Id$), so they can still be used in doxygen comments, but do not influence the file identification

During review, delivery and compilation, it can be verified, that the file id was reset after the last modification to the source code, and the id is available during software tests and for the application.

Setting the File Identification

Each source file must have a line containing:

FILE_IDENTIFIER();

and each header file must have a similar line:

HEADER_IDENTIFIER();

After the source code is modified, a small tool is used to update the file identifier line, so it contains the hash over all the relevant code. For use during runtime, a name is derived from the filename which can act as a variable name to hold the file id. After that, the line in file.c looks like this:

FILE_IDENTIFIER(file_c, "780392e2f64bdcef705f98e02cd68094");

Checking the File Identification

During review, delivery and compilation, the same tool can check that a file contains such a line, and that the hash value is valid.

The version control system can be configured to reject files without valid file identification by using the same tool to check and approve files, before transfered into the version control system.

Using the File ID at runtime

You can define the macros FILE_IDENTIFIER and HEADER_IDENTIFIER as needed. For example: the macro FILE_IDENTIFIER may create a variable that contains the hash string after expanding.

The macro line:

FILE_IDENTIFIER(file_c, "780392e2f64bdcef705f98e02cd68094");

expands to:

const char* FILEID_file_c = "780392e2f64bdcef705f98e02cd68094";

Note: the macro HEADER_IDENTIFIER should evaluate to an empty statement in all but one object file, otherwise each header's id will be present once for each time the header was used.

A single C file defines HEADER_IDENTIFIER to the same as FILE_IDENTIFIER and then includes all header files. This way, the id of all files will be present in the linked binary exactly once.

Identifying files, used in the application

The application can compute a checksum over all file identifiers using CRC32 or a similar simple checksum. During Startup, the application verifies, that the checksum matches the one, that is computed during software verification.

sum = 0;
sum += checksum(FILEID_file_h);
sum += checksum(FILEID_file_c);
sum += checksum(FILEID_file2_c);
ASSERT(sum, EXPECTED_CHECKSUM)

Identifying files, used during verification

During software test, one file may print all the file identifiers to let the tester ensure with visual inspection, that all intended files are part of the verification activities:

print("file.h: ");
print(FILEID_file_h);
print("file.c: ");
print(FILEID_file_c);
print("file2.c: ");
print(FILEID_file2_c);

If all the test files get a file identifier as well, this printing function becomes large and bothersome to maintain, but this work can be delegated to the linker, by using a more advanced definitions of FILE_IDENTIFIER().

For example, you may define FILE_IDENTIFIER() to extend:

FILE_IDENTIFIER(file_c, "780392e2f64bdcef705f98e02cd68094");

to:

static char FI_FileHash_file_c[] =  "780392e2f64bdcef705f98e02cd68094";
static char FI_FileName_file_c[] = __FILE__;
__attribute__((section(".fileid"))) FI_IDENTIFIER_T FI_FileId_file_c = {
    &FI_FileName_file_c[0],
    &FI_FileHash_file_c[0]
}

This creates two local variables containing the file identifier and the file id, as well as a structure which contains pointers to the filename string and the file id string. This structure is itself placed in the special section .fileid by the linker. If nothing else is placed in that section, you can interpret that section as an array of FI_IDENTIFIER_T structures which can be iterated over to access all file names and identifiers. By printing them into the test log, all involved files are identified.

Summary

In this article, we investigated some methods of tracking pre-certified source code from the origin version control system via the customers development environment further into the customers end-products.

We see some classic methods, and my favorite method proposed and some of the nice features of this method:

  • version control system independent - so works for supplier and customer in the same way
  • setting and checking the identification with one simple tool
  • the identification information is usable during runtime (if required)
  • only Open Source tools required (python >= 3.4 and my great script)

Do you think, this tool could helpful for you as well?

Note: we will providing this tool as Open-Source soon!

Go back

Update Notification

For an automatic notification on new blog articles, just register your EMail address.

We are the Blogger:

Andrea Dorn

After my study of industrial engineering I worked at an engineering service provider. As team leader and sales representative, I was responsible for customers from aviation and mechanical engineering. I am part of the Embedded Office team since 2010. Here I am responsible for the Sales and Marketing activities. I love being outside for hiking, riding or walking no matter the weather.

Fridolin Kolb

I have more than 20 years experience in developing safety critical software as developer and project manager in medical, aerospace and automotive industries. I am always keen on finding a solution for any problem. The statement “This won’t never work”, will never work for me. In my spare time You can find me playing the traverse flute in our local music association, spending time with my family, or in a session as member of our local council and member of the local church council. So obviously I am lacking the ability to say “No” to any challenge ;-).

Michael Hillmann

I have been working for 20 years in safety critical software development. Discussing and solving challenges with customers and colleagues excites me again and again. In my spare time I can be found while hiking, spending time with my family, having a sauna with friends - or simply reading a good book.

Wolfgang Engelhard

I’m a functional safety engineer with over 10 years of experience in software development. I’m most concerned with creating accurate documentation for safety critical software, but lately found joy in destruction of software (meaning I’m testing). Spare time activities range from biking to mentoring a local robotics group of young kids.

Matthias Riegel

Since finishing my master in computer science (focus on Embedded Systems and IT-Security), I’ve been working at Embedded Office. Before that, I worked with databases, and learned many unusual languages (like lisp, clojure, smalltalk, io, prolog, …). In my spare time I’m often on my bike, at the lathe or watching my bees.