Wednesday, 5 June 2019

Procedural Map Reduce and Parallelism

Since toy examples seem like fun, I thought I would make my own toy example in Ada, and play around with some parallelism while I was at it.

The idea is that we want to code a very simple but optimised map reduce function across a dataset consisting of several million 64-bit floating point values. In our code, we want to map the square function over the data points and then reduce with the sum function. In other words, we wish to calculate the sum of the squares of the data points, as illustrated by the following Common Lisp:

In reality, instead of using five integers for our data, we will be using 256 million 64-bit floating point values, occupying 2,048,000,000 bytes (2GiB) of RAM. That should make for a fairly decent computation.

A straightforward way to do this in Ada is shown below. (Code is available at this link.)

The Params module merely keeps the desired number of data points, so it can be easily shared with the subsequent parallelised version of the algorithm:

It should be readily seen from perusing the code that we are allocating an array of 256 million IEEE 64-bit floating point values, setting each value to Pi, and then summing the squares straightforwardly with "Sum := Sum + Val * Val;".

Compiling this with full optimisations (-O3), and running it gives us an output similar to the following.

Sum           :  2.52661872562787E+09
Duration (ms) :  203.672997000

The C example in the article mentioned in the first paragraph achieves 17ms for 32 million data values. That is about 33% faster than this naive Ada code. I would attribute this to the C compiler having AVX2 instructions activated, which may not be the case with the GNAT Ada compiler I used.

However, unlike C, Ada has tasking built into the language, so let's see what we need to do to parallelise the computation, and what effect this may have on the execution time. Here is a parallelised version of the algorithm.

There are two main additional complexities here: defining the worker task (in lines 28 to 59), and dividing up the array into suitable chunks depending on the number of worker tasks to be run (lines 63 to 87). Of these, I'm sure that latter of these two can be greatly simplified, but the initial approach works.

The main part of the code (lines 109 to 120) takes care of starting each of the worker tasks, collecting the results, and summing them to arrive at the overall total.

The last few lines (123 to 127) merely calculate the map reduce time and then print out the details.

Running this code with eight worker tasks results in output similar to the following.

Sum           :  2.52661872742146E+09
Duration (ms) :  111.897841000

This represents around a 45% performance improvement over the single-threaded Ada code, and about an 18% improvement over the C version from the referenced article.

Now, a slightly dissatisfying aspect of these results is that the measured duration of the process would vary somewhat from run to run, no doubt depending upon the vagaries of what was happening in the system from moment to moment. In order to be a bit more scientific, it would be nice to run the job a certain number of times and then report the mean duration. To do this, I first added an extra parameter for the number of calculation runs to perform:

I settled on a choice of fifty runs. As you can see, I also upped the number of data points to a thousand and twenty-four million, which consumes 8GiB of RAM to store the data array.

I then factored out the single calculation run into its own subroutine so that it could be called multiple times:

The factored out procedure can be seen in lines 21 to 39. This is called the required number of times in the main program (lines 68 to 70). Then the mean duration is calculated (line 72), and the results printed out as before.

As expected, this approach greatly evens out the variations previously obtained. The following is a sample output:

Data points    :  1024000000
Sum of squares :  1.01064747680946E+10
Duration (ms)  :  824.309429480

Comparing this again to the C time of 17ms for 32 million floats, the C code using AVX2 instructions (not available on many hardware architectures) does 34% better. So, the previously shown result was about right.

I made corresponding changes to the multi-threaded example, but also ensured that each task would run on a dedicated CPU, so that there would be no ping-ponging from one to the other. In this particular exercise, doing this did not seem to make much of a difference, as presumably the Linux scheduler already did an excellent job in spreading the load. Version two of the multi-threaded code follows.

Part one:

Part two:

I also used a more sensible method of setting the worker index ranges into the dataset.

The results of running this were as follows:

Data points    :  1024000000
Worker tasks   :  8
Sum of squares :  1.01064748782376E+10
Duration (ms)  :  455.692586600

This represents a 16% performance benefit over the architecture-optimised C code (see article referenced in the first paragraph). The previous estimate was 18%, so there is a marginal pessimisation factor happening here, probably the result in the first instance of choosing a suboptimal representative run.

The gains of multitasking are obviously there, and would be even more pronounced with tasks requiring more CPU-intensive computing, but of course, there is the added programmer work involved in parallelising the algorithm, which most often would not be done unless performance were crucial. The good news is that in the next Ada standard (202x) just around the corner, we expect that this sort of simple parallelisation will become the matter of only an extra keyword or two, thus making it even easier to exploit multiprocessing in Ada!

Monday, 31 October 2016

Lisp to Ada, the Omer Count, Part Two

In the previous instalment, we introduced this series of articles whose purpose is to explore, using a very simple arbitrary example, the idea of using Common Lisp to interactively discover and develop a program requirements definition which would then be implemented in Ada 2012. This is a type of prototype model, except that the prototype is not so much the implementation of a specification as it is the specification itself.

We already talked about the requirements of the example "omer count" program in that first article, but in reality, many of those detailed requirements or specifications were developed during the course of interactively creating the script itself. The purpose of the present article, then, is to retrospectively give some idea of the sort of process which was followed in developing the simple program or script.

We will not review here the purpose or requirements of the example program. Those interested are encouraged to quickly review the previous article first before continuing.

The Beginning

It was known at the start that the program would never be anything more than a simple script, so it was written that way from the beginning. For example:

  • Everything is in one small text file
  • There is no system definition detailing module dependencies. (The one external library dependency is noted as a comment at the top of the file.)
  • A few general purpose, self-written library functions are copy and pasted in preference to using the whole library, for no other reason than that happened to be inconvenient at the time.

One thing that I did do which wasn't necessary, was to create a dedicated package to accommodate the script without polluting the CL-USER package.

Here are the top few lines of the program, which were written immediately to get started:

The first line is a comment noting the dependency on the DATE-CALC library, which provides a DELTA-DAYS function for calculating the number of days between one date and another.

The DEFPACKAGE form defines a package called DKT.OMER, and specifying a shorter "nickname" of OMER. The COMMON-LISP package symbols are used or "imported", and the symbols OMER and OMER-FOR are exported. (These exports were actually added later.)

The IN-PACKAGE form sets the current package to DKT.OMER, so that all unqualified symbols from then onwards refer to symbols in that package. (When things like functions and variables are given names in Lisp, symbols are used for the purpose.)

The DEFPARAMETER form creates a dynamic special variable, or parameter (hence the name) to the program, called *PESACH* which will store the required base date from which the count is to begin.

Starting Interactive Development with a Lisp Image

At present, all we have is a few lines of text but no live code. Now, to most programmers, that is a confusing statement. This is because Lisp development typically occurs within a live, running Lisp environment, and not in a batch-like edit, compile, run cycle, as happens typically in most other program development environments.

Since my Lisp IDE consists of the combination of Emacs and SLIME (Superior Lisp Interaction Mode for Emacs), all I do to start interactive development is type in Emacs "ALT-X slime <enter>" and within a second or two I have an interactive REPL (Read Eval Print Loop) prompt within my editor.

After that, I want to load the DATE-CALC library, downloading and compiling it, if necessary, by typing the form: (ql:quickload "date-calc").

Then I compile my omer.lisp file containing the above few lines of code by simply pressing CTRL C K.

And lastly, to switch the REPL into the OMER package, I just type CTRL C ALT P OMER <enter> or alternatively type the following equivalent form at the REPL prompt: (in-package #:dkt.omer).

Incremental Development

It's time to get started with interactively building up the required functionality. The most basic requirement is that we need a count of days, so we can start by entering something like this at the REPL:

Output looks good except that we never want a day count of zero. Now, since we are counting up to day fifty, it would be nice to also be told how many days are left, so let's add that:

Day fifty is our target (Shavuot), so all is looking as expected so far as counting is concerned. However, we want the day count to be an ordinal number in words (fourteenth, twenty-second, and so on), we want the number of days remaining to likewise be written as a cardinal number in words (twenty, thirty-four, and so on), and we want singular and plurals in the output to be correct (we don't want to read, "there are one days to go!") This can be done as follows by adjusting the call to FORMAT:

Note that FORMAT is a very powerful sub-language in Common Lisp, the details of which can be found online here.

Now that we are thinking about how far left to go until Shavuot in our counting, we realise that it would be nice to note when we reach half way! So let's go ahead and add that little note:

Apart from the day count, we also want to know which week it is since Pesach, and which day of that week it is. Because Pesach can fall on any day of the week, the "day of that week" just mentioned is not equivalent to the normal days of the week. After a little mucking around with modulus arithmetic, I came up with the following code:

Finally, we also want to keep track of how many Sabbaths (Saturdays) we have counted along the way. To do this, it would be nice to have a function which would return a count of Sabbaths from the day after Pesach up to and including a given date. Here is one that works, along with a supporting function to increment any given date:

Note that it would certainly be possible to come up with some clever formula which calculates the same answer; however, this method is easier to verify correct, and any lack of efficiency in calculation is purely notional given the short time periods involved in the calculations. In fact, we can easly calculate the number of Sabbaths over a time span of a thousand years within a couple of seconds, which is more than good enough for our purposes!

Now that we have a Sabbath count function, it is very easy to add the Sabbath count to our original code, as shown below:

We're getting pretty close now, but we have accumulated a few lines of code at the REPL, so it's probably time to formalise this into an actual callable function. Since the function outputs the required omer count for a given date, we will just call the function OMER-FOR, as follows:

All we have done here is move DATE so that it becomes the function parameter, and we have added a heading line to the output. So now we can write:

We're nearly there. We have taken care of the normal cases of counting; that is, where the day count is greater than zero and less than fifty. We need to do something different when the day count is:

  • equal to zero,
  • equal to fifty,
  • greater than fifty,
  • less than zero.

Since the code so far represents the normal case, we will encapsulate it in its own local function and call it NORMAL-COUNT, as follows:

FLET means Function LET. In this example, it is binding the five forms of our code so far (FORMAT, WHEN, FORMAT, FORMAT, and LET) to the name NORMAL-COUNT, so that we can call (NORMAL-COUNT) when we want to evaluate or execute that code.

In the place of the comment, "Normal and other cases go here," we need to put a conditional expression to take care of all five cases (the four noted above, plus the normal case). In Lisp we use a CONDitional expression for that, as follows:

In the first case, where (< 0 day-count 50), i.e., when the day count is between zero and fifty but not equal to zero or fifty, we want to evaluate the code for NORMAL-COUNT.

In the second case, where (= 0 day-count), i.e., when the day count is equal to zero, we want to print out a single line message noting that the day is Pesach.

In the third case, where (= 50 day-count), i.e., when the day count is equal to fifty, we want to print a message saying that the day is... and then print a banner that reads "SHAVUOT".

In the fourth case, where (> day-count 50), i.e., when the day count is greater than fifty, we want to print a single line message saying that Shavuot has passed.

In the fifth and final case, which is when the day count is negative, we want to print a single line message saying that it doesn't make any sense, since we don't start the count until the day after Pesach.

Inserting this conditional code, we now have:

So, let's try it out:

All looks good, but we can note two things. First, when using the REPL for this, as is the intention, it is slightly annoying to see the NIL return value printed after each call. Second, we have conveniently skipped showing the code to generate the banner. The fix for the NIL issue is simply to return something besides NIL at the end of the OMER-FOR function, for instance, the symbol OMER-COUNT. So, we now have:

And when we run it, we get OMER-COUNT as the evaluation result instead of NIL:

The code to generate the banner is as follows. It simply consists of one function per letter, and one function to print the required word. Each letter function defines a string vector of length six, since each letter consists of six lines of print, and prints out one of those six lines depending on the value provided for the LINE parameter. Since these "functions" simply print output, and the return value is ignored, they are really "procedures" called only for their so-called "side effects".

There is one final finishing touch. During the period of counting, the computer is quite capable of knowing what today's date is, so we don't need to tell it all the time; hence, we just want a function, call it OMER, which supplies today's date to OMER-FOR:


We have now accomplished a couple of things. First, we have finished defining what the requirements are for this application, and these requirements are embodied in the "executable pseudo-code", except that the pseudo code is real Lisp code (which is why it is executable).

The second thing we have at least attempted to accomplish is to hint at the dynamic nature of interactive and incremental development in Lisp, which is quite unlike most other languages, including other languages which may boast a "REPL". Having a REPL is far from the only requirement for a fully immersive, interactive development environment.

Next Steps

We are now ready to take our design specification and use it to develop a type-safe, binary executable application in the Ada 2012 language. For this exercise, we are going to forget that the whole point of this particular program was to develop an easily-modifiable script which can be conveniently executed from within the REPL. (Note that the REPL is not just for development! It is also for executing finished, "production" code.)

Until then, happy programming!

Addendum: Full Source Code of the Design Specification

The only code omitted consists of a couple of functions from the DATE-CALC library. The source for these can be obtained if it is required to code equivalent functionality ourselves in the target Ada language.

Friday, 21 October 2016

Lisp to Ada, the Omer Count, Part One

This short series of posts will concern itself with using Common Lisp and Ada to create a simple omer count application. Common Lisp is used to incrementally and interactively generate what is effectively a working code specification of the required functionality. A prototype, in other words, but one which itself defines the functional requirements of the application. This is what may be called an executable specification, fully documented by the working source code.

Using code as documentation is a powerful language-based idiom, against which we may contrast essentially picture-based idioms such as the use of UML diagrams.

After the Common Lisp prototype is complete, a more formal implementation in Ada will be designed and coded according to the Lisp specifications, resulting in a compact, standalone executable program which produces the same output as the Lisp.

The purpose of the exercise is to explore in microcosm how well Lisp and Ada can work together to engineer a solution to a problem.

There is a small issue, though. I don't know Ada. Well, it's not entirely true, since I remember programming a network battleship game in Ada 95 about a decade or more ago, while my son did a network protocol compatible version in Python. Since then, there have been two major updates to Ada: 2005 and 2012. So I'll be learning Ada along the way. I have ordered John Barnes' Programming in Ada 2012 (*), which I am now waiting to receive for the second time, due to it being damaged in the mail <sad face/>.

It is said that a software engineer is a software engineer first, and a programmer of a particular language only secondarily, with the implication that the language is learned a lot more easily and quickly than the engineering. We'll be putting that little pearl of wisdom to the test! <smile/>

(*) The link is to Amazon, but I purchased my particular copy elsewhere.

What is an Omer count anyway?

There is the word 'eclectic' in the title of the blog, right? So what better way to kick off than with a little example about something that no one's ever heard of? Well, that's not quite true, as you will see.

The biblical people of God, called the Hebrews, the Israelites, the House of Israel, or most often simply, the Jews, have two feasts---among others---called Pesach (which is Passover) and Shavuot (which is Pentecost). The Passover holy day is on the fifteenth day of the month of Aviv, and Pentecost falls fifty days after that --- hence the English name, which comes from the Greek, and meaning 'fiftieth'. Counting the fifty days from Passover to Pentecost is often called the "Omer Count" because an omer is a measure used for grain, and since Pentecost and Passover occur during the northern hemisphere Spring harvest time, and Pentecost in particular is often referred to as a harvest festival.

The Scripture gives a couple of particular instructions on the counting from Passover to Pentecost:

  1. The counting is to begin (from one) on the day after the Passover Sabbath on 15 Aviv.
  2. The days from one to forty-nine are to be counted, and then the fiftieth day is to be celebrated as the day of Pentecost.
  3. The celebration of the fiftieth day is to begin at sunset of the previous day.
  4. The seven weekly Sabbath days along the way are also to be separately counted.
  5. Each of the seven weeks is to be counted.

We take it that to do this counting, the intention is that it be done each day until the eve of Pentecost arrives; and therein lies the problem which this program is meant to solve: a single person performing this count manually each day may possibly be error prone, especially if done first thing in the morning while still sleepy <grin/>. It's handy to have a convenient confirmation that one has not skipped or doubled up on a day, as well as to have a usable textual notice that can easily be copied and pasted for sending off to friends as a reminder.

So, it's quite a trivial little example which may better be called a script than a program or application, but it serves well as a first experiment with the whole Lisp to Ada design methodology.

Desired output

The normal output desired is like the following sample:


It is the ninth day of fifty to Shavuot.
There are forty-one more days to go!
It is the second day of the second week since Pesach.
We have counted two of the seven Sabbaths!

This demonstrates the following requirements:

  • The heading in upper case, containing the day of the week as a non-abbreviated English word, and the date in ISO 8601 numeric date format as shown.
  • The first line of the output body shows the number of days out of fifty to Pentecost, which is referred to as 'Shavuot' (meaning 'weeks').
  • The second line of the output body shows the number of days remaining.
  • The third line of the output body shows which day of which week since Passover ('Pesach'). Note that this does not correspond to normal calendar weeks, as Passover can be on any day of the week.
  • The fourth and last line of the output body shows the number of weekly Sabbaths (Saturdays) counted up to and including the given date.
  • Except for the date, there are no printed numerics. All numbers are written out in words, including ordinals (first, second, third, etc.).
  • Singulars and plurals must be printed correctly. For example, we do not want to see, "There are one more days to go!", but instead, "There is one more day to go!"

If the output is for a Saturday, the content is varied slightly, as follows:


It is the fifteenth day of fifty to Shavuot.
There are thirty-five more days to go!
It is the first day of the third week since Pesach.
It is the third of the Sabbaths!

That demonstrates the following additional requirements:

  • The "SATURDAY" is replaced with the word, "SABBATH".
  • The last line of the output reads, "It is the xxxth of the Sabbaths!", where xxxth is 'first', 'second', 'third', ... , 'seventh'.

There are four outlier or boundary cases which we also want to handle:

  1. The date of Passover itself is not included in the count. If the count is requested for this date, the output should be as per: "2016-03-25 is Pesach. Fifty more days to Shavuot." Obviously, the date shown will vary from year to year, since Passover is defined according to lunar months and not Gregorian calendar months.
  2. The date of Pentecost (Shavuot) itself is also not included in the count. If the count is requested for this date, the output should be as per the following:

Today (2016-05-14) is ...

SSSSSSS  H    H      A     V     V  U     U   OOOOO   TTTTTTT  
S        H    H     A A     V   V   U     U  O     O     T     
SSSSSSS  HHHHHH    A   A    V   V   U     U  O     O     T     
SSSSSSS  HHHHHH   AAAAAAA   V   V   U     U  O     O     T     
      S  H    H   A     A    V V    U     U  O     O     T     
SSSSSSS  H    H   A     A     V      UUUUU    OOOOO      T     

  1. Obviously, any date after Shavuot is also not included in the count. If the count is requested for any date after Shavuot, the output should be, "Sorry, but you have missed Shavuot! :-("
  2. Equally obviously, any date prior to Passover is not included in the count. If the count is requested for any date before Passover, then output, "You have confused me." Or else some informative but boring message explaining that the omer count does not begin until after Pesach.

Note in passing that the dates given in the examples were the actual dates for 2016. For the benefit of those who may know that the Jews kept those feasts one month later in 2016, the reason is that the Jews now follow an imprecise, precalculated calendar which is slowly drifting with respect to the true solar year. The accumulated error in the Jewish calendar caused all of their feasts to be one month late this year, as a result of intercalating an extra (leap) month in the wrong year.

Preparing to develop the Common Lisp version

It is assumed that if you wish to work with or run the Lisp code, then you will already have a Common Lisp development environment set up. I am personally using and recommend the combination of Emacs and SLIME as a development IDE on Linux, together with the latest release of SBCL or CCL as the Common Lisp implementation. If you are new to Common Lisp, then it may be less trouble to get started with LispWorks or Allegro Common Lisp. Details of how to do this are beyond the scope of this article. All of these options will work on Linux, Windows or Mac.

What's next?

In the next part, we will look at what was involved in developing the Lisp version of the program interactively and incrementally, and the full source will be presented so that those who wish to run it or modify it are able to do so. The script was actually programmed back earlier in the year, in March or April, so there will be a bit of reconstruction from memory involved.

The Ada version, however, has not yet been developed, so we will be able to see what it looks like moving from a simple Lisp example program and implementing this in Ada, which is the main purpose of these posts. A small extra challenge will be learning Ada along the way! :-)

Thursday, 20 October 2016


Welcome to Eclectic Software Engineering, a blog for software developers and engineers, and those who aspire to those roles. While "eclectic" in the sense that we do not aspire to formally teach software engineering or the craft of software development, nor do we aim to build up any sort of comprehensive "curriculum" or treament of set topics, it is hoped nevertheless that the content to come will be found both interesting and useful to those working in the software development field (or those wishing to do so).

Apart from the occasional, or even more frequent, general article relating to software engineering and development, you will also expect to find here numerous articles complete with downloadable source code detailing the creation of actual applications and toy or illustrative programs meant mainly to demonstrate particular focus points. In the case of larger applications, we expect there to be a series of articles relating the various stages of the analysis, design, development, testing and deployment of them. It is hoped that these may be used by others both as a tool for learning and as a base for further development, where the code proves to be of sufficient interest or utility to the individual concerned.

All code provided here will be free and open source, and made available on GitHub so that it can be freely downloaded and modified as required. Naturally, as we are all tired of hearing, no warranties or guarantees of any sort are implied by making this code available to you for your own use. You are responsible for ensuring the code you use and create will do what you want it to do.

On with the show! Happy programming!