Migrating from Eclipse Tycho to bndtools (OSGi)

bndtools logo

In this post we show how we migrated our codebase from an Eclipse Tycho OSGi build to a bndtools / Gradle build.

Disclaimer: This post is rather technical and targeted at Java-Developers. Feel free to silently leave for your own peace of mind.

Intro

Synesty has always been built on a OSGi module technology right from the beginning in 2010 – a choice we did not regret. OSGi is a natural fit for us, since
every Add-On in Synesty (e.g. a Connector for a Shop or ERP-System) is represented by an OSGi-bundle. Add-Ons can be created by us, but in also by 3rd-parties and they can come and go at runtime anytime (installed as plugins while the application is still running)
This property is one of the huge benefits of OSGi.

But since then we already completed two fundamental changes in our build-system. In 2010 we used an Ant-based PDE build process. In 2015 we switched to a Maven build based on Eclipse Tycho – still living in the PDE-Manifest-first world. This felt a bit better, but it still based on Eclipse Target-Platform-Concept and P2-Update-sites.
At the end of 2019 we started a new migration to bndtools with the help of Data in Motion Consulting. Thanks to a lot of discussions and beer they told us about bndtools which they have started using some time ago and are now using for almost all their OSGi based projects. They are now also a OSGi-alliance member.

Why now bndtools?

The problems we wanted to solve were the following:

  • time consuming setup of development environment (required a very detailed documentation developers)
  • Eclipse Target-Platform was hard to maintain
  • OSGi MANIFEST.MF needed to be maintained manually (manifest-first)
  • adding new dependencies was complicated
  • discrepancies of build process in Eclipse vs. commandline (Maven – Tycho)
  • weired effects when switching git-branches which confused Eclipse

Setting up development laptops, or switching branches or adding a new library (a .jar file) to our local P2-Update site was a pain. It required written documentation, some manual hacks to .xml files and restarts or cache clearings, when Eclipse did not recognize the changes.
We wished we could just add new libraries by adding some maven coordinates somewhere and done.

Sure, the reason could also be our own incompentence, but it felt like black magic sometimes.

Our wishes and goals

  • simpler setup of development environment
  • easier management of dependencies
  • specify dependecies as Maven-coordinates (no Target platform anymore)
  • to be able to update used technologies easier to current versions
  • use newer OSGi technologies like Declarative Services (DS) to reduce boilerplate code and use annotations
  • reduced startup time of application through better lazy initialization (only start bundle when another bundle starts using the reference)
  • consistent build in Eclipse and Command line

How was the migration process?

Painful :) Since an operation like this is like re-building your house, based on a different fundament, lots of parts needed to be replaced by newer materials while still behaving exactly the same.
But we already have tons of customers with thousands of running flows living in that house, which should not notice anything about this change.

That means our migration strategy had the following parts:

  1. Get the build up and running using bndtools
  2. Go live in parallel on a few servers.
  3. Really go live on all servers.
  4. Improve code base and add the shiny features

1 and 2 (and finally 3) almost took our team around 15 month doing it besides normal daily work.
The hardest part was to get the dependencies right.
What does that mean?
Before we basically had a local folder with .jar files which was our target-platform.
Now we wanted to define as many dependencies as possible using maven coordinates, so that they are fetched from maven central. Those libraries are supposed to be real OSGI bundles ideally.

Most of the libraries we use are already OSGi-ready (having OSGi meta data), BUT unfortunately not all of them. So we had to wrap some of those libs into OSGi-bundles (manual process, but can be done using bnd / bndtools).
In some rare cases we still put the .jar files in the bundle itself where we thought it was not worth the effort – e.g. some legacy vendor SDKs are coming as a ZIP file including tons of dependencies which are not needed anywhere else.
As those cases were living in a separate bundle anyway, we just left the .jar files there too.

The main problems we had were related to classloading issues (which a lot of problems are when you read about OSGi) where some library was expecting a class to be loaded by a specific classloader. Unfortunately those errors occur mostly at runtime or app startup.
Understanding the error messages related to resolving and classloading were the hardest part. Sometimes it took a ridiculous amount of restarting, googleing, trial and error.
But to be fair: the classloading of OSGi – every bundle has its own classloader – is a major strength of OSGi. It allows e.g. using different versions of the same library in your application. Something which is almost impossible with plain Java or requires at least other hacks.

One very helpful debugging-tool along the way was the OSGi bnd Snapshot Viewer.

Also our Spring and Hibernate stack does not fit very well into OSGi World. Basically we kept both of them in a single central “main-app” bundle together with all entity classes which are used throughout our application. This is a part which is exactly done as before in “our” old world. Also the .jar files for Spring and Hibernate are still locally in the bundle due to class loading issues.
We did not tackle this problem yet. Maybe in the future when we upgrade to more recent versions.

So basically we have a kind-of monolithic CORE bundle with lots of dynamic OSGI-bundles flying around outside (our Add-Ons).
The CORE contains our Base-Application artifacts (database, entities, web app) while the Add-Ons are additional connectors and features build on top of the CORE.

But finally we did it

Yeah. It was around January 2021 when we went live with a full bndtools environment and gradle build on all servers. The month before we did a parallel deployment of the bndtools branch on a single production server to spot any runtime bugs in a real production environment. That means that a single server was already running in the new world while all other servers still had the old world. Since both are based on the current master (old world) that way we could do a side-by-side comparison.

We found a couple of runtime bugs for some areas which were not covered by automated testcases. But thanks to our monitoring we quickly could stop the affected server when the bug was caused by the “new world” so that only a very small fraction of customers was affected for a very short time. This is far from optimal, but for us it was the least risk in this big-bang migration scenario which touched almost every corner of our code base.

Since we went life we started improving our codebase even more and making use of new OSGi features.
Mainly we started splitting out some larger bundles into smaller ones, introduce API-bundles, using more and more @Component and @Reference annotations here and there. Basically building better and smaller modules and removing old sins from the past.

Conclusion

What did we gain after all this pain?

Simpler development setup

Every team member who switched their Eclipse from the old world to the new bndtools world confirmed that the setup-process was much simpler now.

Basically for a new team member it is now:

  • git checkout repo
  • Download Eclipse
  • Install bndtools Eclipse Plugin
  • Import “Existing projects” of the plugins folder of the repo
  • Start application using our .bndrun file

Compared to the large step-by-step documentation we had before for the old world, these few new steps above are very minimalistic.

Also when switching between feature branches it is not a headache anymore.

Adding or changing dependencies is simple now

To change or add a new library all we have to do now is changing a line in our
central.maven file which contains all of our dependencies.

An entry in this file looks like this:

org.freemarker:freemarker:2.3.30

which corresponds to maven coordinates in a pom.xml like:

<dependency>
 <groupId>org.freemarker</groupId>
 <artifactId>freemarker</artifactId>
 <version>2.3.30</version>
</dependency>

Before, this process was harder and involved messing with the Eclipse target-platform to recognize the new jar. Fortunately I cannot remember the numerous steps needed to make that happen anymore.

Better modules and easier codebase

Creating new bundles is easier now using bndtools Eclipse plugin as it is just a few clicks to get a new bundle added to codebase.
Using DS Annotations to wire our bundles using @Component and @Reference also the boilerplate code became less, since we could get rid of Activator classes or ServiceTrackers. This makes it also easier to read and maintain.
For example, before, to create a new Servlet / Controller, a developer had to touch 3 different places to register stuff. Now it is just adding some annotations to the class and registration happens automatically.

Any drawbacks?

Yes. It is the longer build time in Eclipse.
To give you an example:
E.g. we have a Util-Bundle with static helper classes which are used in almost every other bundle. So a single change in that Util bundle causes a full rebuild of all other bundles which depend on it.
This takes much longer now (between 30-50 seconds) and also hot-code reloading works different. (Note: this is a worst-case situation for a small fraction of very central classes. For most bundles build times are 1-2s.)

Why?

First: The main problem might be ourselves. We may structure our code / bundles differently to avoid a situation like this.
But on a technical level it is the difference how bndtools and the former Eclipse compiler work.
Eclipse PDE before was based on the ECJ (Eclipse Compiler for Java). It somehow does some incremental build magic which compiles Java code much faster. Not sure what exactly it does, but in the old world a change of a central Util-function just took a few single-digit seconds for the full workspace build even with Hot-code replacement when your app is running. While now something like this takes easily 30-50 seconds.

The reason why it is much slower now, is how bndtools works. (…not saying bndtools is slow. It just works different.)
Whenever you change something, bndtools checks the dependency graph which other bundles depend on it.
And then it first builds your bundle, and then all other dependent bundles one by one. Building a bundle means compiling javac code, creating the .jar file , installing it into the repository (local) + stopping and uninstalling the potentially running bundle, re-installing and starting of the new bundle in case your server is already running. A lot of (heavy) steps.

This is the correct OSGi way how things should work and the base for the cool features like deploying bundles at runtime without restarting your app. Basically it is re-deploying the bundle into the running OSGI-container (Equinox in our case).
But it also means, things will be different and you have to think and code different.

We are probably still far apart from a perfect modularized system and still working to improve things, but for the important parts we already benefit a lot from this new world.

How did we reduce the build times?

Improvements often means to restructure bundles and dependencies by splitting bundles into API and Implementation bundles.
e.g. create API-bundles and move the interfaces there, and make sure other bundles only depend on the interfaces in the API-bundle. As interfaces do not change as often as the implementations, the long compile times are reduced, since you rarely change the interfaces.
Sounds easy in theory, but sometimes easier said than done with a running system in production.

So we are basically fine right now. For the places with frequent code changes, we have done proper splitting of bundles into API-bundles and Impl-Bundles so build times in Eclipse are very short (seconds). We still have the issue with a few central Util-bundles but we refactored them so that changes are not happening very often anymore.

Final words

Puh, a lot of stuff above…
This migration project was a huge effort for us, but we are confident that we did the right thing. We have a more current and stable platform now and a lot of options to make use of new tools which bring us forward on our journey to “automate and connect without coding” – while we have to code a lot to achive that :) Thanks to Jürgen from Data in Motion and the bnd / bndtools community. Everybody was always very helpful when we had questions or problems. Also thanks to our customers who don’t care at all about all this and just did not notice anything of this… until this article ;-)