A lesson in humility

It happened again.

I grossly underestimated integrating a third party SDK into our iOS app based on Kotlin/native. And by grossly I mean not just 2x or 5x, but an order of magnitude. I'm not proud of it, but it's just what it is and it's certainly worth a blog post as I learned one or two (actually 4, feel free to skip to the last section if you are in a hurry) things from it - and I guess I just need to write this down to finally start looking at it not just as a gigantic waste of time, but also as a lesson in patience and humility.

First of all, technically I lied in the intro: I underestimated the effort by at least an order of magnitude, as I am still not done yet. After two weeks. Not working straight on it, but on and off, which is kind of worse as it really created a lot of context switching overhead. That being said, let's start from the beginning.

It was just another task on just another day...

OneSignal is a push and in-app messaging SDK for mobile apps. It helps facilitating communication with your user base and as such is quite important for many mobile apps.

OneSignal integration into our (not Kotlin/native based, but just plain Kotlin) Android app was easy peasy. Just as expected I was done in a day including extensive testing.

So I expected around two days for iOS integration, as firstly, I have nowhere as much iOS experience as Android experience and secondly, our iOS app is based on Kotlin/native which potentially could complicate things. However, I didn't expect much trouble from the latter as the OneSignal SDK was supposed to just touch the Swift side of things, so what could go wrong?

Well, turns out, hell of a lot!

The OneSignal SDK does something clever when you integrate it into your app (no matter whether you are using Swift Package Manager or Cocoapods). Without writing a single line of code its existence as a dependency alone will trigger a process called swizzling on app start. I am not super proficient in iOS development - I am more on a can-make-things-work level, but here is what I understood swizzling does at a very high level: at app start (so at runtime, not at compile time) swizzling will exchange/add implementations of functions. This is usually done so that developers don't need to implement certain calls to the SDK on their own and in the process make mistakes by calling the wrong functions at the wrong time.

In OneSignal's case they are swizzling push notification handling capabilities which should make my life as an engineer easier. Should. The problem is, the way they are doing swizzling changed initialization/loading order of certain classes - something the Kotlin/native code at the other end of the food chain didn't expect!

What the swizzle?

My first thought was: why does the Kotlin/native code even bother? It's all Swift stuff being touched by OneSignal, or is it? Turns out, it's not. Swizzling causes virtually all classes in your binary to be touched - that's just how it works from what I understood from the more advanced documentation and posts you can find online. It seems to follow the visitor pattern: visit all classes and functions and then ask whether the implementation should be changed. If all classes are touched, so are the Kotlin/native classes.

And that's the culprit. Right at app start, the app crashed with a SIGABRT (feel free to ignore the trace, it's just here for the show effect):

__pthread_kill 0x00000001ba63a9e8
pthread_kill 0x00000001dab8d824
abort 0x000000018b0cb0b4
konan::abort() 0x00000001012ce774
kotlin::internal::RuntimeAssertFailedPanic(bool, char const*, char const*, ...) 0x00000001012ce458
__Kotlin_ObjCExport_initialize_block_invoke 0x00000001012d2cb8
_dispatch_client_callout 0x000000018091e198
_dispatch_once_callout 0x00000001808ee7b8
Kotlin_ObjCExport_initialize 0x00000001012d08b8
+[KotlinBase initialize] 0x00000001010fe2fc
CALLING_SOME_+initialize_METHOD 0x0000000197e031e4
initializeNonMetaClass 0x0000000197df91f8
initializeNonMetaClass 0x0000000197df8f84
initializeNonMetaClass 0x0000000197df8f84
initializeAndMaybeRelock(objc_class *, objc_object *, mutex_tt<…> &, bool) 0x0000000197dfd17c
lookUpImpOrForward 0x0000000197df67bc
_objc_msgSend_uncached 0x0000000197df2400
swift_dynamicCastObjCClassMetatype 0x00000001856048a4
swift_dynamicCastMetatypeImpl(const swift::TargetMetadata<…> *, const swift::TargetMetadata<…> *) 0x00000001855c2dec
swift::_checkGenericRequirements(__swift::__runtime::llvm::ArrayRef<…>, __swift::__runtime::llvm::SmallVectorImpl<…> &, std::function<…>, std::function<…>) 0x00000001855f9e38
_gatherGenericParameters(const swift::TargetContextDescriptor<…> *, __swift::__runtime::llvm::ArrayRef<…>, const swift::TargetMetadata<…> *, __swift::__runtime::llvm::SmallVectorImpl<…> &, __swift::__runtime::llvm::SmallVectorImpl<…> &, swift::Demangle::__runtime::Demangler &) 0x00000001855f5cf8
DecodedMetadataBuilder::createBoundGenericType(const swift::TargetContextDescriptor<…> *, __swift::__runtime::llvm::ArrayRef<…>, const swift::TargetMetadata<…> *) const 0x00000001855f4a88
swift::Demangle::__runtime::TypeDecoder::decodeMangledType(swift::Demangle::__runtime::Node *) 0x00000001855f0f88
swift_getTypeByMangledNodeImpl(swift::MetadataRequest, swift::Demangle::__runtime::Demangler &, swift::Demangle::__runtime::Node *, const void *const *, std::function<…>, std::function<…>) 0x00000001855ee934
swift::swift_getTypeByMangledNode(swift::MetadataRequest, swift::Demangle::__runtime::Demangler &, swift::Demangle::__runtime::Node *, const void *const *, std::function<…>, std::function<…>) 0x00000001855ee6b4
swift_getTypeByMangledNameImpl(swift::MetadataRequest, __swift::__runtime::llvm::StringRef, const void *const *, std::function<…>, std::function<…>) 0x00000001855eee28
swift::swift_getTypeByMangledName(swift::MetadataRequest, __swift::__runtime::llvm::StringRef, const void *const *, std::function<…>, std::function<…>) 0x00000001855ec5d0
getSuperclassMetadata 0x00000001855d6a8c
_swift_initClassMetadataImpl(swift::TargetClassMetadata<…> *, swift::ClassLayoutFlags, unsigned long, const swift::TypeLayout *const *, unsigned long *, bool) 0x00000001855d6c64
type metadata completion function for MyApp.WeightHistoryValueFormatter 0x0000000100b1edf0
swift::MetadataCacheEntryBase::doInitialization(swift::ConcurrencyControl &, swift::MetadataCompletionQueueEntry *, swift::MetadataRequest) 0x00000001855e308c
swift_getSingletonMetadata 0x00000001855d3e30
type metadata accessor for MyApp.WeightHistoryValueFormatter 0x0000000100b1d6dc
ObjC metadata update function for MyApp.WeightHistoryValueFormatter 0x0000000100b1ee34
realizeAllClasses() 0x0000000197e119c4
objc_getClassList 0x0000000197e12a70
ClassGetSubclasses 0x0000000104ef407c
+[OneSignalUNUserNotificationCenter swizzleSelectorsOnDelegate:] 0x0000000104f00288
-[OneSignalUNUserNotificationCenter setOneSignalUNDelegate:] 0x0000000104f0021c
+[OneSignalUNUserNotificationCenter registerDelegate] 0x0000000104effe60
+[UIApplication(OneSignal) load] 0x0000000104edf8d4
load_images 0x0000000197dff14c
dyld4::RuntimeState::notifyObjCInit(const dyld4::Loader *) 0x000000010295d9dc
dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState &, dyld3::Array<…> &) const 0x0000000102961a54
dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState &, dyld3::Array<…> &) const 0x0000000102961a3c
dyld4::Loader::runInitializersBottomUpPlusUpwardLinks(dyld4::RuntimeState &) const 0x00000001029675c4
dyld4::APIs::runAllInitializersForMain() 0x000000010298402c
dyld4::prepare(dyld4::APIs &, const dyld3::MachOAnalyzer *) 0x00000001029718bc
start 0x0000000102970170

Yes, that's what I, with my limited iOS knowledge, got to see after launching my app. Quite of a mouthful, huh? There are better ways to start your day, believe me.

It took me some time to figure out what was happening - although that's actually an exaggeration. I didn't really know what was going on, just that swizzling somehow seemed to interfere with some Kotlin/native magic.

Road to redemption?

So I did what every responsible engineer does: fire up Google and search for what I thought was causing the problem. In lieu of meaningful search results (Kotlin/native still isn't as widely employed as it could potentially be) I then reached out to the OneSignal and Kotlin/native developers.

If anyone is interested, here are the bug reports:

and

Luckily, developers of both OneSignal and Kotlin turned out to be very responsive, kudos! And the Kotlin engineer I talked to even came up with a quick to implement workaround, after I had managed to create a sample project for him to replicate the issue: I had used CocoaPods to integrate the OneSignal SDK, he asked me to use SwiftPM instead and at the same time make the Kotlin/native shared code module a dynamic framework.

And it worked! As you might expect I was overjoyed! And it got even better, the Kotlin native engineer already had implemented a fix for the issue which is scheduled to be released in about 3 months as part of the next Kotlin beta.

Actually, I would have preferred to stay with CocoaPods and the static framework and instead disable swizzling for the OneSignal SDK, but it's not clear yet, whether the latter is possible - the cost of taking away much of the integration work from user-engineers is usually paid by the same engineers when their setup is not as expected. ¯_(ツ)_/¯

At that point, I had spent about 2 weeks on the problem. Phew.

Time to merge. I created the PR and our CI started to do it's thing. Part of the thing is to create and export an archive for internal testing purposes. And guess what? Yup, exporting failed.

shared not found in dylib search path

Bummer.

By changing the shared Kotlin/native module to be a dynamic instead of a static framework I screwed up linking big time! At the time of writing this issue has not yet been solved despite two days of try and error (once more violating learning #3, see below). I'll update the post when we have a solution for that problem.

Lessons learned

But I already learned my lessons (at least for the time being):

  1. Never underestimate the integration costs of a third party SDK, especially if it promises to be integrated "in less than 10 lines of code" (quoting the OneSignal landing page).
  2. Never underestimate the added costs when employing a growing, but not yet established technology (such as Kotlin/native in my case). However, to be fair, this was the first time I stumbled on such an issue with Kotlin/native, never had any similar problems over 2 years of development, so it also seems to be a special case.
  3. Never overestimate your own skills and instead ask for help, when you need it. It took me three days until I reached out to the OneSignal and Kotlin engineers. Before I spent way to much time trying to debug things I clearly had no real idea of. That doesn't mean one should bombard dev-support or Stackoverflow without spending some time to think about a solution on your own. But you should foster the mental clarity to see when you need help. And when you reach out to your fellow developers: ensure that your bug report adheres to their project's guidelines and provide all the information needed in a concise and complete manner. It's just a matter of etiquette and respect of their time.
  4. Never underestimate the time it takes to create a sample project. Here, it took me over two days to finally boil it all down to the essentials and making sure that the issue is still reproducible.

Now, any of those learnings should be new to someone who spent more than two decades on coding (oy, I'm getting old...). But it helps to reiterate them from time to time to overcome the hubris that sometimes comes with growing experience.

In the end I will probably end up to be the person reading this post the most - it shall serve as a reminder for my future self, but might also help you, dearest reader, to avoid the estimation traps I fell into.

Did you find this article valuable?

Support Sven Bendel by becoming a sponsor. Any amount is appreciated!