Intro: A Quick Rant On Code Snippets
As a developer, code snippets and technical language are absolutely everywhere. They’re in developer docs, shared in messengers like Slack or Teams, written in productivity tools like Notion or Google Docs, they’re unfortunately still sent via email sometimes… and, of course, they’re in our IDEs.
These apps just aren’t well built to identify and handle this code, leading to a series of frustrating experiences as a user.
THE WORST experience is to encounter code snippets that are unlabeled, unclassified and not syntax highlighted 🤦♂️…and if I may say so myself, developers are often lazy and hence every day I see more and more of the latter.
Close behind in annoyance is SLOW DEVELOPER DOCS SITES.
And last but certainly not least, snippets that get lost in the avalanche of Slack/Team/Discord messages. Good luck finding that typescript snippet later, ESPECIALLY if it’s not classified or syntax highlighted.
Today, I’m throwing out some power tips on how to up your game when building a docsite, web app or chat bot that is going to be dealing with lots of code snippets, code editing, syntax highlighting and/or documentation.
The Classic Solutions — HighlightJS, PrismJS and CodeMirror
An old time favorite — HighlightJS — but it’s not without shortcomings…
For some context, earlier in my career I was building a messaging platform for my university and I REALLY wanted to add a feature where users could paste a code snippet into the send input, press send, and the system would automagically think, “Hey, this is isn’t natural language, it’s actually technical language,” and, “Not only is this code, but it’s actually Kotlin or Dart or Typescript,” and, “I should syntax highlight this and do some additional tagging/sanitization/security…”
This would be an amazing experience for users sharing code in my communications platform. Truly, this would be magical for forums, blogs, comments and threads as well!
Alas, the code and technical language generated by our users was unstructured and unlabeled, and therefore oftentimes neither syntax-highlighted nor formatted. 😥
In search of a solution, HighlightJS stood out — it has Automatic Language Detection and Multi-Language Code Highlighting. That’s perfect, right!?
Well, somewhat…
Problem 1: You still need to know if the user input is code / technical language vs plain text / natural language. (This turns out to be quite difficult to solve, but there’s a solution below.)
Problem 2: If it’s code, you need to know what language it is, and then, of course, syntax highlight it!
HighlightJS does not solve Problem 1, and for Problem 2, HighlightJS can attempt to determine the coding language but at an expensive network and compute cost.
The caveat in solving Problem 2 is that HighlightJS introduces several megabytes when loading the Automatic Language Detection system. For reference, when you go to the HighlightJS website and download source for their default 34 languages, depending on your build process, you’ll likely end up adding ~1.6 Megabytes to your bundle size, and — yes, this is minified — for full language support, your bundle size clocks in at ~2.5 Megabytes. Further, when looking at HighlightJS’s Bundlephobia and taking into account variable network speeds, we’re looking at somewhere between 300 to ~5600 milliseconds.
If you take a conservative load time for HighlightJS, say 1500 milliseconds, you’re already negatively impacting your users’ experience.
Beyond page load, HighlightJS is using regex under the hood so you can wreck your Time to Interactive and Frame Rates. If you’re not careful, too much regex processing can easily freeze the user’s window or drop frames below the target 60fps.
Long story short — props to HighlightJS for attempting to solve the Problem 2 around Language Detection, but we can do better.
A New and Improved Stack? PrismJS + CodeDetectionAPI
Given HighlightJS’s limitations, my team at Pieces built and published the CodeDetectionAPI to solve Problems 1 and 2 in our app with machine learning while achieving optimal network and compute performance.
CodeDetectionAPI is open and accessible through our API platform at runtime.dev and via our friends at apilayer.com.
CodeDetectionAPI is gloriously simple and performant. It can determine whether a given string/input is natural language or code/technical language. If it detects code, it will classify it with a high level of accuracy in up to 30 programming languages. And, it does all of this in approximately 250 milliseconds while minimizing the requirements of on-device compute.
And with that, CodeDetectionAPI is finally the silver bullet I’ve been looking for, allowing me to explore other syntax highlighting solutions…
This is where PrismJS comes into play.
PrismJS has a very small core, ~ 2 kilobytes, which results in significantly faster load times than HighlightJS. Further, if we know the language of the code snippet that we want to highlight, we can utilize PrismJS’s Autoloader Plugin to specifically lazy-load stylesheets for particular languages on demand.
Take a look at PrismJS’s Bundlephobia — PrismJS clocks in at significantly faster load times!
When deploying PrismJS with CodeDetectionAPI, the hope is that your users will have a significantly improved experience both around page/product performance but also around ease of use! Code can finally and automagically be treated as code with the beautiful syntax highlighting it deserves.
As an aside, it’s worth mentioning that you can actually utilize a similar technique with HighlightJS by loading specific dependencies without the Autodetection features and supplement those with CodeDetectionAPI, similar to what I do with PrismJS.
Finally, there are times when you’re building a product where you not only want code to be highlighted, but also editable. Problem 2 around Language Detection and Syntax Highlighting also applies in this case.
I want to give a quick shoutout to CodeMirror 6, which is doing a great job of building a modular and performant Code Editor for the browser. CodeDetectionAPI can be used in conjunction with CodeMirror to automatically detect and set the language as the user begins writing. No dropdowns or selections necessary! In practice, this is actually pretty smooth and users enjoy simply being able to write code without configuring anything. 🎩 🪄
Icing on the Cake — All-around Better UX
Whatever you’re building — perhaps a chatbot, messenger, documentation site, forum, online code editor or a developer productivity tool like code.pieces.app, the CodeDetectionAPI is fundamental to leveling up the performance and user experience around writing, sharing, reading and searching the technical language and code within your product.
No more labeling code snippets, no more dropdowns, no more unnecessary loading of large javascript bundles, no more regex parsing on the main thread. At Runtime.dev, we’re creating automagical experiences around code and technical language. The CodeDetectionAPI coupled with something like PrismJS or CodeMirror 6 is a mega-step in the right direction.
Where to next?
To see the CodeDetectionAPI in production in the wild, check out the Pieces app, which also includes some really innovative work around Technical Language Processing, NLP and extracting text from screenshots (OCR).
CodeDetectionAPI is a debut API for the runtime.dev platform. In the coming weeks, look for some fantastic new APIs, including auto-tag generation from a code snippet and the extraction of code snippets from images.
When these capabilities land in the runtime.dev platform, it’s going to open another set of opportunities for app developers to improve Search, SEO and Accessibility. Being able to automatically generate meta-tags for Search and Alt-Text is a game changer for SEO and Accessibility. Furthermore, being able to automatically extract code from images on a webpage is going to level up in-browser features like Find in Page or Copy and Paste.
All in all, I hope you found some of these insights useful and perhaps reconsider your user experience and performance around code snippets!