Enabling applications to inject consistent, rich experiences inside blocks
October 24th, 2022
Open up an application and start typing.
There’s a good chance that wherever you’re typing offers a world of possibilities beyond just plain text.
Inputs and dropdowns are summoned by hotkeys, ready to inject links to people, pages, places.
Formatting bars wait dutifully nearby, seen or unseen, ready to elevate your prose.
The text and the space around it can brighten your screen with a rainbow of colours.
We don’t often stop to think about how much magic underlies all this.
A single formatted paragraph can be made up of dozens of pieces, each with a constellation of metadata.
Any combination of fragments of those pieces can be selected, and transformations demanded.
As we type, keystrokes are intercepted, inputs rendered, databases searched, relationships created.
The average user is blissfully unaware of any of this complexity.
They only notice when something they are familiar with is missing, or doesn’t work as expected.
When the hotkeys painstakingly learned in one application do nothing, or — even worse — do something else. Ugh. Who decided that?
A good text editing experience makes you forget it’s there.
It makes you forget that the humble rectangle you are typing inside packs in more features, pixel for pixel, than anything else around it.
Text editing varies across applications: what features are available, and how the text is stored.
Block-based applications are no different. Think WordPress, Contentful, Notion.
At HASH, we’re helping develop a new open standard: the Block Protocol. Its goal is to allow any block to be used in any application, with zero configuration
It does this by standardizing how blocks and applications communicate, and what messages they exchange.
Because rich text editing is a core part of many modern web applications, we knew the Block Protocol had to support it.
We had to decide how blocks made by different people could provide a consistent text editing experience within an application, while allowing for variations in the experience across applications – without the block knowing anything about the application it’s being used in.
There are two parties to the Block Protocol – the block, and the application using it.
To display some formatted text on a web page, one of these parties has to take a representation of the text and render it.
We can’t expect blocks to take any old text and understand it. The same words with the same formatting in different applications can look identical on the screen, but vary wildly in how they look behind the scenes. Text might be stored as Markdown, HTML, or any of a number of other approaches to storing a series of characters with formatting.
If we wanted the block to handle rendering, the Block Protocol would need to specify a standard representation of formatted text, that blocks could predictably parse.
Text would have to be transformed from however the application stores it into that standard interface format and then transformed back again when the block requests updates to it.
This is absolutely possible – it’s a burden on the embedding application, but we already require that applications translate their records into a specified graph representation of entities and links. For text, we could choose a popular format – some flavour of Markdown would be a good candidate – and provide utilities to handle the conversion to and from other common formats.
But this would only get us so far – there would still be application-specific content, such as @mentions
of people, and hotkeys, that might require special handling.
We could provide an escape hatch for applications to render special elements within a series of rich text nodes – but how does that data get created in the first place?
Text editing in modern applications isn’t just about formatting characters and inserting links to URLs – there is often the the ability to search data within the system via special keystrokes and insert references to non-page entities.
Those references might not be static text (showing the entity’s name) or hyperlinks (to the entity’s page), but instead identifiers which are dynamically resolved by the application at runtime, to be able to display the latest name for the entity in question. Clicking on them might do something other than take the user to a page.
This all involves more application-specific logic. In order to handle it under the Block Protocol, we could allow applications to inject inputs and content in response to special keys (e.g. when the user types /
or @
).
But now we have a potentially awkward mix of the block and application manipulating the same DOM element or data while the user types, and we have a lot of specification to do to get there – what else do we need to think about in order to build a foundation suitable for general use?
Each block is only a part of the application the user is interacting with. There are most likely other blocks on a page, and other UI elements beyond them.
Even if we specified a standard text format and ways of applications injecting special content handlers, leaving the rest of the editing experience up to individual blocks means that it’s likely to vary within a page.
The hotkeys that toggle different formatting options. The features on the formatting bar, what it looks like, its position – even whether it is visible by default or not. How cursors are managed inside and across blocks.
If each block determines this on its own, a user is going to be faced with potentially wildly different experiences editing text on the same page.
This is already something we think about in the Block Protocol – we want applications to be able to share styling data with blocks to maximize visual consistency across blocks – but text editing involves so many complex features, with so much scope for variation, that ensuring it is consistent across blocks seems to involve one of:
To avoid blowing up the scope and complexity of the Block Protocol, and to avoid attempting standardization in an area that could potentially hinder embedding applications’ ability to provide systems which work for them, we are opting for (2).
The hook module introduces the concept of “hooks”.
You can read the formal specification for the hook module here, but in summary:
Under the hook module, a block can invite embedding application to take over the rendering of an element within the block (the hook), allowing embedding applications to provide users with application-specific views of data of a specified type.
Using the module, blocks can send a message specifying: a DOM element, the type of data it is intended to hold, and which entity property the data should be retrieved from and stored in. The application then renders the data according to its preference, including any editing controls.
This allows applications to control the display and editing experience for specialized data, while blocks retain control over where exactly in their interface a given property is displayed or edited.
This isn’t just intended for rich text, although text was the motivating factor — the hook module also allows applications to provide their own display, editing, and sharing controls for images, for example.
We aren’t prescribing a fixed list of types — we want to see which are supported by applications and requested by blocks, and ones that we can’t imagine ourselves.
The use of the hook module makes blocks that only render a piece of rich text extremely boring. Our paragraph block, for example, does nothing more than render a <p>
tag and demand that the embedding application does everything else.
If we only needed rich text in paragraphs, the hook module would be unnecessary – embedding applications could manage rendering the <p>
tag as well, and forget using the Block Protocol for rich text.
But we don’t only expect rich text in paragraph-shaped rectangles — we expect it in lists, in columns, in table cells.
The ability for blocks to summon a fully-featured, familiar interface for text editing anywhere it is required is what will make the hook module shine.
At the moment, hooks are initialized at a block’s request, with the type of data specified by the block – but often only the embedding application knows which data requires an application-rendered view. Leaving it up to blocks might mean that specialized data is displayed or edited incorrectly.
Solving this might involve a way of applications telling blocks which data require bespoke handling, allowing blocks to ignore it or ask the application to render a view of it into a specified DOM node. It could also involve special entity or property types which identify those rich experiences may be available for – we’ll be thinking about this in the context of the upcoming new type system, which improves the composability and reusability of types.
It’s also worth noting that because it relies on the application rendering something inside the block, the hook module requires that the application has access to the block’s DOM. That is fine in the current system, but if we ever amend the Block Protocol to provide for blocks running in a scope that the application can only communicate with asynchronously, the hook module would not work for those blocks.
This is only the first step in supporting rich text experiences.
This implementation is likely to evolve as hooks become more widely used, and we discover new factors to consider.
If you have ideas, we’d love to hear them (start a discussion on GitHub).
Get notified when new long-reads and articles go live. Follow along as we dive deep into new tech, and share our experiences. No sales stuff.