HTML was historically an application of SGML, and SGML could do includes. You could define a new "entity", and if you created a "system" entity, you could refer to it later and have it substituted in.
<!DOCTYPE html example [
<!ENTITY myheader SYSTEM "myheader.html">
]>
....
&myheader;
SGML is complex, so various efforts were made to simplify HTML, and that's one of the capabilities that was dropped along the way.
The XML subset of SGML still includes most forms of entity usage SGML has, including external general entities as described by grandparent. XInclude can include any fragment not just a complete document, but apart from that was redundant, and what remains of XInclude in HTML today (<svg href=...>) doesnt't make use of fragments and also does away with the xinclude and other namespaces. For reusing fragments OTOH, SVG has the more specific <use href=...> construct. XInclude also really worked bad in the presence of XML Schema.
It's too bad we didn't go down the XHTML/semantic web route twenty years ago.
Strict documents, reusable types, microformats, etc. would have put search into the hands of the masses rather than kept it in Google's unique domain.
The web would have been more composible and P2P. We'd have been able to slurp first class article content, comments, contact details, factual information, addresses, etc., and built a wealth of tooling.
Google / WhatWG wanted easy to author pages (~="sloppy markup, nonstandard docs") because nobody else could "organize the web" like them if it was disorganized by default.
Once the late 2010's came to pass, Google's need for the web started to wane. They directly embed lifted facts into the search results, tried to push AMP to keep us from going to websites, etc.
Google's decisions and technologies have been designed to keep us in their funnel. Web tech has been nudged and mutated to accomplish that. It's especially easy to see when the tides change.
I kinda bailed on being optimistic/enthusiastic about the Web when xhtml wasn't adopted as the way forward.
It was such a huge improvement. For some reason rather than just tolerating old tag-soup mess while forging the way for a brighter future, we went "nah, let's embrace the mess". WTF.
It was so cool to be able to apply XML tools to the Web and have it actually work. Like getting a big present for Christmas. That was promptly thrown in a dumpster.
As a programmer, I really liked XHTML because it meant I could use a regular XML parser/writer to work with it. Such components can be made small and efficient if you don't need the more advanced features of XML (ex: schemas), on the level of JSON. I remember an app I wrote that had a "print" feature that worked by generating an HTML document. We made it XHTML, and used the XML library we already used elsewhere to generate the document. Much more reliable than concatenating strings (hello injections!) and no need for an additional dependency.
In addition, we used XSLT quite a bit too. It is nice being able to open your XML data files in a web browser and having it nicely formatted without any external software. All you needed was a link to the style sheet.
The thing I liked the most about XHTML was how it enforced strict notation.
Elements had to be used in their pure form, and CSS was for all visual presentation.
It really helped me understand and be better at web development - getting the tick from the XHTML validator was always an achievement for complicated webpages.
The "semantic" part was what eventually became W3C's RDF stuff (a pet peeve of TBL's predating even the Web). When people squeeze poetry, threaded discussion, and other emergent text forms into a vocabulary for casual academic publishing and call that "semantic HTML", that still doesn't make it semantic.
The "strict markup" part can be (and always could be) had using SGML which is just a superset of XML that also supports HTML empty elements, tag inference, attribute shortforms, etc. HTML was invented as SGML vocabulary in the first place.
Agree though that Google derailed any meaningful standardization effort for the readins you stated. Actually, it started already with CSS and the idioticy to pile yet another item-value syntax over SGML/HTML, when it already has attributes for formatting. The "semantic HTML" postulate is kind of just an after-the-fact justification for insane CSS complexity that could grow because it wasn't part of HTML proper and the scrutinity that goes with introducing new elements or attributes with it.
I don't think there was ever a sustainable route to a semantic web that would work for the masses.
People wanted to write and publish. Only a small portion of people/institutions would have had the resources or appetite to tag factual information on their pages. Most people would have ignored the semantic taxonomies (or just wouldn't have published at all). I guess a small and insular semantic web is better than no semantic web, but I doubt there was a scenario where the web would have been as rich as it actually became, but was also rigidly organized.
Also even if you do have good practices of semantic tagging, there are huge epistemological problems around taxonomies - who constructs them, what do the terms actually mean, how to organize them and so on.
In my experience trying to work with wikidata taxonomies, it can be a total mess when it's crowdsourced, and if you go to am "expert" derived taxonomy there are all kinds of other problems with coverage, meaning, democracy.
I've had a few flirtations with the semantic web going back to 2007 and long ago came to the personal conclusion that unfortunately AI is the only viable approach.
That’s not how the history went at all. When I worked at an internet co in the late 1990s (ie pre google’s dominance) SGML was a minority interest back then. We used to try to sell clients on an intranet based on SGML because of the flexibility etc and there was little interest and sloppy markup and incorrect html was very much the norm on the web back then (pre chrome etc)
I kinda agree with you but I'd argue the "death" of microformats is unrelated to the death of XHTML (tho schema.org is still around).
You could still use e.g. hReview today, but nobody does. In the end the problem of microformats was that "I want my content to be used outside my web property" is something nobody wants, beyond search engines that are supposed to drive traffic to you.
The fediverse is the only chance of reviving that concept because it basically keeps attribution around.
Me personally, I didn't even care that much about strict semantic web, but XML has the benefits of the entire ecosystem around it (like XPath and XSLT), composable extensibility in form of namespaces etc. It was very frustrating to see all that thrown out with HTML5, and the reasoning never made any sense to me (backwards compatibility with pre-XHTML pages would be best handled by defining a spec according to which they should be converted to XHTML).
The semantic web is a silly dream of the 90s and 00s. It's not a realizabile technology, and Google basically showed exactly why: as soon as you have a fixed algorithm for finding pages on the web, people will start gaming that algorithm to prioritize their content over others'. And I'm not talking about malicious actors trying to publish malware, but about every single publisher that has theoney to invest in figuring out how and doing it.
So any kind of purely algorithmic, metadata based retrieval algorithm would very quickly return almost pure garbage. What makes actual search engines work is the constant human work to change the algorithm in response to the people who are gaming it. Which goes against the idea of the semantic web somewhat, and completely against the idea of a local-first web search engine for the masses.
I would encourage you to go and read more about triples/asserting facts, and the trust/provenance of facts in this context.
You are basically saying "it's impossible to make basic claims" in your comment, which perhaps you don't realize
I'm as big a critic of Google as anyone, but I'm always surprised at modern day takes around the lost semantic web technologies - they are missing facts or jumping to conclusions in hindsight.
Here's what people should know.
1) The failure of XHTML was very much a multi-vendor, industry-wide affair; the problem was that the syntax of XML was stricter than the syntax of HTML, and the web was already littered with broken HTML that the browser vendors all had to implement layers of quirk handling to parse. There was simply no clear user payoff for moving to the stricter parsing rules of XML and there was basically no vendor who wanted to do the work. To my memory Google does not really stand out here, they largely avoided working on what was frequently referred to as a science project, like all the other vendors.
A few things stand out as interesting. First of all, the old semantic web never had a business case. JSON+LD Structured Data does: Google will parse your structured data and use it to inform the various snippets, factoids, previews and interactive widgets they show all over their search engine and other web properties. So as a result JSON+LD has taken off massively. Millions of websites have adopted it. The data is there in the document. It is just in a JSON+LD section. If you work in SEO you know all about this. Seems to be quite rare that anyone on Hacker News is aware of it however.
Second interesting thing, why did we end up with the semantic data being in JSON in a separate section of the file? I don't know. I think everyone just found that interleaving it within the HTML was not that useful. For the legacy reasons discussed earlier, HTML is a mess. It's difficult to parse. It's overloaded with a lot of stuff. JSON is the more modern thing. It seems reasonable to me that we ended up with this implementation. Note that Google does have some level of support for other semantic data, like RDFa which I think is directly in the HTML - it is not popular.
Which brings us to the third interesting thing, the JSON+LD schemas Google uses, are standards, or at least... standard-y. The W3C is involved. Google, Yahoo, Yandex and Microsoft have made the largest contributions to my knowledge. You can read all about it on schema.org.
TL;DR - XHTML was not a practical technology and no browser or tool vendor wanted to support it. We eventually got the semantic web anyway!
I remember just using PHP sessions back then on a XHTML document produced parse errors. Because PHP added the session to the query strings of links and used the raw & character instead of & for separating params in the query string. Thus causing a XML parse error.
There was a push to prevent browsers to be too lenient with the syntax in order to avoid the problem that sloppy HTML produced (inconsistent rendering across browsers)
Google does support multiple semantic web standards: RDFa, JSON+LD and I believe microdata as well.
JSON+LD is much simpler to extract and parse, however it makes site HTML bigger because information gets duplicated compared to RDFa where values could be inclined.
The “semantic web” has been successful in a few areas but not so much as SQL or document databases. Many data formats use it, such RSS feeds and XMP metadata used by Adobe tools.
As someone who worked in the field of "semantic XML processing" at the time I can tell you that while the "XML processing" part was (while full of unnecessary complications) well understood, the "semantic" part was purely aspirational and never well understood. The common theme with the current flurry of LLMs and their noisy proponents is that it is, in both cases, possible to do worthwhile and impressive demos with these technologies and also real applications that do useful things, but people who have their feet on the ground know that XML doesn't engender "semantics" and LLMs are not "conscious". Yet the hype meddlers keep the fire burning by suggesting that if you just do "more XML" and build bigger LLMs, then at some point real semantics and actual conscience will somehow emerge like a hatching chicken from the egg. And, being emergent properties, who is to say semantics and conscience will not emerge, at some point somehow? A "heap" of grains is emergent after all, and so is the "wetness" of water. But I have strong doubts about XHTML being more semantic than HTML5.
And anyway, even if Google had nefarious intentions and even if they managed to steer the standardization, one has also to concede that all search engines before Google were encumbered by too much structure, too rigid approaches. When you were looking for a book in a computerized library at that point it was standard to be sat in front of a search form with many, many fields; one for the author's name, one for the title and so forth, and searching was not only a pain, it was also very hard to do for a user without prior training. Google had demonstrated it could deliver far better results with a single short form field filled out by naive users that just plonked down three or five words that were on their mind et voila. They made it plausible that instead of imposing a structure onto data at creation time maybe it's more effective to discover associations in the data at search time (well, at indexing time really).
As for the strictness of documents, I'm not sure what it will give you what we don't get with sloppy documents. OK web browsers could refuse to display a web page if any one image tag is missing the required `alt` attribute. So now what happens, will web authors duly include alt="picture of a cat" for each picture of a cat? Maybe, to a degree, but the other 80% of alt tags will just contain some useless drivel to appease the browser. I'm actually more for strict documents than I used to be, but on the other hand we (I mean web browsers) have become quite good at reconstructing usable HTML documents from less-than perfect sources, and the reconstructed source is also a strictly validating source. So I doubt this is the missing piece; I think the semantic web failed because the idea never was strong, clear, compelling, well-defined and rewarding enough to catch on with enough people.
If we're honest, we still don't know, 25 years later, what 'semantic' means after all.
Yes it did, and there are HTML 5.x DTDs for HTML versions newer than HTML 4.x ar [1], including post-HTML 5.2 review drafts until 2023; see notes at [2].
That’s what lots of sites used to do in the late 90s and early aughts in order to have fixed elements.
It was really shit. Browser navigation cues disappear, minor errors will fuck up the entire thing by navigating fixed element frames instead of contents, design flexibility disappears (even as consistent styling requires more efforts), frames don’t content-size so will clip and show scroll bars all over, debugging is absolute ass, …
This was the rabbit hole that I started down in the late 90s and still haven’t come out of. I was the webmaster of the Analog Science Fiction website and I was building tons of static pages, each with the same header and side bar. It drove me nuts. So I did some research and found out about Apache server side includes. Woo hoo! Keeping it DRY (before I knew DRY was a thing).
Yeah, we’ve been solving this over and over in different ways. For those saying that iframes are good enough, they’re not. Iframes don’t expand to fit content. And server side solutions require a server. Why not have a simple client side method for this? I think it’s a valid question. Now that we’re fixing a lot of the irritation in web development, it seems worth considering.
Server-side includes FTW! When a buddy and I started making "web stuff" back in the mid-90s the idea of DRY also just made sense to us.
My dialup ISP back then didn't disable using .htaccess files in the web space they provided to end users. That meant I could turn on server-side includes! Later I figured out how to enable CGI. (I even went so far as to code rudimentary webshells in Perl just so I could explore the webserver box...)
this here is the main idea of HTMX - extended to work for any tag p, div, content, aside …
there are many examples of HTMX (since it is a self contained and tiny) being used alongside existing frameworks
of course for some of us, since HTMX brings dynamic UX to back end frameworks, it is a way of life https://harcstack.org (warning - raku code may hurt your eyes)
I used the seamless attribute extensively in the past, it still doesn't work the way GP intended, which is to fit in the layout flow, for example to take the full width provided by the parent, or automatically resize the height (the pain of years of my career)
It worked rather like a reverse shadow DOM, allowing CSS from the parent document to leak into the child, removing borders and other visual chrome that would make it distinguishable from the host, except you still had to use fixed CSS layouts and resize it with JS.
> The optimal solution would be using a template engine to generate static documents.
This helps the creator, but not the consumer, right? That is, if I visit 100 of your static documents created with a template engine, then I'll still be downloading some identical content 100 times.
I'll still be downloading some identical content 100 times.
That doesn't seem like a significant problem at all, on the consumer side.
What is this identical content across 100 different pages? Page header, footer, sidebar? The text content of those should be small relative to the unique page content, so who cares?
Usually most of the weight is images, scripts and CSS, and those don't need to be duplicated.
If the common text content is large for some reason, put the small dynamic part in an iframe, or swap it out with javascript.
If anyone has a genuine example of a site where redundant HTML content across multiple pages caused significant bloat, I'd be interested to hear about it.
I care! It is unnecessary complexity, and frankly ugly. If you can avoid repetition, then you should, even if the reason is not obvious.
To give you a concrete example, consider caching (or, equivalently, compiling) web pages. Maybe you have 100 articles, which share a common header and footer. If you make a change to the header, then all 100 articles have to be uncached/rebuilt. Why? Because somebody did not remove the duplication when they had the chance :-)
XSLT solved this problem. But it had poor tool support (DreamWeaver etc) and a bunch of anti-XML sentiment I assume as blowback from capital-E Enterprise stacks going insane with XML for everything.
XSLT did exactly what HTML includes could do and more. The user agent could cache stylesheets or if it wanted override a linked stylesheet (like with CSS) and transform the raw data any way it wanted.
> Woo hoo! Keeping it DRY (before I knew DRY was a thing)
I still remember the script I wrote to replace thousands (literally) slightly different headers and footers in some large websites of the 90s. How liberating to finally have that.
You can message the page dimensions to the parent. To do it x domain you can load the same url into the parent with the height in the #location hash. It won't refresh that way.
I know it’s possible to work around it, but that’s not the point. This is such a common use case that it seems worthwhile to pave the cowpath. We’ve paved a lot of cowpaths that are far less trodden than this one. This is practically a cow superhighway.
We’ve built an industry around solving this problem. What if, for some basic web publishing use cases, we could replace a complex web framework with one new tag?
> We’ve built an industry around solving this problem. What if, for some basic web publishing use cases, we could replace a complex web framework with one new tag?
I actually did that replacement, with a few enhancements (maybe 100 lines of code, total?). It's in arxiv pending at the moment. In about two days it will be done and I'll post a Show HN here.
> XHTML 2 takes a completely different approach, by taking the premise that all images have a long description and treating the image and the text as equivalents. In XHTML 2 any element may have a @src attribute, which specifies a resource (such as an image) to load instead of the element.
The difference between "a line of JS" and a standardized declarative solution is of course that a meek "line of $turing_complete_language" can not, in the general case, be known and trusted to do what it purports to do, and nothing else; you've basically enabled any kind of computation, and any kind of behavior. With an include tag or attribute that's different; it's behavior is described by standards, and (except for knowing what content we might be pulling in) we can 100% tell the effects from static analysis, that is, without executing the code. With "a line of JS" the only way, in the general case, to know what it does is to run it (an infinite number of times). Also, because it's not standardized, it's much harder to save to disk, to index and to archive it.
I mean in 1996s netscape you could do this (I run the server for a website that still uses this):
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">
<html>
<frameset cols="1000, *">
<frame src="FRAMESET_navigation.html" name="navigation">
<frame src="FRAMESET_home.html" name="in">
</frameset>
</html>
The thing that always bugged me about frames is that they are too clever. I don't want to reload only the frame html when I rightclick and reload. Sure the idea was to cache those separately, but come on — frames and caching are meant to solve two different problems and by munching them together they somewhat sucked at solving either.
To me includes for HTML should work in the dumbest way possible. And that means: Take the text from the include and paste it where the include was and give the browser the resulting text.
If you want to cache a nav section separately because it appears the same on every page lets add a cache attribute that solves the problem independently:
I think of all the “hygienic macro” sorts of problems. You really ought to be able to transclude a chunk of HTML and the associated CSS into another document but you have to watch out for ‘id’ being unique never mind the same names being used for CSS classes. Figuring out the rendering intent for CSS could also be complicated: the guest CSS might be written like
.container .style { … }
Where the container is basically the whole guest document but you still want those rules to apply…. Maybe, you want the guest text to appear in the same font as the host document but you still want colors and font weights to apply. Maybe you want to make the colors muted to be consistent with the host document, maybe the background of the host document is different and the guest text isn’t contrasts enough anymore, etc.
HTML is a markup language, not a programming language. It's like asking why Markdown can't handle includes. Some Markdown editors support them (just like some server-side tools do for HTML), but not all.
Including another document is much closer to a markup operation than a programming operation. We already include styles, scripts, images, videos, fonts...why not document fragments?
Markdown can't do most of those, so it makes more sense why it doesn't have includes, but I'd still argue it definitely should. I generally dislike LaTeX, but about the only thing I liked about it when writing my thesis was that I could have each chapter in its own file and just include all of them in the main file.
This isn’t programming. It’s transclusion[0]. Essentially, iframes and images are already forms of transclusion, so why not transclude html and have the iframe expand to fit the content?
As I wrote that, I realized there could be cumulative layout shift, so that’s an argument against. To avoid that, the browser would have to download all transcluded content before rendering. In the past, this would have been a dealbreaker, but maybe it’s more feasible now with http multiplexing.
With Early Hints (HTTP code 103), it seems especially feasible. You can start downloading the included content one round-trip after the first byte is sent.
I'm not defending it, because when I started web development this was one of the first problems I ran into as well -- how the heck do you include a common header.
But the original concept of HTML was standalone documents, not websites with reusable components like headers and footers and navbars.
That being said, I still don't understand why then the frames monstrosity was invented, rather than a basic include. To save on bandwidth or something?
The original concept of HTML was as an SGML subset, and SGML had this functionality, precisely because it's very handy for document authoring to be able to share common snippets.
Frames were widely abused by early web apps to do dynamic interfaces before XHR was invented/widely supported. The "app" had a bunch of sub-frames with all the links and forms carefully pointing to different frames in the frameset.
A link in a sidebar frame would open a link in the "editor" frame which loaded a page with a normal HTML form. Submitting the form reloaded it in that same frame. Often the form would have multiple submit buttons, one to save edits in progress and another to submit the completed form and move to the next step. The current app state was maintained server side and validation was often handled there save for some basic formatting client side JavaScript could handle.
This setup allowed even the most primitive frame-supporting browsers to use CRUD web apps. IIRC early web frameworks like WebObjects leaned into that model of web app.
Oh my goodness, yes you're right, I'd forgotten entirely about those.
They were horrible -- you'd hit the back button and only one of the frames would go back and then the app would be in an inconsistent state... it was a mess!
You needed to hit the reset button (and hoped it worked) and never the back button! Yes, I suffered through early SAP web apps built entirely with frames and HTML forms. It was terrible.
I don't love JavaScript monstrosities but XHR and dynamic HTML were a vast improvement over HTML forms and frame/iframe abuse.
Really well written web form applications were a delight in 2001 and a large improvement over conventional applications written in Windows. It helped that application data was in a SQL database, with a schema, protected by transactions, etc as opposed to a tangle of pointers that would eventually go bad and crash the app -- I made very complicated forms for demographic profiling, scientific paper submission, application submission, document search, etc. If you did not use "session" variables for application state this could at worst cause a desynchronization between the browser and the server which (1) would get resynchronized at any load or reload and (2) never get the system into a "stuck" state from the user viewpoint and (3) never lose more than a screen full of work.
Try some other architecture though and all bets were off.
Amazon's web store looked and worked mostly the same as it does now, people were very impressed with MapQuest, etc.
Applications like that can feel really fast, almost desktop application fast, if you are running them on a powerful desktop computer and viewing them on another computer or tablet over a LAN
No, HTML is fundamentally different because (for a static site without any JS dom manipulation) it has all the semantic content, while stylesheets, images, objects, etc. are just about presentation.
I think the distinction is "semantic on what level/perspective?". An image packaged as a binary blob is semantically opaque until it is rendered. Meanwhile, seeing <img> in the HTML or the file extension .jpg in any context that displays file extensions tells me some information right out of the gate. And note that all three of these examples are different information: the HTML tag tells me it's an image, whereas the file extension tells me it's a JPEG image, and the image tells me what the image contains. HTML is an example of some kind of separation, as it can tell you some semantic meaning of the data without telling you all of it. Distinguishing and then actually separating semantics means data can be interpreted with different semantics, and we usually choose to focus on one alternative interpretation. Then I can say that HTML alone regards some semantics (e.g. there is an image here) while disregarding others (e.g. the image is an image of a brick house).
I'm not sure what isn't computing. Presumably you know (or have looked up) the meaning of "semantic"? Images and videos are graphic, not semantic, content. To the extent they are rendering semantic content, that content should be described in the alt tag.
The feature proposal was called HTML Imports [1], created as part of the Web Components effort.
> HTML Imports are a way to include and reuse HTML documents in other HTML documents
There were plans for <template> tag support and everything.
If I remember correctly, Google implemented the proposed spec in Blink but everyone else balked for various reasons. Mozilla was concerned with the complexity of the implementation and its security implications, as well as the overlap with ES6 modules. Without vendor support, the proposal was officially discontinued.
That matches with the comment [1] on the article, citing insufficient demand, no vendor enthusiasm, etc.
The thing is that all those are non-reasons that don't really explain anything: Low demand is hard to believe if this feature is requested for 20 years straight and there are all kinds of shim implementations using scripts, backend engines, etc. (And low demand didn't stop other features that the vendors were interested in for their own reasons)
Vendor refusal also doesn't explain why they refused it, even to the point of rolling back implementations that already existed.
So I'd be interested to understand the "various reasons" in more detail.
"Security implications" also seem odd as you already are perfectly able to import HTML cross origin using script tags. Why is importing a script that does document.write() fine, but a HTML tag that does exactly the same thing hugely problematic?
(I understand the security concern that you wouldn't want to allow something like "<import src=google.com>" and get an instant clone of the Google homepage. But that issue seems trivially solvable with CORS.)
There are various specs/semantics you can choose, which prescribe the implementation & required cross-cutting complexity. Security is only relevant in some of them.
To give you some idea:
- HTML load ordering is a pretty deeply held assumption. People understand JS can change those assumptions (document.write). Adding an obscure HTML tags that does so is going to be an endless parade of bugs & edge cases.
- To keep top-to-bottom fast we could define preload semantics (Dropping the linear req-reply, define client-cache update policy when the template changes, etc). Is that added complexity truly simpler than having the server combine templates?
- <iframe> exists
In other words, to do the simplest thing 75% of people want, requires a few lines of code. Either client side or server side.
To fit the other 25% (even to 'deny' it) is endlessly complex in ways few if any can oversee.
Maybe something that adds to this low demand is that:
1. Web pages that are developed from the viewpoint of the user having JS, makes it trivial to implement something that provides the same results.
2. Web pages that are developed for user agents that don't run js, probably want to have some interaction, so already have a server runtime that can provide this feature.
2b. And if it doesn't have any user interaction, its probably a static content site, and nobody is writing content in HTML, so there already is a build step that provides this feature.
JS-first developers want something that works the same way client-side and server-side, and the mainstream front-end dev community shifted to JS-first, for better or worse
HTML Imports went in a similar direction but they do not do what the blog post is about. HTML should be imported and displayed in a specific place of the document. HTML Imports could not do this without JavaScript.
To be fair, it was pretty complicated. IIRC, using it required using Javascript to instantiate the template after importing it, rather than just having something like <include src="myinclude.html">.
As far as I'm aware of it, changing the SRC-attribute was quite crash-y and the functionality was stripped soon. (I remember playing with this in beta, and then it was gone in the production version.)
Ward Cunningham (inventor of the Wiki) spent some time trying to invent a transclusion-first wiki, where everyone had their own wiki-space and used transclusion socially https://en.wikipedia.org/wiki/Federated_Wiki
I think true transclusion would be more than that.
In Xanadu you could transclude just an excerpt from one document into another document.
If you wanted to do this with HTML you need an answer for the CSS. In any particular case you can solve it, making judgements about which attributes should be consistent between the host document, the guest document and the guest-embedded-in-host. The general case, however, is unclear.
For a straightforward <include ...> tag the guest document is engineered to live inside the CSS environment (descendant of the 3rd div child of a p that has class ".rodney") that the host puts it in.
Another straightforward answer is the Shadow DOM which, for the most part, lets the guest style itself without affecting the rest of the document. I think in that case the host can still put some styles in to patch the guest.
Isn't this what proper framesets (not iframes) were supposed to do a long time ago (HTML 4?). At least they autoexpanded just fine and the user could even adjust the size to their preference.
There was a lot of criticism for frames [1] but still they were successfully deployed for useful stuff like Java API documentation [2].
In my opinion the whole thing didn't stay mostly because of too little flexibility for designer: Framesets were probably well enough for useful information pages but didn't account for all the designers' needs with their bulky scrollbars and limited number of subspaces on the screen. Today it is too late to revive them because framesets as-is wouldn't probably work well on mobile...
Issue with frame set was way more fundamental: No deep linking, thus people coming via bookmarks or Google (or predecessor) were left on a page without navigation, which people then tried working around with JavaScript, which never gave it a good experience.
Nowdays it is sometimes the other way around: Pages are all JavaScript so no good experience in the first place. I have encountered difficulty trying to get a proper “link” to something multiple times. Also, given that Browsers love to reduce/hide the address bar I wonder if it is really still that important a feature.
Of course "back then" this was an important feature and one of the reasons for getting rid of frames :)
"Includes" functionality is considered to be server-side, i.e. handled outside of the web browser. HTML is client-side, and really just a markup syntax, not a programming language.
As the article says, the problem is a solved one. The "includes" issue is how every web design student learns about PHP. In most CMSes, "includes" become "template partials" and are one of the first things explained in the documentation.
There really isn't any need to make includes available through just HTML. HTML is a presentation format and doesn't do anything interesting without CSS and JS anyway.
> "Includes" functionality is considered to be server-side, i.e. handled outside of the web browser. HTML is client-side, and really just a markup syntax, not a programming language.
That's not an argument that client-side includes shouldn't happen. In fact HTML already has worse versions of this via frames and iframes. A client-side equivalent of a server-side include fits naturally into what people do with HTML.
I think it feels off because an HTML file can include scripts, fonts, images, videos, styles, and probably a few other things. But not HTML. It can probably be coded with a custom element (<include src=.../>). I would be surprised if there wasn't a github repo with something similar.
It stands for "markup language", and was inherited from SGML, which had includes. Strictly speaking, so did early HTML (since it was just an SGML subset), it's just that browsers didn't bother implementing it, for the most part. So it's not that it didn't evolve, but rather it devolved.
Nor is this something unique to SGML. XML is also a "markup language", yet XInclude is a thing.
That's why I joked about flamebait, it's hypertext though, aren't anchors essentially a goToURL() click handler in some ways? Template partials seem like a basic part of this system.
> considered to be server-side
Good point! Wouldn't fetching a template partial happen the same way (like fetching an image?)
Agree with what you said, however, HTML is a document description language and not a presentation format. CSS is for presentation (assuming you meant styling).
HTML is a markup language that identifies the functional role of bits of text. In that sense, it is there to provide information about how to present the text, and is thus a presentation format.
It is also a document description language, because almost all document description languages are also a presentation format.
> As the article says, the problem is a solved one.
It's "solved" only in the sense that you need to use a programming language on the server to "solve" it. If all you are doing is static pages, it's most definitely not solved.
Then you just pre-build the page before publishing it. It's way cheaper as you do the work once, instead of every client being much slower because they have to do additional requests.
> Then you just pre-build the page before publishing it.
That's "using a programming language to solve the problem", isn't it?
> It's way cheaper as you do the work once, instead of every client being much slower because they have to do additional requests.
What work do client-side includes have to do other than fetching the page (which will get cached anyway)? It's less work to have a `<include-remote ...>` builtin than even a simple Makefile on the server.
It does not have to be a programming language on the server, no, unless you want to. The server can have a static site, that you build as you deploy it.
Fetching another resource is expensive. It's another round trip, and depending on many factors it could be another second to load the page. And if the HTML includes other nested HTML then it can be much slower.
This is the exact thing we try to avoid when building websites that perform well. You want as few chained requests as possible, and you want the browser to be aware of them as soon as possible, with the correct priority. That way the browser can get the important stuff needed to display content fast.
Including HTML client side for templating is just wasteful, slow and dumb from a technical standpoint.
Every client would have to do another request for each include. It would literally be many thousands of times slower(or worse) than doing it locally where the templates can be in memory as you render the pre-render the pages. You also save a ton of CPU cycles and bandwidth, by not serving more files with additional overhead like headers.
> It would literally be many thousands of times slower(or worse) than doing it locally where the templates can be in memory as you render the pre-render the pages.
Yeah, it's not. I'm doing client side includes and the includes get cached by the browser. I'm sure I would have noticed if my pages went from 1s to display to 1000s to display.
If you have a site/webapp with (say) twenty pages, that's only two extra requests for both header and footer.
By "whole 'build' process", do you think something like a makefile or do you think something more advanced is required?
One drawback though would be that one indeed would have to maintain dependencies, which would be error prone beyond simply adding headers and footers... I wonder if one could (ab)use CPP [1] and its -M option to do that.
Well, that very much depends on your definition of slow, doesn't it?
An additional request is another round trip. That can be very slow. Average TTFB on the internet in the US is ~0.7 seconds.
It's much faster to send it as part of the same request as you then don't have to wait for the browser to discover it, request it, wait for the response and then add it.
A build process does not have to be complicated, at all. If you can write HTML then using something that can simply read the HTML includes you wish existed and swap it with the specified filename is trivial.
Ofc, the idea has many other issues, like how to handle dependencies of the included HTML, how to handle conflicts, what oath to use and many more.
> "Includes" functionality is considered to be server-side
Exactly! Include makes perfect sense on server-side.
But client-side include means that the client should be able to modify original DOM at unknown moment of time. Options are
1. at HTML parse time (before even DOM is generated). This requires synchronous request to server for the inclusion. Not desirable.
2. after DOM creation: <include src=""> (or whatever) needs to appear in the DOM, chunk loaded asynchronously and then the <include> DOM element(sic!) needs to be replaced(or how?) by external fragment. This disables any existing DOM structure validation mechanism.
Having said that...
I've implemented <include> in my Sciter engine using strategy #1. It works there as HTML in Sciter usually comes from local app resources / file system where price of issuing additional "get chunk" request is negligible.
There are all kind of issues with HTML include as others have pointed out
If main.html includes child/include1.html and child/include1.html has a link src="include2.html" then when the user clicks the link where does it go? If it goes to "include2.html", which by the name was meant to be included, then that page is going to be missing everything else. If it goes to main.html, how does it specify this time, use include2.html, not include1.html?
You could do the opposite, you can have article1.html, article2.html, article3.html etc, each include header.html, footer.html, navi.html. Ok, that works, but now you've make it so making a global change to the structure of your articles requires editing all articles. In other words, if you want to add comments.html to every article you have to edit all articles and you're back to wanting to generate pages from articles based on some template at which point you don't need the browser to support include.
I also suspect there would be other issues, like the header wants to know the title, or the footer wants a next/prev link, which now require some way to communicate this info between includes and you're basically back to generate the pages and include not being a solution
I think if you work though the issues you'll find an HTML include would be practically useless for most use cases.
These are all solvable issues with fairly obvious solutions. For example:
> If main.html includes child/include1.html and child/include1.html has a link src="include2.html" then when the user clicks the link where does it go? If it goes to "include2.html", which by the name was meant to be included, then that page is going to be missing everything else. If it goes to main.html, how does it specify this time, use include2.html, not include1.html?
There are two distinct use cases here: snippet reuse and embeddable self-contained islands. But the latter is already handled by iframes (the behavior being your latter case). So we only need to do the former.
> These are all solvable issues with fairly obvious solutions.
No, they are a can of worms and decades of arguments and incompatibilities and versioning
> But the latter is already handled by iframes
iframes don't handle this case because the page can not adjust to the iframe's content. There have been proposals to fix this but they always run into issues.
So, HTML did have includes and they fell out of favor.
The actual term include is an XML feature and it’s that feature the article is hoping for. HTML had an alternate approach that came into existence before XML. That approach was frames. Frames did much more than XML includes and so HTML never gained that feature. Frames lost favor due to misuse, security, accessibility, and variety of other concerns.
Unlike Framesets I think XML includes were never really supported in many browsers (or even any major browsers)?
I still like to use them occasionally but it incurs a "compilation" step to evaluate them prior to handing the result of this compilation to the users/browsers.
As it happens, the major browsers still can do XML 'includes' to some extent, since by some miracle they haven't torn out their support for XSLT 1.0. E.g. this outputs "FizzBuzz" on Firefox:
Yep, and this can be used to e.g. make a basically static site template and then do an include for `userdata.xml` to decorate your page with the logged in user's info (e.g. on HN, adding your username in the top right, highlighting your comments and showing the edit/delete buttons, etc.). You can for example include into a variable `<xsl:variable name="myinfo" select="document('userdata.xml')"/>` and then use it in xpath expressions like `$myinfo/user/@id`. Extremely simple, good for caching, lightweight, very high performance. Easy to fail gracefully to the logged out template. You basically get your data "API" for free since you're returning XML in your data model. I will never understand why it didn't take off.
XML includes are blocking because XSL support hasn't been updated for 25 years, but there's no reason why we couldn't have it async by now if resources were devoted to this instead of webusb etc.
You'd better not jinx it: XSL support seems like just the sort of thing browser devs would want to tear out in the name of reducing attack surface. They already dislike the better-known SVG and never add any new features to it. I often worry that the status quo persists only because they haven't really thought about it in the last 20 years.
I’ve used XSLT in anger - I used it to build Excel worksheets (in XML format) using libXSLT. I found it very verbose and hard to read. And Xpath is pretty torturous.
I wish I could have used Javascript. I wish Office objects were halfway as easy to compose as the DOM. I know a lot of people hate on Javascript and the DOM, but it’s way easier to work with than the alternatives.
I know it’s not straight HTML, but SSI (server side includes) helped with this and back in the day made for some incredibly powerful caching solutions. You could write out chunks of your site statically and periodically refresh them in the server side, while benefitting from serving static content to your users. (This was in the pre varnish era, and before everyone was using memcached)
I personally used this to great success on a couple of Premier League football club websites around the mid 2000s.
One benefit of doing it on the client is the client can cache the result of an include. So for example, instead of having to download the content of a header and footer for every page, it is just downloaded once and re-usef for future pages
How big are your headers and footers, really? If caching them is worth the extra complexity on the client plus all the pain of cache invalidation (and the two extra requests in the non-cached case).
I’m willing to bet the runtime overhead of assembly on the client is going to be larger than the download cost of the fragments being included server or edge side and cached
If you measure download cost in time then sure.. If you measure download cost in terms of bytes downloaded, or server costs, then nope. The cost would be smaller to cache.
Not necessarily, compression is really effective at reducing downloaded bytes
In server terms the overhead of tracking one download is going to be less that the overhead of tracking the download of the multiple components
And for client side caching to be any use then a visitor would need to view more than one page and the harsh reality is many sessions are only one page long e.g. news sites, blogs etc
I'm a full stack developer. I do server side rendering. I agree that this is a 'solved problem' for that case. However there are many times I don't want to run a server or a static site generator. I manage a lot of projects. I don't want more build steps than necessary. I just want to put some HTML on the net with some basic includes, without JavaScript. But currently I would go the web component route and accept the extra JS.
This is just my own understanding, but doesn't a webpage consist of a bunch of nodes, which can be combined in any way. And an html document is supposed to be a complete set of nodes, so a combination of those won't be a single document anymore.
Nodes can be addressed individually, but a document is the proportion for transmission containing also metadata. You can combined nodes as you like, but you can't really combined two already packed and annotated documents of nodes.
So I would say it is more due a semantic meaning. I think there was also the idea of requesting arbitrary sets of nodes, but that was never developed and with the shift away from a semantic document, it didn't make sense anymore.
I think the quickest way to say it is that there is only one head on a page, and every HTML file needs a head. So if you include one into the other, you either have two heads, or the inner document didn't have a head.
I wasn't talking about the nodes in the DOM. I meant the minimal annotated information snippets, that the WWW is supposed to consist of, as opposed to the minimum addressable units.
At least some of the blame here is the bias towards HTML being something that is dynamic code generated, as opposed to something that is statically handwritten by many people.
There are features that would be good for the latter that have been removed. For example, if you need to embed HTML code examples, you can use the <xmp> tag, which makes it so you don't need to encode escapes. Sadly, the HTML5 spec is trying to obsolete the <xmp> tag even though it's the only way to make this work. All browsers seem to be supporting it anyways, but once it is removed you will always have to encode the examples.
HTML spec developers should be more careful to consider people hand coding HTML when designing specifications, or at least decisions that will require JavaScript to accomplish something it probably shouldn't be needed for.
It's the other way around, HTML was designed to be hand written, and the feature set was defined at that stage. If it ended up being dynamically generated, that happened after the feature set was defined.
The article asks about includes but also about imports ("HTML cannot import HTML ") which this is very directly.
This feature was billed as #includes for the web [1]. No, it acts nothing like an #include. TBH I don't see why ES modules are a "replacement" here.
Personally I would like to see something like these imports come back, as a way to reuse HTML structure across pages, BUT purely declaratively (no JS needed).
#includes where partially formed HTML (ie, header.html has a <body> open tag and footer.html has the closing tag) isn't very DOM compatible.
I don't think you even need to wrap it, really. You need to make sure it's valid XML, but the root element could be <html> just fine. And then use an identity transform with <xsl:output method="html">.
The reason is simple, HTML is not a hypertext markup language. Markup is the process of adding commentary and other information on top of an existing document, and HTML is ironically incapable of doing the one thing it most definitely should be able to do.
It's so bad, that if you want to discuss the markup hypertext (I.E. putting notes on top of an existing read only text files, etc.) you'll have to Google the word "annotation" to even start to get close.
Along with C macros, Case Sensitivity, Null terminated strings, unauthenticated email, ambient authority operating systems, HTML is one of the major mistakes of computing.
We should have had the Memex at least a decade ago, and we've got this crap instead. 8(
Kind of serious question. Do we have any alternatives to html? If not, why? It’s essentially all html. Yes, browser will render svg/pdf/md and so on, but as far as I can tell, it’s not what I consider "real web" (links to other documents, support for styling, shared resources, scripting, and so on ).
I would have loved for there to be a json based format, or perhaps yaml, as an alternative to the xml- based stuff we have today.
You have to give an iframe a specific height in pixels. There is no “make this iframe the height its content wants to be (like normal HTML).
This leads to two options:
- your page has nested vertical scroll bars (awful UX)
- you have to write JavaScript inside and outside the frame to constantly measure and communicate how tall the frame wants to be.
I guess, the best you could do is emulating a frameset layout with a fixed navigation and a display frame for the actual content. (By setting the overflow to `hidden` you can get rid of the outer scrollbars.)
If I really need HTML includes for some reason, I'd reach for XSLT. I know its old, and barely maintained at best, but that was the layer intentionally added to add programming language features to the markup language that is HTML.
My main gripe is a decade(s?) old Firefox bug related to rendering an HTML string to the DOM.
That may be a fairly specific use case though, and largely it still works great today. I've done a few side projects with XSLT and web components for interactivity, worked great.
This bug is specifically about <xsl:text disable-output-escaping="yes"> not working in Firefox. How is disabling output escaping relevant in regards to sharing templates between pages?
> The only combination that fails to render these entities correctly is Firefox/XSLT.
Which is one good reason not to adopt XSLT to implement HTML includes. You just don't know what snags you'll hit upon but you can be sure you'll be on your own.
> Bug 98168 (doe) Opened 24 years ago Updated 21 days ago
Well it does look like someone's still mulling over whether and how to fix it... 24 years later...
I think XSLT is still a reasonable technology in itself - the lack of updated implementations is the bad part. I think modern browsers only support 1.0 (?). At least most modern programming languages should have 3.0 support.
Firefox has a very old bug related to rendering an HTML string to the DOM without escaping it, that one has bit me a few times. Nothing a tiny inline script can't fix, but its frustrating to have such a basic feature fail.
Debugging is also pretty painful, or I at least haven't found a good dev setup for it.
That said, I'm happy to reach for XSLT when it makes sense. Its pretty amazing what can be done with such an old tech, for the core use case of props and templates to HTML you really don't need react.
If you want to include HTML sandboxes, we have iframes. If you want it served from the server, it's just text. Putting text A inside text B is a solved problem.
> Putting text A inside text B is a solved problem.
Yes, but in regards to HTML it hasn't been solved in a standard way, it's been solved in hundreds, if not thousands of non standard ways. The point of the article is that having one standard way wlcould reduce a lot of complexity from the ecosystem, as ES6 imports did.
> We’ve got <iframe>, which technically is a pure HTML solution, but they are bad for overall performance, accessibility, and generally extremely awkward here
What does this mean? This is a pure HTML solution, not just "technically" but in reality. (And before iframe there were frames and frameset). Just because the author doesn't like them don't make them non-existent.
An iframe is a window into another webpage, and is bounded as such both visually and in terms of DOM interfaces. A simple example would be that an iframe header can't have drop-down menus that overlap content from the page hosting it.
They are categorically not the same DX/UX as SSI et al. and it's absolutely bizarre to me that there's so many comments making this complaint.
The real problem with iframes is that their size is set by the parent document only.
They would be a lot more useful if we could write e.g. <iframe src=abc.html height=auto width=100> so the height of the iframe element is set by the abc.html document instead of the parent document.
I'm not an expert on this but IMO, from a language point of view, HTML is a markup language, it 'must' have no logic or processing. It is there to structure the information not to dynamically change it. Nor even to display it nicely.
The logic is performed elsewhere. If you were to have includes directly in HTML, it means that browsers must implement logic for HTML. So it is not 'just' a parser anymore.
Imagine for example that I create an infinite loop of includes, who is responsible to limit me? How to ensure that all other browsers implement it in the same way?
What happens if I perform an injection from another website? Then we start to have cors policy management to write. (iframes were bad for this)
Now imagine using Javascript I inject an include somewhere, should the website reload in some way? So we have a dynamic DOM in HTML?
I understand your point, but I still think it is bad from the point of view of language paradigms[1]. Iframes should have not been created in the first place.. You are changing the purpose of the language while it was not made for it.
(yes in my view I interpret includes as a basic procedure)
There is a very, very broad line in that "no logic or processing". HTML/CSS already do a lot of logic and processing. And many "markup languages" have include support. Like wikitext used in wikipedia and includes in Asciidoc.
* the actual existence of frames (although those are deprecated)
* iframes (which are not deprecated, so seemingly doing declarative inclusion of HTML in HTML was not what was wrong with frames)
* imports in CSS, which share some of the same problems / concerns as HTML imports
* the existence of JavaScript with its ability to change anything on the page, including the ability to issue HTTP requests and be written arbitrarily obfuscated ways.
I 100% agree with the sentiment of this article. For my personal website, I write pretty much every page by hand, and I have a header and a footer on most of those pages. I certainly don't want to have to update every single page everytime I want to add a new navigation button to the top of the page. For a while I used PHP, but I was running a PHP server literally for only this feature. I eventually switched to JavaScript, but likewise, on a majority of my pages, this was the only JavaScript I had, and I wanted to have a "pure" HTML page for a multitude of reasons.
In the end, I settled on using a Caddy directive to do it. It still feels like a tacked on solution, but this is about as pure as I can get to just automatically "pasting" in the code, as described in the article.
I'd say in 80% of the cases a pure, static html include is not enough. In a menu include, you want to disable the link to the currently shown page or show a page specific breadcrumb.
In a footer include, you may want a dynamic "last updated" timestamp or the current year in the copyright notice.
As all these use cases required a server-side scripting language anyway, there was no push behind an html include.
Initially HTML was less about the presentation layer and more about the "document" concept. Documents should be self-contained, outside of references to other documents.
One document == one HTML page was never the idea. Documents are often way too long to comfortably read and navigate that way. Breaking them into sections and linking between them was part of the core idea of HTML.
Includes are a standard part of many document systems. Headers and footers are a perfect example - if I update a document I certainly don't want to update the document revision number on every single page! It also allows you to add navigation between documents in a way that is easy to maintain.
LaTeX can do it. Microsoft Word can do it (in a typically horrible Microsoftian way). Why not HTML?
I still think this is the best web. Either you are a collection of interlinked documents and forms (manual pages, wiki,...), or you are a full application (figma, gmail, google docs). But a lot of sites are trying to be both. And somes are trying to be one while they are the other type.
> Our developer brains scream at us to ensure that we’re not copying the exact code three times, we’re creating the header once then “including” it on the three (or a thousand) other pages.
Interesting, my brain is not this way: I want to send a minimum number of files per link requested. I don't care if I include the same text because the web is generally slow and it's generally caused by a zillion files sent and a ton of JS.
We discussed this back when creating web components, but the focus quickly became about SPA applications instead of MPAs and the demand for features like this was low in that space.
I wish I would have advocated more for it though. I think it would be pretty easy to add using a new attribute on <script> since the parser already pauses there, so making something like <script transclude={url}> would likely not be too difficult.
I’m not saying my first website was impressive — but as a programmer there’s no way I was copying and pasting the same header / footer stuff into each page and quickly found “shtml” and used that as much as possible.
Then used the integrated FTP support in whatever editor it was (“HTML-kit” I think it was called?) - to upload it straight to prod. Like a true professional cowboy.
On topic: what's the absolute minimal static site generator that can achieve this feature? I know things like Pelican can do it but it's pretty heavy. C preprocessor probably can be used for this...
This seems to be forgetting the need to render other site's content. That's the main reason for iframes to be used, as people need to render ads, email previews, games, and so forth, without potentially breaking the rest of the page.
The "extremely awkward" aspect they complain about is a side effect of needing to handle that case.
You could add some nicer way to include content for the same domain, but I suspect having two highly similar HTML features would be fairly awkward in practice, as you'd have to create a whole new set of security rules for it.
We used to have this in the form of a pair of HTML tags: <frameset> and <frame> (not to be confused with the totally separate <iframe>!). <frameset> provided the scaffolding with slots for multiple frames, letting you easily create a page made up entirely of subpages. It was once popular and, in many ways, worked quite neatly. It let you define static elements once entirely client-side (and without JS!), and reload only the necessary parts of the page - long before AJAX was a thing. You could even update multiple frames at once when needed.
From what I remember, the main problem was that it broke URLs: you could only link to the initial state of the page, and navigating around the site wouldn't update the address bar - so deep linking wasn’t possible (early JavaScript SPA frameworks had the same issue, BTW). Another related problem was that each subframe had to be a full HTML document, so they did have their own individual URLs. These would get indexed by search engines, and users could end up on isolated subframe documents without the surrounding context the site creator intended - like just the footer, or the article content without any navigation.
In the nineties we fixed it with frames or CGI. I still think of it as one of those “if it was fiction it would be unrealistic” things (although, who writes fictional markup standards?)
SSI is still a thing: I use it on my personal website. It isn't really part of the HTML, though: it's a server-dependent extension to HTML. It's supported by Apache and nginx, but not by every server, so you have to have control over the server stack, not just access to the documents.
Originally, iframe were the solution, like the posts mentions. By the time iframes became unfashionable, nobody was writing HTML with their bare hands anymore. Since then, people use a myriad of other tools and, as also mentioned, they all have a way to fix this.
So the only group who would benefit from a better iframe is the group of people who don't use any tools and write their HTML with their bare hands in 2025. That is an astonishing small group. Even if you use a script to convert markdown files to blog posts, you already fall outside of it.
No-one needs it, so the iframe does not get reinvented.
No, originally frameset[0] and frame[1] were the solution to this problem. I remember building a website in the late 1990s with frameset. iframe came later, and basically allowed you to do frames without the frameset. Anyway, frameset is also the reason every browser's user agent starts with "Mozilla".
what if it could be a larger group though? modern css has been advancing rather rapidly... I don't even need a preprocessing library any more... I've got nested rules, variables, even some light data handling... why not start beefing up html too? we've got some new features but includes would be killer
I think it’s because it would be so easy to make a recursive page that includes itself forever. So you have to have rules when it’s okay, and that’s more complex and opaque than just programming it yourself.
It's a pity, of all web resources advancements, js, css, runtimes, web engines. HTML was the most stagnant aspect of it, despite the "HTML5" effing hype. My guess is they did not want to empower HTML and threaten SSR's, or solutions. I believe the bigest concern of not making a step is the damned backward compatibility. Some just wont budge to move.
HTML5 hype started strong out of the gate because of the video and audio tags, and canvas slightly after. Those HTML tags were worth the hype.
Flash's reputation was quite low at the time and people were ready to finally move on from plugins being required on the web. (Though the "battle" then shifted to open vs. closed codecs.)
At the end of the day it’s not something trivial to implement at the HTML spec/parser level.
For relative links, how should the page doing the import handle them?
Do nothing and let it break, convert to absolute links, or remap it as a new relative link?
Should the include be done synchronously or asynchronously?
The big benefit of traditional server side includes is that its synchronous, thus simplifying logic for in-page JavaScript, but all browsers are trying to eliminate synchronous calls for speed, it’s hard to see them agreeing to add a new synchronous bottleneck.
Should it be CORS restricted? If it is then it blocks offline use (file:// protocol) which really kills its utility.
There are a lot of hurdles to it and it’s hard to get people to agree on the exact implementation, it might be best to leave it to JavaScript libraries.
Someone else made the same - https://github.com/Paul-Browne/HTMLInclude - but it's not been updated in 7 years, leaving questions. I'll try yours and theirs in due course. Err, and the fragment @HumanOstrich said elsewhere in comments.
My guess is that some or maybe all of your concerns should have been solved by CSS @import (https://developer.mozilla.org/en-US/docs/Web/CSS/@import) although, as I'm reading the first few lines of the linked article, those must appear near the top of a CSS file, so are significantly more restricted than an import that can appear in the middle of a document.
So glad I decided early in my career to not do webpages. Look how much discussion this minor feature has generated. I did make infra tools that outputted basic html, get post cgi type of stuff. What's funny is this stuff was deployed right before AWS was launched and a year later the on prem infra was sold and the warehouse services were moved to the cloud.
You and me both. I did some web dev back in the early days, and noped out when IE was dragging everyone down with its refusal to change. I have never had a reason to regret that decision.
honestly, html can include css and javascript via link and style tags. there's no reason for it to not have an <include src="" /> tag, and let the browser parsing it fetch the content to replace it.
Chris is an absolute legend in this space and I’m so glad he’s bringing this up. I feel like he might actually have pull here and start good discussions that might have actual solutions.
HTML frames solved this problems just fine, but they were deprecated in favour of using AJAX to replace portions of the body as you navigate (e.g.: SPAs).
I still feel like frames were great for their use case.
> I don't think we should do this. The user experience is much better if such inclusion is done server-side ahead of time, instead of at runtime. Otherwise, you can emulate it with JavaScript, if you value developer convenience more than user experience.
The "user experience" problem he's describing is a performance problem, adding an extra round-trip to the server to fetch the included HTML. If you request "/blog/article.html", and the article includes "/blog/header.html", you'll have to do another request to the server to fetch the header.
It would also prevent streaming parsing and rendering, where the browser can parse and render HTML bit-by-bit as it streams in from the server.
Before you say, "so, what's the big deal with adding another round trip and breaking the streaming parser?" go ahead and read through the hundreds of comments on that thread. "What's the big deal" has not convinced browser devs for at least eight years, so, pick another argument.
I think there is a narrow opening, where some noble volunteer would spec out a streaming document-fragment parser.
It would involve a lot of complicated technical specification detail. I know a thing or two about browser implementation and specification writing, and designing a streaming document-fragment parser is far, far beyond my ken.
But, if you care about this, that's where you'd start. Good luck!
P.S. There is another option available to you: it is kinda possible to do client-side includes using a service worker. A service worker is a client-side proxy server that the browser will talk to when requesting documents; the service worker can fetch document fragments and merge them together (even streaming fragments!) with just a bit of JS.
But that option kinda sucks as a developer experience, because the service worker doesn't work the first time a user visits your site, so you'd have to implement server-side includes and also serve up document fragments, just for second-time visitors who already have the header cached.
Still, if all you want is to return a fast cached header while the body of your page loads, service workers are a fantastic solution to that problem.
I desperately want to back to crafting sites by hand and not reach for react/vue as a default. I do a lot of static and tempory sites that do very little
I would guess back in the days having extra requests was expensive, thus discouraged. Later there were attempts via xinclude, but by then PHP and similar took over or people tolerated frames.
SVG use element can do exactly what the OP desires. SVGs can be inlined in html and html can be inlined in SVG too. I never understand why web devs learn html and then stop there instead of also learning svg which looks just like html, but with a lot more power.
I still use server side includes. It is absolutely the best ratio of templating power to attack surface. SSI basically hasn't changed in the last 20 years and is solid in apache, nginx, etc. You can avoid all the static site generator stuff and just write pure .html files.
It should not have gone away. It never did for me.
Also, this is kind of what 'frames' were and how they were used. Everything old is new again.
just on point at all why includ this stuff to load more at once... the hole web stuff works since cgi implementations at web services on asyncronity to load just what u actually need... like now most of them are fetch or xhr calls... I mean it makes just sence for onepager to keep the markup a bit more structured... but why u want to make static rendered homepages those days?
at all ...why includ this stuff to load more at once... the hole web stuff works since cgi implementations at web services on asyncronity to load just what u actually need... like now most of them are fetch or xhr calls... I mean it makes just sence for onepager to keep the markup a bit more structured... but why u want to make static rendered homepages those days?
A frame is a separate rendering context—it's (almost) as heavyweight as a new tab. The author wants to insert content from another file directly into the existing DOM, merging the two documents completely.
The simplest answer is that HTML wasn't designed as a presentation language, but a hypertext document language. CSS and Javascripts were add-ons after the fact. Images weren't even in the first version. Once usage of the web grew beyond the initial vision, solutions like server-side includes and server-side languages that rendered HTML were sufficient.
I think the best examples of HTML in that regard is HTML-rendered info pages[0], for Emacs and its ecosystem. Then you have the same content presented in HTML [1]. Templates were enough in the first case. Includes are better in the second case due to common assets external to the content.
That's the first thing listed in the article? "Javascript to go fetch the HTML and insert it". What they're after is something that's _just_ HTML and not another language.
While you do need a server i think this is the functional equivalent? The fetch JS and insert outlined (linked to) in the article is async. This blocks execution like you'd expect an HTML include to do. It's WAY easier to reason about - which is why the initial ask, I think...
you need a server for HTML to work, as practical matter. But yes. There IS a workaround to that too, if you're REALLY determined, but you have to format your HTML a giant JS comment block (lol really :))
[edit: I'm sure there are still some file:// workflows for docs - and yes this doesn't address that]
After researching this very topic earlier; SSI is the most pragmatic solution. Check out Caddy's Template Language (based on Go), it is quite capable and quite similar to building themes in Hugo. Just much more bare bones.
I have built several sites with pure HTML+CSS, sprinkled with some light SSI with Caddy, and it is rock solid and very performant!
Lots of rationalization in here—it's always been needed. I complained about the lack of <include src="..."> when building my first site in '94/95, with simpletext and/or notepad!
It was not in the early spec, and seems someone powerful wouldn't allow it in later. So everyone else made work arounds, in any way they could. Resulting in the need being lessened quite a bit.
My current best workaround is the <object data=".."> tag, which has a few better defaults than iframe. If you put a link to the same stylesheet in the include file it will match pretty well. Size with width=100%, though with height you'll need to eyeball or use javascript.
I think this is a genuinely good question that I was also wondering some time ago.
And it is a genuinely good question!
I think the answer of PD says feels the truest.
JS/CSS with all its bureaucracy are nothing compared to HTML it seems. Maybe people don't find nothing wrong with Html, maybe if they do, they just reach out for js/css and try to fix html (ahem frontend frameworks).
That being said, I have just regurgitated what PD says has said and I give him full credit of that but I am also genuinely confused as to why I have heard that JS / CSS are bureaucratic (I remember that there was this fireship video of types being added in JS and I think I had watched it atleast 1 year ago (can be wrong) but I haven't heard anything for it and I see a lot of JS proposals just stuck from my observation
And yet HTML is such level of bureaucratic that the answer to why HTML doesn't have a feature is because of its bureaucracy. Maybe someone can explain the history of it and why?
Are we solving the information-centric transclusion problem,
or the design-centric asset reuse problem?
An iframe is fine for the former but is not geared towards design and layout solutions.
It kinda sucks for both! Dropping in a box of text that flatly does not resize to fit its contents does not fit the definition of "fine" for me, here.
You can do some really silly maneuvers with `window.postMessage` to communicate an expected size between the parent and frame on resize, but that's expensive and fiddly.
Because it's HyperText, the main idea is that you link to other content, so this is not a weird feature that is being asked for, it's just a different way of doing the whole raison d'etre of the tech. In fact the tag to link stuff is the <a> tag. It just so happens that it makes you load the other "page", instead of transcluding content, the idea is that you load it.
It wouldn't make sense to transclude the article about the United States in the article about Wyoming (and in fact modern wikipedia shows a pop up bubble doing a partial transclusion, but would benefit in no way from basic html transclusion.)
It's a simple idea. But of course modern HTML is not at all what HTML was designed to be, but that's the canonical answer.
The elders of HTML would just tell you to make an <a> link to whatever you wanted to transclude instead. Be it a "footer/header/table of contents" or another encylcopdic article, or whatever. Because that's how HTML works, and not the way you suggest.
Think of what would happen if it were the case, you would transclude page A, which transcludes page B, and so with page C, possibly recursively transcluding page B and so. You would transform the User Agent (browser) into a whole WWW crawler!
It's because HTML is pass by reference, not pass by copy.
the web platform is the tech stack version of the human concept of "failing upward". It sucks but will only get more and more vital in the modern tech scene as time goes by.
Honest answer: because any serious efforts to improve HTML died 20 years ago, and the web as it's envisaged today is not an infinite library of the worlds knowledge but instead a JavaScript based and platform.
Asking for things that the W3C had specced out in 2006 for XML tech is just not reasonable if it doesn't facilitate clicks.
<iframe> is different from what the author is asking for, it has its own DOM and etc. He wants something like an SSI but client side. He explains some of the problems right after the part you cut off above
"We’ve got <iframe>, which technically is a pure HTML solution, but they are bad for overall performance, accessibility, and generally extremely awkward here"
Headers and their menus are often problematic for this approach, unless they are 100% static (e.g. HN would work but Reddit and Google wouldn't since they both put things in their header which can expand over the content). I.e. you can make it transparent but that doesn't solve eating the interactions. The code needed to work around that is more than just using JS to do the imports.
I guess for the similar reason that Markdown does not have any "include" ability -- it is a feature not useful enough yet with too many issues to deal with. They are really intended to be used as "single" documents.
HTML was historically an application of SGML, and SGML could do includes. You could define a new "entity", and if you created a "system" entity, you could refer to it later and have it substituted in.
SGML is complex, so various efforts were made to simplify HTML, and that's one of the capabilities that was dropped along the way.We also had a brief detour into XML with XHTML, and XML has XInclude, although it's not a required feature.
The XML subset of SGML still includes most forms of entity usage SGML has, including external general entities as described by grandparent. XInclude can include any fragment not just a complete document, but apart from that was redundant, and what remains of XInclude in HTML today (<svg href=...>) doesnt't make use of fragments and also does away with the xinclude and other namespaces. For reusing fragments OTOH, SVG has the more specific <use href=...> construct. XInclude also really worked bad in the presence of XML Schema.
It's too bad we didn't go down the XHTML/semantic web route twenty years ago.
Strict documents, reusable types, microformats, etc. would have put search into the hands of the masses rather than kept it in Google's unique domain.
The web would have been more composible and P2P. We'd have been able to slurp first class article content, comments, contact details, factual information, addresses, etc., and built a wealth of tooling.
Google / WhatWG wanted easy to author pages (~="sloppy markup, nonstandard docs") because nobody else could "organize the web" like them if it was disorganized by default.
Once the late 2010's came to pass, Google's need for the web started to wane. They directly embed lifted facts into the search results, tried to push AMP to keep us from going to websites, etc.
Google's decisions and technologies have been designed to keep us in their funnel. Web tech has been nudged and mutated to accomplish that. It's especially easy to see when the tides change.
I kinda bailed on being optimistic/enthusiastic about the Web when xhtml wasn't adopted as the way forward.
It was such a huge improvement. For some reason rather than just tolerating old tag-soup mess while forging the way for a brighter future, we went "nah, let's embrace the mess". WTF.
It was so cool to be able to apply XML tools to the Web and have it actually work. Like getting a big present for Christmas. That was promptly thrown in a dumpster.
As a programmer, I really liked XHTML because it meant I could use a regular XML parser/writer to work with it. Such components can be made small and efficient if you don't need the more advanced features of XML (ex: schemas), on the level of JSON. I remember an app I wrote that had a "print" feature that worked by generating an HTML document. We made it XHTML, and used the XML library we already used elsewhere to generate the document. Much more reliable than concatenating strings (hello injections!) and no need for an additional dependency.
In addition, we used XSLT quite a bit too. It is nice being able to open your XML data files in a web browser and having it nicely formatted without any external software. All you needed was a link to the style sheet.
The thing I liked the most about XHTML was how it enforced strict notation.
Elements had to be used in their pure form, and CSS was for all visual presentation.
It really helped me understand and be better at web development - getting the tick from the XHTML validator was always an achievement for complicated webpages.
The "semantic" part was what eventually became W3C's RDF stuff (a pet peeve of TBL's predating even the Web). When people squeeze poetry, threaded discussion, and other emergent text forms into a vocabulary for casual academic publishing and call that "semantic HTML", that still doesn't make it semantic.
The "strict markup" part can be (and always could be) had using SGML which is just a superset of XML that also supports HTML empty elements, tag inference, attribute shortforms, etc. HTML was invented as SGML vocabulary in the first place.
Agree though that Google derailed any meaningful standardization effort for the readins you stated. Actually, it started already with CSS and the idioticy to pile yet another item-value syntax over SGML/HTML, when it already has attributes for formatting. The "semantic HTML" postulate is kind of just an after-the-fact justification for insane CSS complexity that could grow because it wasn't part of HTML proper and the scrutinity that goes with introducing new elements or attributes with it.
I don't think there was ever a sustainable route to a semantic web that would work for the masses.
People wanted to write and publish. Only a small portion of people/institutions would have had the resources or appetite to tag factual information on their pages. Most people would have ignored the semantic taxonomies (or just wouldn't have published at all). I guess a small and insular semantic web is better than no semantic web, but I doubt there was a scenario where the web would have been as rich as it actually became, but was also rigidly organized.
Also even if you do have good practices of semantic tagging, there are huge epistemological problems around taxonomies - who constructs them, what do the terms actually mean, how to organize them and so on.
In my experience trying to work with wikidata taxonomies, it can be a total mess when it's crowdsourced, and if you go to am "expert" derived taxonomy there are all kinds of other problems with coverage, meaning, democracy.
I've had a few flirtations with the semantic web going back to 2007 and long ago came to the personal conclusion that unfortunately AI is the only viable approach.
That’s not how the history went at all. When I worked at an internet co in the late 1990s (ie pre google’s dominance) SGML was a minority interest back then. We used to try to sell clients on an intranet based on SGML because of the flexibility etc and there was little interest and sloppy markup and incorrect html was very much the norm on the web back then (pre chrome etc)
I kinda agree with you but I'd argue the "death" of microformats is unrelated to the death of XHTML (tho schema.org is still around).
You could still use e.g. hReview today, but nobody does. In the end the problem of microformats was that "I want my content to be used outside my web property" is something nobody wants, beyond search engines that are supposed to drive traffic to you.
The fediverse is the only chance of reviving that concept because it basically keeps attribution around.
JSON LD is alive and kicking.
Me personally, I didn't even care that much about strict semantic web, but XML has the benefits of the entire ecosystem around it (like XPath and XSLT), composable extensibility in form of namespaces etc. It was very frustrating to see all that thrown out with HTML5, and the reasoning never made any sense to me (backwards compatibility with pre-XHTML pages would be best handled by defining a spec according to which they should be converted to XHTML).
If XHTML was literally just HTML but with XML syntax, it would be pretty cool.
It is.
XHTML 1.0 was, and they evolved it incompatibly.
isn't it what XHTML5 is, more or less?
The semantic web is a silly dream of the 90s and 00s. It's not a realizabile technology, and Google basically showed exactly why: as soon as you have a fixed algorithm for finding pages on the web, people will start gaming that algorithm to prioritize their content over others'. And I'm not talking about malicious actors trying to publish malware, but about every single publisher that has theoney to invest in figuring out how and doing it.
So any kind of purely algorithmic, metadata based retrieval algorithm would very quickly return almost pure garbage. What makes actual search engines work is the constant human work to change the algorithm in response to the people who are gaming it. Which goes against the idea of the semantic web somewhat, and completely against the idea of a local-first web search engine for the masses.
I would encourage you to go and read more about triples/asserting facts, and the trust/provenance of facts in this context. You are basically saying "it's impossible to make basic claims" in your comment, which perhaps you don't realize
It was certainly a good way to win EU grants.
XHTML was just a more strict syntax for HTML. It didnt make it any more semantic.
I'm as big a critic of Google as anyone, but I'm always surprised at modern day takes around the lost semantic web technologies - they are missing facts or jumping to conclusions in hindsight.
Here's what people should know.
1) The failure of XHTML was very much a multi-vendor, industry-wide affair; the problem was that the syntax of XML was stricter than the syntax of HTML, and the web was already littered with broken HTML that the browser vendors all had to implement layers of quirk handling to parse. There was simply no clear user payoff for moving to the stricter parsing rules of XML and there was basically no vendor who wanted to do the work. To my memory Google does not really stand out here, they largely avoided working on what was frequently referred to as a science project, like all the other vendors.
2) In subsequent years, Google actually has actually delivered a semantic web of sorts: https://developers.google.com/search/docs/appearance/structu...
A few things stand out as interesting. First of all, the old semantic web never had a business case. JSON+LD Structured Data does: Google will parse your structured data and use it to inform the various snippets, factoids, previews and interactive widgets they show all over their search engine and other web properties. So as a result JSON+LD has taken off massively. Millions of websites have adopted it. The data is there in the document. It is just in a JSON+LD section. If you work in SEO you know all about this. Seems to be quite rare that anyone on Hacker News is aware of it however.
Second interesting thing, why did we end up with the semantic data being in JSON in a separate section of the file? I don't know. I think everyone just found that interleaving it within the HTML was not that useful. For the legacy reasons discussed earlier, HTML is a mess. It's difficult to parse. It's overloaded with a lot of stuff. JSON is the more modern thing. It seems reasonable to me that we ended up with this implementation. Note that Google does have some level of support for other semantic data, like RDFa which I think is directly in the HTML - it is not popular.
Which brings us to the third interesting thing, the JSON+LD schemas Google uses, are standards, or at least... standard-y. The W3C is involved. Google, Yahoo, Yandex and Microsoft have made the largest contributions to my knowledge. You can read all about it on schema.org.
TL;DR - XHTML was not a practical technology and no browser or tool vendor wanted to support it. We eventually got the semantic web anyway!
I remember just using PHP sessions back then on a XHTML document produced parse errors. Because PHP added the session to the query strings of links and used the raw & character instead of & for separating params in the query string. Thus causing a XML parse error.
There was a push to prevent browsers to be too lenient with the syntax in order to avoid the problem that sloppy HTML produced (inconsistent rendering across browsers)
That is not true at all…
Point n°2 is only partially correct.
Google does support multiple semantic web standards: RDFa, JSON+LD and I believe microdata as well.
JSON+LD is much simpler to extract and parse, however it makes site HTML bigger because information gets duplicated compared to RDFa where values could be inclined.
The “semantic web” has been successful in a few areas but not so much as SQL or document databases. Many data formats use it, such RSS feeds and XMP metadata used by Adobe tools.
As someone who worked in the field of "semantic XML processing" at the time I can tell you that while the "XML processing" part was (while full of unnecessary complications) well understood, the "semantic" part was purely aspirational and never well understood. The common theme with the current flurry of LLMs and their noisy proponents is that it is, in both cases, possible to do worthwhile and impressive demos with these technologies and also real applications that do useful things, but people who have their feet on the ground know that XML doesn't engender "semantics" and LLMs are not "conscious". Yet the hype meddlers keep the fire burning by suggesting that if you just do "more XML" and build bigger LLMs, then at some point real semantics and actual conscience will somehow emerge like a hatching chicken from the egg. And, being emergent properties, who is to say semantics and conscience will not emerge, at some point somehow? A "heap" of grains is emergent after all, and so is the "wetness" of water. But I have strong doubts about XHTML being more semantic than HTML5.
And anyway, even if Google had nefarious intentions and even if they managed to steer the standardization, one has also to concede that all search engines before Google were encumbered by too much structure, too rigid approaches. When you were looking for a book in a computerized library at that point it was standard to be sat in front of a search form with many, many fields; one for the author's name, one for the title and so forth, and searching was not only a pain, it was also very hard to do for a user without prior training. Google had demonstrated it could deliver far better results with a single short form field filled out by naive users that just plonked down three or five words that were on their mind et voila. They made it plausible that instead of imposing a structure onto data at creation time maybe it's more effective to discover associations in the data at search time (well, at indexing time really).
As for the strictness of documents, I'm not sure what it will give you what we don't get with sloppy documents. OK web browsers could refuse to display a web page if any one image tag is missing the required `alt` attribute. So now what happens, will web authors duly include alt="picture of a cat" for each picture of a cat? Maybe, to a degree, but the other 80% of alt tags will just contain some useless drivel to appease the browser. I'm actually more for strict documents than I used to be, but on the other hand we (I mean web browsers) have become quite good at reconstructing usable HTML documents from less-than perfect sources, and the reconstructed source is also a strictly validating source. So I doubt this is the missing piece; I think the semantic web failed because the idea never was strong, clear, compelling, well-defined and rewarding enough to catch on with enough people.
If we're honest, we still don't know, 25 years later, what 'semantic' means after all.
It existed also in DTD (Document Type Definition) used with HTML 4 and below, and XML. Came fromn SGML too I guess.
Yes it did, and there are HTML 5.x DTDs for HTML versions newer than HTML 4.x ar [1], including post-HTML 5.2 review drafts until 2023; see notes at [2].
[1]: https://sgmljs.net/docs/html5.html
[2]: https://sgmljs.net/blog/blog2303.html
Neat reference, going to look into that.
The <object> tag appears to include/embed other html pages.
An embedded HTML page:
<object data="snippet.html" width="500" height="200"></object>
https://www.w3schools.com/tags/tag_object.asp
<object> used like this is just a poor iframe in a much shakier spot in the standards, mostly for backwards compatibility.
Like iframe, it "includes" a full subdocument as a block element, which isn't quite what the OP is hinting at.
Sound like it's good enough for headers and footers, which is 80% of what people need.
That’s what lots of sites used to do in the late 90s and early aughts in order to have fixed elements.
It was really shit. Browser navigation cues disappear, minor errors will fuck up the entire thing by navigating fixed element frames instead of contents, design flexibility disappears (even as consistent styling requires more efforts), frames don’t content-size so will clip and show scroll bars all over, debugging is absolute ass, …
And it increases resource use.
It’s not ideal, but t it does exist in pure html… and the OP didn’t seem to note it.
A bit of vanilla JavaScript with WebComponents is a few lines:
https://gomakethings.com/html-includes-with-web-components/
Edit: “t” was supposed to be the object tag.
> it does exist in pure html [...] JavaScript with WebComponents
You seem to have a rather original definition of "pure HTML".
Typo, from the numerous fat finger typos you see above. :)
An html only option that exists is using object. Replying to the miss of the OP in case others might find it suitable.
If a tiny bit of vanilla JavaScript can be tolerated, WebComponents appear to have a broad standardized approach that is not framework dependant.
Whether it’s good enough or not it does exist and everyone can decide if it works for them.
I’d probably explore WebComponents, but wanting the height of JavaScript without JavaScript..
It’s an option built into html.
The OP doesn’t need to hint.
Yeah, that is just a crappier version of HTML Frames [1]
1 - https://en.m.wikipedia.org/wiki/Frame_(World_Wide_Web)
I don’t disagree, the premise of the article seems to be unaware, it’s entirely possible to us html alone to do includes. :)
Some might argue react is over abstracted or over engineered to do the same.
Interpretation and preference is different than if it’s possible
Well, that is an entire attack surface, on it's own.
https://en.wikipedia.org/wiki/Billion_laughs_attack
https://en.wikipedia.org/wiki/XML_external_entity_attack would be the more relavent link.
This was the rabbit hole that I started down in the late 90s and still haven’t come out of. I was the webmaster of the Analog Science Fiction website and I was building tons of static pages, each with the same header and side bar. It drove me nuts. So I did some research and found out about Apache server side includes. Woo hoo! Keeping it DRY (before I knew DRY was a thing).
Yeah, we’ve been solving this over and over in different ways. For those saying that iframes are good enough, they’re not. Iframes don’t expand to fit content. And server side solutions require a server. Why not have a simple client side method for this? I think it’s a valid question. Now that we’re fixing a lot of the irritation in web development, it seems worth considering.
Server-side includes FTW! When a buddy and I started making "web stuff" back in the mid-90s the idea of DRY also just made sense to us.
My dialup ISP back then didn't disable using .htaccess files in the web space they provided to end users. That meant I could turn on server-side includes! Later I figured out how to enable CGI. (I even went so far as to code rudimentary webshells in Perl just so I could explore the webserver box...)
I've become a fan of https://htmx.org for this reason.
A small 10KB lib that augments HTML with the essential good stuff (like dynamic imports of static HTML)
Seems like overkill to bring in a framework just for inlining some static html. If that's all you're doing, a self-replacing script tag is neat:
... The `script` element is replaced with the html from `/footer.html`.this here is the main idea of HTMX - extended to work for any tag p, div, content, aside …
there are many examples of HTMX (since it is a self contained and tiny) being used alongside existing frameworks
of course for some of us, since HTMX brings dynamic UX to back end frameworks, it is a way of life https://harcstack.org (warning - raku code may hurt your eyes)
But this requires JavaScript...
> But this requires JavaScript...
Depending on the specific objection to Javascript, this may or may not matter:
1. You object to any/all JS on a page? Yeah, then this won't work for you.
2. You object to having to write JS just to get client-side includes? This should mostly work for you.
It all depends on what the actual objection is.
... but the folks behind that standard don't want to encourage browsing with Javascript off.
Why would the ES standards organization that defines JavaScript syntax encourage people not to use it?
The minified version needs ~51 kilobytes (16 compressed):
see fixi if you want bare-bones version of the same idea:
https://github.com/bigskysoftware/fixi
> Iframes don’t expand to fit content
Actually, that was part of the original plan - https://caniuse.com/iframe-seamless
I used the seamless attribute extensively in the past, it still doesn't work the way GP intended, which is to fit in the layout flow, for example to take the full width provided by the parent, or automatically resize the height (the pain of years of my career)
It worked rather like a reverse shadow DOM, allowing CSS from the parent document to leak into the child, removing borders and other visual chrome that would make it distinguishable from the host, except you still had to use fixed CSS layouts and resize it with JS.
The optimal solution would be using a template engine to generate static documents.
> The optimal solution would be using a template engine to generate static documents.
This helps the creator, but not the consumer, right? That is, if I visit 100 of your static documents created with a template engine, then I'll still be downloading some identical content 100 times.
I'll still be downloading some identical content 100 times.
That doesn't seem like a significant problem at all, on the consumer side.
What is this identical content across 100 different pages? Page header, footer, sidebar? The text content of those should be small relative to the unique page content, so who cares?
Usually most of the weight is images, scripts and CSS, and those don't need to be duplicated.
If the common text content is large for some reason, put the small dynamic part in an iframe, or swap it out with javascript.
If anyone has a genuine example of a site where redundant HTML content across multiple pages caused significant bloat, I'd be interested to hear about it.
I care! It is unnecessary complexity, and frankly ugly. If you can avoid repetition, then you should, even if the reason is not obvious.
To give you a concrete example, consider caching (or, equivalently, compiling) web pages. Maybe you have 100 articles, which share a common header and footer. If you make a change to the header, then all 100 articles have to be uncached/rebuilt. Why? Because somebody did not remove the duplication when they had the chance :-)
Compression Dictionary Transport [0] seems like something that can potentially address this. If you squint, this looks almost like XSLT.
[0] https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/Com...
True for any server side solution, yes.
On the other hand it means less work for the client, which is a pretty big deal on mobile.
XSLT solved this problem. But it had poor tool support (DreamWeaver etc) and a bunch of anti-XML sentiment I assume as blowback from capital-E Enterprise stacks going insane with XML for everything.
XSLT did exactly what HTML includes could do and more. The user agent could cache stylesheets or if it wanted override a linked stylesheet (like with CSS) and transform the raw data any way it wanted.
The Umbraco CMS was amazing during the time that it used and supported XSLT.
While it evaluated the xslt serverside it was a really neat and simple approach.
macros!
Don't Service Workers API provide this now, essentially act like a in-browser proxy to the server.
https://developer.mozilla.org/en-US/docs/Web/API/Service_Wor...
Rational or not, some of us try very hard to avoid JavaScript based solutions.
> Woo hoo! Keeping it DRY (before I knew DRY was a thing)
I still remember the script I wrote to replace thousands (literally) slightly different headers and footers in some large websites of the 90s. How liberating to finally have that.
You can message the page dimensions to the parent. To do it x domain you can load the same url into the parent with the height in the #location hash. It won't refresh that way.
I know it’s possible to work around it, but that’s not the point. This is such a common use case that it seems worthwhile to pave the cowpath. We’ve paved a lot of cowpaths that are far less trodden than this one. This is practically a cow superhighway.
We’ve built an industry around solving this problem. What if, for some basic web publishing use cases, we could replace a complex web framework with one new tag?
> We’ve built an industry around solving this problem. What if, for some basic web publishing use cases, we could replace a complex web framework with one new tag?
I actually did that replacement, with a few enhancements (maybe 100 lines of code, total?). It's in arxiv pending at the moment. In about two days it will be done and I'll post a Show HN here.
I couldn't agree more.
<div src="foo.txt"></div>
https://www.w3.org/TR/xhtml2/introduction.html
> XHTML 2 takes a completely different approach, by taking the premise that all images have a long description and treating the image and the text as equivalents. In XHTML 2 any element may have a @src attribute, which specifies a resource (such as an image) to load instead of the element.
> Why not have a simple client side method for this?
Like writing a line of js?
A line of JS that has to run through the Javascript interpreter in your browser rather than a simple I/O operation?
If internally this gets optimized to a simple I/O operation (which it should) then why add the JS indirection in the first place?
A block of in-line JavaScript stops the renderer until it runs because its output cannot be determined before it completes.
> A block of in-line JavaScript stops the renderer until it runs because its output cannot be determined before it completes.
I do it in a way that doesn't stop the renderer.
So would any form of html inclusion.
The difference between "a line of JS" and a standardized declarative solution is of course that a meek "line of $turing_complete_language" can not, in the general case, be known and trusted to do what it purports to do, and nothing else; you've basically enabled any kind of computation, and any kind of behavior. With an include tag or attribute that's different; it's behavior is described by standards, and (except for knowing what content we might be pulling in) we can 100% tell the effects from static analysis, that is, without executing the code. With "a line of JS" the only way, in the general case, to know what it does is to run it (an infinite number of times). Also, because it's not standardized, it's much harder to save to disk, to index and to archive it.
I mean in 1996s netscape you could do this (I run the server for a website that still uses this):
The thing that always bugged me about frames is that they are too clever. I don't want to reload only the frame html when I rightclick and reload. Sure the idea was to cache those separately, but come on — frames and caching are meant to solve two different problems and by munching them together they somewhat sucked at solving either.To me includes for HTML should work in the dumbest way possible. And that means: Take the text from the include and paste it where the include was and give the browser the resulting text.
If you want to cache a nav section separately because it appears the same on every page lets add a cache attribute that solves the problem independently:
To tell the browser it should load the inner html or the src of that element from cache if it has it.Now you could convince me thst the include should allow for more, but it being dumb is a feature not a bug.
Nitpick: the HTML4 spec was released in December 1997, and HTML4.01 only in December 1999 so it probably wouldn't have run in 1996s Netscape.
The doctype doesn’t matter in this context. Netscape Navigator 2 supported frames in 1995 and would render that page.
Back then it was common for Netscape to have features that (years) later became standard HTML.
The web seems like it was deliberately designed to make any form of composability impossible. It’s one of the worst things about it as a platform.
I’m sure some purist argument has driven this somewhere.
I think of all the “hygienic macro” sorts of problems. You really ought to be able to transclude a chunk of HTML and the associated CSS into another document but you have to watch out for ‘id’ being unique never mind the same names being used for CSS classes. Figuring out the rendering intent for CSS could also be complicated: the guest CSS might be written like
Where the container is basically the whole guest document but you still want those rules to apply…. Maybe, you want the guest text to appear in the same font as the host document but you still want colors and font weights to apply. Maybe you want to make the colors muted to be consistent with the host document, maybe the background of the host document is different and the guest text isn’t contrasts enough anymore, etc.I look back longingly at the promise of XML services in the early days of Web 2.0. Before the term just meant JavaScript everywhere.
All sorts of data could be linked together to display or remix by user agents.
HTML is a markup language, not a programming language. It's like asking why Markdown can't handle includes. Some Markdown editors support them (just like some server-side tools do for HTML), but not all.
Including another document is much closer to a markup operation than a programming operation. We already include styles, scripts, images, videos, fonts...why not document fragments?
Markdown can't do most of those, so it makes more sense why it doesn't have includes, but I'd still argue it definitely should. I generally dislike LaTeX, but about the only thing I liked about it when writing my thesis was that I could have each chapter in its own file and just include all of them in the main file.
This isn’t programming. It’s transclusion[0]. Essentially, iframes and images are already forms of transclusion, so why not transclude html and have the iframe expand to fit the content?
As I wrote that, I realized there could be cumulative layout shift, so that’s an argument against. To avoid that, the browser would have to download all transcluded content before rendering. In the past, this would have been a dealbreaker, but maybe it’s more feasible now with http multiplexing.
[0] https://en.m.wikipedia.org/wiki/Transclusion#Client-side_HTM...
With Early Hints (HTTP code 103), it seems especially feasible. You can start downloading the included content one round-trip after the first byte is sent.
Well, asciidoc - a markup language supports includes, so the "markup languages" analogy doesn't hold.
https://docs.asciidoctor.org/asciidoc/latest/directives/incl...
I think this is the most likely answer.
I'm not defending it, because when I started web development this was one of the first problems I ran into as well -- how the heck do you include a common header.
But the original concept of HTML was standalone documents, not websites with reusable components like headers and footers and navbars.
That being said, I still don't understand why then the frames monstrosity was invented, rather than a basic include. To save on bandwidth or something?
The original concept of HTML was as an SGML subset, and SGML had this functionality, precisely because it's very handy for document authoring to be able to share common snippets.
Frames were widely abused by early web apps to do dynamic interfaces before XHR was invented/widely supported. The "app" had a bunch of sub-frames with all the links and forms carefully pointing to different frames in the frameset.
A link in a sidebar frame would open a link in the "editor" frame which loaded a page with a normal HTML form. Submitting the form reloaded it in that same frame. Often the form would have multiple submit buttons, one to save edits in progress and another to submit the completed form and move to the next step. The current app state was maintained server side and validation was often handled there save for some basic formatting client side JavaScript could handle.
This setup allowed even the most primitive frame-supporting browsers to use CRUD web apps. IIRC early web frameworks like WebObjects leaned into that model of web app.
Oh my goodness, yes you're right, I'd forgotten entirely about those.
They were horrible -- you'd hit the back button and only one of the frames would go back and then the app would be in an inconsistent state... it was a mess!
You needed to hit the reset button (and hoped it worked) and never the back button! Yes, I suffered through early SAP web apps built entirely with frames and HTML forms. It was terrible.
I don't love JavaScript monstrosities but XHR and dynamic HTML were a vast improvement over HTML forms and frame/iframe abuse.
Really well written web form applications were a delight in 2001 and a large improvement over conventional applications written in Windows. It helped that application data was in a SQL database, with a schema, protected by transactions, etc as opposed to a tangle of pointers that would eventually go bad and crash the app -- I made very complicated forms for demographic profiling, scientific paper submission, application submission, document search, etc. If you did not use "session" variables for application state this could at worst cause a desynchronization between the browser and the server which (1) would get resynchronized at any load or reload and (2) never get the system into a "stuck" state from the user viewpoint and (3) never lose more than a screen full of work.
Try some other architecture though and all bets were off.
Amazon's web store looked and worked mostly the same as it does now, people were very impressed with MapQuest, etc.
Applications like that can feel really fast, almost desktop application fast, if you are running them on a powerful desktop computer and viewing them on another computer or tablet over a LAN
To be fair, modern SAP web apps are also terrible.
A lot of early HTML was about taking the output of a different system such as a mainframe and putting that output into HTML.
Lots of gateways between systems.
That’s the Hyper part of HTML, and what makes it special.
It’s made to pull in external resources (as opposed to other document formats like PDF).
Scripts, stylesheets, images, objects, favicons, etc. HTML is thematically similar.
No, HTML is fundamentally different because (for a static site without any JS dom manipulation) it has all the semantic content, while stylesheets, images, objects, etc. are just about presentation.
Images are content. Videos are content. Objects/iframes are content.
The only one that is presentational is stylesheets.
Which (as I'm sure you know), also literally has 'content' :)
https://developer.mozilla.org/en-US/docs/Web/CSS/content
True :)
Images and videos are not semantic content. The alt attributes that describe them on the other hand are indeed semantic content.
Can you describe what semantic, non-textual content would be?
> Images and videos are not semantic content
Something in that tenet does not compute with me.
I think the distinction is "semantic on what level/perspective?". An image packaged as a binary blob is semantically opaque until it is rendered. Meanwhile, seeing <img> in the HTML or the file extension .jpg in any context that displays file extensions tells me some information right out of the gate. And note that all three of these examples are different information: the HTML tag tells me it's an image, whereas the file extension tells me it's a JPEG image, and the image tells me what the image contains. HTML is an example of some kind of separation, as it can tell you some semantic meaning of the data without telling you all of it. Distinguishing and then actually separating semantics means data can be interpreted with different semantics, and we usually choose to focus on one alternative interpretation. Then I can say that HTML alone regards some semantics (e.g. there is an image here) while disregarding others (e.g. the image is an image of a brick house).
I'm not sure what isn't computing. Presumably you know (or have looked up) the meaning of "semantic"? Images and videos are graphic, not semantic, content. To the extent they are rendering semantic content, that content should be described in the alt tag.
Iframes exist.
Markdown doesn't have this common HTML pattern of wanting to include a header/footer in all pages of a site.
The feature proposal was called HTML Imports [1], created as part of the Web Components effort.
> HTML Imports are a way to include and reuse HTML documents in other HTML documents
There were plans for <template> tag support and everything.
If I remember correctly, Google implemented the proposed spec in Blink but everyone else balked for various reasons. Mozilla was concerned with the complexity of the implementation and its security implications, as well as the overlap with ES6 modules. Without vendor support, the proposal was officially discontinued.
[1] https://www.w3.org/TR/html-imports/
That matches with the comment [1] on the article, citing insufficient demand, no vendor enthusiasm, etc.
The thing is that all those are non-reasons that don't really explain anything: Low demand is hard to believe if this feature is requested for 20 years straight and there are all kinds of shim implementations using scripts, backend engines, etc. (And low demand didn't stop other features that the vendors were interested in for their own reasons)
Vendor refusal also doesn't explain why they refused it, even to the point of rolling back implementations that already existed.
So I'd be interested to understand the "various reasons" in more detail.
"Security implications" also seem odd as you already are perfectly able to import HTML cross origin using script tags. Why is importing a script that does document.write() fine, but a HTML tag that does exactly the same thing hugely problematic?
(I understand the security concern that you wouldn't want to allow something like "<import src=google.com>" and get an instant clone of the Google homepage. But that issue seems trivially solvable with CORS.)
[1] https://frontendmasters.com/blog/seeking-an-answer-why-cant-...
That is a bit of a large ask.
There are various specs/semantics you can choose, which prescribe the implementation & required cross-cutting complexity. Security is only relevant in some of them.
To give you some idea:
- HTML load ordering is a pretty deeply held assumption. People understand JS can change those assumptions (document.write). Adding an obscure HTML tags that does so is going to be an endless parade of bugs & edge cases.
- To keep top-to-bottom fast we could define preload semantics (Dropping the linear req-reply, define client-cache update policy when the template changes, etc). Is that added complexity truly simpler than having the server combine templates?
- <iframe> exists
In other words, to do the simplest thing 75% of people want, requires a few lines of code. Either client side or server side.
To fit the other 25% (even to 'deny' it) is endlessly complex in ways few if any can oversee.
Maybe something that adds to this low demand is that: 1. Web pages that are developed from the viewpoint of the user having JS, makes it trivial to implement something that provides the same results. 2. Web pages that are developed for user agents that don't run js, probably want to have some interaction, so already have a server runtime that can provide this feature. 2b. And if it doesn't have any user interaction, its probably a static content site, and nobody is writing content in HTML, so there already is a build step that provides this feature.
HTML imports could not include markup within the body, it could only be used to reference template elements for custom elements
JS-first developers want something that works the same way client-side and server-side, and the mainstream front-end dev community shifted to JS-first, for better or worse
HTML Imports went in a similar direction but they do not do what the blog post is about. HTML should be imported and displayed in a specific place of the document. HTML Imports could not do this without JavaScript.
See https://github.com/whatwg/html/issues/2791#issuecomment-3112... for details.
To be fair, it was pretty complicated. IIRC, using it required using Javascript to instantiate the template after importing it, rather than just having something like <include src="myinclude.html">.
https://caniuse.com/imports says FF even had it as a config flag
Tbf, HTML Imports were significantly more complex than includes, which this article requests.
Frames essentially could do html import
Netscape 4 has this with inflow layers — `<ILAYER SRC=included.html></ILAYER>`
https://web.archive.org/web/19970630074729fw_/http://develop...
https://web.archive.org/web/19970630094813fw_/http://develop...
As far as I'm aware of it, changing the SRC-attribute was quite crash-y and the functionality was stripped soon. (I remember playing with this in beta, and then it was gone in the production version.)
I always wondered why it was called ILAYER. Ty
The name of this feature is transclusion.
https://en.wikipedia.org/wiki/Transclusion
It was part of Project Xanadu, and originally considered to be an important feature of hypertext.
Notably, mediawiki uses transclusion extensively. It sometimes feels like the wiki is the truest form of hypertext.
Ward Cunningham (inventor of the Wiki) spent some time trying to invent a transclusion-first wiki, where everyone had their own wiki-space and used transclusion socially https://en.wikipedia.org/wiki/Federated_Wiki
it never quite took off
I think true transclusion would be more than that.
In Xanadu you could transclude just an excerpt from one document into another document.
If you wanted to do this with HTML you need an answer for the CSS. In any particular case you can solve it, making judgements about which attributes should be consistent between the host document, the guest document and the guest-embedded-in-host. The general case, however, is unclear.
For a straightforward <include ...> tag the guest document is engineered to live inside the CSS environment (descendant of the 3rd div child of a p that has class ".rodney") that the host puts it in.
Another straightforward answer is the Shadow DOM which, for the most part, lets the guest style itself without affecting the rest of the document. I think in that case the host can still put some styles in to patch the guest.
Isn't this what proper framesets (not iframes) were supposed to do a long time ago (HTML 4?). At least they autoexpanded just fine and the user could even adjust the size to their preference.
There was a lot of criticism for frames [1] but still they were successfully deployed for useful stuff like Java API documentation [2].
In my opinion the whole thing didn't stay mostly because of too little flexibility for designer: Framesets were probably well enough for useful information pages but didn't account for all the designers' needs with their bulky scrollbars and limited number of subspaces on the screen. Today it is too late to revive them because framesets as-is wouldn't probably work well on mobile...
[1] <https://www.nngroup.com/articles/why-frames-suck-most-of-the...> - I love how much of it is not applicable anymore and all of these problems mentioned with frames are present in today's web in an even nastier way?
[2] <https://www.eeng.dcu.ie/~ee553/ee402notes/html/figures/JavaD...>
Issue with frame set was way more fundamental: No deep linking, thus people coming via bookmarks or Google (or predecessor) were left on a page without navigation, which people then tried working around with JavaScript, which never gave it a good experience.
Nowdays it is sometimes the other way around: Pages are all JavaScript so no good experience in the first place. I have encountered difficulty trying to get a proper “link” to something multiple times. Also, given that Browsers love to reduce/hide the address bar I wonder if it is really still that important a feature.
Of course "back then" this was an important feature and one of the reasons for getting rid of frames :)
"Includes" functionality is considered to be server-side, i.e. handled outside of the web browser. HTML is client-side, and really just a markup syntax, not a programming language.
As the article says, the problem is a solved one. The "includes" issue is how every web design student learns about PHP. In most CMSes, "includes" become "template partials" and are one of the first things explained in the documentation.
There really isn't any need to make includes available through just HTML. HTML is a presentation format and doesn't do anything interesting without CSS and JS anyway.
> "Includes" functionality is considered to be server-side, i.e. handled outside of the web browser. HTML is client-side, and really just a markup syntax, not a programming language.
That's not an argument that client-side includes shouldn't happen. In fact HTML already has worse versions of this via frames and iframes. A client-side equivalent of a server-side include fits naturally into what people do with HTML.
I think it feels off because an HTML file can include scripts, fonts, images, videos, styles, and probably a few other things. But not HTML. It can probably be coded with a custom element (<include src=.../>). I would be surprised if there wasn't a github repo with something similar.
I created something like this relatively recently. The downside is of course that it requires JavaScript.
https://github.com/benstigsen/include.js
I also improved an existing custom component for this : https://amc.melanie-de-la-salette.fr/polyfill.js
Well said this is many students' intro to PHP. Why not `<include src=header.html/>` though?
Some content is already loaded asynchronously such as images, content below the fold etc.
> HTML is really just a markup syntax, not a programming language
flamebait detected :) It's a declarative language, interpreted by each browser engine separately.
What's the ML in HTML stand for? I think that's probably the crux of the argument. Are we gonna evolve it past its name?
If the issue is that "include" somehow makes it sound like it's not markup, the solution seems obvious. Just use the src attribute on other tags:
<html src="/some/page.html">, <div src="/some/div.html">, <span src="/some/span.html">, etc.
Or create a new tag that's a noun like fragment, page, document, subdoc or something.
Surely that's no less markup than svg, img, script, video, iframe, and what not.
It stands for "markup language", and was inherited from SGML, which had includes. Strictly speaking, so did early HTML (since it was just an SGML subset), it's just that browsers didn't bother implementing it, for the most part. So it's not that it didn't evolve, but rather it devolved.
Nor is this something unique to SGML. XML is also a "markup language", yet XInclude is a thing.
> It stands for "markup language", and was inherited from SGML, which had includes
touchay!!
That's why I joked about flamebait, it's hypertext though, aren't anchors essentially a goToURL() click handler in some ways? Template partials seem like a basic part of this system.
> considered to be server-side
Good point! Wouldn't fetching a template partial happen the same way (like fetching an image?)
> What's the ML in HTML stand for?
I always assumed it stood for my initials.
Agree with what you said, however, HTML is a document description language and not a presentation format. CSS is for presentation (assuming you meant styling).
They didn't mean styling.
HTML is a markup language that identifies the functional role of bits of text. In that sense, it is there to provide information about how to present the text, and is thus a presentation format.
It is also a document description language, because almost all document description languages are also a presentation format.
> As the article says, the problem is a solved one.
It's "solved" only in the sense that you need to use a programming language on the server to "solve" it. If all you are doing is static pages, it's most definitely not solved.
Then you just pre-build the page before publishing it. It's way cheaper as you do the work once, instead of every client being much slower because they have to do additional requests.
> Then you just pre-build the page before publishing it.
That's "using a programming language to solve the problem", isn't it?
> It's way cheaper as you do the work once, instead of every client being much slower because they have to do additional requests.
What work do client-side includes have to do other than fetching the page (which will get cached anyway)? It's less work to have a `<include-remote ...>` builtin than even a simple Makefile on the server.
It does not have to be a programming language on the server, no, unless you want to. The server can have a static site, that you build as you deploy it.
Fetching another resource is expensive. It's another round trip, and depending on many factors it could be another second to load the page. And if the HTML includes other nested HTML then it can be much slower.
This is the exact thing we try to avoid when building websites that perform well. You want as few chained requests as possible, and you want the browser to be aware of them as soon as possible, with the correct priority. That way the browser can get the important stuff needed to display content fast.
Including HTML client side for templating is just wasteful, slow and dumb from a technical standpoint.
Every client would have to do another request for each include. It would literally be many thousands of times slower(or worse) than doing it locally where the templates can be in memory as you render the pre-render the pages. You also save a ton of CPU cycles and bandwidth, by not serving more files with additional overhead like headers.
> It would literally be many thousands of times slower(or worse) than doing it locally where the templates can be in memory as you render the pre-render the pages.
Yeah, it's not. I'm doing client side includes and the includes get cached by the browser. I'm sure I would have noticed if my pages went from 1s to display to 1000s to display.
If you have a site/webapp with (say) twenty pages, that's only two extra requests for both header and footer.
An additional requests for html isn’t slow, and now I have to have a whole “build” process for something that is basically static. Not ideal
By "whole 'build' process", do you think something like a makefile or do you think something more advanced is required?
One drawback though would be that one indeed would have to maintain dependencies, which would be error prone beyond simply adding headers and footers... I wonder if one could (ab)use CPP [1] and its -M option to do that.
[1] https://gcc.gnu.org/onlinedocs/cpp/Invocation.html
Well, that very much depends on your definition of slow, doesn't it?
An additional request is another round trip. That can be very slow. Average TTFB on the internet in the US is ~0.7 seconds.
It's much faster to send it as part of the same request as you then don't have to wait for the browser to discover it, request it, wait for the response and then add it.
A build process does not have to be complicated, at all. If you can write HTML then using something that can simply read the HTML includes you wish existed and swap it with the specified filename is trivial.
Ofc, the idea has many other issues, like how to handle dependencies of the included HTML, how to handle conflicts, what oath to use and many more.
> "Includes" functionality is considered to be server-side
Exactly! Include makes perfect sense on server-side.
But client-side include means that the client should be able to modify original DOM at unknown moment of time. Options are
1. at HTML parse time (before even DOM is generated). This requires synchronous request to server for the inclusion. Not desirable.
2. after DOM creation: <include src=""> (or whatever) needs to appear in the DOM, chunk loaded asynchronously and then the <include> DOM element(sic!) needs to be replaced(or how?) by external fragment. This disables any existing DOM structure validation mechanism.
Having said that...
I've implemented <include> in my Sciter engine using strategy #1. It works there as HTML in Sciter usually comes from local app resources / file system where price of issuing additional "get chunk" request is negligible.
See: https://docs.sciter.com/docs/HTML/html-include
This argument applies just as much to CSS and JS. Why do they include "includes" when you can just bundle on the server?
For caching and sharing resources across the whole site, I suppose.
But that would apply to <header> and <footer> and <nav> too. We could cache them.
hearing someone assert that
> the problem is a solved one
is a sure-fire way to know that a problem is not solved
There are all kind of issues with HTML include as others have pointed out
If main.html includes child/include1.html and child/include1.html has a link src="include2.html" then when the user clicks the link where does it go? If it goes to "include2.html", which by the name was meant to be included, then that page is going to be missing everything else. If it goes to main.html, how does it specify this time, use include2.html, not include1.html?
You could do the opposite, you can have article1.html, article2.html, article3.html etc, each include header.html, footer.html, navi.html. Ok, that works, but now you've make it so making a global change to the structure of your articles requires editing all articles. In other words, if you want to add comments.html to every article you have to edit all articles and you're back to wanting to generate pages from articles based on some template at which point you don't need the browser to support include.
I also suspect there would be other issues, like the header wants to know the title, or the footer wants a next/prev link, which now require some way to communicate this info between includes and you're basically back to generate the pages and include not being a solution
I think if you work though the issues you'll find an HTML include would be practically useless for most use cases.
These are all solvable issues with fairly obvious solutions. For example:
> If main.html includes child/include1.html and child/include1.html has a link src="include2.html" then when the user clicks the link where does it go? If it goes to "include2.html", which by the name was meant to be included, then that page is going to be missing everything else. If it goes to main.html, how does it specify this time, use include2.html, not include1.html?
There are two distinct use cases here: snippet reuse and embeddable self-contained islands. But the latter is already handled by iframes (the behavior being your latter case). So we only need to do the former.
> These are all solvable issues with fairly obvious solutions.
No, they are a can of worms and decades of arguments and incompatibilities and versioning
> But the latter is already handled by iframes
iframes don't handle this case because the page can not adjust to the iframe's content. There have been proposals to fix this but they always run into issues.
https://github.com/domenic/cooperatively-sized-iframes/issue...
The include logic of include2.html missing everything else would also apply to all other includes.
If a user clicked a link with src="include.css" then it'll be rubbish.
It would be good for static data.. images, css, and static html content.
There is an open issue about this at WHATWG (also mentioned in the comment section of the blog post):
Client side include feature for HTML
https://github.com/whatwg/html/issues/2791
So, HTML did have includes and they fell out of favor.
The actual term include is an XML feature and it’s that feature the article is hoping for. HTML had an alternate approach that came into existence before XML. That approach was frames. Frames did much more than XML includes and so HTML never gained that feature. Frames lost favor due to misuse, security, accessibility, and variety of other concerns.
Unlike Framesets I think XML includes were never really supported in many browsers (or even any major browsers)?
I still like to use them occasionally but it incurs a "compilation" step to evaluate them prior to handing the result of this compilation to the users/browsers.
As it happens, the major browsers still can do XML 'includes' to some extent, since by some miracle they haven't torn out their support for XSLT 1.0. E.g. this outputs "FizzBuzz" on Firefox:
You can even use XSLT for HTML5 output, if you're careful. But YMMV with which XML processors will support stylesheets.Yep, and this can be used to e.g. make a basically static site template and then do an include for `userdata.xml` to decorate your page with the logged in user's info (e.g. on HN, adding your username in the top right, highlighting your comments and showing the edit/delete buttons, etc.). You can for example include into a variable `<xsl:variable name="myinfo" select="document('userdata.xml')"/>` and then use it in xpath expressions like `$myinfo/user/@id`. Extremely simple, good for caching, lightweight, very high performance. Easy to fail gracefully to the logged out template. You basically get your data "API" for free since you're returning XML in your data model. I will never understand why it didn't take off.
XML includes are blocking because XSL support hasn't been updated for 25 years, but there's no reason why we couldn't have it async by now if resources were devoted to this instead of webusb etc.
> if resources were devoted to this
You'd better not jinx it: XSL support seems like just the sort of thing browser devs would want to tear out in the name of reducing attack surface. They already dislike the better-known SVG and never add any new features to it. I often worry that the status quo persists only because they haven't really thought about it in the last 20 years.
Fortunately, XSLT is used by far too many high-importance websites (e.g. official government legal sites) for removing it to be a real threat.
> I will never understand why it didn't take off.
I’ve used XSLT in anger - I used it to build Excel worksheets (in XML format) using libXSLT. I found it very verbose and hard to read. And Xpath is pretty torturous.
I wish I could have used Javascript. I wish Office objects were halfway as easy to compose as the DOM. I know a lot of people hate on Javascript and the DOM, but it’s way easier to work with than the alternatives.
XQuery is basically XSLT with saner syntax.
Nice, didn't think of that approach and It should work very well for the purposes of static headers and footers.
This is the closest we can do today:
Scripts will replace their tags with html producing a clean source, not pretty but it works on the clientYou could even have a server wrap static html resources in that js if you request them a certain way, like
<script src="/include/footer.html">
For /footer.html
But then you probably might as well use server side includes
Pretty sure it is possible without JavaScript, too.
I know it’s not straight HTML, but SSI (server side includes) helped with this and back in the day made for some incredibly powerful caching solutions. You could write out chunks of your site statically and periodically refresh them in the server side, while benefitting from serving static content to your users. (This was in the pre varnish era, and before everyone was using memcached)
I personally used this to great success on a couple of Premier League football club websites around the mid 2000s.
One benefit of doing it on the client is the client can cache the result of an include. So for example, instead of having to download the content of a header and footer for every page, it is just downloaded once and re-usef for future pages
It’s amazing how people vociferously argue against this. If it was implemented we would be arguing over something else.
How big are your headers and footers, really? If caching them is worth the extra complexity on the client plus all the pain of cache invalidation (and the two extra requests in the non-cached case).
I’m willing to bet the runtime overhead of assembly on the client is going to be larger than the download cost of the fragments being included server or edge side and cached
If you measure download cost in time then sure.. If you measure download cost in terms of bytes downloaded, or server costs, then nope. The cost would be smaller to cache.
Not necessarily, compression is really effective at reducing downloaded bytes
In server terms the overhead of tracking one download is going to be less that the overhead of tracking the download of the multiple components
And for client side caching to be any use then a visitor would need to view more than one page and the harsh reality is many sessions are only one page long e.g. news sites, blogs etc
I'm a full stack developer. I do server side rendering. I agree that this is a 'solved problem' for that case. However there are many times I don't want to run a server or a static site generator. I manage a lot of projects. I don't want more build steps than necessary. I just want to put some HTML on the net with some basic includes, without JavaScript. But currently I would go the web component route and accept the extra JS.
This is just my own understanding, but doesn't a webpage consist of a bunch of nodes, which can be combined in any way. And an html document is supposed to be a complete set of nodes, so a combination of those won't be a single document anymore.
Nodes can be addressed individually, but a document is the proportion for transmission containing also metadata. You can combined nodes as you like, but you can't really combined two already packed and annotated documents of nodes.
So I would say it is more due a semantic meaning. I think there was also the idea of requesting arbitrary sets of nodes, but that was never developed and with the shift away from a semantic document, it didn't make sense anymore.
I think the quickest way to say it is that there is only one head on a page, and every HTML file needs a head. So if you include one into the other, you either have two heads, or the inner document didn't have a head.
They can just be html chunks. No need to make sense on their own.
Maybe a single tag that points at an url to load if someone attempts to load the chunk directly.
See DofumentFragment - sounds a lot like this: https://developer.mozilla.org/en-US/docs/Web/API/DocumentFra...
> a webpage consist of a bunch of nodes, which can be combined in any way
More or less, but manipulating the nodes requires JavaScript, which some people would like to avoid.
I wasn't talking about the nodes in the DOM. I meant the minimal annotated information snippets, that the WWW is supposed to consist of, as opposed to the minimum addressable units.
At least some of the blame here is the bias towards HTML being something that is dynamic code generated, as opposed to something that is statically handwritten by many people.
There are features that would be good for the latter that have been removed. For example, if you need to embed HTML code examples, you can use the <xmp> tag, which makes it so you don't need to encode escapes. Sadly, the HTML5 spec is trying to obsolete the <xmp> tag even though it's the only way to make this work. All browsers seem to be supporting it anyways, but once it is removed you will always have to encode the examples.
HTML spec developers should be more careful to consider people hand coding HTML when designing specifications, or at least decisions that will require JavaScript to accomplish something it probably shouldn't be needed for.
It's the other way around, HTML was designed to be hand written, and the feature set was defined at that stage. If it ended up being dynamically generated, that happened after the feature set was defined.
There used to be a thing for this
https://caniuse.com/imports
No HTML imports was an idea of using the HTML document format to encapsulate the 3 distinct data types needed for custom elements:
- JS for functionality via the custom elements API - HTML for layout via <template> tags. - CSS for aesthetics via <style> tags.
Not for just quickly and simply inserting the contents of header.html at a specific location in the DOM.
Says "superseded by ES modules". Not really the same thing, right?
The article asks about includes but also about imports ("HTML cannot import HTML ") which this is very directly.
This feature was billed as #includes for the web [1]. No, it acts nothing like an #include. TBH I don't see why ES modules are a "replacement" here.
Personally I would like to see something like these imports come back, as a way to reuse HTML structure across pages, BUT purely declaratively (no JS needed).
#includes where partially formed HTML (ie, header.html has a <body> open tag and footer.html has the closing tag) isn't very DOM compatible.
[1] https://web.archive.org/web/20181121181125/https://www.html5...
Chad QQ and UC browsers, the only ones still supporting HTML imports lmao. I've never heard of them before but I like the cut of their jib.
You can get JS-free, client-side include functionality if you're willing to wrap your HTML in XML. Here is a demo:
https://github.com/Evidlo/xsl-website
I don't think you even need to wrap it, really. You need to make sure it's valid XML, but the root element could be <html> just fine. And then use an identity transform with <xsl:output method="html">.
That's interesting, thanks.
How well supported is XSLT in modern browsers? What would be the drawbacks of using this approach for a modern website?
The reason is simple, HTML is not a hypertext markup language. Markup is the process of adding commentary and other information on top of an existing document, and HTML is ironically incapable of doing the one thing it most definitely should be able to do.
It's so bad, that if you want to discuss the markup hypertext (I.E. putting notes on top of an existing read only text files, etc.) you'll have to Google the word "annotation" to even start to get close.
Along with C macros, Case Sensitivity, Null terminated strings, unauthenticated email, ambient authority operating systems, HTML is one of the major mistakes of computing.
We should have had the Memex at least a decade ago, and we've got this crap instead. 8(
They didn't mention frameset (https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/...).
Kind of serious question. Do we have any alternatives to html? If not, why? It’s essentially all html. Yes, browser will render svg/pdf/md and so on, but as far as I can tell, it’s not what I consider "real web" (links to other documents, support for styling, shared resources, scripting, and so on ).
I would have loved for there to be a json based format, or perhaps yaml, as an alternative to the xml- based stuff we have today.
Fun fact: this does work with iframes:
The important part is that the target iframe must have a `name` attribute (not identified by `id`.) I guess, this is a legacy of framesets & frames.(Of course, this has all the issues of framesets, as in deep linking, accessibility, etc.)
The worst part of frames is scrolling.
You have to give an iframe a specific height in pixels. There is no “make this iframe the height its content wants to be (like normal HTML).
This leads to two options:
- your page has nested vertical scroll bars (awful UX) - you have to write JavaScript inside and outside the frame to constantly measure and communicate how tall the frame wants to be.
Or you could just not use frames.
I guess, the best you could do is emulating a frameset layout with a fixed navigation and a display frame for the actual content. (By setting the overflow to `hidden` you can get rid of the outer scrollbars.)
If I really need HTML includes for some reason, I'd reach for XSLT. I know its old, and barely maintained at best, but that was the layer intentionally added to add programming language features to the markup language that is HTML.
I believe XSLT 1 is still working in all major browsers today. Here's a simple HTML 5 example with two pages sharing a header template: https://gist.github.com/MarkTiedemann/0e6d36c337159a3e6d5072...
My main gripe is a decade(s?) old Firefox bug related to rendering an HTML string to the DOM.
That may be a fairly specific use case though, and largely it still works great today. I've done a few side projects with XSLT and web components for interactivity, worked great.
What bug specifically?
Couldn't find a good link earlier, guess I didn't have quite the right keywords for search.
Here we go, looks like its 17 years old now:
https://bugzilla.mozilla.org/show_bug.cgi?id=98168#c99
This bug is specifically about <xsl:text disable-output-escaping="yes"> not working in Firefox. How is disabling output escaping relevant in regards to sharing templates between pages?
from the linked thread:
> The only combination that fails to render these entities correctly is Firefox/XSLT.
Which is one good reason not to adopt XSLT to implement HTML includes. You just don't know what snags you'll hit upon but you can be sure you'll be on your own.
> Bug 98168 (doe) Opened 24 years ago Updated 21 days ago
Well it does look like someone's still mulling over whether and how to fix it... 24 years later...
I think XSLT is still a reasonable technology in itself - the lack of updated implementations is the bad part. I think modern browsers only support 1.0 (?). At least most modern programming languages should have 3.0 support.
Firefox has a very old bug related to rendering an HTML string to the DOM without escaping it, that one has bit me a few times. Nothing a tiny inline script can't fix, but its frustrating to have such a basic feature fail.
Debugging is also pretty painful, or I at least haven't found a good dev setup for it.
That said, I'm happy to reach for XSLT when it makes sense. Its pretty amazing what can be done with such an old tech, for the core use case of props and templates to HTML you really don't need react.
If you want to include HTML sandboxes, we have iframes. If you want it served from the server, it's just text. Putting text A inside text B is a solved problem.
> Putting text A inside text B is a solved problem.
Yes, but in regards to HTML it hasn't been solved in a standard way, it's been solved in hundreds, if not thousands of non standard ways. The point of the article is that having one standard way wlcould reduce a lot of complexity from the ecosystem, as ES6 imports did.
The article references both of these methods with explanations of why they don't feel they answer the question posed.
> We’ve got <iframe>, which technically is a pure HTML solution, but they are bad for overall performance, accessibility, and generally extremely awkward here
What does this mean? This is a pure HTML solution, not just "technically" but in reality. (And before iframe there were frames and frameset). Just because the author doesn't like them don't make them non-existent.
What do you mean what does it mean?
An iframe is a window into another webpage, and is bounded as such both visually and in terms of DOM interfaces. A simple example would be that an iframe header can't have drop-down menus that overlap content from the page hosting it.
They are categorically not the same DX/UX as SSI et al. and it's absolutely bizarre to me that there's so many comments making this complaint.
The real problem with iframes is that their size is set by the parent document only.
They would be a lot more useful if we could write e.g. <iframe src=abc.html height=auto width=100> so the height of the iframe element is set by the abc.html document instead of the parent document.
You could do this with js in the child document, if its important to keep js out of the parent.
no, not if it is cross site URL.
Then you need a postMessage to send body size to parent frame which then needs to listen for messages and resize the iframe element.
Totally! I thought we were talking about the same site case.
You can achieve that with js in the parent document.
You can achieve everything with JS in the parent document, it doesn’t mean it should be required or even recommended
No way. You can't make a decent single web page by iframing a bunch of components together.
I'm not an expert on this but IMO, from a language point of view, HTML is a markup language, it 'must' have no logic or processing. It is there to structure the information not to dynamically change it. Nor even to display it nicely.
The logic is performed elsewhere. If you were to have includes directly in HTML, it means that browsers must implement logic for HTML. So it is not 'just' a parser anymore.
Imagine for example that I create an infinite loop of includes, who is responsible to limit me? How to ensure that all other browsers implement it in the same way?
What happens if I perform an injection from another website? Then we start to have cors policy management to write. (iframes were bad for this)
Now imagine using Javascript I inject an include somewhere, should the website reload in some way? So we have a dynamic DOM in HTML?
> from a language point of view, HTML is a markup language, it 'must' have no logic or processing.
Client-side includes are not "processing". HTML already has frames and iframes which do this, just in a worse way, so we'd be better off.
I understand your point, but I still think it is bad from the point of view of language paradigms[1]. Iframes should have not been created in the first place.. You are changing the purpose of the language while it was not made for it.
(yes in my view I interpret includes as a basic procedure)
[1] http://www.info.ucl.ac.be/people/PVR/paradigmsDIAGRAMeng201....
There is nothing procedural about includes. In fact, it first appeared before HTML even became popular in Xanadu, called transclusion:
https://en.wikipedia.org/wiki/Transclusion
> an infinite loop of includes
We can probably copy the specs for <frameset> and deal with it the same way:
https://www.w3.org/TR/WD-frames-970331#:~:text=Infinite%20Re...
> How to ensure that all other browsers implement it in the same way?Browsers that don't implement the specs will eventually break:
https://bugzilla.mozilla.org/show_bug.cgi?id=8065
There is a very, very broad line in that "no logic or processing". HTML/CSS already do a lot of logic and processing. And many "markup languages" have include support. Like wikitext used in wikipedia and includes in Asciidoc.
I feel like most of your answer is invalidated by
* the actual existence of frames (although those are deprecated)
* iframes (which are not deprecated, so seemingly doing declarative inclusion of HTML in HTML was not what was wrong with frames)
* imports in CSS, which share some of the same problems / concerns as HTML imports
* the existence of JavaScript with its ability to change anything on the page, including the ability to issue HTTP requests and be written arbitrarily obfuscated ways.
We have the object tag, don't we? Is there anything wrong with it?
https://www.w3.org/TR/WD-html40-970708/struct/includes.html#...
You can't include a menu like this. Clicking on a link in a menu included like this won't work.
I 100% agree with the sentiment of this article. For my personal website, I write pretty much every page by hand, and I have a header and a footer on most of those pages. I certainly don't want to have to update every single page everytime I want to add a new navigation button to the top of the page. For a while I used PHP, but I was running a PHP server literally for only this feature. I eventually switched to JavaScript, but likewise, on a majority of my pages, this was the only JavaScript I had, and I wanted to have a "pure" HTML page for a multitude of reasons.
In the end, I settled on using a Caddy directive to do it. It still feels like a tacked on solution, but this is about as pure as I can get to just automatically "pasting" in the code, as described in the article.
I'd say in 80% of the cases a pure, static html include is not enough. In a menu include, you want to disable the link to the currently shown page or show a page specific breadcrumb. In a footer include, you may want a dynamic "last updated" timestamp or the current year in the copyright notice. As all these use cases required a server-side scripting language anyway, there was no push behind an html include.
> In a menu include, you want to disable the link to the currently shown page
I’ve always just styled the link to the current page differently, not disabled it, which you can do with an id on the page and a line of CSS.
Initially HTML was less about the presentation layer and more about the "document" concept. Documents should be self-contained, outside of references to other documents.
One document == one HTML page was never the idea. Documents are often way too long to comfortably read and navigate that way. Breaking them into sections and linking between them was part of the core idea of HTML.
Includes are a standard part of many document systems. Headers and footers are a perfect example - if I update a document I certainly don't want to update the document revision number on every single page! It also allows you to add navigation between documents in a way that is easy to maintain.
LaTeX can do it. Microsoft Word can do it (in a typically horrible Microsoftian way). Why not HTML?
I still think this is the best web. Either you are a collection of interlinked documents and forms (manual pages, wiki,...), or you are a full application (figma, gmail, google docs). But a lot of sites are trying to be both. And somes are trying to be one while they are the other type.
> Our developer brains scream at us to ensure that we’re not copying the exact code three times, we’re creating the header once then “including” it on the three (or a thousand) other pages.
Interesting, my brain is not this way: I want to send a minimum number of files per link requested. I don't care if I include the same text because the web is generally slow and it's generally caused by a zillion files sent and a ton of JS.
We discussed this back when creating web components, but the focus quickly became about SPA applications instead of MPAs and the demand for features like this was low in that space.
I wish I would have advocated more for it though. I think it would be pretty easy to add using a new attribute on <script> since the parser already pauses there, so making something like <script transclude={url}> would likely not be too difficult.
My first ever website I wrote with mod_include and .shtml - updating a website was just adding a few tags.
Also I miss framesets - with that a proper sidebar navigation was easily possible.
Same here.
I’m not saying my first website was impressive — but as a programmer there’s no way I was copying and pasting the same header / footer stuff into each page and quickly found “shtml” and used that as much as possible.
Then used the integrated FTP support in whatever editor it was (“HTML-kit” I think it was called?) - to upload it straight to prod. Like a true professional cowboy.
On topic: what's the absolute minimal static site generator that can achieve this feature? I know things like Pelican can do it but it's pretty heavy. C preprocessor probably can be used for this...
Probably m4 aswell, but the syntax isn't really pretty by todays standards
This seems to be forgetting the need to render other site's content. That's the main reason for iframes to be used, as people need to render ads, email previews, games, and so forth, without potentially breaking the rest of the page.
The "extremely awkward" aspect they complain about is a side effect of needing to handle that case.
You could add some nicer way to include content for the same domain, but I suspect having two highly similar HTML features would be fairly awkward in practice, as you'd have to create a whole new set of security rules for it.
We used to have this in the form of a pair of HTML tags: <frameset> and <frame> (not to be confused with the totally separate <iframe>!). <frameset> provided the scaffolding with slots for multiple frames, letting you easily create a page made up entirely of subpages. It was once popular and, in many ways, worked quite neatly. It let you define static elements once entirely client-side (and without JS!), and reload only the necessary parts of the page - long before AJAX was a thing. You could even update multiple frames at once when needed.
From what I remember, the main problem was that it broke URLs: you could only link to the initial state of the page, and navigating around the site wouldn't update the address bar - so deep linking wasn’t possible (early JavaScript SPA frameworks had the same issue, BTW). Another related problem was that each subframe had to be a full HTML document, so they did have their own individual URLs. These would get indexed by search engines, and users could end up on isolated subframe documents without the surrounding context the site creator intended - like just the footer, or the article content without any navigation.
For HTML-in-HTML natively I use iframes and still remember how to use Frames & framesets.
Like MHTML/MHT/MIME HTML files? Browsers could load *.mht files that included everything. IE, Firefox, Chrome all used to support it.
In the nineties we fixed it with frames or CGI. I still think of it as one of those “if it was fiction it would be unrealistic” things (although, who writes fictional markup standards?)
SHTML used to be a thing back in the 1990s: https://en.wiktionary.org/wiki/SHTML
Or, better, "Server Side Includes" (SSI): https://en.wikipedia.org/wiki/Server_Side_Includes
SSI is still a thing: I use it on my personal website. It isn't really part of the HTML, though: it's a server-dependent extension to HTML. It's supported by Apache and nginx, but not by every server, so you have to have control over the server stack, not just access to the documents.
I made this to get around pages being cached at CDN level, but still needing to get live data...
https://github.com/jasoncartwright/clientsideinclude
My guess: no-one needs it.
Originally, iframe were the solution, like the posts mentions. By the time iframes became unfashionable, nobody was writing HTML with their bare hands anymore. Since then, people use a myriad of other tools and, as also mentioned, they all have a way to fix this.
So the only group who would benefit from a better iframe is the group of people who don't use any tools and write their HTML with their bare hands in 2025. That is an astonishing small group. Even if you use a script to convert markdown files to blog posts, you already fall outside of it.
No-one needs it, so the iframe does not get reinvented.
No, originally frameset[0] and frame[1] were the solution to this problem. I remember building a website in the late 1990s with frameset. iframe came later, and basically allowed you to do frames without the frameset. Anyway, frameset is also the reason every browser's user agent starts with "Mozilla".
[0] https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/...
[1] https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/...
Originally my footers and navbars were included with server side includes
what if it could be a larger group though? modern css has been advancing rather rapidly... I don't even need a preprocessing library any more... I've got nested rules, variables, even some light data handling... why not start beefing up html too? we've got some new features but includes would be killer
I think it’s because it would be so easy to make a recursive page that includes itself forever. So you have to have rules when it’s okay, and that’s more complex and opaque than just programming it yourself.
It's a pity, of all web resources advancements, js, css, runtimes, web engines. HTML was the most stagnant aspect of it, despite the "HTML5" effing hype. My guess is they did not want to empower HTML and threaten SSR's, or solutions. I believe the bigest concern of not making a step is the damned backward compatibility. Some just wont budge to move.
HTML5 hype started strong out of the gate because of the video and audio tags, and canvas slightly after. Those HTML tags were worth the hype.
Flash's reputation was quite low at the time and people were ready to finally move on from plugins being required on the web. (Though the "battle" then shifted to open vs. closed codecs.)
I too lamented the loss of HTML imports and ended up coming up with my own JavaScript library for it.
https://miragecraft.com/blog/replacing-html-imports
At the end of the day it’s not something trivial to implement at the HTML spec/parser level.
For relative links, how should the page doing the import handle them?
Do nothing and let it break, convert to absolute links, or remap it as a new relative link?
Should the include be done synchronously or asynchronously?
The big benefit of traditional server side includes is that its synchronous, thus simplifying logic for in-page JavaScript, but all browsers are trying to eliminate synchronous calls for speed, it’s hard to see them agreeing to add a new synchronous bottleneck.
Should it be CORS restricted? If it is then it blocks offline use (file:// protocol) which really kills its utility.
There are a lot of hurdles to it and it’s hard to get people to agree on the exact implementation, it might be best to leave it to JavaScript libraries.
Someone else made the same - https://github.com/Paul-Browne/HTMLInclude - but it's not been updated in 7 years, leaving questions. I'll try yours and theirs in due course. Err, and the fragment @HumanOstrich said elsewhere in comments.
My guess is that some or maybe all of your concerns should have been solved by CSS @import (https://developer.mozilla.org/en-US/docs/Web/CSS/@import) although, as I'm reading the first few lines of the linked article, those must appear near the top of a CSS file, so are significantly more restricted than an import that can appear in the middle of a document.
So glad I decided early in my career to not do webpages. Look how much discussion this minor feature has generated. I did make infra tools that outputted basic html, get post cgi type of stuff. What's funny is this stuff was deployed right before AWS was launched and a year later the on prem infra was sold and the warehouse services were moved to the cloud.
You and me both. I did some web dev back in the early days, and noped out when IE was dragging everyone down with its refusal to change. I have never had a reason to regret that decision.
honestly, html can include css and javascript via link and style tags. there's no reason for it to not have an <include src="" /> tag, and let the browser parsing it fetch the content to replace it.
Chris is an absolute legend in this space and I’m so glad he’s bringing this up. I feel like he might actually have pull here and start good discussions that might have actual solutions.
HTML frames solved this problems just fine, but they were deprecated in favour of using AJAX to replace portions of the body as you navigate (e.g.: SPAs).
I still feel like frames were great for their use case.
We know why HTML alone can't do includes! In https://github.com/whatwg/html/issues/2791 the standards committee discussed this. The issue has been open for years.
The first naysayer was @dominic: https://github.com/whatwg/html/issues/2791#issuecomment-3113...
> I don't think we should do this. The user experience is much better if such inclusion is done server-side ahead of time, instead of at runtime. Otherwise, you can emulate it with JavaScript, if you value developer convenience more than user experience.
The "user experience" problem he's describing is a performance problem, adding an extra round-trip to the server to fetch the included HTML. If you request "/blog/article.html", and the article includes "/blog/header.html", you'll have to do another request to the server to fetch the header.
It would also prevent streaming parsing and rendering, where the browser can parse and render HTML bit-by-bit as it streams in from the server.
Before you say, "so, what's the big deal with adding another round trip and breaking the streaming parser?" go ahead and read through the hundreds of comments on that thread. "What's the big deal" has not convinced browser devs for at least eight years, so, pick another argument.
I think there is a narrow opening, where some noble volunteer would spec out a streaming document-fragment parser.
It would involve a lot of complicated technical specification detail. I know a thing or two about browser implementation and specification writing, and designing a streaming document-fragment parser is far, far beyond my ken.
But, if you care about this, that's where you'd start. Good luck!
P.S. There is another option available to you: it is kinda possible to do client-side includes using a service worker. A service worker is a client-side proxy server that the browser will talk to when requesting documents; the service worker can fetch document fragments and merge them together (even streaming fragments!) with just a bit of JS.
But that option kinda sucks as a developer experience, because the service worker doesn't work the first time a user visits your site, so you'd have to implement server-side includes and also serve up document fragments, just for second-time visitors who already have the header cached.
Still, if all you want is to return a fast cached header while the body of your page loads, service workers are a fantastic solution to that problem.
I bet a lot of other features were implemented without as much forethought and we were ok.
I desperately want to back to crafting sites by hand and not reach for react/vue as a default. I do a lot of static and tempory sites that do very little
Use HTMX.
this
(and this https://harcstack.org)
Astro is all the goodness of components but for static sites.
I think the authors of htmx have the same questions :)
this!
(and this https://harcstack.org)
I would guess back in the days having extra requests was expensive, thus discouraged. Later there were attempts via xinclude, but by then PHP and similar took over or people tolerated frames.
SVG use element can do exactly what the OP desires. SVGs can be inlined in html and html can be inlined in SVG too. I never understand why web devs learn html and then stop there instead of also learning svg which looks just like html, but with a lot more power.
Now you can include HTML in HTML, see https://include.franzai.com/ - a quick Chrome Polyfill based on the discussion here. MIT License
Github: https://github.com/franzenzenhofer/html-include-polyfill-ext...
SHOW HN: https://news.ycombinator.com/item?id=43881815
I still use server side includes. It is absolutely the best ratio of templating power to attack surface. SSI basically hasn't changed in the last 20 years and is solid in apache, nginx, etc. You can avoid all the static site generator stuff and just write pure .html files.
It should not have gone away. It never did for me.
Also, this is kind of what 'frames' were and how they were used. Everything old is new again.
When I read all the arguments against, I think: “Perfect is the enemy of good.”
just on point at all why includ this stuff to load more at once... the hole web stuff works since cgi implementations at web services on asyncronity to load just what u actually need... like now most of them are fetch or xhr calls... I mean it makes just sence for onepager to keep the markup a bit more structured... but why u want to make static rendered homepages those days?
at all ...why includ this stuff to load more at once... the hole web stuff works since cgi implementations at web services on asyncronity to load just what u actually need... like now most of them are fetch or xhr calls... I mean it makes just sence for onepager to keep the markup a bit more structured... but why u want to make static rendered homepages those days?
Do remote entity references still work in XHTML? XML had its issues but did have a decent toolbox of powerful if not insecure primitives.
HTML does have frames and iframes, which can accomplish some of the same goals.
it is mentioned in the article indeed; it's an awful solution that is poor in performance and break the accessibility
Thanks. I was reading too fast and missed the iframe reference in the article
Why would the performance be any better with another tag?
A frame is a separate rendering context—it's (almost) as heavyweight as a new tab. The author wants to insert content from another file directly into the existing DOM, merging the two documents completely.
Negligible twenty years ago. But yes, if there's an improvement it should be merged automatically into the same document.
Seems everyone forgot HTML-SSI which worked something like this. Many servers and hosting websites of the 90s supported it.
Because how would the browser decide it's in a fetch loop?
The simplest answer is that HTML wasn't designed as a presentation language, but a hypertext document language. CSS and Javascripts were add-ons after the fact. Images weren't even in the first version. Once usage of the web grew beyond the initial vision, solutions like server-side includes and server-side languages that rendered HTML were sufficient.
I think the best examples of HTML in that regard is HTML-rendered info pages[0], for Emacs and its ecosystem. Then you have the same content presented in HTML [1]. Templates were enough in the first case. Includes are better in the second case due to common assets external to the content.
[0]: https://www.gnu.org/software/emacs/manual/html_node/emacs/in...
[1]: https://emacsdocs.org/docs/emacs/The-Emacs-Editor
This has always worked for me. Pretty much the ask? https://gist.github.com/sreekotay/08f9dfcd7553abb8f1bb17375d...
That's the first thing listed in the article? "Javascript to go fetch the HTML and insert it". What they're after is something that's _just_ HTML and not another language.
While you do need a server i think this is the functional equivalent? The fetch JS and insert outlined (linked to) in the article is async. This blocks execution like you'd expect an HTML include to do. It's WAY easier to reason about - which is why the initial ask, I think...
The <object> tag appears to include/embed other html pages.
An embedded HTML page:
<object data="snippet.html" width="500" height="200"></object>
https://www.w3schools.com/tags/tag_object.asp
But you need a server for that to work.
you need a server for HTML to work, as practical matter. But yes. There IS a workaround to that too, if you're REALLY determined, but you have to format your HTML a giant JS comment block (lol really :))
[edit: I'm sure there are still some file:// workflows for docs - and yes this doesn't address that]
You don't need a server for HTML to work, I can just hand you a USB stick/floppy disk/MO disk for your NeXT with HTML files on it.
( •_•) ( •_•)>⌐■-■ (⌐■_■) Deal with it.
:)
Have you ever heard of Server Side Includes?
https://en.wikipedia.org/wiki/Server_Side_Includes
After researching this very topic earlier; SSI is the most pragmatic solution. Check out Caddy's Template Language (based on Go), it is quite capable and quite similar to building themes in Hugo. Just much more bare bones.
I have built several sites with pure HTML+CSS, sprinkled with some light SSI with Caddy, and it is rock solid and very performant!
That's mentioned in TFA under "old school web server directives".
Seems like this would help with caching, too.
Lots of rationalization in here—it's always been needed. I complained about the lack of <include src="..."> when building my first site in '94/95, with simpletext and/or notepad!
It was not in the early spec, and seems someone powerful wouldn't allow it in later. So everyone else made work arounds, in any way they could. Resulting in the need being lessened quite a bit.
My current best workaround is the <object data=".."> tag, which has a few better defaults than iframe. If you put a link to the same stylesheet in the include file it will match pretty well. Size with width=100%, though with height you'll need to eyeball or use javascript.
Or, Javascript can also hoist the elements to the document level if you really need. Sample code at this site: https://www.filamentgroup.com/lab/html-includes/
Some people are too smart for their own, and ours, good.
just use react or nextjs or whatever and move on jeez
we had no problem using <object> for headers and footers
in the meantime, you can use <html-include> https://www.npmjs.com/package/html-include-element
I think this is a genuinely good question that I was also wondering some time ago.
And it is a genuinely good question!
I think the answer of PD says feels the truest.
JS/CSS with all its bureaucracy are nothing compared to HTML it seems. Maybe people don't find nothing wrong with Html, maybe if they do, they just reach out for js/css and try to fix html (ahem frontend frameworks).
That being said, I have just regurgitated what PD says has said and I give him full credit of that but I am also genuinely confused as to why I have heard that JS / CSS are bureaucratic (I remember that there was this fireship video of types being added in JS and I think I had watched it atleast 1 year ago (can be wrong) but I haven't heard anything for it and I see a lot of JS proposals just stuck from my observation
And yet HTML is such level of bureaucratic that the answer to why HTML doesn't have a feature is because of its bureaucracy. Maybe someone can explain the history of it and why?
Iframes, while not perfect, are pretty close though...
Making iframes be the right size is super awkward. I might actually use them more if they were easy to get responsive.
This post does link to a technique (new to me) to extract iframe contents:
I've come across this technique here [0] to try it on <object> elements, but sizing is even more difficult there.
[0]: https://www.filamentgroup.com/lab/html-includes/
Are we solving the information-centric transclusion problem, or the design-centric asset reuse problem? An iframe is fine for the former but is not geared towards design and layout solutions.
It kinda sucks for both! Dropping in a box of text that flatly does not resize to fit its contents does not fit the definition of "fine" for me, here.
You can do some really silly maneuvers with `window.postMessage` to communicate an expected size between the parent and frame on resize, but that's expensive and fiddly.
Iframes fundamentally encapsulate html documents, not fragments.
Interaction between elements in different iframes is very restricted.
IIRC, you can communicate entire JSON objects between an iframe and it's host frame with PostMessage.
The host can then act as a server for the iframe client, even updating it's state or DOM in response to a message from the iframe.
Because it's HyperText, the main idea is that you link to other content, so this is not a weird feature that is being asked for, it's just a different way of doing the whole raison d'etre of the tech. In fact the tag to link stuff is the <a> tag. It just so happens that it makes you load the other "page", instead of transcluding content, the idea is that you load it.
It wouldn't make sense to transclude the article about the United States in the article about Wyoming (and in fact modern wikipedia shows a pop up bubble doing a partial transclusion, but would benefit in no way from basic html transclusion.)
It's a simple idea. But of course modern HTML is not at all what HTML was designed to be, but that's the canonical answer.
The elders of HTML would just tell you to make an <a> link to whatever you wanted to transclude instead. Be it a "footer/header/table of contents" or another encylcopdic article, or whatever. Because that's how HTML works, and not the way you suggest.
Think of what would happen if it were the case, you would transclude page A, which transcludes page B, and so with page C, possibly recursively transcluding page B and so. You would transform the User Agent (browser) into a whole WWW crawler!
It's because HTML is pass by reference, not pass by copy.
iFrames have a src and includes other html.. We used to make sites with it way back.
the web platform is the tech stack version of the human concept of "failing upward". It sucks but will only get more and more vital in the modern tech scene as time goes by.
Honest answer: because any serious efforts to improve HTML died 20 years ago, and the web as it's envisaged today is not an infinite library of the worlds knowledge but instead a JavaScript based and platform.
Asking for things that the W3C had specced out in 2006 for XML tech is just not reasonable if it doesn't facilitate clicks.
iframe is html
[dead]
FTA
> We’ve got <iframe>, which technically is a pure HTML solution, but
And then on the following paragraph..
> But none of the solutions is HTML
> None of these are a straightforward HTML tag
Not sure what the point is. Maybe just complaining
<iframe> is different from what the author is asking for, it has its own DOM and etc. He wants something like an SSI but client side. He explains some of the problems right after the part you cut off above
"We’ve got <iframe>, which technically is a pure HTML solution, but they are bad for overall performance, accessibility, and generally extremely awkward here"
Iframe is stuck in a rectangular box. It's not really suitable for things like site wide headers, footers and menus.
While I get your point, headers and footers and menus tend to all live within rectangular boxes.
Headers and their menus are often problematic for this approach, unless they are 100% static (e.g. HN would work but Reddit and Google wouldn't since they both put things in their header which can expand over the content). I.e. you can make it transparent but that doesn't solve eating the interactions. The code needed to work around that is more than just using JS to do the imports.
Headers and footers, yes. Menus generally need to expand when you interact with them, especially on mobile.
Do a drop down list of links on a header in an iframe
I guess for the similar reason that Markdown does not have any "include" ability -- it is a feature not useful enough yet with too many issues to deal with. They are really intended to be used as "single" documents.
Yeah people downvote me but can't bother to leave a comment.
If you disagree, and you think you are in the right, you probably have a somewhat good argument you can use in a reply.
The fact that you don't means my explanation makes sense.
You could start with something like this:
Well, that's not "HTML alone".
Could you use an object tag for this?
In one line
customElements.define("include", class extends HTMLElement { connectedCallback() { fetch(this.getAttribute("href")).then(x => x.text()).then(x => this.outerHTML = x) } })