This should be higher up and seems very relevant to understanding what's going on. Looks like the (former) maintainer does not actually want to abandon libxml2.
2 hours ago [-]
AndyKelley 5 hours ago [-]
If you think you need libxml2, think again. XML is a complex beast. Do you really need all those features? Maybe a much smaller, more easily maintained library would suit your needs while performing better at the same time!
For instance, consuming XML and creating it are two very different use cases. Zooming into consuming it, perhaps your input data has more guarantees than libxml2 assumes, such as the nonexistence of meta definition tags.
throw0101a 1 hours ago [-]
> Do you really need all those features?
"You" probably do not.
But different "yous" need different features, and so they get all glommed together into one big thing. So no one needs "all" of lbxml2/XML's features, each individual needs a different subset.
bartread 56 minutes ago [-]
It's the same as the old joke about Microsoft Word: people only use 10% of Word's functionality, but the problem is each person uses a different 10%.
Of course this is an oversimplification, and there will no doubt be some sort of long tail, but it expresses the challenge well. I'd imagine the same is true for many other reasonably complex libraries, frameworks, or applications.
remus 2 hours ago [-]
This process usually goes:
1. "This XML library is way bigger than what I need, I'll write something more minimal for my use case"
2. write a library for whatever minimal subset you need
3. crash report comes in, realise you missed off some feature x. Add support for some feature x.
4. Bob likes your library. So small, so elegant. He'd love to use it, if only you supported feature y, so you add support for feature y.
...
End result is x+1 big, complex XML libraries.
Obviously Im being a bit obtuse here because you might be able to guarantee some subset of it in whatever your specific circumstances are, but I think it's hard to do over a long period of time. If people think you're speaking XML then at some point they'll say "why don't we use this nice XML feature to add this new functionality".
mort96 5 hours ago [-]
I kinda want something which just treats XML as a dumb tree definition language... give me elements with attributes as string key/value pairs, and children as an array of elements. And have a serialiser in there as well, it shouldn't hurt.
Basically something behaves like your typical JSON parser and serialiser but for XML.
To my knowledge, this is what TinyXML2 does, and I've used TinyXML2 for this before to great effect.
cHaOs667 5 hours ago [-]
That's what you call a DOM Parser - the problem with them is, as they serialize all the elements into objects, bigger XML files tend to eat up all of your RAM. And this is where SAX2 parsers come into play where you define tree based callbacks to process the data.
mort96 5 hours ago [-]
The solution is simple: don't have XML files that are many gigabytes in size.
iberator 4 hours ago [-]
A lot of teleco stuff dumps multi-gb stuff of xml hourly. Per BTS. Processing few TB of XML files on one server daily
It's doable, just use the right tools and hacks :)
Processing schema-less or broken schema stuff is always hilarious.
Good times.
senorrib 48 minutes ago [-]
Lol I love the upbeat tone here. Helps me deal with my PTSD after working with XML files.
cHaOs667 5 hours ago [-]
Depending on the XML structure and the servers RAM - it can already happen while you approach 80-100 MB file sizes. And to be fair, in the Enterprise context, you are quite often not in a position to decide how big the export of another system is. But yes, back in 2010 we built preprocessing systems that checked XMLs and split them up in smaller chunks if they exceeded a certain size.
stuaxo 4 hours ago [-]
Some formats are this and they are historical formats.
lyu07282 5 hours ago [-]
Tell that to wikimedia, I've used libxml's SAX parser in the past to parse 80GB+ xml dumps.
jeroenhd 5 hours ago [-]
XML is used in countless standards. You can't just not use it if you interact with the outside world. Every XML feature is still in the many XML libraries because someone has a need for it, even things like external entities.
Maybe you don't need libxml2 specifically (good luck finding an alternative to parse XML in C and other such languages though), but "I don't like the complex side of XML so let's pretend it doesn't exist" doesn't solve the problem most people pick libxml2 for. It's the de-facto standard because it supports everything you could possibly need.
dontlaugh 4 hours ago [-]
Exactly. For example if you need to integrate SAML, you have to support a significant subset of several XML specs. It may be possible to write a SAML-only library that supports less, but it's not clear it would be any simpler.
lyu07282 5 hours ago [-]
You shouldn't be down voted, its just the truth no matter how unfortunate.
pferde 3 hours ago [-]
There is always libexpat, which works very well, also for the streaming case.
> <blink>Expat is UNDERSTAFFED and WITHOUT FUNDING.</blink>
> The following topics need additional skilled C developers to progress
> in a timely manner or at all (loosely ordered by descending priority):
pferde 2 hours ago [-]
Yep, another case of XKCD 2347, unfortunately.
EvanAnderson 5 hours ago [-]
Gratuitous use of XML does sometimes smell like a "now you have two problems" kind of affair.
fergie 4 hours ago [-]
Its a shame that xslt seems to be struggling so much at the moment. If xslt 3 support was fully implemented in libxml2 (and therefore xsltproc and browsers) then it would be by far the most sensible option for designing anything to do with getting text onto the web.
* XSLT is still the only native templating option for HTML pages that runs natively in the browser (but just now you are limited to XSLT v1.0 which as a number of drawbacks and limitations)
* XSLT/XML is still best at text markup. In particular interpolation. There is no simple way to represent marked up text in, say, JSON.
* Content federation (atom, rss) is still very dependent on XML.
Surely somebody somewhere has money to pay for a greybeard to fix XSLT for us? It seems far to fundamental to be left to wither on the vine.
omcnoe 3 hours ago [-]
Rather than struggling/withering, it's actively being killed. Efforts are underway to completely remove XSLT support from browsers, due to the poor state of libxml2 and a lack of any new maintainer stepping up.
fergie 19 minutes ago [-]
Right, but AKAIK its _still_ being maintained on a voluntary basis. Thats nuts, and its not clear why, say, Chrome or Firefox wouldn't want to take over XSLT/libsml2 development, particularly if they won market share from stuff like React, and created a developer acquisition pipeline for their respective ecosystems.
omcnoe 5 minutes ago [-]
They don’t want to because they don’t see any bright future for the technology even if it’s better maintained. XML/XSLT isn’t trendy anymore, nobody is building new apps on it. It is never going to win market share from react - it’s too baroque and dated.
I feel like it adds more weight to my feeling that we should have a software building code. When you have software that's critical infrastructure, with a nutso security policy like "no embargoes / 0day me bruh", we should have some regulations in place to require the software be maintained properly (that is to say, in a sane manner) or you can't use it commercially or for safety-critical things. Which would inevitably force commercial entities to pay for the maintenance so it could be done right.... which they should be doing already, the same way any company that builds safety-critical infrastructure has to pay to do it right.
If we want society to be safe, we have to make a law that enforces it. That's how that shit works.
(as an aside: holy shit, you're a prolific HN submitter, and all from different sources. where do you get it all?)
Snild 6 hours ago [-]
> we should have a software building code
This made my brain go
"Oh no, not this again. Open source projects don't owe you..." etc etc.
> or you can't use it commercially or for safety-critical things
Oh. Yeah, okay, absolutely! For safety-critical, I would like to think the responsibility already lies with the integrator/seller, but making it explicitly so can't hurt.
WJW 4 hours ago [-]
> or you can't use it commercially or for safety-critical things
The license for libxml2 (like the license for almost any kind of open source software) already states "THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT." I don't see how you can put the responsibility even more on the integrator/seller than that. It literally states the devs don't even guarantee it works correctly.
elcritch 6 hours ago [-]
Safety critical fields like aviation already have strict requirements. Usually there's very few software dependencies used in those projects.
Expanding that to more fields would be interesting, but difficult and expensive across the board. Particularly any sort of requirements like that generally incur significant regulatory and certification overhead.
However, if it was done similar to PCISS as an industry forum it might work better. Especially if certain fields like anything connecting with the electric grid we're required to use certified software.
pcdavid 5 hours ago [-]
Isn't this what the european Cyber Resilience Act (CRA) is about?
See https://orcwg.org/cra/ and the work of the Open Regulatory Compliance Working Group in general.
rcxdude 4 hours ago [-]
More or less, though the CRA is pretty minimal: it has a few basic requirements and hobby/unpaid open source software is not covered. A company integrating open source software is essentially responsible for covering those requirements themselves.
jeroenhd 4 hours ago [-]
The company being responsible for the open source components they integrate should solve the biggest dependency problems, though. From a security perspective, it doesn't really matter if a company fixes the bugs themselves or if they pay someone to fix it for them.
tinco 5 hours ago [-]
People building "safety critical" systems already pay for a "secure" ecosystem. It's called Microsoft. We don't need regulations to have Microsoft exist. Do you think some random med tech startup is going to pay to have libxml2 maintained? They'll see the regulation and go "oh ok, Windows licenses it is".
It's not the "safety critical" software that needs this fixed, it's all software in general. There's a million software systems that have important privacy sensitive data or safety relevant processes that fly under the "safety critical" radar.
thyristan 3 hours ago [-]
Read your Microsoft licensing agreement. If you don't have one, read the EULA for OEM windows. The warranty, fitness for purpose and damages exclusion is not as extensive as what the grandparent cited, but it basically boils down to "as limited as legally possible, and the most damages you will get is your license fee back". You also won't get a binding requirements document anyways, so you don't even really know what the software microsoft sells you is fit for. At any point in time, there could be some knowledgebase article saying something like "oh, and btw, don't do this because it breaks", so per their warranty agreement you signed they are free from any responsibility simply by documenting the problem.
Really safety-critical stuff like ASIL-D, ISO26262, IEC61508 (and tons of other magic numbers) isn't something you can buy from microsoft. At best, you can sometimes get a reseller to sign something a little more binding, but with tons of restrictions that basically boil down to "use the microsoft stuff for the readout gauges, but the critical control part goes somewhere else".
tinco 1 hours ago [-]
It's not about warranties, it's about having a stable ecosystem with some guaranteed measure of maintenance. The point is not that there's even more stable and expensive options than Microsoft. The point is that there's very little space for OSS here. Go to any hospital and count the amount of Windows devices and compare that to the amount of other operating systems you see. The second something becomes even a little safety oriented, there's going to be proprietary software.
So when these regulations that OP would start to take hold, would we get companies to sponsor random open source dependencies like libxml2? Or would they gather around some stable proprietary ecosystem like Microsoft's and maybe some big innovative solutions built on top of Microsoft?
darkamaul 6 hours ago [-]
Nick Wellnhofer is stepping away from libxml2 after a decade of unpaid maintenance. He’s forking it under the AGPL, but that will probably scare off most corporate users.
Meanwhile libxml2 is still everywhere. Without someone with real backing, a core piece of infrastructure is about to go unmaintained.
Once again, the open-source funding problem is laid bare: the internet runs on the unpaid evenings of a few people until they burn out (add relevant reference from XKCD, obviously).
ricardo81 44 minutes ago [-]
True. It'd be illuminating to know how far and wide it is used. It was always been my go-to library for parsing XML in a number of languages.
speed_spread 24 minutes ago [-]
"Expected effort required to maintain implementation" should be an evaluation criteria when selecting technologies. Thousand page RFCs do not make sustainable standards in the long run. Most committee designed specs end up in this category. People are impressed by complexity but actively pursuing simplicity is what we should be doing.
jeroenhd 5 hours ago [-]
With not enough time to develop an alternative and too many application ecosystems relying on this library, I think it's a matter of time before a large company forks the library to fix security issues with it now that they have no choice but to do the work themselves. At least until IBM and Google figure out a way to move away from this library.
moomin 2 hours ago [-]
Ironically, IBM and Google 100% could just pay for it to be maintained under current licensing. (But won't.)
Maybe my human interaction interfacing software has a glitch but I am having a hard time parsing this content. Do I detect a hint of sarcasm? Please add a '/s' at the end of your future posts to aid my very archaic and vintage brain matter.
yupyupyups 6 hours ago [-]
Jia Tan was the alias of the hacker(s) who infiltrated xz to plant a backdoor. He/They were in the project for 2 years I believe, and so had "significant experience" "maintaining" open source software.
ivolimmen 3 hours ago [-]
Thanks for the info; I read the news but did not remember the name of the person.
tsimionescu 6 hours ago [-]
"Jia Tan" was the name of the person (or group) who became a maintainer of libxz and sneaked in a vulnerability targeting OpenSSH.
mid-kid 4 hours ago [-]
*liblzma
rjh29 4 hours ago [-]
Maybe _my_ software has a glitch but was your comment also sarcastic? Be sure to add an /s next time...
ivolimmen 3 hours ago [-]
No I had a hard time understanding as I was not aware of the person in question, no sarcasm.
bombcar 3 hours ago [-]
All Internet comments are to be assumed sarcastic until proven otherwise. Bombcar’s law.
throw839393949 6 hours ago [-]
[flagged]
throw839393949 6 hours ago [-]
[flagged]
mort96 5 hours ago [-]
I don't think this is about money but about will.
throw839393949 5 hours ago [-]
> Since commercial users of libxml2 are completely unwilling to fund further development
For instance, consuming XML and creating it are two very different use cases. Zooming into consuming it, perhaps your input data has more guarantees than libxml2 assumes, such as the nonexistence of meta definition tags.
"You" probably do not.
But different "yous" need different features, and so they get all glommed together into one big thing. So no one needs "all" of lbxml2/XML's features, each individual needs a different subset.
Of course this is an oversimplification, and there will no doubt be some sort of long tail, but it expresses the challenge well. I'd imagine the same is true for many other reasonably complex libraries, frameworks, or applications.
1. "This XML library is way bigger than what I need, I'll write something more minimal for my use case"
2. write a library for whatever minimal subset you need
3. crash report comes in, realise you missed off some feature x. Add support for some feature x.
4. Bob likes your library. So small, so elegant. He'd love to use it, if only you supported feature y, so you add support for feature y.
...
End result is x+1 big, complex XML libraries.
Obviously Im being a bit obtuse here because you might be able to guarantee some subset of it in whatever your specific circumstances are, but I think it's hard to do over a long period of time. If people think you're speaking XML then at some point they'll say "why don't we use this nice XML feature to add this new functionality".
Basically something behaves like your typical JSON parser and serialiser but for XML.
To my knowledge, this is what TinyXML2 does, and I've used TinyXML2 for this before to great effect.
It's doable, just use the right tools and hacks :)
Processing schema-less or broken schema stuff is always hilarious.
Good times.
Maybe you don't need libxml2 specifically (good luck finding an alternative to parse XML in C and other such languages though), but "I don't like the complex side of XML so let's pretend it doesn't exist" doesn't solve the problem most people pick libxml2 for. It's the de-facto standard because it supports everything you could possibly need.
> <blink>Expat is UNDERSTAFFED and WITHOUT FUNDING.</blink> > The following topics need additional skilled C developers to progress > in a timely manner or at all (loosely ordered by descending priority):
* XSLT is still the only native templating option for HTML pages that runs natively in the browser (but just now you are limited to XSLT v1.0 which as a number of drawbacks and limitations)
* XSLT/XML is still best at text markup. In particular interpolation. There is no simple way to represent marked up text in, say, JSON.
* Content federation (atom, rss) is still very dependent on XML.
Surely somebody somewhere has money to pay for a greybeard to fix XSLT for us? It seems far to fundamental to be left to wither on the vine.
I feel like it adds more weight to my feeling that we should have a software building code. When you have software that's critical infrastructure, with a nutso security policy like "no embargoes / 0day me bruh", we should have some regulations in place to require the software be maintained properly (that is to say, in a sane manner) or you can't use it commercially or for safety-critical things. Which would inevitably force commercial entities to pay for the maintenance so it could be done right.... which they should be doing already, the same way any company that builds safety-critical infrastructure has to pay to do it right.
If we want society to be safe, we have to make a law that enforces it. That's how that shit works.
(as an aside: holy shit, you're a prolific HN submitter, and all from different sources. where do you get it all?)
This made my brain go "Oh no, not this again. Open source projects don't owe you..." etc etc.
> or you can't use it commercially or for safety-critical things
Oh. Yeah, okay, absolutely! For safety-critical, I would like to think the responsibility already lies with the integrator/seller, but making it explicitly so can't hurt.
The license for libxml2 (like the license for almost any kind of open source software) already states "THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT." I don't see how you can put the responsibility even more on the integrator/seller than that. It literally states the devs don't even guarantee it works correctly.
Expanding that to more fields would be interesting, but difficult and expensive across the board. Particularly any sort of requirements like that generally incur significant regulatory and certification overhead.
However, if it was done similar to PCISS as an industry forum it might work better. Especially if certain fields like anything connecting with the electric grid we're required to use certified software.
It's not the "safety critical" software that needs this fixed, it's all software in general. There's a million software systems that have important privacy sensitive data or safety relevant processes that fly under the "safety critical" radar.
Really safety-critical stuff like ASIL-D, ISO26262, IEC61508 (and tons of other magic numbers) isn't something you can buy from microsoft. At best, you can sometimes get a reseller to sign something a little more binding, but with tons of restrictions that basically boil down to "use the microsoft stuff for the readout gauges, but the critical control part goes somewhere else".
So when these regulations that OP would start to take hold, would we get companies to sponsor random open source dependencies like libxml2? Or would they gather around some stable proprietary ecosystem like Microsoft's and maybe some big innovative solutions built on top of Microsoft?
Meanwhile libxml2 is still everywhere. Without someone with real backing, a core piece of infrastructure is about to go unmaintained.
Once again, the open-source funding problem is laid bare: the internet runs on the unpaid evenings of a few people until they burn out (add relevant reference from XKCD, obviously).