-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for parsing YAML #7340
base: master
Are you sure you want to change the base?
Conversation
Like mentioned in #4910 (comment), we might want to consider to encode the yaml version into the function somehow. Instead of |
If this should be implemented, then a decision about the versioning scheme is needed and how the versioning should be done, i.e. whether the builtins should have different names or whether a function with additional version argument should be used instead. Having explicitly different function names would be advantageous with respect to reproducibility. The versioning should apply to different code versions, not only to different YAML standards, so that bug compatibility can be retained. The YAML standard is ugly and we need to parse real world data, i.e. the "YAML" might be invalid, so that some tradeoffs are required... I'm wondering how updates of |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/how-to-convert-yaml-nix-object/23755/2 |
I discussed this once more with @NaN-git and it doesn't seem like a good idea to encode the yaml spec version in the function name. Whatever naming schema we would come up right now, there is a high chance that it won't be meaningful. Therefore I think the builtin should just be introduced as a plain |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2023-01-02-nix-team-meeting-minutes-20/24403/1 |
Discussed in Nix team meeting on 2023-01-09:
Complete discussion
|
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2023-01-09-nix-team-meeting-minutes-22/24577/1 |
@edolstra any reason not to merge this? |
Decisions
Issues to clarify
VersioningIf the underlying library returns different results over time, this impacts reproducibility |
Regarding the tests:
builtins.fromYAML ''
---
values:
- &value someValue
---
# some YAML template
template: *value
'' evaluates to (I'm not very happy with the handling of multiple documents) [
{
values = [ "someValue" ];
}
{
template = "someValue";
}
] Build questionsrapidyaml uses cmake as build system. I don't think that this a good choice for nix because bootstrapping of nix has to be easy. The only dependency of rapidyaml is c4core from the same author. Even with rapidyaml as single header file the build time of |
I would package it and its dependency in Nixpkgs. I am happy to help with that part if you want. I personally am not to concerned about the bootstrapping because we could always make YAML a (compile time of Nix) optional feature. |
Packaging rapidyaml is rather easy: { cmake
, fetchFromGitHub
, git
, stdenv
, enableStatic ? true
}:
stdenv.mkDerivation rec {
pname = "rapidyaml";
version = "0.5.0";
src = fetchFromGitHub {
owner = "biojppm";
repo = pname;
rev = "v${version}";
fetchSubmodules = true;
hash = "sha256-1/P6Szgng94UU8cPFAtOKMS+EmiwfW/IJl2UTolDU5s=";
};
nativeBuildInputs = [ cmake git ];
cmakeFlags = [
"-DRYML_WITH_TAB_TOKENS=ON"
"-DBUILD_SHARED_LIBS=${if enableStatic then "OFF" else "ON"}"
];
} I prefer to statically link this because the static library is rather small (~500 KB). In my opinion not only bootstrapping of nix has to be easy, but compiling it for other distributions should be easy, too. I don't know how to add rapidyaml as optional dependency to nix, so that builds without nix aren't messed up. |
I removed rapidyaml as single header file and made it an optional library instead. If the library cannot be found then Regarding the testing framework: I need more information what's actually required/wanted. I commented some point in #7340 (comment). Of course it would be possible to preprocess the tests differently so that less logic is needed in the testing code and it would be more obvious which tests are executed actually. |
@edolstra what do you think about an optional built-in? Doesn't sound right to me. |
Supporting a build without yaml library might be useful for some build-from-source bootstrapping process, but in this case it must be an explicit choice to remove the yaml feature and create an incomplete Nix. It must not happen by accident. |
Whether that's even a worthwhile effort, I don't know. Personally I think bootstrapping Nix from source is not an important use case, but packaging by other distros is. Those distros must not package a Nix without yaml though! I'd rather require rapidyaml unconditionally. |
Ok, I see two possible solutions:
|
- failed assertion throws exception - parse values correctly - handle empty YAML
This reverts commit 82e4242.
also check consistency with fromJSON
Co-authored-by: Eelco Dolstra <[email protected]>
update rapidyaml version cleanup/fix parsing of yaml make fromYAML experimental
cleanup test logic don't ignore whole classes of tests
- add additional argument to fromYAML for optional parameters of the parser - adhere to the YAML 1.2 core schema - much stronger error checks and improved error messages - proper conversion of null, floats, integers and booleans - additional testcases and more checks for expected failures
- restrict patterns of floats and ints to patterns defined by YAML 1.2 core schema - parse integers with tag !!float - map: enforce key uniqueness
fix: parse "!!float -0" as -0.0
- detect integer over- and underflow - disallow denormal numbers
610cd2f
to
ce61749
Compare
Why
builtins.fromYAML
?YAML is widely used amongst other package managers and deployment tools. If we want better compatibility to these ecosystems, the ability to parse yaml efficiently in nix is useful, as it is for TOML and JSON, which we already support. Further discussion can be found, i.a., in #4910.
Description
YAML 1.2 is a complex standard and nix has a limited set of data types. Thus only a subset of YAML can be represented in nix. For example attribute sets require String keys, i.e. attribute sets can represent YAML maps with String keys only, and nix has no data types for binary data or dates. If
builtins.fromYAML
encounters YAML with incompatible data types, then it fails similar tobuiltins.fromTOML
.First, the implementation uses rapidyaml to parse the YAML string and afterwards the nix objects are created while traversing the YAML tree similar to
builtins.fromTOML
. Tags are mostly ignored and affect only scalars. Custom tags are always ignored and I don't see how custom tags could be handled by nix.Why rapidyaml?
As part of nix robustness and safety of the implementation of
builtins.fromYAML
are important.Some reasons why rapidyaml was chosen:
limitations of rapidyaml
rapidyaml has a few limitations. Most of these limitations are not really relevant for the nix use case because of the limitations of nix.
Some comments with respect to the limitations:
RYML_WITH_TAB_TOKENS
is defined forbuiltins.fromYAML
.{ &anchor: key: val, anchor: *anchor: }
is parsed correctly, but this test case fails. The test case is ignored in thebuiltins.fromYAML
-tests because it is an edge case and it's no valid YAML 1.3.Otherwise the only issue that I found is that a string/block containing only tab-, space- and new line-characters might be parsed incorrectly as empty string (test case). I don't think that this limits the real world usage.UPDATE: This is fixed in the newest rapidyaml release.:
-token and missing value, e.g. this example cannot be parsed.Also rapidyaml parses some invalid YAML successfully, but that is actually helpful.
Otherwise the valid JSONUPDATE: Actually this is allowed in flow mappings.{"a":"b"}
, which is emitted bybuiltins.toJSON
, could not be parsed bybuiltins.fromYAML
because it is no valid YAML due to the missing separation white space after the:
-token.Tested platforms