Rumble 1.7.0 "Phoenix Atlantica"
July 10, 2020
Submitted by Ghislain Fourny.
We are happy to announce the release of Rumble 1.7.0 beta. Rumble runs JSONiq (XQuery's cousin) on top of Spark.
We have a lot of new features, prioritized based on our users, that bring us closer to whole coverage, including (since the previous announcement of 1.5.1 on this website):
- Running in parallel now "crosses" function boundaries, so you can pass sequences of billions of items to and out of a function. Likewise, global variables may be bound to sequences of billions of items. Many expressions and functions can also now run in parallel on billions of items (conditional, typeswitch, switch, ...)
- Declarative machine learning: function items used as estimators and transformers, with Spark ML doing the job inside on your large-scale datasets.
- Library modules are supported; for now, the module namespace is used to resolve the location, i.e., the namespace is the actual location on any file system, which should be straightforward to understand for Python programmers. Location hints will follow soon.
- Support for writing queries in Jupyter notebooks (very popular with data scientists), with Rumble running as an HTTP server in the background.
- HTTP is supported as a ready-only file system, in addition to otherwise HDFS, S3, Azure, the local file system, etc: as most XQuery users already know it, this means that you can share your code on the Web and import modules directly from there. Or you can share your code within your institution on HDFS, etc.
- Relative URIs (of imported modules, input data, etc) are resolved according to the standard (e.g., relative URIs in a query are resolved against the query location, etc).
- Support for global variables (with dependency cycle detection), try-catch expressions, simple map expressions, XQuery 3.1's "=>" function call notation, many new builtin functions including trace for debugging.
- Support for the Avro and SVM input formats (in addition to JSON, Parquet, CSV, ROOT...)
- Support for the brand new version of Spark 3.0.0.
Enjoy! It's free and open source (Apache 2.0). You can get started on an online sandbox within seconds fromĀ http://www.rumbledb.org/