The Apache Software Foundation (ASF) has presented a new major release of Impala, the query engine tailored to Hadoop, with numerous bug fixes, as well as improvements and new features. Apache Impala uses the same metadata and SQL syntax as Apache Hive, and with the new version 4.0, provides users with enhanced multithreading options.0 now provides users with advanced multithreading options. In addition, the update brings some fundamental changes to authentication and authorization – for example, the departure of Sentry in favor of Ranger.
Multithreading for all queries
For analytical queries on data stored in HDFS (Hadoop Distributed File System), Kudu, or even in the cloud, Impala offers a different level of parallelism that can be specified via the MT_DOP option for all those operations that can benefit from multithreaded execution. Previously, however, this option was limited to queries involving only scans and aggregates. As of version 4.0, MT_DOP is now available for all queries.
After the original Impala developer Cloudera announced in the course of its merger with Hortonworks that it would phase out its own Sentry project for authorization and auditing purposes in favor of Apache Ranger contributed by Hortonworks, Impala 4.0 now takes the final step: support for Sentry is completely dropped. Although Ranger was not compatible with Impala at the time, its broader feature set and more strategically promising integration with Hadoop components clinched the deal.
Ranger and Knox for more security
In the future, Ranger will not only be the standard tool for authorization, which is important for DSGVO-compliant masking of personal data, but will also contribute to the integration of Apache Knox. As a gateway, Knox provides a single central authentication and access point for Hadoop services in the cluster by encapsulating Kerberos. The stateless reverse proxy framework can, on the one hand, use REST.
In addition, the Impala 4.0 the compliance requirements of FIPS (Federal Information Processing Standard Publication) and understands the Security Assertion Markup Language (SAML), which is an XML framework that governs the exchange of authentication and authorization information.
A complete overview of all the new features and improvements in Apache Impala 4.0 can be found in the release notes as well as in the changelog.