What’s new in 0.3.0 (May 23, 2023)#

Features#

  • Inequality join operation added, matching one column’s values that are less / greater than the other column’s values.

  • Parallelized Theta join

  • Change <pandance.theta_join>() arguments (and documentation) to use the term “condition” instead of “relation”.

Performance#

  • Large performance improvements for <pandance.theta_join>(): x25 increase in speed on the benchmark and avoids an intermediate Cartesian join that can quickly consume all memory for larger inputs.

  • Slight performance improvements for <pandance.fuzzy_join>()

Documentation#

  • Clarified time complexity and worst case for Fuzzy join