Home Populating Java PriorityQueue Efficiently
Post
Cancel

Populating Java PriorityQueue Efficiently

TL;DR Use PriorityQueue constructor(s) to populate elements in queue instead of addAll, add or offer methods

Couple of weeks back, I was going over PriorityQueue docs in Java and wondered if there is any difference in performances over how Priorityqueue is constructed. After some analysis, I realized that it does make a difference how tree is constructed. In this blog post, I try will to explain the reasons and provide some actual benchmarks with some sample test cases.

Populating Priority Heap:

There are couple of ways to populate a heap.

  • Method 1: Create an instance and later populate the values via add or addAll, offer methods.
1
2
3
4
 // values is a subtype of Collection<Integer> 
 
 PriorityQueue<Integer> queue = new PriorityQueue<Integer>();
 queue.addAll(values); 

We will only examine addAll method as similar arguments can be made against other methods.

  • Method 2: Provide the values during object construction
1
2
3
 // values is a subtype of Collection<Integer> 
 
 PriorityQueue<Integer> queue = new PriorityQueue<Integer>(values);

Background

Priority queues are generally implemented using array based binary heap data structure. Operations on the binary tree are performed in such a way that this binary tree is balanced all the times along with upholding the heap property. No surprise, Java uses binary heap to implement PriorityQueue.

Method 1

If you were to add each element to an existing heap, it takes O(k) (where k = size of the heap at the time of insertion). Lets says if you want to add a total of n elements, then complexity would be nlogn. If we observe carefully, the cost of insertion at any moment depends on the height of the tree.

Total cost of adding n elements, one after another, would be:

\[\log_21 + \log_22 + \log_23 + \cdots + \log_2 n = \sum_{x=1}^{n} \log_2 x = \log _2(n!) = O(n\log n)\]

Uses Stirling’s approximation for time complexity

Time Complexity of Method 1 = O(nlogn)

Method 2

One could use same approach in method 1 and populate each element one by one. However, a better Floyd heap construction algorithm exists which can construct the heap in O(n) instead of O(nlogn). PriorityQueue class uses **Floyd** method to construct heap if all the elements are provided at once. This is the reason why method 2 is more efficient.

Time Complexity of Method 2 = O(n)

Performance Benchmarks

We will micro-benchmark these two methods under the following scenarios.

  1. Worst Case Scenario: When elements are added in reverse order
  2. Best Case Scenario: When elements are already sorted in the collection. A sorted array is a min binary heap
  3. Random Case Scenario: Just pick random numbers

In each of the above scenarios, we will test with different collection size namely 50, 500, 5000.

Results

The numbers indicate the average latency taken to perform the operation

Best Case ScenarioN=50N=500N=5000
Method 1 (via addAll)0.362 μs3.801 μs55.818 μs
Method 2 (via constructor)0.167 μs1.455 μs14.435 μs
Worst Case ScenarioN=50N=500N=5000
Method 1 (via addAll)0.962 μs16.349 μs221.430 μs
Method 2 (via constructor)0.319 μs3.139 μs32.609 μs
Random Case ScenarioN=50N=500N=5000
Method 1 (via addAll)0.587 μs7.582 μs119.197 μs
Method 2 (via constructor)0.331 μs2.885 μs63.401 μs

These micro-benchmarks are collected on my personal mac laptop using openjdk 11. The time unit is in microseconds.

As we can easily see from the benchmarks, adding elements via constructor (Method 2) is at least 2x times faster and in cases, it is even more than that.

The source code used for benchmarking is present here.

Limitations:

It is not always possible to know all the elements that need to be added to the PriorityQueue in advance.

Other Languages

References

This post is licensed under CC BY 4.0 by the author.