While refactoring code for a Java Barnes-hut simulation, i came across an explicit unrolling of loops. At a high level, the bodies are stored in an Oct-tree and so each cluster would have eight neighbors for each of the subspace in three dimensions. Once the tree has been constructed, the nodes are compacted. This means, since the neighbors for the Oct-tree are stored in an array, that any non-existent neighbors are removed so a pass over all the neighbors can break on first non-existent neighbor. In order to compute the force acting on each body, we recursively iterate over the tree. This means at each node of the Oct-tree, we have to iterate over all the neighbors, and that means going up to 8 nodes. The simplest way would be to have a loop go over all the bodies and breaking on the first non-existent neighbor. Hence, something like :

GNode child = null; for(int i=0;i<8;i++){ child = octree.getNeighbor(nn, i, MethodFlag.NONE); if(child==null) break; RecurseForce(leaf, child, dsq, epssq); }

The other change made was to encapsulate all the coordinate referencing code into a point class. So, there are four variants :

- Original (without point3, and with explicit loop unrolling)
- Point3 ( with point3 class, with explicit loop unrolling)
- For (without point3 class, without explicit loop unrolling)
- Naive (with point3 class, without explicit loop unrolling)

As a developer, you would write the Naive version. So, comparing the performance of the four variants, here is what we get.