I fixed the code to use SimpleBatchNode (all spheres are there now).
I temporarily disabled all my code at physicsTick() and prePhysicsTick() but it was light weight and didnt change the results.
But indeed, that huge amount of simultaneous physics going on is unnecessary. The MainThread FPS and PhysThread tick per seconds were great until I reach the mark of about 125 spheres moving around! what is more than enough to play with 
[details=some tests results: I dont know if this could be improved, I will keep here just for the curious ones :)]initially, without the physics spheres, the MainThread FPS is about 650, and my GPU utilization goes to 35%.
With 250 spheres, the Ticks Per Second (I counted on the physicsTick()) goes from the constant 60 to 40, while the MainThread FPS goes down to 10.
I was expecting the CPU usage to increase to keep the Physics thread at 60 ticks per second, but that didnt happen.
By what you said, it seems the MainThread going as low as 10 FPS is hindering the updates at the Physics Thread, is that so?
I profiled and the most time spent are:
MainThread:
BulletAppState ... java.util.concurrent.locks.LockSupport.park (Object) 28,757 ms (47.1%) 13.9 ms (0%)
PhysThread:
com.jme3.bullet.PhysicsSpace.stepSimulation[native] (long, float, int, float) 31,783 ms (52.1%) 31,783 ms (98.8%)
Also, when the MainThread FPS goes down to 10, my GPU utilization goes down too to 4%; it seems to use more GPU processing for higher FPS at MainThread (rendering) only.
btw, I measured the GPU in linux using nvidia-settings -q GPUUtilization
[/details]
good to know that, so may be I can put some stuff that would not collide (just gravitating) or that would collide far away or any other thing that doesnt interfere/hinder/limit with the player’s actions, in separate physics spaces! that will be helpful, thx!