Dimensioning using stereo camera

i want to find the real world dimensions(LxWxH) of any object (estimated to the nearest cuboid). Any ideas how can i achieve this? I can get the depth from the stereo camera but for the length and width, the number of pixels it covers is not the same for variable heights. For example, a object with dimension (50x50cm) at a distance of 500 cm from camera will appear to be small when the same object is kept at a distance of 100cm.

How can i combine depth information and scale the real world dimensions from pixels?

Sorry, we don’t have much experience with this, maybe you can get inspiration from this article:
https://www.researchgate.net/publication/274773924_Object_Distance_and_Size_Measurement_Using_Stereo_Vision_System