Robot Alignment - Part 1

Peter Evers
Feb 16, 2024
4 min read

TETMET is developing ASLM (Adaptive Spacial Lattice Manufacturing), a methodology to create lattice structures in an efficient way and scale. This video offers a slow-motion glimpse into the process. At the heart of our technology lies a robotic arm welding and cutting a continuous rod, as well as a fiber optic laser, to build intricate lattice structures.

This blog post takes a deep dive into our computer vision stack, which automatically repositions the robot based on camera input and the evolving structure's characteristics. Traditional methods rely on two cameras and known horizontal disparity ("stereo vision") to perceive depth. However, TETMET leverages trigonometry, AI, and linear algebra to achieve precise robot control using four cameras freely positioned on a robot axis. Our setup is shown in the next image.

Three cameras are mounted on the flange, and one camera is coaxial to the laser.

Inaccuracies in Robotics

Inaccuracy is a common challenge in robotics, and our process is no exception. We continuously weld steel rods (1mm-3mm diameter) using a fiber optic laser. Weld quality hinges on rod alignment, beam quality, and protective gas flow. This blog post features an ABB 1660 IRD robot, having an absolute accuracy of around 1 mm on average (max 2 mm) and repeatability of 0.05mm. In simpler terms, while the robot excels at repeatedly hitting the same spot, that spot might not be perfectly accurate.

Our case is low accuracy with high precision/repeatability (top right)

Furthermore, welding steel can deform the structure leading to misalignment of the rods. An example of low accuracy can be found in the next camera image, which shows a small gap between two rods that should be tangent to achieve a successful weld.

A simple example of robot inaccuracy is shown on four cameras. The top left image is the coaxial camera, the other three cameras are mounted on axis 5.

Four cameras guide our robot: one directly above the laser (top left), and three spaced around the flange at roughly 120-degree intervals. Only the last joint (axis 6) affects the tool position relative to the camera image.

The green arrow in the image indicates the Tool Center Point (TCP). The blue arrow indicates the theoretical position of the existing rod. The blue dotted field indicates the laser beam range in the first image. In this image, we can see that the blue arrow (theoretical position) is not aligned with the actual rod.

To understand and measure the coordinates of a point on the images, we designed special targets for calibrating the robot tool axes. The next image shows two 3D-printed samples. This tool sits precisely at 30 degrees, mirroring the TCP angle, and the circle at its center has a perfect 8mm diameter. The purpose of these targets is to transfer the tool axes to the camera frame. By simply rotating around axis 6, we can measure each tool axis in every configuration. This process resembles the OpenCV checkboard method for calibrating cameras in stereo vision.

Example of a 3D printed version of our calibration tool (left) and the image on the camera (right)

With axis 6 rotating, we track the projection of tool axes onto the camera plane (U, V axes). Our custom calibration target, angled exactly like the robot tool, ensures a clear axis definition. This lets us build a precise model predicting how the camera plane and tool axes relate based on axis 6's position. Since the top camera is aligned with axis 6, the projection onto the camera frame describes a circle, for the other cameras the projections follow an ellipse.

This is great! We now have a simple mathematical model of the relationship between our camera plane and the TCP axes, since an ellipse is described by:

Solving Linear Equations

SciPy's 'optimize' function comes in handy to fit ellipses to the points for each camera and TCP axes. This gives us 12 ellipses – the building blocks for depth estimation. On a given angle of axis 6, each tool axis vector is defined as a set of pixel coordinates. We can express the coordinates of a point in the tool referential as the sum of the components on each axis which would result in a position on the camera frame. Since we have three unknowns (X, Y, and Z coordinates of the point) we need three equations to solve the problem. Each camera frame axis (U and V) can provide an independent equation.

Linear equation to solve the XYZ coordinates from the projected planes (UV)

Hence, with a single-camera image, we cannot solve the problem. With an image from two cameras we already have more equations than unknowns (4 equations for 3 unknowns). As we increase the number of cameras, we increase the number of equations while keeping the number of unknowns the same. To further improve the accuracy of the method, we choose to take the median solution of every solvable system of equations. This solution can be directly used as an input for a relative movement in the tool referential by the robot.

We move the robot based on two markers manually placed at the rod tip endpoint, and the result is shown in the next image.

The blue arrow is now aligned with the structure

There is however one problem with this solution. We need input from the user from multiple cameras to identify the same reference point, also referred to as a “keypoint”. We choose the tips of the rods that are both defined in the structure and can be observed on the camera frames. Finding the keypoint location on each image is a tedious task that luckily can be automated using AI which we will discuss in our next blogpost!