Normal software engineering projects are grouped around certain technologies. For example, somebody can use a Linux server together with the PHP language to build a website, or it's possible to create with the C++ language a new computer game. If the programming environment is fixed, it's possible to figure out the details. And the number of options how to create within C++ a game is limited.
The situation in case of robotics project is a bit more difficult. There is no framework available. Sure, some libraries for creating robotics and even some programming language are mentioned in the literature. Sometimes the ROS project is called a quasi standard, and embedded control is often handled with C. But these technologies are not used for creating the AI itself, but they make only if it's already known how to realize the robot.
The better idea to start a robotics project is based on the steps in human computer interaction. A new robotics project is usually started as a manual control system. That means, the human operator gets a joystick and moves the robotarm remote. That is the same what a crane operator is doing. The second step is about reducing the workload for the human operator. The goal is to increase the automation level. In case of a robot arm who grasps objects this is done by automating the step of grasping itself. That means, the human operator controls the arm, but the robot decides when the right moment is there to close the gripper.
In the literature the concept is called shared autonomy. It means, that that some tasks are done by the human and other by the Artificial Intelligence. The human operator controls the movement of the arm, and the vision system detects if an object is in the hand and activates the grasping action. The advantage is, that only subparts of the system gets automated. That means, only the software which executes the grasping action is working autonomously, while the position of the gripper isn't controlled by the software. The overall pipeline can be improved into a fully autonomous system. The next step would be, that the AI controls both: the grasping and the position of the robot hand.
Somebody may argue, that the difference between a teleoperated robot arm and a robot arm who can grasp by itself is small. And indeed, in both cases the human operator is in the loop. That means, he has to move the joystick for doing the task. The advantage is, that the human will recognize the reduced workload. If he doesn't need to press the “grasp” button it's a clear improvement.
Combining GOAP with a vision model
GOAP (Goal oriented action planning) is a well known technique from Game AI to build realistic AI characters. The idea is, that the agent is in a worldstate and has a behavior library in the background. A solver is testing out different behaviors to bring the agent to a goal. GOAP is equal to a automatic textadventure which takes an input worldstate and generates the next behaviors.
To use the concept for real robotics, a vision model is needed which provides the input worldstate. A vision model is in the easiest case a vision cone infront of the agent. This is sometimes describes as spatial grounding in the literature, because it connects pixelcoordinates like “object=(100,100)” to language, e.g. “object isat front”.