Multi-Agent System: A Two-Level BDI Model Integrating Theory of Mind

Multi-agent system modeling of human social behavior or of an organization is an important area of research in full evolution. The key factor in human social interactions is our beliefs about others. This ability to attribute mental states to oneself and others and to interpret the behavior of others in terms of mental states is the theory of mind. This article presents a formal BDI agent model for the concept of theory of mind. The model uses the BDI concepts, the operating loop architecture of a BDI agent with an agent control with a limited obligation strategy to describe the reasoning process of an agent who reasons on the reasoning process of another agent, which is also conceptbased. A case study illustrates how the model can be used for school cheating.


INTRODUCTION
When deciding to use agents to simulate the functioning of a complex organization or system, it is important that agents are able to infer and understand mental states from themselves and others in order to better perform the actions. In other words, agents must be able to reason about the actual and potential behavior of the agents around them (Bosse T. a., 2007). This ability to attribute mental states to oneself and others and to interpret the behavior of others in terms of mental states is theory of mind (Harbers, 2009). According to Goldman (Goldman, 2012), there are four competing views on how theory of mind (ToM) can be developed, including the theory theory as the first point of view. , simulation as a second competing point of view, it is sometimes called "empathy theory", the third body of the literature states that the brain has two separate mechanisms (The Mechanism of Theory of Mind (ToMM) and the process of selection (SP)) who work together to provide ToM; and the last point of view is the theory of rationalitytheology. In the literature, The theory of mind is not properly taken into account in the decision-making process in ADMs. Tina Balke and Nigel Gilbert in (Balke, 2014) presented a review of decision-making architectures decisions that do not take into account or highlight good modeling of the theory of mind in their decision making.
Indeed, in 2005, Pynadath and Marsella proposed in (Pynadath, 2005) an agent-based simulation tool (PsychSim) to model the influence that certain agents have on others. In this tool, all agents have beliefs, objectives, policies, etc. of others, and are able to reason about it. The agent architecture of PsychSim is fairly simple, as each agent has exactly the mental states of other agents, and in addition it has no formal basis. In 2007, Tibor Bosse and Memon in (Bosse T. a., 2007) proposed a two-level BDI model which is an extension of the BDI architecture in which it uses the concepts of the BDI model to describe the reasoning process d 'an agent who reasons on the reasoning process of another agent, which is also based on BDI concepts. The model is used for social manipulation, notably the case of a manager who reasons about the behavior of an employee trying to avoid tasks. ToM is modeled in this architecture, according to the last competing point of view (theory of rationality-theology) according to the way in which ToM is modeled in the literature. Although the BDI architectures seem to be both descriptive enough to represent the cognitive processes influencing behavior and intuitive enough to be easily understood by modelers and non-computer scientists in general, it must be noted that the two-level BDI model n is not a good agent control strategy and good modeling of the BDI architecture that it extends. This is re fl ected by a number of simplifying hypotheses. In particular, it does not explicitly deal with situations in which the reasoning process does not follow the expected rhythm or the agent does not check if these intentions are possible or achievable when he performs the action. In 2010, Hiatt and Trafton suggested in (Hiatt, 2010) how we can add ToM to the ACT-R architecture with the aim of experimenting with alternative theories on how ToM is developed in humans rather than in as part of the design of agent models. On the strength of these observations, we are led to ask the question: "how to properly model the theory of mind in multi-agent systems?".
In this article, it will be precisely a decisionmaking model integrating TOM for a cognitive multi-agent system; for this we plan to improve the two-level BDI model for a better representation of ToM by changing the way the BDI model is modeled in the two-level BDI architecture by adapting it to the operating loop architecture of a BDI agent proposed by Florea (Florea, 1998). Also, we are also going to modify the agent control obligation strategy which formerly was based on the blind obligation strategy will now be based on the open obligation strategy. We plan to take into account in addition to the theory of rationality-theology, the third divergent point of view to model ToM and finally illustrate our model by cheating in an academic environment. The operating loop of a BDI agent is illustrated by Kinny and Rao in (Kinny, 1996). They put forward a certain number of hypotheses according to which the agent is immersed in an environment from which he receives events conveying information about the state of the environment and in which he acts by actions which modify this environment. As another hypothesis they say that the agent is characterized by representations (objects, data structures ...) of: • beliefs, desires and intentions. Florea (Florea, 1998), based on the hypotheses put forward by Kinny and Rao, he modeled The following figure 1 which presents the operating loop architecture of a BDI agent.  (Florea, 1998) The agent has an explicit representation of his beliefs, desires and intentions. We denote by B the set of beliefs of the agent, by D the set of his desires, and by I the set of his intentions, and by B, D and I the beliefs, desires and current intentions of the agent. The agent must produce plans, including a sequence of actions that he will take to resolve the problem. The most common representation of actions is to represent the effects of these actions on the environment.

d) Strategies of open obligation to control BDI agents
The open bond strategy is one of three main BDI agent control strategies. An agent with an open bond strategy maintains his intentions as long as those intentions are also his wishes. It also implies that once the agent has concluded that his intentions are no longer achievable, he no longer considers them among his desires. In the same way its algorithm is conceived starting from the modification of the algorithm conceived by Michael Wooldrige. For that one defines first (PE) the first action of a plan PE and remains (PE) the rest of the actions of the PE after the execution thereof. In this algorithm, after receiving new perceptions of the environment and revising their beliefs, the agent also considers his wishes and, in addition, considers a possible change in all of his intentions by the filter function. In this case, the agent engages in a planning process again, because the result of the filter function is a partial plan. e) General structure of the BDI model The BDI model presented in (Bosse T. a., 2007) explains behavior in a refined form, focusing on Aristotle's analysis of how humans (and animals) can come to action; cf. (Ross, 1962) (Nussbaum, 1976). He explains how the appearance of certain internal (mental) state properties within living beings, cause or cause the appearance of an action in the outside world. These properties of internal states are sometimes called by him "things in the soul", for example, sensation, reason and desire: "Now there are three things in the soul that control action and truth -Feeling, reason, desire" (Ross, 1962). Here, sensation indicates the detection of the environment by the agent, which leads, (in modern terms) to internal representations, called beliefs. Reason indicates the (rational) choice of an action which is reasonable to satisfy the given desire. On this basis, Aristotle introduces the following model to explain the action (called practical syllogism): If A has a desire D. and A is convinced that AC action is one (or: the best) means of reaching D. then A will do AC. Thus to model the BDI architecture, the authors relied on this previous model by making some adjustments in particular instead of a process of desire to action in a single step, as an intermediate step first intention is generated, and the intention of the action is generated.
The diagram of the BDI model presented in Figure 2 below is a generic structure in the graph-causal style, also often used to visualize the specifications of LEADSTO. In this figure2 the frame represents the boundaries of the agent, the circles indicate the properties of the state, and the arrows represent the dynamic properties expressing that one property of the state leads to (or the causes) another property of the state. In this model, an action is taken when the subject intends to do this action and is convinced that certain circumstances in the world are met so that the possibility of doing the action is there. Beliefs are created based on observations. The intention to do a specific type of action is created if there is a desire D, and there is the belief that certain circumstances in the state of the world are there, which make it possible that the performing this action fulfills this desire (which is the kind of rationality criterion discussed above). The relationships instantiated in the general BDI model as represented by arrows in graphic form in figure 2 can be specified in formal LEADSTO format as follows:

desire(D) ∧ belief(B1) → intention(AC) intention(Ac) ∧ belief(B2) →performs(AC)
With appropriate desire D, action Ac and beliefs B1, B2. Note that the beliefs used here both depend on observations, as shown in Figure 2. In addition, ∧ is the synonym for the link operator (and) between atomic state properties (in the graphic format designated by an arc connecting two (or more) arrows). Often the dynamic properties of LEADSTO are presented in semi-formal format, as follows:

RESULT AND ILLUSTRATION a) Adjustment of the general structure of the BDI model
The general structure of the BDI model presented in (Bosse T. a., 2007) is very simple. It does not develop a plan during the execution of the action, it is based on a strategy of blind obligation. So we will adjust this model by adapting it to the operating loop architecture of a Florea BDI agent based on the open obligation strategy so that the agent is more rational and is able to find other ways to achieve these goals or realize that these goals are being achieved or cannot be achieved. Indeed to adapt the general structure of the BDI model with respect to the operating loop architecture of a BDI agent from Florea, we just insert a new step which is the generation of the plan before the agent performs the action. As for the control of BDI agents, we opted for an open bond strategy which will allow the agent to maintain these chosen intentions if these intentions are also these desires. So after each execution of an action of the plan, the agent will make new perceptions which will allow him to update these beliefs, these desires, these intentions and to adjust his plan. He will check whether these intentions are still achievable or already realized. Using the LEADSTO modeling language to model the general structure of the BDI model adapted to the operating loop architecture of a BDI agent from Florea, we have: for any perception P made by agent A, then the latter believes; in other words :

Observe(A, P) → belief(A, W)
Likewise, a mental state (belief or desire or intention) can generate a desire. For any desire D, the state property of the world W, the agent can generate an intention AC (we will encapsulate the partial plans in the intention so that once the intention is generated, the partial plan l is also but always remains encapsulated in the intention) to realize the desire of the agent such that has_reason_for (A, D, W, AC) is place:

desire(A, D) ∧ belief(A, W) → intention(A, AC)
For any state property of the world W, and for any intention AC, an agent A can from the relevance of the state property of the world W, generated a complete plan PE (where PE = (P1, P2 ,. .., Pn) with P1, .., Pn being the sequences of actions) which it will execute to realize its intention AC. So we have the relation is_opportunity_for (A, W, PE, AC) which means:

intention (A, AC) ∧ belief (A, W) → generate (A, PE) (1) generate (A, PE) → performs (A, PE) (2 )
The relation (1) makes it possible to transform the partial plans encapsulated in the intention into a complete or detailed plan and the relation (2) makes it possible to execute the plan. Note that it is the execution of the plan that will make the agent achieve these intentions, So

performs (A, PE) → performs (A, AC).
Of course, as the agents control is based on the open obligation strategy, each time the agent performs an action of the plan (P1 or P2, ..... or Pn) it will make a new perception, will put update these beliefs, these desires, these intentions and adjust your plan and check if these chosen intentions are still achievable or if they have already been realized. That is to say: Each time agent A performs an action Pi, he will make a new perception L

performs (A, Pi) → Observe (A, L).
This new observation will allow Agent A to generate a belief.

Observe (A, L) → belief (A, L)
It is thanks to this new belief that the agent will update these desires, these intentions and adjust his plan. Subsequently the agent checks whether this new belief is consistent with his intention which wants to realize whether with an intention AC and a belief W we still intend to do AC.

intention (A, AC) ∧ belief (A, W) → intention (A, AC)
The agent also checks to see if this new belief tells him if, by any chance, these intentions are already fulfilled before the execution of his plan is completed.

b) Improvement of the BDI model on two levels
The architecture we offer is an improvement on the two-level BDI model. It is an extension of the general structure of the BDI model adapted to the operating loop architecture of a BDI agent presented above, the agents control of which is based on an open obligation strategy and to which we have added filter modules represented in figure 4 by the tubes painted in black which will make it possible to select from these inferred mental states the one or those which have the highest possibility. Then she will use the selected mental states which are the BDI concepts, to reason on the reasoning process of another agent whose model is also the general structure of the BDI model which we have adapted to the operating loop architecture a BDI agent. Thus, for agent B, a theory of mind is obtained by choosing the mental states which have the highest weight to attribute to agent A and then subsequently reasons on these concepts of beliefs, desires and intentions of Agent A that he inferred. Our new architecture therefore uses not only the theory of rationality-theology and in addition the third divergent point of view (ToMM-SP) to model the theory of mind. Assuming that the agent has already selected the most possible mental states, the new dependencies can be read as follows: For example, agent B can express his theory of mind to agent A by beliefs such as :

1))) belief(B, depends_on(desire(A, f), belief(A, e1))) belief(B, depends_on(desire(A, f), belief(A, e2))) belief(B, depends_on(belief(A, e1), observes(A, e1)))
This is seen in Figure 4 by the black dot which shows when agent B has a desire d and believes that this desire d depends on another f so he will first resolve f and then later satisfy the desire d. These beliefs can also be expressed by the "leads_to" relationship as follows:

desire(B, IE1) ∧ belief(B, IE2) → intention(B, AC)
For any property of the state of the world IE: WORLD_PROP and the action AC: ACTION such as is_opportunity_for (B, IE, AC) takes place:

performs(B, AC)
The figure below represents our new BDI model on two levels: Figure 4: General structure of the two-level BDI model c) Illustration of our model with a case study: cheating in an academic environment ➢ Scenario: One teacher found that the students cheated the previous academic year enormously. Wanting to prevent this evil from recurring in his material, he decided to take drastic measures at the start of the following academic year. For this, the teacher will try to generate all the possibilities that the student can use to cheat and choose the one (s) that are more true. This will allow him to model the behavior of the student (without ToM) and then reason from this model so that the student cannot perform the action of cheating.
➢ Inference of student behavior The student is appointed by A. The desire to cheat arises in the student if: • The student believes that the material is hard A.

Belief (A, understands_nothing) → Desire (A, cheated)
• He believes he has not learned enough

Belief (A, not_sufficiently_learned) → Desire (A, cheated)
• He believes that this teaching unit will take him back

Belief (A, UE_faire_reprendre) → Desire (A, cheated)
As all the subjects are equal so there is not one which is harder than the other and that any teaching unit can make take back a student then we will consider rather the two other cases which generated the desire to know : the student does not understand anything and he believes that he has not learned enough. The intention to cheat arises in the student at time t: • If the desire to cheat is present and he believes that we admit that we are dealing with phone.

Desire (A, cheated) ∧ belief (A, compose_telephone) → intention (A, cheated_telephone)
• If the desire to cheat is present and he believes he can use a cartridge

Desire (A, cheated) ∧ belief (A, use_cartridge) → intention (A, cheated_cartridge)
• If the desire to cheat is present and if he believes that he can ask and watch for these comrades.

Desire (A, cheated) ∧ belief (A, close-comrade) → intention (A, cheated_requesting_guettant)
Since the student is searched before entering the room, and he cannot have the intention to cheat with the phone or a cartridge then the intention cannot be that he thinks he can watch and ask these comrades.
The generation of the plan is born at time t: • If the intention has occurred and the student believes that the supervisor cannot see it.

generate (A, PE) → performs (A, PE)
He carries out his plan, that is to say action after action while making new perception after each action to see if the supervisor is watching to see if these intentions cheating is always possible.

performs (A, PE) → performs (A, tricher_demandant_guettant)
Thus the following specific relationships are used to model the behavior of the student: has_reason_for (To cheat, to understand nothing, not to be learned enough, cheated to ask) is_opportunity_for (A, surveillant_ne_le_voit_pa, PE, cheat_demandant_guettant) Figure 5: Trace of simulation of student behavior ➢ Reasoning of the teacher who reasons on the behavior of the student to avoid that he cheats. Here the sign noted B, reasons and acts in a way to anticipate to avoid that the student is the desire, the intention and / or the action of cheating. The initial desire of the teacher is that the student does not perform the act of cheating: desire (B, not (performs (A, tricher_demandant_guettant))) the satisfaction of this desire can be obtained in the following three ways: • Avoid the student A's desire to cheat. • Avoid the intention of student A to ask and watch for the copy of his classmate occurs (since the desire has occurred) • Avoid the student generating a plan (since the intention occurs) • Avoid that student A does not execute his plan in order to perform the cheating action (since the plan is generated and even if some action (s) have been performed). For convenience, the model does not make a selection, but addresses the three options to prevent the student from cheating. This means that the teacher generates desires so that: • Listening to the fact that it is possible to meet the teacher outside of class for concepts not understood in the classroom and that he knows that there is a remedial session to be able to validate.

hears (A, teacher_available_out_class))
• Listens that it is not possible to touch your neighbor.

hears (A, assieds_seul_banc))
• Listening that there is an additional supervisor who is at the back of the class.

hears (A, two_monitoring))
• Listens to the warning of one of the supervisors when he performs an action on his plan.

hears (A, About_Warning))
To meet these desires, intentions must be generated by teacher B to perform actions such as: • B tells A that he is available even outside school hours for all concepts not included.

performs (B, tell (A, teacher_available_out_class))
• B tells A that everyone will sit alone on their bench on the day of the assessment. performs (B, tell (A, assieds_seul_banc)) • B tells A that there will be two supervisors, one in front and the other behind. performs (B, tell (A, two_monitoring)) • B tells A about the warnings so that he doesn't carry out his plan.  belief (B, adequate_communication (B, A)) • B believes he has observed that A is lazy, gestures enough during all of the class sessions.

belief (B, observations (A, parasseux_avant_examen))
In addition, these intentions of B can lead to corresponding sequences of actions (plan) when the following belief of B when presented: • B believes that A is calm, attentive and there are no indoor noises.

belief (B, A_calme_attentif_pas_bruit)
• B believes he observed that A gestures during the assessment. belief (B, observe (A, gesticule_pendant_examen)) By combining the specific relationships and the generic Leadsto rules, we obtain the following simulation of the behavior of the teacher who wants to prevent the student from cheating by watching or asking from his friend:

CONCLUSION
In this article, it was a question of proposing a decision-making model integrating the TOM for a cognitive multi-agent system. Indeed, we proposed a multi-agent system architecture (SMA) which is an improvement of the BDI architecture on two levels. To achieve this, we first changed the way the BDI architecture on which the two-level BDI model was based to adapt it to the operating loop architecture of a Florea BDI agent whose control of agents is based on the limited obligation strategy. Thus the agent will maintain these intentions until he believes that they are realized or when he believes that they are no longer achievable. In addition to the agent using the theory of rationality theology, we have used the third divergent view of the literature to effectively model Theory of Mind (ToM) in SMA. Thus an agent inferred all possible eventualities from the mental states of another agent and selected those which were more true, then reasoned over them in order to make social anticipations or manipulations. The model was formalized using the high-level modeling language LEADSTO, which describes the dynamics in terms of direct time dependencies between the properties of the state in successive states. We illustrated our model with the scenario of a teacher who would not want students to cheat during the year. For this the teacher has inferred all the possibilities (mental states) that the students can use to cheat, then choose those (mental states) which are more true to model the behavior of the student using our loop architecture. Functioning of a BDI agent with limited strategy, then will use this to reason about the student's behavior so that they do not cheat. In our future work, we envisage a multi-level BDI model integrating theory of mind: application to a game with several participants.