Artificial Intelligence used in Robotics

DOI : 10.17577/IJERTCONV6IS14083

Download Full-Text PDF Cite this Publication

Text Only Version

Artificial Intelligence used in Robotics

Artificial Intelligence used in Robotics

R. Ajay Balaji, N. Mohamed Inzamam Kutty

Department of Mechanical Engineering Parisutham Institute of Technology and Science

Abstract:- Novel anthropomorphic robotic systems increasingly employ variable impedance actuation with a view to achieving robustness against uncertainty, superior agility and improved efficiency that are hallmarks of biological systems. Control- ling and modulating impedance profiles such that they are optimally tuned to the controlled plant is crucial in realizing these benefits. In this work, we propose a methodology to generate optimal control commands for variable impedance actuators under a prescribed tradeoff of task accuracy and energy cost. We employ a supervised learning paradigm to acquire both the plant dynamics and its stochastic properties. This enables us to prescribe an optimal impedance and command profile (i) tuned to the hard-to- model plant noise characteristics and (ii) adaptable to systematic changes. To evaluate the scalability of our framework to real hardware, we designed and built a novel antagonistic series elastic actuator (SEA) characterized by a simple mechanical architecture and we ran several evaluations on a variety of reach and hold tasks. These results highlight, for the first time on real hardware, how impedance modulation profiles tuned to the plant dynamics emerge from the first principles of stochastic optimization, achieving clear performance gains over classical methods that ignore or are incapable of incorporating stochastic information.

Keywords:- Antagonistic actuator, dynamics learning, equilibrium point control, impedance control, stochastic optimal control

1. INTRODUCTION

Humans have remarkable abilities in controlling their limbs in a fashion that outperforms most artificial systems in terms of versatility, compliance and energy efficiency. The fact that biological motor systems suffer from significant noise, sensory delays and other sources of stochasticity (Faisal et al. 2008) makes their performance even more impressive. Therefore, it comes as no surprise that biologi- cal motor control is often used as a benchmark for robotic systems. On the one hand, biological motor control charac- teristics are a result of the inherent biophysical properties of human limbs, and on the other hand, they are achieved through a framework of learning and adaptation processes (Wolpert et al. 1995; Kawato 1999; Davidson and Wolpert 2005). These concepts can be transferred to robotic systems by (i) developing appropriate anthropomorphic hardware and (ii) by employing learning mechanisms that support motor control in the presence of noise and perturbations (Mitrovic et al. 2008).

In this paper, we focus on issues related to adaptive motor control of antagonistically actuated robots. Antagonistic actuator designs are based on the biological principle of

opposing muscle pairs. Therefore, the joint torque motors, for example, of a robotic arm are replaced by opposing actuators, typically using mechanical springs (Pratt and Williamson 1995). Such series elastic actuators (SEA) have had increasing attention in the last few decades (Van- derborght et al. 2009) as they provide several beneficial properties over classic joint torque actuated systems:

  1. Impedance control and variable compliance: Through the use of antagonistic actuation, the system is able to vary co-contraction levels, which in turn change the sys- tems mechanical properties: this is commonly referred to as impedance control (Hogan 1984). Impedance in a mechanical system is defined as a measure of force response to a motion exerted on the system and is composed of components such as inertia, damping and stiffness. In general SEAs can only vary stiffness of a system and achieving variable damping is technically challenging (e.g. Laffranchi et al. 2010). Consequently, in this paper, when we refer to impedance control, we will solely address a change in stiffness and ignore vari- able damping or variable inertia. Antagonistic actuation introduces an additional degree of freedom in the limb dynamics, i.e. the same joint torque can be achieved by different muscle activations. This means low co- contraction leads to low joint impedance whereas high co-contraction increases joint impedance. This degree of freedom can be used beneficially in many motion tasks, especially those involving manipulation or inter- action with tools. It has been shown through many neu- rophysiological studies (e.g. Burdet et al. 2001) that humans are capable of modulating this impedance in an optimal way with respect to the task requirements, trading off selectively against energy consumption. For example, when you use a drill to drill holes in a wall, you learn how to co- contract your muscles such that the random perturbations of the drilling have a mini- mal impact. Furthermore, the ability to vary compliance plays a crucial role in robot safety (Zinn et al. 2004). In general, impedance modulation is an efficient way to control systems that suffer from noise, disturbances or sensorimotor delays.

  2. Energy efficiency and energy storage: By appropriately controlling the SEA, one can take into account the passive

properties of the springs and produce control strategies with low energy requirements. A well-known example is walking, where the spring properties com- bined with an ideal actuation timing can be used to produce energetically efficient gaits (Collins and Ruina 2005; Collins and Kuo 2010). Furthermore, an SEA has impressive energy storage and fast discharge capabili- ties, enabling explosive behavior such as throwing a ball (Wolf and Hirzinger 2008), which is quite hard to achieve with regular joint torque controllers. Therefore, series elasticity can amplify power and work output of an actuator, which is important in the fabrication of lightweight but powerful robotic or prosthetic devices (Paluska and Herr 2006).

A disadvantage of antagonistic actuation is that it imposes higher demands on the redundancy resolution capabilities of a motor controller. Optimality principles have successfully been used in biological (Flash and Hogan 1985; Scott 2004; Todorov 2004) and in artificial systems (Nakamura and Hanafusa 1987; Cortes et al. 2001) as a principled strategy to resolve redundancies in a way that is beneficial for the task at hand. More specifically, stochas- tic optimal control (SOC) (Stengel 1994; Bertsekas 1995; Todorov 2006) appears to be an especially appealing the- ory as it studies optimality principles under the premise of noisy and uncertain dynamics. Another important aspect when studying stochastic systems is how the information,

for example, about noise or uncertainty is obtained with- out prior knowledge. Supervised learning methods can pro- vide a viable solution to this problem as they can be used to extract information from the plants sensorimotor data directly.

Here, we propose a control strategy for antagonistic sys- tems that is based on stochastic optimal control theory under the premise of minimal energy cost. We propose to extend SOC by learning the dynamic model of the plant, which enables us (i) to adapt to systematic changes of the plant and (ii) extract its stochastic properties. Stochastic properties or stochastic information refers to noise or ran- dom perturbations of the controlled system that cannot be modeled deterministically. By incorporating this stochas- tic information into the optimization process, we show how impedance modulation and co-contraction behavior emerges as an optimal control srategy from first principles. In the next section, we present a new antagonistic actua- tor, which serves as our implementation testbed for study- ing impedance control in the presence of stochasticity and which, compared to previous antagonistic designs, has a much simpler mechanical design. In Section 3, we introduce the basic concepts of optimal control and propose an exten- sion that uses a learned dynamic model. This supervised learning methodology allows us to adapt online to changes in the dynamics as well as to extract localized stochastic information from movement data. We then propose a sys- tematic methodology for incorporating deterministic and stochastic plant dynamic information into the optimal con-

trol framework, resulting in a scheme that improves per- formance significantly by exploiting the antagonistic redun- dancy of our plant. Our claims are supported by a number of experimental evaluations on real hardware in Section 4. We conclude the paper with a discussion and an outlook.

2. A Novel Antagonistic Actuator Design for Impedance Control

To study impedance control, we developed an antagonistic joint with a simple mechanical setup. Our design is based on the SEA approach in which the driven joint is connected via spring(s) to a stiff actuator (e.g. a servomotor). A vari- ety of SEA designs have been proposed (for a recent review see Vanderborght et al. (2009)), which we here classify into pseudo-antagonistic and antagonistic setups. Pseudo- antagonistic SEAs have one or multiple elastic elements, which are connected between the driving motor and the driven joint. The spring tension and therefore, the joint stiffness, is regulated using a mechanism equipped with a second actuator. Antagonistic SEAs have one motor per opposing spring and the stiffness is controlled through a combination of both motor commands. Therefore, in antag- onistic designs, the relationship between motor commands and stiffness must be resolved by the controller. This addi- tional computational cost is the tradeoff for a biologically plausible architecture.

Fig. 1. Schematic of the variable stiffness actuator. The robots dimensions are: a = 15 mm, L = 26 mm, d = 81 mm, h =

27 mm. The spring rest length is s0 = 27 mm.

For antagonistic SEAs, non-linearity of the springs is essential to obtain variable compliance (van Ham et al. 2009). Because forces produced through springs with lin- ear tension-to-force characteristics tend to cancel out in an antagonistic setup, an increase in the tension of both springs (i.e. co-contraction) does not change the stiffness of the system. Commercially available springs usually have lin- ear tension-to-force characteristics and consequently most antagonistic SEAs require relatively complex mechanical structures to achieve a non-linear tension- -force curve

(Hurst et al. 2004; Migliore et al. 2005; Tonietti et al. 2005). These mechanisms typically increase construction and maintenance effort but also can complicate the sys- tem identification and controllability, for example, due to

added drag and friction properties. We directly addressed

Note that K depends linearly on the spring stiffness , but the geometry of the arm induces a non-linear dependency on and . Figure 2 shows the c omputed profiles of the equilibrium position and stiff ness, respectiv ely.

Further, denoting the arms inertia around the z-axis by Iz

this aspect in our design of the SEA, which aims to achieve

and a damping torque given by ( ) D, the

namic

variable stiffness characteristics using a simple mechanical

setup.

Variable Stiffness with Linear Springs

Here we propose an SEA design that does not rely on com- plex mechanisms to achieve variable stiffness but achieves the desired properties through a specific geometric arrange- ment of the springs. While the emphasis of this paper is not on the mechanical design of actuators, we will explain the essential dynamic properties of our testbed. Figure 1 shows a sketch of the robot, which is mounted horizontally and consists of a single joint and two antagonistic servo- motors that are connected to the joint via linear springs. The springs are mounted with a moment arm offset a at the joints and an offset of L at the motors. Therefore, the spring-endpoints move along circular paths at the joints and at the motors. Under the assumption that the servomotors are infinitely stiff, we can calculate the torque acting on the arm as follows. Let s1 denote the vector from point C to

A, and s2 the vector from D to B, and s1 and s2 their respec- tive lengths. Putting the origin of the coordinate system at the arm joint, we have

= dy

equation can be analytically written as

Iz¨= (, ,)D. (5)

Actuator Hardware

Figure 3 depicts our prototype SEA hardware implemen- tation of the discussed design. For actuation, we employ two servomotors (Hitec HSR-5990TG), each of which is connected to the arm via a spring mounted on two low friction ball bearings. To avoid excessive oscillations, the joint is attached to a rotary viscous damper. The servos are controlled using 50 Hz PWM signals by an Arduino Duemi- lanove microcontroller board (Atmel ATmega328). That board also measures the arms joint angle with a contact- free rotary position encoder (Melexis MLX90316GO),

as well as its angular acceleration ¨ using a LilyPad

accelerometer (Analog Devices ADXL330). Finally, we also measure the servomotor positions by feeding a signal

h L sin

a cos

s1 = d + L cos a sin ,

0 0

h + L sin

z ¸ s

=a1

a cos

s2 = d + L0 cos a s0in . (1)

z ¸ s

=a2

Denoting the spring constant by and the rest length by s0, this yields forces

s

F1 = ( s1 s0 ) s1

1

s

and F2 = ( s2 s0) s2

2

. (2)

Given the motor positions and and the arm position , the torque generated by the springs is

( , , ) = zT ( F1 × a1 + F2 × a2 ) , (3)

where zT denotes the three-dimensional basis vector ( 0, 0, 1)T. To calculate the equilibrium position eq for given

motor positions and , we need to solve ( , , e=q) 0, which in practice is by numerical optimization. At this

position, we can calculate the joint stiffness as

K( , )= (, , ).

=eq

. (4)

180

150

u1 [deg]

120

90

60

30

0

Equilibrium Position

0 30 60 90 120 150 180

u2 [deg]

50

0

50

180

150

u1 [deg]

120

90

60

30

0

Joint Stiffness

0 30 60 90 120 150 180

u2 [deg]

0.5

0.45

0.4

0.35

0.3

Fig. 2. Left: Equilibrium position as a function of the motor positions (in degrees), with contour lines spaced at 5 intervals. Right: Stiffness profile of the arm, as calculated from Equation (4). The maximum achievable stiffness is 150% of the intrinsic spring stiffness.

Motor 1 [deg]

150

100

50

Commanded

From sensor modelled (FIR filter)

0

0 2 4 6 8 10 12 14 16 18 20

Time [s]

150

Motor 2 [deg]

100

50

0

0 2 4 6 8 10 12 14 16 18 20

Time [s]

Joint accelerations [deg/s2]

10000

5000

0

Measured Predicted

Fig. 3. Photograph of our antagonistic robot. Inset panel (a): Sep- arate servomotor mounted at the end of the arm to create stochastic

5000

0 2 4 6 8 10 12 14 16 18 20

Time [s]

perturbations (see Section 4.2).

from their internal potentiometer to the AD converters of the Arduino. While the operating frequency is limited to 50 Hz due to the PWM control, all measurements are taken at

a 4×higher frequency and averaged on the board to reduce the amount of noise, before sendin the results to a PC via

a serial connection (RS232/USB).

System Identification

Fig. 4. Comparison of prediction of performance of estimated

motor dynamics (top and middle) and of arm dynamics (bottom) for an independent test data set.

use are very accurate, they need some time to reach the desired position, and therefore we model the true motor positions (, ) as a low-pass filtered version of the commands ( u1, u2) using a finite impulse response (FIR) filter, i.e.

Apart from measuring the exact dimensions (L = 2.6 cm,

K

.

[n] = ( h u ) [n] + [n] = h[k]u [n k] + [n]

a =1.5 cm, h 2=.7 cm, d 8=.1 cm) of the robot, and the stiffness constant of the spring (= 424 N m1), system

1 1

k=0

(6)

identification consists of a series of steps, each of which

involves a least-squares fit between known and actually measured quantities.

  1. Identify servomotor dynamics: The servomotors are controlled by sending the desired position (encoded as a PWM signal), which we refer to as u1 and u2 for motors 1 and 2, respectively. Even though the servomotors we

    and similarly for and u2. The term [t] denotes a

    noise component of the true motor position that can- not be modeled with the FIR filter. By using the inter- nal potentiometer of the servomotors, we can measure the actual motor positions to identify the filter coeffi- cients hi using a least squares fit, that is, by minimizing

    1 i

    . t ( [n] ( h u ) [n])2 with respect to h . Weretrieved a good fi t motor dynamics (cf. Figure 4) using an

    t of he

    FIR filter with seven steps, with estimated coefficients

    h = [0, 0, 0, 0.0445, 0.2708, 0.3189, 0.3658].

  2. Calibrate position sensor: Tests with the position sensor

    revealed linear position characteristics. By moving the arm physically to several predefined and geometrically measured positions, we determined the sensors offset and slope.

  3. Calibrate acceleration sensor: We matched the accel- erations measured with the accelerometer with accel- erations derived from the position sensor (using finite differences).

  4. Collect training data and fit parameters: We carried out motor babbling (any excitation movements are applica- ble) on the servos and measured the resulting arm posi- tions, velocities and accelerations. Taking into account the estimated motor dynamics using the fitted filter, we

solving a two-point boundary difference/differential equa- tion derived by applying Pontryagins minimum principle (Stengel 1994). In practice, in the presence of small pertur- bations or modeling errors, the optimal open-loop sequence of commands can be run on the real plant together with a simple PD controller that corrects deviations from the planned trajectory. However, those corrections will usually not adhere to the optimality criterion, and the resulting cost J will behigher.

Alternatively, we can try to incorporate stochasticity, e.g. as a dynamic model

dx = f( x, u) dt + F( x, u) d , N ( 0, I) (8) directly into the optimization process and minimize the expected cost.1 Here, d is a Gaussian noise process and

F( ·) tells us how strongly the noise affects each part of

estimated the armsinertia (Iz = kgm2

rad(2))andvis-

the state and control space. A well-studied example of this

cous damping (D =

N msrad(1)) coefficient using

case is the LQG problem, which stands for linear dynamics

least squares from Equation (5).

On a large independent test set of Stes=t 300, 000 data points, the motor prediction produces a Normalized Mean

Squared Error (NMSE) of enmse = 1.85%. Figure 4 shows an example prediction of performance for a sequence of

random motor commands (20 s from the test set Stest) using

(f( x, u=) Ax B+u), quadratic cost (in both x and u), and

Gaussian noise (F is constant). A solution to this class of

problems is the optimal feedback controller (OFC), that is,

a policy u = ( x, t) that calculates the optimal command

u based on feedback x from the real plant. In the LQG

case, the solution is a linear feedback law u=L( t) x with

2

the estimated dynamic model.

precomputed time-dependent gain matrices

1994).

L( t) (Stengel

  1. Stochastic Optimal Control

    In many control scenarios it is desirable to be able to perform in the best way possible. For example, one may wish to move the system to a desired posture and consume as little energy as possible during the movement. This type of problem is studied in optimal control theory, the central ingredient of which is the minimization of an optimality criterion

    T

    +

    Solving OFC problems for more complex systems (non- linear dynamics, non-quadratic cost, varying noise levels

    F) is a difficult computational task. A general way to solve OFC problems for non-linear quadratic problems is Dynamic Programming (DP) (Bellman 1957). DP in its basic form relies on a discretization of the state and action space, which in practice is difficult to obtain: On the one hand, tiling the stateaction space too sparse will lead to poor representation of the underlying plant dynamics. On the other hand, a very fine discretization leads to a com- binatorial explosion of the problem, which is commonly

    J [u] =

    J [u] =

    0 c( x( t), u( t), t) dt h( x( T )) or

    c( x( t), u( t), t) dt, (7)

    0

    referred to as the curse of dimensionality. For example, con- sider a discretization of 100 steps for each variable of the

    state and action space. In the case of the presented SEA,

    this corresponds to a state space dimensionality =n 2, for positions and velocities, and action space dimensionality

    for a task with a finite or infinite horizon. Apart from the

    optional final cost h(·), the criterion integrates a cost rate

    c( x, u) over the course of the movement. That cost may

    depend on the systems state x, control commands u and on time t, where the initial state of the system is given as x( 0), and x( t) evolves depending on the commands u( t). In the context of biological motor control, this theory has been studied for decades with the well-known examples of various optimality criteria such as minimum time (Bobrow et al. 1985), energy (Li and Todorov 2007), jerk (Flash and Hogan 1985) and torque change (Uno et al. 1989).

    For a system with deterministic (and accurately modeled)

    dynamics x =f( x, u), it is sufficient to find the open-loop

    sequence of commands u( t) and the associated trajectory

    x( t) that minimizes J, which can usually be obtained by

    m = 2, for the two motors.3 Even for this low-dimensional system the possible combinations of states and actions that

    DP needs to evaluate and store in order to find the optimal

    control law are p= 1004 =100,000,000. One way to avoid the curse of dimensionality is to restrict the state space to

    a region that is close to a nominal optimal trajectory. In the neighborhood of such trajectories the DP problem can be approximated analytically using Taylor expansions of the dynamics and the cost function. The idea is to compute an optimal trajectory together with a locally valid feedback law and then iteratively improve this nominal solution until convergence. Well-known examples of such iterative meth- ods are Differential Dynamic Programming (DDP) (Dyer and McReynolds 1970; Jacobson and Mayne 1970) or the more recent iterative Linear Quadratic Gaussian (ILQG)

    (Todorov and Li 2005), which will serve as solution tech- nique of choice in this paper. ILQG yields both an open- loop sequence of commands and optimized feedback gain matrices, but these are not guaranteed to converge towards the global optimum: Depending on the initial guess of the trajectory, the iterative improvement might result in a solution with only a locally optimal expected cos J .

    Modeling Dynamics and Noise through Learning

    Analytical dynamic formulations, as described in Section

    2.1 or in Equation (5), have the tremendous advantage of being compact and fast to evaluate numerically, but they also suffer from drawbacks. First, their accuracy is limited to the level of detail put into the physical model. For exam- ple, our model is based on the assumption that the robot is completely symmetric, that both motors are perfectly cali- brated, and that the two springs are identical, but in reality we cannot avoid small errors in all of these. Second, the

    Learning this mapping from data, we can directly account for asymmetries. More interestingly, when we collect data from the perturbed system, we can acquire a model of the arms kinematic variability as a function of the motor positions.

    We use this learned model f in two ways: first, in (slow)

    position control tasks (Section 3.2), and in conjunction with full analytic dynamic models for dynamic reaching tasks (Section 3.3).

    Energy Optimal (Equilibrium) Position Control

    Consider the task of holding the arm at a certain position , while consuming as little energy as possible. Let us fur- ther assume that we have no feedback from the system,5

    but that the arm is perturbed randomly. We can state this mathematically as the minimization of a cost

    analytical model does not provide obvious ways to model changes in the dynamics, such as from wear and tear, or

    J =

    wp

    . ( f ( u) )2 +| u |

    2., (10)

    more systematic changes due to the weight of an added tool. While these problems can to some extent be alleviated by a more involved and repeated system identification pro- cess, the situation is more difficult if we consider the noise

    model·F( ), or stochastic changes to the dynamics. For example, an arm might be randomly perturbed by tool inter-

    where wp is a factor that weights the importance of being at the right position against the energy consumption, which

    for simplicity we model by|u| 2. Taking into account that the motor commands u are deterministic, and decomposing

    the expected position error into an error of the mean plus the variance, we can write the expected cost J as

    actions such as when drilling into a wall, with stronger effects for certain postures, and milder effects for others.

    J = wp .

    2 p..

    |

    2. + | u 2 ,

    )

    It is not obvious how one can model state dependent noise

    (f ( u) ) + w

    f ( u) (f ( u)

    (11)

    analytically.

    We therefore propose to include a supervised learning component and to acquire both the dynamics and the noise model in a data-driven fashion (Figure 5). Our method of choice in this paper is Locally Weighted Projection Regres- sion (LWPR) or (Vijayakumar et al. 2005), because that algorithm allows us to adapt the models incrementally and

    online, and it is able to reflect heteroscedastic4 noise in the

    which based on the LWPR learned model becomes

    J = wp( f(u) )2 +wp2( u)+| u | 2. (12)

    Here f( u) and ( u) denote the prediction and the one- standard-deviation-based confidence interval of the LWPR

    model of f ( u). The constant wp represents the importance

    of the accuracy requirements in our task. We then can easily minimize J with respect to u =( u , u )T numerically, taking

    training data through localized confidence intervals around 1 2 6

    its predictions. More details on learning with LWPR can be found in Appendix A.

    In order to simplify the presentation as much as possi- ble, and also due to technical challenges of operating on the real hardware (for details see Section 5), in this work we learn the stochastic mapping f ( u) from motor positions to joint angle , not taking into account velocities and accel- erations. During stationary conditions and in the absence of perturbations, this mapping reflects the equilibrium posi- tion of the arm (Figure 2, left). In correspondence to the

    general dynamic equation (8), here the state =x eq repre- sents the current equilibrium position, u the applied motor

    action, and dx the resulting change in equilibrium position. Therefore the reduced dynamics used here, only depends on the control signals, i.e.

    dx = f ( u) dt + F( u) d , N ( 0, 1). (9)

    into account the box constraints 0 ui 180 .

    Dynamic Control with Learned Stochastic Information

    Equilibrium position control is ignorant about the dynam- ics of the arm, that is, going from one desired position to the next might induce swinging movements, which are not damped out actively. Proper dynamic control should take these effect into account and optimize the command sequence accordingly. What follows is a description of how we model the full dynamics of the arm, that is, the combination of the dynamics of the joint and the motors.

    The state vector x[k] of our system at time k consists of the joint angle x1[k] = [k] and joint velocity x2[k] = [k] as well as 12 additional state variables, which represent the

    command history of the two motors, i.e. the last six motor

    Cost function

    Stochastic Optimal Controller

    Execute control law Plant

    accelerations

    Dynamics + Noise Dynamics data

    Dynamics learning

    positions, velocities & motor commands

    Fig. 5. Schematic diagram of our proposed combination of stochastic optimal control (SOC) and learning. The dynamic model used in SOC is acquired and constantly updated with data from the plant. The learning algorithm extracts the dynamics as well as stochastic information contained (noise model from confidence intervals). SOC takes into account both measures in the optimization.

    commands that were applied to the system. The state vector, therefore, is

    x[k] = ( [k], [k], u1[k 1], … , u1[k 6],

    The gradient of ¨( x) is given by the chain rule, where is the short notation for ( , , ). Note that= x1, = x2, and and are calculated from x3…14:

    u2[k 1], … , u2[k 6])T , (13)

    .

    1

    where the additional state variables x3[k],

    … , x8[k] for

    z

    x¨ =I ,D, p, p,… ,h7,

    motor 1, and similarly, x9[k], … , x14[k] for motor 2, are

    h ,

    2 p, … ,

    h7 . (20)

    required to represent the FIR filter states of the motor

    dynamics from Equation (6). We can estimate the motor

    positions [k] and [k] solely from these filter states because the FIR coefficients are h0 = p = 0:

    This only shows the second row of the Jacobian xf( x, u) and for brevity we omitted the others as they are trivial. The

    7 7 other Jacobian uf( x, u) consists of zero entries apart from

    [k] =

    .

    j=2

    hju1[k j + 1] =

    .

    j=2

    hjxj+1[k] (14)

    the entry 1/t 50 at indices ( 3, 1) and ( 9, 2).

    Since the dyn=amics of our system is non-linear and high-

    7 7 dimensional, we have to employ an iterative local optimiza-

    [k] =

    .

    hju2[k j + 1] =

    .

    hjxj +7[k]. (15)

    tion approach. We employ the ILQG method due to its abil- ity to include constraints on the commands. More details of

    j=2 j=2 the ILQG algorithm can be found in Appendix B.

    Based on the rigid body dynamics from Equation (5) we

    can compute the acceleration from states (i.e. forward dynamics) as

    1 .

    I

    ¨[k] = ( [k], [k], [k]) D[k] . (16)

    z

    Therefore running the dynamics here means account-

    The usual ILQG formulation is based on an analytically given cost function (deterministic) and a stochastic dynamic function. Here we use a deterministic dynamics (with the idealized analytic model) and we propose a cost function that takes stochastic information into account.

    2

    ing for motor dynamics by shifting the filter states that is

    c( x, u) = wp( x1 )2 +w2vx2 + we | u |

    + wd(( u1 x3)2

    xi+1[k + 1] = xi[k] for i = 3, … , 7 and i = 9, … , 13, and

    + ( u2 x9)2

    ) +wp 2( u) . (21)

    then Euler-integrating the velocities and accelerations:

    x[k + 1] = x[k] + t f( x[k], u[k]) (17)

    = ( [k] + t[k], [k] + t¨[k], u1 [k],

    x3[k], … , x7[k], u2[k],

    x9[k], … , x13[k])T . (18)

    Alternatively, we can drop the time index k and write the dynamics in.compact form as

    All quantities in Equation (21) (also possibly the pre- factors) are time-dependent, but we have dropped the time indices for notational simplicity. As before wp governs the accuracy requirement. In addition, a stability term wv gov- erns the importance of having zero velocity and we penal- izes energy consumption at the level of springs. The weight-

    ing factor wd penalizes changes in motor commands and therefore energy consumption at the level of the servomo-

    f( x, u)

    x , ¨( x) ,

    1

    ( u x ) ,

    1

    ( x x ) , ,

    tor. The last term includes the learned uncertainty in our

    = 2

    1

    t 1 3

    1

    t 3 4

    T

    … equilibrium positions, which is here also scaled by wp. This is justified because, for example, for a reaching task, the

    arm will finish with the servomotors in a position such that

    t ( u2 x8), t ( x8 x9), …

    . (19)

    the arms equilibrium position is the desired position ,and

    we have learned from data how much perturbation we can expect at any such configuration. The same holds true for slow tracking tasks, where the servos will be moved such that the equilibrium positions track the desired trajectory.

  2. Results

    In this section we present results from the optimal control model applied to the hardware described earlier in Sec- tion 2. We first highlight the adaptation capabilities of this framework experimentally and then show how the learned stochastic information leads to an improved control strat- egy over solutions obtained without stochastic information. More specifically the new model achieves higher posi- tional accuracy by varying impedance of the arm through motor co-contraction. We study position holding, trajectory tracking and target reaching tasks.

    Experiment 1: Adaptation towards a Systematic Change in the System

    An advantage of the learned dynamic paradigm is that it allows us to account for systematic changes without prior knowledge of the shape or source of the perturbation. To demonstrate such an adaptation scenario we set up a sys- tematic change in the hardware by replacing the left spring, between motor 1 and the joint (i.e. between points A and C in Figure 1), with one that has a lower, unknown spring constant. The aim is to hold a certain equilibrium position using the energy optimal position controller described in Section 3.2. As expected, the prediction about the equilib-

    rium points (i.e. f( u)) does not match the real, changed

    system properties.

    Next, we demonstrate how the system can adapt online and increase performance, trial by trial. We specified a tar-

    get trajectory that is a linear interpolation of 200 steps between the start position 0 = 30 and the target posi-

    shown by the asymmetric shape. Analyzing the motor com- mands (Figure 6 right) shows that the optimal controller, for all trials, chooses the motor commands with virtually no co-contraction. This is a sensible choice as co-contraction would contradict the minimum energy cost function that we have specified.

    The Role of Stochastic Informationfor Impedance Control

    Because co-contraction and energy consumption are oppos- ing properties, our controller will hardly make use of the redundant degree of freedom in the actuation. Even though minimum energy optimal control in an antagonistic system seems to be unable to co-contract it remains our favorite choice of performance index as it also implies compliant movement and as it follows the biological motivation. If we consider the stochastic information that would arise from a task involving random perturbations, we can see that the produced stochasticity holds valuable information about the stability of the system.7 If the uncertainty can be reduced by co-contracting it will be reflected in the data, i.e. in the LWPR confidence bounds. Therefore the answer to the pre- vious question is that, given that we wish to achieve high task accuracy, the controller should co-contract whenever it can reduce the expected noise/stochasticity in the system (weighted with the accuracy requirement).

    Suppose our system experiences some form of small random perturbations during control. In the hardware we realize such a scenario by adding a perturbation motor at the end of the arm, which mimics, for example, a drilling tool (panel a in Figure 3). The perturbation on the arm is produced by alternating the servomotor positions quickly

    every 200 ms from 40 to40. The inertia of the addi- tional weight then produces deflections of the arm from the

    current equilibrium position. With these perturbations, we

    collected new training data and updated the existing LWPR model f. The collected data reveals that the arm stabilizes

    tion =

    30. We tracked this trajectory by recomputing

    in regions with higher co-contraction, where the stiffness

    the equilibrium positions, i.e. by minimizing Equation (12)

    is higher. This behavior is illustrated in Figure 8, which

    at a rate of 50 Hz. At the same time we updated f( u)

    shows motion traces around =0

    due to the perturbation

    during reaching. Due to the nature of local learning algo- rithms, f is only updated in the neighborhood of the current

    trajectory and therefore shows limited generalization. To account for this, after each trial, we additionally updated the model with 400 training data points, collected from a

    motor for different co-contraction levels. This information

    is contained in the learned confidence bounds (Figure 9) and, therefore, the optimal controller effectively tries to find the tradeoff between accuracy and energy consumption.

    20 × 20 grid of the motors range u1

    = u2

    = [0, 180].

    Experiment 2: Impedance Control for

    Figure 6 depicts the outcome of this adaptation experiment.

    One can observe that the controller initially (lighter lines) fails to track the desired trajectory (red). However, there is significant improvement between each trial, especially between trials 1 to 5. After about nine trials the internal model has been updated and manages to track the desired trajectory well (up to the hardwares level of precision). The equilibrium position predictions in Figure 7 confirm that the the systematic shift has been successfully learned, as

    Varying Accuracy Requirements

    Based on the learned LWPR model f from the previous section, we can demonstrate the improved control behav- ior of the stochastic optimization with emerging impedance control. We formulate a task to hold the arm at the fixed

    positions = 15 and = 0, respectively. While minimiz-

    ing for the cost function in Equation (12), we continuously

    and slowly increased the position penalty within the range

    30

    Desired

    160

    9 Motor 1 9

    Trial 0

    20 Trial 1

    Trial 2

    Trial 3

    3 140

    2

    Motor commands [deg]

    1 120

    Motor 2

    3

    2

    Joint angle [deg]

    10 Trial 9

    0

    10

    20

    30

    0 1

    100

    0

    80 0

    60 1

    2

    40 3

    9

    20

    40

    0 20 40 60 80 100 120 140 160 180 200

    Target index

    0

    0 20 40 60 80 100 120 140 160 180 200

    Target index

    Fig. 6. Visualization of the adaptation process. Left: Desired (red) and observed arm positions. Right: Motor commands for the corresponding trials. Darker and thinne lines indicate later stages of learning.

    wp =[102, 105]. The left column in Figure 10 summarizes the results we discuss next: At w=p 102 to approximately wp =100 the optimization neglects position accuracy and minimizes mainly for energy, i.e. u=1 u2 = 0. The actual joint positions, because of the perturbations, oscil-

    late around the mean =0 as indicated by the shaded

    the cost function (21) by setting the weighting terms as follows: the time-dependent position penalty is a mono- tonically increasing linear interpolation of 100 steps, i.e. wp[t] = [0.1, 0.2, … , 10]. The penalty for zero endpoint

    velocity was set to wv[t] = 0 for 0 < t < 80 and wv[t] = 1

    for t 80. The energy penalties are assumed constant

    area. Between wp =100 and wp

    =102 the position con-

    we = wd = 1 during the whole movement.

    straint starts to catch up with the energy constraint; a shift in the mean position towards can be observed. At about

    ·

    wp =5 101, the variance in the positions increases as the periodic perturbation seems to coincide with the resonance

    frequency of the system. For wp > 102 the stochastic infor- mation is weighted sufficiently such that the optimal solu- tion increases the co-contraction and the accuracy improves further.

    In contrast, if we run the same experiment while ignoring the stochastic part of the cost function, i.e. we minimize for

    By using ILQG, we then compute an optimal control

    sequence u¯ with the corresponding desired trajectory x¯ and a feedback control law L. Figure 12 depicts the reaching

    performance of the ILQG trajectory, applied in open-loop mode and in closed-loop mode (i.e. using feedback law L), where the robot has been perturbed by a manual push. The closed-loop scheme successfully corrects the perturbation and reaches the target while the open-loop controller oscil- lates and fails to reach the target. This experiment high- lights the benefits of closed-loop optimization which can,

    the deterministic cost function J

    = wp ( f( u) )2 +| u | 2

    by incorporating the full dynamic description of the sys-

    only, we can can see (Figure 11) that the system does, as

    expected, not co-contract and hardly improves performance accuracy.

    4.4. Experiment 3: ILQG-Reaching Task with a Stochastic Cost Function

    For certain tasks, such as quick target reaching or faster tracking of trajectories, the system dynamics based on equi-

    librium points =f ( u) may not be sufficient, as it contains no information about the velocities and accelerations of the

    system. Next, we assume a full forward dynamic descrip- tion of our system as identified in Equation (13), where the state consists of joint angles, joint velocities, and twelve motor states.

    The task is to start at position 0 = 0 and reach

    tem, account for such perturbations. However, the ability to correct perturbations is limited by the hardware control bandwidth (i.e. slow servomotor dynamics and 50 Hz con- trol board frequency). If the system also suffers from feed- back or motor delays the correction ability is limited and for example accounting for vibrations or noise8 is difficult to achieve using feedback signals only. For such stochastic perturbations, impedance control can improve performance as it changes the mechanical properties of the system in a feed-forward manner, i.e. it reduces the effects of the perturbations in the first place.

    To realize such a scenario, we defined a tracking task that starts at the zero position then moves away and back again along a sinusoidal curve for 2.5 s. The cost function parameters for this task are defined as follows: The time-

    dependent position penalty is wp[t] = [50, 100, … , 4,000]

    towards the target = 0.3 rad (= 17.18). The reaching for 0 < t < 80 and wp[t] = 4,000 for t 80. The endpoint

    movement duration is fixed at 2 s, which corresponds to

    T = 100 discretized time steps at the hardwares opera- tion rate of 50 Hz. This task can be formalized based on

    velocity term is wv[t] = 0 for 0 < t < 80 and wv[t] = 10 for t 80. The energy penalties are held constant, i.e. we = wd = 1.

    Learned position after 0 iterations

    180

    15

    180

    Learned position after 1 iterations

    15

    150 150

    u [deg]

    u [deg]

    120 120

    1

    1

    90 90

    60 60

    30 30

    0

    0 30 60 90 120 150 180

    0

    0 30 60 90 120

    150 180

    u [deg] u

    2 2

    [deg]

    Learned position after 3 iterations

    180

    180

    Learn ed position after 9 iteratio ns

    150 150

    u [deg]

    u [deg]

    120 120

    1

    1

    90 90

    60 60

    30 30

    0

    0 30 60 90 120 150 180

    0

    0 30 60 90 120 150

    180

    u [deg] u

    2 2

    [deg]

    Fig. 7. Learned position models during the adaptation process. The white numbers represent the equilibrium points.

    As before, we observe the benefits of using stochastic information for optimization compared to a determinis- tic optimization (not using the LWPR confidence bounds). After computing the optimal control using ILQG we ran the optimal feedback control law (see Appendix B) con- secutively 20 times in each condition, i.e. with and without stochastic optimization. Note that the perturbation motor is switched on at all times. Figure 13 summarizes the results: as expected the stochastic information in the cost function induces a co-activation for the reaching task, which shows generally better performance in terms of reduced variabil- ity of the trajectories. Evaluating the movement variability where the accuracy weight is maximal, i.e. for t > 80, the standard deviation of the trajectories is significantly lower with stoch = 0.55 for the stochastic optimization com-

    pared to the deterministic optimization with det = 1.38.

    A detailed look at the bottom right plot in Figure 13 reveals a minor shift in the recorded trajectory compared to the planned one from the analytic model. We attribute this error to imprecisions in the hardware, i.e. tiny asymme- tries, which are not included in the analytic model. In the case of higher co-contraction, small manufacturing errors and an increased joint friction lead to deviations towards the idealized analytic model predictions. Indeed the learned dynamic model can account for these asymmetries as can be seen in Figure 9 (left), along the equilibrium position = 0, i.e. the line u1 = u2 is slightly skewed.

  3. Conclusion and Outlook

In this paper we have presented a stochastic optimal control model for antagonistically actuated systems. We proposed

Fig. 8. Motion traces of our SEA hardware around = 0. The perturbation motor causes different deflections depending on the co- contraction levels: (a) u1 = u2 = 0, (b) u1 = u2 = 45, (c) u1 = u2 = 120.

180

150

u [deg]

120

1

90

60

30

0

Learned Equilibrium Position

50

0

50

0 30 60 90 120 150 180

180

150

u [deg]

120

1

90

60

30

0

Confidence Interval/Noise Level

0 30 60 90 120 150 180

0.08

0.07

0.06

0.05

0.04

0.03

0.02

0.01

u [deg]

2

u [deg]

2

Fig. 9. Left: Learned equilibrium position as a function of the motor positions (in degrees), with contour lines spaced at 5 intervals. Right: Stochastic information given by the heteroscedastic confidence intervals of LWPR.

to learn the dynamics as well as the stochastic information of the controlled system from sensorimotor feedback of the plant. This control architecture can account for a system- atic change in the system properties (Experiment 1) and, furthermore, is able, by incorporating the heteroscedastic prediction variances into the optimiation, to compensate for stochastic perturbations that were induced in the plant.

Doing so, our control model demonstrated significantly better accuracy performance than the deterministic opti- mization in both energy-optimal equilibrium point control (Experiment 2) and energy-optimal reaching using dynamic optimization (Experiment 3). The improved behavior was achieved by co-activating antagonistic motors, i.e. by using the redundant degree of freedom in the system based on the

25

Desired pos.

20 Actual pos.

15

degrees

10

5

0

5

15

10

degrees

5

0

5

10

Desired pos. Actual pos.

10 2 0

10 10

2 4

10 10

6 15 2 0

10 10 10

2 4 6

10 10 10

Wp

Motor 1

Motor 2

120

100

Wp

Motor 1

100

degrees

80

60

40

20

0

90 Motor 2

80

70

degrees

60

50

40

30

20

10

2

0

2

4

6

2

0

2

4

6

10

10

10

Wp

10

10

10

10

10

Wp

10

10

0

Fig. 10. Experiment with increasing position penalty wp for two targets. Left plot column: = 15; right plot column: = 0. The plots show the desired vs. measured position and corresponding motor commands as a function of the accuracy requirement (pre-factor wp). The shaded area results from the perturbation motor.

first principles of optimality. The presented results demon- strate that this is a viable optimal control strategy for real hardware systems that exhibit hard-to-model system prop- erties (e.g. asymmetries, systematic changes) as well as stochastic characteristics (e.g. using a power tool) that may be unknown a priori.

An advantage of the presented control architecture is that motor co-activation (or impedance) does not need to be specified explicitly as a control variable but emerges from the actual learned stochasticity within the system (scaled with the specified accuracy requirements of the task). Therefore, co-activation (i.e. higher impedance), since it is energetically expensive, will only be applied if it actually is beneficial for the accuracy of the task.

Exploiting stochasticity in wider domains The method- ology we suggest for optimal exploitation of sensorimo- tor stochasticity through learning is a generic principle that goes beyond applications to impedance modulation of antagonistic systems but can be generalized to deal with any kind of control or state dependent uncertainties. For exam- ple, if we wish to control a robot arm that suffers from poor repeatability in certain joint angles or in a particular range of velocities, this would be visible in the noise landscape

(given one has learned state dependent stochastic dynamics) and consequently those regions would be avoided by the optimal controller. In this context, the source of the stochas- ticity is irrelevant for the learner and therefore, it could arise from internal (i.e. noise in the motor), as well as external (i.e. power tool) sources. However, the stochastic system properties must, to a certain degree, be stationary in time so that the learner can acquire enough information about the noise landscape.

Biological relevance As mentioned in the introduction, biological systems are often used as a benchmark for the control of artificial systems. In this work not only the antag- onistic hardware but also the actual control architecture is motivated by biological principles. Optimality approaches have been a very fruitful line of research (Todorov 2004; Scott 2004; Shadmehr and Krakauer 2008) and its combi- nation with a learning paradigm (Mitrovic et al. 2008) is biologically justified a priori, since the sensorimotor sys- tem can be seen as the product of an optimization process (i.e. evolution, development, learning, adaptation) that con- stantly learns to improve its behavioral performance (Li 2006). Indeed, internal models play a key role in efficient human motor control (Davidson and Wolpert 2005) and it

30

25

20

degrees

15

10

5

0

5

10 2 0

10 10

2 4

10 10

Desired pos. Actual pos.

6

10

10

8

6

4

degrees

2

0

2

4

6

8

10 2 0

10 10

2 4

10 10

Desired pos. Actual pos.

6

10

Wp

60

Motor 1

Motor 2

50

degrees

40

30

20

10

0

Wp

20

Motor 1

18 Motor 2

16

14

degrees

12

10

8

6

4

2

0

2 0 2 4 6

2 0 2 4 6

10 10 10 10

Wp

10 10 10 10 10 10

Wp

Fig. 11. The same experiment as in Figure 10 where the stochastic information was not incorporated into the optimization.

20 10

Desired

15 Closed Loop Perturb 5

Open Loop Perturb

10

0

Arm Position [deg]

Arm Position [deg]

5

Desired

Closed Loop

Open Loop

Closed Loop Perturb

Open Loop Perturb

0 5

5

10

15

20

25

30

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Time [s]

10

15

20

25

30

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Time [s]

Fig. 12. ILQG-reaching task without stochastic information used in open-loop and closed-loop control. We perturbed the arm by hitting it once after 0.4 s (left plot) and 1.2 s (right plot), respectively. The dashed lines in the right plot represent unperturbed trials (open loop and closed loop).

has been suggested that the motor system forms an internal forward dynamic model to compensate for delays, uncer- tainty of sensory feedback, and environmental changes in a predictive fashion (Wolpert et al. 1995; Kawato 1999; Shad- mehr and Wise 2005). Notably a learned optimal trade- off between energy consumption, accuracy and impedance has been repeatedly observed in human impedance con- trol studies (Burdet et al. 2001; Franklin et al. 2008). More specifically, the amount of impedance modulation in

humans seems to be governed by some measure of uncer- tainty, which could arise from internal (e.g. motor noise) or external (e.g. tools) sources (Selen et al. 2009).

In the computational model presented here, these uncer- tainties are represented by the heteroscedastic confidence bounds of LWPR and integrated into the optimization process via the performance index (i.e. cost function). Such an assumption is biologically plausible, since humans have the ability to learn not only the dynamics but also the

70 Motor 1 70

Motor 2

Motor commands [deg]

Motor Commands [deg]

60 60

50 50

40 40

30 30

20 20

10

0

0 0.5 1 1.5 2

10

0

0 0.5 1 1.5 2

Motor 1

Motor 2

Desired position

Real position, determ. plan

10 10

Arm Position [deg]

Arm Position [deg]

5 5

0 0

5 5

10 10

Desired position

Real position, stoch. plan

15

0 0.5 1 1.5 2

Time [s]

15

0 0.5 1 1.5 2

Time [s]

Fig. 13. Twenty trials of ILQG for a tracking task of 2.5 s. Left column: Deterministic optimization does not exhibit co-contraction. Right column: Twenty trials of ILQG using stochastic information in the cost function. The system co-contracts as the accuracy requirements increase.

stochastic characteristics of tasks, in order t optimally learn the control of a complex task (Chhabra and Jacobs 2006; Selen et al. 2009).

Hardware Limitations and Scalability This work repre- sents an initial attempt to modulate the impedance of a real antagonistic system in a principled fashion. The proposed SEA has been primarily designed to perform as a proof of concept of our control method on a real system. Specifi- cally we can identify several limitations of our system that need further investigation in the future.

First, the stiffness range of the system is fairly low as spring non-linearities are achieved by the geometric effect of changing the moment arms. There are other, mechani- cally sophisticated, SEA designs with large stiffness ranges (e.g. Grebenstein and van der Smagt 2008; van Ham et al. 2009), which also could serve as attractive implementation platforms for our algorithm. Specifically the MACCEPA design (van Ham et al. 2007) is very appealing as it is tech- nically simple and offers a large stiffness range; however, parallels to biologically realistic implementations are less obvious in this design, as the system is not antagonistically

actuated. The fact that we were able to obtain a significant increase in co-contraction from the learned stochastic infor- mation, even for hardware with a very low stiffness range is promising, indicating good resolution capabilities of the localized variance measure in LWPR.

Second, the relatively slow control loop (50 Hz) causes controllability issues (i.e. slow feedback) and, furthermore, turned out to be sensitive to numerical integration errors within ILQG. While these numerical issues have not caused problems in an analytic dynamic formulation (Experiment 3), they turned out to be critical when we run ILQG using

the full learned forward dynamics f(x, u). Under these con-

ditions, for most of the time ILQG does not converge to a reasonable solution. A potential route of improvement could be a combination of LWPR learning with an analytic model. Instead of ignoring valuable knowledge about the system given in analytic form, one could focus on learning an error model only, i.e. aspects of the dynamics that are not described by the analytic model.

Third, the transfer of optimal controls from simulation to the real hardware has proven to be very challenging. Cur- rently we are computing ILQG solutions for a fixed time

horizon and later applying them to the SEA. For slower movements this approach produces satisfying accuracy. In Experiment 3 we enforced slower and smoother move- ments by formulating an appropriate time-dependent cost function. However, for movements with higher frequency the situation is more difficult: Errors accumulate on the hardware over the course of the trajectory, since the feed- back loop for corrections is very slow. This leads to solu- tions that differ significantly from the preplanned optimal solution. A potential route to resolve this problem is to use a model predictive control approach in which the opti- mal solutions are re-computed during control with current states of the plant as initial states. However, this approach requires computationally efficient re-computations of the optimal control law, which may be hard to obtain, especially for systems with higher dimensionality.

Finally, our experiments were carried out on a low- dimensional system with a single joint and two motors. Implementations on systems with higher dimensionality, however, are still very challenging as the construction of antagonistic robots is non-trivial and the availability of large degrees of freedom systems is very limited. Due to the curse of dimensionality, high-dimensional systems impose serious computational challenges on both optimal control methods and machine learning techniques. While some of these issues have been addressed in previous work (Todorov et al. 2005; Mitrovic et al. 2008, 2010), we believe that the study of impedance control based on stochastic senso- rimotor feedback is a promising route of research for both robotic and biological systems.

Funding

This work was partially supported by the EU FP7 Project STIFF and by a Royal Academy of Engineering and Microsoft Research Fellowship to SV.

Notes

  1. This means we put expectation brackets around the integrals and h( ·) in (7).

  2. For the infinite-horizon case, the matrix is constant.

  3. Note that we have ignored any motor dynamics.

  4. Stability here refers to the desired equilibrium position.

  5. Heteroscedastic noise has different variances across the state and action space. For example, the variance of the noise can scale with the magnitude of the control signal u, which is also called signal dependent noise.

  6. Alternatively, assume the feedback loop is so slow that it is practically unusable.

  7. For our SEA this optimization can be performed in real time,

    i.e. at least 50 times per second, which corresponds to the maximum control frequency of our system (50 Hz).

  8. or any other high-frequency perturbation.

References

Bellman R (1957) Dynamic Programming. Princeton, NJ: Prince- ton University Press.

Bertsekas DP (1995) Dynamic programming and optimal control.

Belmont, MA: Athena Scientific.

Bobrow J, Dubowsky S and Gibson J (1985) Time-optimal control of robotic manipulators along specified paths. The International Journal of Robotics Research 4(3): 317.

Burdet E, Osu R, Franklin DW, Milner TE and Kawato M (2001) The central nervous system stabilizes unstable dynamics by learning optimal impedance. Nature 414: 446449.

Chhabra M and Jacobs RA (2006) Near-optimal human adap- tive control across different noise environments. Journal of Neuroscience 26: 1088310887.

Collins S and Ruina A (2005) A bipedal walking robot with efficient human-like gait. In Proceedings of the IEEE Inter- national Conference on Robotics and automation (ICRA), pp. 19831988.

Collins SH and Kuo AD (2010) Recycling energy to restore impaired ankle function during human walking. PLoS ONE 5 pages-e9307.

Cortes J, Martinez S, Ostrowski J and McIsaac KA (2001) Opti- mal gaits for dynamic robotic locomotion. The International Journal of Robotics Research 20: 707728.

Davidson PR and Wolpert DM (2005) Widespread access to pre- dictive models in the motor system: a short review. Journal of Neural Engineering 2: 313319.

Dyer P and McReynolds S (1970) The Computational Theory of Optimal Control. New York: Academic Press.

Faisal AA, Selen LPJ and Wolpert DM (2008) Noise in the nervous system. Nature Reviews Neuroscience 9: 292303.

Flash T and Hogan N (1985) The coordination of arm movements: an experimentally confirmed mathematical model. Journal of Neuroscience 5: 16881703.

Franklin D, Burdet E, Tee KP, Osu R, Chew CM, Milner TE, et al. (2008) CNS learns stable, accurate, and efficient movements using a simple algorithm. The Journal of Neuroscience 28: 1116511173.

Grebenstein M and van der Smagt P (2008) Antagonism for a highly anthropomorphic handarm system. Advanced Robotics 22: 3955.

Hogan N (1984) Adaptive control of mechanical impedance by coactivation of antagonist muscles. IEEE Transactions on Automatic Control 29: 681690.

Hurst JW, Chestnutt J and Rizzi A (2004) An Actuator with Mechanically Adjustable Series Compliance. Technical Report CMU-RI-TR-04-24, Robotics Institute, Carnegie Mellon University.

Jacobson DH and Mayne DQ (1970) Differential Dynamic Pro- gramming. New York: Elsevier.

Kawato M (1999) Internal models for motor control and trajectory planning. Current Opinion in Neurobiology 9: 718727.

Laffranchi M, Tsagarakis NG and Caldwell DG (2010) A vari- able physical damping actuator (VPDA) for compliant robotic joints. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA).

Li W (2006) Optimal Control for Biological Movement Systems.

PhD dissertation, University of California, an Diego, CA.

Li W and Todorov E (2007) Iterative linearization methods for approximately optimal control and estimation of non- linear stochastic system. International Journal of Control 80: 14391453.

Migliore SA, Brown EA and DeWeerth SP (2005) Biologically inspired joint stiffness control. In Proceedings of the 2005

IEEE International Conference on Robotics and Automation (ICRA), pp. 45084513.

Mitrovic D, Klanke S and Vijayakumar S (2008) Adaptive opti- mal control for redundantly actuated arms. In Proceedings of the 10th Internatio nal Conference on Simulation of Adaptive Behaviour (SAB), Osaka, Japan.

Mitrovic D, Klanke S and Vijayakumar S (2010) Optimal feed- back control for anthropomorphic manipulators. In Proceed- ings of the IEEE International Conference on Robotics and Automation (ICRA).

Nakamura Y and Hanafusa H (1987) Optimal redundancy control of robot manipulators. The International Journal of Robotics Research 6: 3242.

Paluska D and Herr H (2006) The effect of series elasticity on actuator power and work output: Implications for robotic and prosthetic joint design. Robotics and Autonomous Systems 54: 667673.

Pratt GA and Williamson MM (1995) Series elastic actuators. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 399406.

Scott SH (2004) Optimal feedback control and the neural basis of volitional motor control. Nature Reviews Neuroscience 5: 532546.

Selen LP, Franklin DW and Wolpert DM (2009) Impedance con- trol reduces instability that arises from motor noise. Journal of Neuroscience 40(9): 1260612616.

Shadmehr R and Krakauer JW (2008) A computational neu- roanatomy for motor control. Experimental Brain Research 185: 359381.

Shadmehr R and Wise S (2005) The Computational Neurobiology of Reaching and Pointing. Cambridge, MA: The MIT Press. Stengel RF (1994) Optimal control and estimation. New York:

Dover Publications.

Todorov E (2004) Optimality principles in sensorimotor control.

Nature Neuroscience 7: 907915.

Todorov E (2006) Optimal Control Theory. In Doya K (Ed.), Bayesian Brain: Probabilistic Approaches to Neural Coding. Cambridge, MA: MIT Press, pp. 269298.

Todorov E and Li W (2005) A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems. In Proceedings of the American Control Conference.

Todorov E, Li W and Pan X (2005) From task parameters to motor synergies: A hierarchical framework for approximately-optimal control of redundant manipulators. Journal of Robotic Systems 22: 691710.

Tonietti G, Schiavi R and Bicchi A (2005) Design and con- trol of a variable stiffness actuator for safe and fast physi- cal human/robot interaction. IEEE International Conference Robotics and Automation (ICRA) pp. 526531.

Uno Y, Kawato M and Suzuki R (1989) Formation and control of optimal trajectories in human multijoint arm movements: minimum torque-change model. Biological Cybernetics 61: 89101.

van Ham R, Sugar T, Vanderborght B, Hollander K and Lefeber

D (2009) Compliant actuator designs. IEEE Robotics and

Design and implementation in a biped robot. Robotics and Autonomous Systems 55: 761768.

Vanderborght B, Van Ham R, Lefeber D, Sugar TG and Hollan- der KW (2009) Comparison of mechanical design and energy consumption of adaptable, passivecompliant actuators. The International Journal of Robotics Research 28: 90103.

Vijayakumar S, DSouza A and Schaal S (2005) Incremental online learning in high dimensions. Neural Computation 17: 26022634.

Wolf S and Hirzinger G (2008) A new variable stiffness design: Matching requirements of the next robot generation. In Pro- ceedings of the IEEE International Conference on Robotics and Automation (ICRA).

Wolpert DM, Ghahramani Z and Jordan MI (1995) An internal model for sensorimotor integration. Science 269: 18801882.

Zinn M, Khatib O, Roth B and Salisbury JK (2004) Playing it safe. IEEE Robotics and Automation Magazine 11: 1221.

Appendix A: Learning with LWPR

In order to learn the plant dynamics various supervised learning algorithms could be applied. Here we focus on local learning methods, which represent a function by using small simplistic patches, e.g. first-order polynomials. The size of the locality is determined by gating activation ker- nels, and the positions and number of the local kernels are adapted during learning to represent the non-linear func- tion. Because the input data activates only local patches, local learning algorithms are robust against global nega- tive interference. This ensures the flexibility of the learned model towards systematic changes in the dynamic proper- ties of the arm (e.g. load, material wear). Furthermore, the domain of real-time robot control demands certain proper- ties of a learning algorithm, namely fast learning rates and high prediction speeds at run-time if the model is trained incrementally. LWPR has been shown to exhibit these prop- erties, and to be very efficient for incremental learning of non-linear models (Vijayakumar et al. 2005).

In LWPR, the regression function is constructed by blending local linear models, each of which is endowed with a locality kernel that defines the area of its validity (also termed its receptive field). During training, the parameters of the local models (locality and fit) are updated using incre- mental Partial Least Squares, and models can be pruned or added on an as-need basis, for example, when training data is generated in previously unexplored regions. Usually the receptive fields of LWPR are modeled by Gaussian kernels, so their activation or response to a query vector z (here the inputs are the two motor commands u) is given by

.

1 T , (22)

Automation Magazine 16(3): 8194.

van Ham R, Vanderborght B, Van Damme M, Verrelst B and

wk( z) = exp

2 ( z ck)

Dk( z ck )

Lefeber D (2007) MACCEPA, the mechanically adjustable compliance and controllable equilibrium position actuator:

where ck is the center of the kth linear model and Dk is its distance metric. Treating each output dimension separately

for notational convenience, the regression function can be written as

K K

1 .

f( z) =W

.

wk( z) k ( z), W =

wk( z) , (23)

k=1 k=1

0 T

k( z) = bk + bk ( z ck) , (24)

k

where b0 and bk denote the offset and slope of the kth model, respectively.

LWPR learning has the desirable property that it can be carried out online, and moreover, the learned model can be adapted to changes in the dynamics in real-time. Further- more, the statistical parameters of LWPR regression models provide access to the confidence intervals, here termed con- fidence bounds, of new prediction inputs (Vijayakumar et al. 2005). In LWPR the predictive variances are assumed to evolve as an additive combination of the variances within a local model and the variances independent of the local

Fig. 14. Typical regression function (blue continuous line) using LWPR. The dots indicate a representative training data set. The receptive fields are shown as ellipses drawn at the bottom of the plot. The shaded region represents the confidence bounds around the prediction function. The confidence bounds grow between z =[5..6] (no training data) and generally towards larger z values (noise grows with larger values).

2 i i i i

model. The predictive variance estimates predf,ok r the kth local model can be computed by analogy with ordinary

x¯ [k + 1] = x¯ [k] + t f( x¯ [k], u¯ [k]). Next, the discretized dynamics (Equation (5)) are linearly approximated as

linear regression. Similarly one can formulate the global variances 2 across models. By analogy with Equation (23), LWPR then combines both variances additively to form the

x[k + 1] =

.

.

f

x .

I + t

x¯ [k]

x[k] + t u[k.

f .

u .u¯ [k]

confidence bounds given by (26)

.

2 1 K

o pred = W 2 wk( z) 2 +

.K 2

wk ( z) pred,k . (25)

Similarly one can derive a quadratic approximation of the cost function around x¯i [k] and u¯ i [k]:

1

k=1 k=1 cost[k] = q[k] + x[k]Tq[k] + x[k]TQ[k]x[k] (27)

The local nature of LWPR leads to the intuitive require- ment that only receptive fields that actively contribute to the prediction (e.g. large linear regions) are involved in the actual confidence bounds calculation. Large confidence

bound values typically evolve if the training data contains where

2 +

u[k]Tr[k] + 1 u[k]TR[k]u[k] +

2

u[k]TP[k]x[k]

much noise and other sources of variability, such as chang- v[k]

ing output distributions. Further regions with sparse or no

trainingdata, i.e. unexplored regions, showlarge confidence

q[k] = tv[k] q[k] = t x

..

x¯ [k]

(28)

bounds compared with densely trained regions. Figure 14 depicts the learning concepts of LWPR graphically for a learned model with one input and one output dimension.

2v[k]

.

Q[k] = t . x x

v[k] .

x¯[k],u¯ [k]

P[k] = t

2v[k]

.

u x .x¯[k],u¯ [k]

2 v[k] .

The noisy training data was drawn from an example func- tion that becomes more linear and more noisy for larger z

r[k] = t

u .

u¯ [k]

R[k] = t

u u .

.

u¯ [k]

values. Furthermore, in the range z

= [5..6] no data was

Both approximations are formulated as deviations xi[k]

sampled for training to show the effects of sparse data on

=

i i i i

LWPR learning.

xi[k] x¯[k] and u [k] =u [k]

u¯[k] of the current opti-

Appendix B: The ILQG Algorithm

The ILQG algorithm starts with a time-discretized initial guess of an optimal control sequence and then iteratively improves it with respect to the cost function. From the

initial control sequence u¯ i at the ith iteration, the corre- sponding state sequence x¯ i is retrieved using the determin- istic forward dynamics f with a standard Euler integration

mal trajectory and therefore form a local LQG problem.

This linear quadratic problem can be solved efficiently via a modified Ricatti-like set of equations that yields an affine

control law [k]( =x) l[k] +L[k]x[k]. This control law has a special form: since it is defined in terms of deviations

of a nominal trajectory and since it needs to be implemented iteratively, it consists of an open-loop component l[k] and a feedback-component L[k]x[k]. The actual optimization in ILQG supports constraints for the control variable u, such as lower and upper bounds. After the optimal control signal

correction ¯ui has been obtained, it can be used to improve the current optimal control sequence for the next iteration using u¯i+1[k] = u¯ i[k] + ¯ui[k]. Finally, u¯ i+1[k] is applied to

where x[k] represents the real plant position and x[k¯] the desired position at time k.

the system dynamics (Equation (5)) and the new total cost

cost function

L,

feedback

along the trajectory is computed. The algorithm stops once the cost v cannot be significantly decreased anymore. After

(incl. target)

x x

controller

u

convergence, ILQG returns an optimal control sequence u¯

and a corresponding state sequence x¯ (i.e. trajectory). Along

dynamics model

ILQG

u u + u plant

+ (SEA)

with the open-loop parameters x¯and u,¯ ILQG produces a feedback matrix L which may serve as optimal feedback

gains for correcting local deviations from the desired tra- jectory of the plant (Figure 15). The control law for each time step k is defined as

u[k]plant = u¯ [k] + u[k] (29)

u[k] = L[k] · (x[k] x¯ [k]) , (30)

Fig. 15. The optimal feedback control scheme using ILQG.

Leave a Reply