Skip to main content

The Development of a Robot-Assisted Joint Attention Training System for Children with Autism


Autism is a neurodevelopmental disorder that affects how information is processed in the brain. It is present in approximately 1 in 68 children in the United States and often leads to struggles with the development of social skills. Autism therapy is most effective at early ages, but is also very expensive. Joint attention, which is the ability for multiple people to share their focus on the same object, is a social skill that children with autism often have difficulty learning. Previous research shows that children with autism generally prefer to interact with robots rather than humans. In this study, Nao, the Humanoid Robot and the EyeTribe eye tracker were utilized to develop a joint attention training system. The EyeTribe was used to determine if the direction of an individual’s gaze was a proper response to Nao’s gestures. It was found that the EyeTribe was capable of determining one’s point of gaze with deviation in the scale of millimeters. The development and implementation of such a system could be a viable and relatively low cost method of providing autism therapy.


The autism spectrum disorder is a term used to describe the range of neurodevelopmental disorders that affects 1 in 68 children in the United States [1]. Autism alters how information is processed in the brain and interferes with the development of social and language skills. The difficulty of communication usually causes children with autism and their families to become stressed and frustrated. The impairment in social skills also causes children with autism to struggle interacting with and learning from those around them, stunting their development.

Not only does autism place a lot of stress on the families affected, but it also is very costly to them and their health insurance providers, who are required by law in several states to provide for some treatments, such as speech therapy. Various forms of autism therapy aim to help stimulate the growth of social skills at an early age to help those affected, but are usually infeasible options for many families. The average cost of providing autism therapy/treatment for a child can exceed $60,000 per year [2] and reach $1.4 -$2.4 million over a child’s lifetime [3].

Recent studies have suggested that children with autism particularly enjoy interacting with robots as compared to humans [4]. This could possibly be because robots are very simplistic in nature and capable of precise physical performance. Furthermore, small humanoid or toy-like robots are generally not intimidating to children. Such studies also explored the use of robots to train various aspects of social interaction, such as imitation skills [4]. Therefore, robots and other technologies could possibly be used as an alternative method of providing autism therapy [5]. A robotic system could be used to aid fundamental social skills as joint attention, where a child is required to share focus on an object with another being and to communicate through various gestures, such as gazing or pointing.

In order to develop a system such as this, Nao, a fully programmable robot from Aldebaran Robotics, was utilized. Nao has several human-like characteristics and was interfaced with an eye tracking device known as the EyeTribe to develop a system to train joint attention skills.

The development and use of this system could help a child with autism develop his or her social skills at an early age and help him or her grow as a typically developing child. This system could make autism therapy more accessible to families, relieve some of their stress, and help those affected grow and develop at the pace of their peers.


Developing the Joint Attention Training Program

Nao is a widely used humanoid robot with 25 degrees of freedom, is 58 cm in height, and resembles a child. It also has a semi-realistic voice with text to speech functions. The EyeTribe eye tracker is a newly developed device which can detect a user’s gaze angle in real time. Nao and the EyeTribe both have software development kits that support the C# programming language. Microsoft Visual Studio was used to program this system.

A sequence of movements were linked together to greet the user by making Nao wave his hand and say hello. Since the system is aimed towards young children, Nao then introduces his program as a game which would theoretically improve participation and cooperation. Nao was then programmed to position his body in various pointing positions while instructing the user to look where he pointed.

In this system, Nao can point either arm upwards, downwards, and outwards for a total of six possible positions, i.e., upper left, left, lower left, upper right, right, and lower right.

After each of the methods for determining Nao’s posture was created, the program as depicted by the flowchart in Figure 1 was developed.


Figure 1. Chart depicting program flow and how it operates


A random number from zero to five was generated, with each number associated with one of the six possible positions. After greeting the user, Nao would point in the direction determined for eight seconds. If the user looked in the right direction within that amount of time, Nao would give positive feedback and say “Good Job!” If the allotted time passed without the user responding to the prompt, Nao would provide some encouragement and say “Let’s try again,” The program also records the number of successful or failed responses, the user’s gaze coordinates, and the time it took to recognize the response (reaction speed) to a text file.

Nao would then return to a neutral standing position before repeating the process until reaching a user specified limit.

Determining the EyeTribe’s Accuracy

After the system was developed, another program was created to test if using the EyeTribe to determine gaze location is viable. In this program, a target was drawn at a random location on a 93.34 pixels per inch computer screen from 3 feet away. The user would look at the center of the target and then click a button. The program would then store the coordinates of the target location and compare them to the coordinates from the EyeTribe text file.

Five sample sets of data were taken, each of which consisted of ten of the randomly drawn targets. The difference in pixels of the actual coordinates and the gaze coordinates was calculated for each set. The pixel difference was converted to distance based on the monitor resolution. The angular accuracy is calculated as the arctangent of the x and y components of the difference between the gaze and actual coordinates.

Testing the System

In order to test the accuracy of the system, six volunteers were gathered. The program was set to give ten prompts. The system was run two times with each volunteer. In the first (normal) run, the volunteers were instructed to properly follow the prompts given by Nao. In the second (abnormal) run, the volunteers were instructed to deliberately look away from Nao’s point to test for false positives. The volunteers were manually observed and the number of times the system malfunctioned was recorded for each run.


In the EyeTribe Accuracy test, the average angular accuracy was 0.70° and the average discrepancy between the gaze point and the EyeTribe’s coordinates was 12.30mm. Individual results for each dataset can be found in supplemental table 1.

Normal Correct Normal Incorrect Avg Rxn. Speed (ms) Abnormal Correct Abnormal Incorrect
User 1 10 0 1826.7 9 1
User 2 (G) 9 1 1485.1 9 1
User 3 9 1 2648.9 10 0
User 4 (G) 6 4 1694.5 10 0
User 5 7 3 3200.3 9 1
User 6 (G) 5 5 1555.8 10 0
Average 76.67% 23.33% 2068.6 95.00% 5.00%
Table 1. Table displaying results of system test. “(G)” corresponds to volunteers that were wearing glasses.


In the normal test, the system was able to properly respond to the user 76.67% of the time.. The reaction speeds of each user were able to be recorded during then normal test. The average reaction speed of all volunteers was 2068.6 ms.

In the abnormal test, the system did not report a false positive 95% of the time.


Nao, the humanoid robot, and the EyeTribe were able to be interfaced to create a program to train joint attention skills. The results of the EyeTribe accuracy test show that the device is able to determine the location of one’s gaze with very minute, almost negligible amounts of error. The average angular accuracy was 0.70°, which falls within the 0.5°-1.0° range the EyeTribe product website claims [6]. This means that the EyeTribe was able to determine gaze with error ranging approximately in the width of a fingertip.

The system was generally able to properly respond to a user’s gaze, but occasionally would not register a target hit even though the user was looking in the correct direction. This could possibly be because some volunteers were wearing glasses. The EyeTribe utilizes infrared illumination to track gaze. Because of this, the product website states that it can have issues with bifocals or special coatings, such as polarized lenses [6]. Users 4 and 6 were both wearing glasses during the test and had the lowest success rates.

The system rarely reported false positives during the abnormal tests. False positives may have been caused by users extending their gaze too far beyond the EyeTribe’s tracking range, causing it to return sporadic coordinates.

The reaction speeds of the volunteers were recorded for each volunteer and appear to be reasonable response times for the prompts. The reaction times may be slightly slower or faster than in actuality due to minute amounts of lag/latency that could occur while the program is running.

The accuracy of the EyeTribe in correspondence with the program developed reveals that this system could possibly be used to help train joint attention in children with autism. The system needs to be improved upon to reduce the possibility of falsely giving positive feedback if a child does not properly respond to a prompt while also ensuring that correct responses are always identified. The system needs to actually be used with children with autism to reveal whether or not it is capable of improving joint attention skills.

In that respect, if the system actually does help improve joint attention skills in these children, the question arises on whether or not those improvements will translate from a robotic, experimental setup to a real world setting.


Autism is a disorder that is very detrimental to the development of children and creates immense amounts of frustration and stress within families. Autism treatment and therapies are very expensive and the overall cost of care for individuals with autism is massive because they often need special supervision/attention throughout their entire lifetime.

Since other studies suggest children with autism generally show an interest in robots, the use of technology and robots to provide autism therapy could possibly be a way to address this issue.

This study introduced the preliminary stages for what could become a fully-fledged robotic system for providing autism therapy. The system could be improved upon by making Nao respond to prompts and instructions given by the user which would emulate a more realistic social environment.

In addition, different difficulties of the system could be designed by creating more possible directions for Nao to point in whilst making the range of the EyeTribe coordinates that trigger a target hit stricter. These modifications might cause the system to be more effective in training joint attention skills.

Since this program would likely be used by autism treatment specialists, a clear and concise graphical user interface should also be created for the program to make it easy to use and manipulate by people who might not have much experience with computers.

Nao has several more capabilities including speech recognition. It might be possible to expand upon this system by training speech, perhaps by attempting to get the user to engage in conversation with Nao.

If given more time, the system would be tested with several individuals to further ensure that it will perform similarly with any user. This would increase the validity and viability of actually using this system to train joint attention skills in children with autism.


Robotics and Autonomous Systems Laboratory

Dr. Nilanjan Sarkar

Zhi Zheng

School for Science and Math at Vanderbilt

Dr. Mary Loveless

Dr. Chris Vanags




Table S1. EyeTribe Accuracy Test



  1. “Autism Spectrum Disorders,” Internet: ,[Sep. 2014].
  2. “What is Autism? Facts and Figures,” Internet:,[Sep. 2014].
  3. A. Buescher, Z. Cidav, M. Knapp, D. Mandell, “Costs of Autism Spectrum Disorders in the United Kingdom and the United States,” Internet: ,Aug. 2014 [Sep. 2014].
  4. B. Robins, K. Dautenhahn, R. Te Boekhorst, A. Billard, “Robotic assistants in therapy and education of children with autism: Can a small humanoid robot help encourage social interaction skills?” Universal Access in the Information Society, vol. 4, iss. 2, pp. 105-120, Jul. 2005, [Sep. 2014].
  5. M. Goodwin, “Enhancing and Accelerating the Pace of Autism Research and Treatment,” Focus on Autism and Other Developmental Disabilities, vol. 23, iss. 2, pp. 125-128, Jun. 2008, [Sep. 2014].
  6. “The Eye Tribe Tracker,” Internet: [Feb. 2015].

Posted by on Wednesday, April 29, 2015 in May 2015.

Tags: , ,