O1 : Document Navigability: A Need for Print-Impaired
Tata Consultancy Services
O2 : Real time Indian sign language recognition using Manual and Non Manual features
Real time Indian sign language recognition using Manual and Non Manual features
Sign language, like other widely used languages, has its own linguistics, and thus is a language of its own. Today we have different APIs and proven systems to translate between different languages. But despite so many advances, we are yet to have matured system to convert Sign Language to English and the reverse. Much research is taking place in area of Sign Language Recognition with most of them pertaining to American Sign Language (ASL) with their own unique approaches. A robust recognition system can be leveraged in classrooms, Banks, Television broadcasting and conferences. Challenges to such systems are accuracy and the large vocabulary of sign language. This research is a culmination of the existing approaches, challenges, our methodology and future improvements to this vast domain of Sign Language Recognition. We have targeted Indian Sign Language (ISL) in particular. At sub-unit level signs are described using phonemes. They are combined in parallel with signs unlike sequentially in speech. Signs consist of Manual features which includes hand gestures and Non-Manual Features such as lip shape, facial expressions and body posture. The experiments include incorporating Manual Features and Non-Manual Feature data consisting of facial expression and eyebrow movement to generate signs. Combinations of three machine learning models working simultaneously help detect words, fingerspelling and facial features. Techniques leveraging normalization, datasets consisting of more than 4000 video samples from 5 individuals, Y-axis flipping to account for both handed signs and next word predictions help gain better performance in real time testing across 30 gestures and 37 classes of characters. Accuracy was observed at 98.4% in a lab environment. Work is ongoing with exploring additional factors to facial expressions and a wider range of datasets. An implemented and well tested model would offer immense value to accessibility research and demand within businesses for their workforce.
Microsoft Research, India
O3 : Disability Rights Advocacy on Social Media: Gendered Perspectives from India
Disability Rights Advocacy on Social Media: Gendered Perspectives from India
As social media gains global impetus, People with Disabilities (PwDs) are taking to participatory platforms such as Twitter and YouTube to advocate for a range of disability-related rights. Emergent scholarship from the West outlines how such platforms enable uniquely empowering and marginalizing conditions for PwDs, shaping in turn new methods (Pearson & Trevisan, 2015; Trevisan, 2017; Mann, 2018), cultures (Sweet et al, 2020; Andrews et al, 2019; Richter, 2019), and socio-politics (Egner, 2022; Ellis et al, 2015, Burke & Crow, 2016) of disability-related advocacy. On the other hand, little is known about this phenomenon in India, where access to Information and Communication Technologies is severely limited among PwDs (Khetarpal, 2014). However, social media platforms have witnessed rapid growth in the last decade (Statista, 2021a, 2021b). Indian PwDs make up a small but vocal portion of this user base. Among them are disabled activists, influencers, public figures, and ordinary users who vocalize their opinions on inclusion and accessibility, engage in online community organization, and assert their identities as a form of reclamation in a digital public sphere.
We aim to shed light on major focus areas and experiences of disability-related social media advocacy by Indian PwDs. We center the voices of Indian Women with Disabilities (WwDs), who contend with dual marginalization in a society marked by structural ableism and patriarchy (Thomas & Thomas, 2002; Ghai, 2002). As a preliminary step in this IRB-approved study, we attempt to provide a broad overview of themes, methods, and tools of online disability-rights advocacy, the engagement of PwDs with networks of disabled users and the broader digital publics, and the myriad effects of positioning the disabled self as hypervisible in an able-majoritied Indian cyberspace. We further delineate gender differentials in advocacy-related online experiences of Indian PwDs.
We adopt a mixed-methods approach (Snelson, 2016), engaging in the following techniques: (a) Quantitative analyses: We analyze Twitter data of Indian PwDs using standard natural language processing (NLP) tools (Farzindar et al, 2015) including topic modeling, sentiment classification, and emotion detection. Further, we analyze survey data of users with disabilities across platforms. (b) Qualitative analyses: We draw insights from semi-structured interviews via inductive open-coding. We supplement our findings with content analyses of social media posts published by study participants. This methodological approach offers us a holistic view of experiential and cultural nuances, allowing us to provide a much-needed overview of the motivations, needs, and challenges faced by Indian PwDs on social media.
Our work sheds light on three major dimensions. First, we expound on how social media is redefining potentials for disabled agency within a contemporary Indian context, and its long-term impacts on a broader Indian disability advocacy framework. Second, we probe gender differences in online advocacy-related behavior. Preliminary insights from Twitter data show us that male PwDs are more likely to be vocal about structural discrimination than WwDs. However, female Paralympians are more likely to advocate for inclusion rights than their male counterparts - even dissenting against a norm of positive content publication more frequently. Qualitative insights will help us achieve a more in-depth understanding of such phenomena. Third, that this transformation is both platform-mediated and situated in a patriarchal Indian context poses unique safety and accessibility challenges. The voices of PwDs will reveal how social media platforms can better serve disability advocacy, organization, and mobilization needs in India.
Indian Institute of Technology Delhi
O4 : Indoor Navigation and Accessibility - A Retrofit Perspective in Low-income Settings
Indoor Navigation and Accessibility - A Retrofit Perspective in Low-income Settings
Indoor navigation and accessibility are fundamental requirements for independent living but challenging due to limited information support. Access to the right information at right time is critical, for instance, finding a consultant room in a hospital, finding an exit to a shopping mall, or finding the right gate to exit from a metro station is an challenging for people with visual impairments. This is further amplified by the lack of accessible signage, poor infrastructure, accessibility information, and mediums to communicate this information. In short, information disability in low-resource settings is common which can lead to serious consequences i.e. in a healthcare facility without knowing the healthcare information people often end with sub-optimal treatment or even a complete lack of treatment. Our motivation is based on the requirement to access the right information in the right time. To achieve this, we need a robust information sourcing, validation, and communication ecosystem to enhance spatial awareness and information accessibility for people with visual disabilities. With increasing indoor complexity, a number of services and stakeholders demand more inclusive information to improve access to these spaces. The first mega challenge that obstructed indoor accessibility is the “last meter navigation” in various contexts, especially challenging for people with physical and visual disabilities. The second mega challenge is the “seamless access to required information” that is common in urban public spaces like hospitals, shopping centers, airports, etc. In these situations, easy access can considerably elevate user experience e.g. information about a nearby washroom or a water point can be very helpful for inclusive accessibility. In our earlier surveys and studies, we found built environment provides enough cues to support wayfinding and accessibility. Technology can capture, highlight or redefine these cues more inclusively and transparently for people with disability. Hence, we developed two baseline solution (i) an easy and scalable mechanism to annotate inclusive information related to urban public spaces and (ii) an accessible interface to provide seamless access to annotated information to support them to reach their destination and access the associated services independently. We believe additional choice of medium, language, and interface preferences can enhance the usability experience.
Deepashree Joshi B
Sri Ramachandra Institute of Higher Education and Research
Indian Institute of Technology Madras
O5 : Validation of a tablet based hearing screening device to screen children below 6 years of age in rural communities – Preliminary outcomes
Plug-and-train robot for hand rehabilitation: Version 2
Approximately 34 million children around the world are suffering from disabling hearing loss (WHO, 2021). As per the data available from the Indian Speech and Hearing Association (ISHA), there are only about 2,500 registered audiologists and speech pathologists who serve a population of 1.28 billion (Varshney et al.,2021). Therefore, services are largely unavailable or limited. Use of mobile/tablet-based hearing screening in primary health care initiatives is likely to improve access to early identification of disabling hearing loss in underserved regions (Yousuf Hussein et al., 2019). But there are challenges with existing m/tablet based hearing apps, due to higher cost in the LMIC context, non-availability of validated context based tests for young children (birth-6 years).
Therefore, a low-cost mobile/tablet based hearing screening device was developed as a part of a larger project of the SRESHT lab at SRIHER funded by DBT/Wellcome Trust India Alliance. This device is for use by trained grass root level workers in the rural districts to screen children at the community level.
To validate the newly developed hearing screening device against the gold standard for children below 6 years of age.
The hearing screening module was developed using a single board computer. Calibrated commercial bluetooth headphones (Sony WHCH 510) and speakers (JBL Go pro 2) were used as transducers. The hearing screening module consists of a checklist for high-risk factors (HRR) for hearing loss, parental questionnaire (PQ), and age-appropriate hearing screening tests - Behavioral Observation based screening (BOA) (0–1 year); Speech Awareness Task-based screening (SAT) (> 1–3 years); Speech Recognition Task-based screening (SRT) (> 3–6 years)
Forty children were screened across the age group of 1-6 years. 15 (11 Normal Hearing (NH) & 4 Hearing Impaired (HI)) from birth to 1 year old; 10 (7 NH & 3 HI) from 1 to 3 years and 15 (5NH & 10 HI) were included.
Non-tonal auditory stimuli including animal sounds, noise makers, and environmental sounds were developed for BOA screening at specific frequencies (500Hz, 1KHz, 2KHz, and 4KHz). Animal sounds and spondee words were developed for SAT screening on children between 1 to 3 years. For SRT screening, 18 spondee words from a pre-validated list in Tamil based on picture-ability were selected (centered at 1KHz).
To identify the suitable auditory stimuli for screening children, the device with these stimuli underwent a beta validation. The responses of children to non-tonal stimuli were compared with the standardised warble tones. The suitable stimuli for each test were identified.
The samples were subdivided into three groups (0-1 year; 1-3 years; and 3- 6 years) based on the screening test used for age groups. 244 children were screened from 0 to 6 years of age. 86 (74 NH & 12 HI) from 0 to 1 year; 58 (40 NH & 18 HI) from 1 to 3 years; 100 (47 NH and 53 HI) from 3 to 6 years.
Stimuli were presented at the intensity of 60dB HL (500Hz, 1KHz, 2kHz and 4KHz) via bluetooth speakers at 0 degree azimuth at the back of the child at a 1 foot distance from child’s ear level. Child was seated on the mother's lap. The behavioral responses were observed by an audiologist (other than the investigator) and noted. The pass criteria was set at responses observed in 3/4 frequencies. The same child then underwent objective gold standard auditory brainstem response screening (ABR).
The finalised stimuli from beta validation at 500Hz,1KHz,2KHz and 4KHz were presented at 60dB HL via bluetooth headphones to each child. The behavioral responses were observed and noted by an audiologist. The pass criteria was set at responses observed in 3/4 frequencies in each ear. The same child then underwent objective gold standard automated auditory brainstem response screening (AABR screening).
Finalized spondee words at 60dB HL were presented through headphones to each child. Standard closed set picture recognition task was used as response mode. Each child was instructed to tap on the picture on tablet screen when the respective word was heard through headphones. Scores were automatically calculated based on the child’s response to four choice picture paradigms in the screening device. 5/6 correct responses were set as pass criteria for each ear separately. Each child was then tested using standard/conditioning audiometry (500Hz to 4KHz) through headphones.
Results obtained from both devices for each age group were compared and analysed. The accuracy of each module (BOA/SAT/SRT) was validated against gold standard tests.
The Friedman test showed no significant difference (P = 0.001) between standardised warble tones and newly developed non-tonal auditory stimuli. Noise makers were found to be appropriate for BOA (0 to 1 year); animal sounds were appropriate for SAT screening (1 to 3 years) and bisyllabic picturable words were appropriate for SRT screening (3 to 6 years).
Sensitivity and specificity of BOA (0 to 1 year) based screening to be 76% and 79% respectively, SAT (1 to 3 years) was 76% and 69% respectively and SRT (3 to 6 years) screening was 95% and 88% respectively.
The non-tonal auditory stimuli developed for the tablet-based hearing screening device were found to be accurate to screen hearing among children below 6 years of age.
The rolling data analysis suggests that the validity of SRT is the highest and can identify losses even upto 50 dB HL, SAT and BOA are comparable to the HRR checklist and PQ, therefore for younger age group a checklist based screening may be adequate.
But since the current precision is at 15%, we intend to increase the sample towards a precision of 12% and reevaluate the validity data to confirm these trends.
O6 : Hexis-Antara: A Refreshable Braille Reader and Content Management System to Improve Literacy
Hexis-Antara: A Refreshable Braille Reader and Content Management System to Improve Literacy
Even a general study of the literacy capacities of visually impaired students produces alarming results. Most of them fail to achieve the standard of reading competency set for their age and schooling level by various governmental educational boards. Most are unable to read even words properly at the middle school level. This has serious and highly debilitating consequences for their future education. An investigation to find out the causes of this predicament has pointed to three directions. First, many parents keep their wards at home during the early years, delaying their formal education due to concern over them, motivated by the fact that many schools for the blind are residential. Many such students straightaway join at the first standard or even later losing crucial years of early schooling. This causes immediate underperformance and eventually a cumulative lag in their learning. Second is the lack of proper competency and commitment on the part of teachers. The particular difficulties faced by a visually impaired student in acquiring early literacy remain to be explored and acquainted to the teachers. Further, the lack of academic prospects induces a lacklustre commitment on the part of the teachers to actually invest properly in the student’s education. The biggest challenge however is the low availability and inadequate access to Braille books. High cost of printing and logistical issues skews the distribution of Braille books. Most schools have limited copies of textbooks and even fewer non-academic books. The bulky size of such texts further restricts portability causing decreased usage. A lack of Braille books in the Indian vernaculars further alienates children, most of whom have only a passing acquaintance with English or even none at all. A swift remedy on these three fronts is required. To that effect proper pedagogical and technological interventions need to be planned combined with a strong social support base.
On the technology side, Vision Empower and Vembi Technologies have come up with a refreshable braille reader named Hexis (a reference to the six-dot braille sign system) and a corresponding online platform for content management called Antara. Each Hexis device has a unique three-digit serial number associated with it. Using this each device can be registered on the Antara platform under a particular teacher account, using which the device can be monitored. Content is uploaded on the platform using and under the same teacher account, and is automatically converted to Braille. The content can be books, school notes, assessment worksheets etc. This content can be downloaded on the device after connecting it to a wireless internet connection. Salient features of the Hexis-Antara solution are:
1. Hexis is a battery-based standalone product with 5-6 days of usage without interim charge at a rate of 4-5 hours of daily reading.
2. Hexis is designed specifically for the school-going child to read any content in any Indian language.
3. Teachers/caregivers using the Antara platform will be able to create virtual classrooms and disseminate accessible content seamlessly to their children’s device.
4. Teachers/caregivers can track their child’s performance using analytics provided by the Antara Platform.
5. Hexis-Antara is the most affordable solution in the Global South priced at USD 300.
Creation of a technology however does not solve the problem. A good solution demands a deft use of this technology in appropriately planned interventions. We have planned to proceed in two steps; first, a proper assessment and representation of their current literacy abilities using appropriate indices; second, planning interventions to improve upon those indices to demonstrate the success or viability of certain educational strategies. The literary ability of children will thus be mapped into several levels defined by increasing competencies. They will be tested to determine which level they should be sorted into. Once such a sorting has been performed, we will have a clear picture of where the children stand. These levels are as follows –
- Level I – can identify braille letters.
- Level II – can identify and pronounce braille phonics and syllables.
- Level III – can read short words properly.
- Level IV – can read short sentences fluently without having to pronounce each syllable separately.
- Level V – can read simple rhyming couplets word by word.
- Level VI – can read short passages/paragraphs or stories clearly.
These levels may be considered to range from early pre-literacy to basic functional literacy. Upward mobility through these as a result of effective educational interventions are aimed at enabling a child in foundational literacy capacities. The range represents the basic building steps of these capacities: the ability to read alphabets, followed by phonics and syllables, then words, short sentences, rhyming couplets and finally into short passages. The reason rhyming couplets have been included is to develop a sensitivity towards the cadences of sound in children, a sensitivity which may prove very crucial and rewarding for individuals who otherwise do not have access to visual stimuli.
The Hexis device will be used for these purposes. Once the students have been evaluated and appropriately graded, texts corresponding to the same six primary levels will be used as practice drills to foster the improvement in the child’s reading skills. After the primary literacy skills have been developed or strengthened, the students will be provided general creative content to read. The development of personal reading habits can prove extremely beneficial not just for future academic endeavors but also in the personal progress of a young adult. It is hoped that the Hexis device will address this need.
Divya Prabha J
Indian Institute of Technology
O7 : Design development and manufacture of Junior Braille Playing cards
O8 : Towards Optimizing OCR for Accessibility
O9 : Experimental validation of Motor Imagery-based Brain-Computer Interface (MI-BCI) on movement-impaired patients
Experimental validation of Motor Imagery-based Brain-Computer Interface (MI-BCI) on movement-impaired patients
Objective: Motor impairment is a disability that causes weakness or inability of movement, arising due to muscular dysfunction or motor neuron diseases (MND) like Amyotrophic lateral sclerosis (ALS). MND affects the patient's hands, legs and facial muscles creating loss of balance, speech impairment and movement impairment. A Brain-Computer Interface (BCI) system provides such impaired patients with an alternate means of communication and control by providing a non-muscular communication pathway. The BCI captures and converts the patient's brain neuronal activity into commands to control external software such as speech generation  or hardware such as a robotic wheelchair for movement . Especially in motor imagery-based BCIs (MI-BCIs), the imagination of movements causes changes in the brain's neuronal rhythms, which are accessed and decoded to interpret the user's intention . In this regard, we have collected three motor-impaired patients' Electroencephalography (EEG) signals during the right-hand and left-hand movement imagination task and analysed their performance for BCI classification. For ease of implementation and structural integrity, we have used a reduced 8-electrode configuration with a dry headset . Finally, we have also compared our results with healthy subjects to check the disabled patient's performance against healthy subjects for practical BCI usage.
Experimental study: The study encompasses two stages: a) MI-EEG data acquisition from patients and b) Performance analysis of the acquired data using benchmarked classifiers to check for BCI capability.
a) MI-EEG data acquisition from patients:
EEG acquisition headset: In the first stage of collecting data from MND patients, EEG captures the brain patterns when the subjects imagine left-hand and right-hand movements. Eight dry electrodes from the positions C3, Cz, C4, CPz, FC5, FC6, CP5 and CP6 of the OPENBCI headset and sensing board were used for signal acquisition and sampling of the EEG signals in real-time. Patient's details: For collecting data from disabled subjects, we have worked with three movement-restricted patients. The patients consist of two males and one female who suffer from some form of motor neuron disease. The first patient is 72 years old, male, right handed and is hemiparetic, with his right side completely disabled. The second patient, aged 52 years, male, right-handed, is affected by quadriparesis, which weakens all four limbs (both legs and arms). The third subject is a 30-year-old female, a right-handed patient diagnosed with a cerebral vascular accident (CVA), commonly referred to as stroke resulting from the interruption of blood flow to the brain cells.
MI paradigm: We have focussed on Motor Imagery (MI) paradigm for BCI as it is external stimuli independent and quite intuitive for movement. Instructions are displayed to the patients and based on the given cue the patients are instructed to imagine left- and right-hand movements. Each trial begins with a fixation cross to enable focus, followed by the cue, after which the patient imagines movement for 5 sec. The imagination of movement results in an Event-Related Desynchronisation (ERD) , a decrease in power in the mu (8-14 Hz) and beta rhythm (13-30 Hz) in specific cortical areas. For example, the left parietal region for right-hand imagination and vice versa, as our brain is cross-wired. This ERD phenomenon occurring in specific cortical areas of the brain is the key to identifying the activity of user imagination.
b) Performance analysis:
The EEG data are then analysed to check for the discriminability between the two tasks for the feasibility of a Brain-Computer Interface (BCI) system. The relevant frequency information for MI is in the range of 8-30Hz, which includes the SMR and beta rhythms. Hence our data were filtered in this range before extracting the features. We have evaluated the performance of all the patient's motor imagery experiments with seven baseline classification pipelines. The pipelines comprise various Riemannian  and Common Spatial Pattern (CSP) based features and KNN, SVM and LDA classifiers.
Main Results: The analysis results using seven different classifier pipelines show that motor-impaired patients can produce distinct MI-EEG patterns that can be decoded for controlling a BCI system. Although the patients suffered from neuronal diseases, they were able to provide satisfactory results during MI-BCI performance compared to the healthy subjects.
Conclusion: Research shows that the reduced 8-channel dry electrode MI-BCI system could be effectively used as a communication and control paradigm for building assistive technologies in the case of movement-impaired patients. Also, our testing on disabled patients in addition to healthy subjects avoids the implementation bias as most BCIs are designed to substitute the central nervous system functionality.
Significance: Compared with other BCI works, our proposed reduced channel dry data acquisition setup lowers cost, imparts convenience and results in a compact design. Further, our tested MI-BCI paradigm on both disabled and healthy subjects can be incorporated into assistive technologies such as BCI-controlled wheelchairs and robotic arms for the motor impaired and elderly.
Indian Institute Of Science Bangalore
O10 : Enabling and Supporting Persons with Severe Speech and Motor Impairment through Inclusive, Safe, and Gaze Controlled Human-Robot Interaction
Enabling and Supporting Persons with Severe Speech and Motor Impairment through Inclusive, Safe, and Gaze Controlled Human-Robot Interaction
A significant proportion of the world’s population lives with some form of disability. Persons with disabilities have a different range of functional and cognitive capabilities based on their underlying medical condition. A congenital disorder or damage to the brain or the spinal cord caused by an accident, or a neurodevelopmental disease leads to involuntary muscle control referred to as spasticity. Persons with severe speech and motor impairment (SSMI) belong to an isolated part of the disability spectrum. The complexity of their physical conditions renders them unable to have natural interactions with their environments even for activities of daily living (ADL). Advances in robotics and technology have provided opportunities to design systems to enable and support persons with disabilities in everyday activities, communication, education, and fun. This research work proposes an inclusive and safe eye-gaze-controlled human-robot interaction (HRI) system for persons with SSMI. The proposed work is evaluated through various user trials performed at Vidyasagar school for special children (formerly the Spastic Society of India), Chennai. A user-centric design approach has been taken while iterating over possible solutions based on physical interaction with the end users and interviews with the special educators, trainers, and parents of persons with SSMI. The initial user studies with the proposed eye-gaze-controlled robotic arm for tasks like picking and placing objects, and reaching a randomly assigned target, shows a comparable performance of the persons with SSMI and able-bodied persons. The results motivated the research aims of proposing an affordable and safe HRI system to be used by persons with SSMI for everyday activities like painting, drawing, grabbing, moving objects, and engaging with playful and educational activities with toys and games like their able-bodied counterparts. The following user trials with a webcam-based eye-gaze-controlled robotic arm with proposed collision avoidance algorithms ensured safe and smooth interactions for pick-place and reachability tasks. The results showed improved performance of persons with SSMI over subsequent sessions while removing the need of using costly commercial off-the-shelf eye-trackers. The proposed safety algorithms, tested in simulations and with actual users, tracked and avoided any collision of the robot with humans using a webcam. The further user trails were focused on supporting the education and training of users with SSMI through engaging in playful activities like driving a toy car. A generic joystick controller mechanism was designed to be used with the proposed augmented reality-based eye-gaze-controlled interface to drive a toy to reach a target location. The performance of the persons with SSMI was measured on a representative pointing and selection task. The users with SSMI found the car driving activity to be engaging and interesting as opposed to the dull pointing and selection task. However, their performance improved on the pointing and selection task by practicing the car driving task. The proposed interface also ensures the safe to the privacy of individuals by incorporating a person detection and blurring effect on the proposed live video see-through interface. The proposed system as evaluated by the user studies shows potential in supporting the education and training of persons with SSMI through affordable and safe eye-gaze-controlled HRI systems.
Indian Institute of Technology Madras & Christian Medical college, Vellore
Indian Institute of Technology Madras
Christian Medical college, Vellore
O11 : Plug-and-train robot for hand rehabilitation: Version 2
Plug-and-train robot for hand rehabilitation: Version 2
In India, approximately 2 million people yearly are affected by hand impairments. Conventional hand rehabilitation has been proven effective but requires long hours of one-to-one therapy sessions with a physical or occupational therapist.Due to the dearth of trained therapists and high healthcare costs, most in-clinic rehabilitation prioritises balance and mobility over hand functions. Hence, home-based rehabilitation programs are prescribed with the intent to increase the dosage of therapy for patients. However, the conventional approach to home-based hand therapy, based on performing a printed handout of exercises, can be boring thus leading to poor adherence and high dropouts.
Rehabilitation robots can address some of these problems by providing intense, engaging, semi-automated therapy with a therapist's intermittent, direct/remote supervision. The device would better utilise the clinician's time for high-level therapy planning, attend to more patients simultaneously, and provide personalised therapy in the hospital. It can also help the patient adhere to/continue with therapy protocol at home by providing constant feedback and motivation.
To this end, we have developed PLUTO- plug-and-train robot for hand rehabilitation. Based on the feedback from the different stakeholders on the first version of PLUTO, we have refined the robot for better usability – PLUTO V2 (Version 2). This paper describes the improvements made to the previous version of PLUTO and the results from the in-clinic and in-home usability study.
Satish Chandra Jain
O12 : Personalized Wearable Gesture Vocalizer
Personalized Wearable Gesture Vocalizer
0.79% of India’s population (~1 crore) is speech impaired (1). People with Speech Impairment (PSI) face difficulty while communicating with others, who don’t know sign language. The biggest disadvantage of sign languages is that both parties need to know it. To overcome this difficulty, we have designed a Personalized Wearable Gesture Vocalizer through which PSI can easily communicate with others. The device will be wearable around the neck and can detect hand gestures made by the user through a camera and vocalize them to phrases and sentences of basic utility. For e.g., a PSI student can easily communicate within a normal classroom setting with their peers and instructor. Similarly, the device can be trained for use in commercial settings like offices or businesses to help PSI communicate and interact easily. The device will thus be helpful in education and integration of PSIs with the society and can be personalized by training according to individual needs. The device will be much cheaper than the wearable glove vocalizers being developed, as the sensory glove is not needed. Different hand gestures were recorded by our device's camera sensor and fed to an ML model. The interpretation of these hand gestures is predicted by this ML model. The device then reads out the message loudly through a speaker and presents it on an LCD screen too. So, any PSI can train the device with gestures of their own choice, thus helping in communication. The device is portable and features a safety mechanism. The device's main processing unit is a Raspberry Pi microprocessor, increasing its versatility and extensibility. The entire motivation is to bridge the gap between speech impaired people and the rest of the world by introducing a personalized interface.
Materials and Methods: Software
The gesture detection software is activated either using the push buttons on the device or the display screen. The touch display screen is also used for the addition and removal of gestures. For this screen, a Graphical User Interface (GUI) is created, which implements Add Gesture, Remove Gesture and Detect Gesture. The GUI is implemented using Python. The Add Gesture feature employs an On-Screen Keyboard which enables the user to easily add text without any extra hardware component requirement. ML Model: The Classification ML models are built on top of an open-source model licensed under Apache License 2.0 which allows the reproduction and distribution of the work. The ML model for single-hand gestures is isolated from the model for double-hands. Either of the ML models is composed of two main processes, collection and preprocessing of data using the processed images (using Mediapipe) and training the ML model on the collected data. The heavy-lifting of the image processing has been done using Google's Mediapipe library which provides the key points of the hand(s) which we save as a tuple. Each tuple represents one snapshot of the hand. For recognition of a gesture, the Mediapipe library is again used to obtain a tuple, dynamically, which represents one snapshot of the hand. This tuple is classified using the ML model to one of the stored gestures. The classifier is augmented to reject false classification. Prototype: The proposed design includes a portable/wearable device which is hung around the neck by the PSI. The device features an LCD touchscreen panel and a camera module, both mounted on a RaspberryPi 4 microprocessor. All the controls required to operate the device can be accessed by the touch screen alone. The camera module has been appropriately configured to capture the gestures of the beneficiary. When needed, the PSI can press a button mounted over the device chassis to switch on the screen. The device also features a quick access button to instantly start the camera which will capture the images of the gestures shown. Captured images are processed and classified by the microprocessor and after successful detection the gesture is vocalized using the connected speaker. The device can be used to add/delete any gesture of choice on-demand. The LCD panel is of sufficiently high resolution to allow the PSI to browse through his trained gestures and remove them.
Results and Discussion
The device was trained for detecting gestures required for a PSI student in classroom setting. The device could be trained for 20 such gestures that could be vocalized into phrases like may I go to toilet, can I sit here, can you please repeat this, can I borrow a pen etc. The device then could recognize the gesture and vocalize them into the corresponding phrases. Gestures according to the need can be added and deleted from the device and different set of gestures can be input for different social settings. The Raspberry Pi 4 helps to increase system efficiency and has a camera sensor built in for speech communication (Fig 1). The technology automatically translates sign language/ personalized gestures to speech and enables people with speech impediment to express themselves independently. The strategy helps persons with disabilities enhance their quality of life by overcoming their day-to-day challenges. The prototype can be further miniaturized to increase ergonomics of the product. As a result, we have created a gesture vocalization solution that is efficient and adaptable.