As of 2016, there were 1 million job openings for home-care and 35 million people working as home-care workers who are not paid (children taking care of parents or grandparents, uncles or aunts). The 'baby boomers' are about to enter the market, and 90% have a preference for staying at home over entering any care facility.
By 2030, older adults 65+ will grow by 81%
Older homebound people often complain of feeling isolated and marginalized. But thanks to innovative services and products that help them feel connected — such as virtual reading groups — the barriers created by aging are starting to improve.
By 2020, the nursing workforce will drop 20% below projected needs. Home Care will add one million jobs with the highest growth occupations, 2012-2022. We will modernize and grow this workforce into the future of home care.
My name is Andy Cramer, and I'm Al's best friend, husband, and partner. For the past 20 years, we have been creating technology for various for-profit and non-profit industries. We have become experts in assembling communities with common interests and supplying innovative ways to serve unfulfilled needs and enhance experiences.
We feel that storytelling is intrinsic to our society and can bring joy and education to people of all ages. Storytelling is 10,000 years old and most likely the oldest form of emotional communication and learning. Remember that storytelling predates writing and requires a storyteller and a good listener.
We are currently working on a child's book using AI using technology that will allow the audience to communicate with an IOT device and choose different paths to create different endings. Designed for middle school children, we are imagining people of all ages being able to set up their stories and add content.
Our primary focus is eldercare. It's particularly close to me since I am in my late 60's and my mother has been in assisted living for years. She's had back surgery and advanced arthritis and is unable to walk more than a few steps. Added to that, she has late stage macular degeneration and is hard of hearing. Her isolation is painful, and Al and I bought her an Echo, hoping it would provide company and allow her to listen to the books she enjoys. She enjoys listening to music, but at 90 and almost blind, she is unable to remember or read the commands to wake up Alexa and ask for specific books or explore new forms of entertainment. My mother was an avid reader her whole life and even learned to convert books into Braille 30 years ago. Now, unable to manage to get out and around has created a feeling of isolation. Last week she told me that she is lonely and bored and that any voice would be a blessing; even one that is AI generated.
I'm going to continue to write about our progress. As a baby boomer, I can see what is ahead. Forbes reported, "By 2020, 117 million Americans are expected to need the assistance of some kind, yet the overall number of unpaid caregivers is only projected to reach 45 million". Advances in AI will allow facial recognition and elder Americans will be more able to exercise their minds, speak with their families, remember to take medication and have an assistant who will provide choices and companionship.
AI voice system lesson one: call out the attention command, wait or repeat until you get the system's attention, then make your request. It works perfectly when you use it exactly that way.
Why is this the case? For some systems, the local hardware is listening for this attention command by processing all audio it hears, sorting through sounds, looking for the "Attention Command." Once acknowledged as the correct command, the system then goes into cloud mode, where it sends your audio to the cloud to be processed. Here in lies the rub, for privacy we don't have a cloud system listening to our every conversation and sending it up to the cloud, so only when you get the attention of the system is that audio sent.
Given that the "Attention Command" is the only way a voice can activate the system, it then requires previous knowledge of the particular "Attention Command" and that the word itself is a command to start listening. If that is unknown, a user might speak the first three words of the sentence before the command to start listening has been activated. That's why the "Attention Command" with a pause after it is more effective.
The idea felt in the industry is to hide this complexity from the user, seamlessly providing a simulation of human dialogue. Example: "Alexa, what's the weather" - produces the correct response if the system is setup correctly. As we grow accustomed to asking the same thing each day, in such a nonchalant manner, we forget the uniqueness of this "weather command." It's just a single series of words that activates the request. But after the command has executed, Alexa is back to listening for Alexa, no longer focused on you. The reason is this AI is not 'an AI,' it is many AI systems acting as a single AI. It's just a big set of systems waiting for the next command. When we break down our example, “Alexa, what’s the weather?" we can see this simple question is a complicated process.
First, the hardware in Amazon’s Echo contains an AI listening and processing audio for the word “Alexa,” from all the sounds around it, all the time it is on. Listening and processing audio is the first AI that helps with this question. Local Command Word Processor (local speech to command) is AI #1,
Second, the audio containing the spoken words "what’s the weather" turns audio into text. This Voice to Text AI uses multitudes of recorded utterances to piece together the sounds into letters and words. In this case, it needed to understand a conjunction of “what is” as "what's." Voice to Text AI is AI #2.
Third, Natural Language Processing is used to turn, "what’s the weather?" into commands for this AI that specializes in the meaning of the text. Natural Language Processing turns, "what’s the weather" into commands. Natural Language Processing is AI #3.
When the intention of the user is determined, programs use a locally defined zip code and request the weather from the internet weather source for this location, then organizes the results into reply text. This text is, in turn, passed to AI #4, Text to Speech. It creates the returned text as audio, matched from multitudes of recorded utterances to construct the sounds into words, and words into sentences. Additionally, adding a cadence and creating an audio file is then passed to the Alexa speaker to play.
With four AI systems working together, the complexity of what is involved in delivering a spoken audio response from Alexa is enormous.
Teaching users how to use an AI system is a problem for the entire AI industry. It's because we don't address each other by first calling out their name, waiting for an acknowledgment before then speaking a command. When we want something from someone, we say it all together. For example, 'Al turn on the lights' will come out in one breath without a pause. It's not natural for humans to pause after calling the name of someone especially when we're feeling physically comfortable ourselves. The Industry problem I spoke of; these AIs all exist in that environment where we feel comfortable; therefore they must adapt to us for best adoption and lower attrition rate. The AI systems are in part, built for human comfort. Therefore, the nuance of a simple insertion of a pause is in the way of faster adoption. Chances are, if you know what you want and can see the sentence in your mind before saying it, then call out the AI start command, say, 'Alexa,' then wait for the system to acknowledge your request for attention, then speak the sentence. Then and only then you almost always get what you want and quickly.
Currently, a user must think about what information they want from our AI systems before speaking to them to ensure an accurate response. Understanding how these systems work and how we communicate with each other is an important first step towards full adoption.
For example, understanding command word requirements when using an AI platform with voice is essential. The acknowledgment command, usually the local hardware is listening for this command and does the processing of all the audio it hears, sorting through sounds, looking for the Attention Command. Given that the attention command is the only way to use your voice to activate the system, it requires previous knowledge of the command, and that the word itself is a command to start listening. If you don't know that, you might speak the first three words of the sentence before the command to start listening has been activated. That's why using a pause after the attention command is best. We see this as a penultimate need for our industry to bridge this gap.
The underlying assumption of AI assistants seems to follow the understanding that the assistant is the equivalent to a person that you can naturally speak with, using answers from the Internet or access to personal data to form natural speech responses. If we feel this way, we also feel it isn't very accurate.
The current AI systems, such as Amazon's wireless speaker, Echo, with the Alexa assistant, is connected to your Amazon data it uses to generate responses. Alexa consists of groups of cloud-based and local-based AI systems that work together to perform a single command, or at most two or three at a time*; only after activated correctly (* josh.ai and Google can process multiple commands given at one time)
The industry is hiding this complexity from the user, seamlessly providing a simulation of human dialogue. For example, if we use Alexa to examine the difference - "Alexa, what's the weather" - produces the correct response if the system is setup correctly. As we grow accustomed to asking the same thing each day, in such a nonchalant manner, we forget the uniqueness of this 'weather command.' It's just a single command and a single way to activate the response. But after the command has executed, Alexa is back to listening for "Alexa," and no longer focused on the user. This medium is not AI; it is just a big set of deep learning AI systems acting as a single responder and waiting for the next command.
When we break down the example, “Alexa, what’s the weather?", a complicated process takes place. First, the hardware in Amazon’s Echo contains an AI listening and processing audio for the word “Alexa,” from all the sounds around it, all the time it is on. This listening and processing are the first AI assistant that helps with this question. Let's call it AI #1, Local Command Word Processor (local speech to command)
Second, the audio containing the spoken words turns "what’s the weather" from audio into text. This Voice to Text AI uses multitudes of recorded utterances to piece together the sounds into letters and words. In this case, it needed to understand a conjunction of “what is” as "what's." The Voice to Text AI supports AI #2.
Third, another AI that specializes in the meaning of the text is involved. This Natural Language Processing AI turns "what’s the weather" into commands, and we can call it AI #3.
When the intention of the user is determined, programs use a locally defined zip code and request the weather from an Internet weather location source, then organizes the results into reply text. This text is, in turn, passed to the AI #4 that helps with this question. Text to Speech creates the returned text as audio, matched from multitudes of recorded utterances to construct the sounds into words, and words into sentences. Additionally, adding a cadence and creating an audio file is passed to the Alexa speaker to play.
With four AI systems working together, the complexity of what is involved in delivering a spoken audio response from Alexa is enormous. At this point, the solution AI systems are solving for is becoming a problem for the AI industry as a whole: how to teach users to manage an AI system.
Making the system so easy to use that there is no thought involved is premature. It is still critical that we think about what we want from our AI systems before speaking to one if we want to ensure an accurate response.
Amazon's release of the Echo gives users a device that can be spoken to from any direction in the house and carry out commands like reading audio-books, listening to music or learning from Wikipedia by asking questions. "I has been a wonderful addition to my life as boredom nearly overtook me! Now I can listen to my books and hear a knock-knock joke once in a while" says Caryl Burgess of McAllen Tx.
- Jibo's release will make way for video calls that can be initiated by Jibo from a users voice across the room and Jibo aims the camera at the caller's face with a 360 turning radius.
Social robotics to the rescue! (Pictured, Pepper and Jibo)
That's Really Possible
- Ready or not here they come. Social robotics is now a real thing! Inexpensive access to these internet enabled devices will revolutionize eldercare and relieve the real problem of house-bound people everywhere.