I am sure we are all aware of how language models like GPT have taken off in recent years. They have opened up a world of possibilities through the medium of natural language, with a lot of opportunity to rework/replace a lot of software defined natural language tools.

The idea of voice assistants have always interested me. We have all seen movies like Iron Man where these advanced, human-like voice assistants interact with people very naturally.

What is DARS?

DARS is a home assistant that combines the natural language interpretation capabilities of Large Language Models with discrete, programmatic output so that it can be used to control/activate functions in a room.

This includes

  • Managing home appliances and lights

  • Interacting with local files for a notetaking/todolist system

  • Routing your song request to spotify to play it

  • Managing software, like a web server / site

Beyond its functionality, DARS also features:

  • Voice synthesis to clone the voice of TARS from Interstellar

  • Custom, fine tuned personality via local model

  • Local speech recognition

  • Humor Setting ( and maybe a discretion one! )

How Does It Work?

  • Voice Synthesis, Recognition, and (parts of) local LLM run on Coral TPU Ai Accelerator
    • What can’t be run on device is run via cloud

  • Language Model Generates formatted json to run functions

    • Provides arguments and is pre-prompted on when to call functions
  • Function calls are either run on Pi ( note taker, humor setter) or sent to the Master ESP-32

    • Master can connect to several other ESP-32’s to control room device power

Hardware Architecture

flowchart TD

  

subgraph DARS Hardware

  

B[Raspberry Pi]

  

L[Coral Neural TPU]

  

I[GPIO & ESP32]

  

Z[SSD - Network Attached Storage]

  

end

  

  

A[Microphone Input ] --> B

  

B -->|Audio Input| C[Speech Recognition Model Convert to Text ]

  

C --> B

  

B --> Z

  

B --> L[Coral Neural TPU Run LLM Inference ]

  

L -->|Text Output| E[Generate JSON Output]

  

L -->|Audio Response| F[Speech Synthesis Model Convert to Audio ]

  

F --> B

  

B --> G[Speaker Output to User ]

  

E -->|JSON Data| H[Langroid Stack Process JSON for Function Calls ]

  

H --> B

  

B -->|GPIO Signal| I

  

I -->|ESP32 #1| J[Transmit to ESP32 #2]

  

J --> K[Control Device e.g., Lights ]

See DARS Media Gallery for some photos through the development process!

See DARS for a video demonstration!