Optic: An App Assistant for the Blind

Optic is a personal assistant app made for blind people. It is easy to use and helps the blind ‘see’ their surroundings using a phone’s camera and computer vision technology. Using the current version of our app, blind people can read books and signs, and can also search the web for any information they would like.

The app was built as part of a hackathon where I led my team to 3rd place out of 25 teams across Bangalore, India.

Demo

Usage

To have the app read your surroundings out to you, simply point the phone in the general direction of the text, click the large button to activate the assistant, and say ’take a picture’. Within a few seconds, Optic will be reading out whatever is in front of you. To search for information on the web that would normally involve looking at a screen with tiny letters, you can instead click the large button, say ‘search for <topic>’, and Optic will read out information about <topic> to you.

User Interface

The UI is extremely simple. With the help of a trusted member, a blind person can log into Optic using his google account. From there, Optic will remember him to be linked to that device until they decide to sign out. Once the account is set up, the blind person is completely independent. There is only a single large button that can easily be pressed to activate the assistant, after which commands can be spoken by the user. Pressing the button activates the listener, so privacy is maintained when not speaking to the app.

Tech Stack

Let’s take a look at the technology and some of the code used to build this application.

CAUTION:

Some of the functions and APIs used in this blog have since been changed. Optic was developed in 2021; as such, many libraries have gone through breaking changes since then.

This blog is BEST USED as a guide for how such an application can be structured, and how the code could be written.

Authentication

The app uses Firebase Authentication and Riverpod to switch between screens based on user authentication state.

Here is the Wrapper widget which is displayed from the root of the application.

Application Root

void main() async {
  WidgetsFlutterBinding.ensureInitialized();
  await Firebase.initializeApp();

  runApp(
    ProviderScope(
      child: RootWidget(),
    ),
  );
}

class RootWidget extends StatelessWidget {
  @override
  Widget build(BuildContext context) {
    return MaterialApp(
      debugShowCheckedModeBanner: false,
      home: Wrapper(),
      navigatorKey: appNavigatorKey,
    );
  }
}

Wrapper Widget

class Wrapper extends ConsumerWidget {
  @override
  Widget build(BuildContext context, ScopedReader watch) {
    return watch(authStateProvider).when(
      data: (user) {
        if (user == null) {
          // User is not logged in
          return LoginPage();
        } else {
          // User is logged in
          return HomePage();
        }
      },
      loading: () => Scaffold(
        body: Center(
          child: CircularProgressIndicator(), // Loading Circle
        ),
      ),
      error: (error, stackTrace) => Text(
        '$error',
      ),
    );
  }
}

Database

Google’s Cloud Firestore Database was used for this project. Here is a snippet from my databaseService.dart which provided utility functions for interacting with Firestore to the rest of the codebase:

import 'package:cloud_firestore/cloud_firestore.dart';
import 'package:firebase_auth/firebase_auth.dart';
import 'package:optic/models/userData.dart';

final usersRef = FirebaseFirestore.instance.collection('users');

Future<void> addUser(User user) async {
  final documentSnapshot = await usersRef.doc(user.uid).get();
  if (!documentSnapshot.exists) {
    await usersRef.doc(user.uid).set(
      {
        'uid': user.uid,
        'photoURL': user.photoURL,
        'email': user.email,
        'displayName': user.displayName,
      },
    );
  }
}

Future<UserData> getUserData(String uid) async {
  final documentSnapshot = await usersRef.doc(uid).get();
  return UserData.fromMap(documentSnapshot.data()!);
}

Stream<UserData> userDataStream(String uid) {
  final docSnapshotStream = usersRef.doc(uid).snapshots();
  return docSnapshotStream.map(
    (ds) {
      return UserData.fromMap(
        ds.data()!,
      );
    },
  );
}

List<UserData> _userDataListFromQuerySnapshot(QuerySnapshot qs) {
  return qs.docs.map((ds) {
    return UserData.fromMap(
      ds.data()!,
    );
  }).toList();
}

Future<List<UserData>> listUsers() async {
  final usersQuerySnapshot = await usersRef.get();
  final userDataList = _userDataListFromQuerySnapshot(usersQuerySnapshot);
  return userDataList;
}

Speech to Text

The app used the Flutter speech_to_text package. Here is a snippet demonstrating how it was used in the app to enable voice commands.

if (!speechToTextService.isAvailable) {
    await speechToTextService.initialize(
      onStatus: (status) {
        print('log: listening status is: $status');
      },
      onError: (val) => print('onError: $val'),
    );
  }

if (!speechToTextService.isListening) {
  await speechToTextService.listen(
    onDevice: true,
    onResult: (result) async {
      streamController.add(result);
      if (result.finalResult) {
        // No more words are being spoken

        if (result.recognizedWords != '') {
          if (result.recognizedWords.substring(0, 10) == 'search for') {
            // User wants to search something on Wikipedia
            var res = await Wikidart.searchQuery(
                result.recognizedWords.substring(11) // Everything after 'search for'
            );
            var pageid = res?.results?.first.pageId;

            if (pageid != null) {
              var info = await Wikidart.summary(pageid);
              if (info != null) {
                print(info.title);
                print(info.description);
                print(info.extract);
                if (info.extract != null) {
                  // text to speech - recite the Wikipedia page's extract
                  await tts.speak(info.extract!); 
                }
              }
            }
          } else if (result.recognizedWords.substring(0, 14) ==
              'take a picture') {
            // User wants to perform Optical Character Recognition (OCR)

            // Move to the TakePicturePage
            appNavigatorKey.currentState?.push(TakePicturePage.route()); 

            final image = await cameraController.takePicture();

            final recognisedText = await textDetector.processImage(
              InputImage.fromFilePath(
                image.path,
              ),
            );

            await tts.speak('Reading text in image');
            await Future.delayed(
              Duration(
                seconds: 2,
              ),
            );

            if (recognisedText.text.trim().isEmpty) {
              await tts.speak('No text found in picture');
            }
            await tts.speak(recognisedText.text.toLowerCase());
            appNavigatorKey.currentState?.pop();
          } else if (result.recognizedWords.trim().toLowerCase() == 'stop') {
            // stop text-to-speech
            tts.stop();
          }
        }
      }
    },
  );
} else {
  await speechToTextService.stop();
}

// This code snippet is within a provider - when the provider is disposed,
// we want to stop the speech-to-text service (we want to stop listening)
ref.onDispose(() async {
  if (speechToTextService.isListening) {
    await speechToTextService.stop();
  }
  streamController.close();
});

INFO:

Google’s ML Kit was used for all image text recognition. The google_ml_kit Flutter package was used to interface with ML Kit’s APIs

Conclusion and Source Code

In this blog, we talked about Optic, a basic app assistant designed to assist individuals who are blind.

We delved into Optic’s tech stack, which includes Firebase for authentication, Google’s Cloud Firestore for the database, and the Flutter speech_to_text package for voice commands. We also highlighted the use of Google’s ML Kit for image text recognition.

The complete source code for Optic can be found here ⭐.