Roles: 1 = greeter; 2 = facilitator; 3 = computer; 4 = observer
Project site: http://www.cs.washington.edu/education/courses/cse490f/06au/project_files/camera/
Report URL: http://www.cs.washington.edu/education/courses/cse490f/06au/project_files/camera/proto/
We report here the results of user tests of an initial prototype of a panlingual mobile camera (PMC). It would capture written and printed expressions needing translation, do some and obtain the rest of the translation processing, and deliver translations to the customer.
World-wide mobility across the boundaries of thousands of languages creates a potential mass demand for ubiquitous rapid translation from any language into any language. The increasingly prevalent camera-equipped mobile telephone could, with appropriate software, be a PMC.
A successful PMC would need an effective user interface. Of course, it would also need to deploy automatic, human, and hybrid human-machine services: text recognition, language identification, and translation. These are computationally, scientifically, and organizationally difficult, but they do not render the user interface secondary or trivial. And the very complexity of the PMC's back end and the diversity of the targeted user population make the design of a user interface particularly challenging.
The mission of this project is to design an interface suitable for the diverse users of a system that panlingually translates texts in images, installed on mobile camera phones that are expected to be generally available within five years, so that its users can interact effectively and satisfyingly, in any language, with the system and, through it, with the world around them.
Our empirical study of the need for a PMC revealed unexpectedly diverse and complex needs. We found the potential clientele including people of all occupations, ages, educational levels, and social origins. We also found evidence that users would want:
These findings persuaded us to embark on incremental prototyping. We had originally envisioned designing a comprehensive prototype of a user interface that would cover the anticipated functionality of a complete PMC. We modified that notion. Our new goal was to design an initial prototype. This would cover a testable fraction of the complete functionality. By a testable fraction, we mean a fraction that permits us to define plausible tasks that the prototyped device could support, but omits support for likely extensions of these tasks or for other tasks.
Our initial prototype represents a software application for what will be common existing mobile camera phones. We evaluated the required camera resolution with realistic pre-prototype tests, using a 0.3-megapixel camera phone, and found the resolution often inadequate for text recognition. On this basis, we estimate that a 3-megapixel resolution, which is now becoming common in mobile phones, would suffice. In our pre-prototype tests of user needs, we found that users would commonly need to select particular passages of text for enhancement, metadata access, or annotation. We concluded that such detailed interaction with the image would require a device with a touch-sensitive display. Such mobile phones, and overlays for other mobile phones, are already in production and development (e.g., http://www.synaptics.com/onyx/). Strategy Analytics has estimated that "as many as 40%" of mobile phones "could be using some form of touch sensitive technology" by 2012 (http://immr.client.shareholder.com/ReleaseDetail.cfm?ReleaseID=202035). In addition, language identification and the selection of target languages for translation would be enhanced if the underlying device is location-aware.
The prototype that we tested supports the following fractional functionality:
Thus, the initial prototype omits support for the translation of texts in archived images, translation performed entirely or partly by humans, spoken translations, and user annotation of images for translators, among other actions.
In addition to supporting only a fractional functionality, the initial prototype also omits any integrated help or documentation for those actions that it does support.
The prototype is a low-fidelity, paper-based, hand-sketched preview of the initial interface. Changeable elements are represented with slips of paper that we place onto and remove from the background sheet as required. The camera mode is simulated with a sheet containing a rectangular hole, which the subject maneuvers over a sheet showing a scene, so as to frame a photograph. The low-fidelity-first strategy has been found to help subjects focus on strategic aspects of interface designs and help designers modify designs as often as necessary.

Figure 1. Panlingual Mobile Camera, Initial Interface Low-Fidelity Prototype
The interface contains controls labeled "camera" and "panlingual camera", the latter intended to launch our application, which during its operation provides a control labeled "exit" for the termination of the application process. After it is launched, the application provides a control labeled "capture and translate", allowing the user to take a photograph and have it automatically translated. During the translation process, if it is not apparently instantaneous, the captured image is displayed, with a horizontal-bar progress indicator labeled "translating..." covering part of it. Then the translation appears below the scaled-down image. The image is labeled above with the list of languages identified as appearing in text in the image. The translation is labeled above with its language, and next to the language name is a point-down triangle, intended to tell the user that this language is the selected language in a drop-down list. (The default language is the user's main language except when the original is in that language, in which case it is the dominant language of the current location.) If the user touches the language name, a list of alternative languages appears with an "other..." option at the bottom, so the user can select any other target language, whereupon the selected language name replaces the original name and the translation changes to a translation into the selected language (with a progress indicator if necessary). Metadata about any translated word or phrase in the original text or its translation are revealed when the user touches the word or phrase. If the user touches either the original text or the translation, the applicable word or phrase in both the original text and the translation are highlighted, and the OCR output from the original word or phrase is displayed next to it.
The sketches comprising the prototype, including special sketches for the scenarios to be described below, are shown in Figure 1. The basic system elements are labeled "Common", and the elements belonging to the four scenarios are labeled "Demo", "1", "2", and "3".
After informal tests of individual design elements on one another, we conducted a complete informal pretest on one colleague-friend, then tests on four subjects, all with native fluency in English. We recruited the test subjects by posting invitations on local Internet activity and discussion fora (the Seattle International Club and the LiveJournal University of Washington Community). By self-report, the pretestee and the subjects had the following profiles, indicating some diversity in foreign-travel, foreign-language, and information-technology experience:
| Pretestee ("SC") | Sub 1 ("DE") | Sub 2 ("ES") | Sub 3 ("GM") | Sub 4 ("DS") | |
|---|---|---|---|---|---|
| Other languages known | 2 | 1.5 | 1 | 3.5 | 1 |
| Continents visited outside USA | 2 | 0 | 3 | 3 | 2 |
| Mobile camera experience | yes | yes | yes | no | yes |
| Computer literacy (0-3) | 1 | 3 | 1.5 | 1 | 1 |
We conducted the pretest in a private residence and the tests in study rooms of the undergraduate library of the University of Washington. We selected the latter venue for its prominence and the expectation that subjects would feel safe there.
We defined three user tasks for this study.
In task 1, the user is sightseeing, waiting at a bus stop, and a bus approaches, labeled with a foreign-language sign. The user wants an English translation of the sign, so as to decide whether to take the bus.
In task 2, the user is traveling in a country that uses a foreign language and is at a restaurant with a menu in, and waiters who know, a different foreign language. The user has written a note in English asking for the vegetarian dishes on the menu to be pointed out. The user wants this note translated into the restaurant's language, so the user can show it to a waiter.
In task 3, the user is visiting a cemetary where an ancestor is believed buried and has found a tombstone with an inscription in a foreign language that makes it seem likely that this is the ancestor's tomb. The user wants an English translation of the inscription and wants to know what, in the translation, corresponds to a passage that is prominently marked in the inscription.
The pretest informant and the test subjects were asked four questions about their knowledge and experiences (see above) when recruited. At the start of the test sessions, we obtained written consent to be studied (with a copy for each party), described to subjects the purposes of the study, and answered any questions.
We then demonstrated the paper prototype to show its user-interface principles on tasks differing from the user tasks and to show how the paper would simulate an electronic device. In the demonstration, we showed a user touching sketches of an "Email" button and an "Exit" button and thereby causing an email menu to appear and disappear. We also showed a user moving a sketched frame with a rectangular cutout and positioning it over a drawing, to simulate aiming the camera before taking a photograph. Finally, we showed a user looking at a map on the display and touching a place on the map, causing more information about that place to appear next to it. We described each action (button pressing, camera aiming, and object touching for deeper information) as an example of a general functionality of the device.
After the demonstration, we consecutively read each task description to the subject and placed the applicable props before the subject. We asked the subject to show what the subject might do and explain verbally the subject's intent. We altered the props as appropriate after each subject action. We then discussed the subject's interpretations, our intent, and the subject's ideas about alternative design elements. When the subject took an unanticipated action for which we had no responsive props, we explained our expectations, solicited the subject's ideas about how better to communicate those in the interface, and resumed.
We completed each session by soliciting other suggestions, questions, and comments.
We kept the procedure invariant except when the results of any session made it appear substantially beneficial to modify the procedure. We modified the procedure twice. After the pretest, we made numerous modifications to the demonstration script, task descriptions, and prototype to overcome confusions attributable to instructions and design elements. After the first test, we made further modifications to the task descriptions in order to clarify our expectations.
This study was a qualitative exploration of dimensions of usability. We observed user behaviors and statements to infer any notable successes and failures in the achievement of the common usability goals: learnability, memorability, flexibility, efficiency, robustness, satisfaction, and enjoyment.
The initial prototype's testing generally produced encouraging and satisfying results. Subjects were unanimously optimistic about the value of a device with the prototyped features. They showed growing awareness of its possibilities as the tasks exposed them to applications of it that they had not considered (such as having their own notes translated into a local language, or guessing what language they wanted it translated into on the basis of their current location). Subjects either knew how to exercise the prototype on the basis of its adherence to standards with which they were already familiar or showed or asserted that they could learn to do so in a few trials or a day. They generally navigated through the anticipated sequences without resistance or objection, as if each next step were the natural one to take. We saw no cases of a subject forgetting any learned technique from one task to a later one, something that might have occurred if an action sequence had conflicted with their experience-based expectations. Subjects did not complain about the poverty of the interface, even though it did not offer integrated help, image manipulation, archiving, retrieval, etc. Finally, subjects frequently smiled and indicated that they were enjoying the experience of using the prototype and imagining themselves using the actual system.
The "easy" task produced little difficulty, even though it was the first task performed. We detected, though, a bit of confusion about the existence of two camera modes on the device, a regular one and one that translates text in photographs taken. The "moderate" task produced some difficulties for some subjects, relating to the recognition of a drop-down list as something affording choice, and to the surprise of having to discard a translation into an automatically selected target language and replace it with one into the desired target language. The "difficult" task produced difficulty for all subjects. No subject learned from the demonstration to try touching any displayed item about which they wanted more detailed information. This one-step solution did not occur to them, so they assumed that the only solution would be to take another photograph of just the passage of particular interest, thereby tricking the device into giving them just the translation of that passage.
The typical completion time for our tests was 30 minutes. The typical response time per action (step) was about 2 seconds, lengthened by about 5 seconds for the requested vocalization of the subject's intent. All response times in excess of 5 seconds appeared to be due to conceptual gaps between designer and user models, as discussed above, not to any complexity or counterintuitivity of the model realization.
The testing in this project produced a rapid asymptotic approach to an apparently practical initial interface design. Our early drafts had contained fatal inconsistencies and gaps, which we had detected with our own mental analyses of utilization scenarios and about which we were warned by expert readers. By the pretest, we had adopted an incremental-prototyping strategy and drastically simplified the prototype. This created a risk that we would be testing a trivially easy prototype. We decided to face this risk, on the working assumption that everything is more complex than it seems. The final testing supported this assumption.
By including only a small set of features, the initial prototype allowed us to focus on a few basic design issues and resolve them before we proceed to elaborate the design. We can thereby protect the more elaborate features from a need to redesign them because of defects in the initial features discovered too late.
Our testing revealed two main issues for resolution: target-language selection and metadata access. Each of these provoked competing ideas among the designers and the subjects, and we believe they merit further prototyping and testing.
Target-language selection relates to cases in which the device is used to translate texts that are in the user's main language (the default target language). We anticipate that the device would be used for such translation only in a minority of cases, but in such cases the user might resist the automatic production of a translation into the current location's principal language. Should users be educated to treat this first translation as a harmless, free scrap, and thereby to get the benefit of having immediate access to the translation that they will want most of the time, without any intervening steps? Or should the conventional notion of "ordering" translation as a "job", with the target language as one of the "specifications", be reflected in the application's action sequence, by a step in which the user confirms or changes the target language before translation is performed? Or should each of these sequences be a user-preference option, and, if so, which sequence should be the default? This is an application-specific question.
Metadata access relates to the fact that translation produces too much information for automatic simultaneous display. Metadata access is important with automatic translation, because of the intrinsic imperfection of the process, especially when performed on texts that are discovered in photographed images of miscellaneous scenes. This imperfection will lead users to want to inquire into the translations they get, such as seeing in print the character strings that the original texts are recognized as. If, for example, the Russian word "глина" (vehicle wheel tire) is misrecognized as "шина" (clay), the user may figure out that this is the reason for the appearance of a strange word in the translation and may recover from the error by rephotographing the passage more closely or at higher resolution. Our solution concept for metadata access was an invitation to the user to touch any object on the display to get more information about it. Users showed no familiarity with this potential convention and didn't learn it from a single demonstration. This made Task 3 difficult for them, though the difficulty is paradoxical, since, once we reminded them of the possibility of probing by touching they did this with more ease than the attempted alternative procedure. Should we abandon this feature, or conclude that it is an empowerment that we should help users learn? Would they have learned it from a single example if, instead of watching us demonstrate it, they had performed the action themselves? Would a goal that couldn't be reached with a work-around, such as inspecting the OCR for an item, more easily trigger the subject's awareness of the feature? Should we adopt a subject's suggestion to mark all touchable objects in the display, making them look like hyperlinks? In contrast to target-language selection, this issue is platform-specific, not application-specific. We expect that all applications on a touch-screen platform may benefit their users by offering deep information that is revealed by a "probe" action, executed by direct touching. The map example that we used in the demonstration, contact lists, Web search, and other applications come to mind. We are not sure about the best answer to this question, but we believe that the answer should be a platform-wide, cross-application standardization answer, one which we should seek to arrive at by examination of, or participation in, evolving interface standards. In principle, our application could pioneer a platform-wide convention, but users would then need to learn this convention for our application alone, until (and unless) other applications adopted it. Meanwhile, discussion suggested to us that some metadata can be provided on the fly, in ways that suggest touching for more, hints about touch-probing can be offered until no longer needed, and the texture (e.g., matte) or apparent shape (e.g., inset) of the display might also provide a suggestion of touchability.
While supporting our decision to do incremental prototyping, the test results also supported the low-fidelity prototype-testing paradigm. Subjects showed no resistance to, or skepticism of, the primitive sketches that we used in lieu of working equipment. Subjects seemed to see how easy it was going to be for us to modify the prototype and therefore seemed to understand that their suggestions would make a difference. We also noticed no difference in subject behavior attributable to the change from our initial terse task descriptions and the more verbose ones of Version 3 (see Appendix). This suggests that subjects in low-fidelity testing exhibit robust imaginations that are not highly dependent on particular task formulations.
These considerations lead to a basic decision about further prototyping, once we have dealt with the issues described above. Do we maintain the initial fractional functionality and test at a higher level of fidelity, or do we maintain the existing low fidelity and incrementally add to the prototype's functionality? If we do the latter, we are aware that some possible solutions to some observed problems, such as animated hints, are difficult to simulate in low fidelity. This seems to be our next strategic choice.
Consent Form
The Panlingual Mobile Camera application is being produced as part of the coursework for Computer Science course CSE 490F at the University of Washington. Participants in experimental evaluation of the application provide data that are used to evaluate and modify the interface of Panlingual Mobile Camera. Data will be collected by interview, observation and questionnaire.
Participation in this experiment is voluntary. Participants may withdraw themselves and their data at any time without fear of consequences. And concerns about the experiment may be discussed with the researchers Jonathan Pool, Neb Tadesse, Tim Wong, and Luke Woods, or with Professor James Landay, the instructor of CSE 490f:
James A. Landay
CSE Department
University of Washington
206-685-9139
landay at cs.washington.edu
Participant anonymity will be provided by the separate storage of names from data. Data will only be identified by participant number. No identifying information about the participants will be available to anyone except the researchers and their supervisors.
I hereby acknowledge that I have been given an opportunity to ask questions about the nature of the experiment and my participation in it. I give my consent to have data collected on my behavior and opinions in relation to the Panlingual Mobile Camera experiment. I understand I may withdraw my permission at any time.
Name ______________________________________________
Date _______________________________________________
Signature ___________________________________________
Witness name ________________________________________
Witness signature ____________________________________

This is a very rough initial sketch of a new feature for a mobile camera phone.
With this feature, the phone would work as a regular mobile phone with a built-in camera. But, in addition, the user could also take pictures of texts and have the texts translated. Also, this phone would have a touch-sensitive display, so the user could do things by touching and drawing on the screen.
First, I'm going to show you how to interact with the sketch, as if it were a camera phone.
To press any button, I just touch the button on the sketch, like this:.....
When I press a button that does something, my colleague here, who's pretending to be the brain inside the phone, changes the sketch accordingly. So, for example, if I press the "Email" button, like this--.....--the sketch changes accordingly. [Email menu appears.] Then, if I press the "Exit" button to get out of the email mode, like this--.....--the sketch changes again. [Email menu disappears.]
To press a button that's on the side of the phone, I touch it just the same, like this:.....
To select any item that is displayed, I can just touch the item, like this:.....
As for the camera, to show that I'm aiming the camera to take a picture, I pick up the sketch with the hole in it and look through it, then put it back down, like this:.....
OK, that's the demonstration. Any questions before we start?
In the next step, I'm going to give you three tasks to perform, and you'll show me how you would perform them. Our purpose here is not to test your abilities. It's to test whether the interface that we have sketched is understandable enough so people can use it without training or documentation. Ready?
This is a very rough initial sketch of a new feature for a mobile camera phone.
With this feature, the phone would work as a regular mobile phone with a built-in camera. But, in addition, the user could also take pictures of texts and have the texts translated. Also, this phone would have a touch-sensitive display, so the user could do things by touching and drawing on the screen.
First, I'm going to show you how to interact with the sketch, as if it were a camera phone.
To press any button, just touch the button on the sketch, like this:.....
When you press a button that does something, my colleague here, who's pretending to be the brain inside the phone, changes the sketch accordingly. So, for example, if you press the "Email" button, like this--.....--the sketch changes accordingly. [Email menu appears.] Then, if you press the "Exit" button to get out of the email mode, like this--.....--the sketch changes again. [Email menu disappears.]
If you want more information about anything in the display, you can touch it. If more information is available, it will be shown. For example, if you're looking at a map and you touch a place, like this--.....--more information about the place may be shown. [City info window appears next to city.]
As for the camera, to show that you're aiming the camera to take a picture, you move the sketch with the hole in it over the background sketch, like this:.....
Finally, let me mention that the phone would be location-aware. It would know where it is, and sometimes it would use that knowledge to make guesses about what its user needs.
OK, that's the demonstration. Any questions before we start?
In the next step, I'm going to give you three tasks to perform, and you'll show me how you would perform them. Our purpose here is not to test your abilities. It's to test whether the interface that we have sketched is understandable enough so people can use it without training or documentation. Ready?

Task 1
Task 2
Task 3
Task 1
Task 2
Task 3
Task 1
Waiting for the bus, you see one approach and wonder if you can take it. The sign on the bus is written in a foreign language, which you would like to have translated into English so you can decide whether to take the bus or not.
Task one will be complete when you read the English translation of the bus sign to the facilitator. Show us what you might do.
Task 2
Traveling alone in France, you are a strict vegetarian browsing the menu at a Chinese restaurant. Your server speaks Chinese, but no English or French. You decide to write down the question, "Which dishes are vegetarian?", and have it translated into Chinese. You then show the Chinese translation to the server so that he/she can point you to the vegetarian dishes.
Task two will be complete when you show the Chinese translation of your question to the facilitator. Show us what you might do.
Task 3
While tracing your family heritage, you are visiting a cemetery where one of your ancestors may have been buried. You find a tombstone with an inscription that includes your family's surname, a few lines of text, and a small passage surrounded by a decorative border. You wish to have the entire inscription translated into English, but are particularly interested in the translation of the small, ornate passage.
Task three will be complete when you show the facilitator which part of the translation corresponds to the ornate passage. Show us what you might do.
Task 1

Task 2

Task 3

The detailed test data in the next five sections include identifications of the roles played by the investigators in each test and each observer's observation notes, with severity ratings on the scale proposed by Jakob Nielsen, "Severity Ratings for Usability Problems" (http://www.useit.com/papers/heuristic/severityrating.html), ranging from 0 (not a problem) to 4 (catastrophic). We apply severity ratings only to interface elements, not to problems in our demonstration script, task descriptions, or simulation instructions, and only to negative, not to positive, incidents.
We summarize here the "critical incidents" noted by our observers:
Positive incidents:
Negative incidents:
The pretest took place on 10 November 2006. Pool played all roles. The observation notes by Pool follow:
First-person singular ("I") in the demo was found confusing, so I changed it to second-person ("you").
The process of lifting a paper with a big hole in it and panning with it, then pressing a button on it, was cumbersome and hard for multiple observers to watch, so I replaced it with sliding the window around over a larger image, while leaving it on the table.
There was nothing in the demo telling the subject that one can always touch something in the display to try to get more information about it. So it wouldn't occur to the subject to touch the tombstone text to get more info. I added a legend to explain this, but it wasn't understood well, so I removed it again and added a scene to the demo for this, with appropriate props. Severity: 3.
The subject tended to interpret the demo as relating to translation even when it wasn't. This could lead to misunderstandings.
The bus-sign translation was unsatisfactory when represented only as lines, since the subject couldn't understand that the translation was adequate as-is. So I replaced the lines with English that clearly showed the bus sign's meaning.
The menu language selection response was inadequately detailed. There was no change in the lines representing the translation, so it looked as if only the specified target language had changed without the translation itself changing. So I changed this so the original looked like French and the new "Chinese" replacement replaced the language name and the translation, with the translation looking like Chinese.
The graveyard scenario gave inadequate motivation to get more information about a part of the original text and its translation. So I changed the scenario and the image to provide a stronger motivation.
The duplicative small slips of paper were hard to handle and keep organized. They slowed the manipulations. So I moved one copy of each to the "Common" set.
Test 1 took place on 11 November 2006. Woods played the roles of facilitator and observer. Pool played the roles of greeter and computer. The observation notes by Woods follow:
Task 1
Okay, frame the photo. Need to get close; is there a zoom function? (Should we incorporate this? Severity: 1.) Not there. This will work. Snap.
Watches as Jonathan shuffles the proceeding screens.
Pause, hmmmm, probably there is nothing else I have to do. Could be a result of the slow computer. (Do things happen too quickly? Should users have to request a translation? Or is it simply a disconnect between the paper prototype and the speed of the software? Severity: 2.)
Unclear that he was finished with the task. We made the task descriptions more clear in this regard.
Task 2
Activate camera, press capture (not capture and translate); faster than before.
I've learned that it will translate automatically.
Does the person speak French? No, so I'll press this 'drop down'. He does speak Chinese.
Positive reaction to the idea of translating from (rather than to) native language. Not sure if he would think to do that without the script. Perhaps users could upload their manipulations for others to see.
Very enthusiastic reaction to this 'manipulation' of the capabilities of the phone.
Task 3
Go to camera, capture. Repetition: very fast.
So we see the outline, hmmm (pause), could we actually touch? Wow, yeah. That's what the phrase says.
No recognition that the displayed text above is the OCR. (Should this be indicated? Severity: 2.)
Ability-to-touch issues: could be part of the phone which the user would quickly learn to expect and be comfortable with. Once learned, reminders take up valuable screen space. With no reminders, however, the user must keep this knowledge in their head. Hardware could function to encourage touch or activity (animation) on the screen of the dynamic OCR while taking a photo and afterwards. Would reveal image as more than just a static photograph. (Severity: 3.)
Overall, positive, enthusiastic reaction to the prototype. Expressed interest in using it. Commented very good/interesting idea.
Test 2 took place on 11 November 2006. Woods played the roles of facilitator and observer. Pool played the roles of greeter and computer. The observation notes by Woods follow:
Task 1
Explained a comparison of the available buttons in order to select the panlingual camera button.
Again, comparison of possible options in order to select the capture and translation button.
Pause, okay, so I'm finished. Again, somewhat surprised/disoriented by the shuffle of three screens: see the photo, see the translation progress, see the translation and new menu. Could add a confirmation step before the translation request is sent. (Severity: 3.)
Task 2
Quicker routine, panlingual camera button, capture and translate. Translated into French; the arrow probably opens a list of more options. Here's Chinese.
It is assumed that she knows what language he speaks. What if that is not known? What if she has to specify a language that is not offered by the location-aware algorithm? (Severity: 2.)
Task 3
I think that these (the greeking lines) would be the writing. I'll take a picture (now more natural than before).
See the translation, long pause, point toward the image (but did not touch the screen). "Is that the part that I wanted?" Do I need to take another photo? (Somewhat confused; does not realize she could touch to select text.) If I need to take another photo, I'll press the Panlingual Camera button again. Moves beyond the paper prototype (but would, presumably, be able to repeat task 1 and take a photo of that smaller chunk of text to get the specific translation). Should we support both routes in the test?
Can I zoom in on this part of the photo and narrow my translation?!! (Severity: 3)
Thought that the functionality would be useful, for comparison specifically. While looking at a menu, for example. Cue could appear for the first few times used and then go away? (Severity: 2.)
Interested in the device and happy to participate.
Test 3 took place on 12 November 2006. Woods played the roles of facilitator and observer. Pool played the roles of greeter and computer. Wong played the role of observer.
The observation notes by Woods follow:
Task 1
Switch to camera mode, rather than panlingual camera mode. (Should the camera button appear on the first screen of the prototype or should we assume it is activated by some other means? Severity: 1.)
Intervention necessary; no supporting screen for the camera mode; need to start at the beginning.
Notices and presses Panlingual camera button, frame picture, press capture & translate.
Task 2
Touch "the camera" (in reality the Pan-lingual camera button; severity 1) to take a picture of the note.
Capture and translate.
So, it was translated into French, but I wanted Chinese. Should I have specified that before? Severity: 2. Pause; could I press this arrow? Okay, now I can make it Chinese.
Confusion; did it translate the original or translate the already translated French? Severity: 1.
Task 3
Go to the Panlingual Camera (now correct terminology).
I guess capture and translate, so, pause (questionably confused by the clarity of the sketch). Seemingly, thinking about zooming in on the particular part of the scene.
Perhaps I should take a second photo of the scene?
Touching the screen as she explained what she was thinking, but seemingly it was not deliberate. We chose not to react to the touches and she did not complete the task.
She thought that any reminder that the screen could be touched should only appear temporarily because of the limited screen space. She expressed some interest in a tutorial, but indicated that she thought it would be helpful for others. She said that she would read the manual in order to understand the features available.
Zooming is perhaps a better metaphor for isolating text than highlighting or circling, which could still be useful for revealing the OCR that was used in making the translation. Severity: 3.
Desire expressed for dictionary functionality (rather than a translation) so that she could form the sentences on her own.
Expressed interest in using the device in her travels (after we work out the bugs). Spoke a lot about difficulties with, and shortcomings of, current computer translation software.
The observation notes by Wong follow:
Task 1
Pressed Camera mode (seemed slightly unsure whether this was the right button or not). (There is a panlingual camera mode button as well.) Severity: 2.
Wrong button (no screens for this); told her; she pressed panlingual camera mode.
Framed the photo, pressed capture & translate.
Question: This is a bus sign; when the bus moves and the sign rotates, how is the camera going to catch that? Severity: 1.
E.g., Seattle buses have lit display, changing frequently (speed of camera).
Translation complete screen:
Task completed! Recognized the language translating from, and was able to identify the translation to English on the screen.
Task 2
Presses panlingual camera button.
Frames question on piece of paper.
Presses capture and translate.
Similar to before; much faster and more confidence; repetition in the task.
Waits as Jonathan changes papers.
Similar screen to first task; recognizes that it went from English to French.
Wonders that it should be in Chinese, not French.
Seems unsure of where to proceed next. Severity: 3.
Thinks there is no Chinese screen card that we have made (might have thrown her off from revealing that we don't have a screen for the camera mode from task 1). Severity: 1.
Selects language; feels unsure about what she's doing. Severity: 1.
Presses "to Chinese".
Menu disappears.
Wonders about the underlining of the Chinese characters.
Exits.
Follow-up:
Notes that it would be nice if you could select the translation TO and translation FROM options prior to pressing "capture and translate".
Wants to make sure that it's not going from English to French to Chinese, rather than English to Chinese directly. Mobile phone currently gives the sense that it's going thru multiple translations, rather than being assured that it was direct from the English. Gives the sense that it's less accurate. Severity: 3.
Task 3
Panlingual camera mode.
Frames the tombstone.
Capture and translate (SEEMS like she wants to zoom in, but sees no option for that; "so I guess, just capture and translate?" Pauses briefly. Severity: 2.
Translation complete.
Looking at screen, thinking; pause; wondering if in the top half of the screen the portion highlighted is signaling what is translated in the bottom.
Subject asks if camera has zoom-in capabilities, in order to cut out extraneous information.
Would translate the entire tombstone, and THEN go back and take another photo of the smaller section and translate that. Severity: 3.
Doesn't recognize the ability to press the screen to zoom/highlight in on the image. Severity: 2.
Need some brief popup dialog information that informs the user (on first translate use?) that he/she can press the screen to zoom/highlight in on the screen.
Followup to test.
Feels it would take a day to recognize that there are touch-screen abilities.
Would read manual.
Frustrated with tutorials; don't pay attention to the details, just keep clicking next.
Test subject notes that, if there is too much text on the small screen, it can get cluttered, or it can be overlooked by the user, since there is too much information to concentrate on.
Typically uses language translation programs to translate the root word, and will transform the word herself to correct gender, tense, and form.
Test 4 took place on 12 November 2006. Tadesse played the roles of facilitator and observer. Pool played the roles of greeter and computer. Wong played the role of observer.
The observation notes by Wong follow:
Task 1
Pushes panlingual camera button.
Places phone on the bus.
Hits capture and translate.
Seems to have a good understanding of the system.
Positive: Reads the system quickly, understands the system quickly, identified all items quickly, and did not seem to have any problems or pausing/thinking.
Task 2
Presses panlingual camera.
Capture and translate; placed camera onto prewritten question.
Waits as Jonathan finds papers.
Recognizes it knows it wants it from English to French.
Recognizes that a GPS feature auto-selected ("wanted") it in French because of location awareness, but subject knows she now needs it to be in Chinese instead.
Sees the dropdown arrow, and presses it.
Selects "to Chinese".
Knows to show this to server.
Positive again: very quick to recognize elements on the screen, doesn't require much time to figure out what things mean, isn't afraid of the system.
Task 3
Presses panlingual camera mode.
Captures and translates.
"First I'm going to do all of it, because that's what I'm interested in."
"Since I want just this little part here (points to the little area we highlighted), I'm going to press the panlingual camera button again." Severity: 3.
Show subject tombstone + camera mode again; she asks if she can zoom in on the capture point (severity: 3). Jonathan tells her that we have not provided that ability and asks the subject "Is there any other way you might go about doing this?"
Pause for a long time; subject doesn't realize that there is the ability to touch the screen for tool-tip information. Jonathan reminds her of the map demonstration; "Ohhh, so I can touch it and it'll show me information?"
Subject's suggestion: Make the text look like a hyperlink, so that the user knows it can be "clicked". Severity: 1.
On image, put some sort of detail around box to indicate that it is a link that could be clicked.
"If it were more clear that you could tap on it for extra information, then it might be better." Severity: 2.
Final thoughts:
Liked the feature that auto-selected French because of the GPS system knowing that the phone was in France.
Not sure how you would solve the problem of how to determine that "to chinese" should be placed at the top of the menu.
Minor detail: What about different types of Chinese?
Couldn't see herself using it out on the street taking a picture of a sign, but could see herself using it on a menu at a restaurant. Outdoor photography: worried more about the fonts; would rather ask someone instead (speed/time issues?).
Question: What is the difference between "camera" and "panlingual camera" on the main device? Severity: 2.
Jonathan: How would you get people to start using the touch-screen ability?
Answer: Demos during in-store presentations. Rollovers? So that people know that the ability is available.
"I'm thinking, let's say this is your line of text, and it has some sort of marking that lets you know, if you had your finger over it, it would display, and when you take it away, it would disappear or you could click an 'x'."
Notes by Tadesse:
Task 1
Doesn't see herself using the tool for task 1.
Worried about the fonts.
Task 2
Liked the GPS; impressed.
Seemed to want more language choices in the dropdown.
Also was wondering about the different types of Chinese.
Also wondering why/how Chinese got to be at the top of the list.
Task 3
Hyperlinks to indicate to the user that you can click/hover mouse over it.
Zoom-in: first thing that came into her mind, since it was in a "camera" mode. Severity: 2.
Low-Fidelity Prototype Sketches during Development

Low-Fidelity Prototype Sketch Detail Being Drawn

Low-Fidelity Prototype in User Testing
