Appium Drivers Family


     In this article, I would like to share different Appium drivers and how to choose which one to use in the automation situation. Appium is not just one “thing”. It can automate multiple platforms, from iOS to Android and beyond. The way that Appium organizes itself around this multi-platform model is by means of various “drivers”. This is more or less the same architecture as was first adopted by Selenium/WebDriver, which also utilizes a number of independent “drivers” in order to support the automation of multiple browsers.

     There is one Appium driver per underlying automation technology. This almost means one Appium driver per platform (one for iOS, one for Android, etc…), but not quite. This is because some platforms (like Android) have multiple automation technologies that Appium targets to support automation of that platform. Android actually has 3 Appium drivers: one based on UiAutomator, one based on UiAutomator 2, and one based on Espresso.

How Drivers Work?

     The driver is arguably the most important concept in all of Appium. It’s ultimately the driver’s responsibility to turn the Appium API (known as the WebDriver Protocol) into automation for a particular platform. For architectural simplicity among other reasons, each individual driver is itself a standalone WebDriver-compatible server (though it doesn’t have all the options the main Appium server does). Drivers themselves can have quite a complex internal architecture, sometimes relying on a whole stack of technologies. Here’s a diagram showing the full stack of technologies involved in the XCUITest driver (the current iOS driver):

XCUITest Flow

     The XCUITest driver is made available as part of Appium, and is brought to life whenever someone starts an iOS session. Internally, it spins up another bit of technology known as WebDriverAgent, which is responsible for turning WebDriver protocol commands into XCUITest library calls.

Different Drivers

     So far there are 11 drivers available. This bit of code defines which strings are allowed to be used as values for the automationName capability. Of course, each driver typically only supports one platform. Here’s a brief description of each of the drivers, by their automationName:

  • Appium: this automation name really means “just give me the default driver for the platform I’ve chosen.” It’s not actually a separate driver on its own.
  • UiAutomator2: this is the current default Android driver, based on Google’s UiAutomator technology.
  • UiAutomator1: this is the older Android driver, based on an older version of UiAutomator.
  • XCUITest: this is the current iOS driver, based on Apple’s XCUITest technology.
  • YouiEngine: this is a driver produced by You.i Labs, to support automation of apps on many different platforms built using their SDK.
  • Espresso: this is the newest Android driver, based on Google’s Espresso technology.
  • Tizen: this is a driver produced by Samsung to assist in automation of Xamarin apps built for the Tizen OS.
  • Fake: the “fake” driver is used internally by Appium for the purpose of testing, and you shouldn’t need to ever use it.
  • Instruments: this is an older iOS driver based on an Apple technology which was removed after iOS 9. Basically, don’t use this.
  • Windows: Microsoft put together an Appium-compatible server called WinAppDriver, and this is the driver that connects it up with the main Appium server. You can use this driver to automate Windows apps.
  • Mac: this is a driver which enables automation of Mac desktop apps.

     As mentioned above, each of these drivers has its own internal architecture, as you can see in this detailed diagram:

Appium Driver Architecture

     You can choose your driver based on the platform and type of application. Mostly, in the current automation world, people are using UiAutomator2 and Espresso for Android platform, XCUITest for iOS platform, WindowsDriver for desktop windows applications and MacDriver for desktop MAC applications. Try to learn more about different automation drivers prior to start the actual development.

Reference: Appium Pro

make it perfect!

Toggle between iOS application during AUT


     According to one recent report, smartphone users use on average 9 apps per day, and 30 apps per month. Many apps today provide value by integrating with other aspects of the mobile phone experience, even if it’s just as simple as taking photos using the device’s camera. Testing of multi-app integrations with iOS has been challenging, because Apple’s automation model allowed automation only so long as the AUT was running. The second you switched to another app to automate another part of your user flow, the automation would die. Thankfully, with Appium’s XCUITest driver (available with iOS 9.3+), we’re no longer limited to this kind of sandbox.

     There are so many kinds of multi-app flows we could experiment with, but let’s imagine a scenario involving the camera. Let’s say your app allows the user to take photos, and modifies them in some way before finally saving them to the system-wide photo library (the Camera Roll). One key verification for testing your app will be ensuring that, after performing the necessary steps in your app, the photo is newly available in the Camera Roll album of the Photos app.

     This is all possible to encode into your Appium scripts using two handy commands: mobile:launchApp and mobile:activateApp. We know that the launchApp before, in the Appium. activateApp is just the same, only the app must already have been launched; activateApp merely brings it back to the foreground. Here are examples of how we’d use these commands, by incorporating the Bundle IDs of the apps we’re dealing with (having these Bundle IDs available is a pre-requisite),

// launch the photos app (with the special bundle id seen below)
HashMap<String, Object> args = new HashMap<>();
args.put(“bundleId”, “”);
driver.executeScript(“mobile: launchApp“, args);

// re-activate that AUT (in this case The App)
args.put(“bundleId”, “io.cloudgrey.the-app”);
driver.executeScript(“mobile: activateApp“, args);

      I believe this is so simple with Appium we can really pretend we are a user and do all the things a user would do in order to check the appropriate conditions. And chances are, you have a different requirement than verifying photo saving. So the thing to take away is the principle of using mobile: launchApp and mobile: activateApp. What’s nice is that this particular strategy will work not only for simulators but also for real devices.

Try to use this logic of switch applications during your iOS Appium automation.

Reference: Appium Pro

make it perfect!

Start Appium Server – MAC and Windows


     In this article introducing the AppiumServiceBuilder functionality built into the Appium Java client. This will help to start the Appium Server programmatically in both MAC and Windows OS.

Start Your Appium Server…

        Using this concept you can start many Appium Server sessions for your automation, especially in case of parallel execution you may need to start the appium servers in different terminals with different ports. Here I am creating the available ports using the ServerSocket class and binding the dynamic port with build service of AppiumDriverLocalService. Based on your Appium service availability you can load all the desired capability and send those as a request to Appium Server. Here I defined following private functions to work with this utility,

  • getPort – This function helps to generate random dynamic port number which required to start the Appium service. The port number includes port for Appium server URL, Chrome Driver port, and Bootstrap port.
  • getNodePath – This function helps to get the path of installed node in both Windows and MAC Operating Systems.
  • getJSPath – This function helps to get the path of the js in both Windows and MAC Operating Systems.
  • startAppiumServer – This function helps to start the Appium Driver service in both Windows and MAC Operating Systems.

Below is the actual implementation of getPort function,

private static int getPort() throws Exception {
int port = 0;
try {
ServerSocket socket = new ServerSocket(0);
port = socket.getLocalPort();
} catch (Exception e) {
return port;

Below is the actual implementation of getNodePath function,

private static String getNodePath() throws IOException, InterruptedException {
String jsPaths = null;
String nodePath = null;
Process p;
BufferedReader reader;
String operatingSystem = System.getProperty(““);
if (operatingSystem.contains(“Win”)) {
String whereAppium = “where” + ” ” + “node”;
p = Runtime.getRuntime().exec(whereAppium);
reader = new BufferedReader(new InputStreamReader(p.getInputStream()));
while ((jsPaths = reader.readLine()) != null) {
nodePath = jsPaths;
if (nodePath == null) {
} else {
String command = “which ” + “node”;
p = Runtime.getRuntime().exec(command);
reader = new BufferedReader(new InputStreamReader(p.getInputStream()));
String line = “”;
while ((line = reader.readLine()) != null) {
nodePath = line;
if (nodePath == null) {
return nodePath;

Below is the actual implementation of getJSPath function,

private static String getJSPath() throws IOException, InterruptedException {
String jsPaths = null;
String actualJSPath = null;
String operatingSystem = System.getProperty(““);
if (operatingSystem.contains(“Win”)) {
String whereAppium = “where” + ” ” + “appium”;
Process p = Runtime.getRuntime().exec(whereAppium);
BufferedReader stdInput = new BufferedReader(new InputStreamReader(p.getInputStream()));
while ((jsPaths = stdInput.readLine()) != null) {
actualJSPath = jsPaths.replace(“appium”, “node_modules\\appium\\build\\lib\\main.js”);
if (actualJSPath == null) {
} else {
actualJSPath = “//usr//local//lib//node_modules//appium//build//lib//main.js”;
return actualJSPath;

Below is the actual implementation to start Appium Driver Service,

public static void startAppiumServer() throws Exception {
String IP_ADDRESS = “”;
String bootStrapPort;
String chromePort;
int port;
AppiumDriverLocalService service;
port = getPort();
bootStrapPort = Integer.toString(getPort());
chromePort = Integer.toString(getPort());
service = AppiumDriverLocalService.buildService(new AppiumServiceBuilder().withAppiumJS(new File(getJSPath()))
.usingDriverExecutable(new File(getNodePath())).withIPAddress(IP_ADDRESS).usingPort(port)
.withArgument(AndroidServerFlag.BOOTSTRAP_PORT_NUMBER, bootStrapPort)
.withArgument(AndroidServerFlag.CHROME_DRIVER_PORT, chromePort));
if (service.isRunning()) {
// Load the Desired Capabilities
} else {
System.out.println(“Server startup failed”);

Please try to use above logic to start the Appium Driver services prior load the desired capabilities.

make it perfect !

Automate Complex Touch Gestures With Appium


     Mobile apps often involve the use of touch gestures, sometimes in a very complex fashion. Compared to web apps, touch gestures are common and crucial for mobile apps. Even navigating a list of items requires a flick or a swipe on mobile, and often the difference between these two actions can make a meaningful difference in the behavior of an app.

  The story of Appium‘s support for these gestures is as complex as the gestures themselves. Because Appium has the goal of compatibility with WebDriver, we’ve also had to evolve in step with some of the changes in the WebDriver protocol. When Appium was first developed, there were two different ways of getting at actions using WebDriver’s existing JSON Wire Protocol. These APIs were designed for automating web browsers, driven by a mouse, so needless to say they didn’t map well to the more generally gestural world of mobile app use. To make things worse, the iOS and Android automation technologies Appium was built on top of did not expose useful general gesture primitives. They each exposed their own platform-specific API, with commands like swipe that took different parameters and behaved differently to each other, as well as to the intention of the JSON Wire Protocol.

       Appium thus faced two challenges: an inadequate protocol spec, and an inadequate and variable set of basic mobile APIs provided by Apple and Google. Our approach was to implement the JSON Wire Protocol as faithfully as possible, but also to provide direct access to the platform-specific APIs, via the executeScript command. We defined a special prefix, mobile:, and implemented access to these non-standard APIs behind that prefix. So, users could run commands like driver.executeScript("mobile: swipe", args) to directly activate the iOS-specific swipe method provided by Apple. It was a bit hacky, but gave users control over whether they wanted to stick to Appium‘s implementation of the standard JSON Wire Protocol, or gain direct access to the platform-specific methods.

     Meanwhile, the Appium team learned that the Selenium project was working on a better, more general specification for touch actions. This new API was proposed as part of the new W3C WebDriver Spec which was under heavy development at the time. The Appium team then implemented this new API, and gave Appiumclients another way to automate touch actions which we thought would be the standard moving forward. Unfortunately, this was an erroneous assumption. Appium was too quick to implement this new Actions spec—the spec itself changed and was recently ratified in a different incarnation than what Appium had originally supported. At least the spec is better now!

   That brings us to today. Over the years the mobile automation technologies Appium uses have themselves evolved, and clever people have uncovered new APIs that allow Appium to perform totally (or almost totally) arbitrary gestures. The W3C WebDriver Spec is also now an official thing, including the most recent incarnation of the Actions API. The confluence of these two factors means that, since Appium 1.8, it’s been possible for Appium to support the W3C Actions API for complex and general gestures, for example in drawing a picture.

      Why do we care about the W3C API specifically? Apart from Appium‘s desire to match the official WebDriver standard, the Appium clients are built directly on top of the Selenium WebDriver clients. As the Selenium clients change to accommodate only the W3C APIs, that means Appium will need to support them or risk getting out of phase with the updated clients.

The Actions API

     The W3C Actions API is very general, which also makes it abstract and a bit hard to understand. Basically, it has the concept of input sources, many of which can exist, and each of which must be of a certain type (like key or pointer), potentially a subtype (like mouse, pen, touch), and have a certain id (like “default mouse”). Pointer inputs can register actions like pointerMovepointerUppointerDown, and pause. By defining one (or more) pointer inputs, each with a set of actions and corresponding parameters, we can define pretty much any gesture you like.

    Conceptually, for example, a “zoom” gesture consists of two pointer input sources, each of which would register a series of actions

Pointer 1 (type touch, id "forefinger")
          - pointerMove to zoom origin coordinate, with no duration
          - pointerDown
          - pointerMove to a coordinate diagonally up and right, with duration X
          - pointerUp
Pointer 2 (type touch, id "thumb")
          - pointerMove to zoom origin coordinate, with no duration
          - pointerDown
          - pointerMove to a coordinate diagonally down and left, with duration X
          - pointerUp

     These input sources, along with their actions, get bundled up into one JSON object and sent to the server when you call driver.perform(). The server then unpacks the input sources and actions and interprets them appropriately, each input source’s actions being played at the same time (each action taking up one “tick” of virtual time, to keep actions synchronized across input sources).

Example: Let’s Draw a Surprised Face

     Let’s take a look at some actual Java code. Because the W3C Actions API is so new, there aren’t a whole lot of helper methods in the Java client we can use to make our life easier. The helper methods which do exist are pretty boring, basically implementing moving to and tapping on elements, with code like:

Actions actions = new Actions(driver);;

     But this is the kind of thing we can pretty much do already, without the Actions API. What about something cool, like drawing arbitrary shapes? Let’s teach Appium to draw some circles so we can play around with a “surprised face” picture (just to keep things simple—as an exercise to the reader it would be interesting to augment the drawing methods to be able to also draw half-circles, so that our face could be more smiley and less surprised). If we’re going to draw some circles, the first thing we’ll need is some math, so we can get the coordinates for points along a circle:

private Point getPointOnCircle (int step, int totalSteps, Point origin, double radius) {
        double theta = 2 * Math.PI * ((double)step / totalSteps);
          int x = (int)Math.floor(Math.cos(theta) * radius);
            int y = (int)Math.floor(Math.sin(theta) * radius);
              return new Point(origin.x + x, origin.y + y);

     The idea here is that we’re going to define a circle by an origin coordinate, a radius, and a number of “steps”—how fine-grained our circle should be. If we pass in a value of 4 for totalSteps, for example, our circle will actually be a square! The greater the number of steps, the more perfect a circle it will appear. Then we use the magic of Trigonometry to determine, for a given iteration (“step”), which point our “finger” should be on. Now we need to use this method to actually do some drawing with Appium:

private void drawCircle (AppiumDriver driver, Point origin, double radius, int steps) {
              Point firstPoint = getPointOnCircle(0, steps, origin, radius);
              PointerInput finger = new PointerInput(Kind.TOUCH, "finger");
              Sequence circle = new Sequence(finger, 0);
              circle.addAction(finger.createPointerMove(NO_TIME, VIEW, firstPoint.x, firstPoint.y));
                  for (int i = 1; i < steps + 1; i++) {
                    Point point = getPointOnCircle(i, steps, origin, radius);
                    circle.addAction(finger.createPointerMove(STEP_DURATION, VIEW, point.x, point.y));

     In this drawCircle method we see the use of the low-level Actions API in the Java client. Using the PointerInput class we create a virtual “finger” to do the drawing, and a Sequence of actions corresponding to that input, which we will populate as we go on. From here on out we’re just calling methods on our input to create specific actions, for example moving, touching the pointer to the screen, and lifting the pointer up. (In doing this we utilize some timing constants defined elsewhere). Finally, we hand the sequence off to the driver to perform! This method is a perfectly general way of drawing a circle with Appium using the W3C Actions API. But it is not yet enough to draw a surprised face. For that, we need to specify which circles we want to draw, at which coordinates:

public void drawFace() {
                      Point head = new Point(220, 450);
                      Point leftEye = head.moveBy(-50, -50);
                      Point rightEye = head.moveBy(50, -50);
                      Point mouth = head.moveBy(0, 50);
                      drawCircle(driver, head, 150, 30);
                      drawCircle(driver, leftEye, 20, 20);
                      drawCircle(driver, rightEye, 20, 20);
                      drawCircle(driver, mouth, 40, 20);

     Here we simply define the center points of our various face components (head, eyes, and mouth), and then draw a circle of an appropriate size and with an appropriate arrangement of parts so that it kind of looks like a face.

Please try to draw circles using above logic Actions API.

Reference: Appium Pro

make it perfect !

Pick the Right Locator Strategy during Mobile Automation


   Here focus just on the selector strategies provided by Appium for native iOS and Android testing using the UiAutomator2 and XCUITest drivers.  Here’s prioritized list of locator strategies:

  1. accessibility id
  2. id
  3. XPath
  4. Class name
  5. Locators interpreted by the underlying automation frameworks, such as: -android uiautomator, -ios predicate string, -ios class chain
  6. -image

1. accessibility id
This is the top choice should surprise nobody. If you have the option of using accessibility IDs, use them. Normally an app developer needs to add these specifically to UI elements in the code. The major benefit of accessibility IDs over just the id locator strategy is that while app developers add these IDs for testing, users with handicaps or accessibility issues benefit. People who use screen readers or other devices, and algorithms which inspect UIs, can better navigate your application. On Android, this locator strategy uses the accessibility contentDescription property. On iOS, this locator strategy uses the accessibility identifier. Here’s something surprising: in the XCUITest driver, the accessibility idid, and name locator strategies are all identical.

  They are implemented the same way. Go ahead and try switching your locator strategies, you will get the same results. This may change in the future, but for now you can find an element using the name or text in it because iOS has many ways in which it sets a default accessibility identifier if one is not provided by the developer.

2. id
Element IDs need to be added by a developer, but they allow us to pinpoint the exact element in the app we care about, even if the UI changes appearance. The drawback is you need to be able to talk to your developers. Many testing teams do not have this luxury.

   This locator strategy is pretty similar to accessibility id except that you don’t get the added benefit of accessibility. As noted above, on iOS, this is actually identical to accessibility id. On Android, the id locator strategy is implemented using Resource IDs. These are usually added to UI elements manually by app developers, though are not required, so many app developers will omit them if they don’t think they’re important.

3. XPath
Now this is contentious. XPath is the most expressive and commonly accepted locator strategy. Despite Appium developers warning against XPath’s low performance for years, it still seems to be the most popularly used locator strategy. This is probably because there are many selections that can’t easily be made any other way. For example, there’s no way to select the parent of an element using the simple id selectors. The benefit of being able to express more complicated queries must outweigh the cost to performance for all but the testers whose apps have such large XML element hierarchies that XPath is completely unusable.

     XPath selectors can be very brittle, but they can be responsibly wielded to great effect. Being intentional and carefully picking selectors rather than taking whatever an inspector provides can mitigate the brittleness.

    I think part of the popularity of XPath stems from its use with Selenium and web development, as well as it being the default of many tutorials and inspection tools. When working on Appium I always expected our XPath handling to break more often, but I remember few bugs, probably the benefit open source XPath libraries built for more generalized use.

    The Android OS provides a useful dumpWindowHierarchy which gives us an XML document of all the elements on the screen. From there we apply the XPath query and find elements.

    iOS does not supply a method of getting the entire hierarchy. Appium’s implementation starts at the root application element and recurses through each element’s children and populates and XML document which we can then apply the XPath query to. I still think XPath is unintuitive, especially for those new to programming, but at least it’s a well-accepted industry standard.

4. -android uiautomator, -ios predicate string or -ios class chain
These are the “native” locator strategies because they are provided by Appium as a means of creating selectors in the native automation frameworks supported by the device. These locator strategies have many fans, who love the fine-grained expression and great performance (equally or just slightly less performance than accessibility id or id).

    These locator strategies are crucial for those who have UIs which escape the grasp of the other locator strategies or have an element tree which is too large to allow the use of XPath. In my view, they have several drawbacks. These native locator strategies require a more detailed understanding of the underlying automation frameworks. Uiautomator and XCUITest can be hard to use, especially for those less familiar with Android and iOS specifics. These locator strategies are not cross platform, and knowing the ins-and-outs of both iOS and Android is challenging.

   In addition, the selectors passed to these native locator strategies are not directly evaluated by the mobile OS. Java, Kotlin, Objective C and Swift all lack an eval function which would allow interpreting a string of text as code. When you send an android uiautomator selector to Appium, the text passes through a very simplistic parser and uses Reflection to reconstruct the objects referenced in the text. Because of this, small mistakes in syntax can throw off the entire selector and only the most common methods are supported. This system is unreliable and often encounters difficult bugs.

    If Android or iOS change the testing classes in new updates, your selectors might need to be updated. Using XPath, Appium will keep track of the OS updates and your selectors should keep working.

    A personal quibble I have with these locator strategies is that you are essentially writing a different programming language (such as Java) inside of a string in your test code. Your text editor will not offer syntax highlighting or semantic analysis inside of these queries which makes them harder to maintain. On the other hand, sometimes there’s just no other way, and for those who are proficient in these methods XPath can seem clumsy in comparison.

5. -image
This locator strategy is pretty nifty. Supported on all platforms and drivers, you can pass an image file to Appium and it will try to locate the matching elements on the screen. This supports fuzzy matching, and certainly has its applications, but you should only use it when the other locator strategies aren’t working. It may not always behave deterministically, which mean it can be a source of test flakiness. Also, unless you have a special code editor, they may be a little more difficult to work with and maintain. Of course, visual UI changes will always break -image selectors, whereas the other selectors only break when the element hierarchy changes.

Reference: Appium Pro

make it perfect !