iOS Specific Touch Gestures With Appium


     Nowadays the iOS automation is very popular especially doing with help of Appium. Here I would like to explain some important iOS specific touch gestures.  Because these are not part of the WebDriver spec, Appium provides this access by overloading the executeScript command, as you’ll see in the examples below.

mobile: swipe

     This command ultimately calls the XCUIElement.swipe* family of methods provided by XCUITest, and thus takes two parameters: a direction (whether to swipe up, down, left, or right), and the ID of an element within which the swipe is to take place (Appium defaults to the entire Application element if no element is specified). Following is an example,

// swipe up then down
Map<String, Object> args = new HashMap<>();
args.put("direction", "up");
driver.executeScript("mobile: swipe", args);
args.put("direction", "down");
driver.executeScript("mobile: swipe", args);

mobile: scroll

     If you want to try and make sure that each movement of your gesture moves a view by the height of the scrollable content, or if you want to scroll until a particular element is visible, try mobile: scroll. It works similarly to mobile: swipe but takes more parameters:

  • element: the id of the element to scroll within (the application element by default). Call this the “bounding element”
  • direction: the opposite of how direction is used in mobile: swipe. A swipe “up” will scroll view contents down, whereas this is what a scroll “down” will do.
  • name: the accessibility ID of an element to scroll to within the bounding element
  • predicateString: the NSPredicate of an element to scroll to within the bounding element
  • toVisible: if true, and if element is set to a custom element, then simply scroll to the first visible child of element

Following are some examples:

// scroll down then up
Map<String, Object> args = new HashMap<>();
args.put("direction", "down");
driver.executeScript("mobile: scroll", args);
args.put("direction", "up");
driver.executeScript("mobile: scroll", args);

// scroll to the last item in the list by accessibility id
args.put("direction", "down");
args.put("name", "Stratus");
driver.executeScript("mobile: scroll", args);

// scroll back to the first item in the list
MobileElement list = (MobileElement) driver.findElement(By.className("XCUIElementTypeScrollView"));
args.put("direction", "up");
args.put("name", null);
args.put("element", list.getId());
driver.executeScript("mobile: scroll", args);

mobile: pinch

     To pinch (described by a two-finger gesture where the fingers start far apart and come together) or to zoom (described by the inverse gesture where fingers start together and expand outward), use mobile: pinch, which calls XCUIElement.pinch under the hood. As with the other methods described so far, you can pass in an element parameter defining the element in which the pinch will take place (the entire application by default).

The only required parameter is scale:

  • Values between 0 and 1 refer to a “pinch”
  • Values greater than 1 refer to a “zoom”

An additional optional parameter velocity can be sent, which corresponds to “the velocity of the pinch in scale factor per second”. Following is an example:

// zoom in on something
Map<String, Object> args = new HashMap<>();
args.put("scale", 5);
driver.executeScript("mobile: pinch", args);

mobile: tap

The best way to tap on an element is using So why do we have mobile: tap? This method allows for extra parameters x and y signifying the coordinate at which to click. The nice thing is that this coordinate is either screen-relative (if an element parameter is not included, the default), or element-relative (if an element parameter is included).

This means that if you want to tap at the very top left corner of an element rather than dead center. Following is an example:

// tap an element very near its top left corner
Map<String, Object> args = new HashMap<>();
args.put("element", ((MobileElement) element).getId());
args.put("x", 2);
args.put("y", 2);
driver.executeScript("mobile: tap", args);

mobile: doubleTap

 There’s more to tapping than single-tapping! And while you can certainly build a double-tap option using the Actions API, XCUITest provides a XCUIElement.doubleTap method for this purpose, and it could presumably have greater reliability than synthesizing your own action. In terms of parameters, you should send in either an element parameter, with the ID of the element you want to tap, or both an x and y value representing the screen coordinate you wish to tap. Following is an example:

// double-tap the screen at a specific point
Map<String, Object> args = new HashMap<>();
args.put("x", 100);
args.put("y", 200);
driver.executeScript("mobile: doubleTap", args);

mobile: twoFingerTap

    Not to be confused with a double-tap, a two-finger-tap is a single tap using two fingers! This method has only one parameter, which is required: good old element (it only works in the context of an element, not a point on the screen). Following is an example:

// two-finger-tap an element (assume element object already exists)
Map<String, Object> args = new HashMap<>();
args.put("element", ((MobileElement) element).getId());
driver.executeScript("mobile: twoFingerTap", args);

mobile: touchAndHold

    Many iOS apps allow a user to trigger special behavior by tapping and holding the finger down on a certain UI element. You can specify all the same parameters as for doubleTap (elementx, and y) with the same semantics. In addition you must set the duration parameter to specify how many seconds you want the touch to be held. Following is an example:

// touch and hold an element
Map<String, Object> args = new HashMap<>();
args.put("element", ((MobileElement) element).getId());
args.put("duration", 1.5);
driver.executeScript("mobile: touchAndHold", args);

mobile: dragFromToForDuration

     Another commonly-implemented app gesture is “drag-and-drop”. As with all of these gestures, it’s possible to build a respectable drag-and-drop using the Actions API, but if for some reason this doesn’t work, XCUITest has provided a method directly for this purpose. It’s a method on the XCUICoordinate class. Really, what’s going on is that we’re defining a start and an end coordinate, and also the duration of the hold on the start coordinate. In other words, we have no control over the drag duration itself, only on how long the first coordinate is held before the drag happens. Following are the required parameters:

  • element: an element ID, which if provided will cause Appium to treat the coordinates as relative to this element. Absolute screen coordinates otherwise.
  • duration: the number of seconds (between 0.5 and 6.0) that the start coordinates should be held
  • fromX: the x-coordinate of the start position
  • fromY: the y-coordinate of the start position
  • toX: the x-coordinate of the end position
  • toY: the y-coordinate of the end position

Following is an example:

// touch, hold, and drag based on coordinates
Map<String, Object> args = new HashMap<>();
args.put("duration", 1.5);
args.put("fromX", 100);
args.put("fromY", 100);
args.put("toX", 300);
args.put("toY", 600);
driver.executeScript("mobile: dragFromToForDuration", args);

Please try to practice all the touch gestured during your iOS automation using Appium.

Reference: Appium Pro

make it perfect !

Automate Complex Touch Gestures With Appium


     Mobile apps often involve the use of touch gestures, sometimes in a very complex fashion. Compared to web apps, touch gestures are common and crucial for mobile apps. Even navigating a list of items requires a flick or a swipe on mobile, and often the difference between these two actions can make a meaningful difference in the behavior of an app.

  The story of Appium‘s support for these gestures is as complex as the gestures themselves. Because Appium has the goal of compatibility with WebDriver, we’ve also had to evolve in step with some of the changes in the WebDriver protocol. When Appium was first developed, there were two different ways of getting at actions using WebDriver’s existing JSON Wire Protocol. These APIs were designed for automating web browsers, driven by a mouse, so needless to say they didn’t map well to the more generally gestural world of mobile app use. To make things worse, the iOS and Android automation technologies Appium was built on top of did not expose useful general gesture primitives. They each exposed their own platform-specific API, with commands like swipe that took different parameters and behaved differently to each other, as well as to the intention of the JSON Wire Protocol.

       Appium thus faced two challenges: an inadequate protocol spec, and an inadequate and variable set of basic mobile APIs provided by Apple and Google. Our approach was to implement the JSON Wire Protocol as faithfully as possible, but also to provide direct access to the platform-specific APIs, via the executeScript command. We defined a special prefix, mobile:, and implemented access to these non-standard APIs behind that prefix. So, users could run commands like driver.executeScript("mobile: swipe", args) to directly activate the iOS-specific swipe method provided by Apple. It was a bit hacky, but gave users control over whether they wanted to stick to Appium‘s implementation of the standard JSON Wire Protocol, or gain direct access to the platform-specific methods.

     Meanwhile, the Appium team learned that the Selenium project was working on a better, more general specification for touch actions. This new API was proposed as part of the new W3C WebDriver Spec which was under heavy development at the time. The Appium team then implemented this new API, and gave Appiumclients another way to automate touch actions which we thought would be the standard moving forward. Unfortunately, this was an erroneous assumption. Appium was too quick to implement this new Actions spec—the spec itself changed and was recently ratified in a different incarnation than what Appium had originally supported. At least the spec is better now!

   That brings us to today. Over the years the mobile automation technologies Appium uses have themselves evolved, and clever people have uncovered new APIs that allow Appium to perform totally (or almost totally) arbitrary gestures. The W3C WebDriver Spec is also now an official thing, including the most recent incarnation of the Actions API. The confluence of these two factors means that, since Appium 1.8, it’s been possible for Appium to support the W3C Actions API for complex and general gestures, for example in drawing a picture.

      Why do we care about the W3C API specifically? Apart from Appium‘s desire to match the official WebDriver standard, the Appium clients are built directly on top of the Selenium WebDriver clients. As the Selenium clients change to accommodate only the W3C APIs, that means Appium will need to support them or risk getting out of phase with the updated clients.

The Actions API

     The W3C Actions API is very general, which also makes it abstract and a bit hard to understand. Basically, it has the concept of input sources, many of which can exist, and each of which must be of a certain type (like key or pointer), potentially a subtype (like mouse, pen, touch), and have a certain id (like “default mouse”). Pointer inputs can register actions like pointerMovepointerUppointerDown, and pause. By defining one (or more) pointer inputs, each with a set of actions and corresponding parameters, we can define pretty much any gesture you like.

    Conceptually, for example, a “zoom” gesture consists of two pointer input sources, each of which would register a series of actions

Pointer 1 (type touch, id "forefinger")
          - pointerMove to zoom origin coordinate, with no duration
          - pointerDown
          - pointerMove to a coordinate diagonally up and right, with duration X
          - pointerUp
Pointer 2 (type touch, id "thumb")
          - pointerMove to zoom origin coordinate, with no duration
          - pointerDown
          - pointerMove to a coordinate diagonally down and left, with duration X
          - pointerUp

     These input sources, along with their actions, get bundled up into one JSON object and sent to the server when you call driver.perform(). The server then unpacks the input sources and actions and interprets them appropriately, each input source’s actions being played at the same time (each action taking up one “tick” of virtual time, to keep actions synchronized across input sources).

Example: Let’s Draw a Surprised Face

     Let’s take a look at some actual Java code. Because the W3C Actions API is so new, there aren’t a whole lot of helper methods in the Java client we can use to make our life easier. The helper methods which do exist are pretty boring, basically implementing moving to and tapping on elements, with code like:

Actions actions = new Actions(driver);;

     But this is the kind of thing we can pretty much do already, without the Actions API. What about something cool, like drawing arbitrary shapes? Let’s teach Appium to draw some circles so we can play around with a “surprised face” picture (just to keep things simple—as an exercise to the reader it would be interesting to augment the drawing methods to be able to also draw half-circles, so that our face could be more smiley and less surprised). If we’re going to draw some circles, the first thing we’ll need is some math, so we can get the coordinates for points along a circle:

private Point getPointOnCircle (int step, int totalSteps, Point origin, double radius) {
        double theta = 2 * Math.PI * ((double)step / totalSteps);
          int x = (int)Math.floor(Math.cos(theta) * radius);
            int y = (int)Math.floor(Math.sin(theta) * radius);
              return new Point(origin.x + x, origin.y + y);

     The idea here is that we’re going to define a circle by an origin coordinate, a radius, and a number of “steps”—how fine-grained our circle should be. If we pass in a value of 4 for totalSteps, for example, our circle will actually be a square! The greater the number of steps, the more perfect a circle it will appear. Then we use the magic of Trigonometry to determine, for a given iteration (“step”), which point our “finger” should be on. Now we need to use this method to actually do some drawing with Appium:

private void drawCircle (AppiumDriver driver, Point origin, double radius, int steps) {
              Point firstPoint = getPointOnCircle(0, steps, origin, radius);
              PointerInput finger = new PointerInput(Kind.TOUCH, "finger");
              Sequence circle = new Sequence(finger, 0);
              circle.addAction(finger.createPointerMove(NO_TIME, VIEW, firstPoint.x, firstPoint.y));
                  for (int i = 1; i < steps + 1; i++) {
                    Point point = getPointOnCircle(i, steps, origin, radius);
                    circle.addAction(finger.createPointerMove(STEP_DURATION, VIEW, point.x, point.y));

     In this drawCircle method we see the use of the low-level Actions API in the Java client. Using the PointerInput class we create a virtual “finger” to do the drawing, and a Sequence of actions corresponding to that input, which we will populate as we go on. From here on out we’re just calling methods on our input to create specific actions, for example moving, touching the pointer to the screen, and lifting the pointer up. (In doing this we utilize some timing constants defined elsewhere). Finally, we hand the sequence off to the driver to perform! This method is a perfectly general way of drawing a circle with Appium using the W3C Actions API. But it is not yet enough to draw a surprised face. For that, we need to specify which circles we want to draw, at which coordinates:

public void drawFace() {
                      Point head = new Point(220, 450);
                      Point leftEye = head.moveBy(-50, -50);
                      Point rightEye = head.moveBy(50, -50);
                      Point mouth = head.moveBy(0, 50);
                      drawCircle(driver, head, 150, 30);
                      drawCircle(driver, leftEye, 20, 20);
                      drawCircle(driver, rightEye, 20, 20);
                      drawCircle(driver, mouth, 40, 20);

     Here we simply define the center points of our various face components (head, eyes, and mouth), and then draw a circle of an appropriate size and with an appropriate arrangement of parts so that it kind of looks like a face.

Please try to draw circles using above logic Actions API.

Reference: Appium Pro

make it perfect !