AI-based Object Detection Anywhere On The Screen

AI based object detection mode Appium

      We know that object identification is a tedious task during automation. Some of the objects may have dynamic in nature and we will use relative XPath to make it strong. In some situations, we may fail to get the exact locator strategy to do your activities on the respective screens or pages. In the mobile automation world, Appium introduced an AI-based method to find objects anywhere on the screen. I already explain most of the implementations related to the Appium Classifier Plugin in one of my articles. In this article, we will see how AI-based object detection helpful in automation. This AI-based object detection has been possible with the support of with Appium.

     Initially, we will discuss the story of two modes, The default mode for the plugin is called element lookup mode. Let’s say we are trying to find an icon that looks like a clock on the screen. The way this mode works is that Appium finds every leaf-node element in the view, generates an image of how it appears on the screen, and then checks to see whether that image looks like a clock. If so, then the plugin will remember the element that was associated with that image, and return it as the result of the call to findElement.  This mode works great for finding many icons. However, it can run into some big problems. Let’s say we are working with the Files app on iOS and we want to find and tap on the Recents button at the bottom of the screen. There is an icon that looks like a clock just above to Recent label, so we might be tempted to try and find the element using that label:

driver.findElement(MobileBy.custom(” ai:clock “)).click();

     This might work depending on the confidence level set, but it will probably fail because on iOS especially, the clock icon is not its own element, it is grouped together with the accompanying text into one element. When a screenshot is taken of this element, and it is fed into the classifier model, the arbitrary text that is included as part of the image will potentially confuse the model and lead to a bad result.

     The second mode is called object detection mode. Object detection is a feature of many image-based machine learning models, that aims to take an input image and generate as its output a list of regions within that image that count as distinct objects. In the object detection mode, it takes a screenshot of the entire screen, runs this entire screenshot through an object detection model, and then cuts up screen regions based on objects detected by the model. Those cut-up images are then sent through the same classifier as in element lookup mode, so we can see if any of them match the type of icon that we are looking for.

     The main advantage of object detection mode, it won’t matter if there is a unique clock element all by itself somewhere on the screen or not. This is also the main disadvantage because when using this mode, we are not finding actual elements. We are just finding locations on the screen where we believe an image element to be. Below is an excerpt of the complete script and I would like to share the required desired capabilities, constant settings while using the image elements:

DesiredCapabilities caps = new DesiredCapabilities();
caps.setCapability(“platformName”, “iOS”);
caps.setCapability(“automationName”, “XCUITest”);
caps.setCapability(“platformVersion”, “11.4”);
caps.setCapability(“deviceName”, “iPhone X”);
caps.setCapability(“bundleId”, BUNDLE_ID);
HashMap<String, String> customFindModules = new HashMap<>();
customFindModules.put(“ai”, “test-ai-classifier”);
caps.setCapability(“customFindModules”, customFindModules);
caps.setCapability(“testaiFindMode”, “object_detection”);
caps.setCapability(“testaiObjectDetectionThreshold”, “0.9”);
caps.setCapability(“shouldUseCompactResponses”, false);

     Once you defined all the required desired capabilities, you can perform the object detection mode in your test case using below sample code snippet,

driver.setSetting(Setting.CHECK_IMAGE_ELEMENT_STALENESS, false);
driver.findElement(MobileBy.custom(” ai:clock “)).click();

     Try to use the test-ai-classifier model to detect objects anywhere on the screen. Also, you can try the above sample code snippet in your mobile automation scripting world and explore more.

Reference: Appium Pro

make it perfect!


Find Element By Image Locator Strategy in Appium


      Picking the right locator strategy during mobile automation helps to stabilize the automation flow. One of my previous articles, I have already mentioned the selection of the right locator strategy. There I have mentioned about -image selector.

    The -image locator strategy is pretty nifty. Supported on all platforms and drivers, you can pass an image file to Appium and it will try to locate the matching elements on the screen. This supports fuzzy matching and certainly has its applications, but you should only use it when the other locator strategies aren’t working. It may not always behave deterministically, which means it can be a source of test flakiness. Also, unless you have a special code editor, they may be a little more difficult to work with and maintain. Of course, visual UI changes will always break -image selectors, whereas the other selectors only break when the element hierarchy changes. In this article, I would like to explain about find elements By Image. 

     The Appium team finally decided to bite the bullet and support a small set of visual detection features, which are available as of Appium 1.9.0 version and above. An image element looks to your client code exactly like any other element, except that you’ve found it via a new -image locator strategy. Instead of a typical selector (like “foo”), the strings used with this new locator strategy are Base64-encoded image files. The image file used for a particular find action represents a template image that will be used by Appium to match against regions of the screen in order to find the most likely occurrence of the element you’re looking for.

     Suppose you have a scenario like, click on a particular image from your gallery and validate it.  None of the images have any identifying information in the UI tree, and their order changes every time we load the view, so we can’t hardcode an element index if we want to tap a particular image. Find-by-image to the rescue. Actually using this strategy is the same as finding an element using any other strategy:

WebElement element = driver.findElementByImage(base64EncodedImageFile);

     Of course, for this to work we have to have a Base64-encoded version of your image file. In Java 8 this is pretty straightforward:

// consider that you have a File called myImageFile.You can fetch image path //refImgUrl using ClassLoader class
File myImageFile = Paths.get(refImgUrl.toURI()).toFile();

     One another great thing is that finding elements by image supports both implicit and explicit wait strategies, so your tests can robustly wait until your reference image matches something on the screen:

By image = MobileBy.image(base64EncodedImageFile);
new WebDriverWait(driver, 10)

     By using the above simple code snippet you can easily navigate to the gallery, click on your particular image, and validate it. Try to use the above find element by image locator strategy in your Appium automation script and enjoy automation.

Reference: Appium Pro

make it perfect!