AI-based Object Detection Anywhere On The Screen

AI based object detection mode Appium

      We know that object identification is a tedious task during automation. Some of the objects may have dynamic in nature and we will use relative XPath to make it strong. In some situations, we may fail to get the exact locator strategy to do your activities on the respective screens or pages. In the mobile automation world, Appium introduced an AI-based method to find objects anywhere on the screen. I already explain most of the implementations related to the Appium Classifier Plugin in one of my articles. In this article, we will see how AI-based object detection helpful in automation. This AI-based object detection has been possible with the support of with Appium.

     Initially, we will discuss the story of two modes, The default mode for the plugin is called element lookup mode. Let’s say we are trying to find an icon that looks like a clock on the screen. The way this mode works is that Appium finds every leaf-node element in the view, generates an image of how it appears on the screen, and then checks to see whether that image looks like a clock. If so, then the plugin will remember the element that was associated with that image, and return it as the result of the call to findElement.  This mode works great for finding many icons. However, it can run into some big problems. Let’s say we are working with the Files app on iOS and we want to find and tap on the Recents button at the bottom of the screen. There is an icon that looks like a clock just above to Recent label, so we might be tempted to try and find the element using that label:

driver.findElement(MobileBy.custom(” ai:clock “)).click();

     This might work depending on the confidence level set, but it will probably fail because on iOS especially, the clock icon is not its own element, it is grouped together with the accompanying text into one element. When a screenshot is taken of this element, and it is fed into the classifier model, the arbitrary text that is included as part of the image will potentially confuse the model and lead to a bad result.

     The second mode is called object detection mode. Object detection is a feature of many image-based machine learning models, that aims to take an input image and generate as its output a list of regions within that image that count as distinct objects. In the object detection mode, it takes a screenshot of the entire screen, runs this entire screenshot through an object detection model, and then cuts up screen regions based on objects detected by the model. Those cut-up images are then sent through the same classifier as in element lookup mode, so we can see if any of them match the type of icon that we are looking for.

     The main advantage of object detection mode, it won’t matter if there is a unique clock element all by itself somewhere on the screen or not. This is also the main disadvantage because when using this mode, we are not finding actual elements. We are just finding locations on the screen where we believe an image element to be. Below is an excerpt of the complete script and I would like to share the required desired capabilities, constant settings while using the image elements:

DesiredCapabilities caps = new DesiredCapabilities();
caps.setCapability(“platformName”, “iOS”);
caps.setCapability(“automationName”, “XCUITest”);
caps.setCapability(“platformVersion”, “11.4”);
caps.setCapability(“deviceName”, “iPhone X”);
caps.setCapability(“bundleId”, BUNDLE_ID);
HashMap<String, String> customFindModules = new HashMap<>();
customFindModules.put(“ai”, “test-ai-classifier”);
caps.setCapability(“customFindModules”, customFindModules);
caps.setCapability(“testaiFindMode”, “object_detection”);
caps.setCapability(“testaiObjectDetectionThreshold”, “0.9”);
caps.setCapability(“shouldUseCompactResponses”, false);

     Once you defined all the required desired capabilities, you can perform the object detection mode in your test case using below sample code snippet,

driver.setSetting(Setting.CHECK_IMAGE_ELEMENT_STALENESS, false);
driver.findElement(MobileBy.custom(” ai:clock “)).click();

     Try to use the test-ai-classifier model to detect objects anywhere on the screen. Also, you can try the above sample code snippet in your mobile automation scripting world and explore more.

Reference: Appium Pro

make it perfect!


AI for Appium Test Automation


     Perhaps the most buzzy of the buzzwords in tech these days is “AI” (Artificial Intelligence), or “AI/ML” (throwing in Machine Learning). To most of us, these phrases seem like magical fairy dust that promises to make the hard parts of our tech jobs go away. To be sure, AI is largely over-hyped, or at least its methods and applications are largely misunderstood and therefore assumed to be much more magical than they are.

         How you can use AI with Appium! It’s a bit surprising, but the Appium project has developed an AI-powered element finding plugin for use specifically with Appium.

          First, let’s discuss element finding plugin. In a recent addition to Appium, added the ability for third-party developers to create “plugins” for Appium that can use an Appium driver together with their own unique capabilities to find elements. As we’ll see below, users can access these plugins simply by installing the plugin as an NPM module in their Appium directory, and then using the customFindModules capability to register the plugin with the Appium server.

       The first plugin worked on within this new structure was one that incorporates a machine learning model from designed to classify app icons, the training data for which was just open-sourced. This is a model which can tell us, given the input of an icon, what sort of thing the icon represents (for example, a shopping cart button, or a back arrow button). The application we developed with this model was the Appium Classifier Plugin, which conforms to the new element finding plugin format.

      Basically, we can use this plugin to find icons on the screen based on their appearance, rather than knowing anything about the structure of our app or needing to ask developers for internal identifiers to use as selectors. For the time being the plugin is limited to finding elements by their visual appearance, so it really only works for elements which display a single icon. Luckily, these kinds of elements are pretty common in mobile apps.

         This approach is more flexible than existing locator strategies (like accessibility id, or image) in many cases, because the AI model is trained to recognize icons without needing any context, and without requiring them to match only one precise image style. What this means is that using the plugin to find a “cart” icon will work across apps and across platforms, without needing to worry about minor differences.

         So let’s take a look at a concrete example, demonstrating the simplest possible use case. If you fire up an iOS simulator you have access to the Photos application, which looks something like this:

The Photos app with search icon

        Notice the little magnifying glass icon near the top which, when clicked, opens up a search bar:

The Photos app with search bar and cancel text

             Let’s write a test that uses the new plugin to find and click that icon. First, we need to follow the setup instructions to make sure everything will work. Then, we can set up our Desired Capabilities for running a test against the Photos app:

DesiredCapabilities caps = new DesiredCapabilities();
        caps.setCapability("platformName", "iOS");
        caps.setCapability("platformVersion", "11.4");
        caps.setCapability("deviceName", "iPhone 6");
        caps.setCapability("bundleId", ""); 

Now we need to add some new capabilities: customFindModules (to tell Appium about the AI plugin we want to use), and shouldUseCompactResponses (because the plugin itself told us we need to set this capability in its setup instructions):

HashMap<String, String> customFindModules = new HashMap<>();
      customFindModules.put("ai", "test-ai-classifier");
      caps.setCapability("customFindModules", customFindModules);
      caps.setCapability("shouldUseCompactResponses", false); 

         You can see that customFindModules is a capability which has some internal structure: in this case “ai” is the shortcut name for the plugin that we can use internally in our test, and “test-ai-classifier” is the fully-qualified reference that Appium will need to be able to find and require the plugin when we request elements with it.

Once we’ve done all this, finding the element is super simple:


           Here we’re using a new custom locator strategy so that Appium knows we want a plugin, not one of its supported locator strategies. Then, we’re prefixing our selector with ai: to let Appium know which plugin specifically we want to use for this request (because there could be multiple). Of course since we are in fact only using one plugin for this test, we could do away with the prefix (and for good measure we could use the different find command style, too):


           And that’s it! As mentioned above, this technology has some significant limitations at the current time, for example that it can really only reliably find elements which are one of the icons that the model has been trained to detect. On top of that, the process is fairly slow, both in the plugin code (since it has to retrieve every element on screen in order to send information into the model), and in the model itself. All of these areas will see improvement in the future, however. And even if this particular plugin isn’t useful for your day-to-day, it demonstrates that concrete applications of AI in the testing space are not only possible, but actual!

Please try to implement above mentioned AI component in your automation script.

make it perfect !

Reference: Appium Pro