Guide: Image analysis in Python with Azure AI Vision

This article assumes little to no prior knowledge about Azure AI Services, however does assume basic knowledge about Azure itself, as well as AI in general.

Microsoft’s Azure AI Platform offers various solutions and tools for the purpose of rapidly developing Artificial Intelligence (AI) projects, i.e., any projects that involve some form of AI. It offers tooling for common problem spaces, such as vision, speech and translation. [1]

One key feature of Azure AI is that it can be used both through SDK’s in various languages — such as Python, Rust or C# — or through regular HTTP calls to a REST endpoint. Through one of these methods, AI can easily be incorporated into both existing and new projects.

In this blog post we will be looking up at setting up a simple Azure AI-powered application that downloads, caches and then analyzes an image, using Python. In particular, we will look at Azure AI Vision, which provides access to image processing algorithms. This can be useful for purposes such as automatically assigning tags to images. [2]

For simplicity, let us make the following assumptions:

  • All input, other than ‘quit’, is a valid URL. There is no explicit URL validation.
  • Errors can occur in a very limited number of situations, meaning that we only explicitly handle exceptions when absolutely needed.

CREATING AZURE RESOURCES

There are various ways of setting up an Azure AI Vision service. It is possible to create a multi-service AI resource, which offers access to multiple AI services at a given time, or a single-service AI resource, which offers access to just one, such as vision or speech. Which option to pick can differ based on factors such as whether the billing should be separate for each AI service. For now, let us proceed by creating a multi-service resource.

To create such a resource, open the Azure Portal and create a new resource. In the marketplace, search for azure ai services and select the corresponding resource. Figure 1 shows the resource you should pick.

Figure 1: The Azure AI Services resource in the marketplace.

Click on the create button. Let us ignore the network configuration for now and simply specify a resource group, pricing tier and a name. For example:

Figure 2: The keys and endpoint page of an Azure AI Services resource.
  • Subscription: armonden-main
  • Resource group: armonden-platform-dev-rg
  • Name: armonden-platform-dev-ai
  • Pricing tier: Standard S0

On the resource overview page, go to Keys and Endpoint. This page lists two keys and an endpoint. Both the key and endpoint are required to set up the client. It does not matter which key you select, since both can be used. One typical use case would be to use one key for a development environment and the other for production.1 Figure 2 shows an example of the keys and endpoint page of a multi-service account.

SETTING UP THE AZURE AI VISION CLIENT

To create an Azure AI Vision client using Python, first create a new folder, with a .env file. For now it suffices to use this file to store the configuration. The contents of this file should look similar to the following file:

AI_SERVICE_ENDPOINT=https://ai-services-resource.cognitiveservices.azure.com/
AI_SERVICE_KEY=ai_services_key

The exact configuration is determined by the values retrieved from the keys and endpoint page. Now, set up a Python virtual environment by running the following commands in a terminal:

$ python3 -m venv .venv
$ ./.venv/bin/activate

Now that a virtual environment has been set up and activate, it is time to download the dependencies. For this simple example, install the following packages using Pip:

  • dotenv (at version 0.9.9 or higher)
  • azure-ai-vision-imageanalysis (at version 1.0.0 or higher)
  • requests (at version 2.32.3 or higher)

Once these packages have been installed, let us move on to the implementation of the client itself. The gameplay loop of the client is very simple. Considering a happy path, it behaves in the following manner:

  1. Read some input \(x\) from the user.
  2. If \(x=\)'q', then stop the program.
  3. Calculate some hash \(x_\text{hashed}\) based on the input.
  4. If \(x_\text{hashed}\notin\text{cache}\), then download the image and store it in the cache.
  5. Perform image analysis on file \(x_\text{hashed}\).
  6. Print the results to console.
  7. Return to 1.

IMPLEMENTING THE CLIENT

Now, it is time to move on to the code. Create a new python file, e.g., client.py. Then add the following imports, such that the top of the file looks as follows:

#!/usr/bin/env python3

from os import getenv, path, makedirs

from dotenv import load_dotenv
from requests import get

from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.ai.vision.imageanalysis.models import VisualFeatures
from azure.core.credentials import AzureKeyCredential

Note that I have included a shebang to avoid the need to type python3 client.py, however this is merely personal preference and is not required.2 If you include it, ensure that it points to the correct binary: on Windows, for example, python3 may not exist and instead be referred to as simply python. In this case, update the shebang accordingly.3

Now let us implement a basic loop that reads user input and calls a function that, given a vision client and a URL, runs an image analysis. For good measure, we also set up stub for the vision client loading function. The code, as you would expect, is quite trivial:

def run_analysis_on_url(client: ImageAnalysisClient, url: str):
    pass


def load_vision_client() -> ImageAnalysisClient:
    pass


if __name__ == '__main__':
    client = load_vision_client()

    while True:
        user_input = input('Please enter a URL or type \'quit\' to exit: ')

        if user_input.lower() == 'quit':
            exit()

        run_analysis_on_url(client, user_input)

The next step is to implement the code responsible for setting up the actual Azure AI client. We can do this by a simply loading the configuration from the .env file and using the parameters as arguments to create a new instance of the ImageAnalysisClient class. If we assume that nothing can possibly go wrong and, hence, do not have to catch any exceptions, we end up with the following code:

def load_vision_client() -> ImageAnalysisClient:
    load_dotenv()
    endpoint = getenv('AI_SERVICE_ENDPOINT')
    key = getenv('AI_SERVICE_KEY')

    return ImageAnalysisClient(
        endpoint,
        AzureKeyCredential(key)
    )

Although we can also run image analysis on a URL directly, in this case we will implement a caching mechanism. The procedure is quite straightforward. Given, again, some URL \(x\):

  1. Calculate \(x_\text{hashed}\).
  2. If \(x_\text{hashed}\notin\text{cache}\), then download the image and store it in the cache.
  3. Open the cached image.

To calculate the hash, let us use the built-in hash() function. Then, again, assuming that \(x\) points to a valid image, we can implement the entire caching mechanism as follows:

def load_image(url: str):
    filename = f'cache/{str(hash(url))}'
    
    if not path.exists('cache'):
        makedirs('cache')

    if not path.exists(filename):
        image_data = get(url).content

        with open(filename, 'wb') as handler:
            handler.write(image_data)

    return open(filename, 'rb')

Now, let us move on to the actual image analysis part. There are several attributes we can read out, e.g., tags, caption and objects, however we will focus on the first two. Since the analysis method requires the desired attributes to be specified explicitly, we can simply pass a list in the form of visual_features=[VisualFeatures.CAPTION, VisualFeatures.TAGS].

The call to Azure AI Vision then becomes trivial:

def run_analysis_on_url(client: ImageAnalysisClient, url: str):
    image_data = load_image(url)

    result = client.analyze(
        image_data=image_data,
        visual_features=[VisualFeatures.CAPTION, VisualFeatures.TAGS]
    )

We can now read out the detected attributes. For the tags, let us apply a filter, implemented through an arbitrary confidence threshold of \(\theta_c > 0.8\). We then end up with the following:

def run_analysis_on_url(client: ImageAnalysisClient, url: str):
    image_data = load_image(url)

    result = client.analyze(
        image_data=image_data,
        visual_features=[VisualFeatures.CAPTION, VisualFeatures.TAGS]
    )

    tags = [tag.name for tag in result.tags.list if tag.confidence > 0.8]
    caption = result.caption.text if result.caption is not None else 'No description'

    print(f'{caption}. Tags: [{', '.join(tags)}]')

TESTING THE CLIENT

The client has now been implemented. To verify that the client funtions correctly, start the script and provide the URL to an image. Figure 3 shows the result of running the client on a screenshot of the Azure Marketplace. You can see that it correctly generates a caption and a number of tags.

Figure 3: The result of analyzing a screenshot of the Azure Marketplace.

CONCLUSION

In this guide, we have seen how trivial it is to set up a working client for Azure AI Vision. We have implemented the client in such a way that it can early be used within other Python code. By applying a caching mechanism, we only have to download an image once. This means that if we wish to rerun the analysis of a very large number of images, there is no reason to send the same number of requests to the server again.

This guide has also demonstrated that it is fairly trivial to apply a threshold to the results of an analysis. In this example the threshold has only been applied to the tags, however we can apply such a threshold to the caption in a similar manner.

EXERCISES

  1. Modify the program such that it takes in a path to some file containing comma-separated urls. Each of these images is then processed, with the final result exported to some output file. By removing the loop as well, the program has become a batch program.
  2. Introduce color detection: what are the primary colors of the image?
  3. Extend the code to support detection of objects within the image. Save each detected object as a dedicated image, based on the coordinates of the corresponding bounding box.

REFERENCES

[1] https://learn.microsoft.com/en-us/azure/ai-services/what-are-ai-services

[2] https://learn.microsoft.com/en-us/azure/ai-services/computer-vision/overview

[3] https://learn.microsoft.com/en-us/training/modules/fundamentals-azure-ai-services/3-create-azure-ai-resource

FOOTNOTES

  1. Though one could question whether these should not be separate environments all together. ↩︎
  2. In fact, depending on the shell you are using this may do nothing at all. ↩︎
  3. On Windows, for example, the shebang could instead be #!/usr/bin/env python. ↩︎

Quantum Mechanics Part I: The Wave Function Ψ

INTRODUCING THE BLOG SERIES

This is the first article in a blog series about introductory quantum mechanics. These articles are mainly based on my notes, which are in turn based on the book Introduction to Quantum Mechanics by David J. Griffiths. It is the book that was used during the Quantum Physics 1 and 2 courses at the University of Groningen when I took these courses.

Before moving on, allow me to make an important disclaimer: I am not a physicist. I am not a mathematician. Nor is Quantum Physics one of my strongest areas. But I do find it fascinating, and I hope that by sharing these notes, you will find it too. I aim to have each of these articles checked by those much more experience in physics (and mathematics).

In these blog series I assume little to no prior physics knowledge. I will try to elaborate on concepts in classical physics as needed, but if there is something you do not understand, don’t be scared to look things up online! A basic knowledge of linear algebra and integral and differential calculus is assumed, however.

Let’s get started then!

ATOMS

Everything around us is made up of atoms, which are very, very tiny objects that are themselves made up of even smaller objects. From the bones in our body to the water we swim in. From the planes we use to go on holiday to the chairs we sit in (for must of us, a bit too much!). The word atom comes from ancient Greek, in which, ironically, it had a meaning that can be translated as indivisible or uncuttable.

All atoms consist of a center, which is referred to as the nucleus, and one or more electrons, ‘orbiting’ around the nucleus. Nuclei are even tinier than atoms themselves: if we consider the nucleus to be an ant inside a soccer stadium, then the electrons would be approximately at the location of the stands. Not all atoms are the same, however. Nuclei consist of smaller objects (particles) which we call protons and neutrons. A more general way to refer to protons and nucleons is by referring to them as neucleons.

We can go even further. In fact, protons and neutrons are in turn made up of quarks, even tinier particles. Although not quite relevant to the course, you may be interested in knowing that nucleons owe their charge to their quark contents. Protons, for example, are made up of two up-type quarks and one down-type quark, which have charges \(+\dfrac{2}{3}e\) and \(-\dfrac{1}{3}e\) respectively. This means that \(\dfrac{2}{3}e+\dfrac{2}{3}e-\dfrac{1}{3}e=1e\), which is precisely the charge of a proton.

A popular approach to visualizing atoms is to draw them as satellites orbiting a planet, as in the image below.

An (incorrect) representation of an atom. Source: sciencefacts.net.

However, as you will read later in this post, a slightly more accurate representation would be the following:

A more accurate representation of the atom, which a central nucleus surrounded by an electron cloud. Source: https://time.graphics/period/539933.

The lightest atom is the hydrogen-1 atom, which consists of just a single proton and a single electron. One of the heaviest elements is uranium, which has 92 protons. The number of neutrons may vary. Each version of a specific atom (i.e., a different number of neutrons) is known as an isotope. For example, a hydrogen atom with one neutron is known as hydrogen-2 or deuterium. A future course on nuclear physics will cover these more extensively, for now we will just focus on hydrogen-1, since it makes our calculations a lot easier.

CLASSICAL MECHANICS, QUANTUM MECHANICS AND RELATIVITY

The field of quantum mechanics describes the dynamics of things at an incredibly small scales: at the scale of molecules, atoms and smaller. Physicists found that, at this level, the laws of classical mechanics no longer seem sufficient. The transition from classical mechanics is governed by a single fundamental constant \(\hbar\) (the reduced Planck constant).

In a similar fashion we have the extension of special relativity, which is governed by \(c\), the speed of light. Special relativity allows for relativistic effects to be taken into account, which occur when objects travel at very large speeds.

A field referred to as quantum field theory includes both relativistic and quantum effects. Consider the standard model of elementary particles, which covers particles such as electrons. The term elementary refers to the fact that these particles are considered to be unbreakable: they are not made up of smaller particles. Hence, one would not find protons in the standard model but rather its constituent quarks, \(u\) and \(d\).

The diagram below displays the relation between these theories.

The fields of classical mechanics, relativistic mechanics, quantum mechanics and quantum field theory, visualized. Source: YassineMrabet at Wikipedia: https://commons.wikimedia.org/wiki/File:Modernphysicsfields.svg

The third extension is gravity, governed by the gravitational constant \(g\). Combining gravity and special relativity results in the theory called general relativity.

A unified theory considering relativistic, quantum and gravitational effects is referred to as quantum gravity. Two popular theories are string theory and loop quantum gravity. Yet, at the time of writing these theories are still very poorly understood.

In this course we will focus on non-relativistic quantum mechanics.

THE WAVE FUNCTION

Imagine a tennis ball in one-dimensional space (for simplicity we will assume the one-dimensional case for the time being, i.e., assuming that the ball can only move left or right). Due to the tennis ball being big enough, we can describe it using the laws of classical mechanics. These laws tells us that the position of the particle at a time \(x\) is given by \(x(t)\) and its velocity is given by \(\dfrac{dx}{dt}\). Seems simple enough, right?

Now let us shrink the tennis ball down in size, such that it reaches the size of an electron inside an atom. 

At this level, objects (particles) are no longer described by these functions but instead by a wave function \(\Psi(\vec{r}, t)\), which is obtained by solving the Schrödinger equation, which in the one-dimensional case has the following shape:

$$i\hbar\dfrac{\partial^2\Psi}{\partial t^2} = \dfrac{\hbar^2}{2m}\dfrac{\partial^2\Psi}{\partial x^2}+V\Psi$$

Rather than giving a specific position of a particle, the wave function gives the probability that the particle can be found at a certain point. We can use the wave function for other things as well, such as calculating the momentum of a particle.

The Schrödinger equation looks intimidating, so let us decompose it. It simply tells us how the wave function of a particle evolves over time. Notice that it is a partial differential equation with both a spatial and temporal dependence. Also notice that it is a complex function, due to the imaginary component \(i\) (the square root of \(-1\)). Does that mean that the results of a quantum mechanical calculation will be imaginary? No, not at all. The wave function itself will be imaginary, however this imaginary component drops out once we attempt to calculate, for example, the position (you will see why later).

The \(\hbar\) is known as the reduced Planck constant. It can be seen everywhere in quantum mechanics, and can be thought of as some sort of conversion number that governs the transition from classical to quantum mechanics. It’s value is the normal Planck constant divided by \(2\pi\):

$$\hbar = \dfrac{h}{2\pi} = 1.054572 \cdot 10^{-34} J\cdot s$$

The remaining coefficient in the left sign of the equality operator is a second-order spatial derivative. On the right hand side we see the Hamiltonian operator acting on the wave function \(\Psi\). We will discuss operators a lot more later, but for now let us just say that the Hamiltonian operator gives us the total energy of the particle. In quantum mechanics, the Hamiltonian represents the same thing as in classical mechanics: the sum of the kinetic and potential energies.

If you have not studied classical mechanics before, imagine a tennis ball on the edge of a high cliff. While it is stationary, it has an incredible amount of potential energy. As the ball gets pushed off the cliff, it gains kinetic energy. The total energy is the same however: as the ball gains kinetic energy, it loses potential energy.

For simplicity we will assume a conservative system. What that means is that we consider the potential energy to be constant, i.e., independent of time. A temporal dependence may be more interesting if we are, for example, dealing with electric fields that change over time.

Solving the Schrödinger equation gives us the wave equation. Over the course of the blog series we will actually try to solve the equation.

SOME FINAL THOUGHTS

In the next few posts we go into much more detail on the concept of a wave function in quantum mechanics. For now let us end this post with some final thoughts.

Quantum mechanics is inherently probabilistic. In classical mechanics we know where an object will be at any time \(t\) if we know its trajectory. In quantum mechanics we do not know exactly where the object is. We just know where it is most likely to be. This probability distribution is given by the wave function. This is one of the most fundamental concepts of quantum mechanics. And it is also one of the hardest concepts to truly accept.

EXERCISES

  1. What is an isotope?
  2. Why are particles such as protons and neutrons not part of the standard model, while particles such as electrons are in the standard model?
  3. What is the difference between the Planck constant \(h\) and the reduced Planck constant \(\hbar\)?

REFERENCES

  1. Griffiths, D.J. and Schroeter, D.F., 2018. Introduction to quantum mechanics. Cambridge university press.
  2. De Sanctis, E., Monti, S. and Ripani, M., 2016. Energy from nuclear fission. Undergraduate Lecture,© Springer International Publishing Switzerland.

Automated Accessibility Assessment

In the European Union, the number of citizens with some form of disability is rising rapidly. An article published by the European Parliament states that in 2020, this number is estimated to reach 120 million in 2020. [1] Therefore, it is evident that making web applications more accessible becomes increasingly more important.

One way to test accessibility is through the Web Content Accessibility Guidelines (WCAG), a set of guidelines developed by the World Wide Web Consortium (W3C). These guidelines were developed with the goal of providing a shared accessibility standard for content on the web. [2] The WCAG guidelines can therefore not only be used for developing accessible applications, but also for assessing these applications.

Tooling for Automated Accessibility Assessments

In many projects, continuous integration is used in combination with linters or code analyzers, which automatically test code on a set of criteria, such as requiring explicit access modifiers on methods. A similar approach can be used to assess accessibility. The W3C provides a list of tools that can be used for this purpose. [3]

A bit of background: during my final semester at the Hanzehogeschool in Groningen, the Netherlands, I did my graduation project at the RDW (the Dutch vehicle authority). The project involved finding a way to automate accessibility assessment based on WCAG version 2.1. The tool I ended up using was Axe Core, developed by Dequeue Labs.

Let us consider a tool like Pa11y. During the research phase of my graduation project, this was one of the tools I tested. The tool works really well. Given a URL, Pa11y assesses accessibility and returns the results. For simple web applications, such as this website, this works really well. No authentication to worry about. Little to no user feedback.

Now, let us consider a situation in which a tool like Pa11y would be less helpful. Consider a large forum with over 1,000,000 members and twice that many messages. Each member belongs to one or more member groups. Each member group has a set of permissions. Can you see how this could become a problem? If we decide to use crawling, the time it would take to run a ‘simple’ accessibility assessment would be enormous. Each page would have to be scanned. This will take time.

Yet, we should also consider that, in a hypothetical scenario in which we actually run an accessibility assessment on the forum described above, that the assessment will be incomplete. Guests do not have access to the administration panel. Members that do have access to the administration panel, will not be able to see the login page. Then how can we get a complete assessment?

An Alternative Approach

Rather than running recursive accessibility assessments on a URL, an alternative approach would be to integrate them into automated tests. Whenever the UI changes, an accessibility assessment would be run. As long as there is a unit test for a component, it can be tested for accessibility. Assessments are no longer limited by component visibility, rather by the availability of tests.

One of the reasons I ended up using Axe Core in my final product was that it provided a way to run accessibility tests without the need to provide a URL. [4] Adding accessibility assessment calls to automated tests can even be done automatically, which I did in my final product.

Advantages and Limitations of Using Tests (for A11Y)

A major advantage of this approach is that any component can be assessed, regardless of visibility level. Assessments only need to be run as often as new UI elements appear. Therefore, in case of a large forum like we described before, we would only need to scan a component that displays messages in a topic once.

However, using automated tests to perform accessibility assessments relies, by definition, on the existence of automated tests. If a UI element is not exposed by any tests, then it cannot be tested for accessibility. It could be argued that any sufficiently large project should have sufficient automated tests, but in some projects this simply might not be an option. In these cases, it would be more practical to run assessments through URLs.

Limitations of Automated Assessments

It must be noted that the complete automation of accessibility assessments is unrealistic. For example, guideline 2.4.2 describes that all web pages should have a descriptive title. [5] However, automated tools most likely will not recognize that ‘About us’ is not a descriptive name for a download page.

Hence, automated tools should therefore be used in conjunction with manual testing.

Conclusions

With online accessibility becoming an increasingly bigger concern, measures must be taken to ensure that web applications are accessible. One way to accomplish this is by using automated tools, which can be integrated in build processes.

Using a tool like Axe Core, we can use existing tests for automated accessibility assessment. This allows assessment of each component that is exposed by at least one test. Therefore, even those components restricted to users with certain permissions can be assessed. The availability of sufficient tests is crucial for this approach.

Although automated testing cannot realistically fully replace manual testing, development time could be shortened when automated testing is used in conjunction with manual testing.

Footnotes

[1] https://www.europarl.europa.eu/news/en/press-room/20181108IPR18560/european-accessibility-act-parliament-and-council-negotiators-strike-a-deal

[2] https://www.w3.org/TR/WCAG21/#background-on-wcag-2

[3] https://www.w3.org/WAI/ER/tools

[4] https://github.com/dequelabs/axe-core

[5] https://www.w3.org/TR/WCAG21/#page-titled