Tutorials

Tutorial 1

Large-Scale Entity Extraction from Enterprise Data

 

Speakers:

  • Rajeev Gupta, Microsoft, India
  • Ranganath Kondapally, Microsoft, India
     

Abstract:

Adoption of cloud computing by enterprises has exploded in the last decade and most of the applications used by enterprise users have moved to the cloud. These applications include collaboration software(e.g., Word, Excel), instant messaging (e.g., Chat), asynchronous communication (e.g., Email), etc. This has resulted in an exponential increase in the volume of data arising from the interactions of the users with the online applications (such as documents edited, people interacted with, meetings attended, etc.). Activities of a user provide strong insights about her such as meetings attended by the user indicate the set of people the user closely works with and documents edited indicate the topics the user works on, etc. Typically, this data is private and confidential for the enterprise, part of the enterprise, or the individual employee. To provide better experience and assist employees in their activities, it is critical to mine certain entities from this data. In this tutorial, we explain various entities which can be extracted from the enterprise data and assist the employees in their productivity.

Tutorial 2

Advances in exploratory data analysis, visualisation and quality for data centric AI systems

 

Speakers:

  • Hima Patel, IBM Research
  • Shanmukh Guttula, IBM Research
  • Ruhi Sharma Mittal, IBM Research
  • Naresh Manwani, IIIT Hyderabad, India
  • Laure Berti-Equille, IRD
  • Abhijit Manatkar, IIIT Hyderabad, India
     

Abstract:

It is widely accepted that data preparation is one of the most time consuming steps of machine learning (ML) lifecycle. It is also one of the most important steps, as the quality of data directly influences the quality of a model. In this tutorial, we will discuss the importance and the role of exploratory data analysis (EDA) and data visualisation techniques to find data quality issues and for data preparation, relevant to building ML pipelines. We will also discuss the latest advances in these fields and bring out areas that need innovation. To make the tutorial actionable for practitioners, we will also discuss on the most popular open source packages that one can get started with along with their strengths and weaknesses. Finally, we will discuss on the challenges posed by industry workloads and the gaps to be addressed to make data centric AI real in industry settings.

Tutorial 3

Neuro-symbolic AI for Mental Healthcare

 

Speakers:

  • Amit Sheth, University of South Carolina
  • Kaushik Roy, University of South Carolina
  • Manas Gaur, University of Maryland Baltimore County
  • Usha Lokala, University of South Carolina
     

Abstract:

In today’s data-driven world, organizations derive insights from massive amounts of data through large scale statistical machine learning models. However, statistical techniques can be easy to fool with adversarial instances (a neural network can predict a non-extremist as an extremist by mere presence of the word Jihad), which raises question in Data Quality. In high stakes decision making problems, such as cyber social threats, it is highly sensitive to classify a non-extremist as an extremist and vice-versa. Data quality is good if the data possesses adequate domain coverage and the labels contain adequate semantics. For example, is the semantics of an extremist vs. non-extremist vis-a-vis the word Jihad captured in the label (adequate semantics in labels)? Also, are there enough non-extremists with the word Jihad in the training data from the perspective of religion, hate, or ideology? Thus semantic annotation of the data, beyond mere labels attached to data instances, can significantly improve the robustness of model outcomes and ensure that the model has learned from trustworthy, knowledge-guided data standards. It is important to note that the knowledge-guided standards help de-bias the data if specified correctly (contextualized de-biasing extremist behavior data from bias towards the word Jihad). Therefore, in addition to trust in the robustness of outcomes, knowledge guided data creation also enables fair and ethical practices during real-world deployment of machine learning in high stakes decision making. We denote such data as Explainable Data. In this tutorial of type course and case-studies, we detail how to construct Explainable Data using various expert resources and knowledge graphs.

Tutorial Webpage: https://aiisc.ai/neurone/

Tutorial 4

Malware Analysis and Detection

 

Speakers:

  • Hemant Rathore, BITS Pilani Goa Campus, India
  • Mohit Sewak, Microsoft
     

Abstract:

Often computer/mobile users call everything that disturbs/corrupts their system a VIRUS without being aware of what it means or accomplishes. This tutorial systematically introduces the different malware varieties, their distinctive properties, different methods of analyzing the malware, and their detection techniques.

Tutorial 5

Identification of causal dependencies in multivariate time series

 

Speakers:

  • Sujoy Roy Chowdhury, Ericsson
  • Serene Banerjee, Ericsson
  • Ranjani H.G, Ericsson
  • Chaitanya Kapoor, Ericsson
     

Abstract:

Telecommunications networks operate on enormous amount of time-series data, and often exhibit anomalous trends in their behaviour. This is caused due to increased latency and reduced throughput in the network which inevitably leads to poor customer experience . One of the common problems in machine learning in the telecom domain is to predict anomalous behaviour ahead of time. Whilst this is a well-researched problem, though still open, there is far less work done in identifying causal structures from the temporal patterns of various Key Performance Indicators (KPI) in the telecom network. The ability to identify causal structures from anomalous behaviours would allow more effective intervention and generalisation of different environments and networks. The tutorial is focused on discussing existing frameworks for establishing causal discovery for time-series data sets. In this hands-on tutorial, we will be covering at least 3 state-of-the-art (SOTA) methods on causal time series analysis including Granger causality,convergent cross-mapping Peter-Clark Momentary Conditional Independence (PC-MCI) and Temporal Causal discovery framework (TCDF). The need for a causation analysis , beyond correlation will also be explained using publicly available datasets, such as, double pendulum dataset . The state-of-art methods are chosen to cover various aspects of the causal time series analysis, such as modelling the non-linearity (non-linear Granger Causality), attempting the problem from chaos and dynamic systems (CCM), information-theoretic approaches (PC-MCI, or having a data-driven approach (TCDF). State-of-the-art survey papers show that none of the methods can be said to be ideal for all the possible time series and there are relative advantages and shortcomings for each of these methods.

Tutorial 6

TinyML Techniques for running Machine Learning models on Edge Devices

 

Speakers:

  • Arijit Mukherjee, TCS Research
  • Arijit Ukil, TCS Research
  • Swarnava Dey, TCS Research
  • Gitesh Kulkarni, TCS Research
     

Abstract:

Resource-constrained platforms such as micro-controllers are the workhorses in embedded systems, being deployed to capture data from sensors and send the collected data to cloud for processing. Recently, a great interest is seen in the research community and industry to use these devices for performing Artificial Intelligence/Machine Learning (AI/ML) inference tasks in the areas of computer vision, natural language processing, machine monitoring etc. leading to the realization of embedded intelligence at the edge. This task is challenging and needs a significant knowledge of AI/ML applications, algorithms, and computer architecture and their interactions to achieve the desired performance. In this tutorial we cover a few aspects that will help embedded systems designers and AI/ML engineers and scientists to deploy the AI/ML models on the Tiny Edge Devices at an optimum level of performance.