IJCAI 19 Program Schedule

#11051

CRSRL: Customer Routing System Using Reinforcement Learning
Chong Long, Zining Liu, Xiaolu Lu, Zehong Hu, Yafang Wang
Details | PDF

Demo Booths 1

Allocating resources to customers in the customer service is a difficult problem, because designing an optimal strategy to achieve an optimal trade-off between available resources and customers' satisfaction is non-trivial. In this paper, we formalize the customer routing problem, and propose a novel framework based on deep reinforcement learning (RL) to address this problem. To make it more practical, a demo is provided to show and compare different models, which visualizes all decision process, and in particular, the system shows how the optimal strategy is reached. Besides, our demo system also ships with a variety of models that users can choose based on their needs.
#11054

ATTENet: Detecting and Explaining Suspicious Tax Evasion Groups
Qinghua Zheng, Yating Lin, Huan He, Jianfei Ruan, Bo Dong
Details | PDF

Demo Booths 1

In this demonstration, we present ATTENet, a novel visual analytic system for detecting and explaining suspicious affiliated-transaction-based tax evasion (ATTE) groups. First, the system constructs a taxpayer interest interacted network, which contains economic behaviors and social relationships between taxpayers. Then, the system combines basic features and structure features of each group in the network with network embedding method structure2Vec, and then detects suspicious ATTE groups with random forest algorithm. Last, to explore and explain the detection results, the system provides an ATTENet visualization with three coordinated views and interactive tools. We demonstrate ATTENet on a non-confidential dataset which contains two years of real tax data obtained by our cooperative tax authorities to verify the usefulness of our system.
#11033

DeepRec: An Open-source Toolkit for Deep Learning based Recommendation
Shuai Zhang, Yi Tay, Lina Yao, Bin Wu, Aixin Sun
Details | PDF

Demo Booths 1

Deep learning based recommender systems have been extensively explored in recent years. However, the large number of models proposed each year poses a big challenge for both researchers and practitioners in reproducing the results for further comparisons. Although a portion of papers provides source code, they adopted different programming languages or different deep learning packages, which also raises the bar in grasping the ideas. To alleviate this problem, we released the open source project: \textbf{DeepRec}. In this toolkit, we have implemented a number of deep learning based recommendation algorithms using Python and the widely used deep learning package - Tensorflow. Three major recommendation scenarios: rating prediction, top-N recommendation (item ranking) and sequential recommendation, were considered. Meanwhile, DeepRec maintains good modularity and extensibility to easily incorporate new models into the framework. It is distributed under the terms of the GNU General Public License. The source code is available at github: https://github.com/cheungdaven/DeepRec
#11034

Agent-based Decision Support for Pain Management in Primary Care Settings
Xu Guo, Han Yu, Chunyan Miao, Yiqiang Chen
Details | PDF

Demo Booths 1

The lack of systematic pain management training and support among primary care physicians (PCPs) limits their ability to provide quality care for patients with pain. Here, we demonstrate an Agent-based Clinical Decision Support System to empower PCPs to leverage knowledge from pain specialists. The system learns a general-purpose representation space on patients, automatically diagnoses pain, recommends therapy and medicine, and suggests a referral program to PCPs in their decision-making tasks.
#11036

A Mobile Application for Sound Event Detection
Yingwei Fu, Kele Xu, Haibo Mi, Huaimin Wang, Dezhi Wang, Boqing Zhu
Details | PDF

Demo Booths 1

Sound event detection is intended to analyze and recognize the sound events in audio streams and it has widespread applications in real life. Recently, deep neural networks such as convolutional recurrent neural networks have shown state-of-the-art performance in this task. However, the previous methods were designed and implemented on devices with rich computing resources, and there are few applications on mobile devices. This paper focuses on the solution on the mobile platform for sound event detection. The architecture of the solution includes offline training and online detection. During offline training process, multi model-based distillation method is used to compress model to enable real-time detection. The online detection process includes acquisition of sensor data, processing of audio signals, and detecting and recording of sound events. Finally, we implement an application on the mobile device that can detect sound events in near real time.
#11039

Demonstration of PerformanceNet: A Convolutional Neural Network Model for Score-to-Audio Music Generation
Yu-Hua Chen, Bryan Wang, Yi-Hsuan Yang
Details | PDF

Demo Booths 1

We present in this paper PerformacnceNet, a neural network model we proposed recently to achieve score-to-audio music generation. The model learns to convert a music piece from the symbolic domain to the audio domain, assigning performance-level attributes such as changes in velocity automatically to the music and then synthesizing the audio. The model is therefore not just a neural audio synthesizer, but an AI performer that learns to interpret a musical score in its own way. The code and sample outputs of the model can be found online at https://github.com/bwang514/PerformanceNet.
#11045

A Quantitative Analysis Platform for PD-L1 Immunohistochemistry based on Point-level Supervision Model
Haibo Mi, Kele Xu, Yang Xiang, Yulin He, Dawei Feng, Huaimin Wang, Chun Wu, Yanming Song, Xiaolei Sun
Details | PDF

Demo Booths 1

Recently, deep learning has witnessed dramatic progress in the medical image analysis field. In the precise treatment of cancer immunotherapy, the quantitative analysis of PD-L1 immunohistochemistry is of great importance. It is quite common that pathologists manually quantify the cell nuclei. This process is very time-consuming and error-prone. In this paper, we describe the development of a platform for PD-L1 pathological image quantitative analysis using deep learning approaches. As point-level annotations can provide a rough estimate of the object locations and classifications, this platform adopts a point-level supervision model to classify, localize, and count the PD-L1 cells nuclei. Presently, this platform has achieved an accurate quantitative analysis of PD-L1 for two types of carcinoma, and it is deployed in one of the first-class hospitals in China.
#11052

Explainable Deep Neural Networks for Multivariate Time Series Predictions
Roy Assaf, Anika Schumann
Details | PDF

Demo Booths 1

We demonstrate that CNN deep neural networks can not only be used for making predictions based on multivariate time series data, but also for explaining these predictions. This is important for a number of applications where predictions are the basis for decisions and actions. Hence, confidence in the prediction result is crucial. We design a two stage convolutional neural network architecture which uses particular kernel sizes. This allows us to utilise gradient based techniques for generating saliency maps for both the time dimension and the features. These are then used for explaining which features during which time interval are responsible for a given prediction, as well as explaining during which time intervals was the joint contribution of all features most important for that prediction. We demonstrate our approach for predicting the average energy production of photovoltaic power plants and for explaining these predictions.
#11025

Neural Discourse Segmentation
Jing Li
Details | PDF

Demo Booths 1

Identifying discourse structures and coherence relations in a piece of text is a fundamental task in natural language processing. The first step of this process is segmenting sentences into clause-like units called elementary discourse units (EDUs). Traditional solutions to discourse segmentation heavily rely on carefully designed features. In this demonstration, we present SegBot, a system to split a given piece of text into sequence of EDUs by using an end-to-end neural segmentation model. Our model does not require hand-crafted features or external knowledge except word embeddings, yet it outperforms state-of-the-art solutions to discourse segmentation.
#11040

Design and Implementation of a Disambiguity Framework for Smart Voice Controlled Devices
Kehua Lei, Tianyi Ma, Jia Jia, Cunjun Zhang, Zhihan Yang
Details | PDF

Demo Booths 1

With about 100 million people using it recently, SVCD(Smart Voice Controlled Device) are becoming demotic. Whether at home or in an office, usually, multiple appliances are under the control of a single SVCD and several people may manipulate an SVCD simultaneously. However, present SVCD fails to handle them appropriately. In this paper, we propose a novel framework for SVCD to eliminate orders’ ambiguity for single user or multi-user. We also design an algorithm combining Word2Vec and emotion detection for the device to wipe off ambiguity. Finally, we apply our framework into a virtual smart home scene and the performance of it indicates that our strategy resolves the problems commendably.
#11049

AntProphet: an Intention Mining System behind Alipay's Intelligent Customer Service Bot
Cen Chen, Xiaolu Zhang, Sheng Ju, Chilin Fu, Caizhi Tang, Jun Zhou, Xiaolong Li
Details | PDF

Demo Booths 1

We create an intention mining system, named AntProphet, for Alipay's intelligent customer service bot, to alleviate the burden of customer service. Whenever users have any questions, AntProphet is the first stop to help users to answer their questions. Our system gathers users' profile and their historical behavioral trajectories, together with contextual information to predict users' intention, i.e., the potential questions that users want to resolve. AntProphet takes care of more than 90% of the customer service demands in the Alipay APP and resolves most of the users' problems on the spot, thus significantly reduces the burden of manpower. With the help of it, the overall satisfaction rate of our customer service bot exceeds 85%.

Tuesday 13 10:50 - 11:50 MTA|AM - Art and Music (2705-2706)

Chair: Wenwu Wang

#3626

Temporal Pyramid Pooling Convolutional Neural Network for Cover Song Identification
Zhesong Yu, Xiaoshuo Xu, Xiaoou Chen, Deshun Yang
Details | PDF

Art and Music

Cover song identification is an important problem in the field of Music Information Retrieval. Most existing methods rely on hand-crafted features and sequence alignment methods, and further breakthrough is hard to achieve. In this paper, Convolutional Neural Networks (CNNs) are used for representation learning toward this task. We show that they could be naturally adapted to deal with key transposition in cover songs. Additionally, Temporal Pyramid Pooling is utilized to extract information on different scales and transform songs with different lengths into fixed-dimensional representations. Furthermore, a training scheme is designed to enhance the robustness of our model. Extensive experiments demonstrate that combined with these techniques, our approach is robust against musical variations existing in cover songs and outperforms state-of-the-art methods on several datasets with low time complexity.
#4170

Dilated Convolution with Dilated GRU for Music Source Separation
Jen-Yu Liu, Yi-Hsuan Yang
Details | PDF

Art and Music

Stacked dilated convolutions used in Wavenet have been shown effective for generating high-quality audios. By replacing pooling/striding with dilation in convolution layers, they can preserve high-resolution information and still reach distant locations. Producing high-resolution predictions is also crucial in music source separation, whose goal is to separate different sound sources while maintain the quality of the separated sounds. Therefore, in this paper, we use stacked dilated convolutions as the backbone for music source separation. Although stacked dilated convolutions can reach wider context than standard convolutions do, their effective receptive fields are still fixed and might not be wide enough for complex music audio signals. To reach even further information at remote locations, we propose to combine a dilated convolution with a modified GRU called Dilated GRU to form a block. A Dilated GRU receives information from k-step before instead of the previous step for a fixed k. This modification allows a GRU unit to reach a location with fewer recurrent steps and run faster because it can execute in parallel partially. We show that the proposed model with a stack of such blocks performs equally well or better than the state-of-the-art for separating both vocals and accompaniment.
#6280

Musical Composition Style Transfer via Disentangled Timbre Representations
Yun-Ning Hung, I-Tung Chiang, Yi-An Chen, Yi-Hsuan Yang
Details | PDF

Art and Music

Music creation involves not only composing the different parts (e.g., melody, chords) of a musical work but also arranging/selecting the instruments to play the different parts. While the former has received increasing attention, the latter has not been much investigated. This paper presents, to the best of our knowledge, the first deep learning models for rearranging music of arbitrary genres. Specifically, we build encoders and decoders that take a piece of polyphonic musical audio as input, and predict as output its musical score. We investigate disentanglement techniques such as adversarial training to separate latent factors that are related to the musical content (pitch) of different parts of the piece, and that are related to the instrumentation (timbre) of the parts per short-time segment. By disentangling pitch and timbre, our models have an idea of how each piece was composed and arranged. Moreover, the models can realize “composition style transfer” by rearranging a musical piece without much affecting its pitch content. We validate the effectiveness of the models by experiments on instrument activity detection and composition style transfer. To facilitate follow-up research, we open source our code at https://github.com/biboamy/instrument-disentangle.
#1469

SynthNet: Learning to Synthesize Music End-to-End
Florin Schimbinschi, Christian Walder, Sarah M. Erfani, James Bailey
Details | PDF

Art and Music

We consider the problem of learning a mapping directly from annotated music to waveforms, bypassing traditional single note synthesis. We propose a specific architecture based on WaveNet, a convolutional autoregressive generative model designed for text to speech. We investigate the representations learned by these models on music and concludethat mappings between musical notes and the instrument timbre can be learned directly from the raw audio coupled with the musical score, in binary piano roll format.Our model requires minimal training data (9 minutes), is substantially better in quality and converges 6 times faster in comparison to strong baselines in the form of powerful text to speech models.The quality of the generated waveforms (generation accuracy) is sufficiently high,that they are almost identical to the ground truth.Our evaluations are based on both the RMSE of the Constant-Q transform, and mean opinion scores from human subjects.We validate our work using 7 distinct synthetic instrument timbres, real cello music and also provide visualizations and links to all generated audio.

Tuesday 13 10:50 - 12:20 ML|AL - Active Learning 1 (J)

Chair: Giuseppe Riccardi

#392

Deeper Connections between Neural Networks and Gaussian Processes Speed-up Active Learning
Evgenii Tsymbalov, Sergei Makarychev, Alexander Shapeev, Maxim Panov
Details | PDF

Active Learning 1

Active learning methods for neural networks are usually based on greedy criteria, which ultimately give a single new design point for the evaluation. Such an approach requires either some heuristics to sample a batch of design points at one active learning iteration, or retraining the neural network after adding each data point, which is computationally inefficient. Moreover, uncertainty estimates for neural networks sometimes are overconfident for the points lying far from the training sample. In this work, we propose to approximate Bayesian neural networks (BNN) by Gaussian processes (GP), which allows us to update the uncertainty estimates of predictions efficiently without retraining the neural network while avoiding overconfident uncertainty prediction for out-of-sample points. In a series of experiments on real-world data, including large-scale problems of chemical and physical modeling, we show the superiority of the proposed approach over the state-of-the-art methods.
#3209

Multi-View Active Learning for Video Recommendation
Jia-Jia Cai, Jun Tang, Qing-Guo Chen, Yao Hu, Xiaobo Wang, Sheng-Jun Huang
Details | PDF

Active Learning 1

On many video websites, the recommendation is implemented as a prediction problem of video-user pairs, where the videos are represented by text features extracted from the metadata. However, the metadata is manually annotated by users and is usually missing for online videos. To train an effective recommender system with lower annotation cost, we propose an active learning approach to fully exploit the visual view of videos, while querying as few annotations as possible from the text view. On one hand, a joint model is proposed to learn the mapping from visual view to text view by simultaneously aligning the two views and minimizing the classification loss. On the other hand, a novel strategy based on prediction inconsistency and watching frequency is proposed to actively select the most important videos for metadata querying. Experiments on both classification datasets and real video recommendation tasks validate that the proposed approach can significantly reduce the annotation cost.
#6159

Active Learning within Constrained Environments through Imitation of an Expert Questioner
Kalesha Bullard, Yannick Schroecker, Sonia Chernova
Details | PDF

Active Learning 1

Active learning agents typically employ a query selection algorithm which solely considers the agent's learning objectives. However, this may be insufficient in more realistic human domains. This work uses imitation learning to enable an agent in a constrained environment to concurrently reason about both its internal learning goals and environmental constraints externally imposed, all within its objective function. Experiments are conducted on a concept learning task to test generalization of the proposed algorithm to different environmental conditions and analyze how time and resource constraints impact efficacy of solving the learning problem. Our findings show the environmentally-aware learning agent is able to statistically outperform all other active learners explored under most of the constrained conditions. A key implication is adaptation for active learning agents to more realistic human environments, where constraints are often externally imposed on the learner.
#1644

Mindful Active Learning
Zhila Esna Ashari, Hassan Ghasemzadeh
Details | PDF

Active Learning 1

We propose a novel active learning framework for activity recognition using wearable sensors. Our work is unique in that it takes physical and cognitive limitations of the oracle into account when selecting sensor data to be annotated by the oracle. Our approach is inspired by human-beings' limited capacity to respond to external stimulus such as responding to a prompt on their mobile devices. This capacity constraint is manifested not only in the number of queries that a person can respond to in a given time-frame but also in the lag between the time that a query is made and when it is responded to. We introduce the notion of mindful active learning and propose a computational framework, called EMMA, to maximize the active learning performance taking informativeness of sensor data, query budget, and human memory into account. We formulate this optimization problem, propose an approach to model memory retention, discuss complexity of the problem, and propose a greedy heuristic to solve the problem. We demonstrate the effectiveness of our approach on three publicly available datasets and by simulating oracles with various memory strengths. We show that the activity recognition accuracy ranges from 21% to 97% depending on memory strength, query budget, and difficulty of the machine learning task. Our results also indicate that EMMA achieves an accuracy level that is, on average, 13.5% higher than the case when only informativeness of the sensor data is considered for active learning. Additionally, we show that the performance of our approach is at most 20% less than experimental upper-bound and up to 80% higher than experimental lower-bound. We observe that mindful active learning is most beneficial when query budget is small and/or oracle's memory is weak, thus emphasizing contributions of our work in human-centered mobile health settings and for elderly with cognitive impairments.
#4697

ActiveHNE: Active Heterogeneous Network Embedding
Xia Chen, Guoxian Yu, Jun Wang, Carlotta Domeniconi, Zhao Li, Xiangliang Zhang
Details | PDF

Active Learning 1

Heterogeneous network embedding (HNE) is a challenging task due to the diverse node types and/or diverse relationships between nodes. Existing HNE methods are typically unsupervised. To maximize the profit of utilizing the rare and valuable supervised information in HNEs, we develop a novel Active Heterogeneous Network Embedding (ActiveHNE) framework, which includes two components: Discriminative Heterogeneous Network Embedding (DHNE) and Active Query in Heterogeneous Networks (AQHN).In DHNE, we introduce a novel semi-supervised heterogeneous network embedding method based on graph convolutional neural network. In AQHN, we first introduce three active selection strategies based on uncertainty and representativeness, and then derive a batch selection method that assembles these strategies using a multi-armed bandit mechanism. ActiveHNE aims at improving the performance of HNE by feeding the most valuable supervision obtained by AQHN into DHNE. Experiments on public datasets demonstrate the effectiveness of ActiveHNE and its advantage on reducing the query cost.
#2628

Deep Active Learning with Adaptive Acquisition
Manuel Haussmann, Fred Hamprecht, Melih Kandemir
Details | PDF

Active Learning 1

Model selection is treated as a standard performance boosting step in many machine learning applications. Once all other properties of a learning problem are fixed, the model is selected by grid search on a held-out validation set. This is strictly inapplicable to active learning. Within the standardized workflow, the acquisition function is chosen among available heuristics a priori, and its success is observed only after the labeling budget is already exhausted. More importantly, none of the earlier studies report a unique consistently successful acquisition heuristic to the extent to stand out as the unique best choice. We present a method to break this vicious circle by defining the acquisition function as a learning predictor and training it by reinforcement feedback collected from each labeling round. As active learning is a scarce data regime, we bootstrap from a well-known heuristic that filters the bulk of data points on which all heuristics would agree, and learn a policy to warp the top portion of this ranking in the most beneficial way for the character of a specific data distribution. Our system consists of a Bayesian neural net, the predictor, a bootstrap acquisition function, a probabilistic state definition, and another Bayesian policy network that can effectively incorporate this input distribution. We observe on three benchmark data sets that our method always manages to either invent a new superior acquisition function or to adapt itself to the a priori unknown best performing heuristic for each specific data set.

Tuesday 13 10:50 - 12:35 Panel (K)

Chair: Marie desJardins

Diversity in AI: Where Are We and Where Are We Headed
Marie desJardins

Panel

Tuesday 13 10:50 - 12:35 ML|DL - Deep Learning 1 (L)

Chair: Tengfei Ma

#1601

FakeTables: Using GANs to Generate Functional Dependency Preserving Tables with Bounded Real Data
Haipeng Chen, Sushil Jajodia, Jing Liu, Noseong Park, Vadim Sokolov, V. S. Subrahmanian
Details | PDF

Deep Learning 1

In many cases, an organization wishes to release some data, but is restricted in the amount of data to be released due to legal, privacy and other concerns. For instance, the US Census Bureau releases only 1% of its table of records every year, along with statistics about the entire table. However, the machine learning (ML) models trained on the released sub-table are usually sub-optimal. In this paper, our goal is to find a way to augment the sub-table by generating a synthetic table from the released sub-table, under the constraints that the generated synthetic table (i) has similar statistics as the entire table, and (ii) preserves the functional dependencies of the released sub-table. We propose a novel generative adversarial network framework called ITS-GAN, where both the generator and the discriminator are specifically designed to satisfy these two constraints. By evaluating the augmentation performance of ITS-GAN on two representative datasets, the US Census Bureau data and US Bureau of Transportation Statistics (BTS) data, we show that ITS-GAN yields high quality classification results, and significantly outperforms various state-of-the-art data augmentation approaches.
#1954

Boundary Perception Guidance: A Scribble-Supervised Semantic Segmentation Approach
Bin Wang, Guojun Qi, Sheng Tang, Tianzhu Zhang, Yunchao Wei, Linghui Li, Yongdong Zhang
Details | PDF

Deep Learning 1

Semantic segmentation suffers from the fact that densely annotated masks are expensive to obtain. To tackle this problem, we aim at learning to segment by only leveraging scribbles that are much easier to collect for supervision. To fully explore the limited pixel-level annotations from scribbles, we present a novel Boundary Perception Guidance (BPG) approach, which consists of two basic components, i.e., prediction refinement and boundary regression. Specifically, the prediction refinement progressively makes a better segmentation by adopting an iterative upsampling and a semantic feature enhancement strategy. In the boundary regression, we employ class-agnostic edge maps for supervision to effectively guide the segmentation network in localizing the boundaries between different semantic regions, leading to producing finer-grained representation of feature maps for semantic segmentation. The experiment results on the PASCAL VOC 2012 demonstrate the proposed BPG achieves mIoU of 73.2% without fully connected Conditional Random Field (CRF) and 76.0% with CRF, setting up the new state-of-the-art in literature.
#2829

Open-Ended Long-Form Video Question Answering via Hierarchical Convolutional Self-Attention Networks
Zhu Zhang, Zhou Zhao, Zhijie Lin, Jingkuan Song, Xiaofei He
Details | PDF

Deep Learning 1

Open-ended video question answering aims to automatically generate the natural-language answer from referenced video contents according to the given question. Currently, most existing approaches focus on short-form video question answering with multi-modal recurrent encoder-decoder networks. Although these works have achieved promising performance, they may still be ineffectively applied to long-form video question answering due to the lack of long-range dependency modeling and the suffering from the heavy computational cost. To tackle these problems, we propose a fast hierarchical convolutional self-attention encoder-decoder network. Concretely, we first develop a hierarchical convolutional self-attention encoder to efficiently model long-form video contents, which builds the hierarchical structure for video sequences and captures question-aware long-range dependencies from video context. We then devise a multi-scale attentive decoder to incorporate multi-layer video representations for answer generation, which avoids the information missing of the top encoder layer. The extensive experiments show the effectiveness and efficiency of our method.
#3298

Attribute Aware Pooling for Pedestrian Attribute Recognition
Kai Han, Yunhe Wang, Han Shu, Chuanjian Liu, Chunjing Xu, Chang Xu
Details | PDF

Deep Learning 1

This paper expands the strength of deep convolutional neural networks (CNNs) to the pedestrian attribute recognition problem by devising a novel attribute aware pooling algorithm. Existing vanilla CNNs cannot be straightforwardly applied to handle multi-attribute data because of the larger label space as well as the attribute entanglement and correlations. We tackle these challenges that hampers the development of CNNs for multi-attribute classification by fully exploiting the correlation between different attributes. The multi-branch architecture is adopted for fucusing on attributes at different regions. Besides the prediction based on each branch itself, context information of each branch are employed for decision as well. The attribute aware pooling is developed to integrate both kinds of information. Therefore, attributes which are indistinct or tangled with others can be accurately recognized by exploiting the context information. Experiments on benchmark datasets demonstrate that the proposed pooling method appropriately explores and exploits the correlations between attributes for the pedestrian attribute recognition.
#365

MUSICAL: Multi-Scale Image Contextual Attention Learning for Inpainting
Ning Wang, Jingyuan Li, Lefei Zhang, Bo Du
Details | PDF

Deep Learning 1

We study the task of image inpainting, where an image with missing region is recovered with plausible context. Recent approaches based on deep neural networks have exhibited potential for producing elegant detail and are able to take advantage of background information, which gives texture information about missing region in the image. These methods often perform pixel/patch level replacement on the deep feature maps of missing region and therefore enable the generated content to have similar texture as background region. However, this kind of replacement is a local strategy and often performs poorly when the background information is misleading. To this end, in this study, we propose to use a multi-scale image contextual attention learning (MUSICAL) strategy that helps to flexibly handle richer background information while avoid to misuse of it. However, such strategy may not promising in generating context of reasonable style. To address this issue, both of the style loss and the perceptual loss are introduced into the proposed method to achieve the style consistency of the generated image. Furthermore, we have also noticed that replacing some of the down sampling layers in the baseline network with the stride 1 dilated convolution layers is beneficial for producing sharper and fine-detailed results. Experiments on the Paris Street View, Places, and CelebA datasets indicate the superior performance of our approach compares to the state-of-the-arts.
#1516

Neurons Merging Layer: Towards Progressive Redundancy Reduction for Deep Supervised Hashing
Chaoyou Fu, Liangchen Song, Xiang Wu, Guoli Wang, Ran He
Details | PDF

Deep Learning 1

Deep supervised hashing has become an active topic in information retrieval. It generates hashing bits by the output neurons of a deep hashing network. During binary discretization, there often exists much redundancy between hashing bits that degenerates retrieval performance in terms of both storage and accuracy. This paper proposes a simple yet effective Neurons Merging Layer (NMLayer) for deep supervised hashing. A graph is constructed to represent the redundancy relationship between hashing bits that is used to guide the learning of a hashing network. Specifically, it is dynamically learned by a novel mechanism defined in our active and frozen phases. According to the learned relationship, the NMLayer merges the redundant neurons together to balance the importance of each output neuron. Moreover, multiple NMLayers are progressively trained for a deep hashing network to learn a more compact hashing code from a long redundant code. Extensive experiments on four datasets demonstrate that our proposed method outperforms state-of-the-art hashing methods.
#1498

Deeply-learned Hybrid Representations for Facial Age Estimation
Zichang Tan, Yang Yang, Jun Wan, Guodong Guo, Stan Z. Li
Details | PDF

Deep Learning 1

In this paper, we propose a novel unified network named Deep Hybrid-Aligned Architecture for facial age estimation. It contains global, local and global-local branches. They are jointly optimized and thus can capture multiple types of features with complementary information. In each branch, we employ a separate loss for each sub-network to extract the independent features and use a recurrent fusion to explore correlations among those region features. Considering that the pose variations may lead to misalignment in different regions, we design an Aligned Region Pooling operation to generate aligned region features. Moreover, a new large age dataset named Web-FaceAge owning more than 120K samples is collected under diverse scenes and spanning a large age range. Experiments on five age benchmark datasets, including Web-FaceAge, Morph, FG-NET, CACD and Chalearn LAP 2015, show that the proposed method outperforms the state-of-the-art approaches significantly.

Tuesday 13 10:50 - 12:35 ML|RS - Recommender Systems 1 (2701-2702)

Chair: William K. Cheung

#663

Matrix Completion in the Unit Hypercube via Structured Matrix Factorization
Emanuele Bugliarello, Swayambhoo Jain, Vineeth Rakesh
Details | PDF

Recommender Systems 1

Several complex tasks that arise in organizations can be simplified by mapping them into a matrix completion problem. In this paper, we address a key challenge faced by our company: predicting the efficiency of artists in rendering visual effects (VFX) in film shots. We tackle this challenge by using a two-fold approach: first, we transform this task into a constrained matrix completion problem with entries bounded in the unit interval [0,1]; second, we propose two novel matrix factorization models that leverage our knowledge of the VFX environment. Our first approach, expertise matrix factorization (EMF), is an interpretable method that structures the latent factors as weighted user-item interplay. The second one, survival matrix factorization (SMF), is instead a probabilistic model for the underlying process defining employees' efficiencies. We show the effectiveness of our proposed models by extensive numerical tests on our VFX dataset and two additional datasets with values that are also bounded in the [0,1] interval.
#704

Modeling Multi-Purpose Sessions for Next-Item Recommendations via Mixture-Channel Purpose Routing Networks
Shoujin Wang, Liang Hu, Yan Wang, Quan Z. Sheng, Mehmet Orgun, Longbing Cao
Details | PDF

Recommender Systems 1

A session-based recommender system (SBRS) suggests the next item by modeling the dependencies between items in a session. Most of existing SBRSs assume the items inside a session are associated with one (implicit) purpose. However, this may not always be true in reality, and a session may often consist of multiple subsets of items for different purposes (e.g., breakfast and decoration). Specifically, items (e.g., bread and milk) in a subsethave strong purpose-specific dependencies whereas items (e.g., bread and vase) from different subsets have much weaker or even no dependencies due to the difference of purposes. Therefore, we propose a mixture-channel model to accommodate the multi-purpose item subsets for more precisely representing a session. Filling gaps in existing SBRSs, this model recommends more diverse items to satisfy different purposes. Accordingly, we design effective mixture-channel purpose routing networks (MCPRN) with a purpose routing network to detect the purposes of each item and assign it into the corresponding channels. Moreover, a purpose specific recurrent network is devised to model the dependencies between items within each channel for a specific purpose. The experimental results show the superiority of MCPRN over the state-of-the-art methods in terms of both recommendation accuracy and diversity.
#1488

A Review-Driven Neural Model for Sequential Recommendation
Chenliang Li, Xichuan Niu, Xiangyang Luo, Zhenzhong Chen, Cong Quan
Details | PDF

Recommender Systems 1

Writing review for a purchased item is a unique channel to express a user's opinion in E-Commerce. Recently, many deep learning based solutions have been proposed by exploiting user reviews for rating prediction. In contrast, there has been few attempt to enlist the semantic signals covered by user reviews for the task of collaborative filtering. In this paper, we propose a novel review-driven neural sequential recommendation model (named RNS) by considering user's intrinsic preference (long-term) and sequential patterns (short-term). In detail, RNS is devised to encode each user or item with the aspect-aware representations extracted from the reviews. Given a sequence of historical purchased items for a user, we devise a novel hierarchical attention over attention mechanism to capture sequential patterns at both union-level and individual-level. Extensive experiments on three real-world datasets of different domains demonstrate that RNS obtains significant performance improvement over uptodate state-of-the-art sequential recommendation models.
#3150

PD-GAN: Adversarial Learning for Personalized Diversity-Promoting Recommendation
Qiong Wu, Yong Liu, Chunyan Miao, Binqiang Zhao, Yin Zhao, Lu Guan
Details | PDF

Recommender Systems 1

This paper proposes Personalized Diversity-promoting GAN (PD-GAN), a novel recommendation model to generate diverse, yet relevant recommendations. Specifically, for each user, a generator recommends a set of diverse and relevant items by sequentially sampling from a personalized Determinantal Point Process (DPP) kernel matrix. This kernel matrix is constructed by two learnable components: the general co-occurrence of diverse items and the user's personal preference to items. To learn the first component, we propose a novel pairwise learning paradigm using training pairs, and each training pair consists of a set of diverse items and a set of similar items randomly sampled from the observed data of all users. The second component is learnt through adversarial training against a discriminator which strives to distinguish between recommended items and the ground-truth sets randomly sampled from the observed data of the target user. Experimental results show that PD-GAN is superior to generate recommendations that are both diverse and relevant.
#4290

DARec: Deep Domain Adaptation for Cross-Domain Recommendation via Transferring Rating Patterns
Feng Yuan, Lina Yao, Boualem Benatallah
Details | PDF

Recommender Systems 1

Cross-domain recommendation has long been one of the major topics in recommender systems.Recently, various deep models have been proposed to transfer the learned knowledge across domains, but most of them focus on extracting abstract transferable features from auxilliary contents, e.g., images and review texts, and the patterns in the rating matrix itself is rarely touched. In this work, inspired by the concept of domain adaptation, we proposed a deep domain adaptation model (DARec) that is capable of extracting and transferring patterns from rating matrices only without relying on any auxillary information. We empirically demonstrate on public datasets that our method achieves the best performance among several state-of-the-art alternative cross-domain recommendation models.
#5182

STAR-GCN: Stacked and Reconstructed Graph Convolutional Networks for Recommender Systems
Jiani Zhang, Xingjian Shi, Shenglin Zhao, Irwin King
Details | PDF

Recommender Systems 1

We propose a new STAcked and Reconstructed Graph Convolutional Networks (STAR-GCN) architecture to learn node representations for boosting the performance in recommender systems, especially in the cold start scenario. STAR-GCN employs a stack of GCN encoder-decoders combined with intermediate supervision to improve the final prediction performance. Unlike the graph convolutional matrix completion model with one-hot encoding node inputs, our STAR-GCN learns low-dimensional user and item latent factors as the input to restrain the model space complexity. Moreover, our STAR-GCN can produce node embeddings for new nodes by reconstructing masked input node embeddings, which essentially tackles the cold start problem. Furthermore, we discover a label leakage issue when training GCN-based models for link prediction tasks and propose a training strategy to avoid the issue. Empirical results on multiple rating prediction benchmarks demonstrate our model achieves state-of-the-art performance in four out of five real-world datasets and significant improvements in predicting ratings in the cold start scenario. The code implementation is available in https://github.com/jennyzhang0215/STAR-GCN.
#6424

Disparity-preserved Deep Cross-platform Association for Cross-platform Video Recommendation
Shengze Yu, Xin Wang, Wenwu Zhu, Peng Cui, Jingdong Wang
Details | PDF

Recommender Systems 1

Cross-platform recommendation aims to improve recommendation accuracy through associating information from different platforms. Existing cross-platform recommendation approaches assume all cross-platform information to be consistent with each other and can be aligned. However, there remain two unsolved challenges: i) there exist inconsistencies in cross-platform association due to platform-specific disparity, and ii) data from distinct platforms may have different semantic granularities. In this paper, we propose a cross-platform association model for cross-platform video recommendation, i.e., Disparity-preserved Deep Cross-platform Association (DCA), taking platform-specific disparity and granularity difference into consideration. The proposed DCA model employs a partially-connected multi-modal autoencoder, which is capable of explicitly capturing platform-specific information, as well as utilizing nonlinear mapping functions to handle granularity differences. We then present a cross-platform video recommendation approach based on the proposed DCA model. Extensive experiments for our cross-platform recommendation framework on real-world dataset demonstrate that the proposed DCA model significantly outperform existing cross-platform recommendation methods in terms of various evaluation metrics.

Tuesday 13 10:50 - 12:35 AMS|NG - Noncooperative Games 1 (2703-2704)

Chair: Fei Fang

#2776

Be a Leader or Become a Follower: The Strategy to Commit to with Multiple Leaders
Matteo Castiglioni, Alberto Marchesi, Nicola Gatti
Details | PDF

Noncooperative Games 1

We study the problem of computing correlated strategies to commit to in games with multiple leaders and followers. To the best of our knowledge, this problem is widely unexplored so far, as the majority of the works in the literature focus on games with a single leader and one or more followers. The fundamental ingredient of our model is that a leader can decide whether to participate in the commitment or to defect from it by taking on the role of follower. This introduces a preliminary stage where, before the underlying game is played, the leaders make their decisions to reach an agreement on the correlated strategy to commit to. We distinguish three solution concepts on the basis of the constraints that they enforce on the agreement reached by the leaders. Then, we provide a comprehensive study of the properties of our solution concepts, in terms of existence, relation with other solution concepts, and computational complexity.
#2814

Civic Crowdfunding for Agents with Negative Valuations and Agents with Asymmetric Beliefs
Sankarshan Damle, Moin Hussain Moti, Praphul Chandra, Sujit Gujar
Details | PDF

Noncooperative Games 1

In the last decade, civic crowdfunding has proved to be effective in generating funds for the provision of public projects. However, the existing literature deals only with citizen's with positive valuation and symmetric belief towards the project's provision. In this work, we present novel mechanisms which break these two barriers, i.e., mechanisms which incorporate negative valuation and asymmetric belief, independently. For negative valuation, we present a methodology for converting existing mechanisms to mechanisms that incorporate agents with negative valuations. Particularly, we adapt existing PPR and PPS mechanisms, to present novel PPRN and PPSN mechanisms which incentivize strategic agents to contribute to the project based on their true preference. With respect to asymmetric belief, we propose a reward scheme Belief Based Reward (BBR) based on Robust Bayesian Truth Serum mechanism. With BBR, we propose a general mechanism for civic crowdfunding which incorporates asymmetric agents. We leverage PPR and PPS, to present PPRx and PPSx. We prove that in PPRx and PPSx, agents with greater belief towards the project's provision contribute more than agents with lesser belief. Further, we also show that contributions are such that the project is provisioned at equilibrium.
#4500

Network Formation under Random Attack and Probabilistic Spread
Yu Chen, Shahin Jabbari, Michael Kearns, Sanjeev Khanna, Jamie Morgenstern
Details | PDF

Noncooperative Games 1

We study a network formation game where agents receive benefits by forming connections to other agents but also incur both direct and indirect costs from the formed connections. Specifically, once the agents have purchased their connections, an attack starts at a randomly chosen vertex in the network and spreads according to the independent cascade model with a fixed probability, destroying any infected agents. The utility or welfare of an agent in our game is defined to be the expected size of the agent's connected component post-attack minus her expenditure in forming connections. Our goal is to understand the properties of the equilibrium networks formed in this game. Our first result concerns the edge density of equilibrium networks. A network connection increases both the likelihood of remaining connected to other agents after an attack as well the likelihood of getting infected by a cascading spread of infection. We show that the latter concern primarily prevails and any equilibrium network in our game contains only $O(n\log n)$ edges where $n$ denotes the number of agents. On the other hand, there are equilibrium networks that contain $\Omega(n)$ edges showing that our edge density bound is tight up to a logarithmic factor. Our second result shows that the presence of attack and its spread through a cascade does not significantly lower social welfare as long as the network is not too dense. We show that any non-trivial equilibrium network with $O(n)$ edges has $\Theta(n^2)$ social welfare, asymptotically similar to the social welfare guarantee in the game without any attacks.
#4515

Equilibrium Characterization for Data Acquisition Games
Jinshuo Dong, Hadi Elzayn, Shahin Jabbari, Michael Kearns, Zachary Schutzman
Details | PDF

Noncooperative Games 1

We study a game between two firms which each provide a service based on machine learning. The firms are presented with the opportunity to purchase a new corpus of data, which will allow them to potentially improve the quality of their products. The firms can decide whether or not they want to buy the data, as well as which learning model to build on that data. We demonstrate a reduction from this potentially complicated action space to a one-shot, two-action game in which each firm only decides whether or not to buy the data. The game admits several regimes which depend on the relative strength of the two firms at the outset and the price at which the data is being offered. We analyze the game's Nash equilibria in all parameter regimes and demonstrate that, in expectation, the outcome of the game is that the initially stronger firm's market position weakens whereas the initially weaker firm's market position becomes stronger. Finally, we consider the perspective of the users of the service and demonstrate that the expected outcome at equilibrium is not the one which maximizes the welfare of the consumers.
#4907

Compact Representation of Value Function in Partially Observable Stochastic Games
Karel Horák, Branislav Bošanský, Christopher Kiekintveld, Charles Kamhoua
Details | PDF

Noncooperative Games 1

Value methods for solving stochastic games with partial observability model the uncertainty of the players as a probability distribution over possible states, where the dimension of the belief space is the number of states. For many practical problems, there are exponentially many states which causes scalability problems. We propose an abstraction technique that addresses this curse of dimensionality by projecting the high-dimensional beliefs onto characteristic vectors of significantly lower dimension (e.g., marginal probabilities). Our main contributions are (1) a novel compact representation of the uncertainty in partially observable stochastic games and (2) a novel algorithm using this representation that is based on existing state-of-the-art algorithms for solving stochastic games with partial observability. Experimental evaluation confirms that the new algorithm using the compact representation dramatically increases scalability compared to the state of the art.
#6310

Temporal Information Design in Contests
Priel Levy, David Sarne, Yonatan Aumann
Details | PDF

Noncooperative Games 1

We study temporal information design in contests, wherein the organizer may, possibly incrementally, disclose information about the participation and performance of some contestants to other (later) contestants. We show that such incremental disclosure can increase the organizer's profit. The expected profit, however, depends on the exact information disclosure structure, and the optimal structure depends on the parameters of the problem. We provide a game-theoretic analysis of such information disclosure schemes as they apply to two common models of contests: (a) simple contests, wherein contestants' decisions concern only their participation; and (b) Tullock contests, wherein contestants choose the effort levels to expend. For each of these we analyze and characterize the equilibrium strategy, and exhibit the potential benefits of information design.
#886

Possibilistic Games with Incomplete Information
Nahla Ben Amor, Helene Fargier, Régis Sabbadin, Meriem Trabelsi
Details | PDF

Noncooperative Games 1

Bayesian games offer a suitable framework for games where the utility degrees are additive in essence. This approach does nevertheless not apply to ordinal games, where the utility degrees do not capture more than a ranking, nor to situations of decision under qualitative uncertainty. This paper proposes a representation framework for ordinal games under possibilistic incomplete information (π-games) and extends the fundamental notion of Nash equilibrium (NE) to this framework. We show that deciding whether a NE exists is a difficult problem (NP-hard) and propose a Mixed Integer Linear Programming (MILP) encoding. Experiments on variants of the GAMUT problems confirm the feasibility of this approach.

Tuesday 13 10:50 - 12:35 HAI|PUM - Personalization and User Modeling (2601-2602)

Chair: Li Chen

#254

Deep Adversarial Social Recommendation
Wenqi Fan, Tyler Derr, Yao Ma, Jianping Wang, Jiliang Tang, Qing Li
Details | PDF

Personalization and User Modeling

Recent years have witnessed rapid developments on social recommendation techniques for improving the performance of recommender systems due to the growing influence of social networks to our daily life. The majority of existing social recommendation methods unify user representation for the user-item interactions (item domain) and user-user connections (social domain). However, it may restrain user representation learning in each respective domain, since users behave and interact differently in the two domains, which makes their representations to be heterogeneous. In addition, most of traditional recommender systems can not efficiently optimize these objectives, since they utilize negative sampling technique which is unable to provide enough informative guidance towards the training during the optimization process. In this paper, to address the aforementioned challenges, we propose a novel deep adversarial social recommendation framework DASO. It adopts a bidirectional mapping method to transfer users' information between social domain and item domain using adversarial learning. Comprehensive experiments on two real-world datasets show the effectiveness of the proposed framework.
#1863

Minimizing Time-to-Rank: A Learning and Recommendation Approach
Haoming Li, Sujoy Sikdar, Rohit Vaish, Junming Wang, Lirong Xia, Chaonan Ye
Details | PDF

Personalization and User Modeling

Consider the following problem faced by an online voting platform: A user is provided with a list of alternatives, and is asked to rank them in order of preference using only drag-and-drop operations. The platform's goal is to recommend an initial ranking that minimizes the time spent by the user in arriving at her desired ranking. We develop the first optimization framework to address this problem, and make theoretical as well as practical contributions. On the practical side, our experiments on the Amazon Mechanical Turk platform provide two interesting insights about user behavior: First, that users' ranking strategies closely resemble selection or insertion sort, and second, that the time taken for a drag-and-drop operation depends linearly on the number of positions moved. These insights directly motivate our theoretical model of the optimization problem. We show that computing an optimal recommendation is NP-hard, and provide exact and approximation algorithms for a variety of special cases of the problem. Experimental evaluation on MTurk shows that, compared to a random recommendation strategy, the proposed approach reduces the (average) time-to-rank by up to 50%.
#2398

DeepAPF: Deep Attentive Probabilistic Factorization for Multi-site Video Recommendation
Huan Yan, Xiangning Chen, Chen Gao, Yong Li, Depeng Jin
Details | PDF

Personalization and User Modeling

Existing web video systems recommend videos according to users' viewing history from its own website. However, since many users watch videos in multiple websites, this approach fails to capture these users' interests across sites. In this paper, we investigate the user viewing behavior in multiple sites based on a large scale real dataset. We find that user interests are comprised of cross-site consistent part and site-specific part with different degrees of the importance. Existing linear matrix factorization recommendation model has limitation in modeling such complicated interactions. Thus, we propose a model of Deep Attentive Probabilistic Factorization (DeepAPF) to exploit deep learning method to approximate such complex user-video interaction. DeepAPF captures both cross-site common interests and site-specific interests with non-uniform importance weights learned by the attentional network. Extensive experiments show that our proposed model outperforms by 17.62%, 7.9% and 8.1% with the comparison of three state-of-the-art baselines. Our study provides insight to integrate user viewing records from multiple sites via the trusted third party, which gains mutual benefits in video recommendation.
#3165

Personalized Multimedia Item and Key Frame Recommendation
Le Wu, Lei Chen, Yonghui Yang, Richang Hong, Yong Ge, Xing Xie, Meng Wang
Details | PDF

Personalization and User Modeling

When recommending or advertising items to users, an emerging trend is to present each multimedia item with a key frame image (e.g., the poster of a movie). As each multimedia item can be represented as multiple fine-grained visual images (e.g., related images of the movie), personalized key frame recommendation is necessary in these applications to attract users' unique visual preferences. However, previous personalized key frame recommendation models relied on users' fine grained image behavior of multimedia items (e.g., user-image interaction behavior), which is often not available in real scenarios. In this paper, we study the general problem of joint multimedia item and key frame recommendation in the absence of the fine-grained user-image behavior. We argue that the key challenge of this problem lies in discovering users' visual profiles for key frame recommendation, as most recommendation models would fail without any users' fine-grained image behavior. To tackle this challenge, we leverage users' item behavior by projecting users(items) in two latent spaces: a collaborative latent space and a visual latent space. We further design a model to discern both the collaborative and visual dimensions of users, and model how users make decisive item preferences from these two spaces. As a result, the learned user visual profiles could be directly applied for key frame recommendation. Finally, experimental results on a real-world dataset clearly show the effectiveness of our proposed model on the two recommendation tasks.
#3571

Discrete Trust-aware Matrix Factorization for Fast Recommendation
Guibing Guo, Enneng Yang, Li Shen, Xiaochun Yang, Xiaodong He
Details | PDF

Personalization and User Modeling

Trust-aware recommender systems have received much attention recently for their abilities to capture the influence among connected users. However, they suffer from the efficiency issue due to large amount of data and time-consuming real-valued operations. Although existing discrete collaborative filtering may alleviate this issue to some extent, it is unable to accommodate social influence. In this paper we propose a discrete trust-aware matrix factorization (DTMF) model to take dual advantages of both social relations and discrete technique for fast recommendation. Specifically, we map the latent representation of users and items into a joint hamming space by recovering the rating and trust interactions between users and items. We adopt a sophisticated discrete coordinate descent (DCD) approach to optimize our proposed model. In addition, experiments on two real-world datasets demonstrate the superiority of our approach against other state-of-the-art approaches in terms of ranking accuracy and efficiency.
#3677

An Input-aware Factorization Machine for Sparse Prediction
Yantao Yu, Zhen Wang, Bo Yuan
Details | PDF

Personalization and User Modeling

Factorization machines (FMs) are a class of general predictors working effectively with sparse data, which represents features using factorized parameters and weights. However, the accuracy of FMs can be adversely affected by the fixed representation trained for each feature, as the same feature is usually not equally predictive and useful in different instances. In fact, the inaccurate representation of features may even introduce noise and degrade the overall performance. In this work, we improve FMs by explicitly considering the impact of individual input upon the representation of features. We propose a novel model named \textit{Input-aware Factorization Machine} (IFM), which learns a unique input-aware factor for the same feature in different instances via a neural network. Comprehensive experiments on three real-world recommendation datasets are used to demonstrate the effectiveness and mechanism of IFM. Empirical results indicate that IFM is significantly better than the standard FM model and consistently outperforms four state-of-the-art deep learning based methods.
#4234

Dynamic Item Block and Prediction Enhancing Block for Sequential Recommendation
Guibing Guo, Shichang Ouyang, Xiaodong He, Fajie Yuan, Xiaohua Liu
Details | PDF

Personalization and User Modeling

Sequential recommendation systems have become a research hotpot recently to suggest users with the next item of interest (to interact with). However, existing approaches suffer from two limitations: (1) The representation of an item is relatively static and fixed for all users. We argue that even a same item should be represented distinctively with respect to different users and time steps. (2) The generation of a prediction for a user over an item is computed in a single scale (e.g., by their inner product), ignoring the nature of multi-scale user preferences. To resolve these issues, in this paper we propose two enhancing building blocks for sequential recommendation. Specifically, we devise a Dynamic Item Block (DIB) to learn dynamic item representation by aggregating the embeddings of those who rated the same item before that time step. Then, we come up with a Prediction Enhancing Block (PEB) to project user representation into multiple scales, based on which many predictions can be made and attentively aggregated for enhanced learning. Each prediction is generated by a softmax over a sampled itemset rather than the whole item space for efficiency. We conduct a series of experiments on four real datasets, and show that even a basic model can be greatly enhanced with the involvement of DIB and PEB in terms of ranking accuracy. The code and datasets can be obtained from https://github.com/ouououououou/DIB-PEB-Sequential-RS

Tuesday 13 10:50 - 12:35 KRR|ACC - Action, Change and Causality (2603-2604)

Chair: Ruichu Cai

#921

Estimating Causal Effects of Tone in Online Debates
Dhanya Sridhar, Lise Getoor
Details | PDF

Action, Change and Causality

Statistical methods applied to social media posts shed light on the dynamics of online dialogue. For example, users' wording choices predict their persuasiveness and users adopt the language patterns of other dialogue participants. In this paper, we estimate the causal effect of reply tones in debates on linguistic and sentiment changes in subsequent responses. The challenge for this estimation is that a reply's tone and subsequent responses are confounded by the users' ideologies on the debate topic and their emotions. To overcome this challenge, we learn representations of ideology using generative models of text. We study debates from 4Forums.com and compare annotated tones of replying such as emotional versus factual, or reasonable versus attacking. We show that our latent confounder representation reduces bias in ATE estimation. Our results suggest that factual and asserting tones affect dialogue and provide a methodology for estimating causal effects from text.
#2116

Automatic Verification of FSA Strategies via Counterexample-Guided Local Search for Invariants
Kailun Luo, Yongmei Liu
Details | PDF

Action, Change and Causality

Strategy representation and reasoning has received much attention over the past years. In this paper, we consider the representation of general strategies that solve a class of (possibly infinitely many) games with similar structures, and their automatic verification, which is an undecidable problem. We propose to represent a general strategy by an FSA (Finite State Automaton) with edges labelled by restricted Golog programs. We formalize the semantics of FSA strategies in the situation calculus. Then we propose an incomplete method for verifying whether an FSA strategy is a winning strategy by counterexample-guided local search for appropriate invariants. We implemented our method and did experiments on combinatorial game and also single-agent domains. Experimental results showed that our system can successfully verify most of them within a reasonable amount of time.
#2213

Causal Discovery with Cascade Nonlinear Additive Noise Model
Ruichu Cai, Jie Qiao, Kun Zhang, Zhenjie Zhang, Zhifeng Hao
Details | PDF

Action, Change and Causality

Identification of causal direction between a causal-effect pair from observed data has recently attracted much attention. Various methods based on functional causal models have been proposed to solve this problem, by assuming the causal process satisfies some (structural) constraints and showing that the reverse direction violates such constraints. The nonlinear additive noise model has been demonstrated to be effective for this purpose, but the model class is not transitive--even if each direct causal relation follows this model, indirect causal influences, which result from omitted intermediate causal variables and are frequently encountered in practice, do not necessarily follow the model constraints; as a consequence, the nonlinear additive noise model may fail to correctly discover causal direction. In this work, we propose a cascade nonlinear additive noise model to represent such causal influences--each direct causal relation follows the nonlinear additive noise model but we observe only the initial cause and final effect. We further propose a method to estimate the model, including the unmeasured intermediate variables, from data, under the variational auto-encoder framework. Our theoretical results show that with our model, causal direction is identifiable under suitable technical conditions on the data generation process. Simulation results illustrate the power of the proposed method in identifying indirect causal relations across various settings, and experimental results on real data suggest that the proposed model and method greatly extend the applicability of causal discovery based on functional causal models in nonlinear cases.
#3621

Boosting Causal Embeddings via Potential Verb-Mediated Causal Patterns
Zhipeng Xie, Feiteng Mu
Details | PDF

Action, Change and Causality

Existing approaches to causal embeddings rely heavily on hand-crafted high-precision causal patterns, leading to limited coverage. To solve this problem, this paper proposes a method to boost causal embeddings by exploring potential verb-mediated causal patterns. It first constructs a seed set of causal word pairs, then uses them as supervision to characterize the causal strengths of extracted verb-mediated patterns, and finally exploits the weighted extractions by those verb-mediated patterns in the construction of boosted causal embeddings. Experimental results have shown that the boosted causal embeddings outperform several state-of-the-arts significantly on both English and Chinese. As by-products, the top-ranked patterns coincide with human intuition about causality.
#5703

From Statistical Transportability to Estimating the Effect of Stochastic Interventions
Juan D. Correa, Elias Bareinboim
Details | PDF

Action, Change and Causality

Learning systems often face a critical challenge when applied to settings that differ from those under which they were initially trained. In particular, the assumption that both the source/training and the target/deployment domains follow the same causal mechanisms and observed distributions is commonly violated. This implies that the robustness and convergence guarantees usually expected from these methods are no longer attainable. In this paper, we study these violations through causal lens using the formalism of statistical transportability [Pearl and Bareinboim, 2011] (PB, for short). We start by proving sufficient and necessary graphical conditions under which a probability distribution observed in the source domain can be extrapolated to the target one, where strictly less data is available. We develop the first sound and complete procedure for statistical transportability, which formally closes the problem introduced by PB. Further, we tackle the general challenge of identification of stochastic interventions from observational data [Sec.~4.4, Pearl, 2000]. This problem has been solved in the context of atomic interventions using Pearl's do-calculus, which lacks complete treatment in the stochastic case. We prove completeness of stochastic identification by constructing a reduction of any instance of this problem to an instance of statistical transportability, closing the problem.
#6337

ASP-based Discovery of Semi-Markovian Causal Models under Weaker Assumptions
Zhalama, Jiji Zhang, Frederick Eberhardt, Wolfgang Mayer, Mark Junjie Li
Details | PDF

Action, Change and Causality

In recent years the possibility of relaxing the so-called Faithfulness assumption in automated causal discovery has been investigated. The investigation showed (1) that the Faithfulness assumption can be weakened in various ways that in an important sense preserve its power, and (2) that weakening of Faithfulness may help to speed up methods based on Answer Set Programming. However, this line of work has so far only considered the discovery of causal models without latent variables. In this paper, we study weakenings of Faithfulness for constraint-based discovery of semi-Markovian causal models, which accommodate the possibility of latent variables, and show that both (1) and (2) remain the case in this more realistic setting.
#10961

(Sister Conferences Best Papers Track) On Causal Identification under Markov Equivalence
Amin Jaber, Jiji Zhang, Elias Bareinboim
Details | PDF

Action, Change and Causality

In this work, we investigate the problem of computing an experimental distribution from a combination of the observational distribution and a partial qualitative description of the causal structure of the domain under investigation. This description is given by a partial ancestral graph (PAG) that represents a Markov equivalence class of causal diagrams, i.e., diagrams that entail the same conditional independence model over observed variables, and is learnable from the observational data. Accordingly, we develop a complete algorithm to compute the causal effect of an arbitrary set of intervention variables on an arbitrary outcome set.

Tuesday 13 10:50 - 12:35 NLP|NLP - Natural Language Processing 1 (2605-2606)

Chair: Tianyong Hao

#1128

Leap-LSTM: Enhancing Long Short-Term Memory for Text Categorization
Ting Huang, Gehui Shen, Zhi-Hong Deng
Details | PDF

Natural Language Processing 1

Recurrent Neural Networks (RNNs) are widely used in the field of natural language processing (NLP), ranging from text categorization to question answering and machine translation. However, RNNs generally read the whole text from beginning to end or vice versa sometimes, which makes it inefficient to process long texts. When reading a long document for a categorization task, such as topic categorization, large quantities of words are irrelevant and can be skipped. To this end, we propose Leap-LSTM, an LSTM-enhanced model which dynamically leaps between words while reading texts. At each step, we utilize several feature encoders to extract messages from preceding texts, following texts and the current word, and then determine whether to skip the current word. We evaluate Leap-LSTM on several text categorization tasks: sentiment analysis, news categorization, ontology classification and topic classification, with five benchmark data sets. The experimental results show that our model reads faster and predicts better than standard LSTM. Compared to previous models which can also skip words, our model achieves better trade-offs between performance and efficiency.
#1860

Deep Mask Memory Network with Semantic Dependency and Context Moment for Aspect Level Sentiment Classification
Peiqin Lin, Meng Yang, Jianhuang Lai
Details | PDF

Natural Language Processing 1

Aspect level sentiment classification aims at identifying the sentiment of each aspect term in a sentence. Deep memory networks often use location information between context word and aspect to generate the memory. Although improved results are achieved, the relation information among aspects in the same sentence is ignored and the word location can't bring enough and accurate information for the analysis on the aspect sentiment. In this paper, we propose a novel framework for aspect level sentiment classification, deep mask memory network with semantic dependency and context moment (DMMN-SDCM), which integrates semantic parsing information of the aspect and the inter-aspect relation information into deep memory network. With the designed attention mechanism based on semantic dependency information, different parts of the context memory in different computational layers are selected and useful inter-aspect information in the same sentence is exploited for the desired aspect. To make full use of the inter-aspect relation information, we also jointly learn a context moment learning task, which aims to learn the sentiment distribution of the entire sentence for providing a background for the desired aspect. We examined the merit of our model on SemEval 2014 Datasets, and the experimental results show that our model achieves a state-of-the-art performance.
#3512

Robust Embedding with Multi-Level Structures for Link Prediction
Zihan Wang, Zhaochun Ren, Chunyu He, Peng Zhang, Yue Hu
Details | PDF

Natural Language Processing 1

Knowledge Graph (KG) embedding has become crucial for the task of link prediction. Recent work applies encoder-decoder models to tackle this problem, where an encoder is formulated as a graph neural network (GNN) and a decoder is represented by an embedding method. These approaches enforce embedding techniques with structure information. Unfortunately, existing GNN-based frameworks still confront 3 severe problems: low representational power, stacking in a flat way, and poor robustness to noise. In this work, we propose a novel multi-level graph neural network (M-GNN) to address the above challenges. We first identify an injective aggregate scheme and design a powerful GNN layer using multi-layer perceptrons (MLPs). Then, we define graph coarsening schemes for various kinds of relations, and stack GNN layers on a series of coarsened graphs, so as to model hierarchical structures. Furthermore, attention mechanisms are adopted so that our approach can make predictions accurately even on the noisy knowledge graph. Results on WN18 and FB15k datasets show that our approach is effective in the standard link prediction task, significantly and consistently outperforming competitive baselines. Furthermore, robustness analysis on FB15k-237 dataset demonstrates that our proposed M-GNN is highly robust to sparsity and noise.
#5698

Medical Concept Representation Learning from Multi-source Data
Tian Bai, Brian L. Egleston, Richard Bleicher, Slobodan Vucetic
Details | PDF

Natural Language Processing 1

Representing words as low dimensional vectors is very useful in many natural language processing tasks. This idea has been extended to medical domain where medical codes listed in medical claims are represented as vectors to facilitate exploratory analysis and predictive modeling. However, depending on a type of a medical provider, medical claims can use medical codes from different ontologies or from a combination of ontologies, which complicates learning of the representations. To be able to properly utilize such multi-source medical claim data, we propose an approach that represents medical codes from different ontologies in the same vector space. We first modify the Pointwise Mutual Information (PMI) measure of similarity between the codes. We then develop a new negative sampling method for word2vec model that implicitly factorizes the modified PMI matrix. The new approach was evaluated on the code cross-reference problem, which aims at identifying similar codes across different ontologies. In our experiments, we evaluated cross-referencing between ICD-9 and CPT medical code ontologies. Our results indicate that vector representations of codes learned by the proposed approach provide superior cross-referencing when compared to several existing approaches.
#3757

Graph-based Neural Sentence Ordering
Yongjing Yin, Linfeng Song, Jinsong Su, Jiali Zeng, Chulun Zhou, Jiebo Luo
Details | PDF

Natural Language Processing 1

Sentence ordering is to restore the original paragraph from a set of sentences. It involves capturing global dependencies among sentences regardless of their input order. In this paper, we propose a novel and flexible graph-based neural sentence ordering model, which adopts graph recurrent network \citep{Zhang:acl18} to accurately learn semantic representations of the sentences. Instead of assuming connections between all pairs of input sentences, we use entities that are shared among multiple sentences to make more expressive graph representations with less noise. Experimental results show that our proposed model outperforms the existing state-of-the-art systems on several benchmark datasets, demonstrating the effectiveness of our model. We also conduct a thorough analysis on how entities help the performance. Our code is available at https://github.com/DeepLearnXMU/NSEG.git.
#3096

Incorporating Structural Information for Better Coreference Resolution
Fang Kong, Fu Jian
Details | PDF

Natural Language Processing 1

Coreference resolution plays an important role in text understanding. In the literature, various neural approaches have been proposed and achieved considerable success. However, structural information, which has been proven useful in coreference resolution, has been largely ignored in previous neural approaches. In this paper, we focus on effectively incorporating structural information to neural coreference resolution from three aspects. Firstly, nodes in the parse trees are employed as a constraint to filter out impossible text spans (i.e., mention candidates) in reducing the computational complexity. Secondly, contextual information is encoded in the traversal node sequence instead of the word sequence to better capture hierarchical information for text span representation. Lastly, additional structural features (e.g., the path, siblings, degrees, category of the current node) are encoded to enhance the mention representation. Experimentation on the data-set of the CoNLL 2012 Shared Task shows the effectiveness of our proposed approach in incorporating structural information into neural coreference resolution.
#3033

Adapting BERT for Target-Oriented Multimodal Sentiment Classification
Jianfei Yu, Jing Jiang
Details | PDF

Natural Language Processing 1

As an important task in Sentiment Analysis, Target-oriented Sentiment Classification (TSC) aims to identify sentiment polarities over each opinion target in a sentence. However, existing approaches to this task primarily rely on the textual content, but ignoring the other increasingly popular multimodal data sources (e.g., images), which can enhance the robustness of these text-based models. Motivated by this observation and inspired by the recently proposed BERT architecture, we study Target-oriented Multimodal Sentiment Classification (TMSC) and propose a multimodal BERT architecture. To model intra-modality dynamics, we first apply BERT to obtain target-sensitive textual representations. We then borrow the idea from self-attention and design a target attention mechanism to perform target-image matching to derive target-sensitive visual representations. To model inter-modality dynamics, we further propose to stack a set of self-attention layers to capture multimodal interactions. Experimental results show that our model can outperform several highly competitive approaches for TSC and TMSC.

Tuesday 13 10:50 - 12:35 ML|KM - Kernel Methods (2501-2502)

Chair: Junchi Yan

#248

Exchangeability and Kernel Invariance in Trained MLPs
Russell Tsuchida, Fred Roosta, Marcus Gallagher
Details | PDF

Kernel Methods

In the analysis of machine learning models, it is often convenient to assume that the parameters are IID. This assumption is not satisfied when the parameters are updated through training processes such as Stochastic Gradient Descent. A relaxation of the IID condition is a probabilistic symmetry known as exchangeability. We show the sense in which the weights in MLPs are exchangeable. This yields the result that in certain instances, the layer-wise kernel of fully-connected layers remains approximately constant during training. Our results shed light on such kernel properties throughout training while limiting the use of unrealistic assumptions.
#1350

Deep Spectral Kernel Learning
Hui Xue, Zheng-Fan Wu, Wei-Xiang Sun
Details | PDF

Kernel Methods

Recently, spectral kernels have attracted wide attention in complex dynamic environments. These advanced kernels mainly focus on breaking through the crucial limitation on locality, that is, the stationarity and the monotonicity. But actually, owing to the inefficiency of shallow models in computational elements, they are more likely unable to accurately reveal dynamic and potential variations. In this paper, we propose a novel deep spectral kernel network (DSKN) to naturally integrate non-stationary and non-monotonic spectral kernels into elegant deep architectures in an interpretable way, which can be further generalized to cover most kernels. Concretely, we firstly deal with the general form of spectral kernels by the inverse Fourier transform. Secondly, DSKN is constructed by embedding the preeminent spectral kernels into each layer to boost the efficiency in computational elements, which can effectively reveal the dynamic input-dependent characteristics and potential long-range correlations by compactly representing complex advanced concepts. Thirdly, detailed analyses of DSKN are presented. Owing to its universality, we propose a unified spectral transform technique to flexibly extend and reasonably initialize domain-related DSKN. Furthermore, the representer theorem of DSKN is given. Systematical experiments demonstrate the superiority of DSKN compared to state-of-the-art relevant algorithms on varieties of standard real-world tasks.
#2312

GCN-LASE: Towards Adequately Incorporating Link Attributes in Graph Convolutional Networks
Ziyao Li, Liang Zhang, Guojie Song
Details | PDF

Kernel Methods

Graph Convolutional Networks (GCNs) have proved to be a most powerful architecture in aggregating local neighborhood information for individual graph nodes. Low-rank proximities and node features are successfully leveraged in existing GCNs, however, attributes that graph links may carry are commonly ignored, as almost all of these models simplify graph links into binary or scalar values describing node connectedness. In our paper instead, links are reverted to hypostatic relationships between entities with descriptional attributes. We propose GCN-LASE (GCN with Link Attributes and Sampling Estimation), a novel GCN model taking both node and link attributes as inputs. To adequately captures the interactions between link and node attributes, their tensor product is used as neighbor features, based on which we define several graph kernels and further develop according architectures for LASE. Besides, to accelerate the training process, the sum of features in entire neighborhoods are estimated through Monte Carlo method, with novel sampling strategies designed for LASE to minimize the estimation variance. Our experiments show that LASE outperforms strong baselines over various graph datasets, and further experiments corroborate the informativeness of link attributes and our model's ability of adequately leveraging them.
#4091

High Dimensional Bayesian Optimization via Supervised Dimension Reduction
Miao Zhang, Huiqi Li, Steven Su
Details | PDF

Kernel Methods

Bayesian optimization (BO) has been broadly applied to computational expensive problems, but it is still challenging to extend BO to high dimensions. Existing works are usually under strict assumption of an additive or a linear embedding structure for objective functions. This paper directly introduces a supervised dimension reduction method, Sliced Inverse Regression (SIR), to high dimensional Bayesian optimization, which could effectively learn the intrinsic sub-structure of objective function during the optimization. Furthermore, a kernel trick is developed to reduce computational complexity and learn nonlinear subset of the unknowing function when applying SIR to extremely high dimensional BO. We present several computational benefits and derive theoretical regret bounds of our algorithm. Extensive experiments on synthetic examples and two real applications demonstrate the superiority of our algorithms for high dimensional Bayesian optimization.
#4362

Graph Space Embedding
João Pereira, Albert K. Groen, Erik S. G. Stroes, Evgeni Levin
Details | PDF

Kernel Methods

We propose the Graph Space Embedding (GSE), a technique that maps the input into a space where interactions are implicitly encoded, with little computations required. We provide theoretical results on an optimal regime for the GSE, namely a feasibility region for its parameters, and demonstrate the experimental relevance of our findings. Next, we introduce a strategy to gain insight on which interactions are responsible for the certain predictions, paving the way for a far more transparent model. In an empirical evaluation on a real-world clinical cohort containing patients with suspected coronary artery disease, the GSE achieves far better performance than traditional algorithms.
#5315

Entangled Kernels
Riikka Huusari, Hachem Kadri
Details | PDF

Kernel Methods

We consider the problem of operator-valued kernel learning and investigate the possibility of going beyond the well-known separable kernels. Borrowing tools and concepts from the field of quantum computing, such as partial trace and entanglement, we propose a new view on operator-valued kernels and define a general family of kernels that encompasses previously known operator-valued kernels, including separable and transformable kernels. Within this framework, we introduce another novel class of operator-valued kernels called entangled kernels that are not separable. We propose an efficient two-step algorithm for this framework, where the entangled kernel is learned based on a novel extension of kernel alignment to operator-valued kernels. The utility of the algorithm is illustrated on both artificial and real data.
#5808

Multi-view Clustering via Late Fusion Alignment Maximization
Siwei Wang, Xinwang Liu, En Zhu, Chang Tang, Jiyuan Liu, Jingtao Hu, Jingyuan Xia, Jianping Yin
Details | PDF

Kernel Methods

Multi-view clustering (MVC) optimally integrates complementary information from different views to improve clustering performance. Although demonstrating promising performance in many applications, we observe that most of existing methods directly combine multiple views to learn an optimal similarity for clustering. These methods would cause intensive computational complexity and over-complicated optimization. In this paper, we theoretically uncover the connection between existing k-means clustering and the alignment between base partitions and consensus partition. Based on this observation, we propose a simple but effective multi-view algorithm termed {Multi-view Clustering via Late Fusion Alignment Maximization (MVC-LFA)}. In specific, MVC-LFA proposes to maximally align the consensus partition with the weighted base partitions. Such a criterion is beneficial to significantly reduce the computational complexity and simplify the optimization procedure. Furthermore, we design a three-step iterative algorithm to solve the new resultant optimization problem with theoretically guaranteed convergence. Extensive experiments on five multi-view benchmark datasets demonstrate the effectiveness and efficiency of the proposed MVC-LFA.

Tuesday 13 10:50 - 12:35 ML|C - Classification 1 (2503-2504)

Chair: Min-Ling Zhang

#755

Learning Topic Models by Neighborhood Aggregation
Ryohei Hisano
Details | PDF

Classification 1

Topic models are frequently used in machine learning owing to their high interpretability and modular structure. However, extending a topic model to include a supervisory signal, to incorporate pre-trained word embedding vectors and to include a nonlinear output function is not an easy task because one has to resort to a highly intricate approximate inference procedure. The present paper shows that topic modeling with pre-trained word embedding vectors can be viewed as implementing a neighborhood aggregation algorithm where messages are passed through a network defined over words. From the network view of topic models, nodes correspond to words in a document and edges correspond to either a relationship describing co-occurring words in a document or a relationship describing the same word in the corpus. The network view allows us to extend the model to include supervisory signals, incorporate pre-trained word embedding vectors and include a nonlinear output function in a simple manner. In experiments, we show that our approach outperforms the state-of-the-art supervised Latent Dirichlet Allocation implementation in terms of held-out document classification tasks.
#2676

Partial Label Learning with Unlabeled Data
Qian-Wei Wang, Yu-Feng Li, Zhi-Hua Zhou
Details | PDF

Classification 1

Partial label learning deals with training examples each associated with a set of candidate labels, among which only one label is valid. Previous studies typically assume that the candidate label sets are provided for all training examples. In many real-world applications such as video character classification, however, it is generally difficult to label a large number of instances and there exists much data left to be unlabeled. We call this kind of problem semi-supervised partial label learning. In this paper, we propose the SSPL method to address this problem. Specifically, an iterative label propagation procedure between partial label examples and unlabeled instances is employed to disambiguate the candidate label sets of partial label examples as well as assign valid labels to unlabeled instances. The importance of unlabeled instances increases adaptively as the number of iteration increases, since they carry richer labeling information. Finally, unseen instances are classified based on the minimum reconstruction error on both partial label and unlabeled instances. Experiments on real-world data sets clearly validate the effectiveness of the proposed SSPL method.
#3265

Zero-shot Learning with Many Classes by High-rank Deep Embedding Networks
Yuchen Guo, Guiguang Ding, Jungong Han, Hang Shao, Xin Lou, Qionghai Dai
Details | PDF

Classification 1

Zero-shot learning (ZSL) is a recently emerging research topic which aims to build classification models for unseen classes with knowledge from auxiliary seen classes. Though many ZSL works have shown promising results on small-scale datasets by utilizing a bilinear compatibility function, the ZSL performance on large-scale datasets with many classes (say, ImageNet) is still unsatisfactory. We argue that the bilinear compatibility function is a low-rank approximation of the true compatibility function such that it is not expressive enough especially when there are a large number of classes because of the rank limitation. To address this issue, we propose a novel approach, termed as High-rank Deep Embedding Networks (GREEN), for ZSL with many classes. In particular, we propose a feature-dependent mixture of softmaxes as the image-class compatibility function, which is a simple extension of the bilinear compatibility function, but yields much better results. It utilizes a mixture of non-linear transformations with feature-dependent latent variables to approximate the true function in a high-rank way, which makes GREEN more expressive. Experiments on several datasets including ImageNet demonstrate GREEN significantly outperforms the state-of-the-art approaches.
#4379

Submodular Batch Selection for Training Deep Neural Networks
K J Joseph, Vamshi Teja R, Krishnakant Singh, Vineeth N Balasubramanian
Details | PDF

Classification 1

Mini-batch gradient descent based methods are the de facto algorithms for training neural network architectures today.We introduce a mini-batch selection strategy based on submodular function maximization. Our novel submodular formulation captures the informativeness of each sample and diversity of the whole subset. We design an efficient, greedy algorithm which can give high-quality solutions to this NP-hard combinatorial optimization problem. Our extensive experiments on standard datasets show that the deep models trained using the proposed batch selection strategy provide better generalization than Stochastic Gradient Descent as well as a popular baseline sampling strategy across different learning rates, batch sizes, and distance metrics.
#6379

Extensible Cross-Modal Hashing
Tian-yi Chen, Lan Zhang, Shi-cong Zhang, Zi-long Li, Bai-chuan Huang
Details | PDF

Classification 1

Cross-modal hashing (CMH) models are introduced to significantly reduce the cost of large-scale cross-modal data retrieval systems. In many real-world applications, however, data of new categories arrive continuously, which requires the model has good extensibility. That is the model should be updated to accommodate data of new categories but still retain good performance for the old categories with minimum computation cost. Unfortunately, existing CMH methods fail to satisfy the extensibility requirements. In this work, we propose a novel extensible cross-modal hashing (ECMH) to enable highly efficient and low-cost model extension. Our proposed ECMH has several desired features: 1) it has good forward compatibility, so there is no need to update old hash codes; 2) the ECMH model is extended to support new data categories using only new data by a well-designed ``weak constraint incremental learning'' algorithm, which saves up to 91\% time cost comparing with retraining the model with both new and old data; 3) the extended model achieves high precision and recall on both old and new tasks. Our extensive experiments show the effectiveness of our design.
#4735

Semi-supervised User Profiling with Heterogeneous Graph Attention Networks
Weijian Chen, Yulong Gu, Zhaochun Ren, Xiangnan He, Hongtao Xie, Tong Guo, Dawei Yin, Yongdong Zhang
Details | PDF

Classification 1

Aiming to represent user characteristics and personal interests, the task of user profiling is playing an increasingly important role for many real-world applications, e.g., e-commerce and social networks platforms. By exploiting the data like texts and user behaviors, most existing solutions address user profiling as a classification task, where each user is formulated as an individual data instance. Nevertheless, a user's profile is not only reflected from her/his affiliated data, but also can be inferred from other users, e.g., the users that have similar co-purchase behaviors in e-commerce, the friends in social networks, etc. In this paper, we approach user profiling in a semi-supervised manner, developing a generic solution based on heterogeneous graph learning. On the graph, nodes represent the entities of interest (e.g., users, items, attributes of items, etc.), and edges represent the interactions between entities. Our heterogeneous graph attention networks (HGAT) method learns the representation for each entity by accounting for the graph structure, and exploits the attention mechanism to discriminate the importance of each neighbor entity. Through such a learning scheme, HGAT can leverage both unsupervised information and limited labels of users to build the predictor. Extensive experiments on a real-world e-commerce dataset verify the effectiveness and rationality of our HGAT for user profiling.
#307

Deterministic Routing between Layout Abstractions for Multi-Scale Classification of Visually Rich Documents
Ritesh Sarkhel, Arnab Nandi
Details | PDF

Classification 1

Classifying heterogeneous visually rich documents is a challenging task. Difficulty of this task increases even more if the maximum allowed inference turnaround time is constrained by a threshold. The increased overhead in inference cost, compared to the limited gain in classification capabilities make current multi-scale approaches infeasible in such scenarios. There are two major contributions of this work. First, we propose a spatial pyramid model to extract highly discriminative multi-scale feature descriptors from a visually rich document by leveraging the inherent hierarchy of its layout. Second, we propose a deterministic routing scheme for accelerating end-to-end inference by utilizing the spatial pyramid model. A depth-wise separable multi-column convolutional network is developed to enable our method. We evaluated the proposed approach on four publicly available, benchmark datasets of visually rich documents. Results suggest that our proposed approach demonstrates robust performance compared to the state-of-the-art methods in both classification accuracy and total inference turnaround.

Tuesday 13 10:50 - 12:35 ML|DM - Data Mining 1 (2505-2506)

Chair: Junming Shao

#708

Inferring Substitutable Products with Deep Network Embedding
Shijie Zhang, Hongzhi Yin, Qinyong Wang, Tong Chen, Hongxu Chen, Quoc Viet Hung Nguyen
Details | PDF

Data Mining 1

On E-commerce platforms, understanding the relationships (e.g., substitute and complement) among products from user's explicit feedback, such as users' online transactions, is of great importance to boost extra sales. However, the significance of such relationships is usually neglected by existing recommender systems. In this paper, we propose a semisupervised deep embedding model, namely, Substitute Products Embedding Model (SPEM), which models the substitutable relationships between products by preserving the second-order proximity, negative first-order proximity and semantic similarity in a product co-purchasing graph based on user's purchasing behaviours. With SPEM, the learned representations of two substitutable products align closely in the latent embedding space. Extensive experiments on real-world datasets are conducted, and the results verify that our model outperforms state-of-the-art baselines.
#1663

Low-Bit Quantization for Attributed Network Representation Learning
Hong Yang, Shirui Pan, Ling Chen, Chuan Zhou, Peng Zhang
Details | PDF

Data Mining 1

Attributed network embedding plays an important role in transferring network data into compact vectors for effective network analysis. Existing attributed network embedding models are designed either in continuous Euclidean spaces which introduce data redundancy or in binary coding spaces which incur significant loss of representation accuracy. To this end, we present a new Low-Bit Quantization for Attributed Network Representation Learning model (LQANR for short) that can learn compact node representations with low bitwidth values while preserving high representation accuracy. Specifically, we formulate a new representation learning function based on matrix factorization that can jointly learn the low-bit node representations and the layer aggregation weights under the low-bit quantization constraint. Because the new learning function falls into the category of mixed integer optimization, we propose an efficient mixed-integer based alternating direction method of multipliers (ADMM) algorithm as the solution. Experiments on real-world node classification and link prediction tasks validate the promising results of the proposed LQANR model.
#2991

iDev: Enhancing Social Coding Security by Cross-platform User Identification Between GitHub and Stack Overflow
Yujie Fan, Yiming Zhang, Shifu Hou, Lingwei Chen, Yanfang Ye, Chuan Shi, Liang Zhao, Shouhuai Xu
Details | PDF

Data Mining 1

As modern social coding platforms such as GitHub and Stack Overflow become increasingly popular, their potential security risks increase as well (e.g., risky or malicious codes could be easily embedded and distributed). To enhance the social coding security, in this paper, we propose to automate cross-platform user identification between GitHub and Stack Overflow to combat the attackers who attempt to poison the modern software programming ecosystem. To solve this problem, an important insight brought by this work is to leverage social coding properties in addition to user attributes for cross-platform user identification. To depict users in GitHub and Stack Overflow (attached with attributed information), projects, questions and answers as well as the rich semantic relations among them, we first introduce an attributed heterogeneous information network (AHIN) for modeling. Then, we propose a novel AHIN representation learning model AHIN2Vec to efficiently learn node (i.e., user) representations in AHIN for cross-platform user identification. Comprehensive experiments on the data collections from GitHub and Stack Overflow are conducted to validate the effectiveness of our developed system iDev integrating our proposed method in cross-platform user identification by comparisons with other baselines.
#3985

Outlier-Robust Multi-Aspect Streaming Tensor Completion and Factorization
Mehrnaz Najafi, Lifang He, Philip S. Yu
Details | PDF

Data Mining 1

With the increasing popularity of streaming tensor data such as videos and audios, tensor factorization and completion have attracted much attention recently in this area. Existing work usually assume that streaming tensors only grow in one mode. However, in many real-world scenarios, tensors may grow in multiple modes (or dimensions), i.e., multi-aspect streaming tensors. Standard streaming methods cannot directly handle this type of data elegantly. Moreover, due to inevitable system errors, data may be contaminated by outliers, which cause significant deviations from real data values and make such research particularly challenging. In this paper, we propose a novel method for Outlier-Robust Multi-Aspect Streaming Tensor Completion and Factorization (OR-MSTC), which is a technique capable of dealing with missing values and outliers in multi-aspect streaming tensor data. The key idea is to decompose the tensor structure into an underlying low-rank clean tensor and a structured-sparse error (outlier) tensor, along with a weighting tensor to mask missing data. We also develop an efficient algorithm to solve the non-convex and non-smooth optimization problem of OR-MSTC. Experimental results on various real-world datasets show the superiority of the proposed method over the baselines and its robustness against outliers.
#5069

Convolutional Gaussian Embeddings for Personalized Recommendation with Uncertainty
Junyang Jiang, Deqing Yang, Yanghua Xiao, Chenlu Shen
Details | PDF

Data Mining 1

Most of existing embedding based recommendation models use embeddings (vectors) to represent users and items which contain latent features of users and items. Each of such embeddings corresponds to a single fixed point in low-dimensional space, thus fails to precisely represent the users/items with uncertainty which are often observed in recommender systems. Addressing this problem, we propose a unified deep recommendation framework employing Gaussian embeddings, which are proven adaptive to uncertain preferences exhibited by some users, resulting in better user representations and recommendation performance. Furthermore, our framework adopts Monte-Carlo sampling and convolutional neural networks to compute the correlation between the objective user and the candidate item, based on which precise recommendations are achieved. Our extensive experiments on two benchmark datasets not only justify that our proposed Gaussian embeddings capture the uncertainty of users very well, but also demonstrate its superior performance over the state-of-the-art recommendation models.
#609

Fine-grained Event Categorization with Heterogeneous Graph Convolutional Networks
Hao Peng, Jianxin Li, Qiran Gong, Yangqiu Song, Yuanxin Ning, Kunfeng Lai, Philip S. Yu
Details | PDF

Data Mining 1

Events are happening in real-world and real-time, which can be planned and organized occasions involving multiple people and objects. Social media platforms publish a lot of text messages containing public events with comprehensive topics. However, mining social events is challenging due to the heterogeneous event elements in texts and explicit and implicit social network structures. In this paper, we design an event meta-schema to characterize the semantic relatedness of social events and build an event-based heterogeneous information network (HIN) integrating information from external knowledge base, and propose a novel Pairwise Popularity Graph Convolutional Network (PP-GCN) based fine-grained social event categorization model. We propose a Knowledgeable meta-paths Instances based social Event Similarity (KIES) between events and build a weighted adjacent matrix as input to the PP-GCN model. Comprehensive experiments on real data collections are conducted to compare various social event detection and clustering tasks. Experimental results demonstrate that our proposed framework outperforms other alternative social event categorization techniques.
#1716

Topology Optimization based Graph Convolutional Network
Liang Yang, Zesheng Kang, Xiaochun Cao, Di Jin, Bo Yang, Yuanfang Guo
Details | PDF

Data Mining 1

In the past few years, semi-supervised node classification in attributed network has been developed rapidly. Inspired by the success of deep learning, researchers adopt the convolutional neural network to develop the Graph Convolutional Networks (GCN), and they have achieved surprising classification accuracy by considering the topological information and employing the fully connected network (FCN). However, the given network topology may also induce a performance degradation if it is directly employed in classification, because it may possess high sparsity and certain noises. Besides, the lack of learnable filters in GCN also limits the performance. In this paper, we propose a novel Topology Optimization based Graph Convolutional Networks (TO-GCN) to fully utilize the potential information by jointly refining the network topology and learning the parameters of the FCN. According to our derivations, TO-GCN is more flexible than GCN, in which the filters are fixed and only the classifier can be updated during the learning process. Extensive experiments on real attributed networks demonstrate the superiority of the proposed TO-GCN against the state-of-the-art approaches.

Tuesday 13 10:50 - 12:35 AMS|MP - Multi-agent Planning (2401-2402)

Chair: Sven Koenig

#344

Multi-Agent Pathfinding with Continuous Time
Anton Andreychuk, Konstantin Yakovlev, Dor Atzmon, Roni Stern
Details | PDF

Multi-agent Planning

Multi-Agent Pathfinding (MAPF) is the problem of finding paths for multiple agents such that every agent reaches its goal and the agents do not collide. Most prior work on MAPF were on grids, assumed agents' actions have uniform duration, and that time is discretized into timesteps. In this work, we propose a MAPF algorithm that do not assume any of these assumptions, is complete, and provides provably optimal solutions. This algorithm is based on a novel combination of Safe Interval Path Planning (SIPP), a continuous time single agent planning algorithms, and Conflict-Based Search (CBS). We analyze this algorithm, discuss its pros and cons, and evaluate it experimentally on several standard benchmarks.
#2201

Priority Inheritance with Backtracking for Iterative Multi-agent Path Finding
Keisuke Okumura, Manao Machida, Xavier Défago, Yasumasa Tamura
Details | PDF

Multi-agent Planning

The Multi-agent Path Finding (MAPF) problem consists in all agents having to move to their own destinations while avoiding collisions. In practical applications to the problem, such as for navigation in an automated warehouse, MAPF must be solved iteratively. We present here a novel approach to iterative MAPF, that we call Priority Inheritance with Backtracking (PIBT). PIBT gives a unique priority to each agent every timestep, so that all movements are prioritized. Priority inheritance, which aims at dealing effectively with priority inversion in path adjustment within a small time window, can be applied iteratively and a backtracking protocol prevents agents from being stuck. We prove that, regardless of their number, all agents are guaranteed to reach their destination within finite time, when the environment is a graph such that all pairs of adjacent nodes belong to a simple cycle of length 3 or more (e.g., biconnected). Our implementation of PIBT can be fully decentralized without global communication. Experimental results over various scenarios confirm that PIBT is adequate both for finding paths in large environments with many agents, as well as for conveying packages in an automated warehouse.
#2971

Improved Heuristics for Multi-Agent Path Finding with Conflict-Based Search
Jiaoyang Li, Ariel Felner, Eli Boyarski, Hang Ma, Sven Koenig
Details | PDF

Multi-agent Planning

Conflict-Based Search (CBS) and its enhancements are among the strongest algorithms for Multi-Agent Path Finding. Recent work introduced an admissible heuristic to guide the high-level search of CBS. In this work, we prove the limitation of this heuristic, as it is based on cardinal conflicts only. We then introduce two new admissible heuristics by reasoning about the pairwise dependencies between agents. Empirically, CBS with either new heuristic significantly improves the success rate over CBS with the recent heuristic and reduces the number of expanded nodes and runtime by up to a factor of 50.
#5401

Multi-Robot Planning Under Uncertain Travel Times and Safety Constraints
Masoumeh Mansouri, Bruno Lacerda, Nick Hawes, Federico Pecora
Details | PDF

Multi-agent Planning

We present a novel modelling and planning approach for multi-robot systems under uncertain travel times. The approach uses generalised stochastic Petri nets (GSPNs) to model desired team behaviour, and allows to specify safety constraints and rewards. The GSPN is interpreted as a Markov decision process (MDP) for which we can generate policies that optimise the requirements. This representation is more compact than the equivalent multi-agent MDP, allowing us to scale better. Furthermore, it naturally allows for asynchronous execution of the generated policies across the robots, yielding smoother team behaviour. We also describe how the integration of the GSPN with a lower-level team controller allows for accurate expectations on team performance. We evaluate our approach on an industrial scenario, showing that it outperforms hand-crafted policies used in current practice.
#10985

(Journal track) Implicitly Coordinated Multi-Agent Path Finding under Destination Uncertainty: Success Guarantees and Computational Complexity
Bernhard Nebel, Thomas Bolander, Thorsten Engesser, Robert Mattmüller
Details | PDF

Multi-agent Planning

In multi-agent path finding, it is usually assumed that planning is performed centrally and that the destinations of the agents are common knowledge. We will drop both assumptions and analyze under which conditions it can be guaranteed that the agents reach their respective destinations using implicitly coordinated plans without communication.
#2916

Unifying Search-based and Compilation-based Approaches to Multi-agent Path Finding through Satisfiability Modulo Theories
Pavel Surynek
Details | PDF

Multi-agent Planning

We unify search-based and compilation-based approaches to multi-agent path finding (MAPF) through satisfiability modulo theories (SMT). The task in MAPF is to navigate agents in an undirected graph to given goal vertices so that they do not collide. We rephrase Conflict-Based Search (CBS), one of the state-of-the-art algorithms for optimal MAPF solving, in the terms of SMT. This idea combines SAT-based solving known from MDD-SAT, a SAT-based optimal MAPF solver, at the low-level with conflict elimination of CBS at the high-level. Where the standard CBS branches the search after a conflict, we refine the propositional model with a disjunctive constraint. Our novel algorithm called SMT-CBS hence does not branch at the high-level but incrementally extends the propositional model. We experimentally compare SMT-CBS with CBS, ICBS, and MDD-SAT.
#2472

Reachability Games in Dynamic Epistemic Logic
Bastien Maubert, Sophie Pinchinat, François Schwarzentruber
Details | PDF

Multi-agent Planning

We define reachability games based on Dynamic Epistemic Logic (DEL), where the players? actions are finely described as DEL action models. We first consider the setting where a controller with perfect information interacts with an environment and aims at reaching some desired state of knowledge regarding the observers of the system. We study the problem of existence of a strategy for the controller, which generalises the classic epistemic planning problem, and we solve it for several types of actions such as public announcements and public actions. We then consider a yet richer setting where observers themselves are players, whose strategies must be based on their observations. We establish several decidability and undecidability results for the problem of existence of a distributed strategy, depending on the type of actions the players can use, and relate them to results from the literature on multiplayer games with imperfect information.

Tuesday 13 10:50 - 12:35 DemoT1 - Demo Talks 1 (2403-2404)

Chair: Matjaz Gams

#11022

Fair and Explainable Dynamic Engagement of Crowd Workers
Han Yu, Yang Liu, Xiguang Wei, Chuyu Zheng, Tianjian Chen, Qiang Yang, Xiong Peng
Details | PDF

Demo Talks 1

Years of rural-urban migration has resulted in a significant population in China seeking ad-hoc work in large urban centres. At the same time, many businesses face large fluctuations in demand for manpower and require more efficient ways to satisfy such demands. This paper outlines AlgoCrowd, an artificial intelligence (AI)-empowered algorithmic crowdsourcing platform. Equipped with an efficient explainable task-worker matching optimization approach designed to focus on fair treatment of workers while maximizing collective utility, the platform provides explainable task recommendations to workers' personal work management mobile apps which are becoming popular, with the aim to address the above societal challenge.
#11024

Multi-Agent Visualization for Explaining Federated Learning
Xiguang Wei, Quan Li, Yang Liu, Han Yu, Tianjian Chen, Qiang Yang
Details | PDF

Demo Talks 1

As an alternative decentralized training approach, Federated Learning enables distributed agents to collaboratively learn a machine learning model while keeping personal/private information on local devices. However, one significant issue of this framework is the lack of transparency, thus obscuring understanding of the working mechanism of Federated Learning systems. This paper proposes a multi-agent visualization system that illustrates what is Federated Learning and how it supports multi-agents coordination. To be specific, it allows users to participate in the Federated Learning empowered multi-agent coordination. The input and output of Federated Learning are visualized simultaneously, which provides an intuitive explanation of Federated Learning for users in order to help them gain deeper understanding of the technology.
#11028

AiD-EM: Adaptive Decision Support for Electricity Markets Negotiations
Tiago Pinto, Zita Vale
Details | PDF

Demo Talks 1

This paper presents the Adaptive Decision Support for Electricity Markets Negotiations (AiD-EM) system. AiD-EM is a multi-agent system that provides decision support to market players by incorporating multiple sub-(agent-based) systems, directed to the decision support of specific problems. These sub-systems make use of different artificial intelligence methodologies, such as machine learning and evolutionary computing, to enable players adaptation in the planning phase and in actual negotiations in auction-based markets and bilateral negotiations. AiD-EM demonstration is enabled by its connection to MASCEM (Multi-Agent Simulator of Competitive Electricity Markets).
#11032

Embodied Conversational AI Agents in a Multi-modal Multi-agent Competitive Dialogue
Rahul R. Divekar, Xiangyang Mou, Lisha Chen, Maíra Gatti de Bayser, Melina Alberio Guerra, Hui Su
Details | PDF

Demo Talks 1

In a setting where two AI agents embodied as animated humanoid avatars are engaged in a conversation with one human and each other, we see two challenges. One, determination by the AI agents about which one of them is being addressed. Two, determination by the AI agents if they may/could/should speak at the end of a turn. In this work we bring these two challenges together and explore the participation of AI agents in multi-party conversations. Particularly, we show two embodied AI shopkeeper agents who sell similar items aiming to get the business of a user by competing with each other on the price. In this scenario, we solve the first challenge by using headpose (estimated by deep learning techniques) to determine who the user is talking to. For the second challenge we use deontic logic to model rules of a negotiation conversation.
#11043

Multi-Agent Path Finding on Ozobots
Roman Barták, Ivan Krasičenko, Jiří Švancara
Details | PDF

Demo Talks 1

Multi-agent path finding (MAPF) is the problem to find collision-free paths for a set of agents (mobile robots) moving on a graph. There exists several abstract models describing the problem with various types of constraints. The demo presents software to evaluate the abstract models when the plans are executed on Ozobots, small mobile robots developed for teaching programming. The software allows users to design the grid-like maps, to specify initial and goal locations of robots, to generate plans using various abstract models implemented in the Picat programming language, to simulate and to visualise execution of these plans, and to translate the plans to command sequences for Ozobots.
#11050

Reagent: Converting Ordinary Webpages into Interactive Software Agents
Matthew Peveler, Jeffrey O. Kephart, Hui Su
Details | PDF

Demo Talks 1

We introduce Reagent, a technology that can be used in conjunction with automated speech recognition to allow users to query and manipulate ordinary webpages via speech and pointing. Reagent can be used out-of-the-box with third-party websites, as it requires neither special instrumentation from website developers nor special domain knowledge to capture semantically-meaningful mouse interactions with structured elements such as tables and plots. When it is unable to infer mappings between domain vocabulary and visible webpage content on its own, Reagent proactively seeks help by engaging in a voice-based interaction with the user.
#11029

Deep Reinforcement Learning for Ride-sharing Dispatching and Repositioning
Zhiwei (Tony) Qin, Xiaocheng Tang, Yan Jiao, Fan Zhang, Chenxi Wang, Qun (Tracy) Li
Details | PDF

Demo Talks 1

In this demo, we will present a simulation-based human-computer interaction of deep reinforcement learning in action on order dispatching and driver repositioning for ride-sharing. Specifically, we will demonstrate through several specially designed domains how we use deep reinforcement learning to train agents (drivers) to have longer optimization horizon and to cooperate to achieve higher objective values collectively.
#11041

Contextual Typeahead Sticker Suggestions on Hike Messenger
Mohamed Hanoosh, Abhishek Laddha, Debdoot Mukherjee
Details | PDF

Demo Talks 1

In this demonstration, we present Hike's sticker recommendation system, which helps users choose the right sticker to substitute the next message that they intend to send in a chat. We describe how the system addresses the issue of numerous orthographic variations for chat messages and operates under 20 milliseconds with low CPU and memory footprint on device.
#11023

InterSpot: Interactive Spammer Detection in Social Media
Kaize Ding, Jundong Li, Shivam Dhar, Shreyash Devan, Huan Liu
Details | PDF

Demo Talks 1

Spammer detection in social media has recently received increasing attention due to the rocketing growth of user-generated data. Despite the empirical success of existing systems, spammers may continuously evolve over time to impersonate normal users while new types of spammers may also emerge to combat with the current detection system, leading to the fact that a built system will gradually lose its efficacy in spotting spammers. To address this issue, grounded on the contextual bandit model, we present a novel system for conducting interactive spammer detection. We demonstrate our system by showcasing the interactive learning process, which allows the detection model to keep optimizing its detection strategy through incorporating the feedback information from human experts.

Tuesday 13 10:50 - 12:35 Early Career 1 - Early Career Spotlight 1 (2405-2406)

Chair: Louise Trave

#11058

From Data to Knowledge Engineering for Cybersecurity
Gerardo I. Simari
Details | PDF

Early Career Spotlight 1

Data present in a wide array of platforms that are part of today's information systems lies at the foundation of many decision making processes, as we have now come to depend on social media, videos, news, forums, chats, ads, maps, and many other data sources for our daily lives. In this article, we first discuss how such data sources are involved in threats to systems' integrity, and then how they can be leveraged along with knowledge-based tools to tackle a set of challenges in the cybersecurity domain. Finally, we present a brief discussion of our roadmap for research and development in the near future to address the set of ever-evolving cyber threats that our systems face every day.
#11060

The Quest For "Always-On" Autonomous Mobile Robots
Joydeep Biswas
Details | PDF

Early Career Spotlight 1

Building ``always-on'' robots to be deployed over extended periods of time in real human environments is challenging for several reasons. Some fundamental questions that arise in the process include: 1) How can the robot reconcile unexpected differences between its observations and its outdated map of the world? 2) How can we scalably test robots for long-term autonomy? 3) Can a robot learn to predict its own failures, and their corresponding causes? 4) When the robot fails and is unable to recover autonomously, can it utilize partially specified, approximate human corrections to overcome its failures? We summarize our research towards addressing all of these questions. We present 1) Episodic non-Markov Localization to maintain the belief of the robot's location while explicitly reasoning about unmapped observations; 2) a 1,000km challenge to test for long-term autonomy; 3) feature-based and learning-based approaches to predicting failures; and 4) human-in-the-loop SLAM to overcome robot mapping errors, and SMT-based robot transition repair to overcome state machine failures.
#11055

Multiagent Decision Making and Learning in Urban Environments
Akshat Kumar
Details | PDF

Early Career Spotlight 1

Our increasingly interconnected urban environments provide several opportunities to deploy intelligent agents---from self-driving cars, ships to aerial drones---that promise to radically improve productivity and safety. Achieving coordination among agents in such urban settings presents several algorithmic challenges---ability to scale to thousands of agents, addressing uncertainty, and partial observability in the environment. In addition, accurate domain models need to be learned from data that is often noisy and available only at an aggregate level. In this paper, I will overview some of our recent contributions towards developing planning and reinforcement learning strategies to address several such challenges present in large-scale urban multiagent systems.
#11056

What Does the Evidence Say? Models to Help Make Sense of the Biomedical Literature
Byron C. Wallace
Details | PDF

Early Career Spotlight 1

Ideally decisions regarding medical treatments would be informed by the totality of the available evidence. The best evidence we currently have is in published natural language articles describing the conduct and results of clinical trials. Because these are unstructured, it is difficult for domain experts (e.g., physicians) to sort through and appraise the evidence pertaining to a given clinical question. Natural language technologies have the potential to improve access to the evidence via semi-automated processing of the biomedical literature. In this brief paper I highlight work on developing tasks, corpora, and models to support semi-automated evidence retrieval and extraction. The aim is to design models that can consume articles describing clinical trials and automatically extract from these key clinical variables and findings, and estimate their reliability. Completely automating `machine reading' of evidence remains a distant aim given current technologies; the more immediate hope is to use such technologies to help domain experts access and make sense of unstructured biomedical evidence more efficiently, with the ultimate aim of improving patient care. Aside from their practical importance, these tasks pose core NLP challenges that directly motivate methodological innovation.

Tuesday 13 14:00 - 14:50 Invited Talk (D-I)

Chair: Thomas Eiter

Reasoning About The Behavior of AI Systems
Adnan Darwiche

Invited Talk

Tuesday 13 15:00 - 16:00 IJCAI-JAIR Best Paper Prize Session (K)

IJCAI-JAIR

IJCAI-JAIR Best Paper Prize Session

Tuesday 13 15:00 - 16:00 ST: Human AI & ML 1 - Special Track on Human AI and Machine Learning 1 (J)

Chair: Chen Gong

#1462

Playgol: Learning Programs Through Play
Andrew Cropper
Details | PDF

Special Track on Human AI and Machine Learning 1

Children learn though play. We introduce the analogous idea of learning programs through play. In this approach, a program induction system (the learner) is given a set of user-supplied build tasks and initial background knowledge (BK). Before solving the build tasks, the learner enters an unsupervised playing stage where it creates its own play tasks to solve, tries to solve them, and saves any solutions (programs) to the BK. After the playing stage is finished, the learner enters the supervised building stage where it tries to solve the build tasks and can reuse solutions learnt whilst playing. The idea is that playing allows the learner to discover reusable general programs on its own which can then help solve the build tasks. We claim that playing can improve learning performance. We show that playing can reduce the textual complexity of target concepts which in turn reduces the sample complexity of a learner. We implement our idea in Playgol, a new inductive logic programming system. We experimentally test our claim on two domains: robot planning and real-world string transformations. Our experimental results suggest that playing can substantially improve learning performance.
#1545

EL Embeddings: Geometric Construction of Models for the Description Logic EL++
Maxat Kulmanov, Wang Liu-Wei, Yuan Yan, Robert Hoehndorf
Details | PDF

Special Track on Human AI and Machine Learning 1

An embedding is a function that maps entities from one algebraic structure into another while preserving certain characteristics. Embeddings are being used successfully for mapping relational data or text into vector spaces where they can be used for machine learning, similarity search, or similar tasks. We address the problem of finding vector space embeddings for theories in the Description Logic ??⁺⁺ that are also models of the TBox. To find such embeddings, we define an optimization problem that characterizes the model-theoretic semantics of the operators in ??⁺⁺ within ℝⁿ, thereby solving the problem of finding an interpretation function for an ??⁺⁺ theory given a particular domain Δ. Our approach is mainly relevant to large ??⁺⁺ theories and knowledge bases such as the ontologies and knowledge graphs used in the life sciences. We demonstrate that our method can be used for improved prediction of protein--protein interactions when compared to semantic similarity measures or knowledge graph embeddings.
#5010

A Comparative Study of Distributional and Symbolic Paradigms for Relational Learning
Sebastijan Dumancic, Alberto Garcia-Duran, Mathias Niepert
Details | PDF

Special Track on Human AI and Machine Learning 1

Many real-world domains can be expressed as graphs and, more generally, as multi-relational knowledge graphs. Though reasoning and learning with knowledge graphs has traditionally been addressed by symbolic approaches such as Statistical relational learning, recent methods in (deep) representation learning have shown promising results for specialised tasks such as knowledge base completion. These approaches, also known as distributional, abandon the traditional symbolic paradigm by replacing symbols with vectors in Euclidean space. With few exceptions, symbolic and distributional approaches are explored in different communities and little is known about their respective strengths and weaknesses. In this work, we compare distributional and symbolic relational learning approaches on various standard relational classification and knowledge base completion tasks. Furthermore, we analyse the properties of the datasets and relate them to the performance of the methods in the comparison. The results reveal possible indicators that could help in choosing one approach over the other for particular knowledge graphs.
#5666

Synthesizing Datalog Programs using Numerical Relaxation
Xujie Si, Mukund Raghothaman, Kihong Heo, Mayur Naik
Details | PDF

Special Track on Human AI and Machine Learning 1

The problem of learning logical rules from examples arises in diverse fields, including program synthesis, logic programming, and machine learning. Existing approaches either involve solving computationally difficult combinatorial problems, or performing parameter estimation in complex statistical models. In this paper, we present Difflog, a technique to extend the logic programming language Datalog to the continuous setting. By attaching real-valued weights to individual rules of a Datalog program, we naturally associate numerical values with individual conclusions of the program. Analogous to the strategy of numerical relaxation in optimization problems, we can now first determine the rule weights which cause the best agreement between the training labels and the induced values of output tuples, and subsequently recover the classical discrete-valued target program from the continuous optimum. We evaluate Difflog on a suite of 34~benchmark problems from recent literature in knowledge discovery, formal verification, and database query-by-example, and demonstrate significant improvements in learning complex programs with recursive rules, invented predicates, and relations of arbitrary arity.

Tuesday 13 15:00 - 16:00 ML|DL - Deep Learning 2 (L)

Chair: Yahong Han

#2033

Position Focused Attention Network for Image-Text Matching
Yaxiong Wang, Hao Yang, Xueming Qian, Lin Ma, Jing Lu, Biao Li, Xin Fan
Details | PDF

Deep Learning 2

Image-text matching tasks have recently attracted a lot of attention in the computer vision field. The key point of this cross-domain problem is how to accurately measure the similarity between the visual and the textual contents, which demands a fine understanding of both modalities. In this paper, we propose a novel position focused attention network (PFAN) to investigate the relation between the visual and the textual views. In this work, we integrate the object position clue to enhance the visual-text joint-embedding learning. We first split the images into blocks, by which we infer the relative position of region in the image. Then, an attention mechanism is proposed to model the relations between the image region and blocks and generate the valuable position feature, which will be further utilized to enhance the region expression and model a more reliable relationship between the visual image and the textual sentence. Experiments on the popular datasets Flickr30K and MS-COCO show the effectiveness of the proposed method. Besides the public datasets, we also conduct experiments on our collected practical news dataset (Tencent-News) to validate the practical application value of proposed method. As far as we know, this is the first attempt to test the performance on the practical application. Our method can achieve the state-of-art performance on all of these three datasets.
#3819

Hierarchical Representation Learning for Bipartite Graphs
Chong Li, Kunyang Jia, Dan Shen, C.J. Richard Shi, Hongxia Yang
Details | PDF

Deep Learning 2

Recommender systems on E-Commerce platforms track users' online behaviors and recommend relevant items according to each user’s interests and needs. Bipartite graphs that capture both user/item feature and use-item interactions have been demonstrated to be highly effective for this purpose. Recently, graph neural network (GNN) has been successfully applied in representation of bipartite graphs in industrial recommender systems. Providing individualized recommendation on a dynamic platform with billions of users is extremely challenging. A key observation is that the users of an online E-Commerce platform can be naturally clustered into a set of communities. We propose to cluster the users into a set of communities and make recommendations based on the information of the users in the community collectively. More specifically, embeddings are assigned to the communities and the user embedding is decomposed into two parts, each of which captures the community-level generalizations and individualized preferences respectively. The community embedding can be considered as an enhancement to the GNN methods that are inherently flat and do not learn hierarchical representations of graphs. The performance of the proposed algorithm is demonstrated on a public dataset and a world-leading E-Commerce company dataset.
#4010

COP: Customized Deep Model Compression via Regularized Correlation-Based Filter-Level Pruning
Wenxiao Wang, Cong Fu, Jishun Guo, Deng Cai, Xiaofei He
Details | PDF

Deep Learning 2

Neural network compression empowers the effective yet unwieldy deep convolutional neural networks (CNN) to be deployed in resource-constrained scenarios. Most state-of-the-art approaches prune the model in filter-level according to the "importance" of filters. Despite their success, we notice they suffer from at least two of the following problems: 1) The redundancy among filters is not considered because the importance is evaluated independently. 2) Cross-layer filter comparison is unachievable since the importance is defined locally within each layer. Consequently, we must manually specify layer-wise pruning ratios. 3) They are prone to generate sub-optimal solutions because they neglect the inequality between reducing parameters and reducing computational cost. Reducing the same number of parameters in different positions in the network may reduce different computational cost. To address the above problems, we develop a novel algorithm named as COP (correlation-based pruning), which can detect the redundant filters efficiently. We enable the cross-layer filter comparison through global normalization. We add parameter-quantity and computational-cost regularization terms to the importance, which enables the users to customize the compression according to their preference (smaller or faster). Extensive experiments have shown COP outperforms the others significantly. The code is released at https://github.com/ZJULearning/COP.

Tuesday 13 15:00 - 16:00 MTA|RS - Recommender Systems 2 (2701-2702)

Chair: Yong Li

#3499

Explainable Fashion Recommendation: A Semantic Attribute Region Guided Approach
Min Hou, Le Wu, Enhong Chen, Zhi Li, Vincent W. Zheng, Qi Liu
Details | PDF

Recommender Systems 2

In fashion recommender systems, each product usually consists of multiple semantic attributes (e.g., sleeves, collar, etc). When making cloth decisions, people usually show preferences for different semantic attributes (e.g., the clothes with v-neck collar). Nevertheless, most previous fashion recommendation models comprehend the clothing images with a global content representation and lack detailed understanding of users' semantic preferences, which usually leads to inferior recommendation performance. To bridge this gap, we propose a novel Semantic Attribute Explainable Recommender System (SAERS). Specifically, we first introduce a fine-grained interpretable semantic space. We then develop a Semantic Extraction Network (SEN) and Fine-grained Preferences Attention (FPA) module to project users and items into this space, respectively. With SAERS, we are capable of not only providing cloth recommendations for users, but also explaining the reason why we recommend the cloth through intuitive visual attribute semantic highlights in a personalized manner. Extensive experiments conducted on real-world datasets clearly demonstrate the effectiveness of our approach compared with the state-of-the-art methods.
#10969

(Sister Conferences Best Papers Track) Impact of Consuming Suggested Items on the Assessment of Recommendations in User Studies on Recommender Systems
Benedikt Loepp, Tim Donkers, Timm Kleemann, Jürgen Ziegler
Details | PDF

Recommender Systems 2

User studies are increasingly considered important in research on recommender systems. Although participants typically cannot consume any of the recommended items, they are often asked to assess the quality of recommendations and of other aspects related to user experience by means of questionnaires. Not being able to listen to recommended songs or to watch suggested movies, might however limit the validity of the obtained results. Consequently, we have investigated the effect of consuming suggested items. In two user studies conducted in different domains, we showed that consumption may lead to differences in the assessment of recommendations and in questionnaire answers. Apparently, adequately measuring user experience is in some cases not possible without allowing users to consume items. On the other hand, participants sometimes seem to approximate the actual value of recommendations reasonably well depending on domain and provided information.
#5041

Binarized Collaborative Filtering with Distilling Graph Convolutional Network
Haoyu Wang, Defu Lian, Yong Ge
Details | PDF

Recommender Systems 2

The efficiency of top-K item recommendation based on implicit feedback are vital to recommender systems in real world, but it is very challenging due to the lack of negative samples and the large number of candidate items. To address the challenges, we firstly introduce an improved Graph Convolutional Network~(GCN) model with high-order feature interaction considered. Then we distill the ranking information derived from GCN into binarized collaborative filtering, which makes use of binary representation to improve the efficiency of online recommendation. However, binary codes are not only hard to be optimized but also likely to incur the loss of information during the training processing. Therefore, we propose a novel framework to convert the binary constrained optimization problem into an equivalent continuous optimization problem with a stochastic penalty. The binarized collaborative filtering model is then easily optimized by many popular solvers like SGD and Adam. The proposed algorithm is finally evaluated on three real-world datasets and shown the superiority to the competing baselines.
#3878

Co-Attentive Multi-Task Learning for Explainable Recommendation
Zhongxia Chen, Xiting Wang, Xing Xie, Tong Wu, Guoqing Bu, Yining Wang, Enhong Chen
Details | PDF

Recommender Systems 2

Despite widespread adoption, recommender systems remain mostly black boxes. Recently, providing explanations about why items are recommended has attracted increasing attention due to its capability to enhance user trust and satisfaction. In this paper, we propose a co-attentive multi-task learning model for explainable recommendation. Our model improves both prediction accuracy and explainability of recommendation by fully exploiting the correlations between the recommendation task and the explanation task. In particular, we design an encoder-selector-decoder architecture inspired by human's information-processing model in cognitive psychology. We also propose a hierarchical co-attentive selector to effectively model the cross knowledge transferred for both tasks. Our model not only enhances prediction accuracy of the recommendation task, but also generates linguistic explanations that are fluent, useful, and highly personalized. Experiments on three public datasets demonstrate the effectiveness of our model.

Tuesday 13 15:00 - 16:00 HAI|HCC - Human Computation and Crowdsourcing (2703-2704)

Chair: Chengqi Zhang

#630

MiSC: Mixed Strategies Crowdsourcing
Ching Yun Ko, Rui Lin, Shu Li, Ngai Wong
Details | PDF

Human Computation and Crowdsourcing

Popular crowdsourcing techniques mostly focus on evaluating workers' labeling quality before adjusting their weights during label aggregation. Recently, another cohort of models regard crowdsourced annotations as incomplete tensors and recover unfilled labels by tensor completion. However, mixed strategies of the two methodologies have never been comprehensively investigated, leaving them as rather independent approaches. In this work, we propose MiSC ( Mixed Strategies Crowdsourcing), a versatile framework integrating arbitrary conventional crowdsourcing and tensor completion techniques. In particular, we propose a novel iterative Tucker label aggregation algorithm that outperforms state-of-the-art methods in extensive experiments.
#1477

Multiple Noisy Label Distribution Propagation for Crowdsourcing
Hao Zhang, Liangxiao Jiang, Wenqiang Xu
Details | PDF

Human Computation and Crowdsourcing

Crowdsourcing services provide a fast, efficient, and cost-effective means of obtaining large labeled data for supervised learning. Ground truth inference, also called label integration, designs proper aggregation strategies to infer the unknown true label of each instance from the multiple noisy label set provided by ordinary crowd workers. However, to the best of our knowledge, nearly all existing label integration methods focus solely on the multiple noisy label set itself of the individual instance while totally ignoring the intercorrelation among multiple noisy label sets of different instances. To solve this problem, a multiple noisy label distribution propagation (MNLDP) method is proposed in this study. MNLDP first transforms the multiple noisy label set of each instance into its multiple noisy label distribution and then propagates its multiple noisy label distribution to its nearest neighbors. Consequently, each instance absorbs a fraction of the multiple noisy label distributions from its nearest neighbors and yet simultaneously maintains a fraction of its own original multiple noisy label distribution. Promising experimental results on simulated and real-world datasets validate the effectiveness of our proposed method.
#10968

(Sister Conferences Best Papers Track) Quality Control Attack Schemes in Crowdsourcing
Alessandro Checco, Jo Bates, Gianluca Demartini
Details | PDF

Human Computation and Crowdsourcing

An important precondition to build effective AI models is the collection of training data at scale. Crowdsourcing is a popular methodology to achieve this goal. Its adoption introduces novel challenges in data quality control, to deal with under-performing and malicious annotators. One of the most popular quality assurance mechanisms, especially in paid micro-task crowdsourcing, is the use of a small set of pre-annotated tasks as gold standard, to assess in real time the annotators quality. In this paper, we highlight a set of vulnerabilities this scheme suffers: a group of colluding crowd workers can easily implement and deploy a decentralised machine learning inferential system to detect and signal which parts of the task are more likely to be gold questions, making them ineffective as a quality control tool. Moreover, we demonstrate how the most common countermeasures against this attack are ineffective in practical scenarios. The basic architecture of the inferential system is composed of a browser plug-in and an external server where the colluding workers can share information. We implement and validate the attack scheme, by means of experiments on real-world data from a popular crowdsourcing platform.
#4194

Boosting for Comparison-Based Learning
Michael Perrot, Ulrike von Luxburg
Details | PDF

Human Computation and Crowdsourcing

We consider the problem of classification in a comparison-based setting: given a set of objects, we only have access to triplet comparisons of the form ``object A is closer to object B than to object C.'' In this paper we introduce TripletBoost, a new method that can learn a classifier just from such triplet comparisons. The main idea is to aggregate the triplets information into weak classifiers, which can subsequently be boosted to a strong classifier. Our method has two main advantages: (i) it is applicable to data from any metric space, and (ii) it can deal with large scale problems using only passively obtained and noisy triplets. We derive theoretical generalization guarantees and a lower bound on the number of necessary triplets, and we empirically show that our method is both competitive with state of the art approaches and resistant to noise.

Tuesday 13 15:00 - 16:00 AMS|TR - Trust and Reputation (2705-2706)

Chair: Catholijn Jonker

#486

A Value-based Trust Assessment Model for Multi-agent Systems
Kinzang Chhogyal, Abhaya Nayak, Aditya Ghose, Hoa K. Dam
Details | PDF

Trust and Reputation

An agent's assessment of its trust in another agent is commonly taken to be a measure of the reliability/predictability of the latter's actions. It is based on the trustor's past observations of the behaviour of the trustee and requires no knowledge of the inner-workings of the trustee. However, in situations that are new or unfamiliar, past observations are of little help in assessing trust. In such cases, knowledge about the trustee can help. A particular type of knowledge is that of values - things that are important to the trustor and the trustee. In this paper, based on the premise that the more values two agents share, the more they should trust one another, we propose a simple approach to trust assessment between agents based on values, taking into account if agents trust cautiously or boldly, and if they depend on others in carrying out a task.
#4579

Spotting Collective Behaviour of Online Frauds in Customer Reviews
Sarthika Dhawan, Siva Charan Reddy Gangireddy, Shiv Kumar, Tanmoy Chakraborty
Details | PDF

Trust and Reputation

Online reviews play a crucial role in deciding the quality before purchasing any product. Unfortunately, spammers often take advantage of online review forums by writing fraud reviews to promote/demote certain products. It may turn out to be more detrimental when such spammers collude and collectively inject spam reviews as they can take complete control of users' sentiment due to the volume of fraud reviews they inject. Group spam detection is thus more challenging than individual-level fraud detection due to unclear definition of a group, variation of inter-group dynamics, scarcity of labeled group-level spam data, etc. Here, we propose DeFrauder, an unsupervised method to detect online fraud reviewer groups. It first detects candidate fraud groups by leveraging the underlying product review graph and incorporating several behavioral signals which model multi-faceted collaboration among reviewers. It then maps reviewers into an embedding space and assigns a spam score to each group such that groups comprising spammers with highly similar behavioral traits achieve high spam score. While comparing with five baselines on four real-world datasets (two of them were curated by us), DeFrauder shows superior performance by outperforming the best baseline with 17.11% higher NDCG@50 (on average) across datasets.
#5274

FaRM: Fair Reward Mechanism for Information Aggregation in Spontaneous Localized Settings
Moin Hussain Moti, Dimitris Chatzopoulos, Pan Hui, Sujit Gujar
Details | PDF

Trust and Reputation

Although peer prediction markets are widely used in crowdsourcing to aggregate information from agents, they often fail to reward the participating agents equitably. Honest agents can be wrongly penalized if randomly paired with dishonest ones. In this work, we introduce selective and cumulative fairness. We characterize a mechanism as fair if it satisfies both notions and present FaRM, a representative mechanism we designed. FaRM is a Nash incentive mechanism that focuses on information aggregation for spontaneous local activities which are accessible to a limited number of agents without assuming any prior knowledge of the event. All the agents in the vicinity observe the same information. FaRM uses (i) a report strength score to remove the risk of random pairing with dishonest reporters, (ii) a consistency score to measure an agent's history of accurate reports and distinguish valuable reports, (iii) a reliability score to estimate the probability of an agent to collude with nearby agents and prevents agents from getting swayed, and (iv) a location robustness score to filter agents who try to participate without being present in the considered setting. Together, report strength, consistency, and reliability represent a fair reward given to agents based on their reports.
#5745

Identifying vulnerabilities in trust and reputation systems
Taha D. Güneş, Long Tran-Thanh, Timothy J. Norman
Details | PDF

Trust and Reputation

Online communities use trust and reputation systems to assist their users in evaluating other parties. Due to the preponderance of these systems, malicious entities have a strong incentive to attempt to influence them, and strategies employed are increasingly sophisticated. Current practice is to evaluate trust and reputation systems against known attacks, and hence are heavily reliant on expert analysts. We present a novel method for automatically identifying vulnerabilities in such systems by formulating the problem as a derivative-free optimisation problem and applying efficient sampling methods. We illustrate the application of this method for attacks that involve the injection of false evidence, and identify vulnerabilities in existing trust models. In this way, we provide reliable and objective means to assess how robust trust and reputation systems are to different kinds of attacks.

Tuesday 13 15:00 - 16:00 PS|S - Scheduling (2601-2602)

Chair: Gerhard Friedrich

#253

Faster Dynamic Controllability Checking in Temporal Networks with Integer Bounds
Nikhil Bhargava, Brian C. Williams
Details | PDF

Scheduling

Simple Temporal Networks with Uncertainty (STNUs) provide a useful formalism with which to reason about events and the temporal constraints that apply to them. STNUs are in particular notable because they facilitate reasoning over stochastic, or uncontrollable, actions and their corresponding durations. To evaluate the feasibility of a set of constraints associated with an STNU, one checks the network's \textit{dynamic controllability}, which determines whether an adaptive schedule can be constructed on-the-fly. Our work improves the runtime of checking the dynamic controllability of STNUs with integer bounds to O(min(mn, m sqrt(n) log N) + km + k^2n + kn log n). Our approach pre-processes the STNU using an existing O(n^3) dynamic controllability checking algorithm and provides tighter bounds on its runtime. This makes our work easily adaptable to other algorithms that rely on checking variants of dynamic controllability.
#2658

Scheduling Jobs with Stochastic Processing Time on Parallel Identical Machines
Richard Stec, Antonin Novak, Premysl Sucha, Zdenek Hanzalek
Details | PDF

Scheduling

Many real-world scheduling problems are characterized by uncertain parameters. In this paper, we study a classical parallel machine scheduling problem where the processing time of jobs is given by a normal distribution. The objective is to maximize the probability that jobs are completed before a given common due date. This study focuses on the computational aspect of this problem, and it proposes a Branch-and-Price approach for solving it. The advantage of our method is that it scales very well with the increasing number of machines and is easy to implement. Furthermore, we propose an efficient lower bound heuristics. The experimental results show that our method outperforms the existing approaches.
#4311

Fair Online Allocation of Perishable Goods and its Application to Electric Vehicle Charging
Enrico H. Gerding, Alvaro Perez-Diaz, Haris Aziz, Serge Gaspers, Antonia Marcu, Nicholas Mattei, Toby Walsh
Details | PDF

Scheduling

We consider mechanisms for the online allocation of perishable resources such as energy or computational power. A main application is electric vehicle charging where agents arrive and leave over time. Unlike previous work, we consider mechanisms without money, and a range of objectives including fairness and efficiency. In doing so, we extend the concept of envy-freeness to online settings. Furthermore, we explore the trade-offs between different objectives and analyse their theoretical properties both in online and offline settings. We then introduce novel online scheduling algorithms and compare them in terms of both their theoretical properties and empirical performance.
#10979

(Journal track) Complexity Bounds for the Controllability of Temporal Networks with Conditions, Disjunctions, and Uncertainty
Nikhil Bhargava, Brian C. Williams
Details | PDF

Scheduling

In temporal planning, many different temporal network formalisms are used to model real world situations. Each of these formalisms has different features which affect how easy it is to determine whether the underlying network of temporal constraints is consistent. While many of the simpler models have been well-studied from a computational complexity perspective, the algorithms developed for advanced models which combine features have very loose complexity bounds. In this work, we provide tight completeness bounds for strong, weak, and dynamic controllability checking of temporal networks that have conditions, disjunctions, and temporal uncertainty. Our work exposes some of the subtle differences between these different structures and, remarkably, establishes a guarantee that all of these problems are computable in PSPACE.

Tuesday 13 15:00 - 16:00 KRR|RKB - Reasoning about Knowlege and Belief (2603-2604)

Chair: Gerhard Lakemeyer

#4209

The Complexity of Model Checking Knowledge and Time
Laura Bozzelli, Bastien Maubert, Aniello Murano
Details | PDF

Reasoning about Knowlege and Belief

We establish the precise complexity of the model checking problem for the main logics of knowledge and time. While this problem was known to be non-elementary for agents with perfect recall, with a number of exponentials that increases with the alternation of knowledge operators, the precise complexity of the problem when the maximum alternation is fixed has been an open problem for twenty years. We close it by establishing improved upper bounds for CTL* with knowledge, and providing matching lower bounds that also apply for epistemic extensions of LTL and CTL.
#4563

Converging on Common Knowledge
Dominik Klein, Rasmus Kræmmer Rendsvig
Details | PDF

Reasoning about Knowlege and Belief

Common knowledge, as is well known, is not attainable in finite time by unreliable communication, thus hindering perfect coordination. Focusing on the coordinated attack problem modeled using dynamic epistemic logic, this paper discusses unreliable communication protocols from a topological perspective and asks "If the generals may communicate indefinitely, will they then *converge* to a state of common knowledge?" We answer by making precise and showing the following: *common knowledge is attainable if, and only if, we do not care about common knowledge*.
#2601

A Modal Characterization Theorem for a Probabilistic Fuzzy Description Logic
Paul Wild, Lutz Schröder, Dirk Pattinson, Barbara König
Details | PDF

Reasoning about Knowlege and Belief

The fuzzy modality probably is interpreted over probabilistic type spaces by taking expected truth values. The arising probabilistic fuzzy description logic is invariant under probabilistic bisimilarity; more informatively, it is non-expansive wrt. a suitable notion of behavioural distance. In the present paper, we provide a characterization of the expressive power of this logic based on this observation: We prove a probabilistic analogue of the classical van Benthem theorem, which states that modal logic is precisely the bisimulation-invariant fragment of first-order logic. Specifically, we show that every formula in probabilistic fuzzy first-order logic that is non-expansive wrt. behavioural distance can be approximated by concepts of bounded rank in probabilistic fuzzy description logic.
#4270

Accelerated Inference Framework of Sparse Neural Network Based on Nested Bitmask Structure
Yipeng Zhang, Bo Du, Lefei Zhang, Rongchun Li, Yong Dou
Details | PDF

Reasoning about Knowlege and Belief

In order to satisfy the ever-growing demand for high-performance processors for neural networks, the state-of-the-art processing units tend to use application-oriented circuits to replace Processing Engine (PE) on the GPU under circumstances where low-power solutions are required. The application-oriented PE is fully optimized in terms of the circuit architecture and eliminates incorrect data dependency and instructional redundancy. In this paper, we propose a novel encoding approach on a sparse neural network after pruning. We partition the weight matrix into numerous blocks and use a low-rank binary map to represent the validation of these blocks. Furthermore, the elements in each nonzero block are also encoded into two submatrices: one is the binary stream discriminating the zero/nonzero position, while the other is the pure nonzero elements stored in the FIFO. In the experimental part, we implement a well pre-trained sparse neural network on the Xilinx FPGA VC707. Experimental results show that our algorithm outperforms the other benchmarks. Our approach has successfully optimized the throughput and the energy efficiency to deal with a single frame. Accordingly, we contend that Nested Bitmask Neural Network (NBNN), is an efficient neural network structure with only minor accuracy loss on the SoC system.

Tuesday 13 15:00 - 16:00 NLP|MT - Machine Translation (2605-2606)

Chair: Lemao Liu

#1653

Sharing Attention Weights for Fast Transformer
Tong Xiao, Yinqiao Li, Jingbo Zhu, Zhengtao Yu, Tongran Liu
Details | PDF

Machine Translation

Recently, the Transformer machine translation system has shown strong results by stacking attention layers on both the source and target-language sides. But the inference of this model is slow due to the heavy use of dot-product attention in auto-regressive decoding. In this paper we speed up Transformer via a fast and lightweight attention model. More specifically, we share attention weights in adjacent layers and enable the efficient re-use of hidden states in a vertical manner. Moreover, the sharing policy can be jointly learned with the MT model. We test our approach on ten WMT and NIST OpenMT tasks. Experimental results show that it yields an average of 1.3X speed-up (with almost no decrease in BLEU) on top of a state-of-the-art implementation that has already adopted a cache for fast inference. Also, our approach obtains a 1.8X speed-up when it works with the AAN model. This is even 16 times faster than the baseline with no use of the attention cache.
#1859

From Words to Sentences: A Progressive Learning Approach for Zero-resource Machine Translation with Visual Pivots
Shizhe Chen, Qin Jin, Jianlong Fu
Details | PDF

Machine Translation

The neural machine translation model has suffered from the lack of large-scale parallel corpora. In contrast, we humans can learn multi-lingual translations even without parallel texts by referring our languages to the external world. To mimic such human learning behavior, we employ images as pivots to enable zero-resource translation learning. However, a picture tells a thousand words, which makes multi-lingual sentences pivoted by the same image noisy as mutual translations and thus hinders the translation model learning. In this work, we propose a progressive learning approach for image-pivoted zero-resource machine translation. Since words are less diverse when grounded in the image, we first learn word-level translation with image pivots, and then progress to learn the sentence-level translation by utilizing the learned word translation to suppress noises in image-pivoted multi-lingual sentences. Experimental results on two widely used image-pivot translation datasets, IAPR-TC12 and Multi30k, show that the proposed approach significantly outperforms other state-of-the-art methods.
#3742

Polygon-Net: A General Framework for Jointly Boosting Multiple Unsupervised Neural Machine Translation Models
Chang Xu, Tao Qin, Gang Wang, Tie-Yan Liu
Details | PDF

Machine Translation

Neural machine translation (NMT) has achieved great success. However, collecting large-scale parallel data for training is costly and laborious. Recently, unsupervised neural machine translation has attracted more and more attention, due to its demand for monolingual corpus only, which is common and easy to obtain, and its great potentials for the low-resource or even zero-resource machine translation. In this work, we propose a general framework called Polygon-Net, which leverages multi auxiliary languages for jointly boosting unsupervised neural machine translation models. Specifically, we design a novel loss function for multi-language unsupervised neural machine translation. In addition, different from the literature that just updating one or two models individually, Polygon-Net enables multiple unsupervised models in the framework to update in turn and enhance each other for the first time. In this way, multiple unsupervised translation models are associated with each other for training to achieve better performance. Experiments on the benchmark datasets including UN Corpus and WMT show that our approach significantly improves over the two-language based methods, and achieves better performance with more languages introduced to the framework.
#4441

Correct-and-Memorize: Learning to Translate from Interactive Revisions
Rongxiang Weng, Hao Zhou, Shujian Huang, Lei Li, Yifan Xia, Jiajun Chen
Details | PDF

Machine Translation

State-of-the-art machine translation models are still not on a par with human translators. Previous work takes human interactions into the neural machine translation process to obtain improved results in target languages. However, not all model--translation errors are equal -- some are critical while others are minor. In the meanwhile, same translation mistakes occur repeatedly in similar context. To solve both issues, we propose CAMIT, a novel method for translating in an interactive environment. Our proposed method works with critical revision instructions, therefore allows human to correct arbitrary words in model-translated sentences. In addition, CAMIT learns from and softly memorizes revision actions based on the context, alleviating the issue of repeating mistakes. Experiments in both ideal and real interactive translation settings demonstrate that our proposed CAMIT enhances machine translation results significantly while requires fewer revision instructions from human compared to previous methods.

Tuesday 13 15:00 - 16:00 CV|RDCIMRSI - Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 1 (2501-2502)

Chair: Yan Shuicheng

#3159

Binarized Neural Networks for Resource-Efficient Hashing with Minimizing Quantization Loss
Feng Zheng, Cheng Deng, Heng Huang
Details | PDF

Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 1

In order to solve the problem of memory consumption and computational requirements, this paper proposes a novel learning binary neural network framework to achieve a resource-efficient deep hashing. In contrast to floating-point (32-bit) full-precision networks, the proposed method achieves a 32x model compression rate. At the same time, computational burden in convolution is greatly reduced due to efficient Boolean operations. To this end, in our framework, a new quantization loss defined between the binary weights and the learned real values is minimized to reduce the model distortion, while, by minimizing a binary entropy function, the discrete optimization is successfully avoided and the stochastic gradient descend method can be used smoothly. More importantly, we provide two theories to demonstrate the necessity and effectiveness of minimizing the quantization losses for both weights and activations. Numerous experiments show that the proposed method can achieve fast code generation without sacrificing accuracy.
#3518

DSRN: A Deep Scale Relationship Network for Scene Text Detection
Yuxin Wang, Hongtao Xie, Zilong Fu, Yongdong Zhang
Details | PDF

Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 1

Nowadays, scene text detection has become increasingly important and popular. However, the large variance of text scale remains the main challenge and limits the detection performance in most previous methods. To address this problem, we propose an end-to-end architecture called Deep Scale Relationship Network (DSRN) to map multi-scale convolution features onto a scale invariant space to obtain uniform activation of multi-size text instances. Firstly, we develop a Scale-transfer module to transfer the multi-scale feature maps to a unified dimension. Due to the heterogeneity of features, simply concatenating feature maps with multi-scale information would limit the detection performance. Thus we propose a Scale Relationship module to aggregate the multi-scale information through bi-directional convolution operations. Finally, to further reduce the miss-detected instances, a novel Recall Loss is proposed to force the network to concern more about miss-detected text instances by up-weighting poor-classified examples. Compared with previous approaches, DSRN efficiently handles the large-variance scale problem without complex hand-crafted hyperparameter settings (e.g. scale of default boxes) and complicated post processing. On standard datasets including ICDAR2015 and MSRA-TD500, the proposed algorithm achieves the state-of-art performance with impressive speed (8.8 FPS on ICDAR2015 and 13.3 FPS on MSRA-TD500).
#4861

Detecting Robust Co-Saliency with Recurrent Co-Attention Neural Network
Bo Li, Zhengxing Sun, Lv Tang, Yunhan Sun, Jinlong Shi
Details | PDF

Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 1

Effective feature representations which should not only express the images individual properties, but also reflect the interaction among group images are essentially crucial for robust co-saliency detection. This paper proposes a novel deep learning co-saliency detection approach which simultaneously learns single image properties and robust group feature in a recurrent manner. Specifically, our network first extracts the semantic features of each image. Then, a specially designed Recurrent Co-Attention Unit (RCAU) will explore all images in the group recurrently to generate the final group representation using the co-attention between images, and meanwhile suppresses noisy information. The group feature which contains complementary synergetic information is later merged with the single image features which express the unique properties to infer robust co-saliency. We also propose a novel co-perceptual loss to make full use of interactive relationships of whole images in the training group as the supervision in our end-to-end training process. Extensive experimental results demonstrate the superiority of our approach in comparison with the state-of-the-art methods.
#3859

Deep Recurrent Quantization for Generating Sequential Binary Codes
Jingkuan Song, Xiaosu Zhu, Lianli Gao, Xin-Shun Xu, Wu Liu, Heng Tao Shen
Details | PDF

Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 1

Quantization has been an effective technology in ANN (approximate nearest neighbour) search due to its high accuracy and fast search speed. To meet the requirement of different applications, there is always a trade-off between retrieval accuracy and speed, reflected by variable code lengths. However, to encode the dataset into different code lengths, existing methods need to train several models, where each model can only produce a specific code length. This incurs a considerable training time cost, and largely reduces the flexibility of quantization methods to be deployed in real applications. To address this issue, we propose a Deep Recurrent Quantization (DRQ) architecture which can generate sequential binary codes. To the end, when the model is trained, a sequence of binary codes can be generated and the code length can be easily controlled by adjusting the number of recurrent iterations. A shared codebook and a scalar factor is designed to be the learnable weights in the deep recurrent quantization block, and the whole framework can be trained in an end-to-end manner. As far as we know, this is the first quantization method that can be trained once and generate sequential binary codes. Experimental results on the benchmark datasets show that our model achieves comparable or even better performance compared with the state-of-the-art for image retrieval. But it requires significantly less number of parameters and training times. Our code is published online: https://github.com/cfm-uestc/DRQ.

Tuesday 13 15:00 - 16:00 ML|C - Classification 2 (2503-2504)

Chair: Lianhua Chi

#222

Learning Sound Events from Webly Labeled Data
Anurag Kumar, Ankit Shah, Alexander Hauptmann, Bhiksha Raj
Details | PDF

Classification 2

In the last couple of years, weakly labeled learning has turned out to be an exciting approach for audio event detection. In this work, we introduce webly labeled learning for sound events which aims to remove human supervision altogether from the learning process. We first develop a method of obtaining labeled audio data from the web (albeit noisy), in which no manual labeling is involved. We then describe methods to efficiently learn from these webly labeled audio recordings. In our proposed system, WeblyNet, two deep neural networks co-teach each other to robustly learn from webly labeled data, leading to around 17% relative improvement over the baseline method. The method also involves transfer learning to obtain efficient representations.
#2957

Persistence Bag-of-Words for Topological Data Analysis
Bartosz Zieliński, Michał Lipiński, Mateusz Juda, Matthias Zeppelzauer, Paweł Dłotko
Details | PDF

Classification 2

Persistent homology (PH) is a rigorous mathematical theory that provides a robust descriptor of data in the form of persistence diagrams (PDs). PDs exhibit, however, complex structure and are difficult to integrate in today's machine learning workflows. This paper introduces persistence bag-of-words: a novel and stable vectorized representation of PDs that enables the seamless integration with machine learning. Comprehensive experiments show that the new representation achieves state-of-the-art performance and beyond in much less time than alternative approaches.
#4977

Improving the Robustness of Deep Neural Networks via Adversarial Training with Triplet Loss
Pengcheng Li, Jinfeng Yi, Bowen Zhou, Lijun Zhang
Details | PDF

Classification 2

Recent studies have highlighted that deep neural networks (DNNs) are vulnerable to adversarial examples. In this paper, we improve the robustness of DNNs by utilizing techniques of Distance Metric Learning. Specifically, we incorporate Triplet Loss, one of the most popular Distance Metric Learning methods, into the framework of adversarial training. Our proposed algorithm, Adversarial Training with Triplet Loss (AT2L), substitutes the adversarial example against the current model for the anchor of triplet loss to effectively smooth the classification boundary. Furthermore, we propose an ensemble version of AT2L, which aggregates different attack methods and model structures for better defense effects. Our empirical studies verify that the proposed approach can significantly improve the robustness of DNNs without sacrificing accuracy. Finally, we demonstrate that our specially designed triplet loss can also be used as a regularization term to enhance other defense methods.
#537

Graph and Autoencoder Based Feature Extraction for Zero-shot Learning
Yang Liu, Deyan Xie, Quanxue Gao, Jungong Han, Shujian Wang, Xinbo Gao
Details | PDF

Classification 2

Zero-shot learning (ZSL) aims to build models to recognize novel visual categories that have no associated labelled training samples. The basic framework is to transfer knowledge from seen classes to unseen classes by learning the visual-semantic embedding. However, most of approaches do not preserve the underlying sub-manifold of samples in the embedding space. In addition, whether the mapping can precisely reconstruct the original visual feature is not investigated in-depth. In order to solve these problems, we formulate a novel framework named Graph and Autoencoder Based Feature Extraction (GAFE) to seek a low-rank mapping to preserve the sub-manifold of samples. Taking the encoder-decoder paradigm, the encoder part learns a mapping from the visual feature to the semantic space, while decoder part reconstructs the original features with the learned mapping. In addition, a graph is constructed to guarantee the learned mapping can preserve the local intrinsic structure of the data. To this end, an L21 norm sparsity constraint is imposed on the mapping to identify features relevant to the target domain. Extensive experiments on five attribute datasets demonstrate the effectiveness of the proposed model.

Tuesday 13 15:00 - 16:00 ML|DM - Data Mining 2 (2505-2506)

Chair: Ming Li

#653

DeepCU: Integrating both Common and Unique Latent Information for Multimodal Sentiment Analysis
Sunny Verma, Chen Wang, Liming Zhu, Wei Liu
Details | PDF

Data Mining 2

Multimodal sentiment analysis combines information available from visual, textual, and acoustic representations for sentiment prediction. The recent multimodal fusion schemes combine multiple modalities as a tensor and obtain either; the common information by utilizing neural networks, or the unique information by modeling low-rank representation of the tensor. However, both of these information are essential as they render inter-modal and intra-modal relationships of the data. In this research, we first propose a novel deep architecture to extract the common information from the multi-mode representations. Furthermore, we propose unique networks to obtain the modality-specific information that enhances the generalization performance of our multimodal system. Finally, we integrate these two aspects of information via a fusion layer and propose a novel multimodal data fusion architecture, which we call DeepCU (Deep network with both Common and Unique latent information). The proposed DeepCU consolidates the two networks for joint utilization and discovery of all-important latent information. Comprehensive experiments are conducted to demonstrate the effectiveness of utilizing both common and unique information discovered by DeepCU on multiple real-world datasets. The source code of proposed DeepCU is available at https://github.com/sverma88/DeepCU-IJCAI19.
#3577

Commit Message Generation for Source Code Changes
Shengbin Xu, Yuan Yao, Feng Xu, Tianxiao Gu, Hanghang Tong, Jian Lu
Details | PDF

Data Mining 2

Commit messages, which summarize the source code changes in natural language, are essential for program comprehension and software evolution understanding. Unfortunately, due to the lack of direct motivation, commit messages are sometimes neglected by developers, making it necessary to automatically generate such messages. State-of-the-art adopts learning based approaches such as neural machine translation models for the commit message generation problem. However, they tend to ignore the code structure information and suffer from the out-of-vocabulary issue. In this paper, we propose CoDiSum to address the above two limitations. In particular, we first extract both code structure and code semantics from the source code changes, and then jointly model these two sources of information so as to better learn the representations of the code changes. Moreover, we augment the model with copying mechanism to further mitigate the out-of-vocabulary issue. Experimental evaluations on real data demonstrate that the proposed approach significantly outperforms the state-of-the-art in terms of accurately generating the commit messages.
#6385

Recommending Links to Maximize the Influence in Social Networks
Federico Corò, Gianlorenzo D'Angelo, Yllka Velaj
Details | PDF

Data Mining 2

Social link recommendation systems, like "People-you-may-know" on Facebook, "Who-to-follow" on Twitter, and "Suggested-Accounts" on Instagram assist the users of a social network in establishing new connections with other users. While these systems are becoming more and more important in the growth of social media, they tend to increase the popularity of users that are already popular. Indeed, since link recommenders aim at predicting users' behavior, they accelerate the creation of links that are likely to be created in the future, and, as a consequence, they reinforce social biases by suggesting few (popular) users, while giving few chances to the majority of users to build new connections and increase their popularity.In this paper we measure the popularity of a user by means of its social influence, which is its capability to influence other users' opinions, and we propose a link recommendation algorithm that evaluates the links to suggest according to their increment in social influence instead of their likelihood of being created. In detail, we give a constant factor approximation algorithm for the problem of maximizing the social influence of a given set of target users by suggesting a fixed number of new connections. We experimentally show that, with few new links and small computational time, our algorithm is able to increase by far the social influence of the target users. We compare our algorithm with several baselines and show that it is the most effective one in terms of increased influence.
#5247

Fairwalk: Towards Fair Graph Embedding
Tahleen Rahman, Bartlomiej Surma, Michael Backes, Yang Zhang
Details | PDF

Data Mining 2

Graph embeddings have gained huge popularity in the recent years as a powerful tool to analyze social networks. However, no prior works have studied potential bias issues inherent within graph embedding. In this paper, we make a first attempt in this direction. In particular, we concentrate on the fairness of node2vec, a popular graph embedding method. Our analyses on two real-world datasets demonstrate the existence of bias in node2vec when used for friendship recommendation. We, therefore, propose a fairness-aware embedding method, namely Fairwalk, which extends node2vec. Experimental results demonstrate that Fairwalk reduces bias under multiple fairness metrics while still preserving the utility.

Tuesday 13 15:00 - 16:00 AMS|ML - Multi-agent Learning 1 (2401-2402)

Chair: Jianye Hao

#1063

Value Function Transfer for Deep Multi-Agent Reinforcement Learning Based on N-Step Returns
Yong Liu, Yujing Hu, Yang Gao, Yingfeng Chen, Changjie Fan
Details | PDF

Multi-agent Learning 1

Many real-world problems, such as robot control and soccer game, are naturally modeled as sparse-interaction multi-agent systems. Reutilizing single-agent knowledge in multi-agent systems with sparse interactions can greatly accelerate the multi-agent learning process. Previous works rely on bisimulation metric to define Markov decision process (MDP) similarity for controlling knowledge transfer. However, bisimulation metric is costly to compute and is not suitable for high-dimensional state space problems. In this work, we propose more scalable transfer learning methods based on a novel MDP similarity concept. We start by defining the MDP similarity based on the N-step return (NSR) values of an MDP. Then, we propose two knowledge transfer methods based on deep neural networks called direct value function transfer and NSR-based value function transfer. We conduct experiments in image-based grid world, multi-agent particle environment (MPE) and Ms. Pac-Man game. The results indicate that the proposed methods can significantly accelerate multi-agent reinforcement learning and meanwhile get better asymptotic performance.
#2168

Decentralized Optimization with Edge Sampling
Chi Zhang, Qianxiao Li, Peilin Zhao
Details | PDF

Multi-agent Learning 1

In this paper, we propose a decentralized distributed algorithm with stochastic communication among nodes, building on a sampling method called "edge sampling''. Such a sampling algorithm allows us to avoid the heavy peer-to-peer communication cost when combining neighboring weights on dense networks while still maintains a comparable convergence rate. In particular, we quantitatively analyze its theoretical convergence properties, as well as the optimal sampling rate over the underlying network. When compared with previous methods, our solution is shown to be unbiased, communication-efficient and suffers from lower sampling variances. These theoretical findings are validated by both numerical experiments on the mixing rates of Markov Chains and distributed machine learning problems.
#2581

Exploring the Task Cooperation in Multi-goal Visual Navigation
Yuechen Wu, Zhenhuan Rao, Wei Zhang, Shijian Lu, Weizhi Lu, Zheng-Jun Zha
Details | PDF

Multi-agent Learning 1

Learning to adapt to a series of different goals in visual navigation is challenging. In this work, we present a model-embedded actor-critic architecture for the multi-goal visual navigation task. To enhance the task cooperation in multi-goal learning, we introduce two new designs to the reinforcement learning scheme: inverse dynamics model (InvDM) and multi-goal co-learning (MgCl). Specifically, InvDM is proposed to capture the navigation-relevant association between state and goal, and provide additional training signals to relieve the sparse reward issue. MgCl aims at improving the sample efficiency and supports the agent to learn from unintentional positive experiences. Extensive results on the interactive platform AI2-THOR demonstrate that the proposed method converges faster than state-of-the-art methods while producing more direct routes to navigate to the goal. The video demonstration is available at: https://youtube.com/channel/UCtpTMOsctt3yPzXqe_JMD3w/videos.
#2679

Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent
Edward Lockhart, Marc Lanctot, Julien Pérolat, Jean-Baptiste Lespiau, Dustin Morrill, Finbarr TImbers, Karl Tuyls
Details | PDF

Multi-agent Learning 1

In this paper, we present exploitability descent, a new algorithm to compute approximate equilibria in two-player zero-sum extensive-form games with imperfect information, by direct policy optimization against worst-case opponents. We prove that when following this optimization, the exploitability of a player's strategy converges asymptotically to zero, and hence when both players employ this optimization, the joint policies converge to a Nash equilibrium. Unlike fictitious play (XFP) and counterfactual regret minimization (CFR), our convergence result pertains to the policies being optimized rather than the average policies. Our experiments demonstrate convergence rates comparable to XFP and CFR in four benchmark games in the tabular case. Using function approximation, we find that our algorithm outperforms the tabular version in two of the games, which, to the best of our knowledge, is the first such result in imperfect information games among this class of algorithms.

Tuesday 13 15:00 - 16:00 NLP|NLPAT - NLP Applications and Tools (2403-2404)

Chair: Vincent Ng

#3015

Learning Assistance from an Adversarial Critic for Multi-Outputs Prediction
Yue Deng, Yilin Shen, Hongxia Jin
Details | PDF

NLP Applications and Tools

We introduce an adversarial-critic-and-assistant (ACA) learning framework to improve the performance of existing supervised learning with multiple outputs. The core contribution of our ACA is the innovation of two novel modules, i.e. an `adversarial critic' and a `collaborative assistant', that are jointly designed to provide augmenting information for facilitating general learning tasks. Our approach is not intended to be regarded as an emerging competitor for tons of well-established algorithms in the field. In fact, most existing approaches, while implemented with different learning objectives, can all be adopted as building blocks seamlessly integrated in the ACA framework to accomplish various real-world tasks. We show the performance and generalization ability of ACA on diverse learning tasks including multi-label classification, attributes prediction and sequence-to-sequence generation.
#3373

Answering Binary Causal Questions Through Large-Scale Text Mining: An Evaluation Using Cause-Effect Pairs from Human Experts
Oktie Hassanzadeh, Debarun Bhattacharjya, Mark Feblowitz, Kavitha Srinivas, Michael Perrone, Shirin Sohrabi, Michael Katz
Details | PDF

NLP Applications and Tools

In this paper, we study the problem of answering questions of type "Could X cause Y?" where X and Y are general phrases without any constraints. Answering such questions will assist with various decision analysis tasks such as verifying and extending presumed causal associations used for decision making. Our goal is to analyze the ability of an AI agent built using state-of-the-art unsupervised methods in answering causal questions derived from collections of cause-effect pairs from human experts. We focus only on unsupervised and weakly supervised methods due to the difficulty of creating a large enough training set with a reasonable quality and coverage. The methods we examine rely on a large corpus of text derived from news articles, and include methods ranging from large-scale application of classic NLP techniques and statistical analysis to the use of neural network based phrase embeddings and state-of-the-art neural language models.
#6048

Aligning Learning Outcomes to Learning Resources: A Lexico-Semantic Spatial Approach
Swarnadeep Saha, Malolan Chetlur, Tejas Indulal Dhamecha, W M Gayathri K Wijayarathna, Red Mendoza, Paul Gagnon, Nabil Zary, Shantanu Godbole
Details | PDF

NLP Applications and Tools

Aligning Learning Outcomes (LO) to relevant portions of Learning Resources (LR) is necessary to help students quickly navigate within the recommended learning material. In general, the problem can be viewed as finding the relevant sections of a document (LR) that is pertinent to a broad question (LO). In this paper, we introduce the novel problem of aligning LOs (LO is usually a sentence long text) to relevant pages of LRs (LRs are in the form of slide decks). We observe that the set of relevant pages can be composed of multiple chunks (a chunk is a contiguous set of pages) and the same page of an LR might be relevant to multiple LOs. To this end, we develop a novel Lexico-Semantic Spatial approach that captures the lexical, semantic, and spatial aspects of the task, and also alleviates the limited availability of training data. Our approach first identifies the relevancy of a page to an LO by using lexical and semantic features from each page independently. The spatial model at a later stage exploits the dependencies between the sequence of pages in the LR to further improve the alignment task. We empirically establish the importance of the lexical, semantic, and spatial models within the proposed approach. We show that, on average, a student can navigate to a relevant page from the first predicted page by about four clicks within a 38 page slide deck, as compared to two clicks by human experts.
#5271

Modeling Noisy Hierarchical Types in Fine-Grained Entity Typing: A Content-Based Weighting Approach
Junshuang Wu, Richong Zhang, Yongyi Mao, Hongyu Guo, Jinpeng Huai
Details | PDF

NLP Applications and Tools

Fine-grained entity typing (FET), which annotates the entities in a sentence with a set of finely specified type labels, often serves as the first and critical step towards many natural language processing tasks. Despite great processes have been made, current FET methods have difficulty to cope with the noisy labels which naturally come with the data acquisition processes. Existing FET approaches either pre-process to clean the noise or simply focus on one of the noisy labels, sidestepping the fact that those noises are related and content dependent. In this paper, we directly model the structured, noisy labels with a novel content-sensitive weighting schema. Coupled with a newly devised cost function and a hierarchical type embedding strategy, our method leverages a random walk process to effectively weight out noisy labels during training. Experiments on several benchmark datasets validate the effectiveness of the proposed framework and establish it as a new state of the art strategy for noisy entity typing problem.

Tuesday 13 15:00 - 16:00 MLA|BM - Bio;Medicine (2405-2406)

Chair: Guoxian Yu

#1408

FSM: A Fast Similarity Measurement for Gene Regulatory Networks via Genes' Influence Power
Zhongzhou Liu, Wenbin Hu
Details | PDF

Bio;Medicine

The problem of graph similarity measurement is fundamental in both complex networks and bioinformatics researches. Gene regulatory networks (GRNs) describe the interactions between the molecules in organisms, and are widely studied in the fields of medical AI. By measuring the similarity between GRNs, significant information can be obtained to assist the applications like gene functions prediction, drug development and medical diagnosis. Most of the existing similarity measurements have been focusing on the graph isomorphisms and are usually NP-hard problems. Thus, they are not suitable for applications in biology and clinical research due to the complexity and large-scale features of real-world GRNs. In this paper, a fast similarity measurement method called FSM for GRNs is proposed. Unlike the conventional measurements, it pays more attention to the differences between those influential genes. For the convenience and reliability, a new index defined as influence power is adopted to describe the influential genes which have greater position in a GRN. FSM was applied in nine datasets of various scales and is compared with state-of-art methods. The results demonstrated that it ran significantly faster than other methods without sacrificing measurement performance.
#2106

MLRDA: A Multi-Task Semi-Supervised Learning Framework for Drug-Drug Interaction Prediction
Xu Chu, Yang Lin, Yasha Wang, Leye Wang, Jiangtao Wang, Jingyue Gao
Details | PDF

Bio;Medicine

Drug-drug interactions (DDIs) are a major cause of preventable hospitalizations and deaths. Recently, researchers in the AI community try to improve DDI prediction in two directions, incorporating multiple drug features to better model the pharmacodynamics and adopting multi-task learning to exploit associations among DDI types. However, these two directions are challenging to reconcile due to the sparse nature of the DDI labels which inflates the risk of overfitting of multi-task learning models when incorporating multiple drug features. In this paper, we propose a multi-task semi-supervised learning framework MLRDA for DDI prediction. MLRDA effectively exploits information that is beneficial for DDI prediction in unlabeled drug data by leveraging a novel unsupervised disentangling loss CuXCov. The CuXCov loss cooperates with the classification loss to disentangle the DDI prediction relevant part from the irrelevant part in a representation learnt by an autoencoder, which helps to ease the difficulty in mining useful information for DDI prediction in both labeled and unlabeled drug data. Moreover, MLRDA adopts a multi-task learning framework to exploit associations among DDI types. Experimental results on real-world datasets demonstrate that MLRDA significantly outperforms state-of-the-art DDI prediction methods by up to 10.3% in AUPR.
#3567

Medical Concept Embedding with Multiple Ontological Representations
Lihong Song, Chin Wang Cheong, Kejing Yin, William K. Cheung, Benjamin C. M. Fung, Jonathan Poon
Details | PDF

Bio;Medicine

Learning representations of medical concepts from the Electronic Health Records (EHR) has been shown effective for predictive analytics in healthcare. Incorporation of medical ontologies has also been explored to further enhance the accuracy and to ensure better alignment with the known medical knowledge. Most of the existing work assumes that medical concepts under the same ontological category should share similar representations, which however does not always hold. In particular, the categorizations in medical ontologies were established with various factors being considered. Medical concepts even under the same ontological category may not follow similar occurrence patterns in the EHR data, leading to contradicting objectives for the representation learning. In this paper, we propose a deep learning model called MMORE which alleviates this conflicting objective issue by allowing multiple representations to be inferred for each ontological category via an attention mechanism. We apply MMORE to diagnosis prediction and our experimental results show that the representations obtained by MMORE can achieve better predictive accuracy and result in clinically meaningful sub-categorization of the existing ontological categories.
#568

Two-Stage Generative Models of Simulating Training Data at The Voxel Level for Large-Scale Microscopy Bioimage Segmentation
Deli Wang, Ting Zhao, Nenggan Zheng, Zhefeng Gong
Details | PDF

Bio;Medicine

Bioimage Informatics is a growing area that aims to extract biological knowledge from microscope images of biomedical samples automatically. Its mission is vastly challenging, however, due to the complexity of diverse imaging modalities and big scales of multi-dimensional images. One major challenge is automatic image segmentation, an essential step towards high-level modeling and analysis. While progresses in deep learning have brought the goal of automation much closer to reality, creating training data for producing powerful neural networks is often laborious. To provide a shortcut for this costly step, we propose a novel two-stage generative model for simulating voxel level training data based on a specially designed objective function of preserving foreground labels. Using segmenting neurons from LM (Light Microscopy) image stacks as a testing example, we showed that segmentation networks trained by our synthetic data were able to produce satisfactory results. Unlike other simulation methods available in the field, our method can be easily extended to many other applications because it does not involve sophisticated cell models and imaging mechanisms.

Tuesday 13 15:00 - 17:00 Competition (2305)

Macao AI Challenge for High School Students

Competition

Tuesday 13 16:30 - 17:30 ST: Human AI & ML 1 - Special Track on Human AI and Machine Learning 2 (J)

Chair: Chao Yu

#4017

How Well Do Machines Perform on IQ tests: a Comparison Study on a Large-Scale Dataset
Yusen Liu, Fangyuan He, Haodi Zhang, Guozheng Rao, Zhiyong Feng, Yi Zhou
Details | PDF

Special Track on Human AI and Machine Learning 2

AI benchmarking becomes an increasingly important task. As suggested by many researchers, Intelligence Quotient (IQ) tests, which is widely regarded as one of the predominant benchmarks for measuring human intelligence, raises an interesting challenge for AI systems. For better solving IQ tests automatedly by machines, one needs to use, combine and advance many areas in AI including knowledge representation and reasoning, machine learning, natural language processing and image understanding. Also, automated IQ tests provides an ideal testbed for integrating symbolic and sub-symbolic approaches as both are found useful here. Hence, we argue that IQ tests, although not suitable for testing machine intelligence, provides an excellent benchmark for the current development of AI research. Nevertheless, most existing IQ test datasets are not comprehensive enough for this purpose. As a result, the conclusions obtained are not representative. To address this issue, we create IQ10k, a large-scale dataset that contains more than 10,000 IQ test questions. We also conduct a comparison study on IQ10k with a number of state-of-the-art approaches.
#4970

Learning Relational Representations with Auto-encoding Logic Programs
Sebastijan Dumancic, Tias Guns, Wannes Meert, Hendrik Blockeel
Details | PDF

Special Track on Human AI and Machine Learning 2

Deep learning methods capable of handling relational data have proliferated over the past years. In contrast to traditional relational learning methods that leverage first-order logic for representing such data, these methods aim at re-representing symbolic relational data in Euclidean space. They offer better scalability, but can only approximate rich relational structures and are less flexible in terms of reasoning. This paper introduces a novel framework for relational representation learning that combines the best of both worlds. This framework, inspired by the auto-encoding principle, uses first-order logic as a data representation language, and the mapping between the the original and latent representation is done by means of logic programs instead of neural networks. We show how learning can be cast as a constraint optimisation problem for which existing solvers can be used. The use of logic as a representation language makes the proposed framework more accurate (as the representation is exact, rather than approximate), more flexible, and more interpretable than deep learning methods. We experimentally show that these latent representations are indeed beneficial in relational learning tasks.
#5902

Learning Hierarchical Symbolic Representations to Support Interactive Task Learning and Knowledge Transfer
James R. Kirk, John E. Laird
Details | PDF

Special Track on Human AI and Machine Learning 2

Interactive Task Learning (ITL) focuses on learning the definition of tasks through online natural language instruction in real time. Learning the correct grounded meaning of the instructions is difficult due to ambiguous words, lack of common ground, and the presence of distractors in the environment and the agent’s knowledge. We present a learning strategy embodied in an ITL agent that interactively learns in one shot the meaning of task concepts for 40 games and puzzles in ambiguous scenarios. Our approach learns hierarchical symbolic representations of task knowledge rather than learning a mapping directly from perceptual representations. These representations enable the agent to transfer and compose knowledge, analyze and debug multiple interpretations, and communicate efficiently with the teacher to resolve ambiguity. We evaluate the efficiency of the learning by examining the number of words required to teach tasks across cases of no transfer, positive transfer, and interference from prior tasks. Our results show that the agent can correctly generalize, disambiguate, and transfer concepts within variations in language descriptions and world representations of the same task, and across variations in different tasks.
#6050

LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning
Alberto Camacho, Rodrigo Toro Icarte, Toryn Q. Klassen, Richard Valenzano, Sheila A. McIlraith
Details | PDF

Special Track on Human AI and Machine Learning 2

In Reinforcement Learning (RL), an agent is guided by the rewards it receives from the reward function. Unfortunately, it may take many interactions with the environment to learn from sparse rewards, and it can be challenging to specify reward functions that reflect complex reward-worthy behavior. We propose using reward machines (RMs), which are automata-based representations that expose reward function structure, as a normal form representation for reward functions. We show how specifications of reward in various formal languages, including LTL and other regular languages, can be automatically translated into RMs, easing the burden of complex reward function specification. We then show how the exposed structure of the reward function can be exploited by tailored q-learning algorithms and automated reward shaping techniques in order to improve the sample efficiency of reinforcement learning methods. Experiments show that these RM-tailored techniques significantly outperform state-of-the-art (deep) RL algorithms, solving problems that otherwise cannot reasonably be solved by existing approaches.

Tuesday 13 16:30 - 18:00 ML|DL - Deep Learning 3 (L)

Chair: Vineeth N Balasubramanian

#61

Heterogeneous Graph Matching Networks for Unknown Malware Detection
Shen Wang, Zhengzhang Chen, Xiao Yu, Ding Li, Jingchao Ni, Lu-An Tang, Jiaping Gui, Zhichun Li, Haifeng Chen, Philip S. Yu
Details | PDF

Deep Learning 3

Information systems have widely been the target of malware attacks. Traditional signature-based malicious program detection algorithms can only detect known malware and are prone to evasion techniques such as binary obfuscation, while behavior-based approaches highly rely on the malware training samples and incur prohibitively high training cost. To address the limitations of existing techniques, we propose MatchGNet, a heterogeneous Graph Matching Network model to learn the graph representation and similarity metric simultaneously based on the invariant graph modeling of the program's execution behaviors. We conduct a systematic evaluation of our model and show that it is accurate in detecting malicious program behavior and can help detect malware attacks with less false positives. MatchGNet outperforms the state-of-the-art algorithms in malware detection by generating 50% less false positives while keeping zero false negatives.
#3225

On the Convergence of (Stochastic) Gradient Descent with Extrapolation for Non-Convex Minimization
Yi Xu, Zhuoning Yuan, Sen Yang, Rong Jin, Tianbao Yang
Details | PDF

Deep Learning 3

Extrapolation is a well-known technique for solving convex optimization and variational inequalities and recently attracts some attention for non-convex optimization. Several recent works have empirically shown its success in some machine learning tasks. However, it has not been analyzed for non-convex minimization and there still remains a gap between the theory and the practice. In this paper, we analyze gradient descent and stochastic gradient descent with extrapolation for finding an approximate first-order stationary point in smooth non-convex optimization problems. Our convergence upper bounds show that the algorithms with extrapolation can be accelerated than without extrapolation.
#4207

Learning Instance-wise Sparsity for Accelerating Deep Models
Chuanjian Liu, Yunhe Wang, Kai Han, Chunjing Xu, Chang Xu
Details | PDF

Deep Learning 3

Exploring deep convolutional neural networks of high efficiency and low memory usage is very essential for a wide variety of machine learning tasks. Most of existing approaches used to accelerate deep models by manipulating parameters or filters without data, e.g., pruning and decomposition. In contrast, we study this problem from a different perspective by respecting the difference between data. An instance-wise feature pruning is developed by identifying informative features for different instances. Specifically, by investigating a feature decay regularization, we expect intermediate feature maps of each instance in deep neural networks to be sparse while preserving the overall network performance. During online inference, subtle features of input images extracted by intermediate layers of a well-trained neural network can be eliminated to accelerate the subsequent calculations. We further take coefficient of variation as a measure to select the layers that are appropriate for acceleration. Extensive experiments conducted on benchmark datasets and networks demonstrate the effectiveness of the proposed method.
#4385

Quaternion Collaborative Filtering for Recommendation
Shuai Zhang, Lina Yao, Lucas Vinh Tran, Aston Zhang, Yi Tay
Details | PDF

Deep Learning 3

This paper proposes Quaternion Collaborative Filtering (QCF), a novel representation learning method for recommendation. Our proposed QCF relies on and exploits computation with Quaternion algebra, benefiting from the expressiveness and rich representation learning capability of Hamilton products. Quaternion representations, based on hypercomplex numbers, enable rich inter-latent dependencies between imaginary components. This encourages intricate relations to be captured when learning user-item interactions, serving as a strong inductive bias as compared with the real-space inner product. All in all, we conduct extensive experiments on six real-world datasets, demonstrating the effectiveness of Quaternion algebra in recommender systems. The results exhibit that QCF outperforms a wide spectrum of strong neural baselines on all datasets. Ablative experiments confirm the effectiveness of Hamilton-based composition over multi-embedding composition in real space.
#10977

(Sister Conferences Best Papers Track) A Walkthrough for the Principle of Logit Separation
Gil Keren, Sivan Sabato, Björn Schuller
Details | PDF

Deep Learning 3

We consider neural network training, in applications in which there are many possible classes, but at test-time, the task is a binary classification task of determining whether the given example belongs to a specific class. We define the Single Logit Classification (SLC) task: training the network so that at test-time, it would be possible to accurately identify whether the example belongs to a given class in a computationally efficient manner, based only on the output logit for this class. We propose a natural principle, the Principle of Logit Separation, as a guideline for choosing and designing loss functions that are suitable for SLC. We show that the Principle of Logit Separation is a crucial ingredient for success in the SLC task, and that SLC results in considerable speedups when the number of classes is large.
#613

Dense Transformer Networks for Brain Electron Microscopy Image Segmentation
Jun Li, Yongjun Chen, Lei Cai, Ian Davidson, Shuiwang Ji
Details | PDF

Deep Learning 3

The key idea of current deep learning methods for dense prediction is to apply a model on a regular patch centered on each pixel to make pixel-wise predictions. These methods are limited in the sense that the patches are determined by network architecture instead of learned from data. In this work, we propose the dense transformer networks, which can learn the shapes and sizes of patches from data. The dense transformer networks employ an encoder-decoder architecture, and a pair of dense transformer modules are inserted into each of the encoder and decoder paths. The novelty of this work is that we provide technical solutions for learning the shapes and sizes of patches from data and efficiently restoring the spatial correspondence required for dense prediction. The proposed dense transformer modules are differentiable, thus the entire network can be trained. We apply the proposed networks on biological image segmentation tasks and show superior performance is achieved in comparison to baseline methods.

Tuesday 13 16:30 - 18:00 ML|RL - Reinforcement Learning 1 (2701-2702)

Chair: Regis Sabbadin

#132

Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning
Wenjie Shi, Shiji Song, Cheng Wu
Details | PDF

Reinforcement Learning 1

Maximum entropy deep reinforcement learning (RL) methods have been demonstrated on a range of challenging continuous tasks. However, existing methods either suffer from severe instability when training on large off-policy data or cannot scale to tasks with very high state and action dimensionality such as 3D humanoid locomotion. Besides, the optimality of desired Boltzmann policy set for non-optimal soft value function is not persuasive enough. In this paper, we first derive soft policy gradient based on entropy regularized expected reward objective for RL with continuous actions. Then, we present an off-policy actor-critic, model-free maximum entropy deep RL algorithm called deep soft policy gradient (DSPG) by combining soft policy gradient with soft Bellman equation. To ensure stable learning while eliminating the need of two separate critics for soft value functions, we leverage double sampling approach to making the soft Bellman equation tractable. The experimental results demonstrate that our method outperforms in performance over off-policy prior methods.
#628

Incremental Learning of Planning Actions in Model-Based Reinforcement Learning
Jun Hao Alvin Ng, Ronald P. A. Petrick
Details | PDF

Reinforcement Learning 1

The soundness and optimality of a plan depends on the correctness of the domain model. Specifying complete domain models can be difficult when interactions between an agent and its environment are complex. We propose a model-based reinforcement learning (MBRL) approach to solve planning problems with unknown models. The model is learned incrementally over episodes using only experiences from the current episode which suits non-stationary environments. We introduce the novel concept of reliability as an intrinsic motivation for MBRL, and a method to learn from failure to prevent repeated instances of similar failures. Our motivation is to improve the learning efficiency and goal-directedness of MBRL. We evaluate our work with experimental results for three planning domains.
#2825

Autoregressive Policies for Continuous Control Deep Reinforcement Learning
Dmytro Korenkevych, A. Rupam Mahmood, Gautham Vasan, James Bergstra
Details | PDF

Reinforcement Learning 1

Reinforcement learning algorithms rely on exploration to discover new behaviors, which is typically achieved by following a stochastic policy. In continuous control tasks, policies with a Gaussian distribution have been widely adopted. Gaussian exploration however does not result in smooth trajectories that generally correspond to safe and rewarding behaviors in practical tasks. In addition, Gaussian policies do not result in an effective exploration of an environment and become increasingly inefficient as the action rate increases. This contributes to a low sample efficiency often observed in learning continuous control tasks. We introduce a family of stationary autoregressive (AR) stochastic processes to facilitate exploration in continuous control domains. We show that proposed processes possess two desirable features: subsequent process observations are temporally coherent with continuously adjustable degree of coherence, and the process stationary distribution is standard normal. We derive an autoregressive policy (ARP) that implements such processes maintaining the standard agent-environment interface. We show how ARPs can be easily used with the existing off-the-shelf learning algorithms. Empirically we demonstrate that using ARPs results in improved exploration and sample efficiency in both simulated and real world domains, and, furthermore, provides smooth exploration trajectories that enable safe operation of robotic hardware.
#4369

Sharing Experience in Multitask Reinforcement Learning
Tung-Long Vuong, Do-Van Nguyen, Tai-Long Nguyen, Cong-Minh Bui, Hai-Dang Kieu, Viet-Cuong Ta, Quoc-Long Tran, Thanh-Ha Le
Details | PDF

Reinforcement Learning 1

In multitask reinforcement learning, tasks often have sub-tasks that share the same solution, even though the overall tasks are different. If the shared-portions could be effectively identified, then the learning process could be improved since all the samples between tasks in the shared space could be used. In this paper, we propose a Sharing Experience Framework (SEF) for simultaneously training of multiple tasks. In SEF, a confidence sharing agent uses task-specific rewards from the environment to identify similar parts that should be shared across tasks and defines those parts as shared-regions between tasks. The shared-regions are expected to guide task-policies sharing their experience during the learning process. The experiments highlight that our framework improves the performance and the stability of learning task-policies, and is possible to help task-policies avoid local optimums.
#4757

Adversarial Imitation Learning from Incomplete Demonstrations
Mingfei Sun, Xiaojuan Ma
Details | PDF

Reinforcement Learning 1

Imitation learning targets deriving a mapping from states to actions, a.k.a. policy, from expert demonstrations. Existing methods for imitation learning typically require any actions in the demonstrations to be fully available, which is hard to ensure in real applications. Though algorithms for learning with unobservable actions have been proposed, they focus solely on state information and over- look the fact that the action sequence could still be partially available and provide useful information for policy deriving. In this paper, we propose a novel algorithm called Action-Guided Adversarial Imitation Learning (AGAIL) that learns a pol- icy from demonstrations with incomplete action sequences, i.e., incomplete demonstrations. The core idea of AGAIL is to separate demonstrations into state and action trajectories, and train a policy with state trajectories while using actions as auxiliary information to guide the training whenever applicable. Built upon the Generative Adversarial Imitation Learning, AGAIL has three components: a generator, a discriminator, and a guide. The generator learns a policy with rewards provided by the discriminator, which tries to distinguish state distributions between demonstrations and samples generated by the policy. The guide provides additional rewards to the generator when demonstrated actions for specific states are available. We com- pare AGAIL to other methods on benchmark tasks and show that AGAIL consistently delivers com- parable performance to the state-of-the-art methods even when the action sequence in demonstrations is only partially available.
#4947

A Restart-based Rank-1 Evolution Strategy for Reinforcement Learning
Zefeng Chen, Yuren Zhou, Xiao-yu He, Siyu Jiang
Details | PDF

Reinforcement Learning 1

Evolution strategies have been demonstrated to have the strong ability to roughly train deep neural networks and well accomplish reinforcement learning tasks. However, existing evolution strategies designed specially for deep reinforcement learning only involve the plain variants which can not realize the adaptation of mutation strength or other advanced techniques. The research of applying advanced and effective evolution strategies to reinforcement learning in an efficient way is still a gap. To this end, this paper proposes a restart-based rank-1 evolution strategy for reinforcement learning. When training the neural network, it adapts the mutation strength and updates the principal search direction in a way similar to the momentum method, which is an ameliorated version of stochastic gradient ascent. Besides, two mechanisms, i.e., the adaptation of the number of elitists and the restart procedure, are integrated to deal with the issue of local optima. Experimental results on classic control problems and Atari games show that the proposed algorithm is superior to or competitive with state-of-the-art algorithms for reinforcement learning, demonstrating the effectiveness of the proposed algorithm.

Tuesday 13 16:30 - 18:00 AMS|CSC - Computational Social Choice 1 (2703-2704)

Chair: Tamir Tassa

#748

Flexible Representative Democracy: An Introduction with Binary Issues
Ben Abramowitz, Nicholas Mattei
Details | PDF

Computational Social Choice 1

We introduce Flexible Representative Democracy (FRD), a novel hybrid of Representative Democracy (RD) and Direct Democracy (DD), in which voters can alter the issue-dependent weights of a set of elected representatives. In line with the literature on Interactive Democracy, our model allows the voters to actively determine the degree to which the system is direct versus representative. However, unlike Liquid Democracy, FRD uses strictly non-transitive delegations, making delegation cycles impossible, preserving privacy and anonymity, and maintaining a fixed set of accountable elected representatives. We present FRD and analyze it using a computational approach with issues that are independent, binary, and symmetric; we compare the outcomes of various democratic systems using Direct Democracy with majority voting and full participation as an ideal baseline. We find through theoretical and empirical analysis that FRD can yield significant improvements over RD for emulating DD with full participation.
#1527

On Strategyproof Conference Peer Review
Yichong Xu, Han Zhao, Xiaofei Shi, Nihar B. Shah
Details | PDF

Computational Social Choice 1

We consider peer review under a conference setting where there are conflicts between the reviewers and the submissions. Under such conflicts, reviewers can manipulate their reviews in a strategic manner to influence the final rankings of their own papers. Present-day peer-review systems are not designed to guard against such strategic behavior, beyond minimal (and insufficient) checks such as not assigning a paper to a conflicted reviewer. In this work, we address this problem through the lens of social choice, and present a theoretical framework for strategyproof and efficient peer review. Given the conflict graph which satisfies a simple property, we first present and analyze a flexible framework for reviewer-assignment and aggregation for the reviews that guarantees not only strategyproofness but also a natural efficiency property (unanimity). Our framework is based on the so-called partitioning method, and can be treated as a generalization of this type of method to conference peer review settings. We then empirically show that the requisite property on the (authorship) conflict graph is indeed satisfied in the ICLR-17 submissions data, and further demonstrate a simple trick to make the partitioning method more practically appealing under conference peer-review settings. Finally, we complement our positive results with negative theoretical results where we prove that under slightly stronger requirements, it is impossible for any algorithm to be both strategyproof and efficient.
#5143

A Contribution to the Critique of Liquid Democracy
Ioannis Caragiannis, Evi Micha
Details | PDF

Computational Social Choice 1

Liquid democracy, which combines features of direct and representative democracy has been proposed as a modern practice for collective decision making. Its advocates support that by allowing voters to delegate their vote to more informed voters can result in better decisions. In an attempt to evaluate the validity of such claims, we study liquid democracy as a means to discover an underlying ground truth. We revisit a recent model by Kahng et al. [2018] and conclude with three negative results, criticizing an important assumption of their modeling, as well as liquid democracy more generally. In particular, we first identify cases where natural local mechanisms are much worse than either direct voting or the other extreme of full delegation to a common dictator. We then show that delegating to less informed voters may considerably increase the chance of discovering the ground truth. Finally, we show that deciding delegations that maximize the probability to find the ground truth is a computationally hard problem.
#5250

Protecting Elections by Recounting Ballots
Edith Elkind, Jiarui Gan, Svetlana Obraztsova, Zinovi Rabinovich, Alexandros A. Voudouris
Details | PDF

Computational Social Choice 1

Complexity of voting manipulation is a prominent topic in computational social choice. In this work, we consider a two-stage voting manipulation scenario. First, a malicious party (an attacker) attempts to manipulate the election outcome in favor of a preferred candidate by changing the vote counts in some of the voting districts. Afterwards, another party (a defender), which cares about the voters' wishes, demands a recount in a subset of the manipulated districts, restoring their vote counts to their original values. We investigate the resulting Stackelberg game for the case where votes are aggregated using two variants of the Plurality rule, and obtain an almost complete picture of the complexity landscape, both from the attacker's and from the defender's perspective.
#59

Fair Allocation of Indivisible Goods and Chores
Haris Aziz, Ioannis Caragiannis, Ayumi Igarashi, Toby Walsh
Details | PDF

Computational Social Choice 1

We consider the problem of fairly dividing a set of items. Much of the fair division literature assumes that the items are ``goods'' i.e., they yield positive utility for the agents. There is also some work where the items are ``chores'' that yield negative utility for the agents. In this paper, we consider a more general scenario where an agent may have negative or positive utility for each item. This framework captures, e.g., fair task assignment, where agents can have both positive and negative utilities for each task. We show that whereas some of the positive axiomatic and computational results extend to this more general setting, others do not. We present several new and efficient algorithms for finding fair allocations in this general setting. We also point out several gaps in the literature regarding the existence of allocations satisfying certain fairness and efficiency properties and further study the complexity of computing such allocations.
#255

Weighted Maxmin Fair Share Allocation of Indivisible Chores
Haris Aziz, Hau Chan, Bo Li
Details | PDF

Computational Social Choice 1

We initiate the study of indivisible chore allocation for agents with asymmetric shares. The fairness concept we focus on is the weighted natural generalization of maxmin share: WMMS fairness and OWMMS fairness. We first highlight the fact that commonly-used algorithms that work well for allocation of goods to asymmetric agents, and even for chores to symmetric agents do not provide good approximations for allocation of chores to asymmetric agents under WMMS. As a consequence, we present a novel polynomial-time constant-approximation algorithm, via linear program, for OWMMS. For two special cases: the binary valuation case and the 2-agent case, we provide exact or better constant-approximation algorithms.

Tuesday 13 16:30 - 18:00 KRR|NR - Non-monotonic Reasoning (2705-2706)

Chair: Eduardo Ferme

#2953

Rational Inference Relations from Maximal Consistent Subsets Selection
Sébastien Konieczny, Pierre Marquis, Srdjan Vesic
Details | PDF

Non-monotonic Reasoning

When one wants to draw non-trivial inferences from an inconsistent belief base, a very natural approach is to take advantage of the maximal consistent subsets of the base. But few inference relations from maximal consistent subsets exist. In this paper we point out new such relations based on selection of some of the maximal consistent subsets, leading thus to inference relations with a stronger inferential power. The selection process must obey some principles to ensure that it leads to an inference relation which is rational. We define a general class of monotonic selection relations for comparing maximal consistent sets. And we show that it corresponds to the class of rational inference relations.
#4262

On the Integration of CP-nets in ASPRIN
Mario Alviano, Javier Romero, Torsten Schaub
Details | PDF

Non-monotonic Reasoning

Conditional preference networks (CP-nets) express qualitative preferences over features of interest.A Boolean CP-net can express that a feature is preferable under some conditions, as long as all other features have the same value.This is often a convenient representation, but sometimes one would also like to express a preference for maximizing a set of features, or some other objective function on the features of interest.ASPRIN is a flexible framework for preferences in ASP, where one can mix heterogeneous preference relations, and this paper reports on the integration of Boolean CP-nets.In general, we extend ASPRIN with a preference program for CP-nets in order to compute most preferred answer sets via an iterative algorithm.For the specific case of acyclic CP-nets, we provide an approximation by partially ordered set preferences, which are in turn normalized by ASPRIN to take advantage of several highly optimized algorithms implemented by ASP solvers for computing optimal solutions.Finally, we take advantage of a linear-time computable function to address dominance testing for tree-shaped CP-nets.
#4681

Simple Conditionals with Constrained Right Weakening
Giovanni Casini, Thomas Meyer, Ivan Varzinczak
Details | PDF

Non-monotonic Reasoning

In this paper we introduce and investigate a very basic semantics for conditionals that can be used to define a broad class of conditional reasoning. We show that it encompasses the most popular kinds of conditional reasoning developed in logic-based KR. It turns out that the semantics we propose is appropriate for a structural analysis of those conditionals that do not satisfy the property of Right Weakening. We show that it can be used for the further development of an analysis of the notion of relevance in conditional reasoning.
#5424

Out of Sight But Not Out of Mind: An Answer Set Programming Based Online Abduction Framework for Visual Sensemaking in Autonomous Driving
Jakob Suchan, Mehul Bhatt, Srikrishna Varadarajan
Details | PDF

Non-monotonic Reasoning

We demonstrate the need and potential of systematically integrated vision and semantics solutions for visual sensemaking (in the backdrop of autonomous driving). A general method for online visual sensemaking using answer set programming is systematically formalised and fully implemented. The method integrates state of the art in visual computing, and is developed as a modular framework usable within hybrid architectures for perception & control. We evaluate and demo with community established benchmarks KITTIMOD and MOT. As use-case, we focus on the significance of human-centred visual sensemaking ---e.g., semantic representation and explainability, question-answering, commonsense interpolation--- in safety-critical autonomous driving situations.
#6324

What Has Been Said? Identifying the Change Formula in a Belief Revision Scenario
Nicolas Schwind, Katsumi Inoue, Sébastien Konieczny, Jean-Marie Lagniez, Pierre Marquis
Details | PDF

Non-monotonic Reasoning

We consider the problem of identifying the change formula in a belief revision scenario: given that an unknown announcement (a formula mu) led a set of agents to revise their beliefs and given the prior beliefs and the revised beliefs of the agents, what can be said about mu? We show that under weak conditions about the rationality of the revision operators used by the agents, the set of candidate formulae has the form of a logical interval. We explain how the bounds of this interval can be tightened when the revision operators used by the agents are known and/or when mu is known to be independent from a given set of variables. We also investigate the completeness issue, i.e., whether mu can be exactly identified. We present some sufficient conditions for it, identify its computational complexity, and report the results of some experiments about it.
#10975

(Sister Conferences Best Papers Track) Meta-Interpretive Learning Using HEX-Programs
Tobias Kaminski, Thomas Eiter, Katsumi Inoue
Details | PDF

Non-monotonic Reasoning

Meta-Interpretive Learning (MIL) is a recent approach for Inductive Logic Programming (ILP) implemented in Prolog. Alternatively, MIL-problems can be solved by using Answer Set Programming (ASP), which may result in performance gains due to efficient conflict propagation. However, a straightforward MIL-encoding results in a huge size of the ground program and search space. To address these challenges, we encode MIL in the HEX-extension of ASP, which mitigates grounding issues, and we develop novel pruning techniques.

Tuesday 13 16:30 - 18:00 CV|BFGR - Biometrics, Face and Gesture Recognition (2601-2602)

Chair: Jingyi Yu

#2347

Face Photo-Sketch Synthesis via Knowledge Transfer
Mingrui Zhu, Nannan Wang, Xinbo Gao, Jie Li, Zhifeng Li
Details | PDF

Biometrics, Face and Gesture Recognition

Despite deep neural networks have demonstrated strong power in face photo-sketch synthesis task, their performance, however, are still limited by the lack of training data (photo-sketch pairs). Knowledge Transfer (KT), which aims at training a smaller and fast student network with the information learned from a larger and accurate teacher network, has attracted much attention recently due to its superior performance in the acceleration and compression of deep neural networks. This work has brought us great inspiration that we can train a relatively small student network on very few training data by transferring knowledge from a larger teacher model trained on enough training data for other tasks. Therefore, we propose a novel knowledge transfer framework to synthesize face photos from face sketches or synthesize face sketches from face photos. Particularly, we utilize two teacher networks trained on large amount of data in related task to learn the knowledge of face photos and face sketches separately and transfer them to two student networks simultaneously. In addition, the two student networks, one for photo ? sketch task and the other for sketch ? photo task, can transfer their knowledge mutually. With the proposed method, we can train our model which has superior performance using a small set of photo-sketch pairs. We validate the effectiveness of our method across several datasets. Quantitative and qualitative evaluations illustrate that our model outperforms other state-of-the-art methods in generating face sketches (or photos) with high visual quality and recognition ability.
#2386

Pose-preserving Cross Spectral Face Hallucination
Junchi Yu, Jie Cao, Yi Li, Xiaofei Jia, Ran He
Details | PDF

Biometrics, Face and Gesture Recognition

To narrow the inherent sensing gap in heterogeneous face recognition (HFR), recent methods have resorted to generative models and explored the ?recognition via generation? framework. Even though, it remains a very challenging task to synthesize photo-realistic visible faces (VIS) from near-infrared (NIR) images especially when paired training data are unavailable. We present an approach to avert the data misalignment problem and faithfully preserve pose, expression and identity information during cross-spectral face hallucination. At the pixel level, we introduce an unsupervised attention mechanism to warping that is jointly learned with the generator to derive pixel-wise correspondence from unaligned data. At the image level, an auxiliary generator is employed to facilitate the learning of mapping from NIR to VIS domain. At the domain level, we first apply the mutual information constraint to explicitly measure the correlation between domains and thus benefit synthesis. Extensive experiments on three heterogeneous face datasets demonstrate that our approach not only outperforms current state-of-the-art HFR methods but also produce visually appealing results at a high resolution.
#2439

Multi-Margin based Decorrelation Learning for Heterogeneous Face Recognition
Bing Cao, Nannan Wang, Xinbo Gao, Jie Li, Zhifeng Li
Details | PDF

Biometrics, Face and Gesture Recognition

Heterogeneous face recognition (HFR) refers to matching face images acquired from different domains with wide applications in security scenarios. However, HFR is still a challenging problem due to the significant cross-domain discrepancy and the lacking of sufficient training data in different domains. This paper presents a deep neural network approach namely Multi-Margin based Decorrelation Learning (MMDL) to extract decorrelation representations in a hyperspherical space for cross-domain face images. The proposed framework can be divided into two components: heterogeneous representation network and decorrelation representation learning. First, we employ a large scale of accessible visual face images to train heterogeneous representation network. The decorrelation layer projects the output of the first component into decorrelation latent subspace and obtain decorrelation representation. In addition, we design a multi-margin loss (MML), which consists of tetradmargin loss (TML) and heterogeneous angular margin loss (HAML), to constrain the proposed framework. Experimental results on two challenging heterogeneous face databases show that our approach achieves superior performance on both verification and recognition tasks, comparing with state-of-the-art methods.
#3952

High Performance Gesture Recognition via Effective and Efficient Temporal Modeling
Yang Yi, Feng Ni, Yuexin Ma, Xinge Zhu, Yuankai Qi, Riming Qiu, Shijie Zhao, Feng Li, Yongtao Wang
Details | PDF

Biometrics, Face and Gesture Recognition

State-of-the-art hand gesture recognition methods have investigated the spatiotemporal features based on 3D convolutional neural networks (3DCNNs) or convolutional long short-term memory (ConvLSTM). However, they often suffer from the inefficiency due to the high computational complexity of their network structures. In this paper, we focus instead on the 1D convolutional neural networks and propose a simple and efficient architectural unit, Multi-Kernel Temporal Block (MKTB), that models the multi-scale temporal responses by explicitly applying different temporal kernels. Then, we present a Global Refinement Block (GRB), which is an attention module for shaping the global temporal features based on the cross-channel similarity. By incorporating the MKTB and GRB, our architecture can effectively explore the spatiotemporal features within tolerable computational cost. Extensive experiments conducted on public datasets demonstrate that our proposed model achieves the state-of-the-art with higher efficiency. Moreover, the proposed MKTB and GRB are plug-and-play modules and the experiments on other tasks, like video understanding and video-based person re-identification, also display their good performance in efficiency and capability of generalization.
#5053

Attribute-Aware Convolutional Neural Networks for Facial Beauty Prediction
Luojun Lin, Lingyu Liang, Lianwen Jin, Weijie Chen
Details | PDF

Biometrics, Face and Gesture Recognition

Facial beauty prediction (FBP) aims to develop a machine that automatically makes facial attractiveness assessment. To a large extent, the perception of facial beauty for a human is involved with the attributes of facial appearance, which provides some significant visual cues for FBP. Deep convolution neural networks (CNNs) have shown its power for FBP, but convolution filters with fixed parameters cannot take full advantage of the facial attributes for FBP. To address this problem, we propose an Attribute-aware Convolutional Neural Network (AaNet) that modulates the filters of the main network, adaptively, using parameter generators that take beauty-related attributes as extra inputs. The parameter generators update the filters in the main network in two different manners: filter tuning or filter rebirth. However, AaNet takes attributes information as prior knowledge, that is ill-suited to those datasets merely with task-oriented labels. Therefore, imitating the design of AaNet, we further propose a Pseudo Attribute-aware Convolutional Neural Network (P-AaNet) that modulates filters conditioned on global context embeddings (pseudo attributes) of input faces learnt by a lightweight pseudo attribute distiller. Extensive ablation studies show that the AaNet and P-AaNet improve the performance of FBP when compared to conventional convolution and attention scheme, which validates the effectiveness of our method.
#30

Dense Temporal Convolution Network for Sign Language Translation
Dan Guo, Shuo Wang, Qi Tian, Meng Wang
Details | PDF

Biometrics, Face and Gesture Recognition

The sign language translation (SLT) which aims at translating a sign language video into natural language is a weakly supervised task, given that there is no exact mapping relationship between visual actions and textual words in a sentence label. To align the sign language actions and translate them into the respective words automatically, this paper proposes a dense temporal convolution network, termed DenseTCN which captures the actions in hierarchical views. Within this network, a temporal convolution (TC) is designed to learn the short-term correlation among adjacent features and further extended to a dense hierarchical structure. In the kth TC layer, we integrate the outputs of all preceding layers together: (1) The TC in a deeper layer essentially has larger receptive fields, which captures long-term temporal context by the hierarchical content transition. (2) The integration addresses the SLT problem by different views, including embedded short-term and extended longterm sequential learning. Finally, we adopt the CTC loss and a fusion strategy to learn the featurewise classification and generate the translated sentence. The experimental results on two popular sign language benchmarks, i.e. PHOENIX and USTCConSents, demonstrate the effectiveness of our proposed method in terms of various measurements.

Tuesday 13 16:30 - 18:00 KRR|DLO - Description Logics and Ontologies 1 (2603-2604)

Chair: Gerardo I. Simari

#2773

Augmenting Transfer Learning with Semantic Reasoning
Freddy Lécué, Jiaoyan Chen, Jeff Z. Pan, Huajun Chen
Details | PDF

Description Logics and Ontologies 1

Transfer learning aims at building robust prediction models by transferring knowledge gained from one problem to another. In the semantic Web, learning tasks are enhanced with semantic representations. We exploit their semantics to augment transfer learning by dealing with when to transfer with semantic measurements and what to transfer with semantic embeddings. We further present a general framework that integrates the above measurements and embeddings with existing transfer learning algorithms for higher performance. It has demonstrated to be robust in two real-world applications: bus delay forecasting and air quality forecasting.
#2902

Oblivious and Semi-Oblivious Boundedness for Existential Rules
Pierre Bourhis, Michel Leclère, Marie-Laure Mugnier, Sophie Tison, Federico Ulliana, Lily Gallois
Details | PDF

Description Logics and Ontologies 1

We study the notion of boundedness in the context positive existential rules, that is, wether there exists an upper bound to the depth of the chase procedure, that is independent from the initial instance. By focussing our attention on the oblivious and the semi-oblivious chase variants, we give a characterization of boundedness in terms of FO-rewritability and chase termination. We show that it is decidable to recognize if a set of rules is bounded for several classes of rules and outline the complexity of the problem.
#4008

Semantic Characterization of Data Services through Ontologies
Gianluca Cima, Maurizio Lenzerini, Antonella Poggi
Details | PDF

Description Logics and Ontologies 1

We study the problem of associating formal semantic descriptions to data services. We base our proposal on the Ontology-based Data Access paradigm, where a domain ontology is used to provide a semantic layer mapped to the data sources of an organization. The basic idea is to explain the semantics of a data service in terms of a query over the ontology. We illustrate a formal framework for this problem, based on the notion of source-to-ontology (s-to-o) rewriting, which comes in three variants, called sound, complete and perfect, respectively. We present a thorough complexity analysis of two computational problems, namely verification (checking whether a query is an s-to-o rewriting of a given data service), and computation (computing an s-to-o rewriting of a data service).
#5437

Revisiting Controlled Query Evaluation in Description Logics
Domenico Lembo, Riccardo Rosati, Domenico Fabio Savo
Details | PDF

Description Logics and Ontologies 1

Controlled Query Evaluation (CQE) is a confidentiality-preserving framework in which private information is protected through a policy, and a (optimal) censor guarantees that answers to queries are maximized without violating the policy. CQE has been recently studied in the context of ontologies, where the focus has been mainly on the problem of the existence of an optimal censor. In this paper we instead consider query answering over all possible optimal censors. We study data complexity of this problem for ontologies specified in the Description Logics DL-LiteR and EL_bottom and for variants of the censor language, which is the language used by the censor to enforce the policy. In our investigation we also analyze the relationship between CQE and the problem of Consistent Query Answering (CQA). Some of the complexity results we provide are indeed obtained through mutual reduction between CQE and CQA.
#925

Satisfaction and Implication of Integrity Constraints in Ontology-based Data Access
Charalampos Nikolaou, Bernardo Cuenca Grau, Egor V. Kostylev, Mark Kaminski, Ian Horrocks
Details | PDF

Description Logics and Ontologies 1

We extend ontology-based data access with integrity constraints over both the source and target schemas. The relevant reasoning problems in this setting are constraint satisfaction—to check whether a database satisfies the target constraints given the mappings and the ontology—and source-to-target (resp., target-to-source) constraint implication, which is to check whether a target constraint (resp., a source constraint) is satisfied by each database satisfying the source constraints (resp., the target constraints). We establish decidability and complexity bounds for all these problems in the case where ontologies are expressed in DL-LiteR and constraints range from functional dependencies to disjunctive tuple-generating dependencies.
#2486

Learning Description Logic Concepts: When can Positive and Negative Examples be Separated?
Maurice Funk, Jean Christoph Jung, Carsten Lutz, Hadrien Pulcini, Frank Wolter
Details | PDF

Description Logics and Ontologies 1

Learning description logic (DL) concepts from positive and negative examples given in the form of labeled data items in a KB has received significant attention in the literature. We study the fundamental question of when a separating DL concept exists and provide useful model-theoretic characterizations as well as complexity results for the associated decision problem. For expressive DLs such as ALC and ALCQI, our characterizations show a surprising link to the evaluation of ontology-mediated conjunctive queries. We exploit this to determine the combined complexity (between ExpTime and NExpTime) and data complexity (second level of the polynomial hierarchy) of separability. For the Horn DL EL, separability is ExpTime-complete both in combined and in data complexity while for its modest extension ELI it is even undecidable. Separability is also undecidable when the KB is formulated in ALC and the separating concept is required to be in EL or ELI.

Tuesday 13 16:30 - 18:00 NLP|NLS - Natural Language Semantics (2605-2606)

Chair: Wenya Wang

#1326

Aspect-Based Sentiment Classification with Attentive Neural Turing Machines
Qianren Mao, Jianxin Li, Senzhang Wang, Yuanning Zhang, Hao Peng, Min He, Lihong Wang
Details | PDF

Natural Language Semantics

Aspect-based sentiment classification aims to identify sentiment polarity expressed towards a given opinion target in a sentence. The sentiment polarity of the target is not only highly determined by sentiment semantic context but also correlated with the concerned opinion target. Existing works cannot effectively capture and store the inter-dependence between the opinion target and its context. To solve this issue, we propose a novel model of Attentive Neural Turing Machines (ANTM). Via interactive read-write operations between an external memory storage and a recurrent controller, ANTM can learn the dependable correlation of the opinion target to context and concentrate on crucial sentiment information. Specifically, ANTM separates the information of storage and computation, which extends the capabilities of the controller to learn and store sequential features. The read and write operations enable ANTM to adaptively keep track of the interactive attention history between memory content and controller state. Moreover, we append target entity embeddings into both input and output of the controller in order to augment the integration of target information. We evaluate our model on SemEval2014 dataset which contains reviews of Laptop and Restaurant domains and Twitter review dataset. Experimental results verify that our model achieves state-of-the-art performance on aspect-based sentiment classification.
#2826

A Latent Variable Model for Learning Distributional Relation Vectors
Jose Camacho-Collados, Luis Espinosa-Anke, Shoaib Jameel, Steven Schockaert
Details | PDF

Natural Language Semantics

Recently a number of unsupervised approaches have been proposed for learning vectors that capture the relationship between two words. Inspired by word embedding models, these approaches rely on co-occurrence statistics that are obtained from sentences in which the two target words appear. However, the number of such sentences is often quite small, and most of the words that occur in them are not relevant for characterizing the considered relationship. As a result, standard co-occurrence statistics typically lead to noisy relation vectors. To address this issue, we propose a latent variable model that aims to explicitly determine what words from the given sentences best characterize the relationship between the two target words. Relation vectors then correspond to the parameters of a simple unigram language model which is estimated from these words.
#4815

Dual-View Variational Autoencoders for Semi-Supervised Text Matching
Zhongbin Xie, Shuai Ma
Details | PDF

Natural Language Semantics

Semantically matching two text sequences (usually two sentences) is a fundamental problem in NLP. Most previous methods either encode each of the two sentences into a vector representation (sentence-level embedding) or leverage word-level interaction features between the two sentences. In this study, we propose to take the sentence-level embedding features and the word-level interaction features as two distinct views of a sentence pair, and unify them with a framework of Variational Autoencoders such that the sentence pair is matched in a semi-supervised manner. The proposed model is referred to as Dual-View Variational AutoEncoder (DV-VAE), where the optimization of the variational lower bound can be interpreted as an implicit Co-Training mechanism for two matching models over distinct views. Experiments on SNLI, Quora and a Community Question Answering dataset demonstrate the superiority of our DV-VAE over several strong semi-supervised and supervised text matching models.
#296

TransMS: Knowledge Graph Embedding for Complex Relations by Multidirectional Semantics
Shihui Yang, Jidong Tian, Honglun Zhang, Junchi Yan, Hao He, Yaohui Jin
Details | PDF

Natural Language Semantics

Knowledge graph embedding, which projects the symbolic relations and entities onto low-dimension continuous spaces, is essential to knowledge graph completion. Recently, translation-based embedding models (e.g. TransE) have aroused increasing attention for their simplicity and effectiveness. These models attempt to translate semantics from head entities to tail entities with the relations and infer richer facts outside the knowledge graph. In this paper, we propose a novel knowledge graph embedding method named TransMS, which translates and transmits multidirectional semantics: i) the semantics of head/tail entities and relations to tail/head entities with nonlinear functions and ii) the semantics from entities to relations with linear bias vectors. Our model has merely one additional parameter α than TransE for each triplet, which results in its better scalability in large-scale knowledge graph. Experiments show that TransMS achieves substantial improvements against state-of-the-art baselines, especially the Hit@10s of head entity prediction for N-1 relations and tail entity prediction for 1-N relations improved by about 27.1% and 24.8% on FB15K database respectively.
#3445

CNN-Based Chinese NER with Lexicon Rethinking
Tao Gui, Ruotian Ma, Qi Zhang, Lujun Zhao, Yu-Gang Jiang, Xuanjing Huang
Details | PDF

Natural Language Semantics

Character-level Chinese named entity recognition (NER) that applies long short-term memory (LSTM) to incorporate lexicons has achieved great success. However, this method fails to fully exploit GPU parallelism and candidate lexicons can conflict. In this work, we propose a faster alternative to Chinese NER: a convolutional neural network (CNN)-based method that incorporates lexicons using a rethinking mechanism. The proposed method can model all the characters and potential words that match the sentence in parallel. In addition, the rethinking mechanism can address the word conflict by feeding back the high-level features to refine the networks. Experimental results on four datasets show that the proposed method can achieve better performance than both word-level and character-level baseline methods. In addition, the proposed method performs up to 3.21 times faster than state-of-the-art methods, while realizing better performance.
#3652

Learning Task-Specific Representation for Novel Words in Sequence Labeling
Minlong Peng, Qi Zhang, Xiaoyu Xing, Tao Gui, Jinlan Fu, Xuanjing Huang
Details | PDF

Natural Language Semantics

Word representation is a key component in neural-network-based sequence labeling systems. However, representations of unseen or rare words trained on the end task are usually poor for appreciable performance. This is commonly referred to as the out-of-vocabulary (OOV) problem. In this work, we address the OOV problem in sequence labeling using only training data of the task. To this end, we propose a novel method to predict representations for OOV words from their surface-forms (e.g., character sequence) and contexts. The method is specifically designed to avoid the error propagation problem suffered by existing approaches in the same paradigm. To evaluate its effectiveness, we performed extensive empirical studies on four part-of-speech tagging (POS) tasks and four named entity recognition (NER) tasks. Experimental results show that the proposed method can achieve better or competitive performance on the OOV problem compared with existing state-of-the-art methods.

Tuesday 13 16:30 - 18:00 CV|RDCIMRSI - Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 2 (2501-2502)

Chair: Mang Ye

#416

Resolution-invariant Person Re-Identification
Shunan Mao, Shiliang Zhang, Ming Yang
Details | PDF

Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 2

Exploiting resolution invariant representation is critical for person Re-Identification (ReID) in real applications, where the resolutions of captured person images may vary dramatically. This paper learns person representations robust to resolution variance through jointly training a Foreground-Focus Super-Resolution (FFSR) module and a Resolution-Invariant Feature Extractor (RIFE) by end-to-end CNN learning. FFSR upscales the person foreground using a fully convolutional auto-encoder with skip connections learned with a foreground focus training loss. RIFE adopts two feature extraction streams weighted by a dual-attention block to learn features for low and high resolution images, respectively. These two complementary modules are jointly trained, leading to a strong resolution invariant representation. We evaluate our methods on five datasets containing person images at a large range of resolutions, where our methods show substantial superiority to existing solutions. For instance, we achieve Rank-1 accuracy of 36.4% and 73.3% on CAVIAR and MLR-CUHK03, outperforming the state-of-the art by 2.9% and 2.6%, respectively.
#690

Deep Light-field-driven Saliency Detection from a Single View
Yongri Piao, Zhengkun Rong, Miao Zhang, Xiao Li, Huchuan Lu
Details | PDF

Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 2

Previous 2D saliency detection methods extract salient cues from a single view and directly predict the expected results. Both traditional and deep-learning-based 2D methods do not consider geometric information of 3D scenes. Therefore the relationship between scene understanding and salient objects cannot be effectively established. This limits the performance of 2D saliency detection in challenging scenes. In this paper, we show for the first time that saliency detection problem can be reformulated as two sub-problems: light field synthesis from a single view and light-field-driven saliency detection. We propose a high-quality light field synthesis network to produce reliable 4D light field information. Then we propose a novel light-field-driven saliency detection network with two purposes, that is, i) richer saliency features can be produced for effective saliency detection; ii) geometric information can be considered for integration of multi-view saliency maps in a view-wise attention fashion. The whole pipeline can be trained in an end-to-end fashion. For training our network, we introduce the largest light field dataset for saliency detection, containing 1580 light fields that cover a wide variety of challenging scenes. With this new formulation, our method is able to achieve state-of-the-art performance.
#1361

Generalized Zero-Shot Vehicle Detection in Remote Sensing Imagery via Coarse-to-Fine Framework
Hong Chen, Yongtan Luo, Liujuan Cao, Baochang Zhang, Guodong Guo, Cheng Wang, Jonathan Li, Rongrong Ji
Details | PDF

Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 2

Vehicle detection and recognition in remote sensing images are challenging, especially when only limited training data are available to accommodate various target categories. In this paper, we introduce a novel coarse-to-fine framework, which decomposes vehicle detection into segmentation-based vehicle localization and generalized zero-shot vehicle classification. Particularly, the proposed framework can well handle the problem of generalized zero-shot vehicle detection, which is challenging due to the requirement of recognizing vehicles that are even unseen during training. Specifically, a hierarchical DeepLab v3 model is proposed in the framework, which fully exploits fine-grained features to locate the target on a pixel-wise level, then recognizes vehicles in a coarse-grained manner. Additionally, the hierarchical DeepLab v3 model is beneficially compatible to combine the generalized zero-shot recognition. To the best of our knowledge, there is no publically available dataset to test comparative methods, we therefore construct a new dataset to fill this gap of evaluation. The experimental results show that the proposed framework yields promising results on the imperative yet difficult task of zero-shot vehicle detection and recognition.
#6134

MSR: Multi-Scale Shape Regression for Scene Text Detection
Chuhui Xue, Shijian Lu, Wei Zhang
Details | PDF

Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 2

State-of-the-art scene text detection techniques predict quadrilateral boxes that are prone to localization errors while dealing with straight or curved text lines of different orientations and lengths in scenes. This paper presents a novel multi-scale shape regression network (MSR) that is capable of locating text lines of different lengths, shapes and curvatures in scenes. The proposed MSR detects scene texts by predicting dense text boundary points that inherently capture the location and shape of text lines accurately and are also more tolerant to the variation of text line length as compared with the state of the arts using proposals or segmentation. Additionally, the multi-scale network extracts and fuses features at different scales which demonstrates superb tolerance to the text scale variation. Extensive experiments over several public datasets show that the proposed MSR obtains superior detection performance for both curved and straight text lines of different lengths and orientations.
#3846

Beyond Product Quantization: Deep Progressive Quantization for Image Retrieval
Lianli Gao, Xiaosu Zhu, Jingkuan Song, Zhou Zhao, Heng Tao Shen
Details | PDF

Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 2

Product Quantization (PQ) has long been a mainstream for generating an exponentially large codebook at very low memory/time cost. Despite its success, PQ is still tricky for the decomposition of high-dimensional vector space, and the retraining of model is usually unavoidable when the code length changes. In this work, we propose a deep progressive quantization (DPQ) model, as an alternative to PQ, for large scale image retrieval. DPQ learns the quantization codes sequentially and approximates the original feature space progressively. Therefore, we can train the quantization codes with different code lengths simultaneously. Specifically, we first utilize the label information for guiding the learning of visual features, and then apply several quantization blocks to progressively approach the visual features. Each quantization block is designed to be a layer of a convolutional neural network, and the whole framework can be trained in an end-to-end manner. Experimental results on the benchmark datasets show that our model significantly outperforms the state-of-the-art for image retrieval. Our model is trained once for different code lengths and therefore requires less computation time. Additional ablation study demonstrates the effect of each component of our proposed model. Our code is released at https://github.com/cfm-uestc/DPQ.
#1370

LRDNN: Local-refining based Deep Neural Network for Person Re-Identification with Attribute Discerning
Qinqin Zhou, Bineng Zhong, Xiangyuan Lan, Gan Sun, Yulun Zhang, Mengran Gou
Details | PDF

Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 2

Recently, pose or attribute information has been widely used to solve person re-identification (re-ID) problem. However, the inaccurate output from pose or attribute modules will impair the final person re-ID performance. Since re-ID, pose estimation and attribute recognition are all based on the person appearance information, we propose a Local-refining based Deep Neural Network (LRDNN) to aggregate pose estimation and attribute recognition to improve the re-ID performance. To this end, we add a pose branch to extract the local spatial information and optimize the whole network on both person identity and attribute objectives. To diminish the negative affect from unstable pose estimation, a novel structure called channel parse block (CPB) is introduced to learn weights on different feature channels in pose branch. Then two branches are combined with compact bilinear pooling. Experimental results on Market1501 and DukeMTMC-reid datasets illustrate the effectiveness of the proposed method.

Tuesday 13 16:30 - 18:00 ML|C - Classification 3 (2503-2504)

Chair: Shao-Wen Yang

#183

Margin Learning Embedded Prediction for Video Anomaly Detection with A Few Anomalies
Wen Liu, Weixin Luo, Zhengxin Li, Peilin Zhao, Shenghua Gao
Details | PDF

Classification 3

Classical semi-supervised video anomaly detection assumes that only normal data are available in the training set because of the rare and unbounded nature of anomalies. It is obviously, however, these infrequently observed abnormal events can actually help with the detection of identical or similar abnormal events, a line of thinking that motivates us to study open-set supervised anomaly detection with only a few types of abnormal observed events and many normal events available. Under the assumption that normal events can be well predicted, we propose a Margin Learning Embedded Prediction (MLEP) framework. There are three features in MLEP- based open-set supervised video anomaly detection: i) we customize a video prediction framework that favors the prediction of normal events and distorts the prediction of abnormal events; ii) The margin learning framework learns a more compact normal data distribution and enlarges the margin between normal and abnormal events. Since abnormal events are unbounded, our framework consequently helps with the detection of abnormal events, even for anomalies that have never been previously observed. Therefore, our framework is suitable for the open-set supervised anomaly detection setting; iii) our framework can readily handle both frame-level and video-level anomaly annotations. Considering that video-level anomaly detection is more easily annotated in practice and that anomaly detection with a few anomalies is a more practical setting, our work thus pushes the application of anomaly detection towards real scenarios. Extensive experiments validate the effectiveness of our framework for anomaly detection.
#1214

Comprehensive Semi-Supervised Multi-Modal Learning
Yang Yang, Ke-Tao Wang, De-Chuan Zhan, Hui Xiong, Yuan Jiang
Details | PDF

Classification 3

Multi-modal learning refers to the process of learning a precise model to represent the joint representations of different modalities. Despite its promise for multi-modal learning, the co-regularization method is based on the consistency principle with a sufficient assumption, which usually does not hold for real-world multi-modal data. Indeed, due to the modal insufficiency in real-world applications, there are divergences among heterogeneous modalities. This imposes a critical challenge for multi-modal learning. To this end, in this paper, we propose a novel Comprehensive Multi-Modal Learning (CMML) framework, which can strike a balance between the consistency and divergency modalities by considering the insufficiency in one unified framework. Specifically, we utilize an instance level attention mechanism to weight the sufficiency for each instance on different modalities. Moreover, novel diversity regularization and robust consistency metrics are designed for discovering insufficient modalities. Our empirical studies show the superior performances of CMML on real-world data in terms of various criteria.
#5482

Exploiting Interaction Links for Node Classification with Deep Graph Neural Networks
Hogun Park, Jennifer Neville
Details | PDF

Classification 3

Node classification is an important problem in relational machine learning. However, in scenarios where graph edges represent interactions among the entities (e.g., over time), the majority of current methods either summarize the interaction information into link weights or aggregate the links to produce a static graph. In this paper, we propose a neural network architecture that jointly captures both temporal and static interaction patterns, which we call Temporal-Static-Graph-Net (TSGNet). Our key insight is that leveraging both a static neighbor encoder, which can learn aggregate neighbor patterns, and a graph neural network-based recurrent unit, which can capture complex interaction patterns, improve the performance of node classification. In our experiments on node classification tasks, TSGNet produces significant gains compared to state-of-the-art methods—reducing classification error up to 24% and an average of 10% compared to the best competitor on four real-world networks and one synthetic dataset.
#5854

Automated Machine Learning with Monte-Carlo Tree Search
Herilalaina Rakotoarison, Marc Schoenauer, Michèle Sebag
Details | PDF

Classification 3

The AutoML approach aims to deliver peak performance from a machine learning portfolio on the dataset at hand. A Monte-Carlo Tree Search Algorithm Selection and Configuration (Mosaic) approach is presented to tackle this mixed (combinatorial and continuous) expensive optimization problem on the structured search space of ML pipelines. Extensive lesion studies are conducted to independently assess and compare: i) the optimization processes based on Bayesian Optimization or Monte Carlo Tree Search (MCTS); ii) its warm-start initialization based on meta-features or random runs; iii) the ensembling of the solutions gathered along the search. Mosaic is assessed on the OpenML 100 benchmark and the Scikit-learn portfolio, with statistically significant gains over AutoSkLearn, winner of all former AutoML challenges.
#2178

Multi-Class Learning using Unlabeled Samples: Theory and Algorithm
Jian Li, Yong Liu, Rong Yin, Weiping Wang
Details | PDF

Classification 3

In this paper, we investigate the generalization performance of multi-class classification, for which we obtain a shaper error bound by using the notion of local Rademacher complexity and additional unlabeled samples, substantially improving the state-of-the-art bounds in existing multi-class learning methods. The statistical learning motivates us to devise an efficient multi-class learning framework with the local Rademacher complexity and Laplacian regularization. Coinciding with the theoretical analysis, experimental results demonstrate that the stated approach achieves better performance.
#3777

Accelerated Incremental Gradient Descent using Momentum Acceleration with Scaling Factor
Yuanyuan Liu, Fanhua Shang, Licheng Jiao
Details | PDF

Classification 3

Recently, research on variance reduced incremental gradient descent methods (e.g., SAGA) has made exciting progress (e.g., linear convergence for strongly convex (SC) problems). However, existing accelerated methods (e.g., point-SAGA) suffer from drawbacks such as inflexibility. In this paper, we design a novel and simple momentum to accelerate the classical SAGA algorithm, and propose a direct accelerated incremental gradient descent algorithm. In particular, our theoretical result shows that our algorithm attains a best known oracle complexity for strongly convex problems and an improved convergence rate for the case of n>=L/\mu. We also give experimental results justifying our theoretical results and showing the effectiveness of our algorithm.

Tuesday 13 16:30 - 18:00 ML|DM - Data Mining 3 (2505-2506)

Chair: Decebal Constantin Mocanu

#3254

Learning Network Embedding with Community Structural Information
Yu Li, Ying Wang, Tingting Zhang, Jiawei Zhang, Yi Chang
Details | PDF

Data Mining 3

Network embedding is an effective approach to learn the low-dimensional representations of vertices in networks, aiming to capture and preserve the structure and inherent properties of networks. The vast majority of existing network embedding methods exclusively focus on vertex proximity of networks, while ignoring the network internal community structure. However, the homophily principle indicates that vertices within the same community are more similar to each other than those from different communities, thus vertices within the same community should have similar vertex representations. Motivated by this, we propose a novel network embedding framework NECS to learn the Network Embedding with Community Structural information, which preserves the high-order proximity and incorporates the community structure in vertex representation learning. We formulate the problem into a principled optimization framework and provide an effective alternating algorithm to solve it. Extensive experimental results on several benchmark network datasets demonstrate the effectiveness of the proposed framework in various network analysis tasks including network reconstruction, link prediction and vertex classification.
#5016

Unified Embedding Model over Heterogeneous Information Network for Personalized Recommendation
Zekai Wang, Hongzhi Liu, Yingpeng Du, Zhonghai Wu, Xing Zhang
Details | PDF

Data Mining 3

Most of heterogeneous information network (HIN) based recommendation models are based on the user and item modeling with meta-paths. However, they always model users and items in isolation under each meta-path, which may lead to information extraction misled. In addition, they only consider structural features of HINs when modeling users and items during exploring HINs, which may lead to useful information for recommendation lost irreversibly. To address these problems, we propose a HIN based unified embedding model for recommendation, called HueRec. We assume there exist some common characteristics under different meta-paths for each user or item, and use data from all meta-paths to learn unified users’ and items’ representations. So the interrelation between meta-paths are utilized to alleviate the problems of data sparsity and noises on one meta-path. Different from existing models which first explore HINs then make recommendations, we combine these two parts into an end-to-end model to avoid useful information lost in initial phases. In addition, we embed all users, items and meta-paths into related latent spaces. Therefore, we can measure users’ preferences on meta-paths to improve the performances of personalized recommendation. Extensive experiments show HueRec consistently outperforms state-of-the-art methods.
#5394

Metric Learning on Healthcare Data with Incomplete Modalities
Qiuling Suo, Weida Zhong, Fenglong Ma, Ye Yuan, Jing Gao, Aidong Zhang
Details | PDF

Data Mining 3

Utilizing multiple modalities to learn a good distance metric is of vital importance for various clinical applications. However, it is common that modalities are incomplete for some patients due to various technical and practical reasons in healthcare datasets. Existing metric learning methods cannot directly learn the distance metric on such data with missing modalities. Nevertheless, the incomplete data contains valuable information to characterize patient similarity and modality relationships, and they should not be ignored during the learning process. To tackle the aforementioned challenges, we propose a metric learning framework to perform missing modality completion and multi-modal metric learning simultaneously. Employing the generative adversarial networks, we incorporate both complete and incomplete data to learn the mapping relationship between modalities. After completing the missing modalities, we use the nonlinear representations extracted by the discriminator to learn the distance metric among patients. Through jointly training the adversarial generation part and metric learning, the similarity among patients can be learned on data with missing modalities. Experimental results show that the proposed framework learns more accurate distance metric on real-world healthcare datasets with incomplete modalities, comparing with the state-of-the-art approaches. Meanwhile, the quality of the generated modalities can be preserved.
#316

Joint Link Prediction and Network Alignment via Cross-graph Embedding
Xingbo Du, Junchi Yan, Hongyuan Zha
Details | PDF

Data Mining 3

Link prediction and network alignment are two important problems in social network analysis and other network related applications. Considerable efforts have been devoted to these two problems while often in an independent way to each other. In this paper we argue that these two tasks are relevant and present a joint link prediction and network alignment framework, whereby a novel cross-graph node embedding technique is devised to allow for information propagation. Our approach can either work with a few initial vertex correspondence as seeds, or from scratch. By extensive experiments on public benchmark, we show that link prediction and network alignment can benefit to each other especially for improving the recall for both tasks.
#1709

Masked Graph Convolutional Network
Liang Yang, Fan Wu, Yingkui Wang, Junhua Gu, Yuanfang Guo
Details | PDF

Data Mining 3

Semi-supervised classification is a fundamental technology to process the structured and unstructured data in machine learning field. The traditional attribute-graph based semi-supervised classification methods propagate labels over the graph which is usually constructed from the data features, while the graph convolutional neural networks smooth the node attributes, i.e., propagate the attributes, over the real graph topology. In this paper, they are interpreted from the perspective of propagation, and accordingly categorized into symmetric and asymmetric propagation based methods. From the perspective of propagation, both the traditional and network based methods are propagating certain objects over the graph. However, different from the label propagation, the intuition ``the connected data samples tend to be similar in terms of the attributes", in attribute propagation is only partially valid. Therefore, a masked graph convolution network (Masked GCN) is proposed by only propagating a certain portion of the attributes to the neighbours according to a masking indicator, which is learned for each node by jointly considering the attribute distributions in local neighbourhoods and the impact on the classification results. Extensive experiments on transductive and inductive node classification tasks have demonstrated the superiority of the proposed method.
#3778

Improving Cross-lingual Entity Alignment via Optimal Transport
Shichao Pei, Lu Yu, Xiangliang Zhang
Details | PDF

Data Mining 3

Cross-lingual entity alignment identifies entity pairs that share the same meanings but locate in different language knowledge graphs (KGs). The study in this paper is to address two limitations that widely exist in current solutions: 1) the alignment loss functions defined at the entity level serve well the purpose of aligning labeled entities but fail to match the whole picture of labeled and unlabeled entities in different KGs; 2) the translation from one domain to the other has been considered (e.g., X to Y by M1 or Y to X by M2). However, the important duality of alignment between different KGs (X to Y by M1 and Y to X by M2) is ignored. We propose a novel entity alignment framework (OTEA), which dually optimizes the entity-level loss and group-level loss via optimal transport theory. We also impose a regularizer on the dual translation matrices to mitigate the effect of noise during transformation. Extensive experimental results show that our model consistently outperforms the state-of-the-arts with significant improvements on alignment accuracy.

Tuesday 13 16:30 - 18:00 ML|LT - Learning Theory (2401-2402)

Chair: Colin de la Higuera

#531

Approximate Optimal Transport for Continuous Densities with Copulas
Jinjin Chi, Jihong Ouyang, Ximing Li, Yang Wang, Meng Wang
Details | PDF

Learning Theory

Optimal Transport (OT) formulates a powerful framework by comparing probability distributions, and it has increasingly attracted great attention within the machine learning community. However, it suffers from severe computational burden, due to the intractable objective with respect to the distributions of interest. Especially, there still exist very few attempts for continuous OT, i.e., OT for comparing continuous densities. To this end, we develop a novel continuous OT method, namely Copula OT (Cop-OT). The basic idea is to transform the primal objective of continuous OT into a tractable form with respect to the copula parameter, which can be efficiently solved by stochastic optimization with less time and memory requirements. Empirical results on real applications of image retrieval and synthetic data demonstrate that our Cop-OT can gain more accurate approximations to continuous OT values than the state-of-the-art baselines.
#848

Improved Algorithm on Online Clustering of Bandits
Shuai Li, Wei Chen, Shuai Li, Kwong-Sak Leung
Details | PDF

Learning Theory

We generalize the setting of online clustering of bandits by allowing non-uniform distribution over user frequencies. A more efficient algorithm is proposed with simple set structures to represent clusters. We prove a regret bound for the new algorithm which is free of the minimal frequency over users. The experiments on both synthetic and real datasets consistently show the advantage of the new algorithm over existing methods.
#1736

Heavy-ball Algorithms Always Escape Saddle Points
Tao Sun, Dongsheng Li, Zhe Quan, Hao Jiang, Shengguo Li, Yong Dou
Details | PDF

Learning Theory

Nonconvex optimization algorithms with random initialization have attracted increasing attention recently. It has been showed that many first-order methods always avoid saddle points with random starting points. In this paper, we answer a question: can the nonconvex heavy-ball algorithms with random initialization avoid saddle points? The answer is yes! Direct using the existing proof technique for the heavy-ball algorithms is hard due to that each iteration of the heavy-ball algorithm consists of current and last points. It is impossible to formulate the algorithms as iteration like xk+1= g(xk) under some mapping g. To this end, we design a new mapping on a new space. With some transfers, the heavy-ball algorithm can be interpreted as iterations after this mapping. Theoretically, we prove that heavy-ball gradient descent enjoys larger stepsize than the gradient descent to escape saddle points to escape the saddle point. And the heavy-ball proximal point algorithm is also considered; we also proved that the algorithm can always escape the saddle point.
#1738

BN-invariant Sharpness Regularizes the Training Model to Better Generalization
Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
Details | PDF

Learning Theory

It is arguably believed that flatter minima can generalize better. However, it has been pointed out that the usual definitions of sharpness, which consider either the maxima or the integral of loss over a delta ball of parameters around minima, cannot give consistent measurement for scale invariant neural networks, e.g., networks with batch normalization layer. In this paper, we first propose a measure of sharpness, BN-Sharpness, which gives consistent value for equivalent networks under BN. It achieves the property of scale invariance by connecting the integral diameter with the scale of parameter. Then we present a computation-efficient way to calculate the BN-sharpness approximately i.e., one dimensional integral along the "sharpest" direction. Furthermore, we use the BN-sharpness to regularize the training and design an algorithm to minimize the new regularized objective. Our algorithm achieves considerably better performance than vanilla SGD over various experiment settings.
#2451

Conditions on Features for Temporal Difference-Like Methods to Converge
Marcus Hutter, Samuel Yang-Zhao, Sultan Javed Majeed
Details | PDF

Learning Theory

The convergence of many reinforcement learning (RL) algorithms with linear function approximation has been investigated extensively but most proofs assume that these methods converge to a unique solution. In this paper, we provide a complete characterization of non-uniqueness issues for a large class of reinforcement learning algorithms, simultaneously unifying many counter-examples to convergence in a theoretical framework. We achieve this by proving a new condition on features that can determine whether the convergence assumptions are valid or non-uniqueness holds. We consider a general class of RL methods, which we call natural algorithms, whose solutions are characterized as the fixed point of a projected Bellman equation. Our main result proves that natural algorithms converge to the correct solution if and only if all the value functions in the approximation space satisfy a certain shape. This implies that natural algorithms are, in general, inherently prone to converge to the wrong solution for most feature choices even if the value function can be represented exactly. Given our results, we show that state aggregation-based features are a safe choice for natural algorithms and also provide a condition for finding convergent algorithms under other feature constructions.
#5172

Motion Invariance in Visual Environments
Alessandro Betti, Marco Gori, Stefano Melacci
Details | PDF

Learning Theory

The puzzle of computer vision might find new challenging solutions when we realize that most successful methods are working at image level, which is remarkably more difficult than processing directly visual streams, just as it happens in nature. In this paper, we claim that the processing of a stream of frames naturally leads to formulate the motion invariance principle, which enables the construction of a new theory of visual learning based on convolutional features. The theory addresses a number of intriguing questions that arise in natural vision, and offers a well-posed computational scheme for the discovery of convolutional filters over the retina. They are driven by the Euler- Lagrange differential equations derived from the principle of least cognitive action, that parallels the laws of mechanics. Unlike traditional convolutional networks, which need massive supervision, the proposed theory offers a truly new scenario in which feature learning takes place by unsupervised processing of video signals. An experimental report of the theory is presented where we show that features extracted under motion invariance yield an improvement that can be assessed by measuring information-based indexes.

Tuesday 13 16:30 - 18:00 CV|LV - Language and Vision 1 (2403-2404)

Chair: Sheng Tang

#801

Talking Face Generation by Conditional Recurrent Adversarial Network
Yang Song, Jingwen Zhu, Dawei Li, Andy Wang, Hairong Qi
Details | PDF

Language and Vision 1

Given an arbitrary face image and an arbitrary speech clip, the proposed work attempts to generate the talking face video with accurate lip synchronization. Existing works either do not consider temporal dependency across video frames thus yielding abrupt facial and lip movement or are limited to the generation of talking face video for a specific person thus lacking generalization capacity. We propose a novel conditional recurrent generation network that incorporates both image and audio features in the recurrent unit for temporal dependency. To achieve both image- and video-realism, a pair of spatial-temporal discriminators are included in the network for better image/video quality. Since accurate lip synchronization is essential to the success of talking face video generation, we also construct a lip-reading discriminator to boost the accuracy of lip synchronization. We also extend the network to model the natural pose and expression of talking face on the Obama Dataset. Extensive experimental results demonstrate the superiority of our framework over the state-of-the-arts in terms of visual quality, lip sync accuracy, and smooth transition pertaining to both lip and facial movement.
#2791

Dynamically Visual Disambiguation of Keyword-based Image Search
Yazhou Yao, Zeren Sun, Fumin Shen, Li Liu, Limin Wang, Fan Zhu, Lizhong Ding, Gangshan Wu, Ling Shao
Details | PDF

Language and Vision 1

Due to the high cost of manual annotation, learning directly from the web has attracted broad attention. One issue that limits their performance is the problem of visual polysemy. To address this issue, we present an adaptive multi-model framework that resolves polysemy by visual disambiguation. Compared to existing methods, the primary advantage of our approach lies in that our approach can adapt to the dynamic changes in the search results. Our proposed framework consists of two major steps: we first discover and dynamically select the text queries according to the image search results, then we employ the proposed saliency-guided deep multi-instance learning network to remove outliers and learn classification models for visual disambiguation. Extensive experiments demonstrate the superiority of our proposed approach.
#3549

Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation
Jing Wang, Yingwei Pan, Ting Yao, Jinhui Tang, Tao Mei
Details | PDF

Language and Vision 1

Image paragraph generation is the task of producing a coherent story (usually a paragraph) that describes the visual content of an image. The problem nevertheless is not trivial especially when there are multiple descriptive and diverse gists to be considered for paragraph generation, which often happens in real images. A valid question is how to encapsulate such gists/topics that are worthy of mention from an image, and then describe the image from one topic to another but holistically with a coherent structure. In this paper, we present a new design --- Convolutional Auto-Encoding (CAE) that purely employs convolutional and deconvolutional auto-encoding framework for topic modeling on the region-level features of an image. Furthermore, we propose an architecture, namely CAE plus Long Short-Term Memory (dubbed as CAE-LSTM), that novelly integrates the learnt topics in support of paragraph generation. Technically, CAE-LSTM capitalizes on a two-level LSTM-based paragraph generation framework with attention mechanism. The paragraph-level LSTM captures the inter-sentence dependency in a paragraph, while sentence-level LSTM is to generate one sentence which is conditioned on each learnt topic. Extensive experiments are conducted on Stanford image paragraph dataset, and superior results are reported when comparing to state-of-the-art approaches. More remarkably, CAE-LSTM increases CIDEr performance from 20.93% to 25.15%.
#3674

Multi-Level Visual-Semantic Alignments with Relation-Wise Dual Attention Network for Image and Text Matching
Zhibin Hu, Yongsheng Luo, Jiong Lin, Yan Yan, Jian Chen
Details | PDF

Language and Vision 1

Image-text matching is central to visual-semantic cross-modal retrieval and has been attracting extensive attention recently. Previous studies have been devoted to finding the latent correspondence between image regions and words, e.g., connecting key words to specific regions of salient objects. However, existing methods are usually committed to handle concrete objects, rather than abstract ones, e.g., a description of some action, which in fact are also ubiquitous in description texts of real-world. The main challenge in dealing with abstract objects is that there is no explicit connections between them, unlike their concrete counterparts. One therefore has to alternatively find the implicit and intrinsic connections between them. In this paper, we propose a relation-wise dual attention network (RDAN) for image-text matching. Specifically, we maintain an over-complete set that contains pairs of regions and words. Then built upon this set, we encode the local correlations and the global dependencies between regions and words by training a visual-semantic network. Then a dual pathway attention network is presented to infer the visual-semantic alignments and image-text similarity. Extensive experiments validate the efficacy of our method, by achieving the state-of-the-art performance on several public benchmark datasets.
#4765

Densely Connected Attention Flow for Visual Question Answering
Fei Liu, Jing Liu, Zhiwei Fang, Richang Hong, Hanqing Lu
Details | PDF

Language and Vision 1

Learning effective interactions between multi-modal features is at the heart of visual question answering (VQA). A common defect of the existing VQA approaches is that they only consider a very limited amount of interactions, which may be not enough to model latent complex image-question relations that are necessary for accurately answering questions. Therefore, in this paper, we propose a novel DCAF (Densely Connected Attention Flow) framework for modeling dense interactions. It densely connects all pairwise layers of the network via Attention Connectors, capturing fine-grained interplay between image and question across all hierarchical levels. The proposed Attention Connector efficiently connects the multi-modal features at any two layers with symmetric co-attention, and produces interaction-aware attention features. Experimental results on three publicly available datasets show that the proposed method achieves state-of-the-art performance.
#2695

Video Interactive Captioning with Human Prompts
Aming Wu, Yahong Han, Yi Yang
Details | PDF

Language and Vision 1

Video captioning aims at generating a proper sentence to describe the video content. As a video often includes rich visual content and semantic details, different people may be interested in different views. Thus the generated sentence always fails to meet the ad hoc expectations. In this paper, we make a new attempt that, we launch a round of interaction between a human and a captioning agent. After generating an initial caption, the agent asks for a short prompt from the human as a clue of his expectation. Then, based on the prompt, the agent could generate a more accurate caption. We name this process a new task of video interactive captioning (ViCap). Taking a video and an initial caption as input, we devise the ViCap agent which consists of a video encoder, an initial caption encoder, and a refined caption generator. We show that the ViCap can be trained via a full supervision (with ground-truth) way or a weak supervision (with only prompts) way. For the evaluation of ViCap, we first extend the MSRVTT with interaction ground-truth. Experimental results not only show the prompts can help generate more accurate captions, but also demonstrate the good performance of the proposed method.

Tuesday 13 16:30 - 18:00 ML|LGM - Learning Graphical Models (2405-2406)

Chair: Satoshi Oyama

#1256

Dynamic Hypergraph Neural Networks
Jianwen Jiang, Yuxuan Wei, Yifan Feng, Jingxuan Cao, Yue Gao
Details | PDF

Learning Graphical Models

In recent years, graph/hypergraph-based deep learning methods have attracted much attention from researchers. These deep learning methods take graph/hypergraph structure as prior knowledge in the model. However, hidden and important relations are not directly represented in the inherent structure. To tackle this issue, we propose a dynamic hypergraph neural networks framework (DHGNN), which is composed of the stacked layers of two modules: dynamic hypergraph construction (DHG) and hypergrpah convolution (HGC). Considering initially constructed hypergraph is probably not a suitable representation for data, the DHG module dynamically updates hypergraph structure on each layer. Then hypergraph convolution is introduced to encode high-order data relations in a hypergraph structure. The HGC module includes two phases: vertex convolution and hyperedge convolution, which are designed to aggregate feature among vertices and hyperedges, respectively. We have evaluated our method on standard datasets, the Cora citation network and Microblog dataset. Our method outperforms state-of-the-art methods. More experiments are conducted to demonstrate the effectiveness and robustness of our method to diverse data distributions.
#3145

Neural Network based Continuous Conditional Random Field for Fine-grained Crime Prediction
Fei Yi, Zhiwen Yu, Fuzhen Zhuang, Bin Guo
Details | PDF

Learning Graphical Models

Crime prediction has always been a crucial issue for public safety, and recent works have shown the effectiveness of taking spatial correlation, such as region similarity or interaction, for fine-grained crime modeling. In our work, we seek to reveal the relationship across regions for crime prediction using Continuous Conditional Random Field (CCRF). However, conventional CCRF would become impractical when facing a dense graph considering all relationship between regions. To deal with it, in this paper, we propose a Neural Network based CCRF (NN-CCRF) model that formulates CCRF into an end-to-end neural network framework, which could reduce the complexity in model training and improve the overall performance. We integrate CCRF with NN by introducing a Long Short-Term Memory (LSTM) component to learn the non-linear mapping from inputs to outputs of each region, and a modified Stacked Denoising AutoEncoder (SDAE) component for pairwise interactions modeling between regions. Experiments conducted on two different real-world datasets demonstrate the superiority of our proposed model over the state-of-the-art methods.
#5018

Efficient Regularization Parameter Selection for Latent Variable Graphical Models via Bi-Level Optimization
Joachim Giesen, Frank Nussbaum, Christopher Schneider
Details | PDF

Learning Graphical Models

Latent variable graphical models are an extension of Gaussian graphical models that decompose the precision matrix into a sparse and a low-rank component. These models can be learned with theoretical guarantees from data via a semidefinite program. This program features two regularization terms, one for promoting sparsity and one for promoting a low rank. In practice, however, it is not straightforward to learn a good model since the model highly depends on the regularization parameters that control the relative weight of the loss function and the two regularization terms. Selecting good regularization parameters can be modeled as a bi-level optimization problem, where the upper level optimizes some form of generalization error and the lower level provides a description of the solution gamut. The solution gamut is the set of feasible solutions for all possible values of the regularization parameters. In practice, it is often not feasible to describe the solution gamut efficiently. Hence, algorithmic schemes for approximating solution gamuts have been devised. One such scheme is Benson's generic vector optimization algorithm that comes with approximation guarantees. So far Benson's algorithm has not been used in conjunction with semidefinite programs like the latent variable graphical Lasso. Here, we develop an adaptive variant of Benson's algorithm for the semidefinite case and show that it keeps the known approximation and run time guarantees. Furthermore, Benson's algorithm turns out to be practically more efficient for the latent variable graphical model than the existing solution gamut approximation scheme on a wide range of data sets.
#1125

Amalgamating Filtered Knowledge: Learning Task-customized Student from Multi-task Teachers
Jingwen Ye, Xinchao Wang, Yixin Ji, Kairi Ou, Mingli Song
Details | PDF

Learning Graphical Models

Many well-trained Convolutional Neural Network~(CNN) models have now been released online by developers for the sake of effortless reproducing. In this paper, we treat such pre-trained networks as teachers and explore how to learn a target student network for customized tasks, using multiple teachers that handle different tasks. We assume no human-labelled annotations are available, and each teacher model can be either single- or multi-task network, where the former is a degenerated case of the latter. The student model, depending on the customized tasks, learns the related knowledge filtered from the multiple teachers, and eventually masters the complete or a subset of expertise from all teachers. To this end, we adopt a layer-wise training strategy, which entangles the student's network block to be learned with the corresponding teachers. As demonstrated on several benchmarks, the learned student network achieves very promising results, even outperforming the teachers on the customized tasks.
#3279

Large Scale Evolving Graphs with Burst Detection
Yifeng Zhao, Xiangwei Wang, Hongxia Yang, Le Song, Jie Tang
Details | PDF

Learning Graphical Models

Analyzing large-scale evolving graphs are crucial for understanding the dynamic and evolutionary nature of social networks. Most existing works focus on discovering repeated and consistent temporal patterns, however, such patterns cannot fully explain the complexity observed in dynamic networks. For example, in recommendation scenarios, users sometimes purchase products on a whim during a window shopping.Thus, in this paper, we design and implement a novel framework called BurstGraph which can capture both recurrent and consistent patterns, and especially unexpected bursty network changes. The performance of the proposed algorithm is demonstrated on both a simulated dataset and a world-leading E-Commerce company dataset, showing that they are able to discriminate recurrent events from extremely bursty events in terms of action propensity.
#3194

Parametric Manifold Learning of Gaussian Mixture Models
Ziquan Liu, Lei Yu, Janet H. Hsiao, Antoni B. Chan
Details | PDF

Learning Graphical Models

The Gaussian Mixture Model (GMM) is among the most widely used parametric probability distributions for representing data. However, it is complicated to analyze the relationship among GMMs since they lie on a high-dimensional manifold. Previous works either perform clustering of GMMs, which learns a limited discrete latent representation, or kernel-based embedding of GMMs, which is not interpretable due to difficulty in computing the inverse mapping. In this paper, we propose Parametric Manifold Learning of GMMs (PML-GMM), which learns a parametric mapping from a low-dimensional latent space to a high-dimensional GMM manifold. Similar to PCA, the proposed mapping is parameterized by the principal axes for the component weights, means, and covariances, which are optimized to minimize the reconstruction loss measured using Kullback-Leibler divergence (KLD). As the KLD between two GMMs is intractable, we approximate the objective function by a variational upper bound, which is optimized by an EM-style algorithm. Moreover, We derive an efficient solver by alternating optimization of subproblems and exploit Monte Carlo sampling to escape from local minima. We demonstrate the effectiveness of PML-GMM through experiments on synthetic, eye-fixation, flow cytometry, and social check-in data.

Tuesday 13 17:30 - 18:00 Early Career 2 - Early Career Spotlight 2 (J)

Chair: Dengji Zhao

#11057

Integrating Learning with Game Theory for Societal Challenges
Fei Fang
Details | PDF

Early Career Spotlight 2

Real-world problems often involve more than one decision makers, each with their own goals or preferences. While game theory is an established paradigm for reasoning strategic interactions between multiple decision-makers, its applicability in practice is often limited by the intractability of computing equilibria in large games, and the fact that the game parameters are sometimes unknown and the players are often not perfectly rational. On the other hand, machine learning and reinforcement learning have led to huge successes in various domains and can be leveraged to overcome the limitations of the game-theoretic analysis. In this paper, we introduce our work on integrating learning with computational game theory for addressing societal challenges such as security and sustainability.

Wednesday 14 08:30 - 09:20 Invited Talk (D-I)

Chair: Christian Bessiere

Empirical Model Learning: merging knowledge-based and data-driven decision models through machine learning
Michela Milano

Invited Talk

Wednesday 14 09:30 - 09:35 Industry days (D-I)

Chair: Yu Zheng

Opening Remarks

Industry days

Wednesday 14 09:30 - 10:30 AI-HWB - ST: AI for Improving Human Well-Being 1 (J)

Chair: Maria Gini

#1213

Truly Batch Apprenticeship Learning with Deep Successor Features
Donghun Lee, Srivatsan Srinivasan, Finale Doshi-Velez
Details | PDF

ST: AI for Improving Human Well-Being 1

We introduce a novel apprenticeship learning algorithm to learn an expert's underlying reward structure in off-policy model-free batch settings. Unlike existing methods that require hand-crafted features, on-policy evaluation, further data acquisition for evaluation policies or the knowledge of model dynamics, our algorithm requires only batch data (demonstrations) of the observed expert behavior. Such settings are common in many real-world tasks---health care, finance, or industrial process control---where accurate simulators do not exist and additional data acquisition is costly. We develop a transition-regularized imitation learning model to learn a rich feature representation and a near-expert initial policy that makes the subsequent batch inverse reinforcement learning process viable. We also introduce deep successor feature networks that perform off-policy evaluation to estimate feature expectations of candidate policies. Under the batch setting, our method achieves superior results on control benchmarks as well as a real clinical task of sepsis management in the Intensive Care Unit.
#1239

Automatic Grassland Degradation Estimation Using Deep Learning
Xiyu Yan, Yong Jiang, Shuai Chen, Zihao He, Chunmei Li, Shu-Tao Xia, Tao Dai, Shuo Dong, Feng Zheng
Details | PDF

ST: AI for Improving Human Well-Being 1

Grassland degradation estimation is essential to prevent global land desertification and sandstorms. Typically, the key to such estimation is to measure the coverage of indicator plants. However, traditional methods of estimation rely heavily on human eyes and manual labor, thus inevitably leading to subjective results and high labor costs. In contrast, deep learning-based image segmentation algorithms are potentially capable of automatic assessment of the coverage of indicator plants. Nevertheless, a suitable image dataset comprising grassland images is not publicly available. To this end, we build an original Automatic Grassland Degradation Estimation Dataset (AGDE-Dataset), with a large number of grassland images captured from the wild. Based on AGDE-Dataset, we are able to propose a brand new scheme to automatically estimate grassland degradation, which mainly consists of two components. 1) Semantic segmentation: we design a deep neural network with an improved encoder-decoder structure to implement semantic segmentation of grassland images. In addition, we propose a novel Focal-Hinge Loss to alleviate the class imbalance of semantics in the training stage. 2) Degradation estimation: we provide the estimation of grassland degradation based on the results of semantic segmentation. Experimental results show that the proposed method achieves satisfactory accuracy in grassland degradation estimation.
#5979

DDL: Deep Dictionary Learning for Predictive Phenotyping
Tianfan Fu, Trong Nghia Hoang, Cao Xiao, Jimeng Sun
Details | PDF

ST: AI for Improving Human Well-Being 1

Predictive phenotyping is about accurately predicting what phenotypes will occur in the next clinical visit based on longitudinal Electronic Health Record (EHR) data. Several deep learning (DL) models have demonstrated great performance in predictive phenotyping. However, these DL-based phenotyping models requires access to a large amount of labeled data, which are often expensive to acquire. To address this label-insufficient challenge, we propose a deep dictionary learning framework (DDL) for phenotyping, which utilizes unlabeled data as a complementary source of information to generate a better, more succinct data representation. With extensive experiments on multiple real-world EHR datasets, we demonstrated DDL can outperform the state of the art predictive phenotyping methods on a wide variety of clinical tasks that require patient phenotyping such as heart failure classification, mortality prediction, and sequential prediction. All empirical results consistently show that unlabeled data can indeed be used to generate better data representation, which helps improve DDL's phenotyping performance over existing baseline methods that only uses labeled data.
#203

Bidirectional Active Learning with Gold-Instance-Based Human Training
Feilong Tang
Details | PDF

ST: AI for Improving Human Well-Being 1

Active learning was proposed to improve learning performance and reduce labeling cost. However, traditional relabeling-based schemes seriously limit the ability of active learning because human may repeatedly make similar mistakes, without improving their expertise. In this paper, we propose a Bidirectional Active Learning with human Training (BALT) model that can enhance human related expertise during labeling and improve relabelingquality accordingly. We quantitatively capture how gold instances can be used to both estimate labelers? previous performance and improve their future correctness ratio. Then, we propose the backward relabeling scheme that actively selects the most likely incorrectly labeled instances for relabeling. Experimental results on three real datasets demonstrate that our BALT algorithm significantly outperforms representative related proposals.

Wednesday 14 09:30 - 10:30 ML|AML - Adversarial Machine Learning 1 (L)

Chair: Jen-Tzung Chien

#2984

Zeroth-Order Stochastic Alternating Direction Method of Multipliers for Nonconvex Nonsmooth Optimization
Feihu Huang, Shangqian Gao, Songcan Chen, Heng Huang
Details | PDF

Adversarial Machine Learning 1

Alternating direction method of multipliers (ADMM) is a popular optimization tool for the composite and constrained problems in machine learning. However, in many machine learning problems such as black-box learning and bandit feedback, ADMM could fail because the explicit gradients of these problems are difficult or even infeasible to obtain. Zeroth-order (gradient-free) methods can effectively solve these problems due to that the objective function values are only required in the optimization. Recently, though there exist a few zeroth-order ADMM methods, they build on the convexity of objective function. Clearly, these existing zeroth-order methods are limited in many applications. In the paper, thus, we propose a class of fast zeroth-order stochastic ADMM methods (\emph{i.e.}, ZO-SVRG-ADMM and ZO-SAGA-ADMM) for solving nonconvex problems with multiple nonsmooth penalties, based on the coordinate smoothing gradient estimator. Moreover, we prove that both the ZO-SVRG-ADMM and ZO-SAGA-ADMM have convergence rate of $O(1/T)$, where $T$ denotes the number of iterations. In particular, our methods not only reach the best convergence rate of $O(1/T)$ for the nonconvex optimization, but also are able to effectively solve many complex machine learning problems with multiple regularized penalties and constraints. Finally, we conduct the experiments of black-box binary classification and structured adversarial attack on black-box deep neural network to validate the efficiency of our algorithms.
#5611

On the Effectiveness of Low Frequency Perturbations
Yash Sharma, Gavin Weiguang Ding, Marcus A. Brubaker
Details | PDF

Adversarial Machine Learning 1

Carefully crafted, often imperceptible, adversarial perturbations have been shown to cause state-of-the-art models to yield extremely inaccurate outputs, rendering them unsuitable for safety-critical application domains. In addition, recent work has shown that constraining the attack space to a low frequency regime is particularly effective. Yet, it remains unclear whether this is due to generally constraining the attack search space or specifically removing high frequency components from consideration. By systematically controlling the frequency components of the perturbation, evaluating against the top-placing defense submissions in the NeurIPS 2017 competition, we empirically show that performance improvements in both the white-box and black-box transfer settings are yielded only when low frequency components are preserved. In fact, the defended models based on adversarial training are roughly as vulnerable to low frequency perturbations as undefended models, suggesting that the purported robustness of state-of-the-art ImageNet defenses is reliant upon adversarial perturbations being high frequency in nature. We do find that under L-inf-norm constraint 16/255, the competition distortion bound, low frequency perturbations are indeed perceptible. This questions the use of the L-inf-norm, in particular, as a distortion metric, and, in turn, suggests that explicitly considering the frequency space is promising for learning robust models which better align with human perception.
#5955

Harnessing the Vulnerability of Latent Layers in Adversarially Trained Models
Nupur Kumari, Mayank Singh, Abhishek Sinha, Harshitha Machiraju, Balaji Krishnamurthy, Vineeth N Balasubramanian
Details | PDF

Adversarial Machine Learning 1

Neural networks are vulnerable to adversarial attacks - small visually imperceptible crafted noise which when added to the input drastically changes the output. The most effective method of defending against adversarial attacks is to use the methodology of adversarial training. We analyze the adversarially trained robust models to study their vulnerability against adversarial attacks at the level of the latent layers. Our analysis reveals that contrary to the input layer which is robust to adversarial attack, the latent layer of these robust models are highly susceptible to adversarial perturbations of small magnitude. Leveraging this information, we introduce a new technique Latent Adversarial Training (LAT) which comprises of fine-tuning the adversarially trained models to ensure the robustness at the feature layers. We also propose Latent Attack (LA), a novel algorithm for constructing adversarial examples. LAT results in a minor improvement in test accuracy and leads to a state-of-the-art adversarial accuracy against the universal first-order adversarial PGD attack which is shown for the MNIST, CIFAR-10, CIFAR-100, SVHN and Restricted ImageNet datasets.
#10963

(Sister Conferences Best Papers Track) Adversarial Attacks on Neural Networks for Graph Data
Daniel Zügner, Amir Akbarnejad, Stephan Günnemann
Details | PDF

Adversarial Machine Learning 1

Deep learning models for graphs have achieved strong performance for the task of node classification. Despite their proliferation, currently there is no study of their robustness to adversarial attacks. Yet, in domains where they are likely to be used, e.g. the web, adversaries are common. Can deep learning models for graphs be easily fooled? In this extended abstract we summarize the key findings and contributions of our work, in which we introduce the first study of adversarial attacks on attributed graphs, specifically focusing on models exploiting ideas of graph convolutions. In addition to attacks at test time, we tackle the more challenging class of poisoning/causative attacks, which focus on the training phase of a machine learning model. We generate adversarial perturbations targeting the node's features and the graph structure, thus, taking the dependencies between instances in account. Moreover, we ensure that the perturbations remain unnoticeable by preserving important data characteristics. To cope with the underlying discrete domain we propose an efficient algorithm Nettack exploiting incremental computations. Our experimental study shows that accuracy of node classification significantly drops even when performing only few perturbations. Even more, our attacks are transferable: the learned attacks generalize to other state-of-the-art node classification models and unsupervised approaches, and likewise are successful given only limited knowledge about the graph.

Wednesday 14 09:30 - 10:30 ML|RL - Reinforcement Learning 2 (2701-2702)

Chair: I-Chen Wu

#977

Metatrace Actor-Critic: Online Step-Size Tuning by Meta-gradient Descent for Reinforcement Learning Control
Kenny Young, Baoxiang Wang, Matthew E. Taylor
Details | PDF

Reinforcement Learning 2

Reinforcement learning (RL) has had many successes, but significant hyperparameter tuning is commonly required to achieve good performance. Furthermore, when nonlinear function approximation is used, non-stationarity in the state representation can lead to learning instability. A variety of techniques exist to combat this --- most notably experience replay or the use of parallel actors. These techniques stabilize learning by making the RL problem more similar to the supervised setting. However, they come at the cost of moving away from the RL problem as it is typically formulated, that is, a single agent learning online without maintaining a large database of training examples. To address these issues, we propose Metatrace, a meta-gradient descent based algorithm to tune the step-size online. Metatrace leverages the structure of eligibility traces, and works for both tuning a scalar step-size and a respective step-size for each parameter. We empirically evaluate Metatrace for actor-critic on the Arcade Learning Environment. Results show Metatrace can speed up learning, and improve performance in non-stationary settings.
#5320

Successor Options: An Option Discovery Framework for Reinforcement Learning
Rahul Ramesh, Manan Tomar, Balaraman Ravindran
Details | PDF

Reinforcement Learning 2

The options framework in reinforcement learning models the notion of a skill or a temporally extended sequence of actions. The discovery of a reusable set of skills has typically entailed building options, that navigate to bottleneck states. In this work, we instead adopt a complementary approach, where we attempt to discover options that navigate to landmark states. These states are prototypical representatives of well-connected regions and can hence access the associated region with relative ease. In this work, we propose Successor Options, which leverages Successor representations to build a model of the state space. The intra-option policies are learnt using a novel pseudo-reward and the model scales to high-dimensional spaces since it does not construct an explicit graph of the entire state space. Additionally, we also propose an Incremental Successor Options model that iterates between constructing Successor representations and building options, which is useful when robust Successor representations cannot be built solely from primitive actions. We demonstrate the efficacy of our approach on a collection of grid-worlds, and on the high-dimensional robotic control environment of Fetch.
#5368

An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents
Felipe Petroski Such, Vashisht Madhavan, Rosanne Liu, Rui Wang, Pablo Samuel Castro, Yulun Li, Jiale Zhi, Ludwig Schubert, Marc G. Bellemare, Jeff Clune, Joel Lehman
Details | PDF

Reinforcement Learning 2

Much human and computational effort has aimed to improve how deep reinforcement learning (DRL) algorithms perform on benchmarks such as the Atari Learning Environment. Comparatively less effort has focused on understanding what has been learned by such methods, and investigating and comparing the representations learned by different families of DRL algorithms. Sources of friction include the onerous computational requirements, and general logistical and architectural complications for running DRL algorithms at scale. We lessen this friction, by (1) training several algorithms at scale and releasing trained models, (2) integrating with a previous DRL model release, and (3) releasing code that makes it easy for anyone to load, visualize, and analyze such models. This paper introduces the Atari Zoo framework, which contains models trained across benchmark Atari games, in an easy-to-use format, as well as code that implements common modes of analysis and connects such models to a popular neural network visualization library. Further, to demonstrate the potential of this dataset and software package, we show initial quantitative and qualitative comparisons between the performance and representations of several DRL algorithms, highlighting interesting and previously unknown distinctions between them.
#5528

Unobserved Is Not Equal to Non-existent: Using Gaussian Processes to Infer Immediate Rewards Across Contexts
Hamoon Azizsoltani, Yeo Jin Kim, Markel Sanz Ausin, Tiffany Barnes, Min Chi
Details | PDF

Reinforcement Learning 2

Learning optimal policies in real-world domains with delayed rewards is a major challenge in Reinforcement Learning. We address the credit assignment problem by proposing a Gaussian Process (GP)-based immediate reward approximation algorithm and evaluate its effectiveness in 4 contexts where rewards can be delayed for long trajectories. In one GridWorld game and 8 Atari games, where immediate rewards are available, our results showed that on 7 out 9 games, the proposed GP-inferred reward policy performed at least as well as the immediate reward policy and significantly outperformed the corresponding delayed reward policy. In e-learning and healthcare applications, we combined GP-inferred immediate rewards with offline Deep Q-Network (DQN) policy induction and showed that the GP-inferred reward policies outperformed the policies induced using delayed rewards in both real-world contexts.

Wednesday 14 09:30 - 10:30 AMS|EPAMBS - Economic Paradigms, Auctions and Market-Based Systems (2703-2704)

Chair: Reyhan Aydogan

#1844

Dispatching Through Pricing: Modeling Ride-Sharing and Designing Dynamic Prices
Mengjing Chen, Weiran Shen, Pingzhong Tang, Song Zuo
Details | PDF

Economic Paradigms, Auctions and Market-Based Systems

Over the past few years, ride-sharing has emerged as an effective way to relieve traffic congestion. A key problem for the ride-sharing platforms is to come up with a revenue-optimal (or GMV-optimal) pricing scheme and a vehicle dispatching policy that incorporate geographic and temporal information. In this paper, we aim to tackle this problem via an economic approach. Modeled naively, the underlying optimization problem may be non-convex and thus hard to solve. To this end, we use a so-called ``ironing'' technique to convert the problem into an equivalent convex optimization one via a clean Markov decision process (MDP) formulation, where the states are the driver distributions and the decision variables are the prices for each pair of locations. Our main finding is an efficient algorithm that computes the exact revenue-optimal (or GMV-optimal) randomized pricing scheme, which naturally induces the accompany vehicle dispatching policy. We also conduct empirical evaluations of our solution through real data of a major ride-sharing platform and show its advantages over fixed pricing schemes as well as several prevalent surge-based pricing schemes.
#5005

Strategic Signaling for Selling Information Goods
Shani Alkoby, David Sarne, Igal Milchtaich
Details | PDF

Economic Paradigms, Auctions and Market-Based Systems

This paper studies the benefit in using signaling by an information seller holding information that can completely disambiguate some uncertainty concerning the state of the world for the information buyer. We show that a necessary condition for having the information seller benefit from signaling in this model is having some ``seed of truth" in the signaling scheme used. We then introduce two natural signaling mechanisms that adhere to this condition, one where the seller pre-commits to the signaling scheme to be used and the other where she commits to use a signaling scheme that contains a ``seed of truth". Finally, we analyze the equilibrium resulting from each and show that, somehow counter-intuitively, despite the inherent differences between the two mechanisms, they are equivalent in the sense that for any equilibrium associated with the maximum revenue in one there is an equilibrium offering the seller the same revenue in the other.
#5490

Explore Truthful Incentives for Tasks with Heterogenous Levels of Difficulty in the Sharing Economy
Pengzhan Zhou, Xin Wei, Cong Wang, Yuanyuan Yang
Details | PDF

Economic Paradigms, Auctions and Market-Based Systems

Incentives are explored in the sharing economy to inspire users for better resource allocation. Previous works build a budget-feasible incentive mechanism to learn users' cost distribution. However, they only consider a special case that all tasks are considered as the same. The general problem asks for finding a solution when the cost for different tasks varies. In this paper, we investigate this general problem by considering a system with k levels of difficulty. We present two incentivizing strategies for offline and online implementation, and formally derive the ratio of utility between them in different scenarios. We propose a regret-minimizing mechanism to decide incentives by dynamically adjusting budget assignment and learning from users' cost distributions. Our experiment demonstrates utility improvement about 7 times and time saving of 54% to meet a utility objective compared to the previous works.
#6397

On the Problem of Assigning PhD Grants
Katarína Cechlárová, Laurent Gourvès, Julien Lesca
Details | PDF

Economic Paradigms, Auctions and Market-Based Systems

In this paper, we study the problem of assigning PhD grants. Master students apply for PhD grants on different topics and the number of available grants is limited. In this problem, students have preferences over topics they applied to and the university has preferences over possible matchings of student/topic that satisfy the limited number of grants. The particularity of this framework is the uncertainty on a student's decision to accept or reject a topic offered to him. Without using probability to model uncertainty, we study the possibility of designing protocols of exchanges between the students and the university in order to construct a matching which is as close as possible to the optimal one i.e., the best achievable matching without uncertainty.

Wednesday 14 09:30 - 10:30 AMS|ATM - Agent Theories and Models (2705-2706)

Chair: Martin Caminada

#63

The Interplay of Emotions and Norms in Multiagent Systems
Anup K. Kalia, Nirav Ajmeri, Kevin S. Chan, Jin-Hee Cho, Sibel Adalı, Munindar P. Singh
Details | PDF

Agent Theories and Models

We study how emotions influence norm outcomes in decision-making contexts. Following the literature, we provide baseline Dynamic Bayesian models to capture an agent's two perspectives on a directed norm. Unlike the literature, these models are holistic in that they incorporate not only norm outcomes and emotions but also trust and goals. We obtain data from an empirical study involving game play with respect to the above variables. We provide a step-wise process to discover two new Dynamic Bayesian models based on maximizing log-likelihood scores with respect to the data. We compare the new models with the baseline models to discover new insights into the relevant relationships. Our empirically supported models are thus holistic and characterize how emotions influence norm outcomes better than previous approaches.
#4198

Strategy Logic with Simple Goals: Tractable Reasoning about Strategies
Francesco Belardinelli, Wojciech Jamroga, Damian Kurpiewski, Vadim Malvone, Aniello Murano
Details | PDF

Agent Theories and Models

In this paper we introduce Strategy Logic with simple goals (SL[SG]), a fragment of Strategy Logic that strictly extends the well-known Alternating-time Temporal Logic ATL by introducing arbitrary quantification over the agents' strategies. Our motivation comes from game-theoretic applications, such as expressing Stackelberg equilibria in games, coercion in voting protocols, as well as module checking for simple goals. Most importantly, we prove that the model checking problem for SL[SG] is PTIME-complete, the same as ATL. Thus, the extra expressive power comes at no computational cost as far as verification is concerned.
#4774

Average-case Analysis of the Assignment Problem with Independent Preferences
Yansong Gao, Jie Zhang
Details | PDF

Agent Theories and Models

The fundamental assignment problem is in search of welfare maximization mechanisms to allocate items to agents when the private preferences over indivisible items are provided by self-interested agents. The mainstream mechanism \textit{Random Priority} is asymptotically the best mechanism for this purpose, when comparing its welfare to the optimal social welfare using the canonical \textit{worst-case approximation ratio}. Surprisingly, the efficiency loss indicated by the worst-case ratio does not have a constant bound \cite{FFZ:14}.Recently, \cite{DBLP:conf/mfcs/DengG017} shows that when the agents' preferences are drawn from a uniform distribution, its \textit{average-case approximation ratio} is upper bounded by 3.718. They left it as an open question of whether a constant ratio holds for general scenarios. In this paper, we offer an affirmative answer to this question by showing that the ratio is bounded by $1/\mu$ when the preference values are independent and identically distributed random variables, where $\mu$ is the expectation of the value distribution. This upper bound improves the results in \cite{DBLP:conf/mfcs/DengG017} for the Uniform distribution as well. Moreover, under mild conditions, the ratio has a \textit{constant} bound for any independent random values. En route to these results, we develop powerful tools to show the insights that for most valuation inputs, the efficiency loss is small.
#4962

On Computational Tractability for Rational Verification
Julian Gutierrez, Muhammad Najib, Giuseppe Perelli, Michael Wooldridge
Details | PDF

Agent Theories and Models

Rational verification involves checking which temporal logic properties hold of a concurrent and multiagent system, under the assumption that agents in the system choose strategies in game theoretic equilibrium. Rational verification can be understood as a counterpart of model checking for multiagent systems, but while model checking can be done in polynomial time for some temporal logic specification languages such as CTL, and polynomial space with LTL specifications, rational verification is much more intractable: it is 2EXPTIME-complete with LTL specifications, even when using explicit-state system representations. In this paper we show that the complexity of rational verification can be greatly reduced by restricting specifications to GR(1), a fragment of LTL that can represent most response properties of reactive systems. We also provide improved complexity results for rational verification when considering players' goals given by mean-payoff utility functions -- arguably the most widely used quantitative objective for agents in concurrent and multiagent systems. In particular, we show that for a number of relevant settings, rational verification can be done in polynomial space or even in polynomial time.

Wednesday 14 09:30 - 10:30 ML|TDS - Time-series;Data Streams 1 (2601-2602)

Chair: Qianli Ma

#279

E²GAN: End-to-End Generative Adversarial Network for Multivariate Time Series Imputation
Yonghong Luo, Ying Zhang, Xiangrui Cai, Xiaojie Yuan
Details | PDF

Time-series;Data Streams 1

The missing values, appear in most of multivariate time series, prevent advanced analysis of multivariate time series data. Existing imputation approaches try to deal with missing values by deletion, statistical imputation, machine learning based imputation and generative imputation. However, these methods are either incapable of dealing with temporal information or multi-stage. This paper proposes an end-to-end generative model E²GAN to impute missing values in multivariate time series. With the help of the discriminative loss and the squared error loss, E²GAN can impute the incomplete time series by the nearest generated complete time series at one stage. Experiments on multiple real-world datasets show that our model outperforms the baselines on the imputation accuracy and achieves state-of-the-art classification/regression results on the downstream applications. Additionally, our method also gains better time efficiency than multi-stage method on the training of neural networks.
#607

CLVSA: A Convolutional LSTM Based Variational Sequence-to-Sequence Model with Attention for Predicting Trends of Financial Markets
Jia Wang, Tong Sun, Benyuan Liu, Yu Cao, Hongwei Zhu
Details | PDF

Time-series;Data Streams 1

Financial markets are a complex dynamical system. The complexity comes from the interaction between a market and its participants, in other words, the integrated outcome of activities of the entire participants determines the markets trend, while the markets trend affects activities of participants. These interwoven interactions make financial markets keep evolving. Inspired by stochastic recurrent models that successfully capture variability observed in natural sequential data such as speech and video, we propose CLVSA, a hybrid model that consists of stochastic recurrent networks, the sequence-to-sequence architecture, the self- and inter-attention mechanism, and convolutional LSTM units to capture variationally underlying features in raw financial trading data. Our model outperforms basic models, such as convolutional neural network, vanilla LSTM network, and sequence-to-sequence model with attention, based on backtesting results of six futures from January 2010 to December 2017. Our experimental results show that, by introducing an approximate posterior, CLVSA takes advantage of an extra regularizer based on the Kullback-Leibler divergence to prevent itself from overfitting traps.
#4502

Confirmatory Bayesian Online Change Point Detection in the Covariance Structure of Gaussian Processes
Jiyeon Han, Kyowoon Lee, Anh Tong, Jaesik Choi
Details | PDF

Time-series;Data Streams 1

In the analysis of sequential data, the detection of abrupt changes is important in predicting future events. In this paper, we propose statistical hypothesis tests for detecting covariance structure changes in locally smooth time series modeled by Gaussian Processes (GPs). We provide theoretically justified thresholds for the tests, and use them to improve Bayesian Online Change Point Detection (BOCPD) by confirming statistically significant changes and non-changes. Our Confirmatory BOCPD (CBOCPD) algorithm finds multiple structural breaks in GPs even when hyperparameters are not tuned precisely. We also provide conditions under which CBOCPD provides the lower prediction error compared to BOCPD. Experimental results on synthetic and real-world datasets show that our proposed algorithm outperforms existing methods for the prediction of nonstationarity in terms of both regression error and log-likelihood.
#5504

Linear Time Complexity Time Series Clustering with Symbolic Pattern Forest
Xiaosheng Li, Jessica Lin, Liang Zhao
Details | PDF

Time-series;Data Streams 1

With increasing powering of data storage and advances in data generation and collection technologies, large volumes of time series data become available and the content is changing rapidly. This requires the data mining methods to have low time complexity to handle the huge and fast-changing data. This paper presents a novel time series clustering algorithm that has linear time complexity. The proposed algorithm partitions the data by checking some randomly selected symbolic patterns in the time series. Theoretical analysis is provided to show that group structures in the data can be revealed from this process. We evaluate the proposed algorithm extensively on all 85 datasets from the well-known UCR time series archive, and compare with the state-of-the-art approaches with statistical analysis. The results show that the proposed method is faster, and achieves better accuracy compared with other rival methods.

Wednesday 14 09:30 - 10:30 KRR|ARTP - Automated Reasoning and Theorem Proving (2603-2604)

Chair: Roni Stern

#2513

An ASP Approach to Generate Minimal Countermodels in Intuitionistic Propositional Logic
Camillo Fiorentini
Details | PDF

Automated Reasoning and Theorem Proving

Intuitionistic Propositional Logic is complete w.r.t. Kripke semantics: if a formula is not intuitionistically valid, then there exists a finite Kripke model falsifying it. The problem of obtaining concise models has been scarcely investigated in the literature. We present a procedure to generate minimal models in the number of worlds relying on Answer Set Programming (ASP).
#3164

Approximating Integer Solution Counting via Space Quantification for Linear Constraints
Cunjing Ge, Feifei Ma, Xutong Ma, Fan Zhang, Pei Huang, Jian Zhang
Details | PDF

Automated Reasoning and Theorem Proving

Solution counting or solution space quantification (means volume computation and volume estimation) for linear constraints (LCs) has found interesting applications in various fields. Experimental data shows that integer solution counting is usually more expensive than quantifying volume of solution space while their output values are close. So it is helpful to approximate the number of integer solutions by the volume if the error is acceptable. In this paper, we present and prove a bound of such error for LCs. It is the first bound that can be used to approximate the integer solution counts. Based on this result, an approximate integer solution counting method for LCs is proposed. Experiments show that our approach is over 20x faster than the state-of-the-art integer solution counters. Moreover, such advantage increases with the problem scale.
#2076

Solving the Satisfiability Problem of Modal Logic S5 Guided by Graph Coloring
Pei Huang, Minghao Liu, Ping Wang, Wenhui Zhang, Feifei Ma, Jian Zhang
Details | PDF

Automated Reasoning and Theorem Proving

Modal logic S5 has found various applications in artificial intelligence. With the advances in modern SAT solvers, SAT-based approach has shown great potential in solving the satisfiability problem of S5. The scale of the SAT encoding for S5 is strongly influenced by the upper bound on the number of possible worlds. In this paper, we present a novel SAT-based approach for S5 satisfiability problem. We show a normal form for S5 formulas. Based on this normal form, a conflict graph can be derived whose chromatic number provides an upper bound of the possible worlds and a lot of unnecessary search spaces can be eliminated in this process. A heuristic graph coloring algorithm is adopted to balance the efficiency and optimality. The number of possible worlds can be significantly reduced for many practical instances. Extensive experiments demonstrate that our approach outperforms state-of-the-art S5-SAT solvers.
#888

Guarantees for Sound Abstractions for Generalized Planning
Blai Bonet, Raquel Fuentetaja, Yolanda E-Martín, Daniel Borrajo
Details | PDF

Automated Reasoning and Theorem Proving

Generalized planning is about finding plans that solve collections of planning instances, often infinite collections, rather than single instances. Recently it has been shown how to reduce the planning problem for generalized planning to the planning problem for a qualitative numerical problem; the latter being a reformulation that simultaneously captures all the instances in the collection. An important thread of research thus consists in finding such reformulations, or abstractions, automatically. A recent proposal learns the abstractions inductively from a finite and small sample of transitions from instances in the collection. However, as in all inductive processes, the learned abstraction is not guaranteed to be correct for the whole collection. In this work we address this limitation by performing an analysis of the abstraction with respect to the collection, and show how to obtain formal guarantees for generalization. These guarantees, in the form of first-order formulas, may be used to 1) define subcollections of instances on which the abstraction is guaranteed to be sound, 2) obtain necessary conditions for generalization under certain assumptions, and 3) do automated synthesis of complex invariants for planning problems. Our framework is general, it can be extended or combined with other approaches, and it has applications that go beyond generalized planning.

Wednesday 14 09:30 - 10:30 NLP|IE - Information Extraction 1 (2605-2606)

Chair: Shourya Roy

#1684

End-to-End Multi-Perspective Matching for Entity Resolution
Cheng Fu, Xianpei Han, Le Sun, Bo Chen, Wei Zhang, Suhui Wu, Hao Kong
Details | PDF

Information Extraction 1

Entity resolution (ER) aims to identify data records referring to the same real-world entity. Due to the heterogeneity of entity attributes and the diversity of similarity measures, one main challenge of ER is how to select appropriate similarity measures for different attributes. Previous ER methods usually employ heuristic similarity selection algorithms, which are highly specialized to specific ER problems and are hard to be generalized to other situations. Furthermore, previous studies usually perform similarity learning and similarity selection independently, which often result in error propagation and are hard to be optimized globally. To resolve the above problems, this paper proposes an end-to-end multi-perspective entity matching model, which can adaptively select optimal similarity measures for heterogenous attributes by jointly learning and selecting similarity measures in an end-to-end way. Experiments on two real-world datasets show that our method significantly outperforms previous ER methods.
#3048

Improving Cross-Domain Performance for Relation Extraction via Dependency Prediction and Information Flow Control
Amir Pouran Ben Veyseh, Thien Nguyen, Dejing Dou
Details | PDF

Information Extraction 1

Relation Extraction (RE) is one of the fundamental tasks in Information Extraction and Natural Language Processing. Dependency trees have been shown to be a very useful source of information for this task. The current deep learning models for relation extraction has mainly exploited this dependency information by guiding their computation along the structures of the dependency trees. One potential problem with this approach is it might prevent the models from capturing important context information beyond syntactic structures and cause the poor cross-domain generalization. This paper introduces a novel method to use dependency trees in RE for deep learning models that jointly predicts dependency and semantics relations. We also propose a new mechanism to control the information flow in the model based on the input entity mentions. Our extensive experiments on benchmark datasets show that the proposed model outperforms the existing methods for RE significantly.
#3319

Neural Collective Entity Linking Based on Recurrent Random Walk Network Learning
Mengge Xue, Weiming Cai, Jinsong Su, Linfeng Song, Yubin Ge, Yubao Liu, Bin Wang
Details | PDF

Information Extraction 1

Beneﬁting from the excellent ability of neural networks on learning semantic representations, existing studies for entity linking (EL) have resorted to neural networks to exploit both the local mention-to-entity compatibility and the global interdependence between different EL decisions for target entity disambiguation. However, most neural collective EL methods depend entirely upon neural networks to automatically model the semantic dependencies between different EL decisions, which lack of the guidance from external knowledge. In this paper, we propose a novel end-to-end neural network with recurrent random-walk layers for collective EL, which introduces external knowledge to model the semantic interdependence between different EL decisions. Speciﬁcally, we ﬁrst establish a model based on local context features, and then stack random-walk layers to reinforce the evidence for related EL decisions into high-probability decisions, where the semantic interdependence between candidate entities is mainly induced from an external knowledge base. Finally, a semantic regularizer that preserves the collective EL decisions consistency is incorporated into the conventional objective function, so that the external knowledge base can be fully exploited in collective EL decisions. Experimental results and in-depth analysis on various datasets show that our model achieves better performance than other state-of-the-art models. Our code and data are released at https://github.com/DeepLearnXMU/RRWEL.
#5335

Coreference Aware Representation Learning for Neural Named Entity Recognition
Zeyu Dai, Hongliang Fei, Ping Li
Details | PDF

Information Extraction 1

Recent neural network models have achieved state-of-the-art performance on the task of named entity recognition (NER). However, previous neural network models typically treat the input sentences as a linear sequence of words but ignore rich structural information, such as the coreference relations among non-adjacent words, phrases or entities. In this paper, we propose a novel approach to learn coreference-aware word representations for the NER task at the document level. In particular, we enrich the well-known neural architecture ``CNN-BiLSTM-CRF'' with a coreference layer on top of the BiLSTM layer to incorporate coreferential relations. Furthermore, we introduce the coreference regularization to ensure the coreferential entities to share similar representations and consistent predictions within the same coreference cluster. Our proposed model achieves new state-of-the-art performance on two NER benchmarks: CoNLL-2003 and OntoNotes v5.0. More importantly, we demonstrate that our framework does not rely on gold coreference knowledge, and can still work well even when the coreferential relations are generated by a third-party toolkit.

Wednesday 14 09:30 - 10:30 CV|VEAS - Video: Events, Activities and Surveillance (2501-2502)

Chair: Wenbing Huang

#1593

Supervised Set-to-Set Hashing in Visual Recognition
I-Hong Jhuo
Details | PDF

Video: Events, Activities and Surveillance

Visual data, such as an image or a sequence of video frames, is often naturally represented as a point set. In this paper, we consider the fundamental problem of finding a nearest set from a collection of sets, to a query set. This problem has obvious applications in large-scale visual retrieval and recognition, and also in applied fields beyond computer vision. One challenge stands out in solving the problem---set representation and measure of similarity. Particularly, the query set and the sets in dataset collection can have varying cardinalities. The training collection is large enough such that linear scan is impractical. We propose a simple representation scheme that encodes both statistical and structural information of the sets. The derived representations are integrated in a kernel framework for flexible similarity measurement. For the query set process, we adopt a learning-to-hash pipeline that turns the kernel representations into hash bits based on simple learners, using multiple kernel learning. Experiments on two visual retrieval datasets show unambiguously that our set-to-set hashing framework outperforms prior methods that do not take the set-to-set search setting.
#4203

Variation Generalized Feature Learning via Intra-view Variation Adaptation
Jiawei Li, Mang Ye, Andy Jinhua Ma, Pong C Yuen
Details | PDF

Video: Events, Activities and Surveillance

This paper addresses the variation generalized feature learning problem in unsupervised video-based person re-identification (re-ID). With advanced tracking and detection algorithms, large-scale intra-view positive samples can be easily collected by assuming that the image frames within the tracking sequence belong to the same person. Existing methods either directly use the intra-view positives to model cross-view variations or simply minimize the intra-view variations to capture the invariant component with some discriminative information loss. In this paper, we propose a Variation Generalized Feature Learning (VGFL) method to learn adaptable feature representation with intra-view positives. The proposed method can learn a discriminative re-ID model without any manually annotated cross-view positive sample pairs. It could address the unseen testing variations with a novel variation generalized feature learning algorithm. In addition, an Adaptability-Discriminability (AD) fusion method is introduced to learn adaptable video-level features. Extensive experiments on different datasets demonstrate the effectiveness of the proposed method.
#4243

DBDNet: Learning Bi-directional Dynamics for Early Action Prediction
Guoliang Pang, Xionghui Wang, Jian-Fang Hu, Qing Zhang, Wei-Shi Zheng
Details | PDF

Video: Events, Activities and Surveillance

Predicting future actions from observed partial videos is very challenging as the missing future is uncertain and sometimes has multiple possibilities. To obtain a reliable future estimation, a novel encoder-decoder architecture is proposed for integrating the tasks of synthesizing future motions from observed videos and reconstructing observed motions from synthesized future motions in an unified framework, which can capture the bi-directional dynamics depicted in partial videos along the temporal (past-to-future) direction and reverse chronological (future-back-to-past) direction. We then employ a bi-directional long short-term memory (Bi-LSTM) architecture to exploit the learned bi-directional dynamics for predicting early actions. Our experiments on two benchmark action datasets show that learning bi-directional dynamics benefits the early action prediction and our system clearly outperforms the state-of-the-art methods.
#3056

Predicting dominance in multi-person videos
Chongyang Bai, Maksim Bolonkin, Srijan Kumar, Jure Leskovec, Judee Burgoon, Norah Dunbar, V. S. Subrahmanian
Details | PDF

Video: Events, Activities and Surveillance

We consider the problems of predicting (i) the most dominant person in a group of people, and (ii) the more dominant of a pair of people, from videos depicting group interactions. We introduce a novel family of variables called Dominance Rank. We combine features not previously used for dominance prediction (e.g., facial action units, emotions), with a novel ensemble-based approach to solve these two problems. We test our models against four competing algorithms in the literature on two datasets and show that our results improve past performance. We show 2.4% to 16.7% improvement in AUC compared to baselines on one dataset, and a gain of 0.6% to 8.8% in accuracy on the other. Ablation testing shows that Dominance Rank features play a key role.

Wednesday 14 09:30 - 10:30 R|MPP - Motion and Path Planning (2503-2504)

Chair: Masoumeh Mansouri

#1466

The Parameterized Complexity of Motion Planning for Snake-Like Robots
Siddharth Gupta, Guy Sa'ar, Meirav Zehavi
Details | PDF

Motion and Path Planning

We study a motion-planning problem inspired by the game Snake that models scenarios like the transportation of linked wagons towed by a locomotor to the movement of a group of agents that travel in an ``ant-like'' fashion. Given a ``snake-like'' robot with initial and final positions in an environment modeled by a graph, our goal is to decide whether the robot can reach the final position from the initial position without intersecting itself. Already on grid graphs, this problem is PSPACE-complete [Biasi and Ophelders, 2018]. Nevertheless, we prove that even on general graphs, it is solvable in time k^{O(k)}|I|^{O(1)} where k is the size of the robot, and |I| is the input size. Towards this, we give a novel application of color-coding to sparsify the configuration graph of the problem. We also show that the problem is unlikely to have a polynomial kernel even on grid graphs, but it admits a treewidth-reduction procedure. To the best of our knowledge, the study of the parameterized complexity of motion problems has been~largely~neglected, thus our work is pioneering in this regard.
#10955

(Sister Conferences Best Papers Track) The Provable Virtue of Laziness in Motion Planning
Nika Haghtalab, Simon Mackenzie, Ariel D. Procaccia, Oren Salzman, Siddhartha Srinivasa
Details | PDF

Motion and Path Planning

The Lazy Shortest Path (LazySP) class consists of motion-planning algorithms that only evaluate edges along candidate shortest paths between the source and target. These algorithms were designed to minimize the number of edge evaluations in settings where edge evaluation dominates the running time of the algorithm such as manipulation in cluttered environments and planning for robots in surgical settings; but how close to optimal are LazySP algorithms in terms of this objective? Our main result is an analytical upper bound, in a probabilistic model, on the number of edge evaluations required by LazySP algorithms; a matching lower bound shows that these algorithms are asymptotically optimal in the worst case.
#656

Energy-Efficient Slithering Gait Exploration for a Snake-Like Robot Based on Reinforcement Learning
Zhenshan Bing, Christian Lemke, Zhuangyi Jiang, Kai Huang, Alois Knoll
Details | PDF

Motion and Path Planning

Similar to their counterparts in nature, the flexible bodies of snake-like robots enhance their movement capability and adaptability in diverse environments. However, this flexibility corresponds to a complex control task involving highly redundant degrees of freedom, where traditional model-based methods usually fail to propel the robots energy-efficiently. In this work, we present a novel approach for designing an energy-efficient slithering gait for a snake-like robot using a model-free reinforcement learning (RL) algorithm. Specifically, we present an RL-based controller for generating locomotion gaits at a wide range of velocities, which is trained using the proximal policy optimization (PPO) algorithm. Meanwhile, a traditional parameterized gait controller is presented and the parameter sets are optimized using the grid search and Bayesian optimization algorithms for the purposes of reasonable comparisons. Based on the analysis of the simulation results, we demonstrate that this RL-based controller exhibits very natural and adaptive movements, which are also substantially more energy-efficient than the gaits generated by the parameterized controller. Videos are shown at https://videoviewsite.wixsite.com/rlsnake .
#10966

(Sister Conferences Best Papers Track) Differentiable Physics and Stable Modes for Tool-Use and Manipulation Planning - Extended Abtract
Marc Toussaint, Kelsey R. Allen, Kevin A. Smith, Joshua B. Tenenbaum
Details | PDF

Motion and Path Planning

We propose to formulate physical reasoning and manipulation planning as an optimization problem that integrates first order logic, which we call Logic-Geometric Programming.

Wednesday 14 09:30 - 10:30 ML|DM - Data Mining 4 (2505-2506)

Chair: Decebal Constantin Mocanu

#938

A Degeneracy Framework for Scalable Graph Autoencoders
Guillaume Salha, Romain Hennequin, Viet Anh Tran, Michalis Vazirgiannis
Details | PDF

Data Mining 4

In this paper, we present a general framework to scale graph autoencoders (AE) and graph variational autoencoders (VAE). This framework leverages graph degeneracy concepts to train models only from a dense subset of nodes instead of using the entire graph. Together with a simple yet effective propagation mechanism, our approach significantly improves scalability and training speed while preserving performance. We evaluate and discuss our method on several variants of existing graph AE and VAE, providing the first application of these models to large graphs with up to millions of nodes and edges. We achieve empirically competitive results w.r.t. several popular scalable node embedding methods, which emphasizes the relevance of pursuing further research towards more scalable graph AE and VAE.
#2479

Community Detection and Link Prediction via Cluster-driven Low-rank Matrix Completion
Junming Shao, Zhong Zhang, Zhongjing Yu, Jun Wang, Yi Zhao, Qinli Yang
Details | PDF

Data Mining 4

Community detection and link prediction are highly dependent since knowing cluster structure as a priori will help identify missing links, and in return, clustering on networks with supplemented missing links will improve community detection performance. In this paper, we propose a Cluster-driven Low-rank Matrix Completion (CLMC), for performing community detection and link prediction simultaneously in a unified framework. To this end, CLMC decomposes the adjacent matrix of a target network as three additive matrices: clustering matrix, noise matrix and supplement matrix. The community-structure and low-rank constraints are imposed on the clustering matrix, such that the noisy edges between communities are removed and the resulting matrix is an ideal block-diagonal matrix. Missing edges are further learned via low-rank matrix completion. Extensive experiments show that CLMC achieves state-of-the-art performance.
#4127

Graph Convolutional Networks on User Mobility Heterogeneous Graphs for Social Relationship Inference
Yongji Wu, Defu Lian, Shuowei Jin, Enhong Chen
Details | PDF

Data Mining 4

Inferring social relations from user trajectory data is of great value in real-world applications such as friend recommendation and ride-sharing. Most existing methods predict relationship based on a pairwise approach using some hand-crafted features or rely on a simple skip-gram based model to learn embeddings on graphs. Using hand-crafted features often fails to capture the complex dynamics in human social relations, while the graph embedding based methods only use random walks to propagate information and cannot incorporate external semantic data provided. We propose a novel model that utilizes Graph Convolutional Networks (GCNs) to learn user embeddings on the User Mobility Heterogeneous Graph in an unsupervised manner. This model is capable of propagating relation layer-wisely as well as combining both the rich structural information in the heterogeneous graph and predictive node features provided. Our method can also be extended to a semi-supervised setting if a part of the social network is available. The evaluation on three real-world datasets demonstrates that our method outperforms the state-of-the-art approaches.
#10973

(Sister Conferences Best Papers Track) Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms
Panagiotis Mandros, Mario Boley, Jilles Vreeken
Details | PDF

Data Mining 4

The reliable fraction of information is an attractive score for quantifying (functional) dependencies in high-dimensional data. In this paper, we systematically explore the algorithmic implications of using this measure for optimization. We show that the problem is NP-hard, justifying worst-case exponential-time as well as heuristic search methods. We then substantially improve the practical performance for both optimization styles by deriving a novel admissible bounding function that has an unbounded potential for additional pruning over the previously proposed one. Finally, we empirically investigate the approximation ratio of the greedy algorithm and show that it produces highly competitive results in a fraction of time needed for complete branch-and-bound style search.

Wednesday 14 09:30 - 10:30 ML|TAML - Transfer, Adaptation, Multi-task Learning 1 (2401-2402)

Chair: Michael Perrot

#2102

Weak Supervision Enhanced Generative Network for Question Generation
Yutong Wang, Jiyuan Zheng, Qijiong Liu, Zhou Zhao, Jun Xiao, Yueting Zhuang
Details | PDF

Transfer, Adaptation, Multi-task Learning 1

Automatic question generation according to an answer within the given passage is useful for many applications, such as question answering system, dialogue system, etc. Current neural-based methods mostly take two steps which extract several important sentences based on the candidate answer through manual rules or supervised neural networks and then use an encoder-decoder framework to generate questions about these sentences. These approaches still acquire two steps and neglect the semantic relations between the answer and the context of the whole passage which is sometimes necessary for answering the question. To address this problem, we propose the Weakly Supervision Enhanced Generative Network (WeGen) which automatically discovers relevant features of the passage given the answer span in a weakly supervised manner to improve the quality of generated questions. More specifically, we devise a discriminator, Relation Guider, to capture the relations between the passage and the associated answer and then the Multi-Interaction mechanism is deployed to transfer the knowledge dynamically for our question generation system. Experiments show the effectiveness of our method in both automatic evaluations and human evaluations.
#2519

Fast and Robust Multi-View Multi-Task Learning via Group Sparsity
Lu Sun, Canh Hao Nguyen, Hiroshi Mamitsuka
Details | PDF

Transfer, Adaptation, Multi-task Learning 1

Multi-view multi-task learning has recently attracted more and more attention due to its dual-heterogeneity, i.e.,each task has heterogeneous features from multiple views, and probably correlates with other tasks via common views.Existing methods usually suffer from three problems: 1) lack the ability to eliminate noisy features, 2) hold a strict assumption on view consistency and 3) ignore the possible existence of task-view outliers.To overcome these limitations, we propose a robust method with joint group-sparsity by decomposing feature parameters into a sum of two components,in which one saves relevant features (for Problem 1) and flexible view consistency (for Problem 2),while the other detects task-view outliers (for Problem 3).With a global convergence property, we develop a fast algorithm to solve the optimization problem in a linear time complexity w.r.t. the number of features and labeled samples.Extensive experiments on various synthetic and real-world datasets demonstrate its effectiveness.
#5280

A Principled Approach for Learning Task Similarity in Multitask Learning
Changjian Shui, Mahdieh Abbasi, Louis-Émile Robitaille, Boyu Wang, Christian Gagné
Details | PDF

Transfer, Adaptation, Multi-task Learning 1

Multitask learning aims at solving a set of related tasks simultaneously, by exploiting the shared knowledge for improving the performance on individual tasks. Hence, an important aspect of multitask learning is to understand the similarities within a set of tasks. Previous works have incorporated this similarity information explicitly (e.g., weighted loss for each task) or implicitly (e.g., adversarial loss for feature adaptation), for achieving good empirical performances. However, the theoretical motivations for adding task similarity knowledge are often missing or incomplete. In this paper, we give a different perspective from a theoretical point of view to understand this practice. We first provide an upper bound on the generalization error of multitask learning, showing the benefit of explicit and implicit task similarity knowledge. We systematically derive the bounds based on two distinct task similarity metrics: H divergence and Wasserstein distance. From these theoretical results, we revisit the Adversarial Multi-task Neural Network, proposing a new training algorithm to learn the task relation coefficients and neural network parameters iteratively. We assess our new algorithm empirically on several benchmarks, showing not only that we find interesting and robust task relations, but that the proposed approach outperforms the baselines, reaffirming the benefits of theoretical insight in algorithm design.
#843

Node Embedding over Temporal Graphs
Uriel Singer, Ido Guy, Kira Radinsky
Details | PDF

Transfer, Adaptation, Multi-task Learning 1

In this work, we present a method for node embedding in temporal graphs. We propose an algorithm that learns the evolution of a temporal graph's nodes and edges over time and incorporates this dynamics in a temporal node embedding framework for different graph prediction tasks. We present a joint loss function that creates a temporal embedding of a node by learning to combine its historical temporal embeddings, such that it optimizes per given task (e.g., link prediction). The algorithm is initialized using static node embeddings, which are then aligned over the representations of a node at different time points, and eventually adapted for the given task in a joint optimization. We evaluate the effectiveness of our approach over a variety of temporal graphs for the two fundamental tasks of temporal link prediction and multi-label node classification, comparing to competitive baselines and algorithmic alternatives. Our algorithm shows performance improvements across many of the datasets and baselines and is found particularly effective for graphs that are less cohesive, with a lower clustering coefficient.

Wednesday 14 09:30 - 10:30 AMS|RA - Resource Allocation (2403-2404)

Chair: Iannis Caragiannis

#119

Almost Envy-Freeness in Group Resource Allocation
Maria Kyropoulou, Warut Suksompong, Alexandros A. Voudouris
Details | PDF

Resource Allocation

We study the problem of fairly allocating indivisible goods between groups of agents using the recently introduced relaxations of envy-freeness. We consider the existence of fair allocations under different assumptions on the valuations of the agents. In particular, our results cover cases of arbitrary monotonic, responsive, and additive valuations, while for the case of binary valuations we fully characterize the cardinalities of two groups of agents for which a fair allocation can be guaranteed with respect to both envy-freeness up to one good (EF1) and envy-freeness up to any good (EFX). Moreover, we introduce a new model where the agents are not partitioned into groups in advance, but instead the partition can be chosen in conjunction with the allocation of the goods. In this model, we show that for agents with arbitrary monotonic valuations, there is always a partition of the agents into two groups of any given sizes along with an EF1 allocation of the goods. We also provide an extension of this result to any number of groups.
#438

The Price of Fairness for Indivisible Goods
Xiaohui Bei, Xinhang Lu, Pasin Manurangsi, Warut Suksompong
Details | PDF

Resource Allocation

We investigate the efficiency of fair allocations of indivisible goods using the well-studied price of fairness concept. Previous work has focused on classical fairness notions such as envy-freeness, proportionality, and equitability. However, these notions cannot always be satisfied for indivisible goods, leading to certain instances being ignored in the analysis. In this paper, we focus instead on notions with guaranteed existence, including envy-freeness up to one good (EF1), balancedness, maximum Nash welfare (MNW), and leximin. We mostly provide tight or asymptotically tight bounds on the worst-case efficiency loss for allocations satisfying these notions.
#5104

Reallocating Multiple Facilities on the Line
Dimitris Fotakis, Loukas Kavouras, Panagiotis Kostopanagiotis, Philip Lazos, Stratis Skoulakis, Nikos Zarifis
Details | PDF

Resource Allocation

We study the multistage K-facility reallocation problem on the real line, where we maintain K facility locations over T stages, based on the stage-dependent locations of n agents. Each agent is connected to the nearest facility at each stage, and the facilities may move from one stage to another, to accommodate different agent locations. The objective is to minimize the connection cost of the agents plus the total moving cost of the facilities, over all stages. K-facility reallocation problem was introduced by (B.D. Kaijzer and D. Wojtczak, IJCAI 2018), where they mostly focused on the special case of a single facility. Using an LP-based approach, we present a polynomial time algorithm that computes the optimal solution for any number of facilities. We also consider online K-facility reallocation, where the algorithm becomes aware of agent locations in a stage-by stage fashion. By exploiting an interesting connection to the classical K-server problem, we present a constant-competitive algorithm for K = 2 facilities.
#1823

Equitable Allocations of Indivisible Goods
Rupert Freeman, Sujoy Sikdar, Rohit Vaish, Lirong Xia
Details | PDF

Resource Allocation

In fair division, equitability dictates that each participant receives the same level of utility. In this work, we study equitable allocations of indivisible goods among agents with additive valuations. While prior work has studied (approximate) equitability in isolation, we consider equitability in conjunction with other well-studied notions of fairness and economic efficiency. We show that the Leximin algorithm produces an allocation that satisfies equitability up to any good and Pareto optimality. We also give a novel algorithm that guarantees Pareto optimality and equitability up to one good in pseudopolynomial time. Our experiments on real-world preference data reveal that approximate envy-freeness, approximate equitability, and Pareto optimality can often be achieved simultaneously.

Wednesday 14 09:30 - 10:30 HAI|CM - Cognitive Modeling (2405-2406)

Chair: Guibing Guo

#3037

A Semantics-based Model for Predicting Children's Vocabulary
Ishaan Grover, Hae Won Park, Cynthia Breazeal
Details | PDF

Cognitive Modeling

Intelligent tutoring systems (ITS) provide educational benefits through one-on-one tutoring by assessing children's existing knowledge and providing tailored educational content. In the domain of language acquisition, several studies have shown that children often learn new words by forming semantic relationships with words they already know. In this paper, we present a model that uses word semantics (semantics-based model) to make inferences about a child's vocabulary from partial information about their existing vocabulary knowledge. We show that the proposed semantics-based model outperforms models that do not use word semantics (semantics-free models) on average. A subject-level analysis of results reveals that different models perform well for different children, thus motivating the need to combine predictions. To this end, we use two methods to combine predictions from semantics-based and semantics-free models and show that these methods yield better predictions of a child's vocabulary knowledge. Our results motivate the use of semantics-based models to assess children's vocabulary knowledge and build ITS that maximizes children's semantic understanding of words.
#3647

Fast and Accurate Classification with a Multi-Spike Learning Algorithm for Spiking Neurons
Rong Xiao, Qiang Yu, Rui Yan, Huajin Tang
Details | PDF

Cognitive Modeling

The formulation of efficient supervised learning algorithms for spiking neurons is complicated and remains challenging. Most existing learning methods with the precisely firing times of spikes often result in relatively low efficiency and poor robustness to noise. To address these limitations, we propose a simple and effective multi-spike learning rule to train neurons to match their output spike number with a desired one. The proposed method will quickly find a local maximum value (directly related to the embedded feature) as the relevant signal for synaptic updates based on membrane potential trace of a neuron, and constructs an error function defined as the difference between the local maximum membrane potential and the firing threshold. With the presented rule, a single neuron can be trained to learn multi-category tasks, and can successfully mitigate the impact of the input noise and discover embedded features. Experimental results show the proposed algorithm has higher precision, lower computation cost, and better noise robustness than current state-of-the-art learning methods under a wide range of learning tasks.
#6329

STCA: Spatio-Temporal Credit Assignment with Delayed Feedback in Deep Spiking Neural Networks
Pengjie Gu, Rong Xiao, Gang Pan, Huajin Tang
Details | PDF

Cognitive Modeling

The temporal credit assignment problem, which aims to discover the predictive features hidden in distracting background streams with delayed feedback, remains a core challenge in biological and machine learning. To address this issue, we propose a novel spatio-temporal credit assignment algorithm called STCA for training deep spiking neural networks (DSNNs). We present a new spatiotemporal error backpropagation policy by defining a temporal based loss function, which is able to credit the network losses to spatial and temporal domains simultaneously. Experimental results on MNIST dataset and a music dataset (MedleyDB) demonstrate that STCA can achieve comparable performance with other state-of-the-art algorithms with simpler architectures. Furthermore, STCA successfully discovers predictive sensory features and shows the highest performance in the unsegmented sensory event detection tasks.
#10962

(Sister Conferences Best Papers Track) Trust Dynamics and Transfer across Human-Robot Interaction Tasks: Bayesian and Neural Computational Models
Harold Soh, Shu Pan, Min Chen, David Hsu
Details | PDF

Cognitive Modeling

This work contributes both experimental findings and novel computational human-robot trust models for multi-task settings. We describe Bayesian non-parametric and neural models, and compare their performance on data collected from real-world human-subjects study. Our study spans two distinct task domains: household tasks performed by a Fetch robot, and a virtual reality driving simulation of an autonomous vehicle performing a variety of maneuvers. We find that human trust changes and transfers across tasks in a structured manner based on perceived task characteristics. Our results suggest that task-dependent functional trust models capture human trust in robot capabilities more accurately, and trust transfer across tasks can be inferred to a good degree. We believe these models are key for enabling trust-based robot decision-making for natural human-robot interaction.

Wednesday 14 09:30 - 10:30 DemoT2 - Demo Talks 2 (2306)

Chair: Andrew Perrault

#11026

An Online Intelligent Visual Interaction System
Anxiang Zeng, Han Yu, Xin Gao, Kairi Ou, Zhenchuan Huang, Peng Hou, Mingli Song, Jingshu Zhang, Chunyan Miao
Details | PDF

Demo Talks 2

This paper proposes an Online Intelligent Visual Interactive System (OIVIS), which can be applied to various live video broadcast and short video scenes to provide an interactive user experience. In the live video broadcast, the anchor can issue various commands by using pre-defined gestures, and can trigger real-time background replacement to create an immersive atmosphere. To support such dynamic interactivity, we implemented algorithms including real-time gesture recognition and real-time video portrait segmentation, developed a deep network inference framework, and a real-time rendering framework AI Gender at the front end to create a complete set of visual interaction solutions for use in resource constrained mobile.
#11030

ERICA and WikiTalk
Divesh Lala, Graham Wilcock, Kristiina Jokinen, Tatsuya Kawahara
Details | PDF

Demo Talks 2

The demo shows ERICA, a highly realistic female android robot, and WikiTalk, an application that helps robots to talk about thousands of topics using information from Wikipedia. The combination of ERICA and WikiTalk results in more natural and engaging human-robot conversations.
#11038

Hintikka's World: Scalable Higher-order Knowledge
Tristan Charrier, Sébastien Gamblin, Alexandre Niveau, François Schwarzentruber
Details | PDF

Demo Talks 2

Hintikka's World is a graphical and pedagogical tool that shows how artificial agents can reason about higher-order knowledge. In this demonstration paper, we present the implementation of symbolic models in Hintikka's World. They enable the tool to scale, by helping it to face the state explosion, which makes it possible to provide examples featuring real card games, such as Hanabi.
#11044

DISPUTool -- A tool for the Argumentative Analysis of Political Debates
Shohreh Haddadan, Elena Cabrio, Serena Villata
Details | PDF

Demo Talks 2

Political debates are the means used by political candidates to put forward and justify their positions in front of the electors with respect to the issues at stake. Argument mining is a novel research area in Artificial Intelligence, aiming at analyzing discourse on the pragmatics level and applying a certain argumentation theory to model and automatically analyze textual data. In this paper, we present DISPUTool, a tool designed to ease the work of historians and social science scholars in analyzing the argumentative content of political speeches. More precisely, DISPUTool allows to explore and automatically identify argumentative components over the 39 political debates from the last 50 years of US presidential campaigns (1960-2016).
#11048

Mappa Mundi: An Interactive Artistic Mind Map Generator with Artificial Imagination
Ruixue Liu, Baoyang Chen, Meng Chen, Youzheng Wu, Zhijie Qiu, Xiaodong He
Details | PDF

Demo Talks 2

We present a novel real-time, collaborative, and interactive AI painting system, Mappa Mundi, for artistic Mind Map creation. The system consists of a voice-based input interface, an automatic topic expansion module, and an image projection module. The key innovation is to inject Artificial Imagination into painting creation by considering lexical and phonological similarities of language, learning and inheriting artist’s original painting style, and applying the principles of Dadaism and impossibility of improvisation. Our system indicates that AI and artist can collaborate seamlessly to create imaginative artistic painting and Mappa Mundi has been applied in art exhibition in UCCA, Beijing.
#11027

The Open Vault Challenge - Learning How to Build Calibration-Free Interactive Systems by Cracking the Code of a Vault
Jonathan Grizou
Details | PDF

Demo Talks 2

This demo takes the form of a challenge to the IJCAI community. A physical vault, secured by a 4-digit code, will be placed in the demo area. The author will publicly open the vault by entering the code on a touch-based interface, and as many times as requested. The challenge to the IJCAI participants will be to crack the code, open the vault, and collect its content. The interface is based on previous work on calibration-free interactive systems that enables a user to start instructing a machine without the machine knowing how to interpret the user’s actions beforehand. The intent and the behavior of the human are simultaneously learned by the machine. An online demo and videos are available for readers to participate in the challenge. An additional interface using vocal commands will be revealed on the demo day, demonstrating the scalability of our approach to continuous input signals.

Wednesday 14 09:30 - 18:00 Competition (2305)

AIBIRDS 2019: The 8th Angry Birds AI Competition

Competition

Wednesday 14 09:30 - 18:00 DB2 - Demo Booths 2 (Hall A)

Chair: TBA

#11022

Fair and Explainable Dynamic Engagement of Crowd Workers
Han Yu, Yang Liu, Xiguang Wei, Chuyu Zheng, Tianjian Chen, Qiang Yang, Xiong Peng
Details | PDF

Demo Booths 2

Years of rural-urban migration has resulted in a significant population in China seeking ad-hoc work in large urban centres. At the same time, many businesses face large fluctuations in demand for manpower and require more efficient ways to satisfy such demands. This paper outlines AlgoCrowd, an artificial intelligence (AI)-empowered algorithmic crowdsourcing platform. Equipped with an efficient explainable task-worker matching optimization approach designed to focus on fair treatment of workers while maximizing collective utility, the platform provides explainable task recommendations to workers' personal work management mobile apps which are becoming popular, with the aim to address the above societal challenge.
#11024

Multi-Agent Visualization for Explaining Federated Learning
Xiguang Wei, Quan Li, Yang Liu, Han Yu, Tianjian Chen, Qiang Yang
Details | PDF

Demo Booths 2

As an alternative decentralized training approach, Federated Learning enables distributed agents to collaboratively learn a machine learning model while keeping personal/private information on local devices. However, one significant issue of this framework is the lack of transparency, thus obscuring understanding of the working mechanism of Federated Learning systems. This paper proposes a multi-agent visualization system that illustrates what is Federated Learning and how it supports multi-agents coordination. To be specific, it allows users to participate in the Federated Learning empowered multi-agent coordination. The input and output of Federated Learning are visualized simultaneously, which provides an intuitive explanation of Federated Learning for users in order to help them gain deeper understanding of the technology.
#11028

AiD-EM: Adaptive Decision Support for Electricity Markets Negotiations
Tiago Pinto, Zita Vale
Details | PDF

Demo Booths 2

This paper presents the Adaptive Decision Support for Electricity Markets Negotiations (AiD-EM) system. AiD-EM is a multi-agent system that provides decision support to market players by incorporating multiple sub-(agent-based) systems, directed to the decision support of specific problems. These sub-systems make use of different artificial intelligence methodologies, such as machine learning and evolutionary computing, to enable players adaptation in the planning phase and in actual negotiations in auction-based markets and bilateral negotiations. AiD-EM demonstration is enabled by its connection to MASCEM (Multi-Agent Simulator of Competitive Electricity Markets).
#11038

Hintikka's World: Scalable Higher-order Knowledge
Tristan Charrier, Sébastien Gamblin, Alexandre Niveau, François Schwarzentruber
Details | PDF

Demo Booths 2

Hintikka's World is a graphical and pedagogical tool that shows how artificial agents can reason about higher-order knowledge. In this demonstration paper, we present the implementation of symbolic models in Hintikka's World. They enable the tool to scale, by helping it to face the state explosion, which makes it possible to provide examples featuring real card games, such as Hanabi.
#11032

Embodied Conversational AI Agents in a Multi-modal Multi-agent Competitive Dialogue
Rahul R. Divekar, Xiangyang Mou, Lisha Chen, Maíra Gatti de Bayser, Melina Alberio Guerra, Hui Su
Details | PDF

Demo Booths 2

In a setting where two AI agents embodied as animated humanoid avatars are engaged in a conversation with one human and each other, we see two challenges. One, determination by the AI agents about which one of them is being addressed. Two, determination by the AI agents if they may/could/should speak at the end of a turn. In this work we bring these two challenges together and explore the participation of AI agents in multi-party conversations. Particularly, we show two embodied AI shopkeeper agents who sell similar items aiming to get the business of a user by competing with each other on the price. In this scenario, we solve the first challenge by using headpose (estimated by deep learning techniques) to determine who the user is talking to. For the second challenge we use deontic logic to model rules of a negotiation conversation.
#11043

Multi-Agent Path Finding on Ozobots
Roman Barták, Ivan Krasičenko, Jiří Švancara
Details | PDF

Demo Booths 2

Multi-agent path finding (MAPF) is the problem to find collision-free paths for a set of agents (mobile robots) moving on a graph. There exists several abstract models describing the problem with various types of constraints. The demo presents software to evaluate the abstract models when the plans are executed on Ozobots, small mobile robots developed for teaching programming. The software allows users to design the grid-like maps, to specify initial and goal locations of robots, to generate plans using various abstract models implemented in the Picat programming language, to simulate and to visualise execution of these plans, and to translate the plans to command sequences for Ozobots.
#11050

Reagent: Converting Ordinary Webpages into Interactive Software Agents
Matthew Peveler, Jeffrey O. Kephart, Hui Su
Details | PDF

Demo Booths 2

We introduce Reagent, a technology that can be used in conjunction with automated speech recognition to allow users to query and manipulate ordinary webpages via speech and pointing. Reagent can be used out-of-the-box with third-party websites, as it requires neither special instrumentation from website developers nor special domain knowledge to capture semantically-meaningful mouse interactions with structured elements such as tables and plots. When it is unable to infer mappings between domain vocabulary and visible webpage content on its own, Reagent proactively seeks help by engaging in a voice-based interaction with the user.
#11029

Deep Reinforcement Learning for Ride-sharing Dispatching and Repositioning
Zhiwei (Tony) Qin, Xiaocheng Tang, Yan Jiao, Fan Zhang, Chenxi Wang, Qun (Tracy) Li
Details | PDF

Demo Booths 2

In this demo, we will present a simulation-based human-computer interaction of deep reinforcement learning in action on order dispatching and driver repositioning for ride-sharing. Specifically, we will demonstrate through several specially designed domains how we use deep reinforcement learning to train agents (drivers) to have longer optimization horizon and to cooperate to achieve higher objective values collectively.
#11035

Intelligent Decision Support for Improving Power Management
Yongqing Zheng, Han Yu, Kun Zhang, Yuliang Shi, Cyril Leung, Chunyan Miao
Details | PDF

Demo Booths 2

With the development and adoption of the electricity information tracking system in China, real-time electricity consumption big data have become available to enable artificial intelligence (AI) to help power companies and the urban management departments to make demand side management decisions. We demonstrate the Power Intelligent Decision Support (PIDS) platform, which can generate Orderly Power Utilization (OPU) decision recommendations and perform Demand Response (DR) implementation management based on a short-term load forecasting model. It can also provide different users with query and application functions to facilitate explainable decision support.
#11041

Contextual Typeahead Sticker Suggestions on Hike Messenger
Mohamed Hanoosh, Abhishek Laddha, Debdoot Mukherjee
Details | PDF

Demo Booths 2

In this demonstration, we present Hike's sticker recommendation system, which helps users choose the right sticker to substitute the next message that they intend to send in a chat. We describe how the system addresses the issue of numerous orthographic variations for chat messages and operates under 20 milliseconds with low CPU and memory footprint on device.

Wednesday 14 09:35 - 09:40 2019 ACM SIGAI Industry Award (D-I)

Real World Reinforcement Learning Team (Microsoft)

2019 ACM SIGAI Industry Award

Wednesday 14 09:40 - 10:30 Industry Days (D-I)

Chair: Yu Zheng

A Real World Reinforcement Learning Service
John Langford and Tyler Clintworth, Principal Research Scientist and Lead Developer, Microsoft Research

Industry Days

Wednesday 14 11:00 - 12:00 MTA|SP - Security and Privacy 1 (2705-2706)

Chair: Wang Pinghui

#1230

DeepInspect: A Black-box Trojan Detection and Mitigation Framework for Deep Neural Networks
Huili Chen, Cheng Fu, Jishen Zhao, Farinaz Koushanfar
Details | PDF

Security and Privacy 1

Deep Neural Networks (DNNs) are vulnerable to Neural Trojan (NT) attacks where the adversary injects malicious behaviors during DNN training. This type of ‘backdoor’ attack is activated when the input is stamped with the trigger pattern specified by the attacker, resulting in an incorrect prediction of the model. Due to the wide application of DNNs in various critical fields, it is indispensable to inspect whether the pre-trained DNN has been trojaned before employing a model. Our goal in this paper is to address the security concern on unknown DNN to NT attacks and ensure safe model deployment. We propose DeepInspect, the first black-box Trojan detection solution with minimal prior knowledge of the model. DeepInspect learns the probability distribution of potential triggers from the queried model using a conditional generative model, thus retrieves the footprint of backdoor insertion. In addition to NT detection, we show that DeepInspect’s trigger generator enables effective Trojan mitigation by model patching. We corroborate the effectiveness, efficiency, and scalability of DeepInspect against the state-of-the-art NT attacks across various benchmarks. Extensive experiments show that DeepInspect offers superior detection performance and lower runtime overhead than the prior work.
#3546

VulSniper: Focus Your Attention to Shoot Fine-Grained Vulnerabilities
Xu Duan, Jingzheng Wu, Shouling Ji, Zhiqing Rui, Tianyue Luo, Mutian Yang, Yanjun Wu
Details | PDF

Security and Privacy 1

With the explosive development of information technology, vulnerabilities have become one of the major threats to computer security. Most vulnerabilities with similar patterns can be detected effectively by static analysis methods. However, some vulnerable and non-vulnerable code is hardly distinguishable, resulting in low detection accuracy. In this paper, we define the accurate identification of vulnerabilities in similar code as a fine-grained vulnerability detection problem. We propose VulSniper which is designed to detect fine-grained vulnerabilities more effectively. In VulSniper, attention mechanism is used to capture the critical features of the vulnerabilities. Especially, we use bottom-up and top-down structures to learn the attention weights of different areas of the program. Moreover, in order to fully extract the semantic features of the program, we generate the code property graph, design a 144-dimensional vector to describe the relation between the nodes, and finally encode the program as a feature tensor. VulSniper achieves F1-scores of 80.6% and 73.3% on the two benchmark datasets, the SARD Buffer Error dataset and the SARD Resource Management Error dataset respectively, which are significantly higher than those of the state-of-the-art methods.
#5689

Data Poisoning against Differentially-Private Learners: Attacks and Defenses
Yuzhe Ma, Xiaojin Zhu, Justin Hsu
Details | PDF

Security and Privacy 1

Data poisoning attacks aim to manipulate the model produced by a learning algorithm by adversarially modifying the training set. We consider differential privacy as a defensive measure against this type of attack. We show that private learners are resistant to data poisoning attacks when the adversary is only able to poison a small number of items. However, this protection degrades as the adversary is allowed to poison more data. We emprically evaluate this protection by designing attack algorithms targeting objective and output perturbation learners, two standard approaches to differentially-private machine learning. Experiments show that our methods are effective when the attacker is allowed to poison sufficiently many training items.
#5758

Robustra: Training Provable Robust Neural Networks over Reference Adversarial Space
Linyi Li, Zexuan Zhong, Bo Li, Tao Xie
Details | PDF

Security and Privacy 1

Machine learning techniques, especially deep neural networks (DNNs), have been widely adopted in various applications. However, DNNs are recently found to be vulnerable against adversarial examples, i.e., maliciously perturbed inputs that can mislead the models to make arbitrary prediction errors. Empirical defenses have been studied, but many of them can be adaptively attacked again. Provable defenses provide provable error bound of DNNs, while such bound so far is far from satisfaction. To address this issue, in this paper, we present our approach named Robustra for effectively improving the provable error bound of DNNs. We leverage the adversarial space of a reference model as the feasible region to solve the min-max game between the attackers and defenders. We solve its dual problem by linearly approximating the attackers' best strategy and utilizing the monotonicity of the slack variables introduced by the reference model. The evaluation results show that our approach can provide significantly better provable adversarial error bounds on MNIST and CIFAR10 datasets, compared to the state-of-the-art results. In particular, bounded by L^infty, with epsilon = 0.1, on MNIST we reduce the error bound from 2.74% to 2.09%; with epsilon = 0.3, we reduce the error bound from 24.19% to 16.91%.

Wednesday 14 11:00 - 12:15 ML|RL - Reinforcement Learning 3 (2701-2702)

Chair: Marc Toussaint

#149

Experience Replay Optimization
Daochen Zha, Kwei-Herng Lai, Kaixiong Zhou, Xia Hu
Details | PDF

Reinforcement Learning 3

Experience replay enables reinforcement learning agents to memorize and reuse past experiences, just as humans replay memories for the situation at hand. Contemporary off-policy algorithms either replay past experiences uniformly or utilize a rule-based replay strategy, which may be sub-optimal. In this work, we consider learning a replay policy to optimize the cumulative reward. Replay learning is challenging because the replay memory is noisy and large, and the cumulative reward is unstable. To address these issues, we propose a novel experience replay optimization (ERO) framework which alternately updates two policies: the agent policy, and the replay policy. The agent is updated to maximize the cumulative reward based on the replayed data, while the replay policy is updated to provide the agent with the most useful experiences. The conducted experiments on various continuous control tasks demonstrate the effectiveness of ERO, empirically showing promise in experience replay learning to improve the performance of off-policy reinforcement learning algorithms.
#354

Interactive Teaching Algorithms for Inverse Reinforcement Learning
Parameswaran Kamalaruban, Rati Devidze, Volkan Cevher, Adish Singla
Details | PDF

Reinforcement Learning 3

We study the problem of inverse reinforcement learning (IRL) with the added twist that the learner is assisted by a helpful teacher. More formally, we tackle the following algorithmic question: How could a teacher provide an informative sequence of demonstrations to an IRL learner to speed up the learning process? We present an interactive teaching framework where a teacher adaptively chooses the next demonstration based on learner's current policy. In particular, we design teaching algorithms for two concrete settings: an omniscient setting where a teacher has full knowledge about the learner's dynamics and a blackbox setting where the teacher has minimal knowledge. Then, we study a sequential variant of the popular MCE-IRL learner and prove convergence guarantees of our teaching algorithm in the omniscient setting. Extensive experiments with a car driving simulator environment show that the learning progress can be speeded up drastically as compared to an uninformative teacher.
#619

Interactive Reinforcement Learning with Dynamic Reuse of Prior Knowledge from Human and Agent Demonstrations
Zhaodong Wang, Matthew E. Taylor
Details | PDF

Reinforcement Learning 3

Reinforcement learning has enjoyed multiple impressive successes in recent years. However, these successes typically require very large amounts of data before an agent achieves acceptable performance. This paper focuses on a novel way of combating such requirements by leveraging existing (human or agent) knowledge. In particular, this paper leverages demonstrations, allowing an agent to quickly achieve high performance. This paper introduces the Dynamic Reuse of Prior (DRoP) algorithm, which combines the offline knowledge (demonstrations recorded before learning) with online confidence-based performance analysis. DRoP leverages the demonstrator's knowledge by automatically balancing between reusing the prior knowledge and the current learned policy, allowing the agent to outperform the original demonstrations. We compare with multiple state-of-the-art learning algorithms and empirically show that DRoP can achieve superior performance in two domains. Additionally, we show that this confidence measure can be used to selectively request additional demonstrations, significantly improving the learning performance of the agent.
#1363

Meta Reinforcement Learning with Task Embedding and Shared Policy
Lin Lan, Zhenguo Li, Xiaohong Guan, Pinghui Wang
Details | PDF

Reinforcement Learning 3

Despite significant progress, deep reinforcement learning (RL) suffers from data-inefficiency and limited generalization. Recent efforts apply meta-learning to learn a meta-learner from a set of RL tasks such that a novel but related task could be solved quickly. Though specific in some ways, different tasks in meta-RL are generally similar at a high level. However, most meta-RL methods do not explicitly and adequately model the specific and shared information among different tasks, which limits their ability to learn training tasks and to generalize to novel tasks. In this paper, we propose to capture the shared information on the one hand and meta-learn how to quickly abstract the specific information about a task on the other hand. Methodologically, we train an SGD meta-learner to quickly optimize a task encoder for each task, which generates a task embedding based on past experience. Meanwhile, we learn a policy which is shared across all tasks and conditioned on task embeddings. Empirical results on four simulated tasks demonstrate that our method has better learning capacity on both training and novel tasks and attains up to 3 to 4 times higher returns compared to baselines.
#5384

Planning with Expectation Models
Yi Wan, Muhammad Zaheer, Adam White, Martha White, Richard S. Sutton
Details | PDF

Reinforcement Learning 3

Distribution and sample models are two popular model choices in model-based reinforcement learning (MBRL). However, learning these models can be intractable, particularly when the state and action spaces are large. Expectation models, on the other hand, are relatively easier to learn due to their compactness and have also been widely used for deterministic environments. For stochastic environments, it is not obvious how expectation models can be used for planning as they only partially characterize a distribution. In this paper, we propose a sound way of using approximate expectation models for MBRL. In particular, we 1) show that planning with an expectation model is equivalent to planning with a distribution model if the state value function is linear in state features, 2) analyze two common parametrization choices for approximating the expectation: linear and non-linear expectation models, 3) propose a sound model-based policy evaluation algorithm and present its convergence results, and 4) empirically demonstrate the effectiveness of the proposed planning algorithm.

Wednesday 14 11:00 - 12:15 Survey 1 - Survey Session 1 (2405-2406)

Chair: Virginia Dignum

#10893

A Survey on Hierarchical Planning – One Abstract Idea, Many Concrete Realizations
Pascal Bercher, Ron Alford, Daniel Höller
Details | PDF

Survey Session 1

Hierarchical planning has attracted renewed interest in the last couple of years, which led to numerous novel formalisms, problem classes, and theoretical investigations. Yet it is important to differentiate between the various formalisms and problem classes, since they show -- sometimes fundamental -- differences with regard to their expressivity and computational complexity: Some of them can be regarded equivalent to non-hierarchical formalisms while others are clearly more expressive. We survey the most important hierarchical problem classes and explain their differences and similarities. We furthermore give pointers to some of the best-known planning systems capable of solving the respective problem classes.
#10895

Integrating Knowledge and Reasoning in Image Understanding
Somak Aditya, Yezhou Yang, Chitta Baral
Details | PDF

Survey Session 1

Deep learning based data-driven approaches have been successfully applied in various image understanding applications ranging from object recognition, semantic segmentation to visual question answering. However, the lack of knowledge integration as well as higher-level reasoning capabilities with the methods still pose a hindrance. In this work, we present a brief survey of a few representative reasoning mechanisms, knowledge integration methods and their corresponding image understanding applications developed by various groups of researchers, approaching the problem from a variety of angles. Furthermore, we discuss upon key efforts on integrating external knowledge with neural networks. Taking cues from these efforts, we conclude by discussing potential pathways to improve reasoning capabilities.
#10909

Counterfactuals in Explainable Artificial Intelligence (XAI): Evidence from Human Reasoning
Ruth M. J. Byrne
Details | PDF

Survey Session 1

Counterfactuals about what could have happened are increasingly used in an array of Artificial Intelligence (AI) applications, and especially in explainable AI (XAI). Counterfactuals can aid the provision of interpretable models to make the decisions of inscrutable systems intelligible to developers and users. However, not all counterfactuals are equally helpful in assisting human comprehension. Discoveries about the nature of the counterfactuals that humans create are a helpful guide to maximize the effectiveness of counterfactual use in AI.
#10953

A Replication Study of Semantics in Argumentation
Leila Amgoud
Details | PDF

Survey Session 1

Argumentation aims at increasing acceptability of claims by supporting them with arguments. Roughly speaking, an argument is a set of premises intended to establish a definite claim. Its strength depends on the plausibility of the premises, the nature of the link between the premises and claim, and the prior acceptability of the claim. It may generally be weakened by other arguments that undermine one or more of its three components. Evaluation of arguments is a crucial task, and a sizable amount of methods, called semantics, has been proposed in the literature. This paper discusses two classifications of the existing semantics: the first one is based on the type of semantics' outcomes (sets of arguments, weighting, and preorder), the second is based on the goals pursued by the semantics (acceptability, strength, coalitions).
#10954

Automated Essay Scoring: A Survey of the State of the Art
Zixuan Ke, Vincent Ng
Details | PDF

Survey Session 1

Despite being investigated for over 50 years, the task of automated essay scoring is far from being solved. Nevertheless, it continues to draw a lot of attention in the natural language processing community in part because of its commercial and educational values as well as the associated research challenges. This paper presents an overview of the major milestones made in automated essay scoring research since its inception.

Wednesday 14 11:00 - 12:30 Industry Days (D-I)

Chair: Anand Rao (PwC)

AI x Robotics in Sony as Creative Entertainment Company
Masahiro Fujita, Senior Chief Researcher, AI Collaboration Office, Sony Corporation; Michael Spranger, Senior Research Scientist, Sony Corporation AND Researcher, Sony Computer Science Laboratories Inc.

Industry Days

Wednesday 14 11:00 - 12:30 Panel (K)

Chair: Ray Perrault

50 years of IJCAI

Panel

Wednesday 14 11:00 - 12:30 AI-HWB - ST: AI for Improving Human Well-Being 2 (J)

Chair: Christophe Marsala

#457

Safe Contextual Bayesian Optimization for Sustainable Room Temperature PID Control Tuning
Marcello Fiducioso, Sebastian Curi, Benedikt Schumacher, Markus Gwerder, Andreas Krause
Details | PDF

ST: AI for Improving Human Well-Being 2

We tune one of the most common heating, ventilation, and air conditioning (HVAC) control loops, namely the temperature control of a room. For economical and environmental reasons, it is of prime importance to optimize the performance of this system. Buildings account from 20 to 40 % of a country energy consumption, and almost 50 % of it comes from HVAC systems. Scenario projections predict a 30 % decrease in heating consumption by 2050 due to efficiency increase. Advanced control techniques can improve performance; however, the proportional-integral-derivative (PID) control is typically used due to its simplicity and overall performance. We use Safe Contextual Bayesian Optimization to optimize the PID parameters without human intervention. We reduce costs by 32 % compared to the current PID controller setting while assuring safety and comfort to people in the room. The results of this work have an immediate impact on the room control loop performances and its related commissioning costs. Furthermore, this successful attempt paves the way for further use at different levels of HVAC systems, with promising energy, operational, and commissioning costs savings, and it is a practical demonstration of the positive effects that Artificial Intelligence can have on environmental sustainability.
#1168

Protecting Neural Networks with Hierarchical Random Switching: Towards Better Robustness-Accuracy Trade-off for Stochastic Defenses
Xiao Wang, Siyue Wang, Pin-Yu Chen, Yanzhi Wang, Brian Kulis, Xue Lin, Sang Chin
Details | PDF

ST: AI for Improving Human Well-Being 2

Despite achieving remarkable success in various domains, recent studies have uncovered the vulnerability of deep neural networks to adversarial perturbations, creating concerns on model generalizability and new threats such as prediction-evasive misclassification or stealthy reprogramming. Among different defense proposals, stochastic network defenses such as random neuron activation pruning or random perturbation to layer inputs are shown to be promising for attack mitigation. However, one critical drawback of current defenses is that the robustness enhancement is at the cost of noticeable performance degradation on legitimate data, e.g., large drop in test accuracy.This paper is motivated by pursuing for a better trade-off between adversarial robustness and test accuracy for stochastic network defenses. We propose Defense Efficiency Score (DES), a comprehensive metric that measures the gain in unsuccessful attack attempts at the cost of drop in test accuracy of any defense. To achieve a better DES, we propose hierarchical random switching (HRS), which protects neural networks through a novel randomization scheme. A HRS-protected model contains several blocks of randomly switching channels to prevent adversaries from exploiting fixed model structures and parameters for their malicious purposes. Extensive experiments show that HRS is superior in defending against state-of-the-art white-box and adaptive adversarial misclassification attacks. We also demonstrate the effectiveness of HRS in defending adversarial reprogramming, which is the first defense against adversarial programs. Moreover, in most settings the average DES of HRS is at least 5X higher than current stochastic network defenses, validating its significantly improved robustness-accuracy trade-off.
#707

KitcheNette: Predicting and Ranking Food Ingredient Pairings using Siamese Neural Network
Donghyeon Park, Keonwoo Kim, Yonggyu Park, Jungwoon Shin, Jaewoo Kang
Details | PDF

ST: AI for Improving Human Well-Being 2

As a vast number of ingredients exist in the culinary world, there are countless food ingredient pairings, but only a small number of pairings have been adopted by chefs and studied by food researchers. In this work, we propose KitcheNette which is a model that predicts food ingredient pairing scores and recommends optimal ingredient pairings. KitcheNette employs Siamese neural networks and is trained on our annotated dataset containing 300K scores of pairings generated from numerous ingredients in food recipes. As the results demonstrate, our model not only outperforms other baseline models, but also can recommend complementary food pairings and discover novel ingredient pairings.
#1303

SparseSense: Human Activity Recognition from Highly Sparse Sensor Data-streams Using Set-based Neural Networks
Alireza Abedin, S. Hamid Rezatofighi, Qinfeng Shi, Damith C. Ranasinghe
Details | PDF

ST: AI for Improving Human Well-Being 2

Batteryless or so called passive wearables are providing new and innovative methods for human activity recognition (HAR), especially in healthcare applications for older people. Passive sensors are low cost, lightweight, unobtrusive and desirably disposable; attractive attributes for healthcare applications in hospitals and nursing homes. Despite the compelling propositions for sensing applications, the data streams from these sensors are characterised by high sparsity---the time intervals between sensor readings are irregular while the number of readings per unit time are often limited. In this paper, we rigorously explore the problem of learning activity recognition models from temporally sparse data. We describe how to learn directly from sparse data using a deep learning paradigm in an end-to-end manner. We demonstrate significant classification performance improvements on real-world passive sensor datasets from older people over the state-of-the-art deep learning human activity recognition models. Further, we provide insights into the model's behaviour through complementary experiments on a benchmark dataset and visualisation of the learned activity feature spaces.
#3424

MNN: Multimodal Attentional Neural Networks for Diagnosis Prediction
Zhi Qiao, Xian Wu, Shen Ge, Wei Fan
Details | PDF

ST: AI for Improving Human Well-Being 2

Diagnosis prediction plays a key role in clinical decision supporting process, which attracted extensive research attention recently. Existing studies mainly utilize discrete medical codes (e.g., the ICD codes and procedure codes) as the primary features in prediction. However, in real clinical settings, such medical codes could be either incomplete or erroneous. For example, missed diagnosis will neglect some codes which should be included, mis-diagnosis will generate incorrect medical codes. To increase the robustness towards noisy data, we introduce textual clinical notes in addition to medical codes. Combining information from both sides will lead to improved understanding towards clinical health conditions. To accommodate both the textual notes and discrete medical codes in the same framework, we propose Multimodal Attentional Neural Networks (MNN), which integrates multi-modal data in a collaborative manner. Experimental results on real world EHR datasets demonstrate the advantages of MNN in terms of both robustness and accuracy.
#4465

Global Robustness Evaluation of Deep Neural Networks with Provable Guarantees for the Hamming Distance
Wenjie Ruan, Min Wu, Youcheng Sun, Xiaowei Huang, Daniel Kroening, Marta Kwiatkowska
Details | PDF

ST: AI for Improving Human Well-Being 2

Deployment of deep neural networks (DNNs) in safety-critical systems requires provable guarantees for their correct behaviours. We compute the maximal radius of a safe norm ball around a given input, within which there are no adversarial examples for a trained DNN. We define global robustness as an expectation of the maximal safe radius over a test dataset, and develop an algorithm to approximate the global robustness measure by iteratively computing its lower and upper bounds. Our algorithm is the first efficient method for the Hamming (L0) distance, and we hypothesise that this norm is a good proxy for a certain class of physical attacks. The algorithm is anytime, i.e., it returns intermediate bounds and robustness estimates that are gradually, but strictly, improved as the computation proceeds; tensor-based, i.e., the computation is conducted over a set of inputs simultaneously to enable efficient GPU computation; and has provable guarantees, i.e., both the bounds and the robustness estimates can converge to their optimal values. Finally, we demonstrate the utility of our approach by applying the algorithm to a set of challenging problems.

Wednesday 14 11:00 - 12:30 ML|DL - Deep Learning 4 (L)

Chair: Longbing Cao

#2022

Towards Robust ResNet: A Small Step but a Giant Leap
Jingfeng Zhang, Bo Han, Laura Wynter, Bryan Kian Hsiang Low, Mohan Kankanhalli
Details | PDF

Deep Learning 4

This paper presents a simple yet principled approach to boosting the robustness of the residual network (ResNet) that is motivated by a dynamical systems perspective. Namely, a deep neural network can be interpreted using a partial differential equation, which naturally inspires us to characterize ResNet based on an explicit Euler method. This consequently allows us to exploit the step factor h in the Euler method to control the robustness of ResNet in both its training and generalization. In particular, we prove that a small step factor h can benefit its training and generalization robustness during backpropagation and forward propagation, respectively. Empirical evaluation on real-world datasets corroborates our analytical findings that a small h can indeed improve both its training and generalization robustness.
#2847

Reparameterizable Subset Sampling via Continuous Relaxations
Sang Michael Xie, Stefano Ermon
Details | PDF

Deep Learning 4

Many machine learning tasks require sampling a subset of items from a collection based on a parameterized distribution. The Gumbel-softmax trick can be used to sample a single item, and allows for low-variance reparameterized gradients with respect to the parameters of the underlying distribution. However, stochastic optimization involving subset sampling is typically not reparameterizable. To overcome this limitation, we define a continuous relaxation of subset sampling that provides reparameterization gradients by generalizing the Gumbel-max trick. We use this approach to sample subsets of features in an instance-wise feature selection task for model interpretability, subsets of neighbors to implement a deep stochastic k-nearest neighbors model, and sub-sequences of neighbors to implement parametric t-SNE by directly comparing the identities of local neighbors. We improve performance in all these tasks by incorporating subset sampling in end-to-end training.
#2949

Image Captioning with Compositional Neural Module Networks
Junjiao Tian, Jean Oh
Details | PDF

Deep Learning 4

In image captioning where fluency is an important factor in evaluation, n-gram metrics, sequential models are commonly used; however, sequential models generally result in overgeneralized expressions that lack the details that may be present in an input image. Inspired by the idea of the compositional neural module networks in the visual question answering task, we introduce a hierarchical framework for image captioning that explores both compositionality and sequentiality of natural language. Our algorithm learns to compose a detail-rich sentence by selectively attending to different modules corresponding to unique aspects of each object detected in an input image to include specific descriptions such as counts and color. In a set of experiments on the MSCOCO dataset, the proposed model outperforms a state-of-the art model across multiple evaluation metrics, more importantly, presenting visually interpretable results. Furthermore, the breakdown of subcategories f-scores of the SPICE metric and human evaluation on Amazon Mechanical Turk show that our compositional module networks effectively generate accurate and detailed captions.
#5573

Extrapolating Paths with Graph Neural Networks
Jean-Baptiste Cordonnier, Andreas Loukas
Details | PDF

Deep Learning 4

We consider the problem of path inference: given a path prefix, i.e., a partially observed sequence of nodes in a graph, we want to predict which nodes are in the missing suffix. In particular, we focus on natural paths occurring as a by-product of the interaction of an agent with a network---a driver on the transportation network, an information seeker in Wikipedia, or a client in an online shop. Our interest is sparked by the realization that, in contrast to shortest-path problems, natural paths are usually not optimal in any graph-theoretic sense, but might still follow predictable patterns. Our main contribution is a graph neural network called Gretel. Conditioned on a path prefix, this network can efficiently extrapolate path suffixes, evaluate path likelihood, and sample from the future path distribution. Our experiments with GPS traces on a road network and user-navigation paths in Wikipedia confirm that Gretel is able to adapt to graphs with very different properties, while also comparing favorably to previous solutions.
#3760

Ornstein Auto-Encoders
Youngwon Choi, Joong-Ho Won
Details | PDF

Deep Learning 4

We propose the Ornstein auto-encoder (OAE), a representation learning model for correlated data. In many interesting applications, data have nested structures. Examples include the VGGFace and MNIST datasets. We view such data consist of i.i.d. copies of a stationary random process, and seek a latent space representation of the observed sequences. This viewpoint necessitates a distance measure between two random processes. We propose to use Orstein's d-bar distance, a process extension of Wasserstein's distance. We first show that the theorem by Bousquet et al. (2017) for Wasserstein auto-encoders extends to stationary random processes. This result, however, requires both encoder and decoder to map an entire sequence to another. We then show that, when exchangeability within a process, valid for VGGFace and MNIST, is assumed, these maps reduce to univariate ones, resulting in a much simpler, tractable optimization problem. Our experiments show that OAEs successfully separate individual sequences in the latent space, and can generate new variations of unknown, as well as known, identity. The latter has not been possible with other existing methods.
#3138

Variational Graph Embedding and Clustering with Laplacian Eigenmaps
Zitai Chen, Chuan Chen, Zong Zhang, Zibin Zheng, Qingsong Zou
Details | PDF

Deep Learning 4

As a fundamental machine learning problem, graph clustering has facilitated various real-world applications, and tremendous efforts had been devoted to it in the past few decades. However, most of the existing methods like spectral clustering suffer from the sparsity, scalability, robustness and handling high dimensional raw information in clustering. To address this issue, we propose a deep probabilistic model, called Variational Graph Embedding and Clustering with Laplacian Eigenmaps (VGECLE), which learns node embeddings and assigns node clusters simultaneously. It represents each node as a Gaussian distribution to disentangle the true embedding position and the uncertainty from the graph. With a Mixture of Gaussian (MoG) prior, VGECLE is capable of learning an interpretable clustering by the variational inference and generative process. In order to learn the pairwise relationships better, we propose a Teacher-Student mechanism encouraging node to learn a better Gaussian from its instant neighbors in the stochastic gradient descent (SGD) training fashion. By optimizing the graph embedding and the graph clustering problem as a whole, our model can fully take the advantages in their correlation. To our best knowledge, we are the first to tackle graph clustering in a deep probabilistic viewpoint. We perform extensive experiments on both synthetic and real-world networks to corroborate the effectiveness and efficiency of the proposed framework.

Wednesday 14 11:00 - 12:30 AMS|AGT - Algorithmic Game Theory 1 (2703-2704)

Chair: Bei Xiaohui

#166

Achieving a Fairer Future by Changing the Past
Jiafan He, Ariel D. Procaccia, Alexandros Psomas, David Zeng
Details | PDF

Algorithmic Game Theory 1

We study the problem of allocating T indivisible items that arrive online to agents with additive valuations. The allocation must satisfy a prominent fairness notion, envy-freeness up to one item (EF1), at each round. To make this possible, we allow the reallocation of previously allocated items, but aim to minimize these so-called adjustments. For the case of two agents, we show that algorithms that are informed about the values of future items can get by without any adjustments, whereas uninformed algorithms require Theta(T) adjustments. For the general case of three or more agents, we prove that even informed algorithms must use Omega(T) adjustments, and design an uninformed algorithm that requires only O(T^(3/2)).
#3032

Preferred Deals in General Environments
Yuan Deng, Sébastien Lahaie, Vahab Mirrokni
Details | PDF

Algorithmic Game Theory 1

A preferred deal is a special contract for selling impressions of display ad inventory. By accepting a deal, a buyer agrees to buy a minimum amount of impressions at a fixed price per impression, and is granted priority access to the impressions before they are sent to an open auction on an ad exchange. We consider the problem of designing preferred deals (inventory, price, quantity) in the presence of general convex constraints, including budget constraints, and propose an approximation algorithm to maximize the revenue obtained from the deals. We then evaluate our algorithm using auction data from a major advertising exchange and our empirical results show that the algorithm achieves around 95% of the optimal revenue.
#3679

On the Efficiency and Equilibria of Rich Ads
MohammadAmin Ghiasi, MohammadTaghi Hajiaghayi, Sébastien Lahaie, Hadi Yami
Details | PDF

Algorithmic Game Theory 1

Search ads have evolved in recent years from simple text formats to rich ads that allow deep site links, rating, images and videos. In this paper, we consider a model where several slots are available on the search results page, as in the classic generalized second-price auction (GSP), but now a bidder can be allocated several consecutive slots, which are interpreted as a rich ad. As in the GSP, each bidder submits a bid-per-click, but the click-through rate (CTR) function is generalized from a simple CTR for each slot to a general CTR function over sets of consecutive slots. We study allocation and pricing in this model under subadditive and fractionally subadditive CTRs. We design and analyze a constant-factor approximation algorithm for the efficient allocation problem under fractionally subadditive CTRs, and a log-approximation algorithm for the subadditive case. Building on these results, we show that approximate competitive equilibrium prices exist and can be computed for subadditive and fractionally subadditive CTRs, with the same guarantees as for allocation.
#4855

Neural Networks for Predicting Human Interactions in Repeated Games
Yoav Kolumbus, Gali Noti
Details | PDF

Algorithmic Game Theory 1

We consider the problem of predicting human players' actions in repeated strategic interactions. Our goal is to predict the dynamic step-by-step behavior of individual players in previously unseen games. We study the ability of neural networks to perform such predictions and the information that they require. We show on a dataset of normal-form games from experiments with human participants that standard neural networks are able to learn functions that provide more accurate predictions of the players' actions than established models from behavioral economics. The networks outperform the other models in terms of prediction accuracy and cross-entropy, and yield higher economic value. We show that if the available input is only of a short sequence of play, economic information about the game is important for predicting behavior of human agents. However, interestingly, we find that when the networks are trained with long enough sequences of history of play, action-based networks do well and additional economic details about the game do not improve their performance, indicating that the sequence of actions encode sufficient information for the success in the prediction task.
#6034

Ridesharing with Driver Location Preferences
Duncan Rheingans-Yoo, Scott Duke Kominers, Hongyao Ma, David C. Parkes
Details | PDF

Algorithmic Game Theory 1

We study revenue-optimal pricing and driver compensation in ridesharing platforms when drivers have heterogeneous preferences over locations. If a platform ignores drivers' location preferences, it may make inefficient trip dispatches; moreover, drivers may strategize so as to route towards their preferred locations. In a model with stationary and continuous demand and supply, we present a mechanism that incentivizes drivers to both (i) report their location preferences truthfully and (ii) always provide service. In settings with unconstrained driver supply or symmetric demand patterns, our mechanism achieves (full-information) first-best revenue. Under supply constraints and unbalanced demand, we show via simulation that our mechanism improves over existing mechanisms and has performance close to the first-best.
#10967

(Sister Conferences Best Papers Track) The Power of Context in Networks: Ideal Point Models with Social Interactions
Mohammad T. Irfan, Tucker Gordon
Details | PDF

Algorithmic Game Theory 1

Game theory has been widely used for modeling strategic behaviors in networked multiagent systems. However, the context within which these strategic behaviors take place has received limited attention. We present a model of strategic behavior in networks that incorporates the behavioral context, focusing on the contextual aspects of congressional voting. One salient predictive model in political science is the ideal point model, which assigns each senator and each bill a number on the real line of political spectrum. We extend the classical ideal point model with network-structured interactions among senators. In contrast to the ideal point model's prediction of individual voting behavior, we predict joint voting behaviors in a game-theoretic fashion. The consideration of context allows our model to outperform previous models that solely focus on the networked interactions with no contextual parameters. We focus on two fundamental questions: learning the model using real-world data and computing stable outcomes of the model with a view to predicting joint voting behaviors and identifying most influential senators. We demonstrate the effectiveness of our model through experiments using data from the 114th U.S. Congress.

Wednesday 14 11:00 - 12:30 ML|OL - Online Learning 1 (2601-2602)

Chair: Asim Munawar

#2197

A Practical Semi-Parametric Contextual Bandit
Yi Peng, Miao Xie, Jiahao Liu, Xuying Meng, Nan Li, Cheng Yang, Tao Yao, Rong Jin
Details | PDF

Online Learning 1

Classic multi-armed bandit algorithms are inefficient for a large number of arms. On the other hand, contextual bandit algorithms are more efficient, but they suffer from a large regret due to the bias of reward estimation with finite dimensional features. Although recent studies proposed semi-parametric bandits to overcome these defects, they assume arms' features are constant over time. However, this assumption rarely holds in practice, since real-world problems often involve underlying processes that are dynamically evolving over time especially for the special promotions like Singles' Day sales. In this paper, we formulate a novel Semi-Parametric Contextual Bandit Problem to relax this assumption. For this problem, a novel Two-Steps Upper-Confidence Bound framework, called Semi-Parametric UCB (SPUCB), is presented. It can be flexibly applied to linear parametric function problem with a satisfied gap-free bound on the n-step regret. Moreover, to make our method more practical in online system, an optimization is proposed for dealing with high dimensional features of a linear function. Extensive experiments on synthetic data as well as a real dataset from one of the largest e-commercial platforms demonstrate the superior performance of our algorithm.
#3401

Marginal Posterior Sampling for Slate Bandits
Maria Dimakopoulou, Nikos Vlassis, Tony Jebara
Details | PDF

Online Learning 1

We introduce a new Thompson sampling-based algorithm, called marginal posterior sampling, for online slate bandits, that is characterized by three key ideas. First, it postulates that the slate-level reward is a monotone function of the marginal unobserved rewards of the base actions selected in the slates's slots, but it does not attempt to estimate this function. Second, instead of maintaining a slate-level reward posterior, the algorithm maintains posterior distributions for the marginal reward of each slot's base actions and uses the samples from these marginal posteriors to select the next slate. Third, marginal posterior sampling optimizes at the slot-level rather than the slate-level, which makes the approach computationally efficient. Simulation results establish substantial advantages of marginal posterior sampling over alternative Thompson sampling-based approaches that are widely used in the domain of web services.
#3855

Learning Multi-Objective Rewards and User Utility Function in Contextual Bandits for Personalized Ranking
Nirandika Wanigasekara, Yuxuan Liang, Siong Thye Goh, Ye Liu, Joseph Jay Williams, David S. Rosenblum
Details | PDF

Online Learning 1

This paper tackles the problem of providing users with ranked lists of relevant search results, by incorporating contextual features of the users and search results, and learning how a user values multiple objectives. For example, to recommend a ranked list of hotels, an algorithm must learn which hotels are the right price for users, as well as how users vary in their weighting of price against the location. In our paper, we formulate the context-aware, multi-objective, ranking problem as a Multi-Objective Contextual Ranked Bandit (MOCR-B). To solve the MOCR-B problem, we present a novel algorithm, named Multi-Objective Utility-Upper Confidence Bound (MOU-UCB). The goal of MOU-UCB is to learn how to generate a ranked list of resources that maximizes the rewards in multiple objectives to give relevant search results. Our algorithm learns to predict rewards in multiple objectives based on contextual information (combining the Upper Confidence Bound algorithm for multi-armed contextual bandits with neural network embeddings), as well as learns how a user weights the multiple objectives. Our empirical results reveal that the ranked lists generated by MOU-UCB lead to better click-through rates, compared to approaches that do not learn the utility function over multiple reward objectives.
#5278

Perturbed-History Exploration in Stochastic Multi-Armed Bandits
Branislav Kveton, Csaba Szepesvári, Mohammad Ghavamzadeh, Craig Boutilier
Details | PDF

Online Learning 1

We propose an online algorithm for cumulative regret minimization in a stochastic multi-armed bandit. The algorithm adds O(t) i.i.d. pseudo-rewards to its history in round t and then pulls the arm with the highest average reward in its perturbed history. Therefore, we call it perturbed-history exploration (PHE). The pseudo-rewards are carefully designed to offset potentially underestimated mean rewards of arms with a high probability. We derive near-optimal gap-dependent and gap-free bounds on the n-round regret of PHE. The key step in our analysis is a novel argument that shows that randomized Bernoulli rewards lead to optimism. Finally, we empirically evaluate PHE and show that it is competitive with state-of-the-art baselines.
#5548

Unifying the Stochastic and the Adversarial Bandits with Knapsack
Anshuka Rangi, Massimo Franceschetti, Long Tran-Thanh
Details | PDF

Online Learning 1

This work investigates the adversarial Bandits with Knapsack (BwK) learning problem, where a player repeatedly chooses to perform an action, pays the corresponding cost of the action, and receives a reward associated with the action. The player is constrained by the maximum budget that can be spent to perform the actions, and the rewards and the costs of these actions are assigned by an adversary. This setting is studied in terms of expected regret, defined as the difference between the total expected rewards per unit cost corresponding the best fixed action and the total expected rewards per unit cost of the learning algorithm. We propose a novel algorithm EXP3.BwK and show that the expected regret of the algorithm is order optimal in the budget. We then propose another algorithm EXP3++.BwK, which is order optimal in the adversarial BwK setting, and incurs an almost optimal expected regret in the stochastic BwK setting where the rewards and the costs are drawn from unknown underlying distributions. These results are then extended to a more general online learning setting, by designing another algorithm EXP3++.LwK and providing its performance guarantees. Finally, we investigate the scenario where the costs of the actions are large and comparable to the budget. We show that for the adversarial setting, the achievable regret bounds scale at least linearly with the maximum cost for any learning algorithm, and are significantly worse in comparison to the case of having costs bounded by a constant, which is a common assumption in the BwK literature.
#2097

Multi-Objective Generalized Linear Bandits
Shiyin Lu, Guanghui Wang, Yao Hu, Lijun Zhang
Details | PDF

Online Learning 1

In this paper, we study the multi-objective bandits (MOB) problem, where a learner repeatedly selects one arm to play and then receives a reward vector consisting of multiple objectives. MOB has found many real-world applications as varied as online recommendation and network routing. On the other hand, these applications typically contain contextual information that can guide the learning process which, however, is ignored by most of existing work. To utilize this information, we associate each arm with a context vector and assume the reward follows the generalized linear model (GLM). We adopt the notion of Pareto regret to evaluate the learner's performance and develop a novel algorithm for minimizing it. The essential idea is to apply a variant of the online Newton step to estimate model parameters, based on which we utilize the upper confidence bound (UCB) policy to construct an approximation of the Pareto front, and then uniformly at random choose one arm from the approximate Pareto front. Theoretical analysis shows that the proposed algorithm achieves an \tilde O(d\sqrt{T}) Pareto regret, where T is the time horizon and d is the dimension of contexts, which matches the optimal result for single objective contextual bandits problem. Numerical experiments demonstrate the effectiveness of our method.

Wednesday 14 11:00 - 12:30 ML|C - Classification 4 (2603-2604)

Chair: Miao Xu

#2994

SPAGAN: Shortest Path Graph Attention Network
Yiding Yang, Xinchao Wang, Mingli Song, Junsong Yuan, Dacheng Tao
Details | PDF

Classification 4

Graph convolutional networks (GCN) have recently demonstrated their potential in analyzing non-grid structure data that can be represented as graphs. The core idea is to encode the local topology of a graph, via convolutions, into the feature of a center node. In this paper, we propose a novel GCN model, which we term as Shortest Path Graph Attention Network (SPAGAN). Unlike conventional GCN models that carry out node-based attentions, on either first-order neighbors or random higher-order ones, the proposed SPAGAN conducts path-based attention that explicitly accounts for the influence of a sequence of nodes yielding the minimum cost, or shortest path, between the center node and its higher-order neighbors. SPAGAN therefore allows for a more informative and intact exploration of the graph structure and further the more effective aggregation of information from distant neighbors, as compared to node-based GCN methods. We test SPAGAN for the downstream classification task on several standard datasets, and achieve performances superior to the state of the art.
#3128

Learn Smart with Less: Building Better Online Decision Trees with Fewer Training Examples
Ariyam Das, Jin Wang, Sahil M. Gandhi, Jae Lee, Wei Wang, Carlo Zaniolo
Details | PDF

Classification 4

Online decision tree models are extensively used in many industrial machine learning applications for real-time classification tasks. These models are highly accurate, scalable and easy to use in practice. The Very Fast Decision Tree (VFDT) is the classic online decision tree induction model that has been widely adopted due to its theoretical guarantees as well as competitive performance. However, VFDT and its variants solely rely on conservative statistical measures like Hoeffding bound to incrementally grow the tree. This makes these models extremely circumspect and limits their ability to learn fast. In this paper, we efficiently employ statistical resampling techniques to build an online tree faster using fewer examples. We first theoretically show that a naive implementation of resampling techniques like non-parametric bootstrap does not scale due to large memory and computational overheads. We mitigate this by proposing a robust memory-efficient bootstrap simulation heuristic (Mem-ES) that successfully expedites the learning process. Experimental results on both synthetic data and large-scale real world datasets demonstrate the efficiency and effectiveness of our proposed technique.
#3915

Discrete Binary Coding based Label Distribution Learning
Ke Wang, Xin Geng
Details | PDF

Classification 4

Label Distribution Learning (LDL) is a general learning paradigm in machine learning, which includes both single-label learning (SLL) and multi-label learning (MLL) as its special cases. Recently, many LDL algorithms have been proposed to handle different application tasks such as facial age estimation, head pose estimation and visual sentiment distributions prediction. However, the training time complexity of most existing LDL algorithms is too high, which makes them unapplicable to large-scale LDL. In this paper, we propose a novel LDL method to address this issue, termed Discrete Binary Coding based Label Distribution Learning (DBC-LDL). Specifically, we design an efficiently discrete coding framework to learn binary codes for instances. Furthermore, both the pair-wise semantic similarities and the original label distributions are integrated into this framework to learn highly discriminative binary codes. In addition, a fast approximate nearest neighbor (ANN) search strategy is utilized to predict label distributions for testing instances. Experimental results on five real-world datasets demonstrate its superior performance over several state-of-the-art LDL methods with the lower time cost.
#3982

Learning for Tail Label Data: A Label-Specific Feature Approach
Tong Wei, Wei-Wei Tu, Yu-Feng Li
Details | PDF

Classification 4

Tail label data (TLD) is prevalent in real-world tasks, and large-scale multi-label learning (LMLL) is its major learning scheme. Previous LMLL studies typically need to additionally take into account extensive head label data (HLD), and thus fail to guide the learning behavior of TLD. In many applications such as recommender systems, however, the prediction of tail label is very necessary, since it provides very important supplementary information. We call this kind of problem as \emph{tail label learning}. In this paper, we propose a novel method for the tail label learning problem. Based on the observation that the raw feature representation in LMLL data usually benefits HLD, which may not be suitable for TLD, we construct effective and rich label-specific features through exploring labeled data distribution and leveraging label correlations. Specifically, we employ clustering analysis to explore discriminative features for each tail label replacing the original high-dimensional and sparse features. In addition, due to the scarcity of positive examples of TLD, we encode knowledge from HLD by exploiting label correlations to enhance the label-specific features. Experimental results verify the superiority of the proposed method in terms of performance on TLD.
#4952

Spatio-Temporal Attentive RNN for Node Classification in Temporal Attributed Graphs
Dongkuan Xu, Wei Cheng, Dongsheng Luo, Xiao Liu, Xiang Zhang
Details | PDF

Classification 4

Node classification in graph-structured data aims to classify the nodes where labels are only available for a subset of nodes. This problem has attracted considerable research efforts in recent years. In real-world applications, both graph topology and node attributes evolve over time. Existing techniques, however, mainly focus on static graphs and lack the capability to simultaneously learn both temporal and spatial/structural features. Node classification in temporal attributed graphs is challenging for two major aspects. First, effectively modeling the spatio-temporal contextual information is hard. Second, as temporal and spatial dimensions are entangled, to learn the feature representation of one target node, it’s desirable and challenging to differentiate the relative importance of different factors, such as different neighbors and time periods. In this paper, we propose STAR, a spatio-temporal attentive recurrent network model, to deal with the above challenges. STAR extracts the vector representation of neighborhood by sampling and aggregating local neighbor nodes. It further feeds both the neighborhood representation and node attributes into a gated recurrent unit network to jointly learn the spatio-temporal contextual information. On top of that, we take advantage of the dual attention mechanism to perform a thorough analysis on the model interpretability. Extensive experiments on real datasets demonstrate the effectiveness of the STAR model.
#2572

Worst-Case Discriminative Feature Selection
Shuangli Liao, Quanxue Gao, Feiping Nie, Yang Liu, Xiangdong Zhang
Details | PDF

Classification 4

Feature selection plays a critical role in data mining, driven by increasing feature dimensionality in target problems. In this paper, we propose a new criterion for discriminative feature selection, worst-case discriminative feature selection (WDFS). Unlike Fisher Score and other methods based on the discriminative criteria considering the overall (or average) separation of data, WDFS adopts a new perspective called worst-case view which arguably is more suitable for classification applications. Specifically, WDFS directly maximizes the ratio of the minimum of between-class variance of all class pairs over the maximum of within-class variance, and thus it duly considers the separation of all classes. Otherwise, we take a greedy strategy by finding one feature at a time, but it is very easy to implement. Moreover, we also utilize the correlation between features to help reduce the redundancy and extend WDFS to uncorrelated WDFS (UWDFS). To evaluate the effectiveness of the proposed algorithm, we conduct classification experiments on many real data sets. In the experiment, we respectively use the original features and the score vectors of features over all class pairs to calculate the correlation coefficients, and analyze the experimental results in these two ways. Experimental results demonstrate the effectiveness of WDFS and UWDFS.

Wednesday 14 11:00 - 12:30 NLP|D - Dialogue (2605-2606)

Chair: Magnini Bernardo

#28

Exploiting Persona Information for Diverse Generation of Conversational Responses
Haoyu Song, Wei-Nan Zhang, Yiming Cui, Dong Wang, Ting Liu
Details | PDF

Dialogue

In human conversations, due to their personalities in mind, people can easily carry out and maintain the conversations. Giving conversational context with persona information to a chatbot, how to exploit the information to generate diverse and sustainable conversations is still a non-trivial task. Previous work on persona-based conversational models successfully make use of predefined persona information and have shown great promise in delivering more realistic responses. And they all learn with the assumption that given a source input, there is only one target response. However, in human conversations, there are massive appropriate responses to a given input message. In this paper, we propose a memory-augmented architecture to exploit persona information from context and incorporate a conditional variational autoencoder model together to generate diverse and sustainable conversations. We evaluate the proposed model on a benchmark persona-chat dataset. Both automatic and human evaluations show that our model can deliver more diverse and more engaging persona-based responses than baseline approaches.
#1946

Generating Multiple Diverse Responses with Multi-Mapping and Posterior Mapping Selection
Chaotao Chen, Jinhua Peng, Fan Wang, Jun Xu, Hua Wu
Details | PDF

Dialogue

In human conversation an input post is open to multiple potential responses, which is typically regarded as a one-to-many problem. Promising approaches mainly incorporate multiple latent mechanisms to build the one-to-many relationship. However, without accurate selection of the latent mechanism corresponding to the target response during training, these methods suffer from a rough optimization of latent mechanisms. In this paper, we propose a multi-mapping mechanism to better capture the one-to-many relationship, where multiple mapping modules are employed as latent mechanisms to model the semantic mappings from an input post to its diverse responses. For accurate optimization of latent mechanisms, a posterior mapping selection module is designed to select the corresponding mapping module according to the target response for further optimization. We also introduce an auxiliary matching loss to facilitate the optimization of posterior mapping selection. Empirical results demonstrate the superiority of our model in generating multiple diverse and informative responses over the state-of-the-art methods.
#1987

Learning to Select Knowledge for Response Generation in Dialog Systems
Rongzhong Lian, Min Xie, Fan Wang, Jinhua Peng, Hua Wu
Details | PDF

Dialogue

End-to-end neural models for intelligent dialogue systems suffer from the problem of generating uninformative responses. Various methods were proposed to generate more informative responses by leveraging external knowledge. However, few previous work has focused on selecting appropriate knowledge in the learning process. The inappropriate selection of knowledge could prohibit the model from learning to make full use of the knowledge. Motivated by this, we propose an end-to-end neural model which employs a novel knowledge selection mechanism where both prior and posterior distributions over knowledge are used to facilitate knowledge selection. Specifically, a posterior distribution over knowledge is inferred from both utterances and responses, and it ensures the appropriate selection of knowledge during the training process. Meanwhile, a prior distribution, which is inferred from utterances only, is used to approximate the posterior distribution so that appropriate knowledge can be selected even without responses during the inference process. Compared with the previous work, our model can better incorporate appropriate knowledge in response generation. Experiments on both automatic and human evaluation verify the superiority of our model over previous baselines.
#2334

GSN: A Graph-Structured Network for Multi-Party Dialogues
Wenpeng Hu, Zhangming Chan, Bing Liu, Dongyan Zhao, Jinwen Ma, Rui Yan
Details | PDF

Dialogue

Existing neural models for dialogue response generation assume that utterances are sequentially organized. However, many real-world dialogues involve multiple interlocutors (i.e., multi-party dialogues), where the assumption does not hold as utterances from different interlocutors can occur ``in parallel.'' This paper generalizes existing sequence-based models to a Graph-Structured neural Network (GSN) for dialogue modeling. The core of GSN is a graph-based encoder that can model the information flow along the graph-structured dialogues (two-party sequential dialogues are a special case). Experimental results show that GSN significantly outperforms existing sequence-based models.
#2504

Dual Visual Attention Network for Visual Dialog
Dan Guo, Hui Wang, Meng Wang
Details | PDF

Dialogue

Visual dialog is a challenging task, which involves multi-round semantic transformations between vision and language. This paper aims to address cross-modal semantic correlation for visual dialog. Motivated by that Vg (global vision), Vl (local vision), Q (question) and H (history) have inseparable relevances, the paper proposes a novel Dual Visual Attention Network (DVAN) to realize (Vg, Vl, Q, H)--> A. DVAN is a three-stage query-adaptive attention model. In order to acquire accurate A (answer), it first explores the textual attention, which imposes the question on history to pick out related context H'. Then, based on Q and H', it implements respective visual attentions to discover related global image visual hints Vg' and local object-based visual hints Vl'. Next, a dual crossing visual attention is proposed. Vg' and Vl' are mutually embedded to learn the complementary of visual semantics. Finally, the attended textual and visual features are combined to infer the answer. Experimental results on the VisDial v0.9 and v1.0 datasets validate the effectiveness of the proposed approach.
#2665

A Document-grounded Matching Network for Response Selection in Retrieval-based Chatbots
Xueliang Zhao, Chongyang Tao, Wei Wu, Can Xu, Dongyan Zhao, Rui Yan
Details | PDF

Dialogue

We present a document-grounded matching network (DGMN) for response selection that can power a knowledge-aware retrieval-based chatbot system. The challenges of building such a model lie in how to ground conversation contexts with background documents and how to recognize important information in the documents for matching. To overcome the challenges, DGMN fuses information in a document and a context into representations of each other, and dynamically determines if grounding is necessary and importance of different parts of the document and the context through hierarchical interaction with a response at the matching step. Empirical studies on two public data sets indicate that DGMN can significantly improve upon state-of-the-art methods and at the same time enjoys good interpretability.

Wednesday 14 11:00 - 12:30 CV|RDCIMRSI - Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 3 (2501-2502)

Chair: Shiliang Zhang

#2079

Pedestrian Attribute Recognition by Joint Visual-semantic Reasoning and Knowledge Distillation
Qiaozhe Li, Xin Zhao, Ran He, Kaiqi Huang
Details | PDF

Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 3

Pedestrian attribute recognition in surveillance is a challenging task in computer vision due to significant pose variation, viewpoint change and poor image quality. To achieve effective recognition, this paper presents a graph-based global reasoning framework to jointly model potential visual-semantic relations of attributes and distill auxiliary human parsing knowledge to guide the relational learning. The reasoning framework models attribute groups on a graph and learns a projection function to adaptively assign local visual features to the nodes of the graph. After feature projection, graph convolution is utilized to perform global reasoning between the attribute groups to model their mutual dependencies. Then, the learned node features are projected back to visual space to facilitate knowledge transfer. An additional regularization term is proposed by distilling human parsing knowledge from a pre-trained teacher model to enhance feature representations. The proposed framework is verified on three large scale pedestrian attribute datasets including PETA, RAP, and PA-100k. Experiments show that our method achieves state-of-the-art results.
#3776

Low Shot Box Correction for Weakly Supervised Object Detection
Tianxiang Pan, Bin Wang, Guiguang Ding, Jungong Han, Junhai Yong
Details | PDF

Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 3

Weakly supervised object detection (WSOD) has been widely studied but the accuracy of state-of-art methods remains far lower than strongly supervised methods. One major reason for this huge gap is the incomplete box detection problem which arises because most previous WSOD models are structured on classification networks and therefore tend to recognize the most discriminative parts instead of complete bounding boxes. To solve this problem, we define a low-shot weakly supervised object detection task and propose a novel low-shot box correction network to address it. The proposed task enables to train object detectors on a large data set all of which have image-level annotations, but only a small portion or few shots have box annotations. Given the low-shot box annotations, we use a novel box correction network to transfer the incomplete boxes into complete ones. Extensive empirical evidence shows that our proposed method yields state-of-art detection accuracy under various settings on the PASCAL VOC benchmark.
#3835

Transferable Adversarial Attacks for Image and Video Object Detection
Xingxing Wei, Siyuan Liang, Ning Chen, Xiaochun Cao
Details | PDF

Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 3

Identifying adversarial examples is beneficial for understanding deep networks and developing robust models. However, existing attacking methods for image object detection have two limitations: weak transferability---the generated adversarial examples often have a low success rate to attack other kinds of detection methods, and high computation cost---they need much time to deal with video data, where many frames need polluting. To address these issues, we present a generative method to obtain adversarial images and videos, thereby significantly reducing the processing time. To enhance transferability, we manipulate the feature maps extracted by a feature network, which usually constitutes the basis of object detectors. Our method is based on the Generative Adversarial Network (GAN) framework, where we combine a high-level class loss and a low-level feature loss to jointly train the adversarial example generator. Experimental results on PASCAL VOC and ImageNet VID datasets show that our method efficiently generates image and video adversarial examples, and more importantly, these adversarial examples have better transferability, therefore being able to simultaneously attack two kinds of representative object detection models: proposal based models like Faster-RCNN and regression based models like SSD.
#4376

Equally-Guided Discriminative Hashing for Cross-modal Retrieval
Yufeng Shi, Xinge You, Feng Zheng, Shuo Wang, Qinmu Peng
Details | PDF

Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 3

Cross-modal hashing intends to project data from two modalities into a common hamming space to perform cross-modal retrieval efficiently. Despite satisfactory performance achieved on real applications, existing methods are incapable of effectively preserving semantic structure to maintain inter-class relationship and improving discriminability to make intra-class samples aggregated simultaneously, which thus limits the higher retrieval performance. To handle this problem, we propose Equally-Guided Discriminative Hashing (EGDH), which jointly takes into consideration semantic structure and discriminability. Specifically, we discover the connection between semantic structure preserving and discriminative methods. Based on it, we directly encode multi-label annotations that act as high-level semantic features to build a common semantic structure preserving classifier. With the common classifier to guide the learning of different modal hash functions equally, hash codes of samples are intra-class aggregated and inter-class relationship preserving. Experimental results on two benchmark datasets demonstrate the superiority of EGDH compared with the state-of-the-arts.
#151

Color-Sensitive Person Re-Identification
Guan'an Wang, Yang Yang, Jian Cheng, Jinqiao Wang, Zengguang Hou
Details | PDF

Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 3

Recent deep Re-ID models mainly focus on learning high-level semantic features, while failing to explicitly explore color information which is one of the most important cues for person Re-ID. In this paper, we propose a novel Color-Sensitive Re-ID to take full advantage of color information. On one hand, we train our model with real and fake images. By using the extra fake images, more color information can be exploited and it can avoid overfitting during training. On the other hand, we also train our model with images of the same person with different colors. By doing so, features can be forced to focus on the color difference in regions. To generate fake images with specified colors, we propose a novel Color Translation GAN (CTGAN) to learn mappings between different clothing colors and preserve identity consistency among the same clothing color. Extensive evaluations on two benchmark datasets show that our approach significantly outperforms state-of-the-art Re-ID models.
#168

Graph Convolutional Network Hashing for Cross-Modal Retrieval
Ruiqing Xu, Chao Li, Junchi Yan, Cheng Deng, Xianglong Liu
Details | PDF

Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 3

Deep network based cross-modal retrieval has recently made significant progress. However, bridging modality gap to further enhance the retrieval accuracy still remains a crucial bottleneck. In this paper, we propose a Graph Convolutional Hashing (GCH) approach, which learns modality-unified binary codes via an affinity graph. An end-to-end deep architecture is constructed with three main components: a semantic encoder module, two feature encoding networks, and a graph convolutional network (GCN). We design a semantic encoder as a teacher module to guide the feature encoding process, a.k.a. student module, for semantic information exploiting. Furthermore, GCN is utilized to explore the inherent similarity structure among data points, which will help to generate discriminative hash codes. Extensive experiments on three benchmark datasets demonstrate that the proposed GCH outperforms the state-of-the-art methods.

Wednesday 14 11:00 - 12:30 PS|TFP - Theoretical Foundations of Planning (2503-2504)

Chair: Sylvie Thiebaux

#2816

Partitioning Techniques in LTLf Synthesis
Lucas Martinelli Tabajara, Moshe Y. Vardi
Details | PDF

Theoretical Foundations of Planning

Decomposition is a general principle in computational thinking, aiming at decomposing a problem instance into easier subproblems. Indeed, decomposing a transition system into a partitioned transition relation was critical to scaling BDD-based model checking to large state spaces. Since then, it has become a standard technique for dealing with related problems, such as Boolean synthesis. More recently, partitioning has begun to be explored in the synthesis of reactive systems. LTLf synthesis, a finite-horizon version of reactive synthesis with applications in areas such as robotics, seems like a promising candidate for partitioning techniques. After all, the state of the art is based on a BDD-based symbolic algorithm similar to those from model checking, and partitioning could be a potential solution to the current bottleneck of this approach, which is the construction of the state space. In this work, however, we expose fundamental limitations of partitioning that hinder its effective application to symbolic LTLf synthesis. We not only provide evidence for this fact through an extensive experimental evaluation, but also perform an in-depth analysis to identify the reason for these results. We trace the issue to an overall increase in the size of the explored state space, caused by an inability of partitioning to fully exploit state-space minimization, which has a crucial effect on performance. We conclude that more specialized decomposition techniques are needed for LTLf synthesis which take into account the effects of minimization.
#6582

Dynamic logic of parallel propositional assignments and its applications to planning
Andreas Herzig, Frédéric Maris, Julien Vianey
Details | PDF

Theoretical Foundations of Planning

We introduce a dynamic logic with parallel composition and two kinds of nondeterministic composition, exclusive and inclusive. We show PSPACE completeness of both the model checking and the satisfiability problem and apply our logic to sequential and parallel classical planning where actions have conditional effects.
#733

Planning for LTLf /LDLf Goals in Non-Markovian Fully Observable Nondeterministic Domains
Ronen I. Brafman, Giuseppe De Giacomo
Details | PDF

Theoretical Foundations of Planning

In this paper, we investigate non-Markovian Nondeterministic Fully Observable Planning Domains (NMFONDs), variants of Nondeterministic Fully Observable Planning Domains (FONDs) where the next state is determined by the full history leading to the current state. In particular, we introduce TFONDs which are NMFONDs where conditions on the history are succinctly and declaratively specified using the linear-time temporal logic on finite traces LTLf and its extension LDLf. We provide algorithms for planning in TFONDs for general LTLf/LDLf goals, and establish tight complexity bounds w.r.t. the domain representation and the goal, separately. We also show that TFONDs are able to capture all NMFONDs in which the dependency on the history is "finite state". Finally, we show that TFONDs also capture Partially Observable Nondeterministic Planning Domains (PONDs), but without referring to unobservable variables.
#1561

Steady-State Policy Synthesis for Verifiable Control
Alvaro Velasquez
Details | PDF

Theoretical Foundations of Planning

In this paper, we introduce the Steady-State Policy Synthesis (SSPS) problem which consists of ﬁnding a stochastic decision-making policy that maximizes expected rewards while satisfying a set of asymptotic behavioral speciﬁcations. These speciﬁcations are determined by the steady-state probability distribution resulting from the Markov chain induced by a given policy. Since such distributions necessitate recurrence, we propose a solution which ﬁnds policies that induce recurrent Markov chains within possibly non-recurrent Markov Decision Processes (MDPs). The SSPS problem functions as a generalization of steady-state control, which has been shown to be in PSPACE. We improve upon this result by showing that SSPS is in P via linear programming. Our results are validated using CPLEX simulations on MDPs with over 10000 states. We also prove that the deterministic variant of SSPS is NP-hard.
#10960

(Sister Conferences Best Papers Track) A Refined Understanding of Cost-optimal Planning with Polytree Causal Graphs
Christer Bäckström, Peter Jonsson, Sebastian Ordyniak
Details | PDF

Theoretical Foundations of Planning

Complexity analysis based on the causal graphs of planning instances is a highly important research area. In particular, tractability results have led to new methods for constructing domain-independent heuristics. Important early examples of such results were presented by, for instance, Brafman & Domshlak and Katz & Keyder. More general results based on polytrees and bounding certain parameters were subsequently derived by Aghighi et al. and Ståhlberg. We continue this line of research by analyzing cost-optimal planning for instances with a polytree causal graph, bounded domain size and bounded depth. We show that no further restrictions are necessary for tractability, thus generalizing the previous results. Our approach is based on a novel method of closely analysing optimal plans: we recursively decompose the causal graph in a way that allows for bounding the number of variable changes as a function of the depth, using a reording argument and a comparison with prefix trees of known size. We then transform the planning instances into tree-structured constraint satisfaction instances.
#4601

Reachability and Coverage Planning for Connected Agents
Tristan Charrier, Arthur Queffelec, Ocan Sankur, François Schwarzentruber
Details | PDF

Theoretical Foundations of Planning

Motivated by the increasing appeal of robots in information-gathering missions, we study multi-agent path planning problems in which the agents must remain interconnected. We model an area by a topological graph specifying the movement and the connectivity constraints of the agents. We study the theoretical complexity of the reachability and the coverage problems of a fleet of connected agents on various classes of topological graphs. We establish the complexity of these problems on known classes, and introduce a new class called sight-moveable graphs which admit efficient algorithms.

Wednesday 14 11:00 - 12:30 ML|DM - Data Mining 5 (2505-2506)

Chair: Hau Chan

#1007

RecoNet: An Interpretable Neural Architecture for Recommender Systems
Francesco Fusco, Michalis Vlachos, Vasileios Vasileiadis, Kathrin Wardatzky, Johannes Schneider
Details | PDF

Data Mining 5

Neural systems offer high predictive accuracy but are plagued by long training times and low interpretability. We present a simple neural architecture for recommender systems that lifts several of these shortcomings. Firstly, the approach has a high predictive power that is comparable to state-of-the-art recommender approaches. Secondly, owing to its simplicity, the trained model can be interpreted easily because it provides the individual contribution of each input feature to the decision. Our method is three orders of magnitude faster than general-purpose explanatory approaches, such as LIME. Finally, thanks to its design, our architecture addresses cold-start issues, and therefore the model does not require retraining in the presence of new users.
#1044

GSTNet: Global Spatial-Temporal Network for Traffic Flow Prediction
Shen Fang, Qi Zhang, Gaofeng Meng, Shiming Xiang, Chunhong Pan
Details | PDF

Data Mining 5

Predicting traffic flow on traffic networks is a very challenging task, due to the complicated and dynamic spatial-temporal dependencies between different nodes on the network. The traffic flow renders two types of temporal dependencies, including short-term neighboring and long-term periodic dependencies. What's more, the spatial correlations over different nodes are both local and non-local. To capture the global dynamic spatial-temporal correlations, we propose a Global Spatial-Temporal Network (GSTNet), which consists of several layers of spatial-temporal blocks. Each block contains a multi-resolution temporal module and a global correlated spatial module in sequence, which can simultaneously extract the dynamic temporal dependencies and the global spatial correlations. Extensive experiments on the real world datasets verify the effectiveness and superiority of the proposed method on both the public transportation network and the road network.
#1050

Graph Contextualized Self-Attention Network for Session-based Recommendation
Chengfeng Xu, Pengpeng Zhao, Yanchi Liu, Victor S. Sheng, Jiajie Xu, Fuzhen Zhuang, Junhua Fang, Xiaofang Zhou
Details | PDF

Data Mining 5

Session-based recommendation, which aims to predict the user's immediate next action based on anonymous sessions, is a key task in many online services (e.g., e-commerce, media streaming). Recently, Self-Attention Network (SAN) has achieved significant success in various sequence modeling tasks without using either recurrent or convolutional network. However, SAN lacks local dependencies that exist over adjacent items and limits its capacity for learning contextualized representations of items in sequences. In this paper, we propose a graph contextualized self-attention model (GC-SAN), which utilizes both graph neural network and self-attention mechanism, for session-based recommendation. In GC-SAN, we dynamically construct a graph structure for session sequences and capture rich local dependencies via graph neural network (GNN). Then each session learns long-range dependencies by applying the self-attention mechanism. Finally, each session is represented as a linear combination of the global preference and the current interest of that session. Extensive experiments on two real-world datasets show that GC-SAN outperforms state-of-the-art methods consistently.
#2999

Outlier Detection for Time Series with Recurrent Autoencoder Ensembles
Tung Kieu, Bin Yang, Chenjuan Guo, Christian S. Jensen
Details | PDF

Data Mining 5

We propose two solutions to outlier detection in time series based on recurrent autoencoder ensembles. The solutions exploit autoencoders built using sparsely-connected recurrent neural networks (S-RNNs). Such networks make it possible to generate multiple autoencoders with different neural network connection structures. The two solutions are ensemble frameworks, specifically an independent framework and a shared framework, both of which combine multiple S-RNN based autoencoders to enable outlier detection. This ensemble-based approach aims to reduce the effects of some autoencoders being overfitted to outliers, this way improving overall detection quality. Experiments with two large real-world time series data sets, including univariate and multivariate time series, offer insight into the design properties of the proposed frameworks and demonstrate that the resulting solutions are capable of outperforming both baselines and the state-of-the-art methods.
#5978

Similarity Preserving Representation Learning for Time Series Clustering
Qi Lei, Jinfeng Yi, Roman Vaculin, Lingfei Wu, Inderjit S. Dhillon
Details | PDF

Data Mining 5

A considerable amount of clustering algorithms take instance-feature matrices as their inputs. As such, they cannot directly analyze time series data due to its temporal nature, usually unequal lengths, and complex properties. This is a great pity since many of these algorithms are effective, robust, efficient, and easy to use. In this paper, we bridge this gap by proposing an efficient representation learning framework that is able to convert a set of time series with various lengths to an instance-feature matrix. In particular, we guarantee that the pairwise similarities between time series are well preserved after the transformation , thus the learned feature representation is particularly suitable for the time series clustering task. Given a set of $n$ time series, we first construct an $n\times n$ partially-observed similarity matrix by randomly sampling $\mathcal{O}(n \log n)$ pairs of time series and computing their pairwise similarities. We then propose an efficient algorithm that solves a non-convex and NP-hard problem to learn new features based on the partially-observed similarity matrix. By conducting extensive empirical studies, we demonstrate that the proposed framework is much more effective, efficient, and flexible compared to other state-of-the-art clustering methods.
#6128

DyAt Nets: Dynamic Attention Networks for State Forecasting in Cyber-Physical Systems
Nikhil Muralidhar, Sathappan Muthiah, Naren Ramakrishnan
Details | PDF

Data Mining 5

Multivariate time series forecasting is an important task in state forecasting for cyber-physical systems (CPS). State forecasting in CPS is imperative for optimal planning of system energy utility and understanding normal operational characteristics of the system thus enabling anomaly detection. Forecasting models can also be used to identify sub-optimal or worn out components and are thereby useful for overall system monitoring. Most existing work only performs single step forecasting but in CPS it is imperative to forecast the next sequence of system states (i.e curve forecasting). In this paper, we propose DyAt (Dynamic Attention) networks, a novel deep learning sequence to sequence (Seq2Seq) model with a novel hierarchical attention mechanism for long-term time series state forecasting. We evaluate our method on several CPS state forecasting and electric load forecasting tasks and find that our proposed DyAt models yield a performance improvement of at least 13.69% for the CPS state forecasting task and a performance improvement of at least 18.83% for the electric load forecasting task over other state-of-the-art forecasting baselines. We perform rigorous experimentation with several variants of the DyAt model and demonstrate that the DyAt models indeed learn better representations over the entire course of the long term forecast as compared to their counterparts with or without traditional attention mechanisms. All data and source code has been made available online.

Wednesday 14 11:00 - 12:30 ML|TAML - Transfer, Adaptation, Multi-task Learning 2 (2401-2402)

Chair: Boyu Wang

#2810

Metadata-driven Task Relation Discovery for Multi-task Learning
Zimu Zheng, Yuqi Wang, Quanyu Dai, Huadi Zheng, Dan Wang
Details | PDF

Transfer, Adaptation, Multi-task Learning 2

Task Relation Discovery (TRD), i.e., reveal the relation of tasks, has notable value: it is the key concept underlying Multi-task Learning (MTL) and provides a principled way for identifying redundancies across tasks. However, task relation is usually specifically determined by data scientist resulting in the additional human effort for TRD, while transfer based on brute-force methods or mere training samples may cause negative effects which degrade the learning performance. To avoid negative transfer in an automatic manner, our idea is to leverage commonly available context attributes in nowadays systems, i.e., the metadata. In this paper, we, for the first time, introduce metadata into TRD for MTL and propose a novel Metadata Clustering method, which jointly uses historical samples and additional metadata to automatically exploit the true relatedness. It also avoids the negative transfer by identifying reusable samples between related tasks. Experimental results on five real-world datasets demonstrate that the proposed method is effective for MTL with TRD, and particularly useful in complicated systems with diverse metadata but insufficient data samples. In general, this study helps in automatic relation discovery among partially related tasks and sheds new light on the development of TRD in MTL through the use of metadata as apriori information.
#4660

Group LASSO with Asymmetric Structure Estimation for Multi-Task Learning
Saullo H. G. Oliveira, André R. Gonçalves, Fernando J. Von Zuben
Details | PDF

Transfer, Adaptation, Multi-task Learning 2

Group LASSO is a widely used regularization that imposes sparsity considering groups of covariates. When used in Multi-Task Learning (MTL) formulations, it makes an underlying assumption that if one group of covariates is not relevant for one or a few tasks, it is also not relevant for all tasks, thus implicitly assuming that all tasks are related. This implication can easily lead to negative transfer if this assumption does not hold for all tasks. Since for most practical applications we hardly know a priori how the tasks are related, several approaches have been conceived in the literature to (i) properly capture the transference structure, (ii) improve interpretability of the tasks interplay, and (iii) penalize potential negative transfer. Recently, the automatic estimation of asymmetric structures inside the learning process was capable of effectively avoiding negative transfer. Our proposal is the first attempt in the literature to conceive a Group LASSO with asymmetric transference formulation, looking for the best of both worlds in a framework that admits the overlap of groups. The resulting optimization problem is solved by an alternating procedure with fast methods. We performed experiments using synthetic and real datasets to compare our proposal with state-of-the-art approaches, evidencing the promising predictive performance and distinguished interpretability of our proposal. The real case study involves the prediction of cognitive scores for Alzheimer's disease progression assessment. The source codes are available at GitHub.
#5047

Meta-Learning for Low-resource Natural Language Generation in Task-oriented Dialogue Systems
Fei Mi, Minlie Huang, Jiyong Zhang, Boi Faltings
Details | PDF

Transfer, Adaptation, Multi-task Learning 2

Natural language generation (NLG) is an essential component of task-oriented dialogue systems. Despite the recent success of neural approaches for NLG, they are typically developed for particular domains with rich annotated training examples. In this paper, we study NLG in a low-resource setting to generate sentences in new scenarios with handful training examples. We formulate the problem from a meta-learning perspective, and propose a generalized optimization-based approach (Meta-NLG) based on the well-recognized model-agnostic meta-learning (MAML) algorithm. Meta-NLG defines a set of meta tasks, and directly incorporates the objective of adapting to new low-resource NLG tasks into the meta-learning optimization process. Extensive experiments are conducted on a large multi-domain dataset (MultiWoz) with diverse linguistic variations. We show that Meta-NLG significantly outperforms other training procedures in various low-resource configurations. We analyze the results, and demonstrate that Meta-NLG adapts extremely fast and well to low-resource situations.
#5499

One Network for Multi-Domains: Domain Adaptive Hashing with Intersectant Generative Adversarial Networks
Tao He, Yuan-Fang Li, Lianli Gao, Dongxiang Zhang, Jingkuan Song
Details | PDF

Transfer, Adaptation, Multi-task Learning 2

With the recent explosive increase of digital data, image recognition and retrieval become a critical practical application. Hashing is an effective solution to this problem, due to its low storage requirement and high query speed. However, most of past works focus on hashing in a single (source) domain. Thus, the learned hash function may not adapt well in a new (target) domain that has a large distributional difference with the source domain. In this paper, we explore an end-to-end domain adaptive learning framework that simultaneously and precisely generates discriminative hash codes and classifies target domain images. Our method encodes two domains images into a semantic common space, followed by two independent generative adversarial networks arming at crosswise reconstructing two domains’ images, reducing domain disparity and improving alignment in the shared space. We evaluate our framework on four public benchmark datasets, all of which show that our method is superior to the other state-of-the-art methods on the tasks of object recognition and image retrieval.
#1358

Progressive Transfer Learning for Person Re-identification
Zhengxu Yu, Zhongming Jin, Long Wei, Jishun Guo, Jianqiang Huang, Deng Cai, Xiaofei He, Xian-Sheng Hua
Details | PDF

Transfer, Adaptation, Multi-task Learning 2

Model fine-tuning is a widely used transfer learning approach in person Re-identification (ReID) applications, which fine-tuning a pre-trained feature extraction model into the target scenario instead of training a model from scratch. It is challenging due to the significant variations inside the target scenario, e.g., different camera viewpoint, illumination changes, and occlusion. These variations result in a gap between the distribution of each mini-batch and the distribution of the whole dataset when using mini-batch training. In this paper, we study model fine-tuning from the perspective of the aggregation and utilization of the global information of the dataset when using mini-batch training. Specifically, we introduce a novel network structure called Batch-related Convolutional Cell (BConv-Cell), which progressively collects the global information of the dataset into a latent state and uses this latent state to rectify the extracted feature. Based on BConv-Cells, we further proposed the Progressive Transfer Learning (PTL) method to facilitate the model fine-tuning process by joint training the BConv-Cells and the pre-trained ReID model. Empirical experiments show that our proposal can improve the performance of the ReID model greatly on MSMT17, Market-1501, CUHK03 and DukeMTMC-reID datasets. The code will be released later on at \url{https://github.com/ZJULearning/PTL}
#2913

Complementary Learning for Overcoming Catastrophic Forgetting Using Experience Replay
Mohammad Rostami, Soheil Kolouri, Praveen K. Pilly
Details | PDF

Transfer, Adaptation, Multi-task Learning 2

Despite huge success, deep networks are unable to learn effectively in sequential multitask learning settings as they forget the past learned tasks after learning new tasks. Inspired from complementary learning systems theory, we address this challenge by learning a generative model that couples the current task to the past learned tasks through a discriminative embedding space. We learn an abstract generative distribution in the embedding that allows generation of data points to represent past experience. We sample from this distribution and utilize experience replay to avoid forgetting and simultaneously accumulate new knowledge to the abstract distribution in order to couple the current task with past experience. We demonstrate theoretically and empirically that our framework learns a distribution in the embedding, which is shared across all tasks, and as a result tackles catastrophic forgetting.

Wednesday 14 11:00 - 12:30 HSGP|HS - Heuristic Search 1 (2403-2404)

Chair: Ariel Felner

#762

Depth-First Memory-Limited AND/OR Search and Unsolvability in Cyclic Search Spaces
Akihiro Kishimoto, Adi Botea, Radu Marinescu
Details | PDF

Heuristic Search 1

Computing cycle-free solutions in cyclic AND/OR search spaces is an important AI problem. Previous work on optimal depth-first search strongly assumes the use of consistent heuristics, the need to keep all examined states in a transposition table, and the existence of solutions. We give a new theoretical analysis under relaxed assumptions where previous results no longer hold. We then present a generic approachto proving unsolvability, and apply it to RBFAOO and BLDFS, two state-of-the-art algorithms. We demonstrate the performance in domain-independent nondeterministic planning
#1195

Conditions for Avoiding Node Re-expansions in Bounded Suboptimal Search
Jingwei Chen, Nathan R. Sturtevant
Details | PDF

Heuristic Search 1

Many practical problems are too difficult to solve optimally, motivating the need to found suboptimal solutions, particularly those with bounds on the final solution quality. Algorithms like Weighted A*, A*-epsilon, Optimistic Search, EES, and DPS have been developed to find suboptimal solutions with solution quality that is within a constant bound of the optimal solution. However, with the exception of weighted A*, all of these algorithms require performing node re-expansions during search. This paper explores the properties of priority functions that can find bounded suboptimal solution without requiring node re-expansions. After general bounds are developed, two new convex priority functions are developed that can outperform weighted A*.
#5894

A*+IDA*: A Simple Hybrid Search Algorithm
Zhaoxing Bu, Richard E. Korf
Details | PDF

Heuristic Search 1

We present a simple combination of A* and IDA*, which we call A*+IDA*. It runs A* until memory is almost exhausted, then runs IDA* below each frontier node without duplicate checking. It is widely believed that this algorithm is called MREC, but MREC is just IDA* with a transposition table. A*+IDA* is the first algorithm to run significantly faster than IDA* on the 24-Puzzle, by a factor of almost 5. A complex algorithm called dual search was reported to significantly outperform IDA* on the 24-Puzzle, but the original version does not. We made improvements to dual search and our version combined with A*+IDA* outperforms IDA* by a factor of 6.7 on the 24-Puzzle. Our disk-based A*+IDA* shows further improvement on several hard 24-Puzzle instances. We also found optimal solutions to a subset of random 27 and 29-Puzzle problems. A*+IDA* does not outperform IDA* on Rubik’s Cube, for reasons we explain.
#5795

Heuristic Search for Homology Localization Problem and Its Application in Cardiac Trabeculae Reconstruction
Xudong Zhang, Pengxiang Wu, Changhe Yuan, Yusu Wang, Dimitris Metaxas, Chao Chen
Details | PDF

Heuristic Search 1

Cardiac trabeculae are fine rod-like muscles whose ends are attached to the inner walls of ventricles. Accurate extraction of trabeculae is important yet challenging, due to the background noise and limited resolution of cardiac images. Existing works proposed to handle this task by modeling the trabeculae as topological handles for better extraction. Computing optimal representation of these handles is essential yet very expensive. In this work, we formulate the problem as a heuristic search problem, and propose novel heuristic functions based on advanced topological techniques. We show in experiments that the proposed heuristic functions improve the computation in both time and memory.
#1078

Iterative Budgeted Exponential Search
Malte Helmert, Tor Lattimore, Levi H. S. Lelis, Laurent Orseau, Nathan R. Sturtevant
Details | PDF

Heuristic Search 1

We tackle two long-standing problems related to re-expansions in heuristic search algorithms. For graph search, A* can require Ω(2ⁿ) expansions, where n is the number of states within the final f bound. Existing algorithms that address this problem like B and B’ improve this bound to Ω(n²). For tree search, IDA* can also require Ω(n²) expansions. We describe a new algorithmic framework that iteratively controls an expansion budget and solution cost limit, giving rise to new graph and tree search algorithms for which the number of expansions is O(n log C*), where C* is the optimal solution cost. Our experiments show that the new algorithms are robust in scenarios where existing algorithms fail. In the case of tree search, our new algorithms have no overhead over IDA* in scenarios to which IDA* is well suited and can therefore be recommended as a general replacement for IDA*.

Wednesday 14 11:00 - 12:30 DemoT3 - Demo Talks 3 (2306)

Chair: Han Yu

#11020

Crowd View: Converting Investors' Opinions into Indicators
Chung-Chi Chen, Hen-Hsen Huang, Hsin-Hsi Chen
Details | PDF

Demo Talks 3

This paper demonstrates an opinion indicator (OI) generation system, named Crowd View, with which traders can refer to the fine-grained opinions, beyond the market sentiment (bullish/bearish), from crowd investors when trading financial instruments. We collect the real-time textual information from Twitter, and convert it into five kinds of OIs, including the support price level, resistance price level, price target, buy-side cost, and sell-side cost. The OIs for all component stocks in Dow Jones Industrial Average Index (DJI) are provided, and shown with the real-time stock price for comparison and analysis. The information embedding in the OIs and the application scenarios are introduced.
#11021

OpenMarkov, an Open-Source Tool for Probabilistic Graphical Models
Manuel Arias, Jorge Pérez-Martín, Manuel Luque, Francisco J. Díez
Details | PDF

Demo Talks 3

OpenMarkov is a Java open-source tool for creating and evaluating probabilistic graphical models, including Bayesian networks, influence diagrams, and some Markov models. With more than 100,000 lines of code, it offers some features for interactive learning, explanation of reasoning, and cost-effectiveness analysis, which are not available in any other tool. OpenMarkov has been used at universities, research centers, and large companies in more than 30 countries on four continents. Several models, some of them for real-world medical applications, built with OpenMarkov, are publicly available on Internet.
#11031

ACTA A Tool for Argumentative Clinical Trial Analysis
Tobias Mayer, Elena Cabrio, Serena Villata
Details | PDF

Demo Talks 3

Argumentative analysis of textual documents of various nature (e.g., persuasive essays, online discussion blogs, scientific articles) allows to detect the main argumentative components (i.e., premises and claims) present in the text and to predict whether these components are connected to each other by argumentative relations (e.g., support and attack), leading to the identification of (possibly complex) argumentative structures. Given the importance of argument-based decision making in medicine, in this demo paper we introduce ACTA, a tool for automating the argumentative analysis of clinical trials. The tool is designed to support doctors and clinicians in identifying the document(s) of interest about a certain disease, and in analyzing the main argumentative content and PICO elements.
#11047

VEST: A System for Vulnerability Exploit Scoring & Timing
Haipeng Chen, Jing Liu, Rui Liu, Noseong Park, V. S. Subrahmanian
Details | PDF

Demo Talks 3

Knowing if/when a cyber-vulnerability will be exploited and how severe the vulnerability is can help enterprise security officers (ESOs) come up with appropriate patching schedules. Today, this ability is severely compromised: our study of data from Mitre and NIST shows that on average there is a 132 day gap between the announcement of a vulnerability by Mitre and the time NIST provides an analysis with severity score estimates and 8 important severity attributes. Many attacks happen during this very 132-day window. We present Vulnerability Exploit Scoring \& Timing (VEST), a system for (early) prediction and visualization of if/when a vulnerability will be exploited, and its estimated severity attributes and score.
#11037

The pywmi Framework and Toolbox for Probabilistic Inference using Weighted Model Integration
Samuel Kolb, Paolo Morettin, Pedro Zuidberg Dos Martires, Francesco Sommavilla, Andrea Passerini, Roberto Sebastiani, Luc De Raedt
Details | PDF

Demo Talks 3

Weighted Model Integration (WMI) is a popular technique for probabilistic inference that extends Weighted Model Counting (WMC) -- the standard inference technique for inference in discrete domains -- to domains with both discrete and continuous variables. However, existing WMI solvers each have different interfaces and use different formats for representing WMI problems. Therefore, we introduce pywmi (http://pywmi.org), an open source framework and toolbox for probabilistic inference using WMI, to address these shortcomings. Crucially, pywmi fixes a common internal format for WMI problems and introduces a common interface for WMI solvers. To assist users in modeling WMI problems, pywmi introduces modeling languages based on SMT-LIB.v2 or MiniZinc and parsers for both. To assist users in comparing WMI solvers, pywmi includes implementations of several state-of-the-art solvers, a fast approximate WMI solver, and a command-line interface to solve WMI problems. Finally, to assist developers in implementing new solvers, pywmi provides Python implementations of commonly used subroutines.
#11046

CoTrRank: Trust Evaluation of Users and Tweets
Peiyao Li, Weiliang Zhao, Jian Yang, Jia Wu
Details | PDF

Demo Talks 3

Trust evaluation of people and information on Twitter is critical for maintaining a healthy online social environment. How to evaluate the trustworthiness of users and tweets becomes a challenging question. In this demo, we show how our proposed CoTrRank approach deal with this problem. This approach models users and tweets in two coupled networks and calculate their trust values in different trust spaces. In particular, our solution provides a configurable way when mapping the calculated raw evidences to the trust values. The CoTrRank demo system has an interactive interface to show how our proposed approach produces more effective and adaptive trust evaluation results comparing with baseline methods.
#11035

Intelligent Decision Support for Improving Power Management
Yongqing Zheng, Han Yu, Kun Zhang, Yuliang Shi, Cyril Leung, Chunyan Miao
Details | PDF

Demo Talks 3

With the development and adoption of the electricity information tracking system in China, real-time electricity consumption big data have become available to enable artificial intelligence (AI) to help power companies and the urban management departments to make demand side management decisions. We demonstrate the Power Intelligent Decision Support (PIDS) platform, which can generate Orderly Power Utilization (OPU) decision recommendations and perform Demand Response (DR) implementation management based on a short-term load forecasting model. It can also provide different users with query and application functions to facilitate explainable decision support.
#11053

GraspSnooker: Automatic Chinese Commentary Generation for Snooker Videos
Zhaoyue Sun, Jiaze Chen, Hao Zhou, Deyu Zhou, Lei Li, Mingmin Jiang
Details | PDF

Demo Talks 3

We demonstrate a web-based software system, GraspSnooker, which is able to automatically generate Chinese text commentaries for snooker game videos. It consists of a video analyzer, a strategy predictor and a commentary generator. As far as we know, it is the first attempt on snooker commentary generation, which might be helpful for snooker learners to understand the game.
#11042

SAGE: A Hybrid Geopolitical Event Forecasting System
Fred Morstatter, Aram Galstyan, Gleb Satyukov, Daniel Benjamin, Andres Abeliuk, Mehrnoosh Mirtaheri, KSM Tozammel Hossain, Pedro Szekely, Emilio Ferrara, Akira Matsui, Mark Steyvers, Stephen Bennet, David Budescu, Mark Himmelstein, Michael Ward, Andreas Beger, Michele Catasta, Rok Sosic, Jure Leskovec, Pavel Atanasov, Regina Joseph, Rajiv Sethi, Ali Abbas
Details | PDF

Demo Talks 3

Forecasting of geopolitical events is a notoriously difficult task, with experts failing to significantly outperform a random baseline across many types of forecasting events. One successful way to increase the performance of forecasting tasks is to turn to crowdsourcing: leveraging many forecasts from non-expert users. Simultaneously, advances in machine learning have led to models that can produce reasonable, although not perfect, forecasts for many tasks. Recent efforts have shown that forecasts can be further improved by ``hybridizing'' human forecasters: pairing them with the machine models in an effort to combine the unique advantages of both. In this demonstration, we present Synergistic Anticipation of Geopolitical Events (SAGE), a platform for human/computer interaction that facilitates human reasoning with machine models.

Wednesday 14 14:00 - 14:30 Industry Days (K)

Chair: Quan Lu (Alibaba Group)

Lenovo Enterprise Analytics Platform: LEAP AI and Case Studies
Jun Luo, Director and Principal Researcher of Lenovo Machine Intelligence Center, Lenovo

Industry Days

Wednesday 14 14:00 - 14:50 Invited Talk (D-I)

Chair: Raz Lin

Deep Learning: Why deep and is it only doable for neural networks?
Zhi-Hua Zhou

Invited Talk

Wednesday 14 14:30 - 15:00 Industry Days (K)

Chair: Quan Lu (Alibaba Group)

Automation AI - Infusing AI into an Enterprise
Yonghua Lin, Director, IBM Research China

Industry Days

Wednesday 14 15:00 - 15:30 Industry Days (K)

Chair: Quan Lu (Alibaba Group)

AI in Digital Banking
Tianjian Chen, Deputy General Manager of WeBank AI Department, WeBank

Industry Days

Wednesday 14 15:00 - 16:00 AI-HWB - ST: AI for Improving Human Well-Being 3 (J)

Chair: Gerald Steinbauer

#2209

A Decomposition Approach for Urban Anomaly Detection Across Spatiotemporal Data
Mingyang Zhang, Tong Li, Hongzhi Shi, Yong Li, Pan Hui
Details | PDF

ST: AI for Improving Human Well-Being 3

Urban anomalies such as abnormal flow of crowds and traffic accidents could result in loss of life or property if not handled properly. Detecting urban anomalies at the early stage is important to minimize the adverse effects. However, urban anomaly detection is difficult due to two challenges: a) the criteria of urban anomalies varies with different locations and time; b) urban anomalies of different types may show different signs. In this paper, we propose a decomposing approach to address these two challenges. Specifically, we decompose urban dynamics into the normal component and the abnormal component. The normal component is merely decided by spatiotemporal features, while the abnormal component is caused by anomalous events. Then, we extract spatiotemporal features and estimate the normal component accordingly. At last, we derive the abnormal component to identify anomalies. We evaluate our method using both real-world and synthetic datasets. The results show our method can detect meaningful events and outperforms state-of-the-art anomaly detecting methods by a large margin.
#3473

RDPD: Rich Data Helps Poor Data via Imitation
Shenda Hong, Cao Xiao, Trong Nghia Hoang, Tengfei Ma, Hongyan Li, Jimeng Sun
Details | PDF

ST: AI for Improving Human Well-Being 3

In many situations, we need to build and deploy separate models in related environments with different data qualities. For example, an environment with strong observation equipments (e.g., intensive care units) often provides high-quality multi-modal data, which are acquired from multiple sensory devices and have rich-feature representations. On the other hand, an environment with poor observation equipment (e.g., at home) only provides low-quality, uni-modal data with poor-feature representations. To deploy a competitive model in a poor-data environment without requiring direct access to multi-modal data acquired from a rich-data environment, this paper develops and presents a knowledge distillation (KD) method (RDPD) to enhance a predictive model trained on poor data using knowledge distilled from a high-complexity model trained on rich, private data. We evaluated RDPD on three real-world datasets and shown that its distilled model consistently outperformed all baselines across all datasets, especially achieving the greatest performance improvement over a model trained only on low-quality data by 24.56% on PR-AUC and 12.21% on ROC-AUC, and over that of a state-of-the-art KD model by 5.91% on PR-AUC and 4.44% on ROC-AUC.
#215

Systematic Conservation Planning for Sustainable Land-use Policies: A Constrained Partitioning Approach to Reserve Selection and Design.
Dimitri Justeau-Allaire, Philippe Vismara, Philippe Birnbaum, Xavier Lorca
Details | PDF

ST: AI for Improving Human Well-Being 3

Faced with natural habitat degradation, fragmentation, and destruction, it is a major challenge for environmental managers to implement sustainable land use policies promoting socioeconomic development and natural habitat conservation in a balanced way. Relying on artificial intelligence and operational research, reserve selection and design models can be of assistance. This paper introduces a partitioning approach based on Constraint Programming (CP) for the reserve selection and design problem, dealing with both coverage and complex spatial constraints. Moreover, it introduces the first CP formulation of the buffer zone constraint, which can be reused to compose more complex spatial constraints. This approach has been evaluated in a real-world dataset addressing the problem of forest fragmentation in New Caledonia, a biodiversity hotspot where managers are gaining interest in integrating these methods into their decisional processes. Through several scenarios, it showed expressiveness, flexibility, and ability to quickly find solutions to complex questions.
#4746

Balanced Ranking with Diversity Constraints
Ke Yang, Vasilis Gkatzelis, Julia Stoyanovich
Details | PDF

ST: AI for Improving Human Well-Being 3

Many set selection and ranking algorithms have recently been enhanced with diversity constraints that aim to explicitly increase representation of historically disadvantaged populations, or to improve the over-all representativeness of the selected set. An unintended consequence of these constraints, however, is reduced in-group fairness: the selected candidates from a given group may not be the best ones, and this unfairness may not be well-balanced across groups. In this paper we study this phenomenon using datasets that comprise multiple sensitive attributes. We then introduce additional constraints, aimed at balancing the in-group fairness across groups, and formalize the induced optimization problems as integer linear programs. Using these programs, we conduct an experimental evaluation with real datasets, and quantify the feasible trade-offs between balance and overall performance in the presence of diversity constraints.

Wednesday 14 15:00 - 16:00 ML|AML - Adversarial Machine Learning 2 (L)

Chair: Yuan-Fang Li

#5340

Topology Attack and Defense for Graph Neural Networks: An Optimization Perspective
Kaidi Xu, Hongge Chen, Sijia Liu, Pin-Yu Chen, Tsui-Wei Weng, Mingyi Hong, Xue Lin
Details | PDF

Adversarial Machine Learning 2

Graph neural networks (GNNs) which apply the deep neural networks to graph data have achieved significant performance for the task of semi-supervised node classification. However, only few work has addressed the adversarial robustness of GNNs. In this paper, we first present a novel gradient-based attack method that facilitates the difficulty of tackling discrete graph data. When comparing to current adversarial attacks on GNNs, the results show that by only perturbing a small number of edge perturbations, including addition and deletion, our optimization-based attack can lead to a noticeable decrease in classification performance. Moreover, leveraging our gradient-based attack, we propose the first optimization-based adversarial training for GNNs. Our method yields higher robustness against both different gradient based and greedy attack methods without sacrifice classification accuracy on original graph.
#5480

Feature Prioritization and Regularization Improve Standard Accuracy and Adversarial Robustness
Chihuang Liu, Joseph JaJa
Details | PDF

Adversarial Machine Learning 2

Adversarial training has been successfully applied to build robust models at a certain cost. While the robustness of a model increases, the standard classification accuracy declines. This phenomenon is suggested to be an inherent trade-off. We propose a model that employs feature prioritization by a nonlinear attention module and L2 feature regularization to improve the adversarial robustness and the standard accuracy relative to adversarial training. The attention module encourages the model to rely heavily on robust features by assigning larger weights to them while suppressing non-robust features. The regularizer encourages the model to extract similar features for the natural and adversarial images, effectively ignoring the added perturbation. In addition to evaluating the robustness of our model, we provide justification for the attention module and propose a novel experimental strategy that quantitatively demonstrates that our model is almost ideally aligned with salient data characteristics. Additional experimental results illustrate the power of our model relative to the state of the art methods.
#5913

Interpreting and Evaluating Neural Network Robustness
Fuxun Yu, Zhuwei Qin, Chenchen Liu, Liang Zhao, Yanzhi Wang, Xiang Chen
Details | PDF

Adversarial Machine Learning 2

Recently, adversarial deception becomes one of the most considerable threats to deep neural networks. However, compared to extensive research in new designs of various adversarial attacks and defenses, the neural networks' intrinsic robustness property is still lack of thorough investigation. This work aims to qualitatively interpret the adversarial attack and defense mechanisms through loss visualization, and establish a quantitative metric to evaluate the model's intrinsic robustness. The proposed robustness metric identifies the upper bound of a model's prediction divergence in the given domain and thus indicates whether the model can maintain a stable prediction. With extensive experiments, our metric demonstrates several advantages over conventional testing accuracy based robustness estimation: (1) it provides a uniformed evaluation to models with different structures and parameter scales; (2) it over-performs conventional accuracy based robustness evaluation and provides a more reliable evaluation that is invariant to different test settings; (3) it can be fast generated without considerable testing cost.
#4224

DiffChaser: Detecting Disagreements for Deep Neural Networks
Xiaofei Xie, Lei Ma, Haijun Wang, Yuekang Li, Yang Liu, Xiaohong Li
Details | PDF

Adversarial Machine Learning 2

The platform migration and customization have become an indispensable process of deep neural network (DNN) development lifecycle. A high-precision but complex DNN trained in the cloud on massive data and powerful GPUs often goes through an optimization phase (e.g, quantization, compression) before deployment to a target device (e.g, mobile device). A test set that effectively uncovers the disagreements of a DNN and its optimized variant provides certain feedback to debug and further enhance the optimization procedure. However, the minor inconsistency between a DNN and its optimized version is often hard to detect and easily bypasses the original test set. This paper proposes DiffChaser, an automated black-box testing framework to detect untargeted/targeted disagreements between version variants of a DNN. We demonstrate 1) its effectiveness by comparing with the state-of-the-art techniques, and 2) its usefulness in real-world DNN product deployment involved with quantization and optimization.

Wednesday 14 15:00 - 16:00 MLA|ARL - Applications of Reinforcement Learning (2701-2702)

Chair: Yanan Sui

#1508

Dynamic Electronic Toll Collection via Multi-Agent Deep Reinforcement Learning with Edge-Based Graph Convolutional Networks
Wei Qiu, Haipeng Chen, Bo An
Details | PDF

Applications of Reinforcement Learning

Over the past decades, Electronic Toll Collection (ETC) systems have been proved the capability of alleviating traffic congestion in urban areas. Dynamic Electronic Toll Collection (DETC) was recently proposed to further improve the efficiency of ETC, where tolls are dynamically set based on traffic dynamics. However, computing the optimal DETC scheme is computationally difficult and existing approaches are limited to small scale or partial road networks, which significantly restricts the adoption of DETC. To this end, we propose a novel multi-agent reinforcement learning (RL) approach for DETC. We make several key contributions: i) an enhancement over the state-of-the-art RL-based method with a deep neural network representation of the policy and value functions and a temporal difference learning framework to accelerate the update of target values, ii) a novel edge-based graph convolutional neural network (eGCN) to extract the spatio-temporal correlations of the road network state features, iii) a novel cooperative multi-agent reinforcement learning (MARL) which divides the whole road network into partitions according to their geographic and economic characteristics and trains a tolling agent for each partition. Experimental results show that our approach can scale up to realistic-sized problems with robust performance and significantly outperform the state-of-the-art method.
#3513

Randomized Adversarial Imitation Learning for Autonomous Driving
MyungJae Shin, Joongheon Kim
Details | PDF

Applications of Reinforcement Learning

With the evolution of various advanced driver assistance system (ADAS) platforms, the design of autonomous driving system is becoming more complex and safety-critical. The autonomous driving system simultaneously activates multiple ADAS functions; and thus it is essential to coordinate various ADAS functions. This paper proposes a randomized adversarial imitation learning (RAIL) method that imitates the coordination of autonomous vehicle equipped with advanced sensors. The RAIL policies are trained through derivative-free optimization for the decision maker that coordinates the proper ADAS functions, e.g., smart cruise control and lane keeping system. Especially, the proposed method is also able to deal with the LIDAR data and makes decisions in complex multi-lane highways and multi-agent environments.
#10983

(Journal track) Teaching AI Agents Ethical Values Using Reinforcement Learning and Policy Orchestration
Ritesh Noothigattu, Djallel Bouneffouf, Nicholas Mattei, Rachita Chandra, Piyush Madan, Kush R. Varshney, Murray Campbell, Moninder Singh, Francesca Rossi
Details | PDF

Applications of Reinforcement Learning

Autonomous cyber-physical agents play an increasingly large role in our lives. To ensure that they behave in ways aligned with the values of society, we must develop techniques that allow these agents to not only maximize their reward in an environment, but also to learn and follow the implicit constraints of society. We detail a novel approach that uses inverse reinforcement learning to learn a set of unspecified constraints from demonstrations and reinforcement learning to learn to maximize environmental rewards. A contextual bandit-based orchestrator then picks between the two policies: constraint-based and environment reward-based. The contextual bandit orchestrator allows the agent to mix policies in novel ways, taking the best actions from either a reward-maximizing or constrained policy. In addition, the orchestrator is transparent on which policy is being employed at each time step. We test our algorithms using Pac-Man and show that the agent is able to learn to act optimally, act within the demonstrated constraints, and mix these two functions in complex ways.
#2445

Building Personalized Simulator for Interactive Search
Qianlong Liu, Baoliang Cui, Zhongyu Wei, Baolin Peng, Haikuan Huang, Hongbo Deng, Jianye Hao, Xuanjing Huang, Kam-Fai Wong
Details | PDF

Applications of Reinforcement Learning

Interactive search, where a set of tags is recommended to users together with search results at each turn, is an effective way to guide users to identify their information need. It is a classical sequential decision problem and the reinforcement learning based agent can be introduced as a solution. The training of the agent can be divided into two stages, i.e., offline and online. Existing reinforcement learning based systems tend to perform the offline training in a supervised way based on historical labeled data while the online training is performed via reinforcement learning algorithms based on interactions with real users. The mis-match between online and offline training leads to a cold-start problem for the online usage of the agent. To address this issue, we propose to employ a simulator to mimic the environment for the offline training of the agent. Users' profiles are considered to build a personalized simulator, besides, model-based approach is used to train the simulator and is able to use the data efficiently. Experimental results based on real-world dataset demonstrate the effectiveness of our agent and personalized simulator.

Wednesday 14 15:00 - 16:00 AMS|NG - Noncooperative Games 2 (2703-2704)

Chair: Ehud Shapiro

#724

Multi-Population Congestion Games With Incomplete Information
Charlotte Roman, Paolo Turrini
Details | PDF

Noncooperative Games 2

Congestion games have many important applications to systems where only limited knowledge may be available to players. Here we study traffic networks with multiple origin-destination pairs, relaxing the simplifying assumption of agents having complete knowledge of the network structure. We identify a ubiquitous class of networks, i.e., rings, for which we can safely increase the agents’ knowledge without affecting their own overall performance - known as immunity to Informational Braess’ Paradox - closing a gap in the literature. By extension of this performance measure to include the welfare of all agents, i.e., minimisation of social cost, we show that IBP is a widespread phenomenon and no network is immune to it.
#1706

Imitative Attacker Deception in Stackelberg Security Games
Thanh Nguyen, Haifeng Xu
Details | PDF

Noncooperative Games 2

To address the challenge of uncertainty regarding the attacker’s payoffs, capabilities, and other characteristics, recent work in security games has focused on learning the optimal defense strategy from observed attack data. This raises a natural concern that the strategic attacker may mislead the defender by deceptively reacting to the learning algorithms. This paper focuses on understanding how such attacker deception affects the game equilibrium. We examine a basic deception strategy termed imitative deception, in which the attacker simply pretends to have a different payoff assuming his true payoff is unknown to the defender. We provide a clean characterization about the game equilibrium as well as optimal algorithms to compute the equilibrium. Our experiments illustrate significant defender loss due to imitative attacker deception, suggesting the potential side effect of learning from the attacker.
#4919

Optimality and Nash Stability in Additive Separable Generalized Group Activity Selection Problems
Vittorio Bilò, Angelo Fanelli, Michele Flammini, Gianpiero Monaco, Luca Moscardelli
Details | PDF

Noncooperative Games 2

The generalized group activity selection problem (GGASP) consists in assigning agents to activities according to their preferences, which depend on both the activity and the set of its participants. We consider additively separable GGASPs, where every agent has a separate valuation for each activity as well as for any other agent, and her overall utility is given by the sum of the valuations she has for the selected activity and its participants. Depending on the nature of the agents' valuations, nine different variants of the problem arise. We completely characterize the complexity of computing a social optimum and provide approximation algorithms for the NP-hard cases. We also focus on Nash stable outcomes, for which we give some complexity results and a full picture of the related performance by providing tights bounds on both the price of anarchy and the price of stability.
#5162

Leadership in Congestion Games: Multiple User Classes and Non-Singleton Actions
Alberto Marchesi, Matteo Castiglioni, Nicola Gatti
Details | PDF

Noncooperative Games 2

We study the problem of finding Stackelberg equilibria in games with a massive number of players. So far, the only known game instances in which the problem is solved in polynomial time are some particular congestion games. However, a complete characterization of hard and easy instances is still lacking. In this paper, we extend the state of the art along two main directions. First, we focus on games where players' actions are made of multiple resources, and we prove that the problem is NP-hard and not in Poly-APX unless P = NP, even in the basic case in which players are symmetric, their actions are made of only two resources, and the cost functions are monotonic. Second, we focus on games with singleton actions where the players are partitioned into classes, depending on which actions they have available. In this case, we provide a dynamic programming algorithm that finds an equilibrium in polynomial time, when the number of classes is fixed and the leader plays pure strategies. Moreover, we prove that, if we allow for leader's mixed strategies, then the problem becomes NP-hard even with only four classes and monotonic costs. Finally, for both settings, we provide mixed-integer linear programming formulations, and we experimentally evaluate their scalability on both random game instances and worst-case instances based on our hardness reductions.

Wednesday 14 15:00 - 16:00 AMS|CC - Coordination and Cooperation (2705-2706)

Chair: Pavel Surynek

#1084

AsymDPOP: Complete Inference for Asymmetric Distributed Constraint Optimization Problems
Yanchen Deng, Ziyu Chen, Dingding Chen, Wenxin Zhang, Xingqiong Jiang
Details | PDF

Coordination and Cooperation

Asymmetric distributed constraint optimization problems (ADCOPs) are an emerging model for coordinating agents with personal preferences. However, the existing inference-based complete algorithms which use local eliminations cannot be applied to ADCOPs, as the parent agents are required to transfer their private functions to their children. Rather than disclosing private functions explicitly to facilitate local eliminations, we solve the problem by enforcing delayed eliminations and propose AsymDPOP, the first inference-based complete algorithm for ADCOPs. To solve the severe scalability problems incurred by delayed eliminations, we propose to reduce the memory consumption by propagating a set of smaller utility tables instead of a joint utility table, and to reduce the computation efforts by sequential optimizations instead of joint optimizations. The empirical evaluation indicates that AsymDPOP significantly outperforms the state-of-the-art, as well as the vanilla DPOP with PEAV formulation.
#1894

ATSIS: Achieving the Ad hoc Teamwork by Sub-task Inference and Selection
Shuo Chen, Ewa Andrejczuk, Athirai A. Irissappane, Jie Zhang
Details | PDF

Coordination and Cooperation

In an ad hoc teamwork setting, the team needs to coordinate their activities to perform a task without prior agreement on how to achieve it. The ad hoc agent cannot communicate with its teammates but it can observe their behaviour and plan accordingly. To do so, the existing approaches rely on the teammates' behaviour models. However, the models may not be accurate, which can compromise teamwork. For this reason, we present Ad Hoc Teamwork by Sub-task Inference and Selection (ATSIS) algorithm that uses a sub-task inference without relying on teammates' models. First, the ad hoc agent observes its teammates to infer which sub-tasks they are handling. Based on that, it selects its own sub-task using a partially observable Markov decision process that handles the uncertainty of the sub-task inference. Last, the ad hoc agent uses the Monte Carlo tree search to find the set of actions to perform the sub-task. Our experiments show the benefits of ATSIS for robust teamwork.
#5598

Ad Hoc Teamwork With Behavior Switching Agents
Manish Ravula, Shani Alkoby, Peter Stone
Details | PDF

Coordination and Cooperation

As autonomous AI agents proliferate in the real world, they will increasingly need to cooperate with each other to achieve complex goals without always being able to coordinate in advance. This kind of cooperation, in which agents have to learn to cooperate on the fly, is called ad hoc teamwork. Many previous works investigating this setting assumed that teammates behave according to one of many predefined types that is fixed throughout the task. This assumption of stationarity in behaviors, is a strong assumption which cannot be guaranteed in many real-world settings. In this work, we relax this assumption and investigate settings in which teammates can change their types during the course of the task. This adds complexity to the planning problem as now an agent needs to recognize that a change has occurred in addition to figuring out what is the new type of the teammate it is interacting with. In this paper, we present a novel Convolutional-Neural-Network-based Change point Detection (CPD) algorithm for ad hoc teamwork. When evaluating our algorithm on the modified predator prey domain, we find that it outperforms existing Bayesian CPD algorithms.
#6462

Integrating Decision Sharing with Prediction in Decentralized Planning for Multi-Agent Coordination under Uncertainty
Minglong Li, Wenjing Yang, Zhongxuan Cai, Shaowu Yang, Ji Wang
Details | PDF

Coordination and Cooperation

The performance of decentralized multi-agent systems tends to benefit from information sharing and its effective utilization. However, too much or unnecessary sharing may hinder the performance due to the delay, instability and additional overhead of communications. Aiming to a satisfiable coordination performance, one would prefer the cost of communications as less as possible. In this paper, we propose an approach for improving the sharing utilization by integrating information sharing with prediction in decentralized planning. We present a novel planning algorithm by combining decision sharing and prediction based on decentralized Monte Carlo Tree Search called Dec-MCTS-SP. Each agent grows a search tree guided by the rewards calculated by the joint actions, which can not only be sampled from the shared probability distributions over action sequences, but also be predicted by a sufficiently-accurate and computationally-cheap heuristics-based method. Besides, several policies including sparse and discounted UCT and DIY-bonus are leveraged for performance improvement. We have implemented Dec-MCTS-SP in the case study on multi-agent information gathering under threat and uncertainty, which is formulated as Decentralized Partially Observable Markov Decision Process (Dec-POMDP). The factored belief vectors are integrated into Dec-MCTS-SP to handle the uncertainty. Comparing with the random, auction-based algorithm and Dec-MCTS, the evaluation shows that Dec-MCTS-SP can reduce communication cost significantly while still achieving a surprisingly higher coordination performance.

Wednesday 14 15:00 - 16:00 ML|TDS - Time-series;Data Streams 2 (2601-2602)

Chair: Xiaoyang Liu

#4435

ARMIN: Towards a More Efficient and Light-weight Recurrent Memory Network
Zhangheng Li, Jia-Xing Zhong, Jingjia Huang, Tao Zhang, Thomas Li, Ge Li
Details | PDF

Time-series;Data Streams 2

In recent years, memory-augmented neural networks(MANNs) have shown promising power to enhance the memory ability of neural networks for sequential processing tasks. However, previous MANNs suffer from complex memory addressing mechanism, making them relatively hard to train and causing computational overheads. Moreover, many of them reuse the classical RNN structure such as LSTM for memory processing, causing inefficient exploitations of memory information. In this paper, we introduce a novel MANN, the Auto-addressing and Recurrent Memory Integrating Network (ARMIN) to address these issues. The ARMIN only utilizes hidden state h_t for automatic memory addressing, and uses a novel RNN cell for refined integration of memory information. Empirical results on a variety of experiments demonstrate that the ARMIN is more light-weight and efficient compared to existing memory networks. Moreover, we demonstrate that the ARMIN can achieve much lower computational overhead than vanilla LSTM while keeping similar performances. Codes are available on github.com/zoharli/armin.
#5764

AddGraph: Anomaly Detection in Dynamic Graph Using Attention-based Temporal GCN
Li Zheng, Zhenpeng Li, Jian Li, Zhao Li, Jun Gao
Details | PDF

Time-series;Data Streams 2

Anomaly detection in dynamic graphs becomes very critical in many different application scenarios, e.g., recommender systems, while it also raises huge challenges due to the high flexible nature of anomaly and lack of sufficient labelled data. It is better to learn the anomaly patterns by considering all possible features including the structural, content and temporal features, rather than utilizing heuristic rules over the partial features. In this paper, we propose AddGraph, a general end-to-end anomalous edge detection framework using an extended temporal GCN (Graph Convolutional Network) with an attention model, which can capture both long-term patterns and the short-term patterns in dynamic graphs. In order to cope with insufficient explicit labelled data, we employ the negative sampling and margin loss in training of AddGraph in a semi-supervised fashion. We conduct extensive experiments on real-world datasets, and illustrate that AddGraph can outperform the state-of-the-art competitors in anomaly detection significantly.
#5985

Patent Citation Dynamics Modeling via Multi-Attention Recurrent Networks
Taoran Ji, Zhiqian Chen, Nathan Self, Kaiqun Fu, Chang-Tien Lu, Naren Ramakrishnan
Details | PDF

Time-series;Data Streams 2

Modeling and forecasting forward citations to a patent is a central task for the discovery of emerging technologies and for measuring the pulse of inventive progress. Conventional methods for forecasting these forward citations cast the problem as analysis of temporal point processes which rely on the conditional intensity of previously received citations. Recent approaches model the conditional intensity as a chain of recurrent neural networks to capture memory dependency in hopes of reducing the restrictions of the parametric form of the intensity function. For the problem of patent citations, we observe that forecasting a patent's chain of citations benefits from not only the patent's history itself but also from the historical citations of assignees and inventors associated with that patent. In this paper, we propose a sequence-to-sequence model which employs an attention-of-attention mechanism to capture the dependencies of these multiple time sequences. Furthermore, the proposed model is able to forecast both the timestamp and the category of a patent's next citation. Extensive experiments on a large patent citation dataset collected from USPTO demonstrate that the proposed model outperforms state-of-the-art models at forward citation forecasting.
#4437

Monitoring of a Dynamic System Based on Autoencoders
Aomar Osmani, Massinissa Hamidi, Salah Bouhouche
Details | PDF

Time-series;Data Streams 2

Monitoring industrial infrastructures are undergoing a critical transformation with industry 4.0. Monitoring solutions must follow the system behavior in real time and must adapt to its continuous change. We propose in this paper an autoencoder model-based approach for tracking abnormalities in industrial application. A set of sensors collects data from turbo-compressors and an original two-level machine learning LSTM autoencoder architecture defines a continuous nominal vibration model. Normalized thresholds (ISO 20816) between the model and the system generates a possible abnormal situation to diagnose. Experimental results, including hyper-parameter optimization on large real data and domain expert analysis, show that our proposed solution gives promising results.

Wednesday 14 15:00 - 16:00 KRR|BC - Belief Change (2603-2604)

Chair: Peter Struss

#1357

Observations on Darwiche and Pearl's Approach for Iterated Belief Revision
Theofanis Aravanis, Pavlos Peppas, Mary-Anne Williams
Details | PDF

Belief Change

Notwithstanding the extensive work on iterated belief revision, there is, still, no fully satisfactory solution within the classical AGM paradigm. The seminal work of Darwiche and Pearl (DP approach, for short) remains the most dominant, despite its well-documented shortcomings. In this article, we make further observations on the DP approach. Firstly, we prove that the DP postulates are, in a strong sense, inconsistent with Parikh's relevance-sensitive axiom (P), extending previous initial conflicts. Immediate consequences of this result are that an entire class of intuitive revision operators, which includes Dalal's operator, violates the DP postulates, as well as that the Independence postulate and Spohn's conditionalization are inconsistent with (P). Lastly, we show that the DP postulates allow for more revision polices than the ones that can be captured by identifying belief states with total preorders over possible worlds, a fact implying that a preference ordering (over possible worlds) is an insufficient representation for a belief state.
#4693

Belief Revision Operators with Varying Attitudes Towards Initial Beliefs
Adrian Haret, Stefan Woltran
Details | PDF

Belief Change

Classical axiomatizations of belief revision include a postulate stating that if new information is consistent with initial beliefs, then revision amounts to simply adding the new information to the original knowledge base. This postulate assumes a conservative attitude towards initial beliefs, in the sense that an agent faced with the need to revise them will seek to preserve initial beliefs as much as possible. In this work we look at operators that can assume different attitudes towards original beliefs. We provide axiomatizations of these operators by varying the aforementioned postulate and obtain representation results that characterize the new types of operators using preorders on possible worlds. We also present concrete examples for each new type of operator, adapting notions from decision theory.
#5451

Belief Update without Compactness in Non-finitary Languages
Jandson S Ribeiro, Abhaya Nayak, Renata Wassermann
Details | PDF

Belief Change

The main paradigms of belief change require the background logic to be Tarskian and finitary. We look at belief update when the underlying logic is not necessarily finitary. We show that in this case the classical construction for KM update does not capture all the rationality postulates for KM belief update. Indeed, this construction, being fully characterised by a subset of the KM update postulates, is weaker. We explore the reason behind this, and subsequently provide an alternative constructive accounts of belief update which is characterised by the full set of KM postulates in this more general framework.
#10980

(Journal track) Shielded Base Contraction
Marco Garapa, Eduardo Fermé, Maurício D. L. Reis
Details | PDF

Belief Change

In this paper we study a kind of non-prioritized contraction operator on belief bases -known as shielded base contractions. We propose twenty different classes of shielded base contractions and obtain axiomatic characterizations for each one of them. Additionally we thoroughly investigate the interrelations (in the sense of inclusion) among all those classes.

Wednesday 14 15:00 - 16:00 NLP|IE - Information Extraction 2 (2605-2606)

Chair: Wang Wenya

#3104

Beyond Word Attention: Using Segment Attention in Neural Relation Extraction
Bowen Yu, Zhenyu Zhang, Tingwen Liu, Bin Wang, Sujian Li, Quangang Li
Details | PDF

Information Extraction 2

Relation extraction studies the issue of predicting semantic relations between pairs of entities in sentences. Attention mechanisms are often used in this task to alleviate the inner-sentence noise by performing soft selections of words independently. Based on the observation that information pertinent to relations is usually contained within segments (continuous words in a sentence), it is possible to make use of this phenomenon for better extraction. In this paper, we aim to incorporate such segment information into neural relation extractor. Our approach views the attention mechanism as linear-chain conditional random fields over a set of latent variables whose edges encode the desired structure, and regards attention weight as the marginal distribution of each word being selected as a part of the relational expression. Experimental results show that our method can attend to continuous relational expressions without explicit annotations, and achieve the state-of-the-art performance on the large-scale TACRED dataset.
#3126

Relation Extraction Using Supervision from Topic Knowledge of Relation Labels
Haiyun Jiang, Li Cui, Zhe Xu, Deqing Yang, Jindong Chen, Chenguang Li, Jingping Liu, Jiaqing Liang, Chao Wang, Yanghua Xiao, Wei Wang
Details | PDF

Information Extraction 2

Explicitly exploring the semantics of a relation is significant for high-accuracy relation extraction, which is, however, not fully studied in previous work. In this paper, we mine the topic knowledge of a relation to explicitly represent the semantics of this relation, and model relation extraction as a matching problem. That is, the matching score between a sentence and a candidate relation is predicted for an entity pair. To this end, we propose a deep matching network to precisely model the semantic similarity between a sentence-relation pair. Besides, the topic knowledge also allows us to derive the importance information of samples as well as two knowledge-guided negative sampling strategies in the training process. We conduct extensive experiments to evaluate the proposed framework and observe improvements in AUC of 11.5% and max F1 of 5.4% over the baselines with state-of-the-art performance.
#3304

Extracting Entities and Events as a Single Task Using a Transition-Based Neural Model
Junchi Zhang, Yanxia Qin, Yue Zhang, Mengchi Liu, Donghong Ji
Details | PDF

Information Extraction 2

The task of event extraction contains subtasks including detections for entity mentions, event triggers and argument roles. Traditional methods solve them as a pipeline, which does not make use of task correlation for their mutual benefits. There have been recent efforts towards building a joint model for all tasks. However, due to technical challenges, there has not been work predicting the joint output structure as a single task. We build a first model to this end using a neural transition-based framework, incrementally predicting complex joint structures in a state-transition process. Results on standard benchmarks show the benefits of the joint model, which gives the best result in the literature.
#5076

Early Discovery of Emerging Entities in Microblogs
Satoshi Akasaki, Naoki Yoshinaga, Masashi Toyoda
Details | PDF

Information Extraction 2

Keeping up to date on emerging entities that appear every day is indispensable for various applications, such as social-trend analysis and marketing research. Previous studies have attempted to detect unseen entities that are not registered in a particular knowledge base as emerging entities and consequently find non-emerging entities since the absence of entities in knowledge bases does not guarantee their emergence. We therefore introduce a novel task of discovering truly emerging entities when they have just been introduced to the public through microblogs and propose an effective method based on time-sensitive distant supervision, which exploits distinctive early-stage contexts of emerging entities. Experimental results with a large-scale Twitter archive show that the proposed method achieves 83.2% precision of the top 500 discovered emerging entities, which outperforms baselines based on unseen entity recognition with burst detection. Besides notable emerging entities, our method can discover massive long-tail and homographic emerging entities. An evaluation of relative recall shows that the method detects 80.4% emerging entities newly registered in Wikipedia; 92.8% of them are discovered earlier than their registration in Wikipedia, and the average lead-time is more than one year (578 days).

Wednesday 14 15:00 - 16:00 CV|DDCV - 2D and 3D Computer Vision 1 (2501-2502)

Chair: Xiaochun Cao

#4594

Mutually Reinforced Spatio-Temporal Convolutional Tube for Human Action Recognition
Haoze Wu, Jiawei Liu, Zheng-Jun Zha, Zhenzhong Chen, Xiaoyan Sun
Details | PDF

2D and 3D Computer Vision 1

Recent works use 3D convolutional neural networks to explore spatio-temporal information for human action recognition. However, they either ignore the correlation between spatial and temporal features or suffer from high computational cost by spatio-temporal features extraction. In this work, we propose a novel and efficient Mutually Reinforced Spatio-Temporal Convolutional Tube (MRST) for human action recognition. It decomposes 3D inputs into spatial and temporal representations, mutually enhances both of them by exploiting the interaction of spatial and temporal information and selectively emphasizes informative spatial appearance and temporal motion, meanwhile reducing the complexity of structure. Moreover, we design three types of MRSTs according to the different order of spatial and temporal information enhancement, each of which contains a spatio-temporal decomposition unit, a mutually reinforced unit and a spatio-temporal fusion unit. An end-to-end deep network, MRST-Net, is also proposed based on the MRSTs to better explore spatio-temporal information in human actions. Extensive experiments show MRST-Net yields the best performance, compared to state-of-the-art approaches.
#5743

Structure-Aware Residual Pyramid Network for Monocular Depth Estimation
Xiaotian Chen, Xuejin Chen, Zheng-Jun Zha
Details | PDF

2D and 3D Computer Vision 1

Monocular depth estimation is an essential task for scene understanding. The underlying structure of objects and stuff in a complex scene is critical to recovering accurate and visually-pleasing depth maps. Global structure conveys scene layouts, while local structure reflects shape details. Recently developed approaches based on convolutional neural networks (CNNs) significantly improve the performance of depth estimation. However, few of them take into account multi-scale structures in complex scenes. In this paper, we propose a Structure-Aware Residual Pyramid Network (SARPN) to exploit multi-scale structures for accurate depth prediction. We propose a Residual Pyramid Decoder (RPD) which expresses global scene structure in upper levels to represent layouts, and local structure in lower levels to present shape details. At each level, we propose Residual Refinement Modules (RRM) that predict residual maps to progressively add finer structures on the coarser structure predicted at the upper level. In order to fully exploit multi-scale image features, an Adaptive Dense Feature Fusion (ADFF) module, which adaptively fuses effective features from all scales for inferring structures of each scale, is introduced. Experiment results on the challenging NYU-Depth v2 dataset demonstrate that our proposed approach achieves state-of-the-art performance in both qualitative and quantitative evaluation. The code is available at https://github.com/Xt-Chen/SARPN.
#6580

Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity
Liang Liu, Guangyao Zhai, Wenlong Ye, Yong Liu
Details | PDF

2D and 3D Computer Vision 1

Scene flow estimation in the dynamic scene remains a challenging task. Computing scene flow by a combination of 2D optical flow and depth has shown to be considerably faster with acceptable performance. In this work, we present a unified framework for joint unsupervised learning of stereo depth and optical flow with explicit local rigidity to estimate scene flow. We estimate camera motion directly by a Perspective-n-Point method from the optical flow and depth predictions, with RANSAC outlier rejection scheme. In order to disambiguate the object motion and the camera motion in the scene, we distinguish the rigid region by the re-project error and the photometric similarity. By joint learning with the local rigidity, both depth and optical networks can be refined. This framework boosts all four tasks: depth, optical flow, camera motion estimation, and object motion segmentation. Through the evaluation on the KITTI benchmark, we show that the proposed framework achieves state-of-the-art results amongst unsupervised methods. Our models and code are available at https://github.com/lliuz/unrigidflow.
#301

Unsupervised Learning of Monocular Depth and Ego-Motion using Conditional PatchGANs
Madhu Vankadari, Swagat Kumar, Anima Majumder, Kaushik Das
Details | PDF

2D and 3D Computer Vision 1

This paper presents a new GAN-based deep learning framework for estimating absolute scale awaredepth and ego motion from monocular images using a completely unsupervised mode of learning.The proposed architecture uses two separate generators to learn the distribution of depth and posedata for a given input image sequence. The depth and pose data, thus generated, are then evaluated bya patch-based discriminator using the reconstructed image and its corresponding actual image. Thepatch-based GAN (or PatchGAN) is shown to detect high frequency local structural defects in thereconstructed image, thereby improving the accuracy of overall depth and pose estimation. Unlikeconventional GANs, the proposed architecture uses a conditioned version of input and output of thegenerator for training the whole network. The resulting framework is shown to outperform all existing deep networks in this field and beating the current state-of-the-art method by 8.7% in absoluteerror and 5.2% in RMSE metric. To the best of our knowledge, this is first deep network based modelto estimate both depth and pose simultaneously using a conditional patch-based GAN paradigm.The efficacy of the proposed approach is demonstrated through rigorous ablation studies and exhaustive performance comparison on the popular KITTI outdoor driving dataset.

Wednesday 14 15:00 - 16:00 HSGP|CSO - Combinatorial Search and Optimisation 1 (2503-2504)

Chair: Xiangfu Zhao

#2056

Graph Mining Meets Crowdsourcing: Extracting Experts for Answer Aggregation
Yasushi Kawase, Yuko Kuroki, Atsushi Miyauchi
Details | PDF

Combinatorial Search and Optimisation 1

Aggregating responses from crowd workers is a fundamental task in the process of crowdsourcing. In cases where a few experts are overwhelmed by a large number of non-experts, most answer aggregation algorithms such as the majority voting fail to identify the correct answers. Therefore, it is crucial to extract reliable experts from the crowd workers. In this study, we introduce the notion of "expert core", which is a set of workers that is very unlikely to contain a non-expert. We design a graph-mining-based efficient algorithm that exactly computes the expert core. To answer the aggregation task, we propose two types of algorithms. The first one incorporates the expert core into existing answer aggregation algorithms such as the majority voting, whereas the second one utilizes information provided by the expert core extraction algorithm pertaining to the reliability of workers. We then give a theoretical justification for the first type of algorithm. Computational experiments using synthetic and real-world datasets demonstrate that our proposed answer aggregation algorithms outperform state-of-the-art algorithms.
#1017

Predict+Optimise with Ranking Objectives: Exhaustively Learning Linear Functions
Emir Demirovic, Peter J. Stuckey, James Bailey, Jeffrey Chan, Christopher Leckie, Kotagiri Ramamohanarao, Tias Guns
Details | PDF

Combinatorial Search and Optimisation 1

We study the predict+optimise problem, where machine learning and combinatorial optimisation must interact to achieve a common goal. These problems are important when optimisation needs to be performed on input parameters that are not fully observed but must instead be estimated using machine learning. Our contributions are two-fold: 1) we provide theoretical insight into the properties and computational complexity of predict+optimise problems in general, and 2) develop a novel framework that, in contrast to related work, guarantees to compute the optimal parameters for a linear learning function given any ranking optimisation problem. We illustrate the applicability of our framework for the particular case of the unit-weighted knapsack predict+optimise problem and evaluate on benchmarks from the literature.
#4709

Multiple Policy Value Monte Carlo Tree Search
Li-Cheng Lan, Wei Li, Ting-Han Wei, I-Chen Wu
Details | PDF

Combinatorial Search and Optimisation 1

Many of the strongest game playing programs use a combination of Monte Carlo tree search (MCTS) and deep neural networks (DNN), where the DNNs are used as policy or value evaluators. Given a limited budget, such as online playing or during the self-play phase of AlphaZero (AZ) training, a balance needs to be reached between accurate state estimation and more MCTS simulations, both of which are critical for a strong game playing agent. Typically, larger DNNs are better at generalization and accurate evaluation, while smaller DNNs are less costly, and therefore can lead to more MCTS simulations and bigger search trees with the same budget. This paper introduces a new method called the multiple policy value MCTS (MPV-MCTS), which combines multiple policy value neural networks (PV-NNs) of various sizes to retain advantages of each network, where two PV-NNs f_S and f_L are used in this paper. We show through experiments on the game NoGo that a combined f_S and f_L MPV-MCTS outperforms single PV-NN with policy value MCTS, called PV-MCTS. Additionally, MPV-MCTS also outperforms PV-MCTS for AZ training.
#5485

Model-Based Diagnosis with Multiple Observations
Alexey Ignatiev, Antonio Morgado, Georg Weissenbacher, Joao Marques-Silva
Details | PDF

Combinatorial Search and Optimisation 1

Existing automated testing frameworks require multiple observations to be jointly diagnosed with the purpose of identifying common fault locations. This is the case for example with continuous integration tools. This paper shows that existing solutions fail to compute the set of minimal diagnoses, and as a result run times can increase by orders of magnitude. The paper proposes not only solutions to correct existing algorithms, but also conditions for improving their run times. Nevertheless, the diagnosis of multiple observations raises a number of important computational challenges, which even the corrected algorithms are often unable to cope with. As a result, the paper devises a novel algorithm for diagnosing multiple observations, which is shown to enable significant performance improvements in practice.

Wednesday 14 15:00 - 16:00 ML|DM - Data Mining 6 (2505-2506)

Chair: Xiangliang Zhang

#1579

Deep Metric Learning: The Generalization Analysis and an Adaptive Algorithm
Mengdi Huai, Hongfei Xue, Chenglin Miao, Liuyi Yao, Lu Su, Changyou Chen, Aidong Zhang
Details | PDF

Data Mining 6

As an effective way to learn a distance metric between pairs of samples, deep metric learning (DML) has drawn significant attention in recent years. The key idea of DML is to learn a set of hierarchical nonlinear mappings using deep neural networks, and then project the data samples into a new feature space for comparing or matching. Although DML has achieved practical success in many applications, there is no existing work that theoretically analyzes the generalization error bound for DML, which can measure how good a learned DML model is able to perform on unseen data. In this paper, we try to fill up this research gap and derive the generalization error bound for DML. Additionally, based on the derived generalization bound, we propose a novel DML method (called ADroDML), which can adaptively learn the retention rates for the DML models with dropout in a theoretically justified way. Compared with existing DML works that require predefined retention rates, ADroDML can learn the retention rates in an optimal way and achieve better performance. We also conduct experiments on real-world datasets to verify the findings derived from the generalization error bound and demonstrate the effectiveness of the proposed adaptive DML method.
#6322

RobustTrend: A Huber Loss with a Combined First and Second Order Difference Regularization for Time Series Trend Filtering
Qingsong Wen, Jingkun Gao, Xiaomin Song, Liang Sun, Jian Tan
Details | PDF

Data Mining 6

Extracting the underlying trend signal is a crucial step to facilitate time series analysis like forecasting and anomaly detection. Besides noise signal, time series can contain not only outliers but also abrupt trend changes in real-world scenarios. To deal with these challenges, we propose a robust trend filtering algorithm based on robust statistics and sparse learning. Specifically, we adopt the Huber loss to suppress outliers, and utilize a combination of the first order and second order difference on the trend component as regularization to capture both slow and abrupt trend changes. Furthermore, an efficient method is designed to solve the proposed robust trend filtering based on majorization minimization (MM) and alternative direction method of multipliers (ADMM). We compared our proposed robust trend filter with other nine state-of-the-art trend filtering algorithms on both synthetic and real-world datasets. The experiments demonstrate that our algorithm outperforms existing methods.
#6358

Geometric Understanding for Unsupervised Subspace Learning
Shihui Ying, Lipeng Cai, Changzhou He, Yaxin Peng
Details | PDF

Data Mining 6

In this paper, we address the unsupervised subspace learning from a geometric viewpoint. First, we formulate the subspace learning as an inverse problem on Grassmannian manifold by considering all subspaces as points on it. Then, to make the model computable, we parameterize the Grassmannian manifold by using an orbit of rotation group action on all standard subspaces, which are spanned by the orthonormal basis. Further, to improve the robustness, we introduce a low-rank regularizer which makes the dimension of subspace as low as possible. Thus, the subspace learning problem is transferred to a minimization problem with variables of rotation and dimension. Then, we adopt the alternately iterative strategy to optimize the variables, where a structure-preserving method, based on the geodesic structure of the rotation group, is designed to update the rotation. Finally, we compare the proposed approach with six state-of-the-art methods on three different kinds of real datasets. The experimental results validate that our proposed method outperforms all compared methods.
#3755

Matching User with Item Set: Collaborative Bundle Recommendation with Deep Attention Network
Liang Chen, Yang Liu, Xiangnan He, Lianli Gao, Zibin Zheng
Details | PDF

Data Mining 6

Most recommendation research has been concentrated on recommending single items to users, such as the considerable work on collaborative filtering that models the interaction between a user and an item. However, in many real-world scenarios, the platform needs to show users a set of items, e.g., the marketing strategy that offers multiple items for sale as one bundle.In this work, we consider recommending a set of items to a user, i.e., the Bundle Recommendation task, which concerns the interaction modeling between a user and a set of items. We contribute a neural network solution named DAM, short for Deep Attentive Multi-Task model, which is featured with two special designs: 1) We design a factorized attention network to aggregate the item embeddings in a bundle to obtain the bundle's representation; 2) We jointly model user-bundle interactions and user-item interactions in a multi-task manner to alleviate the scarcity of user-bundle interactions. Extensive experiments on a real-world dataset show that DAM outperforms the state-of-the-art solution, verifying the effectiveness of our attention design and multi-task learning in DAM.

Wednesday 14 15:00 - 16:00 ML|SSL - Semi-Supervised Learning 1 (2401-2402)

Chair: Yaniv Shmueli

#1457

Quadruply Stochastic Gradients for Large Scale Nonlinear Semi-Supervised AUC Optimization
Wanli Shi, Bin Gu, Xiang Li, Xiang Geng, Heng Huang
Details | PDF

Semi-Supervised Learning 1

Semi-supervised learning is pervasive in real-world applications, where only a few labeled data are available and large amounts of instances remain unlabeled. Since AUC is an important model evaluation metric in classification, directly optimizing AUC in semi-supervised learning scenario has drawn much attention in the machine learning community. Recently, it has been shown that one could find an unbiased solution for the semi-supervised AUC maximization problem without knowing the class prior distribution. However, this method is hardly scalable for nonlinear classification problems with kernels. To address this problem, in this paper, we propose a novel scalable quadruply stochastic gradient algorithm (QSG-S2AUC) for nonlinear semi-supervised AUC optimization. In each iteration of the stochastic optimization process, our method randomly samples a positive instance, a negative instance, an unlabeled instance and their random features to compute the gradient and then update the model by using this quadruply stochastic gradient to approach the optimal solution. More importantly, we prove that QSG-S2AUC can converge to the optimal solution in O(1/t), where t is the iteration number. Extensive experimental results on a variety of benchmark datasets show that QSG-S2AUC is far more efficient than the existing state-of-the-art algorithms for semi-supervised AUC maximization, while retaining the similar generalization performance.
#3403

Interpolation Consistency Training for Semi-supervised Learning
Vikas Verma, Alex Lamb, Juho Kannala, Yoshua Bengio, David Lopez-Paz
Details | PDF

Semi-Supervised Learning 1

We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm. ICT encourages the prediction at an interpolation of unlabeled points to be consistent with the interpolation of the predictions at those points. In classification problems, ICT moves the decision boundary to low-density regions of the data distribution. Our experiments show that ICT achieves state-of-the-art performance when applied to standard neural network architectures on the CIFAR-10 and SVHN benchmark dataset.
#5405

Learning Robust Distance Metric with Side Information via Ratio Minimization of Orthogonally Constrained L21-Norm Distances
Kai Liu, Lodewijk Brand, Hua Wang, Feiping Nie
Details | PDF

Semi-Supervised Learning 1

Metric Learning, which aims at learning a distance metric for a given data set, plays an important role in measuring the distance or similarity between data objects. Due to its broad usefulness, it has attracted a lot of interest in machine learning and related areas in the past few decades. This paper proposes to learn the distance metric from the side information in the forms of must-links and cannot-links. Given the pairwise constraints, our goal is to learn a Mahalanobis distance that minimizes the ratio of the distances of the data pairs in the must-links to those in the cannot-links. Different from many existing papers that use the traditional squared L2-norm distance, we develop a robust model that is less sensitive to data noise or outliers by using the not-squared L2-norm distance. In our objective, the orthonormal constraint is enforced to avoid degenerate solutions. To solve our objective, we have derived an efficient iterative solution algorithm. We have conducted extensive experiments, which demonstrated the superiority of our method over state-of-the-art.
#1319

Robust Learning from Noisy Side-information by Semidefinite Programming
En-Liang Hu, Quanming Yao
Details | PDF

Semi-Supervised Learning 1

Robustness recently becomes one of the major concerns among machine learning community, since learning algorithms are usually vulnerable to outliers or corruptions. Motivated by such a trend and needs, we pursue robustness in semi-definite programming (SDP) in this paper. Specifically, this is done by replacing the commonly used squared loss with the more robust L1-loss in the low-rank SDP. However, the resulting objective becomes neither convex nor smooth. As no existing algorithms can be applied, we design an efficient algorithm, based on majorization-minimization, to optimize the objective. The proposed algorithm not only has cheap iterations and low space complexity but also theoretically converges to some critical points. Finally, empirical study shows that the new objective armed with proposed algorithm outperforms state-of-the-art in terms of both speed and accuracy.

Wednesday 14 15:00 - 16:00 AMS|ABSE - Agent-Based Simulation and Emergence (2403-2404)

Chair: Dave de Jonge

#2874

Swarm Engineering Through Quantitative Measurement of Swarm Robotic Principles in a 10,000 Robot Swarm
John Harwell, Maria Gini
Details | PDF

Agent-Based Simulation and Emergence

When designing swarm-robotic systems, system- atic comparison of algorithms from different do- mains is necessary to determine which is capa- ble of scaling up to handle the target problem size and target operating conditions. We propose a set of quantitative metrics for scalability, flexibility, and emergence which are capable of addressing these needs during the system design process. We demonstrate the applicability of our proposed met- rics as a design tool by solving a large object gath- ering problem in temporally varying operating con- ditions using iterative hypothesis evaluation. We provide experimental results obtained in simulation for swarms of over 10,000 robots.
#5073

Cap-and-Trade Emissions Regulation: A Strategic Analysis
Frank Cheng, Yagil Engel, Michael P. Wellman
Details | PDF

Agent-Based Simulation and Emergence

Cap-and-trade schemes are designed to achieve target levels of regulated emissions in a socially efficient manner. These schemes work by issuing regulatory credits and allowing firms to buy and sell them according to their relative compliance costs. Analyzing the efficacy of such schemes in concentrated industries is complicated by the strategic interactions among firms producing heterogeneous products. We tackle this complexity via an agent-based microeconomic model of the US market for personal vehicles. We calculate Nash equilibria among credits-trading strategies in a variety of scenarios and regulatory models. We find that while cap-and-trade results improves efficiency overall, consumers bear a disproportionate share of regulation cost, as firms use credit trading to segment the vehicle market. Credits trading volume decreases when firms behave more strategically, which weakens the segmentation effect.
#422

The Price of Governance: A Middle Ground Solution to Coordination in Organizational Control
Chao Yu, Guozhen Tan
Details | PDF

Agent-Based Simulation and Emergence

Achieving coordination is crucial in organizational control. This paper investigates a middle ground solution between decentralized interactions and centralized administrations for coordinating agents beyond inefficient behavior. We first propose the price of governance (PoG) to evaluate how such a middle ground solution performs in terms of effectiveness and cost. We then propose a hierarchical supervision framework to explicitly model the PoG, and define step by step how to realize the core principle of the framework and compute the optimal PoG for a control problem. Two illustrative case studies are carried out to exemplify the applications of the proposed framework and its methodology. Results show that the hierarchical supervision framework is capable of promoting coordination among agents while bounding administrative cost to a minimum in different kinds of organizational control problems.
#5627

Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning
Arthur Juliani, Ahmed Khalifa, Vincent-Pierre Berges, Jonathan Harper, Ervin Teng, Hunter Henry, Adam Crespi, Julian Togelius, Danny Lange
Details | PDF

Agent-Based Simulation and Emergence

The rapid pace of recent research in AI has been driven in part by the presence of fast and challenging simulation environments. These environments often take the form of games; with tasks ranging from simple board games, to competitive video games. We propose a new benchmark - Obstacle Tower: a high fidelity, 3D, 3rd person, procedurally generated environment. An agent in Obstacle Tower must learn to solve both low-level control and high-level planning problems in tandem while learning from pixels and a sparse reward signal. Unlike other benchmarks such as the Arcade Learning Environment, evaluation of agent performance in Obstacle Tower is based on an agent's ability to perform well on unseen instances of the environment. In this paper we outline the environment and provide a set of baseline results produced by current state-of-the-art Deep RL methods as well as human players. These algorithms fail to produce agents capable of performing near human level.

Wednesday 14 15:00 - 16:00 HAI|HAIC - Human-AI Collaboration (2405-2406)

Chair: Katia P Sycara

#4728

Exploring Computational User Models for Agent Policy Summarization
Isaac Lage, Daphna Lifschitz, Finale Doshi-Velez, Ofra Amir
Details | PDF

Human-AI Collaboration

AI agents support high stakes decision-making processes from driving cars to prescribing drugs, making it increasingly important for human users to understand their behavior. Policy summarization methods aim to convey strengths and weaknesses of such agents by demonstrating their behavior in a subset of informative states. Some policy summarization methods extract a summary that optimizes the ability to reconstruct the agent's policy under the assumption that users will deploy inverse reinforcement learning. In this paper, we explore the use of different models for extracting summaries. We introduce an imitation learning-based approach to policy summarization; we demonstrate through computational simulations that a mismatch between the model used to extract a summary and the model used to reconstruct the policy results in worse reconstruction quality; and we demonstrate through a human-subject study that people use different models to reconstruct policies in different contexts, and that matching the summary extraction model to these can improve performance. Together, our results suggest that it is important to carefully consider user models in policy summarization.
#5120

Why Can’t You Do That HAL? Explaining Unsolvability of Planning Tasks
Sarath Sreedharan, Siddharth Srivastava, David Smith, Subbarao Kambhampati
Details | PDF

Human-AI Collaboration

Explainable planning is widely accepted as a prerequisite for autonomous agents to successfully work with humans. While there has been a lot of research on generating explanations of solutions to planning problems, explaining the absence of solutions remains an open and under-studied problem, even though such situations can be the hardest to understand or debug. In this paper, we show that hierarchical abstractions can be used to efficiently generate reasons for unsolvability of planning problems. In contrast to related work on computing certificates of unsolvability, we show that these methods can generate compact, human-understandable reasons for unsolvability. Empirical analysis and user studies show the validity of our methods as well as their computational efficacy on a number of benchmark planning domains.
#5910

Balancing Explicability and Explanations in Human-Aware Planning
Tathagata Chakraborti, Sarath Sreedharan, Subbarao Kambhampati
Details | PDF

Human-AI Collaboration

Human-aware planning involves generating plans that are explicable as well as providing explanations when such plans cannot be found. In this paper, we bring these two concepts together and show how an agent can achieve a trade-off between these two competing characteristics of a plan. In order to achieve this, we conceive a first of its kind planner MEGA that can augment the possibility of explaining a plan in the plan generation process itself. We situate our discussion in the context of recent work on explicable planning and explanation generation and illustrate these concepts in two well-known planning domains, as well as in a demonstration of a robot in a typical search and reconnaissance task. Human factor studies in the latter highlight the usefulness of the proposed approach.
#5305

Model-Free Model Reconciliation
Sarath Sreedharan, Alberto Olmo Hernandez, Aditya Prasad Mishra, Subbarao Kambhampati
Details | PDF

Human-AI Collaboration

Designing agents capable of explaining complex sequential decisions remains a significant open problem in human-AI interaction. Recently, there has been a lot of interest in developing approaches for generating such explanations for various decision-making paradigms. One such approach has been the idea of explanation as model-reconciliation. The framework hypothesizes that one of the common reasons for a user's confusion could be the mismatch between the user's model of the agent's task model and the model used by the agent to generate the decisions. While this is a general framework, most works that have been explicitly built on this explanatory philosophy have focused on classical planning settings where the model of user's knowledge is available in a declarative form. Our goal in this paper is to adapt the model reconciliation approach to a more general planning paradigm and discuss how such methods could be used when user models are no longer explicitly available. Specifically, we present a simple and easy to learn labeling model that can help an explainer decide what information could help achieve model reconciliation between the user and the agent with in the context of planning with MDPs.

Wednesday 14 15:00 - 16:00 EurAI AI Dissertation Award (2306)

EurAI AI Dissertation Award

EurAI AI Dissertation Award

Wednesday 14 15:30 - 16:00 Industry Days (K)

Chair: Quan Lu (Alibaba Group)

Smart Finance: AI Meets Risk
Lingyun Gu, Chairman of the Board of Director, IceKredit

Industry Days

Wednesday 14 16:30 - 17:00 Industry Days (K)

Chair: Quan Lu (Alibaba Group)

Transform data into intelligence
Shanchuan Xu, Architect, OpenBayes

Industry Days

Wednesday 14 16:30 - 17:45 AI-HWB - ST: AI for Improving Human Well-Being 4 (J)

Chair: Dagmar Monett Diaz

#5915

Controllable Neural Story Plot Generation via Reward Shaping
Pradyumna Tambwekar, Murtaza Dhuliawala, Lara J. Martin, Animesh Mehta, Brent Harrison, Mark O. Riedl
Details | PDF

ST: AI for Improving Human Well-Being 4

Language-modeling--based approaches to story plot generation attempt to construct a plot by sampling from a language model (LM) to predict the next character, word, or sentence to add to the story. LM techniques lack the ability to receive guidance from the user to achieve a specific goal, resulting in stories that don't have a clear sense of progression and lack coherence. We present a reward-shaping technique that analyzes a story corpus and produces intermediate rewards that are backpropagated into a pre-trained LM in order to guide the model toward a given goal. Automated evaluations show our technique can create a model that generates story plots which consistently achieve a specified goal. Human-subject studies show that the generated stories have more plausible event ordering than baseline plot generation techniques.
#5423

Governance by Glass-Box: Implementing Transparent Moral Bounds for AI Behaviour
Andrea Aler Tubella, Andreas Theodorou, Frank Dignum, Virginia Dignum
Details | PDF

ST: AI for Improving Human Well-Being 4

Artificial Intelligence (AI) applications are being used to predict and assess behaviour in multiple domains which directly affect human well-being. However, if AI is to improve people’s lives, then people must be able to trust it, by being able to understand what the system is doing and why. Although transparency is often seen as the requirement in this case, realistically it might not always be possible, whereas the need to ensure that the system operates within set moral bounds remains. In this paper, we present an approach to evaluate the moral bounds of an AI system based on the monitoring of its inputs and outputs. We place a ‘Glass-Box’ around the system by mapping moral values into explicit verifiable norms that constrain inputs and outputs, in such a way that if these remain within the box we can guarantee that the system adheres to the value. The focus on inputs and outputs allows for the verification and comparison of vastly different intelligent systems; from deep neural networks to agent-based systems. The explicit transformation of abstract moral values into concrete norms brings great benefits in terms of explainability; stakeholders know exactly how the system is interpreting and employing relevant abstract moral human values and calibrate their trust accordingly. Moreover, by operating at a higher level we can check the compliance of the system with different interpretations of the same value.
#4679

AI-powered Posture Training: Application of Machine Learning in Sitting Posture Recognition Using the LifeChair Smart Cushion
Katia Bourahmoune, Toshiyuki Amagasa
Details | PDF

ST: AI for Improving Human Well-Being 4

Humans spend on average more than half of their day sitting down. The ill-effects of poor sitting posture and prolonged sitting on physical and mental health have been extensively studied, and solutions for curbing this sedentary epidemic have received special attention in recent years. With the recent advances in sensing technologies and Artificial Intelligence (AI), sitting posture monitoring and correction is one of the key problems to address for enhancing human well-being using AI. We present the application of a sitting posture training smart cushion called LifeChair that combines a novel pressure sensing technology, a smartphone app interface and machine learning (ML) for real-time sitting posture recognition and seated stretching guidance. We present our experimental design for sitting posture and stretch pose data collection using our posture training system. We achieved an accuracy of 98.93% in detecting more than 13 different sitting postures using a fast and robust supervised learning algorithm. We also establish the importance of taking into account the divergence in user body mass index in posture monitoring. Additionally, we present the first ML-based human stretch pose recognition system for pressure sensor data and show its performance in classifying six common chair-bound stretches.
#4282

Improving Customer Satisfaction in Bike Sharing Systems through Dynamic Repositioning
Supriyo Ghosh, Jing Yu Koh, Patrick Jaillet
Details | PDF

ST: AI for Improving Human Well-Being 4

In bike sharing systems (BSSs), the uncoordinated movements of customers using bikes lead to empty or congested stations, which causes a significant loss in customer demand. In order to reduce the lost demand, a wide variety of existing research has employed a fixed set of historical demand patterns to design efficient bike repositioning solutions. However, the progress remains slow in understanding the underlying uncertainties in demand and designing proactive robust bike repositioning solutions. To bridge this gap, we propose a dynamic bike repositioning approach based on a probabilistic satisficing method which uses the uncertain demand parameters that are learnt from historical data. We develop a novel and computationally efficient mixed integer linear program for maximizing the probability of satisfying the uncertain demand so as to improve the overall customer satisfaction and efficiency of the system. Extensive experimental results from a simulation model built on a real-world bike sharing data set demonstrate that our approach is not only robust to uncertainties in customer demand, but also outperforms the existing state-of-the-art repositioning approaches in terms of reducing the expected lost demand.
#3024

Evaluating the Interpretability of the Knowledge Compilation Map: Communicating Logical Statements Effectively
Serena Booth, Christian Muise, Julie Shah
Details | PDF

ST: AI for Improving Human Well-Being 4

Knowledge compilation techniques translate propositional theories into equivalent forms to increase their computational tractability. But, how should we best present these propositional theories to a human? We analyze the standard taxonomy of propositional theories for relative interpretability across three model domains: highway driving, emergency triage, and the chopsticks game. We generate decision-making agents which produce logical explanations for their actions and apply knowledge compilation to these explanations. Then, we evaluate how quickly, accurately, and confidently users comprehend the generated explanations. We find that domain, formula size, and negated logical connectives significantly affect comprehension while formula properties typically associated with interpretability are not strong predictors of human ability to comprehend the theory.

Wednesday 14 16:30 - 18:00 Panel (D-I)

Chair: Qiang Yang

AI in China

Panel

Wednesday 14 16:30 - 18:00 ML|DL - Deep Learning 5 (L)

Chair: Zhouchen Lin

#4173

One-Shot Texture Retrieval with Global Context Metric
Kai Zhu, Wei Zhai, Zheng-Jun Zha, Yang Cao
Details | PDF

Deep Learning 5

In this paper, we tackle one-shot texture retrieval: given an example of a new reference texture, detect and segment all the pixels of the same texture category within an arbitrary image. To address this problem, we present an OS-TR network to encoding both reference patch and query image, leading to achieve texture segmentation towards the reference category. Unlike the existing texture encoding methods that integrate CNN with orderless pooling, we propose a directionality-aware network to capture the texture variations at each direction, resulting in spatially invariant representation. To segment new categories given only few examples, we incorporate a self-gating mechanism into relation network to exploit global context information for adjusting per-channel modulation weights of local relation features. Extensive experiments on benchmark texture datasets and real scenarios demonstrate the above-par segmentation performance and robust generalization across domains of our proposed method.
#4320

STG2Seq: Spatial-Temporal Graph to Sequence Model for Multi-step Passenger Demand Forecasting
Lei Bai, Lina Yao, Salil S. Kanhere, Xianzhi Wang, Quan Z. Sheng
Details | PDF

Deep Learning 5

Multi-step passenger demand forecasting is a crucial task in on-demand vehicle sharing services. However, predicting passenger demand is generally challenging due to the nonlinear and dynamic spatial-temporal dependencies. In this work, we propose to model multi-step citywide passenger demand prediction based on a graph and use a hierarchical graph convolutional structure to capture both spatial and temporal correlations simultaneously. Our model consists of three parts: 1) a long-term encoder to encode historical passenger demands; 2) a short-term encoder to derive the next-step prediction for generating multi-step prediction; 3) an attention-based output module to model the dynamic temporal and channel-wise information. Experiments on three real-world datasets show that our model consistently outperforms many baseline methods and state-of-the-art models.
#5314

Omnidirectional Scene Text Detection with Sequential-free Box Discretization
Yuliang Liu, Sheng Zhang, Lianwen Jin, Lele Xie, Yaqiang Wu, Zhepeng Wang
Details | PDF

Deep Learning 5

Scene text in the wild is commonly presented with high variant characteristics. Using quadrilateral bounding box to localize the text instance is nearly indispensable for detection methods. However, recent researches reveal that introducing quadrilateral bounding box for scene text detection will bring a label confusion issue which is easily overlooked, and this issue may significantly undermine the detection performance. To address this issue, in this paper, we propose a novel method called Sequential-free Box Discretization (SBD) by discretizing the bounding box into key edges (KE) which can further derive more effective methods to improve detection performance. Experiments showed that the proposed method can outperform state-of-the-art methods in many popular scene text benchmarks, including ICDAR 2015, MLT, and MSRA-TD500. Ablation study also showed that simply integrating the SBD into Mask R-CNN framework, the detection performance can be substantially improved. Furthermore, an experiment on the general object dataset HRSC2016 (multi-oriented ships) showed that our method can outperform recent state-of-the-art methods by a large margin, demonstrating its powerful generalization ability.
#6146

Multi-Group Encoder-Decoder Networks to Fuse Heterogeneous Data for Next-Day Air Quality Prediction
Yawen Zhang, Qin Lv, Duanfeng Gao, Si Shen, Robert Dick, Michael Hannigan, Qi Liu
Details | PDF

Deep Learning 5

Accurate next-day air quality prediction is essential to enable warning and prevention measures for cities and individuals to cope with potential air pollution, such as vehicle restriction, factory shutdown, and limiting outdoor activities. The problem is challenging because air quality is affected by a diverse set of complex factors. There has been prior work on short-term (e.g., next 6 hours) prediction, however, there is limited research on modeling local weather influences or fusing heterogeneous data for next-day air quality prediction. This paper tackles this problem through three key contributions: (1) we leverage multi-source data, especially high-frequency grid-based weather data, to model air pollutant dynamics at station-level; (2) we add convolution operators on grid weather data to capture the impacts of various weather parameters on air pollutant variations; and (3) we automatically group (cross-domain) features based on their correlations, and propose multi-group Encoder-Decoder networks (MGED-Net) to effectively fuse multiple feature groups for next-day air quality prediction. The experiments with real-world data demonstrate the improved prediction performance of MGED-Net over state-of-the-art solutions (4.2% to 9.6% improvement in MAE and 9.2% to 16.4% improvement in RMSE).
#110

Multi-Prototype Networks for Unconstrained Set-based Face Recognition
Jian Zhao, Jianshu Li, Xiaoguang Tu, Fang Zhao, Yuan Xin, Junliang Xing, Hengzhu Liu, Shuicheng Yan, Jiashi Feng
Details | PDF

Deep Learning 5

In this paper, we address the challenging unconstrained set-based face recognition problem where each subject face is instantiated by a set of media (images and videos) instead of a single image. Naively aggregating information from all the media within a set would suffer from the large intra-set variance caused by heterogeneous factors (e.g., varying media modalities, poses and illumination) and fail to learn discriminative face representations. A novel Multi-Prototype Network (MP- Net) model is thus proposed to learn multiple prototype face representations adaptively from the media sets. Each learned prototype is representative for the subject face under certain condition in terms of pose, illumination and media modality. Instead of handcrafting the set partition for prototype learn- ing, MPNet introduces a Dense SubGraph (DSG) learning sub-net that implicitly untangles inconsistent media and learns a number of representative prototypes. Qualitative and quantitative experiments clearly demonstrate the superiority of the proposed model over state-of-the-arts.
#420

A Part Power Set Model for Scale-Free Person Retrieval
Yunhang Shen, Rongrong Ji, Xiaopeng Hong, Feng Zheng, Xiaowei Guo, Yongjian Wu, Feiyue Huang
Details | PDF

Deep Learning 5

Recently, person re-identification (re-ID) has attracted increasing research attention, which has broad application prospects in video surveillance and beyond. To this end, most existing methods highly relied on well-aligned pedestrian images and hand-engineered part-based model on the coarsest feature map. In this paper, to lighten the restriction of such fixed and coarse input alignment, an end-to-end part power set model with multi-scale features is proposed, which captures the discriminative parts of pedestrians from global to local, and from coarse to fine, enabling part-based scale-free person re-ID. In particular, we first factorize the visual appearance by enumerating $k$-combinations for all $k$ of $n$ body parts to exploit rich global and partial information to learn discriminative feature maps. Then, a combination ranking module is introduced to guide the model training with all combinations of body parts, which alternates between ranking combinations and estimating an appearance model. To enable scale-free input, we further exploit the pyramid architecture of deep networks to construct multi-scale feature maps with a feasible amount of extra cost in term of memory and time. Extensive experiments on the mainstream evaluation datasets, including Market-1501, DukeMTMC-reID and CUHK03, validate that our method achieves the state-of-the-art performance.
#2668

Deliberation Learning for Image-to-Image Translation
Tianyu He, Yingce Xia, Jianxin Lin, Xu Tan, Di He, Tao Qin, Zhibo Chen
Details | PDF

Deep Learning 5

Image-to-image translation, which transfers an image from a source domain to a target one, has attracted much attention in both academia and industry. The major approach is to adopt an encoder-decoder based framework, where the encoder extracts features from the input image and then the decoder decodes the features and generates an image in the target domain as the output. In this paper, we go beyond this learning framework by considering an additional polishing step on the output image. Polishing an image is very common in human's daily life, such as editing and beautifying a photo in Photoshop after taking/generating it by a digital camera. Such a deliberation process is shown to be very helpful and important in practice and thus we believe it will also be helpful for image translation. Inspired by the success of deliberation network in natural language processing, we extend deliberation process to the field of image translation. We verify our proposed method on four two-domain translation tasks and one multi-domain translation task. Both the qualitative and quantitative results demonstrate the effectiveness of our method.

Wednesday 14 16:30 - 18:00 ML|RL - Reinforcement Learning 4 (2701-2702)

Chair: Zhenghua Xu

#1900

Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces
Haotian Fu, Hongyao Tang, Jianye Hao, Zihan Lei, Yingfeng Chen, Changjie Fan
Details | PDF

Reinforcement Learning 4

Deep Reinforcement Learning (DRL) has been applied to address a variety of cooperative multi-agent problems with either discrete action spaces or continuous action spaces. However, to the best of our knowledge, no previous work has ever succeeded in applying DRL to multi-agent problems with discrete-continuous hybrid (or parameterized) action spaces which is very common in practice. Our work fills this gap by proposing two novel algorithms: Deep Multi-Agent Parameterized Q-Networks (Deep MAPQN) and Deep Multi-Agent Hierarchical Hybrid Q-Networks (Deep MAHHQN). We follow the centralized training but decentralized execution paradigm: different levels of communication between different agents are used to facilitate the training process, while each agent executes its policy independently based on local observations during execution. Our empirical results on several challenging tasks (simulated RoboCup Soccer and game Ghost Story) show that both Deep MAPQN and Deep MAHHQN are effective and significantly outperform existing independent deep parameterized Q-learning method.
#2930

Imitation Learning from Video by Leveraging Proprioception
Faraz Torabi, Garrett Warnell, Peter Stone
Details | PDF

Reinforcement Learning 4

Classically, imitation learning algorithms have been developed for idealized situations, e.g., the demonstrations are often required to be collected in the exact same environment and usually include the demonstrator's actions. Recently, however, the research community has begun to address some of these shortcomings by offering algorithmic solutions that enable imitation learning from observation (IfO), e.g., learning to perform a task from visual demonstrations that may be in a different environment and do not include actions. Motivated by the fact that agents often also have access to their own internal states (i.e., proprioception), we propose and study an IfO algorithm that leverages this information in the policy learning process. The proposed architecture learns policies over proprioceptive state representations and compares the resulting trajectories visually to the demonstration data. We experimentally test the proposed technique on several MuJoCo domains and show that it outperforms other imitation from observation algorithms by a large margin.
#4404

Playing FPS Games With Environment-Aware Hierarchical Reinforcement Learning
Shihong Song, Jiayi Weng, Hang Su, Dong Yan, Haosheng Zou, Jun Zhu
Details | PDF

Reinforcement Learning 4

Learning rational behaviors in First-person-shooter (FPS) games is a challenging task for Reinforcement Learning (RL) with the primary difficulties of huge action space and insufficient exploration. To address this, we propose a hierarchical agent based on combined options with intrinsic rewards to drive exploration. Specifically, we present a hierarchical model that works in a manager-worker fashion over two levels of hierarchy. The high-level manager learns a policy over options, and the low-level workers, motivated by intrinsic reward, learn to execute the options. Performance is further improved with environmental signals appropriately harnessed. Extensive experiments demonstrate that our trained bot significantly outperforms the alternative RL-based models on FPS games requiring maze solving and combat skills, etc. Notably, we achieved first place in VDAIC 2018 Track(1).
#5088

DeepMellow: Removing the Need for a Target Network in Deep Q-Learning
Seungchan Kim, Kavosh Asadi, Michael Littman, George Konidaris
Details | PDF

Reinforcement Learning 4

Deep Q-Network (DQN) is an algorithm that achieves human-level performance in complex domains like Atari games. One of the important elements of DQN is its use of a target network, which is necessary to stabilize learning. We argue that using a target network is incompatible with online reinforcement learning, and it is possible to achieve faster and more stable learning without a target network when we use Mellowmax, an alternative softmax operator. We derive novel properties of Mellowmax, and empirically show that the combination of DQN and Mellowmax, but without a target network, outperforms DQN with a target network.
#5236

On Principled Entropy Exploration in Policy Optimization
Jincheng Mei, Chenjun Xiao, Ruitong Huang, Dale Schuurmans, Martin Müller
Details | PDF

Reinforcement Learning 4

In this paper, we investigate Exploratory Conservative Policy Optimization (ECPO), a policy optimization strategy that improves exploration behavior while assuring monotonic progress in a principled objective. ECPO conducts maximum entropy exploration within a mirror descent framework, but updates policies using reversed KL projection. This formulation bypasses undesirable mode seeking behavior and avoids premature convergence to sub-optimal policies, while still supporting strong theoretical properties such as guaranteed policy improvement. Experimental evaluations demonstrate that the proposed method significantly improves practical exploration and surpasses the empirical performance of state-of-the art policy optimization methods in a set of benchmark tasks.
#1874

Automatic Successive Reinforcement Learning with Multiple Auxiliary Rewards
Zhao-Yang Fu, De-Chuan Zhan, Xin-Chun Li, Yi-Xing Lu
Details | PDF

Reinforcement Learning 4

Reinforcement learning has played an important role in decision making related applications, e.g., robotics motion, self-driving, recommendation, etc. The reward function, as a crucial component, affects the efficiency and effectiveness of reinforcement learning to a large extent. In this paper, we focus on the investigation of reinforcement learning with more than one auxiliary reward. It is found that different auxiliary rewards can boost up the learning rate and effectiveness in different stages, and consequently we propose the Automatic Successive Reinforcement Learning (ASR) for auxiliary rewards grading selection for efficient reinforcement learning by stages. Experiments and simulations have shown the superiority of our proposed ASR on a range of environments, including OpenAI classical control domains and video games; Freeway and Catcher.

Wednesday 14 16:30 - 18:00 AMS|FVVS - Formal Verification, Validation and Synthesis (2703-2704)

Chair: Wojtek Jamroga

#758

Probabilistic Strategy Logic
Benjamin Aminof, Marta Kwiatkowska, Bastien Maubert, Aniello Murano, Sasha Rubin
Details | PDF

Formal Verification, Validation and Synthesis

We introduce Probabilistic Strategy Logic, an extension of Strategy Logic for stochastic systems. The logic has probabilistic terms that allow it to express many standard solution concepts, such as Nash equilibria in randomised strategies, as well as constraints on probabilities, such as independence. We study the model-checking problem for agents with perfect- and imperfect-recall. The former is undecidable, while the latter is decidable in space exponential in the system and triple-exponential in the formula. We identify a natural fragment of the logic, in which every temporal operator is immediately preceded by a probabilistic operator, and show that it is decidable in space exponential in the system and the formula, and double-exponential in the nesting depth of the probabilistic terms. Taking a fixed nesting depth, this gives a fragment that still captures many standard solution concepts, and is decidable in exponential space.
#2732

A Probabilistic Logic for Resource-Bounded Multi-Agent Systems
Hoang Nga Nguyen, Abdur Rakib
Details | PDF

Formal Verification, Validation and Synthesis

Resource-bounded alternating-time temporal logic (RB-ATL), an extension of Coalition Logic (CL) and Alternating-time Temporal Logic (ATL), allows reasoning about resource requirements of coalitions in concurrent systems. However, many real-world systems are inherently probabilistic as well as resource-bounded, and there is no straightforward way of reasoning about their unpredictable behaviours. In this paper, we propose a logic for reasoning about coalitional power under resource constraints in the probabilistic setting. We extend RB-ATL with probabilistic reasoning and provide a standard algorithm for the model-checking problem of the resulting logic Probabilistic Resource-Bounded ATL (pRB-ATL).
#4929

Reasoning about Quality and Fuzziness of Strategic Behaviours
Patricia Bouyer, Orna Kupferman, Nicolas Markey, Bastien Maubert, Aniello Murano, Giuseppe Perelli
Details | PDF

Formal Verification, Validation and Synthesis

We introduce and study SL[F], a quantitative extension of SL (Strategy Logic), one of the most natural and expressive logics describing strategic behaviours. The satisfaction value of an SL[F] formula is a real value in [0,1], reflecting ``how much'' or ``how well'' the strategic on-going objectives of the underlying agents are satisfied. We demonstrate the applications of SL[F] in quantitative reasoning about multi-agent systems, by showing how it can express concepts of stability in multi-agent systems, and how it generalises some fuzzy temporal logics. We also provide a model-checking algorithm for ourlogic, based on a quantitative extension of Quantified CTL*.
#2420

Decidability of Model Checking Multi-Agent Systems with Regular Expressions against Epistemic HS Specifications
Jakub Michaliszyn, Piotr Witkowski
Details | PDF

Formal Verification, Validation and Synthesis

Epistemic Halpern-Shoham logic (EHS) is an interval temporal logic defined to verify properties of Multi-Agent Systems. In this paper we show that the model checking Multi-Agent Systems with regular expressions against the EHS specifications is decidable. We achieve this by reducing the model checking problem to the satisfiability problem of Monadic Second-Order Logic on trees.
#6364

Demystifying the Combination of Dynamic Slicing and Spectrum-based Fault Localization
Sofia Reis, Rui Abreu, Marcelo d'Amorim
Details | PDF

Formal Verification, Validation and Synthesis

Several approaches have been proposed to reduce debugging costs through automated software fault diagnosis. Dynamic Slicing (DS) and Spectrum-based Fault Localization (SFL) are popular fault diagnosis techniques and normally seen as complementary. This paper reports on a comprehensive study to reassess the effects of combining DS with SFL. With this combination, components that are often involved in failing but seldom in passing test runs could be located and their suspiciousness reduced. Results show that the DS-SFL combination, coined as Tandem-FL, improves the diagnostic accuracy up to 73.7% (13.4% on average). Furthermore, results indicate that the risk of missing faulty statements, which is a DS?s key limitation, is not high ? DS misses faulty statements in 9% of the 260 cases. To sum up, we found that the DS-SFL combination was practical and effective and encourage new SFL techniques to be evaluated against that optimization.
#4521

Best Answers over Incomplete Data : Complexity and First-Order Rewritings
Amélie Gheerbrant, Cristina Sirangelo
Details | PDF

Formal Verification, Validation and Synthesis

Answering queries over incomplete data is ubiquitous in data management and in many AI applications that use query rewriting to take advantage of relational database technology. In these scenarios one lacks full information on the data but queries still need to be answered with certainty. The certainty aspect often makes query answering unfeasible except for restricted classes, such as unions of conjunctive queries. In addition often there are no, or very few certain answers, thus expensive computation is in vain. Therefore we study a relaxation of certain answers called best answers. They are defined as those answers for which there is no better one (that is, no answer true in more possible worlds). When certain answers exist the two notions coincide. We compare different ways of casting query answering as a decision problem and characterise its complexity for first-order queries, showing significant differences in the behavior of best and certain answers.We then restrict attention to best answers for unions of conjunctive queries and produce a practical algorithm for finding them based on query rewriting techniques.

Wednesday 14 16:30 - 18:00 HSGP|GPML - Game Playing and Machine Learning (2705-2706)

Chair: Daniel Harabor

#779

DeltaDou: Expert-level Doudizhu AI through Self-play
Qiqi Jiang, Kuangzheng Li, Boyao Du, Hao Chen, Hai Fang
Details | PDF

Game Playing and Machine Learning

Artificial Intelligence has seen several breakthroughs in two-player perfect information game. Nevertheless, Doudizhu, a three-player imperfect information game, is still quite challenging. In this paper, we present a Doudizhu AI by applying deep reinforcement learning from games of self-play. The algorithm combines an asymmetric MCTS on nodes of information set of each player, a policy-value network that approximates the policy and value on each decision node, and inference on unobserved hands of other players by given policy. Our results show that self-play can significantly improve the performance of our agent in this multi-agent imperfect information game. Even starting with a weak AI, our agent can achieve human expert level after days of self-play and training.
#3831

An Evolution Strategy with Progressive Episode Lengths for Playing Games
Lior Fuks, Noor Awad, Frank Hutter, Marius Lindauer
Details | PDF

Game Playing and Machine Learning

Recently, Evolution Strategies (ES) have been successfully applied to solve problems commonly addressed by reinforcement learning (RL). Due to the simplicity of ES approaches, their runtime is often dominated by the RL-task at hand (e.g., playing a game). In this work, we introduce Progressive Episode Lengths (PEL) as a new technique and incorporate it with ES. The main objective is to allow the agent to play short and easy tasks with limited lengths, and then use the gained knowledge to further solve long and hard tasks with progressive lengths. Hence allowing the agent to perform many function evaluations and find a good solution for short time horizons before adapting the strategy to tackle larger time horizons. We evaluated PEL on a subset of Atari games from OpenAI Gym, showing that it can substantially improve the optimization speed, stability and final score of canonical ES. Specifically, we show average improvements of 80% (32%) after 2 hours (10 hours) compared to canonical ES.
#5042

Playing Card-Based RTS Games with Deep Reinforcement Learning
Tianyu Liu, Zijie Zheng, Hongchang Li, Kaigui Bian, Lingyang Song
Details | PDF

Game Playing and Machine Learning

Game AI is of great importance as games are simulations of reality. Recent research on game AI has shown much progress in various kinds of games, such as console games, board games and MOBA games. However, the exploration in RTS games remains a challenge for their huge state space, imperfect information, sparse rewards and various strategies. Besides, the typical card-based RTS games have complex card features and are still lacking solutions. We present a deep model SEAT (selection-attention) to play card-based RTS games. The SEAT model includes two parts, a selection part for card choice and an attention part for card usage, and it learns from scratch via deep reinforcement learning. Comprehensive experiments are performed on Clash Royale, a popular mobile card-based RTS game. Empirical results show that the SEAT model agent makes it to reach a high winning rate against rule-based agents and decision-tree-based agent.
#3965

Learning Deep Decentralized Policy Network by Collective Rewards for Real-Time Combat Game
Peixi Peng, Junliang Xing, Lili Cao, Lisen Mu, Chang Huang
Details | PDF

Game Playing and Machine Learning

The task of real-time combat game is to coordinate multiple units to defeat their enemies controlled by the given opponent in a real-time combat scenario. It is difficult to design a high-level Artificial Intelligence (AI) program for such a task due to its extremely large state-action space and real-time requirements. This paper formulates this task as a collective decentralized partially observable Markov decision process, and designs a Deep Decentralized Policy Network (DDPN) to model the polices. To train DDPN effectively, a novel two-stage learning algorithm is proposed which combines imitation learning from opponent and reinforcement learning by no-regret dynamics. Extensive experimental results on various combat scenarios indicate that proposed method can defeat different opponent models and significantly outperforms many state-of-the-art approaches.
#751

Procedural Generation of Initial States of Sokoban
Dâmaris S. Bento, André G. Pereira, Levi H. S. Lelis
Details | PDF

Game Playing and Machine Learning

Procedural generation of initial states of state-space search problems have applications in human and machine learning as well as in the evaluation of planning systems. In this paper we deal with the task of generating hard and solvable initial states of Sokoban puzzles. We propose hardness metrics based on pattern database heuristics and the use of novelty to improve the exploration of search methods in the task of generating initial states. We then present a system called Beta that uses our hardness metrics and novelty to generate initial states. Experiments show that Beta is able to generate initial states that are harder to solve by a specialized solver than those designed by human experts.
#2866

The Expected-Length Model of Options
David Abel, John Winder, Marie desJardins, Michael Littman
Details | PDF

Game Playing and Machine Learning

Effective options can make reinforcement learning easier by enhancing an agent's ability to both explore in a targeted manner and plan further into the future. However, learning an appropriate model of an option's dynamics in hard, requiring estimating a highly parameterized probability distribution. This paper introduces and motivates the Expected-Length Model (ELM) for options, an alternate model for transition dynamics. We prove ELM is a (biased) estimator of the traditional Multi-Time Model (MTM), but provide a non-vacuous bound on their deviation. We further prove that, in stochastic shortest path problems, ELM induces a value function that is sufficiently similar to the one induced by MTM, and is thus capable of supporting near-optimal behavior. We explore the practical utility of this option model experimentally, finding consistent support for the thesis that ELM is a suitable replacement for MTM. In some cases, we find ELM leads to more sample efficient learning, especially when options are arranged in a hierarchy.

Wednesday 14 16:30 - 18:00 ML|OL - Online Learning 2 (2601-2602)

Chair: Ying Wei

#744

Indirect Trust is Simple to Establish
Elham Parhizkar, Mohammad Hossein Nikravan, Sandra Zilles
Details | PDF

Online Learning 2

In systems with multiple potentially deceptive agents, any single agent may have to assess the trustworthiness of other agents in order to decide with which agents to interact. In this context, indirect trust refers to trust established through third-party advice. Since the advisers themselves may be deceptive or unreliable, agents need a mechanism to assess and properly incorporate advice. We evaluate existing state-of-the-art methods for computing indirect trust in numerous simulations, demonstrating that the best ones tend to be of prohibitively large complexity. We propose a new and easy to implement method for computing indirect trust, based on a simple prediction with expert advice strategy as is often used in online learning. This method either competes with or outperforms all tested systems in the vast majority of the settings we simulated, while scaling substantially better. Our results demonstrate that existing systems for computing indirect trust are overly complex; the problem can be solved much more efficiently than the literature suggests.
#972

Optimal Exploitation of Clustering and History Information in Multi-armed Bandit
Djallel Bouneffouf, Srinivasan Parthasarathy, Horst Samulowitz, Martin Wistuba
Details | PDF

Online Learning 2

We consider the stochastic multi-armed bandit problem and the contextual bandit problem with historical observations and pre-clustered arms. The historical observations can contain any number of instances for each arm, and the pre-clustering information is a fixed clustering of arms provided as part of the input. We develop a variety of algorithms which incorporate this offline information effectively during the online exploration phase and derive their regret bounds. In particular, we develop the META algorithm which effectively hedges between two other algorithms: one which uses both historical observations and clustering, and another which uses only the historical observations. The former outperforms the latter when the clustering quality is good, and vice-versa. Extensive experiments on synthetic and real world datasets on Warafin drug dosage and web server selectionfor latency minimization validate our theoretical insights and demonstrate that META is a robust strategy for optimally exploiting the pre-clustering information.
#2702

Unsupervised Hierarchical Temporal Abstraction by Simultaneously Learning Expectations and Representations
Katherine Metcalf, David Leake
Details | PDF

Online Learning 2

This paper presents ENHAnCE, an algorithm that simultaneously learns a predictive model of the input stream and generates representations of the concepts being observed. Following cognitively-inspired models of event segmentation, ENHAnCE uses expectation violations to identify boundaries between temporally extended patterns. It applies its expectation-driven process at multiple levels of temporal granularity to produce a hierarchy of predictive models that enable it to identify concepts at multiple levels of temporal abstraction. Evaluations show that the temporal abstraction hierarchies generated by ENHAnCE closely match hand-coded hierarchies for the test data streams. Given language data streams, ENHAnCE learns a hierarchy of predictive models that capture basic units of both spoken and written language: morphemes, lexemes, phonemes, syllables, and words.
#1113

Online Learning from Capricious Data Streams: A Generative Approach
Yi He, Baijun Wu, Di Wu, Ege Beyazit, Sheng Chen, Xindong Wu
Details | PDF

Online Learning 2

Learning with streaming data has received extensive attention during the past few years. Existing approaches assume the feature space is fixed or changes by following explicit regularities, limiting their applicability in dynamic environments where the data streams are described by an arbitrarily varying feature space. To handle such capricious data streams, we in this paper develop a novel algorithm, named OCDS (Online learning from Capricious Data Streams), which does not make any assumption on feature space dynamics. OCDS trains a learner on a universal feature space that establishes relationships between old and new features, so that the patterns learned in the old feature space can be used in the new feature space. Specifically, the universal feature space is constructed by leveraging the relatednesses among features. We propose a generative graphical model to model the construction process, and show that learning from the universal feature space can effectively improve performance with theoretical analysis. The experimental results demonstrate that OCDS achieves conspicuous performance on synthetic and real datasets.
#1667

Efficient Non-parametric Bayesian Hawkes Processes
Rui Zhang, Christian Walder, Marian-Andrei Rizoiu, Lexing Xie
Details | PDF

Online Learning 2

In this paper, we develop an efficient non-parametric Bayesian estimation of the kernel function of Hawkes processes. The non-parametric Bayesian approach is important because it provides flexible Hawkes kernels and quantifies their uncertainty. Our method is based on the cluster representation of Hawkes processes. Utilizing the stationarity of the Hawkes process, we efficiently sample random branching structures and thus, we split the Hawkes process into clusters of Poisson processes. We derive two algorithms --- a block Gibbs sampler and a maximum a posteriori estimator based on expectation maximization --- and we show that our methods have a linear time complexity, both theoretically and empirically. On synthetic data, we show our methods to be able to infer flexible Hawkes triggering kernels. On two large-scale Twitter diffusion datasets, we show that our methods outperform the current state-of-the-art in goodness-of-fit and that the time complexity is linear in the size of the dataset. We also observe that on diffusions related to online videos, the learned kernels reflect the perceived longevity for different content types such as music or pets videos.
#5712

Sketched Iterative Algorithms for Structured Generalized Linear Models
Qilong Gu, Arindam Banerjee
Details | PDF

Online Learning 2

Recent years have seen advances in optimizing large scale statistical estimation problems. In statistical learning settings iterative optimization algorithms have been shown to enjoy geometric convergence. While powerful, such results only hold for the original dataset, and may face computational challenges when the sample size is large. In this paper, we study sketched iterative algorithms, in particular sketched-PGD (projected gradient descent) and sketched-SVRG (stochastic variance reduced gradient) for structured generalized linear model, and illustrate that these methods continue to have geometric convergence to the statistical error under suitable assumptions. Moreover, the sketching dimension is allowed to be even smaller than the ambient dimension, thus can lead to significant speed-ups. The sketched iterative algorithms introduced provide an additional dimension to study the trade-offs between statistical accuracy and time.

Wednesday 14 16:30 - 18:00 ML|C - Classification 5 (2603-2604)

Chair: Yue Zhu

#1598

ATTAIN: Attention-based Time-Aware LSTM Networks for Disease Progression Modeling
Yuan Zhang, Xi Yang, Julie Ivy, Min Chi
Details | PDF

Classification 5

Modeling patient disease progression using Electronic Health Records (EHRs) is critical to assist clinical decision making. Long-Short Term Memory (LSTM) is an effective model to handle sequential data, such as EHRs, but it encounters two major limitations when applied to EHRs: it is unable to interpret the prediction results and it ignores the irregular time intervals between consecutive events. To tackle these limitations, we propose an attention-based time-aware LSTM Networks (ATTAIN), to improve the interpretability of LSTM and to identify the critical previous events for current diagnosis by modeling the inherent time irregularity. We validate ATTAIN on modeling the progression of an extremely challenging disease, septic shock, by using real-world EHRs. Our results demonstrate that the proposed framework outperforms the state-of-the-art models such as RETAIN and T-LSTM. Also, the generated interpretative time-aware attention weights shed some lights on the progression behaviors of septic shock.
#2324

Positive and Unlabeled Learning with Label Disambiguation
Chuang Zhang, Dexin Ren, Tongliang Liu, Jian Yang, Chen Gong
Details | PDF

Classification 5

Positive and Unlabeled (PU) learning aims to learn a binary classifier from only positive and unlabeled training data. The state-of-the-art methods usually formulate PU learning as a cost-sensitive learning problem, in which every unlabeled example is simultaneously treated as positive and negative with different class weights. However, the ground-truth label of an unlabeled example should be unique, so the existing models inadvertently introduce the label noise which may lead to the biased classifier and deteriorated performance. To solve this problem, this paper proposes a novel algorithm dubbed as "Positive and Unlabeled learning with Label Disambiguation'' (PULD). We first regard all the unlabeled examples in PU learning as ambiguously labeled as positive and negative, and then employ the margin-based label disambiguation strategy, which enlarges the margin of classifier response between the most likely label and the less likely one, to find the unique ground-truth label of each unlabeled example. Theoretically, we derive the generalization error bound of the proposed method by analyzing its Rademacher complexity. Experimentally, we conduct intensive experiments on both benchmark and real-world datasets, and the results clearly demonstrate the superiority of the proposed PULD to the existing PU learning approaches.
#2920

Prediction of Mild Cognitive Impairment Conversion Using Auxiliary Information
Xiaofeng Zhu
Details | PDF

Classification 5

In this paper, we propose a new feature selection method to exploit the issue of High Dimension Low Sample Size (HDLSS) for the prediction of Mild Cognitive Impairment (MCI) conversion. Specially, by regarding the Magnetic Resonance Imaging (MRI) information of MCI subjects as the target data, this paper proposes to integrate auxiliary information with the target data in a unified feature selection framework for distinguishing progressive MCI (pMCI) subjects from stable MCI (sMCI) subjects, i.e., the MCI conversion classification for short in this paper, based on their MRI information. The auxiliary information includes the Positron Emission Tomography (PET) information of the target data, the MRI information of Alzheimer’s Disease (AD) subjects and Normal Control (NC) subjects, and the ages of the target data and the AD and NC subjects. As a result, the proposed method jointly selects features from the auxiliary data and the target data by taking into account the influence of outliers and aging of these two kinds of data. Experimental results on the public data of Alzheimer’s Disease Neuroimaging Initiative (ADNI) verified the effectiveness of our proposed method, compared to three state-of-the-art feature selection methods, in terms of four classification evaluation metrics.
#5102

Recurrent Generative Networks for Multi-Resolution Satellite Data: An Application in Cropland Monitoring
Xiaowei Jia, Mengdie Wang, Ankush Khandelwal, Anuj Karpatne, Vipin Kumar
Details | PDF

Classification 5

Effective and timely monitoring of croplands is critical for managing food supply. While remote sensing data from earth-observing satellites can be used to monitor croplands over large regions, this task is challenging for small-scale croplands as they cannot be captured precisely using coarse-resolution data. On the other hand, the remote sensing data in higher resolution are collected less frequently and contain missing or disturbed data. Hence, traditional sequential models cannot be directly applied on high-resolution data to extract temporal patterns, which are essential to identify crops. In this work, we propose a generative model to combine multi-scale remote sensing data to detect croplands at high resolution. During the learning process, we leverage the temporal patterns learned from coarse-resolution data to generate missing high-resolution data. Additionally, the proposed model can track classification confidence in real time and potentially lead to an early detection. The evaluation in an intensively cultivated region demonstrates the effectiveness of the proposed method in cropland detection.
#6256

MR-GNN: Multi-Resolution and Dual Graph Neural Network for Predicting Structured Entity Interactions
Nuo Xu, Pinghui Wang, Long Chen, Jing Tao, Junzhou Zhao
Details | PDF

Classification 5

Predicting interactions between structured entities lies at the core of numerous tasks such as drug regimen and new material design. In recent years, graph neural networks have become attractive. They represent structured entities as graphs, and then extract features from each individual graph using graph convolution operations. However, these methods have some limitations: i) their networks only extract features from a fix-sized subgraph structure (i.e., a fix-sized receptive field) of each node, and ignore features in substructures of different sizes, and ii) features are extracted by considering each entity independently, which may not effectively reflect the interaction between two entities. To resolve these problems, we present {\em MR-GNN}, an end-to-end graph neural network with the following features: i) it uses a multi-resolution based architecture to extract node features from different neighborhoods of each node, and, ii) it uses dual graph-state long short-term memory networks (LSTMs) to summarize local features of each graph and extracts the interaction features between pairwise graphs. Experiments conducted on real-world datasets show that MR-GNN improves the prediction of state-of-the-art methods.
#535

Learning Low-precision Neural Networks without Straight-Through Estimator (STE)
Zhi-Gang Liu, Matthew Mattina
Details | PDF

Classification 5

The Straight-Through Estimator (STE) is widely used for back-propagating gradients through the quantization function, but the STE technique lacks a complete theoretical understanding. We propose an alternative methodology called alpha-blending (AB), which quantizes neural networks to low precision using stochastic gradient descent (SGD). Our AB method avoids STE approximation by replacing the quantized weight in the loss function by an affine combination of the quantized weight w_q and the corresponding full-precision weight w with non-trainable scalar coefficient alpha and (1- alpha). During training, alpha is gradually increased from 0 to 1; the gradient updates to the weights are through the full precision term, (1-alpha) * w, of the affine combination; the model is converted from full-precision to low precision progressively. To evaluate the AB method, a 1-bit BinaryNet on CIFAR10 dataset and 8-bits, 4-bits MobileNet v1, ResNet_50 v1/2 on ImageNet are trained using the alpha-blending approach, and the evaluation indicates that AB improves top-1 accuracy by 0.9\%, 0.82\% and 2.93\% respectively compared to the results of STE based quantization.

Wednesday 14 16:30 - 18:00 NLP|NLG - Natural Language Generation 1 (2605-2606)

Chair: Kaisong Song

#327

Difficulty Controllable Generation of Reading Comprehension Questions
Yifan Gao, Lidong Bing, Wang Chen, Michael Lyu, Irwin King
Details | PDF

Natural Language Generation 1

We investigate the difficulty levels of questions in reading comprehension datasets such as SQuAD, and propose a new question generation setting, named Difficulty-controllable Question Generation (DQG). Taking as input a sentence in the reading comprehension paragraph and some of its text fragments (i.e., answers) that we want to ask questions about, a DQG method needs to generate questions each of which has a given text fragment as its answer, and meanwhile the generation is under the control of specified difficulty labels---the output questions should satisfy the specified difficulty as much as possible. To solve this task, we propose an end-to-end framework to generate questions of designated difficulty levels by exploring a few important intuitions. For evaluation, we prepared the first dataset of reading comprehension questions with difficulty labels. The results show that the question generated by our framework not only have better quality under the metrics like BLEU, but also comply with the specified difficulty labels.
#835

HorNet: A Hierarchical Offshoot Recurrent Network for Improving Person Re-ID via Image Captioning
Shiyang Yan, Jun Xu, Yuai Liu, Lin Xu
Details | PDF

Natural Language Generation 1

Person re-identification (re-ID) aims to recognize a person-of-interest across different cameras with notable appearance variance. Existing research works focused on the capability and robustness of visual representation. In this paper, instead, we propose a novel hierarchical offshoot recurrent network (HorNet) for improving person re-ID via image captioning. Image captions are semantically richer and more consistent than visual attributes, which could significantly alleviate the variance. We use the similarity preserving generative adversarial network (SPGAN) and an image captioner to fulfill domain transfer and language descriptions generation. Then the proposed HorNet can learn the visual and language representation from both the images and captions jointly, and thus enhance the performance of person re-ID. Extensive experiments are conducted on several benchmark datasets with or without image captions, i.e., CUHK03, Market-1501, and Duke-MTMC, demonstrating the superiority of the proposed method. Our method can generate and extract meaningful image captions while achieving state-of-the-art performance.
#1898

T-CVAE: Transformer-Based Conditioned Variational Autoencoder for Story Completion
Tianming Wang, Xiaojun Wan
Details | PDF

Natural Language Generation 1

Story completion is a very challenging task of generating the missing plot for an incomplete story, which requires not only understanding but also inference of the given contextual clues. In this paper, we present a novel conditional variational autoencoder based on Transformer for missing plot generation. Our model uses shared attention layers for encoder and decoder, which make the most of the contextual clues, and a latent variable for learning the distribution of coherent story plots. Through drawing samples from the learned distribution, diverse reasonable plots can be generated. Both automatic and manual evaluations show that our model generates better story plots than state-of-the-art models in terms of readability, diversity and coherence.
#2108

A Dual Reinforcement Learning Framework for Unsupervised Text Style Transfer
Fuli Luo, Peng Li, Jie Zhou, Pengcheng Yang, Baobao Chang, Xu Sun, Zhifang Sui
Details | PDF

Natural Language Generation 1

Unsupervised text style transfer aims to transfer the underlying style of text but keep its main content unchanged without parallel data. Most existing methods typically follow two steps: first separating the content from the original style, and then fusing the content with the desired style. However, the separation in the first step is challenging because the content and style interact in subtle ways in natural language. Therefore, in this paper, we propose a dual reinforcement learning framework to directly transfer the style of the text via a one-step mapping model, without any separation of content and style. Specifically, we consider the learning of the source-to-target and target-to-source mappings as a dual task, and two rewards are designed based on such a dual structure to reflect the style accuracy and content preservation, respectively. In this way, the two one-step mapping models can be trained via reinforcement learning, without any use of parallel data. Automatic evaluations show that our model outperforms the state-of-the-art systems by a large margin, especially with more than 10 BLEU points improvement averaged on two benchmark datasets. Human evaluations also validate the effectiveness of our model in terms of style accuracy, content preservation and fluency. Our code and data, including outputs of all baselines and our model are available at https://github.com/luofuli/DualRL.
#3504

Knowledgeable Storyteller: A Commonsense-Driven Generative Model for Visual Storytelling
Pengcheng Yang, Fuli Luo, Peng Chen, Lei Li, Zhiyi Yin, Xiaodong He, Xu Sun
Details | PDF

Natural Language Generation 1

The visual storytelling (VST) task aims at generating a reasonable and coherent paragraph-level story with the image stream as input. Different from caption that is a direct and literal description of image content, the story in the VST task tends to contain plenty of imaginary concepts that do not appear in the image. This requires the AI agent to reason and associate with the imaginary concepts based on implicit commonsense knowledge to generate a reasonable story describing the image stream. Therefore, in this work, we present a commonsense-driven generative model, which aims to introduce crucial commonsense from the external knowledge base for visual storytelling. Our approach first extracts a set of candidate knowledge graphs from the knowledge base. Then, an elaborately designed vision-aware directional encoding schema is adopted to effectively integrate the most informative commonsense. Besides, we strive to maximize the semantic similarity within the output during decoding to enhance the coherence of the generated text. Results show that our approach can outperform the state-of-the-art systems by a large margin, which achieves a 29\% relative improvement of CIDEr score. With additional commonsense and semantic-relevance based objective, the generated stories are more diverse and coherent.
#6196

Utilizing Non-Parallel Text for Style Transfer by Making Partial Comparisons
Di Yin, Shujian Huang, Xin-Yu Dai, Jiajun Chen
Details | PDF

Natural Language Generation 1

Text style transfer aims to rephrase a given sentence into a different style without changing its original content. Since parallel corpora (i.e. sentence pairs with the same content but different styles) are usually unavailable, most previous works solely guide the transfer process with distributional information, i.e. using style-related classifiers or language models, which neglect the correspondence of instances, leading to poor transfer performance, especially for the content preservation. In this paper, we propose making partial comparisons to explicitly model the content and style correspondence of instances, respectively. To train the partial comparators, we propose methods to extract partial-parallel training instances automatically from the non-parallel data, and to further enhance the training process by using data augmentation. We perform experiments that compare our method to other existing approaches on two review datasets. Both automatic and manual evaluations show that our approach can significantly improve the performance of existing adversarial methods, and outperforms most state-of-the-art models. Our code and data will be available on Github.

Wednesday 14 16:30 - 18:00 CV|DDCV - 2D and 3D Computer Vision 2 (2501-2502)

Chair: Xuejin Chen

#487

3DViewGraph: Learning Global Features for 3D Shapes from A Graph of Unordered Views with Attention
Zhizhong Han, Xiyang Wang, Chi Man Vong, Yu-Shen Liu, Matthias Zwicker, C. L. Philip Chen
Details | PDF

2D and 3D Computer Vision 2

Learning global features by aggregating information over multiple views has been shown to be effective for 3D shape analysis. For view aggregation in deep learning models, pooling has been applied extensively. However, pooling leads to a loss of the content within views, and the spatial relationship among views, which limits the discriminability of learned features. We propose 3DViewGraph to resolve this issue, which learns 3D global features by more effectively aggregating unordered views with attention. Specifically, unordered views taken around a shape are regarded as view nodes on a view graph. 3DViewGraph first learns a novel latent semantic mapping to project low-level view features into meaningful latent semantic embeddings in a lower dimensional space, which is spanned by latent semantic patterns. Then, the content and spatial information of each pair of view nodes are encoded by a novel spatial pattern correlation, where the correlation is computed among latent semantic patterns. Finally, all spatial pattern correlations are integrated with attention weights learned by a novel attention mechanism. This further increases the discriminability of learned features by highlighting the unordered view nodes with distinctive characteristics and depressing the ones with appearance ambiguity. We show that 3DViewGraph outperforms state-of-the-art methods under three large-scale benchmarks.
#497

Parts4Feature: Learning 3D Global Features from Generally Semantic Parts in Multiple Views
Zhizhong Han, Xinhai Liu, Yu-Shen Liu, Matthias Zwicker
Details | PDF

2D and 3D Computer Vision 2

Deep learning has achieved remarkable results in 3D shape analysis by learning global shape features from the pixel-level over multiple views. Previous methods, however, compute low-level features for entire views without considering part-level information. In contrast, we propose a deep neural network, called Parts4Feature, to learn 3D global features from part-level information in multiple views. We introduce a novel definition of generally semantic parts, which Parts4Feature learns to detect in multiple views from different 3D shape segmentation benchmarks. A key idea of our architecture is that it transfers the ability to detect semantically meaningful parts in multiple views to learn 3D global features. Parts4Feature achieves this by combining a local part detection branch and a global feature learning branch with a shared region proposal module. The global feature learning branch aggregates the detected parts in terms of learned part patterns with a novel multi-attention mechanism, while the region proposal module enables locally and globally discriminative information to be promoted by each other. We demonstrate that Parts4Feature outperforms the state-of-the-art under three large-scale 3D shape benchmarks.
#1852

Rethinking Loss Design for Large-scale 3D Shape Retrieval
Zhaoqun Li, Cheng Xu, Biao Leng
Details | PDF

2D and 3D Computer Vision 2

Learning discriminative shape representations is a crucial issue for large-scale 3D shape retrieval. In this paper, we propose the Collaborative Inner Product Loss (CIP Loss) to obtain ideal shape embedding that discriminative among different categories and clustered within the same class. Utilizing simple inner product operation, CIP loss explicitly enforces the features of the same class to be clustered in a linear subspace, while inter-class subspaces are constrained to be at least orthogonal. Compared to previous metric loss functions, CIP loss could provide more clear geometric interpretation for the embedding than Euclidean margin, and is easy to implement without normalization operation referring to cosine margin. Moreover, our proposed loss term can combine with other commonly used loss functions and can be easily plugged into existing off-the-shelf architectures. Extensive experiments conducted on the two public 3D object retrieval datasets, ModelNet and ShapeNetCore 55, demonstrate the effectiveness of our proposal, and our method has achieved state-of-the-art results on both datasets.
#3897

MAT-Net: Medial Axis Transform Network for 3D Object Recognition
Jianwei Hu, Bin Wang, Lihui Qian, Yiling Pan, Xiaohu Guo, Lingjie Liu, Wenping Wang
Details | PDF

2D and 3D Computer Vision 2

3D deep learning performance depends on object representation and local feature extraction. In this work, we present MAT-Net, a neural network which captures local and global features from the Medial Axis Transform (MAT). Different from K-Nearest-Neighbor method which extracts local features by a fixed number of neighbors, our MAT-Net exploits effective modules Group-MAT and Edge-Net to process topological structure. Experimental results illustrate that MAT-Net demonstrates competitive or better performance on 3D shape recognition than state-of-the-art methods, and prove that MAT representation has excellent capacity in 3D deep learning, even in the case of low resolution.
#873

Semi-supervised Three-dimensional Reconstruction Framework with GAN
Chong Yu
Details | PDF

2D and 3D Computer Vision 2

Because of the intrinsic complexity in computation, three-dimensional (3D) reconstruction is an essential and challenging topic in computer vision research and applications. The existing methods for 3D reconstruction often produce holes, distortions and obscure parts in the reconstructed 3D models, or can only reconstruct voxelized 3D models for simple isolated objects. So they are not adequate for real usage. From 2014, the Generative Adversarial Network (GAN) is widely used in generating unreal dataset and semi-supervised learning. So the focus of this paper is to achieve high quality 3D reconstruction performance by adopting GAN principle. We propose a novel semi-supervised 3D reconstruction framework, namely SS-3D-GAN, which can iteratively improve any raw 3D reconstruction models by training the GAN models to converge. This new model only takes real-time 2D observation images as the weak supervision, and doesn't rely on prior knowledge of shape models or any referenced observations. Finally, through the qualitative and quantitative experiments & analysis, this new method shows compelling advantages over the current state-of-the-art methods on Tanks & Temples reconstruction benchmark dataset.
#1325

Region Deformer Networks for Unsupervised Depth Estimation from Unconstrained Monocular Videos
Haofei Xu, Jianmin Zheng, Jianfei Cai, Juyong Zhang
Details | PDF

2D and 3D Computer Vision 2

While learning based depth estimation from images/videos has achieved substantial progress, there still exist intrinsic limitations. Supervised methods are limited by a small amount of ground truth or labeled data and unsupervised methods for monocular videos are mostly based on the static scene assumption, not performing well on real world scenarios with the presence of dynamic objects. In this paper, we propose a new learning based method consisting of DepthNet, PoseNet and Region Deformer Networks (RDN) to estimate depth from unconstrained monocular videos without ground truth supervision. The core contribution lies in RDN for proper handling of rigid and non-rigid motions of various objects such as rigidly moving cars and deformable humans. In particular, a deformation based motion representation is proposed to model individual object motion on 2D images. This representation enables our method to be applicable to diverse unconstrained monocular videos. Our method can not only achieve the state-of-the-art results on standard benchmarks KITTI and Cityscapes, but also show promising results on a crowded pedestrian tracking dataset, which demonstrates the effectiveness of the deformation based motion representation. Code and trained models are available at https://github.com/haofeixu/rdn4depth.

Wednesday 14 16:30 - 18:00 PS|SPS - Search in Planning and Scheduling (2503-2504)

Chair: Alvaro Torralba

#950

Pattern Selection for Optimal Classical Planning with Saturated Cost Partitioning
Jendrik Seipp
Details | PDF

Search in Planning and Scheduling

Pattern databases are the foundation of some of the strongest admissible heuristics for optimal classical planning. Experiments showed that the most informative way of combining information from multiple pattern databases is to use saturated cost partitioning. Previous work selected patterns and computed saturated cost partitionings over the resulting pattern database heuristics in two separate steps. We introduce a new method that uses saturated cost partitioning to select patterns and show that it outperforms all existing pattern selection algorithms.
#1640

Bayesian Inference of Linear Temporal Logic Specifications for Contrastive Explanations
Joseph Kim, Christian Muise, Ankit Shah, Shubham Agarwal, Julie Shah
Details | PDF

Search in Planning and Scheduling

Temporal logics are useful for providing concise descriptions of system behavior, and have been successfully used as a language for goal definitions in task planning. Prior works on inferring temporal logic specifications have focused on "summarizing" the input dataset - i.e., finding specifications that are satisfied by all plan traces belonging to the given set. In this paper, we examine the problem of inferring specifications that describe temporal differences between two sets of plan traces. We formalize the concept of providing such contrastive explanations, then present BayesLTL - a Bayesian probabilistic model for inferring contrastive explanations as linear temporal logic (LTL) specifications. We demonstrate the robustness and scalability of our model for inferring accurate specifications from noisy data and across various benchmark planning domains.
#10956

(Sister Conferences Best Papers Track) On Guiding Search in HTN Planning with Classical Planning Heuristics
Daniel Höller, Pascal Bercher, Gregor Behnke, Susanne Biundo
Details | PDF

Search in Planning and Scheduling

Planning is the task of finding a sequence of actions that achieves the goal(s) of an agent. It is solved based on a model describing the environment and how to change it. There are several approaches to solve planning tasks, two of the most popular are classical planning and hierarchical planning. Solvers are often based on heuristic search, but especially regarding domain-independent heuristics, techniques in classical planning are more sophisticated. However, due to the different problem classes, it is difficult to use them in hierarchical planning. In this paper we describe how to use arbitrary classical heuristics in hierarchical planning and show that the resulting system outperforms the state of the art in hierarchical planning.
#5204

Earliest-Completion Scheduling of Contract Algorithms with End Guarantees
Spyros Angelopoulos, Shendan Jin
Details | PDF

Search in Planning and Scheduling

We consider the setting in which executions of contract algorithms are scheduled in a processor so as to produce an interruptible system. Such algorithms offer a trade off between the quality of output and the available computation time, provided that the latter is known in advance. Previous work on this setting has provided strict performance guarantees for several variants of this setting, assuming that an interruption can occur arbitrarily ahead in the future. In practice, however, one expects that the schedule will reach a point beyond which further progress will only be marginal, hence it can be deemed complete. In this work we show how to optimize the time at which the system reaches a desired performance objective, while maintaining interruptible guarantees throughout the entire execution. The resulting schedule is provably optimal, and it guarantees that upon completion each individual contract algorithm has attained a predefined end guarantee.
#3582

A Novel Distribution-Embedded Neural Network for Sensor-Based Activity Recognition
Hangwei Qian, Sinno Jialin Pan, Bingshui Da, Chunyan Miao
Details | PDF

Search in Planning and Scheduling

Feature-engineering-based machine learning models and deep learning models have been explored for wearable-sensor-based human activity recognition. For both types of methods, one crucial research issue is how to extract proper features from the partitioned segments of multivariate sensor readings. Existing methods have different drawbacks: 1) feature-engineering-based methods are able to extract meaningful features, such as statistical or structural information underlying the segments, but usually require manual designs of features for different applications, which is time consuming, and 2) deep learning models are able to learn temporal and/or spatial features from the sensor data automatically, but fail to capture statistical information. In this paper, we propose a novel deep learning model to automatically learn meaningful features including statistical features, temporal features and spatial correlation features for activity recognition in a unified framework. Extensive experiments are conducted on four datasets to demonstrate the effectiveness of our proposed method compared with state-of-the-art baselines.
#3244

Online Probabilistic Goal Recognition over Nominal Models
Ramon Fraga Pereira, Mor Vered, Felipe Meneguzzi, Miquel Ramírez
Details | PDF

Search in Planning and Scheduling

This paper revisits probabilistic, model-based goal recognition to study the implications of the use of nominal models to estimate the posterior probability distribution over a finite set of hypothetical goals. Existing model-based approaches rely on expert knowledge to produce symbolic descriptions of the dynamic constraints domain objects are subject to, and these are assumed to produce correct predictions. We abandon this assumption to consider the use of nominal models that are learnt from observations on transitions of systems with unknown dynamics. Leveraging existing work on the acquisition of domain models via learning for Hybrid Planning we adapt and evaluate existing goal recognition approaches to analyze how prediction errors, inherent to system dynamics identification and model learning techniques have an impact over recognition error rates.

Wednesday 14 16:30 - 18:00 ML|DM - Data Mining 7 (2505-2506)

Chair: Xiangliang Zhang

#1779

DMRAN:A Hierarchical Fine-Grained Attention-Based Network for Recommendation
Huizhao Wang, Guanfeng Liu, An Liu, Zhixu Li, Kai Zheng
Details | PDF

Data Mining 7

The conventional methods for the next-item recommendation are generally based on RNN or one- dimensional attention with time encoding. They are either hard to preserve the long-term dependencies between different interactions, or hard to capture fine-grained user preferences. In this paper, we propose a Double Most Relevant Attention Network (DMRAN) that contains two layers, i.e., Item level Attention and Feature Level Self- attention, which are to pick out the most relevant items from the sequence of user’s historical behaviors, and extract the most relevant aspects of relevant items, respectively. Then, we can capture the fine-grained user preferences to better support the next-item recommendation. Extensive experiments on two real-world datasets illustrate that DMRAN can improve the efficiency and effectiveness of the recommendation compared with the state-of-the-art methods.
#2422

ProNE: Fast and Scalable Network Representation Learning
Jie Zhang, Yuxiao Dong, Yan Wang, Jie Tang, Ming Ding
Details | PDF

Data Mining 7

Recent advances in network embedding has revolutionized the field of graph and network mining. However, (pre-)training embeddings for very large-scale networks is computationally challenging for most existing methods. In this work, we present ProNE---a fast, scalable, and effective model, whose single-thread version is 10--400x faster than efficient network embedding benchmarks with 20 threads, including LINE, DeepWalk, node2vec, GraRep, and HOPE. As a concrete example, the single-version ProNE requires only 29 hours to embed a network of hundreds of millions of nodes while it takes LINE weeks and DeepWalk months by using 20 threads. To achieve this, ProNE first initializes network embeddings efficiently by formulating the task as sparse matrix factorization. The second step of ProNE is to enhance the embeddings by propagating them in the spectrally modulated space. Extensive experiments on networks of various scales and types demonstrate that ProNE achieves both effectiveness and significant efficiency superiority when compared to the aforementioned baselines. In addition, ProNE's embedding enhancement step can be also generalized for improving other models at speed, e.g., offering >10% relative gains for the used baselines.
#2793

Hi-Fi Ark: Deep User Representation via High-Fidelity Archive Network
Zheng Liu, Yu Xing, Fangzhao Wu, Mingxiao An, Xing Xie
Details | PDF

Data Mining 7

Deep learning techniques have been widely applied to modern recommendation systems, bringing in flexible and effective ways of user representation. Conventionally, user representations are generated purely in the offline stage. Without referencing to the specific candidate item for recommendation, it is difficult to fully capture user preference from the perspective of interest. More recent algorithms tend to generate user representation at runtime, where user's historical behaviors are attentively summarized w.r.t. the presented candidate item. In spite of the improved efficacy, it is too expensive for many real-world scenarios because of the repetitive access to user's entire history. In this work, a novel user representation framework, Hi-Fi Ark, is proposed. With Hi-Fi Ark, user history is summarized into highly compact and complementary vectors in the offline stage, known as archives. Meanwhile, user preference towards a specific candidate item can be precisely captured via the attentive aggregation of such archives. As a result, both deployment feasibility and superior recommendation efficacy are achieved by Hi-Fi Ark. The effectiveness of Hi-Fi Ark is empirically validated on three real-world datasets, where remarkable and consistent improvements are made over a variety of well-recognized baseline methods.
#4239

Deep Active Learning for Anchor User Prediction
Anfeng Cheng, Chuan Zhou, Hong Yang, Jia Wu, Lei Li, Jianlong Tan, Li Guo
Details | PDF

Data Mining 7

Predicting pairs of anchor users plays an important role in the cross-network analysis. Due to the expensive costs of labeling anchor users for training prediction models, we consider in this paper the problem of minimizing the number of user pairs across multiple networks for labeling as to improve the accuracy of the prediction. To this end, we present a deep active learning model for anchor user prediction (DALAUP for short). However, active learning for anchor user sampling meets the challenges of non-i.i.d. user pair data caused by network structures and the correlation among anchor or non-anchor user pairs. To solve the challenges, DALAUP uses a couple of neural networks with shared-parameter to obtain the vector representations of user pairs, and ensembles three query strategies to select the most informative user pairs for labeling and model training. Experiments on real-world social network data demonstrate that DALAUP outperforms the state-of-the-art approaches.
#3364

On the Estimation of Treatment Effect with Text Covariates
Liuyi Yao, Sheng Li, Yaliang Li, Hongfei Xue, Jing Gao, Aidong Zhang
Details | PDF

Data Mining 7

Estimating the treatment effect benefits decision making in various domains as it can provide the potential outcomes of different choices. Existing work mainly focuses on covariates with numerical values, while how to handle covariates with textual information for treatment effect estimation is still an open question. One major challenge is how to filter out the nearly instrumental variables which are the variables more predictive to the treatment than the outcome. Conditioning on those variables to estimate the treatment effect would amplify the estimation bias. To address this challenge, we propose a conditional treatment-adversarial learning based matching method (CTAM). CTAM incorporates the treatment-adversarial learning to filter out the information related to nearly instrumental variables when learning the representations, and then it performs matching among the learned representations to estimate the treatment effects. The conditional treatment-adversarial learning helps reduce the bias of treatment effect estimation, which is demonstrated by our experimental results on both semi-synthetic and real-world datasets.
#3561

Noise-Resilient Similarity Preserving Network Embedding for Social Networks
Zhenyu Qiu, Wenbin Hu, Jia Wu, ZhongZheng Tang, Xiaohua Jia
Details | PDF

Data Mining 7

Network embedding assigns nodes in a network to low-dimensional representations and effectively preserves the structure and inherent properties of the network. Most existing network embedding methods didn't consider network noise. However, it is almost impossible to observe the actual structure of a real-world network without noise. The noise in the network will affect the performance of network embedding dramatically. In this paper, we aim to exploit node similarity to address the problem of social network embedding with noise and propose a node similarity preserving (NSP) embedding method. NSP exploits a comprehensive similarity index to quantify the authenticity of the observed network structure. Then we propose an algorithm to construct a correction matrix to reduce the influence of noise. Finally, an objective function for accurate network embedding is proposed and an efficient algorithm to solve the optimization problem is provided. Extensive experimental results on a variety of applications of real-world networks with noise show the superior performance of the proposed method over the state-of-the-art methods.

Wednesday 14 16:30 - 18:00 ML|TAML - Transfer, Adaptation, Multi-task Learning 3 (2401-2402)

Chair: Shalini Ghosh

#4195

Differentially Private Optimal Transport: Application to Domain Adaptation
Nam LeTien, Amaury Habrard, Marc Sebban
Details | PDF

Transfer, Adaptation, Multi-task Learning 3

Optimal transport has received much attention during the past few years to deal with domain adaptation tasks. The goal is to transfer knowledge from a source domain to a target domain by finding a transportation of minimal cost moving the source distribution to the target one. In this paper, we address the challenging task of privacy preserving domain adaptation by optimal transport. Using the Johnson-Lindenstrauss transform together with some noise, we present the first differentially private optimal transport model and show how it can be directly applied on both unsupervised and semi-supervised domain adaptation scenarios. Our theoretically grounded method allows the optimization of the transportation plan and the Wasserstein distance between the two distributions while protecting the data of both domains.We perform an extensive series of experiments on various benchmarks (VisDA, Office-Home and Office-Caltech datasets) that demonstrates the efficiency of our method compared to non-private strategies.
#5546

Learning to Learn Gradient Aggregation by Gradient Descent
Jinlong Ji, Xuhui Chen, Qianlong Wang, Lixing Yu, Pan Li
Details | PDF

Transfer, Adaptation, Multi-task Learning 3

In the big data era, distributed machine learning emerges as an important learning paradigm to mine large volumes of data by taking advantage of distributed computing resources. In this work, motivated by learning to learn, we propose a meta-learning approach to coordinate the learning process in the master-slave type of distributed systems. Specifically, we utilize a recurrent neural network (RNN) in the parameter server (the master) to learn to aggregate the gradients from the workers (the slaves). We design a coordinatewise preprocessing and postprocessing method to make the neural network based aggregator more robust. Besides, to address the fault tolerance, especially the Byzantine attack, in distributed machine learning systems, we propose an RNN aggregator with additional loss information (ARNN) to improve the system resilience. We conduct extensive experiments to demonstrate the effectiveness of the RNN aggregator, and also show that it can be easily generalized and achieve remarkable performance when transferred to other distributed systems. Moreover, under majoritarian Byzantine attacks, the ARNN aggregator outperforms the Krum, the state-of-art fault tolerance aggregation method, by 43.14%. In addition, our RNN aggregator enables the server to aggregate gradients from variant local models, which significantly improve the scalability of distributed learning.
#6042

Efficient Protocol for Collaborative Dictionary Learning in Decentralized Networks
Tsuyoshi Idé, Rudy Raymond, Dzung T. Phan
Details | PDF

Transfer, Adaptation, Multi-task Learning 3

This paper is concerned with the task of collaborative density estimation in the distributed multi-task setting. Major application scenarios include collaborative anomaly detection among distributed industrial assets owned by different companies competing with each other. Of critical importance here is to achieve two conflicting goals at once: data privacy and collaboration. To this end, we propose a new framework for collaborative dictionary learning. By using a mixture of the exponential family, we show that collaborative learning can be nicely separated into three steps: local updates, global consensus, and optimization. For the critical step of consensus building, we propose a new algorithm that does not rely on expensive encryption-based multi-party computation. Our theoretical and experimental analysis shows that our method is several orders of magnitude faster than the alternative.
#3575

Knowledge Amalgamation from Heterogeneous Networks by Common Feature Learning
Sihui Luo, Xinchao Wang, Gongfan Fang, Yao Hu, Dapeng Tao, Mingli Song
Details | PDF

Transfer, Adaptation, Multi-task Learning 3

An increasing number of well-trained deep networks have been released online by researchers and developers, enabling the community to reuse them in a plug-and-play way without accessing the training annotations. However, due to the large number of network variants, such public-available trained models are often of different architectures, each of which being tailored for a specific task or dataset. In this paper, we study a deep-model reusing task, where we are given as input pre-trained networks of heterogeneous architectures specializing in distinct tasks, as teacher models. We aim to learn a multitalented and light-weight student model that is able to grasp the integrated knowledge from all such heterogeneous-structure teachers, again without accessing any human annotation. To this end, we propose a common feature learning scheme, in which the features of all teachers are transformed into a common space and the student is enforced to imitate them all so as to amalgamate the intact knowledge. We test the proposed approach on a list of benchmarks and demonstrate that the learned student is able to achieve very promising performance, superior to those of the teachers in their specialized tasks.
#6456

Landmark Selection for Zero-shot Learning
Yuchen Guo, Guiguang Ding, Jungong Han, Chenggang Yan, Jiyong Zhang, Qionghai Dai
Details | PDF

Transfer, Adaptation, Multi-task Learning 3

Zero-shot learning (ZSL) is an emerging research topic whose goal is to build recognition models for previously unseen classes. The basic idea of ZSL is based on heterogeneous feature matching which learns a compatibility function between image and class features using seen classes. The function is constructed based on one-vs-all training in which each class has only one class feature and many image features. Existing ZSL works mostly treat all image features equivalently. However, in this paper we argue that it is more reasonable to use some representative cross-domain data instead of all. Motivated by this idea, we propose a novel approach, termed as Landmark Selection(LAST) for ZSL. LAST is able to identify representative cross-domain features which further lead to better image-class compatibility function. Experiments on several ZSL datasets including ImageNet demonstrate the superiority of LAST to the state-of-the-arts.
#501

Bayesian Uncertainty Matching for Unsupervised Domain Adaptation
Jun Wen, Nenggan Zheng, Junsong Yuan, Zhefeng Gong, Changyou Chen
Details | PDF

Transfer, Adaptation, Multi-task Learning 3

Domain adaptation is an important technique to alleviate performance degradation caused by domain shift, e.g., when training and test data come from different domains. Most existing deep adaptation methods focus on reducing domain shift by matching marginal feature distributions through deep transformations on the input features, due to the unavailability of target domain labels. We show that domain shift may still exist via label distribution shift at the classifier, thus deteriorating model performances. To alleviate this issue, we propose an approximate joint distribution matching scheme by exploiting prediction uncertainty. Specifically, we use a Bayesian neural network to quantify prediction uncertainty of a classifier. By imposing distribution matching on both features and labels (via uncertainty), label distribution mismatching in source and target data is effectively alleviated, encouraging the classifier to produce consistent predictions across domains. We also propose a few techniques to improve our method by adaptively reweighting domain adaptation loss to achieve nontrivial distribution matching and stable training. Comparisons with state of the art unsupervised domain adaptation methods on three popular benchmark datasets demonstrate the superiority of our approach, especially on the effectiveness of alleviating negative transfer.

Wednesday 14 16:30 - 18:00 UAI|SDM - Sequential Decision Making (2403-2404)

Chair: Tran-Thanh Long

#280

Thompson Sampling on Symmetric Alpha-Stable Bandits
Abhimanyu Dubey, Alex `Sandy' Pentland
Details | PDF

Sequential Decision Making

Thompson Sampling provides an efficient technique to introduce prior knowledge in the multi-armed bandit problem, along with providing remarkable empirical performance. In this paper, we revisit the Thompson Sampling algorithm under rewards drawn from symmetric alpha-stable distributions, which are a class of heavy-tailed probability distributions utilized in finance and economics, in problems such as modeling stock prices and human behavior. We present an efficient framework for posterior inference, which leads to two algorithms for Thompson Sampling in this setting. We prove finite-time regret bounds for both algorithms, and demonstrate through a series of experiments the stronger performance of Thompson Sampling in this setting. With our results, we provide an exposition of symmetric alpha-stable distributions in sequential decision-making, and enable sequential Bayesian inference in applications from diverse fields in finance and complex systems that operate on heavy-tailed features.
#3937

ISLF: Interest Shift and Latent Factors Combination Model for Session-based Recommendation
Jing Song, Hong Shen, Zijing Ou, Junyi Zhang, Teng Xiao, Shangsong Liang
Details | PDF

Sequential Decision Making

Session-based recommendation is a challenging problem due to the inherent uncertainty of user behavior and the limited historical click information. Latent factors and the complex dependencies within the user’s current session have an important impact on the user's main intention, but the existing methods do not explicitly consider this point. In this paper, we propose a novel model, Interest Shift and Latent Factors Combination Model (ISLF), which can capture the user's main intention by taking into account the user’s interest shift (i.e. long-term and short-term interest) and latent factors simultaneously. In addition, we experimentally give an explicit explanation of this combination in our ISLF. Our experimental results on three benchmark datasets show that our model achieves state-of-the-art performance on all test datasets.
#6516

Automated Negotiation with Gaussian Process-based Utility Models
Haralambie Leahu, Michael Kaisers, Tim Baarslag
Details | PDF

Sequential Decision Making

Designing agents that can efficiently learn and integrate user's preferences into decision making processes is a key challenge in automated negotiation. While accurate knowledge of user preferences is highly desirable, eliciting the necessary information might be rather costly, since frequent user interactions may cause inconvenience. Therefore, efficient elicitation strategies (minimizing elicitation costs) for inferring relevant information are critical. We introduce a stochastic, inverse-ranking utility model compatible with the Gaussian Process preference learning framework and integrate it into a (belief) Markov Decision Process paradigm which formalizes automated negotiation processes with incomplete information. Our utility model, which naturally maps ordinal preferences (inferred from the user) into (random) utility values (with the randomness reflecting the underlying uncertainty), provides the basic quantitative modeling ingredient for automated (agent-based) negotiation.
#759

AdaLinUCB: Opportunistic Learning for Contextual Bandits
Xueying Guo, Xiaoxiao Wang, Xin Liu
Details | PDF

Sequential Decision Making

In this paper, we propose and study opportunistic contextual bandits - a special case of contextual bandits where the exploration cost varies under different environmental conditions, such as network load or return variation in recommendations. When the exploration cost is low, so is the actual regret of pulling a sub-optimal arm (e.g., trying a suboptimal recommendation). Therefore, intuitively, we could explore more when the exploration cost is relatively low and exploit more when the exploration cost is relatively high. Inspired by this intuition, for opportunistic contextual bandits with Linear payoffs, we propose an Adaptive Upper-Confidence-Bound algorithm (AdaLinUCB) to adaptively balance the exploration-exploitation trade-off for opportunistic learning. We prove that AdaLinUCB achieves O((log T)^2) problem-dependent regret upper bound, which has a smaller coefficient than that of the traditional LinUCB algorithm. Moreover, based on both synthetic and real-world dataset, we show that AdaLinUCB significantly outperforms other contextual bandit algorithms, under large exploration cost fluctuations.
#1724

Exact Bernoulli Scan Statistics using Binary Decision Diagrams
Masakazu Ishihata, Takanori Maehara
Details | PDF

Sequential Decision Making

In combinatorial statistics, we are interested in a statistical test of combinatorial correlation, i.e., existence a subset from an underlying combinatorial structure such that the observation is large on the subset. The combinatorial scan statistics has been proposed for such a statistical test; however, it is not commonly used in practice because of its high computational cost. In this study, we restrict our attention to the case that the number of data points is moderately small (e.g., 50), the outcome is binary, and the underlying combinatorial structure is represented by a zero-suppressed binary decision diagram (ZDD), and consider the problem of computing the p-value of the combinatorial scan statistics exactly. First, we prove that this problem is a #P-hard problem. Then, we propose a practical algorithm that solves the problem. Here, the algorithm constructs a binary decision diagram (BDD) for a set of realizations of the random variables by a dynamic programming on the ZDD, and computes the p-value by a dynamic programming on the BDD. We conducted experiments to evaluate the performance of the proposed algorithm using real-world datasets.
#4003

Cascaded Algorithm-Selection and Hyper-Parameter Optimization with Extreme-Region Upper Confidence Bound Bandit
Yi-Qi Hu, Yang Yu, Jun-Da Liao
Details | PDF

Sequential Decision Making

An automatic machine learning (AutoML) task is to select the best algorithm and its hyper-parameters simultaneously. Previously, the hyper-parameters of all algorithms are joint as a single search space, which is not only huge but also redundant, because many dimensions of hyper-parameters are irrelevant with the selected algorithms. In this paper, we propose a cascaded approach for algorithm selection and hyper-parameter optimization. While a search procedure is employed at the level of hyper-parameter optimization, a bandit strategy runs at the level of algorithm selection to allocate the budget based on the search feedbacks. Since the bandit is required to select the algorithm with the maximum performance, instead of the average performance, we thus propose the extreme-region upper confidence bound (ER-UCB) strategy, which focuses on the extreme region of the underlying feedback distribution. We show theoretically that the ER-UCB has a regret upper bound O(K ln n) with independent feedbacks, which is as efficient as the classical UCB bandit. We also conduct experiments on a synthetic problem as well as a set of AutoML tasks. The results verify the effectiveness of the proposed method.

Wednesday 14 16:30 - 18:00 CSAT|CS 1 - Constraint Satisfaction 1 (2405-2406)

Chair: Kuldeep Meel

#2450

Phase Transition Behavior of Cardinality and XOR Constraints
Yash Pote, Saurabh Joshi, Kuldeep S. Meel
Details | PDF

Constraint Satisfaction 1

The runtime performance of modern SAT solvers is deeply connected to the phase transition behavior of CNF formulas. While CNF solving has witnessed significant runtime improvement over the past two decades, the same does not hold for several other classes such as the conjunction of cardinality and XOR constraints, denoted as CARD-XOR formulas. The problem of determining satisfiability of CARD-XOR formulas is a fundamental problem with wide variety of applications ranging from discrete integration in the field of artificial intelligence to maximum likelihood decoding in coding theory. The runtime behavior of random CARD-XOR formulas is unexplored in prior work. In this paper, we present the first rigorous empirical study to characterize the runtime behavior of 1-CARD-XOR formulas. We show empirical evidence of a surprising phase-transition that follows a non-linear tradeoff between CARD and XOR constraints.
#3226

DoubleLex Revisited and Beyond
Xuming Huang, Jimmy Lee
Details | PDF

Constraint Satisfaction 1

The paper proposes Maximum Residue (MR) as a notion to evaluate the strength of a symmetry breaking method. We give a proof to improve the best known DoubleLex MR upper bound from m!n! - (m!+n!) to min(m!,n!) for an m x n matrix model. Our result implies that DoubleLex works well on matrix models where min(m, n) is relatively small. We further study the MR bounds of SwapNext and SwapAny, which are extensions to DoubleLex breaking further a small number of composition symmetries. Such theoretical comparisons suggest general principles on selecting Lex-based symmetry breaking methods based on the dimensions of the matrix models. Our experiments confirm the theoretical predictions as well as efficiency of these methods.
#4365

Constraint-Based Scheduling with Complex Setup Operations: An Iterative Two-Layer Approach
Adriana Pacheco, Cédric Pralet, Stéphanie Roussel
Details | PDF

Constraint Satisfaction 1

In this paper, we consider scheduling problems involving resources that must perform complex setup operations between the tasks they realize. To deal with such problems, we introduce a simple yet efficient iterative two-layer decision process that alternates between the fast synthesis of high-level schedules based on a coarse-grain model of setup operations, and the production of detailed schedules based on a fine-grain model. Experiments realized on representative benchmarks of a multi-robot application show the efficiency of the approach.
#5034

How to Tame Your Anticipatory Algorithm
Allegra De Filippo, Michele Lombardi, Michela Milano
Details | PDF

Constraint Satisfaction 1

Sampling-based anticipatory algorithms can be very effective at solving online optimization problems under uncertainty, but their computational cost may be prohibitive in some cases. Given an arbitrary anticipatory algorithm, we present three methods that allow to retain its solution quality at a fraction of the online computational cost, via a substantial degree of offline preparation. Our approaches are obtained by combining: 1) a simple technique to identify likely future outcomes based on past observations; 2) the (expensive) offline computation of a "contingency table"; and 3) an efficient solution-fixing heuristic. We ground our techniques on two case studies: an energy management system with uncertain renewable generation and load demand, and a traveling salesman problem with uncertain travel times. In both cases, our techniques achieve high solution quality, while substantially reducing the online computation time.
#4738

Constraint Programming for Mining Borders of Frequent Itemsets
Mohamed-Bachir Belaid, Christian Bessiere, Nadjib Lazaar
Details | PDF

Constraint Satisfaction 1

Frequent itemset mining is one of the most studied tasks in knowledge discovery. It is often reduced to mining the positive border of frequent itemsets, i.e. maximal frequent itemsets. Infrequent itemset mining, on the other hand, can be reduced to mining the negative border, i.e. minimal infrequent itemsets. We propose a generic framework based on constraint programming to mine both borders of frequent itemsets.One can easily decide which border to mine by setting a simple parameter. For this, we introduce two new global constraints, FREQUENTSUBS and INFREQUENTSUPERS, with complete polynomial propagators. We then consider the problem of mining borders with additional constraints. We prove that this problem is coNP-hard, ruling out the hope for the existence of a single CSP solving this problem (unless coNP ⊆ NP).
#11071

(Sister Conferences Best Papers Track) Clause Learning and New Bounds for Graph Coloring
Emmanuel Hebrard, George Katsirelos
Details | PDF

Constraint Satisfaction 1

Graph coloring is a major component of numerous allocation and scheduling problems. We introduce a hybrid CP/SAT approach to graph coloring based on exploring Zykov’s tree: for two non-neighbors, either they take a different color and there might as well be an edge between them, or they take the same color and we might as well merge them. Branching on whether two neighbors get the same color yields a symmetry-free tree with complete graphs as leaves, which correspond to colorings of the original graph. We introduce a new lower bound for this problem based on Mycielskian graphs; a method to produce a clausal explanation of this bound for use in a CDCL algorithm; and a branching heuristic emulating Brelaz on the Zykov tree. The combination of these techniques in a branch- and-bound search outperforms Dsatur and other SAT-based approaches on standard benchmarks both for finding upper bounds and for proving lower bounds.

Wednesday 14 17:00 - 17:30 Industry Days (K)

Chair: Quan Lu (Alibaba Group)

Empowering Urban Refined Management: AI practice in Shanghai Lingang
Lu Xiaoyuan, Director, Virtual Lingang Development Center

Industry Days

Wednesday 14 19:00 - 22:00 IJCAI 2019 Conference Dinner (The Parisian Macao, Level 5, the Parisian Ballroom)

IJCAI 2019 Conference Dinner

IJCAI 2019 Conference Dinner

Wednesday 14 19:00 - 22:00 IJCAI 2019 Student Reception (Banyan Tree Macau @Galaxy Macau)

IJCAI 2019 Student Reception

IJCAI 2019 Student Reception

Thursday 15 08:30 - 09:20 Invited Talk (D-I)

Chair: Jeff Rosenschein

Formal Synthesis for Robots
Hadas Kress-Gazit

Invited Talk

Thursday 15 09:30 - 10:15 KRR|GSTR - Geometric, Spatial, and Temporal Reasoning 1 (2401-2402)

Chair: Egor Kostylev

#1122

Graph WaveNet for Deep Spatial-Temporal Graph Modeling
Zonghan Wu, Shirui Pan, Guodong Long, Jing Jiang, Chengqi Zhang
Details | PDF

Geometric, Spatial, and Temporal Reasoning 1

Spatial-temporal graph modeling is an important task to analyze the spatial relations and temporal trends of components in a system. Existing approaches mostly capture the spatial dependency on a fixed graph structure, assuming that the underlying relation between entities is pre-determined. However, the explicit graph structure (relation) does not necessarily reflect the true dependency and genuine relation may be missing due to the incomplete connections in the data. Furthermore, existing methods are ineffective to capture the temporal trends as the RNNs or CNNs employed in these methods cannot capture long-range temporal sequences. To overcome these limitations, we propose in this paper a novel graph neural network architecture, {Graph WaveNet}, for spatial-temporal graph modeling. By developing a novel adaptive dependency matrix and learn it through node embedding, our model can precisely capture the hidden spatial dependency in the data. With a stacked dilated 1D convolution component whose receptive field grows exponentially as the number of layers increases, Graph WaveNet is able to handle very long sequences. These two components are integrated seamlessly in a unified framework and the whole framework is learned in an end-to-end manner. Experimental results on two public traffic network datasets, METR-LA and PEMS-BAY, demonstrate the superior performance of our algorithm.
#2786

Cross-City Transfer Learning for Deep Spatio-Temporal Prediction
Leye Wang, Xu Geng, Xiaojuan Ma, Feng Liu, Qiang Yang
Details | PDF

Geometric, Spatial, and Temporal Reasoning 1

Spatio-temporal prediction is a key type of tasks in urban computing, e.g., traffic flow and air quality. Adequate data is usually a prerequisite, especially when deep learning is adopted. However, the development levels of different cities are unbalanced, and still many cities suffer from data scarcity. To address the problem, we propose a novel cross-city transfer learning method for deep spatio-temporal prediction tasks, called RegionTrans. RegionTrans aims to effectively transfer knowledge from a data-rich source city to a data-scarce target city. More specifically, we first learn an inter-city region matching function to match each target city region to a similar source city region. A neural network is designed to effectively extract region-level representation for spatio-temporal prediction. Finally, an optimization algorithm is proposed to transfer learned features from the source city to the target city with the region matching function. Using citywide crowd flow prediction as a demonstration experiment, we verify the effectiveness of RegionTrans. Results show that RegionTrans can outperform the state-of-the-art fine-tuning deep spatio-temporal prediction models by reducing up to 10.7% prediction error.
#5150

Data Complexity and Rewritability of Ontology-Mediated Queries in Metric Temporal Logic under the Event-Based Semantics
Vladislav Ryzhikov, Przemyslaw Andrzej Walega, Michael Zakharyaschev
Details | PDF

Geometric, Spatial, and Temporal Reasoning 1

We investigate the data complexity of answering queries mediated by metric temporal logic ontologies under the event-based semantics assuming that data instances are finite timed words timestamped with binary fractions. We identify classes of ontology-mediated queries answering which can be done in AC0, NC1, L, NL, P, and coNP for data complexity, provide their rewritings to first-order logic and its extensions with primitive recursion, transitive closure or datalog, and establish lower complexity bounds.

Thursday 15 09:30 - 10:30 Industry Days (D-I)

Chair: Tiger Qie (Didi)

Building Intelligent Cities with Big Data and AI
Yu Zheng, Vice President, JD.com

Industry Days

Thursday 15 09:30 - 10:30 AI-HWB - ST: AI for Improving Human Well-Being 5 (J)

Chair: Frank Dignum

#5070

The Price of Local Fairness in Multistage Selection
Vitalii Emelianov, George Arvanitakis, Nicolas Gast, Krishna Gummadi, Patrick Loiseau
Details | PDF

ST: AI for Improving Human Well-Being 5

The rise of algorithmic decision making led to active researches on how to define and guarantee fairness, mostly focusing on one-shot decision making. In several important applications such as hiring, however, decisions are made in multiple stage with additional information at each stage. In such cases, fairness issues remain poorly understood. In this paper we study fairness in k-stage selection problems where additional features are observed at every stage. We first introduce two fairness notions, local (per stage) and global (final stage) fairness, that extend the classical fairness notions to the k-stage setting. We propose a simple model based on a probabilistic formulation and show that the locally and globally fair selections that maximize precision can be computed via a linear program. We then define the price of local fairness to measure the loss of precision induced by local constraints; and investigate theoretically and empirically this quantity. In particular, our experiments show that the price of local fairness is generally smaller when the sensitive attribute is observed at the first stage; but globally fair selections are more locally fair when the sensitive attribute is observed at the second stage – hence in both cases it is often possible to have a selection that has a small price of local fairness and is close to locally fair.
#5318

mdfa: Multi-Differential Fairness Auditor for Black Box Classifiers
Xavier Gitiaux, Huzefa Rangwala
Details | PDF

ST: AI for Improving Human Well-Being 5

Machine learning algorithms are increasingly involved in sensitive decision-making processes with adversarial implications on individuals. This paper presents a new tool, mdfa that identifies the characteristics of the victims of a classifier's discrimination. We measure discrimination as a violation of multi-differential fairness. Multi-differential fairness is a guarantee that a black box classifier's outcomes do not leak information on the sensitive attributes of a small group of individuals. We reduce the problem of identifying worst-case violations to matching distributions and predicting where sensitive attributes and classifier's outcomes coincide. We apply mdfa to a recidivism risk assessment classifier widely used in the United States and demonstrate that for individuals with little criminal history, identified African-Americans are three-times more likely to be considered at high risk of violent recidivism than similar non-African-Americans.
#2973

Diversity-Inducing Policy Gradient: Using Maximum Mean Discrepancy to Find a Set of Diverse Policies
Muhammad Masood, Finale Doshi-Velez
Details | PDF

ST: AI for Improving Human Well-Being 5

Standard reinforcement learning methods aim to master one way of solving a task whereas there may exist multiple near-optimal policies. Being able to identify this collection of near-optimal policies can allow a domain expert to efficiently explore the space of reasonable solutions. Unfortunately, existing approaches that quantify uncertainty over policies are not ultimately relevant to finding policies with qualitatively distinct behaviors. In this work, we formalize the difference between policies as a difference between the distribution of trajectories induced by each policy, which encourages diversity with respect to both state visitation and action choices. We derive a gradient-based optimization technique that can be combined with existing policy gradient methods to now identify diverse collections of well-performing policies. We demonstrate our approach on benchmarks and a healthcare task.
#3171

Group-Fairness in Influence Maximization
Alan Tsang, Bryan Wilder, Eric Rice, Milind Tambe, Yair Zick
Details | PDF

ST: AI for Improving Human Well-Being 5

Influence maximization is a widely used model for information dissemination in social networks. Recent work has employed such interventions across a wide range of social problems, spanning public health, substance abuse, and international development (to name a few examples). A critical but understudied question is whether the benefits of such interventions are fairly distributed across different groups in the population; e.g., avoiding discrimination with respect to sensitive attributes such as race or gender. Drawing on legal and game-theoretic concepts, we introduce formal definitions of fairness in influence maximization. We provide an algorithmic framework to find solutions which satisfy fairness constraints, and in the process improve the state of the art for general multi-objective submodular maximization problems. Experimental results on real data from an HIV prevention intervention for homeless youth show that standard influence maximization techniques oftentimes neglect smaller groups which contribute less to overall utility, resulting in a disparity which our proposed algorithms substantially reduce.

Thursday 15 09:30 - 10:30 ML|EM - Ensemble Methods 1 (L)

Chair: Xiangyuan Lan

#1559

AugBoost: Gradient Boosting Enhanced with Step-Wise Feature Augmentation
Philip Tannor, Lior Rokach
Details | PDF

Ensemble Methods 1

Gradient Boosted Decision Trees (GBDT) is a widely used machine learning algorithm, which obtains state-of-the-art results on many machine learning tasks. In this paper we introduce a method for obtaining better results, by augmenting the features in the dataset between the iterations of GBDT. We explore a number of augmentation methods: training an Artificial Neural Network (ANN) and extracting features from it's last hidden layer (supervised), and rotating the feature-space using unsupervised methods such as PCA or Random Projection (RP). These variations on GBDT were tested on 20 classification tasks, on which all of them outperformed GBDT and previous related work.
#5434

Ensemble-based Ultrahigh-dimensional Variable Screening
Wei Tu, Dong Yang, Linglong Kong, Menglu Che, Qian Shi, Guodong Li, Guangjian Tian
Details | PDF

Ensemble Methods 1

Since the sure independence screening (SIS) method by Fan and Lv, many different variable screening methods have been proposed based on different measures under different models. However, most of these methods are designed for specific models. In practice, we often have very little information about the data generating process and different methods can result in very different sets of features. The heterogeneity presented here motivates us to combine various screening methods simultaneously. In this paper, we introduce a general ensemble-based framework to efficiently combine results from multiple variable screening methods. The consistency and sure screening property of proposed framework has been established. Extensive simulation studies confirm our intuition that the proposed ensemble-based method is more robust against model specification than using single variable screening method. The proposed ensemble-based method is used to predict attention deficit hyperactivity disorder (ADHD) status using brain function connectivity (FC).
#5661

Deep Variational Koopman Models: Inferring Koopman Observations for Uncertainty-Aware Dynamics Modeling and Control
Jeremy Morton, Freddie D. Witherden, Mykel J. Kochenderfer
Details | PDF

Ensemble Methods 1

Koopman theory asserts that a nonlinear dynamical system can be mapped to a linear system, where the Koopman operator advances observations of the state forward in time. However, the observable functions that map states to observations are generally unknown. We introduce the Deep Variational Koopman (DVK) model, a method for inferring distributions over observations that can be propagated linearly in time. By sampling from the inferred distributions, we obtain a distribution over dynamical models, which in turn provides a distribution over possible outcomes as a modeled system advances in time. Experiments show that the DVK model is effective at long-term prediction for a variety of dynamical systems. Furthermore, we describe how to incorporate the learned models into a control framework, and demonstrate that accounting for the uncertainty present in the distribution over dynamical models enables more effective control.
#2704

Gradient Boosting with Piece-Wise Linear Regression Trees
Yu Shi, Jian Li, Zhize Li
Details | PDF

Ensemble Methods 1

Gradient Boosted Decision Trees (GBDT) is a very successful ensemble learning algorithm widely used across a variety of applications. Recently, several variants of GBDT training algorithms and implementations have been designed and heavily optimized in some very popular open sourced toolkits including XGBoost, LightGBM and CatBoost. In this paper, we show that both the accuracy and efficiency of GBDT can be further enhanced by using more complex base learners. Specifically, we extend gradient boosting to use piecewise linear regression trees (PL Trees), instead of piecewise constant regression trees, as base learners. We show that PL Trees can accelerate convergence of GBDT and improve the accuracy. We also propose some optimization tricks to substantially reduce the training time of PL Trees, with little sacrifice of accuracy. Moreover, we propose several implementation techniques to speedup our algorithm on modern computer architectures with powerful Single Instruction Multiple Data (SIMD) parallelism. The experimental results show that GBDT with PL Trees can provide very competitive testing accuracy with comparable or less training time.

Thursday 15 09:30 - 10:30 UAI|API - Approximate Probabilistic Inference (2701-2702)

Chair: Joao Marques Silva

#1206

On Constrained Open-World Probabilistic Databases
Tal Friedman, Guy Van den Broeck
Details | PDF

Approximate Probabilistic Inference

Increasing amounts of available data have led to a heightened need for representing large-scale probabilistic knowledge bases. One approach is to use a probabilistic database, a model with strong assumptions that allow for efficiently answering many interesting queries. Recent work on open-world probabilistic databases strengthens the semantics of these probabilistic databases by discarding the assumption that any information not present in the data must be false. While intuitive, these semantics are not sufficiently precise to give reasonable answers to queries. We propose overcoming these issues by using constraints to restrict this open world. We provide an algorithm for one class of queries, and establish a basic hardness result for another. Finally, we propose an efficient and tight approximation for a large class of queries.
#5342

Lifted Message Passing for Hybrid Probabilistic Inference
Yuqiao Chen, Nicholas Ruozzi, Sriraam Natarajan
Details | PDF

Approximate Probabilistic Inference

Lifted inference algorithms for first-order logic models, e.g., Markov logic networks (MLNs), have been of significant interest in recent years. Lifted inference methods exploit model symmetries in order to reduce the size of the model and, consequently, the computational cost of inference. In this work, we consider the problem of lifted inference in MLNs with continuous or both discrete and continuous groundings. Existing work on lifting with continuous groundings has mostly been limited to special classes of models, e.g., Gaussian models, for which variable elimination or message-passing updates can be computed exactly. Here, we develop approximate lifted inference schemes based on particle sampling. We demonstrate empirically that our approximate lifting schemes perform comparably to existing state-of-the-art for models for Gaussian MLNs, while having the flexibility to be applied to models with arbitrary potential functions.
#5427

Bayesian Parameter Estimation for Nonlinear Dynamics Using Sensitivity Analysis
Yi Chou, Sriram Sankaranarayanan
Details | PDF

Approximate Probabilistic Inference

We investigate approximate Bayesian inference techniques for nonlinear systems described by ordinary differential equation (ODE) models. In particular, the approximations will be based on set-valued reachability analysis approaches, yielding approximate models for the posterior distribution. Nonlinear ODEs are widely used to mathematically describe physical and biological models. However, these models are often described by parameters that are not directly measurable and have an impact on the system behaviors. Often, noisy measurement data combined with physical/biological intuition serve as the means for finding appropriate values of these parameters. Our approach operates under a Bayesian framework, given prior distribution over the parameter space and noisy observations under a known sampling distribution. We explore subsets of the space of model parameters, computing bounds on the likelihood for each subset. This is performed using nonlinear set-valued reachability analysis that is made faster by means of linearization around a reference trajectory. The tiling of the parameter space can be adaptively refined to make bounds on the likelihood tighter. We evaluate our approach on a variety of nonlinear benchmarks and compare our results with Markov Chain Monte Carlo and Sequential Monte Carlo approaches.

Thursday 15 09:30 - 10:30 AMS|CSC - Computational Social Choice 2 (2703-2704)

Chair: Edith Elkind

#90

Approval-Based Elections and Distortion of Voting Rules
Grzegorz Pierczyński, Piotr Skowron
Details | PDF

Computational Social Choice 2

We consider elections where both voters and candidates can be associated with points in a metric space and voters prefer candidates that are closer to those that are farther away. It is often assumed that the optimal candidate is the one that minimizes the total distance to the voters. Yet, the voting rules often do not have access to the metric space M and only see preference rankings induced by M. Consequently, they often are incapable of selecting the optimal candidate. The distortion of a voting rule measures the worst-case loss of the quality being the result of having access only to preference rankings. We extend the idea of distortion to approval-based preferences. First, we compute the distortion of Approval Voting. Second, we introduce the concept of acceptability-based distortion---the main idea behind is that the optimal candidate is the one that is acceptable to most voters. We determine acceptability-distortion for a number of rules, including Plurality, Borda, k-Approval, Veto, Copeland, Ranked Pairs, the Schulze's method, and STV.
#4615

Complexity of Manipulating and Controlling Approval-Based Multiwinner Voting
Yongjie Yang
Details | PDF

Computational Social Choice 2

We study the complexity of several manipulation and control problems for six prevalent approval based multiwinner voting rules. We show that these rules generally resist the proposed strategic types. In addition, we also give fixed-parameter tractability results for these problems with respect to several natural parameters and derive polynomial-time algorithms for certain special cases.
#216

A Quantitative Analysis of Multi-Winner Rules
Martin Lackner, Piotr Skowron
Details | PDF

Computational Social Choice 2

To choose a suitable multi-winner voting rule is a hard and ambiguous task. Depending on the context, it varies widely what constitutes the choice of an "optimal" subset.In this paper, we offer a new perspective on measuring the quality of such subsets and---consequently---of multi-winner rules. We provide a quantitative analysis using methods from the theory of approximation algorithms and estimate how well multi-winner rules approximate two extreme objectives: diversity as captured by the Approval Chamberlin--Courant rule and individual excellence as captured by Multi-winner Approval Voting. With both theoretical and experimental methods we classify multi-winner rules in terms of their quantitative alignment with these two opposing objectives.
#715

Aggregating Incomplete Pairwise Preferences by Weight
Zoi Terzopoulou, Ulle Endriss
Details | PDF

Computational Social Choice 2

We develop a model for the aggregation of preferences that do not need to be either complete or transitive. Our focus is on the normative characterisation of aggregation rules under which each agent has a weight that depends only on the size of her ballot, i.e., on the number of pairs of alternatives for which she chooses to report a relative ranking. We show that for rules that satisfy a restricted form of majoritarianism these weights in fact must be constant, while for rules that are invariant under agents with compatible preferences forming pre-election pacts it must be the case that an agent's weight is inversely proportional to the size of her ballot.

Thursday 15 09:30 - 10:30 PS|POMDP - POMDPs (2705-2706)

Chair: Nicola Gigante

#2577

Approximability of Constant-horizon Constrained POMDP
Majid Khonji, Ashkan Jasour, Brian Williams
Details | PDF

POMDPs

Partially Observable Markov Decision Process (POMDP) is a fundamental framework for planning and decision making under uncertainty. POMDP is known to be intractable to solve or even approximate when the planning horizon is long (i.e., within a polynomial number of time steps). Constrained POMDP (C-POMDP) allows constraints to be specified on some aspects of the policy in addition to the objective function. When the constraints involve bounding the probability of failure, the problem is called Chance-Constrained POMDP (CC-POMDP). Our first contribution is a reduction from CC-POMDP to C-POMDP and a novel Integer Linear Programming (ILP) formulation. Thus, any algorithm for the later problem can be utilized to solve any instance of the former. Second, we show that unlike POMDP, when the length of the planning horizon is constant, (C)C-POMDP is NP-Hard. Third, we present the first Fully Polynomial Time Approximation Scheme (FPTAS) that computes (near) optimal deterministic policies for constant-horizon (C)C-POMDP in polynomial time.
#4610

Influence of State-Variable Constraints on Partially Observable Monte Carlo Planning
Alberto Castellini, Georgios Chalkiadakis, Alessandro Farinelli
Details | PDF

POMDPs

Online planning methods for partially observable Markov decision processes (POMDPs) have recently gained much interest. In this paper, we propose the introduction of prior knowledge in the form of (probabilistic) relationships among discrete state-variables, for online planning based on the well-known POMCP algorithm. In particular, we propose the use of hard constraint networks and probabilistic Markov random fields to formalize state-variable constraints and we extend the POMCP algorithm to take advantage of these constraints. Results on a case study based on Rocksample show that the usage of this knowledge provides significant improvements to the performance of the algorithm. The extent of this improvement depends on the amount of knowledge encoded in the constraints and reaches the 50% of the average discounted return in the most favorable cases that we analyzed.
#5680

Counterexample-Guided Strategy Improvement for POMDPs Using Recurrent Neural Networks
Steven Carr, Nils Jansen, Ralf Wimmer, Alexandru Serban, Bernd Becker, Ufuk Topcu
Details | PDF

POMDPs

We study strategy synthesis for partially observable Markov decision processes (POMDPs). The particular problem is to determine strategies that provably adhere to (probabilistic) temporal logic constraints. This problem is computationally intractable and theoretically hard. We propose a novel method that combines techniques from machine learning and formal verification. First, we train a recurrent neural network (RNN) to encode POMDP strategies. The RNN accounts for memory-based decisions without the need to expand the full belief space of a POMDP. Secondly, we restrict the RNN-based strategy to represent a finite-memory strategy and implement it on a specific POMDP. For the resulting finite Markov chain, efficient formal verification techniques provide provable guarantees against temporal logic specifications. If the specification is not satisfied, counterexamples supply diagnostic information. We use this information to improve the strategy by iteratively training the RNN. Numerical experiments show that the proposed method elevates the state of the art in POMDP solving by up to three orders of magnitude in terms of solving times and model sizes.
#1193

Regular Decision Processes: A Model for Non-Markovian Domains
Ronen I. Brafman, Giuseppe De Giacomo
Details | PDF

POMDPs

We introduce and study Regular Decision Processes (RDPs), a new, compact, factored model for domains with non-Markovian dynamics and rewards. In RDPs, transition and reward functions are specified using formulas in linear dynamic logic over finite traces, a language with the expressive power of regular expressions. This allows specifying complex dependence on the past using intuitive and compact formulas, and provides a model that generalizes MDPs and k-order MDPs. RDPs can also approximate POMDPs without having to postulate the existence of hidden variables, and, in principle, can be learned from observations only.

Thursday 15 09:30 - 10:30 MLA|AUL - Applications of Unsupervised Learning (2601-2602)

Chair: Lingfei Wu

#913

Pseudo Supervised Matrix Factorization in Discriminative Subspace
Jiaqi Ma, Yipeng Zhang, Lefei Zhang, Bo Du, Dapeng Tao
Details | PDF

Applications of Unsupervised Learning

Non-negative Matrix Factorization (NMF) and spectral clustering have been proved to be efficient and effective for data clustering tasks and have been applied to various real-world scenes. However, there are still some drawbacks in traditional methods: (1) most existing algorithms only consider high-dimensional data directly while neglect the intrinsic data structure in the low-dimensional subspace; (2) the pseudo-information got in the optimization process is not relevant to most spectral clustering and manifold regularization methods. In this paper, a novel unsupervised matrix factorization method, Pseudo Supervised Matrix Factorization (PSMF), is proposed for data clustering. The main contributions are threefold: (1) to cluster in the discriminant subspace, Linear Discriminant Analysis (LDA) combines with NMF to become a unified framework; (2) we propose a pseudo supervised manifold regularization term which utilizes the pseudo-information to instruct the regularization term in order to find subspace that discriminates different classes; (3) an efficient optimization algorithm is designed to solve the proposed problem with proved convergence. Extensive experiments on multiple benchmark datasets illustrate that the proposed model outperforms other state-of-the-art clustering algorithms.
#4390

A Quantum-inspired Classical Algorithm for Separable Non-negative Matrix Factorization
Zhihuai Chen, Yinan Li, Xiaoming Sun, Pei Yuan, Jialin Zhang
Details | PDF

Applications of Unsupervised Learning

Non-negative Matrix Factorization (NMF) asks to decompose a (entry-wise) non-negative matrix into the product of two smaller-sized nonnegative matrices, which has been shown intractable in general. In order to overcome this issue, separability assumption is introduced which assumes all data points are in a conical hull. This assumption makes NMF tractable and widely used in text analysis and image processing, but still impractical for huge-scale datasets. In this paper, inspired by recent development on dequantizing techniques, we propose a new classical algorithm for separable NMF problem. Our new algorithm runs in polynomial time in the rank and logarithmic in the size of input matrices, which achieves an exponential speedup in the low-rank setting.
#3094

Learning a Generative Model for Fusing Infrared and Visible Images via Conditional Generative Adversarial Network with Dual Discriminators
Han Xu, Pengwei Liang, Wei Yu, Junjun Jiang, Jiayi Ma
Details | PDF

Applications of Unsupervised Learning

In this paper, we propose a new end-to-end model, called dual-discriminator conditional generative adversarial network (DDcGAN), for fusing infrared and visible images of different resolutions. Unlike the pixel-level methods and existing deep learning-based methods, the fusion task is accomplished through the adversarial process between a generator and two discriminators, in addition to the specially designed content loss. The generator is trained to generate real-like fused images to fool discriminators. The two discriminators are trained to calculate the JS divergence between the probability distribution of downsampled fused images and infrared images, and the JS divergence between the probability distribution of gradients of fused images and gradients of visible images, respectively. Thus, the fused images can compensate for the features that are not constrained by the single content loss. Consequently, the prominence of thermal targets in the infrared image and the texture details in the visible image can be preserved or even enhanced in the fused image simultaneously. Moreover, by constraining and distinguishing between the downsampled fused image and the low-resolution infrared image, DDcGAN can be preferably applied to the fusion of different resolution images. Qualitative and quantitative experiments on publicly available datasets demonstrate the superiority of our method over the state-of-the-art.
#909

LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs
Weibin Meng, Ying Liu, Yichen Zhu, Shenglin Zhang, Dan Pei, Yuqing Liu, Yihao Chen, Ruizhi Zhang, Shimin Tao, Pei Sun, Rong Zhou
Details | PDF

Applications of Unsupervised Learning

Recording runtime status via logs is common for almost every computer system, and detecting anomalies in logs is crucial for timely identifying malfunctions of systems. However, manually detecting anomalies for logs is time-consuming, error-prone, and infeasible. Existing automatic log anomaly detection approaches, using indexes rather than semantics of log templates, tend to cause false alarms. In this work, we propose LogAnomaly, a framework to model unstructured a log stream as a natural language sequence. Empowered by template2vec, a novel, simple yet effective method to extract the semantic information hidden in log templates, LogAnomaly can detect both sequential and quantitive log anomalies simultaneously, which were not done by any previous work. Moreover, LogAnomaly can avoid the false alarms caused by the newly appearing log templates between periodic model retrainings. Our evaluation on two public production log datasets show that LogAnomaly outperforms existing log-based anomaly detection methods.

Thursday 15 09:30 - 10:30 KRR|CCR - Computational Complexity of Reasoning 1 (2603-2604)

Chair: Stefan Borgwardt

#2282

Reasoning about Disclosure in Data Integration in the Presence of Source Constraints
Michael Benedikt, Pierre Bourhis, Louis Jachiet, Michaël Thomazo
Details | PDF

Computational Complexity of Reasoning 1

Data integration systems allow users to access data sitting in multiple sources by means of queries over a global schema, related to the sources via mappings. Datasources often contain sensitive information, and thus an analysis is needed to verify that a schema satisfies a privacy policy, given as a set of queries whose answers should not be accessible to users. Such an analysis should take into account not only knowledge that an attacker may have about the mappings, but also what they may know about the semantics of the sources.In this paper, we show that source constraints can have a dramatic impact on disclosure analysis. We study the problem of determining whether a given data integration system discloses a source query to an attacker in the presence of constraints, providing both lower and upper bounds on source-aware disclosure analysis.
#4211

Enriching Ontology-based Data Access with Provenance
Diego Calvanese, Davide Lanti, Ana Ozaki, Rafael Penaloza, Guohui Xiao
Details | PDF

Computational Complexity of Reasoning 1

Ontology-based data access (OBDA) is a popular paradigm for querying heterogeneous data sources by connecting them through mappings to an ontology. In OBDA, it is often difficult to reconstruct why a tuple occurs in the answer of a query. We address this challenge by enriching OBDA with provenance semirings, taking inspiration from database theory. In particular, we investigate the problems of (i) deciding whether a provenance annotated OBDA instance entails a provenance annotated conjunctive query, and (ii) computing a polynomial representing the provenance of a query entailed by a provenance annotated OBDA instance. Differently from pure databases, in our case, these polynomials may be infinite. To regain finiteness, we consider idempotent semirings, and study the complexity in the case of DL-LiteR ontologies. We implement Task (ii) in a state-of-the-art OBDA system and show the practical feasibility of the approach through an extensive evaluation against two popular benchmarks.
#4223

Mixed-World Reasoning with Existential Rules under Active-Domain Semantics
Meghyn Bienvenu, Pierre Bourhis
Details | PDF

Computational Complexity of Reasoning 1

In this paper, we study reasoning with existential rules in a setting where some of the predicates may be closed (i.e., their content is fully specified by the data instance) and the remaining open predicates are interpreted under active-domain semantics. We show, unsurprisingly, that the main reasoning tasks (satisfiability and certainty / possibility of Boolean queries) are all intractable in data complexity in the general case. However, several positive (PTIME data) results are obtained for the linear fragment, and interestingly, these tractability results hold also for various extensions, e.g., with negated closed atoms and disjunctive rule heads. This motivates us to take a closer look at the linear fragment, exploring its expressivity and defining a fixpoint extension to approximate non-linear rules.
#3536

On Computational Complexity of Pickup-and-Delivery Problems with Precedence Constraints or Time Windows
Xing Tan, Jimmy Xiangji Huang
Details | PDF

Computational Complexity of Reasoning 1

Pickup-and-Delivery (PD) problems consider routing vehicles to achieve a set of tasks related to ``Pickup'', and to ``Delivery''. Meanwhile these tasks might subject to Precedence Constraints (PDPC) or Time Windows (PDTW). PD is a variant to Vehicle Routing Problems (VRP), which have been extensively studied for decades. In the recent years, PD demonstrates its closer relevance to AI. With an awareness that few work has been dedicated so far in addressing where the tractability boundary line can be drawn for PD problems, we identify in this paper a set of highly restricted PD problems and prove their NP-completeness. Many problems from a multitude of applications and industry domains are general versions of PDPC. Thus this new result of NP-hardness, of PDPC, not only clarifies the computational complexity of these problems, but also sets up a firm base for the requirement on use of approximation or heuristics, as opposed to looking for exact but intractable algorithms for solving them. We move on to perform an empirical study to locate sources of intractability in PD problems. That is, we propose a local-search formalism and algorithm for solving PDPC problems in particular. Experimental results support strongly effectiveness and efficiency of the local-search. Using the local-search as a solver for randomly generated PDPC problem instances, we obtained interesting and potentially useful insights regarding computational hardness of PDPC and PD.

Thursday 15 09:30 - 10:30 NLP|IR - Information Retrieval (2605-2606)

Chair: Xiaojun Quan

#417

Knowledge Aware Semantic Concept Expansion for Image-Text Matching
Botian Shi, Lei Ji, Pan Lu, Zhendong Niu, Nan Duan
Details | PDF

Information Retrieval

Image-text matching is a vital cross-modality task in artificial intelligence and has attracted increasing attention in recent years. Existing works have shown that learning semantic concepts is useful to enhance image representation and can significantly improve the performance of both image-to-text and text-to-image retrieval. However, existing models simply detect semantic concepts from a given image, which are less likely to deal with long-tail and occlusion concepts. Frequently co-occurred concepts in the same scene, e.g. bedroom and bed, can provide common-sense knowledge to discover other semantic-related concepts. In this paper, we develop a Scene Concept Graph (SCG) by aggregating image scene graphs and extracting frequently co-occurred concept pairs as scene common-sense knowledge. Moreover, we propose a novel model to incorporate this knowledge to improve image-text matching. Specifically, semantic concepts are detected from images and then expanded by the SCG. After learning to select relevant contextual concepts, we fuse their representations with the image embedding feature to feed into the matching module. Extensive experiments are conducted on Flickr30K and MSCOCO datasets, and prove that our model achieves state-of-the-art results due to the effectiveness of incorporating the external SCG.
#1744

Improving Multilingual Sentence Embedding using Bi-directional Dual Encoder with Additive Margin Softmax
Yinfei Yang, Gustavo Hernandez Abrego, Steve Yuan, Mandy Guo, Qinlan Shen, Daniel Cer, Yun-hsuan Sung, Brian Strope, Ray Kurzweil
Details | PDF

Information Retrieval

In this paper, we present an approach to learn multilingual sentence embeddings using a bi-directional dual-encoder with additive margin softmax. The embeddings are able to achieve state-of-the-art results on the United Nations (UN) parallel corpus retrieval task. In all the languages tested, the system achieves P@1 of 86% or higher. We use pairs retrieved by our approach to train NMT models that achieve similar performance to models trained on gold pairs. We explore simple document-level embeddings constructed by averaging our sentence embeddings. On the UN document-level retrieval task, document embeddings achieve around 97% on P@1 for all experimented language pairs. Lastly, we evaluate the proposed model on the BUCC mining task. The learned embeddings with raw cosine similarity scores achieve competitive results compared to current state-of-the-art models, and with a second-stage scorer we achieve a new state-of-the-art level on this task.
#2165

RLTM: An Efficient Neural IR Framework for Long Documents
Chen Zheng, Yu Sun, Shengxian Wan, Dianhai Yu
Details | PDF

Information Retrieval

Deep neural networks have achieved significant improvements in information retrieval (IR). However, most existing models are computational costly and can not efficiently scale to long documents. This paper proposes a novel End-to-End neural ranking framework called Reinforced Long Text Matching (RLTM) which matches a query with long documents efficiently and effectively. The core idea behind the framework can be analogous to the human judgment process which firstly locates the relevance parts quickly from the whole document and then matches these parts with the query carefully to obtain the final label. Firstly, we select relevant sentences from the long documents by a coarse and efficient matching model. Secondly, we generate a relevance score by a more sophisticated matching model based on the sentence selected. The whole model is trained jointly with reinforcement learning in a pairwise manner by maximizing the expected score gaps between positive and negative examples. Experimental results demonstrate that RLTM has greatly improved the efficiency and effectiveness of the states-of-the-art models.
#6207

Revealing Semantic Structures of Texts: Multi-grained Framework for Automatic Mind-map Generation
Yang Wei, Honglei Guo, Jinmao Wei, Zhong Su
Details | PDF

Information Retrieval

A mind-map is a diagram used to represent ideas linked to and arranged around a central concept. It’s easier to visually access the knowledge and ideas by converting a text to a mind-map. However, highlighting the semantic skeleton of an article remains a challenge. The key issue is to detect the relations amongst concepts beyond intra-sentence. In this paper, we propose a multi-grained framework for automatic mind-map generation. That is, a novel neural network is taken to detect the relations at first, which employs multi-hop self-attention and gated recurrence network to reveal the directed semantic relations via sentences. A recursive algorithm is then designed to select the most salient sentences to constitute the hierarchy. The human-like mind-map is automatically constructed with the key phrases in the salient sentences. Promising results have been achieved on the comparison with manual mind-maps. The case studies demonstrate that the generated mind-maps reveal the underlying semantic structures of the articles.

Thursday 15 09:30 - 10:30 CV|LV - Language and Vision 2 (2501-2502)

Chair: Chao Ma

#4960

Densely Supervised Hierarchical Policy-Value Network for Image Paragraph Generation
Siying Wu, Zheng-Jun Zha, Zilei Wang, Houqiang Li, Feng Wu
Details | PDF

Language and Vision 2

Image paragraph generation aims to describe an image with a paragraph in natural language. Compared to image captioning with a single sentence, paragraph generation provides more expressive and fine-grained description for storytelling. Existing approaches mainly optimize paragraph generator towards minimizing word-wise cross entropy loss, which neglects linguistic hierarchy of paragraph and results in ``sparse" supervision for generator learning. In this paper, we propose a novel Densely Supervised Hierarchical Policy-Value (DHPV) network for effective paragraph generation. We design new hierarchical supervisions consisting of hierarchical rewards and values at both sentence and word levels. The joint exploration of hierarchical rewards and values provides dense supervision cues for learning effective paragraph generator. We propose a new hierarchical policy-value architecture which exploits compositionality at token-to-token and sentence-to-sentence levels simultaneously and can preserve the semantic and syntactic constituent integrity. Extensive experiments on the Stanford image-paragraph benchmark have demonstrated the effectiveness of the proposed DHPV approach with performance improvements over multiple state-of-the-art methods.
#6309

Generative Visual Dialogue System via Weighted Likelihood Estimation
Heming Zhang, Shalini Ghosh, Larry Heck, Stephen Walsh, Junting Zhang, Jie Zhang, C.-C. Jay Kuo
Details | PDF

Language and Vision 2

The key challenge of generative Visual Dialogue (VD) systems is to respond to human queries with informative answers in natural and contiguous conversation flow. Traditional Maximum Likelihood Estimation-based methods only learn from positive responses but ignore the negative responses, and consequently tend to yield safe or generic responses. To address this issue, we propose a novel training scheme in conjunction with weighted likelihood estimation method. Furthermore, an adaptive multi-modal reasoning module is designed, to accommodate various dialogue scenarios automatically and select relevant information accordingly. The experimental results on the VisDial benchmark demonstrate the superiority of our proposed algorithm over other state-of-the-art approaches, with an improvement of 5.81% on recall@10.
#4479

Swell-and-Shrink: Decomposing Image Captioning by Transformation and Summarization
Hanzhang Wang, Hanli Wang, Kaisheng Xu
Details | PDF

Language and Vision 2

Image captioning is currently viewed as a problem analogous to machine translation. However, it always suffers from poor interpretability, coarse or even incorrect descriptions on regional details. Moreover, information abstraction and compression, as essential characteristics of captioning, are always overlooked and seldom discussed. To overcome the shortcomings, a swell-shrink method is proposed to redefine image captioning as a compositional task which consists of two separated modules: modality transformation and text compression. The former is guaranteed to accurately transform adequate visual content into textual form while the latter consists of a hierarchical LSTM which particularly emphasizes on removing the redundancy among multiple phrases and organizing the final abstractive caption. Additionally, the order and quality of region of interest and modality processing are studied to give insights of better understanding the influence of regional visual cues on language forming. Experiments demonstrate the effectiveness of the proposed method.
#323

Connectionist Temporal Modeling of Video and Language: a Joint Model for Translation and Sign Labeling
Dan Guo, Shengeng Tang, Meng Wang
Details | PDF

Language and Vision 2

Online sign interpretation suffers from challenges presented by hybrid semantics learning among sequential variations of visual representations, sign linguistics, and textual grammars. This paper proposes a Connectionist Temporal Modeling (CTM) network for sentence translation and sign labeling. To acquire short-term temporal correlations, a Temporal Convolution Pyramid (TCP) module is performed on 2D CNN features to realize (2D+1D)=pseudo 3D' CNN features. CTM aligns the pseudo 3D' with the original 3D CNN clip features and fuses them. Next, we implement a connectionist decoding scheme for long-term sequential learning. Here, we embed dynamic programming into the decoding scheme, which learns temporal mapping among features, sign labels, and the generated sentence directly. The solution using dynamic programming to sign labeling is considered as pseudo labels. Finally, we utilize the pseudo supervision cues in an end-to-end framework. A joint objective function is designed to measure feature correlation, entropy regularization on sign labeling, and probability maximization on sentence decoding. The experimental results using the RWTH-PHOENIX-Weather and USTC-CSL datasets demonstrate the effectiveness of the proposed approach.

Thursday 15 09:30 - 10:30 ML|SSL - Semi-Supervised Learning 2 (2503-2504)

Chair: Zhao Kang

#2799

Sequential and Diverse Recommendation with Long Tail
Yejin Kim, Kwangseob Kim, Chanyoung Park, Hwanjo Yu
Details | PDF

Semi-Supervised Learning 2

Sequential recommendation is a task that learns a temporal dynamic of a user behavior in sequential data and predicts items that a user would like afterward. However, diversity has been rarely emphasized in the context of sequential recommendation. Sequential and diverse recommendation must learn temporal preference on diverse items as well as on general items. Thus, we propose a sequential and diverse recommendation model that predicts a ranked list containing general items and also diverse items without compromising significant accuracy.To learn temporal preference on diverse items as well as on general items, we cluster and relocate consumed long tail items to make a pseudo ground truth for diverse items and learn the preference on long tail using recurrent neural network, which enables us to directly learn a ranking function. Extensive online and offline experiments deployed on a commercial platform demonstrate that our models significantly increase diversity while preserving accuracy compared to the state-of-the-art sequential recommendation model, and consequently our models improve user satisfaction.
#3603

Belief Propagation Network for Hard Inductive Semi-Supervised Learning
Jaemin Yoo, Hyunsik Jeon, U Kang
Details | PDF

Semi-Supervised Learning 2

Given graph-structured data, how can we train a robust classifier in a semi-supervised setting that performs well without neighborhood information? In this work, we propose belief propagation networks (BPN), a novel approach to train a deep neural network in a hard inductive setting, where the test data are given without neighborhood information. BPN uses a differentiable classifier to compute the prior distributions of nodes, and then diffuses the priors through the graphical structure, independently from the prior computation. This separable structure improves the generalization performance of BPN for isolated test instances, compared with previous approaches that jointly use the feature and neighborhood without distinction. As a result, BPN outperforms state-of-the-art methods in four datasets with an average margin of 2.4% points in accuracy.
#3866

Deep Correlated Predictive Subspace Learning for Incomplete Multi-View Semi-Supervised Classification
Zhe Xue, Junping Du, Dawei Du, Wenqi Ren, Siwei Lyu
Details | PDF

Semi-Supervised Learning 2

Incomplete view information often results in failure cases of the conventional multi-view methods. To address this problem, we propose a Deep Correlated Predictive Subspace Learning (DCPSL) method for incomplete multi-view semi-supervised classification. Specifically, we integrate semi-supervised deep matrix factorization, correlated subspace learning, and multi-view label prediction into a unified framework to jointly learn the deep correlated predictive subspace and multi-view shared and private label predictors. DCPSL is able to learn proper subspace representation that is suitable for class label prediction, which can further improve the performance of classification. Extensive experimental results on various practical datasets demonstrate that the proposed method performs favorably against the state-of-the-art methods.
#2203

Approximate Manifold Regularization: Scalable Algorithm and Generalization Analysis
Jian Li, Yong Liu, Rong Yin, Weiping Wang
Details | PDF

Semi-Supervised Learning 2

Graph-based semi-supervised learning is one of the most popular and successful semi-supervised learning approaches. Unfortunately, it suffers from high time and space complexity, at least quadratic with the number of training samples. In this paper, we propose an efficient graph-based semi-supervised algorithm with a sound theoretical guarantee. The proposed method combines Nystrom subsampling and preconditioned conjugate gradient descent, substantially improving computational efficiency and reducing memory requirements. Extensive empirical results reveal that our method achieves the state-of-the-art performance in a short time even with limited computing resources.

Thursday 15 09:30 - 10:30 ML|DM - Data Mining 8 (2505-2506)

Chair: Senzhang Wang

#1024

Finding Statistically Significant Interactions between Continuous Features
Mahito Sugiyama, Karsten Borgwardt
Details | PDF

Data Mining 8

The search for higher-order feature interactions that are statistically significantly associated with a class variable is of high relevance in fields such as Genetics or Healthcare, but the combinatorial explosion of the candidate space makes this problem extremely challenging in terms of computational efficiency and proper correction for multiple testing. While recent progress has been made regarding this challenge for binary features, we here present the first solution for continuous features. We propose an algorithm which overcomes the combinatorial explosion of the search space of higher-order interactions by deriving a lower bound on the p-value for each interaction, which enables us to massively prune interactions that can never reach significance and to thereby gain more statistical power. In our experiments, our approach efficiently detects all significant interactions in a variety of synthetic and real-world datasets.
#1850

MEGAN: A Generative Adversarial Network for Multi-View Network Embedding
Yiwei Sun, Suhang Wang, Tsung-Yu Hsieh, Xianfeng Tang, Vasant Honavar
Details | PDF

Data Mining 8

Data from many real-world applications can be naturally represented by multi-view networks where the different views encode different types of relationships (e.g., friendship, shared interests in music, etc.) between real-world individuals or entities. There is an urgent need for methods to obtain low-dimensional, information preserving and typically nonlinear embeddings of such multi-view networks. However, most of the work on multi-view learning focuses on data that lack a network structure, and most of the work on network embeddings has focused primarily on single-view networks. Against this background, we consider the multi-view network representation learning problem, i.e., the problem of constructing low-dimensional information preserving embeddings of multi-view networks. Specifically, we investigate a novel Generative Adversarial Network (GAN) framework for Multi-View Network Embedding, namely MEGAN, aimed at preserving the information from the individual network views, while accounting for connectivity across (and hence complementarity of and correlations between) different views. The results of our experiments on two real-world multi-view data sets show that the embeddings obtained using MEGAN outperform the state-of-the-art methods on node classification, link prediction and visualization tasks.
#2893

Privacy-aware Synthesizing for Crowdsourced Data
Mengdi Huai, Di Wang, Chenglin Miao, Jinhui Xu, Aidong Zhang
Details | PDF

Data Mining 8

Although releasing crowdsourced data brings many benefits to the data analyzers to conduct statistical analysis, it may violate crowd users' data privacy. A potential way to address this problem is to employ traditional differential privacy (DP) mechanisms and perturb the data with some noise before releasing them. However, considering that there usually exist conflicts among the crowdsourced data and these data are usually large in volume, directly using these mechanisms can not guarantee good utility in the setting of releasing crowdsourced data. To address this challenge, in this paper, we propose a novel privacy-aware synthesizing method (i.e., PrisCrowd) for crowdsourced data, based on which the data collector can release users' data with strong privacy protection for their private information, while at the same time, the data analyzer can achieve good utility from the released data. Both theoretical analysis and extensive experiments on real-world datasets demonstrate the desired performance of the proposed method.
#3237

Fast Algorithm for K-Truss Discovery on Public-Private Graphs
Soroush Ebadian, Xin Huang
Details | PDF

Data Mining 8

In public-private graphs, users share one public graph and have their own private graphs. A private graph consists of personal private contacts that only can be visible to its owner, e.g., hidden friend lists on Facebook and secret following on Sina Weibo. However, existing public-private analytic algorithms have not yet investigated the dense subgraph discovery of k-truss, where each edge is contained in at least k-2 triangles. This paper aims at finding k-truss efficiently in public-private graphs. The core of our solution is a novel algorithm to update k-truss with node insertions. We develop a classification-based hybrid strategy of node insertions and edge insertions to incrementally compute k-truss in public-private graphs. Extensive experiments validate the superiority of our proposed algorithms against state-of-the-art methods on real-world datasets.

Thursday 15 09:30 - 10:30 HAI|HCI - Human-Computer Interaction (2403-2404)

Chair: Timothy Miller

#2824

DeepFlow: Detecting Optimal User Experience From Physiological Data Using Deep Neural Networks
Marco Maier, Daniel Elsner, Chadly Marouane, Meike Zehnle, Christoph Fuchs
Details | PDF

Human-Computer Interaction

Flow is an affective state of optimal experience, total immersion and high productivity. While often associated with (professional) sports, it is a valuable information in several scenarios ranging from work environments to user experience evaluations, and we expect it to be a potential reward signal for human-in-the-loop reinforcement learning systems. Traditionally, flow has been assessed through questionnaires which prevents its use in online, real-time environments. In this work, we present our findings towards estimating a user's flow state based on physiological signals measured using wearable devices. We conducted a study with participants playing the game Tetris in varying difficulty levels, leading to boredom, stress, and flow. Using an end-to-end deep learning architecture, we achieve an accuracy of 67.50% in recognizing high flow vs. low flow states and 49.23% in distinguishing all three affective states boredom, flow, and stress.
#3446

Explaining Reinforcement Learning to Mere Mortals: An Empirical Study
Andrew Anderson, Jonathan Dodge, Amrita Sadarangani, Zoe Juozapaitis, Evan Newman, Jed Irvine, Souti Chattopadhyay, Alan Fern, Margaret Burnett
Details | PDF

Human-Computer Interaction

We present a user study to investigate the impact of explanations on non-experts? understanding of reinforcement learning (RL) agents. We investigate both a common RL visualization, saliency maps (the focus of attention), and a more recent explanation type, reward-decomposition bars (predictions of future types of rewards). We designed a 124 participant, four-treatment experiment to compare participants? mental models of an RL agent in a simple Real-Time Strategy (RTS) game. Our results show that the combination of both saliency and reward bars were needed to achieve a statistically significant improvement in mental model score over the control. In addition, our qualitative analysis of the data reveals a number of effects for further study.
#3526

Multi-agent Attentional Activity Recognition
Kaixuan Chen, Lina Yao, Dalin Zhang, Bin Guo, Zhiwen Yu
Details | PDF

Human-Computer Interaction

Multi-modality is an important feature of sensor based activity recognition. In this work, we consider two inherent characteristics of human activities, the spatially-temporally varying salience of features and the relations between activities and corresponding body part motions. Based on these, we propose a multi-agent spatial-temporal attention model. The spatial-temporal attention mechanism helps intelligently select informative modalities and their active periods. And the multiple agents in the proposed model represent activities with collective motions across body parts by independently selecting modalities associated with single motions. With a joint recognition goal, the agents share gained information and coordinate their selection policies to learn the optimal recognition model. The experimental results on four real-world datasets demonstrate that the proposed model outperforms the state-of-the-art methods.
#2537

Human-in-the-loop Active Covariance Learning for Improving Prediction in Small Data Sets
Homayun Afrabandpey, Tomi Peltola, Samuel Kaski
Details | PDF

Human-Computer Interaction

Learning predictive models from small high-dimensional data sets is a key problem in high-dimensional statistics. Expert knowledge elicitation can help, and a strong line of work focuses on directly eliciting informative prior distributions for parameters. This either requires considerable statistical expertise or is laborious, as the emphasis has been on accuracy and not on efficiency of the process. Another line of work queries about importance of features one at a time, assuming them to be independent and hence missing covariance information. In contrast, we propose eliciting expert knowledge about pairwise feature similarities, to borrow statistical strength in the predictions, and using sequential decision making techniques to minimize the effort of the expert. Empirical results demonstrate improvement in predictive performance on both simulated and real data, in high-dimensional linear regression tasks, where we learn the covariance structure with a Gaussian process, based on sequential elicitation.

Thursday 15 09:30 - 10:30 CSAT|CA - Constraint Satisfaction 2 (2405-2406)

Chair: Mohamed Siala

#10978

(Sister Conferences Best Papers Track) Constraint Games for Stable and Optimal Allocation of Demands in SDN
Anthony Palmieri, Arnaud Lallouet, Luc Pons
Details | PDF

Constraint Satisfaction 2

Software Defined Networking (or SDN) allows to apply a centralized control over a network of computers in order to provide better global throughput. One of the problem to solve is the multi-commodity flow routing where a set of demands (or commodities) have to be routed at minimum cost. In contrast with other versions of this problem, we consider here problems with congestion that change the cost of a link according to the capacity used. We propose here to study centralized routing with Constraint Programming and selfish routing with Constraint Games. Selfish routing reaches a Nash equilibrium and is important for the perceived quality of the solution since no user is able to improve his cost by changing only his own path. We present real and synthetic benchmarks with hundreds or thousands players and we show that for this problem the worst selfish routing is often close to the optimal centralized solution.
#10974

(Sister Conferences Best Papers Track) Not All FPRASs are Equal: Demystifying FPRASs for DNF-Counting
Kuldeep S. Meel, Aditya A. Shrotri, Moshe Y. Vardi
Details | PDF

Constraint Satisfaction 2

The problem of counting the number of solutions of a DNF formula, also called #DNF, is a fundamental problem in AI with wide-ranging applications. Owing to the intractability of the exact variant, efforts have focused on the design of approximate techniques. Consequently, several Fully Polynomial Randomized Approximation Schemes (FPRASs) based on Monte Carlo techniques have been proposed. Recently, it was discovered that hashing-based techniques too lend themselves to FPRASs for #DNF. Despite significant improvements, the complexity of the hashing-based FPRAS is still worse than that of the best Monte Carlo FPRAS by polylog factors. Two questions were left unanswered in previous works: Can the complexity of the hashing-based techniques be improved? How do these approaches compare empirically? In this paper, we first propose a new search procedure for the hashing-based FPRAS that removes the polylog factors from its time complexity. We then present the first empirical study of runtime behavior of different FPRASs for #DNF, which produces a nuanced picture. We observe that there is no single best algorithm for all formulas and that the algorithm with one of the worst time complexities solves the largest number of benchmarks.
#4522

Entropy-Penalized Semidefinite Programming
Mikhail Krechetov, Jakub Marecek, Yury Maximov, Martin Takac
Details | PDF

Constraint Satisfaction 2

Low-rank methods for semi-definite programming (SDP) have gained a lot of interest recently, especially in machine learning applications. Their analysis often involves determinant-based or Schatten-norm penalties, which are difficult to implement in practice due to high computational efforts. In this paper, we propose Entropy-Penalized Semi-Definite Programming (EP-SDP), which provides a unified framework for a broad class of penalty functions used in practice to promote a low-rank solution. We show that EP-SDP problems admit an efficient numerical algorithm, having (almost) linear time complexity of the gradient computation; this makes it useful for many machine learning and optimization problems. We illustrate the practical efficiency of our approach on several combinatorial optimization and machine learning problems.
#6520

Acquiring Integer Programs from Data
Mohit Kumar, Stefano Teso, Luc De Raedt
Details | PDF

Constraint Satisfaction 2

Integer programming (IP) is widely used within operations research to model and solve complex combinatorial problems such as personnel rostering and assignment problems. Modelling such problems is difficult for non-experts and expensive when hiring domain experts to perform the modelling. For many tasks, however, examples of working solutions are readily available. We propose ARNOLD, an approach that partially automates the modelling step by learning an integer program from example solutions. Contrary to existing alternatives, ARNOLD natively handles multi-dimensional quantities and non-linear operations, which are at the core of IP problems, and it only requires examples of feasible solution. The main challenge is to efficiently explore the space of possible programs. Our approach pairs a general-to-specific traversal strategy with a nested lexicographic ordering in order to prune large portions of the space of candidate constraints while avoiding visiting the same candidate multiple times. Our empirical evaluation shows that ARNOLD can acquire models for a number of realistic benchmark problems

Thursday 15 09:30 - 10:30 Early Career 3 - Early Career Spotlight 3 (2306)

Chair: Arne Jonsson

#11068

End-User Programming of General Purpose Robots
Maya Cakmak

Early Career Spotlight 3

Robots that can assist humans in everyday tasks have the potential to improve people’s quality of life and bring independence to persons with disabilities. A key challenge in realizing such robots is programming them to meet the unique and changing needs of users and to robustly function in their unique environments. Most research in robotics targets this challenge by attempting to develop universal or adaptive robotic capabilities. This approach has had limited success because it is extremely difficult to anticipate all possible scenarios and use-cases for general-purpose robots or collect massive amounts of data that represent each scenario and use-case. Instead, we aim to develop robots that can be programmed in-context and by end-users after they are deployed, tailoring it for the specific environment and user preferences. To that end, we have been developing new techniques and tools that allow intuitive and rapid programming of robots to do useful tasks. Here, we describe some of these techniques and tools and review observations from a number of user studies with potential users and real world deployments. We also discuss opportunities for automating parts of the programming process to reduce the burden on end-user programmers and improve the quality of robot programs they develop.

Thursday 15 09:30 - 18:00 DB3 - Demo Booths 3 (Hall A)

Chair: TBA

#11047

VEST: A System for Vulnerability Exploit Scoring & Timing
Haipeng Chen, Jing Liu, Rui Liu, Noseong Park, V. S. Subrahmanian
Details | PDF

Demo Booths 3

Knowing if/when a cyber-vulnerability will be exploited and how severe the vulnerability is can help enterprise security officers (ESOs) come up with appropriate patching schedules. Today, this ability is severely compromised: our study of data from Mitre and NIST shows that on average there is a 132 day gap between the announcement of a vulnerability by Mitre and the time NIST provides an analysis with severity score estimates and 8 important severity attributes. Many attacks happen during this very 132-day window. We present Vulnerability Exploit Scoring \& Timing (VEST), a system for (early) prediction and visualization of if/when a vulnerability will be exploited, and its estimated severity attributes and score.
#11020

Crowd View: Converting Investors' Opinions into Indicators
Chung-Chi Chen, Hen-Hsen Huang, Hsin-Hsi Chen
Details | PDF

Demo Booths 3

This paper demonstrates an opinion indicator (OI) generation system, named Crowd View, with which traders can refer to the fine-grained opinions, beyond the market sentiment (bullish/bearish), from crowd investors when trading financial instruments. We collect the real-time textual information from Twitter, and convert it into five kinds of OIs, including the support price level, resistance price level, price target, buy-side cost, and sell-side cost. The OIs for all component stocks in Dow Jones Industrial Average Index (DJI) are provided, and shown with the real-time stock price for comparison and analysis. The information embedding in the OIs and the application scenarios are introduced.
#11030

ERICA and WikiTalk
Divesh Lala, Graham Wilcock, Kristiina Jokinen, Tatsuya Kawahara
Details | PDF

Demo Booths 3

The demo shows ERICA, a highly realistic female android robot, and WikiTalk, an application that helps robots to talk about thousands of topics using information from Wikipedia. The combination of ERICA and WikiTalk results in more natural and engaging human-robot conversations.
#11026

An Online Intelligent Visual Interaction System
Anxiang Zeng, Han Yu, Xin Gao, Kairi Ou, Zhenchuan Huang, Peng Hou, Mingli Song, Jingshu Zhang, Chunyan Miao
Details | PDF

Demo Booths 3

This paper proposes an Online Intelligent Visual Interactive System (OIVIS), which can be applied to various live video broadcast and short video scenes to provide an interactive user experience. In the live video broadcast, the anchor can issue various commands by using pre-defined gestures, and can trigger real-time background replacement to create an immersive atmosphere. To support such dynamic interactivity, we implemented algorithms including real-time gesture recognition and real-time video portrait segmentation, developed a deep network inference framework, and a real-time rendering framework AI Gender at the front end to create a complete set of visual interaction solutions for use in resource constrained mobile.
#11044

DISPUTool -- A tool for the Argumentative Analysis of Political Debates
Shohreh Haddadan, Elena Cabrio, Serena Villata
Details | PDF

Demo Booths 3

Political debates are the means used by political candidates to put forward and justify their positions in front of the electors with respect to the issues at stake. Argument mining is a novel research area in Artificial Intelligence, aiming at analyzing discourse on the pragmatics level and applying a certain argumentation theory to model and automatically analyze textual data. In this paper, we present DISPUTool, a tool designed to ease the work of historians and social science scholars in analyzing the argumentative content of political speeches. More precisely, DISPUTool allows to explore and automatically identify argumentative components over the 39 political debates from the last 50 years of US presidential campaigns (1960-2016).
#11031

ACTA A Tool for Argumentative Clinical Trial Analysis
Tobias Mayer, Elena Cabrio, Serena Villata
Details | PDF

Demo Booths 3

Argumentative analysis of textual documents of various nature (e.g., persuasive essays, online discussion blogs, scientific articles) allows to detect the main argumentative components (i.e., premises and claims) present in the text and to predict whether these components are connected to each other by argumentative relations (e.g., support and attack), leading to the identification of (possibly complex) argumentative structures. Given the importance of argument-based decision making in medicine, in this demo paper we introduce ACTA, a tool for automating the argumentative analysis of clinical trials. The tool is designed to support doctors and clinicians in identifying the document(s) of interest about a certain disease, and in analyzing the main argumentative content and PICO elements.
#11048

Mappa Mundi: An Interactive Artistic Mind Map Generator with Artificial Imagination
Ruixue Liu, Baoyang Chen, Meng Chen, Youzheng Wu, Zhijie Qiu, Xiaodong He
Details | PDF

Demo Booths 3

We present a novel real-time, collaborative, and interactive AI painting system, Mappa Mundi, for artistic Mind Map creation. The system consists of a voice-based input interface, an automatic topic expansion module, and an image projection module. The key innovation is to inject Artificial Imagination into painting creation by considering lexical and phonological similarities of language, learning and inheriting artist’s original painting style, and applying the principles of Dadaism and impossibility of improvisation. Our system indicates that AI and artist can collaborate seamlessly to create imaginative artistic painting and Mappa Mundi has been applied in art exhibition in UCCA, Beijing.
#11027

The Open Vault Challenge - Learning How to Build Calibration-Free Interactive Systems by Cracking the Code of a Vault
Jonathan Grizou
Details | PDF

Demo Booths 3

This demo takes the form of a challenge to the IJCAI community. A physical vault, secured by a 4-digit code, will be placed in the demo area. The author will publicly open the vault by entering the code on a touch-based interface, and as many times as requested. The challenge to the IJCAI participants will be to crack the code, open the vault, and collect its content. The interface is based on previous work on calibration-free interactive systems that enables a user to start instructing a machine without the machine knowing how to interpret the user’s actions beforehand. The intent and the behavior of the human are simultaneously learned by the machine. An online demo and videos are available for readers to participate in the challenge. An additional interface using vocal commands will be revealed on the demo day, demonstrating the scalability of our approach to continuous input signals.
#11053

GraspSnooker: Automatic Chinese Commentary Generation for Snooker Videos
Zhaoyue Sun, Jiaze Chen, Hao Zhou, Deyu Zhou, Lei Li, Mingmin Jiang
Details | PDF

Demo Booths 3

We demonstrate a web-based software system, GraspSnooker, which is able to automatically generate Chinese text commentaries for snooker game videos. It consists of a video analyzer, a strategy predictor and a commentary generator. As far as we know, it is the first attempt on snooker commentary generation, which might be helpful for snooker learners to understand the game.
#11042

SAGE: A Hybrid Geopolitical Event Forecasting System
Fred Morstatter, Aram Galstyan, Gleb Satyukov, Daniel Benjamin, Andres Abeliuk, Mehrnoosh Mirtaheri, KSM Tozammel Hossain, Pedro Szekely, Emilio Ferrara, Akira Matsui, Mark Steyvers, Stephen Bennet, David Budescu, Mark Himmelstein, Michael Ward, Andreas Beger, Michele Catasta, Rok Sosic, Jure Leskovec, Pavel Atanasov, Regina Joseph, Rajiv Sethi, Ali Abbas
Details | PDF

Demo Booths 3

Forecasting of geopolitical events is a notoriously difficult task, with experts failing to significantly outperform a random baseline across many types of forecasting events. One successful way to increase the performance of forecasting tasks is to turn to crowdsourcing: leveraging many forecasts from non-expert users. Simultaneously, advances in machine learning have led to models that can produce reasonable, although not perfect, forecasts for many tasks. Recent efforts have shown that forecasts can be further improved by ``hybridizing'' human forecasters: pairing them with the machine models in an effort to combine the unique advantages of both. In this demonstration, we present Synergistic Anticipation of Geopolitical Events (SAGE), a platform for human/computer interaction that facilitates human reasoning with machine models.

Thursday 15 11:00 - 12:00 Industry Days (D-I)

Chair: Richard Tong (Squirrel AI Learning)

Large Scale End-to-End Deep Learning using PaddlePaddle - Baidu's Practice
Yanjun Ma, Director of Deep Learning Platform, Baidu

Industry Days

Thursday 15 11:00 - 12:30 Panel (K)

Chair: Carles Sierra

Highly refereed AI conferences

Panel

Thursday 15 11:00 - 12:30 AI-HWB - ST: AI for Improving Human Well-Being 6 (J)

Chair: Odd Erik Gundersen

#3353

Three-quarter Sibling Regression for Denoising Observational Data
Shiv Shankar, Daniel Sheldon, Tao Sun, John Pickering, Thomas G. Dietterich
Details | PDF

ST: AI for Improving Human Well-Being 6

Many ecological studies and conservation policies are based on field observations of species, which can be affected by systematic variability introduced by the observation process. A recently introduced causal modeling technique called 'half-sibling regression' can detect and correct for systematic errors in measurements of multiple independent random variables. However, it will remove intrinsic variability if the variables are dependent, and therefore does not apply to many situations, including modeling of species counts that are controlled by common causes. We present a technique called 'three-quarter sibling regression' to partially overcome this limitation. It can filter the effect of systematic noise when the latent variables have observed common causes. We provide theoretical justification of this approach, demonstrate its effectiveness on synthetic data, and show that it reduces systematic detection variability due to moon brightness in moth surveys.
#2908

CounterFactual Regression with Importance Sampling Weights
Negar Hassanpour, Russell Greiner
Details | PDF

ST: AI for Improving Human Well-Being 6

Perhaps the most pressing concern of a patient diagnosed with cancer is her life expectancy under various treatment options. For a binary-treatment case, this translates into estimating the difference between the outcomes (e.g., survival time) of the two available treatment options – i.e., her Individual Treatment Effect (ITE). This is especially challenging to estimate from observational data, as that data has selection bias: the treatment assigned to a patient depends on that patient's attributes. In this work, we borrow ideas from domain adaptation to address the distributional shift between the source (outcome of the administered treatment, appearing in the observed training data) and target (outcome of the alternative treatment) that exists due to selection bias. We propose a context-aware importance sampling re-weighing scheme, built on top of a representation learning module, for estimating ITEs. Empirical results on two publicly available benchmarks demonstrate that the proposed method significantly outperforms state-of-the-art.
#223

Risk Assessment for Networked-guarantee Loans Using High-order Graph Attention Representation
Dawei Cheng, Yi Tu, Zhenwei Ma, Zhibin Niu, Liqing Zhang
Details | PDF

ST: AI for Improving Human Well-Being 6

Assessing and predicting the default risk of networked-guarantee loans is critical for the commercial banks and financial regulatory authorities. The guarantee relationships between the loan companies are usually modeled as directed networks. Learning the informative low-dimensional representation of the networks is important for the default risk prediction of loan companies, even for the assessment of systematic financial risk level. In this paper, we propose a high-order graph attention representation method (HGAR) to learn the embedding of guarantee networks. Because this financial network is different from other complex networks, such as social, language, or citation networks, we set the binary roles of vertices and define high-order adjacent measures based on financial domain characteristics. We design objective functions in addition to a graph attention layer to capture the importance of nodes. We implement a productive learning strategy and prove that the complexity is near-linear with the number of edges, which could scale to large datasets. Extensive experiments demonstrate the superiority of our model over state-of-the-art method. We also evaluate the model in a real-world loan risk control system, and the results validate the effectiveness of our proposed approaches.
#295

Scribble-to-Painting Transformation with Multi-Task Generative Adversarial Networks
Jinning Li, Yexiang Xue
Details | PDF

ST: AI for Improving Human Well-Being 6

We propose the Dual Scribble-to-Painting Network (DSP-Net), which is able to produce artistic paintings based on user-generated scribbles. In scribble-to-painting transformation, a neural net has to infer additional details of the image, given relatively sparse information contained in the outlines of the scribble. Therefore, it is more challenging than classical image style transfer, in which the information content is reduced from photos to paintings. Inspired by the human cognitive process, we propose a multi-task generative adversarial network, which consists of two jointly trained neural nets -- one for generating artistic images and the other one for semantic segmentation. We demonstrate that joint training on these two tasks brings in additional benefit. Experimental result shows that DSP-Net outperforms state-of-the-art models both visually and quantitatively. In addition, we publish a large dataset for scribble-to-painting transformation.
#2506

MINA: Multilevel Knowledge-Guided Attention for Modeling Electrocardiography Signals
Shenda Hong, Cao Xiao, Tengfei Ma, Hongyan Li, Jimeng Sun
Details | PDF

ST: AI for Improving Human Well-Being 6

Electrocardiography (ECG) signals are commonly used to diagnose various cardiac abnormalities. Recently, deep learning models showed initial success on modeling ECG data, however they are mostly black-box, thus lack interpretability needed for clinical usage. In this work, we propose MultIlevel kNowledge-guided Attention networks (MINA) that predict heart diseases from ECG signals with intuitive explanation aligned with medical knowledge. By extracting multilevel (beat-, rhythm- and frequency-level) domain knowledge features separately, MINA combines the medical knowledge and ECG data via a multilevel attention model, making the learned models highly interpretable. Our experiments showed MINA achieved PR-AUC 0.9436 (outperforming the best baseline by 5.51%) in real world ECG dataset. Finally, MINA also demonstrated robust performance and strong interpretability against signal distortion and noise contamination.
#3057

Learning Interpretable Relational Structures of Hinge-loss Markov Random Fields
Yue Zhang, Arti Ramesh
Details | PDF

ST: AI for Improving Human Well-Being 6

Statistical relational models such as Markov logic networks (MLNs) and hinge-loss Markov random fields (HL-MRFs) are specified using templated weighted first-order logic clauses, leading to the creation of complex, yet easy to encode models that effectively combine uncertainty and logic. Learning the structure of these models from data reduces the human effort of identifying the right structures. In this work, we present an asynchronous deep reinforcement learning algorithm to automatically learn HL-MRF clause structures. Our algorithm possesses the ability to learn semantically meaningful structures that appeal to human intuition and understanding, while simultaneously being able to learn structures from data, thus learning structures that have both the desirable qualities of interpretability and good prediction performance. The asynchronous nature of our algorithm further provides the ability to learn diverse structures via exploration, while remaining scalable. We demonstrate the ability of the models to learn semantically meaningful structures that also achieve better prediction performance when compared with a greedy search algorithm, a path-based algorithm, and manually defined clauses on two computational social science applications: i) modeling recovery in alcohol use disorder, and ii) detecting bullying.

Thursday 15 11:00 - 12:30 ML|DL - Deep Learning 6 (L)

Chair: Yiwen Guo

#665

Nostalgic Adam: Weighting More of the Past Gradients When Designing the Adaptive Learning Rate
Haiwen Huang, Chang Wang, Bin Dong
Details | PDF

Deep Learning 6

First-order optimization algorithms have been proven prominent in deep learning. In particu- lar, algorithms such as RMSProp and Adam are extremely popular. However, recent works have pointed out the lack of “long-term memory” in Adam-like algorithms, which could hamper their performance and lead to divergence. In our study, we observe that there are benefits of weighting more of the past gradients when designing the adaptive learning rate. We therefore propose an algorithm called the Nostalgic Adam (NosAdam) with theoretically guaranteed convergence at the best known convergence rate. NosAdam can be regarded as a fix to the non-convergence issue of Adam in alternative to the recent work of [Reddi et al., 2018]. Our preliminary numerical experiments show that NosAdam is a promising alternative al- gorithm to Adam. The proofs, code and other supplementary materials are already released.
#1899

Coarse-to-Fine Image Inpainting via Region-wise Convolutions and Non-Local Correlation
Yuqing Ma, Xianglong Liu, Shihao Bai, Lei Wang, Dailan He, Aishan Liu
Details | PDF

Deep Learning 6

Recently deep neural networks have achieved promising performance for filling large missing regions in image inpainting tasks. They usually adopted the standard convolutional architecture over the corrupted image, where the same convolution filters try to restore the diverse information on both existing and missing regions, and meanwhile ignores the long-distance correlation among the regions. Only relying on the surrounding areas inevitably leads to meaningless contents and artifacts, such as color discrepancy and blur. To address these problems, we first propose region-wise convolutions to locally deal with the different types of regions, which can help exactly reconstruct existing regions and roughly infer the missing ones from existing regions at the same time. Then, a non-local operation is introduced to globally model the correlation among different regions, promising visual consistency between missing and existing regions. Finally, we integrate the region-wise convolutions and non-local correlation in a coarse-to-fine framework to restore semantically reasonable and visually realistic images. Extensive experiments on three widely-used datasets for image inpainting tasks have been conducted, and both qualitative and quantitative experimental results demonstrate that the proposed model significantly outperforms the state-of-the-art approaches, especially for the large irregular missing regions.
#2830

Localizing Unseen Activities in Video via Image Query
Zhu Zhang, Zhou Zhao, Zhijie Lin, Jingkuan Song, Deng Cai
Details | PDF

Deep Learning 6

Action localization in untrimmed videos is an important topic in the field of video understanding. However, existing action localization methods are restricted to a pre-defined set of actions and cannot localize unseen activities. Thus, we consider a new task to localize unseen activities in videos via image queries, named Image-Based Activity Localization. This task faces three inherent challenges: (1) how to eliminate the influence of semantically inessential contents in image queries; (2) how to deal with the fuzzy localization of inaccurate image queries; (3) how to determine the precise boundaries of target segments. We then propose a novel self-attention interaction localizer to retrieve unseen activities in an end-to-end fashion. Specifically, we first devise a region self-attention method with relative position encoding to learn fine-grained image region representations. Then, we employ a local transformer encoder to build multi-step fusion and reasoning of image and video contents. We next adopt an order-sensitive localizer to directly retrieve the target segment. Furthermore, we construct a new dataset ActivityIBAL by reorganizing the ActivityNet dataset. The extensive experiments show the effectiveness of our method.
#3365

Light-Weight Hybrid Convolutional Network for Liver Tumor Segmentation
Jianpeng Zhang, Yutong Xie, Pingping Zhang, Hao Chen, Yong Xia, Chunhua Shen
Details | PDF

Deep Learning 6

Automated segmentation of liver tumors in contrast-enhanced abdominal computed tomography (CT) scans is essential in assisting medical professionals to evaluate tumor development and make fast therapeutic schedule. Although deep convolutional neural networks (DCNNs) have contributed many breakthroughs in image segmentation, this task remains challenging, since 2D DCNNs are incapable of exploring the inter-slice information and 3D DCNNs are too complex to be trained with the available small dataset. In this paper, we propose the light-weight hybrid convolutional network (LW-HCN) to segment the liver and its tumors in CT volumes. Instead of combining a 2D and a 3D networks for coarse-to-fine segmentation, LW-HCN has a encoder-decoder structure, in which 2D convolutions used at the bottom of the encoder decreases the complexity and 3D convolutions used in other layers explore both spatial and temporal information. To further reduce the complexity, we design the depthwise and spatiotemporal separate (DSTS) factorization for 3D convolutions, which not only reduces parameters dramatically but also improves the performance. We evaluated the proposed LW-HCN model against several recent methods on the LiTS and 3D-IRCADb datasets and achieved, respectively, the Dice per case of 73.0% and 94.1% for tumor segmentation, setting a new state of the art.
#10964

(Sister Conferences Best Papers Track) A Dual Approach to Verify and Train Deep Networks
Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Timothy Mann, Pushmeet Kohli
Details | PDF

Deep Learning 6

This paper addressed the problem of formally verifying desirable properties of neural networks, i.e., obtaining provable guarantees that neural networks satisfy specifications relating their inputs and outputs (e.g., robustness to bounded norm adversarial perturbations). Most previous work on this topic was limited in its applicability by the size of the network, network architecture and the complexity of properties to be verified. In contrast, our framework applies to a general class of activation functions and specifications. We formulate verification as an optimization problem (seeking to find the largest violation of the specification) and solve a Lagrangian relaxation of the optimization problem to obtain an upper bound on the worst case violation of the specification being verified. Our approach is anytime, i.e., it can be stopped at any time and a valid bound on the maximum violation can be obtained. Finally, we highlight how this approach can be used to train models that are amenable to verification.
#10982

(Journal track) Learning in the Machine: Random Backpropagation and the Deep Learning Channel
Pierre Baldi, Peter Sadowski, Zhiqin Lu
Details | PDF

Deep Learning 6

Random backpropagation (RBP) is a variant of the backpropagation algorithm for training neural networks, where the transpose of the forward matrices are replaced by fixed random matrices in the calculation of the weight updates. It is remarkable both because of its effectiveness, in spite of using random matrices to communicate error information, and because it completely removes the requirement of maintaining symmetric weights in a physical neural system. To better understand RBP, we compare different algorithms in terms of the information available locally to each neuron. In the process, we derive several alternatives to RBP, including skipped RBP (SRBP), adaptive RBP (ARBP), sparse RBP, and study their behavior through simulations. These simulations show that many variants are also robust deep learning algorithms, but that the derivative of the transfer function is important in the learning rule. Finally, we prove several mathematical results including the convergence to fixed points of linear chains of arbitrary length, the convergence to fixed points of linear autoencoders with decorrelated data, the long-term existence of solutions for linear systems with a single hidden layer and convergence in special cases, and the convergence to fixed points of non-linear chains, when the derivative of the activation functions is included.

Thursday 15 11:00 - 12:30 ML|RL - Reinforcement Learning 5 (2701-2702)

Chair: Balaraman Ravindran

#3006

Using Natural Language for Reward Shaping in Reinforcement Learning
Prasoon Goyal, Scott Niekum, Raymond J. Mooney
Details | PDF

Reinforcement Learning 5

Recent reinforcement learning (RL) approaches have shown strong performance in complex domains, such as Atari games, but are highly sample inefficient. A common approach to reduce interaction time with the environment is to use reward shaping, which involves carefully designing reward functions that provide the agent intermediate rewards for progress towards the goal. Designing such rewards remains a challenge, though. In this work, we use natural language instructions to perform reward shaping. We propose a framework that maps free-form natural language instructions to intermediate rewards, that can seamlessly be integrated into any standard reinforcement learning algorithm. We experiment with Montezuma's Revenge from the Atari video games domain, a popular benchmark in RL. Our experiments on a diverse set of 15 tasks demonstrate that for the same number of interactions with the environment, using language-based rewards can successfully complete the task 60% more often, averaged across all tasks, compared to learning without language.
#5233

Monte Carlo Tree Search for Policy Optimization
Xiaobai Ma, Katherine Driggs-Campbell, Zongzhang Zhang, Mykel J. Kochenderfer
Details | PDF

Reinforcement Learning 5

Gradient-based methods are often used for policy optimization in deep reinforcement learning, despite being vulnerable to local optima and saddle points. Although gradient-free methods (e.g., genetic algorithms or evolution strategies) help mitigate these issues, poor initialization and local optima are still concerns in highly nonconvex spaces. This paper presents a method for policy optimization based on Monte-Carlo tree search and gradient-free optimization. Our method, called Monte-Carlo tree search for policy optimization (MCTSPO), provides a better exploration-exploitation trade-off through the use of the upper confidence bound heuristic. We demonstrate improved performance on reinforcement learning tasks with deceptive or sparse reward functions compared to popular gradient-based and deep genetic algorithm baselines.
#6078

Hill Climbing on Value Estimates for Search-control in Dyna
Yangchen Pan, Hengshuai Yao, Amir-massoud Farahmand, Martha White
Details | PDF

Reinforcement Learning 5

Dyna is an architecture for model based reinforcement learning (RL), where simulated experience from a model is used to update policies or value functions. A key component of Dyna is search control, the mechanism to generate the state and action from which the agent queries the model, which remains largely unexplored. In this work, we propose to generate such states by using the trajectory obtained from Hill Climbing (HC) the current estimate of the value function. This has the effect of propagating value from high value regions and of preemptively updating value estimates of the regions that the agent is likely to visit next. We derive a noisy projected natural gradient algorithm for hill climbing, and highlight a connection to Langevin dynamics. We provide an empirical demonstration on four classical domains that our algorithm, HC Dyna, can obtain significant sample efficiency improvements. We study the properties of different sampling distributions for search control, and find that there appears to be a benefit specifically from using the samples generated by climbing on current value estimates from low value to high value region.
#6239

Recurrent Existence Determination Through Policy Optimization
Baoxiang Wang
Details | PDF

Reinforcement Learning 5

Binary determination of the presence of objects is one of the problems where humans perform extraordinarily better than computer vision systems, in terms of both speed and preciseness. One of the possible reasons is that humans can skip most of the clutter and attend only on salient regions. Recurrent attention models (RAM) are the first computational models to imitate the way humans process images via the REINFORCE algorithm. Despite that RAM is originally designed for image recognition, we extend it and present recurrent existence determination, an attention-based mechanism to solve the existence determination. Our algorithm employs a novel $k$-maximum aggregation layer and a new reward mechanism to address the issue of delayed rewards, which would have caused the instability of the training process. The experimental analysis demonstrates significant efficiency and accuracy improvement over existing approaches, on both synthetic and real-world datasets.
#6265

Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space
Zhou Fan, Rui Su, Weinan Zhang, Yong Yu
Details | PDF

Reinforcement Learning 5

In this paper we propose a hybrid architecture of actor-critic algorithms for reinforcement learning in parameterized action space, which consists of multiple parallel sub-actor networks to decompose the structured action space into simpler action spaces along with a critic network to guide the training of all sub-actor networks. While this paper is mainly focused on parameterized action space, the proposed architecture, which we call hybrid actor-critic, can be extended for more general action spaces which has a hierarchical structure. We present an instance of the hybrid actor-critic architecture based on proximal policy optimization (PPO), which we refer to as hybrid proximal policy optimization (H-PPO). Our experiments test H-PPO on a collection of tasks with parameterized action space, where H-PPO demonstrates superior performance over previous methods of parameterized action reinforcement learning.
#763

Transfer of Temporal Logic Formulas in Reinforcement Learning
Zhe Xu, Ufuk Topcu
Details | PDF

Reinforcement Learning 5

Transferring high-level knowledge from a source task to a target task is an effective way to expedite reinforcement learning (RL). For example, propositional logic and first-order logic have been used as representations of such knowledge. We study the transfer of knowledge between tasks in which the timing of the events matters. We call such tasks temporal tasks. We concretize similarity between temporal tasks through a notion of logical transferability, and develop a transfer learning approach between different yet similar temporal tasks. We first propose an inference technique to extract metric interval temporal logic (MITL) formulas in sequential disjunctive normal form from labeled trajectories collected in RL of the two tasks. If logical transferability is identified through this inference, we construct a timed automaton for each sequential conjunctive subformula of the inferred MITL formulas from both tasks. We perform RL on the extended state which includes the locations and clock valuations of the timed automata for the source task. We then establish mappings between the corresponding components (clocks, locations, etc.) of the timed automata from the two tasks, and transfer the extended Q-functions based on the established mappings. Finally, we perform RL on the extended state for the target task, starting with the transferred extended Q-functions. Our implementation results show, depending on how similar the source task and the target task are, that the sampling efficiency for the target task can be improved by up to one order of magnitude by performing RL in the extended state space, and further improved by up to another order of magnitude using the transferred extended Q-functions.

Thursday 15 11:00 - 12:30 AMS|CSC - Computational Social Choice 3 (2703-2704)

Chair: Piotr Faliszewski

#1422

Sybil-Resilient Reality-Aware Social Choice
Gal Shahaf, Ehud Shapiro, Nimrod Talmon
Details | PDF

Computational Social Choice 3

Sybil attacks, in which fake or duplicate identities (a.k.a., Sybils) infiltrate an online community, pose a serious threat to such communities, as they might tilt community-wide decisions in their favor. While the extensive research on sybil identification may help keep the fraction of sybils in such communities low, it cannot however ensure their complete eradication. Thus, our goal here is to enhance social choice theory with effective group decision mechanisms for communities with bounded sybil penetration. Inspired by Reality-Aware Social Choice, we use the status quo as the anchor of Sybil Resilience, characterized by Sybil Safety -- the inability of sybils to change the status quo against the will of the genuine agents, and Sybil Liveness -- the ability of the genuine agents to change the status quo against the will of the sybils. We consider the social choice settings of deciding on a single proposal, on multiple proposals, and on updating a parameter. For each, we present social choice rules that are sybil-safe and, under certain conditions, satisfy sybil-liveness.
#2251

Graphical One-Sided Markets
Sagar Massand, Sunil Simon
Details | PDF

Computational Social Choice 3

We study the problem of allocating indivisible objects to a set of rational agents where each agent's final utility depends on the intrinsic valuation of the allocated item as well as the allocation within the agent's local neighbourhood. We specify agents' local neighbourhood in terms of a weighted graph. This extends the model of one-sided markets to incorporate neighbourhood externalities. We consider the solution concept of stability and show that, unlike in the case of one-sided markets, stable allocations may not always exist. When the underlying local neighbourhood graph is symmetric, a 2-stable allocation is guaranteed to exist and any decentralised mechanism where pairs of rational players agree to exchange objects terminates in such an allocation. We show that computing a 2-stable allocation is PLS-complete and further identify subclasses which are tractable. In the case of asymmetric neighbourhood structures, we show that it is NP-complete to check if a 2-stable allocation exists. We then identify structural restrictions where stable allocations always exist and can be computed efficiently. Finally, we study the notion of envy-freeness in this framework.
#2779

Maximin-Aware Allocations of Indivisible Goods
Hau Chan, Jing Chen, Bo Li, Xiaowei Wu
Details | PDF

Computational Social Choice 3

We study envy-free allocations of indivisible goods to agents in settings where each agent is unaware of the goods allocated to other agents. In particular, we propose the maximin aware (MMA) fairness measure, which guarantees that every agent, given the bundle allocated to her, is aware that she does not envy at least one other agent, even if she does not know how the other goods are distributed among other agents. We also introduce two of its relaxations, and discuss their egalitarian guarantee and existence. Finally, we present a polynomial-time algorithm, which computes an allocation that approximately satisfies MMA or its relaxations. Interestingly, the returned allocation is also 1/2-approximate EFX when all agents have sub- additive valuations, which improves the algorithm in [Plaut and Roughgarden, 2018].
#4626

On the Tree Representations of Dichotomous Preferences
Yongjie Yang
Details | PDF

Computational Social Choice 3

We study numerous restricted domains of dichotomous preferences with respect to some tree structures. Particularly, we study the relationships among these domains and the ones proposed by Elkind and Lackner [2015]. We also show that recognizing all the restricted domains proposed in this paper is polynomial-time solvable. Finally, we explore the complexity of winner determination for several important approval-based multiwinner voting rules when restricted to these domains.
#2331

Exploiting Social Influence to Control Elections Based on Scoring Rules
Federico Corò, Emilio Cruciani, Gianlorenzo D'Angelo, Stefano Ponziani
Details | PDF

Computational Social Choice 3

We consider the election control problem in social networks which consists in exploiting social influence in a network of voters to change their opinion about a target candidate with the aim of increasing his chances to win (constructive control) or lose (destructive control) the election. Previous works on this problem focus on plurality voting systems and on a influence model in which the opinion of the voters about the target candidate can only change by shifting its ranking by one position, regardless of the amount of influence that a voter receives. We introduce Linear Threshold Ranking, a natural extension of Linear Threshold Model, which models the change of opinions taking into account the amount of exercised influence. In this general model, we are able to approximate the maximum score that a target candidate can achieve up to a factor of 1-1/e by showing submodularity of the objective function. We exploit this result to provide a 1/3(1-1/e)-approximation algorithm for the constructive election control problem and a 1/2(1-1/e)-approximation ratio in the destructive scenario. The algorithm can be used in arbitrary scoring rule voting systems, including plurality rule and borda count.
#5559

Fairness Towards Groups of Agents in the Allocation of Indivisible Items
Nawal Benabbou, Mithun Chakraborty, Edith Elkind, Yair Zick
Details | PDF

Computational Social Choice 3

In this paper, we study the problem of matching a set of items to a set of agents partitioned into types so as to balance fairness towards the types against overall utility/efficiency. We extend multiple desirable properties of indivisible goods allocation to our model and investigate the possibility and hardness of achieving combinations of these properties, e.g. we prove that maximizing utilitarian social welfare under constraints of typewise envy-freeness up to one item (TEF1) is computationally intractable. We also define a new concept of waste for this setting, show experimentally that augmenting an existing algorithm with a marginal utility maximization heuristic can produce a TEF1 solution with reduced waste, and also provide a polynomial-time algorithm for computing a non-wasteful TEF1 allocation for binary agent-item utilities.

Thursday 15 11:00 - 12:30 MTA|SP - Security and Privacy 2 (2705-2706)

Chair: Venkatramanan Siva Subrahmanian

#162

Data Poisoning Attack against Knowledge Graph Embedding
Hengtong Zhang, Tianhang Zheng, Jing Gao, Chenglin Miao, Lu Su, Yaliang Li, Kui Ren
Details | PDF

Security and Privacy 2

Knowledge graph embedding (KGE) is a technique for learning continuous embeddings for entities and relations in the knowledge graph. Due to its benefit to a variety of downstream tasks such as knowledge graph completion, question answering and recommendation, KGE has gained significant attention recently. Despite its effectiveness in a benign environment, KGE's robustness to adversarial attacks is not well-studied. Existing attack methods on graph data cannot be directly applied to attack the embeddings of knowledge graph due to its heterogeneity. To fill this gap, we propose a collection of data poisoning attack strategies, which can effectively manipulate the plausibility of arbitrary targeted facts in a knowledge graph by adding or deleting facts on the graph. The effectiveness and efficiency of the proposed attack strategies are verified by extensive evaluations on two widely-used benchmarks.
#361

Heterogeneous Gaussian Mechanism: Preserving Differential Privacy in Deep Learning with Provable Robustness
NhatHai Phan, Minh N. Vu, Yang Liu, Ruoming Jin, Dejing Dou, Xintao Wu, My T. Thai
Details | PDF

Security and Privacy 2

In this paper, we propose a novel Heterogeneous Gaussian Mechanism (HGM) to preserve differential privacy in deep neural networks, with provable robustness against adversarial examples. We first relax the constraint of the privacy budget in the traditional Gaussian Mechanism from (0, 1] to (0, infty), with a new bound of the noise scale to preserve differential privacy. The noise in our mechanism can be arbitrarily redistributed, offering a distinctive ability to address the trade-off between model utility and privacy loss. To derive provable robustness, our HGM is applied to inject Gaussian noise into the first hidden layer. Then, a tighter robustness bound is proposed. Theoretical analysis and thorough evaluations show that our mechanism notably improves the robustness of differentially private deep neural networks, compared with baseline approaches, under a variety of model attacks.
#3785

Model-Agnostic Adversarial Detection by Random Perturbations
Bo Huang, Yi Wang, Wei Wang
Details | PDF

Security and Privacy 2

Adversarial examples induce model classification errors on purpose, which has raised concerns on the security aspect of machine learning techniques. Many existing countermeasures are compromised by adaptive adversaries and transferred examples. We propose a model-agnostic approach to resolve the problem by analysing the model responses to an input under random perturbations, and study the robustness of detecting norm-bounded adversarial distortions in a theoretical framework. Extensive evaluations are performed on the MNIST, CIFAR-10 and ImageNet datasets. The results demonstrate that our detection method is effective and resilient against various attacks including black-box attacks and the powerful CW attack with four adversarial adaptations.
#5108

Real-Time Adversarial Attacks
Yuan Gong, Boyang Li, Christian Poellabauer, Yiyu Shi
Details | PDF

Security and Privacy 2

In recent years, many efforts have demonstrated that modern machine learning algorithms are vulnerable to adversarial attacks, where small, but carefully crafted, perturbations on the input can make them fail. While these attack methods are very effective, they only focus on scenarios where the target model takes static input, i.e., an attacker can observe the entire original sample and then add a perturbation at any point of the sample. These attack approaches are not applicable to situations where the target model takes streaming input, i.e., an attacker is only able to observe past data points and add perturbations to the remaining (unobserved) data points of the input. In this paper, we propose a real-time adversarial attack scheme for machine learning models with streaming inputs.
#965

Privacy-Preserving Obfuscation of Critical Infrastructure Networks
Ferdinando Fioretto, Terrence W.K. Mak, Pascal Van Hentenryck
Details | PDF

Security and Privacy 2

The paper studies how to release data about a critical infrastructure network (e.g., a power network or a transportation network) without disclosing sensitive information that can be exploited by malevolent agents, while preserving the realism of the network. It proposes a novel obfuscation mechanism that combines several privacy-preserving building blocks with a bi-level optimization model to significantly improve accuracy. The obfuscation is evaluated for both realism and privacy properties on real energy and transportation networks. Experimental results show the obfuscation mechanism substantially reduces the potential damage of an attack exploiting the released data to harm the real network.
#821

Robust Audio Adversarial Example for a Physical Attack
Hiromu Yakura, Jun Sakuma
Details | PDF

Security and Privacy 2

We propose a method to generate audio adversarial examples that can attack a state-of-the-art speech recognition model in the physical world. Previous work assumes that generated adversarial examples are directly fed to the recognition model, and is not able to perform such a physical attack because of reverberation and noise from playback environments. In contrast, our method obtains robust adversarial examples by simulating transformations caused by playback or recording in the physical world and incorporating the transformations into the generation process. Evaluation and a listening experiment demonstrated that our adversarial examples are able to attack without being noticed by humans. This result suggests that audio adversarial examples generated by the proposed method may become a real threat.

Thursday 15 11:00 - 12:30 ML|UL - Unsupervised Learning 1 (2601-2602)

Chair: Xiao Wang

#245

Unsupervised Inductive Graph-Level Representation Learning via Graph-Graph Proximity
Yunsheng Bai, Hao Ding, Yang Qiao, Agustin Marinovic, Ken Gu, Ting Chen, Yizhou Sun, Wei Wang
Details | PDF

Unsupervised Learning 1

We introduce a novel approach to graph-level representation learning, which is to embed an entire graph into a vector space where the embeddings of two graphs preserve their graph-graph proximity. Our approach, UGraphEmb, is a general framework that provides a novel means to performing graph-level embedding in a completely unsupervised and inductive manner. The learned neural network can be considered as a function that receives any graph as input, either seen or unseen in the training set, and transforms it into an embedding. A novel graph-level embedding generation mechanism called Multi-Scale Node Attention (MSNA), is proposed. Experiments on five real graph datasets show that UGraphEmb achieves competitive accuracy in the tasks of graph classification, similarity ranking, and graph visualization.
#2485

SPINE: Structural Identity Preserved Inductive Network Embedding
Junliang Guo, Linli Xu, Jingchang Liu
Details | PDF

Unsupervised Learning 1

Recent advances in the field of network embedding have shown that low-dimensional network representation is playing a critical role in network analysis. Most existing network embedding methods encode the local proximity of a node, such as the first- and second-order proximities. While being efficient, these methods are short of leveraging the global structural information between nodes distant from each other. In addition, most existing methods learn embeddings on one single fixed network, and thus cannot be generalized to unseen nodes or networks without retraining. In this paper we present SPINE, a method that can jointly capture the local proximity and proximities at any distance, while being inductive to efficiently deal with unseen nodes or networks. Extensive experimental results on benchmark datasets demonstrate the superiority of the proposed framework over the state of the art.
#2567

Improving representation learning in autoencoders via multidimensional interpolation and dual regularizations
Sheng Qian, Guanyue Li, Wen-Ming Cao, Cheng Liu, Si Wu, Hau San Wong
Details | PDF

Unsupervised Learning 1

Autoencoders enjoy a remarkable ability to learn data representations. Research on autoencoders shows that the effectiveness of data interpolation can reflect the performance of representation learning. However, existing interpolation methods in autoencoders do not have enough capability of traversing a possible region between two datapoints on a data manifold, and the distribution of interpolated latent representations is not considered.To address these issues, we aim to fully exert the potential of data interpolation and further improve representation learning in autoencoders. Specifically, we propose the multidimensional interpolation to increase the capability of data interpolation by randomly setting interpolation coefficients for each dimension of latent representations. In addition, we regularize autoencoders in both the latent and the data spaces by imposing a prior on latent representations in the Maximum Mean Discrepancy (MMD) framework and encouraging generated datapoints to be realistic in the Generative Adversarial Network (GAN) framework. Compared to representative models, our proposed model has empirically shown that representation learning exhibits better performance on downstream tasks on multiple benchmarks.
#3490

Learning Unsupervised Visual Grounding Through Semantic Self-Supervision
Syed Ashar Javed, Shreyas Saxena, Vineet Gandhi
Details | PDF

Unsupervised Learning 1

Localizing natural language phrases in images is a challenging problem that requires joint understanding of both the textual and visual modalities. In the unsupervised setting, lack of supervisory signals exacerbate this difficulty. In this paper, we propose a novel framework for unsupervised visual grounding which uses concept learning as a proxy task to obtain self-supervision. The intuition behind this idea is to encourage the model to localize to regions which can explain some semantic property in the data, in our case, the property being the presence of a concept in a set of images. We present thorough quantitative and qualitative experiments to demonstrate the efficacy of our approach and show a 5.6% improvement over the current state of the art on Visual Genome dataset, a 5.8% improvement on the ReferItGame dataset and comparable to state-of-art performance on the Flickr30k dataset.
#5604

Attributed Subspace Clustering
Jing Wang, Linchuan Xu, Feng Tian, Atsushi Suzuki, Changqing Zhang, Kenji Yamanishi
Details | PDF

Unsupervised Learning 1

Existing methods on representation-based subspace clustering mainly treat all features of data as a whole to learn a single self-representation and get one clustering solution. Real data however are often complex and consist of multiple attributes or sub-features, such as a face image has expressions or genders. Each attribute is distinct and complementary on depicting the data. Failing to explore attributes and capture the complementary information among them may lead to an inaccurate representation. Moreover, a single clustering solution is rather limited to depict data, which can often be interpreted from different aspects and grouped into multiple clusters according to attributes. Therefore, we propose an innovative model called attributed subspace clustering (ASC). It simultaneously learns multiple self-representations on latent representations derived from original data. By utilizing Hilbert Schmidt Independence Criterion as a co-regularizing term, ASC enforces that each self-representation is independent and corresponds to a specific attribute. A more comprehensive self-representation is then established by adding these self-representations. Experiments on several benchmark image datasets have demonstrated the effectiveness of ASC not only in terms of clustering accuracy achieved by the integrated representation, but also the diverse interpretation of data, which is beyond what current approaches can offer.
#6178

Learning Strictly Orthogonal p-Order Nonnegative Laplacian Embedding via Smoothed Iterative Reweighted Method
Haoxuan Yang, Kai Liu, Hua Wang, Feiping Nie
Details | PDF

Unsupervised Learning 1

Laplacian Embedding (LE) is a powerful method to reveal the intrinsic geometry of high-dimensional data by using graphs. Imposing the orthogonal and nonnegative constraints onto the LE objective has proved to be effective to avoid degenerate and negative solutions, which, though, are challenging to achieve simultaneously because they are nonlinear and nonconvex. In addition, recent studies have shown that using the p-th order of the L2-norm distances in LE can find the best solution for clustering and promote the robustness of the embedding model against outliers, although this makes the optimization objective nonsmooth and difficult to efficiently solve in general. In this work, we study LE that uses the p-th order of the L2-norm distances and satisfies both orthogonal and nonnegative constraints. We introduce a novel smoothed iterative reweighted method to tackle this challenging optimization problem and rigorously analyze its convergence. We demonstrate the effectiveness and potential of our proposed method by extensive empirical studies on both synthetic and real data sets.

Thursday 15 11:00 - 12:30 ML|LGM2 - Learning Generative Models (2603-2604)

Chair: Zizhao Zhang

#572

Image-to-Image Translation with Multi-Path Consistency Regularization
Jianxin Lin, Yingce Xia, Yijun Wang, Tao Qin, Zhibo Chen
Details | PDF

Learning Generative Models

Image translation across different domains has attracted much attention in both machine learning and computer vision communities. Taking the translation from a source domain to a target domain as an example, existing algorithms mainly rely on two kinds of loss for training: One is the discrimination loss, which is used to differentiate images generated by the models and natural images; the other is the reconstruction loss, which measures the difference between an original image and the reconstructed version. In this work, we introduce a new kind of loss, multi-path consistency loss, which evaluates the differences between direct translation from source domain to target domain and indirect translation from source domain to an auxiliary domain to target domain, to regularize training. For multi-domain translation (at least, three) which focuses on building translation models between any two domains, at each training iteration, we randomly select three domains, set them respectively as the source, auxiliary and target domains, build the multi-path consistency loss and optimize the network. For two-domain translation, we need to introduce an additional auxiliary domain and construct the multi-path consistency loss. We conduct various experiments to demonstrate the effectiveness of our proposed methods, including face-to-face translation, paint-to-photo translation, and de-raining/de-noising translation.
#614

GAN-EM: GAN Based EM Learning Framework
Wentian Zhao, Shaojie Wang, Zhihuai Xie, Jing Shi, Chenliang Xu
Details | PDF

Learning Generative Models

Expectation maximization (EM) algorithm is to find maximum likelihood solution for models having latent variables. A typical example is Gaussian Mixture Model (GMM) which requires Gaussian assumption, however, natural images are highly non-Gaussian so that GMM cannot be applied to perform image clustering task on pixel space. To overcome such limitation, we propose a GAN based EM learning framework that can maximize the likelihood of images and estimate the latent variables. We call this model GAN-EM, which is a framework for image clustering, semi-supervised classification and dimensionality reduction. In M-step, we design a novel loss function for discriminator of GAN to perform maximum likelihood estimation (MLE) on data with soft class label assignments. Specifically, a conditional generator captures data distribution for K classes, and a discriminator tells whether a sample is real or fake for each class. Since our model is unsupervised, the class label of real data is regarded as latent variable, which is estimated by an additional network (E-net) in E-step. The proposed GAN-EM achieves state-of-the-art clustering and semi-supervised classification results on MNIST, SVHN and CelebA, as well as comparable quality of generated images to other recently developed generative models.
#1334

IRC-GAN: Introspective Recurrent Convolutional GAN for Text-to-video Generation
Kangle Deng, Tianyi Fei, Xin Huang, Yuxin Peng
Details | PDF

Learning Generative Models

Automatically generating videos according to the given text is a highly challenging task, where visual quality and semantic consistency with captions are two critical issues. In existing methods, when generating a specific frame, the information in those frames generated before is not fully exploited. And an effective way to measure the semantic accordance between videos and captions remains to be established. To address these issues, we present a novel Introspective Recurrent Convolutional GAN (IRC-GAN) approach. First, we propose a recurrent transconvolutional generator, where LSTM cells are integrated with 2D transconvolutional layers. As 2D transconvolutional layers put more emphasis on the details of each frame than 3D ones, our generator takes both the definition of each video frame and temporal coherence across the whole video into consideration, and thus can generate videos with better visual quality. Second, we propose mutual information introspection to semantically align the generated videos to text. Unlike other methods simply judging whether the video and the text match or not, we further take mutual information to concretely measure the semantic consistency. In this way, our model is able to introspect the semantic distance between the generated video and the corresponding text, and try to minimize it to boost the semantic consistency.We conduct experiments on 3 datasets and compare with state-of-the-art methods. Experimental results demonstrate the effectiveness of our IRC-GAN to generate plausible videos from given text.
#3586

Learning Generative Adversarial Networks from Multiple Data Sources
Trung Le, Quan Hoang, Hung Vu, Tu Dinh Nguyen, Hung Bui, Dinh Phung
Details | PDF

Learning Generative Models

Generative Adversarial Networks (GANs) are a powerful class of deep generative models. In this paper, we extend GAN to the problem of generating data that are not only close to a primary data source but also required to be different from auxiliary data sources. For this problem, we enrich both GANs' formulations and applications by introducing pushing forces that thrust generated samples away from given auxiliary data sources. We term our method Push-and-Pull GAN (P2GAN). We conduct extensive experiments to demonstrate the merit of P2GAN in two applications: generating data with constraints and addressing the mode collapsing problem. We use CIFAR-10, STL-10, and ImageNet datasets and compute Fréchet Inception Distance to evaluate P2GAN's effectiveness in addressing the mode collapsing problem. The results show that P2GAN outperforms the state-of-the-art baselines. For the problem of generating data with constraints, we show that P2GAN can successfully avoid generating specific features such as black hair.
#5090

Conditional GAN with Discriminative Filter Generation for Text-to-Video Synthesis
Yogesh Balaji, Martin Renqiang Min, Bing Bai, Rama Chellappa, Hans Peter Graf
Details | PDF

Learning Generative Models

Developing conditional generative models for text-to-video synthesis is an extremely challenging yet an important topic of research in machine learning. In this work, we address this problem by introducing Text-Filter conditioning Generative Adversarial Network (TFGAN), a conditional GAN model with a novel multi-scale text-conditioning scheme that improves text-video associations. By combining the proposed conditioning scheme with a deep GAN architecture, TFGAN generates high quality videos from text on challenging real-world video datasets. In addition, we construct a synthetic dataset of text-conditioned moving shapes to systematically evaluate our conditioning scheme. Extensive experiments demonstrate that TFGAN significantly outperforms existing approaches, and can also generate videos of novel categories not seen during training.
#5731

Three-Player Wasserstein GAN via Amortised Duality
Nhan Dam, Quan Hoang, Trung Le, Tu Dinh Nguyen, Hung Bui, Dinh Phung
Details | PDF

Learning Generative Models

We propose a new formulation for learning generative adversarial networks (GANs) using optimal transport cost (the general form of Wasserstein distance) as the objective criterion to measure the dissimilarity between target distribution and learned distribution. Our formulation is based on the general form of the Kantorovich duality which is applicable to optimal transport with a wide range of cost functions that are not necessarily metric. To make optimising this duality form amenable to gradient-based methods, we employ a function that acts as an amortised optimiser for the innermost optimisation problem. Interestingly, the amortised optimiser can be viewed as a mover since it strategically shifts around data points. The resulting formulation is a sequential min-max-min game with 3 players: the generator, the critic, and the mover where the new player, the mover, attempts to fool the critic by shifting the data around. Despite involving three players, we demonstrate that our proposed formulation can be trained reasonably effectively via a simple alternative gradient learning strategy. Compared with the existing Lipschitz-constrained formulations of Wasserstein GAN on CIFAR-10, our model yields significantly better diversity scores than weight clipping and comparable performance to gradient penalty method.

Thursday 15 11:00 - 12:30 NLP|NLG - Natural Language Generation 2 (2605-2606)

Chair: Chengqing Zong

#533

Network Embedding with Dual Generation Tasks
Jie Liu, Na Li, Zhicheng He
Details | PDF

Natural Language Generation 2

We study the problem of Network Embedding (NE) for content-rich networks. NE models aim to learn efficient low-dimensional dense vectors for network vertices which are crucial to many network analysis tasks. The core problem of content-rich network embedding is to learn and integrate the semantic information conveyed by network structure and node content. In this paper, we propose a general end-to-end model, Dual GEnerative Network Embedding (DGENE), to leverage the complementary information of network structure and content. In this model, each vertex is regarded as an object with two modalities: node identity and textual content. Then we formulate two dual generation tasks. One is Node Identification (NI) which recognizes nodes’ identities given their contents. Inversely, the other one is Content Generation (CG) which generates textual contents given the nodes’ identities. We develop specific Content2Node and Node2Content models for the two tasks. Under the DGENE framework, the two dual models are learned by sharing and integrating intermediate layers, with which they mutually enhance each other. Extensive experimental results show that our model yields a significant performance gain compared to the state-of-the-art NE methods. Moreover, our model has an interesting and useful byproduct, that is, a component of our model can generate texts, which is potentially useful for many tasks.
#1269

Learning towards Abstractive Timeline Summarization
Xiuying Chen, Zhangming Chan, Shen Gao, Meng-Hsuan Yu, Dongyan Zhao, Rui Yan
Details | PDF

Natural Language Generation 2

Timeline summarization targets at concisely summarizing the evolution trajectory along the timeline and existing timeline summarization approaches are all based on extractive methods.In this paper, we propose the task of abstractive timeline summarization, which tends to concisely paraphrase the information in the time-stamped events.Unlike traditional document summarization, timeline summarization needs to model the time series information of the input events and summarize important events in chronological order.To tackle this challenge, we propose a memory-based timeline summarization model (MTS).Concretely, we propose a time-event memory to establish a timeline, and use the time position of events on this timeline to guide generation process.Besides, in each decoding step, we incorporate event-level information into word-level attention to avoid confusion between events.Extensive experiments are conducted on a large-scale real-world dataset, and the results show that MTS achieves the state-of-the-art performance in terms of both automatic and human evaluations.
#1405

Sequence Generation: From Both Sides to the Middle
Long Zhou, Jiajun Zhang, Chengqing Zong, Heng Yu
Details | PDF

Natural Language Generation 2

The encoder-decoder framework has achieved promising process for many sequence generation tasks, such as neural machine translation and text summarization. Such a framework usually generates a sequence token by token from left to right, hence (1) this autoregressive decoding procedure is time-consuming when the output sentence becomes longer, and (2) it lacks the guidance of future context which is crucial to avoid under-translation. To alleviate these issues, we propose a synchronous bidirectional sequence generation (SBSG) model which predicts its outputs from both sides to the middle simultaneously. In the SBSG model, we enable the left-to-right (L2R) and right-to-left (R2L) generation to help and interact with each other by leveraging interactive bidirectional attention network. Experiments on neural machine translation (En-De, Ch-En, and En-Ro) and text summarization tasks show that the proposed model significantly speeds up decoding while improving the generation quality compared to the autoregressive Transformer.
#2328

Mask and Infill: Applying Masked Language Model for Sentiment Transfer
Xing Wu, Tao Zhang, Liangjun Zang, Jizhong Han, Songlin Hu
Details | PDF

Natural Language Generation 2

This paper focuses on the task of sentiment transfer on non-parallel text, which modifies sentiment attributes (e.g., positive or negative) of sentences while preserving their attribute-independent contents. Existing methods adopt RNN encoder-decoder structure to generate a new sentence of a target sentiment word by word, which is trained on a particular dataset from scratch and have limited ability to produce satisfactory sentences. When people convert the sentiment attribute of a given sentence, a simple but effective approach is to only replace the sentiment tokens of the sentence with other expressions indicative of the target sentiment, instead of building a new sentence from scratch. Such a process is very similar to the task of Text Infilling or Cloze. With this intuition, we propose a two steps approach: Mask and Infill. In the \emph{mask} step, we identify and mask the sentiment tokens of a given sentence. In the \emph{infill} step, we utilize a pre-trained Masked Language Model (MLM) to infill the masked positions by predicting words or phrases conditioned on the context\footnote{In this paper, \emph{content} and \emph{context} are equivalent, \emph{style}, \emph{attribute} and \emph{label} are equivalent.}and target sentiment. We evaluate our model on two review datasets \emph{Yelp} and \emph{Amazon} by quantitative, qualitative, and human evaluations. Experimental results demonstrate that our model achieve state-of-the-art performance on both accuracy and BLEU scores.
#2690

Sentiment-Controllable Chinese Poetry Generation
Huimin Chen, Xiaoyuan Yi, Maosong Sun, Wenhao Li, Cheng Yang, Zhipeng Guo
Details | PDF

Natural Language Generation 2

Expressing diverse sentiments is one of the main purposes of human poetry creation. Existing Chinese poetry generation models have made great progress in poetry quality, but they all neglected to endow generated poems with specific sentiments. Such defect leads to strong sentiment collapse or bias and thus hurts the diversity and semantics of generated poems. Meanwhile, there are few sentimental Chinese poetry resources for studying. To address this problem, we first collect a manually-labelled sentimental poetry corpus with fine-grained sentiment labels. Then we propose a novel semi-supervised conditional Variational Auto-Encoder model for sentiment-controllable poetry generation. Besides, since poetry is discourse-level text where the polarity and intensity of sentiment could transfer among lines, we incorporate a temporal module to capture sentiment transition patterns among different lines. Experimental results show our model can control the sentiment of not only a whole poem but also each line, and improve the poetry diversity against the state-of-the-art models without losing quality.
#5988

A Deep Generative Model for Code Switched Text
Bidisha Samanta, Sharmila Reddy, Hussain Jagirdar, Niloy Ganguly, Soumen Chakrabarti
Details | PDF

Natural Language Generation 2

Code-switching, the interleaving of two or more languages within a sentence or discourse is pervasive in multilingual societies. Accurate language models for code-switched text are critical for NLP tasks. State-of-the-art data-intensive neural language models are difficult to train well from scarce language-labeled code-switched text. A potential solution is to use deep generative models to synthesize large volumes of realistic code-switched text. Although generative adversarial networks and variational autoencoders can synthesize plausible monolingual text from continuous latent space, they cannot adequately address code-switched text, owing to their informal style and complex interplay between the constituent languages. We introduce VACS, a novel variational autoencoder architecture specifically tailored to code-switching phenomena. VACS encodes to and decodes from a two-level hierarchical representation, which models syntactic contextual signals in the lower level, and language switching signals in the upper layer. Sampling representations from the prior and decoding them produced well-formed, diverse code-switched sentences. Extensive experiments show that using synthetic code-switched text with natural monolingual data results in significant (33.06\%) drop in perplexity.

Thursday 15 11:00 - 12:30 CV|CV - Computer Vision (2501-2502)

Chair: Yu-Shen Liu

#1328

Dynamic Feature Fusion for Semantic Edge Detection
Yuan Hu, Yunpeng Chen, Xiang Li, Jiashi Feng
Details | PDF

Computer Vision

Features from multiple scales can greatly benefit the semantic edge detection task if they are well fused. However, the prevalent semantic edge detection methods apply a fixed weight fusion strategy where images with different semantics are forced to share the same weights, resulting in universal fusion weights for all images and locations regardless of their different semantics or local context. In this work, we propose a novel dynamic feature fusion strategy that assigns different fusion weights for different input images and locations adaptively. This is achieved by a proposed weight learner to infer proper fusion weights over multi-level features for each location of the feature map, conditioned on the specific input. In this way, the heterogeneity in contributions made by different locations of feature maps and input images can be better considered and thus help produce more accurate and sharper edge predictions. We show that our model with the novel dynamic feature fusion is superior to fixed weight fusion and also the na\"ive location-invariant weight fusion methods, via comprehensive experiments on benchmarks Cityscapes and SBD. In particular, our method outperforms all existing well established methods and achieves new state-of-the-art.
#2881

CoSegNet: Image Co-segmentation using a Conditional Siamese Convolutional Network
Sayan Banerjee, Avik Hati, Subhasis Chaudhuri, Rajbabu Velmurugan
Details | PDF

Computer Vision

The objective in image co-segmentation is to jointly segment unknown common objects from a given set of images. In this paper, we propose a novel deep convolution neural network based end-to-end co-segmentation model. It is composed of a metric learning and decision network leading to a novel conditional siamese encoder-decoder network for estimating a co-segmentation mask. The role of the metric learning network is to find an optimum latent feature space where objects of the same class are closer and that of different classes are separated by a certain margin. Depending on the extracted features, the decision network decides whether input images have common objects or not and the encoder-decoder network produces a cosegmentation mask accordingly. Key aspects of the architecture are as follows. First, it is completely class agnostic and does not require any semantic information. Second, in addition to producing masks, the decoder network also learns similarity across image pairs that improves co-segmentation significantly. Experimental results reflect an excellent performance of our method compared to state of-the-art methods on challenging co-segmentation datasets.
#3696

Learning to Draw Text in Natural Images with Conditional Adversarial Networks
Shancheng Fang, Hongtao Xie, Jianjun Chen, Jianlong Tan, Yongdong Zhang
Details | PDF

Computer Vision

In this work, we propose an entirely learning-based method to automatically synthesize text sequence in natural images leveraging conditional adversarial networks. As vanilla GANs are clumsy to capture structural text patterns, directly employing GANs for text image synthesis typically results in illegible images. Therefore, we design a two-stage architecture to generate repeated characters in images. Firstly, a character generator attempts to synthesize local character appearance independently, so that the legible characters in sequence can be obtained. To achieve style consistency of characters, we propose a novel style loss based on variance-minimization. Secondly, we design a pixel-manipulation word generator constrained by self-regularization, which learns to convert local characters to plausible word image. Experiments on SVHN dataset and ICDAR, IIIT5K datasets demonstrate our method is able to synthesize visually appealing text images. Besides, we also show the high-quality images synthesized by our method can be used to boost the performance of a scene text recognition algorithm.
#5933

Hallucinating Optical Flow Features for Video Classification
Yongyi Tang, Lin Ma, Lianqiang Zhou
Details | PDF

Computer Vision

Appearance and motion are two key components to depict and characterize the video content. Currently, the two-stream models have achieved state-of-the-art performances on video classification. However, extracting motion information, specifically in the form of optical flow features, is extremely computationally expensive, especially for large-scale video classification. In this paper, we propose a motion hallucination network, namely MoNet, to imagine the optical flow features from the appearance features, with no reliance on the optical flow computation. Specifically, MoNet models the temporal relationships of the appearance features and exploits the contextual relationships of the optical flow features with concurrent connections. Extensive experimental results demonstrate that the proposed MoNet can effectively and efficiently hallucinate the optical flow features, which together with the appearance features consistently improve the video classification performances. Moreover, MoNet can help cutting down almost a half of computational and data-storage burdens for the two-stream video classification. Our code is available at: https://github.com/YongyiTang92/MoNet-Features
#6443

Generative Image Inpainting with Submanifold Alignment
Ang Li, Jianzhong Qi, Rui Zhang, Xingjun Ma, Kotagiri Ramamohanarao
Details | PDF

Computer Vision

Image inpainting aims at restoring missing regions of corrupted images, which has many applications such as image restoration and object removal. However, current GAN-based generative inpainting models do not explicitly exploit the structural or textural consistency between restored contents and their surrounding contexts. To address this limitation, we propose to enforce the alignment (or closeness) between the local data submanifolds (subspaces) around restored images and those around the original (uncorrupted) images during the learning process of GAN-based inpainting models. We exploit Local Intrinsic Dimensionality (LID) to measure, in deep feature space, the alignment between data submanifolds learned by a GAN model and those of the original data, from a perspective of both images (denoted as iLID) and local patches (denoted as pLID) of images. We then apply iLID and pLID as regularizations for GAN-based inpainting models to encourage two different levels of submanifold alignments: 1) an image-level alignment to improve structural consistency, and 2) a patch-level alignment to improve textural details. Experimental results on four benchmark datasets show that our proposed model can generate more accurate results than state-of-the-art models.
#3173

ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs
Amir Gholaminejad, Kurt Keutzer, George Biros
Details | PDF

Computer Vision

Residual neural networks can be viewed as the forward Euler discretization of an Ordinary Differential Equation (ODE) with a unit time step. This has recently motivated researchers to explore other discretization approaches and train ODE based networks. However, an important challenge of neural ODEs is their prohibitive memory cost during gradient backpropogation. Recently a method proposed in arXiv:1806.07366, claimed that this memory overhead can be reduced from LNt, where Nt is the number of time steps, down to O(L) by solving forward ODE backwards in time, where L is the depth of the network. However, we will show that this approach may lead to several problems: (i) it may be numerically unstable for ReLU/non-ReLU activations and general convolution operators, and (ii) the proposed optimize-then-discretize approach may lead to divergent training due to inconsistent gradients for small time step sizes. We discuss the underlying problems, and to address them we propose ANODE, a neural ODE framework which avoids the numerical instability related problems noted above. ANODE has a memory footprint of O(L) + O(Nt), with the same computational cost as reversing ODE solve. We furthermore, discuss a memory efficient algorithm which can further reduce this footprint with a tradeoff of additional computational cost. We show results on Cifar-10/100 datasets using ResNet and SqueezeNext neural networks.

Thursday 15 11:00 - 12:30 ML|C - Classification 6 (2503-2504)

Chair: Marie-Jeanne Lesot

#1349

Supervised Short-Length Hashing
Xingbo Liu, Xiushan Nie, Quan Zhou, Xiaoming Xi, Lei Zhu, Yilong Yin
Details | PDF

Classification 6

Hashing can compress high-dimensional data into compact binary codes, while preserving the similarity, to facilitate efficient retrieval and storage. However, when retrieving using an extremely short length hash code learned by the existing methods, the performance cannot be guaranteed because of severe information loss. To address this issue, in this study, we propose a novel supervised short-length hashing (SSLH). In this proposed SSLH, mutual reconstruction between the short-length hash codes and original features are performed to reduce semantic loss. Furthermore, to enhance the robustness and accuracy of the hash representation, a robust estimator term is added to fully utilize the label information. Extensive experiments conducted on four image benchmarks demonstrate the superior performance of the proposed SSLH with short-length hash codes. In addition, the proposed SSLH outperforms the existing methods, with long-length hash codes. To the best of our knowledge, this is the first linear-based hashing method that focuses on both short and long-length hash codes for maintaining high precision.
#1665

KCNN: Kernel-wise Quantization to Remarkably Decrease Multiplications in Convolutional Neural Network
Linghua Zeng, Zhangcheng Wang, Xinmei Tian
Details | PDF

Classification 6

Convolutional neural networks (CNNs) have demonstrated state-of-the-art performance in computer vision tasks. However, the high computational power demand of running devices of recent CNNs has hampered many of their applications. Recently, many methods have quantized the floating-point weights and activations to fixed-points or binary values to convert fractional arithmetic to integer or bit-wise arithmetic. However, since the distributions of values in CNNs are extremely complex, fixed-points or binary values lead to numerical information loss and cause performance degradation. On the other hand, convolution is composed of multiplications and accumulation, but the implementation of multiplications in hardware is more costly comparing with accumulation. We can preserve the rich information of floating-point values on dedicated low power devices by considerably decreasing the multiplications. In this paper, we quantize the floating-point weights in each kernel separately to multiple bit planes to remarkably decrease multiplications. We obtain a closed-form solution via an aggressive Lloyd algorithm and the fine-tuning is adopted to optimize the bit planes. Furthermore, we propose dual normalization to solve the pathological curvature problem during fine-tuning. Our quantized networks show negligible performance loss compared to their floating-point counterparts.
#3038

Advocacy Learning: Learning through Competition and Class-Conditional Representations
Ian Fox, Jenna Wiens
Details | PDF

Classification 6

We introduce advocacy learning, a novel supervised training scheme for attention-based classification problems. Advocacy learning relies on a framework consisting of two connected networks: 1) N Advocates (one for each class), each of which outputs an argument in the form of an attention map over the input, and 2) a Judge, which predicts the class label based on these arguments. Each Advocate produces a class-conditional representation with the goal of convincing the Judge that the input example belongs to their class, even when the input belongs to a different class. Applied to several different classification tasks, we show that advocacy learning can lead to small improvements in classification accuracy over an identical supervised baseline. Though a series of follow-up experiments, we analyze when and how such class-conditional representations improve discriminative performance. Though somewhat counter-intuitive, a framework in which subnetworks are trained to competitively provide evidence in support of their class shows promise, in many cases performing on par with standard learning approaches. This provides a foundation for further exploration into competition and class-conditional representations in supervised learning.
#4747

Taming the Noisy Gradient: Train Deep Neural Networks with Small Batch Sizes
Yikai Zhang, Hui Qu, Chao Chen, Dimitris Metaxas
Details | PDF

Classification 6

Deep learning architectures are usually proposed with millions of parameters, resulting in a memory issue when training deep neural networks with stochastic gradient descent type methods using large batch sizes. However, training with small batch sizes tends to produce low quality solution due to the large variance of stochastic gradients. In this paper, we tackle this problem by proposing a new framework for training deep neural network with small batches/noisy gradient. During optimization, our method iteratively applies a proximal type regularizer to make loss function strongly convex. Such regularizer stablizes the gradient, leading to better training performance. We prove that our algorithm achieves comparable convergence rate as vanilla SGD even with small batch size. Our framework is simple to implement and can be potentially combined with many existing optimization algorithms. Empirical results show that our method outperforms SGD and Adam when batch size is small. Our implementation is available at https://github.com/huiqu18/TRAlgorithm.
#678

Play and Prune: Adaptive Filter Pruning for Deep Model Compression
Pravendra Singh, Vinay Kumar Verma, Piyush Rai, Vinay P. Namboodiri
Details | PDF

Classification 6

While convolutional neural networks (CNN) have achieved impressive performance on various classification/recognition tasks, they typically consist of a massive number of parameters. This results in significant memory requirement as well as computational overheads. Consequently, there is a growing need for filter-level pruning approaches for compressing CNN based models that not only reduce the total number of parameters but reduce the overall computation as well. We present a new min-max framework for filter-level pruning of CNNs. Our framework, called Play and Prune (PP), jointly prunes and fine-tunes CNN model parameters, with an adaptive pruning rate, while maintaining the model's predictive performance. Our framework consists of two modules: (1) An adaptive filter pruning (AFP) module, which minimizes the number of filters in the model; and (2) A pruning rate controller (PRC) module, which maximizes the accuracy during pruning. Moreover, unlike most previous approaches, our approach allows directly specifying the desired error tolerance instead of pruning level. Our compressed models can be deployed at run-time, without requiring any special libraries or hardware. Our approach reduces the number of parameters of VGG-16 by an impressive factor of 17.5X, and number of FLOPS by 6.43X, with no loss of accuracy, significantly outperforming other state-of-the-art filter pruning methods.
#3101

Partial Label Learning by Semantic Difference Maximization
Lei Feng, Bo An
Details | PDF

Classification 6

Partial label learning is a weakly supervised learning framework, in which each instance is provided with multiple candidate labels while only one of them is correct. Most of the existing approaches focus on leveraging the instance relationships to disambiguate the given noisy label space, while it is still unclear whether we can exploit potentially useful information in label space to alleviate the label ambiguities. This paper gives a positive answer to this question for the first time. Specifically, if two instances do not share any common candidate labels, they cannot have the same ground-truth label. By exploiting such dissimilarity relationships from label space, we propose a novel approach that aims to maximize the latent semantic differences of the two instances whose ground-truth labels are definitely different, while training the desired model simultaneously, thereby continually enlarging the gap of label confidences between two instances of different classes. Extensive experiments on artificial and real-world partial label datasets show that our approach significantly outperforms state-of-the-art counterparts.

Thursday 15 11:00 - 12:30 ML|DM - Data Mining 9 (2505-2506)

Chair: Shoujin Wang

#1514

Hierarchical Diffusion Attention Network
Zhitao Wang, Wenjie Li
Details | PDF

Data Mining 9

A series of recent studies formulated the diffusion prediction problem as a sequence prediction task and proposed several sequential models based on recurrent neural networks. However, non-sequential properties exist in real diffusion cascades, which do not strictly follow the sequential assumptions of previous work. In this paper, we propose a hierarchical diffusion attention network (HiDAN), which adopts a non-sequential framework and two-level attention mechanisms, for diffusion prediction. At the user level, a dependency attention mechanism is proposed to dynamically capture historical user-to-user dependencies and extract the dependency-aware user information. At the cascade (i.e., sequence) level, a time-aware influence attention is designed to infer possible future user's dependencies on historical users by considering both inherent user importance and time decay effects. Significantly higher effectiveness and efficiency of HiDAN over state-of-the-art sequential models are demonstrated when evaluated on three real diffusion datasets. The further case studies illustrate that HiDAN can accurately capture diffusion dependencies.
#1726

HDI-Forest: Highest Density Interval Regression Forest
Lin Zhu, Jiaxing Lu, Yihong Chen
Details | PDF

Data Mining 9

By seeking the narrowest prediction intervals (PIs) that satisfy the specified coverage probability requirements, the recently proposed quality-based PI learning principle can extract high-quality PIs that better summarize the predictive certainty in regression tasks, and has been widely applied to solve many practical problems. Currently, the state-of-the-art quality-based PI estimation methods are based on deep neural networks or linear models. In this paper, we propose Highest Density Interval Regression Forest (HDI-Forest), a novel quality-based PI estimation method that is instead based on Random Forest. HDI-Forest does not require additional model training, and directly reuses the trees learned in a standard Random Forest model. By utilizing the special properties of Random Forest, HDI-Forest could efficiently and more directly optimize the PI quality metrics. Extensive experiments on benchmark datasets show that HDI-Forest significantly outperforms previous approaches, reducing the average PI width by over 20% while achieving the same or better coverage probability.
#2035

Trend-Aware Tensor Factorization for Job Skill Demand Analysis
Xunxian Wu, Tong Xu, Hengshu Zhu, Le Zhang, Enhong Chen, Hui Xiong
Details | PDF

Data Mining 9

Given a job position, how to identify the right job skill demand and its evolving trend becomes critically important for both job seekers and employers in the fast-paced job market. Along this line, there still exist various challenges due to the lack of holistic understanding on skills related factors, e.g., the dynamic validity periods of skill trend, as well as the constraints from overlapped business and skill co-occurrence. To address these challenges, in this paper, we propose a trend-aware approach for fine-grained skill demand analysis. Specifically, we first construct a tensor for each timestamp based on the large-scale recruitment data, and then reveal the aggregations among companies and skills by heuristic solutions. Afterwards, the Trend-Aware Tensor Factorization (TATF) framework is designed by integrating multiple confounding factors, i.e., aggregation-based and temporal constraints, to provide more fine-grained representation and evolving trend of job demand for specific job positions. Finally, validations on large-scale real-world data clearly validate the effectiveness of our approach for skill demand analysis.
#3772

Legal Judgment Prediction via Multi-Perspective Bi-Feedback Network
Wenmian Yang, Weijia Jia, Xiaojie Zhou, Yutao Luo
Details | PDF

Data Mining 9

The Legal Judgment Prediction (LJP) is to determine judgment results based on the fact descriptions of the cases. LJP usually consists of multiple subtasks, such as applicable law articles prediction, charges prediction, and the term of the penalty prediction. These multiple subtasks have topological dependencies, the results of which affect and verify each other. However, existing methods use dependencies of results among multiple subtasks inefficiently. Moreover, for cases with similar descriptions but different penalties, current methods cannot predict accurately because the word collocation information is ignored. In this paper, we propose a Multi-Perspective Bi-Feedback Network with the Word Collocation Attention mechanism based on the topology structure among subtasks. Specifically, we design a multi-perspective forward prediction and backward verification framework to utilize result dependencies among multiple subtasks effectively. To distinguish cases with similar descriptions but different penalties, we integrate word collocations features of fact descriptions into the network via an attention mechanism. The experimental results show our model achieves significant improvements over baselines on all prediction tasks.
#4102

DANE: Domain Adaptive Network Embedding
Yizhou Zhang, Guojie Song, Lun Du, Shuwen Yang, Yilun Jin
Details | PDF

Data Mining 9

Recent works reveal that network embedding techniques enable many machine learning models to handle diverse downstream tasks on graph structured data. However, as previous methods usually focus on learning embeddings for a single network, they can not learn representations transferable on multiple networks. Hence, it is important to design a network embedding algorithm that supports downstream model transferring on different networks, known as domain adaptation. In this paper, we propose a novel Domain Adaptive Network Embedding framework, which applies graph convolutional network to learn transferable embeddings. In DANE, nodes from multiple networks are encoded to vectors via a shared set of learnable parameters so that the vectors share an aligned embedding space. The distribution of embeddings on different networks are further aligned by adversarial learning regularization. In addition, DANE's advantage in learning transferable network embedding can be guaranteed theoretically. Extensive experiments reflect that the proposed framework outperforms other state-of-the-art network embedding baselines in cross-network domain adaptation tasks.
#1669

Discovering Regularities from Traditional Chinese Medicine Prescriptions via Bipartite Embedding Model
Chunyang Ruan, Jiangang Ma, Ye Wang, Yanchun Zhang, Yun Yang
Details | PDF

Data Mining 9

Regularities analysis for prescriptions is a significant task for traditional Chinese medicine (TCM), both in inheritance of clinical experience and in improvement of clinical quality. Recently, many methods have been proposed for regularities discovery, but this task is challenging due to the quantity, sparsity and free-style of prescriptions. In this paper, we address the specific problem of regularities discovery and propose a graph embedding based framework for regularities discovery for massive prescriptions. We model this task as a relation prediction in which the correlation of two herbs or of herb and symptom are incorporated to characterize the different relationships. Specifically, we first establish a heterogeneous network with herbs and symptoms as its nodes. We develop a bipartite embedding model termed HS2Vec to detect regularities, which explores multiple relations of herbherb, and herb-symptom based on the heterogeneous network. Experiments on four real-world datasets demonstrate that the proposed framework is very effective for regularities discovery.

Thursday 15 11:00 - 12:30 UAI|UAI - Uncertainty in AI (2401-2402)

Chair: Luigi Portinale

#5235

Ranked Programming
Tjitze Rienstra
Details | PDF

Uncertainty in AI

While probabilistic programming is a powerful tool, uncertainty is not always of a probabilistic kind. Some types of uncertainty are better captured using ranking theory, which is an alternative to probability theory where uncertainty is measured using degrees of surprise on the integer scale from 0 to ∞. In this paper we combine probabilistic programming methodology with ranking theory and develop a ranked programming language. We use the Scheme programming language a basis and extend it with the ability to express both normal and exceptional behaviour of a model, and perform inference on such models. Like probabilistic programming, our approach provides a simple and flexible way to represent and reason with models involving uncertainty, but using a coarser grained and computationally simpler kind of uncertainty.
#5411

Hyper-parameter Tuning under a Budget Constraint
Zhiyun Lu, Liyu Chen, Chao-Kai Chiang, Fei Sha
Details | PDF

Uncertainty in AI

Hyper-parameter tuning is of crucial importance for real-world machine learning applications. While existing works mainly focus on speeding up the tuning process, we propose to study the problem of hyper-parameter tuning under a budget constraint, which is a more realistic scenario in developing large-scale systems. We formulate the task into a sequential decision making problem and propose a solution, which uses a Bayesian belief model to predict future performances, and an action-value function to plan and select the next configuration to run. With long term prediction and planning capability, our method is able to early stop unpromising configurations, and adapt the tuning behaviors to different constraints. Experiment results show that our method outperforms existing algorithms, including the-state-of-the-art one, on real-world tuning tasks across a range of different budgets.
#10986

(Journal track) Complexity of Fundamental Problems in Probabilistic Abstract Argumentation: Beyond Independence
Bettina Fazzinga, Sergio Flesca, Filippo Furfaro
Details | PDF

Uncertainty in AI

The complexity of the probabilistic counterparts of the verification and acceptance problems is investigated over probabilistic Abstract Argumentation Frameworks (prAAFs), in a setting more general than the literature, where the complexity has been characterized only under independence between arguments/defeats. The complexity of these problems is shown to depend on the semantics of the extensions, the way of encoding the prAAF, and the correlations between arguments/defeats. In this regard, in order to study the impact of different correlations between arguments/defeats on the complexity, a new form of prAAF is introduced, called gen. It is based on the well-known paradigm of world-set sets, and it allows the correlations to be easily distinguishable.
#3212

Statistical Guarantees for the Robustness of Bayesian Neural Networks
Luca Cardelli, Marta Kwiatkowska, Luca Laurenti, Nicola Paoletti, Andrea Patane, Matthew Wicker
Details | PDF

Uncertainty in AI

We introduce a probabilistic robustness measure for Bayesian Neural Networks (BNNs), defined as the probability that, given a test point, there exists a point within a bounded set such that the BNN prediction differs between the two. Such a measure can be used, for instance, to quantify the probability of the existence of adversarial examples. Building on statistical verification techniques for probabilistic models, we develop a framework that allows us to estimate probabilistic robustness for a BNN with statistical guarantees, i.e., with a priori error and confidence bounds. We provide experimental comparison for several approximate BNN inference techniques on image classification tasks associated to MNIST and a two-class subset of the GTSRB dataset. Our results enable quantification of uncertainty of BNN predictions in adversarial settings.
#4065

On Privacy Protection of Latent Dirichlet Allocation Model Training
Fangyuan Zhao, Xuebin Ren, Shusen Yang, Xinyu Yang
Details | PDF

Uncertainty in AI

Latent Dirichlet Allocation (LDA) is a popular topic modeling technique for discovery of hidden semantic architecture of text datasets, and plays a fundamental role in many machine learning applications. However, like many other machine learning algorithms, the process of training a LDA model may leak the sensitive information of the training datasets and bring significant privacy risks. To mitigate the privacy issues in LDA, we focus on studying privacy-preserving algorithms of LDA model training in this paper. In particular, we first develop a privacy monitoring algorithm to investigate the privacy guarantee obtained from the inherent randomness of the Collapsed Gibbs Sampling (CGS) process in a typical LDA training algorithm on centralized curated datasets. Then, we further propose a locally private LDA training algorithm on crowdsourced data to provide local differential privacy for individual data contributors. The experimental results on real-world datasets demonstrate the effectiveness of our proposed algorithms.
#3478

Learning Interpretable Deep State Space Model for Probabilistic Time Series Forecasting
Longyuan Li, Junchi Yan, Xiaokang Yang, Yaohui Jin
Details | PDF

Uncertainty in AI

Probabilistic time series forecasting involves estimating the distribution of future based on its history, which is essential for risk management in downstream decision-making. We propose a deep state space model for probabilistic time series forecasting whereby the non-linear emission model and transition model are parameterized by networks and the dependency is modeled by recurrent neural nets. We take the automatic relevance determination (ARD) view and devise a network to exploit the exogenous variables in addition to time series. In particular, our ARD network can incorporate the uncertainty of the exogenous variables and eventually helps identify useful exogenous variables and suppress those irrelevant for forecasting. The distribution of multi-step ahead forecasts are approximated by Monte Carlo simulation. We show in experiments that our model produces accurate and sharp probabilistic forecasts. The estimated uncertainty of our forecasting also realistically increases over time, in a spontaneous manner.

Thursday 15 11:00 - 12:30 HSGP|HS - Heuristic Search 2 (2403-2404)

Chair: Yingqian Zhang

#956

Direction-Optimizing Breadth-First Search with External Memory Storage
Shuli Hu, Nathan R. Sturtevant
Details | PDF

Heuristic Search 2

While computing resources have continued to grow, methods for building and using large heuristics have not seen significant advances in recent years. We have observed that direction-optimizing breadth-first search, developed for and used broadly in the Graph 500 competition, can also be applied for building heuristics. But, the algorithm cannot run efficiently using external memory -- when the heuristics being built are larger than RAM. This paper shows how to modify direction-optimizing breadth-first search to build external-memory heuristics. We show that the new approach is not effective in state spaces with low asymptotic branching factors, but in other domains we are able to achieve up to a 3x reducing in runtime when building an external-memory heuristic. The approach is then used to build a 2.6TiB Rubik's Cube heuristic with 5.8 trillion entries, the largest pattern database heuristic ever built.
#4235

Branch-and-Cut-and-Price for Multi-Agent Pathfinding
Edward Lam, Pierre Le Bodic, Daniel D. Harabor, Peter J. Stuckey
Details | PDF

Heuristic Search 2

There are currently two broad strategies for optimal Multi-agent Pathfinding (MAPF): (1) search-based methods, which model and solve MAPF directly, and (2) compilation-based solvers, which reduce MAPF to instances of well-known combinatorial problems, and thus, can benefit from advances in solver techniques. In this work, we present an optimal algorithm, BCP, that hybridizes both approaches using Branch-and-Cut-and-Price, a decomposition framework developed for mathematical optimization. We formalize BCP and compare it empirically against CBSH and CBSH-RM, two leading search-based solvers. Conclusive results on standard benchmarks indicate that its performance exceeds the state-of-the-art: solving more instances on smaller grids and scaling reliably to 100 or more agents on larger game maps.
#4345

Local Search with Efficient Automatic Configuration for Minimum Vertex Cover
Chuan Luo, Holger H. Hoos, Shaowei Cai, Qingwei Lin, Hongyu Zhang, Dongmei Zhang
Details | PDF

Heuristic Search 2

Minimum vertex cover (MinVC) is a prominent NP-hard problem in artificial intelligence, with considerable importance in applications. Local search solvers define the state of the art in solving MinVC. However, there is no single MinVC solver that works best across all types of MinVC instances, and finding the most suitable solver for a given application poses considerable challenges. In this work, we present a new local search framework for MinVC called MetaVC, which is highly parametric and incorporates many effective local search techniques. Using an automatic algorithm configurator, the performance of MetaVC can be optimized for particular types of MinVC instances. Through extensive experiments, we demonstrate that MetaVC significantly outperforms previous solvers on medium-size hard MinVC instances, and shows competitive performance on large MinVC instances. We further introduce a neural-network-based approach for enhancing the automatic configuration process, by identifying and terminating unpromising configuration runs. Our results demonstrate that MetaVC, when automatically configured using this method, can achieve improvements in the best known solutions for 16 large MinVC instances.
#6181

Path Planning with CPD Heuristics
Massimo Bono, Alfonso E. Gerevini, Daniel D. Harabor, Peter J. Stuckey
Details | PDF

Heuristic Search 2

Compressed Path Databases (CPDs) are a leading technique for optimal pathfinding in graphs with static edge costs. In this work we investigate CPDs as admissible heuristic functions and we apply them in two distinct settings: problems where the graph is subject to dynamically changing costs, and anytime settings where deliberation time is limited. Conventional heuristics derive cost-to-go estimates by reasoning about a tentative and usually infeasible path, from the current node to the target. CPD-based heuristics derive cost-to-go estimates by computing a concrete and usually feasible path. We exploit such paths to bound the optimal solution, not just from below but also from above. We demonstrate the benefit of this approach in a range of experiments on standard gridmaps and in comparison to Landmarks, a popular alternative also developed for searching in explicit state-spaces.
#10958

(Sister Conferences Best Papers Track) Optimally Efficient Bidirectional Search
Eshed Shaham, Ariel Felner, Nathan R. Sturtevant, Jeffrey S. Rosenschein
Details | PDF

Heuristic Search 2

A* is optimally efficient with regard to node expansions among unidirectional admissible algorithms — those that only assume that the heuristic used is admissible. This paper studies algorithms that are optimally efficient for bidirectional search algorithms. We present the Fractional MM algorithm and its sibling, the MT algorithm, which is simpler to analyze. We then develop variants of these algorithms that are optimally efficient, each under different assumptions on the information available to the algorithm.
#6188

Regarding Jump Point Search and Subgoal Graphs
Daniel D. Harabor, Tansel Uras, Peter J. Stuckey, Sven Koenig
Details | PDF

Heuristic Search 2

In this paper, we define Jump Point Graphs (JP), a preprocessing-based path-planning technique similar to Subgoal Graphs (SG). JP allows for the first time the combination of Jump Point Search style pruning in the context of abstraction-based speedup techniques, such as Contraction Hierarchies. We compare JP with SG and its variants and report new state-of-the-art results for grid-based pathfinding.

Thursday 15 11:00 - 12:30 Community Meeting (2405-2406)

CLAIRE (Confederation of Laboratories for AI Research in Europe)

Community Meeting

Thursday 15 11:00 - 12:30 Early Career 4 - Early Career Spotlight 4 (2306)

Chair: Michael Spranger

#11061

Budgeted Sequential Decision Making in Human-Aware AI Systems
Long Tran-Thanh

Early Career Spotlight 4

In this talk I will summarise my research in the topic of sequential decision making under uncertainty with budget limits, with an application to the domain of human-aware AI systems.
#11070

CyberAI: Innovation, Research and Education for a Better World
Yanfang (Fanny) Ye

Early Career Spotlight 4

Nowadays, each and every day techniques in artificial intelligence (AI) are changing our view of the world. As the Internet and computing devices become increasingly ubiquitous, their security has become more and more important. Cyber attackers and defenders are engaged in a never-ending arms race. At each round, both attackers and defenders analyze the vulnerabilities of each other, and develop their own optimal strategies to overcome the opponents, which has led to considerable countermeasures of variability and sophistication between them. In the AI age, next-generation cybersecurity systems may need to automatically characterize attackers’ behaviors and ever-changing environments to enable self-adaptive defenses. In this talk, I will first introduce the development of cybersecurity industry; then I will present the techniques we have proposed and developed against the evolving cyberattacks; later I will further discuss the arms race between adversarial cyberattacks and defenses in the AI age to facilitate the design and development of next-generation cybersecurity systems. At the end of the talk, I will explore the integration of research and education in this field to inspire K-12 students to pursue STEM careers, especially for young women.
#11063

Map Synchronization: from Object Correspondences to Neural Networks
Qixing Huang

Early Career Spotlight 4

Thursday 15 12:00 - 12:30 Industry Days (D-I)

Chair: Richard Tong (Squirrel AI Learning)

Smart Mobility: Redefining Transportation with AI
Tiger Qie, Vice President, Didi

Industry Days

Thursday 15 12:00 - 17:00 Competition (Hall A)

Angry Birds - Human vs machine - Can you beat the best AI?

Competition

Show details

Thursday 15 14:00 - 14:50 Invited Talk (D-I)

Chair: Fahiem Bacchus

Queryable Self-Deliberating Dynamic Systems
Giuseppe de Giacomo

Invited Talk

Thursday 15 14:00 - 15:00 Industry Days (K)

Chair: Dou Shen (Baidu)

Embracing international service via AI
Hui Wang, Knowledge Scientist, Xiaoi

Industry Days

Thursday 15 14:00 - 18:00 Competition (2304)

The Tenth International Automated Negotiating Agent Competition

Competition

Thursday 15 14:00 - 18:00 Competition (2305)

IJCAI-2019 AI Alibaba Adversarial AI Challenge

Competition

Thursday 15 15:00 - 16:00 Industry Days (K)

Chair: Dou Shen (Baidu)

Panel: AI Challenges in Industry

Industry Days

Thursday 15 15:00 - 16:00 AI-HWB - ST: AI for Improving Human Well-Being 7 (J)

Chair: Jonathan Schaeffer

#5733

Simultaneous Prediction Intervals for Patient-Specific Survival Curves
Samuel Sokota, Ryan D'Orazio, Khurram Javed, Humza Haider, Russell Greiner
Details | PDF

ST: AI for Improving Human Well-Being 7

Accurate models of patient survival probabilities provide important information to clinicians prescribing care for life-threatening and terminal ailments. A recently developed class of models -- known as individual survival distributions (ISDs) -- produces patient-specific survival functions that offer greater descriptive power of patient outcomes than was previously possible. Unfortunately, at the time of writing, ISD models almost universally lack uncertainty quantification. In this paper we demonstrate that an existing method for estimating simultaneous prediction intervals from samples can easily be adapted for patient-specific survival curve analysis and yields accurate results. Furthermore, we introduce both a modification to the existing method and a novel method for estimating simultaneous prediction intervals and show that they offer competitive performance. It is worth emphasizing that these methods are not limited to survival analysis and can be applied in any context in which sampling the distribution of interest is tractable. Code is available at https://github.com/ssokota/spie.
#3659

Daytime Sleepiness Level Prediction Using Respiratory Information
Kazuhiko Shinoda, Masahiko Yoshii, Hayato Yamaguchi, Hirotaka Kaji
Details | PDF

ST: AI for Improving Human Well-Being 7

Daytime sleepiness is not only the cause of productivity decline and accidents, but also an important metric of health risks. Despite its importance, the long-term quantitative analysis of sleepiness in daily living has hardly been done due to time and effort required for the continuous tracking of sleepiness. Although a number of sleepiness detection technologies have been proposed, most of them focused only on driver’s drowsiness. In this paper, we present the first step towards the continuous sleepiness tracking in daily living situations. We explore a methodology for predicting subjective sleepiness levels utilizing respiration and acceleration data obtained from a novel wearable sensor. A class imbalance handling technique and hidden Markov model are combined with supervised classifiers to overcome the difficulties in learning from an imbalanced and time series dataset. We evaluate the performance of our models through a comprehensive experiment.
#3539

Pre-training of Graph Augmented Transformers for Medication Recommendation
Junyuan Shang, Tengfei Ma, Cao Xiao, Jimeng Sun
Details | PDF

ST: AI for Improving Human Well-Being 7

Medication recommendation is an important healthcare application. It is commonly formulated as a temporal prediction task. Hence, most existing works only utilize longitudinal electronic health records (EHRs) from a small number of patients with multiple visits ignoring a large number of patients with a single visit (selection bias). Moreover, important hierarchical knowledge such as diagnosis hierarchy is not leveraged in the representation learning process. Despite the success of deep learning techniques in computational phenotyping, most previous approaches have two limitations: task-oriented representation and ignoring hierarchies of medical codes. To address these challenges, we propose G-BERT, a new model to combine the power of Graph Neural Networks (GNNs) and BERT (Bidirectional Encoder Representations from Transformers) for medical code representation and medication recommendation. We use GNNs to represent the internal hierarchical structures of medical codes. Then we integrate the GNN representation into a transformer-based visit encoder and pre-train it on EHR data from patients only with a single visit. The pre-trained visit encoder and representation are then fine-tuned for downstream predictive tasks on longitudinal EHRs from patients with multiple visits. G-BERT is the first to bring the language model pre-training schema into the healthcare domain and it achieved state-of-the-art performance on the medication recommendation task.
#6482

K-margin-based Residual-Convolution-Recurrent Neural Network for Atrial Fibrillation Detection
Yuxi Zhou, Shenda Hong, Junyuan Shang, Meng Wu, Qingyun Wang, Hongyan Li, Junqing Xie
Details | PDF

ST: AI for Improving Human Well-Being 7

Atrial Fibrillation (AF) is an abnormal heart rhythm which can trigger cardiac arrest and sudden death. Nevertheless, its interpretation is mostly done by medical experts due to high error rates of computerized interpretation. One study found that only about 66% of AF were correctly recognized from noisy ECGs. This is in part due to insufficient training data, class skewness, as well as semantical ambiguities caused by noisy segments in an ECG record. In this paper, we propose a K-margin-based Residual-Convolution-Recurrent neural network (K-margin-based RCR-net) for AF detection from noisy ECGs. In detail, a skewness-driven dynamic augmentation method is employed to handle the problems of data inadequacy and class imbalance. A novel RCR-net is proposed to automatically extract both long-term rhythm-level and local heartbeat-level characters. Finally, we present a K-margin-based diagnosis model to automatically focus on the most important parts of an ECG record and handle noise by naturally exploiting expected consistency among the segments associated for each record. The experimental results demonstrate that the proposed method with 0.8125 F1NAOP score outperforms all state-of-the-art deep learning methods for AF detection task by 6.8%.

Thursday 15 15:00 - 16:00 ML|EM - Ensemble Methods 2 (L)

Chair: Yanfang Ye

#3201

The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding Distillation with Ensemble Learning
Bonggun Shin, Hao Yang, Jinho D. Choi
Details | PDF

Ensemble Methods 2

Recent advances in deep learning have facilitated the demand of neural models for real applications. In practice, these applications often need to be deployed with limited resources while keeping high accuracy. This paper touches the core of neural models in NLP, word embeddings, and presents an embedding distillation framework that remarkably reduces the dimension of word embeddings without compromising accuracy. A new distillation ensemble approach is also proposed that trains a high-efficient student model using multiple teacher models. In our approach, the teacher models play roles only during training such that the student model operates on its own without getting supports from the teacher models during decoding, which makes it run as fast and light as any single model. All models are evaluated on seven document classification datasets and show significant advantage over the teacher models for most cases. Our analysis depicts insightful transformation of word embeddings from distillation and suggests a future direction to ensemble approaches using neural models.
#3477

Privacy-Preserving Stacking with Application to Cross-organizational Diabetes Prediction
Quanming Yao, Xiawei Guo, James Kwok, Weiwei Tu, Yuqiang Chen, Wenyuan Dai, Qiang Yang
Details | PDF

Ensemble Methods 2

To meet the standard of differential privacy, noise is usually added into the original data, which inevitably deteriorates the predicting performance of subsequent learning algorithms. In this paper, motivated by the success of improving predicting performance by ensemble learning, we propose to enhance privacy-preserving logistic regression by stacking. We show that this can be done either by sample-based or feature-based partitioning. However, we prove that when privacy-budgets are the same, feature-based partitioning requires fewer samples than sample-based one, and thus likely has better empirical performance. As transfer learning is difficult to be integrated with a differential privacy guarantee, we further combine the proposed method with hypothesis transfer learning to address the problem of learning across different organizations. Finally, we not only demonstrate the effectiveness of our method on two benchmark data sets, i.e., MNIST and NEWS20, but also apply it into a real application of cross-organizational diabetes prediction from RUIJIN data set, where privacy is of a significant concern.
#4491

Hybrid Item-Item Recommendation via Semi-Parametric Embedding
Peng Hu, Rong Du, Yao Hu, Nan Li
Details | PDF

Ensemble Methods 2

Nowadays, item-item recommendation plays an important role in modern recommender systems. Traditionally, this is either solved by behavior-based collaborative filtering or content-based meth- ods. However, both kinds of methods often suffer from cold-start problems, or poor performance due to few behavior supervision; and hybrid methods which can leverage the strength of both kinds of methods are needed. In this paper, we propose a semi-parametric embedding framework for this problem. Specifically, the embedding of an item is composed of two parts, i.e., the parametric part from content information and the non-parametric part designed to encode behavior information; meanwhile, a deep learning algorithm is proposed to learn two parts simultaneously. Extensive experiments on real-world datasets demonstrate the effectiveness and robustness of the proposed method.
#1560

Adversarial Graph Embedding for Ensemble Clustering
Zhiqiang Tao, Hongfu Liu, Jun Li, Zhaowen Wang, Yun Fu
Details | PDF

Ensemble Methods 2

Ensemble clustering generally integrates basic partitions into a consensus one through a graph partitioning method, which, however, has two limitations: 1) it neglects to reuse original features; 2) obtaining consensus partition with learnable graph representations is still under-explored. In this paper, we propose a novel Adversarial Graph Auto-Encoders (AGAE) model to incorporate ensemble clustering into a deep graph embedding process. Specifically, graph convolutional network is adopted as probabilistic encoder to jointly integrate the information from feature content and consensus graph, and a simple inner product layer is used as decoder to reconstruct graph with the encoded latent variables (i.e., embedding representations). Moreover, we develop an adversarial regularizer to guide the network training with an adaptive partition-dependent prior. Experiments on eight real-world datasets are presented to show the effectiveness of AGAE over several state-of-the-art deep embedding and ensemble clustering methods.

Thursday 15 15:00 - 16:00 ML|EML - Explainable Machine Learning (2701-2702)

Chair: Fosca Giannotti

#1177

Twin-Systems to Explain Artificial Neural Networks using Case-Based Reasoning: Comparative Tests of Feature-Weighting Methods in ANN-CBR Twins for XAI
Eoin M. Kenny, Mark T. Keane
Details | PDF

Explainable Machine Learning

In this paper, twin-systems are described to address the eXplainable artificial intelligence (XAI) problem, where a black box model is mapped to a white box “twin” that is more interpretable, with both systems using the same dataset. The framework is instantiated by twinning an artificial neural network (ANN; black box) with a case-based reasoning system (CBR; white box), and mapping the feature weights from the former to the latter to find cases that explain the ANN’s outputs. Using a novel evaluation method, the effectiveness of this twin-system approach is demonstrated by showing that nearest neighbor cases can be found to match the ANN predictions for benchmark datasets. Several feature-weighting methods are competitively tested in two experiments, including our novel, contributions-based method (called COLE) that is found to perform best. The tests consider the ”twinning” of traditional multilayer perceptron (MLP) networks and convolutional neural networks (CNN) with CBR systems. For the CNNs trained on image data, qualitative evidence shows that cases provide plausible explanations for the CNN’s classifications.
#2604

The Dangers of Post-hoc Interpretability: Unjustified Counterfactual Explanations
Thibault Laugel, Marie-Jeanne Lesot, Christophe Marsala, Xavier Renard, Marcin Detyniecki
Details | PDF

Explainable Machine Learning

Post-hoc interpretability approaches have been proven to be powerful tools to generate explanations for the predictions made by a trained black-box model. However, they create the risk of having explanations that are a result of some artifacts learned by the model instead of actual knowledge from the data. This paper focuses on the case of counterfactual explanations and asks whether the generated instances can be justified, i.e. continuously connected to some ground-truth data. We evaluate the risk of generating unjustified counterfactual examples by investigating the local neighborhoods of instances whose predictions are to be explained and show that this risk is quite high for several datasets. Furthermore, we show that most state of the art approaches do not differentiate justified from unjustified counterfactual examples, leading to less useful explanations.
#2678

A Gradient-Based Split Criterion for Highly Accurate and Transparent Model Trees
Klaus Broelemann, Gjergji Kasneci
Details | PDF

Explainable Machine Learning

Machine learning algorithms aim at minimizing the number of false decisions and increasing the accuracy of predictions. However, the high predictive power of advanced algorithms comes at the costs of transparency. State-of-the-art methods, such as neural networks and ensemble methods, result in highly complex models with little transparency. We propose shallow model trees as a way to combine simple and highly transparent predictive models for higher predictive power without losing the transparency of the original models. We present a novel split criterion for model trees that allows for significantly higher predictive power than state-of-the-art model trees while maintaining the same level of simplicity. This novel approach finds split points which allow the underlying simple models to make better predictions on the corresponding data. In addition, we introduce multiple mechanisms to increase the transparency of the resulting trees.
#3000

Non-smooth Optimization over Stiefel Manifolds with Applications to Dimensionality Reduction and Graph Clustering
Fariba Zohrizadeh, Mohsen Kheirandishfard, Farhad Kamangar, Ramtin Madani
Details | PDF

Explainable Machine Learning

This paper is concerned with the class of non-convex optimization problems with orthogonality constraints. We develop computationally efficient relaxations that transform non-convex orthogonality constrained problems into polynomial-time solvable surrogates. A novel penalization technique is used to enforce feasibility and derive certain conditions under which the constraints of the original non-convex problem are guaranteed to be satisfied. Moreover, we extend our approach to a feasibility-preserving sequential scheme that solves penalized relaxation to obtain near-globally optimal points. Experimental results on synthetic and real datasets demonstrate the effectiveness of the proposed approach on two practical applications in machine learning.

Thursday 15 15:00 - 16:00 AMS|CSC - Computational Social Choice 4 (2703-2704)

Chair: Thanh Nguyen

#2795

An Experimental View on Committees Providing Justified Representation
Robert Bredereck, Piotr Faliszewski, Andrzej Kaczmarczyk, Rolf Niedermeier
Details | PDF

Computational Social Choice 4

We provide an experimental study of committees that achieve (proportional/extended) justified representation (JR/PJR/EJR). In particular, we ask how many such committees exist and how varied they are in terms of voter satisfaction and coverage. We find that under many natural distributions of preferences a large fraction of randomly selected JR committees also provide PJR and EJR. Further, we find that the sets of JR committees for our elections are very varied and include both high-quality ones and not-so-appealing ones.
#3088

Correlating Preferences and Attributes: Nearly Single-Crossing Profiles
Foram Lakhani, Dominik Peters, Edith Elkind
Details | PDF

Computational Social Choice 4

We use social choice theory to develop correlation coefficients between ranked preferences and an ordinal attribute such as educational attainment or income level. For example, such correlations could be used to formalise statements such as "voters' preferences over parties are better explained by age than by income level". In the literature, preferences that are perfectly explained by a single-dimensional agent attribute are commonly taken to be single-crossing preferences. Thus, to quantify how well an attribute explains preferences, we can order the voters by the value of the attribute and compute how far the resulting ordered profile is from being single-crossing, for various commonly studied distance measures (Kendall tau distance, voter/alternative deletion, etc.). The goal of this paper is to evaluate the computational feasibility of this approach. To this end, we investigate the complexity of computing these distances, obtaining an essentially complete picture for the distances we consider.
#2867

Multigoal Committee Selection
Maciej Kocot, Anna Kolonko, Edith Elkind, Piotr Faliszewski, Nimrod Talmon
Details | PDF

Computational Social Choice 4

We study the problem of computing committees that perform well according to several different criteria, which are expressed as committee scoring rules. We analyze the computational complexity of computing such committees and provide an experimental evaluation of the compromise levels that can be achieved between several well-known rules, including k-Borda, SNTV, Bloc, and the Chamberlin--Courant rule.
#108

Strategyproof and Approximately Maxmin Fair Share Allocation of Chores
Haris Aziz, Bo Li, Xiaowei Wu
Details | PDF

Computational Social Choice 4

We initiate the work on fair and strategyproof allocation of indivisible chores. The fairness concept we consider in this paper is maxmin share (MMS) fairness. We consider three previously studied models of information elicited from the agents: the ordinal model, the cardinal model, and the public ranking model in which the ordinal preferences are publicly known. We present both positive and negative results on the level of MMS approximation that can be guaranteed if we require the algorithm to be strategyproof. Our results uncover some interesting contrasts between the approximation ratios achieved for chores versus goods.

Thursday 15 15:00 - 16:00 KRR|KRDUT - Knowledge Representation and Decision ; Utility Theory (2705-2706)

Chair: Ozaki Ana

#1597

How to Handle Missing Values in Multi-Criteria Decision Aiding?
Christophe Labreuche, Sébastien Destercke
Details | PDF

Knowledge Representation and Decision ; Utility Theory

It is often the case in the applications of Multi-Criteria Decision Making that the values of alternatives are unknown on some attributes. An interesting situation arises when the attributes having missing values are actually not relevant and shall thus be removed from the model. Given a model that has been elicited on the complete set of attributes, we are looking thus for a way -- called restriction operator -- to automatically remove the missing attributes from this model. Axiomatic characterizations are proposed for three classes of models. For general quantitative models, the restriction operator is characterized by linearity, recursivity and decomposition on variables. The second class is the set of monotone quantitative models satisfying normalization conditions. The linearity axiom is changed to fit with these conditions. Adding recursivity and symmetry, the restriction operator takes the form of a normalized average. For the last class of models -- namely the Choquet integral, we obtain a simpler expression. Finally, a very intuitive interpretation is provided.
#2110

Neighborhood-Aware Attentional Representation for Multilingual Knowledge Graphs
Qiannan Zhu, Xiaofei Zhou, Jia Wu, Jianlong Tan, Li Guo
Details | PDF

Knowledge Representation and Decision ; Utility Theory

Multilingual knowledge graphs constructed by entity alignment are the indispensable resources for numerous AI-related applications. Most existing entity alignment methods only use the triplet-based knowledge to find the aligned entities across multilingual knowledge graphs, they usually ignore the neighborhood subgraph knowledge of entities that implies more richer alignment information for aligning entities. In this paper, we incorporate neighborhood subgraph-level information of entities, and propose a neighborhood-aware attentional representation method NAEA for multilingual knowledge graphs. NAEA devises an attention mechanism to learn neighbor-level representation by aggregating neighbors' representations with a weighted combination. The attention mechanism enables entities not only capture different impacts of their neighbors on themselves, but also attend over their neighbors' feature representations with different importance. We evaluate our model on two real-world datasets DBP15K and DWY100K, and the experimental results show that the proposed model NAEA significantly and consistently outperforms state-of-the-art entity alignment models.
#4971

BiOWA for Preference Aggregation with Bipolar Scales: Application to Fair Optimization in Combinatorial Domains
Hugo Martin, Patrice Perny
Details | PDF

Knowledge Representation and Decision ; Utility Theory

We study the biOWA model for preference aggregation and multicriteria decision making from bipolar rating scales. A biOWA is an ordered doubly weighted averaging extending standard ordered weighted averaging (OWA) and allowing a finer control of the importance attached to positive and negative evaluations in the aggregation. After establishing some useful properties of biOWA to generate balanced Pareto-optimal solutions, we address fair biOWA-optimization problems in combinatorial domains. We first consider the use of biOWA in multi-winner elections for aggregating graded approval and disapproval judgements. Then we consider the use of biOWA for solving robust path problems with costs expressing gains and losses. A linearization of biOWA is proposed, allowing both problems to be solved by MIP. A path-ranking algorithm for biOWA optimization is also proposed. Numerical tests are provided to show the practical efficiency of our models.
#1888

Unit Selection Based on Counterfactual Logic
Ang Li, Judea Pearl
Details | PDF

Knowledge Representation and Decision ; Utility Theory

The unit selection problem aims to identify a set of individuals who are most likely to exhibit a desired mode of behavior, which is defined in counterfactual terms. A typical example is that of selecting individuals who would respond one way if encouraged and a different way if not encouraged. Unlike previous works on this problem, which rely on ad-hoc heuristics, we approach this problem formally, using counterfactual logic, to properly capture the nature of the desired behavior. This formalism enables us to derive an informative selection criterion which integrates experimental and observational data. We demonstrate the superiority of this criterion over A/B-test-based approaches.

Thursday 15 15:00 - 16:00 ML|UL - Unsupervised Learning 2 (2601-2602)

Chair: Gao Quanxue

#683

Affine Equivariant Autoencoder
Xifeng Guo, En Zhu, Xinwang Liu, Jianping Yin
Details | PDF

Unsupervised Learning 2

Existing deep neural networks mainly focus on learning transformation invariant features. However, it is the equivariant features that are more adequate for general purpose tasks. Unfortunately, few work has been devoted to learning equivariant features. To fill this gap, in this paper, we propose an affine equivariant autoencoder to learn features that are equivariant to the affine transformation in an unsupervised manner. The objective consists of the self-reconstruction of the original example and affine transformed example, and the approximation of the affine transformation function, where the reconstruction makes the encoder a valid feature extractor and the approximation encourages the equivariance. Extensive experiments are conducted to validate the equivariance and discriminative ability of the features learned by our affine equivariant autoencoder.
#4890

Object Detection based Deep Unsupervised Hashing
Rong-Cheng Tu, Xian-Ling Mao, Bo-Si Feng, Shu-ying Yu
Details | PDF

Unsupervised Learning 2

Recently, similarity-preserving hashing methods have been extensively studied for large-scale image retrieval. Compared with unsupervised hashing, supervised hashing methods for labeled data have usually better performance by utilizing semantic label information. Intuitively, for unlabeled data, it will improve the performance of unsupervised hashing methods if we can first mine some supervised semantic 'label information' from unlabeled data and then incorporate the 'label information' into the training process. Thus, in this paper, we propose a novel Object Detection based Deep Unsupervised Hashing method (ODDUH). Specifically, a pre-trained object detection model is utilized to mining supervised 'label information', which is used to guide the learning process to generate high-quality hash codes. Extensive experiments on two public datasets demonstrate that the proposed method outperforms the state-of-the-art unsupervised hashing methods in the image retrieval task.
#5589

Learning K-way D-dimensional Discrete Embedding for Hierarchical Data Visualization and Retrieval
Xiaoyuan Liang, Martin Renqiang Min, Hongyu Guo, Guiling Wang
Details | PDF

Unsupervised Learning 2

Traditional embedding approaches associate a real-valued embedding vector with each symbol or data point, which is equivalent to applying a linear transformation to ``one-hot" encoding of discrete symbols or data objects. Despite simplicity, these methods generate storage-inefficient representations and fail to effectively encode the internal semantic structure of data, especially when the number of symbols or data points and the dimensionality of the real-valued embedding vectors are large. In this paper, we propose a regularized autoencoder framework to learn compact Hierarchical K-way D-dimensional (HKD) discrete embedding of symbols or data points, aiming at capturing essential semantic structures of data. Experimental results on synthetic and real-world datasets show that our proposed HKD embedding can effectively reveal the semantic structure of data via hierarchical data visualization and greatly reduce the search space of nearest neighbor retrieval while preserving high accuracy.
#6538

Robust Low-Tubal-Rank Tensor Completion via Convex Optimization
Qiang Jiang, Michael Ng
Details | PDF

Unsupervised Learning 2

This paper considers the problem of recovering multidimensional array, in particular third-order tensor, from a random subset of its arbitrarily corrupted entries. Our study is based on a recently proposed algebraic framework in which the tensor-SVD is introduced to capture the low-tubal-rank structure in tensor. We analyze the performance of a convex program, which minimizes a weighted combination of the tensor nuclear norm, a convex surrogate for the tensor tubal rank, and the tensor l1 norm. We prove that under certain incoherence conditions, this program can recover the tensor exactly with overwhelming probability, provided that its tubal rank is not too large and that the corruptions are reasonably sparse. The number of required observations is order optimal (up to a logarithm factor) when comparing with the degrees of freedom of the low-tubal-rank tensor. Numerical experiments verify our theoretical results and real-world applications demonstrate the effectiveness of our algorithm.

Thursday 15 15:00 - 16:00 KRR|CCR - Computational Complexity of Reasoning 2 (2603-2604)

Chair: Rakib Abdur

#2442

DatalogMTL: Computational Complexity and Expressive Power
Przemysław A. Wałęga, Bernardo Cuenca Grau, Mark Kaminski, Egor V. Kostylev
Details | PDF

Computational Complexity of Reasoning 2

We study the complexity and expressive power of DatalogMTL - a knowledge representation language that extends Datalog with operators from metric temporal logic (MTL) and which has found applications in ontology-based data access and stream reasoning. We establish tight PSpace data complexity bounds and also show that DatalogMTL extended with negation on input predicates can express all queries in PSpace; this implies that MTL operators add significant expressive power to Datalog. Furthermore, we provide tight combined complexity bounds for the forward-propagating fragment of DatalogMTL, which was proposed in the context of stream reasoning, and show that it is possible to express all PSpace queries in the fragment extended with the falsum predicate.
#2859

On Division Versus Saturation in Pseudo-Boolean Solving
Stephan Gocht, Jakob Nordström, Amir Yehudayoff
Details | PDF

Computational Complexity of Reasoning 2

The conflict-driven clause learning (CDCL) paradigm has revolutionized SAT solving over the last two decades. Extending this approach to pseudo-Boolean (PB) solvers doing 0-1 linear programming holds the promise of further exponential improvements in theory, but intriguingly such gains have not materialized in practice. Also intriguingly, most PB extensions of CDCL use not the division rule in cutting planes as defined in [Cook et al., '87] but instead the so-called saturation rule. To the best of our knowledge, there has been no study comparing the strengths of division and saturation in the context of conflict-driven PB learning, when all linear combinations of inequalities are required to cancel variables. We show that PB solvers with division instead of saturation can be exponentially stronger. In the other direction, we prove that simulating a single saturation step can require an exponential number of divisions. We also perform some experiments to see whether these phenomena can be observed in actual solvers. Our conclusion is that a careful combination of division and saturation seems to be crucial to harness more of the power of cutting planes.
#4218

Measuring the Likelihood of Numerical Constraints
Marco Console, Matthias Hofer, Leonid Libkin
Details | PDF

Computational Complexity of Reasoning 2

Our goal is to measure the likelihood of the satisfaction of numerical constraints in the absence of prior information. We study expressive constraints, involving arithmetic and complex numerical functions, and even quantification over numbers. Such problems arise in processing incomplete data, or analyzing conditions in programs without a priori bounds on variables. We show that for constraints on n variables, the proper way to define such a measure is as the limit of the part of the n-dimensional ball that consists of points satisfying the constraints, when the radius increases. We prove that the existence of such a limit is closely related to the notion of o-minimality from model theory. Thus, for constraints definable with the usual arithmetic and exponentiation, the likelihood is well defined, but adding trigonometric functions is problematic. We look at computing and approximating such likelihoods for order and linear constraints, and prove an impossibility result for approximating with multiplicative error. However, as the likelihood is a number between 0 and 1, an approximation scheme with additive error is acceptable, and we give it for arbitrary linear constraints.
#5122

Some Things are Easier for the Dumb and the Bright Ones (Beware the Average!)
Wojciech Jamroga, Michał Knapik
Details | PDF

Computational Complexity of Reasoning 2

Model checking strategic abilities in multi-agent systems is hard, especially for agents with partial observability of the state of the system. In that case, it ranges from NP-complete to undecidable, depending on the precise syntax and the semantic variant. That, however, is the worst case complexity, and the problem might as well be easier when restricted to particular subclasses of inputs. In this paper, we look at the verification of models with "extreme" epistemic structure, and identify several special cases for which model checking is easier than in general. We also prove that, in the other cases, no gain is possible even if the agents have almost full (or almost nil) observability. To prove the latter kind of results, we develop generic techniques that may be useful also outside of this study.

Thursday 15 15:00 - 16:00 NLP|E - Embeddings (2605-2606)

Chair: Zhu Zichen

#1036

Multi-view Knowledge Graph Embedding for Entity Alignment
Qingheng Zhang, Zequn Sun, Wei Hu, Muhao Chen, Lingbing Guo, Yuzhong Qu
Details | PDF

Embeddings

We study the problem of embedding-based entity alignment between knowledge graphs (KGs). Previous works mainly focus on the relational structure of entities. Some further incorporate another type of features, such as attributes, for refinement. However, a vast of entity features are still unexplored or not equally treated together, which impairs the accuracy and robustness of embedding-based entity alignment. In this paper, we propose a novel framework that unifies multiple views of entities to learn embeddings for entity alignment. Specifically, we embed entities based on the views of entity names, relations and attributes, with several combination strategies. Furthermore, we design some cross-KG inference methods to enhance the alignment between two KGs. Our experiments on real-world datasets show that the proposed framework significantly outperforms the state-of-the-art embedding-based entity alignment methods. The selected views, cross-KG inference and combination strategies all contribute to the performance improvement.
#5038

Representation Learning with Weighted Inner Product for Universal Approximation of General Similarities
Geewook Kim, Akifumi Okuno, Kazuki Fukui, Hidetoshi Shimodaira
Details | PDF

Embeddings

We propose weighted inner product similarity (WIPS) for neural network-based graph embedding. In addition to the parameters of neural networks, we optimize the weights of the inner product by allowing positive and negative values. Despite its simplicity, WIPS can approximate arbitrary general similarities including positive definite, conditionally positive definite, and indefinite kernels. WIPS is free from similarity model selection, since it can learn any similarity models such as cosine similarity, negative Poincaré distance and negative Wasserstein distance. Our experiments show that the proposed method can learn high-quality distributed representations of nodes from real datasets, leading to an accurate approximation of similarities as well as high performance in inductive tasks.
#5878

Triplet Enhanced AutoEncoder: Model-free Discriminative Network Embedding
Yao Yang, Haoran Chen, Junming Shao
Details | PDF

Embeddings

Deep autoencoder is widely used in dimensionality reduction because of the expressive power of the neural network. Therefore, it is naturally suitable for embedding tasks, which essentially compresses high-dimensional information into a low-dimensional latent space. In terms of network representation, methods based on autoencoder such as SDNE and DNGR have achieved comparable results with the state-of-arts. However, all of them do not leverage label information, which leads to the embeddings lack the characteristic of discrimination. In this paper, we present Triplet Enhanced AutoEncoder (TEA), a new deep network embedding approach from the perspective of metric learning. Equipped with the triplet-loss constraint, the proposed approach not only allows capturing the topological structure but also preserving the discriminative information. Moreover, unlike existing discriminative embedding techniques, TEA is independent of any specific classifier, we call it the model-free property. Extensive empirical results on three public datasets (i.e, Cora, Citeseer and BlogCatalog) show that TEA is stable and achieves state-of-the-art performance compared with both supervised and unsupervised network embedding approaches on various percentages of labeled data. The source code can be obtained from https://github.com/yybeta/TEA.
#5385

Unsupervised Embedding Enhancements of Knowledge Graphs using Textual Associations
Neil Veira, Brian Keng, Kanchana Padmanabhan, Andreas Veneris
Details | PDF

Embeddings

Knowledge graph embeddings are instrumental for representing and learning from multi-relational data, with recent embedding models showing high effectiveness for inferring new facts from existing databases. However, such precisely structured data is usually limited in quantity and in scope. Therefore, to fully optimize the embeddings it is important to also consider more widely available sources of information such as text. This paper describes an unsupervised approach to incorporate textual information by augmenting entity embeddings with embeddings of associated words. The approach does not modify the optimization objective for the knowledge graph embedding, which allows it to be integrated with existing embedding models. Two distinct forms of textual data are considered, with different embedding enhancements proposed for each case. In the first case, each entity has an associated text document that describes it. In the second case, a text document is not available, and instead entities occur as words or phrases in an unstructured corpus of text fragments. Experiments show that both methods can offer improvement on the link prediction task when applied to many different knowledge graph embedding models.

Thursday 15 15:00 - 16:00 CV|MT - Motion and Tracking (2501-2502)

Chair: Chao Ma

#161

A Deep Bi-directional Attention Network for Human Motion Recovery
Qiongjie Cui, Huaijiang Sun, Yupeng Li, Yue Kong
Details | PDF

Motion and Tracking

Human motion capture (mocap) data, recording the movement of markers attached to specific joints, has gradually become the most popular solution of animation production. However, the raw motion data are often corrupted due to joint occlusion, marker shedding and the lack of equipment precision, which severely limits the performance in real-world applications. Since human motion is essentially a sequential data, the latest methods resort to variants of long short-time memory network (LSTM) to solve related problems, but most of them tend to obtain visually unreasonable results. This is mainly because these methods hardly capture long-term dependencies and cannot explicitly utilize relevant context, especially in long sequences. To address these issues, we propose a deep bi-directional attention network (BAN) which can not only capture the long-term dependencies but also adaptively extract relevant information at each time step. Moreover, the proposed model, embedded attention mechanism in the bi-directional LSTM (BLSTM) structure at the encoding and decoding stages, can decide where to borrow information and use it to recover corrupted frame effectively. Extensive experiments on CMU database demonstrate that the proposed model consistently outperforms other state-of-the-art methods in terms of recovery accuracy and visualization.
#1710

Rectified Binary Convolutional Networks for Enhancing the Performance of 1-bit DCNNs
Chunlei Liu, Wenrui Ding, Xin Xia, Yuan Hu, Baochang Zhang, Jianzhuang Liu, Bohan Zhuang, Guodong Guo
Details | PDF

Motion and Tracking

Binarized convolutional neural networks (BCNNs) are widely used to improve memory and computation efficiency of deep convolutional neural networks (DCNNs) for mobile and AI chips based applications. However, current BCNNs are not able to fully explore their corresponding full-precision models, causing a significant performance gap between them. In this paper, we propose rectified binary convolutional networks (RBCNs), towards optimized BCNNs, by combining full-precision kernels and feature maps to rectify the binarization process in a unified framework. In particular, we use a GAN to train the 1-bit binary network with the guidance of its corresponding full-precision model, which significantly improves the performance of BCNNs. The rectified convolutional layers are generic and flexible, and can be easily incorporated into existing DCNNs such as WideResNets and ResNets. Extensive experiments demonstrate the superior performance of the proposed RBCNs over state-of-the-art BCNNs. In particular, our method shows strong generalization on the object tracking task.
#4451

Capturing Spatial and Temporal Patterns for Facial Landmark Tracking through Adversarial Learning
Shi Yin, Shangfei Wang, Guozhu Peng, Xiaoping Chen, Bowen Pan
Details | PDF

Motion and Tracking

The spatial and temporal patterns inherent in facial feature points are crucial for facial landmark tracking, but have not been thoroughly explored yet. In this paper, we propose a novel deep adversarial framework to explore the shape and temporal dependencies from both appearance level and target label level. The proposed deep adversarial framework consists of a deep landmark tracker and a discriminator. The deep landmark tracker is composed of a stacked Hourglass network as well as a convolutional neural network and a long short-term memory network, and thus implicitly capture spatial and temporal patterns from facial appearance for facial landmark tracking. The discriminator is adopted to distinguish the tracked facial landmarks from ground truth ones. It explicitly models shape and temporal dependencies existing in ground truth facial landmarks through another convolutional neural network and another long short-term memory network. The deep landmark tracker and the discriminator compete with each other. Through adversarial learning, the proposed deep adversarial landmark tracking approach leverages inherent spatial and temporal patterns to facilitate facial landmark tracking from both appearance level and target label level. Experimental results on two benchmark databases demonstrate the superiority of the proposed approach to state-of-the-art work.
#5588

On Retrospecting Human Dynamics with Attention
Minjing Dong, Chang Xu
Details | PDF

Motion and Tracking

Deep recurrent neural networks have achieved impressive success in forecasting human motion with a sequence to sequence architecture. However, forecasting in longer time horizons often leads to implausible human poses or converges to mean poses, because of error accumulation and difficulties in keeping track of longerterm information. To address these challenges, we propose to retrospect human dynamics with attention. A retrospection module is designed upon RNN to regularly retrospect past frames and correct mistakes in time. This significantly improves the memory of RNN and provides sufficient information for the decoder networks to generate longer term prediction. Moreover, we present a spatial attention module to explore and exploit cooperation among joints in performing a particular motion. Residual connections are also included to guarantee the performance of short term prediction. We evaluate the proposed algorithm on the largest and most challenging Human 3.6M dataset in the field. Experimental results demonstrate the necessity of investigating motion prediction in a self audit manner and the effectiveness of the proposed algorithm in both short term and long term predictions.

Thursday 15 15:00 - 16:00 NLP|TC - Text Classification (2503-2504)

Chair: Sira Ferradans

#1998

GANs for Semi-Supervised Opinion Spam Detection
Gray Stanton, Athirai A. Irissappane
Details | PDF

Text Classification

Online reviews have become a vital source of information in purchasing a service (product). Opinion spammers manipulate reviews, affecting the overall perception of the service. A key challenge in detecting opinion spam is obtaining ground truth. Though there exists a large set of reviews, only a few of them have been labeled spam or non-spam. We propose spamGAN, a generative adversarial network which relies on limited labeled data as well as unlabeled data for opinion spam detection. spamGAN improves the state-of-the-art GAN based techniques for text classification. Experiments on TripAdvisor data show that spamGAN outperforms existing techniques when labeled data is limited. spamGAN can also generate reviews with reasonable perplexity.
#2158

Reading selectively via Binary Input Gated Recurrent Unit
Zhe Li, Peisong Wang, Hanqing Lu, Jian Cheng
Details | PDF

Text Classification

Recurrent Neural Networks (RNNs) have shown great promise in sequence modeling tasks. Gated Recurrent Unit (GRU) is one of the most used recurrent structures, which makes a good trade-off between performance and time spent. However, its practical implementation based on soft gates only partially achieves the goal to control information flow. We can hardly explain what the network has learnt internally. Inspired by human reading, we introduce binary input gated recurrent unit (BIGRU), a GRU based model using a binary input gate instead of the reset gate in GRU. By doing so, our model can read selectively during interference. In our experiments, we show that BIGRU mainly ignores the conjunctions, adverbs and articles that do not make a big difference to the document understanding, which is meaningful for us to further understand how the network works. In addition, due to reduced interference from redundant information, our model achieves better performances than baseline GRU in all the testing tasks.
#4837

Dynamically Route Hierarchical Structure Representation to Attentive Capsule for Text Classification
Wanshan Zheng, Zibin Zheng, Hai Wan, Chuan Chen
Details | PDF

Text Classification

Representation learning and feature aggregation are usually the two key intermediate steps in natural language processing. Despite deep neural networks have shown strong performance in the text classification task, they are unable to learn adaptive structure features automatically and lack of a method for fully utilizing the extracted features. In this paper, we propose a novel architecture that dynamically routes hierarchical structure feature to attentive capsule, named HAC. Specifically, we first adopt intermediate information of a well-designed deep dilated CNN to form hierarchical structure features. Different levels of structure representations are corresponding to various linguistic units such as word, phrase and clause, respectively. Furthermore, we design a capsule module using dynamic routing and equip it with an attention mechanism. The attentive capsule implements an effective aggregation strategy for feature clustering and selection. Extensive results on eleven benchmark datasets demonstrate that the proposed model obtains competitive performance against several state-of-the-art baselines. Our code is available at https://github.com/zhengwsh/HAC.
#2304

Earlier Attention? Aspect-Aware LSTM for Aspect-Based Sentiment Analysis
Bowen Xing, Lejian Liao, Dandan Song, Jingang Wang, Fuzheng Zhang, Zhongyuan Wang, Heyan Huang
Details | PDF

Text Classification

Aspect-based sentiment analysis (ABSA) aims to predict fine-grained sentiments of comments with respect to given aspect terms or categories. In previous ABSA methods, the importance of aspect has been realized and verified. Most existing LSTM-based models take aspect into account via the attention mechanism, where the attention weights are calculated after the context is modeled in the form of contextual vectors. However, aspect-related information may be already discarded and aspect-irrelevant information may be retained in classic LSTM cells in the context modeling process, which can be improved to generate more effective context representations. This paper proposes a novel variant of LSTM, termed as aspect-aware LSTM (AA-LSTM), which incorporates aspect information into LSTM cells in the context modeling stage before the attention mechanism. Therefore, our AA-LSTM can dynamically produce aspect-aware contextual representations. We experiment with several representative LSTM-based models by replacing the classic LSTM cells with the AA-LSTM cells. Experimental results on SemEval-2014 Datasets demonstrate the effectiveness of AA-LSTM.

Thursday 15 15:00 - 16:00 MTA|D - Databases (2505-2506)

Chair: Xin Huang

#1811

K-Core Maximization: An Edge Addition Approach
Zhongxin Zhou, Fan Zhang, Xuemin Lin, Wenjie Zhang, Chen Chen
Details | PDF

Databases

A popular model to measure the stability of a network is k-core - the maximal induced subgraph in which every vertex has at least k neighbors. Many studies maximize the number of vertices in k-core to improve the stability of a network. In this paper, we study the edge k-core problem: Given a graph G, an integer k and a budget b, add b edges to non-adjacent vertex pairs in G such that the k-core is maximized. We prove the problem is NP-hard and APX-hard. A heuristic algorithm is proposed on general graphs with effective optimization techniques. Comprehensive experiments on 9 real-life datasets demonstrate the effectiveness and the efficiency of our proposed methods.
#4409

Toward Efficient Navigation of Massive-Scale Geo-Textual Streams
Chengcheng Yang, Lisi Chen, Shuo Shang, Fan Zhu, Li Liu, Ling Shao
Details | PDF

Databases

With the popularization of portable devices, numerous applications continuously produce huge streams of geo-tagged textual data, thus posing challenges to index geo-textual streaming data efficiently, which is an important task in both data management and AI applications, e.g., real-time data streams mining and targeted advertising. This, however, is not possible with the state-of-the-art indexing methods as they focus on search optimizations of static datasets, and have high index maintenance cost. In this paper, we present NQ-tree, which combines new structure designs and self-tuning methods to navigate between update and search efficiency. Our contributions include: (1) the design of multiple stores each with a different emphasis on write-friendness and read-friendness; (2) utilizing data compression techniques to reduce the I/O cost; (3) exploiting both spatial and keyword information to improve the pruning efficiency; (4) proposing an analytical cost model, and using an online self-tuning method to achieve efficient accesses to different workloads. Experiments on two real-world datasets show that NQ-tree outperforms two well designed baselines by up to 10×.
#5947

Pivotal Relationship Identification: The K-Truss Minimization Problem
Weijie Zhu, Mengqi Zhang, Chen Chen, Xiaoyang Wang, Fan Zhang, Xuemin Lin
Details | PDF

Databases

In a social network, the strength of relationships between users can significantly affect the stability of the network. In this paper, we use the k-truss model to measure the stability of a social network. To identify critical connections, we propose a novel problem, named k-truss minimization. Given a social network G and a budget b, it aims to find b edges for deletion which can lead to the maximum number of edge breaks in the k-truss of G. We show that the problem is NP-hard. To accelerate the computation, novel pruning rules are developed to reduce the candidate size. In addition, we propose an upper bound based strategy to further reduce the searching space. Comprehensive experiments are conducted over real social networks to demonstrate the efficiency and effectiveness of the proposed techniques.

Thursday 15 15:00 - 16:00 KRR|GSTR - Geometric, Spatial, and Temporal Reasoning 2 (2401-2402)

Chair: Yong Gao

#723

Graph Convolutional Networks using Heat Kernel for Semi-supervised Learning
Bingbing Xu, Huawei Shen, Qi Cao, Keting Cen, Xueqi Cheng
Details | PDF

Geometric, Spatial, and Temporal Reasoning 2

Graph convolutional networks gain remarkable success in semi-supervised learning on graph-structured data. The key to graph-based semisupervised learning is capturing the smoothness of labels or features over nodes exerted by graph structure. Previous methods, spectral methods and spatial methods, devote to defining graph convolution as a weighted average over neighboring nodes, and then learn graph convolution kernels to leverage the smoothness to improve the performance of graph-based semi-supervised learning. One open challenge is how to determine appropriate neighborhood that reflects relevant information of smoothness manifested in graph structure. In this paper, we propose GraphHeat, leveraging heat kernel to enhance low-frequency filters and enforce smoothness in the signal variation on the graph. GraphHeat leverages the local structure of target node under heat diffusion to determine its neighboring nodes flexibly, without the constraint of order suffered by previous methods. GraphHeat achieves state-of-the-art results in the task of graph-based semi-supervised classification across three benchmark datasets: Cora, Citeseer and Pubmed.
#1837

Geo-ALM: POI Recommendation by Fusing Geographical Information and Adversarial Learning Mechanism
Wei Liu, Zhi-Jie Wang, Bin Yao, Jian Yin
Details | PDF

Geometric, Spatial, and Temporal Reasoning 2

Learning user’s preference from check-in data is important for POI recommendation. Yet, a user usually has visited some POIs while most of POIs are unvisited (i.e., negative samples). To leverage these “no-behavior” POIs, a typical approach is pairwise ranking, which constructs ranking pairs for the user and POIs. Although this approach is generally effective, the negative samples in ranking pairs are obtained randomly, which may fail to leverage “critical” negative samples in the model training. On the other hand, previous studies also utilized geographical feature to improve the recommendation quality. Nevertheless, most of previous works did not exploit geographical information comprehensively, which may also affect the performance. To alleviate these issues, we propose a geographical information based adversarial learning model (Geo-ALM), which can be viewed as a fusion of geographic features and generative adversarial networks. Its core idea is to learn the discriminator and generator interactively, by exploiting two granularity of geographic features (i.e., region and POI features). Experimental results show that Geo- ALM can achieve competitive performance, compared to several state-of-the-arts.
#3328

Profit-driven Task Assignment in Spatial Crowdsourcing
Jinfu Xia, Yan Zhao, Guanfeng Liu, Jiajie Xu, Min Zhang, Kai Zheng
Details | PDF

Geometric, Spatial, and Temporal Reasoning 2

In Spatial Crowdsourcing (SC) systems, mobile users are enabled to perform spatio-temporal tasks by physically traveling to specified locations with the SC platforms. SC platforms manage the systems and recruit mobile users to contribute to the SC systems, whose commercial success depends on the profit attained from the task requesters. In order to maximize its profit, an SC platform needs an online management mechanism to assign the tasks to suitable workers. How to assign the tasks to workers more cost-effectively with the spatio-temporal constraints is one of the most difficult problems in SC. To deal with this challenge, we propose a novel Profit-driven Task Assignment (PTA) problem, which aims to maximize the profit of the platform. Specifically, we first establish a task reward pricing model with tasks' temporal constraints (i.e., expected completion time and deadline). Then we adopt an optimal algorithm based on tree decomposition to achieve the optimal task assignment and propose greedy algorithms to improve the computational efficiency. Finally, we conduct extensive experiments using real and synthetic datasets, verifying the practicability of our proposed methods.
#3352

Aggressive Driving Saves More Time? Multi-task Learning for Customized Travel Time Estimation
Ruipeng Gao, Xiaoyu Guo, Fuyong Sun, Lin Dai, Jiayan Zhu, Chenxi Hu, Haibo Li
Details | PDF

Geometric, Spatial, and Temporal Reasoning 2

Estimating the origin-destination travel time is a fundamental problem in many location-based services for vehicles, e.g., ride-hailing, vehicle dispatching, and route planning. Recent work has made significant progress to accuracy but they largely rely on GPS traces which are too coarse to model many personalized driving events. In this paper, we propose Customized Travel Time Estimation (CTTE) that fuses GPS traces, smartphone inertial data, and road network within a deep recurrent neural network. It constructs a link traffic database with topology representation, speed statistics, and query distribution. It also uses inertial data to estimate the arbitrary phone's pose in car, and detects fine-grained driving events. The multi-task learning structure predicts both traffic speed at public level and customized travel time at personal level. Extensive experiments on two real-world traffic datasets from Didi Chuxing have demonstrated our effectiveness.

Thursday 15 15:00 - 16:00 HSGP|CSO - Combinatorial Search and Optimisation 2 (2403-2404)

Chair: Takanori Maehara

#14

Deanonymizing Social Networks Using Structural Information
Ioannis Caragiannis, Evanthia Tsitsoka
Details | PDF

Combinatorial Search and Optimisation 2

We study the following fundamental graph problem that models the important task of deanonymizing social networks. We are given a graph representing an eponymous social network and another graph, representing an anonymous social network, which has been produced by the original one after removing some of its nodes and adding some noise on the links. Our objective is to correctly associate as many nodes of the anonymous network as possible to their corresponding node in the eponymous network. We present two algorithms that attack the problem by exploiting only the structure of the two graphs. The first one exploits bipartite matching computations and is relatively fast. The second one is a local search heuristic which can use the outcome of our first algorithm as an initial solution and further improve it. We have applied our algorithms on inputs that have been produced by well-known random models for the generation of social networks as well as on inputs that use real social networks. Our algorithms can tolerate noise at the level of up to 10%. Interestingly, our results provide further evidence to which graph generation models are most suitable for modeling social networks and distinguish them from unrealistic ones.
#2921

An Efficient Evolutionary Algorithm for Minimum Cost Submodular Cover
Victoria G. Crawford
Details | PDF

Combinatorial Search and Optimisation 2

In this paper, the Minimum Cost Submodular Cover problem is studied, which is to minimize a modular cost function such that the monotone submodular benefit function is above a threshold. For this problem, an evolutionary algorithm EASC is introduced that achieves a constant, bicriteria approximation in expected polynomial time; this is the first polynomial-time evolutionary approximation algorithm for Minimum Cost Submodular Cover. To achieve this running time, ideas motivated by submodularity and monotonicity are incorporated into the evolutionary process, which likely will extend to other submodular optimization problems. In a practical application, EASC is demonstrated to outperform the greedy algorithm and converge faster than competing evolutionary algorithms for this problem.
#5149

Integrating Pseudo-Boolean Constraint Reasoning in Multi-Objective Evolutionary Algorithms
Miguel Terra-Neves, Inês Lynce, Vasco Manquinho
Details | PDF

Combinatorial Search and Optimisation 2

Constraint-based reasoning methods thrive in solving problem instances with a tight solution space. On the other hand, evolutionary algorithms are usually effective when it is not hard to satisfy the problem constraints. This dichotomy has been observed in many optimization problems. In the particular case of Multi-Objective Combinatorial Optimization (MOCO), new recently proposed constraint-based algorithms have been shown to outperform more established evolutionary approaches when a given problem instance is hard to satisfy. In this paper, we propose the integration of constraint-based procedures in evolutionary algorithms for solving MOCO. First, a new core-based smart mutation operator is applied to individuals that do not satisfy all problem constraints. Additionally, a new smart improvement operator based on Minimal Correction Subsets is used to improve the quality of the population. Experimental results clearly show that the integration of these operators greatly improves multi-objective evolutionary algorithms MOEA/D and NSGAII. Moreover, even on problem instances with a tight solution space, the newly proposed algorithms outperform the state-of-the-art constraint-based approaches for MOCO.
#2975

Stochastic Constraint Propagation for Mining Probabilistic Networks
Anna Louise D. Latour, Behrouz Babaki, Siegfried Nijssen
Details | PDF

Combinatorial Search and Optimisation 2

A number of data mining problems on probabilistic networks can be modeled as Stochastic Constraint Optimization and Satisfaction Problems, i.e., problems that involve objectives or constraints with a stochastic component. Earlier methods for solving these problems used Ordered Binary Decision Diagrams (OBDDs) to represent constraints on probability distributions, which were decomposed into sets of smaller constraints and solved by Constraint Programming (CP) or Mixed Integer Programming (MIP) solvers. For the specific case of monotonic distributions, we propose an alternative method: a new propagator for a global OBDD-based constraint. We show that this propagator is (sub-)linear in the size of the OBDD, and maintains domain consistency. We experimentally evaluate the effectiveness of this global constraint in comparison to existing decomposition-based approaches, and show how this propagator can be used in combination with another data mining specific constraint present in CP systems. As test cases we use problems from the data mining literature.

Thursday 15 15:00 - 16:00 ML|MMM - Multi-instance;Multi-label;Multi-view learning 1 (2405-2406)

Chair: Xin Geng

#2145

Accelerating Extreme Classification via Adaptive Feature Agglomeration
Ankit Jalan, Purushottam Kar
Details | PDF

Multi-instance;Multi-label;Multi-view learning 1

Extreme classification seeks to assign each data point, the most relevant labels from a universe of a million or more labels. This task is faced with the dual challenge of high precision and scalability, with millisecond level prediction times being a benchmark. We propose DEFRAG, an adaptive feature agglomeration technique to accelerate extreme classification algorithms. Despite past works on feature clustering and selection, DEFRAG distinguishes itself in being able to scale to millions of features, and is especially beneficial when feature sets are sparse, which is typical of recommendation and multi-label datasets. The method comes with provable performance guarantees and performs efficient task-driven agglomeration to reduce feature dimensionalities by an order of magnitude or more. Experiments show that DEFRAG can not only reduce training and prediction times of several leading extreme classification algorithms by as much as 40%, but also be used for feature reconstruction to address the problem of missing features, as well as offer superior coverage on rare labels.
#4179

Latent Semantics Encoding for Label Distribution Learning
Suping Xu, Lin Shang, Furao Shen
Details | PDF

Multi-instance;Multi-label;Multi-view learning 1

Label distribution learning (LDL) is a newly arisen learning paradigm to deal with label ambiguity problems, which can explore the relative importance of different labels in the description of a particular instance. Although some existing LDL algorithms have achieved better effectiveness in real applications, most of them typically emphasize on improving the learning ability by manipulating the label space, while ignoring the fact that irrelevant and redundant features exist in most practical classification learning tasks, which increase not only storage requirements but also computational overheads. Furthermore, noises in data acquisition will bring negative effects on the generalization performance of LDL algorithms. In this paper, we propose a novel algorithm, i.e., Latent Semantics Encoding for Label Distribution Learning (LSE-LDL), which learns the label distribution and implements feature selection simultaneously under the guidance of latent semantics. Specifically, to alleviate noise disturbances, we seek and encode discriminative original physical/chemical features into advanced latent semantic features, and then construct a mapping from the encoded semantic space to the label space via empirical risk minimization. Empirical studies on 15 real-world data sets validate the effectiveness of the proposed algorithm.
#5021

Discriminative and Correlative Partial Multi-Label Learning
Haobo Wang, Weiwei Liu, Yang Zhao, Chen Zhang, Tianlei Hu, Gang Chen
Details | PDF

Multi-instance;Multi-label;Multi-view learning 1

In partial label learning (PML), each instance is associated with a candidate label set that contains multiple relevant labels and other false positive labels. The most challenging issue for the PML is that the training procedure is prone to be affected by the labeling noise. We observe that state-of-the-art PML methods are either powerless to disambiguate the correct labels from the candidate labels or incapable of extracting the label correlations sufficiently. To fill this gap, a two-stage DiscRiminative and correlAtive partial Multi-label leArning (DRAMA) algorithm is presented in this work. In the first stage, a confidence value is learned for each label by utilizing the feature manifold, which indicates how likely a label is correct. In the second stage, a gradient boosting model is induced to fit the label confidences. Specifically, to explore the label correlations, we augment the feature space by the previously elicited labels on each boosting round. Extensive experiments on various real-world datasets clearly validate the superiority of our proposed method.
#5457

Spectral Perturbation Meets Incomplete Multi-view Data
Hao Wang, Linlin Zong, Bing Liu, Yan Yang, Wei Zhou
Details | PDF

Multi-instance;Multi-label;Multi-view learning 1

Beyond existing multi-view clustering, this paper studies a more realistic clustering scenario, referred to as incomplete multi-view clustering, where a number of data instances are missing in certain views. To tackle this problem, we explore spectral perturbation theory. In this work, we show a strong link between perturbation risk bounds and incomplete multi-view clustering. That is, as the similarity matrix fed into spectral clustering is a quantity bounded in magnitude O(1), we transfer the missing problem from data to similarity and tailor a matrix completion method for incomplete similarity matrix. Moreover, we show that the minimization of perturbation risk bounds among different views maximizes the final fusion result across all views. This provides a solid fusion criteria for multi-view data. We motivate and propose a Perturbation-oriented Incomplete multi-view Clustering (PIC) method. Experimental results demonstrate the effectiveness of the proposed method.

Thursday 15 15:00 - 16:00 Early Career 5 - Early Career Spotlight 5 (2306)

Chair: Qingsong Wen

#11066

AI Planning for Enterprise: Putting Theory Into Practice
Shirin Sohrabi
Details | PDF

Early Career Spotlight 5

In this paper, I overview a number of AI Planning applications for Enterprise and discuss a number of challenges in applying AI Planning in that setting. I will also summarize the progress made to date in addressing these challenges.
#11064

Domain-Dependent and Domain-Independent Problem Solving Techniques
Roni Stern
Details | PDF

Early Career Spotlight 5

Heuristic search is a general problem-solving method. Some heuristic search algorithms, like the well-known A* algorithm, are domain-independent, in the sense that their knowledge of the problem at-hand is limited to the (1) initial state, (2) state transition operators and their costs, (3) goal-test function, and (4) black-box heuristic function that estimates the value of a state. Prominent examples are A* and Weighted A*. Other heuristic search algorithms are domain-dependent, that is, customized to solve problems from a specific domain. A well-known example is conflict-directed A*, which is specifically designed to solve model-based diagnosis problems. In this paper, we review our recent advancements in both domain-independent and domain-dependent heuristic search, and outline several challenging open questions.

Thursday 15 16:30 - 17:00 Industry Days (K)

Chair: Jun Luo (Lenovo)

Meet security with AI: Alibaba's Security and Management in Practice
Quan Lu, Principal Engineer, Alibaba Group

Industry Days

Thursday 15 16:30 - 18:00 AI-HWB - ST: AI for Improving Human Well-Being 8 (J)

Chair: Dino Pedreschi

#1854

Who Should Pay the Cost: A Game-theoretic Model for Government Subsidized Investments to Improve National Cybersecurity
Xinrun Wang, Bo An, Hau Chan
Details | PDF

ST: AI for Improving Human Well-Being 8

Due to the recent cyber attacks, cybersecurity is becoming more critical in modern society. A single attack (e.g., WannaCry ransomware attack) can cause as much as $4 billion in damage. However, the cybersecurity investment by companies is far from satisfactory. Therefore, governments (e.g., in the UK) launch grants and subsidies to help companies to boost their cybersecurity to create a safer national cyber environment. The allocation problem is hard due to limited subsidies and the interdependence between self-interested companies and the presence of a strategic cyber attacker. To tackle the government's allocation problem, we introduce a Stackelberg game-theoretic model where the government first commits to an allocation and the companies/users and attacker simultaneously determine their protection and attack (pure or mixed) strategies, respectively. For the pure-strategy case, while there may not be a feasible allocation in general, we prove that computing an optimal allocation is NP-hard and propose a linear reverse convex program when the attacker can attack all users. For the mixed-strategy case, we show that there is a polynomial time algorithm to find an optimal allocation when the attacker has a single-attack capability. We then provide a heuristic algorithm, based on best-response-gradient dynamics, to find an effective allocation in the general setting. Experimentally, we show that our heuristic is effective and outperforms other baselines on synthetic and real data.
#6038

Improving Law Enforcement Daily Deployment Through Machine Learning-Informed Optimization under Uncertainty
Jonathan Chase, Duc Thien Nguyen, Haiyang Sun, Hoong Chuin Lau
Details | PDF

ST: AI for Improving Human Well-Being 8

Urban law enforcement agencies are under great pressure to respond to emergency incidents effectively while operating within restricted budgets. Minutes saved on emergency response times can save lives and catch criminals, and a responsive police force can deter crime and bring peace of mind to citizens. To efficiently minimize the response times of a law enforcement agency operating in a dense urban environment with limited manpower, we consider in this paper the problem of optimizing the spatial and temporal deployment of law enforcement agents to predefined patrol regions in a real-world scenario informed by machine learning. To this end, we develop a mixed integer linear optimization formulation (MIP) to minimize the risk of failing response time targets. Given the stochasticity of the environment in terms of incident numbers, location, timing, and duration, we use Sample Average Approximation (SAA) to find a robust deployment plan. To overcome the sparsity of real data, samples are provided by an incident generator that learns the spatio-temporal distribution and demand parameters of incidents from a real world historical dataset and generates sets of training incidents accordingly. To improve runtime performance across multiple samples, we implement a heuristic based on Iterated Local Search (ILS), as the solution is intended to create deployment plans quickly on a daily basis. Experimental results demonstrate that ILS performs well against the integer model while offering substantial gains in execution time.
#2718

PI-Bully: Personalized Cyberbullying Detection with Peer Influence
Lu Cheng, Jundong Li, Yasin Silva, Deborah Hall, Huan Liu
Details | PDF

ST: AI for Improving Human Well-Being 8

Cyberbullying has become one of the most pressing online risks for adolescents and has raised serious concerns in society. Recent years have witnessed a surge in research aimed at developing principled learning models to detect cyberbullying behaviors. These efforts have primarily focused on building a single generic classification model to differentiate bullying content from normal (non-bullying) content among all users. These models treat users equally and overlook idiosyncratic information about users that might facilitate the accurate detection of cyberbullying. In this paper, we propose a personalized cyberbullying detection framework, PI-Bully, that draws on empirical findings from psychology highlighting unique characteristics of victims and bullies and peer influence from like-minded users as predictors of cyberbullying behaviors. Our framework is novel in its ability to model peer influence in a collaborative environment and tailor cyberbullying prediction for each individual user. Extensive experimental evaluations on real-world datasets corroborate the effectiveness of the proposed framework.
#1914

Failure-Scenario Maker for Rule-Based Agent using Multi-agent Adversarial Reinforcement Learning and its Application to Autonomous Driving
Akifumi Wachi
Details | PDF

ST: AI for Improving Human Well-Being 8

We examine the problem of adversarial reinforcement learning for multi-agent domains including a rule-based agent. Rule-based algorithms are required in safety-critical applications for them to work properly in a wide range of situations. Hence, every effort is made to find failure scenarios during the development phase. However, as the software becomes complicated, finding failure cases becomes difficult. Especially in multi-agent domains, such as autonomous driving environments, it is much harder to find useful failure scenarios that help us improve the algorithm. We propose a method for efficiently finding failure scenarios; this method trains the adversarial agents using multi-agent reinforcement learning such that the tested rule-based agent fails. We demonstrate the effectiveness of our proposed method using a simple environment and autonomous driving simulator.
#6085

Decision Making for Improving Maritime Traffic Safety Using Constraint Programming
Saumya Bhatnagar, Akshat Kumar, Hoong Chuin Lau
Details | PDF

ST: AI for Improving Human Well-Being 8

Maritime navigational safety is of utmost importance to prevent vessel collisions in heavily trafficked ports, and avoid environmental costs. In case of a likely near miss among vessels, port traffic controllers provide assistance for safely navigating the waters, often at very short lead times. A better strategy is to avoid such situations from even happening. To achieve this, we a) formalize the decision model for traffic hotspot mitigation including realistic maritime navigational features and constraints through consultations with domain experts; and b) develop a constraint programming based scheduling approach to mitigate hotspots. We model the problem as a variant of the resource constrained project scheduling problem to adjust vessel movement schedules such that the average delay is minimized and navigational safety constraints are also satisfied. We conduct a thorough evaluation on key performance indicators using real world data, and demonstrate the effectiveness of our approach in mitigating high-risk situations.
#6461

Enhancing Stock Movement Prediction with Adversarial Training
Fuli Feng, Huimin Chen, Xiangnan He, Ji Ding, Maosong Sun, Tat-Seng Chua
Details | PDF

ST: AI for Improving Human Well-Being 8

This paper contributes a new machine learning solution for stock movement prediction, which aims to predict whether the price of a stock will be up or down in the near future. The key novelty is that we propose to employ adversarial training to improve the generalization of a neural network prediction model. The rationality of adversarial training here is that the input features to stock prediction are typically based on stock price, which is essentially a stochastic variable and continuously changed with time by nature. As such, normal training with static price-based features (e.g. the close price) can easily overfit the data, being insufficient to obtain reliable models. To address this problem, we propose to add perturbations to simulate the stochasticity of price variable, and train the model to work well under small yet intentional perturbations. Extensive experiments on two real-world stock data show that our method outperforms the state-of-the-art solution [Xu and Cohen, 2018] with 3.11% relative improvements on average w.r.t. accuracy, validating the usefulness of adversarial training for stock prediction task.

Thursday 15 16:30 - 18:00 ML|DL - Deep Learning 7 (L)

Chair: Lianli Gao

#720

Deep Cascade Generation on Point Sets
Kaiqi Wang, Ke Chen, Kui Jia
Details | PDF

Deep Learning 7

This paper proposes a deep cascade network to generate 3D geometry of an object on a point cloud, consisting of a set of permutation-insensitive points. Such a surface representation is easy to learn from, but inhibits exploiting rich low-dimensional topological manifolds of the object shape due to lack of geometric connectivity. For benefiting from its simple structure yet utilizing rich neighborhood information across points, this paper proposes a two-stage cascade model on point sets. Specifically, our method adopts the state-of-the-art point set autoencoder to generate a sparsely coarse shape first, and then locally refines it by encoding neighborhood connectivity on a graph representation. An ensemble of sparse refined surface is designed to alleviate the suffering from local minima caused by modeling complex geometric manifolds. Moreover, our model develops a dynamically-weighted loss function for jointly penalizing the generation output of cascade levels at different training stages in a coarse-to-fine manner. Comparative evaluation on the publicly benchmarking ShapeNet dataset demonstrates superior performance of the proposed model to the state-of-the-art methods on both single-view shape reconstruction and shape autoencoding applications.
#766

Hypergraph Induced Convolutional Manifold Networks
Taisong Jin, Liujuan Cao, Baochang Zhang, Xiaoshuai Sun, Cheng Deng, Rongrong Ji
Details | PDF

Deep Learning 7

Deep convolutional neural networks (DCNN) with manifold embedding have achieved considerable attention in computer vision. However, prior arts are usually based on the neighborhood-based graph modeling only the pairwise relationship between two samples, which fail to fully capture intra-class variations and thus suffer from severe performance loss for noisy data. While such intra-class variations can be well captured via sophisticated hypergraph structure, we are motivated and lead a hypergraph induced Convolutional Manifold Network (H-CMN) to significantly improve the representation capacity of DCNN for the complex data. Specifically, two innovative designs are provides: 1) our manifold preserving method is implemented based on a mini-batch, which can be efficiently plugged into the existing DCNN training pipelines and be scalable for large datasets; 2) a robust hypergraph is built for each mini-batch, which not only offers a strong robustness against typical noise, but also captures the variances from multiple features. Extensive experiments on the image classification task on large benchmarking datasets demonstrate that our model achieves much better performance than the state-of-the-art
#1247

Theoretical Investigation of Generalization Bound for Residual Networks
Hao Chen, Zhanfeng Mo, Zhouwang Yang, Xiao Wang
Details | PDF

Deep Learning 7

This paper presents a framework for norm-based capacity control with respect to an lp,q-norm in weight-normalized Residual Neural Networks (ResNets). We first formulate the representation of each residual block. For the regression problem, we analyze the Rademacher Complexity of the ResNets family. We also establish a tighter generalization upper bound for weight-normalized ResNets. in a more general sight. Using the lp,q-norm weight normalization in which 1/p+1/q >=1, we discuss the properties of a width-independent capacity control, which only relies on the depth according to a square root term. Several comparisons suggest that our result is tighter than previous work. Parallel results for Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN) are included by introducing the lp,q-norm weight normalization for DNN and the lp,q-norm kernel normalization for CNN. Numerical experiments also verify that ResNet structures contribute to better generalization properties.
#1606

Single-Channel Signal Separation and Deconvolution with Generative Adversarial Networks
Qiuqiang Kong, Yong Xu, Philip J. B. Jackson, Wenwu Wang, Mark D. Plumbley
Details | PDF

Deep Learning 7

Single-channel signal separation and deconvolution aims to separate and deconvolve individual sources from a single-channel mixture. Single-channel signal separation and deconvolution is a challenging problem in which no prior knowledge of the mixing filters is available. Both individual sources and mixing filters need to be estimated. In addition, a mixture may contain non-stationary noise which is unseen in the training set. We propose a synthesizing-decomposition (S-D) approach to solve the single-channel separation and deconvolution problem. In synthesizing, a generative model for sources is built using a generative adversarial network (GAN). In decomposition, both mixing filters and sources are optimized to minimize the reconstruction error of the mixture. The proposed S-D approach achieves a peak-to-noise-ratio (PSNR) of 18.9 dB and 15.4 dB in image inpainting and completion, outperforming a baseline convolutional neural network PSNR of 15.3 dB and 12.2 dB, respectively and achieves a PSNR of 13.2 dB in source separation together with deconvolution, outperforming a convolutive non-negative matrix factorization (NMF) baseline of 10.1 dB.
#2342

Group Reconstruction and Max-Pooling Residual Capsule Network
Xinpeng Ding, Nannan Wang, Xinbo Gao, Jie Li, Xiaoyu Wang
Details | PDF

Deep Learning 7

In capsule networks, the mapping of low-level capsules to high-level capsules is achieved by a routing-by-agreement algorithm. Since the capsule is made up of collections of neurons and the routing mechanism involves all the capsules instead of simply discarding some of the neurons like Max-Pooling, the capsule network has stronger representation ability than the traditional neural network. However, considering too much low-level capsules' information will cause its corresponding upper layer capsules to be interfered by other irrelevant information or noise capsules. Therefore, the original capsule network does not perform well on complex data structure. What's worse, computational complexity becomes a bottleneck in dealing with large data networks. In order to solve these shortcomings, this paper proposes a group reconstruction and max-pooling residual capsule network (GRMR-CapsNet). We build a block in which all capsules are divided into different groups and perform group reconstruction routing algorithm to obtain the corresponding high-level capsules. Between the lower and higher layers, Capsule Max-Pooling is adopted to prevent overfitting. We conduct experiments on CIFAR-10/100 and SVHN datasets and the results show that our method can perform better against state-of-the-arts.
#2851

AttnSense: Multi-level Attention Mechanism For Multimodal Human Activity Recognition
HaoJie Ma, Wenzhong Li, Xiao Zhang, Songcheng Gao, Sanglu Lu
Details | PDF

Deep Learning 7

Sensor-based human activity recognition is a fundamental research problem in ubiquitous computing, which uses the rich sensing data from multimodal embedded sensors such as accelerometer and gyroscope to infer human activities. The existing activity recognition approaches either rely on domain knowledge or fail to address the spatial-temporal dependencies of the sensing signals. In this paper, we propose a novel attention-based multimodal neural network model called AttnSense for multimodal human activity recognition. AttnSense introduce the framework of combining attention mechanism with a convolutional neural network (CNN) and a Gated Recurrent Units (GRU) network to capture the dependencies of sensing signals in both spatial and temporal domains, which shows advantages in prioritized sensor selection and improves the comprehensibility. Extensive experiments based on three public datasets show that AttnSense achieves a competitive performance in activity recognition compared with several state-of-the-art methods.

Thursday 15 16:30 - 18:00 ML|RL - Reinforcement Learning 6 (2701-2702)

Chair: Yang Yu

#2230

Measuring Structural Similarities in Finite MDPs
Hao Wang, Shaokang Dong, Ling Shao
Details | PDF

Reinforcement Learning 6

In this paper, we investigate the structural similarities within a finite Markov decision process (MDP). We view a finite MDP as a heterogeneous directed bipartite graph and propose novel measures for state similarity and action similarity in a mutual reinforcement manner. We prove that the state similarity is a metric and the action similarity is a pseudometric. We also establish the connection between the proposed similarity measures and the optimal values of the MDP. Extensive experiments show that the proposed measures are effective.
#3799

Solving Continual Combinatorial Selection via Deep Reinforcement Learning
Hyungseok Song, Hyeryung Jang, Hai H. Tran, Se-eun Yoon, Kyunghwan Son, Donggyu Yun, Hyoju Chung, Yung Yi
Details | PDF

Reinforcement Learning 6

We consider the Markov Decision Process (MDP) of selecting a subset of items at each step, termed the Select-MDP (S-MDP). The large state and action spaces of S-MDPs make them intractable to solve with typical reinforcement learning (RL) algorithms especially when the number of items is huge. In this paper, we present a deep RL algorithm to solve this issue by adopting the following key ideas. First, we convert the original S-MDP into an Iterative Select-MDP (IS-MDP), which is equivalent to the S-MDP in terms of optimal actions. IS-MDP decomposes a joint action of selecting K items simultaneously into K iterative selections resulting in the decrease of actions at the expense of an exponential increase of states. Second, we overcome this state space explosion by exploiting a special symmetry in IS-MDPs with novel weight shared Q-networks, which provably maintain sufficient expressive power. Various experiments demonstrate that our approach works well even when the item space is large and that it scales to environments with item spaces different from those used in training.
#5227

Reinforcement Learning Experience Reuse with Policy Residual Representation
WenJi Zhou, Yang Yu, Yingfeng Chen, Kai Guan, Tangjie Lv, Changjie Fan, Zhi-Hua Zhou
Details | PDF

Reinforcement Learning 6

Experience reuse is key to sample-efficient reinforcement learning. One of the critical issues is how the experience is represented and stored. Previously, the experience can be stored in the forms of features, individual models, and the average model, each lying at a different granularity. However, new tasks may require experience across multiple granularities. In this paper, we propose the policy residual representation (PRR) network, which can extract and store multiple levels of experience. PRR network is trained on a set of tasks with a multi-level architecture, where a module in each level corresponds to a subset of the tasks. Therefore, the PRR network represents the experience in a spectrum-like way. When training on a new task, PRR can provide different levels of experience for accelerating the learning. We experiment with the PRR network on a set of grid world navigation tasks, locomotion tasks, and fighting tasks in a video game. The results show that the PRR network leads to better reuse of experience and thus outperforms some state-of-the-art approaches.
#5663

Exploiting the Sign of the Advantage Function to Learn Deterministic Policies in Continuous Domains
Matthieu Zimmer, Paul Weng
Details | PDF

Reinforcement Learning 6

In the context of learning deterministic policies in continuous domains, we revisit an approach, which was first proposed in Continuous Actor Critic Learning Automaton (CACLA) and later extended in Neural Fitted Actor Critic (NFAC). This approach is based on a policy update different from that of deterministic policy gradient (DPG). Previous work has observed its excellent performance empirically, but a theoretical justification is lacking. To fill this gap, we provide a theoretical explanation to motivate this unorthodox policy update by relating it to another update and making explicit the objective function of the latter. We furthermore discuss in depth the properties of these updates to get a deeper understanding of the overall approach. In addition, we extend it and propose a new trust region algorithm, Penalized NFAC (PeNFAC). Finally, we experimentally demonstrate in several classic control problems that it surpasses the state-of-the-art algorithms to learn deterministic policies.
#5837

Assumed Density Filtering Q-learning
Heejin Jeong, Clark Zhang, George J. Pappas, Daniel D. Lee
Details | PDF

Reinforcement Learning 6

While off-policy temporal difference (TD) methods have widely been used in reinforcement learning due to their efficiency and simple implementation, their Bayesian counterparts have not been utilized as frequently. One reason is that the non-linear max operation in the Bellman optimality equation makes it difficult to define conjugate distributions over the value functions. In this paper, we introduce a novel Bayesian approach to off-policy TD methods, called as ADFQ, which updates beliefs on state-action values, Q, through an online Bayesian inference method known as Assumed Density Filtering. We formulate an efficient closed-form solution for the value update by approximately estimating analytic parameters of the posterior of the Q-beliefs. Uncertainty measures in the beliefs not only are used in exploration but also provide a natural regularization for the value update considering all next available actions. ADFQ converges to Q-learning as the uncertainty measures of the Q-beliefs decrease and improves common drawbacks of other Bayesian RL algorithms such as computational complexity. We extend ADFQ with a neural network. Our empirical results demonstrate that ADFQ outperforms comparable algorithms on various Atari 2600 games, with drastic improvements in highly stochastic domains or domains with a large action space.
#6657

Curriculum Learning for Cumulative Return Maximization
Francesco Foglino, Christiano Coletto Christakou, Ricardo Luna Gutierrez, Matteo Leonetti
Details | PDF

Reinforcement Learning 6

Curriculum learning has been successfully used in reinforcement learning to accelerate the learning process, through knowledge transfer between tasks of increasing complexity. Critical tasks, in which suboptimal exploratory actions must be minimized, can benefit from curriculum learning, and its ability to shape exploration through transfer. We propose a task sequencing algorithm maximizing the cumulative return, that is, the return obtained by the agent across all the learning episodes. By maximizing the cumulative return, the agent not only aims at achieving high rewards as fast as possible, but also at doing so while limiting suboptimal actions. We experimentally compare our task sequencing algorithm to several popular metaheuristic algorithms for combinatorial optimization, and show that it achieves significantly better performance on the problem of cumulative return maximization. Furthermore, we validate our algorithm on a critical task, optimizing a home controller for a micro energy grid.

Thursday 15 16:30 - 18:00 AMS|CSC - Computational Social Choice 5 (2703-2704)

Chair: Sujit Gujar

#980

Election with Bribe-Effect Uncertainty: A Dichotomy Result
Lin Chen, Lei Xu, Shouhuai Xu, Zhimin Gao, Weidong Shi
Details | PDF

Computational Social Choice 5

We consider the electoral bribery problem in computational social choice. In this context, extensive studies have been carried out to analyze the computational vulnerability of various voting (or election) rules. However, essentially all prior studies assume a deterministic model where each voter has an associated threshold value, which is used as follows. A voter will take a bribe and vote according to the attacker's (i.e., briber's) preference when the amount of the bribe is above the threshold, and a voter will not take a bribe when the amount of the bribe is not above the threshold (in this case, the voter will vote according to its own preference, rather than the attacker's). In this paper, we initiate the study of a more realistic model where each voter is associated with a willingness function, rather than a fixed threshold value. The willingness function characterizes the likelihood a bribed voter would vote according to the attacker's preference; we call this bribe-effect uncertainty. We characterize the computational complexity of the electoral bribery problem in this new model. In particular, we discover a dichotomy result: a certain mathematical property of the willingness function dictates whether or not the computational hardness can serve as a deterrence to bribery attackers.
#1030

A Parameterized Perspective on Protecting Elections
Palash Dey, Neeldhara Misra, Swaprava Nath, Garima Shakya
Details | PDF

Computational Social Choice 5

We study the parameterized complexity of the optimal defense and optimal attack problems in voting. In both the problems, the input is a set of voter groups (every voter group is a set of votes) and two integers k_a and k_d corresponding to respectively the number of voter groups the attacker can attack and the number of voter groups the defender can defend. A voter group gets removed from the election if it is attacked but not defended. In the optimal defense problem, we want to know if it is possible for the defender to commit to a strategy of defending at most k_d voter groups such that, no matter which k_a voter groups the attacker attacks, the out-come of the election does not change. In the optimal attack problem, we want to know if it is possible for the attacker to commit to a strategy of attacking k_a voter groups such that, no matter which k_d voter groups the defender defends, the outcome of the election is always different from the original (without any attack) one. We show that both the optimal defense problem and the optimal attack problem are computationally intractable for every scoring rule and the Condorcet voting rule even when we have only3candidates. We also show that the optimal defense problem for every scoring rule and the Condorcet voting rule is W[2]-hard for both the parameters k_a and k_d, while it admits a fixed parameter tractable algorithm parameterized by the combined parameter (ka, kd). The optimal attack problem for every scoring rule and the Condorcet voting rule turns out to be much harder – it is W[1]-hard even for the combined parameter (ka, kd). We propose two greedy algorithms for the OPTIMAL DEFENSE problem and empirically show that they perform effectively on reasonable voting profiles.
#3675

On Succinct Encodings for the Tournament Fixing Problem
Sushmita Gupta, Saket Saurabh, Ramanujan Sridharan, Meirav Zehavi
Details | PDF

Computational Social Choice 5

Single-elimination tournaments are a popular format in competitive environments. The Tournament Fixing Problem (TFP), which is the problem of finding a seeding of the players such that a certain player wins the resulting tournament, is known to be NP-hard in general and fixed-parameter tractable when parameterized by the feedback arc set number of the input tournament (an oriented complete graph) of expected wins/loses. However, the existence of polynomial kernelizations (efficient preprocessing) for TFP has remained open. In this paper, we present the first polynomial kernelization for TFP parameterized by the feedback arc set number of the input tournament. We achieve this by providing a polynomial-time routine that computes a SAT encoding where the number of clauses is bounded polynomially in the feedback arc set number.
#4898

How Hard Is the Manipulative Design of Scoring Systems?
Dorothea Baumeister, Tobias Hogrebe
Details | PDF

Computational Social Choice 5

In an election, votes are often given as ordered lists over candidates. A common way of determining the winner is then to apply some scoring system, where each position is associated with a specific score. This setting is also transferable to other situations, such as sports tournaments. The design of such systems, i.e., the choice of the score values, may have a crucial influence on the outcome. We study the computational complexity of two related decision problems. In addition, we provide a case study of data from Formula 1 using ILP formulations. Our results show that under some mild conditions there are cases where the actual scoring system has no influence, whereas in other cases very small changes may lead to a different winner. This may be seen as a measure of robustness of the winning candidate.
#5230

Preferences Single-Peaked on a Tree: Sampling and Tree Recognition
Jakub Sliwinski, Edith Elkind
Details | PDF

Computational Social Choice 5

In voting theory, impossibility results and computational hardness results are often circumvented by recognising that voters' preferences are not arbitrary, but lie within a restricted domain. Uncovering the structure of the underlying domain often provides useful insights about the nature of the alternative space, and may be helpful in identifying a collective choice. Preferences single-peaked on a tree are an example of a relatively broad domain that nonetheless exhibits several desirable properties. We consider the setting where voters' preferences are independently sampled from rankings that are single-peaked on a given tree, and study the problem of reliably identifying the tree that generated the observed votes. We test our algorithm empirically; to this end, we develop an algorithm to uniformly sample preferences that are single-peaked on a given tree.
#2666

Answer Set Programming for Judgment Aggregation
Ronald de Haan, Marija Slavkovik
Details | PDF

Computational Social Choice 5

Judgment aggregation (JA) studies how to aggregate truth valuations on logically related issues. Computing the outcome of aggregation procedures is notoriously computationally hard, which is the likely reason that no implementation of them exists as of yet. However, even hard problems sometimes need to be solved. The worst-case computational complexity of answer set programming (ASP) matches that of most problems in judgment aggregation. We take advantage of this and propose a natural and modular encoding of various judgment aggregation procedures and related problems in JA into ASP. With these encodings, we achieve two results: (1) paving the way towards constructing a wide range of new benchmark instances (from JA) for answer set solving algorithms; and (2) providing an automated tool for researchers in the area of judgment aggregation.

Thursday 15 16:30 - 18:00 MTA|SP - Security and Privacy 3 (2705-2706)

Chair: Wang Pinghui

#238

Novel Collaborative Filtering Recommender Friendly to Privacy Protection
Jun Wang, Qiang Tang, Afonso Arriaga, Peter Y. A. Ryan
Details | PDF

Security and Privacy 3

Nowadays, recommender system is an indispensable tool in many information services, and a large number of algorithms have been designed and implemented. However, fed with very large datasets, state-of-the-art recommendation algorithms often face an efficiency bottleneck, i.e., it takes huge amount of computing resources to train a recommendation model. In order to satisfy the needs of privacy-savvy users who do not want to disclose their information to the service provider, the complexity of most existing solutions becomes prohibitive. As such, it is an interesting research question to design simple and efficient recommendation algorithms that achieve reasonable accuracy and facilitate privacy protection at the same time. In this paper, we propose an efficient recommendation algorithm, named CryptoRec, which has two nice properties: (1) can estimate a new user's preferences by directly using a model pre-learned from an expert dataset, and the new user's data is not required to train the model; (2) can compute recommendations with only addition and multiplication operations. As to the evaluation, we first test the recommendation accuracy on three real-world datasets and show that CryptoRec is competitive with state-of-the-art recommenders. Then, we evaluate the performance of the privacy-preserving variants of CryptoRec and show that predictions can be computed in seconds on a PC. In contrast, existing solutions will need tens or hundreds of hours on more powerful computers.
#1533

FABA: An Algorithm for Fast Aggregation against Byzantine Attacks in Distributed Neural Networks
Qi Xia, Zeyi Tao, Zijiang Hao, Qun Li
Details | PDF

Security and Privacy 3

Many times, training a large scale deep learning neural network on a single machine becomes more and more difficult for a complex network model. Distributed training provides an efficient solution, but Byzantine attacks may occur on participating workers. They may be compromised or suffer from hardware failures. If they upload poisonous gradients, the training will become unstable or even converge to a saddle point. In this paper, we propose FABA, a Fast Aggregation algorithm against Byzantine Attacks, which removes the outliers in the uploaded gradients and obtains gradients that are close to the true gradients. We show the convergence of our algorithm. The experiments demonstrate that our algorithm can achieve similar performance to non-Byzantine case and higher efficiency as compared to previous algorithms.
#1853

Locate-Then-Detect: Real-time Web Attack Detection via Attention-based Deep Neural Networks
Tianlong Liu, Yu Qi, Liang Shi, Jianan Yan
Details | PDF

Security and Privacy 3

Web attacks such as Cross-Site Scripting and SQL Injection are serious Web threats that lead to catastrophic data leaking and loss. Because attack payloads are often short segments hidden in URL requests/posts that can be very long, classical machine learning approaches have difficulties in learning useful patterns from them. In this study, we propose a novel Locate-Then-Detect (LTD) system that can precisely detect Web threats in real-time by using attention-based deep neural networks. Firstly, an efficient Payload Locating Network (PLN) is employed to propose most suspicious regions from large URL requests/posts. Then a Payload Classification Network (PCN) is adopted to accurately classify malicious regions from suspicious candidates. In this way, PCN can focus more on learning malicious segments and highly increase detection accuracy. The noise induced by irrelevant background strings can be largely eliminated. Besides, LTD can greatly reduce computational costs (82.6% less) by ignoring large irrelevant URL content. Experiments are carried out on both benchmarks and real Web traffic. The LTD outperforms an HMM-based approach, the Libinjection system, and a leading commercial rule-based Web Application Firewall. Our method can be efficiently implemented on GPUs with an average detection time of about 5ms and well qualified for real-time applications.
#2179

A Privacy Preserving Collusion Secure DCOP Algorithm
Tamir Tassa, Tal Grinshpoun, Avishai Yanay
Details | PDF

Security and Privacy 3

In recent years, several studies proposed privacy-preserving algorithms for solving Distributed Constraint Optimization Problems (DCOPs). All of those studies assumed that agents do not collude. In this study we propose the first privacy-preserving DCOP algorithm that is immune against coalitions, under the assumption of honest majority. Our algorithm -- PC-SyncBB -- is based on the classical Branch and Bound DCOP algorithm. It offers constraint, topology and decision privacy. We evaluate its performance on different benchmarks, problem sizes, and constraint densities. We show that achieving security against coalitions is feasible. As all existing privacy-preserving DCOP algorithms base their security on assuming solitary conduct of the agents, we view this study as an essential first step towards lifting this potentially harmful assumption in all those algorithms.
#4072

Adversarial Examples for Graph Data: Deep Insights into Attack and Defense
Huijun Wu, Chen Wang, Yuriy Tyshetskiy, Andrew Docherty, Kai Lu, Liming Zhu
Details | PDF

Security and Privacy 3

Graph deep learning models, such as graph convolutional networks (GCN) achieve state-of-the-art performance for tasks on graph data. However, similar to other deep learning models, graph deep learning models are susceptible to adversarial attacks. However, compared with non-graph data the discrete nature of the graph connections and features provide unique challenges and opportunities for adversarial attacks and defenses. In this paper, we propose techniques for both an adversarial attack and a defense against adversarial attacks. Firstly, we show that the problem of discrete graph connections and the discrete features of common datasets can be handled by using the integrated gradient technique that accurately determines the effect of changing selected features or edges while still benefiting from parallel computations. In addition, we show that an adversarially manipulated graph using a targeted attack statistically differs from un-manipulated graphs. Based on this observation, we propose a defense approach which can detect and recover a potential adversarial perturbation. Our experiments on a number of datasets show the effectiveness of the proposed techniques.
#6491

BAYHENN: Combining Bayesian Deep Learning and Homomorphic Encryption for Secure DNN Inference
Peichen Xie, Bingzhe Wu, Guangyu Sun
Details | PDF

Security and Privacy 3

Recently, deep learning as a service (DLaaS) has emerged as a promising way to facilitate the employment of deep neural networks (DNNs) for various purposes. However, using DLaaS also causes potential privacy leakage from both clients and cloud servers. This privacy issue has fueled the research interests on the privacy-preserving inference of DNN models in the cloud service. In this paper, we present a practical solution named BAYHENN for secure DNN inference. It can protect both the client's privacy and server's privacy at the same time. The key strategy of our solution is to combine homomorphic encryption and Bayesian neural networks. Specifically, we use homomorphic encryption to protect a client's raw data and use Bayesian neural networks to protect the DNN weights in a cloud server. To verify the effectiveness of our solution, we conduct experiments on MNIST and a real-life clinical dataset. Our solution achieves consistent latency decreases on both tasks. In particular, our method can outperform the best existing method (GAZELLE) by about 5x, in terms of end-to-end latency.

Thursday 15 16:30 - 18:00 ML|UL - Unsupervised Learning 3 (2601-2602)

Chair: Bin Yang

#917

Tree Sampling Divergence: An Information-Theoretic Metric for Hierarchical Graph Clustering
Bertrand Charpentier, Thomas Bonald
Details | PDF

Unsupervised Learning 3

We introduce the tree sampling divergence (TSD), an information-theoretic metric for assessing the quality of the hierarchical clustering of a graph. Any hierarchical clustering of a graph can be represented as a tree whose nodes correspond to clusters of the graph. The TSD is the Kullback-Leibler divergence between two probability distributions over the nodes of this tree: those induced respectively by sampling at random edges and node pairs of the graph. A fundamental property of the proposed metric is that it is interpretable in terms of graph reconstruction. Specifically, it quantifies the ability to reconstruct the graph from the tree in terms of information loss. In particular, the TSD is maximum when perfect reconstruction is feasible, i.e., when the graph has a complete hierarchical structure. Another key property of TSD is that it applies to any tree, not necessarily binary. In particular, the TSD can be used to compress a binary tree while minimizing the information loss in terms of graph reconstruction, so as to get a compact representation of the hierarchical structure of a graph. We illustrate the behavior of TSD compared to existing metrics on experiments based on both synthetic and real datasets.
#1523

Latent Distribution Preserving Deep Subspace Clustering
Lei Zhou, Xiao Bai, Dong Wang, Xianglong Liu, Jun Zhou, Edwin Hancock
Details | PDF

Unsupervised Learning 3

Subspace clustering is a useful technique for many computer vision applications in which the intrinsic dimension of high-dimensional data is smaller than the ambient dimension. Traditional subspace clustering methods often rely on the self-expressiveness property, which has proven effective for linear subspace clustering. However, they perform unsatisfactorily on real data with complex nonlinear subspaces. More recently, deep autoencoder based subspace clustering methods have achieved success owning to the more powerful representation extracted by the autoencoder network. Unfortunately, these methods only considering the reconstruction of original input data can hardly guarantee the latent representation for the data distributed in subspaces, which inevitably limits the performance in practice. In this paper, we propose a novel deep subspace clustering method based on a latent distribution-preserving autoencoder, which introduces a distribution consistency loss to guide the learning of distribution-preserving latent representation, and consequently enables strong capacity of characterizing the real-world data for subspace clustering. Experimental results on several public databases show that our method achieves significant improvement compared with the state-of-the-art subspace clustering methods.
#1822

Distributed Collaborative Feature Selection Based on Intermediate Representation
Xiucai Ye, Hongmin Li, Akira Imakura, Tetsuya Sakurai
Details | PDF

Unsupervised Learning 3

Feature selection is an efficient dimensionality reduction technique for artificial intelligence and machine learning. Many feature selection methods learn the data structure to select the most discriminative features for distinguishing different classes. However, the data is sometimes distributed in multiple parties and sharing the original data is difficult due to the privacy requirement. As a result, the data in one party may be lack of useful information to learn the most discriminative features. In this paper, we propose a novel distributed method which allows collaborative feature selection for multiple parties without revealing their original data. In the proposed method, each party finds the intermediate representations from the original data, and shares the intermediate representations for collaborative feature selection. Based on the shared intermediate representations, the original data from multiple parties are transformed to the same low dimensional space. The feature ranking of the original data is learned by imposing row sparsity on the transformation matrix simultaneously. Experimental results on real-world datasets demonstrate the effectiveness of the proposed method.
#3480

Attributed Graph Clustering: A Deep Attentional Embedding Approach
Chun Wang, Shirui Pan, Ruiqi Hu, Guodong Long, Jing Jiang, Chengqi Zhang
Details | PDF

Unsupervised Learning 3

Graph clustering is a fundamental task which discovers communities or groups in networks. Recent studies have mostly focused on developing deep learning approaches to learn a compact graph embedding, upon which classic clustering methods like k-means or spectral clustering algorithms are applied. These two-step frameworks are difficult to manipulate and usually lead to suboptimal performance, mainly because the graph embedding is not goal-directed, i.e., designed for the specific clustering task. In this paper, we propose a goal-directed deep learning approach, Deep Attentional Embedded Graph Clustering (DAEGC for short). Our method focuses on attributed graphs to sufficiently explore the two sides of information in graphs. By employing an attention network to capture the importance of the neighboring nodes to a target node, our DAEGC algorithm encodes the topological structure and node content in a graph to a compact representation, on which an inner product decoder is trained to reconstruct the graph structure. Furthermore, soft labels from the graph embedding itself are generated to supervise a self-training graph clustering process, which iteratively refines the clustering results. The self-training process is jointly learned and optimized with the graph embedding in a unified framework, to mutually benefit both components. Experimental results compared with state-of-the-art algorithms demonstrate the superiority of our method.
#3867

Simultaneous Representation Learning and Clustering for Incomplete Multi-view Data
Wenzhang Zhuge, Chenping Hou, Xinwang Liu, Hong Tao, Dongyun Yi
Details | PDF

Unsupervised Learning 3

Incomplete multi-view clustering has attracted various attentions from diverse fields. Most existing methods factorize data to learn a unified representation linearly. Their performance may degrade when the relations between the unified representation and data of different views are nonlinear. Moreover, they need post-processing on the unified representations to extract the clustering indicators, which separates the consensus learning and subsequent clustering. To address these issues, in this paper, we propose a Simultaneous Representation Learning and Clustering (SRLC) method. Concretely, SRLC constructs similarity matrices to measure the relations between pair of instances, and learns low-dimensional representations of present instances on each view and a common probability label matrix simultaneously. Thus, the nonlinear information can be reflected by these representations and the clustering results can obtained from label matrix directly. An efficient iterative algorithm with guaranteed convergence is presented for optimization. Experiments on several datasets demonstrate the advantages of the proposed approach.
#6362

Adversarial Incomplete Multi-view Clustering
Cai Xu, Ziyu Guan, Wei Zhao, Hongchang Wu, Yunfei Niu, Beilei Ling
Details | PDF

Unsupervised Learning 3

Multi-view clustering aims to leverage information from multiple views to improve clustering. Most previous works assumed that each view has complete data. However, in real-world datasets, it is often the case that a view may contain some missing data, resulting in the incomplete multi-view clustering problem. Previous methods for this problem have at least one of the following drawbacks: (1) employing shallow models, which cannot well handle the dependence and discrepancy among different views; (2) ignoring the hidden information of the missing data; (3) dedicated to the two-view case. To eliminate all these drawbacks, in this work we present an Adversarial Incomplete Multi-view Clustering (AIMC) method. Unlike most existing methods which only learn a new representation with existing views, AIMC seeks the common latent space of multi-view data and performs missing data inference simultaneously. In particular, the element-wise reconstruction and the generative adversarial network (GAN) are integrated to infer the missing data. They aim to capture overall structure and get a deeper semantic understanding respectively. Moreover, an aligned clustering loss is designed to obtain a better clustering structure. Experiments conducted on three datasets show that AIMC performs well and outperforms baseline methods.

Thursday 15 16:30 - 18:00 KRR|DLO - Description Logics and Ontologies 2 (2603-2604)

Chair: Przemyslaw Walega

#694

Ontology Approximation in Horn Description Logics
Anneke Bötcher, Carsten Lutz, Frank Wolter
Details | PDF

Description Logics and Ontologies 2

We study the approximation of a description logic (DL) ontology in a less expressive DL, focusing on the case of Horn DLs. It is common to construct such approximations in an ad hoc way in practice and the resulting incompleteness is typically neither analyzed nor understood. In this paper, we show how to construct complete approximations. These are typically infinite or of excessive size and thus cannot be used directly in applications, but our results provide an important theoretical foundation that enables informed decisions when constructing incomplete approximations in practice.
#2925

Worst-Case Optimal Querying of Very Expressive Description Logics with Path Expressions and Succinct Counting
Bartosz Bednarczyk, Sebastian Rudolph
Details | PDF

Description Logics and Ontologies 2

Among the most expressive knowledge representation formalisms are the description logics of the Z family. For well-behaved fragments of ZOIQ, entailment of positive two-way regular path queries is well known to be 2EXPTIME-complete under the proviso of unary encoding of numbers in cardinality constraints. We show that this assumption can be dropped without an increase in complexity and EXPTIME-completeness can be achieved when bounding the number of query atoms, using a novel reduction from query entailment to knowledge base satisfiability. These findings allow to strengthen other results regarding query entailment and query containment problems in very expressive description logics. Our results also carry over to GC2, the two-variable guarded fragment of first-order logic with counting quantifiers, for which hitherto only conjunctive query entailment has been investigated.
#5472

Explanations for Query Answers under Existential Rules
İsmail İlkan Ceylan, Thomas Lukasiewicz, Enrico Malizia, Andrius Vaicenavičius
Details | PDF

Description Logics and Ontologies 2

Ontology-mediated query answering is an extensively studied paradigm, which aims at improving query answers with the use of a logical theory. As a form of logical entailment, ontology-mediated query answering is fully interpretable, which makes it possible to derive explanations for query answers. Surprisingly, however, explaining answers for ontology-mediated queries has received little attention for ontology languages based on existential rules. In this paper, we close this gap, and study the problem of explaining query answers in terms of minimal subsets of database facts. We provide a thorough complexity analysis for several decision problems associated with minimal explanations under existential rules.
#6523

Chasing Sets: How to Use Existential Rules for Expressive Reasoning
David Carral, Irina Dragoste, Markus Krötzsch, Christian Lewe
Details | PDF

Description Logics and Ontologies 2

We propose that modern existential rule reasoners can enable fully declarative implementations of rule-based inference methods in knowledge representation, in the sense that a particular calculus is captured by a fixed set of rules that can be evaluated on varying inputs (encoded as facts). We introduce Datalog(S) -- Datalog with support for sets -- as a surface language for such translations, and show that it can be captured in a decidable fragment of existential rules. We then implement several known inference methods in Datalog(S), and empirically show that an existing existential rule reasoner can thus be used to solve practical reasoning problems.
#10957

(Sister Conferences Best Papers Track) Closed-World Semantics for Conjunctive Queries with Negation over ELH-bottom Ontologies
Stefan Borgwardt, Walter Forkel
Details | PDF

Description Logics and Ontologies 2

Ontology-mediated query answering is a popular paradigm for enriching answers to user queries with background knowledge. For querying the absence of information, however, there exist only few ontology-based approaches. Moreover, these proposals conflate the closed-domain and closed-world assumption, and therefore are not suited to deal with the anonymous objects that are common in ontological reasoning. We propose a new closed-world semantics for answering conjunctive queries with negation over ontologies formulated in the description logic ELH-bottom, based on the minimal canonical model. We propose a rewriting strategy for dealing with negated query atoms, which shows that query answering is possible in polynomial time in data complexity.
#2741

Do You Need Infinite Time?
Alessandro Artale, Andrea Mazzullo, Ana Ozaki
Details | PDF

Description Logics and Ontologies 2

Linear temporal logic over finite traces is used as a formalism for temporal specification in automated planning, process modelling and (runtime) verification. In this paper, we investigate first-order temporal logic over finite traces, lifting some known results to a more expressive setting. Satisfiability in the two-variable monodic fragment is shown to be EXPSPACE-complete, as for the infinite trace case, while it decreases to NEXPTIME when we consider finite traces bounded in the number of instants. This leads to new complexity results for temporal description logics over finite traces. We further investigate satisfiability and equivalences of formulas under a model-theoretic perspective, providing a set of semantic conditions that characterise when the distinction between reasoning over finite and infinite traces can be blurred. Finally, we apply these conditions to planning and verification.

Thursday 15 16:30 - 18:00 NLP|NLP - Natural Language Processing 2 (2605-2606)

Chair: Jose Camacho Collados

#783

Modeling both Context- and Speaker-Sensitive Dependence for Emotion Detection in Multi-speaker Conversations
Dong Zhang, Liangqing Wu, Changlong Sun, Shoushan Li, Qiaoming Zhu, Guodong Zhou
Details | PDF

Natural Language Processing 2

Recently, emotion detection in conversations becomes a hot research topic in the Natural Language Processing community. In this paper, we focus on emotion detection in multi-speaker conversations instead of traditional two-speaker conversations in existing studies. Different from non-conversation text, emotion detection in conversation text has one specific challenge in modeling the context-sensitive dependence. Besides, emotion detection in multi-speaker conversations endorses another specific challenge in modeling the speaker-sensitive dependence. To address above two challenges, we propose a conversational graph-based convolutional neural network. On the one hand, our approach represents each utterance and each speaker as a node. On the other hand, the context-sensitive dependence is represented by an undirected edge between two utterances nodes from the same conversation and the speaker-sensitive dependence is represented by an undirected edge between an utterance node and its speaker node. In this way, the entire conversational corpus can be symbolized as a large heterogeneous graph and the emotion detection task can be recast as a classification problem of the utterance nodes in the graph. The experimental results on a multi-modal and multi-speaker conversation corpus demonstrate the great effectiveness of the proposed approach.
#1936

Exploring and Distilling Cross-Modal Information for Image Captioning
Fenglin Liu, Xuancheng Ren, Yuanxin Liu, Kai Lei, Xu Sun
Details | PDF

Natural Language Processing 2

Recently, attention-based encoder-decoder models have been used extensively in image captioning. Yet there is still great difficulty for the current methods to achieve deep image understanding. In this work, we argue that such understanding requires visual attention to correlated image regions and semantic attention to coherent attributes of interest. To perform effective attention, we explore image captioning from a cross-modal perspective and propose the Global-and-Local Information Exploring-and-Distilling approach that explores and distills the source information in vision and language. It globally provides the aspect vector, a spatial and relational representation of images based on caption contexts, through the extraction of salient region groupings and attribute collocations, and locally extracts the fine-grained regions and attributes in reference to the aspect vector for word selection. Our fully-attentive model achieves a CIDEr score of 129.3 in offline COCO evaluation with remarkable efficiency in terms of accuracy, speed, and parameter budget.
#4147

Relation-Aware Entity Alignment for Heterogeneous Knowledge Graphs
Yuting Wu, Xiao Liu, Yansong Feng, Zheng Wang, Rui Yan, Dongyan Zhao
Details | PDF

Natural Language Processing 2

Entity alignment is the task of linking entities with the same real-world identity from different knowledge graphs (KGs), which has been recently dominated by embedding-based methods. Such approaches work by learning KG representations so that entity alignment can be performed by measuring the similarities between entity embeddings. While promising, prior works in the field often fail to properly capture complex relation information that commonly exists in multi-relational KGs, leaving much room for improvement. In this paper, we propose a novel Relation-aware Dual-Graph Convolutional Network (RDGCN) to incorporate relation information via attentive interactions between the knowledge graph and its dual relation counterpart, and further capture neighboring structures to learn better entity representations. Experiments on three real-world cross-lingual datasets show that our approach delivers better and more robust results over the state-of-the-art alignment methods by learning better KG representations.
#5127

PRoFET: Predicting the Risk of Firms from Event Transcripts
Christoph Kilian Theil, Samuel Broscheit, Heiner Stuckenschmidt
Details | PDF

Natural Language Processing 2

Financial risk, defined as the chance to deviate from return expectations, is most commonly measured with volatility. Due to its value for investment decision making, volatility prediction is probably among the most important tasks in finance and risk management. Although evidence exists that enriching purely financial models with natural language information can improve predictions of volatility, this task is still comparably underexplored. We introduce PRoFET, the first neural model for volatility prediction jointly exploiting both semantic language representations and a comprehensive set of financial features. As language data, we use transcripts from quarterly recurring events, so-called "earnings calls"; in these calls, the performance of publicly traded companies is summarized and prognosticated by their management. We show that our proposed architecture, which models verbal context with an attention mechanism, significantly outperforms the previous state-of-the-art and other strong baselines. Finally, we visualize this attention mechanism on the token-level, thus aiding interpretability and providing a use case of PRoFET as a tool for investment decision support.
#3529

A Goal-Driven Tree-Structured Neural Model for Math Word Problems
Zhipeng Xie, Shichao Sun
Details | PDF

Natural Language Processing 2

Most existing neural models for math word problems exploit Seq2Seq model to generate solution expressions sequentially from left to right, whose results are far from satisfactory due to the lack of goal-driven mechanism commonly seen in human problem solving. This paper proposes a tree-structured neural model to generate expression tree in a goal-driven manner. Given a math word problem, the model first identifies and encodes its goal to achieve, and then the goal gets decomposed into sub-goals combined by an operator in a top-down recursive way. The whole process is repeated until the goal is simple enough to be realized by a known quantity as leaf node. During the process, two-layer gated-feedforward networks are designed to implement each step of goal decomposition, and a recursive neural network is used to encode fulfilled subtrees into subtree embeddings, which provides a better representation of subtrees than the simple goals of subtrees. Experimental results on the dataset Math23K have shown that our tree-structured model outperforms significantly several state-of-the-art models.
#1298

Learn to Select via Hierarchical Gate Mechanism for Aspect-Based Sentiment Analysis
Xiangying Ran, Yuanyuan Pan, Wei Sun, Chongjun Wang
Details | PDF

Natural Language Processing 2

Aspect-based sentiment analysis (ABSA) is a fine-grained task. Recurrent Neural Network (RNN) model armed with attention mechanism seems a natural fit for this task, and actually it achieves the state-of-the-art performance recently. However, previous attention mechanisms proposed for ABSA may attend irrelevant words and thus downgrade the performance, especially when dealing with long and complex sentences with multiple aspects. In this paper, we propose a novel architecture named Hierarchical Gate Memory Network (HGMN) for ABSA: firstly, we employ the proposed hierarchical gate mechanism to learn to select the related part about the given aspect, which can keep the original sequence structure of sentence at the same time. After that, we apply Convolutional Neural Network (CNN) on the final aspect-specific memory. We conduct extensive experiments on the SemEval 2014 and Twitter dataset, and results demonstrate that our model outperforms attention based state-of-the-art baselines.

Thursday 15 16:30 - 18:00 MLA|ASL - Applications of Supervised Learning (2501-2502)

Chair: Tony Qin

#621

Learning Shared Vertex Representation in Heterogeneous Graphs with Convolutional Networks for Recommendation
Yanan Xu, Yanmin Zhu, Yanyan Shen, Jiadi Yu
Details | PDF

Applications of Supervised Learning

Collaborative Filtering (CF) is among the most successful techniques in recommendation tasks. Recent works have shown a boost of performance of CF when introducing the pairwise relationships between users and items or among items (users) using interaction data. However, these works usually only utilize one kind of information, i.e., user preference in a user-item interaction matrix or item dependency in interaction sequences which can limit the recommendation performance. In this paper, we propose to mine three kinds of information (user preference, item dependency, and user similarity on behaviors) by converting interaction sequence data into multiple graphs (i.e., a user-item graph, an item-item graph, and a user-subseq graph). We design a novel graph convolutional network (PGCN) to learn shared representations of users and items with the three heterogeneous graphs. In our approach, a neighbor pooling and a convolution operation are designed to aggregate features of neighbors. Extensive experiments on two real-world datasets demonstrate that our graph convolution approaches outperform various competitive methods in terms of two metrics, and the heterogeneous graphs are proved effective for improving recommendation performance.
#1077

Representation Learning-Assisted Click-Through Rate Prediction
Wentao Ouyang, Xiuwu Zhang, Shukui Ren, Chao Qi, Zhaojie Liu, Yanlong Du
Details | PDF

Applications of Supervised Learning

Click-through rate (CTR) prediction is a critical task in online advertising systems. Most existing methods mainly model the feature-CTR relationship and suffer from the data sparsity issue. In this paper, we propose DeepMCP, which models other types of relationships in order to learn more informative and statistically reliable feature representations, and in consequence to improve the performance of CTR prediction. In particular, DeepMCP contains three parts: a matching subnet, a correlation subnet and a prediction subnet. These subnets model the user-ad, ad-ad and feature-CTR relationship respectively. When these subnets are jointly optimized under the supervision of the target labels, the learned feature representations have both good prediction powers and good representation abilities. Experiments on two large-scale datasets demonstrate that DeepMCP outperforms several state-of-the-art models for CTR prediction.
#2982

Predicting the Visual Focus of Attention in Multi-Person Discussion Videos
Chongyang Bai, Srijan Kumar, Jure Leskovec, Miriam Metzger, Jay F. Nunamaker, V. S. Subrahmanian
Details | PDF

Applications of Supervised Learning

Visual focus of attention in multi-person discussions is a crucial nonverbal indicator in tasks such as inter-personal relation inference, speech transcription, and deception detection. However, predicting the focus of attention remains a challenge because the focus changes rapidly, the discussions are highly dynamic, and the people's behaviors are inter-dependent. Here we propose ICAF (Iterative Collective Attention Focus), a collective classification model to jointly learn the visual focus of attention of all people. Every person is modeled using a separate classifier. ICAF models the people collectively---the predictions of all other people's classifiers are used as inputs to each person's classifier. This explicitly incorporates inter-dependencies between all people's behaviors. We evaluate ICAF on a novel dataset of 5 videos (35 people, 109 minutes, 7604 labels in all) of the popular Resistance game and a widely-studied meeting dataset with supervised prediction. See our demo at https://cs.dartmouth.edu/dsail/demos/icaf. ICAF outperforms the strongest baseline by 1%--5% accuracy in predicting the people's visual focus of attention. Further, we propose a lightly supervised technique to train models in the absence of training labels. We show that light-supervised ICAF performs similar to the supervised ICAF, thus showing its effectiveness and generality to previously unseen videos.
#3936

Dual-Path in Dual-Path Network for Single Image Dehazing
Aiping Yang, Haixin Wang, Zhong Ji, Yanwei Pang, Ling Shao
Details | PDF

Applications of Supervised Learning

Recently, deep learning-based single image dehazing method has been a popular approach to tackle dehazing. However, the existing dehazing approaches are performed directly on the original hazy image, which easily results in image blurring and noise amplifying. To address this issue, the paper proposes a DPDP-Net (Dual-Path in Dual-Path network) framework by employing a hierarchical dual path network. Specifically, the first-level dual-path network consists of a Dehazing Network and a Denoising Network, where the Dehazing Network is responsible for haze removal in the structural layer, and the Denoising Network deals with noise in the textural layer, respectively. And the second-level dual-path network lies in the Dehazing Network, which has an AL-Net (Atmospheric Light Network) and a TM-Net (Transmission Map Network), respectively. Concretely, the AL-Net aims to train the non-uniform atmospheric light, while the TM-Net aims to train the transmission map that reflects the visibility of the image. The final dehazing image is obtained by nonlinearly fusing the output of the Denoising Network and the Dehazing Network. Extensive experiments demonstrate that our proposed DPDP-Net achieves competitive performance against the state-of-the-art methods on both synthetic and real-world images.
#2931

FireCast: Leveraging Deep Learning to Predict Wildfire Spread
David Radke, Anna Hessler, Dan Ellsworth
Details | PDF

Applications of Supervised Learning

Destructive wildfires result in billions of dollars in damage each year and are expected to increase in frequency, duration, and severity due to climate change. The current state-of-the-art wildfire spread models rely on mathematical growth predictions and physics-based models, which are difficult and computationally expensive to run. We present and evaluate a novel system, FireCast. FireCast combines artificial intelligence (AI) techniques with data collection strategies from geographic information systems (GIS). FireCast predicts which areas surrounding a burning wildfire have high-risk of near-future wildfire spread, based on historical fire data and using modest computational resources. FireCast is compared to a random prediction model and a commonly used wildfire spread model, Farsite, outperforming both with respect to total accuracy, recall, and F-score.
#5811

Nuclei Segmentation via a Deep Panoptic Model with Semantic Feature Fusion
Dongnan Liu, Donghao Zhang, Yang Song, Chaoyi Zhang, Fan Zhang, Lauren O'Donnell, Weidong Cai
Details | PDF

Applications of Supervised Learning

Automated detection and segmentation of individual nuclei in histopathology images is important for cancer diagnosis and prognosis. Due to the high variability of nuclei appearances and numerous overlapping objects, this task still remains challenging. Deep learning based semantic and instance segmentation models have been proposed to address the challenges, but these methods tend to concentrate on either the global or local features and hence still suffer from information loss. In this work, we propose a panoptic segmentation model which incorporates an auxiliary semantic segmentation branch with the instance branch to integrate global and local features. Furthermore, we design a feature map fusion mechanism in the instance branch and a new mask generator to prevent information loss. Experimental results on three different histopathology datasets demonstrate that our method outperforms the state-of-the-art nuclei segmentation methods and popular semantic and instance segmentation models by a large margin.

Thursday 15 16:30 - 18:00 ML|C - Classification 7 (2503-2504)

Chair: Hau San Wong

#984

Out-of-sample Node Representation Learning for Heterogeneous Graph in Real-time Android Malware Detection
Yanfang Ye, Shifu Hou, Lingwei Chen, Jingwei Lei, Wenqiang Wan, Jiabin Wang, Qi Xiong, Fudong Shao
Details | PDF

Classification 7

The increasingly sophisticated Android malware calls for new defensive techniques that are capable of protecting mobile users against novel threats. In this paper, we first extract the runtime Application Programming Interface (API) call sequences from Android apps, and then analyze higher-level semantic relations within the ecosystem to comprehensively characterize the apps. To model different types of entities (i.e., app, API, device, signature, affiliation) and rich relations among them, we present a structured heterogeneous graph (HG) for modeling. To efficiently classify nodes (e.g., apps) in the constructed HG, we propose the HG-Learning method to first obtain in-sample node embeddings and then learn representations of out-of-sample nodes without rerunning/adjusting HG embeddings at the first attempt. We later design a deep neural network classifier taking the learned HG representations as inputs for real-time Android malware detection. Comprehensive experiments on large-scale and real sample collections from Tencent Security Lab are performed to compare various baselines. Promising results demonstrate that our developed system AiDroid which integrates our proposed method outperforms others in real-time Android malware detection.
#2515

Scalable Semi-Supervised SVM via Triply Stochastic Gradients
Xiang Geng, Bin Gu, Xiang Li, Wanli Shi, Guansheng Zheng, Heng Huang
Details | PDF

Classification 7

Semi-supervised learning (SSL) plays an increasingly important role in the big data era because a large number of unlabeled samples can be used effectively to improve the performance of the classifier. Semi-supervised support vector machine (S3VM) is one of the most appealing methods for SSL, but scaling up S3VM for kernel learning is still an open problem. Recently, a doubly stochastic gradient (DSG) algorithm has been proposed to achieve efficient and scalable training for kernel methods. However, the algorithm and theoretical analysis of DSG are developed based on the convexity assumption which makes them incompetent for non-convex problems such as S3VM. To address this problem, in this paper, we propose a triply stochastic gradient algorithm for S3VM, called TSGS3VM. Specifically, to handle two types of data instances involved in S3VM, TSGS3VM samples a labeled instance and an unlabeled instance as well with the random features in each iteration to compute a triply stochastic gradient. We use the approximated gradient to update the solution. More importantly, we establish new theoretic analysis for TSGS3VM which guarantees that TSGS3VM can converge to a stationary point. Extensive experimental results on a variety of datasets demonstrate that TSGS3VM is much more efficient and scalable than existing S3VM algorithms.
#2706

Classification with Label Distribution Learning
Jing Wang, Xin Geng
Details | PDF

Classification 7

Label Distribution Learning (LDL) is a novel learning paradigm, aim of which is to minimize the distance between the model output and the ground-truth label distribution. We notice that, in real-word applications, the learned label distribution model is generally treated as a classification model, with the label corresponding to the highest model output as the predicted label, which unfortunately prompts an inconsistency between the training phrase and the test phrase. To solve the inconsistency, we propose in this paper a new Label Distribution Learning algorithm for Classification (LDL4C). Firstly, instead of KL-divergence, absolute loss is applied as the measure for LDL4C. Secondly, samples are re-weighted with information entropy. Thirdly, large margin classifier is adapted to boost discrimination precision. We then reveal that theoretically LDL4C seeks a balance between generalization and discrimination. Finally, we compare LDL4C with existing LDL algorithms on 17 real-word datasets, and experimental results demonstrate the effectiveness of LDL4C in classification.
#3036

Learning Image-Specific Attributes by Hyperbolic Neighborhood Graph Propagation
Xiaofeng Xu, Ivor W. Tsang, Xiaofeng Cao, Ruiheng Zhang, Chuancai Liu
Details | PDF

Classification 7

As a kind of semantic representation of visual object descriptions, attributes are widely used in various computer vision tasks. In most of existing attribute-based research, class-specific attributes (CSA), which are class-level annotations, are usually adopted due to its low annotation cost for each class instead of each individual image. However, class-specific attributes are usually noisy because of annotation errors and diversity of individual images. Therefore, it is desirable to obtain image-specific attributes (ISA), which are image-level annotations, from the original class-specific attributes. In this paper, we propose to learn image-specific attributes by graph-based attribute propagation. Considering the intrinsic property of hyperbolic geometry that its distance expands exponentially, hyperbolic neighborhood graph (HNG) is constructed to characterize the relationship between samples. Based on HNG, we define neighborhood consistency for each sample to identify inconsistent samples. Subsequently, inconsistent samples are refined based on their neighbors in HNG. Extensive experiments on five benchmark datasets demonstrate the significant superiority of the learned image-specific attributes over the original class-specific attributes in the zero-shot object classification task.
#5341

Success Prediction on Crowdfunding with Multimodal Deep Learning
Chaoran Cheng, Fei Tan, Xiurui Hou, Zhi Wei
Details | PDF

Classification 7

We consider the problem of project success prediction on crowdfunding platforms. Despite the information in a project profile can be of different modalities such as text, images, and metadata, most existing prediction approaches leverage only the text dominated modality. Nowadays rich visual images have been utilized in more and more project profiles for attracting backers, little work has been conducted to evaluate their effects towards success prediction. Moreover, meta information has been exploited in many existing approaches for improving prediction accuracy. However, such meta information is usually limited to the dynamics after projects are posted, e.g., funding dynamics such as comments and updates. Such a requirement of using after-posting information makes both project creators and platforms not able to predict the outcome in a timely manner. In this work, we designed and evaluated advanced neural network schemes that combine information from different modalities to study the influence of sophisticated interactions among textual, visual, and metadata on project success prediction. To make pre-posting prediction possible, our approach requires only information collected from the pre-posting profile. Our extensive experimental results show that the image features could improve success prediction performance significantly, particularly for project profiles with little text information. Furthermore, we identified contributing elements.
#5399

What to Expect of Classifiers? Reasoning about Logistic Regression with Missing Features
Pasha Khosravi, Yitao Liang, YooJung Choi, Guy Van den Broeck
Details | PDF

Classification 7

While discriminative classifiers often yield strong predictive performance, missing feature values at prediction time can still be a challenge. Classifiers may not behave as expected under certain ways of substituting the missing values, since they inherently make assumptions about the data distribution they were trained on. In this paper, we propose a novel framework that classifies examples with missing features by computing the expected prediction with respect to a feature distribution. Moreover, we use geometric programming to learn a naive Bayes distribution that embeds a given logistic regression classifier and can efficiently take its expected predictions. Empirical evaluations show that our model achieves the same performance as the logistic regression with all features observed, and outperforms standard imputation techniques when features go missing during prediction time. Furthermore, we demonstrate that our method can be used to generate ``sufficient explanations'' of logistic regression classifications, by removing features that do not affect the classification.

Thursday 15 16:30 - 18:00 ML|DM - Data Mining 10 (2505-2506)

Chair: Ying Wei

#1329

Adaptive User Modeling with Long and Short-Term Preferences for Personalized Recommendation
Zeping Yu, Jianxun Lian, Ahmad Mahmoody, Gongshen Liu, Xing Xie
Details | PDF

Data Mining 10

User modeling is an essential task for online recommender systems. In the past few decades, collaborative filtering (CF) techniques have been well studied to model users' long term preferences. Recently, recurrent neural networks (RNN) have shown a great advantage in modeling users' short term preference. A natural way to improve the recommender is to combine both long-term and short-term modeling. Previous approaches neglect the importance of dynamically integrating these two user modeling paradigms. Moreover, users' behaviors are much more complex than sentences in language modeling or images in visual computing, thus the classical structures of RNN such as Long Short-Term Memory (LSTM) need to be upgraded for better user modeling. In this paper, we improve the traditional RNN structure by proposing a time-aware controller and a content-aware controller, so that contextual information can be well considered to control the state transition. We further propose an attention-based framework to combine users' long-term and short-term preferences, thus users' representation can be generated adaptively according to the specific context. We conduct extensive experiments on both public and industrial datasets. The results demonstrate that our proposed method outperforms several state-of-art methods consistently.
#1800

Neural News Recommendation with Attentive Multi-View Learning
Chuhan Wu, Fangzhao Wu, Mingxiao An, Jianqiang Huang, Yongfeng Huang, Xing Xie
Details | PDF

Data Mining 10

Personalized news recommendation is very important for online news platforms to help users find interested news and improve user experience. News and user representation learning is critical for news recommendation. Existing news recommendation methods usually learn these representations based on single news information, e.g., title, which may be insufficient. In this paper we propose a neural news recommendation approach which can learn informative representations of users and news by exploiting different kinds of news information. The core of our approach is a news encoder and a user encoder. In the news encoder we propose an attentive multi-view learning model to learn unified news representations from titles, bodies and topic categories by regarding them as different views of news. In addition, we apply both word-level and view-level attention mechanism to news encoder to select important words and views for learning informative news representations. In the user encoder we learn the representations of users based on their browsed news and apply attention mechanism to select informative news for user representation learning. Extensive experiments on a real-world dataset show our approach can effectively improve the performance of news recommendation.
#1820

Dual Self-Paced Graph Convolutional Network: Towards Reducing Attribute Distortions Induced by Topology
Liang Yang, Zhiyang Chen, Junhua Gu, Yuanfang Guo
Details | PDF

Data Mining 10

The success of graph convolutional neural networks (GCNNs) based semi-supervised node classification is credited to the attribute smoothing (propagating) over the topology. However, the attributes may be interfered by the utilization of the topology information. This distortion will induce a certain amount of misclassifications of the nodes, which can be correctly predicted with only the attributes. By analyzing the impact of the edges in attribute propagations, the simple edges, which connect two nodes with similar attributes, should be given priority during the training process compared to the complex ones according to curriculum learning. To reduce the distortions induced by the topology while exploit more potentials of the attribute information, Dual Self-Paced Graph Convolutional Network (DSP-GCN) is proposed in this paper. Specifically, the unlabelled nodes with confidently predicted labels are gradually added into the training set in the node-level self-paced learning, while edges are gradually, from the simple edges to the complex ones, added into the graph during the training process in the edge-level self-paced learning. These two learning strategies are designed to mutually reinforce each other by coupling the selections of the edges and unlabelled nodes. Experimental results of transductive semi-supervised node classification on many real networks indicate that the proposed DSP-GCN has successfully reduced the attribute distortions induced by the topology while it gives superior performances with only one graph convolutional layer.
#5942

Discriminative Sample Generation for Deep Imbalanced Learning
Ting Guo, Xingquan Zhu, Yang Wang, Fang Chen
Details | PDF

Data Mining 10

In this paper, we propose a discriminative variational autoencoder (DVAE) to assist deep learning from data with imbalanced class distributions. DVAE is designed to alleviate the class imbalance by explicitly learning class boundaries between training samples, and uses learned class boundaries to guide the feature learning and sample generation. To learn class boundaries, DVAE learns a latent two-component mixture distributor, conditioned by the class labels, so the latent features can help differentiate minority class vs. majority class samples. In order to balance the training data for deep learning to emphasize on the minority class, we combine DVAE and generative adversarial networks (GAN) to form a unified model, DVAAN, which generates synthetic instances close to the class boundaries as training data to learn latent features and update the model. Experiments and comparisons confirm that DVAAN significantly alleviates the class imbalance and delivers accurate models for deep learning from imbalanced data.
#6202

Tag2Gauss: Learning Tag Representations via Gaussian Distribution in Tagged Networks
Yun Wang, Lun Du, Guojie Song, Xiaojun Ma, Lichen Jin, Wei Lin, Fei Sun
Details | PDF

Data Mining 10

Keyword-based tags (referred to as tags) are used to represent additional attributes of nodes in addition to what is explicitly stated in their contents, like the hashtags in YouTube. Aside of being auxiliary information for node representation, tags can also be used for retrieval, recommendation, content organization, and event analysis. Therefore, tag representation learning is of great importance. However, to learn satisfactory tag representations is challenging because 1) traditional representation methods generally fail when it comes to representing tags, 2) bidirectional interactions between nodes and tags should be modeled, which are generally not dealt within existing research works. In this paper, we propose a tag representation learning model which takes tag-related node interaction into consideration, named Tag2Gauss. Specifically, since tags represent node communities with intricate overlapping relationships, we propose that Gaussian distributions would be appropriate in modeling tags. Considering the bidirectional interactions between nodes and tags, we propose a tag representation learning model mapping tags to distributions consisting of two embedding tasks, namely Tag-view embedding and Node-view embedding. Extensive evidence demonstrates the effectiveness of representing tag as a distribution, and the advantages of the proposed architecture in many applications, such as the node classification and the network visualization.
#2307

Generalized Majorization-Minimization for Non-Convex Optimization
Hu Zhang, Pan Zhou, Yi Yang, Jiashi Feng
Details | PDF

Data Mining 10

Majorization-Minimization (MM) algorithms optimize an objective function by iteratively minimizing its majorizing surrogate and offer attractively fast convergence rate for convex problems. However, their convergence behaviors for non-convex problems remain unclear. In this paper, we propose a novel MM surrogate function from strictly upper bounding the objective to bounding the objective in expectation. With this generalized surrogate conception, we develop a new optimization algorithm, termed SPI-MM, that leverages the recent proposed SPIDER for more efficient non-convex optimization. We prove that for finite-sum problems, the SPI-MM algorithm converges to an stationary point within deterministic and lower stochastic gradient complexity. To our best knowledge, this work gives the first non-asymptotic convergence analysis for MM-alike algorithms in general non-convex optimization. Extensive empirical studies on non-convex logistic regression and sparse PCA demonstrate the advantageous efficiency of the proposed algorithm and validate our theoretical results.

Thursday 15 16:30 - 18:00 KRR|LKR - Logics for Knowledge Representation (2401-2402)

Chair: Davide Lanti

#1117

Stratified Evidence Logics
Philippe Balbiani, David Fernández-Duque, Andreas Herzig, Emiliano Lorini
Details | PDF

Logics for Knowledge Representation

Evidence logics model agents' belief revision process as they incorporate and aggregate information obtained from multiple sources. This information is captured using neighbourhood structures, where individual neighbourhoods represent pieces of evidence. In this paper we propose an extended framework which allows one to explicitly quantify either the number of evidence sets, or effort, needed to justify a given proposition, provide a complete deductive calculus and a proof of decidability, and show how existing frameworks can be embedded into ours.
#10970

(Sister Conferences Best Papers Track) Do We Need Many-valued Logics for Incomplete Information?
Marco Console, Paolo Guagliardo, Leonid Libkin
Details | PDF

Logics for Knowledge Representation

One of the most common scenarios of handling incomplete information occurs in relational databases. They describe incomplete knowledge with three truth values, using Kleene's logic for propositional formulae and a rather peculiar extension to predicate calculus. This design by a committee from several decades ago is now part of the standard adopted by vendors of database management systems. But is it really the right way to handle incompleteness in propositional and predicate logics? Our goal is to answer this question. Using an epistemic approach, we first characterize possible levels of partial knowledge about propositions, which leads to six truth values. We impose rationality conditions on the semantics of the connectives of the propositional logic, and prove that Kleene's logic is the maximal sublogic to which the standard optimization rules apply, thereby justifying this design choice. For extensions to predicate logic, however, we show that the additional truth values are not necessary: every many-valued extension of first-order logic over databases with incomplete information represented by null values is no more powerful than the usual two-valued logic with the standard Boolean interpretation of the connectives. We use this observation to analyze the logic underlying SQL query evaluation, and conclude that the many-valued extension for handling incompleteness does not add any expressiveness to it.
#10984

(Journal track) A Core Method for the Weak Completion Semantics with Skeptical Abduction
Emmanuelle-Anna Dietz Saldanha, Steffen Hölldobler, Carroline Dewi Puspa Kencana Ramli, Luis Palacios Medinacelli
Details | PDF

Logics for Knowledge Representation

The Weak Completion Semantics is a novel cognitive theory which has been successfully applied -- among others -- to the suppression task, the selection task and syllogistic reasoning. It is based on logic programming with skeptical abduction. Each weakly completed program admits a least model under the three-valued Lukasiewicz logic which can be computed as the least fixed point of an appropriate semantic operator. The operator can be represented by a three-layer feed-forward network using the Core method. Its least fixed point is the unique stable state of a recursive network which is obtained from the three-layer feed-forward core by mapping the activation of the output layer back to the input layer. The recursive network is embedded into a novel network to compute skeptical abduction. This extended abstract outlines a fully connectionist realization of the Weak Completion Semantics.
#5555

A Tractable, Expressive, and Eventually Complete First-Order Logic of Limited Belief
Gerhard Lakemeyer, Hector J. Levesque
Details | PDF

Logics for Knowledge Representation

In knowledge representation, obtaining a notion of belief which is tractable, expressive, and eventually complete has been a somewhat elusive goal. Expressivity here means that an agent should be able to hold arbitrary beliefs in a very expressive language like that of first-order logic, but without being required to perform full logical reasoning on those beliefs. Eventual completeness means that any logical consequence of what is believed will eventually come to be believed, given enough reasoning effort. Tractability in a first-order setting has been a research topic for many years, but in most cases limitations were needed on the form of what was believed, and eventual completeness was so far restricted to the propositional case. In this paper, we propose a novel logic of limited belief, which has all three desired properties.
#6493

On Finite and Unrestricted Query Entailment beyond SQ with Number Restrictions on Transitive Roles
Tomasz Gogacz, Víctor Gutiérrez-Basulto, Yazmín Ibáñez-García, Jean Christoph Jung, Filip Murlak
Details | PDF

Logics for Knowledge Representation

We study the description logic SQ with number restrictions applicable to transitive roles, extended with either nominals or inverse roles. We show tight 2EXPTIME upper bounds for unrestricted entailment of regular path queries for both extensions and finite entailment of positive existential queries for nominals. For inverses, we establish 2EXPTIME-completeness for unrestricted and finite entailment of instance queries (the latter under restriction to a single, transitive role).
#2548

Learning Semantic Annotations for Tabular Data
Jiaoyan Chen, Ernesto Jimenez-Ruiz, Ian Horrocks, Charles Sutton
Details | PDF

Logics for Knowledge Representation

The usefulness of tabular data such as web tables critically depends on understanding their semantics. This study focuses on column type prediction for tables without any meta data. Unlike traditional lexical matching-based methods, we propose a deep prediction model that can fully exploit a table’s contextual semantics, including table locality features learned by a Hybrid NeuralNetwork (HNN), and inter-column semantics features learned by a knowledge base (KB) lookup and query answering algorithm. It exhibits good performance not only on individual table sets, but also when transferring from one table set to another.

Thursday 15 16:30 - 18:00 PS|PA - Planning Algorithms (2403-2404)

Chair: Roman Bartak

#941

Finding Optimal Solutions in HTN Planning - A SAT-based Approach
Gregor Behnke, Daniel Höller, Susanne Biundo
Details | PDF

Planning Algorithms

Over the last years, several new approaches to Hierarchical Task Network (HTN) planning have been proposed that increased the overall performance of HTN planners. However, the focus has been on agile planning - on finding a solution as quickly as possible. Little work has been done on finding optimal plans. We show how the currently best-performing approach to HTN planning - the translation into propositional logic - can be utilised to find optimal plans. Such SAT-based planners usually bound the HTN problem to a certain depth of decomposition and then translate the problem into a propositional formula. To generate optimal plans, the length of the solution has to be bounded instead of the decomposition depth. We show the relationship between these bounds and how it can be handled algorithmically. Based on this, we propose an optimal SAT-based HTN planner and show that it performs favourably on a benchmark set.
#2050

Adaptive Thompson Sampling Stacks for Memory Bounded Open-Loop Planning
Thomy Phan, Thomas Gabor, Robert Müller, Christoph Roch, Claudia Linnhoff-Popien
Details | PDF

Planning Algorithms

We propose Stable Yet Memory Bounded Open-Loop (SYMBOL) planning, a general memory bounded approach to partially observable open-loop planning. SYMBOL maintains an adaptive stack of Thompson Sampling bandits, whose size is bounded by the planning horizon and can be automatically adapted according to the underlying domain without any prior domain knowledge beyond a generative model. We empirically test SYMBOL in four large POMDP benchmark problems to demonstrate its effectiveness and robustness w.r.t. the choice of hyperparameters and evaluate its adaptive memory consumption. We also compare its performance with other open-loop planning algorithms and POMCP.
#4182

Merge-and-Shrink Task Reformulation for Classical Planning
Álvaro Torralba, Silvan Sievers
Details | PDF

Planning Algorithms

The performance of domain-independent planning systems heavily depends on how the planning task has been modeled. This makes task reformulation an important tool to get rid of unnecessary complexity and increase the robustness of planners with respect to the model chosen by the user. In this paper, we represent tasks as factored transition systems (FTS), and use the merge-and-shrink (M&S) framework for task reformulation for optimal and satisficing planning. We prove that the flexibility of the underlying representation makes the M&S reformulation methods more powerful than the counterparts based on the more popular finite-domain representation. We adapt delete-relaxation and M&S heuristics to work on the FTS representation and evaluate the impact of our reformulation.
#5701

Subgoal-Based Temporal Abstraction in Monte-Carlo Tree Search
Thomas Gabor, Jan Peter, Thomy Phan, Christian Meyer, Claudia Linnhoff-Popien
Details | PDF

Planning Algorithms

We propose an approach to general subgoal-based temporal abstraction in MCTS. Our approach approximates a set of available macro-actions locally for each state only requiring a generative model and a subgoal predicate. For that, we modify the expansion step of MCTS to automatically discover and optimize macro-actions that lead to subgoals. We empirically evaluate the effectiveness, computational efficiency and robustness of our approach w.r.t. different parameter settings in two benchmark domains and compare the results to standard MCTS without temporal abstraction.
#6055

Strong Fully Observable Non-Deterministic Planning with LTL and LTLf Goals
Alberto Camacho, Sheila A. McIlraith
Details | PDF

Planning Algorithms

We are concerned with the synthesis of strategies for sequential decision-making in non-deterministic dynamical environments where the objective is to satisfy a prescribed temporally extended goal. We frame this task as a Fully Observable Non-Deterministic planning problem with the goal expressed in Linear Temporal Logic (LTL), or LTL interpreted over finite traces (LTLf). While the problem is well-studied theoretically, existing algorithmic solutions typically compute so-called strong-cyclic solutions, which are predicated on an assumption of fairness. In this paper we introduce novel algorithms to compute so-called strong solutions, that guarantee goal satisfaction even in the absence of fairness. Our strategy generation algorithms are complemented with novel mechanisms to obtain proofs of unsolvability. We implemented and evaluated the performance of our approaches in a selection of domains with LTL and LTLf goals.
#6446

Generalized Potential Heuristics for Classical Planning
Guillem Francès, Augusto B. Corrêa, Cedric Geissmann, Florian Pommerening
Details | PDF

Planning Algorithms

Generalized planning aims at computing solutions that work for all instances of the same domain. In this paper, we show that several interesting planning domains possess compact generalized heuristics that can guide a greedy search in guaranteed polynomial time to the goal, and which work for any instance of the domain. These heuristics are weighted sums of state features that capture the number of objects satisfying a certain first-order logic property in any given state. These features have a meaningful interpretation and generalize naturally to the whole domain. Additionally, we present an approach based on mixed integer linear programming to compute such heuristics automatically from the observation of small training instances. We develop two variations of the approach that progressively refine the heuristic as new states are encountered. We illustrate the approach empirically on a number of standard domains, where we show that the generated heuristics will correctly generalize to all possible instances.

Thursday 15 16:30 - 18:00 Survey 2 - Survey Session 2 (2405-2406)

Chair: Fangzhen Lin

#10921

Leveraging Human Guidance for Deep Reinforcement Learning Tasks
Ruohan Zhang, Faraz Torabi, Lin Guan, Dana H. Ballard, Peter Stone
Details | PDF

Survey Session 2

Reinforcement learning agents can learn to solve sequential decision tasks by interacting with the environment. Human knowledge of how to solve these tasks can be incorporated using imitation learning, where the agent learns to imitate human demonstrated decisions. However, human guidance is not limited to the demonstrations. Other types of guidance could be more suitable for certain tasks and require less human effort. This survey provides a high-level overview of five recent learning frameworks that primarily rely on human guidance other than conventional, step-by-step action demonstrations. We review the motivation, assumption, and implementation of each framework. We then discuss possible future research directions.
#10929

Learning and Inference for Structured Prediction: A Unifying Perspective
Aryan Deshwal, Janardhan Rao Doppa, Dan Roth
Details | PDF

Survey Session 2

In a structured prediction problem, one needs to learn a predictor that, given a structured input, produces a structured object, such as a sequence, tree, or clustering output. Prototypical structured prediction tasks include part-of-speech tagging (predicting POS tag sequence for an input sentence) and semantic segmentation of images (predicting semantic labels for pixels of an input image). Unlike simple classification problems, here there is a need to assign values to multiple output variables accounting for the dependencies between them. Consequently, the prediction step itself (aka ``inference" or ``decoding") is computationally-expensive, and so is the learning process, that typically requires making predictions as part of it. The key learning and inference challenge is due to the exponential size of the structured output space and depend on its complexity. In this paper, we present a unifying perspective of the different frameworks that address structured prediction problems and compare them in terms of their strengths and weaknesses. We also discuss important research directions including integration of deep learning advances into structured prediction, and learning from weakly supervised signals and active querying to overcome the challenges of building structured predictors from small amount of labeled data.
#10944

Deep Learning for Video Captioning: A Review
Shaoxiang Chen, Ting Yao, Yu-Gang Jiang
Details | PDF

Survey Session 2

Deep learning has achieved great successes in solving specific artificial intelligence problems recently. Substantial progresses are made on Computer Vision (CV) and Natural Language Processing (NLP). As a connection between the two worlds of vision and language, video captioning is the task of producing a natural-language utterance (usually a sentence) that describes the visual content of a video. The task is naturally decomposed into two sub-tasks. One is to encode a video via a thorough understanding and learn visual representation. The other is caption generation, which decodes the learned representation into a sequential sentence, word by word. In this survey, we first formulate the problem of video captioning, then review state-of-the-art methods categorized by their emphasis on vision or language, and followed by a summary of standard datasets and representative approaches. Finally, we highlight the challenges which are not yet fully understood in this task and present future research directions.
#10945

Recent Advances in Imitation Learning from Observation
Faraz Torabi, Garrett Warnell, Peter Stone
Details | PDF

Survey Session 2

Imitation learning is the process by which one agent tries to learn how to perform a certain task using information generated by another, often more-expert agent performing that same task. Conventionally, the imitator has access to both state and action information generated by an expert performing the task (e.g., the expert may provide a kinesthetic demonstration of object placement using a robotic arm). However, requiring the action information prevents imitation learning from a large number of existing valuable learning resources such as online videos of humans performing tasks. To overcome this issue, the specific problem of imitation from observation (IfO) has recently garnered a great deal of attention, in which the imitator only has access to the state information (e.g., video frames) generated by the expert. In this paper, we provide a literature review of methods developed for IfO, and then point out some open research problems and potential future work.
#10939

A Survey of Reinforcement Learning Informed by Natural Language
Jelena Luketina, Nantas Nardelli, Gregory Farquhar, Jakob Foerster, Jacob Andreas, Edward Grefenstette, Shimon Whiteson, Tim Rocktäschel
Details | PDF

Survey Session 2

To be successful in real-world tasks, Reinforcement Learning (RL) needs to exploit the compositional, relational, and hierarchical structure of the world, and learn to transfer it to the task at hand. Recent advances in representation learning for language make it possible to build models that acquire world knowledge from text corpora and integrate this knowledge into downstream decision making problems. We thus argue that the time is right to investigate a tight integration of natural language understanding into RL in particular. We survey the state of the field, including work on instruction following, text games, and learning from textual domain knowledge. Finally, we call for the development of new environments as well as further investigation into the potential uses of recent Natural Language Processing (NLP) techniques for such tasks.
#10948

Sequential Recommender Systems: Challenges, Progress and Prospects
Shoujin Wang, Liang Hu, Yan Wang, Longbing Cao, Quan Z. Sheng, Mehmet Orgun
Details | PDF

Survey Session 2

The emerging topic of sequential recommender systems (SRSs) has attracted increasing attention in recent years. Different from the conventional recommender systems (RSs) including collaborative filtering and content-based filtering, SRSs try to understand and model the sequential user behaviors, the interactions between users and items, and the evolution of users’ preferences and item popularity over time. SRSs involve the above aspects for more precise characterization of user contexts, intent and goals, and item consumption trend, leading to more accurate, customized and dynamic recommendations. In this paper, we provide a systematic review on SRSs. We first present the characteristics of SRSs, and then summarize and categorize the key challenges in this research area, followed by the corresponding research progress consisting of the most recent and representative developments on this topic. Finally, we discuss the important research directions in this vibrant area.

Thursday 15 17:00 - 17:20 Competition (Hall A)

Angry Birds - AI's turn -- will it beat all humans?

Competition

Thursday 15 17:00 - 17:30 Industry Days (K)

Chair: Jun Luo (Lenovo)

Responsible AI - from theory to practice
Anand Rao, Global Artificial Intelligence Leader, PwC

Industry Days

Friday 16 08:30 - 09:20 Invited Talk (D-I)

Chair: Jonathan Schaeffer

Creating the Engine for Scientific Discovery: Nobel Turing Challenge as a grand challenge project in AI and Systems Biology
Hiroaki Kitano

Invited Talk

Friday 16 09:30 - 10:30 Industry Days (D-I)

Chair: Masahiro Fujita (Sony)

Using AI to Democratize Video and Music Content Creation
Lei Li, Leader of AI Lab, Bytedance

Industry Days

Friday 16 09:30 - 10:30 MLA|N - Networks (L)

Chair: Liu Yong

#340

Hierarchical Graph Convolutional Networks for Semi-supervised Node Classification
Fenyu Hu, Yanqiao Zhu, Shu Wu, Liang Wang, Tieniu Tan
Details | PDF

Networks

Graph convolutional networks (GCNs) have been successfully applied in node classification tasks of network mining. However, most of these models based on neighborhood aggregation are usually shallow and lack the “graph pooling” mechanism, which prevents the model from obtaining adequate global information. In order to increase the receptive field, we propose a novel deep Hierarchical Graph Convolutional Network (H-GCN) for semi-supervised node classification. H-GCN first repeatedly aggregates structurally similar nodes to hyper-nodes and then refines the coarsened graph to the original to restore the representation for each node. Instead of merely aggregating one- or two-hop neighborhood information, the proposed coarsening procedure enlarges the receptive field for each node, hence more global information can be captured. The proposed H-GCN model shows strong empirical performance on various public benchmark graph datasets, outperforming state-of-the-art methods and acquiring up to 5.9% performance improvement in terms of accuracy. In addition, when only a few labeled samples are provided, our model gains substantial improvements.
#2045

Scaling Fine-grained Modularity Clustering for Massive Graphs
Hiroaki Shiokawa, Toshiyuki Amagasa, Hiroyuki Kitagawa
Details | PDF

Networks

Modularity clustering is an essential tool to understand complicated graphs. However, existing methods are not applicable to massive graphs due to two serious weaknesses. (1) It is difficult to fully reproduce ground-truth clusters due to the resolution limit problem. (2) They are computationally expensive because all nodes and edges must be computed iteratively. This paper proposes gScarf, which outputs fine-grained clusters within a short running time. To overcome the aforementioned weaknesses, gScarf dynamically prunes unnecessary nodes and edges, ensuring that it captures fine-grained clusters. Experiments show that gScarf outperforms existing methods in terms of running time while finding clusters with high accuracy.
#443

An End-to-End Community Detection Model: Integrating LDA into Markov Random Field via Factor Graph
Dongxiao He, Wenze Song, Di Jin, Zhiyong Feng, Yuxiao Huang
Details | PDF

Networks

Markov Random Field (MRF) has been successfully used in community detection recently. However, existing MRF methods only utilize the network topology while ignore the semantic attributes. A straightforward way to combine the two types of information is that, one can first use a topic clustering model (e.g. LDA) to derive group membership of nodes by using the semantic attributes, then take this result as a prior to define the MRF model. In this way, however, the parameters of the two models cannot be adjusted by each other, preventing it from really realizing the complementation of the advantages of the two. This paper integrates LDA into MRF to form an end-to-end learning system where their parameters can be trained jointly. However, LDA is a directed graphic model whereas MRF is undirected, making their integration a challenge. To handle this problem, we first transform LDA and MRF into a unified factor graph framework, allowing sharing the parameters of the two models. We then derive an efficient belief propagation algorithm to train their parameters simultaneously, enabling our approach to take advantage of the strength of both LDA and MRF. Empirical results show that our approach compares favorably with the state-of-the-art methods.
#3012

Story Ending Prediction by Transferable BERT
Zhongyang Li, Xiao Ding, Ting Liu
Details | PDF

Networks

Recent advances, such as GPT and BERT, have shown success in incorporating a pre-trained transformer language model and fine-tuning operation to improve downstream NLP systems. However, this framework still has some fundamental problems in effectively incorporating supervised knowledge from other related tasks. In this study, we investigate a transferable BERT (TransBERT) training framework, which can transfer not only general language knowledge from large-scale unlabeled data but also specific kinds of knowledge from various semantically related supervised tasks, for a target task. Particularly, we propose utilizing three kinds of transfer tasks, including natural language inference, sentiment classification, and next action prediction, to further train BERT based on a pre-trained model. This enables the model to get a better initialization for the target task. We take story ending prediction as the target task to conduct experiments. The final result, an accuracy of 91.8%, dramatically outperforms previous state-of-the-art baseline methods. Several comparative experiments give some helpful suggestions on how to select transfer tasks to improve BERT.

Friday 16 09:30 - 10:30 ML|TAML - Transfer, Adaptation, Multi-task Learning 4 (J)

Chair: Tsuyoshi Ide

#1245

Hierarchical Inter-Attention Network for Document Classification with Multi-Task Learning
Bing Tian, Yong Zhang, Jin Wang, Chunxiao Xing
Details | PDF

Transfer, Adaptation, Multi-task Learning 4

Document classification is an essential task in many real world applications. Existing approaches adopt both text semantics and document structure to obtain the document representation. However, these models usually require a large collection of annotated training instances, which are not always feasible, especially in low-resource settings. In this paper, we propose a multi-task learning framework to jointly train multiple related document classification tasks. We devise a hierarchical architecture to make use of the shared knowledge from all tasks to enhance the document representation of each task. We further propose an inter-attention approach to improve the task-specific modeling of documents with global information. Experimental results on 15 public datasets demonstrate the benefits of our proposed model.
#2526

Multiplicative Sparse Feature Decomposition for Efficient Multi-View Multi-Task Learning
Lu Sun, Canh Hao Nguyen, Hiroshi Mamitsuka
Details | PDF

Transfer, Adaptation, Multi-task Learning 4

Multi-view multi-task learning refers to dealing with dual-heterogeneous data,where each sample has multi-view features,and multiple tasks are correlated via common views.Existing methods do not sufficiently address three key challenges:(a) saving task correlation efficiently, (b) building a sparse model and (c) learning view-wise weights.In this paper, we propose a new method to directly handle these challenges based on multiplicative sparse feature decomposition.For (a), the weight matrix is decomposed into two components via low-rank constraint matrix factorization, which saves task correlation by learning a reduced number of model parameters.For (b) and (c), the first component is further decomposed into two sub-components,to select topic-specific features and learn view-wise importance, respectively. Theoretical analysis reveals its equivalence with a general form of joint regularization,and motivates us to develop a fast optimization algorithm in a linear complexity w.r.t. the data size.Extensive experiments on both simulated and real-world datasets validate its efficiency.
#2707

Learning Shared Knowledge for Deep Lifelong Learning using Deconvolutional Networks
Seungwon Lee, James Stokes, Eric Eaton
Details | PDF

Transfer, Adaptation, Multi-task Learning 4

Current mechanisms for knowledge transfer in deep networks tend to either share the lower layers between tasks, or build upon representations trained on other tasks. However, existing work in non-deep multi-task and lifelong learning has shown success with using factorized representations of the model parameter space for transfer, permitting more flexible construction of task models. Inspired by this idea, we introduce a novel architecture for sharing latent factorized representations in convolutional neural networks (CNNs). The proposed approach, called a deconvolutional factorized CNN, uses a combination of deconvolutional factorization and tensor contraction to perform flexible transfer between tasks. Experiments on two computer vision data sets show that the DF-CNN achieves superior performance in challenging lifelong learning settings, resists catastrophic forgetting, and exhibits reverse transfer to improve previously learned tasks from subsequent experience without retraining.
#4932

Multi-scale Information Diffusion Prediction with Reinforced Recurrent Networks
Cheng Yang, Jian Tang, Maosong Sun, Ganqu Cui, Zhiyuan Liu
Details | PDF

Transfer, Adaptation, Multi-task Learning 4

Information diffusion prediction is an important task which studies how information items spread among users. With the success of deep learning techniques, recurrent neural networks (RNNs) have shown their powerful capability in modeling information diffusion as sequential data. However, previous works focused on either microscopic diffusion prediction which aims at guessing the next influenced user or macroscopic diffusion prediction which estimates the total numbers of influenced users during the diffusion process. To the best of our knowledge, no previous works have suggested a unified model for both microscopic and macroscopic scales. In this paper, we propose a novel multi-scale diffusion prediction model based on reinforcement learning (RL). RL incorporates the macroscopic diffusion size information into the RNN-based microscopic diffusion model by addressing the non-differentiable problem. We also employ an effective structural context extraction strategy to utilize the underlying social graph information. Experimental results show that our proposed model outperforms state-of-the-art baseline models on both microscopic and macroscopic diffusion predictions on three real-world datasets.

Friday 16 09:30 - 10:30 ML|AL - Active Learning 2 (2701-2702)

Chair: Francisco Javier Diez

#2884

Adaptive Ensemble Active Learning for Drifting Data Stream Mining
Bartosz Krawczyk, Alberto Cano
Details | PDF

Active Learning 2

Learning from data streams is among the most vital contemporary fields in machine learning and data mining. Streams pose new challenges to learning systems, due to their volume and velocity, as well as ever-changing nature caused by concept drift. Vast majority of works for data streams assume a fully supervised learning scenario, having an unrestricted access to class labels. This assumption does not hold in real-world applications, where obtaining ground truth is costly and time-consuming. Therefore, we need to carefully select which instances should be labeled, as usually we are working under a strict label budget. In this paper, we propose a novel active learning approach based on ensemble algorithms that is capable of using multiple base classifiers during the label query process. It is a plug-in solution, capable of working with most of existing streaming ensemble classifiers. We realize this process as a Multi-Armed Bandit problem, obtaining an efficient and adaptive ensemble active learning procedure by selecting the most competent classifier from the pool for each query. In order to better adapt to concept drifts, we guide our instance selection by measuring the generalization capabilities of our classifiers. This adaptive solution leads not only to better instance selection under sparse access to class labels, but also to improved adaptation to various types of concept drift and increasing the diversity of the underlying ensemble classifier.
#2998

Fully Distributed Bayesian Optimization with Stochastic Policies
Javier Garcia-Barcos, Ruben Martinez-Cantin
Details | PDF

Active Learning 2

Bayesian optimization has become a popular method for applications, like the design of computer experiments or hyperparameter tuning of expensive models, where sample efficiency is mandatory. These situations or high-throughput computing, where distributed and scalable architectures are a necessity. However, Bayesian optimization is mostly sequential. Even parallel variants require certain computations between samples, limiting the parallelization bandwidth. Thompson sampling has been previously applied for distributed Bayesian optimization. But, when compared with other acquisition functions in the sequential setting, Thompson sampling is known to perform suboptimally. In this paper, we present a new method for fully distributed Bayesian optimization, which can be combined with any acquisition function. Our approach considers Bayesian optimization as a partially observable Markov decision process. In this context, stochastic policies, such as the Boltzmann policy, have some interesting properties which can also be studied for Bayesian optimization. Furthermore, the Boltzmann policy trivially allows a distributed Bayesian optimization implementation with high level of parallelism and scalability. We present results in several benchmarks and applications that shows the performance of our method.
#5962

Perception-Aware Point-Based Value Iteration for Partially Observable Markov Decision Processes
Mahsa Ghasemi, Ufuk Topcu
Details | PDF

Active Learning 2

In conventional partially observable Markov decision processes, the observations that the agent receives originate from fixed known distributions. However, in a variety of real-world scenarios, the agent has an active role in its perception by selecting which observations to receive. We avoid combinatorial expansion of the action space from integration of planning and perception decisions, through a greedy strategy for observation selection that minimizes an information-theoretic measure of the state uncertainty. We develop a novel point-based value iteration algorithm that incorporates this greedy strategy to pick perception actions for each sampled belief point in each iteration. As a result, not only the solver requires less belief points to approximate the reachable subspace of the belief simplex, but it also requires less computation per iteration. Further, we prove that the proposed algorithm achieves a near-optimal guarantee on value function with respect to an optimal perception strategy, and demonstrate its performance empirically.
#2104

Rapid Performance Gain through Active Model Reuse
Feng Shi, Yu-Feng Li
Details | PDF

Active Learning 2

Model reuse aims at reducing the need of learning resources for a newly target task. In previous model reuse studies, the target task usually receives labeled data passively, which results in a slow performance improvement. However, learning models for target tasks are often required to achieve good enough performance rapidly for practical usage. In this paper, we propose the AcMR (Active Model Reuse) method for the rapid performance improvement problem. Firstly, we construct queries through pre-trained models to facilitate the active learner when labeled examples are insufficient in the target task. Secondly, we consider that pre-trained models are able to filter out not-very-necessary queries so that AcMR can save considerable queries compared with direct active learning. Theoretical analysis verifies that AcMR requires fewer queries than direct active learning. Experimental results validate the effectiveness of AcMR.

Friday 16 09:30 - 10:30 AMS|CG - Cooperative Games (2703-2704)

Chair: Timothy Norman

#397

Portioning Using Ordinal Preferences: Fairness and Efficiency
Stéphane Airiau, Haris Aziz, Ioannis Caragiannis, Justin Kruger, Jérôme Lang, Dominik Peters
Details | PDF

Cooperative Games

A public divisible resource is to be divided among projects. We study rules that decide on a distribution of the budget when voters have ordinal preference rankings over projects. Examples of such portioning problems are participatory budgeting, time shares, and parliament elections. We introduce a family of rules for portioning, inspired by positional scoring rules. Rules in this family are given by a scoring vector (such as plurality or Borda) associating a positive value with each rank in a vote, and an aggregation function such as leximin or the Nash product. Our family contains well-studied rules, but most are new. We discuss computational and normative properties of our rules. We focus on fairness, and introduce the SD-core, a group fairness notion. Our Nash rules are in the SD-core, and the leximin rules satisfy individual fairness properties. Both are Pareto-efficient.
#1534

An Ordinal Banzhaf Index for Social Ranking
Hossein Khani, Stefano Moretti, Meltem Öztürk
Details | PDF

Cooperative Games

We introduce a new method to rank single elements given an order over their sets. For this purpose, we extend the game theoretic notion of marginal contribution and of Banzhaf index to our ordinal framework. Furthermore, we characterize the resulting ordinal Banzhaf solution by means of a set of properties inspired from those used to axiomatically characterize another solution from the literature: the ceteris paribus majority. Finally, we show that the computational procedure for these two social ranking solutions boils down to a weighted combination of comparisons over the same subsets of elements.
#3137

Stable and Envy-free Partitions in Hedonic Games
Nathanaël Barrot, Makoto Yokoo
Details | PDF

Cooperative Games

In this paper, we study coalition formation in hedonic games through the fairness criterion of envy-freeness. Since the grand coalition is always envy-free, we focus on the conjunction of envy-freeness with stability notions. We first show that, in symmetric and additively separable hedonic games, an individually stable and justified envy-free partition may not exist and deciding its existence is NP-complete. Then, we prove that the top responsiveness property guarantees the existence of a Pareto optimal, individually stable, and envy-free partition, but it is not sufficient for the conjunction of core stability and envy-freeness. Finally, under bottom responsiveness, we show that deciding the existence of an individually stable and envy-free partition is NP-complete, but a Pareto optimal and justified envy-free partition always exists.
#5840

Robustness against Agent Failure in Hedonic Games
Ayumi Igarashi, Kazunori Ota, Yuko Sakurai, Makoto Yokoo
Details | PDF

Cooperative Games

We study how stability can be maintained even after any set of at most k players leave their groups, in the context of hedonic games. While stability properties ensure an outcome to be robust against players' deviations, it has not been considered how an unexpected change caused by a sudden deletion of players affects stable outcomes. In this paper we propose a novel criterion that reshapes stability form robustness aspect. We observe that some stability properties can be no longer preserved even when a single agent is removed. However, we obtain positive results by focusing on symmetric friend-oriented hedonic games. We prove that we can efficiently decide the existence of robust outcomes with respect to Nash stability underdeletion of any number of players or contractual individual stability under deletion of a single player. We also prove that symmetric additively separable games always admit an individual stable outcome that is robust with respect to individual rationality.

Friday 16 09:30 - 10:30 ML|LPR - Learning Preferences or Rankings 1 (2705-2706)

Chair: Weike Pan

#1232

Reinforced Negative Sampling for Recommendation with Exposure Data
Jingtao Ding, Yuhan Quan, Xiangnan He, Yong Li, Depeng Jin
Details | PDF

Learning Preferences or Rankings 1

In implicit feedback-based recommender systems, user exposure data, which record whether or not a recommended item has been interacted by a user, provide an important clue on selecting negative training samples. In this work, we improve the negative sampler by integrating the exposure data. We propose to generate high-quality negative instances by adversarial training to favour the difﬁcult instances, and by optimizing additional objective to favour the real negatives in exposure data. However, this idea is non-trivial to implement since the distribution of exposure data is latent and the item space is discrete. To this end, we design a novel RNS method (short for Reinforced Negative Sampler) that generates exposure-alike negative instances through feature matching technique instead of directly choosing from exposure data. Optimized under the reinforcement learning framework, RNS is able to integrate user preference signals in exposure data and hard negatives. Extensive experiments on two real-world datasets demonstrate the effectiveness and rationality of our RNS method. Our implementation is available at: https://github. com/dingjingtao/ReinforceNS.
#1855

Correlation-Sensitive Next-Basket Recommendation
Duc-Trong Le, Hady W. Lauw, Yuan Fang
Details | PDF

Learning Preferences or Rankings 1

Items adopted by a user over time are indicative of the underlying preferences. We are concerned with learning such preferences from observed sequences of adoptions for recommendation. As multiple items are commonly adopted concurrently, e.g., a basket of grocery items or a sitting of media consumption, we deal with a sequence of baskets as input, and seek to recommend the next basket. Intuitively, a basket tends to contain groups of related items that support particular needs. Instead of recommending items independently for the next basket, we hypothesize that incorporating information on pairwise correlations among items would help to arrive at more coherent basket recommendations. Towards this objective, we develop a hierarchical network architecture codenamed Beacon to model basket sequences. Each basket is encoded taking into account the relative importance of items and correlations among item pairs. This encoding is utilized to infer sequential associations along the basket sequence. Extensive experiments on three public real-life datasets showcase the effectiveness of our approach for the next-basket recommendation problem.
#4486

Feature Evolution Based Multi-Task Learning for Collaborative Filtering with Social Trust
Qitian Wu, Lei Jiang, Xiaofeng Gao, Xiaochun Yang, Guihai Chen
Details | PDF

Learning Preferences or Rankings 1

Social recommendation could address the data sparsity and cold-start problems for collaborative filtering by leveraging user trust relationships as auxiliary information for recommendation. However, most existing methods tend to consider the trust relationship as preference similarity in a static way and model the representations for user preference and social trust via a common feature space. In this paper, we propose TrustEV and take the view of multi-task learning to unite collaborative filtering for recommendation and network embedding for user trust. We design a special feature evolution unit that enables the embedding vectors for two tasks to exchange their features in a probabilistic manner, and further harness a meta-controller to globally explore proper settings for the feature evolution units. The training process contains two nested loops, where in the outer loop, we optimize the meta-controller by Bayesian optimization, and in the inner loop, we train the feedforward model with given feature evolution units. Experiment results show that TrustEV could make better use of social information and greatly improve recommendation MAE over state-of-the-art approaches.
#10971

(Sister Conferences Best Papers Track) Causal Embeddings for Recommendation: An Extended Abstract
Flavian Vasile, Stephen Bonner
Details | PDF

Learning Preferences or Rankings 1

Recommendations are commonly used to modify user’s natural behavior, for example, increasing product sales or the time spent on a website. This results in a gap between the ultimate business ob- jective and the classical setup where recommenda- tions are optimized to be coherent with past user be- havior. To bridge this gap, we propose a new learn- ing setup for recommendation that optimizes for the Incremental Treatment Effect (ITE) of the policy. We show this is equivalent to learning to predict recommendation outcomes under a fully random recommendation policy and propose a new domain adaptation algorithm that learns from logged data containing outcomes from a biased recommenda- tion policy and predicts recommendation outcomes according to random exposure. We compare our method against state-of-the-art factorization meth- ods, in addition to new approaches of causal rec- ommendation and show significant improvements.

Friday 16 09:30 - 10:30 HAI|EIAI - Ethical Issues in AI (2601-2602)

Chair: Mary-Anne Williams

#3224

Counterfactual Fairness: Unidentification, Bound and Algorithm
Yongkai Wu, Lu Zhang, Xintao Wu
Details | PDF

Ethical Issues in AI

Fairness-aware learning studies the problem of building machine learning models that are subject to fairness requirements. Counterfactual fairness is a notion of fairness derived from Pearl's causal model, which considers a model is fair if for a particular individual or group its prediction in the real world is the same as that in the counterfactual world where the individual(s) had belonged to a different demographic group. However, an inherent limitation of counterfactual fairness is that it cannot be uniquely quantified from the observational data in certain situations, due to the unidentifiability of the counterfactual quantity. In this paper, we address this limitation by mathematically bounding the unidentifiable counterfactual quantity, and develop a theoretically sound algorithm for constructing counterfactually fair classifiers. We evaluate our method in the experiments using both synthetic and real-world datasets, as well as compare with existing methods. The results validate our theory and show the effectiveness of our method.
#3313

Achieving Causal Fairness through Generative Adversarial Networks
Depeng Xu, Yongkai Wu, Shuhan Yuan, Lu Zhang, Xintao Wu
Details | PDF

Ethical Issues in AI

Achieving fairness in learning models is currently an imperative task in machine learning. Meanwhile, recent research showed that fairness should be studied from the causal perspective, and proposed a number of fairness criteria based on Pearl's causal modeling framework. In this paper, we investigate the problem of building causal fairness-aware generative adversarial networks (CFGAN), which can learn a close distribution from a given dataset, while also ensuring various causal fairness criteria based on a given causal graph. CFGAN adopts two generators, whose structures are purposefully designed to reflect the structures of causal graph and interventional graph. Therefore, the two generators can respectively simulate the underlying causal model that generates the real data, as well as the causal model after the intervention. On the other hand, two discriminators are used for producing a close-to-real distribution, as well as for achieving various fairness criteria based on causal quantities simulated by generators. Experiments on a real-world dataset show that CFGAN can generate high quality fair data.
#4812

FAHT: An Adaptive Fairness-aware Decision Tree Classifier
Wenbin Zhang, Eirini Ntoutsi
Details | PDF

Ethical Issues in AI

Automated data-driven decision-making systems are ubiquitous across a wide spread of online as well as offline services. These systems, depend on sophisticated learning algorithms and available data, to optimize the service function for decision support assistance. However, there is a growing concern about the accountability and fairness of the employed models by the fact that often the available historic data is intrinsically discriminatory, i.e., the proportion of members sharing one or more sensitive attributes is higher than the proportion in the population as a whole when receiving positive classification, which leads to a lack of fairness in decision support system. A number of fairness-aware learning methods have been proposed to handle this concern. However, these methods tackle fairness as a static problem and do not take the evolution of the underlying stream population into consideration. In this paper, we introduce a learning mechanism to design a fair classifier for online stream based decision-making. Our learning model, FAHT (Fairness-Aware Hoeffding Tree), is an extension of the well-known Hoeffding Tree algorithm for decision tree induction over streams, that also accounts for fairness. Our experiments show that our algorithm is able to deal with discrimination in streaming environments, while maintaining a moderate predictive performance over the stream.
#10965

(Sister Conferences Best Papers Track) Delayed Impact of Fair Machine Learning
Lydia T. Liu, Sarah Dean, Esther Rolf, Max Simchowitz, Moritz Hardt
Details | PDF

Ethical Issues in AI

Static classification has been the predominant focus of the study of fairness in machine learning. While most models do not consider how decisions change populations over time, it is conventional wisdom that fairness criteria promote the long-term well-being of groups they aim to protect. This work studies the interaction of static fairness criteria with temporal indicators of well-being. We show a simple one-step feedback model in which common criteria do not generally promote improvement over time, and may in fact cause harm. Our results highlight the importance of temporal modeling in the evaluation of fairness criteria, suggesting a range of new challenges and trade-offs.

Friday 16 09:30 - 10:30 KRR|CMA - Computational Models of Argument (2603-2604)

Chair: Stefano Bistarelli

#10981

(Journal track) On the Responsibility for Undecisiveness in Preferred and Stable Labellings in Abstract Argumentation
Claudia Schulz, Francesca Toni
Details | PDF

Computational Models of Argument

Different semantics of abstract Argumentation Frameworks (AFs) provide different levels of decisiveness for reasoning about the acceptability of conflicting arguments.The stable semantics is useful for applications requiring a high level of decisiveness, as it assigns to each argument the label "accepted" or the label "rejected". Unfortunately, stable labellings are not guaranteed to exist, thus raising the question as to which parts of AFs are responsible for the non-existence. In this paper, we address this question by investigating a more general question concerning preferred labellings (which may be less decisive than stable labellings but are always guaranteed to exist), namely why a given preferred labelling may not be stable and thus undecided on some arguments. In particular, (1) we give various characterisations of parts of an AF, based on the given preferred labelling, and (2) we show that these parts are indeed responsible for the undecisiveness if the preferred labelling is not stable. We then use these characterisations to explain the non-existence of stable labellings.
#4410

An Efficient Algorithm for Skeptical Preferred Acceptance in Dynamic Argumentation Frameworks
Gianvincenzo Alfano, Sergio Greco, Francesco Parisi
Details | PDF

Computational Models of Argument

Though there has been an extensive body of work on efficiently solving computational problems for static Dung's argumentation frameworks (AFs), little work has been done for handling dynamic AFs and in particular for deciding the skeptical acceptance of a given argument. In this paper we devise an efficient algorithm for computing the skeptical preferred acceptance in dynamic AFs. More specifically, we investigate how the skeptical acceptance of an argument (goal) evolves when the given AF is updated and propose an efficient algorithm for solving this problem. Our algorithm, called SPA, relies on two main ideas: i) computing a small portion of the input AF, called "context-based" AF, which is sufficient to determine the status of the goal in the updated AF, and ii) incrementally computing the ideal extension to further restrict the context-based AF. We experimentally show that SPA significantly outperforms the computation from scratch, and that the overhead of incrementally maintaining the ideal extension pays off as it speeds up the computation.
#1591

Compilation of Logical Arguments
Leila Amgoud, Dragan Doder
Details | PDF

Computational Models of Argument

Several argument-based logics have been defined for handling inconsistency in propositional knowledge bases. We show that they may miss intuitive consequences, and discuss two sources of this drawback: the definition of logical argument i) may prevent formulas from being justified, and ii) may allow irrelevant information in argument's support. We circumvent these two issues by considering a general definition of argument and compiling each argument. A compilation amounts to forgetting in an argument's support any irrelevant variable. This operation returns zero, one or several concise arguments, which we then use in an instance of Dung's abstract framework. We show that the resulting logic satisfies existing rationality postulates, namely consistency and closure under deduction. Furthermore, it is more productive than the existing argument-based and coherence-based logics.
#5606

Comparing Options with Argument Schemes Powered by Cancellation
Khaled Belahcene, Christophe Labreuche, Nicolas Maudet, Vincent Mousseau, Wassila Ouerdane
Details | PDF

Computational Models of Argument

We introduce a way of reasoning about preferences represented as pairwise comparative statements, based on a very simple yet appealing principle: cancelling out common values across statements. We formalize and streamline this procedure with argument schemes. As a result, any conclusion drawn by means of this approach comes along with a justification. It turns out that the statements which can be inferred through this process form a proper preference relation. More precisely, it corresponds to a necessary preference relation under the assumption of additive utilities. We show the inference task can be performed in polynomial time in this setting, but that finding a minimal length explanation is NP-complete.

Friday 16 09:30 - 10:30 NLP|QA - Question Answering (2605-2606)

Chair: Parisa Kordjamshidi

#2858

Neural Program Induction for KBQA Without Gold Programs or Query Annotations
Ghulam Ahmed Ansari, Amrita Saha, Vishwajeet Kumar, Mohan Bhambhani, Karthik Sankaranarayanan, Soumen Chakrabarti
Details | PDF

Question Answering

Neural Program Induction (NPI) is a paradigm for decomposing high-level tasks such as complex question-answering over knowledge bases (KBQA) into executable programs by employing neural models. Typically, this involves two key phases: i) inferring input program variables from the high-level task description, and ii) generating the correct program sequence involving these variables. Here we focus on NPI for Complex KBQA with only the final answer as supervision, and not gold programs. This raises major challenges; namely, i) noisy query annotation in the absence of any supervision can lead to catastrophic forgetting while learning, ii) reward becomes extremely sparse owing to the noise. To deal with these, we propose a noise-resilient NPI model, Stable Sparse Reward based Programmer (SSRP) that evades noise-induced instability through continual retrospection and its comparison with current learning behavior. On complex KBQA datasets, SSRP performs at par with hand-crafted rule-based models when provided with gold program input, and in the noisy settings outperforms state-of-the-art models by a significant margin even with a noisier query annotator.
#6281

AmazonQA: A Review-Based Question Answering Task
Mansi Gupta, Nitish Kulkarni, Raghuveer Chanda, Anirudha Rayasam, Zachary C. Lipton
Details | PDF

Question Answering

Every day, thousands of customers post questions on Amazon product pages. After some time, if they are fortunate, a knowledgeable customer might answer their question. Observing that many questions can be answered based upon the available product reviews, we propose the task of review-based QA. Given a corpus of reviews and a question, the QA system synthesizes an answer. To this end, we introduce a new dataset and propose a method that combines informational retrieval techniques for selecting relevant reviews (given a question) and "reading comprehension" models for synthesizing an answer (given a question and review). Our dataset consists of 923k questions, 3.6M answers and 14M reviews across 156k products. Building on the well-known Amazon dataset, we additionally collect annotations marking each question as either answerable or unanswerable based on the available reviews. A deployed system could first classify a question as answerable before attempting to generate a provisional answer. Notably, unlike many popular QA datasets, here the questions, passages, and answers are extracted from real human interactions. We evaluate a number of models for answer generation and propose strong baselines, demonstrating the challenging nature of this new task.
#4304

Knowledge Base Question Answering with Topic Units
Yunshi Lan, Shuohang Wang, Jing Jiang
Details | PDF

Question Answering

Knowledge base question answering (KBQA) is an important task in natural language processing. Existing methods for KBQA usually start with entity linking, which considers mostly named entities found in a question as the starting points in the KB to search for answers to the question. However, relying only on entity linking to look for answer candidates may not be sufficient. In this paper, we propose to perform topic unit linking where topic units cover a wider range of units of a KB. We use a generation-and-scoring approach to gradually refine the set of topic units. Furthermore, we use reinforcement learning to jointly learn the parameters for topic unit linking and answer candidate ranking in an end-to-end manner. Experiments on three commonly used benchmark datasets show that our method consistently works well and outperforms the previous state of the art on two datasets.
#4498

Knowledge-enhanced Hierarchical Attention for Community Question Answering with Multi-task and Adaptive Learning
Min Yang, Lei Chen, Xiaojun Chen, Qingyao Wu, Wei Zhou, Ying Shen
Details | PDF

Question Answering

In this paper, we propose a Knowledge-enhanced Hierarchical Attention for community question answering with Multi-task learning and Adaptive learning (KHAMA). First, we propose a hierarchical attention network to fully fuse knowledge from input documents and knowledge base (KB) by exploiting the semantic compositionality of the input sequences. The external factual knowledge helps recognize background knowledge (entity mentions and their relationships) and eliminate noise information from long documents that have sophisticated syntactic and semantic structures. In addition, we build multiple CQA models with adaptive boosting and then combine these models to learn a more effective and robust CQA system. Further- more, KHAMA is a multi-task learning model. It regards CQA as the primary task and question categorization as the auxiliary task, aiming at learning a category-aware document encoder and enhance the quality of identifying essential information from long questions. Extensive experiments on two benchmarks demonstrate that KHAMA achieves substantial improvements over the compared methods.

Friday 16 09:30 - 10:30 ML|RL2 - Relational Learning (2501-2502)

Chair: Jia Xiuyi

#2295

Attributed Graph Clustering via Adaptive Graph Convolution
Xiaotong Zhang, Han Liu, Qimai Li, Xiao-Ming Wu
Details | PDF

Relational Learning

Attributed graph clustering is challenging as it requires joint modelling of graph structures and node attributes. Recent progress on graph convolutional networks has proved that graph convolution is effective in combining structural and content information, and several recent methods based on it have achieved promising clustering performance on some real attributed networks. However, there is limited understanding of how graph convolution affects clustering performance and how to properly use it to optimize performance for different graphs. Existing methods essentially use graph convolution of a fixed and low order that only takes into account neighbours within a few hops of each node, which underutilizes node relations and ignores the diversity of graphs. In this paper, we propose an adaptive graph convolution method for attributed graph clustering that exploits high-order graph convolution to capture global cluster structure and adaptively selects the appropriate order for different graphs. We establish the validity of our method by theoretical analysis and extensive experiments on benchmark datasets. Empirical results show that our method compares favourably with state-of-the-art methods.
#2972

CensNet: Convolution with Edge-Node Switching in Graph Neural Networks
Xiaodong Jiang, Pengsheng Ji, Sheng Li
Details | PDF

Relational Learning

In this paper, we present CensNet, Convolution with Edge-Node Switching graph neural network, for semi-supervised classification and regression in graph-structured data with both node and edge features. CensNet is a general graph embedding framework, which embeds both nodes and edges to a latent feature space. By using line graph of the original undirected graph, the role of nodes and edges are switched, and two novel graph convolution operations are proposed for feature propagation. Experimental results on real-world academic citation networks and quantum chemistry graphs show that our approach has achieved or matched the state-of-the-art performance.
#3812

A Vectorized Relational Graph Convolutional Network for Multi-Relational Network Alignment
Rui Ye, Xin Li, Yujie Fang, Hongyu Zang, Mingzhong Wang
Details | PDF

Relational Learning

Alignment of multiple multi-relational networks, such as knowledge graphs, is vital for AI applications. Different from the conventional alignment models, we apply the graph convolutional network (GCN) to achieve more robust network embedding for the alignment task. In comparison with existing GCNs which cannot fully utilize multi-relation information, we propose a vectorized relational graph convolutional network (VR-GCN) to learn the embeddings of both graph entities and relations simultaneously for multi-relational networks. The role discrimination and translation property of knowledge graphs are adopted in the convolutional process. Thereafter, AVR-GCN, the alignment framework based on VR-GCN, is developed for multi-relational network alignment tasks. Anchors are used to supervise the objective function which aims at minimizing the distances between anchors, and to generate new cross-network triplets to build a bridge between different knowledge graphs at the level of triplet to improve the performance of alignment. Experiments on real-world datasets show that the proposed solutions outperform the state-of-the-art methods in terms of network embedding, entity alignment, and relation alignment.
#4998

Anytime Bottom-Up Rule Learning for Knowledge Graph Completion
Christian Meilicke, Melisachew Wudage Chekol, Daniel Ruffinelli, Heiner Stuckenschmidt
Details | PDF

Relational Learning

We propose an anytime bottom-up technique for learning logical rules from large knowledge graphs. We apply the learned rules to predict candidates in the context of knowledge graph completion. Our approach outperforms other rule-based approaches and it is competitive with current state of the art, which is based on latent representations. Besides, our approach is significantly faster, requires less computational resources, and yields an explanation in terms of the rules that propose a candidate.

Friday 16 09:30 - 10:30 ML|C2 - Clustering (2503-2504)

Chair: Furao Shen

#530

Multiple Partitions Aligned Clustering
Zhao Kang, Zipeng Guo, Shudong Huang, Siying Wang, Wenyu Chen, Yuanzhang Su, Zenglin Xu
Details | PDF

Clustering

Multi-view clustering is an important yet challenging task due to the difficulty of integrating the information from multiple representations. Most existing multi-view clustering methods explore the heterogeneous information in the space where the data points lie. Such common practice may cause significant information loss because of unavoidable noise or inconsistency among views. Since different views admit the same cluster structure, the natural space should be all partitions. Orthogonal to existing techniques, in this paper, we propose to leverage the multi-view information by fusing partitions. Specifically, we align each partition to form a consensus cluster indicator matrix through a distinct rotation matrix. Moreover, a weight is assigned for each view to account for the clustering capacity differences of views. Finally, the basic partitions, weights, and consensus clustering are jointly learned in a unified framework. We demonstrate the effectiveness of our approach on several real datasets, where significant improvement is found over other state-of-the-art multi-view clustering methods.
#2535

Deep Adversarial Multi-view Clustering Network
Zhaoyang Li, Qianqian Wang, Zhiqiang Tao, Quanxue Gao, Zhaohua Yang
Details | PDF

Clustering

Multi-view clustering has attracted increasing attention in recent years by exploiting common clustering structure across multiple views. Most existing multi-view clustering algorithms use shallow and linear embedding functions to learn the common structure of multi-view data. However, these methods cannot fully utilize the non-linear property of multi-view data, which is important to reveal complex cluster structure underlying multi-view data. In this paper, we propose a novel multi-view clustering method, named Deep Adversarial Multi-view Clustering (DAMC) network, to learn the intrinsic structure embedded in multi-view data. Specifically, our model adopts deep auto-encoders to learn latent representations shared by multiple views, and meanwhile leverages adversarial training to further capture the data distribution and disentangle the latent space. Experimental results on several real-world datasets demonstrate that the proposed method outperforms the state-of art methods.
#4798

Multi-View Multiple Clustering
Shixin Yao, Guoxian Yu, Jun Wang, Carlotta Domeniconi, Xiangliang Zhang
Details | PDF

Clustering

Multiple clustering aims at exploring alternative clusterings to organize the data into meaningful groups from different perspectives. Existing multiple clustering algorithms are designed for single-view data. We assume that the individuality and commonality of multi-view data can be leveraged to generate high-quality and diverse clusterings. To this end, we propose a novel multi-view multiple clustering (MVMC) algorithm. MVMC first adapts multi-view self-representation learning to explore the individuality encoding matrices and the shared commonality matrix of multi-view data. It additionally reduces the redundancy (i.e., enhancing the individuality) among the matrices using the Hilbert-Schmidt Independence Criterion (HSIC), and collects shared information by forcing the shared matrix to be smooth across all views. It then uses matrix factorization on the individual matrices, along with the shared matrix, to generate diverse clusterings of high-quality. We further extend multiple co-clustering on multi-view data and propose a solution called multi-view multiple co-clustering (MVMCC). Our empirical study shows that MVMC (MVMCC) can exploit multi-view data to generate multiple high-quality and diverse clusterings (co-clusterings), with superior performance to the state-of-the-art methods.
#6035

Balanced Clustering: A Uniform Model and Fast Algorithm
Weibo Lin, Zhu He, Mingyu Xiao
Details | PDF

Clustering

Clustering is a fundamental research topic in data mining and machine learning. In addition, many specific applications demand that the clusters obtained be balanced. In this paper, we present a balanced clustering model that is to minimize the sum of squared distances to cluster centers, with uniform regularization functions to control the balance degree of the clustering results. To solve the model, we adopt the idea of the k-means method. We show that the k-means assignment step has an equivalent minimum cost flow formulation when the regularization functions are all convex. By using a novel and simple acceleration technique for the k-means and network simplex methods our model can be solved quite efficiently. Experimental results over benchmarks validate the advantage of our algorithm compared to the state-of-the-art balanced clustering algorithms. On most datasets, our algorithm runs more than 100 times faster than previous algorithms with a better solution.

Friday 16 09:30 - 10:30 MLA|BS - Big data ; Scalability (2505-2506)

Chair: Mingkui Tan

#915

Faster Distributed Deep Net Training: Computation and Communication Decoupled Stochastic Gradient Descent
Shuheng Shen, Linli Xu, Jingchang Liu, Xianfeng Liang, Yifei Cheng
Details | PDF

Big data ; Scalability

With the increase in the amount of data and the expansion of model scale, distributed parallel training becomes an important and successful technique to address the optimization challenges. Nevertheless, although distributed stochastic gradient descent (SGD) algorithms can achieve a linear iteration speedup, they are limited significantly in practice by the communication cost, making it difficult to achieve a linear time speedup. In this paper, we propose a computation and communication decoupled stochastic gradient descent (CoCoD-SGD) algorithm to run computation and communication in parallel to reduce the communication cost. We prove that CoCoD-SGD has a linear iteration speedup with respect to the total computation capability of the hardware resources. In addition, it has a lower communication complexity and better time speedup comparing with traditional distributed SGD algorithms. Experiments on deep neural network training demonstrate the significant improvements of CoCoD-SGD: when training ResNet18 and VGG16 with 16 Geforce GTX 1080Ti GPUs, CoCoD-SGD is up to 2-3 x faster than traditional synchronous SGD.
#4729

Combining ADMM and the Augmented Lagrangian Method for Efficiently Handling Many Constraints
Joachim Giesen, Soeren Laue
Details | PDF

Big data ; Scalability

Many machine learning methods entail minimizing a loss-function that is the sum of the losses for each data point. The form of the loss function is exploited algorithmically, for instance in stochastic gradient descent (SGD) and in the alternating direction method of multipliers (ADMM). However, there are also machine learning methods where the entailed optimization problem features the data points not in the objective function but in the form of constraints, typically one constraint per data point. Here, we address the problem of solving convex optimization problems with many convex constraints. Our approach is an extension of ADMM. The straightforward implementation of ADMM for solving constrained optimization problems in a distributed fashion solves constrained subproblems on different compute nodes that are aggregated until a consensus solution is reached. Hence, the straightforward approach has three nested loops: one for reaching consensus, one for the constraints, and one for the unconstrained problems. Here, we show that solving the costly constrained subproblems can be avoided. In our approach, we combine the ability of ADMM to solve convex optimization problems in a distributed setting with the ability of the augmented Lagrangian method to solve constrained optimization problems. Consequently, our algorithm only needs two nested loops. We prove that it inherits the convergence guarantees of both ADMM and the augmented Lagrangian method. Experimental results corroborate our theoretical findings.
#2367

Asynchronous Stochastic Frank-Wolfe Algorithms for Non-Convex Optimization
Bin Gu, Wenhan Xian, Heng Huang
Details | PDF

Big data ; Scalability

Asynchronous parallel stochastic optimization for non-convex problems becomes more and more important in machine learning especially due to the popularity of deep learning. The Frank-Wolfe (a.k.a. conditional gradient) algorithms has regained much interest because of its projection-free property and the ability of handling structured constraints. However, our understanding of asynchronous stochastic Frank-Wolfe algorithms is extremely limited especially in the non-convex setting. To address this challenging problem, in this paper, we propose our asynchronous stochastic Frank-Wolfe algorithm (AsySFW) and its variance reduction version (AsySVFW) for solving the constrained non-convex optimization problems. More importantly, we prove the fast convergence rates of AsySFW and AsySVFW in the non-convex setting. To the best of our knowledge, AsySFW and AsySVFW are the first asynchronous parallel stochastic algorithms with convergence guarantees for solving the constrained non-convex optimization problems. The experimental results on real high-dimensional gray-scale images not only confirm the fast convergence of our algorithms, but also show a near-linear speedup on a parallel system with shared memory due to the lock-free implementation.
#6530

Scalable Bayesian Non-linear Matrix Completion
Xiangju Qin, Paul Blomstedt, Samuel Kaski
Details | PDF

Big data ; Scalability

Matrix completion aims to predict missing elements in a partially observed data matrix which in typical applications, such as collaborative filtering, is large and extremely sparsely observed. A standard solution is matrix factorization, which predicts unobserved entries as linear combinations of latent variables. We generalize to non-linear combinations in massive-scale matrices. Bayesian approaches have been proven beneficial in linear matrix completion, but not applied in the more general non-linear case, due to limited scalability. We introduce a Bayesian non-linear matrix completion algorithm, which is based on a recent Bayesian formulation of Gaussian process latent variable models. To solve the challenges regarding scalability and computation, we propose a data-parallel distributed computational approach with a restricted communication scheme. We evaluate our method on challenging out-of-matrix prediction tasks using both simulated and real-world data.

Friday 16 09:30 - 10:30 NLP|SATM - Sentiment Analysis and Text Mining 1 (2401-2402)

Chair: Junhui Li

#1031

Quantum-Inspired Interactive Networks for Conversational Sentiment Analysis
Yazhou Zhang, Qiuchi Li, Dawei Song, Peng Zhang, Panpan Wang
Details | PDF

Sentiment Analysis and Text Mining 1

Conversational sentiment analysis is an emerging, yet challenging Artificial Intelligence (AI) subtask. It aims to discover the affective state of each participant in a conversation. There exists a wealth of interaction information that affects the sentiment of speakers. However, the existing sentiment analysis approaches are insufficient in dealing with this task due to ignoring the interactions and dependency relationships between utterances. In this paper, we aim to address this issue by modeling intrautterance and inter-utterance interaction dynamics. We propose an approach called quantum-inspired interactive networks (QIN), which leverages the mathematical formalism of quantum theory (QT) and the long short term memory (LSTM) network, to learn such interaction dynamics. Specifically, a density matrix based convolutional neural network (DM-CNN) is proposed to capture the interactions within each utterance (i.e., the correlations between words), and a strong-weak influence model inspired by quantum measurement theory is developed to learn the interactions between adjacent utterances (i.e., how one speaker influences another). Extensive experiments are conducted on the MELD and IEMOCAP datasets. The experimental results demonstrate the effectiveness of the QIN model.
#3485

RTHN: A RNN-Transformer Hierarchical Network for Emotion Cause Extraction
Rui Xia, Mengran Zhang, Zixiang Ding
Details | PDF

Sentiment Analysis and Text Mining 1

The emotion cause extraction (ECE) task aims at discovering the potential causes behind a certain emotion expression in a document. Techniques including rule-based methods, traditional machine learning methods and deep neural networks have been proposed to solve this task. However, most of the previous work considered ECE as a set of independent clause classification problems and ignored the relations between multiple clauses in a document. In this work, we propose a joint emotion cause extraction framework, named RNN-Transformer Hierarchical Network (RTHN), to encode and classify multiple clauses synchronously. RTHN is composed of a lower word-level encoder based on RNNs to encode multiple words in each clause, and an upper clause-level encoder based on Transformer to learn the correlation between multiple clauses in a document. We furthermore propose ways to encode the relative position and global predication information into Transformer that can capture the causality between clauses and make RTHN more efficient. We finally achieve the best performance among 12 compared systems and improve the F1 score of the state-of-the-art from 72.69% to 76.77%.
#4165

Cold-Start Aware Deep Memory Network for Multi-Entity Aspect-Based Sentiment Analysis
Kaisong Song, Wei Gao, Lujun Zhao, Jun Lin, Changlong Sun, Xiaozhong Liu
Details | PDF

Sentiment Analysis and Text Mining 1

Various types of target information have been considered in aspect-based sentiment analysis, such as entities and aspects. Existing research has realized the importance of targets and developed methods with the goal of precisely modeling their contexts via generating target-specific representations. However, all these methods ignore that these representations cannot be learned well due to the lack of sufficient human-annotated target-related reviews, which leads to the data sparsity challenge, a.k.a. cold-start problem here. In this paper, we focus on a more general multiple entity aspect-based sentiment analysis (ME-ABSA) task which aims at identifying the sentiment polarity of different aspects of multiple entities in their context. Faced with severe cold-start scenario, we develop a novel and extensible deep memory network framework with cold-start aware computational layers which use frequency-guided attention mechanism to accentuate on the most related targets, and then compose their representations into a complementary vector for enhancing the representations of cold-start entities and aspects. To verify the effectiveness of the framework, we instantiate it with a concrete context encoding method and then apply the model to the ME-ABSA task. Experimental results conducted on two public datasets demonstrate that the proposed approach outperforms state-of-the-art baselines on ME-ABSA task.
#10959

(Sister Conferences Best Papers Track) Addressing Age-Related Bias in Sentiment Analysis
Mark Díaz, Isaac Johnson, Amanda Lazar, Anne Marie Piper, Darren Gergle
Details | PDF

Sentiment Analysis and Text Mining 1

Recent studies have identified various forms of bias in language-based models, raising concerns about the risk of propagating social biases against certain groups based on sociodemographic factors (e.g., gender, race, geography). In this study, we analyze the treatment of age-related terms across 15 sentiment analysis models and 10 widely-used GloVe word embeddings and attempt to alleviate bias through a method of processing model training data. Our results show significant age bias is encoded in the outputs of many sentiment analysis algorithms and word embeddings, and we can alleviate this bias by manipulating training data.

Friday 16 09:45 - 10:30 ML|MMM - Multi-instance;Multi-label;Multi-view learning 2 (2403-2404)

Chair: Dejing Dou

#1447

Label Distribution Learning with Label Correlations via Low-Rank Approximation
Tingting Ren, Xiuyi Jia, Weiwei Li, Shu Zhao
Details | PDF

Multi-instance;Multi-label;Multi-view learning 2

Label distribution learning (LDL) can be viewed as the generalization of multi-label learning. This novel paradigm focuses on the relative importance of different labels to a particular instance. Most previous LDL methods either ignore the correlation among labels, or only exploit the label correlations in a global way. In this paper, we utilize both the global and local relevance among labels to provide more information for training model and propose a novel label distribution learning algorithm. In particular, a label correlation matrix based on low-rank approximation is applied to capture the global label correlations. In addition, the label correlation among local samples are adopted to modify the label correlation matrix. The experimental results on real-world data sets show that the proposed algorithm outperforms state-of-the-art LDL methods.
#3198

Multi-view Spectral Clustering Network
Zhenyu Huang, Joey Tianyi Zhou, Xi Peng, Changqing Zhang, Hongyuan Zhu, Jiancheng Lv
Details | PDF

Multi-instance;Multi-label;Multi-view learning 2

Multi-view clustering aims to cluster data from diverse sources or domains, which has drawn considerable attention in recent years. In this paper, we propose a novel multi-view clustering method named multi-view spectral clustering network (MvSCN) which could be the first deep version of multi-view spectral clustering to the best of our knowledge. To deeply cluster multi-view data, MvSCN incorporates the local invariance within every single view and the consistency across different views into a novel objective function, where the local invariance is defined by a deep metric learning network rather than the Euclidean distance adopted by traditional approaches. In addition, we enforce and reformulate an orthogonal constraint as a novel layer stacked on an embedding network for two advantages, i.e. jointly optimizing the neural network and performing matrix decomposition and avoiding trivial solutions. Extensive experiments on four challenging datasets demonstrate the effectiveness of our method compared with 10 state-of-the-art approaches in terms of three evaluation metrics.
#3320

Flexible Multi-View Representation Learning for Subspace Clustering
Ruihuang Li, Changqing Zhang, Qinghua Hu, Pengfei Zhu, Zheng Wang
Details | PDF

Multi-instance;Multi-label;Multi-view learning 2

In recent years, numerous multi-view subspace clustering methods have been proposed to exploit the complementary information from multiple views. Most of them perform data reconstruction within each single view, which makes the subspace representation unpromising and thus can not well identify the underlying relationships among data. In this paper, we propose to conduct subspace clustering based on Flexible Multi-view Representation (FMR) learning, which avoids using partial information for data reconstruction. The latent representation is flexibly constructed by enforcing it to be close to different views, which implicitly makes it more comprehensive and well-adapted to subspace clustering. With the introduction of kernel dependence measure, the latent representation can flexibly encode complementary information from different views and explore nonlinear, high-order correlations among these views. We employ the Alternating Direction Minimization (ADM) method to solve our problem. Empirical studies on real-world datasets show that our method achieves superior clustering performance over other state-of-the-art methods.

Friday 16 11:00 - 11:30 Industry Days (D-I)

Chair: Yu Zheng

Xiaowei: A voice assistant at Wechat AI
Yik-Cheung (Wilson) Tam, Principal Research Scientist Wechat AI, Tencent

Industry Days

Friday 16 11:00 - 12:00 Early Career 6 - Early Career Spotlight 6 (2403-2404)

Chair: Gudong Long

#11067

OCR in the Wild: Recent Developments, Challenges and Future Trends
Xiang Bai

Early Career Spotlight 6

Reading text in the wild, consisting of two main steps: scene text detection and scene text recognition,is a general OCR technology that attracts wide attention from academia and industry. Recently, remarkable progresses have been achieved for scene text reading due to the successes of deep neural networks. In this talk, I will give a thorough overview of the state-of-the-art deep learning methods for scene text reading, and summarize its urgent challenges in real applications. Last, the future trends of this area will be predicted.

Friday 16 11:00 - 12:15 AMS|ML - Multi-agent Learning 2 (2703-2704)

Chair: Timothy Norman

#2159

Large-Scale Home Energy Management Using Entropy-Based Collective Multiagent Deep Reinforcement Learning Framework
Yaodong Yang, Jianye Hao, Yan Zheng, Chao Yu
Details | PDF

Multi-agent Learning 2

Smart grids are contributing to the demand-side management by integrating electronic equipment, distributed energy generation and storage and advanced meters and controllers. With the increasing adoption of electric vehicles and distributed energy generation and storage systems, residential energy management is drawing more and more attention, which is regarded as being critical to demand-supply balancing and peak load reduction. In this paper, we focus on a microgrid scenario in which modern homes interact together under a large-scale setting to better optimize their electricity cost. We first make households form a group with an economic stimulus. Then we formulate the energy expense optimization problem of the household community as a multi-agent coordination problem and present an Entropy-Based Collective Multiagent Deep Reinforcement Learning (EB-C-MADRL) framework to address it. Experiments with various real-world data demonstrate that EB-C-MADRL can reduce both the long-term group power consumption cost and daily peak demand effectively compared with existing approaches.
#2352

Towards Efficient Detection and Optimal Response against Sophisticated Opponents
Tianpei Yang, Jianye Hao, Zhaopeng Meng, Chongjie Zhang, Yan Zheng, Ze Zheng
Details | PDF

Multi-agent Learning 2

Multiagent algorithms often aim to accurately predict the behaviors of other agents and find a best response accordingly. Previous works usually assume an opponent uses a stationary strategy or randomly switches among several stationary ones. However, an opponent may exhibit more sophisticated behaviors by adopting more advanced reasoning strategies, e.g., using a Bayesian reasoning strategy. This paper proposes a novel approach called Bayes-ToMoP which can efficiently detect the strategy of opponents using either stationary or higher-level reasoning strategies. Bayes-ToMoP also supports the detection of previously unseen policies and learning a best-response policy accordingly. We provide a theoretical guarantee of the optimality on detecting the opponent's strategies. We also propose a deep version of Bayes-ToMoP by extending Bayes-ToMoP with DRL techniques. Experimental results show both Bayes-ToMoP and deep Bayes-ToMoP outperform the state-of-the-art approaches when faced with different types of opponents in two-agent competitive games.
#2501

Anytime Heuristic for Weighted Matching Through Altruism-Inspired Behavior
Panayiotis Danassis, Aris Filos-Ratsikas, Boi Faltings
Details | PDF

Multi-agent Learning 2

We present a novel anytime heuristic (ALMA), inspired by the human principle of altruism, for solving the assignment problem. ALMA is decentralized, completely uncoupled, and requires no communication between the participants. We prove an upper bound on the convergence speed that is polynomial in the desired number of resources and competing agents per resource; crucially, in the realistic case where the aforementioned quantities are bounded independently of the total number of agents/resources, the convergence time remains constant as the total problem size increases. We have evaluated ALMA under three test cases: (i) an anti-coordination scenario where agents with similar preferences compete over the same set of actions, (ii) a resource allocation scenario in an urban environment, under a constant-time constraint, and finally, (iii) an on-line matching scenario using real passenger-taxi data. In all of the cases, ALMA was able to reach high social welfare, while being orders of magnitude faster than the centralized, optimal algorithm. The latter allows our algorithm to scale to realistic scenarios with hundreds of thousands of agents, e.g., vehicle coordination in urban environments.
#2782

A Regularized Opponent Model with Maximum Entropy Objective
Zheng Tian, Ying Wen, Zhichen Gong, Faiz Punakkath, Shihao Zou, Jun Wang
Details | PDF

Multi-agent Learning 2

In a single-agent setting, reinforcement learning (RL) tasks can be cast into an inference problem by introducing a binary random variable o, which stands for the "optimality". In this paper, we redefine the binary random variable o in multi-agent setting and formalize multi-agent reinforcement learning (MARL) as probabilistic inference. We derive a variational lower bound of the likelihood of achieving the optimality and name it as Regularized Opponent Model with Maximum Entropy Objective (ROMMEO). From ROMMEO, we present a novel perspective on opponent modeling and show how it can improve the performance of training agents theoretically and empirically in cooperative games. To optimize ROMMEO, we first introduce a tabular Q-iteration method ROMMEO-Q with proof of convergence. We extend the exact algorithm to complex environments by proposing an approximate version, ROMMEO-AC. We evaluate these two algorithms on the challenging iterated matrix game and differential game respectively and show that they can outperform strong MARL baselines.
#3162

Explicitly Coordinated Policy Iteration
Yujing Hu, Yingfeng Chen, Changjie Fan, Jianye Hao
Details | PDF

Multi-agent Learning 2

Coordination on an optimal policy between independent learners in fully cooperative stochastic games is difficult due to problems such as relative overgeneralization and miscoordination. Most state-of-the-art algorithms apply fusion heuristics on agents' optimistic and average rewards, by which coordination between agents can be achieved implicitly. However, such implicit coordination faces practical issues such as tedious parameter-tuning in real world applications. The lack of an explicit coordination mechanism may also lead to a low likelihood of coordination in problems with multiple optimal policies. Based on the necessary conditions of an optimal policy, we propose the explicitly coordinated policy iteration (EXCEL) algorithm which always forces agents to coordinate by comparing the agents' separated optimistic and average value functions. We also propose three solutions for deep reinforcement learning extensions of EXCEL. Extensive experiments in matrix games (from 2-agent 2-action games to 5-agent 20-action games) and stochastic games (from 2-agent games to 5-agent games) show that EXCEL has better performance than the state-of-the-art algorithms (such as faster convergence and better coordination).

Friday 16 11:00 - 12:15 NLP|NLP - Natural Language Processing 3 (2605-2606)

Chair: Parisa Kordjamshidi

#1001

Getting in Shape: Word Embedding SubSpaces
Tianyuan Zhou, João Sedoc, Jordan Rodu
Details | PDF

Natural Language Processing 3

Many tasks in natural language processing require the alignment of word embeddings. Embedding alignment relies on the geometric properties of the manifold of word vectors. This paper focuses on supervised linear alignment and studies the relationship between the shape of the target embedding. We assess the performance of aligned word vectors on semantic similarity tasks and find that the isotropy of the target embedding is critical to the alignment. Furthermore, aligning with an isotropic noise can deliver satisfactory results. We provide a theoretical framework and guarantees which aid in the understanding of empirical results.
#1052

Adversarial Transfer for Named Entity Boundary Detection with Pointer Networks
Jing Li, Deheng Ye, Shuo Shang
Details | PDF

Natural Language Processing 3

In this paper, we focus on named entity boundary detection, which aims to detect the start and end boundaries of an entity mention in text, without predicting its type. A more accurate and robust detection approach is desired to alleviate error propagation in downstream applications, such as entity linking and fine-grained typing systems. Here, we first develop a novel entity boundary labeling approach with pointer networks, where the output dictionary size depends on the input, which is variable. Furthermore, we propose AT-Bdry, which incorporates adversarial transfer learning into an end-to-end sequence labeling model to encourage domain-invariant representations. More importantly, AT-Bdry can reduce domain difference in data distributions between the source and target domains, via an unsupervised transfer learning approach (i.e., no annotated target-domain data is necessary). We conduct Formal Text to Formal Text, Formal Text to Informal Text and ablation evaluations on five benchmark datasets. Experimental results show that AT-Bdry achieves state-of-the-art transferring performance against recent baselines.
#3384

Randomized Greedy Search for Structured Prediction: Amortized Inference and Learning
Chao Ma, F A Rezaur Rahman Chowdhury, Aryan Deshwal, Md Rakibul Islam, Janardhan Rao Doppa, Dan Roth
Details | PDF

Natural Language Processing 3

In a structured prediction problem, we need to learn a predictor that can produce a structured output given a structured input (e.g., part-of-speech tagging). The key learning and inference challenge is due to the exponential size of the structured output space. This paper makes four contributions towards the goal of a computationally-efficient inference and training approach for structured prediction that allows to employ complex models and to optimize for non-decomposable loss functions. First, we define a simple class of randomized greedy search (RGS) based inference procedures that leverage classification algorithms for simple outputs. Second, we develop a RGS specific learning approach for amortized inference that can quickly produce high-quality outputs for a given set of structured inputs. Third, we plug our amortized RGS inference solver inside the inner loop of parameter-learning algorithms (e.g., structured SVM) to improve the speed of training. Fourth, we perform extensive experiments on diverse structured prediction tasks. Results show that our proposed approach is competitive or better than many state-of-the-art approaches in spite of its simplicity.
#4228

Refining Word Representations by Manifold Learning
Chu Yonghe, Hongfei Lin, Liang Yang, Yufeng Diao, Shaowu Zhang, Fan Xiaochao
Details | PDF

Natural Language Processing 3

Pre-trained distributed word representations have been proven useful in various natural language processing (NLP) tasks. However, the effect of words’ geometric structure on word representations has not been carefully studied yet. The existing word representations methods underestimate the words whose distances are close in the Euclidean space, while overestimating words with a much greater distance. In this paper, we propose a word vector refinement model to correct the pre-trained word embedding, which brings the similarity of words in Euclidean space closer to word semantics by using manifold learning. This approach is theoretically founded in the metric recovery paradigm. Our word representations have been evaluated on a variety of lexical-level intrinsic tasks (semantic relatedness, semantic similarity) and the experimental results show that the proposed model outperforms several popular word representations approaches.
#5974

Recurrent Neural Network for Text Classification with Hierarchical Multiscale Dense Connections
Yi Zhao, Yanyan Shen, Junjie Yao
Details | PDF

Natural Language Processing 3

Text classification is a fundamental task in many Natural Language Processing applications. While recurrent neural networks have achieved great success in performing text classification, they fail to capture the hierarchical structure and long-term semantics dependency which are common features of text data. Inspired by the advent of the dense connection pattern in advanced convolutional neural networks, we propose a simple yet effective recurrent architecture, named Hierarchical Mutiscale Densely Connected RNNs (HM-DenseRNNs), which: 1) enables direct access to the hidden states of all preceding recurrent units via dense connections, and 2) organizes multiple densely connected recurrent units into a hierarchical multi-scale structure, where the layers are updated at different scales. HM-DenseRNNs can effectively capture long-term dependencies among words in long text data, and a dense recurrent block is further introduced to reduce the number of parameters and enhance training efficiency. We evaluate the performance of our proposed architecture on three text datasets and the results verify the advantages of HM-DenseRNN over the baseline methods in terms of the classification accuracy.

Friday 16 11:00 - 12:30 Panel (K)

Chair: Qiang Yang

AI and User Privacy

Panel

Friday 16 11:00 - 12:30 ML|TAML - Transfer, Adaptation, Multi-task Learning 5 (J)

Chair: Zhouchen Lin

#1917

Deep Multi-Task Learning with Adversarial-and-Cooperative Nets
Pei Yang, Qi Tan, Jieping Ye, Hanghang Tong, Jingrui He
Details | PDF

Transfer, Adaptation, Multi-task Learning 5

In this paper, we propose a deep multi-Task learning model based on Adversarial-and-COoperative nets (TACO). The goal is to use an adversarial-and-cooperative strategy to decouple the task-common and task-specific knowledge, facilitating the fine-grained knowledge sharing among tasks. TACO accommodates multiple game players, i.e., feature extractors, domain discriminator, and tri-classifiers. They play the MinMax games adversarially and cooperatively to distill the task-common and task-specific features, while respecting their discriminative structures. Moreover, it adopts a divide-and-combine strategy to leverage the decoupled multi-view information to further improve the generalization performance of the model. The experimental results show that our proposed method significantly outperforms the state-of-the-art algorithms on the benchmark datasets in both multi-task learning and semi-supervised domain adaptation scenarios.
#2177

Learning Disentangled Semantic Representation for Domain Adaptation
Ruichu Cai, Zijian Li, Pengfei Wei, Jie Qiao, Kun Zhang, Zhifeng Hao
Details | PDF

Transfer, Adaptation, Multi-task Learning 5

Domain adaptation is an important but challenging task. Most of the existing domain adaptation methods struggle to extract the domain-invariant representation on the feature space with entangling domain information and semantic information. Different from previous efforts on the entangled feature space, we aim to extract the domain invariant semantic information in the latent disentangled semantic representation (DSR) of the data. In DSR, we assume the data generation process is controlled by two independent sets of variables, i.e., the semantic latent variables and the domain latent variables. Under the above assumption, we employ a variational auto-encoder to reconstruct the semantic latent variables and domain latent variables behind the data. We further devise a dual adversarial network to disentangle these two sets of reconstructed latent variables. The disentangled semantic latent variables are finally adapted across the domains. Experimental studies testify that our model yields state-of-the-art performance on several domain adaptation benchmark datasets.
#2426

Cooperative Pruning in Cross-Domain Deep Neural Network Compression
Shangyu Chen, Wenya Wang, Sinno Jialin Pan
Details | PDF

Transfer, Adaptation, Multi-task Learning 5

The advancement of deep models poses great challenges to real-world deployment because of the limited computational ability and storage space on edge devices. To solve this problem, existing works have made progress to compress deep models by pruning or quantization. However, most existing methods rely on a large amount of training data and a pre-trained model in the same domain. When only limited in-domain training data is available, these methods fail to perform well. This prompts the idea of transferring knowledge from a resource-rich source domain to a target domain with limited data to perform model compression. In this paper, we propose a method to perform cross-domain pruning by cooperatively training in both domains: taking advantage of data and a pre-trained model from the source domain to assist pruning in the target domain. Specifically, source and target pruned models are trained simultaneously and interactively, with source information transferred through the construction of a cooperative pruning mask. Our method significantly improves pruning quality in the target domain, and shed light to model compression in the cross-domain setting.
#2901

Learning to Interpret Satellite Images using Wikipedia
Burak Uzkent, Evan Sheehan, Chenlin Meng, Zhongyi Tang, Marshall Burke, David Lobell, Stefano Ermon
Details | PDF

Transfer, Adaptation, Multi-task Learning 5

Despite recent progress in computer vision, fine-grained interpretation of satellite images remains challenging because of a lack of labeled training data. To overcome this limitation, we construct a novel dataset called WikiSatNet by pairing geo-referenced Wikipedia articles with satellite imagery of their corresponding locations. We then propose two strategies to learn representations of satellite images by predicting properties of the corresponding articles from the images. Leveraging this new multi-modal dataset, we can drastically reduce the quantity of human-annotated labels and time required for downstream tasks. On the recently released fMoW dataset, our pre-training strategies can boost the performance of a model pre-trained on ImageNet by up to 4.5% in F1 score.
#4313

Weakly Supervised Multi-task Learning for Semantic Parsing
Bo Shao, Yeyun Gong, Junwei Bao, Jianshu Ji, Guihong Cao, Xiaola Lin, Nan Duan
Details | PDF

Transfer, Adaptation, Multi-task Learning 5

Semantic parsing is a challenging and important task which aims to convert a natural language sentence to a logical form. Existing neural semantic parsing methods mainly use <question, logical form> (Q-L) pairs to train a sequence-to-sequence model. However, the amount of existing Q-L labeled data is limited and hard to obtain. We propose an effective method which substantially utilizes labeling information from other tasks to enhance the training of a semantic parser. We design a multi-task learning model to train question type classification, entity mention detection together with question semantic parsing using a shared encoder. We propose a weakly supervised learning method to enhance our multi-task learning model with paraphrase data, based on the idea that the paraphrased questions should have the same logical form and question type information. Finally, we integrate the weakly supervised multi-task learning method to an encoder-decoder framework. Experiments on a newly constructed dataset and ComplexWebQuestions show that our proposed method outperforms state-of-the-art methods which demonstrates the effectiveness and robustness of our method.
#10972

(Sister Conferences Best Papers Track) Taskonomy: Disentangling Task Transfer Learning
Amir Zamir, Alexander Sax, William Shen, Leonidas Guibas, Jitendra Malik, Silvio Savarese
Details | PDF

Transfer, Adaptation, Multi-task Learning 5

Do visual tasks have relationships, or are they unrelated? For instance, could having surface normals simplify estimating the depth of an image? Intuition answers these questions positively, implying existence of a certain structure among visual tasks. Knowing this structure has notable values; it provides a principled way for identifying relationships across tasks, for instance, in order to reuse supervision among tasks with redundancies or solve many tasks in one system without piling up the complexity. We propose a fully computational approach for modeling the transfer learning structure of the space of visual tasks. This is done via finding transfer learning dependencies across tasks in a dictionary of twenty-six 2D, 2.5D, 3D, and semantic tasks. The product is a computational taxonomic map among tasks for transfer learning, and we exploit it to reduce the demand for labeled data. For example, we show that the total number of labeled datapoints needed for solving a set of 10 tasks can be reduced by roughly 2/3 (compared to training independently) while keeping the performance nearly the same. We provide a set of tools for computing and visualizing this taxonomical structure at http://taskonomy.vision.

Friday 16 11:00 - 12:30 ML|DL - Deep Learning 8 (L)

Chair: Dejing Dou

#825

Group-based Learning of Disentangled Representations with Generalizability for Novel Contents
Haruo Hosoya
Details | PDF

Deep Learning 8

Sensory data are often comprised of independent content and transformation factors. For example, face images may have shapes as content and poses as transformation. To infer separately these factors from given data, various ``disentangling'' models have been proposed. However, many of these are supervised or semi-supervised, either requiring attribute labels that are often unavailable or disallowing for generalization over new contents. In this study, we introduce a novel deep generative model, called group-based variational autoencoders. In this, we assume no explicit labels, but a weaker form of structure that groups together data instances having the same content but transformed differently; we thereby separately estimate a group-common factor as content and an instance-specific factor as transformation. This approach allows for learning to represent a general continuous space of contents, which can accommodate unseen contents. Despite the simplicity, our model succeeded in learning, from five datasets, content representations that are highly separate from the transformation representation and generalizable to data with novel contents. We further provide detailed analysis of the latent content code and show insight into how our model obtains the notable transformation invariance and content generalizability.
#1999

Crafting Efficient Neural Graph of Large Entropy
Minjing Dong, Hanting Chen, Yunhe Wang, Chang Xu
Details | PDF

Deep Learning 8

Network pruning is widely applied to deep CNN models due to their heavy computation costs and achieves high performance by keeping important weights while removing the redundancy. Pruning redundant weights directly may hurt global information flow, which suggests that an efficient sparse network should take graph properties into account. Thus, instead of paying more attention to preserving important weight, we focus on the pruned architecture itself. We propose to use graph entropy as the measurement, which shows useful properties to craft high-quality neural graphs and enables us to propose efficient algorithm to construct them as the initial network architecture. Our algorithm can be easily implemented and deployed to different popular CNN models and achieve better trade-offs.
#2818

A Convergence Analysis of Distributed SGD with Communication-Efficient Gradient Sparsification
Shaohuai Shi, Kaiyong Zhao, Qiang Wang, Zhenheng Tang, Xiaowen Chu
Details | PDF

Deep Learning 8

Gradient sparsification is a promising technique to significantly reduce the communication overhead in decentralized synchronous stochastic gradient descent (S-SGD) algorithms. Yet, many existing gradient sparsification schemes (e.g., Top-k sparsification) have a communication complexity of O(kP), where k is the number of selected gradients by each worker and P is the number of workers. Recently, the gTop-k sparsification scheme has been proposed to reduce the communication complexity from O(kP) to O(k logP), which significantly boosts the system scalability. However, it remains unclear whether the gTop-k sparsification scheme can converge in theory. In this paper, we first provide theoretical proofs on the convergence of the gTop-k scheme for non-convex objective functions under certain analytic assumptions. We then derive the convergence rate of gTop-k S-SGD, which is at the same order as the vanilla mini-batch SGD. Finally, we conduct extensive experiments on different machine learning models and data sets to verify the soundness of the assumptions and theoretical results, and discuss the impact of the compression ratio on the convergence performance.
#6190

Parallel Wasserstein Generative Adversarial Nets with Multiple Discriminators
Yuxin Su, Shenglin Zhao, Xixian Chen, Irwin King, Michael Lyu
Details | PDF

Deep Learning 8

Wasserstein Generative Adversarial Nets~(GANs) are newly proposed GAN algorithms and widely used in computer vision, web mining, information retrieval, etc. However, the existing algorithms with approximated Wasserstein loss converge slowly due to heavy computation cost and usually generate unstable results as well. In this paper, we solve the computation cost problem by speeding up the Wasserstein GANs from a well-designed communication efficient parallel architecture. Specifically, we develop a new problem formulation targeting the accurate evaluation of Wasserstein distance and propose an easily parallel optimization algorithm to train the Wasserstein GANs. Compared to traditional parallel architecture, our proposed framework is designed explicitly for the skew parameter updates between the generator network and discriminator network. Rigorous experiments reveal that our proposed framework achieves a significant improvement regarding convergence speed with comparable stability on generating images, compared to the state-of-the-art of Wasserstein GANs algorithms.
#412

Network-Specific Variational Auto-Encoder for Embedding in Attribute Networks
Di Jin, Bingyi Li, Pengfei Jiao, Dongxiao He, Weixiong Zhang
Details | PDF

Deep Learning 8

Network embedding (NE) maps a network into a low-dimensional space while preserving intrinsic features of the network. Variational Auto-Encoder (VAE) has been actively studied for NE. These VAE-based methods typically utilize both network topologies and node semantics and treat these two types of data in the same way. However, the information of network topology and information of node semantics are orthogonal and are often from different sources; the former quantifies coupling relationships among nodes, whereas the latter represents node specific properties. Ignoring this difference affects NE. To address this issue, we develop a network-specific VAE for NE, named as NetVAE. In the encoding phase of our new approach, compression of network structures and compression of node attributes share the same encoder in order to perform co-training to achieve transfer learning and information integration. In the decoding phase, a dual decoder is introduced to reconstruct network topologies and node attributes separately. Specifically, as a part of the dual decoder, we develop a novel method based on a Gaussian mixture model and the block model to reconstruct network structures. Extensive experiments on large real-world networks demonstrate a superior performance of the new approach over the state-of-the-art methods.
#1102

Incremental Few-Shot Learning for Pedestrian Attribute Recognition
Liuyu Xiang, Xiaoming Jin, Guiguang Ding, Jungong Han, Leida Li
Details | PDF

Deep Learning 8

Pedestrian attribute recognition has received increasing attention due to its important role in video surveillance applications. However, most existing methods are designed for a fixed set of attributes. They are unable to handle the incremental few-shot learning scenario, i.e. adapting a well-trained model to newly added attributes with scarce data, which commonly exists in the real world. In this work, we present a meta learning based method to address this issue. The core of our framework is a meta architecture capable of disentangling multiple attribute information and generalizing rapidly to new coming attributes. By conducting extensive experiments on the benchmark dataset PETA and RAP under the incremental few-shot setting, we show that our method is able to perform the task with competitive performances and low resource requirements.

Friday 16 11:00 - 12:30 ML|RL - Reinforcement Learning 7 (2701-2702)

Chair: Paul Weng

#2292

A Strongly Asymptotically Optimal Agent in General Environments
Michael K. Cohen, Elliot Catt, Marcus Hutter
Details | PDF

Reinforcement Learning 7

Reinforcement Learning agents are expected to eventually perform well. Typically, this takes the form of a guarantee about the asymptotic behavior of an algorithm given some assumptions about the environment. We present an algorithm for a policy whose value approaches the optimal value with probability 1 in all computable probabilistic environments, provided the agent has a bounded horizon. This is known as strong asymptotic optimality, and it was previously unknown whether it was possible for a policy to be strongly asymptotically optimal in the class of all computable probabilistic environments. Our agent, Inquisitive Reinforcement Learner (Inq), is more likely to explore the more it expects an exploratory action to reduce its uncertainty about which environment it is in, hence the term inquisitive. Exploring inquisitively is a strategy that can be applied generally; for more manageable environment classes, inquisitiveness is tractable. We conducted experiments in "grid-worlds" to compare the Inquisitive Reinforcement Learner to other weakly asymptotically optimal agents.
#2343

Structure Learning for Safe Policy Improvement
Thiago D. Simão, Matthijs T. J. Spaan
Details | PDF

Reinforcement Learning 7

We investigate how Safe Policy Improvement (SPI) algorithms can exploit the structure of factored Markov decision processes when such structure is unknown a priori. To facilitate the application of reinforcement learning in the real world, SPI provides probabilistic guarantees that policy changes in a running process will improve the performance of this process. However, current SPI algorithms have requirements that might be impractical, such as: (i) availability of a large amount of historical data, or (ii) prior knowledge of the underlying structure. To overcome these limitations we enhance a Factored SPI (FSPI) algorithm with different structure learning methods. The resulting algorithms need fewer samples to improve the policy and require weaker prior knowledge assumptions. In well-factorized domains, the proposed algorithms improve performance significantly compared to a flat SPI algorithm, demonstrating a sample complexity closer to an FSPI algorithm that knows the structure. This indicates that the combination of FSPI and structure learning algorithms is a promising solution to real-world problems involving many variables.
#5277

An Actor-Critic-Attention Mechanism for Deep Reinforcement Learning in Multi-view Environments
Elaheh Barati, Xuewen Chen
Details | PDF

Reinforcement Learning 7

In reinforcement learning algorithms, leveraging multiple views of the environment can improve the learning of complicated policies. In multi-view environments, due to the fact that the views may frequently suffer from partial observability, their level of importance are often different. In this paper, we propose a deep reinforcement learning method and an attention mechanism in a multi-view environment. Each view can provide various representative information about the environment. Through our attention mechanism, our method generates a single feature representation of environment given its multiple views. It learns a policy to dynamically attend to each view based on its importance in the decision-making process. Through experiments, we show that our method outperforms its state-of-the-art baselines on TORCS racing car simulator and three other complex 3D environments with obstacles. We also provide experimental results to evaluate the performance of our method on noisy conditions and partial observation settings.
#5522

Advantage Amplification in Slowly Evolving Latent-State Environments
Martin Mladenov, Ofer Meshi, Jayden Ooi, Dale Schuurmans, Craig Boutilier
Details | PDF

Reinforcement Learning 7

Latent-state environments with long horizons, such as those faced by recommender systems, pose significant challenges for reinforcement learning (RL). In this work, we identify and analyze several key hurdles for RL in such environments, including belief state error and small action advantage. We develop a general principle called advantage amplification that an overcome these hurdles through the use of temporal abstraction. We propose several aggregation methods and prove they induce amplification in certain settings. We also bound the loss in optimality incurred by our methods in environments where latent state evolves slowly and demonstrate their performance empirically in a stylized user-modeling task.
#5872

SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets
Eugene Ie, Vihan Jain, Jing Wang, Sanmit Narvekar, Ritesh Agarwal, Rui Wu, Heng-Tze Cheng, Tushar Chandra, Craig Boutilier
Details | PDF

Reinforcement Learning 7

Reinforcement learning methods for recommender systems optimize recommendations for long-term user engagement. However, since users are often presented with slates of multiple items---which may have interacting effects on user choice---methods are required to deal with the combinatorics of the RL action space. We develop SlateQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates. Under mild assumptions on user choice behavior, we show that the long-term value (LTV) of a slate can be decomposed into a tractable function of its component item-wise LTVs. We demonstrate our methods in simulation, and validate the scalability and effectiveness of decomposed TD-learning on YouTube.
#6308

MineRL: A Large-Scale Dataset of Minecraft Demonstrations
William H. Guss, Brandon Houghton, Nicholay Topin, Phillip Wang, Cayden Codel, Manuela Veloso, Ruslan Salakhutdinov
Details | PDF

Reinforcement Learning 7

The sample inefficiency of standard deep reinforcement learning methods precludes their application to many real-world problems. Methods which leverage human demonstrations require fewer samples but have been researched less. As demonstrated in the computer vision and natural language processing communities, large-scale datasets have the capacity to facilitate research by serving as an experimental and benchmarking platform for new methods. However, existing datasets compatible with reinforcement learning simulators do not have sufficient scale, structure, and quality to enable the further development and evaluation of methods focused on using human examples. Therefore, we introduce a comprehensive, large-scale, simulator-paired dataset of human demonstrations: MineRL. The dataset consists of over 60 million automatically annotated state-action pairs across a variety of related tasks in Minecraft, a dynamic, 3D, open-world environment. We present a novel data collection scheme which allows for the ongoing introduction of new tasks and the gathering of complete state information suitable for a variety of methods. We demonstrate the hierarchality, diversity, and scale of the MineRL dataset. Further, we show the difficulty of the Minecraft domain along with the potential of MineRL in developing techniques to solve key research challenges within it.

Friday 16 11:00 - 12:30 ML|LPR - Learning Preferences or Rankings 2 (2705-2706)

Chair: Bin Gu

#1194

CFM: Convolutional Factorization Machines for Context-Aware Recommendation
Xin Xin, Bo Chen, Xiangnan He, Dong Wang, Yue Ding, Joemon Jose
Details | PDF

Learning Preferences or Rankings 2

Factorization Machine (FM) is an effective solution for context-aware recommender systems (CARS) which models second-order feature interactions by inner product. However, it is insufficient to capture high-order and nonlinear interaction signals. While several recent efforts have enhanced FM with neural networks, they assume the embedding dimensions are independent from each other and model high-order interactions in a rather implicit manner. In this paper, we propose Convolutional Factorization Machine (CFM) to address above limitations. Specifically, CFM models second-order interactions with outer product, resulting in ''images'' which capture correlations between embedding dimensions. Then all generated ''images'' are stacked, forming an interaction cube. 3D convolution is applied above it to learn high-order interaction signals in an explicit approach. Besides, we also leverage a self-attention mechanism to perform the pooling of features to reduce time complexity. We conduct extensive experiments on three real-world datasets, demonstrating significant improvement of CFM over competing methods for context-aware top-k recommendation.
#1374

VAEGAN: A Collaborative Filtering Framework based on Adversarial Variational Autoencoders
Xianwen Yu, Xiaoning Zhang, Yang Cao, Min Xia
Details | PDF

Learning Preferences or Rankings 2

Recently, Variational Autoencoders (VAEs) have been successfully applied to collaborative filtering for implicit feedback. However, the performance of the resulting model depends a lot on the expressiveness of the inference model and the latent representation is often too constrained to be expressive enough to capture the true posterior distribution. In this paper, a novel framework named VAEGAN is proposed to address the above issue. In VAEGAN, we first introduce Adversarial Variational Bayes (AVB) to train Variational Autoencoders with arbitrarily expressive inference model. By utilizing Generative Adversarial Networks (GANs) for implicit variational inference, the inference model provides better approximation to the posterior and maximum-likelihood assignment. Then the performance of our model is further improved by introducing an auxiliary discriminative network using adversarial training to achieve high accuracy in recommendation. Furthermore, contractive loss is added to the classical reconstruction cost function as a penalty term to yield robust features and improve the generalization performance. Finally, we show that the performance of our proposed VAEGAN significantly outperforms state-of-the-art baselines on several real-world datasets.
#2743

Reward Learning for Efficient Reinforcement Learning in Extractive Document Summarisation
Yang Gao, Christian M. Meyer, Mohsen Mesgar, Iryna Gurevych
Details | PDF

Learning Preferences or Rankings 2

Document summarisation can be formulated as a sequential decision-making problem, which can be solved by Reinforcement Learning (RL) algorithms. The predominant RL paradigm for summarisation learns a cross-input policy, which requires considerable time, data and parameter tuning due to the huge search spaces and the delayed rewards. Learning input-specific RL policies is a more efficient alternative, but so far depends on handcrafted rewards, which are difficult to design and yield poor performance. We propose RELIS, a novel RL paradigm that learns a reward function with Learning-to-Rank (L2R) algorithms at training time and uses this reward function to train an input-specific RL policy at test time. We prove that RELIS guarantees to generate near-optimal summaries with appropriate L2R and RL algorithms. Empirically, we evaluate our approach on extractive multi-document summarisation. We show that RELIS reduces the training time by two orders of magnitude compared to the state-of-the-art models while performing on par with them.
#3077

Cascading Non-Stationary Bandits: Online Learning to Rank in the Non-Stationary Cascade Model
Chang Li, Maarten de Rijke
Details | PDF

Learning Preferences or Rankings 2

Non-stationarity appears in many online applications such as web search and advertising. In this paper, we study the online learning to rank problem in a non-stationary environment where user preferences change abruptly at an unknown moment in time. We consider the problem of identifying the K most attractive items and propose cascading non-stationary bandits, an online learning variant of the cascading model, where a user browses a ranked list from top to bottom and clicks on the first attractive item. We propose two algorithms for solving this non-stationary problem: CascadeDUCB and CascadeSWUCB. We analyze their performance and derive gap-dependent upper bounds on the n-step regret of these algorithms. We also establish a lower bound on the regret for cascading non-stationary bandits and show that both algorithms match the lower bound up to a logarithmic factor. Finally, we evaluate their performance on a real-world web search click dataset.
#3547

Action Space Learning for Heterogeneous User Behavior Prediction
Dongha Lee, Chanyoung Park, Hyunjun Ju, Junyoung Hwang, Hwanjo Yu
Details | PDF

Learning Preferences or Rankings 2

Users' behaviors observed in many web-based applications are usually heterogeneous, so modeling their behaviors considering the interplay among multiple types of actions is important. However, recent collaborative filtering (CF) methods based on a metric learning approach cannot learn multiple types of user actions, because they are developed for only a single type of user actions. This paper proposes a novel metric learning method, called METAS, to jointly model heterogeneous user behaviors. Specifically, it learns two distinct spaces: 1) action space which captures the relations among all observed and unobserved actions, and 2) entity space which captures high-level similarities among users and among items. Each action vector in the action space is computed using a non-linear function and its corresponding entity vectors in the entity space. In addition, METAS adopts an efficient triplet mining algorithm to effectively speed up the convergence of metric learning. Experimental results show that METAS outperforms the state-of-the-art methods in predicting users' heterogeneous actions, and its entity space represents the user-user and item-item similarities more clearly than the space trained by the other methods.
#4873

Incremental Elicitation of Rank-Dependent Aggregation Functions based on Bayesian Linear Regression
Nadjet Bourdache, Patrice Perny, Olivier Spanjaard
Details | PDF

Learning Preferences or Rankings 2

We introduce a new model-based incremental choice procedure for multicriteria decision support, that interleaves the analysis of the set of alternatives and the elicitation of weighting coefficients that specify the role of criteria in rank-dependent models such as ordered weighted averages (OWA) and Choquet integrals. Starting from a prior distribution on the set of weighting parameters, we propose an adaptive elicitation approach based on the minimization of the expected regret to iteratively generate preference queries. The answers of the Decision Maker are used to revise the current distribution until a solution can be recommended with sufficient confidence. We present numerical tests showing the interest of the proposed approach.

Friday 16 11:00 - 12:30 AMS|AGT - Algorithmic Game Theory 2 (2601-2602)

Chair: Rohit Vaish

#555

Schelling Games on Graphs
Edith Elkind, Jiarui Gan, Ayumi Igarashi, Warut Suksompong, Alexandros A. Voudouris
Details | PDF

Algorithmic Game Theory 2

We consider strategic games that are inspired by Schelling's model of residential segregation. In our model, the agents are partitioned into k types and need to select locations on an undirected graph. Agents can be either stubborn, in which case they will always choose their preferred location, or strategic, in which case they aim to maximize the fraction of agents of their own type in their neighborhood. We investigate the existence of equilibria in these games, study the complexity of finding an equilibrium outcome or an outcome with high social welfare, and also provide upper and lower bounds on the price of anarchy and stability. Some of our results extend to the setting where the preferences of the agents over their neighbors are defined by a social network rather than a partition into types.
#615

Approximately Maximizing the Broker's Profit in a Two-sided Market
Jing Chen, Bo Li, Yingkai Li
Details | PDF

Algorithmic Game Theory 2

We study how to maximize the broker's (expected) profit in a two-sided market, where she buys items from a set of sellers and resells them to a set of buyers. Each seller has a single item to sell and holds a private value on her item, and each buyer has a valuation function over the bundles of the sellers' items. We consider the Bayesian setting where the agents' values/valuations are independently drawn from prior distributions, and aim at designing dominant-strategy incentive-compatible (DSIC) mechanisms that are approximately optimal. Production-cost markets, where each item has a publicly-known cost to be produced, provide a platform for us to study two-sided markets. Briefly, we show how to covert a mechanism for production-cost markets into a mechanism for the broker, whenever the former satisfies cost-monotonicity. This reduction holds even when buyers have general combinatorial valuation functions. When the buyers' valuations are additive, we generalize an existing mechanism to production-cost markets in an approximation-preserving way. We then show that the resulting mechanism is cost-monotone and thus can be converted into an 8-approximation mechanism for two-sided markets.
#867

Diffusion and Auction on Graphs
Bin Li, Dong Hao, Dengji Zhao, Makoto Yokoo
Details | PDF

Algorithmic Game Theory 2

Auction is the common paradigm for resource allocation which is a fundamental problem in human society. Existing research indicates that the two primary objectives, the seller's revenue and the allocation efficiency, are generally conflicting in auction design. For the first time, we expand the domain of the classic auction to a social graph and formally identify a new class of auction mechanisms on graphs. All mechanisms in this class are incentive-compatible and also promote all buyers to diffuse the auction information to others, whereby both the seller's revenue and the allocation efficiency are significantly improved comparing with the Vickrey auction. It is found that the recently proposed information diffusion mechanism is an extreme case with the lowest revenue in this new class. Our work could potentially inspire a new perspective for the efficient and optimal auction design and could be applied into the prevalent online social and economic networks.
#4916

Computational Aspects of Equilibria in Discrete Preference Games
Phani Raj Lolakapuri, Umang Bhaskar, Ramasuri Narayanam, Gyana R Parija, Pankaj S Dayama
Details | PDF

Algorithmic Game Theory 2

We study the complexity of equilibrium computation in discrete preference games. These games were introduced by Chierichetti, Kleinberg, and Oren (EC '13, JCSS '18) to model decision-making by agents in a social network that choose a strategy from a finite, discrete set, balancing between their intrinsic preferences for the strategies and their desire to choose a strategy that is `similar' to their neighbours. There are thus two components: a social network with the agents as vertices, and a metric space of strategies. These games are potential games, and hence pure Nash equilibria exist. Since their introduction, a number of papers have studied various aspects of this model, including the social cost at equilibria, and arrival at a consensus. We show that in general, equilibrium computation in discrete preference games is PLS-complete, even in the simple case where each agent has a constant number of neighbours. If the edges in the social network are weighted, then the problem is PLS-complete even if each agent has a constant number of neighbours, the metric space has constant size, and every pair of strategies is at distance 1 or 2. Further, if the social network is directed, modelling asymmetric influence, an equilibrium may not even exist. On the positive side, we show that if the metric space is a tree metric, or is the product of path metrics, then the equilibrium can be computed in polynomial time.
#5419

Improving Nash Social Welfare Approximations
Jugal Garg, Peter McGlaughlin
Details | PDF

Algorithmic Game Theory 2

We consider the problem of fairly allocating a set of indivisible goods among n agents. Various fairness notions have been proposed within the rapidly growing field of fair division, but the Nash social welfare (NSW) serves as a focal point. In part, this follows from the 'unreasonable' fairness guarantees provided, in the sense that a max NSW allocation meets multiple other fairness metrics simultaneously, all while satisfying a standard economic concept of efficiency, Pareto optimality. However, existing approximation algorithms fail to satisfy all of the remarkable fairness guarantees offered by a max NSW allocation, instead targeting only the specific NSW objective. We address this issue by presenting a 2 max NSW, Prop-1, 1/(2n) MMS, and Pareto optimal allocation in strongly polynomial time. Our techniques are based on a market interpretation of a fractional max NSW allocation. We present novel definitions of fairness concepts in terms of market prices, and design a new scheme to round a market equilibrium into an integral allocation that provides most of the fairness properties of an integral max NSW allocation.
#5610

An Asymptotically Optimal VCG Redistribution Mechanism for the Public Project Problem
Mingyu Guo
Details | PDF

Algorithmic Game Theory 2

We study the classic public project problem, where a group of agents need to decide whether or not to build a non-excludable public project. We focus on efficient, strategy-proof, and weakly budget-balanced mechanisms (VCG redistribution mechanisms). Our aim is to maximize the worst-case efficiency ratio --- the worst-case ratio between the achieved total utility and the first-best maximum total utility. Previous studies have identified the optimal mechanism for 3 agents. It was also conjectured that the worst-case efficiency ratio approaches 1 asymptotically as the number of agents approaches infinity. Unfortunately, no optimal mechanisms have been identified for cases with more than 3 agents. We propose an asymptotically optimal mechanism, which achieves a worst-case efficiency ratio of 1, under a minor technical assumption: we assume the agents' valuations are rational numbers with bounded denominators. We also show that if the agents' valuations are drawn from identical and independent distributions, our mechanism's efficiency ratio equals 1 with probability approaching 1 asymptotically. Our results significantly improve on previous results. The best previously known asymptotic worst-case efficiency ratio is 0.102. For non-asymptotic cases, our mechanisms also achieve better ratios than all previous results.

Friday 16 11:00 - 12:30 CSAT|SATEA - SAT: Evaluation and Analysis (2603-2604)

Chair: Roberto Sebastiani

#2620

Enumerating Potential Maximal Cliques via SAT and ASP
Tuukka Korhonen, Jeremias Berg, Matti Järvisalo
Details | PDF

SAT: Evaluation and Analysis

The Bouchitté-Todinca algorithm (BT), operating dynamic programming over the so-called potential maximal cliques (PMCs), yields a practically efficient approach to treewidth and generalized hypertreewidth. The enumeration of PMCs is a scalability bottleneck for BT in practice. We propose the use of declarative solvers for PMC enumeration as a substitute for the specialized PMC enumeration algorithms employed in current BT implementations. The presented Boolean satisfiability (SAT) and answer set programming (ASP) based PMC enumeration approaches open up new possibilities for improving the efficiency of BT in practice.
#6075

GANAK: A Scalable Probabilistic Exact Model Counter
Shubham Sharma, Subhajit Roy, Mate Soos, Kuldeep S. Meel
Details | PDF

SAT: Evaluation and Analysis

Given a Boolean formula F, the problem of model counting, also referred to as #SAT, seeks to compute the number of solutions of F. Model counting is a fundamental problem with a wide variety of applications ranging from planning, quantified information flow to probabilistic reasoning and the like. The modern #SAT solvers tend to be either based on static decomposition, dynamic decomposition, or a hybrid of the two. Despite dynamic decomposition based #SAT solvers sharing much of their architecture with SAT solvers, the core design and heuristics of dynamic decomposition-based #SAT solvers has remained constant for over a decade. In this paper, we revisit the architecture of the state-of-the-art dynamic decomposition-based #SAT tool, sharpSAT, and demonstrate that by introducing a new notion of probabilistic component caching and the usage of universal hashing for exact model counting along with the development of several new heuristics can lead to significant performance improvement over state-of-the-art model-counters. In particular, we develop GANAK, a new scalable probabilistic exact model counter that outperforms state-of-the-art exact and approximate model counters sharpSAT and ApproxMC3 respectively, both in terms of PAR-2 score and the number of instances solved. Furthermore, in our experiments, the model count returned by GANAK was equal to the exact model count for all the benchmarks. Finally, we observe that recently proposed preprocessing techniques for model counting benefit exact model counters while hurting the performance of approximate model counters.
#10976

(Sister Conferences Best Papers Track) Sharpness of the Satisfiability Threshold for Non-Uniform Random k-SAT
Tobias Friedrich, Ralf Rothenberger
Details | PDF

SAT: Evaluation and Analysis

We study a more general model to generate random instances of Propositional Satisfiability (SAT) with n Boolean variables, m clauses, and exactly k variables per clause. Additionally, our model is given an arbitrary probability distribution (p_1, ..., p_n) on the variable occurrences. Therefore, we call it non-uniform random k-SAT. The number m of randomly drawn clauses at which random formulas go from asymptotically almost surely (a.a.s.) satisfiable to a.a.s. unsatisfiable is called the satisfiability threshold. Such a threshold is called sharp if it approaches a step function as n increases. We identify conditions on the variable probability distribution (p_1, ..., p_n) under which the satisfiability threshold is sharp if its position is already known asymptotically. This result generalizes Friedgut’s sharpness result from uniform to non-uniform random k -SAT and implies sharpness for thresholds of a wide range of random k -SAT models with heterogeneous probability distributions, for example such models where the variable probabilities follow a power-law.
#2809

Resolution and Domination: An Improved Exact MaxSAT Algorithm
Chao Xu, Wenjun Li, Yongjie Yang, Jianer Chen, Jianxin Wang
Details | PDF

SAT: Evaluation and Analysis

We study the Maximum Satisfiability problem (MaxSAT). Particularly, we derive a branching algorithm of running time O*(1.2989^m) for the MaxSAT problem, where m denotes the number of clauses in the given CNF formula. Our algorithm considerably improves the previous best result O*(1.3248^m) by Chen and Kanj [2004] published 15 years ago. For our purpose, we derive improved branching strategies for variables of degrees 3, 4, and 5. The worst case of our branching algorithm is at variables of degree 4 which occur twice both positively and negatively in the given CNF formula. To serve the branching rules and shrink the size of the CNF formula, we also propose a variety of reduction rules which can be exhaustively applied in polynomial time and, moreover, some of them solve a bottleneck of the previous best algorithm.
#5462

Athanor: High-Level Local Search Over Abstract Constraint Specifications in Essence
Saad Attieh, Nguyen Dang, Christopher Jefferson, Ian Miguel, Peter Nightingale
Details | PDF

SAT: Evaluation and Analysis

This paper presents Athanor, a novel local search solver that operates on abstract constraint specifications of combinatorial problems in the Essence language. It is unique in that it operates directly on the high level, nested types in Essence, such as set of partitions or multiset of sequences, without refining such types into low level representations. This approach has two main advantages. First, the structure present in the high level types allows high quality neighbourhoods for local search to be automatically derived. Second, it allows Athanor to scale much better than solvers that operate on the equivalent, but much larger, low-level representations. The paper details how Athanor operates, covering incremental evaluation, dynamic unrolling of quantified expressions and neighbourhood construction. A series of case studies show the performance of Athanor, benchmarked against several local search solvers on a range of problem classes.
#2465

Optimizing Constraint Solving via Dynamic Programming
Shu Lin, Na Meng, Wenxin Li
Details | PDF

SAT: Evaluation and Analysis

Constraint optimization problems (COP) on finite domains are typically solved via search. Many problems (e.g., 0-1 knapsack) involve redundant search, making a general constraint solver revisit the same subproblems again and again. Existing approaches use caching, symmetry breaking, subproblem dominance, or search with decomposition to prune the search space of constraint problems. In this paper we present a different approach--DPSolver--which uses dynamic programming (DP) to efficiently solve certain types of constraint optimization problems (COPs). Given a COP modeled with MiniZinc, DPSolver first analyzes the model to decide whether the problem is efficiently solvable with DP. If so, DPSolver refactors the constraints and objective functions to model the problem as a DP problem. Finally, DPSolver feeds the refactored model to Gecode--a widely used constraint solver--for the optimal solution. Our evaluation shows that DPSolver significantly improves the performance of constraint solving.

Friday 16 11:00 - 12:30 ML|FSLSM - Feature Selection ; Learning Sparse Models (2501-2502)

Chair: Alberto Castellini

#700

HMLasso: Lasso with High Missing Rate
Masaaki Takada, Hironori Fujisawa, Takeichiro Nishikawa
Details | PDF

Feature Selection ; Learning Sparse Models

Sparse regression such as the Lasso has achieved great success in handling high-dimensional data. However, one of the biggest practical problems is that high-dimensional data often contain large amounts of missing values. Convex Conditioned Lasso (CoCoLasso) has been proposed for dealing with high-dimensional data with missing values, but it performs poorly when there are many missing values, so that the high missing rate problem has not been resolved. In this paper, we propose a novel Lasso-type regression method for high-dimensional data with high missing rates. We effectively incorporate mean imputed covariance, overcoming its inherent estimation bias. The result is an optimally weighted modification of CoCoLasso according to missing ratios. We theoretically and experimentally show that our proposed method is highly effective even when there are many missing values.
#1387

Scalable Block-Diagonal Locality-Constrained Projective Dictionary Learning
Zhao Zhang, Weiming Jiang, Zheng Zhang, Sheng Li, Guangcan Liu, Jie Qin
Details | PDF

Feature Selection ; Learning Sparse Models

We propose a novel structured discriminative block- diagonal dictionary learning method, referred to as scalable Locality-Constrained Projective Dictionary Learning (LC-PDL), for efficient representation and classification. To improve the scalability by saving both training and testing time, our LC-PDL aims at learning a structured discriminative dictionary and a block-diagonal representation without using costly l0/l1-norm. Besides, it avoids extra time-consuming sparse reconstruction process with the well-trained dictionary for new sample as many existing models. More importantly, LC-PDL avoids using the com- plementary data matrix to learn the sub-dictionary over each class. To enhance the performance, we incorporate a locality constraint of atoms into the DL procedures to keep local information and obtain the codes of samples over each class separately. A block-diagonal discriminative approximation term is also derived to learn a discriminative projection to bridge data with their codes by extracting the special block-diagonal features from data, which can ensure the approximate coefficients to associate with its label information clearly. Then, a robust multiclass classifier is trained over extracted block-diagonal codes for accurate label predictions. Experimental results verify the effectiveness of our algorithm.
#4463

InteractionNN: A Neural Network for Learning Hidden Features in Sparse Prediction
Xiaowang Zhang, Qiang Gao, Zhiyong Feng
Details | PDF

Feature Selection ; Learning Sparse Models

In this paper, we present a neural network (InteractionNN) for sparse predictive analysis where hidden features of sparse data can be learned by multilevel feature interaction. To characterize multilevel interaction of features, InteractionNN consists of three modules, namely, nonlinear interaction pooling, layer-lossing, and embedding. Nonlinear interaction pooling (NI pooling) is a hierarchical structure and, by shortcut connection, constructs low-level feature interactions from basic dense features to elementary features. Layer-lossing is a feed-forward neural network where high-level feature interactions can be learned from low-level feature interactions via correlation of all layers with target. Moreover, embedding is to extract basic dense features from sparse features of data which can help in reducing our proposed model computational complex. Finally, our experiment evaluates on the two benchmark datasets and the experimental results show that InteractionNN performs better than most of state-of-the-art models in sparse regression.
#5540

Robust Flexible Feature Selection via Exclusive L21 Regularization
Di Ming, Chris Ding
Details | PDF

Feature Selection ; Learning Sparse Models

Recently, exclusive lasso has demonstrated its promising results in selecting discriminative features for each class. The sparsity is enforced on each feature across all the classes via L12-norm. However, the exclusive sparsity of L12-norm could not screen out a large amount of irrelevant and redundant noise features in high-dimensional data space, since each feature belongs to at least one class. Thus, in this paper, we introduce a novel regularization called "exclusive L21", which is short for "L21 with exclusive lasso", towards robust flexible feature selection. The exclusive L21 regularization is the mix of L21-norm and L12-norm, which brings out joint sparsity at inter-group level and exclusive sparsity at intra-group level simultaneously. An efficient augmented Lagrange multipliers based optimization algorithm is proposed to iteratively solve the exclusive L21 regularization in a row-wise fashion. Extensive experiments on twelve benchmark datasets demonstrate the effectiveness of the proposed regularization and the optimization algorithm as compared to state-of-the-arts.
#5655

Differentially Private Iterative Gradient Hard Thresholding for Sparse Learning
Lingxiao Wang, Quanquan Gu
Details | PDF

Feature Selection ; Learning Sparse Models

We consider the differentially private sparse learning problem, where the goal is to estimate the underlying sparse parameter vector of a statistical model in the high-dimensional regime while preserving the privacy of each training example. We propose a generic differentially private iterative gradient hard threshoding algorithm with a linear convergence rate and strong utility guarantee. We demonstrate the superiority of our algorithm through two specific applications: sparse linear regression and sparse logistic regression. Specifically, for sparse linear regression, our algorithm can achieve the best known utility guarantee without any extra support selection procedure used in previous work \cite{kifer2012private}. For sparse logistic regression, our algorithm can obtain the utility guarantee with a logarithmic dependence on the problem dimension. Experiments on both synthetic data and real world datasets verify the effectiveness of our proposed algorithm.
#1409

Label distribution learning with label-specific features
Tingting Ren, Xiuyi Jia, Weiwei Li, Lei Chen, Zechao Li
Details | PDF

Feature Selection ; Learning Sparse Models

Label distribution learning (LDL) is a novel machine learning paradigm to deal with label ambiguity issues by placing more emphasis on how relevant each label is to a particular instance. Many LDL algorithms have been proposed and most of them concentrate on the learning models, while few of them focus on the feature selection problem. All existing LDL models are built on a simple feature space in which all features are shared by all the class labels. However, this kind of traditional data representation strategy tends to select features that are distinguishable for all labels, but ignores label-specific features that are pertinent and discriminative for each class label. In this paper, we propose a novel LDL algorithm by leveraging label-specific features. The common features for all labels and specific features for each label are simultaneously learned to enhance the LDL model. Moreover, we also exploit the label correlations in the proposed LDL model. The experimental results on several real-world data sets validate the effectiveness of our method.

Friday 16 11:00 - 12:30 ML|C - Classification 8 (2503-2504)

Chair: Qianqian Wang

#1449

Prototype Propagation Networks (PPN) for Weakly-supervised Few-shot Learning on Category Graph
Lu Liu, Tianyi Zhou, Guodong Long, Jing Jiang, Lina Yao, Chengqi Zhang
Details | PDF

Classification 8

A variety of machine learning applications expect to achieve rapid learning from a limited number of labeled data. However, the success of most current models is the result of heavy training on big data. Meta-learning addresses this problem by extracting common knowledge across different tasks that can be quickly adapted to new tasks. However, they do not fully explore weakly-supervised information, which is usually free or cheap to collect. In this paper, we show that weakly-labeled data can significantly improve the performance of meta-learning on few-shot classification. We propose prototype propagation network (PPN) trained on few-shot tasks together with data annotated by coarse-label. Given a category graph of the targeted fine-classes and some weakly-labeled coarse-classes, PPN learns an attention mechanism which propagates the prototype of one class to another on the graph, so that the K-nearest neighbor (KNN) classifier defined on the propagated prototypes results in high accuracy across different few-shot tasks. The training tasks are generated by subgraph sampling, and the training objective is obtained by accumulating the level-wise classification loss on the subgraph. On two benchmarks, PPN significantly outperforms most recent few-shot learning methods in different settings, even when they are also allowed to train on weakly-labeled data.
#1504

Multi-View Multi-Label Learning with View-Specific Information Extraction
Xuan Wu, Qing-Guo Chen, Yao Hu, Dengbao Wang, Xiaodong Chang, Xiaobo Wang, Min-Ling Zhang
Details | PDF

Classification 8

Multi-view multi-label learning serves an important framework to learn from objects with diverse representations and rich semantics. Existing multi-view multi-label learning techniques focus on exploiting shared subspace for fusing multi-view representations, where helpful view-specific information for discriminative modeling is usually ignored. In this paper, a novel multi-view multi-label learning approach named SIMM is proposed which leverages shared subspace exploitation and view-specific information extraction. For shared subspace exploitation, SIMM jointly minimizes confusion adversarial loss and multi-label loss to utilize shared information from all views. For view-specific information extraction, SIMM enforces an orthogonal constraint w.r.t. the shared subspace to utilize view-specific discriminative information. Extensive experiments on real-world data sets clearly show the favorable performance of SIMM against other state-of-the-art multi-view multi-label learning approaches.
#3620

Closed-Loop Memory GAN for Continual Learning
Amanda Rios, Laurent Itti
Details | PDF

Classification 8

Sequential learning of tasks using gradient descent leads to an unremitting decline in the accuracy of tasks for which training data is no longer available, termed catastrophic forgetting. Generative models have been explored as a means to approximate the distribution of old tasks and bypass storage of real data. Here we propose a cumulative closed-loop memory replay GAN (CloGAN) provided with external regularization by a small memory unit selected for maximum sample diversity. We evaluate incremental class learning using a notoriously hard paradigm, single-headed learning, in which each task is a disjoint subset of classes in the overall dataset, and performance is evaluated on all previous classes. First, we show that when constructing a dynamic memory unit to preserve sample heterogeneity, model performance asymptotically approaches training on the full dataset. We then show that using a stochastic generator to continuously output fresh new images during training increases performance significantly further meanwhile generating quality images. We compare our approach to several baselines including fine-tuning by gradient descent (FGD), Elastic Weight Consolidation (EWC), Deep Generative Replay (DGR) and Memory Replay GAN (MeRGAN). Our method has very low long-term memory cost, the memory unit, as well as negligible intermediate memory storage.
#3771

Deep Session Interest Network for Click-Through Rate Prediction
Yufei Feng, Fuyu Lv, Weichen Shen, Menghan Wang, Fei Sun, Yu Zhu, Keping Yang
Details | PDF

Classification 8

Click-Through Rate (CTR) prediction plays an important role in many industrial applications, such as online advertising and recommender systems. How to capture users' dynamic and evolving interests from their behavior sequences remains a continuous research topic in the CTR prediction. However, most existing studies overlook the intrinsic structure of the sequences: the sequences are composed of sessions, where sessions are user behaviors separated by their occurring time. We observe that user behaviors are highly homogeneous in each session, and heterogeneous cross sessions. Based on this observation, we propose a novel CTR model named Deep Session Interest Network (DSIN) that leverages users' multiple historical sessions in their behavior sequences. We first use self-attention mechanism with bias encoding to extract users' interests in each session. Then we apply Bi-LSTM to model how users' interests evolve and interact among sessions. Finally, we employ the local activation unit to adaptively learn the influences of various session interests on the target item. Experiments are conducted on both advertising and production recommender datasets and DSIN outperforms other state-of-the-art models on both datasets.
#5164

Inter-node Hellinger Distance based Decision Tree
Pritom Saha Akash, Md. Eusha Kadir, Amin Ahsan Ali, Mohammad Shoyaib
Details | PDF

Classification 8

This paper introduces a new splitting criterion called Inter-node Hellinger Distance (iHD) and a weighted version of it (iHDw) for constructing decision trees. iHD measures the distance between the parent and each of the child nodes in a split using Hellinger distance. We prove that this ensures the mutual exclusiveness between the child nodes. The weight term in iHDw is concerned with the purity of individual child node considering the class imbalance problem. The combination of the distance and weight term in iHDw thus favors a partition where child nodes are purer and mutually exclusive, and skew insensitive. We perform an experiment over twenty balanced and twenty imbalanced datasets. The results show that decision trees based on iHD win against six other state-of-the-art methods on at least 14 balanced and 10 imbalanced datasets. We also observe that adding the weight to iHD improves the performance of decision trees on imbalanced datasets. Moreover, according to the result of the Friedman test, this improvement is statistically significant compared to other methods.
#6080

Weakly Supervised Multi-Label Learning via Label Enhancement
JiaQi Lv, Ning Xu, RenYi Zheng, Xin Geng
Details | PDF

Classification 8

Weakly supervised multi-label learning (WSML) concentrates on a more challenging multi-label classification problem, where some labels in the training set are missing. Existing approaches make multi-label prediction by exploiting the incomplete logical labels directly without considering the relative importance of each label to an instance. In this paper, a novel two-stage strategy named Weakly Supervised Multi-label Learning via Label Enhancement (WSMLLE) is proposed to learn from weakly supervised data via label enhancement. Firstly, the relative importance of each label, i.e., the description degrees are recovered by leveraging the structural information in the feature space and local correlations learned from the label space. Then, a tailored multi-label predictive model is induced by learning from the training instances with the recovered description degrees. To our best knowledge, it is the first attempt to unify the complement of the missing labels and the recovery of the description degrees into the same framework. Extensive experiments across a wide range of real-world datasets clearly validate the superiority of the proposed approach.

Friday 16 11:00 - 12:30 ML|DM - Data Mining 11 (2505-2506)

Chair: Nirmalie Wiratunga

#2175

BeatGAN: Anomalous Rhythm Detection using Adversarially Generated Time Series
Bin Zhou, Shenghua Liu, Bryan Hooi, Xueqi Cheng, Jing Ye
Details | PDF

Data Mining 11

Given a large-scale rhythmic time series containing mostly normal data segments (or `beats'), can we learn how to detect anomalous beats in an effective yet efficient way? For example, how can we detect anomalous beats from electrocardiogram (ECG) readings? Existing approaches either require excessively high amounts of labeled and balanced data for classification, or rely on less regularized reconstructions, resulting in lower accuracy in anomaly detection. Therefore, we propose BeatGAN, an unsupervised anomaly detection algorithm for time series data. BeatGAN outputs explainable results to pinpoint the anomalous time ticks of an input beat, by comparing them to adversarially generated beats. Its robustness is guaranteed by its regularization of reconstruction error using an adversarial generation approach, as well as data augmentation using time series warping. Experiments show that BeatGAN accurately and efficiently detects anomalous beats in ECG time series, and routes doctors' attention to anomalous time ticks, achieving accuracy of nearly 0.95 AUC, and very fast inference (2.6 ms per beat). In addition, we show that BeatGAN accurately detects unusual motions from multivariate motion-capture time series data, illustrating its generality.
#2587

Collaborative Metric Learning with Memory Network for Multi-Relational Recommender Systems
Xiao Zhou, Danyang Liu, Jianxun Lian, Xing Xie
Details | PDF

Data Mining 11

The success of recommender systems in modern online platforms is inseparable from the accurate capture of users' personal tastes. In everyday life, large amounts of user feedback data are created along with user-item online interactions in a variety of ways, such as browsing, purchasing, and sharing. These multiple types of user feedback provide us with tremendous opportunities to detect individuals' fine-grained preferences. Different from most existing recommender systems that rely on a single type of feedback, we advocate incorporating multiple types of user-item interactions for better recommendations. Based on the observation that the underlying spectrum of user preferences is reflected in various types of interactions with items and can be uncovered by latent relational learning in metric space, we propose a unified neural learning framework, named Multi-Relational Memory Network (MRMN). It can not only model fine-grained user-item relations but also enable us to discriminate between feedback types in terms of the strength and diversity of user preferences. Extensive experiments show that the proposed MRMN model outperforms competitive state-of-the-art algorithms in a wide range of scenarios, including e-commerce, local services, and job recommendations.
#3521

BPAM: Recommendation Based on BP Neural Network with Attention Mechanism
Wu-Dong Xi, Ling Huang, Chang-Dong Wang, Yin-Yu Zheng, Jianhuang Lai
Details | PDF

Data Mining 11

Inspired by the significant success of deep learning, some attempts have been made to introduce deep neural networks (DNNs) in recommendation systems to learn users' preferences for items. Since DNNs are well suitable for representation learning, they enable recommendation systems to generate more accurate prediction. However, they inevitably result in high computational and storage costs. Worse still, due to the relatively small number of ratings that can be fed into DNNs, they may easily lead to over-fitting. To tackle these problems, we propose a novel recommendation algorithm based on Back Propagation (BP) neural network with Attention Mechanism (BPAM). In particular, the BP neural network is utilized to learn the complex relationship of the target users and their neighbors. Compared with deep neural network, the shallow neural network, i.e., BP neural network, can not only reduce the computational and storage costs, but also prevent the model from over-fitting. In addition, an attention mechanism is designed to capture the global impact on all nearest target users for each user. Extensive experiments on eight benchmark datasets have been conducted to evaluate the effectiveness of the proposed model.
#4060

Network Embedding under Partial Monitoring for Evolving Networks
Yu Han, Jie Tang, Qian Chen
Details | PDF

Data Mining 11

Network embedding has been extensively studied in recent years. In addition to the works on static networks, some researchers try to propose new models for evolving networks. However, sometimes most of these dynamic network embedding models are still not in line with the actual situation, since these models have a strong assumption that we can achieve all the changes in the whole network, while in fact we cannot do this in some real world networks, such as the web networks and some large social networks. So in this paper, we study a novel and challenging problem, i.e., network embedding under partial monitoring for evolving networks. We propose a model on dynamic networks in which we cannot perceive all the changes of the structure. We analyze our model theoretically, and give a bound to the error between the results of our model and the potential optimal cases. We evaluate the performance of our model from two aspects. The experimental results on real world datasets show that our model outperforms the baseline models by a large margin.
#2277

Learning Multiple Maps from Conditional Ordinal Triplets
Dung D. Le, Hady W. Lauw
Details | PDF

Data Mining 11

Ordinal embedding seeks a low-dimensional representation of objects based on relative comparisons of their similarities. This low-dimensional representation lends itself to visualization on a Euclidean map. Classical assumptions admit only one valid aspect of similarity. However, there are increasing scenarios involving ordinal comparisons that inherently reflect multiple aspects of similarity, which would be better represented by multiple maps. We formulate this problem as conditional ordinal embedding, which learns a distinct low-dimensional representation conditioned on each aspect, yet allows collaboration across aspects via a shared representation. Our geometric approach is novel in its use of a shared spherical representation and multiple aspect-specific projection maps on tangent hyperplanes. Experiments on public datasets showcase the utility of collaborative learning over baselines that learn multiple maps independently.
#1083

Feature-level Deeper Self-Attention Network for Sequential Recommendation
Tingting Zhang, Pengpeng Zhao, Yanchi Liu, Victor S. Sheng, Jiajie Xu, Deqing Wang, Guanfeng Liu, Xiaofang Zhou
Details | PDF

Data Mining 11

Sequential recommendation, which aims to recommend next item that the user will likely interact in a near future, has become essential in various Internet applications. Existing methods usually consider the transition patterns between items, but ignore the transition patterns between features of items. We argue that only the item-level sequences cannot reveal the full sequential patterns, while explicit and implicit feature-level sequences can help extract the full sequential patterns. In this paper, we propose a novel method named Feature-level Deeper Self-Attention Network (FDSA) for sequential recommendation. Specifically, FDSA first integrates various heterogeneous features of items into feature sequences with different weights through a vanilla mechanism. After that, FDSA applies separated self-attention blocks on item-level sequences and feature-level sequences, respectively, to model item transition patterns and feature transition patterns. Then, we integrate the outputs of these two blocks to a fully-connected layer for next item recommendation. Finally, comprehensive experimental results demonstrate that considering the transition relationships between features can significantly improve the performance of sequential recommendation.

Friday 16 11:00 - 12:30 NLP|SATM - Sentiment Analysis and Text Mining 2 (2401-2402)

Chair: Athirai Aravazhi Irissappane

#1532

Unsupervised Neural Aspect Extraction with Sememes
Ling Luo, Xiang Ao, Yan Song, Jinyao Li, Xiaopeng Yang, Qing He, Dong Yu
Details | PDF

Sentiment Analysis and Text Mining 2

Aspect extraction relies on identifying aspects by discovering coherence among words, which is challenging when word meanings are diversified and processing on short texts. To enhance the performance on aspect extraction, leveraging lexical semantic resources is a possible solution to such challenge. In this paper, we present an unsupervised neural framework that leverages sememes to enhance lexical semantics. The overall framework is analogous to an autoenoder which reconstructs sentence representations and learns aspects by latent variables. Two models that form sentence representations are proposed by exploiting sememes via (1) a hierarchical attention; (2) a context-enhanced attention. Experiments on two real-world datasets demonstrate the validity and the effectiveness of our models, which significantly outperforms existing baselines.
#2539

A Span-based Joint Model for Opinion Target Extraction and Target Sentiment Classification
Yan Zhou, Longtao Huang, Tao Guo, Jizhong Han, Songlin Hu
Details | PDF

Sentiment Analysis and Text Mining 2

Target-Based Sentiment Analysis aims at extracting opinion targets and classifying the sentiment polarities expressed on each target. Recently, token based sequence tagging methods have been successfully applied to jointly solve the two tasks, which aims to predict a tag for each token. Since they do not treat a target containing several words as a whole, it might be difficult to make use of the global information to identify that opinion target, leading to incorrect extraction. Independently predicting the sentiment for each token may also lead to sentiment inconsistency for different words in an opinion target. In this paper, inspired by span-based methods in NLP, we propose a simple and effective joint model to conduct extraction and classification at span level rather than token level. Our model first emulates spans with one or more tokens and learns their representation based on the tokens inside. And then, a span-aware attention mechanism is designed to compute the sentiment information towards each span. Extensive experiments on three benchmark datasets show that our model consistently outperforms the state-of-the-art methods.
#5849

Multi-Domain Sentiment Classification Based on Domain-Aware Embedding and Attention
Yitao Cai, Xiaojun Wan
Details | PDF

Sentiment Analysis and Text Mining 2

Sentiment classification is a fundamental task in NLP. However, as revealed by many researches, sentiment classification models are highly domain-dependent. It is worth investigating to leverage data from different domains to improve the classification performance in each domain. In this work, we propose a novel completely-shared multi-domain neural sentiment classification model to learn domain-aware word embeddings and make use of domain-aware attention mechanism. Our model first utilizes BiLSTM for domain classification and extracts domain-specific features for words, which are then combined with general word embeddings to form domain-aware word embeddings. Domain-aware word embeddings are fed into another BiLSTM to extract sentence features. The domain-aware attention mechanism is used for selecting significant features, by using the domain-aware sentence representation as the query vector. Evaluation results on public datasets with 16 different domains demonstrate the efficacy of our proposed model. Further experiments show the generalization ability and the transferability of our model.
#3120

Towards Discriminative Representation Learning for Speech Emotion Recognition
Runnan Li, Zhiyong Wu, Jia Jia, Yaohua Bu, Sheng Zhao, Helen Meng
Details | PDF

Sentiment Analysis and Text Mining 2

In intelligent speech interaction, automatic speech emotion recognition (SER) plays an important role in understanding user intention. While sentimental speech has different speaker characteristics but similar acoustic attributes, one vital challenge in SER is how to learn robust and discriminative representations for emotion inferring. In this paper, inspired by human emotion perception, we propose a novel representation learning component (RLC) for SER system, which is constructed with Multi-head Self-attention and Global Context-aware Attention Long Short-Term Memory Recurrent Neutral Network (GCA-LSTM). With the ability of Multi-head Self-attention mechanism in modeling the element-wise correlative dependencies, RLC can exploit the common patterns of sentimental speech features to enhance emotion-salient information importing in representation learning. By employing GCA-LSTM, RLC can selectively focus on emotion-salient factors with the consideration of entire utterance context, and gradually produce discriminative representation for emotion inferring. Experiments on public emotional benchmark database IEMOCAP and a tremendous realistic interaction database demonstrate the outperformance of the proposed SER framework, with 6.6% to 26.7% relative improvement on unweighted accuracy compared to state-of-the-art techniques.
#3444

Modeling Source Syntax and Semantics for Neural AMR Parsing
DongLai Ge, Junhui Li, Muhua Zhu, Shoushan Li
Details | PDF

Sentiment Analysis and Text Mining 2

Sequence-to-sequence (seq2seq) approaches formalize Abstract Meaning Representation (AMR) parsing as a translation task from a source sentence to a target AMR graph. However, previous studies generally model a source sentence as a word sequence but ignore the inherent syntactic and semantic information in the sentence. In this paper, we propose two effective approaches to explicitly modeling source syntax and semantics into neural seq2seq AMR parsing. The first approach linearizes source syntactic and semantic structure into a mixed sequence of words, syntactic labels, and semantic labels, while in the second approach we propose a syntactic and semantic structure-aware encoding scheme through a self-attentive model to explicitly capture syntactic and semantic relations between words. Experimental results on an English benchmark dataset show that our two approaches achieve significant improvement of 3.1% and 3.4% F1 scores over a strong seq2seq baseline.
#1053

Self-attentive Biaffine Dependency Parsing
Ying Li, Zhenghua Li, Min Zhang, Rui Wang, Sheng Li, Luo Si
Details | PDF

Sentiment Analysis and Text Mining 2

The current state-of-the-art dependency parsing approaches employ BiLSTMs to encode input sentences.Motivated by the success of the transformer-based machine translation, this work for the first time applies the self-attention mechanism to dependency parsing as the replacement of the BiLSTM-based encoders, leading to competitive performance on both English and Chinese benchmark data. Based on the detailed error analysis, we then combine the power of both BiLSTM and self-attention via model ensembles, demonstrating their complementary capability of capturing contextual information. Finally, we explore the recently proposed contextualized word representations as extra input features, and further improve the parsing performance.

Friday 16 11:30 - 12:00 Industry Days (D-I)

Chair: Yu Zheng

From research to engineering: How AI-powered agents are reshaping the adaptive education system architecture
Richard Tong, Chief Research Architect and General Manager of US Operations, Squirrel AI Learning

Industry Days

Friday 16 12:00 - 12:05 Industry Days (D-I)

Chair: Yu Zheng

Closing remarks

Industry Days

Friday 16 14:00 - 14:45 Computers & Thought Award (D-I)

Chair: Qiang Yang

Guy van den Broeck

Computers & Thought Award

Friday 16 14:45 - 15:30 J. McCarthy Award (D-I)

Chair: Qiang Yang

Pedro Domingos

J. McCarthy Award

Friday 16 15:30 - 16:15 Research Excellence Award (D-I)

Chair: Qiang Yang

Yoav Shoham

Research Excellence Award

Friday 16 16:45 - 17:45 Closing (D-I)

Closing

Closing

Friday 16 17:45 - 19:15 Closing Reception (D-I)

Closing Reception

Closing Reception