### Monday 1219:00 - 22:00IJCAI 2019 Welcome Reception (Welcome)

• IJCAI 2019 Welcome Reception
IJCAI 2019 Welcome Reception
• ### Tuesday 1308:30 - 09:30Opening (R1)

• Opening Session
Opening
• ### Tuesday 1309:30 - 10:20Invited Talk (R1)

• Doing for our robots what evolution did for us
Leslie Kaelbling
Invited Talk
• ### Tuesday 1309:30 - 18:00DB1 - Demo Booths 1 (Z1)

Chair: TBA
• #11051
CRSRL: Customer Routing System using Reinforcement Learning
Chong Long, Zining Liu, Xiaolu Lu, Zehong Hu, Yafang Wang
Demo Booths 1

Allocating resources to customers in the customer service is a difficult problem, because designing an optimal strategy to achieve an optimal trade-off between available resources and customers' satisfaction is non-trivial. In this paper, we formalize the customer routing problem, and propose a novel framework based on deep reinforcement learning (RL) to address this problem. To make it more practical, a demo is provided to show and compare different models, which visualizes all decision process, and in particular, the system shows how the optimal strategy is reached. Besides, our demo system also ships with a variety of models that users can choose based on their needs.

• #11054
ATTENet: Detecting and Explaining Suspicious Tax Evasion Groups
Qinghua Zheng, Yating Lin, Huan He, Jianfei Ruan, Bo Dong
Demo Booths 1

In this demonstration, we present ATTENet, a novel visual analytic system for detecting and explaining suspicious affiliated-transaction-based tax evasion (ATTE) groups. First, the system constructs a taxpayer interest interacted network, which contains economic behaviors and social relationships between taxpayers. Then, the system combines basic features and structure features of each group in the network with network embedding method structure2Vec, and then detects suspicious ATTE groups with random forest algorithm. Last, to explore and explain the detection results, the system provides an ATTENet visualization with three coordinated views and interactive tools. We demonstrate ATTENet on a non-confidential dataset which contains two years of real tax data obtained by our cooperative tax authorities to verify the usefulness of our system.

• #11033
DeepRec: An Open-source Toolkit for Deep Learning based Recommendation
Shuai Zhang, Yi Tay, Lina Yao, Bin Wu, Aixin Sun
Demo Booths 1

Deep learning based recommender systems have been extensively explored in recent years. However, the large number of models proposed each year poses a big challenge for both researchers and practitioners in reproducing the results for further comparisons. Although a portion of papers provides source code, they adopted different programming languages or different deep learning packages, which also raises the bar in grasping the ideas. To alleviate this problem, we released the open source project: \textbf{DeepRec}. In this toolkit, we have implemented a number of deep learning based recommendation algorithms using Python and the widely used deep learning package - Tensorflow. Three major recommendation scenarios: rating prediction, top-N recommendation (item ranking) and sequential recommendation, were considered. Meanwhile, DeepRec maintains good modularity and extensibility to easily incorporate new models into the framework. It is distributed under the terms of the GNU General Public License. The source code is available at github: \url{https://github.com/cheungdaven/DeepRec}

• #11034
Agent-based Decision Support for Pain Management in Primary Care Settings
Xu Guo, Han Yu, Chunyan Miao, Yiqiang Chen
Demo Booths 1

The lack of systematic pain management training and support among primary care physicians (PCPs) limits their ability to provide quality care for pain patients. Here, we demonstrate an Agent-based Clinical Decision Support System to empower PCPs to leverage knowledge from pain specialists. The system analyses patients' health conditions related to pain, automatically generates treatment plans, and recommends a set of evidence-based pain cases to PCPs, with an embodied agent explaining the recommendations.

• #11036
A Mobile Application for Sound Event Detection
Yingwei Fu, Kele Xu, Haibo Mi, Huaimin Wang, Dezhi Wang, Boqing Zhu
Demo Booths 1

Sound event detection is intended to analyze and recognize the sound events in audio streams and it has widespread applications in real life. Recently, deep neural networks such as convolutional recurrent neural networks have shown state-of-the-art performance in this task. However, the previous methods were designed and implemented on devices with rich computing resources, and there are few applications on mobile devices. This paper focuses on the solution on the mobile platform for sound event detection. The architecture of the solution includes offline training and online detection. During offline training process, multi model-based distillation method is used to compress model to enable real-time detection. The online detection process includes acquisition of sensor data, processing of audio signals, and detecting and recording of sound events. Finally, we implement an application on the mobile device that can detect sound events in near real time.

• #11039
Demonstration of PerformanceNet: A Convolutional Neural Network Model for Score-to-Audio Music Generation
Yu-Hua Chen, Bryan Wang, Yi-Hsuan Yang
Demo Booths 1

We present in this paper PerformacnceNet, a neural network model we proposed recently to achieve score-to-audio music generation. The model learns to convert a music piece from the symbolic domain to the audio domain, assigning performance-level attributes such as changes in velocity automatically to the music and then synthesizing the audio. The model is therefore not just a neural audio synthesizer, but an AI performer that learns to interpret a musical score in its own way. The code and sample outputs of the model can be found online at https://github.com/bwang514/PerformanceNet.

• #11045
A Quantitative Analysis Platform for PD-L1 Immunohistochemistry based on Point-level Supervision Model
Haibo Mi, Kele Xu, Huaimin Wang, Dawei Feng, Yang Xiang, Yulin He, Chun Wu, Yanming Song, Xiaolei Sun
Demo Booths 1

Recently, deep learning has witnessed dramatic progress in the medical image analysis field. In the precise treatment of cancer immunotherapy, the quantitative analysis of PD-L1 immunohistochemistry is of great importance. It is quite common that pathologists manually quantify the cell nuclei. This process is very time-consuming and error-prone. In this paper, we describe the development of a platform for PD-L1 pathological image quantitative analysis using deep learning approaches. As point-level annotations can provide a rough estimate of the object locations and classifications, this platform adopts a point-level supervision model to classify, localize, and count the PD-L1 cells nuclei. Presently, this platform has achieved an accurate quantitative analysis of PD-L1 for two types of carcinoma, and it is deployed in one of the first-class hospitals in China.

• #11052
Explainable Deep Neural Networks for Multivariate Time Series Predictions
Roy Assaf, Anika Schumann
Demo Booths 1

We demonstrate that deep neural networks can not only be used for making predictions based on multivariate time series data, but also for explaining these predictions. This is important for a number of applications where predictions are the basis for decisions and actions. Hence, confidence in the prediction result is crucial.  We design a two stage convolutional neural network architecture which uses particular kernel sizes. This allows us to utilise gradient based techniques for generating saliency maps for both the time dimension and the features. These are then used for explaining which features during which time interval are responsible for a given prediction, as well as explaining during which time intervals was the joint contribution of all features most important for that prediction. We demonstrate our approach for the task of predicting the average energy production of photovoltaic power plants and for explaining these predictions.

• #11025
Neural Discourse Segmentation
Jing Li
Demo Booths 1

Identifying discourse structures and coherence relations in a piece of text is a fundamental task in natural language processing. The first step of this process is segmenting sentences into clause-like units called elementary discourse units (EDUs). Traditional solutions to discourse segmentation heavily rely on carefully designed features. In this demonstration, we present SegBot, a system to split a given piece of text into sequence of EDUs by using an end-to-end neural segmentation model. Our model does not require hand-crafted features or external knowledge except word embeddings, yet it outperforms state-of-the-art solutions to discourse segmentation.

• #11040
Design and Implementation of a Disambiguity Framework for Smart Voice Controlled Devices
Kehua Lei, Tianyi Ma, Jia Jia, Cunjun Zhang, Zhihan Yang
Demo Booths 1

With about 100 million people using it recently, SVCD(Smart Voice Controlled Device) are becoming demotic. Whether at home or in an office, usually, multiple appliances are under the control of a single SVCD and several people may manipulate an SVCD simultaneously. However, present SVCD fails to handle them appropriately. In this paper, we propose a novel framework for SVCD to eliminate orders’ ambiguity for single user or multi-user. We also design an algorithm combining Word2Vec and emotion detection for the device to wipe off ambiguity. Finally, we apply our framework into a virtual smart home scene and the performance of it indicates that our strategy resolves the problems commendably.

• #11049
AntProphet: an Intention Mining System behind Alipay's Intelligent Customer Service Bot
Cen Chen, Xiaolu Zhang, Sheng Ju, Chilin Fu, Caizhi Tang, Jun Zhou, Xiaolong Li
Demo Booths 1

We create an intention mining system, named AntProphet, for Alipay's intelligent customer service bot, to alleviate the burden of customer service. Whenever users have any questions, AntProphet is the first stop to help users to answer their questions. Our system gathers users' profile and their historical behavioral trajectories, together with contextual information to predict users' intention, i.e., the potential questions that users want to resolve. AntProphet takes care of more than 90% of the customer service demands in the Alipay APP and resolves most of the users' problems on the spot, thus significantly reduces the burden of manpower. With the help of it, the overall satisfaction rate of our customer service bot exceeds 85%.

### Tuesday 1310:50 - 11:50MTA|AM - Art and Music (R6)

Chair: TBD
• #3626
Temporal Pyramid Pooling Convolutional Neural Network for Cover Song Identification
Zhesong Yu, Xiaoshuo Xu, Xiaoou Chen, Deshun Yang
Art and Music

Cover song identification is an important problem in the field of Music Information Retrieval. Most existing methods rely on hand-crafted features and sequence alignment methods, and further breakthrough is hard to achieve. In this paper, Convolutional Neural Networks (CNNs) are used for representation learning toward this task. We show that they could be naturally adapted to deal with key transposition in cover songs. Additionally, Temporal Pyramid Pooling is utilized to extract information on different scales and transform songs with different lengths into fixed-dimensional representations. Furthermore, a training scheme is designed to enhance the robustness of our model. Extensive experiments demonstrate that combined with these techniques, our approach is robust against musical variations existing in cover songs and outperforms state-of-the-art methods on several datasets with low time complexity.

• #4170
Dilated Convolution with Dilated GRU for Music Source Separation
Jen-Yu Liu, Yi-Hsuan Yang
Art and Music

Stacked dilated convolutions used in Wavenet have been shown effective for generating high-quality audios. By replacing pooling/striding with dilation in convolution layers, they can preserve high-resolution information and still reach distant locations. Producing high-resolution predictions is also crucial in music source separation, whose goal is to separate different sound sources while maintain the quality of the separated sounds. Therefore, in this paper, we use stacked dilated convolutions as the backbone for music source separation. Although stacked dilated convolutions can reach wider context than standard convolutions do, their effective receptive fields are still fixed and might not be wide enough for complex music audio signals. To reach even further information at remote locations, we propose to combine a dilated convolution with a modified GRU called Dilated GRU to form a block. A Dilated GRU receives information from k-step before instead of the previous step for a fixed k. This modification allows a GRU unit to reach a location with fewer recurrent steps and run faster because it can execute in parallel partially. We show that the proposed model with a stack of such blocks performs equally well or better than the state-of-the-art for separating both vocals and accompaniment.

• #6280
Musical Composition Style Transfer via Disentangled Timbre Representations
Yun-Ning Hung, I-Tung Chiang, Yi-An Chen, Yi-Hsuan Yang
Art and Music

Music creation involves not only composing the different parts (e.g., melody, chords) of a musical work but also arranging/selecting the instruments to play the different parts. While the former has received increasing attention, the latter has not been much investigated. This paper presents, to the best of our knowledge, the first deep learning models for rearranging music of arbitrary genres. Specifically, we build encoders and decoders that take a piece of polyphonic musical audio as input, and predict as output its musical score. We investigate disentanglement techniques such as adversarial training to separate latent factors that are related to the musical content (pitch) of different parts of the piece, and that are related to the instrumentation (timbre) of the parts per short-time segment. By disentangling pitch and timbre, our models have an idea of how each piece was composed and arranged. Moreover, the models can realize “composition style transfer” by rearranging a musical piece without much affecting its pitch content. We validate the effectiveness of the models by experiments on instrument activity detection and composition style transfer. To facilitate follow-up research, we open source our code at https://github.com/biboamy/instrument-disentangle.

• #1469
SynthNet: Learning to Synthesize Music End-to-End
Florin Schimbinschi, Christian Walder, Sarah M. Erfani, James Bailey
Art and Music

We consider the problem of learning a mapping directly from annotated music to waveforms, bypassing traditional single note synthesis. We propose a specific architecture based on WaveNet, a convolutional autoregressive generative model designed for text to speech. We investigate the representations learned by these models on music and concludethat mappings between musical notes and the instrument timbre can be learned directly from the raw audio coupled with the musical score, in binary piano roll format.Our model requires minimal training data (9 minutes), is substantially better in quality and converges 6 times faster in comparison to strong baselines in the form of powerful text to speech models.The quality of the generated waveforms (generation accuracy) is sufficiently high,that they are almost identical to the ground truth.Our evaluations are based on both the RMSE of the Constant-Q transform, and mean opinion scores from human subjects.We validate our work using 7 distinct synthetic instrument timbres, real cello music and also provide visualizations and links to all generated audio.

### Tuesday 1310:50 - 12:35Panel Diversity in AI (R1)

Chair: Marie desJardins
• Panel: Diversity in AI: Where Are We and Where Are We Headed
Marie desJardins
Panel Diversity in AI
• ### Tuesday 1310:50 - 12:35ML|AL - Active Learning 1 (R2)

Chair: TBD
• #392
Deeper Connections between Neural Networks and Gaussian Processes Speed-up Active Learning
Evgenii Tsymbalov, Sergei Makarychev, Alexander Shapeev, Maxim Panov
Active Learning 1

Active learning methods for neural networks are usually based on greedy criteria, which ultimately give a single new design point for the evaluation. Such an approach requires either some heuristics to sample a batch of design points at one active learning iteration, or retraining the neural network after adding each data point, which is computationally inefficient. Moreover, uncertainty estimates for neural networks sometimes are overconfident for the points lying far from the training sample. In this work, we propose to approximate Bayesian neural networks (BNN) by Gaussian processes (GP), which allows us to update the uncertainty estimates of predictions efficiently without retraining the neural network while avoiding overconfident uncertainty prediction for out-of-sample points. In a series of experiments on real-world data, including large-scale problems of chemical and physical modeling, we show the superiority of the proposed approach over the state-of-the-art methods.

• #2537
Human-in-the-loop Active Covariance Learning for Improving Prediction in Small Data Sets
Homayun Afrabandpey, Tomi Peltola, Samuel Kaski
Active Learning 1

Learning predictive models from small high-dimensional data sets is a key problem in high-dimensional statistics. Expert knowledge elicitation can help, and a strong line of work focuses on directly eliciting informative prior distributions for parameters. This either requires considerable statistical expertise or is laborious, as the emphasis has been on accuracy and not on efficiency of the process. Another line of work queries about importance of features one at a time, assuming them to be independent and hence missing covariance information. In contrast, we propose eliciting expert knowledge about pairwise feature similarities, to borrow statistical strength in the predictions, and using sequential decision making techniques to minimize the effort of the expert. Empirical results demonstrate improvement in predictive performance on both simulated and real data, in high-dimensional linear regression tasks, where we learn the covariance structure with a Gaussian process, based on sequential elicitation.

• #3209
Multi-View Active Learning for Video Recommendation
Jia-Jia Cai, Jun Tang, Qing-Guo Chen, Yao Hu, Xiaobo Wang, Sheng-Jun Huang
Active Learning 1

On many video websites, the recommendation is implemented as a prediction problem of video-user pairs, where the videos are represented by text features extracted from the metadata. However, the metadata is manually annotated by users and is usually missing for online videos. To train an effective recommender system with lower annotation cost, we propose an active learning approach to fully exploit the visual view of videos, while querying as few annotations as possible from the text view. On one hand, a joint model is proposed to learn the mapping from visual view to text view by simultaneously aligning the two views and minimizing the classification loss. On the other hand, a novel strategy based on prediction inconsistency and watching frequency is proposed to actively select the most important videos for metadata querying. Experiments on both classification datasets and real video recommendation tasks validate that the proposed approach can significantly reduce the annotation cost.

• #6159
Active Learning within Constrained Environments through Imitation of an Expert Questioner
Kalesha Bullard, Yannick Schroecker, Sonia Chernova
Active Learning 1

Active learning agents typically employ a query selection algorithm which solely considers the agent's learning objectives.  However, this may be insufficient in more realistic human domains.  This work uses imitation learning to enable an agent in a constrained environment to concurrently reason about both its internal learning goals and environmental constraints externally imposed, all within its objective function.  Experiments are conducted on a concept learning task to test generalization of the proposed algorithm to different environmental conditions and analyze how time and resource constraints impact efficacy of solving the learning problem.  Our findings show the environmentally-aware learning agent is able to statistically outperform all other active learners explored under most of the constrained conditions.  A key implication is adaptation for active learning agents to more realistic human environments, where constraints are often externally imposed on the learner. p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 10.0px Helvetica}

• #1644
Mindful Active Learning
Zhila Esna Ashari, Hassan Ghasemzadeh
Active Learning 1

We propose a novel active learning framework for activity recognition using wearable sensors. Our work is unique in that it takes physical and cognitive limitations of the oracle into account when selecting sensor data to be annotated by the oracle. Our approach is inspired by human-beings' limited capacity to respond to external stimulus such as responding to a prompt on their mobile devices. This capacity constraint is manifested not only in the number of queries that a person can respond to in a given time-frame but also in the lag between the time that a query is made and when it is responded to. We introduce the notion of mindful active learning and propose a computational framework, called EMMA, to maximize the active learning performance taking informativeness of sensor data, query budget, and human memory into account. We formulate this optimization problem, propose an approach to model memory retention, discuss complexity of the problem, and propose a greedy heuristic to solve the problem. We demonstrate the effectiveness of our approach on three publicly available datasets and by simulating oracles with various memory strengths. We show that the activity recognition accuracy ranges from 21% to 97% depending on memory strength, query budget, and difficulty of the machine learning task. Our results also indicate that EMMA achieves an accuracy level that is, on average, 13.5% higher than the case when only informativeness of the sensor data is considered for active learning. Additionally, we show that the performance of our approach is at most 20% less than experimental upper-bound and up to 80% higher than experimental lower-bound. We observe that mindful active learning is most beneficial when query budget is small and/or oracle's memory is weak, thus emphasizing contributions of our work in human-centered mobile health settings and for elderly with cognitive impairments.

• #4697
ActiveHNE: Active Heterogeneous Network Embedding
Xia Chen, Guoxian Yu, Jun Wang, Carlotta Domeniconi, Zhao Li, Xiangliang Zhang
Active Learning 1

Heterogeneous network embedding (HNE) is a challenging task due to the diverse node types and/or diverse relationships between nodes. Existing HNE methods are typically  unsupervised.  To maximize the profit of utilizing the rare and valuable supervised information in HNEs, we develop a novel Active Heterogeneous Network Embedding (ActiveHNE) framework, which includes two components: Discriminative Heterogeneous Network Embedding (DHNE) and Active Query in Heterogeneous Networks (AQHN).In DHNE, we introduce a novel semi-supervised heterogeneous network embedding method based on graph convolutional neural network. In AQHN, we first introduce three active selection strategies based on uncertainty and representativeness, and then derive a batch selection method that assembles these strategies using a multi-armed bandit mechanism. ActiveHNE aims at improving the performance of HNE by feeding the most valuable supervision obtained by AQHN into DHNE. Experiments on public datasets demonstrate the effectiveness of ActiveHNE and its advantage on reducing the query cost.

• #2628
Deep Active Learning with Adaptive Acquisition
Manuel Haussmann, Fred Hamprecht, Melih Kandemir
Active Learning 1

Model selection is treated as a standard performance boosting step in many machine learning applications. Once all other properties of a learning problem are fixed, the model is selected by grid search on a held-out validation set. This is strictly inapplicable to active learning. Within the standardized workflow, the acquisition function is chosen among available heuristics a priori, and its success is observed only after the labeling budget is already exhausted. More importantly, none of the earlier studies report a unique consistently successful acquisition heuristic to the extent to stand out as the unique best choice. We present a method to break this vicious circle by defining the acquisition function as a learning predictor and training it by reinforcement feedback collected from each labeling round. As active learning is a scarce data regime, we bootstrap from a well-known heuristic that filters the bulk of data points on which all heuristics would agree, and learn a policy to warp the top portion of this ranking in the most beneficial way for the character of a specific data distribution. Our system consists of a Bayesian neural net, the predictor, a bootstrap acquisition function, a probabilistic state definition, and another Bayesian policy network that can effectively incorporate this input distribution. We observe on three benchmark data sets that our method always manages to either invent a new superior acquisition function or to adapt itself to the a priori unknown best performing heuristic for each specific data set.

### Tuesday 1310:50 - 12:35ML|DL - Deep Learning 1 (R3)

Chair: TBD
• #1601
FakeTables: Using GANs to Generate Functional Dependency Preserving Tables with Bounded Real Data
Haipeng Chen, Sushil Jajodia, Jing Liu, Noseong Park, Vadim Sokolov, V. S. Subrahmanian
Deep Learning 1

In many cases, an organization wishes to release some data, but is restricted in the amount of data to be released due to legal, privacy and other concerns. For instance, the US Census Bureau releases only 1% of its table of records every year, along with statistics about the entire table. However, the machine learning (ML) models trained on the released sub-table are usually sub-optimal. In this paper, our goal is to find a way to augment the sub-table by generating a synthetic table from the released sub-table, under the constraints that the generated synthetic table (i) has similar statistics as the entire table, and (ii) preserves the functional dependencies of the released sub-table. We propose a novel generative adversarial network framework called ITS-GAN, where both the generator and the discriminator are specifically designed to satisfy these two constraints. By evaluating the augmentation performance of ITS-GAN on two representative datasets, the US Census Bureau data and US Bureau of Transportation Statistics (BTS) data, we show that ITS-GAN yields high quality classification results, and significantly outperforms various state-of-the-art data augmentation approaches.

• #1954
Boundary Perception Guidance: A Scribble-Supervised Semantic Segmentation Approach
Bin Wang, Guojun Qi, Sheng Tang, Tianzhu Zhang, Yunchao Wei, Linghui Li, Yongdong Zhang
Deep Learning 1

Semantic segmentation suffers from the fact that densely annotated masks are expensive to obtain. To tackle this problem, we aim at learning to segment by only leveraging scribbles that are much easier to collect for supervision. To fully explore the limited pixel-level annotations from scribbles, we present a novel Boundary Perception Guidance (BPG) approach, which consists of two basic components, i.e., prediction refinement and boundary regression. Specifically, the prediction refinement progressively makes a better segmentation by adopting an iterative upsampling and a semantic feature  enhancement strategy. In the boundary regression, we employ class-agnostic edge maps for supervision to effectively guide the segmentation network in localizing the boundaries between different semantic regions, leading to producing finer-grained representation of feature maps for semantic segmentation. The experiment results on the PASCAL VOC 2012 demonstrate the proposed BPG achieves mIoU of 73.2% without fully connected Conditional Random Field (CRF) and 76.0% with CRF, setting up the new state-of-the-art in literature.

• #2829
Open-Ended Long-Form Video Question Answering via Hierarchical Convolutional Self-Attention Networks
Zhu Zhang, Zhou Zhao, Zhijie Lin, Jingkuan Song, Xiaofei He
Deep Learning 1

Open-ended video question answering aims to automatically generate the natural-language answer from referenced video contents according to the given question. Currently, most existing approaches focus on short-form video question answering with multi-modal recurrent encoder-decoder networks. Although these works have achieved promising performance, they may still be ineffectively applied to long-form video question answering due to the lack of long-range dependency modeling and the suffering from the heavy computational cost. To tackle these problems, we propose a fast hierarchical convolutional self-attention encoder-decoder network. Concretely, we first develop a hierarchical convolutional self-attention encoder to efficiently model long-form video contents, which builds the hierarchical structure for video sequences and captures question-aware long-range dependencies from video context. We then devise a multi-scale attentive decoder to incorporate multi-layer video representations for answer generation, which avoids the information missing of the top encoder layer. The extensive experiments show the effectiveness and efficiency of our method.

• #3298
Attribute Aware Pooling for Pedestrian Attribute Recognition
Kai Han, Yunhe Wang, Han Shu, Chuanjian Liu, Chunjing Xu, Chang Xu
Deep Learning 1

This paper expands the strength of deep convolutional neural networks (CNNs) to the pedestrian attribute recognition problem by devising a novel attribute aware pooling algorithm. Existing vanilla CNNs cannot be straightforwardly applied to handle multi-attribute data because of the larger label space as well as the attribute entanglement and correlations. We tackle these challenges that hampers the development of CNNs for multi-attribute classification by fully exploiting the correlation between different attributes. The multi-branch architecture is adopted for fucusing on attributes at different regions. Besides the prediction based on each branch itself, context information of each branch are employed for decision as well. The attribute aware pooling is developed to integrate both kinds of information. Therefore, attributes which are indistinct or tangled with others can be accurately recognized by exploiting the context information. Experiments on benchmark datasets demonstrate that the proposed pooling method appropriately explores and exploits the correlations between attributes for the pedestrian attribute recognition.

• #365
MUSICAL: Multi-Scale Image Contextual Attention Learning for Inpainting
Ning Wang, Jingyuan Li, Lefei Zhang, Bo Du
Deep Learning 1

We study the task of image inpainting, where an image with missing region is recovered with plausible context. Recent approaches based on deep neural networks have exhibited potential for producing elegant detail and are able to take advantage of background information, which gives texture information about missing region in the image. These methods often perform pixel/patch level replacement on the deep feature maps of missing region and therefore enable the generated content to have similar texture as background region. However, this kind of replacement is a local strategy and often performs poorly when the background information is misleading. To this end, in this study, we propose to use a multi-scale image contextual attention learning (MUSICAL) strategy that helps to flexibly handle richer background information while avoid to misuse of it. However, such strategy may not promising in generating context of reasonable style. To address this issue, both of the style loss and the perceptual loss are introduced into the proposed method to achieve the style consistency of the generated image. Furthermore, we have also noticed that replacing some of the down sampling layers in the baseline network with the stride 1 dilated convolution layers is beneficial for producing sharper and fine-detailed results. Experiments on the Paris Street View, Places, and CelebA datasets indicate the superior performance of our approach compares to the state-of-the-arts.

• #1516
Neurons Merging Layer: Towards Progressive Redundancy Reduction for Deep Supervised Hashing
Chaoyou Fu, Liangchen Song, Xiang Wu, Guoli Wang, Ran He
Deep Learning 1

Deep supervised hashing has become an active topic in information retrieval. It generates hashing bits by the output neurons of a deep hashing network. During binary discretization, there often exists much redundancy between hashing bits that degenerates retrieval performance in terms of both storage and accuracy. This paper proposes a simple yet effective Neurons Merging Layer (NMLayer) for deep supervised hashing. A graph is constructed to represent the redundancy relationship between hashing bits that is used to guide the learning of a hashing network. Specifically, it is dynamically learned by a novel mechanism defined in our active and frozen phases. According to the learned relationship, the NMLayer merges the redundant neurons together to balance the importance of each output neuron. Moreover, multiple NMLayers are progressively trained for a deep hashing network to learn a more compact hashing code from a long redundant code. Extensive experiments on four datasets demonstrate that our proposed method outperforms state-of-the-art hashing methods.

• #1498
Deeply-learned Hybrid Representations for Facial Age Estimation
Zichang Tan, Yang Yang, Jun Wan, Guodong Guo, Stan Z. Li
Deep Learning 1

In this paper, we propose a novel unified network named Deep Hybrid-Aligned Architecture for facial age estimation. It contains global, local and global-local branches. They are jointly optimized and thus can capture multiple types of features with complementary information. In each branch, we employ a separate loss for each sub-network to extract the independent features and use a recurrent fusion to explore correlations among those region features. Considering that the pose variations may lead to misalignment in different regions, we design an Aligned Region Pooling operation to generate aligned region features. Moreover, a new large age dataset named Web-FaceAge owning more than 120K samples is collected under diverse scenes and spanning a large age range. Experiments on five age benchmark datasets, including Web-FaceAge, Morph, FG-NET, CACD and Chalearn LAP 2015, show that the proposed method outperforms the state-of-the-art approaches significantly.

### Tuesday 1310:50 - 12:35ML|RS - Recommender Systems 1 (R4)

Chair: TBD
• #663
Matrix Completion in the Unit Hypercube via Structured Matrix Factorization
Emanuele Bugliarello, Swayambhoo Jain, Vineeth Rakesh
Recommender Systems 1

Several complex tasks that arise in organizations can be simplified by mapping them into a matrix completion problem. In this paper, we address a key challenge faced by our company: predicting the efficiency of artists in rendering visual effects (VFX) in film shots. We tackle this challenge by using a two-fold approach: first, we transform this task into a constrained matrix completion problem with entries bounded in the unit interval [0,1]; second, we propose two novel matrix factorization models that leverage our knowledge of the VFX environment. Our first approach, expertise matrix factorization (EMF), is an interpretable method that structures the latent factors as weighted user-item interplay. The second one, survival matrix factorization (SMF), is instead a probabilistic model for the underlying process defining employees' efficiencies. We show the effectiveness of our proposed models by extensive numerical tests on our VFX dataset and two additional datasets with values that are also bounded in the [0,1] interval.

• #704
Modeling Multi-Purpose Sessions for Next-Item Recommendations via Mixture-Channel Purpose Routing Networks
Shoujin Wang, Liang Hu, Yan Wang, Quan Z. Sheng, Mehmet Orgun, Longbing Cao
Recommender Systems 1

A session-based recommender system (SBRS) suggests the next item by modeling the dependencies between items in a session. Most of existing SBRSs assume the items inside a session are associated with one (implicit) purpose. However, this may not always be true in reality, and a session may often consist of multiple subsets of items for different purposes (e.g., breakfast and decoration). Specifically, items (e.g., bread and milk) in a subsethave strong purpose-specific dependencies whereas items (e.g., bread and vase) from different subsets have much weaker or even no dependencies due to the difference of purposes. Therefore, we propose a mixture-channel model to accommodate the multi-purpose item subsets for more precisely representing a session. Filling gaps in existing SBRSs, this model recommends more diverse items to satisfy different purposes. Accordingly, we design effective mixture-channel purpose routing networks (MCPRN) with a purpose routing network to detect the purposes of each item and assign it into the corresponding channels. Moreover, a purpose specific recurrent network is devised to model the dependencies between items within each channel for a specific purpose. The experimental results show the superiority of MCPRN over the state-of-the-art methods in terms of both recommendation accuracy and diversity.

• #1488
A Review-Driven Neural Model for Sequential Recommendation
Chenliang Li, Xichuan Niu, Xiangyang Luo, Zhenzhong Chen, Cong Quan
Recommender Systems 1

Writing review for a purchased item is a unique channel to express a user's opinion in E-Commerce. Recently, many deep learning based solutions have been proposed by exploiting user reviews for rating prediction. In contrast, there has been few attempt to enlist the semantic signals covered by user reviews for the task of collaborative filtering. In this paper, we propose a novel review-driven neural sequential recommendation model (named RNS) by considering user's intrinsic preference (long-term) and sequential patterns (short-term). In detail, RNS is devised to encode each user or item with the aspect-aware representations extracted from the reviews. Given a sequence of historical purchased items for a user, we devise a novel hierarchical attention over attention mechanism to capture sequential patterns at both union-level and individual-level. Extensive experiments on three real-world datasets of different domains demonstrate that RNS obtains significant performance improvement over uptodate state-of-the-art sequential recommendation models.

• #3150
PD-GAN: Adversarial Learning for Personalized Diversity-Promoting Recommendation
Qiong Wu, Yong Liu, Chunyan Miao, Binqiang Zhao, Yin Zhao, Lu Guan
Recommender Systems 1

This paper proposes Personalized Diversity-promoting GAN (PD-GAN), a novel recommendation model to generate diverse, yet relevant recommendations. Specifically, for each user, a generator recommends a set of diverse and relevant items by sequentially sampling from a personalized Determinantal Point Process (DPP) kernel matrix. This kernel matrix is constructed by two learnable components: the general co-occurrence of diverse items and the user's personal preference to items. To learn the first component, we propose a novel pairwise learning paradigm using training pairs, and each training pair consists of a set of diverse items and a set of similar items randomly sampled from the observed data of all users. The second component is learnt through adversarial training against a discriminator which strives to distinguish between recommended items and the ground-truth sets randomly sampled from the observed data of the target user. Experimental results show that PD-GAN is superior to generate recommendations that are both diverse and relevant.

• #4290
DARec: Deep Domain Adaptation for Cross-Domain Recommendation via Transferring Rating Patterns
Feng Yuan, Lina Yao, Boualem Benatallah
Recommender Systems 1

Cross-domain recommendation has long been one of the major topics in recommender systems.Recently, various deep models have been proposed to transfer the learned knowledge across domains, but most of them focus on extracting abstract transferable features from auxilliary contents, e.g., images and review texts, and the patterns in the rating matrix itself is rarely touched. In this work, inspired by the concept of domain adaptation, we proposed a deep domain adaptation model (DARec) that is capable of extracting and transferring patterns from rating matrices only without relying on any auxillary information. We empirically demonstrate on public datasets that our method achieves the best performance among several state-of-the-art alternative cross-domain recommendation models.

• #5182
STAR-GCN: Stacked and Reconstructed Graph Convolutional Networks for Recommender Systems
Jiani Zhang, Xingjian Shi, Shenglin Zhao, Irwin King
Recommender Systems 1

We propose a new STAcked and Reconstructed Graph Convolutional Networks (STAR-GCN) architecture to learn node representations for boosting the performance in recommender systems, especially in the cold start scenario. STAR-GCN employs a stack of GCN encoder-decoders combined with intermediate supervision to improve the final prediction performance. Unlike the graph convolutional matrix completion model with one-hot encoding node inputs, our STAR-GCN learns low-dimensional user and item latent factors as the input to restrain the model space complexity. Moreover, our STAR-GCN can produce node embeddings for new nodes by reconstructing masked input node embeddings, which essentially tackles the cold start problem. Furthermore, we discover a label leakage issue when training GCN-based models for link prediction tasks and propose a training strategy to avoid the issue. Empirical results on multiple rating prediction benchmarks demonstrate our model achieves state-of-the-art performance in four out of five real-world datasets and significant improvements in predicting ratings in the cold start scenario. The code implementation is available in https://github.com/jennyzhang0215/STAR-GCN.

• #6424
Disparity-preserved Deep Cross-platform Association for Cross-platform Video Recommendation
Shengze Yu, Xin Wang, Wenwu Zhu, Peng Cui, Jingdong Wang
Recommender Systems 1

Cross-platform recommendation aims to improve recommendation accuracy through associating information from different platforms. Existing cross-platform recommendation approaches assume all cross-platform information to be consistent with each other and can be aligned. However, there remain two unsolved challenges: i) there exist inconsistencies in cross-platform association due to platform-specific disparity, and ii) data from distinct platforms may have different semantic granularities. In this paper, we propose a cross-platform association model for cross-platform video recommendation, i.e., Disparity-preserved Deep Cross-platform Association (DCA), taking platform-specific disparity and granularity difference into consideration. The proposed DCA model employs a partially-connected multi-modal autoencoder, which is capable of explicitly capturing platform-specific information, as well as utilizing nonlinear mapping functions to handle granularity differences. We then present a cross-platform video recommendation approach based on the proposed DCA model. Extensive experiments for our cross-platform recommendation framework on real-world dataset demonstrate that the proposed DCA model significantly outperform existing cross-platform recommendation methods in terms of various evaluation metrics.

### Tuesday 1310:50 - 12:35AMS|NG - Noncooperative Games 1 (R5)

Chair: TBD
• #2776
Be a Leader or Become a Follower: The Strategy to Commit to with Multiple Leaders
Matteo Castiglioni, Alberto Marchesi, Nicola Gatti
Noncooperative Games 1

We study the problem of computing correlated strategies to commit to in games with multiple leaders and followers. To the best of our knowledge, this problem is widely unexplored so far, as the majority of the works in the literature focus on games with a single leader and one or more followers. The fundamental ingredient of our model is that a leader can decide whether to participate in the commitment or to defect from it by taking on the role of follower. This introduces a preliminary stage where, before the underlying game is played, the leaders make their decisions to reach an agreement on the correlated strategy to commit to. We distinguish three solution concepts on the basis of the constraints that they enforce on the agreement reached by the leaders. Then, we provide a comprehensive study of the properties of our solution concepts, in terms of existence, relation with other solution concepts, and computational complexity.

• #2814
Civic Crowdfunding for Agents with Negative Valuations and Agents with Asymmetric Beliefs
Sankarshan Damle, Moin Hussain Moti, Praphul Chandra, Sujit Gujar
Noncooperative Games 1

In the last decade, civic crowdfunding has proved to be effective in generating funds for the provision of public projects. However, the existing literature deals only with citizen's with positive valuation and symmetric belief towards the project's provision.  In this work, we present novel mechanisms which break these two barriers, i.e., mechanisms which incorporate negative valuation and asymmetric belief, independently. For negative valuation, we present a methodology for converting existing mechanisms to mechanisms that incorporate agents with negative valuations. Particularly, we adapt existing PPR and PPS mechanisms, to present novel PPRN and PPSN mechanisms which incentivize strategic agents to contribute to the project based on their true preference. With respect to asymmetric belief, we propose a reward scheme Belief Based Reward (BBR) based on Robust Bayesian Truth Serum mechanism. With BBR, we propose a general mechanism for civic crowdfunding which incorporates asymmetric agents. We leverage PPR and PPS, to present PPRx and PPSx. We prove that in PPRx and PPSx, agents with greater belief towards the project's provision contribute more than agents with lesser belief. Further, we also show that contributions are such that the project is provisioned at equilibrium.

• #4500
Network Formation under Random Attack and Probabilistic Spread
Yu Chen, Shahin Jabbari, Michael Kearns, Sanjeev Khanna, Jamie Morgenstern
Noncooperative Games 1

We study a network formation game where agents receive benefits by forming connections to other agents but also incur both direct and indirect costs from the formed connections. Specifically, once the agents have purchased their connections, an attack starts at a randomly chosen vertex in the network and spreads according to the independent cascade model with a fixed probability, destroying any infected agents. The utility or welfare of an agent in our game is defined to be the expected size of the agent's connected component post-attack minus her expenditure in forming connections. Our goal is to understand the properties of the equilibrium networks formed in this game. Our first result concerns the edge density of equilibrium networks. A network connection increases both the likelihood of remaining connected to other agents after an attack as well the likelihood of getting infected by a cascading spread of infection. We show that the latter concern primarily prevails and any equilibrium network in our game contains only $O(n\log n)$ edges where $n$ denotes the number of agents. On the other hand, there are equilibrium networks that contain $\Omega(n)$ edges showing that our edge density bound is tight up to a logarithmic factor. Our second result shows that the presence of attack and its spread through a cascade does not significantly lower social welfare as long as the network is not too dense. We show that any non-trivial equilibrium network with $O(n)$ edges has $\Theta(n^2)$ social welfare, asymptotically similar to the social welfare guarantee in the game without any attacks.

• #4515
Equilibrium Characterization for Data Acquisition Games
Jinshuo Dong, Hadi Elzayn, Shahin Jabbari, Michael Kearns, Zachary Schutzman
Noncooperative Games 1

We study a game between two firms which each provide a service based on machine learning.  The firms are presented with the opportunity to purchase a new corpus of data, which will allow them to potentially improve the quality of their products. The firms can decide whether or not they want to buy the data, as well as which learning model to build on that data. We demonstrate a reduction from this potentially complicated action space  to a one-shot, two-action game in which each firm only decides whether or not to buy the data. The game admits several regimes which depend on the relative strength of the two firms at the outset and the price at which the data is being offered. We analyze the game's Nash equilibria in all parameter regimes and demonstrate that, in expectation, the outcome of the game is that the initially stronger firm's market position weakens whereas the initially weaker firm's market position becomes stronger. Finally, we consider the perspective of the users of the service and demonstrate that the expected outcome at equilibrium is not the one which maximizes the welfare of the consumers.

• #4907
Compact Representation of Value Function in Partially Observable Stochastic Games
Karel Horák, Branislav Bošanský, Christopher Kiekintveld, Charles Kamhoua
Noncooperative Games 1

Value methods for solving stochastic games with partial observability model the uncertainty of the players as a probability distribution over possible states, where the dimension of the belief space is the number of states. For many practical problems, there are exponentially many states which causes scalability problems. We propose an abstraction technique that addresses this curse of dimensionality by projecting the high-dimensional beliefs onto characteristic vectors of significantly lower dimension (e.g., marginal probabilities). Our main contributions are (1) a novel compact representation of the uncertainty in partially observable stochastic games and (2) a novel algorithm using this representation that is based on existing state-of-the-art algorithms for solving stochastic games with partial observability. Experimental evaluation confirms that the new algorithm using the compact representation dramatically increases scalability compared to the state of the art.

• #6310
Temporal Information Design in Contests
Priel Levy, David Sarne, Yonatan Aumann
Noncooperative Games 1

We study temporal information design in contests, wherein the organizer may, possibly incrementally, disclose information about the participation and performance of some contestants to other (later) contestants. We show that such incremental disclosure can increase the organizer's profit. The expected profit, however, depends on the exact information disclosure structure, and the optimal structure depends on the parameters of the problem. We provide a game-theoretic analysis of such information disclosure schemes as they apply to two common models of contests: (a) simple contests, wherein contestants' decisions concern only their participation; and (b) Tullock contests, wherein contestants choose the effort levels to expend. For each of these we analyze and characterize the equilibrium strategy, and exhibit the potential benefits of information design.

• #886
Possibilistic Games with Incomplete Information
Nahla Ben Amor, Helene Fargier, Régis Sabbadin, Meriem Trabelsi
Noncooperative Games 1

Bayesian games offer a suitable framework for games where the utility degrees are additive in essence. This approach does nevertheless not apply to ordinal games, where the utility degrees do not capture more than a ranking, nor to situations of decision under qualitative uncertainty. This paper proposes a representation framework for ordinal games under possibilistic incomplete information (π-games) and extends the fundamental notion of Nash equilibrium (NE) to this framework. We show that deciding whether a NE exists is a difficult problem (NP-hard) and propose a  Mixed Integer Linear Programming  (MILP) encoding. Experiments on variants of the GAMUT problems confirm the feasibility of this approach.

### Tuesday 1310:50 - 12:35HAI|PUM - Personalization and User Modeling (R7)

Chair: TBD
• #254
Deep Adversarial Social Recommendation
Wenqi Fan, Tyler Derr, Yao Ma, Jianping Wang, Jiliang Tang, Qing Li
Personalization and User Modeling

Recent years have witnessed rapid developments on social recommendation techniques for improving the performance of recommender systems due to the growing influence of social networks to our daily life. The majority of existing social recommendation methods unify user representation for the user-item interactions (item domain) and user-user connections (social domain). However, it may restrain user representation learning in each respective domain, since users behave and interact differently in the two domains, which makes their representations to be heterogeneous. In addition, most of traditional recommender systems can not efficiently optimize these objectives, since they utilize negative sampling technique which is unable to provide enough informative guidance towards the training during the optimization process. In this paper, to address the aforementioned challenges, we propose a novel deep adversarial social recommendation framework DASO. It adopts a bidirectional mapping method to transfer users' information between social domain and item domain using adversarial learning. Comprehensive experiments on two real-world datasets show the effectiveness of the proposed framework.

• #1863
Minimizing Time-to-Rank: A Learning and Recommendation Approach
Haoming Li, Sujoy Sikdar, Rohit Vaish, Junming Wang, Lirong Xia, Chaonan Ye
Personalization and User Modeling

Consider the following problem faced by an online voting platform: A user is provided with a list of alternatives, and is asked to rank them in order of preference using only drag-and-drop operations. The platform's goal is to recommend an initial ranking that minimizes the time spent by the user in arriving at her desired ranking. We develop the first optimization framework to address this problem, and make theoretical as well as practical contributions. On the practical side, our experiments on the Amazon Mechanical Turk platform provide two interesting insights about user behavior: First, that users' ranking strategies closely resemble selection or insertion sort, and second, that the time taken for a drag-and-drop operation depends linearly on the number of positions moved. These insights directly motivate our theoretical model of the optimization problem. We show that computing an optimal recommendation is NP-hard, and provide exact and approximation algorithms for a variety of special cases of the problem. Experimental evaluation on MTurk shows that, compared to a random recommendation strategy, the proposed approach reduces the (average) time-to-rank by up to 50%.

• #2398
DeepAPF: Deep Attentive Probabilistic Factorization for Multi-site Video Recommendation
Huan Yan, Xiangning Chen, Chen Gao, Yong Li, Depeng Jin
Personalization and User Modeling

Existing web video systems recommend videos according to users' viewing history from its own website. However, since many users watch videos in multiple websites, this approach fails to capture these users' interests across sites. In this paper, we investigate the user viewing behavior in multiple sites based on a large scale real dataset. We find that user interests are comprised of cross-site consistent part and site-specific part with different degrees of the importance. Existing linear matrix factorization recommendation model has limitation in modeling such complicated interactions. Thus, we propose a model of Deep Attentive Probabilistic Factorization (DeepAPF) to exploit deep learning method to approximate such complex user-video interaction. DeepAPF captures both cross-site common interests and site-specific interests with non-uniform importance weights learned by the attentional network. Extensive experiments show that our proposed model outperforms by 17.62%, 7.9% and 8.1% with the comparison of three state-of-the-art baselines. Our study provides insight to integrate user viewing records from multiple sites via the trusted third party, which gains mutual benefits in video recommendation.

• #3165
Personalized Multimedia Item and Key Frame Recommendation
Le Wu, Lei Chen, Yonghui Yang, Richang Hong, Yong Ge, Xing Xie, Meng Wang
Personalization and User Modeling

When recommending or advertising items to users, an emerging trend is to present each multimedia item with  a key frame image (e.g., the poster of a movie). As each multimedia item can be represented as  multiple fine-grained  visual images (e.g., related images of the movie), personalized key frame recommendation is necessary in these applications to attract users' unique visual preferences. However, previous personalized key frame recommendation models relied on users' fine grained image  behavior of  multimedia items (e.g., user-image interaction behavior), which is often not available in real scenarios.  In this paper, we study the general problem of joint multimedia item and key frame recommendation in the absence of the fine-grained user-image behavior. We argue that the key challenge of this problem lies in discovering users' visual profiles for key frame recommendation, as most recommendation models  would fail without any users' fine-grained image behavior. To tackle this challenge, we leverage users' item behavior by projecting users(items) in two latent spaces: a collaborative latent space and a visual latent space. We further design a model to discern both the collaborative and  visual dimensions of users, and model how users make decisive item preferences from these two spaces. As a result, the learned user visual profiles could be directly applied for key frame recommendation. Finally, experimental results on a real-world dataset clearly show the effectiveness of our proposed model on the two recommendation tasks.

• #3571
Discrete Trust-aware Matrix Factorization for Fast Recommendation
Guibing Guo, Enneng Yang, Li Shen, Xiaochun Yang, Xiaodong He
Personalization and User Modeling

Trust-aware recommender systems have received much attention recently for their abilities to capture the influence among connected users. However, they suffer from the efficiency issue due to large amount of data and time-consuming real-valued operations. Although existing discrete collaborative filtering may alleviate this issue to some extent, it is unable to accommodate social influence. In this paper we propose a discrete trust-aware matrix factorization (DTMF) model to take dual advantages of both social relations and discreting technique for fast recommendation. Specifically, we map the latent representation of users and items into a joint hamming space by recovering the rating and trust interactions between users and items. We adopt a sophisticated discrete coordinate descent (DCD) approach to optimize our proposed model. In addition, experiments on two real-world datasets demonstrate the superiority of our approach against other state-of-the-art approaches in terms of ranking accuracy and efficiency.

• #3677
An Input-aware Factorization Machine for Sparse Prediction
Yantao Yu, Zhen Wang, Bo Yuan
Personalization and User Modeling

Factorization machines (FMs) are a class of general predictors working effectively with sparse data, which represents features using factorized parameters and weights. However, the accuracy of FMs can be adversely affected by the fixed representation trained for each feature, as the same feature is usually not equally predictive and useful in different instances. In fact, the inaccurate representation of features may even introduce noise and degrade the overall performance. In this work, we improve FMs by explicitly considering the impact of individual input upon the representation of features. We propose a novel model named \textit{Input-aware Factorization Machine} (IFM), which learns a unique input-aware factor for the same feature in different instances via a neural network. Comprehensive experiments on three real-world recommendation datasets are used to demonstrate the effectiveness and mechanism of IFM. Empirical results indicate that IFM is significantly better than the standard FM model and consistently outperforms four state-of-the-art deep learning based methods.

• #4234
Dynamic Item Block and Prediction Enhancing Block for Sequential Recommendation
Guibing Guo, Shichang Ouyang, Xiaodong He, Fajie Yuan, Xiaohua Liu
Personalization and User Modeling

Sequential recommendation systems have become a research hotpot recently to suggest users with the next item of interest (to interact with). However, existing approaches suffer from two limitations: (1) The representation of an item is relatively static and fixed for all users. We argue that even a same item should be represented distinctively with respect to different users and time steps. (2) The generation of a prediction for a user over an item is computed in a single scale (e.g., by their inner product), ignoring the nature of multi-scale user preferences. To resolve these issues, in this paper we propose two enhancing building blocks for sequential recommendation. Specifically, we devise a Dynamic Item Block (DIB) to learn dynamic item representation by aggregating the embeddings of those who rated the same item before that time step. Then, we come up with a Prediction Enhancing Block (PEB) to project user representation into multiple scales, based on which many predictions can be made and attentively aggregated for enhanced learning. Each prediction is generated by a softmax over a sampled itemset rather than the whole item space for efficiency. We conduct a series of experiments on four real datasets, and show that even a basic model can be greatly enhanced with the involvement of DIB and PEB in terms of ranking accuracy. The code and datasets can be obtained from https://github.com/ouououououou/DIB-PEB-Sequential-RS

### Tuesday 1310:50 - 12:35KRR|ACC - Action, Change and Causality (R8)

Chair: TBD
• #921
Estimating Causal Effects of Tone in Online Debates
Dhanya Sridhar, Lise Getoor
Action, Change and Causality

Statistical methods applied to social media posts shed light on the dynamics of online dialogue. For example, users' wording choices predict their persuasiveness and users adopt the language patterns of other dialogue participants. In this paper, we estimate the causal effect of reply tones in debates on linguistic and sentiment changes in subsequent responses. The challenge for this estimation is that a reply's tone and subsequent responses are confounded by the users' ideologies on the debate topic and their emotions. To overcome this challenge, we learn representations of ideology using generative models of text. We study debates from 4Forums.com and compare annotated tones of replying such as emotional versus factual, or reasonable versus attacking. We show that our latent confounder representation reduces bias in ATE estimation. Our results suggest that factual and asserting tones affect dialogue and provide a methodology for estimating causal effects from text.

• #2116
Automatic Verification of FSA Strategies via Counterexample-Guided Local Search for Invariants
Kailun Luo, Yongmei Liu
Action, Change and Causality

Strategy representation and reasoning has received much attention over the past years. In this paper, we consider the representation of general strategies that solve a class of (possibly infinitely many) games with similar structures, and their automatic verification, which is an undecidable problem. We propose to represent a general strategy by an FSA (Finite State Automaton) with edges labelled by restricted Golog programs. We formalize the semantics of FSA strategies in the situation calculus. Then we propose an incomplete method for verifying whether an FSA strategy is a winning strategy by counterexample-guided local search for appropriate invariants. We implemented our method and did experiments on combinatorial game and also single-agent domains. Experimental results showed that our system can successfully verify most of them within a reasonable amount of time.

• #2213
Causal Discovery with Cascade Nonlinear Additive Noise Model
Ruichu Cai, Jie Qiao, Kun Zhang, Zhenjie Zhang, Zhifeng Hao
Action, Change and Causality

Identification of causal direction between a causal-effect pair from observed data has recently attracted much attention. Various methods based on functional causal models have been proposed to solve this problem, by assuming the causal process satisfies some (structural) constraints and showing that the reverse direction violates such constraints. The nonlinear additive noise model has been demonstrated to be effective for this purpose, but the model class is not transitive--even if each direct causal relation follows this model, indirect causal influences, which result from omitted intermediate causal variables and are frequently encountered in practice, do not necessarily follow the model constraints; as a consequence, the nonlinear additive noise model may fail to correctly discover causal direction. In this work, we propose a cascade nonlinear additive noise model to represent such causal influences--each direct causal relation follows the nonlinear additive noise model but we observe only the initial cause and final effect. We further propose a method to estimate the model, including the unmeasured intermediate variables, from data, under the variational auto-encoder framework. Our theoretical results show that with our model, causal direction is identifiable under suitable technical conditions on the data generation process. Simulation results illustrate the power of the proposed method in identifying indirect causal relations across various settings, and experimental results on real data suggest that the proposed model and method greatly extend the applicability of causal discovery based on functional causal models in nonlinear cases.

• #3621
Boosting Causal Embeddings via Potential Verb-Mediated Causal Patterns
Zhipeng Xie, Feiteng Mu
Action, Change and Causality

Existing approaches to causal embeddings rely heavily on hand-crafted high-precision causal patterns, leading to limited coverage. To solve this problem, this paper proposes a method to boost causal embeddings by exploring potential verb-mediated causal patterns. It first constructs a seed set of causal word pairs, then uses them as supervision to characterize the causal strengths of extracted verb-mediated patterns, and finally exploits the weighted extractions by those verb-mediated patterns in the construction of boosted causal embeddings. Experimental results have shown that the boosted causal embeddings outperform several state-of-the-arts significantly on both English and Chinese. As by-products, the top-ranked patterns coincide with human intuition about causality.

• #5703
From Statistical Transportability to Estimating the Effect of Stochastic Interventions
Juan D. Correa, Elias Bareinboim
Action, Change and Causality

Learning systems often face a critical challenge when applied to settings that differ from those under which they were initially trained. In particular, the assumption that both the source/training and the target/deployment domains follow the same causal mechanisms and observed distributions is commonly violated. This implies that the robustness and convergence guarantees usually expected from these methods are no longer attainable. In this paper, we study these violations through causal lens using the formalism of statistical transportability [Pearl and Bareinboim, 2011] (PB, for short). We start by proving sufficient and necessary graphical conditions under which a probability distribution observed in the source domain can be extrapolated to the target one, where strictly less data is available. We develop the first sound and complete procedure for statistical transportability, which formally closes the problem introduced by PB. Further, we tackle the general challenge of identification of stochastic interventions from observational data [Sec.~4.4, Pearl, 2000]. This problem has been solved in the context of atomic interventions using Pearl's do-calculus, which lacks complete treatment in the stochastic case. We prove completeness of stochastic identification by constructing a reduction of any instance of this problem to an instance of statistical transportability, closing the problem.

• #6337
ASP-based Discovery of Semi-Markovian Causal Models under Weaker Assumptions
Zhalama Zhalama, Jiji Zhang, Frederick Eberhardt, Wolfgang Mayer, Mark Junjie Li
Action, Change and Causality

In recent years the possibility of relaxing the so-called Faithfulness assumption in automated causal discovery has been investigated. The investigation showed (1) that the Faithfulness assumption can be weakened in various ways that in an important sense preserve its power, and (2) that weakening of Faithfulness may help to speed up methods based on Answer Set Programming. However, this line of work has so far only considered the discovery of causal models without latent variables. In this paper, we study weakenings of Faithfulness for constraint-based discovery of semi-Markovian causal models, which accommodate the possibility of latent variables, and show that both (1) and (2) remain the case in this more realistic setting.

• #10961
(Sister Conferences Best Papers Track) On Causal Identification under Markov Equivalence
Amin Jaber, Jiji Zhang, Elias Bareinboim
Action, Change and Causality

In this work, we investigate the problem of computing an experimental distribution from a combination of the observational distribution and a partial qualitative description of the causal structure of the domain under investigation. This description is given by a partial ancestral graph (PAG) that represents a Markov equivalence class of causal diagrams, i.e., diagrams that entail the same conditional independence model over observed variables, and is learnable from the observational data. Accordingly, we develop a complete algorithm to compute the causal effect of an arbitrary set of intervention variables on an arbitrary outcome set.

### Tuesday 1310:50 - 12:35NLP|NLP - Natural Language Processing 1 (R9)

Chair: TBD
• #1128
Leap-LSTM: Enhancing Long Short-Term Memory for Text Categorization
Ting Huang, Gehui Shen, Zhi-Hong Deng
Natural Language Processing 1

Recurrent Neural Networks (RNNs) are widely used in the field of natural language processing (NLP), ranging from text categorization to question answering and machine translation. However, RNNs generally read the whole text from beginning to end or vice versa sometimes, which makes it inefficient to process long texts. When reading a long document for a categorization task, such as topic categorization, large quantities of words are irrelevant and can be skipped. To this end, we propose Leap-LSTM, an LSTM-enhanced model which dynamically leaps between words while reading texts. At each step, we utilize several feature encoders to extract messages from preceding texts, following texts and the current word, and then determine whether to skip the current word. We evaluate Leap-LSTM on several text categorization tasks: sentiment analysis, news categorization, ontology classification and topic classification, with five benchmark data sets. The experimental results show that our model reads faster and predicts better than standard LSTM. Compared to previous models which can also skip words, our model achieves better trade-offs between performance and efficiency.

• #1860
Deep Mask Memory Network with Semantic Dependency and Context Moment for Aspect Level Sentiment Classification
Peiqin Lin, Meng Yang, Jianhuang Lai
Natural Language Processing 1

Aspect level sentiment classification aims at identifying the sentiment of each aspect term in a sentence. Deep memory networks often use location information between context word and aspect to generate the memory. Although improved results are achieved, the relation information among aspects in the same sentence is ignored and the word location can't bring enough and accurate information for the analysis on the aspect sentiment. In this paper, we propose a novel framework for aspect level sentiment classification, deep mask memory network with semantic dependency and context moment (DMMN-SDCM), which integrates semantic parsing information of the aspect and the inter-aspect relation information into deep memory network. With the designed attention mechanism based on semantic dependency information, different parts of the context memory in different computational layers are selected and useful inter-aspect information in the same sentence is exploited for the desired aspect. To make full use of the inter-aspect relation information, we also jointly learn a context moment learning task, which aims to learn the sentiment distribution of the entire sentence for providing a background for the desired aspect. We examined the merit of our model on SemEval 2014 Datasets, and the experimental results show that our model achieves a state-of-the-art performance.

• #3512
Robust Embedding with Multi-Level Structures for Link Prediction
Zihan Wang, Zhaochun Ren, Chunyu He, Peng Zhang, Yue Hu
Natural Language Processing 1

Knowledge Graph (KG) embedding has become crucial for the task of link prediction. Recent work applies encoder-decoder models to tackle this problem, where an encoder is formulated as a graph neural network (GNN) and a decoder is represented by an embedding method. These approaches enforce embedding techniques with structure information. Unfortunately, existing GNN-based frameworks still confront 3 severe problems: low representational power, stacking in a flat way, and poor robustness to noise. In this work, we propose a novel multi-level graph neural network (M-GNN) to address the above challenges. We first identify an injective aggregate scheme and design a powerful GNN layer using multi-layer perceptrons (MLPs). Then, we define graph coarsening schemes for various kinds of relations, and stack GNN layers on a series of coarsened graphs, so as to model hierarchical structures. Furthermore, attention mechanisms are adopted so that our approach can make predictions accurately even on the noisy knowledge graph. Results on WN18 and FB15k datasets show that our approach is effective in the standard link prediction task, significantly and consistently outperforming competitive baselines. Furthermore, robustness analysis on FB15k-237 dataset demonstrates that our proposed M-GNN is highly robust to sparsity and noise.

• #5698
Medical Concept Representation Learning from Multi-source Data
Tian Bai, Brian L. Egleston, Richard Bleicher, Slobodan Vucetic
Natural Language Processing 1

Representing words as low dimensional vectors is very useful in many natural language processing tasks. This idea has been extended to medical domain where medical codes listed in medical claims are represented as vectors to facilitate exploratory analysis and predictive modeling. However, depending on a type of a medical provider, medical claims can use medical codes from different ontologies or from a combination of ontologies, which complicates learning of the representations. To be able to properly utilize such multi-source medical claim data, we propose an approach that represents medical codes from different ontologies in the same vector space. We first modify the Pointwise Mutual Information (PMI) measure of similarity between the codes. We then develop a new negative sampling method for word2vec model that implicitly factorizes the modified PMI matrix. The new approach was evaluated on the code cross-reference problem, which aims at identifying similar codes across different ontologies. In our experiments, we evaluated cross-referencing between ICD-9 and CPT medical code ontologies. Our results indicate that vector representations of codes learned by the proposed approach provide superior cross-referencing when compared to several existing approaches.

• #3757
Graph-based Neural Sentence Ordering
Yongjing Yin, Linfeng Song, Jinsong Su, Jiali Zeng, Chulun Zhou, Jiebo Luo
Natural Language Processing 1

Sentence ordering is to restore the original paragraph from a set of sentences. It involves capturing global dependencies among sentences regardless of their input order. In this paper, we propose a novel and flexible graph-based neural sentence ordering model, which adopts graph recurrent network \citep{Zhang:acl18} to accurately learn semantic representations of the sentences. Instead of assuming connections between all pairs of input sentences, we use entities that are shared among multiple sentences to make more expressive graph representations with less noise. Experimental results show that our proposed model outperforms the existing state-of-the-art systems on several benchmark datasets, demonstrating the effectiveness of our model. We also conduct a thorough analysis on how entities help the performance. Our code is available at https://github.com/DeepLearnXMU/NSEG.git.

• #3096
Incorporating Structural Information for Better Coreference Resolution
Fang Kong, Fu Jian
Natural Language Processing 1

Coreference resolution plays an important role in text understanding. In the literature, various neural approaches have been proposed and achieved considerable success. However, structural information, which has been proven useful in coreference resolution, has been largely ignored in previous neural approaches. In this paper, we focus on effectively incorporating structural information to neural coreference resolution from three aspects. Firstly, nodes in the parse trees are employed as a constraint to filter out impossible text spans (i.e., mention candidates) in reducing the computational complexity. Secondly, contextual information is encoded in the traversal node sequence instead of the word sequence to better capture hierarchical information for text span representation. Lastly, additional structural features (e.g., the path, siblings, degrees, category of the current node) are encoded to enhance the mention representation. Experimentation on the data-set of the CoNLL 2012 Shared Task shows the effectiveness of our proposed approach in incorporating structural information into neural coreference resolution.

• #3033
Adapting BERT for Target-Oriented Multimodal Sentiment Classification
Jianfei Yu, Jing Jiang
Natural Language Processing 1

As an important task in Sentiment Analysis, Target-oriented Sentiment Classification (TSC) aims to identify sentiment polarities over each opinion target in a sentence. However, existing approaches to this task primarily rely on the textual content, but ignoring the other increasingly popular multimodal data sources (e.g., images), which can enhance the robustness of these text-based models. Motivated by this observation and inspired by the recently proposed BERT architecture, we study Target-oriented Multimodal Sentiment Classification (TMSC) and propose a multimodal BERT architecture. To model intra-modality dynamics, we first apply BERT to obtain target-sensitive textual representations. We then borrow the idea from self-attention and design a target attention mechanism to perform target-image matching to derive target-sensitive visual representations. To model inter-modality dynamics, we further propose to stack a set of self-attention layers to capture multimodal interactions. Experimental results show that our model can outperform several highly competitive approaches for TSC and TMSC.

### Tuesday 1310:50 - 12:35ML|KM - Kernel Methods (R10)

Chair: TBD
• #248
Exchangeability and Kernel Invariance in Trained MLPs
Russell Tsuchida, Fred Roosta, Marcus Gallagher
Kernel Methods

In the analysis of machine learning models, it is often convenient to assume that the parameters are IID. This assumption is not satisfied when the parameters are updated through training processes such as Stochastic Gradient Descent. A relaxation of the IID condition is a probabilistic symmetry known as exchangeability. We show the sense in which the weights in MLPs are exchangeable. This yields the result that in certain instances, the layer-wise kernel of fully-connected layers remains approximately constant during training. Our results shed light on such kernel properties throughout training while limiting the use of unrealistic assumptions.

• #1350
Deep Spectral Kernel Learning
Hui Xue, Zheng-Fan Wu, Wei-Xiang Sun
Kernel Methods

Recently, spectral kernels have attracted wide attention in complex dynamic environments. These advanced kernels mainly focus on breaking through the crucial limitation on locality, that is, the stationarity and the monotonicity. But actually, owing to the inefficiency of shallow models in computational elements, they are more likely unable to accurately reveal dynamic and potential variations. In this paper, we propose a novel deep spectral kernel network (DSKN) to naturally integrate non-stationary and non-monotonic spectral kernels into elegant deep architectures in an interpretable way, which can be further generalized to cover most kernels. Concretely, we firstly deal with the general form of spectral kernels by the inverse Fourier transform. Secondly, DSKN is constructed by embedding the preeminent spectral kernels into each layer to boost the efficiency in computational elements, which can effectively reveal the dynamic input-dependent characteristics and potential long-range correlations by compactly representing complex advanced concepts. Thirdly, detailed analyses of DSKN are presented. Owing to its universality, we propose a unified spectral transform technique to flexibly extend and reasonably initialize domain-related DSKN. Furthermore, the representer theorem of DSKN is given. Systematical experiments demonstrate the superiority of DSKN compared to state-of-the-art relevant algorithms on varieties of standard real-world tasks.

• #2312
GCN-LASE: Towards Adequately Incorporating Link Attributes in Graph Convolutional Networks
Ziyao Li, Liang Zhang, Guojie Song
Kernel Methods

Graph Convolutional Networks (GCNs) have proved to be a most powerful architecture in aggregating local neighborhood information for individual graph nodes. Low-rank proximities and node features are successfully leveraged in existing GCNs, however, attributes that graph links may carry are commonly ignored, as almost all of these models simplify graph links into binary or scalar values describing node connectedness. In our paper instead, links are reverted to hypostatic relationships between entities with descriptional attributes. We propose GCN-LASE (GCN with Link Attributes and Sampling Estimation), a novel GCN model taking both node and link attributes as inputs. To adequately captures the interactions between link and node attributes, their tensor product is used as neighbor features, based on which we define several graph kernels and further develop according architectures for LASE. Besides, to accelerate the training process, the sum of features in entire neighborhoods are estimated through Monte Carlo method, with novel  sampling strategies designed for LASE to minimize the estimation variance. Our experiments show that LASE outperforms strong baselines over various graph datasets, and further experiments corroborate the informativeness of link attributes and our model's ability of adequately leveraging them.

• #4091
High Dimensional Bayesian Optimization via Supervised Dimension Reduction
Miao Zhang, Huiqi Li, Steven Su
Kernel Methods

Bayesian optimization (BO) has been broadly applied to computational expensive problems, but it is still challenging to extend BO to high dimensions. Existing works are usually under strict assumption of an additive or a linear embedding structure for objective functions. This paper directly introduces a supervised dimension reduction method, Sliced Inverse Regression (SIR), to high dimensional Bayesian optimization, which could effectively learn the intrinsic sub-structure of objective function during the optimization. Furthermore, a kernel trick is developed to reduce computational complexity and learn nonlinear subset of the unknowing function when applying SIR to extremely high dimensional BO. We present several computational benefits and derive theoretical regret bounds of our algorithm. Extensive experiments on synthetic examples and two real applications demonstrate the superiority of our algorithms for high dimensional Bayesian optimization.

• #4362
Graph Space Embedding
João Pereira, Albert K. Groen, Erik S. G. Stroes, Evgeni Levin
Kernel Methods

We propose the \textit{Graph Space Embedding} (GSE), a technique that maps the input into a space where interactions are implicitly encoded, with little computations required. We provide theoretical results on an optimal regime for the GSE, namely a feasibility region for its parameters, and demonstrate the experimental relevance of our findings. Next, we introduce a strategy to gain insight on which interactions are responsible for the certain predictions, paving the way for a far more transparent model. In an empirical evaluation on a real-world clinical cohort containing patients with suspected coronary artery disease, the GSE achieves far better performance than traditional algorithms.

• #5315
Entangled Kernels
Riikka Huusari, Hachem Kadri
Kernel Methods

We consider the problem of operator-valued kernel learning and investigate the possibility of going beyond the well-known separable kernels. Borrowing tools and concepts from the field of quantum computing, such as partial trace and entanglement, we propose a new view on operator-valued kernels and define a general family of kernels that encompasses previously known operator-valued kernels, including separable and transformable kernels. Within this framework, we introduce another novel class of operator-valued kernels called entangled kernels that are not separable. We propose an efficient two-step algorithm for this framework, where the entangled kernel is learned based on a novel extension of kernel alignment to operator-valued kernels. The utility of the algorithm is illustrated on both artificial and real data.

• #5808
Multi-view Clustering via Late Fusion Alignment Maximization
Siwei Wang, Xinwang Liu, En Zhu, Chang Tang, Jiyuan Liu, Jingtao Hu, Jingyuan Xia, Jianping Yin
Kernel Methods

Multi-view clustering (MVC) optimally integrates complementary information from different views to improve clustering performance. Although demonstrating promising performance in many applications, we observe that most of existing methods directly combine multiple views to learn an optimal similarity for clustering. These methods would cause intensive computational complexity and over-complicated optimization. In this paper, we theoretically uncover the connection between existing k-means clustering and the alignment between base partitions and consensus partition. Based on this observation, we propose a simple but effective multi-view algorithm termed {Multi-view Clustering via Late Fusion Alignment Maximization (MVC-LFA)}. In specific, MVC-LFA proposes to maximally align the consensus partition with the weighted base partitions. Such a criterion is beneficial to significantly reduce the computational complexity and simplify the optimization procedure. Furthermore, we design a three-step iterative algorithm to solve the new resultant optimization problem with theoretically guaranteed convergence. Extensive experiments on five multi-view benchmark datasets demonstrate the effectiveness and efficiency of the proposed MVC-LFA.

### Tuesday 1310:50 - 12:35ML|C - Classification 1 (R11)

Chair: TBD
• #755
Learning Topic Models by Neighborhood Aggregation
Ryohei Hisano
Classification 1

Topic models are frequently used in machine learning owing to their high interpretability and modular structure. However, extending a topic model to include a supervisory signal, to incorporate pre-trained word embedding vectors and to include a nonlinear output function is not an easy task because one has to resort to a highly intricate approximate inference procedure. The present paper shows that topic modeling with pre-trained word embedding vectors can be viewed as implementing a neighborhood aggregation algorithm where messages are passed through a network defined over words. From the network view of topic models, nodes correspond to words in a document and edges correspond to either a relationship describing co-occurring words in a document or a relationship describing the same word in the corpus. The network view allows us to extend the model to include supervisory signals, incorporate pre-trained word embedding vectors and include a nonlinear output function in a simple manner. In experiments, we show that our approach outperforms the state-of-the-art supervised Latent Dirichlet Allocation implementation in terms of held-out document classification tasks.

• #2676
Partial Label Learning with Unlabeled Data
Qian-Wei Wang, Yu-Feng Li, Zhi-Hua Zhou
Classification 1

Partial label learning deals with training examples each associated with a set of candidate labels, among which only one label is valid. Previous studies typically assume that the candidate label sets are provided for all training examples. In many real-world applications such as video character classification, however, it is generally difficult to label a large number of instances and there exists much data left to be unlabeled. We call this kind of problem semi-supervised partial label learning. In this paper, we propose the SSPL method to address this problem. Specifically, an iterative label propagation procedure between partial label examples and unlabeled instances is employed to disambiguate the candidate label sets of partial label examples as well as assign valid labels to unlabeled instances. The importance of unlabeled instances increases adaptively as the number of iteration increases, since they carry richer labeling information. Finally, unseen instances are classified based on the minimum reconstruction error on both partial label and unlabeled instances. Experiments on real-world data sets clearly validate the effectiveness of the proposed SSPL method.

• #3265
Zero-shot Learning with Many Classes by High-rank Deep Embedding Networks
Yuchen Guo, Guiguang Ding, Jungong Han, Hang Shao, Xin Lou, Qionghai Dai
Classification 1

Zero-shot learning (ZSL) is a recently emerging research topic which aims to build classification models for unseen classes with knowledge from auxiliary seen classes. Though many ZSL works have shown promising results on small-scale datasets by utilizing a bilinear compatibility function, the ZSL performance on large-scale datasets with many classes (say, ImageNet) is still unsatisfactory. We argue that the bilinear compatibility function is a low-rank approximation of the true compatibility function such that it is not expressive enough especially when there are a large number of classes because of the rank limitation. To address this issue, we propose a novel approach, termed as High-rank Deep Embedding Networks (GREEN), for ZSL with many classes. In particular, we propose a feature-dependent mixture of softmaxes as the image-class compatibility function, which is a simple extension of the bilinear compatibility function, but yields much better results. It utilizes a mixture of non-linear transformations with feature-dependent latent variables to approximate the true function in a high-rank way, which makes GREEN more expressive. Experiments on several datasets including ImageNet demonstrate GREEN significantly outperforms the state-of-the-art approaches.

• #4379
Submodular Batch Selection for Training Deep Neural Networks
K J Joseph, Vamshi Teja R, Krishnakant Singh, Vineeth N Balasubramanian
Classification 1

Mini-batch gradient descent based methods are the de facto algorithms for training neural network architectures today.We introduce a mini-batch selection strategy based on submodular function maximization. Our novel submodular formulation captures the informativeness of each sample and diversity of the whole subset. We design an efficient, greedy algorithm which can give high-quality solutions to this NP-hard combinatorial optimization problem. Our extensive experiments on standard datasets show that the deep models trained using the proposed batch selection strategy provide better generalization than Stochastic Gradient Descent as well as a popular baseline sampling strategy across different learning rates, batch sizes, and distance metrics.

• #6379
Extensible Cross-Modal Hashing
Tian-yi Chen, Lan Zhang, Shi-cong Zhang, Zi-long Li, Bai-chuan Huang
Classification 1

Cross-modal hashing (CMH) models are introduced to significantly reduce the cost of large-scale cross-modal data retrieval systems. In many real-world applications, however, data of new categories arrive continuously, which requires the model has good extensibility. That is the model should be updated to accommodate data of new categories but still retain good performance for the old categories with minimum computation cost. Unfortunately, existing CMH methods fail to satisfy the extensibility requirements. In this work, we propose a novel extensible cross-modal hashing (ECMH) to enable highly efficient and low-cost model extension. Our proposed ECMH has several desired features: 1) it has good forward compatibility, so there is no need to update old hash codes; 2) the ECMH model is extended to support new data categories using only new data by a well-designed weak constraint incremental learning'' algorithm, which saves up to 91\% time cost comparing with retraining the model with both new and old data; 3) the extended model achieves high precision and recall on both old and new tasks. Our extensive experiments show the effectiveness of our design.

• #4735
Semi-supervised User Profiling with Heterogeneous Graph Attention Networks
Weijian Chen, Yulong Gu, Zhaochun Ren, Xiangnan He, Hongtao Xie, Tong Guo, Dawei Yin, Yongdong Zhang
Classification 1

Aiming to represent user characteristics and personal interests, the task of user profiling is playing an increasingly important role for many real-world applications, e.g., e-commerce and social networks platforms. By exploiting the data like texts and user behaviors, most existing solutions address user profiling as a classification task, where each user is formulated as an individual data instance. Nevertheless, a user's profile is not only reflected from her/his affiliated data, but also can be inferred from other users, e.g., the users that have similar co-purchase behaviors in e-commerce, the friends in social networks, etc. In this paper, we approach user profiling in a semi-supervised manner, developing a generic solution based on heterogeneous graph learning. In the graph, nodes represent the entities of interest (e.g., users, items, attributes of items, etc.), and edges represent the interactions between entities. Our heterogeneous graph attention network (HGAT) method learns the representation for each entity by accounting for the graph structure, and exploits the attention mechanism to discriminate the importance of each neighbor entity. Through such a learning scheme, HGAT can leverage both unsupervised information and limited labels of users to build the predictor. Extensive experiments on a real-world e-commerce dataset verify the effectiveness and rationality of our HGAT for user profiling.

• #307
Deterministic Routing between Layout Abstractions for Multi-Scale Classification of Visually Rich Documents
Ritesh Sarkhel, Arnab Nandi
Classification 1

Classifying heterogeneous visually rich documents is a challenging task. Difficulty of this task increases even more if the maximum allowed inference turnaround time is constrained by a threshold. The increased overhead in inference cost, compared to the limited gain in classification capabilities make current multi-scale approaches infeasible in such scenarios. There are two major contributions of this work. First, we propose a spatial pyramid model to extract highly discriminative multi-scale feature descriptors from a visually rich document by leveraging the inherent hierarchy of its layout. Second, we propose a deterministic routing scheme for accelerating end-to-end inference by utilizing the spatial pyramid model. A depth-wise separable multi-column convolutional network is developed to enable our method. We evaluated the proposed approach on four publicly available, benchmark datasets of visually rich documents. Results suggest that our proposed approach demonstrates robust performance compared to the state-of-the-art methods in both classification accuracy and total inference turnaround.

### Tuesday 1310:50 - 12:35ML|DM - Data Mining 1 (R12)

Chair: TBD
• #708
Inferring Substitutable Products with Deep Network Embedding
Shijie Zhang, Hongzhi Yin, Qinyong Wang, Tong Chen, Hongxu Chen, Quoc Viet Hung Nguyen
Data Mining 1

On E-commerce platforms, understanding the relationships (e.g., substitute and complement) among products from user's explicit feedback, such as users' online transactions, is of great importance to boost extra sales. However, the significance of such relationships is usually neglected by existing recommender systems. In this paper, we propose a semisupervised deep embedding model, namely, Substitute Products Embedding Model (SPEM), which models the substitutable relationships between products by preserving the second-order proximity, negative first-order proximity and semantic similarity in a product co-purchasing graph based on user's purchasing behaviours. With SPEM, the learned representations of two substitutable products align closely in the latent embedding space. Extensive experiments on real-world datasets are conducted, and the results verify that our model outperforms state-of-the-art baselines.

• #1663
Low-Bit Quantization for Attributed Network Representation Learning
Hong Yang, Shirui Pan, Ling Chen, Chuan Zhou, Peng Zhang
Data Mining 1

Attributed network embedding plays an important role in transferring network data into compact vectors for effective network analysis. Existing attributed network embedding models are designed either in continuous Euclidean spaces which introduce data redundancy or in binary coding spaces which incur significant loss of representation accuracy. To this end, we present a new Low-Bit Quantization for Attributed Network Representation Learning model (LQANR for short) that can learn compact node representations with low bitwidth values while preserving high representation accuracy. Specifically, we formulate a new representation learning function based on matrix factorization that can jointly learn the low-bit node representations and the layer aggregation weights under the low-bit quantization constraint. Because the new learning function falls into the category of mixed integer optimization, we propose an efficient mixed-integer based alternating direction method of multipliers (ADMM) algorithm as the solution. Experiments on real-world node classification and link prediction tasks validate the promising results of the proposed LQANR model.

• #2991
iDev: Enhancing Social Coding Security by Cross-platform User Identification Between GitHub and Stack Overflow
Yujie Fan, Yiming Zhang, Shifu Hou, Lingwei Chen, Yanfang Ye, Chuan Shi, Liang Zhao, Shouhuai Xu
Data Mining 1

As modern social coding platforms such as GitHub and Stack Overflow become increasingly popular, their potential security risks increase as well (e.g., risky or malicious codes could be easily embedded and distributed). To enhance the social coding security, in this paper, we propose to automate cross-platform user identification between GitHub and Stack Overflow to combat the attackers who attempt to poison the modern software programming ecosystem. To solve this problem, an important insight brought by this work is to leverage social coding properties in addition to user attributes for cross-platform user identification. To depict users in GitHub and Stack Overflow (attached with attributed information), projects, questions and answers as well as the rich semantic relations among them, we first introduce an attributed heterogeneous information network (AHIN) for modeling. Then, we propose a novel AHIN representation learning model AHIN2Vec to efficiently learn node (i.e., user) representations in AHIN for cross-platform user identification. Comprehensive experiments on the data collections from GitHub and Stack Overflow are conducted to validate the effectiveness of our developed system iDev integrating our proposed method in cross-platform user identification by comparisons with other baselines.

• #3985
Outlier-Robust Multi-Aspect Streaming Tensor Completion and Factorization
Mehrnaz Najafi, Lifang He, Philip S. Yu
Data Mining 1

With the increasing popularity of streaming tensor data such as videos and audios, tensor factorization and completion have attracted much attention recently in this area. Existing work usually assume that streaming tensors only grow in one mode. However, in many real-world scenarios, tensors may grow in multiple modes (or dimensions), i.e., multi-aspect streaming tensors. Standard streaming methods cannot directly handle this type of data elegantly. Moreover, due to inevitable system errors, data may be contaminated by outliers, which cause significant deviations from real data values and make such research particularly challenging. In this paper, we propose a novel method for Outlier-Robust Multi-Aspect Streaming Tensor Completion and Factorization (OR-MSTC), which is a technique capable of dealing with missing values and outliers in multi-aspect streaming tensor data. The key idea is to decompose the tensor structure into an underlying low-rank clean tensor and a structured-sparse error (outlier) tensor, along with a weighting tensor to mask missing data. We also develop an efficient algorithm to solve the non-convex and non-smooth optimization problem of OR-MSTC. Experimental results on various real-world datasets show the superiority of the proposed method over the baselines and its robustness against outliers.

• #5069
Convolutional Gaussian Embeddings for Personalized Recommendation with Uncertainty
Junyang Jiang, Deqing Yang, Yanghua Xiao, Chenlu Shen
Data Mining 1

Most of existing embedding based recommendation models use embeddings (vectors) to represent users and items which contain latent features of users and items. Each of such embeddings corresponds to a single fixed point in low-dimensional space, thus fails to precisely represent the users/items with uncertainty which are often observed in recommender systems. Addressing this problem, we propose a unified deep recommendation framework employing Gaussian embeddings, which are proven adaptive to uncertain preferences exhibited by some users, resulting in better user representations and recommendation performance. Furthermore, our framework adopts Monte-Carlo sampling and convolutional neural networks to compute the correlation between the objective user and the candidate item, based on which precise recommendations are achieved. Our extensive experiments on two benchmark datasets not only justify that our proposed Gaussian embeddings capture the uncertainty of users very well, but also demonstrate its superior performance over the state-of-the-art recommendation models.

• #609
Fine-grained Event Categorization with Heterogeneous Graph Convolutional Networks
Hao Peng, Jianxin Li, Qiran Gong, Yangqiu Song, Yuanxin Ning, Kunfeng Lai, Philip S. Yu
Data Mining 1

Events are happening in real-world and real-time, which can be planned and organized occasions involving multiple people and objects. Social media platforms publish a lot of text messages containing public events with comprehensive topics. However, mining social events is challenging due to the heterogeneous event elements in texts and explicit and implicit social network structures. In this paper, we design an event meta-schema to characterize the semantic relatedness of social events and build an event-based heterogeneous information network (HIN) integrating information from external knowledge base, and propose a novel Pairwise Popularity Graph Convolutional Network (PP-GCN) based fine-grained social event categorization model. We propose a Knowledgeable meta-paths Instances based social Event Similarity (KIES) between events and build a weighted adjacent matrix as input to the PP-GCN model. Comprehensive experiments on real data collections are conducted to compare various social event detection and clustering tasks. Experimental results demonstrate that our proposed framework outperforms other alternative social event categorization techniques.

• #1716
Topology Optimization based Graph Convolutional Network
Liang Yang, Zesheng Kang, Xiaochun Cao, Di Jin, Bo Yang, Yuanfang Guo
Data Mining 1

In the past few years, semi-supervised node classification in attributed network has been developed rapidly. Inspired by the success of deep learning, researchers adopt the convolutional neural network to develop the Graph Convolutional Networks (GCN), and they have achieved surprising classification accuracy by considering the topological information and employing the fully connected network (FCN). However, the given network topology may also induce a performance degradation if it is directly employed in classification, because it may possess high sparsity and certain noises. Besides, the lack of learnable filters in GCN also limits the performance. In this paper, we propose a novel Topology Optimization based Graph Convolutional Networks (TO-GCN) to fully utilize the potential information by jointly refining the network topology and learning the parameters of the FCN. According to our derivations, TO-GCN is more flexible than GCN, in which the filters are fixed and only the classifier can be updated during the learning process. Extensive experiments on real attributed networks demonstrate the superiority of the proposed TO-GCN against the state-of-the-art approaches.

### Tuesday 1310:50 - 12:35AMS|MP - Multi-agent Planning (R13)

Chair: TBD
• #344
Multi-Agent Pathfinding with Continuous Time
Anton Andreychuk, Konstantin Yakovlev, Dor Atzmon, Roni Stern
Multi-agent Planning

Multi-Agent Pathfinding (MAPF) is the problem of finding paths for multiple agents such that every agent reaches its goal and the agents do not collide. Most prior work on MAPF were on grids, assumed agents' actions have uniform duration, and that time is discretized into timesteps. In this work, we propose a MAPF algorithm that do not assume any of these assumptions, is complete, and provides provably optimal solutions. This algorithm is based on a novel combination of Safe Interval Path Planning (SIPP), a continuous time single agent planning algorithms, and Conflict-Based Search (CBS). We analyze this algorithm, discuss its pros and cons, and evaluate it experimentally on several standard benchmarks.

• #2201
Priority Inheritance with Backtracking for Iterative Multi-agent Path Finding
Keisuke Okumura, Manao Machida, Xavier Défago, Yasumasa Tamura
Multi-agent Planning

The Multi-agent Path Finding (MAPF) problem consists in all agents having to move to their own destinations while avoiding collisions. In practical applications to the problem, such as for navigation in an automated warehouse, MAPF must be solved iteratively. We present here a novel approach to iterative MAPF, that we call Priority Inheritance with Backtracking (PIBT). PIBT gives a unique priority to each agent every timestep, so that all movements are prioritized. Priority inheritance, which aims at dealing effectively with priority inversion in path adjustment within a small time window, can be applied iteratively and a backtracking protocol prevents agents from being stuck. We prove that, regardless of their number, all agents are guaranteed to reach their destination within finite time, when the environment is a graph such that all pairs of adjacent nodes belong to a simple cycle of length 3 or more (e.g., biconnected). Our implementation of PIBT can be fully decentralized without global communication. Experimental results over various scenarios confirm that PIBT is adequate both for finding paths in large environments with many agents, as well as for conveying packages in an automated warehouse.

• #2971
Improved Heuristics for Multi-Agent Path Finding with Conflict-Based Search
Jiaoyang Li, Ariel Felner, Eli Boyarski, Hang Ma, Sven Koenig
Multi-agent Planning

Conflict-Based Search (CBS) and its enhancements are among the strongest algorithms for Multi-Agent Path Finding. Recent work introduced an admissible heuristic to guide the high-level search of CBS. In this work, we prove the limitation of this heuristic, as it is based on cardinal conflicts only. We then introduce two new admissible heuristics by reasoning about the pairwise dependencies between agents. Empirically, CBS with either new heuristic significantly improves the success rate over CBS with the recent heuristic and reduces the number of expanded nodes and runtime by up to a factor of 50.

• #5401
Multi-Robot Planning Under Uncertain Travel Times and Safety Constraints
Masoumeh Mansouri, Bruno Lacerda, Nick Hawes, Federico Pecora
Multi-agent Planning

We present a novel modelling and planning approach for multi-robot systems under uncertain travel times. The approach uses generalised stochastic Petri nets (GSPNs) to model desired team behaviour, and allows to specify safety constraints and rewards. The GSPN is interpreted as a Markov decision process (MDP) for which we can generate policies that optimise the requirements. This representation is more compact than the equivalent multi-agent MDP, allowing us to scale better. Furthermore, it naturally allows for asynchronous execution of the generated policies across the robots, yielding smoother team behaviour. We also describe how the integration of the GSPN with a lower-level team controller allows for accurate expectations on team performance. We evaluate our approach on an industrial scenario, showing that it outperforms hand-crafted policies used in current practice.

• #10985
(Journal track) Implicitly Coordinated Multi-Agent Path Finding under Destination Uncertainty: Success Guarantees and Computational Complexity
Bernhard Nebel, Thomas Bolander, Thorsten Engesser, Robert Mattmüller
Multi-agent Planning

In multi-agent path finding, it is usually assumed that planning is performed centrally and that the destinations of the agents are common knowledge. We will drop both assumptions and analyze under which conditions it can be guaranteed that the agents reach their respective destinations using implicitly coordinated plans without communication.

• #2916
Unifying Search-based and Compilation-based Approaches to Multi-agent Path Finding through Satisfiability Modulo Theories
Pavel Surynek
Multi-agent Planning

We unify search-based and compilation-based approaches to multi-agent path finding (MAPF) through satisfiability modulo theories (SMT). The task in MAPF is to navigate agents in an undirected graph to given goal vertices so that they do not collide. We rephrase Conflict-Based Search (CBS), one of the state-of-the-art algorithms for optimal MAPF solving, in the terms of SMT. This idea combines SAT-based solving known from MDD-SAT, a SAT-based optimal MAPF solver, at the low-level with conflict elimination of CBS at the high-level. Where the standard CBS branches the search after a conflict, we refine the propositional model with a disjunctive constraint. Our novel algorithm called SMT-CBS hence does not branch at the high-level but incrementally extends the propositional model. We experimentally compare SMT-CBS with CBS, ICBS, and MDD-SAT.

• #2472
Reachability Games in Dynamic Epistemic Logic
Bastien Maubert, Sophie Pinchinat, François Schwarzentruber
Multi-agent Planning

We define reachability games based on Dynamic Epistemic Logic (DEL), where the players? actions are finely described as DEL action models. We first consider the setting where a controller with perfect information interacts with an environment and aims at reaching some desired state of knowledge regarding the observers of the system. We study the problem of existence of a strategy for the controller, which generalises the classic epistemic planning problem, and we solve it for several types of actions such as public announcements and public actions. We then consider a yet richer setting where observers themselves are players, whose strategies must be based on their observations. We establish several decidability and undecidability results for the problem of existence of a distributed strategy, depending on the type of actions the players can use, and relate them to results from the literature on multiplayer games with imperfect information.

### Tuesday 1310:50 - 12:35DemoT1 - Demo Talks 1 (R14)

Chair: TBA
• #11022
Fair and Explainable Dynamic Engagement of Crowd Workers
Han Yu, Yang Liu, Xiguang Wei, Chuyu Zheng, Tianjian Chen, Qiang Yang, Xiong Peng
Demo Talks 1

Years of rural-urban migration has resulted in a significant population in China seeking ad-hoc work in large urban centres. At the same time, many businesses face large fluctuations in demand for manpower and require more efficient ways to satisfy such demands. This paper outlines AlgoCrowd, an artificial intelligence (AI)-empowered algorithmic crowdsourcing platform. Equipped with an efficient explainable task-worker matching optimization approach designed to focus on fair treatment of workers while maximizing collective utility, the platform provides explicable task recommendations to workers' personal work management mobile apps which are becoming popular, with the aim to address the above societal challenge.

• #11024
Multi-Agent Visualization for Explaining Federated Learning
Xiguang Wei, Quan Li, Yang Liu, Han Yu, Tianjian Chen, Qiang Yang
Demo Talks 1

As an alternative decentralized training approach, Federated Learning enables distributed agents to collaboratively learn a machine learning model while keeping personal/private information on local devices. However, one significant issue of this framework is the lack of transparency, thus obscuring understanding of the working mechanism of Federated Learning systems. This paper proposes a multi-agent visualization system that illustrates what is Federated Learning and how it supports multi-agents coordination. To be specific, it allows users to participate in the Federated Learning empowered multi-agent coordination. The input and output of Federated Learning are visualized simultaneously, which provides an intuitive explanation of Federated Learning for users in order to help them gain deeper understanding of the technology.

• #11028
AiD-EM: Adaptive Decision Support for Electricity Markets Negotiations
Tiago Pinto, Zita Vale
Demo Talks 1

This paper presents the Adaptive Decision Support for Electricity Markets Negotiations (AiD-EM) system. AiD-EM is a multi-agent system that provides decision support to market players by incorporating multiple sub-(agent-based) systems, directed to the decision support of specific problems. These sub-systems make use of different artificial intelligence methodologies, such as machine learning and evolutionary computing, to enable players adaptation in the planning phase and in actual negotiations in auction-based markets and bilateral negotiations. AiD-EM demonstration is enabled by its connection to MASCEM (Multi-Agent Simulator of Competitive Electricity Markets).

• #11032
Embodied Conversational AI Agents in a Multi-modal Multi-agent Competitive Dialogue
Rahul Divekar, Xiangyang Mou, Lisha Chen, Maíra Gatti de Bayser, Melina Alberio Guerra, Hui Su
Demo Talks 1

In a setting where two AI agents embodied as animated humanoid avatars are engaged in a conversation with one human and each other, we see two challenges. One, determination by the AI agents about which one of them is being addressed. Two, determination by the AI agents if they may/could/should speak at the end of a turn. In this work we bring these two challenges together and explore the participation of AI agents in multi-party conversations. Particularly, we show two embodied AI shopkeeper agents who sell similar items aiming to get the business of a user by competing with each other on the price. In this scenario, we solve the first challenge by using headpose (estimated by deep learning techniques) to determine who the user is talking to. For the second challenge we use deontic logic to model rules of a negotiation conversation.

• #11043
Multi-Agent Path Finding on Ozobots
Roman Barták, Ivan Krasičenko, Jiří Švancara
Demo Talks 1

Multi-agent path finding (MAPF) is the problem to find collision-free paths for a set of agents (mobile robots) moving on a graph. There exists several abstract models describing the problem with various types of constraints. The demo presents software to evaluate the abstract models when the plans are executed on Ozobots, small mobile robots developed for teaching programming. The software allows users to design the grid-like maps, to specify initial and goal locations of robots, to generate plans using various abstract models implemented in the Picat programming language, to simulate and to visualise execution of these plans, and to translate the plans to command sequences for Ozobots.

• #11050
Reagent: Converting Ordinary Webpages into Interactive Software Agents
Matthew Peveler, Jeffrey Kephart, Hui Su
Demo Talks 1

Reagent is a technology that readily converts ordinary webpages containing structured content into software agents, allowing for natural interaction through a combination of speech and pointing. Without needing special instrumentation on the part of website operators, Reagent does this through injection of a transparent layer that automatically captures semantic details and mouse interactions. It can then combine this content and interactions with text transcriptions of user speech to derive and execute parameterized commands representing human intent. For content that the system cannot automatically understand, it gives an easy interface for a human user to assist it in helping to build a mapping between the content and domain specific vocabulary.

• #11029
Deep Reinforcement Learning for Ride-sharing Dispatching and Repositioning
Zhiwei (Tony) Qin, Xiaocheng Tang, Yan Jiao, Fan Zhang, Chenxi Wang, Qun (Tracy) Li
Demo Talks 1

In this demo, we will present a simulation-based human-computer interaction of deep reinforcement learning in action on order dispatching and driver repositioning for ride-sharing.  Specifically, we will demonstrate through several specially designed domains how we use deep reinforcement learning to train agents (drivers) to have longer optimization horizon and to cooperate to achieve higher objective values collectively.

• #11041
Contextual Typeahead Sticker Suggestions on Hike Messenger
Mohamed Hanoosh, Abhishek Laddha, Debdoot Mukherjee
Demo Talks 1

In this demonstration, we present Hike's sticker recommendation system, which helps users choose the right sticker to substitute the next message that they intend to send in a chat. We describe how the system addresses the issue of numerous orthographic variations for chat messages and operates under 20 milliseconds with low CPU and memory footprint on device.

• #11023
InterSpot: Interactive Spammer Detection in Social Media
Kaize Ding, Jundong Li, Shivam Dhar, Shreyash Devan, Huan Liu
Demo Talks 1

Spammer detection in social media has recently received increasing attention due to the rocketing growth of user-generated data. Despite the empirical success of existing systems, spammers may continuously evolve over time to impersonate normal users while new types of spammers may also emerge to combat with the current detection system, leading to the fact that a built system will gradually lose its efficacy in spotting spammers. To address this issue, grounded on the contextual bandit model, we present a novel system for conducting interactive spammer detection. We demonstrate our system by showcasing the interactive learning process, which allows the detection model to keep optimizing its detection strategy through incorporating the feedback information from human experts.

### Tuesday 1310:50 - 12:35Early Career 1 - Early Career Spotlight 1 (R15)

Chair: TBD
• #11058
From Data to Knowledge Engineering for Cybersecurity
Gerardo I. Simari
Early Career Spotlight 1

Data present in a wide array of platforms that are part of today's information systems lies at the foundation of many decision making processes, as we have now come to depend on social media, videos, news, forums, chats, ads, maps, and many other data sources for our daily lives. In this article, we first discuss how such data sources are involved in threats to systems' integrity, and then how they can be leveraged along with knowledge-based tools to tackle a set of challenges in the cybersecurity domain. Finally, we present a brief discussion of our roadmap for research and development in the near future to address the set of ever-evolving cyber threats that our systems face every day.

• #11060
The Quest For "Always-On" Autonomous Mobile Robots
Joydeep Biswas
Early Career Spotlight 1

Building always-on'' robots to be deployed over extended periods of time in real human environments is challenging for several reasons. Some fundamental questions that arise in the process include: 1) How can the robot reconcile unexpected differences between its observations and its outdated map of the world? 2) How can we scalably test robots for long-term autonomy? 3) Can a robot learn to predict its own failures, and their corresponding causes? 4) When the robot fails and is unable to recover autonomously, can it utilize partially specified, approximate human corrections to overcome its failures? We summarize our research towards addressing all of these questions. We present 1) Episodic non-Markov Localization to maintain the belief of the robot's location while explicitly reasoning about unmapped observations; 2) a 1,000km challenge to test for long-term autonomy; 3) feature-based and learning-based approaches to predicting failures; and 4) human-in-the-loop SLAM to overcome robot mapping errors, and SMT-based robot transition repair to overcome state machine failures.

• #11055
Multiagent Decision Making and Learning in Urban Environments
Akshat Kumar
Early Career Spotlight 1

Our increasingly interconnected urban environments provide several opportunities to deploy intelligent agents---from self-driving cars, ships to aerial drones---that promise to radically improve productivity and safety. Achieving coordination among agents in such urban settings presents several algorithmic challenges---ability to scale to thousands of agents, addressing uncertainty, and partial observability in the environment. In addition, accurate domain models need to be learned from data that is often noisy and available only at an aggregate level. In this paper, I will overview some of our recent contributions towards developing planning and reinforcement learning strategies to address several such challenges present in large-scale urban multiagent systems.

• #11056
What does the evidence say? Models to help make sense of the biomedical literature
Byron Wallace
Early Career Spotlight 1

How do we know if a particular medical intervention actually works better than the alternatives for a given condition and outcome? Ideally one would consult all available evidence from relevant trials that have been conducted to answer this question. Unfortunately, however, such results are primarily disseminated in natural language articles that describe the conduct and results of clinical trials. This imposes substantial burden on physicians and other domain experts trying to make sense of the evidence. In this talk I will discuss work on designing tasks, corpora, and models that aim to realize natural language technologies that can extract key attributes of clinical trials from articles describing them, and infer the reported findings regarding these. The hope is to use such methods to help domain experts (such as physicians) access and make sense of unstructured biomedical evidence. More specifically, I will discuss models to automatically extract the trial population characteristics (e.g., conditions), interventions/comparators (treatments), and outcomes studied in a given clinical trial; together these "PICO" elements compose well-formed clinical questions. I will also introduce models for automatically assessing the reliability of the evidence reported in a particular article, namely by predicting the statistical "risks of bias" of the underlying trial (e.g., due to poor design) described by a report, and extracting snippets (rationales') supporting these. Finally, I will present ongoing work on approaches to inferring the comparative effectiveness of a given treatment, as compared to a specified comparator, and with respect to a particular outcome of interest.

### Tuesday 1314:00 - 14:50Invited Talk (R1)

• Reasoning About The Behavior of AI Systems
Adnan Darwiche
Invited Talk
• ### Tuesday 1315:00 - 16:00ST: Human AI & ML 1 - Special Track on Human AI and Machine Learning 1 (R2)

Chair: TBD
• #1462
Playgol: Learning Programs Through Play
Andrew Cropper
Special Track on Human AI and Machine Learning 1

Children learn though play. We introduce the analogous idea of learning programs through play. In this approach, a program induction system (the learner) is given a set of user-supplied build tasks and initial background knowledge (BK). Before solving the build tasks, the learner enters an unsupervised playing stage where it creates its own play tasks to solve, tries to solve them, and saves any solutions (programs) to the BK. After the playing stage is finished, the learner enters the supervised building stage where it tries to solve the build tasks and can reuse solutions learnt whilst playing. The idea is that playing allows the learner to discover reusable general programs on its own which can then help solve the build tasks. We claim that playing can improve learning performance. We show that playing can reduce the textual complexity of target concepts which in turn reduces the sample complexity of a learner. We implement our idea in Playgol, a new inductive logic programming system. We experimentally test our claim on two domains: robot planning and real-world string transformations. Our experimental results suggest that playing can substantially improve learning performance.

• #1545
EL Embeddings: Geometric Construction of Models for the Description Logic EL++
Maxat Kulmanov, Wang Liu-Wei, Yuan Yan, Robert Hoehndorf
Special Track on Human AI and Machine Learning 1

An embedding is a function that maps entities from one algebraic structure into another while preserving certain characteristics. Embeddings are being used successfully for mapping relational data or text into vector spaces where they can be used for machine learning, similarity search, or similar tasks. We address the problem of finding vector space embeddings for theories in the Description Logic 𝓔𝓛⁺⁺ that are also models of the TBox. To find such embeddings, we define an optimization problem that characterizes the model-theoretic semantics of the operators in 𝓔𝓛⁺⁺ within ℝⁿ, thereby solving the problem of finding an interpretation function for an 𝓔𝓛⁺⁺ theory given a particular domain Δ. Our approach is mainly relevant to large 𝓔𝓛⁺⁺ theories and knowledge bases such as the ontologies and knowledge graphs used in the life sciences. We demonstrate that our method can be used for improved prediction of protein--protein interactions when compared to semantic similarity measures or knowledge graph embeddings.

• #5010
A comparative study of distributional and symbolic paradigms for relational learning
Sebastijan Dumancic, Alberto Garcia-Duran, Mathias Niepert
Special Track on Human AI and Machine Learning 1

Many real-world domains can be expressed as graphs and, more generally, as multi-relational knowledge graphs. Though reasoning and learning with knowledge graphs has traditionally been addressed by symbolic approaches such as Statistical relational learning, recent methods in (deep) representation learning have shown promising results for specialised tasks such as knowledge base completion. These approaches, also known as distributional, abandon the traditional symbolic paradigm by replacing symbols with vectors in Euclidean space. With few exceptions, symbolic and distributional approaches are explored in different communities and little is known about their respective strengths and weaknesses. In this work, we compare distributional and symbolic relational learning approaches on various standard relational classification and knowledge base completion tasks. Furthermore, we analyse the properties of the datasets and relate them to the performance of the methods in the comparison. The results reveal possible indicators that could help in choosing one approach over the other for particular knowledge graphs.

• #5666
Synthesizing Datalog Programs using Numerical Relaxation
Xujie Si, Mukund Raghothaman, Kihong Heo, Mayur Naik
Special Track on Human AI and Machine Learning 1

The problem of learning logical rules from examples arises in diverse fields, including program synthesis, logic programming, and machine learning. Existing approaches either involve solving computationally difficult combinatorial problems, or performing parameter estimation in complex statistical models. In this paper, we present Difflog, a technique to extend the logic programming language Datalog to the continuous setting. By attaching real-valued weights to individual rules of a Datalog program, we naturally associate numerical values with individual conclusions of the program. Analogous to the strategy of numerical relaxation in optimization problems, we can now first determine the rule weights which cause the best agreement between the training labels and the induced values of output tuples, and subsequently recover the classical discrete-valued target program from the continuous optimum. We evaluate Difflog on a suite of 34~benchmark problems from recent literature in knowledge discovery, formal verification, and database query-by-example, and demonstrate significant improvements in learning complex programs with recursive rules, invented predicates, and relations of arbitrary arity.

### Tuesday 1315:00 - 16:00ML|DL - Deep Learning 2 (R3)

Chair: TBD
• #2033
Position Focused Attention Network for Image-Text Matching
Yaxiong Wang, Hao Yang, Xueming Qian, Lin Ma, Jing Lu, Biao Li, Xin Fan
Deep Learning 2

Image-text matching tasks have recently attracted a lot of attention in the computer vision field. The key point of this cross-domain problem is how to accurately measure the similarity between the visual and the textual contents, which demands a fine understanding of both modalities. In this paper, we propose a novel position focused attention network (PFAN) to investigate the relation between the visual and the textual views. In this work, we integrate the object position clue to enhance the visual-text joint-embedding learning. We first split the images into blocks, by which we infer the relative position of region in the image. Then, an attention mechanism is proposed to model the relations between the image region and blocks and generate the valuable position feature, which will be further utilized to enhance the region expression and model a more reliable relationship between the visual image and the textual sentence. Experiments on the popular datasets Flickr30K and MS-COCO show the effectiveness of the proposed method. Besides the public datasets, we also conduct experiments on our collected practical news dataset (Tencent-News) to validate the practical application value of proposed method. As far as we know, this is the first attempt to test the performance on the practical application. Our method can achieve the state-of-art performance on all of these three datasets.

• #3819
Hierarchical Representation Learning for Bipartite Graphs
Chong Li, Kunyang Jia, Dan Shen, C.J. Richard Shi, Hongxia Yang
Deep Learning 2

Recommender systems on E-Commerce platforms track users' online behaviors and recommend relevant items according to each user’s interests and needs. Bipartite graphs that capture both user/item feature and use-item interactions have been demonstrated to be highly effective for this purpose. Recently, graph neural network (GNN) has been successfully applied in representation of bipartite graphs in industrial recommender systems. Providing individualized recommendation on a dynamic platform with billions of users is extremely challenging. A key observation is that the users of an online E-Commerce platform can be naturally clustered into a set of communities. We propose to cluster the users into a set of communities and make recommendations based on the information of the users in the community collectively. More specifically, embeddings are assigned to the communities and the user embedding is decomposed into two parts, each of which captures the community-level generalizations and individualized preferences respectively. The community embedding can be considered as an enhancement to the GNN methods that are inherently flat and do not learn hierarchical representations of graphs. The performance of the proposed algorithm is demonstrated on a public dataset and a world-leading E-Commerce company dataset.

• #4010
COP: Customized Deep Model Compression via Regularized Correlation-Based Filter-Level Pruning
Wenxiao Wang, Cong Fu, Jishun Guo, Deng Cai, Xiaofei He
Deep Learning 2

Neural network compression empowers the effective yet unwieldy deep convolutional neural networks (CNN) to be deployed in resource-constrained scenarios. Most state-of-the-art approaches prune the model in filter-level according to the "importance" of filters. Despite their success, we notice they suffer from at least two of the following problems: 1) The redundancy among filters is not considered because the importance is evaluated independently. 2) Cross-layer filter comparison is unachievable since the importance is defined locally within each layer. Consequently, we must manually specify layer-wise pruning ratios. 3) They are prone to generate sub-optimal solutions because they neglect the inequality between reducing parameters and reducing computational cost. Reducing the same number of parameters in different positions in the network may reduce different computational cost. To address the above problems, we develop a novel algorithm named as COP (correlation-based pruning), which can detect the redundant filters efficiently. We enable the cross-layer filter comparison through global normalization. We add parameter-quantity and computational-cost regularization terms to the importance, which enables the users to customize the compression according to their preference (smaller or faster). Extensive experiments have shown COP outperforms the others significantly. The code is released at https://github.com/ZJULearning/COP.

### Tuesday 1315:00 - 16:00MTA|RS - Recommender Systems 2 (R4)

Chair: TBD
• #3499
Explainable Fashion Recommendation: A Semantic Attribute Region Guided Approach
Min Hou, Le Wu, Enhong Chen, Zhi Li, Vincent W. Zheng, Qi Liu
Recommender Systems 2

In fashion recommender systems, each product usually consists of multiple semantic attributes (e.g., sleeves, collar, etc). When making cloth decisions, people usually show preferences for different semantic attributes (e.g., the clothes with v-neck collar). Nevertheless, most previous fashion recommendation models comprehend the clothing images with a global content representation and lack detailed understanding of users' semantic preferences, which usually leads to inferior recommendation performance. To bridge this gap, we propose a novel Semantic Attribute Explainable Recommender System (SAERS). Specifically, we first introduce a fine-grained interpretable semantic space. We then develop a Semantic Extraction Network (SEN) and Fine-grained Preferences Attention (FPA) module to project users and items into this space, respectively. With SAERS, we are capable of not only providing cloth recommendations for users, but also explaining the reason why we recommend the cloth through intuitive visual attribute semantic highlights in a personalized manner. Extensive experiments conducted on real-world datasets clearly demonstrate the effectiveness of our approach compared with the state-of-the-art methods.

• #10969
(Sister Conferences Best Papers Track) Impact of Consuming Suggested Items on the Assessment of Recommendations in User Studies on Recommender Systems
Benedikt Loepp, Tim Donkers, Timm Kleemann, Jürgen Ziegler
Recommender Systems 2

User studies are increasingly considered important in research on recommender systems. Although participants typically cannot consume any of the recommended items, they are often asked to assess the quality of recommendations and of other aspects related to user experience by means of questionnaires. Not being able to listen to recommended songs or to watch suggested movies, might however limit the validity of the obtained results. Consequently, we have investigated the effect of consuming suggested items. In two user studies conducted in different domains, we showed that consumption may lead to differences in the assessment of recommendations and in questionnaire answers. Apparently, adequately measuring user experience is in some cases not possible without allowing users to consume items. On the other hand, participants sometimes seem to approximate the actual value of recommendations reasonably well depending on domain and provided information.

• #5041
Binarized Collaborative Filtering with Distilling Graph Convolutional Network
Haoyu Wang, Defu Lian, Yong Ge
Recommender Systems 2

The efficiency of top-K item recommendation based on implicit feedback are vital to recommender systems in real world, but it is very challenging due to the lack of negative samples and the large number of candidate items. To address the challenges, we firstly introduce an improved Graph Convolutional Network~(GCN) model with high-order feature interaction considered. Then we distill the ranking information derived from GCN into binarized collaborative filtering, which makes use of binary representation to improve the efficiency of online recommendation. However, binary codes are not only hard to be optimized but also likely to incur the loss of information during the training processing. Therefore, we propose a novel framework to convert the binary constrained optimization problem into an equivalent continuous optimization problem with a stochastic penalty. The binarized collaborative filtering model is then easily optimized by many popular solvers like SGD and Adam. The proposed algorithm is finally evaluated on three real-world datasets and shown the superiority to the competing baselines.

• #3878
Co-Attentive Multi-Task Learning for Explainable Recommendation
Zhongxia Chen, Xiting Wang, Xing Xie, Tong Wu, Guoqing Bu, Yining Wang, Enhong Chen
Recommender Systems 2

Despite widespread adoption, recommender systems remain mostly black boxes. Recently, providing explanations about why items are recommended has attracted increasing attention due to its capability to enhance user trust and satisfaction. In this paper, we propose a co-attentive multi-task learning model for explainable recommendation. Our model improves both prediction accuracy and explainability of recommendation by fully exploiting the correlations between the recommendation task and the explanation task. In particular, we design an encoder-selector-decoder architecture inspired by human's information-processing model in cognitive psychology. We also propose a hierarchical co-attentive selector to effectively model the cross knowledge transferred for both tasks. Our model not only enhances prediction accuracy of the recommendation task, but also generates linguistic explanations that are fluent, useful, and highly personalized. Experiments on three public datasets demonstrate the effectiveness of our model.

### Tuesday 1315:00 - 16:00HAI|HCC - Human Computation and Crowdsourcing (R5)

Chair: TBD
• #630
MiSC: Mixed Strategies Crowdsourcing
Ching Yun Ko, Rui Lin, Shu Li, Ngai Wong
Human Computation and Crowdsourcing

Popular crowdsourcing techniques mostly focus on evaluating workers' labeling quality before adjusting their weights during label aggregation. Recently, another cohort of models regard crowdsourced annotations as incomplete tensors and recover unfilled labels by tensor completion. However, mixed strategies of the two methodologies have never been comprehensively investigated, leaving them as rather independent approaches. In this work, we propose MiSC ( Mixed Strategies Crowdsourcing), a versatile framework integrating arbitrary conventional crowdsourcing and tensor completion techniques. In particular, we propose a novel iterative Tucker label aggregation algorithm that outperforms state-of-the-art methods in extensive experiments.

• #1477
Multiple Noisy Label Distribution Propagation for Crowdsourcing
Hao Zhang, Liangxiao Jiang, Wenqiang Xu
Human Computation and Crowdsourcing

Crowdsourcing services provide a fast, efficient, and cost-effective means of obtaining large labeled data for supervised learning. Ground truth inference, also called label integration, designs proper aggregation strategies to infer the unknown true label of each instance from the multiple noisy label set provided by ordinary crowd workers. However, to the best of our knowledge, nearly all existing label integration methods focus solely on the multiple noisy label set itself of the individual instance while totally ignoring the intercorrelation among multiple noisy label sets of different instances. To solve this problem, a multiple noisy label distribution propagation (MNLDP) method is proposed in this study. MNLDP first transforms the multiple noisy label set of each instance into its multiple noisy label distribution and then propagates its multiple noisy label distribution to its nearest neighbors. Consequently, each instance absorbs a fraction of the multiple noisy label distributions from its nearest neighbors and yet simultaneously maintains a fraction of its own original multiple noisy label distribution. Promising experimental results on simulated and real-world datasets validate the effectiveness of our proposed method.

• #10968
(Sister Conferences Best Papers Track) Quality Control Attack Schemes in Crowdsourcing
Alessandro Checco, Jo Bates, Gianluca Demartini
Human Computation and Crowdsourcing

An important precondition to build effective AI models is the collection of training data at scale. Crowdsourcing is a popular methodology to achieve this goal. Its adoption  introduces novel challenges in data quality control, to deal with under-performing and malicious annotators. One of the most popular quality assurance mechanisms, especially in paid micro-task crowdsourcing, is the use of a small set of pre-annotated tasks as gold standard, to assess in real time the annotators quality. In this paper, we highlight a set of vulnerabilities this scheme suffers: a group of colluding crowd workers can easily implement and deploy a decentralised machine learning inferential system to  detect and signal which parts of the task are more likely to be gold questions, making them ineffective as a quality control tool. Moreover, we demonstrate how the most common countermeasures against this attack are ineffective in practical scenarios. The basic architecture of the inferential system is composed of a browser plug-in and an external server where the colluding workers can share information. We implement and validate the attack scheme, by means of experiments on real-world data from a popular crowdsourcing platform.

• #4194
Boosting for Comparison-Based Learning
Michael Perrot, Ulrike von Luxburg
Human Computation and Crowdsourcing

We consider the problem of classification in a comparison-based setting: given a set of objects, we only have access to triplet comparisons of the form object A is closer to object B than to object C.'' In this paper we introduce TripletBoost, a new method that can learn a classifier just from such triplet comparisons. The main idea is to aggregate the triplets information into weak classifiers, which can subsequently be boosted to a strong classifier. Our method has two main advantages: (i) it is applicable to data from any metric space, and (ii) it can deal with large scale problems using only passively obtained and noisy triplets. We derive theoretical generalization guarantees and a lower bound on the number of necessary triplets, and we empirically show that our method is both competitive with state of the art approaches and resistant to noise.

### Tuesday 1315:00 - 16:00AMS|TR - Trust and Reputation (R6)

Chair: TBD
• #486
A Value-based Trust Assessment Model for Multi-agent Systems
Kinzang Chhogyal, Abhaya Nayak, Aditya Ghose, Hoa K. Dam
Trust and Reputation

An agent's assessment of its trust in another agent is commonly taken to be a measure of the reliability/predictability of the latter's actions. It is based on the trustor's past observations of the behaviour of the trustee and requires no knowledge of the inner-workings of the trustee. However, in situations that are new or unfamiliar, past observations are of little help in assessing trust. In such cases, knowledge about the trustee can help. A particular type of knowledge is that of values - things that are important to the trustor and the trustee. In this paper, based on the premise that the more values two agents share, the more they should trust one another, we propose a simple approach to trust assessment between agents based on values, taking into account if agents trust cautiously or boldly, and if they depend on others in carrying out a task.

• #4579
Spotting Collective Behaviour of Online Frauds in Customer Reviews
Sarthika Dhawan, Siva Charan Reddy Gangireddy, Shiv Kumar, Tanmoy Chakraborty
Trust and Reputation

Online reviews play a crucial role in deciding the quality before purchasing any product. Unfortunately, spammers often take advantage of online review forums by writing fraud reviews to promote/demote certain products. It may turn out to be more detrimental when such spammers collude and collectively inject spam reviews as they can take complete control of users' sentiment due to the volume of fraud reviews they inject. Group spam detection is thus more challenging than individual-level fraud detection due to unclear definition of a group, variation of inter-group dynamics, scarcity of labeled group-level spam data, etc. Here, we propose DeFrauder, an unsupervised method to detect online fraud reviewer groups. It first detects candidate fraud groups by leveraging the underlying product review graph and incorporating several behavioral signals which model multi-faceted collaboration among reviewers. It then maps reviewers into an embedding space and assigns a spam score to each group such that groups comprising spammers with highly similar behavioral traits achieve high spam score. While comparing with five baselines on four real-world datasets (two of them were curated by us), DeFrauder shows superior performance by outperforming the best baseline with 17.64% higher NDCG@50 (on average) across datasets.

• #5274
FaRM: Fair Reward Mechanism for Information Aggregation in Spontaneous Localized Settings
Moin Hussain Moti, Dimitris Chatzopoulos, Pan Hui, Sujit Gujar
Trust and Reputation

Although peer prediction markets are widely used in crowdsourcing to aggregate information from agents, they often fail to reward the participating agents equitably. Honest agents can be wrongly penalized if randomly paired with dishonest ones. In this work, we introduce selective and cumulative fairness. We characterize a mechanism as fair if it satisfies both notions and present FaRM, a representative mechanism we designed. FaRM is a Nash incentive mechanism that focuses on information aggregation for spontaneous local activities which are accessible to a limited number of agents without assuming any prior knowledge of the event. All the agents in the vicinity observe the same information. FaRM uses (i) a report strength score to remove the risk of random pairing with dishonest reporters, (ii) a consistency score to measure an agent's history of accurate reports and distinguish valuable reports, (iii) a reliability score to estimate the probability of an agent to collude with nearby agents and prevents agents from getting swayed, and (iv) a location robustness score to filter agents who try to participate without being present in the considered setting. Together, report strength, consistency, and reliability represent a fair reward given to agents based on their reports.

• #5745
Identifying vulnerabilities in trust and reputation systems
Taha D. Güneş, Long Tran-Thanh, Timothy J. Norman
Trust and Reputation

Online communities use trust and reputation systems to assist their users in evaluating other parties. Due to the preponderance of these systems, malicious entities have a strong incentive to attempt to influence them, and strategies employed are increasingly sophisticated. Current practice is to evaluate trust and reputation systems against known attacks, and hence are heavily reliant on expert analysts. We present a novel method for automatically identifying vulnerabilities in such systems by formulating the problem as a derivative-free optimisation problem and applying efficient sampling methods. We illustrate the application of this method for attacks that involve the injection of false evidence, and identify vulnerabilities in existing trust models. In this way, we provide reliable and objective means to assess how robust trust and reputation systems are to different kinds of attacks.

### Tuesday 1315:00 - 16:00PS|S - Scheduling (R7)

Chair: TBD
• #253
Faster Dynamic Controllability Checking in Temporal Networks with Integer Bounds
Nikhil Bhargava, Brian C. Williams
Scheduling

Simple Temporal Networks with Uncertainty (STNUs) provide a useful formalism with which to reason about events and the temporal constraints that apply to them. STNUs are in particular notable because they facilitate reasoning over stochastic, or uncontrollable, actions and their corresponding durations. To evaluate the feasibility of a set of constraints associated with an STNU, one checks the network's \textit{dynamic controllability}, which determines whether an adaptive schedule can be constructed on-the-fly. Our work improves the runtime of checking the dynamic controllability of STNUs with integer bounds to O(min(mn, m sqrt(n) log N) + km + k^2n + kn log n). Our approach pre-processes the STNU using an existing O(n^3) dynamic controllability checking algorithm and provides tighter bounds on its runtime. This makes our work easily adaptable to other algorithms that rely on checking variants of dynamic controllability.

• #2658
Scheduling Jobs with Stochastic Processing Time on Parallel Identical Machines
Richard Stec, Antonin Novak, Premysl Sucha, Zdenek Hanzalek
Scheduling

Many real-world scheduling problems are characterized by uncertain parameters. In this paper, we study a classical parallel machine scheduling problem where the processing time of jobs is given by a normal distribution. The objective is to maximize the probability that jobs are completed before a given common due date. This study focuses on the computational aspect of this problem, and it proposes a Branch-and-Price approach for solving it. The advantage of our method is that it scales very well with the increasing number of machines and is easy to implement. Furthermore, we propose an efficient lower bound heuristics. The experimental results show that our method outperforms the existing approaches.

• #4311
Fair Online Allocation of Perishable Goods and its Application to Electric Vehicle Charging
Enrico H. Gerding, Alvaro Perez-Diaz, Haris Aziz, Serge Gaspers, Antonia Marcu, Nicholas Mattei, Toby Walsh
Scheduling

We consider mechanisms for the online allocation of perishable resources such as energy or computational power. A main application is electric vehicle charging where agents arrive and leave over time. Unlike previous work, we consider mechanisms without money, and a range of objectives including fairness and efficiency. In doing so, we extend the concept of envy-freeness to online settings. Furthermore, we explore the trade-offs between different objectives and analyse their theoretical properties both in online and offline settings. We then introduce novel online scheduling algorithms and compare them in terms of both their theoretical properties and empirical performance.

• #10979
(Journal track) Complexity Bounds for the Controllability of Temporal Networks with Conditions, Disjunctions, and Uncertainty
Nikhil Bhargava, Brian C. Williams
Scheduling

In temporal planning, many different temporal network formalisms are used to model real world situations. Each of these formalisms has different features which affect how easy it is to determine whether the underlying network of temporal constraints is consistent. While many of the simpler models have been well-studied from a computational complexity perspective, the algorithms developed for advanced models which combine features have very loose complexity bounds. In this work, we provide tight completeness bounds for strong, weak, and dynamic controllability checking of temporal networks that have conditions, disjunctions, and temporal uncertainty. Our work exposes some of the subtle differences between these different structures and, remarkably, establishes a guarantee that all of these problems are computable in PSPACE.

### Tuesday 1315:00 - 16:00KRR|RKB - Reasoning about Knowlege and Belief (R8)

Chair: TBD
• #4209
The Complexity of Model Checking Knowledge and Time
Laura Bozzelli, Bastien Maubert, Aniello Murano
Reasoning about Knowlege and Belief

We establish the precise complexity of the model checking problem for the main logics of knowledge and time. While this problem was known to be non-elementary for agents with perfect recall, with a number of exponentials that increases with the alternation of knowledge operators, the precise complexity of the problem when the maximum alternation is fixed has been an open problem for twenty years. We close it by establishing improved upper bounds for CTL* with knowledge, and providing matching lower bounds that also apply for epistemic extensions of LTL and CTL.

• #4563
Converging on Common Knowledge
Dominik Klein, Rasmus Kræmmer Rendsvig
Reasoning about Knowlege and Belief

Common knowledge, as is well known, is not attainable in finite time by unreliable communication, thus hindering perfect coordination. Focusing on the coordinated attack problem modeled using dynamic epistemic logic, this paper discusses unreliable communication protocols from a topological perspective and asks "If the generals may communicate indefinitely, will they then *converge* to a state of common knowledge?" We answer by making precise and showing the following: *common knowledge is attainable if, and only if, we do not care about common knowledge*.

• #2601
A Modal Characterization Theorem for a Probabilistic Fuzzy Description Logic
Paul Wild, Lutz Schröder, Dirk Pattinson, Barbara König
Reasoning about Knowlege and Belief

The fuzzy modality probably is interpreted over probabilistic type spaces by taking expected truth values. The arising probabilistic fuzzy description logic is invariant under probabilistic bisimilarity; more informatively, it is non-expansive wrt. a suitable notion of behavioural distance. In the present paper, we provide a characterization of the expressive power of this logic based on this observation: We prove a probabilistic analogue of the classical van Benthem theorem, which states that modal logic is precisely the bisimulation-invariant fragment of first-order logic. Specifically, we show that every formula in probabilistic fuzzy first-order logic that is non-expansive wrt. behavioural distance can be approximated by concepts of bounded rank in probabilistic fuzzy description logic.

• #4270
Accelerated Inference Framework of Sparse Neural Network Based on Nested Bitmask Structure
Yipeng Zhang, Bo Du, Lefei Zhang, Rongchun Li, Yong Dou
Reasoning about Knowlege and Belief

In order to satisfy the ever-growing demand for high-performance processors for neural networks, the state-of-the-art processing units tend to use application-oriented circuits to replace Processing Engine (PE) on the GPU under circumstances where low-power solutions are required. The application-oriented PE is fully optimized in terms of the circuit architecture and eliminates incorrect data dependency and instructional redundancy. In this paper, we propose a novel encoding approach on a sparse neural network after pruning. We partition the weight matrix into numerous blocks and use a low-rank binary map to represent the validation of these blocks. Furthermore, the elements in each nonzero block are also encoded into two submatrices: one is the binary stream discriminating the zero/nonzero position, while the other is the pure nonzero elements stored in the FIFO. In the experimental part, we implement a well pre-trained sparse neural network on the Xilinx FPGA VC707. Experimental results show that our algorithm outperforms the other benchmarks. Our approach has successfully optimized the throughput and the energy efficiency to deal with a single frame. Accordingly, we contend that Nested Bitmask Neural Network (NBNN), is an efficient neural network structure with only minor accuracy loss on the SoC system.

### Tuesday 1315:00 - 16:00NLP|MT - Machine Translation (R9)

Chair: TBD
• #1653
Sharing Attention Weights for Fast Transformer
Tong Xiao, Yinqiao Li, Jingbo Zhu, Zhengtao Yu, Tongran Liu
Machine Translation

Recently, the Transformer machine translation system has shown strong results by stacking attention layers on both the source and target-language sides. But the inference of this model is slow due to the heavy use of dot-product attention in auto-regressive decoding. In this paper we speed up Transformer via a fast and lightweight attention model. More specifically, we share attention weights in adjacent layers and enable the efficient re-use of hidden states in a vertical manner. Moreover, the sharing policy can be jointly learned with the MT model. We test our approach on ten WMT and NIST OpenMT tasks. Experimental results show that it yields an average of 1.3X speed-up (with almost no decrease in BLEU) on top of a state-of-the-art implementation that has already adopted a cache for fast inference. Also, our approach obtains a 1.8X speed-up when it works with the AAN model. This is even 16 times faster than the baseline with no use of the attention cache.

• #1859
From Words to Sentences: A Progressive Learning Approach for Zero-resource Machine Translation with Visual Pivots
Shizhe Chen, Qin Jin, Jianlong Fu
Machine Translation

The neural machine translation model has suffered from the lack of large-scale parallel corpora. In contrast, we humans can learn multi-lingual translations even without parallel texts by referring our languages to the external world. To mimic such human learning behavior, we employ images as pivots to enable zero-resource translation learning. However, a picture tells a thousand words, which makes multi-lingual sentences pivoted by the same image noisy as mutual translations and thus hinders the translation model learning. In this work, we propose a progressive learning approach for image-pivoted zero-resource machine translation. Since words are less diverse when grounded in the image, we first learn word-level translation with image pivots, and then progress to learn the sentence-level translation by utilizing the learned word translation to suppress noises in image-pivoted multi-lingual sentences. Experimental results on two widely used image-pivot translation datasets, IAPR-TC12 and Multi30k, show that the proposed approach significantly outperforms other state-of-the-art methods.

• #3742
Polygon-Net: A General Framework for Jointly Boosting Multiple Unsupervised Neural Machine Translation Models
Chang Xu, Tao Qin, Gang Wang, Tie-Yan Liu
Machine Translation

Neural machine translation (NMT) has achieved great success. However, collecting large-scale parallel data for training is costly and laborious.  Recently, unsupervised neural machine translation has attracted more and more attention, due to its demand for monolingual corpus only, which is common and easy to obtain, and its great potentials for the low-resource or even zero-resource machine translation. In this work, we propose a general framework called Polygon-Net, which leverages multi auxiliary languages for jointly boosting unsupervised neural machine translation models. Specifically, we design a novel loss function for multi-language unsupervised neural machine translation. In addition, different from the literature that just updating one or two models individually, Polygon-Net enables multiple unsupervised models in the framework to update in turn and enhance each other for the first time. In this way, multiple unsupervised translation models are associated with each other for training to achieve better performance. Experiments on the benchmark datasets including UN Corpus and WMT show that our approach significantly improves over the two-language based methods, and achieves better performance with more languages introduced to the framework.

• #4441
Correct-and-Memorize: Learning to Translate from Interactive Revisions
Rongxiang Weng, Hao Zhou, Shujian Huang, Lei Li, Yifan Xia, Jiajun Chen
Machine Translation

State-of-the-art machine translation models are still not on a par with human translators. Previous work takes human interactions into the neural machine translation process to obtain improved results in target languages. However, not all model--translation errors are equal -- some are critical while others are minor. In the meanwhile, same translation mistakes occur repeatedly in similar context. To solve both issues, we propose CAMIT, a novel method for translating in an interactive environment. Our proposed method works with critical revision instructions, therefore allows human to correct arbitrary words in model-translated sentences. In addition, CAMIT learns from and softly memorizes revision actions based on the context, alleviating the issue of repeating mistakes. Experiments in both ideal and real interactive translation settings demonstrate that our proposed CAMIT enhances machine translation results significantly while requires fewer revision instructions from human compared to previous methods.

### Tuesday 1315:00 - 16:00CV|RDCIMRSI - Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 1 (R10)

Chair: TBD
• #3159
Binarized Neural Networks for Resource-Efficient Hashing with Minimizing Quantization Loss
Feng Zheng, Cheng Deng, Heng Huang
Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 1

In order to solve the problem of memory consumption and computational requirements, this paper proposes a novel learning binary neural network framework to achieve a resource-efficient deep hashing. In contrast to floating-point (32-bit) full-precision networks, the proposed method achieves a 32x model compression rate. At the same time, computational burden in convolution is greatly reduced due to efficient Boolean operations. To this end, in our framework, a new quantization loss defined between the binary weights and the learned real values is minimized to reduce the model distortion, while, by minimizing a binary entropy function, the discrete optimization is successfully avoided and the stochastic gradient descend method can be used smoothly. More importantly, we provide two theories to demonstrate the necessity and effectiveness of minimizing the quantization losses for both weights and activations. Numerous experiments show that the proposed method can achieve fast code generation without sacrificing accuracy.

• #3518
DSRN: A Deep Scale Relationship Network for Scene Text Detection
Yuxin Wang, Hongtao Xie, Zilong Fu, Yongdong Zhang
Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 1

Nowadays, scene text detection has become increasingly important and popular. However, the large variance of text scale remains the main challenge and limits the detection performance in most previous methods. To address this problem, we propose an end-to-end architecture called Deep Scale Relationship Network (DSRN) to map multi-scale convolution features onto a scale invariant space to obtain uniform activation of multi-size text instances. Firstly, we develop a Scale-transfer module to transfer the multi-scale feature maps to a unified dimension. Due to the heterogeneity of features, simply concatenating feature maps with multi-scale information would limit the detection performance. Thus we propose a Scale Relationship module to aggregate the multi-scale information through bi-directional convolution operations. Finally, to further reduce the miss-detected instances, a novel Recall Loss is proposed to force the network to concern more about miss-detected text instances by up-weighting poor-classified examples. Compared with previous approaches, DSRN efficiently handles the large-variance scale problem without complex hand-crafted hyperparameter settings (e.g. scale of default boxes) and complicated post processing. On standard datasets including ICDAR2015 and MSRA-TD500, the proposed algorithm achieves the state-of-art performance with impressive speed (8.8 FPS on ICDAR2015 and 13.3 FPS on MSRA-TD500).

• #4861
Detecting Robust Co-Saliency with Recurrent Co-Attention Neural Network
Bo Li, Zhengxing Sun, Lv Tang, Yunhan Sun, Jinlong Shi
Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 1

Effective feature representations which should not only express the images individual properties, but also reflect the interaction among group images are essentially crucial for robust co-saliency detection. This paper proposes a novel deep learning co-saliency detection approach which simultaneously learns single image properties and robust group feature in a recurrent manner. Specifically, our network first extracts the semantic features of each image. Then, a specially designed Recurrent Co-Attention Unit (RCAU) will explore all images in the group recurrently to generate the final group representation using the co-attention between images, and meanwhile suppresses noisy information. The group feature which contains complementary synergetic information is later merged with the single image features which express the unique properties to infer robust co-saliency. We also propose a novel co-perceptual loss to make full use of interactive relationships of whole images in the training group as the supervision in our end-to-end training process. Extensive experimental results demonstrate the superiority of our approach in comparison with the state-of-the-art methods.

• #3859
Deep Recurrent Quantization for Generating Sequential Binary Codes
Jingkuan Song, Xiaosu Zhu, Lianli Gao, Xin-Shun Xu, Wu Liu, Heng Tao Shen
Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 1

Quantization has been an effective technology in ANN (approximate nearest neighbour) search due to its high accuracy and fast search speed. To meet the requirement of different applications, there is always a trade-off between retrieval accuracy and speed, reflected by variable code lengths. However, to encode the dataset into different code lengths, existing methods need to train several models, where each model can only produce a specific code length. This incurs a considerable training time cost, and largely reduces the flexibility of quantization methods to be deployed in real applications. To address this issue, we propose a Deep Recurrent Quantization (DRQ) architecture which can generate sequential binary codes. To the end, when the model is trained, a sequence of binary codes can be generated and the code length can be easily controlled by adjusting the number of recurrent iterations. A shared codebook and a scalar factor is designed to be the learnable weights in the deep recurrent quantization block, and the whole framework can be trained in an end-to-end manner. As far as we know, this is the first quantization method that can be trained once and generate sequential binary codes. Experimental results on the benchmark datasets show that our model achieves comparable or even better performance compared with the state-of-the-art for image retrieval. But it requires significantly less number of parameters and training times. Our code is published online: https://github.com/cfm-uestc/DRQ.

### Tuesday 1315:00 - 16:00ML|C - Classification 2 (R11)

Chair: TBD
• #222
Learning Sound Events from Webly Labeled data
Anurag Kumar, Ankit Parag Shah, Alexander Hauptmann, Bhiksha Raj
Classification 2

In the last couple of years, weakly labeled learning has turned out to be an exciting approach for audio event detection. In this work, we introduce webly labeled learning for sound events which aims to remove human supervision altogether from the learning process. We first develop a method of obtaining labeled audio data from the web (albeit noisy), in which no manual labeling is involved. We then describe methods to efficiently learn from these webly labeled audio recordings. In our proposed system, WeblyNet, two deep neural networks co-teach each other to robustly learn from webly labeled data, leading to around 17\% relative improvement over the baseline method. The method also involves transfer learning to obtain efficient representations.

• #2957
Persistence Bag-of-Words for Topological Data Analysis
Bartosz Zieliński, Michał Lipiński, Mateusz Juda, Matthias Zeppelzauer, Paweł Dłotko
Classification 2

Persistent homology (PH) is a rigorous mathematical theory that provides a robust descriptor of data in the form of persistence diagrams (PDs). PDs exhibit, however, complex structure and are difficult to integrate in today's machine learning workflows. This paper introduces persistence bag-of-words: a novel and stable vectorized representation of PDs that enables the seamless integration with machine learning. Comprehensive experiments show that the new representation achieves state-of-the-art performance and beyond in much less time than alternative approaches.

• #4977
Improving the Robustness of Deep Neural Networks via Adversarial Training with Triplet Loss
Pengcheng Li, Jinfeng Yi, Bowen Zhou, Lijun Zhang
Classification 2

Recent studies have highlighted that deep neural networks (DNNs) are vulnerable to adversarial examples. In this paper, we improve the robustness of DNNs by utilizing techniques of Distance Metric Learning. Specifically, we incorporate Triplet Loss, one of the most popular Distance Metric Learning methods, into the framework of adversarial training. Our proposed algorithm, Adversarial Training with Triplet Loss (AT2L), substitutes the adversarial example against the current model for the anchor of triplet loss to effectively smooth the classification boundary. Furthermore, we propose an ensemble version of AT2L, which aggregates different attack methods and model structures for better defense effects. Our empirical studies verify that the proposed approach can significantly improve the robustness of DNNs without sacrificing accuracy. Finally, we demonstrate that our specially designed triplet loss can also be used as a regularization term to enhance other defense methods.

• #537
Graph and Autoencoder Based Feature Extraction for Zero-shot Learning
Yang Liu, Deyan Xie, Quanxue Gao, Jungong Han, Shujian Wang, Xinbo Gao
Classification 2

Zero-shot learning (ZSL) aims to build models to recognize novel visual categories that have no associated labelled training samples. The basic framework is to transfer knowledge from seen classes to unseen classes by learning the visual-semantic embedding. However, most of approaches do not preserve the underlying sub-manifold of samples in the embedding space. In addition, whether the mapping can precisely reconstruct the original visual feature is not investigated in-depth. In order to solve these problems, we formulate a novel framework named Graph and Autoencoder Based Feature Extraction (GAFE) to seek a low-rank mapping to preserve the sub-manifold of samples. Taking the encoder-decoder paradigm, the encoder part learns a mapping from the visual feature to the semantic space, while decoder part reconstructs the original features with the learned mapping. In addition, a graph is constructed to guarantee the learned mapping can preserve the local intrinsic structure of the data. To this end, an L21 norm sparsity constraint is imposed on the mapping to identify features relevant to the target domain. Extensive experiments on five attribute datasets demonstrate the effectiveness of the proposed model.

### Tuesday 1315:00 - 16:00ML|DM - Data Mining 2 (R12)

Chair: TBD
• #653
DeepCU: Integrating both Common and Unique Latent Information for Multimodal Sentiment Analysis
Sunny Verma, Chen Wang, Liming Zhu, Wei Liu
Data Mining 2

Multimodal sentiment analysis combines information available from visual, textual, and acoustic representations for sentiment prediction. The recent multimodal fusion schemes combine multiple modalities as a tensor and obtain either; the common information by utilizing neural networks, or the unique information by modeling low-rank representation of the tensor. However, both of these information are essential as they render inter-modal and intra-modal relationships of the data. In this research, we first propose a novel deep architecture to extract the common information from the multi-mode representations. Furthermore, we propose unique networks to obtain the modality-specific information that enhances the generalization performance of our multimodal system. Finally, we integrate these two aspects of information via a fusion layer and propose a novel multimodal data fusion architecture, which we call DeepCU (Deep network with both Common and Unique latent information). The proposed DeepCU consolidates the two networks for joint utilization and discovery of all-important latent information. Comprehensive experiments are conducted to demonstrate the effectiveness of utilizing both common and unique information discovered by DeepCU on multiple real-world datasets. The source code of proposed DeepCU is available at https://github.com/sverma88/DeepCU-IJCAI19.

• #3577
Commit Message Generation for Source Code Changes
Shengbin Xu, Yuan Yao, Feng Xu, Tianxiao Gu, Hanghang Tong, Jian Lu
Data Mining 2

Commit messages, which summarize the source code changes in natural language, are essential for program comprehension and software evolution understanding. Unfortunately, due to the lack of direct motivation, commit messages are sometimes neglected by developers, making it necessary to automatically generate such messages. State-of-the-art adopts learning based approaches such as neural machine translation models for the commit message generation problem. However, they tend to ignore the code structure information and suffer from the out-of-vocabulary issue. In this paper, we propose CoDiSum to address the above two limitations. In particular, we first extract both code structure and code semantics from the source code changes, and then jointly model these two sources of information so as to better learn the representations of the code changes. Moreover, we augment the model with copying mechanism to further mitigate the out-of-vocabulary issue. Experimental evaluations on real data demonstrate that the proposed approach significantly outperforms the state-of-the-art in terms of accurately generating the commit messages.

• #6385
Recommending Links to Maximize the Influence in Social Networks
Federico Corò, Gianlorenzo D'Angelo, Yllka Velaj
Data Mining 2

Social link recommendation systems, like "People-you-may-know" on Facebook, "Who-to-follow" on Twitter, and "Suggested-Accounts" on Instagram assist the users of a social network in establishing new connections with other users. While these systems are becoming more and more important in the growth of social media, they tend to increase the popularity of users that are already popular. Indeed, since link recommenders aim at predicting users' behavior, they accelerate the creation of links that are likely to be created in the future, and, as a consequence, they reinforce social biases by suggesting few (popular) users, while giving few chances to the majority of users to build new connections and increase their popularity.In this paper we measure the popularity of a user by means of its social influence, which is its capability to influence other users' opinions, and we propose a link recommendation algorithm that evaluates the links to suggest according to their increment in social influence instead of their likelihood of being created. In detail, we give a constant factor approximation algorithm for the problem of maximizing the social influence of a given set of target users by suggesting a fixed number of new connections. We experimentally show that, with few new links and small computational time, our algorithm is able to increase by far the social influence of the target users. We compare our algorithm with several baselines and show that it is the most effective one in terms of increased influence.

• #5247
Fairwalk: Towards Fair Graph Embedding
Tahleen Rahman, Bartlomiej Surma, Michael Backes, Yang Zhang
Data Mining 2

Graph embeddings have gained huge popularity in the recent years as a powerful tool to analyze social networks. However, no prior works have studied potential bias issues inherent within graph embedding. In this paper, we make a first attempt in this direction. In particular, we concentrate on the fairness of node2vec, a popular graph embedding method. Our analyses on two real-world datasets demonstrate the existence of bias in node2vec when used for friendship recommendation. We, therefore, propose a fairness-aware embedding method, namely Fairwalk, which extends node2vec. Experimental results demonstrate that Fairwalk reduces bias under multiple fairness metrics while still preserving the utility.

### Tuesday 1315:00 - 16:00AMS|ML - Multi-agent Learning 1 (R13)

Chair: TBD
• #1063
Value Function Transfer for Deep Multi-Agent Reinforcement Learning Based on N-Step Returns
Yong Liu, Yujing Hu, Yang Gao, Yingfeng Chen, Changjie Fan
Multi-agent Learning 1

Many real-world problems, such as robot control and soccer game, are naturally modeled as sparse-interactions multi-agent systems. Reutilizing single agent knowledge in multi-agent systems with sparse interactions can greatly accelerate the multi-agent learning process, several previous works rely on the bisimulation metric to define Markov decision process (MDP) similarity for controlling knowledge transfer. However, the bisimulation metric is costly to compute and is not suitable for high dimensional state space. In this work, we propose more scalable transfer learning methods based on a novel MDP similarity concept. We start by defining the MDP similarity based on the N-step return (NSR) models of MDPs. Then, we propose two knowledge transfer methods based on deep neural network called direct value function transfer and NSR-based value function transfer. We conduct experiments in image-based grid world, multi-agent particle environment (MPE) and Ms.Pac-Man game. The results indicate that the proposed methods can significantly accelerate multi-agent reinforcement learning and meanwhile get better performance.

• #2168
Decentralized Optimization with Edge Sampling
Chi Zhang, Qianxiao Li, Peilin Zhao
Multi-agent Learning 1

In this paper, we propose a decentralized distributed algorithm with stochastic communication among nodes, building on a sampling method called "edge sampling''. Such a sampling algorithm allows us to avoid the heavy peer-to-peer communication cost when combining neighboring weights on dense networks while still maintains a comparable convergence rate. In particular, we quantitatively analyze its theoretical convergence properties, as well as the optimal sampling rate over the underlying network. When compared with previous methods, our solution is shown to be unbiased, communication-efficient and suffers from lower sampling variances. These theoretical findings are validated by both numerical experiments on the mixing rates of Markov Chains and distributed machine learning problems.

• #2581
Exploring the Task Cooperation in Multi-goal Visual Navigation
Yuechen Wu, Zhenhuan Rao, Wei Zhang, Shijian Lu, Weizhi Lu, Zheng-Jun Zha
Multi-agent Learning 1

Learning to adapt to a series of different goals in visual navigation is challenging. In this work, we present a model-embedded actor-critic architecture for the multi-goal visual navigation task. To enhance the task cooperation in multi-goal learning, we introduce two new designs to the reinforcement learning scheme: inverse dynamics model (InvDM) and multi-goal co-learning (MgCl). Specifically, InvDM is proposed to capture the navigation-relevant association between state and goal, and provide additional training signals to relieve the sparse reward issue. MgCl aims at improving the sample efficiency and supports the agent to learn from unintentional positive experiences. Extensive results on the interactive platform AI2-THOR demonstrate that the proposed method converges faster than state-of-the-art methods while producing more direct routes to navigate to the goal. The video demonstration is available at: https://youtube.com/channel/UCtpTMOsctt3yPzXqe_JMD3w/videos.

• #2679
Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent
Edward Lockhart, Marc Lanctot, Julien Pérolat, Jean-Baptiste Lespiau, Dustin Morrill, Finbarr TImbers, Karl Tuyls
Multi-agent Learning 1

In this paper, we present exploitability descent, a new algorithm to compute approximate equilibria in two-player zero-sum extensive-form games with imperfect information, by direct policy optimization against worst-case opponents. We prove that when following this optimization, the exploitability of a player?s strategy converges asymptotically to zero, and hence when both players employ this optimization, the joint policies converge to a Nash equilibrium. Unlike fictitious play (XFP) and counterfactual regret minimization (CFR), our convergence result pertains to the policies being optimized rather than the average policies: with certainty in Markov games and perfect information games, and high probability in imperfect information games. Our experiments demonstrate convergence rates comparable to XFP and CFR in four benchmark games in the tabular case. Using function approximation, we find that our algorithm outperforms the tabular version in two of the games, which, to the best of our knowledge, is the first such result in imperfect information games among this class of algorithms.

### Tuesday 1315:00 - 16:00NLP|NLPAT - NLP Applications and Tools (R14)

Chair: TBD
• #3015
Learning Assistance from An Adversarial Critic for Multi-output Prediction
Yue Deng, Yilin Shen, Hongxia Jin
NLP Applications and Tools

We introduce an adversarial-critic-andassistant(ACA) learning framework toimprove the performance of existing supervisedlearning with multiple outputs. The corecontribution of our ACA is the innovationof two novel modules, i.e. an ?adversarialcritic? and a ?collaborative assistant?, that arejointly designed to provide augmenting informationfor facilitating general multi-outputpredictions. Our approach is not intendedto be regarded as an competitor for manywell-established algorithms in the field. Infact, many existing structured prediction algorithmscan all be adopted as building blocksintegrated in the ACA framework. We showthe performance and generalization abilityof ACA on diverse learning tasks includingmulti-label classification, attributes predictionand sequence-to-sequence generation.

• #3373
Answering Binary Causal Questions Through Large-Scale Text Mining: An Evaluation Using Cause-Effect Pairs from Human Experts
Oktie Hassanzadeh, Debarun Bhattacharjya, Mark Feblowitz, Kavitha Srinivas, Michael Perrone, Shirin Sohrabi, Michael Katz
NLP Applications and Tools

In this paper, we study the problem of answering questions of type "Could X cause Y?" where X and Y are general phrases without any constraints. Answering such questions will assist with various decision analysis tasks such as verifying and extending presumed causal associations used for decision making. Our goal is to analyze the ability of an AI agent built using state-of-the-art unsupervised methods in answering causal questions derived from collections of cause-effect pairs from human experts. We focus only on unsupervised and weakly supervised methods due to the difficulty of creating a large enough training set with a reasonable quality and coverage. The methods we examine rely on a large corpus of text derived from news articles, and include methods ranging from large-scale application of classic NLP techniques and statistical analysis to the use of neural network based phrase embeddings and state-of-the-art neural language models.

• #6048
Aligning Learning Outcomes to Learning Resources: A Lexico-Semantic Spatial Approach
Swarnadeep Saha, Malolan Chetlur, Tejas Indulal Dhamecha, W M Gayathri K Wijayarathna, Red Mendoza, Paul Gagnon, Nabil Zary, Shantanu Godbole
NLP Applications and Tools

Aligning Learning Outcomes (LO) to relevant portions of Learning Resources (LR) is necessary to help students quickly navigate within the recommended learning material. In general, the problem can be viewed as finding the relevant sections of a document (LR) that is pertinent to a broad question (LO). In this paper, we introduce the novel problem of aligning LOs (LO is usually a sentence long text) to relevant pages of LRs (LRs are in the form of slide decks). We observe that the set of relevant pages can be composed of multiple chunks (a chunk is a contiguous set of pages) and the same page of an LR might be relevant to multiple LOs. To this end, we develop a novel Lexico-Semantic Spatial approach that captures the lexical, semantic, and spatial aspects of the task, and also alleviates the limited availability of training data. Our approach first identifies the relevancy of a page to an LO by using lexical and semantic features from each page independently. The spatial model at a later stage exploits the dependencies between the sequence of pages in the LR to further improve the alignment task. We empirically establish the importance of the lexical, semantic, and spatial models within the proposed approach. We show that, on average, a student can navigate to a relevant page from the first predicted page by about four clicks within a 38 page slide deck, as compared to two clicks by human experts.

• #5271
Modeling Noisy Hierarchical Types in Fine-Grained Entity Typing: A Content-Based Weighting Approach
Junshuang Wu, Richong Zhang, Yongyi Mao, Hongyu Guo, Jinpeng Huai
NLP Applications and Tools

Fine-grained entity typing (FET), which annotates the entities in a sentence with a set of finely specified type labels, often serves as the first and critical step towards many natural language processing tasks. Despite great processes have been made, current FET methods have difficulty to cope with the noisy labels which naturally come with the data acquisition processes. Existing FET approaches either pre-process to clean the noise or simply focus on one of the noisy labels, sidestepping the fact that those noises are related and content dependent. In this paper, we directly model the structured, noisy labels with a novel content-sensitive weighting schema. Coupled with a newly devised cost function and a hierarchical type embedding strategy, our method leverages a random walk process to effectively weight out noisy labels during training. Experiments on several benchmark datasets validate the effectiveness of the proposed framework and establish it as a new state of the art strategy for noisy entity typing problem.

### Tuesday 1315:00 - 16:00MLA|BM - Bio;Medicine (R15)

Chair: TBD
• #1408
FSM: A Fast Similarity Measurement for Gene Regulatory Networks via Genes' Influence Power
Zhongzhou Liu, Wenbin Hu
Bio;Medicine

The problem of graph similarity measurement is fundamental in both complex networks and bioinformatics researches. Gene regulatory networks (GRNs) describe the interactions between the molecules in organisms, and are widely studied in the fields of medical AI. By measuring the similarity between GRNs, significant information can be obtained to assist the applications like gene functions prediction, drug development and medical diagnosis. Most of the existing similarity measurements have been focusing on the graph isomorphisms and are usually NP-hard problems. Thus, they are not suitable for applications in biology and clinical research due to the complexity and large-scale features of real-world GRNs. In this paper, a fast similarity measurement method called FSM for GRNs is proposed. Unlike the conventional measurements, it pays more attention to the differences between those influential genes. For the convenience and reliability, a new index defined as influence power is adopted to describe the influential genes which have greater position in a GRN. FSM was applied in nine datasets of various scales and is compared with state-of-art methods. The results demonstrated that it ran significantly faster than other methods without sacrificing measurement performance.

• #2106
MLRDA: A Multi-Task Semi-Supervised Learning Framework for Drug-Drug Interaction Prediction
Xu Chu, Yang Lin, Yasha Wang, Leye Wang, Jiangtao Wang, Jingyue Gao
Bio;Medicine

Drug-drug interactions (DDIs) are a major cause of preventable hospitalizations and deaths. Recently, researchers in the AI community try to improve DDI prediction in two directions, incorporating multiple drug features to better model the pharmacodynamics and adopting multi-task learning to exploit associations among DDI types. However, these two directions are challenging to reconcile due to the sparse nature of the DDI labels which inflates the risk of overfitting of multi-task learning models when incorporating multiple drug features. In this paper, we propose a multi-task semi-supervised learning framework MLRDA for DDI prediction. MLRDA effectively exploits information that is beneficial for DDI prediction in unlabeled drug data by leveraging a novel unsupervised disentangling loss CuXCov. The CuXCov loss cooperates with the classification loss to disentangle the DDI prediction relevant part from the irrelevant part in a representation learnt by an autoencoder, which helps to ease the difficulty in mining useful information for DDI prediction in both labeled and unlabeled drug data. Moreover, MLRDA adopts a multi-task learning framework to exploit associations among DDI types. Experimental results on real-world datasets demonstrate that MLRDA significantly outperforms state-of-the-art DDI prediction methods by up to 10.3% in AUPR.

• #3567
Medical Concept Embedding with Multiple Ontological Representations
Lihong Song, Chin Wang Cheong, Kejing Yin, William K. Cheung, Benjamin C. M. Fung, Jonathan Poon
Bio;Medicine

Learning representations of medical concepts from the Electronic Health Records (EHR) has been shown effective for predictive analytics in healthcare. Incorporation of medical ontologies has also been explored to further enhance the accuracy and to ensure better alignment with the known medical knowledge. Most of the existing work assumes that medical concepts under the same ontological category should share similar representations, which however does not always hold. In particular, the categorizations in medical ontologies were established with various factors being considered. Medical concepts even under the same ontological category may not follow similar occurrence patterns in the EHR data, leading to contradicting objectives for the representation learning. In this paper, we propose a deep learning model called MMORE which alleviates this conflicting objective issue by allowing multiple representations to be inferred for each ontological category via an attention mechanism. We apply MMORE to diagnosis prediction and our experimental results show that the representations obtained by MMORE can achieve better predictive accuracy and result in clinically meaningful sub-categorization of the existing ontological categories.

• #568
Two-Stage Generative Models of Simulating Training Data at The Voxel Level for Large-Scale Microscopy Bioimage Segmentation
Deli Wang, Ting Zhao, Nenggan Zheng, Zhefeng Gong
Bio;Medicine

Bioimage Informatics is a growing area that aims to extract biological knowledge from microscope images of biomedical samples automatically. Its mission is vastly challenging, however, due to the complexity of diverse imaging modalities and big scales of multi-dimensional images. One major challenge is automatic image segmentation, an essential step towards high-level modeling and analysis. While progresses in deep learning have brought the goal of automation much closer to reality, creating training data for producing powerful neural networks is often laborious. To provide a shortcut for this costly step, we propose a novel two-stage generative model for simulating voxel level training data based on a specially designed objective function of preserving foreground labels. Using segmenting neurons from LM (Light Microscopy) image stacks as a testing example, we showed that segmentation networks trained by our synthetic data were able to produce satisfactory results. Unlike other simulation methods available in the field, our method can be easily extended to many other applications because it does not involve sophisticated cell models and imaging mechanisms.

### Tuesday 1316:30 - 17:30ST: Human AI & ML 1 - Special Track on Human AI and Machine Learning 2 (R2)

Chair: TBD
• #4017
How Well Do Machines Perform on IQ tests: a Comparison Study on a Large-Scale Dataset
Yusen Liu, Fangyuan He, Haodi Zhang, Guozheng Rao, Zhiyong Feng, Yi Zhou
Special Track on Human AI and Machine Learning 2

AI benchmarking becomes an increasingly important task. As suggested by many researchers, Intelligence Quotient (IQ) tests, which is widely regarded as one of the predominant benchmarks for measuring human intelligence, raises an interesting challenge for AI systems. For better solving IQ tests automatedly by machines, one needs to use, combine and advance many areas in AI including knowledge representation and reasoning, machine learning, natural language processing and image understanding. Also, automated IQ tests provides an ideal testbed for integrating symbolic and subsymbolic approaches as both are found useful here. Hence, we argue that IQ tests, although not suitable for testing machine intelligence, provides an excellent benchmark for the current development of AI research. Nevertheless, most existing IQ test datasets are not comprehensive enough for this purpose. As a result, the conclusions obtained are not representative. To address this issue, we create IQ10k, a large-scale dataset that contains more than 10,000 IQ test questions. We also conduct a comparison study on IQ10k with a number of state-of-the-art approaches.

• #4970
Learning Relational Representations with Auto-encoding Logic Programs
Sebastijan Dumancic, Tias Guns, Wannes Meert, Hendrik Blockeel
Special Track on Human AI and Machine Learning 2

Deep learning methods capable of handling relational data have proliferated over the past years. In contrast to traditional relational learning methods that leverage first-order logic for representing such data, these methods aim at re-representing symbolic relational data in Euclidean space. They offer better scalability, but can only approximate rich relational structures and are less flexible in terms of reasoning. This paper introduces a novel framework for relational representation learning that combines the best of both worlds. This framework, inspired by the auto-encoding principle, uses first-order logic as a data representation language, and the mapping between the the original and latent representation is done by means of logic programs instead of neural networks. We show how learning can be cast as a constraint optimisation problem for which existing solvers can be used. The use of logic as a representation language makes the proposed framework more accurate (as the representation is exact, rather than approximate), more flexible, and more interpretable than deep learning methods. We experimentally show that these latent representations are indeed beneficial in relational learning tasks.

• #5902
Learning Hierarchical Symbolic Representations to Support Interactive Task Learning and Knowledge Transfer
James R. Kirk, John E. Laird
Special Track on Human AI and Machine Learning 2

Interactive Task Learning (ITL) focuses on learning the definition of tasks through online natural language instruction in real time. Learning the correct grounded meaning of the instructions is difficult due to ambiguous words, lack of common ground, and the presence of distractors in the environment and the agent’s knowledge. We present a learning strategy embodied in an ITL agent that interactively learns in one shot the meaning of task concepts for 40 games and puzzles in ambiguous scenarios. Our approach learns hierarchical symbolic representations of task knowledge rather than learning a mapping directly from perceptual representations. These representations enable the agent to transfer and compose knowledge, analyze and debug multiple interpretations, and communicate efficiently with the teacher to resolve ambiguity. We evaluate the efficiency of the learning by examining the number of words required to teach tasks across cases of no transfer, positive transfer, and interference from prior tasks. Our results show that the agent can correctly generalize, disambiguate, and transfer concepts within variations in language descriptions and world representations of the same task, and across variations in different tasks.

• #6050
LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning
Alberto Camacho, Rodrigo Toro Icarte, Toryn Q. Klassen, Richard Valenzano, Sheila A. McIlraith
Special Track on Human AI and Machine Learning 2

In Reinforcement Learning (RL), an agent is guided by the rewards it receives from the reward function. Unfortunately, it may take many interactions with the environment to learn from sparse rewards, and it can be challenging to specify reward functions that reflect complex reward-worthy behavior. We propose using reward machines (RMs), which are automata-based representations that expose reward function structure, as a normal form representation for reward functions. We show how specifications of reward in various formal languages, including LTL and other regular languages, can be automatically translated into RMs, easing the burden of complex reward function specification. We then show how the exposed structure of the reward function can be exploited by tailored q-learning algorithms and automated reward shaping techniques in order to improve the sample efficiency of reinforcement learning methods. Experiments show that these RM-tailored techniques significantly outperform state-of-the-art (deep) RL algorithms, solving problems that otherwise cannot reasonably be solved by existing approaches.

### Tuesday 1316:30 - 18:00ML|DL - Deep Learning 3 (R3)

Chair: TBD
• #61
Heterogeneous Graph Matching Networks for Unknown Malware Detection
Shen Wang, Zhengzhang Chen, Xiao Yu, Ding Li, Jingchao Ni, Lu-An Tang, Jiaping Gui, Zhichun Li, Haifeng Chen, Philip S. Yu
Deep Learning 3

Information systems have widely been the target of malware attacks. Traditional signature-based malicious program detection algorithms can only detect known malware and are prone to evasion techniques such as binary obfuscation, while behavior-based approaches highly rely on the malware training samples and incur prohibitively high training cost. To address the limitations of existing techniques, we propose MatchGNet, a heterogeneous Graph Matching Network model to learn the graph representation and similarity metric simultaneously based on the invariant graph modeling of the program's execution behaviors. We conduct a systematic evaluation of our model and show that it is accurate in detecting malicious program behavior and can help detect malware attacks with less false positives. MatchGNet outperforms the state-of-the-art algorithms in malware detection by generating 50% less false positives while keeping zero false negatives.

• #3225
On the Convergence of (Stochastic) Gradient Descent with Extrapolation for Non-Convex Minimization
Yi Xu, Zhuoning Yuan, Sen Yang, Rong Jin, Tianbao Yang
Deep Learning 3

Extrapolation is a well-known technique for solving convex optimization and variational inequalities and recently attracts some attention for non-convex optimization. Several recent works have empirically shown its success in some machine learning tasks. However, it has not been analyzed for non-convex minimization and there still remains a gap between the theory and the practice. In this paper, we analyze gradient descent  and stochastic gradient descent with extrapolation for finding an approximate first-order stationary point in smooth non-convex optimization problems. Our convergence upper bounds show that the algorithms with extrapolation can be accelerated than without extrapolation.

• #4207
Learning Instance-wise Sparsity for Accelerating Deep Models
Chuanjian Liu, Yunhe Wang, Kai Han, Chunjing Xu, Chang Xu
Deep Learning 3

Exploring deep convolutional neural networks of high efficiency and low memory usage is very essential for a wide variety of machine learning tasks. Most of existing approaches used to accelerate deep models by manipulating parameters or filters without data, e.g., pruning and decomposition. In contrast, we study this problem from a different perspective by respecting the difference between data. An instance-wise feature pruning is developed by identifying informative features for different instances. Specifically, by investigating a feature decay regularization, we expect intermediate feature maps of each instance in deep neural networks to be sparse while preserving the overall network performance. During online inference, subtle features of input images extracted by intermediate layers of a well-trained neural network can be eliminated to accelerate the subsequent calculations. We further take coefficient of variation as a measure to select the layers that are appropriate for acceleration. Extensive experiments conducted on benchmark datasets and networks demonstrate the effectiveness of the proposed method.

• #4385
Quaternion Collaborative Filtering for Recommendation
Shuai Zhang, Lina Yao, Lucas Vinh Tran, Aston Zhang, Yi Tay
Deep Learning 3

This paper proposes Quaternion Collaborative Filtering (QCF), a novel representation learning method for recommendation. Our proposed QCF relies on and exploits computation with Quaternion algebra, benefiting from the expressiveness and rich representation learning capability of Hamilton products. Quaternion representations, based on hypercomplex numbers, enable rich inter-latent dependencies between imaginary components. This encourages intricate relations to be captured when learning user-item interactions, serving as a strong inductive bias  as compared with the real-space inner product. All in all, we conduct extensive experiments on six real-world datasets, demonstrating the effectiveness of Quaternion algebra in recommender systems. The results exhibit that QCF outperforms a wide spectrum of strong neural baselines on all datasets. Ablative experiments confirm the effectiveness of Hamilton-based composition over multi-embedding composition in real space.

• #10977
(Sister Conferences Best Papers Track) A Walkthrough for the Principle of Logit Separation
Gil Keren, Sivan Sabato, Björn Schuller
Deep Learning 3

We consider neural network training, in applications in which there are many possible classes, but at test-time, the task is a binary classification task of determining whether the given example belongs to a specific class. We define the Single Logit Classification (SLC) task: training the network so that at test-time, it would be possible to accurately identify whether the example belongs to a given class in a computationally efficient manner, based only on the output logit for this class. We propose a natural principle, the Principle of Logit Separation, as a guideline for choosing and designing loss functions that are suitable for SLC. We show that the Principle of Logit Separation is a crucial ingredient for success in the SLC task, and that SLC results in considerable speedups when the number of classes is large.

• #613
Dense Transformer Networks for Brain Electron Microscopy Image Segmentation
Jun Li, Yongjun Chen, Lei Cai, Ian Davidson, Shuiwang Ji
Deep Learning 3

The key idea of current deep learning methods for dense prediction is to apply a model on a regular patch centered on each pixel to make pixel-wise predictions. These methods are limited in the sense that the patches are determined by network architecture instead of learned from data. In this work, we propose the dense transformer networks, which can learn the shapes and sizes of patches from data. The dense transformer networks employ an encoder-decoder architecture, and a pair of dense transformer modules are inserted into each of the encoder and decoder paths. The novelty of this work is that we provide technical solutions for learning the shapes and sizes of patches from data and efficiently restoring the spatial correspondence required for dense prediction. The proposed dense transformer modules are differentiable, thus the entire network can be trained. We apply the proposed networks on biological image segmentation tasks and show superior performance is achieved in comparison to baseline methods.

### Tuesday 1316:30 - 18:00ML|RL - Reinforcement Learning 1 (R4)

Chair: TBD
• #132
Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning
Wenjie Shi, Shiji Song, Cheng Wu
Reinforcement Learning 1

Maximum entropy deep reinforcement learning (RL) methods have been demonstrated on a range of challenging continuous tasks. However, existing methods either suffer from severe instability when training on large off-policy data or cannot scale to tasks with very high state and action dimensionality such as 3D humanoid locomotion. Besides, the optimality of desired Boltzmann policy set for non-optimal soft value function is not persuasive enough. In this paper, we first derive soft policy gradient based on entropy regularized expected reward objective for RL with continuous actions. Then, we present an off-policy actor-critic, model-free maximum entropy deep RL algorithm called deep soft policy gradient (DSPG) by combining soft policy gradient with soft Bellman equation. To ensure stable learning while eliminating the need of two separate critics for soft value functions, we leverage double sampling approach to making the soft Bellman equation tractable. The experimental results demonstrate that our method outperforms in performance over off-policy prior methods.

• #628
Incremental Learning of Planning Actions in Model-Based Reinforcement Learning
Jun Hao Alvin Ng, Ron P. A.Ronald P. A. Petrick
Reinforcement Learning 1

The soundness and optimality of a plan depends on the correctness of the domain model. Specifying complete domain models can be difficult when interactions between an agent and its environment are complex. We propose a model-based reinforcement learning (MBRL) approach to solve planning problems with unknown models. The model is learned incrementally over episodes using only experiences from the current episode which suits non-stationary environments. We introduce the novel concept of reliability as an intrinsic motivation for MBRL, and a method to learn from failure to prevent repeated instances of similar failures. Our motivation is to improve the learning efficiency and goal-directedness of MBRL. We evaluate our work with experimental results for three planning domains.

• #2825
Autoregressive Policies for Continuous Control Deep Reinforcement Learning
Dmytro Korenkevych, A. Rupam Mahmood, Gautham Vasan, James Bergstra
Reinforcement Learning 1

Reinforcement learning algorithms rely on exploration to discover new behaviors, which is typically achieved by following a stochastic policy. In continuous control tasks, policies with a Gaussian distribution have been widely adopted. Gaussian exploration however does not result in smooth trajectories that generally correspond to safe and rewarding behaviors in practical tasks. In addition, Gaussian policies do not result in an effective exploration of an environment and become increasingly inefficient as the action rate increases. This contributes to a low sample efficiency often observed in learning continuous control tasks. We introduce a family of stationary autoregressive (AR) stochastic processes to facilitate exploration in continuous control domains. We show that proposed processes possess two desirable features: subsequent process observations are temporally coherent with continuously adjustable degree of coherence, and the process stationary distribution is standard normal. We derive an autoregressive policy (ARP) that implements such processes maintaining the standard agent-environment interface. We show how ARPs can be easily used with the existing off-the-shelf learning algorithms. Empirically we demonstrate that using ARPs results in improved exploration and sample efficiency in both simulated and real world domains, and, furthermore, provides smooth exploration trajectories that enable safe operation of robotic hardware.

• #4369
Sharing Experience in Multitask Reinforcement Learning
Tung-Long Vuong, Do-Van Nguyen, Tai-Long Nguyen, Cong-Minh Bui, Hai-Dang Kieu, Viet-Cuong Ta, Quoc-Long Tran, Thanh-Ha Le
Reinforcement Learning 1

In multitask reinforcement learning, tasks often have sub-tasks that share the same solution, even though the overall tasks are different. If the shared-portions could be effectively identified, then the learning process could be improved since all the samples between tasks in the shared space could be used. In this paper, we propose a Sharing Experience Framework (SEF) for simultaneously training of multiple tasks. In SEF, a confidence sharing agent uses task-specific rewards from the environment to identify similar parts that should be shared across tasks and defines those parts as shared-regions between tasks. The shared-regions are expected to guide task-policies sharing their experience during the learning process. The experiments highlight that our framework improves the performance and the stability of learning task-policies, and is possible to help task-policies avoid local optimums.

• #4757
Adversarial Imitation Learning from Incomplete Demonstrations
Mingfei Sun, Xiaojuan Ma
Reinforcement Learning 1

Imitation learning targets deriving a mapping from states to actions, a.k.a. policy, from expert demonstrations. Existing methods for imitation learning typically require any actions in the demonstrations to be fully available, which is hard to ensure in real applications. Though algorithms for learning with unobservable actions have been proposed, they focus solely on state information and over- look the fact that the action sequence could still be partially available and provide useful information for policy deriving. In this paper, we propose a novel algorithm called Action-Guided Adversarial Imitation Learning (AGAIL) that learns a pol- icy from demonstrations with incomplete action sequences, i.e., incomplete demonstrations. The core idea of AGAIL is to separate demonstrations into state and action trajectories, and train a policy with state trajectories while using actions as auxiliary information to guide the training whenever applicable. Built upon the Generative Adversarial Imitation Learning, AGAIL has three components: a generator, a discriminator, and a guide. The generator learns a policy with rewards provided by the discriminator, which tries to distinguish state distributions between demonstrations and samples generated by the policy. The guide provides additional rewards to the generator when demonstrated actions for specific states are available. We com- pare AGAIL to other methods on benchmark tasks and show that AGAIL consistently delivers com- parable performance to the state-of-the-art methods even when the action sequence in demonstrations is only partially available.

• #4947
A Restart-based Rank-1 Evolution Strategy for Reinforcement Learning
Zefeng Chen, Yuren Zhou, Xiao-yu He, Siyu Jiang
Reinforcement Learning 1

Evolution strategies have been demonstrated to have the strong ability to roughly train deep neural networks and well accomplish reinforcement learning tasks. However, existing evolution strategies designed specially for deep reinforcement learning only involve the plain variants which can not realize the adaptation of mutation strength or other advanced techniques. The research of applying advanced and effective evolution strategies to reinforcement learning in an efficient way is still a gap. To this end, this paper proposes a restart-based rank-1 evolution strategy for reinforcement learning. When training the neural network, it adapts the mutation strength and updates the principal search direction in a way similar to the momentum method, which is an ameliorated version of stochastic gradient ascent. Besides, two mechanisms, i.e., the adaptation of the number of elitists and the restart procedure, are integrated to deal with the issue of local optima. Experimental results on classic control problems and Atari games show that the proposed algorithm is superior to or competitive with state-of-the-art algorithms for reinforcement learning, demonstrating the effectiveness of the proposed algorithm.

### Tuesday 1316:30 - 18:00AMS|CSC - Computational Social Choice 1 (R5)

Chair: TBD
• #748
Flexible Representative Democracy: An Introduction with Binary Issues
Ben Abramowitz, Nicholas Mattei
Computational Social Choice 1

We introduce Flexible Representative Democracy (FRD), a novel hybrid of Representative Democracy (RD) and Direct Democracy (DD), in which voters can alter the issue-dependent weights of a set of elected representatives. In line with the literature on Interactive Democracy, our model allows the voters to actively determine the degree to which the system is direct versus representative. However, unlike Liquid Democracy, FRD uses strictly non-transitive delegations, making delegation cycles impossible, preserving privacy and anonymity, and maintaining a fixed set of accountable elected representatives. We present FRD and analyze it using a computational approach with issues that are independent, binary, and symmetric; we compare the outcomes of various democratic systems using Direct Democracy with majority voting and full participation as an ideal baseline. We find through theoretical and empirical analysis that FRD can yield significant improvements over RD for emulating DD with full participation.

• #1527
On Strategyproof Conference Peer Review
Yichong Xu, Han Zhao, Xiaofei Shi, Nihar Shah
Computational Social Choice 1

We consider peer review under a conference setting where there are conflicts between the reviewers and the submissions. Under such conflicts, reviewers can manipulate their reviews in a strategic manner to influence the final rankings of their own papers. Present-day peer-review systems are not designed to guard against such strategic behavior, beyond minimal (and insufficient) checks such as not assigning a paper to a conflicted reviewer. In this work, we address this problem through the lens of social choice, and present a theoretical framework for strategyproof and efficient peer review. Given the conflict graph which satisfies a simple property, we first present and analyze a flexible framework for reviewer-assignment and aggregation for the reviews that guarantees not only strategyproofness but also a natural efficiency property (unanimity). Our framework is based on the so-called partitioning method, and can be treated as a generalization of this type of method to conference peer review settings. We then empirically show that the requisite property on the (authorship) conflict graph is indeed satisfied in the ICLR-17 submissions data, and further demonstrate a simple trick to make the partitioning method more practically appealing under conference peer-review settings. Finally, we complement our positive results with negative theoretical results where we prove that under slightly stronger requirements, it is impossible for any algorithm to be both strategyproof and efficient.

• #5143
A Contribution to the Critique of Liquid Democracy
Ioannis Caragiannis, Evi Micha
Computational Social Choice 1

Liquid democracy, which combines features of direct and representative democracy has been proposed as a modern practice for collective decision making. Its advocates support that by allowing voters to delegate their vote to more informed voters can result in better decisions. In an attempt to evaluate the validity of such claims, we study liquid democracy as a means to discover an underlying ground truth. We revisit a recent model by Kahng et al. [2018] and conclude with three negative results, criticizing an important assumption of their modeling, as well as liquid democracy more generally. In particular, we first identify cases where natural local mechanisms are much worse than either direct voting or the other extreme of full delegation to a common dictator. We then show that delegating to less informed voters may considerably increase the chance of discovering the ground truth. Finally, we show that deciding delegations that maximize the probability to find the ground truth is a computationally hard problem.

• #5250
Protecting Elections by Recounting Ballots
Edith Elkind, Jiarui Gan, Svetlana Obraztsova, Zinovi Rabinovich, Alexandros A. Voudouris
Computational Social Choice 1

Complexity of voting manipulation is a prominent topic in computational social choice. In this work, we consider a two-stage voting manipulation scenario. First, a malicious party (an attacker) attempts to manipulate the election outcome in favor of a preferred candidate by changing the vote counts in some of the voting districts. Afterwards, another party (a defender), which cares about the voters' wishes, demands a recount in a subset of the manipulated districts, restoring their vote counts to their original values. We investigate the resulting Stackelberg game for the case where votes are aggregated using two variants of the Plurality rule, and obtain an almost complete picture of the complexity landscape, both from the attacker's and from the defender's perspective.

• #59
Fair Allocation of Indivisible Goods and Chores
Haris Aziz, Ioannis Caragiannis, Ayumi Igarashi, Toby Walsh
Computational Social Choice 1

We consider the problem of fairly dividing a set of items. Much of the fair division literature assumes that the items are goods'' i.e., they yield positive utility for the agents. There is also some work where the items are chores'' that yield negative utility for the agents. In this paper, we consider a more general scenario where an agent may have negative or positive utility for each item. This framework captures, e.g., fair task assignment, where agents can have both positive and negative utilities for each task. We show that whereas some of the positive axiomatic and computational results extend to this more general setting, others do not. We present several new and efficient algorithms for finding fair allocations in this general setting. We also point out several gaps in the literature regarding the existence of allocations satisfying certain fairness and efficiency properties and further study the  complexity of computing such allocations.

• #255
Weighted Maxmin Fair Share Allocation of Indivisible Chores
Haris Aziz, Hau Chan, Bo Li
Computational Social Choice 1

We initiate the study of indivisible chore allocation for agents with asymmetric shares. The fairness concept we focus on is the weighted natural generalization of maxmin share: WMMS fairness and OWMMS fairness. We first highlight the fact that commonly-used algorithms that work well for allocation of goods to asymmetric agents, and even for chores to symmetric agents do not provide good approximations for allocation of chores to asymmetric agents under WMMS. As a consequence, we present a novel polynomial-time constant-approximation algorithm, via linear program, for OWMMS. For two special cases: the binary valuation case and the 2-agent case, we provide exact or better constant-approximation algorithms.

### Tuesday 1316:30 - 18:00KRR|NR - Non-monotonic Reasoning (R6)

Chair: TBD
• #2953
Rational Inference Relations from Maximal Consistent Subsets Selection
Sébastien Konieczny, Pierre Marquis, Srdjan Vesic
Non-monotonic Reasoning

When one wants to draw non-trivial inferences from an inconsistent belief base, a very  natural approach is to take advantage of the maximal consistent subsets of the base. But few inference relations from maximal consistent subsets exist. In this paper we point out new such relations based on selection of some of the maximal consistent subsets, leading thus to inference relations with a stronger inferential power. The selection process must obey some principles to ensure that it leads to an inference relation which is rational. We define a general class of monotonic selection relations for comparing maximal consistent sets. And we show that it corresponds to the class of rational inference relations.

• #4262
On the Integration of CP-nets in ASPRIN
Mario Alviano, Javier Romero, Torsten Schaub
Non-monotonic Reasoning

Conditional preference networks (CP-nets) express qualitative preferences over features of interest.A Boolean CP-net can express that a feature is preferable under some conditions, as long as all other features have the same value.This is often a convenient representation, but sometimes one would also like to express a preference for maximizing a set of features, or some other objective function on the features of interest.ASPRIN is a flexible framework for preferences in ASP, where one can mix heterogeneous preference relations, and this paper reports on the integration of Boolean CP-nets.In general, we extend ASPRIN with a preference program for CP-nets in order to compute most preferred answer sets via an iterative algorithm.For the specific case of acyclic CP-nets, we provide an approximation by partially ordered set preferences, which are in turn normalized by ASPRIN to take advantage of several highly optimized algorithms implemented by ASP solvers for computing optimal solutions.Finally, we take advantage of a linear-time computable function to address dominance testing for tree-shaped CP-nets.

• #4681
Simple Conditionals with Constrained Right Weakening
Giovanni Casini, Thomas Meyer, Ivan Varzinczak
Non-monotonic Reasoning

In this paper we introduce and investigate a very basic semantics for conditionals that can be used to define a broad class of conditional reasoning. We show that it encompasses the most popular kinds of conditional reasoning developed in logic-based KR. It turns out that the semantics we propose is appropriate for a structural analysis of those conditionals that do not satisfy the property of Right Weakening. We show that it can be used for the further development of an analysis of the notion of relevance in conditional reasoning.

• #5424
Out of Sight But Not Out of Mind: An ASP Based Online Abduction Framework for Visual Sensemaking in Autonomous Driving
Jakob Suchan, Mehul Bhatt, Srikrishna Varadarajan
Non-monotonic Reasoning

We demonstrate the need and potential of systematically integrated vision and semantics solutions for visual sensemaking (in the backdrop of autonomous driving). A general method for online visual sensemaking using answer set programming is systematically formalised and fully implemented. The method integrates state of the art in (deep learning based) visual computing, and is developed as a modular framework usable within hybrid architectures for perception & control. We evaluate and demo with community established benchmarks KITTIMOD and MOT. As use-case, we focus on the significance of human-centred visual sensemaking - e.g., semantic representation and explainability, question-answering, commonsense interpolation - in safety-critical autonomous driving situations.

• #6324
What Has Been Said? Identifying the Change Formula in a Belief Revision Scenario
Nicolas Schwind, Katsumi Inoue, Sébastien Konieczny, Jean-Marie Lagniez, Pierre Marquis
Non-monotonic Reasoning

We consider the problem of identifying the change formula in a belief revision scenario: given that an unknown announcement (a formula $\mu$) led a set of agents to revise their beliefs and given the prior beliefs and the revised beliefs of the agents, what can be said about $\mu$? We show that under weak conditions about the rationality of the revision operators used by the agents, the set of candidate formulae has the form of a logical interval. We explain how the bounds of this interval can be tightened when the revision operators used by the agents are known and/or when $\mu$ is known to be independent from a given set of variables. We also investigate the completeness issue, i.e., whether $\mu$ can be exactly identified. We present some sufficient conditions for it, identify its computational complexity, and report the results of some experiments about it.

• #10975
(Sister Conferences Best Papers Track) Meta-Interpretive Learning Using HEX-Programs
Tobias Kaminski, Thomas Eiter, Katsumi Inoue
Non-monotonic Reasoning

Meta-Interpretive Learning (MIL) is a recent approach for Inductive Logic Programming (ILP) implemented in Prolog. Alternatively, MIL-problems can be solved by using Answer Set Programming (ASP), which may result in performance gains due to efficient conflict propagation. However, a straightforward MIL-encoding results in a huge size of the ground program and search space. To address these challenges, we encode MIL in the HEX-extension of ASP, which mitigates grounding issues, and we develop novel pruning techniques.

### Tuesday 1316:30 - 18:00CV|BFGR - Biometrics, Face and Gesture Recognition (R7)

Chair: TBD
• #2347
Face Photo-Sketch Synthesis via Knowledge Transfer
Mingrui Zhu, Nannan Wang, Xinbo Gao, Jie Li, Zhifeng Li
Biometrics, Face and Gesture Recognition

Despite deep neural networks have demonstrated strong power in face photo-sketch synthesis task, their performance, however, are still limited by the lack of training data (photo-sketch pairs). Knowledge Transfer (KT), which aims at training a smaller and fast student network with the information learned from a larger and accurate teacher network, has attracted much attention recently due to its superior performance in the acceleration and compression of deep neural networks. This work has brought us great inspiration that we can train a relatively small student network on very few training data by transferring knowledge from a larger teacher model trained on enough training data for other tasks. Therefore, we propose a novel knowledge transfer framework to synthesize face photos from face sketches or synthesize face sketches from face photos. Particularly, we utilize two teacher networks trained on large amount of data in related task to learn the knowledge of face photos and face sketches separately and transfer them to two student networks simultaneously. In addition, the two student networks, one for photo ? sketch task and the other for sketch ? photo task, can transfer their knowledge mutually. With the proposed method, we can train our model which has superior performance using a small set of photo-sketch pairs. We validate the effectiveness of our method across several datasets. Quantitative and qualitative evaluations illustrate that our model outperforms other state-of-the-art methods in generating face sketches (or photos) with high visual quality and recognition ability.

• #2386
Pose-preserving Cross Spectral Face Hallucination
Junchi Yu, Jie Cao, Yi Li, Xiaofei Jia, Ran He
Biometrics, Face and Gesture Recognition

To narrow the inherent sensing gap in heterogeneous face recognition (HFR), recent methods have resorted to generative models and explored the ?recognition via generation? framework. Even though, it remains a very challenging task to synthesize photo-realistic visible faces (VIS) from near-infrared (NIR) images especially when paired training data are unavailable. We present an approach to avert the data misalignment problem and faithfully preserve pose, expression and identity information during cross-spectral face hallucination. At the pixel level, we introduce an unsupervised attention mechanism to warping that is jointly learned with the generator to derive pixel-wise correspondence from unaligned data. At the image level, an auxiliary generator is employed to facilitate the learning of mapping from NIR to VIS domain. At the domain level, we first apply the mutual information constraint to explicitly measure the correlation between domains and thus benefit synthesis. Extensive experiments on three heterogeneous face datasets demonstrate that our approach not only outperforms current state-of-the-art HFR methods but also produce visually appealing results at a high resolution.

• #2439
Multi-Margin based Decorrelation Learning for Heterogeneous Face Recognition
Bing Cao, Nannan Wang, Xinbo Gao, Jie Li, Zhifeng Li
Biometrics, Face and Gesture Recognition

Heterogeneous face recognition (HFR) refers to matching face images acquired from different domains with wide applications in security scenarios. However, HFR is still a challenging problem due to the significant cross-domain discrepancy and the lacking of sufficient training data in different domains. This paper presents a deep neural network approach namely Multi-Margin based Decorrelation Learning (MMDL) to extract decorrelation representations in a hyperspherical space for cross-domain face images. The proposed framework can be divided into two components: heterogeneous representation network and decorrelation representation learning. First, we employ a large scale of accessible visual face images to train heterogeneous representation network. The decorrelation layer projects the output of the first component into decorrelation latent subspace and obtain decorrelation representation. In addition, we design a multi-margin loss (MML), which consists of tetradmargin loss (TML) and heterogeneous angular margin loss (HAML), to constrain the proposed framework. Experimental results on two challenging heterogeneous face databases show that our approach achieves superior performance on both verification and recognition tasks, comparing with state-of-the-art methods.

• #3952
High Performance Gesture Recognition via Effective and Efficient Temporal Modeling
Yang Yi, Feng Ni, Yuexin Ma, Xinge Zhu, Yuankai Qi, Riming Qiu, Shijie Zhao, Feng Li, Yongtao Wang
Biometrics, Face and Gesture Recognition

State-of-the-art hand gesture recognition methods have investigated the spatiotemporal features based on 3D convolutional neural networks (3DCNNs) or convolutional long short-term memory (ConvLSTM). However, they often suffer from the inefficiency due to the high computational complexity of their network structures. In this paper, we focus instead on the 1D convolutional neural networks and propose a simple and efficient architectural unit, Multi-Kernel Temporal Block (MKTB), that models the multi-scale temporal responses by explicitly applying different temporal kernels. Then, we present a Global Refinement Block (GRB), which is an attention module for shaping the global temporal features based on the cross-channel similarity. By incorporating the MKTB and GRB, our architecture can effectively explore the spatiotemporal features within tolerable computational cost. Extensive experiments conducted on public datasets demonstrate that our proposed model achieves the state-of-the-art with higher efficiency. Moreover, the proposed MKTB and GRB are plug-and-play modules and the experiments on other tasks, like video understanding and video-based person re-identification, also display their good performance in efficiency and capability of generalization.

• #5053
Attribute-Aware Convolutional Neural Networks for Facial Beauty Prediction
Luojun Lin, Lingyu Liang, Lianwen Jin, Weijie Chen
Biometrics, Face and Gesture Recognition

Facial beauty prediction (FBP) aims to develop a machine that automatically makes facial attractiveness assessment. To a large extent, the perception of facial beauty for a human is involved with the attributes of facial appearance, which provides some significant visual cues for FBP. Deep convolution neural networks (CNNs) have shown its power for FBP, but convolution filters with fixed parameters cannot take full advantage of the facial attributes for FBP. To address this problem, we propose an Attribute-aware Convolutional Neural Network (AaNet) that modulates the filters of the main network, adaptively, using parameter generators that take beauty-related attributes as extra inputs. The parameter generators update the filters in the main network in two different manners: filter tuning or filter rebirth. However, AaNet takes attributes information as prior knowledge, that is ill-suited to those datasets merely with task-oriented labels. Therefore, imitating the design of AaNet, we further propose a Pseudo Attribute-aware Convolutional Neural Network (P-AaNet) that modulates filters conditioned on global context embeddings (pseudo attributes) of input faces learnt by a lightweight pseudo attribute distiller. Extensive ablation studies show that the AaNet and P-AaNet improve the performance of FBP when compared to conventional convolution and attention scheme, which validates the effectiveness of our method.

• #30
Dense Temporal Convolution Network for Sign Language Translation
Dan Guo, Shuo Wang, Qi Tian, Meng Wang
Biometrics, Face and Gesture Recognition

The sign language translation (SLT) which aims at translating a sign language video into natural language is a weakly supervised task, given that there is no exact mapping relationship between visual actions and textual words in a sentence label. To align the sign language actions and translate them into the respective words automatically, this paper proposes a dense temporal convolution network, termed DenseTCN which captures the actions in hierarchical views. Within this network, a temporal convolution (TC) is designed to learn the short-term correlation among adjacent features and further extended to a dense hierarchical structure. In the kth TC layer, we integrate the outputs of all preceding layers together: (1) The TC in a deeper layer essentially has larger receptive fields, which captures long-term temporal context by the hierarchical content transition. (2) The integration addresses the SLT problem by different views, including embedded short-term and extended longterm sequential learning. Finally, we adopt the CTC loss and a fusion strategy to learn the featurewise classification and generate the translated sentence. The experimental results on two popular sign language benchmarks, i.e. PHOENIX and USTCConSents, demonstrate the effectiveness of our proposed method in terms of various measurements.

### Tuesday 1316:30 - 18:00KRR|DLO - Description Logics and Ontologies 1 (R8)

Chair: TBD
• #2773
Augmenting Transfer Learning with Semantic Reasoning
Freddy Lécué, Jiaoyan Chen, Jeff Z. Pan, Huajun Chen
Description Logics and Ontologies 1

Transfer learning aims at building robust prediction models by transferring knowledge gained from one problem to another. In the semantic Web, learn- ing tasks are enhanced with semantic representations. We exploit their semantics to augment transfer learning by dealing with when to transfer with semantic measurements and what to transfer with semantic embeddings. We further present a general framework that integrates the above measurements and embeddings with existing transfer learning algorithms for higher performance. It has demonstrated to be robust in two real-world applications: bus delay forecasting and air quality forecasting.

• #2902
Oblivious and Semi-Oblivious Boundedness for Existential Rules
Pierre Bourhis, Michel Leclère, Marie-Laure Mugnier, Sophie Tison, Federico Ulliana, Lily Gallois
Description Logics and Ontologies 1

We study the notion of boundedness in the context positive existential rules, that is, wether there exists an upper bound to the depth of the chase procedure, that is independent from the initial instance. By focussing our attention on the oblivious and the semi-oblivious chase variants, we give a characterization of boundedness in terms of FO-rewritability and chase termination. We show that it is decidable to recognize if a set of rules is bounded for several classes of rules and outline the complexity of the problem.

• #4008
Semantic characterization of data services through ontologies
Gianluca Cima, Maurizio Lenzerini, Antonella Poggi
Description Logics and Ontologies 1

We study the problem of associating formal semantic descriptions to data services. We base our proposal on the Ontology-based Data Access paradigm, where a domain ontology is used to provide a semantic layer mapped to the data sources of an organization. The basic idea is to explain the semantics of a data service in terms of a query over the ontology. We illustrate a formal framework for this problem, based on the notion of source-to-ontology (s-to-o) rewriting, which comes in three variants, called sound, complete and perfect, respectively. We present a thorough complexity analysis of two computational problems, namely verification (checking whether a query is an s-to-o rewriting of a given data service), and computation (computing an s-to-o rewriting of a data service).

• #5437
Revisiting Controlled Query Evaluation in Description Logics
Domenico Lembo, Riccardo Rosati, Domenico Fabio Savo
Description Logics and Ontologies 1

Controlled Query Evaluation (CQE) is a confidentiality-preserving framework in which private information is protected through a policy, and a (optimal) censor guarantees that answers to queries are maximized without violating the policy. CQE has been recently studied in the context of ontologies, where the focus has been mainly on the problem of the existence of an optimal censor. In this paper we instead study query answering over all possible optimal censors. We establish data complexity of this problem for ontologies specified in the Description Logics DL-Lite_R and EL_bottom and for variants of the censor language, which is the language used by the censor to enforce the policy. In our investigation we also analyze the relationship between CQE and the problem of Consistent Query Answering (CQA). Some of the complexity results we provide are indeed proved through mutual reduction between CQE and CQA.

• #925
Satisfaction and Implication of Integrity Constraints in Ontology-based Data Access
Charalampos Nikolaou, Bernardo Cuenca Grau, Egor V. Kostylev, Mark Kaminski, Ian Horrocks
Description Logics and Ontologies 1

We extend ontology-based data access with integrity constraints over both the source and target schemas. The relevant reasoning problems in this setting are constraint satisfaction—to check whether a database satisfies the target constraints given the mappings and the ontology—and source-to-target (resp., target-to-source) constraint implication, which is to check whether a target constraint (resp., a source constraint) is satisfied by each database satisfying the source constraints (resp., the target constraints). We establish decidability and complexity bounds for all these problems in the case where ontologies are expressed in DL-LiteR and constraints range from functional dependencies to disjunctive tuple-generating dependencies.

• #2486
Learning Description Logic Concepts: When can Positive and Negative Examples be Separated?
Maurice Funk, Jean Christoph Jung, Carsten Lutz, Hadrien Pulcini, Frank Wolter
Description Logics and Ontologies 1

Learning description logic (DL) concepts from positive and negative examples given in the form of labeled data items in a KB has received significant attention in the literature. We study the fundamental question of when a separating DL concept exists and provide useful model-theoretic characterizations as well as complexity results for the associated decision problem. For expressive DLs such as ALC and ALCQI, our characterizations show a surprising link to the evaluation of ontology-mediated (unions of) conjunctive queries. We exploit this to determine the combined complexity (between ExpTime and NExpTime) and data complexity (second level of the polynomial hierarchy) of separability. For the Horn DL EL, separability is ExpTime-complete both in combined and in data complexity while for its modest extension ELI it is even undecidable. Separability is also undecidable when the KB is formulated in ALC and the separating concept is required to be in EL or ELI.

### Tuesday 1316:30 - 18:00NLP|NLS - Natural Language Semantics (R9)

Chair: TBD
• #1326
Aspect-Based Sentiment Classification with Attentive Neural Turing Machines
Qianren Mao, Jianxin Li, Senzhang Wang, Yuanning Zhang, Hao Peng, Min He, Lihong Wang
Natural Language Semantics

Aspect-based sentiment classification aims to identify sentiment polarity expressed towards a given opinion target in a sentence. The sentiment polarity of the target is not only highly determined by sentiment semantic context but also correlated with the concerned opinion target. Existing works cannot effectively capture and store the inter-dependence between the opinion target and its context.To solve this issue, we propose a novel model of Attentive Neural Turing Machines (ANTM). Via interactive read-write operations between an external memory storage and a recurrent controller, ANTM can learn the dependable correlation of the opinion target to context and concentrate on crucial sentiment information. Specifically, ANTM separates the information of storage and computation, which extends the capabilities of the controller to learn and store sequential features. The read and write operations enable ANTM to adaptively keep track of the interactive attention history between memory content and controller state. We append target entity embeddings into both input and output of the controller in order to augment the integration of target information. We evaluate our model on SemEval2014 dataset which contains reviews of Laptop and Restaurant domains and Twitter review dataset. Experimental results verify that our model achieves state-of-the-art performance on aspect-based sentiment classification.

• #2826
A Latent Variable Model for Learning Distributional Relation Vectors
Jose Camacho-Collados, Luis Espinosa-Anke, Shoaib Jameel, Steven Schockaert
Natural Language Semantics

Recently a number of unsupervised approaches have been proposed for learning vectors that capture the relationship between two words. Inspired by word embedding models, these approaches rely on co-occurrence statistics that are obtained from sentences in which the two target words appear. However, the number of such sentences is often quite small, and most of the words that occur in them are not relevant for characterizing the considered relationship. As a result, standard co-occurrence statistics typically lead to noisy relation vectors. To address this issue, we propose a latent variable model that aims to explicitly determine what words from the given sentences best characterize the relationship between the two target words. Relation vectors then correspond to the parameters of a simple unigram language model which is estimated from these words.

• #4815
Dual-View Variational Autoencoders for Semi-Supervised Text Matching
Zhongbin Xie, Shuai Ma
Natural Language Semantics

Semantically matching two text sequences (usually two sentences) is a fundamental problem in NLP. Most previous methods either encode each of the two sentences into a vector representation (sentence-level embedding) or leverage word-level interaction features between the two sentences. In this study, we propose to take the sentence-level embedding features and the word-level interaction features as two distinct views of a sentence pair, and unify them with a framework of Variational Autoencoders such that the sentence pair is matched in a semi-supervised manner. The proposed model is referred to as Dual-View Variational AutoEncoder (DV-VAE), where the optimization of the variational lower bound can be interpreted as an implicit Co-Training mechanism for two matching models over distinct views. Experiments on SNLI, Quora and a Community Question Answering dataset demonstrate the superiority of our DV-VAE over several strong semi-supervised and supervised text matching models.

• #296
TransMS: Knowledge Graph Embedding for Complex Relations by Multidirectional Semantics
Shihui Yang, Jidong Tian, Honglun Zhang, Junchi Yan, Hao He, Yaohui Jin
Natural Language Semantics

Knowledge graph embedding, which projects the symbolic relations and entities onto low-dimension continuous spaces, is essential to knowledge graph completion. Recently, translation-based embedding models (e.g. TransE) have aroused increasing attention for their simplicity and effectiveness. These models attempt to translate semantics from head entities to tail entities with the relations and infer richer facts outside the knowledge graph. In this paper, we propose a novel knowledge graph embedding method named TransMS, which translates and transmits multidirectional semantics: i) the semantics of head/tail entities and relations to tail/head entities with nonlinear functions and ii) the semantics from entities to relations with linear bias vectors. Our model has merely one additional parameter α than TransE for each triplet, which results in its better scalability in large-scale knowledge graph. Experiments show that TransMS achieves substantial improvements against state-of-the-art baselines, especially the Hit@10s of head entity prediction for N-1 relations and tail entity prediction for 1-N relations improved by about 27.1% and 24.8% on FB15K database respectively.

• #3445
CNN-Based Chinese NER with Lexicon Rethinking
Tao Gui, Ruotian Ma, Qi Zhang, Lujun Zhao, Yu-Gang Jiang, Xuanjing Huang
Natural Language Semantics

Character-level Chinese named entity recognition (NER) that applies long short-term memory (LSTM) to incorporate lexicons has achieved great success. However, this method fails to fully exploit GPU parallelism and candidate lexicons can conflict. In this work, we propose a faster alternative to Chinese NER: a convolutional neural network (CNN)-based method that incorporates lexicons using a rethinking mechanism. The proposed method can model all the characters and potential words that match the sentence in parallel. In addition, the rethinking mechanism can address the word conflict by feeding back the high-level features to refine the networks. Experimental results on four datasets show that the proposed method can achieve better performance than both word-level and character-level baseline methods. In addition, the proposed method performs up to 3.21 times faster than state-of-the-art methods, while realizing better performance.

• #3652
Learning Task-specific Representation for Novel Words in Sequence Labeling
Minlong Peng, Qi Zhang, Xiaoyu Xing, Tao Gui, Jinlan Fu, Xuanjing Huang
Natural Language Semantics

Word representation is a key component in neural-network-based sequence labeling systems. However, representations of unseen or rare words trained on the end task are usually poor for appreciable performance. This is commonly referred to as the out-of-vocabulary (OOV) problem. In this work, we address the OOV problem in sequence labeling using only training data of the task. To this end, we propose a novel method to predict representations for OOV words from their surface-forms (e.g., character sequence) and contexts. The method is specifically designed to avoid the error propagation problem suffered by existing approaches in the same paradigm. To evaluate its effectiveness, we performed extensive empirical studies on four part-of-speech tagging (POS) tasks and four named entity recognition (NER) tasks. Experimental results show that the proposed method can achieve better or competitive performance on the OOV problem compared with existing state-of-the-art methods.

### Tuesday 1316:30 - 18:00CV|RDCIMRSI - Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 2 (R10)

Chair: TBD
• #416
Resolution-invariant Person Re-Identification
Shunan Mao, Shiliang Zhang, Ming Yang
Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 2

Exploiting resolution invariant representation is critical for person Re-Identification (ReID) in real applications, where the resolutions of captured person images may vary dramatically. This paper learns person representations robust to resolution variance through jointly training a Foreground-Focus Super-Resolution (FFSR) module and a Resolution-Invariant Feature Extractor (RIFE) by end-to-end CNN learning. FFSR upscales the person foreground using a fully convolutional auto-encoder with skip connections learned with a foreground focus training loss. RIFE adopts two feature extraction streams weighted by a dual-attention block to learn features for low and high resolution images, respectively. These two complementary modules are jointly trained, leading to a strong resolution invariant representation. We evaluate our methods on five datasets containing person images at a large range of resolutions, where our methods show substantial superiority to existing solutions. For instance, we achieve Rank-1 accuracy of 36.4% and 73.3% on CAVIAR and MLR-CUHK03, outperforming the state-of-the art by 2.9% and 2.6%, respectively.

• #690
Deep Light-field-driven Saliency Detection from a Single View
Yongri Piao, Zhengkun Rong, Miao Zhang, Xiao Li, Huchuan Lu
Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 2

Previous 2D saliency detection methods extract salient cues from a single view and directly predict the expected results. Both traditional and deep-learning-based 2D methods do not consider geometric information of 3D scenes. Therefore the relationship between scene understanding and salient objects cannot be effectively established. This limits the performance of 2D saliency detection in challenging scenes. In this paper, we show for the first time that saliency detection problem can be reformulated as two sub-problems: light field synthesis from a single view and light-field-driven saliency detection. We propose a high-quality light field synthesis network to produce reliable 4D light field information. Then we propose a novel light-field-driven saliency detection network with two purposes, that is, i) richer saliency features can be produced for effective saliency detection; ii) geometric information can be considered for integration of multi-view saliency maps in a view-wise attention fashion. The whole pipeline can be trained in an end-to-end fashion. For training our network, we introduce the largest light field dataset for saliency detection, containing 1580 light fields that cover a wide variety of challenging scenes. With this new formulation, our method is able to achieve state-of-the-art performance.

• #1361
Generalized Zero-Shot Vehicle Detection in Remote Sensing Imagery via Coarse-to-Fine Framework
Hong Chen, Yongtan Luo, Liujuan Cao, Baochang Zhang, Guodong Guo, Cheng Wang, Jonathan Li, Rongrong Ji
Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 2

Vehicle detection and recognition in remote sensing images are challenging, especially when only limited training data are available to accommodate various target categories. In this paper, we introduce a novel coarse-to-fine framework, which decomposes vehicle detection into segmentation-based vehicle localization and generalized zero-shot vehicle classification. Particularly, the proposed framework can well handle the problem of generalized zero-shot vehicle detection, which is challenging due to the requirement of recognizing vehicles that are even unseen during training. Specifically, a hierarchical DeepLab v3 model is proposed in the framework, which fully exploits fine-grained features to locate the target on a pixel-wise level, then recognizes vehicles in a coarse-grained manner. Additionally, the hierarchical DeepLab v3 model is beneficially compatible to combine the generalized zero-shot recognition. To the best of our knowledge, there is no publically available dataset to test comparative methods, we therefore construct a new dataset to fill this gap of evaluation. The experimental results show that the proposed framework yields promising results on the imperative yet difficult task of zero-shot vehicle detection and recognition.

• #6134
MSR: Multi-Scale Shape Regression for Scene Text Detection
Chuhui Xue, Shijian Lu, Wei Zhang
Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 2

State-of-the-art scene text detection techniques predict quadrilateral boxes that are prone to localization errors while dealing with straight or curved text lines of different orientations and lengths in scenes. This paper presents a novel multi-scale shape regression network (MSR) that is capable of locating text lines of different lengths, shapes and curvatures in scenes. The proposed MSR detects scene texts by predicting dense text boundary points that inherently capture the location and shape of text lines accurately and are also more tolerant to the variation of text line length as compared with the state of the arts using proposals or segmentation. Additionally, the multi-scale network extracts and fuses features at different scales which demonstrates superb tolerance to the text scale variation. Extensive experiments over several public datasets show that the proposed MSR obtains superior detection performance for both curved and straight text lines of different lengths and orientations.

• #3846
Beyond Product Quantization: Deep Progressive Quantization for Image Retrieval
Lianli Gao, Xiaosu Zhu, Jingkuan Song, Zhou Zhao, Heng Tao Shen
Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 2

Product Quantization (PQ) has long been a mainstream for generating an exponentially large codebook at very low memory/time cost. Despite its success, PQ is still tricky for the decomposition of high-dimensional vector space, and the retraining of model is usually unavoidable when the code length changes. In this work, we propose a deep progressive quantization (DPQ) model, as an alternative to PQ, for large scale image retrieval. DPQ learns the quantization codes sequentially and approximates the original feature space progressively. Therefore, we can train the quantization codes with different code lengths simultaneously. Specifically, we first utilize the label information for guiding the learning of visual features, and then apply several quantization blocks to progressively approach the visual features. Each quantization block is designed to be a layer of a convolutional neural network, and the whole framework can be trained in an end-to-end manner. Experimental results on the benchmark datasets show that our model significantly outperforms the state-of-the-art for image retrieval. Our model is trained once for different code lengths and therefore requires less computation time. Additional ablation study demonstrates the effect of each component of our proposed model. Our code is released at https://github.com/cfm-uestc/DPQ.

• #1370
LRDNN: Local-refining based Deep Neural Network for Person Re-Identification with Attribute Discerning
Qinqin Zhou, Bineng Zhong, Xiangyuan Lan, Gan Sun, Yulun Zhang, Mengran Gou
Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 2

Recently, pose or attribute information has been widely used to solve person re-identification (re-ID) problem. However, the inaccurate output from pose or attribute modules will impair the final person re-ID performance. Since re-ID, pose estimation and attribute recognition are all based on the person appearance information, we propose a Local-refining based Deep Neural Network (LRDNN) to aggregate pose estimation and attribute recognition to improve the re-ID performance. To this end, we add a pose branch to extract the local spatial information and optimize the whole network on both person identity and attribute objectives. To diminish the negative affect from unstable pose estimation, a novel structure called channel parse block (CPB) is introduced to learn weights on different feature channels in pose branch. Then two branches are combined with compact bilinear pooling. Experimental results on Market1501 and DukeMTMC-reid datasets illustrate the effectiveness of the proposed method.

### Tuesday 1316:30 - 18:00ML|C - Classification 3 (R11)

Chair: TBD
• #183
Margin Learning Embedded Prediction for Video Anomaly Detection with A Few Anomalies
Wen Liu, Weixin Luo, Zhengxin Li, Peilin Zhao, Shenghua Gao
Classification 3

Classical semi-supervised video anomaly detection assumes that only normal data are available in the training set because of the rare and unbounded nature of anomalies. It is obviously, however, these infrequently observed abnormal events can actually help with the detection of identical or similar abnormal events, a line of thinking that motivates us to study open-set supervised anomaly detection with only a few types of abnormal observed events and many normal events available. Under the assumption that normal events can be well predicted, we propose a Margin Learning Embedded Prediction (MLEP) framework. There are three features in MLEP- based open-set supervised video anomaly detection: i) we customize a video prediction framework that favors the prediction of normal events and distorts the prediction of abnormal events; ii) The margin learning framework learns a more compact normal data distribution and enlarges the margin between normal and abnormal events. Since abnormal events are unbounded, our framework consequently helps with the detection of abnormal events, even for anomalies that have never been previously observed. Therefore, our framework is suitable for the open-set supervised anomaly detection setting; iii) our framework can readily handle both frame-level and video-level anomaly annotations. Considering that video-level anomaly detection is more easily annotated in practice and that anomaly detection with a few anomalies is a more practical setting, our work thus pushes the application of anomaly detection towards real scenarios. Extensive experiments validate the effectiveness of our framework for anomaly detection.

• #1214
Comprehensive Semi-Supervised Multi-Modal Learning
Yang Yang, Ke-Tao Wang, De-Chuan Zhan, Hui Xiong, Yuan Jiang
Classification 3

Multi-modal learning refers to the process of learning a precise model to represent the joint representations of different modalities. Despite its promise for multi-modal learning, the co-regularization method is based on the consistency principle with a sufficient assumption, which usually does not hold for real-world multi-modal data. Indeed, due to the modal insufficiency in real-world applications, there are divergences among heterogeneous modalities. This imposes a critical challenge for multi-modal learning. To this end, in this paper, we propose a novel Comprehensive Multi-Modal Learning (CMML) framework, which can strike a balance between the consistency and divergency modalities by considering the insufficiency in one unified framework. Specifically, we utilize an instance level attention mechanism to weight the sufficiency for each instance on different modalities. Moreover, novel diversity regularization and robust consistency metrics are designed for discovering insufficient modalities. Our empirical studies show the superior performances of CMML on real-world data in terms of various criteria.

• #5482
Exploiting Interaction Links for Node Classification with Deep Graph Neural Networks
Hogun Park, Jennifer Neville
Classification 3

Node classification is an important problem in relational machine learning. However, in scenarios where graph edges represent interactions among the entities (e.g., over time), the majority of current methods either summarize the interaction information into link weights or aggregate the links to produce a static graph. In this paper, we propose a neural network architecture that jointly captures both temporal and static interaction patterns, which we call Temporal-Static-Graph-Net (TSGNet). Our key insight is that leveraging both a static neighbor encoder, which can learn aggregate neighbor patterns, and a graph neural network-based recurrent unit, which can capture complex interaction patterns, improve the performance of node classification. In our experiments on node classification tasks, TSGNet produces significant gains compared to state-of-the-art methods—reducing classification error up to 24% and an average of 10% compared to the best competitor on four real-world networks and one synthetic dataset.

• #5854
Automated Machine Learning with Monte-Carlo Tree Search
Herilalaina Rakotoarison, Marc Schoenauer, Michèle Sebag
Classification 3

The AutoML approach aims to deliver peak performance from a machine learning  portfolio on the dataset at hand. A Monte-Carlo Tree Search Algorithm Selection and Configuration (Mosaic) approach is presented to tackle this mixed (combinatorial and continuous) expensive optimization problem on the structured search space of ML pipelines. Extensive lesion studies are conducted to independently assess and compare: i) the optimization processes based on Bayesian Optimization or Monte Carlo Tree Search (MCTS); ii) its warm-start initialization based on meta-features or random runs; iii) the ensembling of the solutions gathered along the search. Mosaic is assessed on the OpenML 100 benchmark and the Scikit-learn portfolio, with statistically significant gains over AutoSkLearn, winner of all former AutoML challenges.

• #2178
Multi-Class Learning using Unlabeled Samples: Theory and Algorithm
Jian Li, Yong Liu, Rong Yin, Weiping Wang
Classification 3

In this paper, we investigate the generalization performance of multi-class classification, for which we obtain a shaper error bound by using the notion of local Rademacher complexity and additional unlabeled samples, substantially improving the state-of-the-art bounds in existing multi-class learning methods. The statistical learning motivates us to devise an efficient multi-class learning framework with the local Rademacher complexity and Laplacian regularization. Coinciding with the theoretical analysis, experimental results demonstrate that the stated approach achieves better performance.

• #3777
Accelerated Incremental Gradient Descent using Momentum Acceleration with Scaling Factor
Yuanyuan Liu, Fanhua Shang, Licheng Jiao
Classification 3

Recently, research on variance reduced incremental gradient descent methods (e.g., SAGA) has made exciting progress (e.g., linear convergence for strongly convex (SC) problems). However, existing accelerated methods (e.g., point-SAGA) suffer from drawbacks such as inflexibility. In this paper, we design a novel and simple momentum to accelerate the classical SAGA algorithm, and propose a direct accelerated incremental gradient descent algorithm. In particular, our theoretical result shows that our algorithm attains a best known oracle complexity for strongly convex problems and an improved convergence rate for the case of n>=L/\mu. We also give experimental results justifying our theoretical results and showing the effectiveness of our algorithm.

### Tuesday 1316:30 - 18:00ML|DM - Data Mining 3 (R12)

Chair: TBD
• #3254
Learning Network Embedding with Community Structural Information
Yu Li, Ying Wang, Tingting Zhang, Jiawei Zhang, Yi Chang
Data Mining 3

Network embedding is an effective approach to learn the low-dimensional representations of vertices in networks, aiming to capture and preserve the structure and inherent properties of networks. The vast majority of existing network embedding methods exclusively focus on vertex proximity of networks, while ignoring the network internal community structure. However, the homophily principle indicates that vertices within the same community are more similar to each other than those from different communities, thus vertices within the same community should have similar vertex representations. Motivated by this, we propose a novel network embedding framework NECS to learn the Network Embedding with Community Structural information, which preserves the high-order proximity and incorporates the community structure in vertex representation learning. We formulate the problem into a principled optimization framework and provide an effective alternating algorithm to solve it. Extensive experimental results on several benchmark network datasets demonstrate the effectiveness of the proposed framework in various network analysis tasks including network reconstruction, link prediction and vertex classification.

• #5016
Unified Embedding Model over Heterogeneous Information Network for Personalized Recommendation
Zekai Wang, Hongzhi Liu, Yingpeng Du, Zhonghai Wu, Xing Zhang
Data Mining 3

Most of heterogeneous information network (HIN) based recommendation models are based on the user and item modeling with meta-paths. However, they always model users and items in isolation under each meta-path, which may lead to information extraction misled. In addition, they only consider structural features of HINs when modeling users and items during exploring HINs, which may lead to useful information for recommendation lost irreversibly. To address these problems, we propose a HIN based unified embedding model for recommendation, called HueRec. We assume there exist some common characteristics under different meta-paths for each user or item, and use data from all meta-paths to learn unified users’ and items’ representations. So the interrelation between meta-paths are utilized to alleviate the problems of data sparsity and noises on one meta-path. Different from existing models which first explore HINs then make recommendations, we combine these two parts into an end-to-end model to avoid useful information lost in initial phases. In addition, we embed all users, items and meta-paths into related latent spaces. Therefore, we can measure users’ preferences on meta-paths to improve the performances of personalized recommendation. Extensive experiments show HueRec consistently outperforms state-of-the-art methods.

• #5394
Metric Learning on Healthcare Data with Incomplete Modalities
Qiuling Suo, Weida Zhong, Fenglong Ma, Ye Yuan, Jing Gao, Aidong Zhang
Data Mining 3

Utilizing multiple modalities to learn a good distance metric is of vital importance for various clinical applications. However, it is common that modalities are incomplete for some patients due to various technical and practical reasons in healthcare datasets. Existing metric learning methods cannot directly learn the distance metric on such data with missing modalities. Nevertheless, the incomplete data contains valuable information to characterize patient similarity and modality relationships, and they should not be ignored during the learning process. To tackle the aforementioned challenges, we propose a metric learning framework to perform missing modality completion and multi-modal metric learning simultaneously. Employing the generative adversarial networks, we incorporate both complete and incomplete data to learn the mapping relationship between modalities. After completing the missing modalities, we use the nonlinear representations extracted by the discriminator to learn the distance metric among patients. Through jointly training the adversarial generation part and metric learning, the similarity among patients can be learned on data with missing modalities. Experimental results show that the proposed framework learns more accurate distance metric on real-world healthcare datasets with incomplete modalities, comparing with the state-of-the-art approaches. Meanwhile, the quality of the generated modalities can be preserved.

• #316
Joint Link Prediction and Network Alignment via Cross-graph Embedding
Xingbo Du, Junchi Yan, Hongyuan Zha
Data Mining 3

Link prediction and network alignment are two important problems in social network analysis and other network related applications. Considerable efforts have been devoted to these two problems while often in an independent way to each other. In this paper we argue that these two tasks are relevant and present a joint link prediction and network alignment framework, whereby a novel cross-graph node embedding technique is devised to allow for information propagation. Our approach can either work with a few initial vertex correspondence as seeds, or from scratch. By extensive experiments on public benchmark, we show that link prediction and network alignment can benefit to each other especially for improving the recall for both tasks.

• #1709
Masked Graph Convolutional Network
Liang Yang, Fan Wu, Yingkui Wang, Junhua Gu, Yuanfang Guo
Data Mining 3

Semi-supervised classification is a fundamental technology to process the structured and unstructured data in machine learning field. The traditional attribute-graph based semi-supervised classification methods propagate labels over the graph which is usually constructed from the data features, while the graph convolutional neural networks smooth the node attributes, i.e., propagate the attributes, over the real graph topology. In this paper, they are interpreted from the perspective of propagation, and accordingly categorized into symmetric and asymmetric propagation based methods. From the perspective of propagation, both the traditional and network based methods are propagating certain objects over the graph. However, different from the label propagation, the intuition the connected data samples tend to be similar in terms of the attributes", in attribute propagation is only partially valid. Therefore, a masked graph convolution network (Masked GCN) is proposed by only propagating a certain portion of the attributes to the neighbours according to a masking indicator, which is learned for each node by jointly considering the attribute distributions in local neighbourhoods and the impact on the classification results. Extensive experiments on transductive and inductive node classification tasks have demonstrated the superiority of the proposed method.

• #3778
Improving Cross-lingual Entity Alignment via Optimal Transport
Shichao Pei, Lu Yu, Xiangliang Zhang
Data Mining 3

Cross-lingual entity alignment identifies entity pairs that share the same meanings but locate in different language knowledge graphs (KGs). The study in this paper is to address two limitations that widely exist in current solutions:  1) the alignment loss functions defined at the entity level serve well the purpose of aligning labeled entities but fail to match the whole picture of labeled and unlabeled entities in different KGs;  2) the translation from one domain to the other has been considered (e.g., X to Y by M1 or Y to X by M2). However, the important duality of alignment between different KGs  (X to Y by M1 and Y to X by M2) is ignored. We propose a novel entity alignment framework (OTEA), which dually optimizes the entity-level loss and group-level loss via optimal transport theory. We also impose a regularizer on the dual translation matrices to mitigate the effect of noise during transformation. Extensive experimental results show that our model consistently outperforms the state-of-the-arts with significant improvements on alignment accuracy.

### Tuesday 1316:30 - 18:00ML|LT - Learning Theory (R13)

Chair: TBD
• #531
Approximate Optimal Transport for Continuous Densities with Copulas
Jinjin Chi, Jihong Ouyang, Ximing Li, Yang Wang, Meng Wang
Learning Theory

Optimal Transport (OT) formulates a powerful framework by comparing probability distributions, and it has increasingly attracted great attention within the machine learning community. However, it suffers from severe computational burden, due to the intractable objective with respect to the distributions of interest. Especially, there still exist very few attempts for continuous OT, i.e., OT for comparing continuous densities. To this end, we develop a novel continuous OT method, namely Copula OT (Cop-OT). The basic idea is to transform the primal objective of continuous OT into a tractable form with respect to the copula parameter, which can be efficiently solved by stochastic optimization with less time and memory requirements. Empirical results on real applications of image retrieval and synthetic data demonstrate that our Cop-OT can gain more accurate approximations to continuous OT values than the state-of-the-art baselines.

• #848
Improved Algorithm on Online Clustering of Bandits
Shuai Li, Wei Chen, Shuai Li, Kwong-Sak Leung
Learning Theory

We generalize the setting of online clustering of bandits by allowing non-uniform distribution over user frequencies. A more efficient algorithm is proposed with simple set structures to represent clusters. We prove a regret bound for the new algorithm which is free of the minimal frequency over users. The experiments on both synthetic and real datasets consistently show the advantage of the new algorithm over existing methods.

• #1736
Heavy-ball Algorithms Always Escape Saddle Points
Tao Sun, Dongsheng Li, Zhe Quan, Hao Jiang, Shengguo Li, Yong Dou
Learning Theory

Nonconvex optimization algorithms with random initialization have attracted increasing attention recently. It has been showed that many first-order methods always avoid saddle points with random starting points. In this paper, we answer a question: can the nonconvex heavy-ball algorithms with random initialization avoid saddle points? The answer is yes! Direct using the existing proof technique for the heavy-ball algorithms is hard due to that each iteration of the heavy-ball algorithm consists of current and last points. It is impossible to formulate the algorithms as iteration like xk+1= g(xk) under some mapping g. To this end, we design a new mapping on a new space. With some transfers, the heavy-ball algorithm can be interpreted as iterations after this mapping. Theoretically, we prove that heavy-ball gradient descent enjoys larger stepsize than the gradient descent to escape saddle points to escape the saddle point. And the heavy-ball proximal point algorithm is also considered; we also proved that the algorithm can always escape the saddle point.

• #1738
BN-invariant Sharpness Regularizes the Training Model to Better Generalization
Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
Learning Theory

It is arguably believed that flatter minima can generalize better. However, it has been pointed out that the usual definitions of sharpness, which consider either the maxima or the integral of loss over a \delta ball of parameters around minima, cannot give consistent measurement for scale invariant neural networks, e.g., networks with batch normalization layer. In this paper, we first propose a measure of sharpness, BN-sharpness, which gives consistent value for equivalent networks under BN. BN-sharpness achieves the property of scale invariance by scaling the integral diameter according to the magnitude of parameter. We then present a computation-efficient way to calculate the BN-sharpness approximately i.e., one dimensional integral along the sharpest" direction. Furthermore, we use the BN-sharpness to regularize the training and design an algorithm to minimize the new regularized objective. Our algorithm achieves considerably better performance than vanilla SGD over various  experiment settings.

• #2451
Conditions on Features for Temporal Difference-Like Methods to Converge
Marcus Hutter, Samuel Yang-Zhao, Sultan Javed Majeed
Learning Theory

The convergence of many reinforcement learning (RL) algorithms with linear function approximation has been investigated extensively but most proofs assume that these methods converge to a unique solution. In this paper, we provide a complete characterization of non-uniqueness issues for a large class of reinforcement learning algorithms, simultaneously unifying many counter-examples to convergence in a theoretical framework. We achieve this by proving a new condition on features that can determine whether the convergence assumptions are valid or non-uniqueness holds. We consider a general class of RL methods, which we call natural algorithms, whose solutions are characterized as the fixed point of a projected Bellman equation. Our main result proves that natural algorithms converge to the correct solution if and only if all the value functions in the approximation space satisfy a certain shape. This implies that natural algorithms are, in general, inherently prone to converge to the wrong solution for most feature choices even if the value function can be represented exactly. Given our results, we show that state aggregation-based features are a safe choice for natural algorithms and also provide a condition for finding convergent algorithms under other feature constructions.

• #5172
Motion Invariance in Visual Environments
Alessandro Betti, Marco Gori, Stefano Melacci
Learning Theory

The puzzle of computer vision might find new challenging solutions when we realize that most successful methods are working at image level, which is remarkably more difficult than processing directly visual streams, just as it happens in nature. In this paper, we claim that the processing of a stream of frames naturally leads to formulate the motion invariance principle, which enables the construction of a new theory of visual learning based on convolutional features. The theory addresses a number of intriguing questions that arise in natural vision, and offers a well-posed computational scheme for the discovery of convolutional filters over the retina. They are driven by the Euler- Lagrange differential equations derived from the principle of least cognitive action, that parallels the laws of mechanics. Unlike traditional convolutional networks, which need massive supervision, the proposed theory offers a truly new scenario in which feature learning takes place by unsupervised processing of video signals. An experimental report of the theory is presented where we show that features extracted under motion invariance yield an improvement that can be assessed by measuring information-based indexes.

### Tuesday 1316:30 - 18:00CV|LV - Language and Vision 1 (R14)

Chair: TBD
• #801
Talking Face Generation by Conditional Recurrent Adversarial Network
Yang Song, Jingwen Zhu, Dawei Li, Andy Wang, Hairong Qi
Language and Vision 1

Given an arbitrary face image and an arbitrary speech clip, the proposed work attempts to generate the talking face video with accurate lip synchronization. Existing works either do not consider temporal dependency across video frames thus yielding abrupt facial and lip movement or are limited to the generation of talking face video for a specific person thus lacking generalization capacity. We propose a novel conditional recurrent generation network that incorporates both image and audio features in the recurrent unit for  temporal dependency. To achieve both image- and video-realism, a pair of spatial-temporal discriminators are included in the network for better image/video quality. Since accurate lip synchronization is essential to the success of talking face video generation, we also construct a lip-reading discriminator to boost the accuracy of lip synchronization. We also extend the network to model the natural pose and expression of talking face on the Obama Dataset. Extensive experimental results demonstrate the superiority of our framework over the state-of-the-arts in terms of visual quality, lip sync accuracy, and smooth transition pertaining to both lip and facial movement.

• #2791
Dynamically Visual Disambiguation of Keyword-based Image Search
Yazhou Yao, Zeren Sun, Fumin Shen, Li Liu, Limin Wang, Fan Zhu, Lizhong Ding, Gangshan Wu, Ling Shao
Language and Vision 1

Due to the high cost of manual annotation, learning directly from the web has attracted broad attention. One issue that limits their performance is the problem of visual polysemy. To address this issue, we present an adaptive multi-model framework that resolves polysemy by visual disambiguation. Compared to existing methods, the primary advantage of our approach lies in that our approach can adapt to the dynamic changes in the search results. Our proposed framework consists of two major steps: we first discover and dynamically select the text queries according to the image search results, then we employ the proposed saliency-guided deep multi-instance learning network to remove outliers and learn classification models for visual disambiguation. Extensive experiments demonstrate the superiority of our proposed approach.

• #3549
Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation
Jing Wang, Yingwei Pan, Ting Yao, Jinhui Tang, Tao Mei
Language and Vision 1

Image paragraph generation is the task of producing a coherent story (usually a paragraph) that describes the visual content of an image. The problem nevertheless is not trivial especially when there are multiple descriptive and diverse gists to be considered for paragraph generation, which often happens in real images. A valid question is how to encapsulate such gists/topics that are worthy of mention from an image, and then describe the image from one topic to another but holistically with a coherent structure. In this paper, we present a new design --- Convolutional Auto-Encoding (CAE) that purely employs convolutional and deconvolutional auto-encoding framework for topic modeling on the region-level features of an image. Furthermore, we propose an architecture, namely CAE plus Long Short-Term Memory (dubbed as CAE-LSTM), that novelly integrates the learnt topics in support of paragraph generation. Technically, CAE-LSTM capitalizes on a two-level LSTM-based paragraph generation framework with attention mechanism. The paragraph-level LSTM captures the inter-sentence dependency in a paragraph, while sentence-level LSTM is to generate one sentence which is conditioned on each learnt topic. Extensive experiments are conducted on Stanford image paragraph dataset, and superior results are reported when comparing to state-of-the-art approaches. More remarkably, CAE-LSTM increases CIDEr performance from 20.93% to 25.15%.

• #3674
Multi-Level Visual-Semantic Alignments with Relation-Wise Dual Attention Network for Image and Text Matching
Zhibin Hu, Yongsheng Luo, Jiong Lin, Yan Yan, Jian Chen
Language and Vision 1

Image-text matching is central to visual-semantic cross-modal retrieval and has been attracting extensive attention recently. Previous studies have been devoted to finding the latent correspondence between image regions and words, e.g., connecting key words to specific regions of salient objects. However, existing methods are usually committed to handle concrete objects, rather than abstract ones, e.g., a description of some action, which in fact are also ubiquitous in description texts of real-world. Two main challenges then arise with abstract objects. The first one is the absence of explicit connections between them, unlike their concrete counterparts. The second is that image regions and key words are in different spaces, so that not comparable to derive similarity. One therefore has to alternatively find the implicit and intrinsic connections between them. To deal with abstract objects, in this paper, we propose a relation-wise dual attention network (RDAN) for image-text matching. Specifically, we maintain an over-complete set that contains pairs of image regions and key words. Then built upon this set, we encode the local correlations and the global dependencies between regions and words by training a visual-semantic network. Then a dual pathway attention network is presented to infer the visual-semantic alignments and image-text similarity. Extensive experiments validate the efficacy of our method, by achieving the state-of-the-art performance on several public benchmark datasets.

• #4765
Densely Connected Attention Flow for Visual Question Answering
Fei Liu, Jing Liu, Zhiwei Fang, Richang Hong, Hanqing Lu
Language and Vision 1

Learning effective interactions between multi-modal features is at the heart of visual question answering (VQA). A common defect of the existing VQA approaches is that they only consider a very limited amount of interactions, which may be not enough to model latent complex image-question relations that are necessary for accurately answering questions. Therefore, in this paper, we propose a novel DCAF (Densely Connected Attention Flow) framework for modeling dense interactions. It densely connects all pairwise layers of the network via Attention Connectors, capturing fine-grained interplay between image and question across all hierarchical levels. The proposed Attention Connector efficiently connects the multi-modal features at any two layers with symmetric co-attention, and produces interaction-aware attention features. Experimental results on three publicly available datasets show that the proposed method achieves state-of-the-art performance.

• #2695
Video Interactive Captioning with Human Prompts
Aming Wu, Yahong Han, Yi Yang
Language and Vision 1

Video captioning aims at generating a proper sentence to describe the video content. As a video often includes rich visual content and semantic details, different people may be interested in different views. Thus the generated sentence always fails to meet the ad hoc expectations. In this paper, we make a new attempt that, we launch a round of interaction between a human and a captioning agent. After generating an initial caption, the agent asks for a short prompt from the human as a clue of his expectation. Then, based on the prompt, the agent could generate a more accurate caption. We name this process a new task of video interactive captioning (ViCap). Taking a video and an initial caption as input, we devise the ViCap agent which consists of a video encoder, an initial caption encoder, and a refined caption generator. We show that the ViCap can be trained via a full supervision (with ground-truth) way or a weak supervision (with only prompts) way. For the evaluation of ViCap, we first extend the MSRVTT with interaction ground-truth. Experimental results not only show the prompts can help generate more accurate captions, but also demonstrate the good performance of the proposed method.

### Tuesday 1316:30 - 18:00ML|LGM - Learning Graphical Models (R15)

Chair: TBD
• #1256
Dynamic Hypergraph Neural Networks
Jianwen Jiang, Yuxuan Wei, Yifan Feng, Jingxuan Cao, Yue Gao
Learning Graphical Models

In recent years, graph/hypergraph-based deep learning methods have attracted much attention from researchers. These deep learning methods take graph/hypergraph structure as prior knowledge in the model. However, hidden and important relations are not directly represented in the inherent structure. To tackle this issue, we propose a dynamic hypergraph neural networks framework (DHGNN), which is composed of the stacked layers of two modules: dynamic hypergraph construction (DHG) and hypergrpah convolution (HGC). Considering initially constructed hypergraph is probably not a suitable representation for data, the DHG module dynamically updates hypergraph structure on each layer. Then hypergraph convolution is introduced to encode high-order data relations in a hypergraph structure. The HGC module includes two phases: vertex convolution and hyperedge convolution, which are designed to aggregate feature among vertices and hyperedges, respectively. We have evaluated our method on standard datasets, the Cora citation network and Microblog dataset. Our method outperforms state-of-the-art methods. More experiments are conducted to demonstrate the effectiveness and robustness of our method to diverse data distributions.

• #3145
Neural Network based Continuous Conditional Random Field for Fine-grained Crime Prediction
Fei Yi, Zhiwen Yu, Fuzhen Zhuang, Bin Guo
Learning Graphical Models

Crime prediction has always been a crucial issue for public safety, and recent works have shown the effectiveness of taking spatial correlation, such as region similarity or interaction, for fine-grained crime modeling. In our work, we seek to reveal the relationship across regions for crime prediction using Continuous Conditional Random Field (CCRF). However, conventional CCRF would become impractical when facing a dense graph considering all relationship between regions. To deal with it, in this paper, we propose a Neural Network based CCRF (NN-CCRF) model that formulates CCRF into an end-to-end neural network framework, which could reduce the complexity in model training and improve the overall performance. We integrate CCRF with NN by introducing a Long Short-Term Memory (LSTM) component to learn the non-linear mapping from inputs to outputs of each region, and a modified Stacked Denoising AutoEncoder (SDAE) component for pairwise interactions modeling between regions. Experiments conducted on two different real-world datasets demonstrate the superiority of our proposed model over the state-of-the-art methods.

• #5018
Efficient Regularization Parameter Selection for Latent Variable Graphical Models via Bi-Level Optimization
Joachim Giesen, Frank Nussbaum, Christopher Schneider
Learning Graphical Models

Latent variable graphical models are an extension of Gaussian graphical models that decompose the precision matrix into a sparse and a low-rank component. These models can be learned with theoretical guarantees from data via a semidefinite program. This program features two regularization terms, one for promoting sparsity and one for promoting a low rank. In practice, however, it is not straightforward to learn a good model since the model highly depends on the regularization parameters that control the relative weight of the loss function and the two regularization terms. Selecting good regularization parameters can be modeled as a bi-level optimization problem, where the upper level optimizes some form of generalization error and the lower level provides a description of the solution gamut. The solution gamut is the set of feasible solutions for all possible values of the regularization parameters. In practice, it is often not feasible to describe the solution gamut efficiently. Hence, algorithmic schemes for approximating solution gamuts have been devised. One such scheme is Benson's generic vector optimization algorithm that comes with approximation guarantees. So far Benson's algorithm has not been used in conjunction with semidefinite programs like the latent variable graphical Lasso. Here, we develop an adaptive variant of Benson's algorithm for the semidefinite case and show that it keeps the known approximation and run time guarantees. Furthermore, Benson's algorithm turns out to be practically more efficient for the latent variable graphical model than the existing solution gamut approximation scheme on a wide range of data sets.

• #1125
Amalgamating Filtered Knowledge: Learning Task-customized Student from Multi-task Teachers
Jingwen Ye, Xinchao Wang, Yixin Ji, Kairi Ou, Mingli Song
Learning Graphical Models

Many well-trained Convolutional Neural Network~(CNN) models have now been released online by developers for the sake of effortless reproducing. In this paper, we treat such pre-trained networks as teachers and explore how to learn a target student network for customized tasks, using multiple teachers that handle different tasks. We assume no human-labelled annotations are available, and each teacher model can be either single- or multi-task network, where the former is a degenerated case of the latter. The student model, depending on the customized tasks, learns the related knowledge filtered from the multiple teachers, and eventually masters the complete or a subset of expertise from all teachers. To this end, we adopt a layer-wise training strategy, which entangles the student's network block to be learned with the corresponding teachers. As demonstrated on several benchmarks, the learned student network achieves very promising results, even outperforming the teachers on the customized tasks.

• #3279
Large Scale Evolving Graphs with Burst Detection
Yifeng Zhao, Xiangwei Wang, Hongxia Yang, Le Song, Jie Tang
Learning Graphical Models

Analyzing large-scale evolving graphs are crucial for understanding the dynamic and evolutionary nature of social networks. Most existing works focus on discovering repeated and consistent temporal patterns, however, such patterns cannot fully explain the complexity observed in dynamic networks. For example, in recommendation scenarios, users sometimes purchase products on a whim during a window shopping.Thus, in this paper, we design and implement a novel framework called BurstGraph which can capture both recurrent and consistent patterns, and especially unexpected bursty network changes. The performance of the proposed algorithm is demonstrated on both a simulated dataset and a world-leading E-Commerce company dataset, showing that they are able to discriminate recurrent events from extremely bursty events in terms of action propensity.

• #3194
Parametric Manifold Learning of Gaussian Mixture Models
Ziquan Liu, Lei Yu, Janet H. Hsiao, Antoni B. Chan
Learning Graphical Models

The Gaussian Mixture Model (GMM) is among the most widely used parametric probability distributions for representing data. However, it is complicated to analyze the relationship among GMMs since they lie on a high-dimensional manifold. Previous works either perform clustering of GMMs, which learns a limited discrete latent representation, or kernel-based embedding of GMMs, which is not interpretable due to difficulty in computing the inverse mapping. In this paper, we propose Parametric Manifold Learning of GMMs (PML-GMM), which learns a parametric mapping from a low-dimensional latent space to a high-dimensional GMM manifold. Similar to PCA, the proposed mapping is parameterized by the principal axes for the component weights, means, and covariances, which are optimized to minimize the reconstruction loss measured using Kullback-Leibler divergence (KLD). As the KLD between two GMMs is intractable, we approximate the objective function by a variational upper bound, which is optimized by an EM-style algorithm. Moreover, We derive an efficient solver by alternating optimization of subproblems and exploit Monte Carlo sampling to escape from local minima. We demonstrate the effectiveness of PML-GMM through experiments on synthetic, eye-fixation, flow cytometry, and social check-in data.

### Tuesday 1317:30 - 18:00Early Career 2 - Early Career Spotlight 2 (R2)

Chair: TBD
• #11057
Integrating Learning with Game Theory for Societal Challenges
Fei Fang
Early Career Spotlight 2

Real-world problems often involve more than one decision makers, each with their own goals or preferences. While game theory is an established paradigm for reasoning strategic interactions between multiple decision-makers, its applicability in practice is often limited by the intractability of computing equilibria in large games, and the fact that the game parameters are sometimes unknown and the players are often not perfectly rational. On the other hand, machine learning and reinforcement learning have led to huge successes in various domains and can be leveraged to overcome the limitations of the game-theoretic analysis. In this paper, we introduce our work on integrating learning with computational game theory for addressing societal challenges such as security and sustainability.

### Wednesday 1408:30 - 09:20Invited Talk (R1)

• Empirical Model Learning: merging knowledge-based and data-driven decision models through machine learning
Michela Milano
Invited Talk
• ### Wednesday 1409:30 - 10:30AI-HWB - ST: AI for Improving Human Well-Being 1 (R2)

Chair: TBD
• #1213
Truly Batch Apprenticeship Learning with Deep Successor Features
Srivatsan Srinivasan, Donghun Lee, Finale Doshi-Velez
ST: AI for Improving Human Well-Being 1

We introduce a novel apprenticeship learning algorithm to learn an expert's underlying reward structure in off-policy model-free batch settings. Unlike existing methods that require hand-crafted features, on-policy evaluation, further data acquisition for evaluation policies or the knowledge of model dynamics, our algorithm requires only batch data (demonstrations) of the observed expert behavior.  Such settings are common in many real-world tasks---health care, finance, or industrial process control---where accurate simulators do not exist and additional data acquisition is costly.  We develop a transition-regularized imitation learning model to learn a rich feature representation and a near-expert initial policy that makes the subsequent batch inverse reinforcement learning process viable. We also introduce deep successor feature networks that perform off-policy evaluation to estimate feature expectations of candidate policies. Under the batch setting, our method achieves superior results on control benchmarks as well as a real clinical task of sepsis management in the Intensive Care Unit.

• #1239
Automatic Grassland Degradation Estimation Using Deep Learning
Xiyu Yan, Yong Jiang, Shuai Chen, Zihao He, Chunmei Li, Shu-Tao Xia, Tao Dai, Shuo Dong, Feng Zheng
ST: AI for Improving Human Well-Being 1

Grassland degradation estimation is essential to prevent global land desertification and sandstorms. Typically, the key to such estimation is to measure the coverage of indicator plants. However, traditional methods of estimation rely heavily on human eyes and manual labor, thus inevitably leading to subjective results and high labor costs. In contrast, deep learning-based image segmentation algorithms are potentially capable of automatic assessment of the coverage of indicator plants. Nevertheless, a suitable image dataset comprising grassland images is not publicly available. To this end, we build an original Automatic Grassland Degradation Estimation Dataset (AGDE-Dataset), with a large number of grassland images captured from the wild. Based on AGDE-Dataset, we are able to propose a brand new scheme to automatically estimate grassland degradation, which mainly consists of two components. 1) Semantic segmentation: we design a deep neural network with an improved encoder-decoder structure to implement semantic segmentation of grassland images. In addition, we propose a novel Focal-Hinge Loss to alleviate the class imbalance of semantics in the training stage.  2) Degradation estimation: we provide the estimation of grassland degradation based on the results of semantic segmentation. Experimental results show that the proposed method achieves satisfactory accuracy in grassland degradation estimation.

• #5979
DDL: Deep Dictionary Learning for Predictive Phenotyping
Tianfan Fu, Trong Nghia Hoang, Cao Xiao, Jimeng Sun
ST: AI for Improving Human Well-Being 1

Predictive phenotyping is about accurately predicting what phenotypes will occur in the next clinical visit based on longitudinal Electronic Health Record (EHR) data. Several deep learning (DL) models have demonstrated great performance in predictive phenotyping. However, these DL-based phenotyping models requires access to a large amount of labeled data, which are often  expensive to acquire. To address this label-insufficient challenge, we propose a deep dictionary learning framework (DDL) for phenotyping, which utilizes unlabeled data as a complementary source of information to generate a better, more succinct data representation. With extensive experiments on multiple real-world EHR datasets, we demonstrated DDL can outperform the state of the art predictive phenotyping methods on a wide variety of clinical tasks that require patient phenotyping such as heart failure classification, mortality prediction, and sequential prediction. All empirical results consistently show that unlabeled data can indeed be used to generate better data representation, which helps improve DDL's phenotyping performance over existing baseline methods that only uses labeled data.

• #203
Bidirectional Active Learning with Gold-Instance-Based Human Training
Feilong Tang
ST: AI for Improving Human Well-Being 1

Active learning was proposed to improve learning performance and reduce labeling cost. However, traditional relabeling-based schemes seriously limit the ability of active learning because human may repeatedly make similar mistakes, without improving their expertise. In this paper, we propose a Bidirectional Active Learning with human Training (BALT) model that can enhance human related expertise during labeling and improve relabelingquality accordingly. We quantitatively capture how gold instances can be used to both estimate labelers? previous performance and improve their future correctness ratio. Then, we propose the backward relabeling scheme that actively selects the most likely incorrectly labeled instances for relabeling. Experimental results on three real datasets demonstrate that our BALT algorithm significantly outperforms representative related proposals.

### Wednesday 1409:30 - 10:30ML|AML - Adversarial Machine Learning 1 (R3)

Chair: TBD
• #2984
Zeroth-Order Stochastic Alternating Direction Method of Multipliers for Nonconvex Nonsmooth Optimization
Feihu Huang, Shangqian Gao, Songcan Chen, Heng Huang
Adversarial Machine Learning 1

Alternating direction method of multipliers (ADMM) is a popular optimization tool for the composite and constrained problems in machine learning. However, in many machine learning problems such as black-box attacks and bandit feedback, ADMM could fail because the explicit gradients of these problems are difficult or infeasible to obtain. Zeroth-order (gradient-free) methods can effectively solve these problems due to that the objective function values are only required in the optimization. Recently, though there exist a few zeroth-order ADMM methods, they build on the convexity of objective function. Clearly, these existing zeroth-order methods are limited in many applications. In the paper, thus, we propose a class of fast zeroth-order stochastic ADMM methods (i.e., ZO-SVRG-ADMM and ZO-SAGA-ADMM) for solving nonconvex problems with multiple nonsmooth penalties, based on the coordinate smoothing gradient estimator. Moreover, we prove that both the ZO-SVRG-ADMM and ZO-SAGA-ADMM have convergence rate of $O(1/T)$, where $T$ denotes the number of iterations. In particular, our methods not only reach the best convergence rate $O(1/T)$ for the nonconvex optimization, but also are able to effectively solve many complex machine learning problems with multiple regularized penalties and constraints. Finally, we conduct the experiments of black-box binary classification and structured adversarial attack on black-box deep neural network to validate the efficiency of our algorithms.

• #5611
On the Effectiveness of Low Frequency Perturbations
Yash Sharma, Gavin Weiguang Ding, Marcus A. Brubaker
Adversarial Machine Learning 1

Carefully crafted, often imperceptible, adversarial perturbations have been shown to cause state-of-the-art models to yield extremely inaccurate outputs, rendering them unsuitable for safety-critical application domains. In addition, recent work has shown that constraining the attack space to a low frequency regime is particularly effective. Yet, it remains unclear whether this is due to generally constraining the attack search space or specifically removing high frequency components from consideration. By systematically controlling the frequency components of the perturbation, evaluating against the top-placing defense submissions in the NeurIPS 2017 competition, we empirically show that performance improvements in both the white-box and black-box transfer settings are yielded only when low frequency components are preserved. In fact, the defended models based on adversarial training are roughly as vulnerable to low frequency perturbations as undefended models, suggesting that the purported robustness of state-of-the-art ImageNet defenses is reliant upon adversarial perturbations being high frequency in nature. We do find that under $\ell_\infty$ $\epsilon=16/255$, the competition distortion bound, low frequency perturbations are indeed perceptible. This questions the use of the $\ell_\infty$-norm, in particular, as a distortion metric, and, in turn, suggests that explicitly considering the frequency space is promising for learning robust models which better align with human perception.

• #5955
Harnessing the Vulnerability of Latent Layers in Adversarially Trained Models
Nupur Kumari, Mayank Singh, Abhishek Sinha, Harshitha Machiraju, Balaji Krishnamurthy, Vineeth N Balasubramanian
Adversarial Machine Learning 1

Neural networks are vulnerable to adversarial attacks - small visually imperceptible crafted noise which when added to the input drastically changes the output. The most effective method of defending against adversarial attacks is to use the methodology of adversarial training. We analyze the adversarially trained robust models to study their vulnerability against adversarial attacks at the level of the latent layers. Our analysis reveals that contrary to the input layer which is robust to adversarial attack, the latent layer of these robust models are highly susceptible to adversarial perturbations of small magnitude. Leveraging this information, we introduce a new technique Latent Adversarial Training (LAT) which comprises of fine-tuning the adversarially trained models to ensure the robustness at the feature layers. We also propose Latent Attack (LA), a novel algorithm for constructing adversarial examples. LAT results in a minor improvement in test accuracy and leads to a state-of-the-art adversarial accuracy against the universal first-order adversarial PGD attack which is shown for the MNIST, CIFAR-10, CIFAR-100, SVHN and Restricted ImageNet datasets.

• #10963
(Sister Conferences Best Papers Track) Adversarial Attacks on Neural Networks for Graph Data
Daniel Zügner, Amir Akbarnejad, Stephan Günnemann
Adversarial Machine Learning 1

Deep learning models for graphs have achieved strong performance for the task of node classification. Despite their proliferation, currently there is no study of their robustness to adversarial attacks. Yet, in domains where they are likely to be used, e.g. the web, adversaries are common. Can deep learning models for graphs be easily fooled? In this extended abstract we summarize the key findings and contributions of our work, in which we introduce the first study of adversarial attacks on attributed graphs, specifically focusing on models exploiting ideas of graph convolutions. In addition to attacks at test time, we tackle the more challenging class of poisoning/causative attacks, which focus on the training phase of a machine learning model. We generate adversarial perturbations targeting the node's features and the graph structure, thus, taking the dependencies between instances in account. Moreover, we ensure that the perturbations remain unnoticeable by preserving important data characteristics. To cope with the underlying discrete domain we propose an efficient algorithm Nettack exploiting incremental computations. Our experimental study shows that accuracy of node classification significantly drops even when performing only few perturbations. Even more, our attacks are transferable: the learned attacks generalize to other state-of-the-art node classification models and unsupervised approaches, and likewise are successful given only limited knowledge about the graph.

### Wednesday 1409:30 - 10:30ML|RL - Reinforcement Learning 2 (R4)

Chair: TBD
• #977
Metatrace Actor-Critic: Online Step-Size Tuning by Meta-gradient Descent for Reinforcement Learning Control
Kenny Young, Baoxiang Wang, Matthew E. Taylor
Reinforcement Learning 2

Reinforcement learning (RL) has had many successes, but significant hyperparameter tuning is commonly required to achieve good performance. Furthermore, when nonlinear function approximation is used, non-stationarity in the state representation can lead to learning instability. A variety of techniques exist to combat this --- most notably experience replay or the use of parallel actors. These techniques stabilize learning by making the RL problem more similar to the supervised setting. However, they come at the cost of moving away from the RL problem as it is typically formulated, that is, a single agent learning online without maintaining a large database of training examples. To address these issues, we propose Metatrace, a meta-gradient descent based algorithm to tune the step-size online. Metatrace leverages the structure of eligibility traces, and works for both tuning a scalar step-size and a respective step-size for each parameter. We empirically evaluate Metatrace for actor-critic on the Arcade Learning Environment. Results show Metatrace can speed up learning, and improve performance in non-stationary settings.

• #5320
Successor Options: An Option Discovery Framework for Reinforcement Learning
Rahul Ramesh, Manan Tomar, Balaraman Ravindran
Reinforcement Learning 2

The options framework in reinforcement learning models the notion of a skill or a temporally extended sequence of actions. The discovery of a reusable set of skills has typically entailed building options, that navigate to bottleneck states. In this work, we instead adopt a complementary approach, where we attempt to discover options that navigate to landmark states. These states are prototypical representatives of well-connected regions and can hence access the associated region with relative ease. In this work, we propose Successor Options, which leverages Successor representations to build a model of the state space. The intra-option policies are learnt using a novel pseudo-reward and the model scales to high-dimensional spaces since it does not construct an explicit graph of the entire state space. Additionally, we also propose an Incremental Successor Options model that iterates between constructing Successor representations and building options, which is useful when robust Successor representations cannot be built solely from primitive actions. We demonstrate the efficacy of our approach on a collection of grid-worlds, and on the high-dimensional robotic control environment of Fetch.

• #5368
An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents
Felipe Such, Vashisht Madhavan, Rosanne Liu, Rui Wang, Pablo Samuel Castro, Yulun Li, Jiale Zhi, Ludwig Schubert, Marc G. Bellemare, Jeff Clune, Joel Lehman
Reinforcement Learning 2

Much human and computational effort has aimed to improve how deep reinforcement learning (DRL) algorithms perform on benchmarks such as the Atari Learning Environment. Comparatively less effort has focused on understanding what has been learned by such methods, and investigating and comparing the representations learned by different families of DRL algorithms. Sources of friction include the onerous computational requirements, and general logistical and architectural complications for running DRL algorithms at scale. We lessen this friction, by (1) training several algorithms at scale and releasing trained models, (2) integrating with a previous DRL model release, and (3) releasing code that makes it easy for anyone to load, visualize, and analyze such models. This paper introduces the Atari Zoo framework, which contains models trained across benchmark Atari games, in an easy-to-use format, as well as code that implements common modes of analysis and connects such models to a popular neural network visualization library. Further, to demonstrate the potential of this dataset and software package, we show initial quantitative and qualitative comparisons between the performance and representations of several DRL algorithms, highlighting interesting and previously unknown distinctions between them.

• #5528
Unobserved Is Not Equal to Non-existent: Using Gaussian Processes to Infer Immediate Rewards Across Contexts
Hamoon Azizsoltani, Yeo Jin Kim, Markel Sanz Ausin, Tiffany Barnes, Min Chi
Reinforcement Learning 2

Learning optimal policies in real-world domains with delayed rewards is a major challenge in Reinforcement Learning. We address the credit assignment problem by proposing a Gaussian Process (GP)-based immediate reward approximation algorithm and evaluate its effectiveness in 4 contexts where rewards can be delayed for long trajectories. In one GridWorld game and 8 Atari games, where immediate rewards are available, our results showed that on 7 out 9 games, the proposed GP-inferred reward policy performed at least as well as the immediate reward policy and significantly outperformed the corresponding delayed reward policy. In e-learning and healthcare applications, we combined GP-inferred immediate rewards with offline Deep Q-Network (DQN) policy induction and showed that the GP-inferred reward policies outperformed the policies induced using delayed rewards in both real-world contexts.

### Wednesday 1409:30 - 10:30AMS|EPAMBS - Economic Paradigms, Auctions and Market-Based Systems (R5)

Chair: TBD
• #1844
Dispatching Through Pricing: Modeling Ride-Sharing and Designing Dynamic Prices
Mengjing Chen, Weiran Shen, Pingzhong Tang, Song Zuo
Economic Paradigms, Auctions and Market-Based Systems

Over the past few years, ride-sharing has emerged as an effective way to relieve traffic congestion. A key problem for the ride-sharing platforms is to come up with a revenue-optimal (or GMV-optimal) pricing scheme and a vehicle dispatching policy that incorporate geographic and temporal information. In this paper, we aim to tackle this problem via an economic approach. Modeled naively, the underlying optimization problem may be non-convex and thus hard to solve. To this end, we use a so-called ironing'' technique to convert the problem into an equivalent convex optimization one via a clean Markov decision process (MDP) formulation, where the states are the driver distributions and the decision variables are the prices for each pair of locations. Our main finding is an efficient algorithm that computes the exact revenue-optimal (or GMV-optimal) randomized pricing scheme, which naturally induces the accompany vehicle dispatching policy. We also conduct empirical evaluations of our solution through real data of a major ride-sharing platform and show its advantages over fixed pricing schemes as well as several prevalent surge-based pricing schemes.

• #5005
Strategic Signaling for Selling Information Goods
Shani Alkoby, David Sarne, Igal Milchtaich
Economic Paradigms, Auctions and Market-Based Systems

This paper studies the benefit in using signaling by an information seller holding information that can completely disambiguate some uncertainty concerning the state of the world for the information buyer. We show that a necessary condition for having the information seller benefit from signaling in this model is having some seed of truth" in the signaling scheme used. We then introduce two natural signaling mechanisms that adhere to this condition, one where the seller pre-commits to the signaling scheme to be used and the other where she commits to use a signaling scheme that contains a seed of truth". Finally, we analyze the equilibrium resulting from each and show that, somehow counter-intuitively, despite the inherent differences between the two mechanisms, they are equivalent in the sense that for any equilibrium associated with the maximum revenue in one there is an equilibrium offering the seller the same revenue in the other.

• #5490
Explore Truthful Incentives for Tasks with Heterogenous Levels of Difficulty in the Sharing Economy
Pengzhan Zhou, Xin Wei, Cong Wang, Yuanyuan Yang
Economic Paradigms, Auctions and Market-Based Systems

Incentives are explored in the sharing economy to inspire users for better resource allocation. Previous works build a budget-feasible incentive mechanism to learn users' cost distribution. However, they only consider a special case that all tasks are considered as the same. The general problem asks for finding a solution when the cost for different tasks varies. In this paper, we investigate this general problem by considering a system with k levels of difficulty. We present two incentivizing strategies for offline and online implementation, and formally derive the ratio of utility between them in different scenarios. We propose a regret-minimizing mechanism to decide incentives by dynamically adjusting budget assignment and learning from users' cost distributions. Our experiment demonstrates utility improvement about 7 times and time saving of 54% to meet a utility objective compared to the previous works.

• #6397
On the Problem of Assigning PhD Grants
Katarína Cechlárová, Laurent Gourvès, Julien Lesca
Economic Paradigms, Auctions and Market-Based Systems

In this paper, we study the problem of assigning PhD grants. Master students apply for PhD grants on different topics and the number of available grants is limited. In this problem, students have preferences over topics they applied to and the university has preferences over possible matchings of student/topic that satisfy the limited number of grants. The particularity of this framework is the uncertainty on a student's decision to accept or reject a topic offered to him. Without using probability to model uncertainty, we study the possibility of designing protocols of exchanges between the students and the university in order to construct a matching which is as close as possible to the optimal one i.e., the best achievable matching without uncertainty.

### Wednesday 1409:30 - 10:30AMS|ATM - Agent Theories and Models (R6)

Chair: TBD
• #63
The Interplay of Emotions and Norms in Multiagent Systems
Anup K. Kalia, Nirav Ajmeri, Kevin S. Chan, Jin-Hee Cho, Sibel Adali, Munindar Singh
Agent Theories and Models

We study how emotions influence norm outcomes in decision-making contexts. Following the literature, we provide baseline Dynamic Bayesian models to capture an agent's two perspectives on a directed norm. Unlike the literature, these models are holistic in that they incorporate not only norm outcomes and emotions but also trust and goals. We obtain data from an empirical study involving game play with respect to the above variables. We provide a step-wise process to discover two new Dynamic Bayesian models based on maximizing log-likelihood scores with respect to the data. We compare the new models with baseline models to discover new insights into the existing relationships. Our empirically supported models are thus holistic and characterize how emotions influence norm outcomes better than previous approaches.

• #4198
Strategy Logic with Simple Goals: Tractable Reasoning about Strategies
Francesco Belardinelli, Wojciech Jamroga, Damian Kurpiewski, Vadim Malvone, Aniello Murano
Agent Theories and Models

In this paper we introduce Strategy Logic with simple goals (SL[SG]), a fragment of Strategy Logic that strictly extends the well-known Alternating-time Temporal Logic ATL by introducing arbitrary quantification over the agents' strategies.  Our motivation comes from game-theoretic applications, such as expressing Stackelberg equilibria in games, coercion in voting protocols, as well as module checking for simple goals. Most importantly, we prove that the model checking problem for SL[SG] is PTIME-complete, the same as ATL. Thus, the extra expressive power comes at no computational cost as far as verification is concerned.

• #4774
Average-case Analysis of the Assignment Problem with Independent Preferences
Yansong Gao, Jie Zhang
Agent Theories and Models

The fundamental assignment problem is in search of welfare maximization mechanisms to allocate items to agents when the private preferences over indivisible items are provided by self-interested agents. The mainstream mechanism \textit{Random Priority} is asymptotically the best mechanism for this purpose, when comparing its welfare  to the optimal social welfare using the canonical \textit{worst-case approximation ratio}.  Surprisingly, the efficiency loss indicated by the worst-case ratio does not have a constant bound \cite{FFZ:14}.Recently, \cite{DBLP:conf/mfcs/DengG017} shows that when the agents' preferences are drawn from a uniform distribution, its \textit{average-case approximation ratio} is upper bounded by 3.718. They left it as an open question of whether a constant ratio holds for general scenarios. In this paper, we offer an affirmative answer to this question by showing that the ratio is bounded by $1/\mu$ when the preference values are independent and identically distributed random variables, where $\mu$ is the expectation of the value distribution. This upper bound improves the results in \cite{DBLP:conf/mfcs/DengG017} for the Uniform distribution as well. Moreover, under mild conditions, the ratio has a \textit{constant} bound for any independent  random values. En route to these results, we develop powerful tools to show the insights that for most valuation inputs, the efficiency loss is small.

• #4962
On Computational Tractability for Rational Verification
Julian Gutierrez, Muhammad Najib, Giuseppe Perelli, Michael Wooldridge
Agent Theories and Models

Rational verification involves checking which temporal logic properties hold of a concurrent and multiagent system, under the assumption that agents in the system choose strategies in game theoretic equilibrium. Rational verification can be understood as a counterpart of model checking for multiagent systems, but while model checking can be done in polynomial time for some temporal logic specification languages such as CTL, and polynomial space with LTL specifications, rational verification is much more intractable: it is 2EXPTIME-complete with LTL specifications, even when using explicit-state system representations.  In this paper we show that the complexity of rational verification can be greatly reduced by restricting specifications to GR(1), a fragment of LTL that can represent most response properties of reactive systems. We also provide improved complexity results for rational verification when considering players' goals given by mean-payoff utility functions -- arguably the most widely used quantitative objective for agents in concurrent and multiagent systems. In particular, we show that for a number of relevant settings, rational verification can be done in polynomial space or even in polynomial time.

### Wednesday 1409:30 - 10:30ML|TDS - Time-series;Data Streams 1 (R7)

Chair: TBD
• #279
E²GAN: End-to-End Generative Adversarial Network for Multivariate Time Series Imputation
Yonghong Luo, Ying Zhang, Xiangrui Cai, Xiaojie Yuan
Time-series;Data Streams 1

The missing values, appear in most of multivariate time series, prevent advanced analysis of multivariate time series data. Existing imputation approaches try to deal with missing values by deletion, statistical imputation, machine learning based imputation and generative imputation. However, these methods are either incapable of dealing with temporal information or multi-stage. This paper proposes an end-to-end generative model {$E^2$}GAN to impute missing values in multivariate time series. With the help of the discriminative loss and the squared error loss, {$E^2$}GAN can impute the incomplete time series by the nearest generated complete time series at one stage. Experiments on multiple real-world datasets show that our model outperforms the baselines on the imputation accuracy and achieves state-of-the-art classification/regression results on the downstream applications. Additionally, our method also gains better time efficiency than multi-stage method on the training of neural networks.

• #607
CLVSA: A Convolutional LSTM Based Variational Sequence-to-Sequence Model with Attention for Predicting Trends of Financial Markets
Jia Wang, Tong Sun, Benyuan Liu, Yu Cao, Hongwei Zhu
Time-series;Data Streams 1

Financial markets are a complex dynamical system. The complexity comes from the interaction between a market and its participants, in other words, the integrated outcome of activities of the entire participants determines the markets trend, while the markets trend affects activities of participants. These interwoven interactions make financial markets keep evolving. Inspired by stochastic recurrent models that successfully capture variability observed in natural sequential data such as speech and video, we propose CLVSA, a hybrid model that consists of stochastic recurrent networks, the sequence-to-sequence architecture, the self- and inter-attention mechanism, and convolutional LSTM units to capture variationally underlying features in raw financial trading data. Our model outperforms basic models, such as convolutional neural network, vanilla LSTM network, and sequence-to-sequence model with attention, based on backtesting results of six futures from January 2010 to December 2017. Our experimental results show that, by introducing an approximate posterior, CLVSA takes advantage of an extra regularizer based on the Kullback-Leibler divergence to prevent itself from overfitting traps.

• #4502
Confirmatory Bayesian Online Change Point Detection in the Covariance Structure of Gaussian Processes
Jiyeon Han, Kyowoon Lee, Anh Tong, Jaesik Choi
Time-series;Data Streams 1

Discovery of abrupt changes in sequential data is important to predict future changes. Modeling non-stationarity is often a key factor in many real-world examples. In this paper, we propose new statistical hypothesis tests to detect covariance structure changes in locally smooth time series modeled by Gaussian Processes (GPs). We provide theoretically justified thresholds of the tests. Furthermore, we apply our tests to improve Bayesian Online Change Point Detection (BOCPD) by confirming statistically significant changes and non-changes. Our Confirmatory-BOCPD (CBOCPD) algorithm finds multiple structural breaks in GPs. Experimental results on synthetic datasets and simulation datasets show our new tests correctly detect changes in covariance structure. Our CBOCPD also outperforms existing methods for nonstationary prediction in terms of both regression error and log likelihood.

• #5504
Linear Time Complexity Time Series Clustering with Symbolic Pattern Forest
Xiaosheng Li, Jessica Lin, Liang Zhao
Time-series;Data Streams 1

With increasing powering of data storage and advances in data generation and collection technologies, large volumes of time series data become available and the content is changing rapidly. This requires the data mining methods to have low time complexity to handle the huge and fast-changing data. This paper presents a novel time series clustering algorithm that has linear time complexity. The proposed algorithm partitions the data by checking some randomly selected symbolic patterns in the time series. Theoretical analysis is provided to show that group structures in the data can be revealed from this process. We evaluate the proposed algorithm extensively on all 85 datasets from the well-known UCR time series archive, and compare with the state-of-the-art approaches with statistical analysis. The results show that the proposed method is faster, and achieves better accuracy compared with other rival methods.

### Wednesday 1409:30 - 10:30KRR|ARTP - Automated Reasoning and Theorem Proving (R8)

Chair: TBD
• #2513
An ASP Approach to Generate Minimal Countermodels in Intuitionistic Propositional Logic
Camillo Fiorentini
Automated Reasoning and Theorem Proving

Intuitionistic Propositional Logic is complete w.r.t. Kripke semantics: if a formula is not intuitionistically valid, then there exists a finite Kripke model falsifying it. The problem of obtaining concise models has been scarcely investigated in the literature. We present a procedure to generate minimal models in the number of worlds relying on Answer Set Programming (ASP).

• #3164
Approximating Integer Solution Counting via Space Quantification for Linear Constraints
Cunjing Ge, Feifei Ma, Xutong Ma, Fan Zhang, Pei Huang, Jian Zhang
Automated Reasoning and Theorem Proving

Solution counting or solution space quantification (means volume computation and volume estimation) for linear constraints (LCs) has found interesting applications in various fields. Experimental data shows that integer solution counting is usually more expensive than quantifying volume of solution space while their output values are close. So it is helpful to approximate the number of integer solutions by the volume if the error is acceptable. In this paper, we present and prove a bound of such error for LCs. It is the first bound that can be used to approximate the integer solution counts. Based on this result, an approximate integer solution counting method for LCs is proposed. Experiments show that our approach is over 20x faster than the state-of-the-art integer solution counters. Moreover, such advantage increases with the problem scale.

• #2076
Solving the Satisfiability Problem of Modal Logic S5 Guided by Graph Coloring
Pei Huang, Minghao Liu, Ping Wang, Wenhui Zhang, Feifei Ma, Jian Zhang
Automated Reasoning and Theorem Proving

Modal logic S5 has found various applications in artificial intelligence. With the advances in modern SAT solvers, SAT-based approach has shown great potential in solving the satisfiability problem of S5. The scale of the SAT encoding for S5 is strongly influenced by the upper bound on the number of possible worlds. In this paper, we present a novel SAT-based approach for S5 satisfiability problem. We show a normal form for S5 formulas. Based on this normal form, a conflict graph can be derived whose chromatic number provides an upper bound of the possible worlds and a lot of unnecessary search spaces can be eliminated in this process. A heuristic graph coloring algorithm is adopted to balance the efficiency and optimality. The number of possible worlds can be significantly reduced for many practical instances. Extensive experiments demonstrate that our approach outperforms state-of-the-art S5-SAT solvers.

• #888
Guarantees for Sound Abstractions for Generalized Planning
Blai Bonet, Raquel Fuentetaja, Yolanda E-Martín, Daniel Borrajo
Automated Reasoning and Theorem Proving

Generalized planning is about finding plans that solve collections of planning instances, often infinite collections, rather than single instances. Recently it has been shown how to reduce the planning problem for generalized planning to the planning problem for a qualitative numerical problem; the latter being a reformulation that simultaneously captures all the instances in the collection. An important thread of research thus consists in finding such reformulations, or abstractions, automatically. A recent proposal learns the abstractions inductively from a finite and small sample of transitions from instances in the collection. However, as in all inductive processes, the learned abstraction is not guaranteed to be correct for the whole collection. In this work we address this limitation by performing an analysis of the abstraction with respect to the collection, and show how to obtain formal guarantees for generalization. These guarantees, in the form of first-order formulas, may be used to 1) define subcollections of instances on which the abstraction is guaranteed to be sound, 2) obtain necessary conditions for generalization under certain assumptions, and 3) do automated synthesis of complex invariants for planning problems. Our framework is general, it can be extended or combined with other approaches, and it has applications that go beyond generalized planning.

### Wednesday 1409:30 - 10:30NLP|IE - Information Extraction 1 (R9)

Chair: TBD
• #1684
End-to-End Multi-Perspective Matching for Entity Resolution
Cheng Fu, Xianpei Han, Le Sun, Bo Chen, Wei Zhang, Suhui Wu, Hao Kong
Information Extraction 1

Entity resolution (ER) aims to identify data records referring to the same real-world entity. Due to the heterogeneity of entity attributes and the diversity of similarity measures, one main challenge of ER is how to select appropriate similarity measures for different attributes. Previous ER methods usually employ heuristic similarity selection algorithms, which are highly specialized to specific ER problems and are hard to be generalized to other situations. Furthermore, previous studies usually perform similarity learning and similarity selection independently, which often result in error propagation and are hard to be optimized globally. To resolve the above problems, this paper proposes an end-to-end multi-perspective entity matching model, which can adaptively select optimal similarity measures for heterogenous attributes by jointly learning and selecting similarity measures in an end-to-end way. Experiments on two real-world datasets show that our method significantly outperforms previous ER methods.

• #3048
Improving Cross-Domain Performance for Relation Extraction via Dependency Prediction and Information Flow Control
Amir Pouran Ben Veyseh, Thien Nguyen, Dejing Dou
Information Extraction 1

Relation Extraction (RE) is one of the fundamental tasks in Information Extraction and Natural Language Processing. Dependency trees have been shown to be a very useful source of information for this task. The current deep learning models for relation extraction has mainly exploited this dependency information by guiding their computation along the structures of the dependency trees. One potential problem with this approach is it might prevent the models from capturing important context information beyond syntactic structures and cause the poor cross-domain generalization. This paper introduces a novel method to use dependency trees in RE for deep learning models that jointly predicts dependency and semantics relations. We also propose a new mechanism to control the information flow in the model based on the input entity mentions. Our extensive experiments on benchmark datasets show that the proposed model outperforms the existing methods for RE significantly.

• #3319
Neural Collective Entity Linking Based on Recurrent Random Walk Network Learning
Mengge Xue, Weiming Cai, Jinsong Su, Linfeng Song, Yubin Ge, Yubao Liu, Bin Wang
Information Extraction 1

Beneﬁting from the excellent ability of neural networks on learning semantic representations, existing studies for entity linking (EL) have resorted to neural networks to exploit both the local mention-to-entity compatibility and the global interdependence between different EL decisions for target entity disambiguation. However, most neural collective EL methods depend entirely upon neural networks to automatically model the semantic dependencies between different EL decisions, which lack of the guidance from external knowledge. In this paper, we propose a novel end-to-end neural network with recurrent random-walk layers for collective EL, which introduces external knowledge to model the semantic interdependence between different EL decisions. Speciﬁcally, we ﬁrst establish a model based on local context features, and then stack random-walk layers to reinforce the evidence for related EL decisions into high-probability decisions, where the semantic interdependence between candidate entities is mainly induced from an external knowledge base. Finally, a semantic regularizer that preserves the collective EL decisions consistency is incorporated into the conventional objective function, so that the external knowledge base can be fully exploited in collective EL decisions. Experimental results and in-depth analysis on various datasets show that our model achieves better performance than other state-of-the-art models. Our code and data are released at https://github.com/DeepLearnXMU/RRWEL.

• #5335
Coreference Aware Representation Learning for Neural Named Entity Recognition
Zeyu Dai, Hongliang Fei, Ping Li
Information Extraction 1

Recent neural network models have achieved state-of-the-art performance on the task of named entity recognition (NER). However, previous neural network models typically treat the input sentences as a linear sequence of words but ignore rich structural information, such as the coreference relations among non-adjacent words, phrases or entities. In this paper, we propose a novel approach to learn coreference-aware word representations for the NER task at the document level. In particular, we enrich the well-known neural architecture CNN-BiLSTM-CRF'' with a coreference layer on top of the BiLSTM layer to incorporate coreferential relations. Furthermore, we introduce the coreference regularization to ensure the coreferential entities to share similar representations and consistent predictions within the same coreference cluster. Our proposed model achieves new state-of-the-art performance on two NER benchmarks: CoNLL-2003 and OntoNotes v5.0. More importantly, we demonstrate that our framework does not rely on gold coreference knowledge, and can still work well even when the coreferential relations are generated by a third-party toolkit.

### Wednesday 1409:30 - 10:30CV|VEAS - Video: Events, Activities and Surveillance (R10)

Chair: TBD
• #1593
Supervised Set-to-Set Hashing in Visual Recognition
I-Hong Jhuo
Video: Events, Activities and Surveillance

Visual data, such as an image or a sequence of video frames, is often naturally represented as a point set. In this paper, we consider the fundamental problem of finding a nearest set from a collection of sets, to a query set. This problem has obvious applications in large-scale visual retrieval and recognition, and also in applied fields beyond computer vision. One challenge stands out in solving the problem---set representation and measure of similarity. Particularly, the query set and the sets in dataset collection can have varying cardinalities. The training collection is large enough such that linear scan is impractical. We propose a simple representation scheme that encodes both statistical and structural information of the sets. The derived representations are integrated in a kernel framework for flexible similarity measurement. For the query set process, we adopt a learning-to-hash pipeline that turns the kernel representations into hash bits based on simple learners, using multiple kernel learning. Experiments on two visual retrieval datasets show unambiguously that our set-to-set hashing framework outperforms prior methods that do not take the set-to-set search setting.

• #4203
Variation Generalized Feature Learning via Intra-view Variation Adaptation
Jiawei Li, Mang Ye, Andy Jinhua Ma, Pong C Yuen
Video: Events, Activities and Surveillance

This paper addresses the variation generalized feature learning problem in unsupervised video-based person re-identification (re-ID). With advanced tracking and detection algorithms, large-scale intra-view positive samples can be easily collected by assuming that the image frames within the tracking sequence belong to the same person. Existing methods either directly use the intra-view positives to model cross-view variations or simply minimize the intra-view variations to capture the invariant component with some discriminative information loss. In this paper, we propose a Variation Generalized Feature Learning (VGFL) method to learn adaptable feature representation with intra-view positives. The proposed method can learn a discriminative re-ID model without any manually annotated cross-view positive sample pairs. It could address the unseen testing variations with a novel variation generalized feature learning algorithm. In addition, an Adaptability-Discriminability (AD) fusion method is introduced to learn adaptable video-level features. Extensive experiments on different datasets demonstrate the effectiveness of the proposed method.

• #4243
DBDNet: Learning Bi-directional Dynamics for Early Action Prediction
Guoliang Pang, Xionghui Wang, Jian-Fang Hu, Qing Zhang, Wei-Shi Zheng
Video: Events, Activities and Surveillance

Predicting future actions from observed partial videos is very challenging as the missing future is uncertain and sometimes has multiple possibilities. To obtain a reliable future estimation, a novel encoder-decoder architecture is proposed for integrating the tasks of synthesizing future motions from observed videos and reconstructing observed motions from synthesized future motions in an unified framework, which can capture the bi-directional dynamics depicted in partial videos along the temporal (past-to-future) direction and reverse chronological (future-back-to-past) direction. We then employ a bi-directional long short-term memory (Bi-LSTM) architecture to exploit the learned bi-directional dynamics for predicting early actions. Our experiments on two benchmark action datasets show that learning bi-directional dynamics benefits the early action prediction and our system clearly outperforms the state-of-the-art methods.

• #3056
Predicting dominance in multi-person videos
Chongyang Bai, Maksim Bolonkin, Srijan Kumar, Jure Leskovec, Judee Burgoon, Norah Dunbar, V. S. Subrahmanian
Video: Events, Activities and Surveillance

We consider the problems of predicting (i) the most dominant person in a group of people, and (ii) the more dominant of a pair of people, from videos depicting group interactions. We introduce a novel family of variables called Dominance Rank. We combine features not previously used for dominance prediction (e.g., facial action units, emotions), with a novel ensemble-based approach to solve these two problems. We test our models against four competing algorithms in the literature on two datasets and show that our results improve past performance. We show 2.4% to 16.7% improvement in AUC compared to baselines on one dataset, and a gain of 0.6% to 8.8% in accuracy on the other. Ablation testing shows that Dominance Rank features play a key role.

### Wednesday 1409:30 - 10:30R|MPP - Motion and Path Planning (R11)

Chair: TBD
• #1466
The Parameterized Complexity of Motion Planning for Snake-Like Robots
Siddharth Gupta, Guy Sa'ar, Meirav Zehavi
Motion and Path Planning

We study a motion-planning problem inspired by the game Snake that models scenarios like the transportation of linked wagons towed by a locomotor to the movement of a group of agents that travel in an ant-like'' fashion. Given a snake-like'' robot with initial and final positions in an environment modeled by a graph, our goal is to decide whether the robot can reach the final position from the initial position without intersecting itself. Already on grid graphs, this problem is PSPACE-complete [Biasi and Ophelders, 2018]. Nevertheless, we prove that even on general graphs, it is solvable in time k^{O(k)}|I|^{O(1)} where k is the size of the robot, and |I| is the input size. Towards this, we give a novel application of color-coding to sparsify the configuration graph of the problem. We also show that the problem is unlikely to have a polynomial kernel even on grid graphs, but it admits a treewidth-reduction procedure. To the best of our knowledge, the study of the parameterized complexity of motion problems has been~largely~neglected, thus our work is pioneering in this regard.

• #10955
(Sister Conferences Best Papers Track) The Provable Virtue of Laziness in Motion Planning [Extended Abstract]
Nika Haghtalab, Simon Mackenzie, Ariel D. Procaccia, Oren Salzman, Siddhartha Srinivasa
Motion and Path Planning

The Lazy Shortest Path (LazySP) class consists of motion-planning algorithms that only evaluate edges along candidate shortest paths between the source and target. These algorithms were designed to minimize the number of edge evaluations in settings where edge evaluation dominates the running time of the algorithm such as manipulation in cluttered environments and planning for robots in surgical settings; but how close to optimal are LazySP algorithms in terms of this objective? Our main result is an analytical upper bound, in a probabilistic model, on the number of edge evaluations required by LazySP algorithms; a matching lower bound shows that these algorithms are asymptotically optimal in the worst case.

• #656
Energy-Efficient Slithering Gait Exploration for a Snake-Like Robot Based on Reinforcement Learning
Zhenshan Bing, Christian Lemke, Zhuangyi Jiang, Kai Huang, Alois Knoll
Motion and Path Planning

Similar to their counterparts in nature, the flexible bodies of snake-like robots enhance their movement capability and adaptability in diverse environments. However, this flexibility corresponds to a complex control task involving highly redundant degrees of freedom, where traditional model-based methods usually fail to propel the robots energy-efficiently. In this work, we present a novel approach for designing an energy-efficient slithering gait for a snake-like robot using a model-free reinforcement learning (RL) algorithm. Specifically, we present an RL-based controller for generating locomotion gaits at a wide range of velocities, which is trained using the proximal policy optimization (PPO) algorithm. Meanwhile, a traditional parameterized gait controller is presented and the parameter sets are optimized using the grid search and Bayesian optimization algorithms for the purposes of reasonable comparisons. Based on the analysis of the simulation results, we demonstrate that this RL-based controller exhibits very natural and adaptive movements, which are also substantially more energy-efficient than the gaits generated by the parameterized controller. Videos are shown at https://videoviewsite.wixsite.com/rlsnake .

• #10966
(Sister Conferences Best Papers Track) Differentiable Physics and Stable Modes for Tool-Use and Manipulation Planning -- Extended Abtract
Marc Toussaint, Kelsey R. Allen, Kevin A. Smith, Joshua B. Tenenbaum
Motion and Path Planning

We consider the problem of sequential manipulation and tool-use planning in domains that include physical interactions such as hitting and throwing. The approach integrates a Task And Motion Planning formulation with primitives that either impose stable kinematic constraints or differentiable dynamical and impulse exchange constraints at the path optimization level. We demonstrate our approach on a variety of physical puzzles that involve tool use and dynamic interactions. We then compare manipulation sequences generated by our approach to human actions on analogous tasks, suggesting future directions and illuminating current limitations.

### Wednesday 1409:30 - 10:30ML|DM - Data Mining 4 (R12)

Chair: TBD
• #938
A Degeneracy Framework for Scalable Graph Autoencoders
Guillaume Salha, Romain Hennequin, Viet Anh Tran, Michalis Vazirgiannis
Data Mining 4

In this paper, we present a general framework to scale graph autoencoders (AE) and graph variational autoencoders (VAE). This framework leverages graph degeneracy concepts to train models only from a dense subset of nodes instead of using the entire graph. Together with a simple yet effective propagation mechanism, our approach significantly improves scalability and training speed while preserving performance. We evaluate and discuss our method on several variants of existing graph AE and VAE, providing the first application of these models to large graphs with up to millions of nodes and edges. We achieve empirically competitive results w.r.t. several popular scalable node embedding methods, which emphasizes the relevance of pursuing further research towards more scalable graph AE and VAE.

• #2479
Community Detection and Link Prediction via Cluster-driven Low-rank Matrix Completion
Junming Shao, Zhong Zhang, Zhongjing Yu, Jun Wang, Yi Zhao, Qinli Yang
Data Mining 4

Community detection and link prediction are highly dependent since knowing cluster structure as a priori will help identify missing links, and in return, clustering on networks with supplemented missing links will improve community detection performance. In this paper, we propose a Cluster-driven Low-rank Matrix Completion (CLMC), for performing community detection and link prediction simultaneously in a unified framework. To this end, CLMC decomposes the adjacent matrix of a target network as three additive matrices: clustering matrix, noise matrix and supplement matrix. The community-structure and low-rank constraints are imposed on the clustering matrix, such that the noisy edges between communities are removed and the resulting matrix is an ideal block-diagonal matrix. Missing edges are further learned via low-rank matrix completion. Extensive experiments show that CLMC achieves state-of-the-art performance.

• #4127
Graph Convolutional Networks on User Mobility Heterogeneous Graphs for Social Relationship Inference
Yongji Wu, Defu Lian, Shuowei Jin, Enhong Chen
Data Mining 4

Inferring social relations from user trajectory data is of great value in real-world applications such as friend recommendation and ride-sharing. Most existing methods predict relationship based on a pairwise approach using some hand-crafted features or rely on a simple skip-gram based model to learn embeddings on graphs. Using hand-crafted features often fails to capture the complex dynamics in human social relations, while the graph embedding based methods only use random walks to propagate information and cannot incorporate external semantic data provided. We propose a novel model that utilizes Graph Convolutional Networks (GCNs) to learn user embeddings on the User Mobility Heterogeneous Graph in an unsupervised manner. This model is capable of propagating relation layer-wisely as well as combining both the rich structural information in the heterogeneous graph and predictive node features provided. Our method can also be extended to a semi-supervised setting if a part of the social network is available. The evaluation on three real-world datasets demonstrates that our method outperforms the state-of-the-art approaches.

• #10973
(Sister Conferences Best Papers Track) Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms
Panagiotis Mandros, Mario Boley, Jilles Vreeken
Data Mining 4

<p>The reliable fraction of information is an attractive score for quantifying (functional) dependencies in high-dimensional data. In this paper, we systematically explore the algorithmic implications of using this measure for optimization. We show that the problem is NP-hard, justifying worst-case exponential-time as well as heuristic search methods. We then substantially improve the practical performance for both optimization styles by deriving a novel admissible bounding function that has an unbounded potential for additional pruning over the previously proposed one. Finally, we empirically investigate the approximation ratio of the greedy algorithm and show that it produces highly competitive results in a fraction of time needed for complete branch-and-bound style search.<br></p>

### Wednesday 1409:30 - 10:30ML|TAML - Transfer, Adaptation, Multi-task Learning 1 (R13)

Chair: TBD
• #2102
Weak Supervision Enhanced Generative Network for Question Generation
Yutong Wang, Jiyuan Zheng, Qijiong Liu, Zhou Zhao, Jun Xiao, Yueting Zhuang
Transfer, Adaptation, Multi-task Learning 1

Automatic question generation according to an answer within the given passage is useful for many applications, such as question answering system, dialogue system, etc. Current neural-based methods mostly take two steps which extract several important sentences based on the candidate answer through manual rules or supervised neural networks and then use an encoder-decoder framework to generate questions about these sentences. These approaches still acquire two steps and neglect the semantic relations between the answer and the context of the whole passage which is sometimes necessary for answering the question. To address this problem, we propose the Weakly Supervision Enhanced Generative Network (WeGen) which automatically discovers relevant features of the passage given the answer span in a weakly supervised manner to improve the quality of generated questions. More specifically, we devise a discriminator, Relation Guider, to capture the relations between the passage and the associated answer and then the Multi-Interaction mechanism is deployed to transfer the knowledge dynamically for our question generation system. Experiments show the effectiveness of our method in both automatic evaluations and human evaluations.

• #2519
Fast and Robust Multi-View Multi-Task Learning via Group Sparsity
Lu Sun, Canh Hao Nguyen, Hiroshi Mamitsuka
Transfer, Adaptation, Multi-task Learning 1

Multi-view multi-task learning has recently attracted more and more attention due to its dual-heterogeneity, i.e.,each task has heterogeneous features from multiple views, and probably correlates with other tasks via common views.Existing methods usually suffer from three problems: 1) lack the ability to eliminate noisy features, 2) hold a strict assumption on view consistency and 3) ignore the possible existence of task-view outliers.To overcome these limitations, we propose a robust method with joint group-sparsity by decomposing feature parameters into a sum of two components,in which one saves relevant features (for Problem 1) and flexible view consistency (for Problem 2),while the other detects task-view outliers (for Problem 3).With a global convergence property, we develop a fast algorithm to solve the optimization problem in a linear time complexity w.r.t. the number of features and labeled samples.Extensive experiments on various synthetic and real-world datasets demonstrate its effectiveness.

• #4003
Cascaded Algorithm-Selection and Hyper-Parameter Optimization with Extreme-Region Upper Confidence Bound Bandit
Yi-Qi Hu, Yang Yu, Jun-Da Liao
Transfer, Adaptation, Multi-task Learning 1

An automatic machine learning (AutoML) task is to select the best algorithm and its hyper-parameters simultaneously. Previously, the hyper-parameters of all algorithms are joint as a single search space, which is not only huge but also redundant, because many dimensions of hyper-parameters are irrelevant with the selected algorithms. In this paper, we propose a cascaded approach for algorithm selection and hyper-parameter optimization. While a search procedure is employed at the level of hyper-parameter optimization, a bandit strategy runs at the level of algorithm selection to allocate the budget based on the search feedbacks. Since the bandit is required to select the algorithm with the maximum performance, instead of the average performance, we thus propose the extreme-region upper confidence bound (ER-UCB) strategy, which focuses on the extreme region of the underlying feedback distribution. We show theoretically that the ER-UCB has a regret upper bound O(K ln n) with independent feedbacks, which is as efficient as the classical UCB bandit. We also conduct experiments on a synthetic problem as well as a set of AutoML tasks. The results verify the effectiveness of the proposed method.

• #5280
A Principled Approach for Learning Task Similarity in Multitask Learning
Changjian Shui, Mahdieh Abbasi, Louis-Émile Robitaille, Boyu Wang, Christian Gagné
Transfer, Adaptation, Multi-task Learning 1

Multitask learning aims at solving a set of related tasks simultaneously, by exploiting the shared knowledge for improving the performance on individual tasks. Hence, an important aspect of multitask learning is to understand the similarities within a set of tasks. Previous works have incorporated this similarity information explicitly (e.g., weighted loss for each task) or implicitly (e.g., adversarial loss for feature adaptation), for achieving good empirical performances. However, the theoretical motivations for adding task similarity knowledge are often missing or incomplete. In this paper, we give a different perspective from a theoretical point of view to understand this practice. We first provide an upper bound on the generalization error of multitask learning, showing the benefit of explicit and implicit task similarity knowledge. We systematically derive the bounds based on two distinct task similarity metrics: H divergence and Wasserstein distance. From these theoretical results, we revisit the Adversarial Multi-task Neural Network, proposing a new training algorithm to learn the task relation coefficients and neural network parameters iteratively. We assess our new algorithm empirically on several benchmarks, showing not only that we find interesting and robust task relations, but that the proposed approach outperforms the baselines, reaffirming the benefits of theoretical insight in algorithm design.

### Wednesday 1409:30 - 10:30AMS|RA - Resource Allocation (R14)

Chair: TBD
• #119
Almost Envy-Freeness in Group Resource Allocation
Maria Kyropoulou, Warut Suksompong, Alexandros A. Voudouris
Resource Allocation

We study the problem of fairly allocating indivisible goods between groups of agents using the recently introduced relaxations of envy-freeness. We consider the existence of fair allocations under different assumptions on the valuations of the agents. In particular, our results cover cases of arbitrary monotonic, responsive, and additive valuations, while for the case of binary valuations we fully characterize the cardinalities of two groups of agents for which a fair allocation can be guaranteed with respect to both envy-freeness up to one good (EF1) and envy-freeness up to any good (EFX). Moreover, we introduce a new model where the agents are not partitioned into groups in advance, but instead the partition can be chosen in conjunction with the allocation of the goods. In this model, we show that for agents with arbitrary monotonic valuations, there is always a partition of the agents into two groups of any given sizes along with an EF1 allocation of the goods. We also provide an extension of this result to any number of groups.

• #438
The Price of Fairness for Indivisible Goods
Xiaohui Bei, Xinhang Lu, Pasin Manurangsi, Warut Suksompong
Resource Allocation

We investigate the efficiency of fair allocations of indivisible goods using the well-studied price of fairness concept. Previous work has focused on classical fairness notions such as envy-freeness, proportionality, and equitability. However, these notions cannot always be satisfied for indivisible goods, leading to certain instances being ignored in the analysis. In this paper, we focus instead on notions with guaranteed existence, including envy-freeness up to one good (EF1), balancedness, maximum Nash welfare (MNW), and leximin. We mostly provide tight or asymptotically tight bounds on the worst-case efficiency loss for allocations satisfying these notions.

• #5104
Reallocating Multiple Facilities on the Line
Dimitris Fotakis, Loukas Kavouras, Panagiotis Kostopanagiotis, Philip Lazos, Stratis Skoulakis, Nikos Zarifis
Resource Allocation

We study the multistage K-facility reallocation problem on the real line, where we maintain K facility locations over T stages, based on the stage-dependent locations of n agents. Each agent is connected to the nearest facility at each stage, and the facilities may move from one stage to another, to accommodate different agent locations. The objective is to minimize the connection cost of the agents plus the total moving cost of the facilities, over all stages. K-facility reallocation problem was introduced by (B.D. Kaijzer and D. Wojtczak, IJCAI 2018), where they mostly focused on the special case of a single facility. Using an LP-based approach, we present a polynomial time algorithm that computes the optimal solution for any number of facilities. We also consider online K-facility reallocation, where the algorithm becomes aware of agent locations in a stage-by stage fashion. By exploiting an interesting connection to the classical K-server problem, we present a constant-competitive algorithm for K = 2 facilities.

• #1823
Equitable Allocations of Indivisible Goods
Rupert Freeman, Sujoy Sikdar, Rohit Vaish, Lirong Xia
Resource Allocation

In fair division, equitability dictates that each participant receives the same level of utility. In this work, we study equitable allocations of indivisible goods among agents with additive valuations. While prior work has studied (approximate) equitability in isolation, we consider equitability in conjunction with other well-studied notions of fairness and economic efficiency. We show that the Leximin algorithm produces an allocation that satisfies equitability up to any good and Pareto optimality. We also give a novel algorithm that guarantees Pareto optimality and equitability up to one good in pseudopolynomial time.  Our experiments on real-world preference data reveal that approximate envy-freeness, approximate equitability, and Pareto optimality can often be achieved simultaneously.

### Wednesday 1409:30 - 10:30HAI|CM - Cognitive Modeling (R15)

Chair: TBD
• #3037
A Semantics-based Model for Predicting Children's Vocabulary
Ishaan Grover, Hae Won Park, Cynthia Breazeal
Cognitive Modeling

Intelligent tutoring systems (ITS) provide educational benefits through one-on-one tutoring by assessing children's existing knowledge and providing tailored educational content. In the domain of language acquisition, several studies have shown that children often learn new words by forming semantic relationships with words they already know. In this paper, we present a model that uses word semantics (semantics-based model) to make inferences about a child's vocabulary from partial information about their existing vocabulary knowledge. We show that the proposed semantics-based model outperforms models that do not use word semantics (semantics-free models) on average. A subject-level analysis of results reveals that different models perform well for different children, thus motivating the need to combine predictions. To this end, we use two methods to combine predictions from semantics-based and semantics-free models and show that these methods yield better predictions of a child's vocabulary knowledge. Our results motivate the use of semantics-based models to assess children's vocabulary knowledge and build ITS that maximizes children's semantic understanding of words.

• #3647
Fast and Accurate Classification with a Multi-Spike Learning Algorithm for Spiking Neurons
Rong Xiao, Qiang Yu, Rui Yan, Huajin Tang
Cognitive Modeling

The formulation of efficient supervised learning algorithms for spiking neurons is complicated and remains challenging. Most existing learning methods with the precisely firing times of spikes often result in relatively low efficiency and poor robustness to noise. To address these limitations, we propose a simple and effective multi-spike learning rule to train neurons to match their output spike number with a desired one. The proposed method will quickly find a suitable local maximum value (directly related to the embedded feature) as the relevant signal for synaptic updates based on membrane potential trace of a neuron, and constructs an error function defined as the difference between the local maximum membrane potential of a neuron and its firing threshold. With the presented rule, a single neuron can be trained to learn multi-category tasks, and can successfully mitigate the impact of the input noise and discover embedded features. Experimental results show the proposed algorithm has higher precision, lower computation cost, and better noise robustness than current state-of-the-art learning methods under a wide range of learning tasks.

• #6329
STCA: Spatio-Temporal Credit Assignment with Delayed Feedback in Deep Spiking Neural Networks
Pengjie Gu, Rong Xiao, Gang Pan, Huajin Tang
Cognitive Modeling

The temporal credit assignment problem, which aims to discover the predictive features hidden in distracting background streams with delayed feedback, remains a core challenge in biological and machine learning. To address this issue, we propose a novel spatio-temporal credit assignment algorithm called STCA for training deep spiking neural networks (DSNNs). We present a new spatiotemporal error backpropagation policy by defining a temporal based loss function, which is able to credit the network losses to spatial and temporal domains simultaneously. Experimental results on MNIST dataset and a music dataset (MedleyDB) demonstrate that STCA can achieve comparable performance with other state-of-the-art algorithms with simpler architectures. Furthermore, STCA successfully discovers predictive sensory features and shows the highest performance in the unsegmented sensory event detection tasks.

• #10962
(Sister Conferences Best Papers Track) Trust Dynamics and Transfer across Human-Robot Interaction Tasks: Bayesian and Neural Computational Models
Harold Soh, Shu Pan, Min Chen, David Hsu
Cognitive Modeling

This work contributes both experimental findings and novel computational human-robot trust models for multi-task settings. We describe Bayesian non-parametric and neural models, and compare their performance on data collected from real-world human-subjects study. Our study spans two distinct task domains: household tasks performed by a Fetch robot, and a virtual reality driving simulation of an autonomous vehicle performing a variety of maneuvers. We find that human trust changes and transfers across tasks in a structured manner based on perceived task characteristics. Our results suggest that task-dependent functional trust models capture human trust in robot capabilities more accurately, and trust transfer across tasks can be inferred to a good degree. We believe these models are key for enabling trust-based robot decision-making for natural human-robot interaction.

### Wednesday 1409:30 - 10:30DemoT2 - Demo Talks 2 (R16)

Chair: TBA
• #11026
An Online Intelligent Visual Interaction System
Anxiang Zeng, Han Yu, Xin Gao, Kairi Ou, Zhenchuan Huang, Peng Hou, Mingli Song, Jingshu Zhang, Chunyan Miao
Demo Talks 2

This paper proposes an Online Intelligent Visual Interactive System (OIVIS), which can be applied to various live video broadcast and short video scenes to provide an interactive user experience. In the live video broadcast, the anchor can issue various commands by using pre-defined gestures, and can trigger real-time background replacement to create an immersive atmosphere. To support such dynamic interactivity, we implemented algorithms including real-time gesture recognition and real-time video portrait segmentation, developed a deep network inference framework, and a real-time rendering framework AI Gender at the front end to create a complete set of visual interaction solutions for use in resource constrained mobile.

• #11027
The Open Vault Challenge - Learning how to build calibration-free interactive systems by cracking the code of a vault.
Jonathan Grizou
Demo Talks 2

This demo takes the form of a challenge to the IJCAI community. A physical vault, secured by a 4-digit code, will be placed in the demo area. The author will publicly open the vault by entering the code on a touch-based interface, and as many times as requested. The challenge to the IJCAI participants will be to crack the code, open the vault, and collect its content. The interface is based on previous work on calibration-free interactive systems that enables a user to start instructing a machine without the machine knowing how to interpret the user’s actions beforehand. The intent and the behavior of the human are simultaneously learned by the machine. An online demo and videos are available for readers to participate in the challenge. An additional interface using vocal commands will be revealed on the demo day, demonstrating the scalability of our approach to continuous input signals.

• #11030
ERICA and WikiTalk
Divesh Lala, Graham Wilcock, Kristiina Jokinen, Tatsuya Kawahara
Demo Talks 2

The demo shows ERICA, a highly realistic female android robot, and WikiTalk, an application that helps robots to talk about thousands of topics using information from Wikipedia. The combination of ERICA and WikiTalk results in more natural and engaging human-robot conversations.

• #11038
Hintikka's World: scalable higher-order knowledge
Tristan Charrier, Sébastien Gamblin, Alexandre Niveau, François Schwarzentruber
Demo Talks 2

Hintikka's World is a graphical and pedagogical tool that shows how artificial agents can reason about higher-order knowledge (agent $a$ knows that agent $b$ knows that...).    In this demonstration paper, we present the implementation of symbolic models in Hintikka's World. They enable to face the state explosion and to provide examples such as real card games, such as Hanabi.

• #11044
DISPUTool -- A tool for the Argumentative Analysis of Political Debates
Shohreh Haddadan, Elena Cabrio, Serena Villata
Demo Talks 2

Political debates are the means used by political candidates to put forward and justify their positions in front of the electors with respect to the issues at stake. Argument mining is a novel research area in Artificial Intelligence, aiming at analyzing discourse on the pragmatics level and applying a certain argumentation theory to model and automatically analyze textual data. In this paper, we present DISPUTool, a tool designed to ease the work of historians and social science scholars in analyzing the argumentative content of political speeches. More precisely, DISPUTool allows to explore and automatically identify argumentative components over the 39 political debates from the last 50 years of US presidential campaigns (1960-2016).

• #11048
Mappa Mundi: An Interactive Artistic Mind Map Generator with Artificial Imagination
Ruixue Liu, Baoyang Chen, Meng Chen, Youzheng Wu, Zhijie Qiu, Xiaodong He
Demo Talks 2

We present a novel real-time, collaborative, and interactive AI painting system, Mappa Mundi, for artistic Mind Map creation. The system consists  of a voice-based input interface, an automatic topic expansion module, and an image projection module. The key innovation is to inject Artificial Imagination into painting creation by considering lexical and phonological similarities of language, learning and inheriting artist’s original painting style, and applying the principles of Dadaism and impossibility of improvisation. Our system indicates that AI and artist can collaborate seamlessly to create imaginative artistic painting and Mappa Mundi has been applied in art exhibition in UCCA, Beijing.

### Wednesday 1409:30 - 18:00DB2 - Demo Booths 2 (Z1)

Chair: TBA
• #11022
Fair and Explainable Dynamic Engagement of Crowd Workers
Han Yu, Yang Liu, Xiguang Wei, Chuyu Zheng, Tianjian Chen, Qiang Yang, Xiong Peng
Demo Booths 2

Years of rural-urban migration has resulted in a significant population in China seeking ad-hoc work in large urban centres. At the same time, many businesses face large fluctuations in demand for manpower and require more efficient ways to satisfy such demands. This paper outlines AlgoCrowd, an artificial intelligence (AI)-empowered algorithmic crowdsourcing platform. Equipped with an efficient explainable task-worker matching optimization approach designed to focus on fair treatment of workers while maximizing collective utility, the platform provides explicable task recommendations to workers' personal work management mobile apps which are becoming popular, with the aim to address the above societal challenge.

• #11024
Multi-Agent Visualization for Explaining Federated Learning
Xiguang Wei, Quan Li, Yang Liu, Han Yu, Tianjian Chen, Qiang Yang
Demo Booths 2

As an alternative decentralized training approach, Federated Learning enables distributed agents to collaboratively learn a machine learning model while keeping personal/private information on local devices. However, one significant issue of this framework is the lack of transparency, thus obscuring understanding of the working mechanism of Federated Learning systems. This paper proposes a multi-agent visualization system that illustrates what is Federated Learning and how it supports multi-agents coordination. To be specific, it allows users to participate in the Federated Learning empowered multi-agent coordination. The input and output of Federated Learning are visualized simultaneously, which provides an intuitive explanation of Federated Learning for users in order to help them gain deeper understanding of the technology.

• #11028
AiD-EM: Adaptive Decision Support for Electricity Markets Negotiations
Tiago Pinto, Zita Vale
Demo Booths 2

This paper presents the Adaptive Decision Support for Electricity Markets Negotiations (AiD-EM) system. AiD-EM is a multi-agent system that provides decision support to market players by incorporating multiple sub-(agent-based) systems, directed to the decision support of specific problems. These sub-systems make use of different artificial intelligence methodologies, such as machine learning and evolutionary computing, to enable players adaptation in the planning phase and in actual negotiations in auction-based markets and bilateral negotiations. AiD-EM demonstration is enabled by its connection to MASCEM (Multi-Agent Simulator of Competitive Electricity Markets).

• #11032
Embodied Conversational AI Agents in a Multi-modal Multi-agent Competitive Dialogue
Rahul Divekar, Xiangyang Mou, Lisha Chen, Maíra Gatti de Bayser, Melina Alberio Guerra, Hui Su
Demo Booths 2

In a setting where two AI agents embodied as animated humanoid avatars are engaged in a conversation with one human and each other, we see two challenges. One, determination by the AI agents about which one of them is being addressed. Two, determination by the AI agents if they may/could/should speak at the end of a turn. In this work we bring these two challenges together and explore the participation of AI agents in multi-party conversations. Particularly, we show two embodied AI shopkeeper agents who sell similar items aiming to get the business of a user by competing with each other on the price. In this scenario, we solve the first challenge by using headpose (estimated by deep learning techniques) to determine who the user is talking to. For the second challenge we use deontic logic to model rules of a negotiation conversation.

• #11038
Hintikka's World: scalable higher-order knowledge
Tristan Charrier, Sébastien Gamblin, Alexandre Niveau, François Schwarzentruber
Demo Booths 2

Hintikka's World is a graphical and pedagogical tool that shows how artificial agents can reason about higher-order knowledge (agent $a$ knows that agent $b$ knows that...).    In this demonstration paper, we present the implementation of symbolic models in Hintikka's World. They enable to face the state explosion and to provide examples such as real card games, such as Hanabi.

• #11043
Multi-Agent Path Finding on Ozobots
Roman Barták, Ivan Krasičenko, Jiří Švancara
Demo Booths 2

Multi-agent path finding (MAPF) is the problem to find collision-free paths for a set of agents (mobile robots) moving on a graph. There exists several abstract models describing the problem with various types of constraints. The demo presents software to evaluate the abstract models when the plans are executed on Ozobots, small mobile robots developed for teaching programming. The software allows users to design the grid-like maps, to specify initial and goal locations of robots, to generate plans using various abstract models implemented in the Picat programming language, to simulate and to visualise execution of these plans, and to translate the plans to command sequences for Ozobots.

• #11050
Reagent: Converting Ordinary Webpages into Interactive Software Agents
Matthew Peveler, Jeffrey Kephart, Hui Su
Demo Booths 2

Reagent is a technology that readily converts ordinary webpages containing structured content into software agents, allowing for natural interaction through a combination of speech and pointing. Without needing special instrumentation on the part of website operators, Reagent does this through injection of a transparent layer that automatically captures semantic details and mouse interactions. It can then combine this content and interactions with text transcriptions of user speech to derive and execute parameterized commands representing human intent. For content that the system cannot automatically understand, it gives an easy interface for a human user to assist it in helping to build a mapping between the content and domain specific vocabulary.

• #11029
Deep Reinforcement Learning for Ride-sharing Dispatching and Repositioning
Zhiwei (Tony) Qin, Xiaocheng Tang, Yan Jiao, Fan Zhang, Chenxi Wang, Qun (Tracy) Li
Demo Booths 2

In this demo, we will present a simulation-based human-computer interaction of deep reinforcement learning in action on order dispatching and driver repositioning for ride-sharing.  Specifically, we will demonstrate through several specially designed domains how we use deep reinforcement learning to train agents (drivers) to have longer optimization horizon and to cooperate to achieve higher objective values collectively.

• #11035
Intelligent Decision Support for Improving Power Management
Yongqing Zheng, Han Yu, Kun Zhang, Yuliang Shi, Cyril Leung, Chunyan Miao
Demo Booths 2

With the development and adoption of the electricity information tracking system in China, real-time electricity consumption big data have become available to enable artificial intelligence (AI) to help power companies and the urban management departments to make demand side management decisions. We demonstrate the Power Intelligent Decision Support (PIDS) platform, which can generate Orderly Power Utilization (OPU) decision recommendations and perform Demand Response (DR) implementation management based on a short-term load forecasting model. It can also provide different users with query and application functions to facilitate explainable decision support.

• #11041
Contextual Typeahead Sticker Suggestions on Hike Messenger
Mohamed Hanoosh, Abhishek Laddha, Debdoot Mukherjee
Demo Booths 2

In this demonstration, we present Hike's sticker recommendation system, which helps users choose the right sticker to substitute the next message that they intend to send in a chat. We describe how the system addresses the issue of numerous orthographic variations for chat messages and operates under 20 milliseconds with low CPU and memory footprint on device.

### Wednesday 1411:00 - 12:30Panel: IJCAI at 50: is AI pivoting or refactoring? (R1)

Chair: Ray Perrault
• 50 years of IJCAI
Ray Perrault
Panel: IJCAI at 50: is AI pivoting or refactoring?
• ### Wednesday 1411:00 - 12:30AI-HWB - ST: AI for Improving Human Well-Being 2 (R2)

Chair: TBD
• #457
Safe Contextual Bayesian Optimization for Sustainable Room Temperature PID Control Tuning
Marcello Fiducioso, Sebastian Curi, Benedikt Schumacher, Markus Gwerder, Andreas Krause
ST: AI for Improving Human Well-Being 2

We tune one of the most common heating, ventilation, and air conditioning (HVAC) control loops, namely the temperature control of a room. For economical and environmental reasons, it is of prime importance to optimize the performance of this system. Buildings account from 20 to 40 % of a country energy consumption, and almost 50 % of it comes from HVAC systems. Scenario projections predict a 30 % decrease in heating consumption by 2050 due to efficiency increase. Advanced control techniques can improve performance; however, the proportional-integral-derivative (PID) control is typically used due to its simplicity and overall performance. We use Safe Contextual Bayesian Optimization to optimize the PID parameters without human intervention. We reduce costs by 32 % compared to the current PID controller setting while assuring safety and comfort to people in the room. The results of this work have an immediate impact on the room control loop performances and its related commissioning costs. Furthermore, this successful attempt paves the way for further use at different levels of HVAC systems, with promising energy, operational, and commissioning costs savings, and it is a practical demonstration of the positive effects that Artificial Intelligence can have on environmental sustainability.

• #1168
Protecting Neural Networks with Hierarchical Random Switching: Towards Better Robustness-Accuracy Trade-off for Stochastic Defenses
Xiao Wang, Siyue Wang, Pin-Yu Chen, Yanzhi Wang, Brian Kulis, Xue Lin, Sang Chin
ST: AI for Improving Human Well-Being 2

Despite achieving remarkable success in various domains, recent studies have uncovered the vulnerability of deep neural networks to adversarial perturbations, creating concerns on model generalizability and new threats such as prediction-evasive misclassification or stealthy reprogramming. Among different defense proposals, stochastic network defenses such as random neuron activation pruning or random perturbation to layer inputs are shown to be promising for attack mitigation. However, one critical drawback of current defenses is that the robustness enhancement is at the cost of noticeable performance degradation on legitimate data, e.g., large drop in test accuracy.This paper is motivated by pursuing for a better trade-off between adversarial robustness and test accuracy for stochastic network defenses. We propose Defense Efficiency Score (DES), a comprehensive metric that measures the gain in unsuccessful attack attempts at the cost of drop in test accuracy of any defense. To achieve a better DES, we propose hierarchical random switching (HRS), which protects neural networks through a novel randomization scheme. A HRS-protected model contains several blocks of randomly switching channels to prevent adversaries from exploiting fixed model structures and parameters for their malicious purposes. Extensive experiments show that HRS is superior in defending against state-of-the-art white-box and adaptive adversarial misclassification attacks. We also demonstrate the effectiveness of HRS in defending adversarial reprogramming, which is the first defense against adversarial programs. Moreover, in most settings the average DES of HRS is at least 5X higher than current stochastic network defenses, validating its significantly improved robustness-accuracy trade-off. p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 10.0px Helvetica} p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 10.0px Helvetica} p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 10.0px Helvetica} p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 10.0px Helvetica} p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 10.0px Helvetica}

• #707
KitcheNette: Predicting and Ranking Food Ingredient Pairings based on Siamese Neural Network
Donghyeon Park, Keonwoo Kim, Yonggyu Park, Jungwoon Shin, Jaewoo Kang
ST: AI for Improving Human Well-Being 2

As a vast number of ingredients exist in the culinary world, there are countless food ingredient pairings, but only a small number of pairings have been adopted by chefs and studied by food researchers. In this work, we propose KitcheNette which is a model that predicts food ingredient pairing scores and recommends optimal ingredient pairings. KitcheNette employs Siamese neural networks and is trained on our annotated dataset containing 300K scores of pairings generated from numerous ingredients in food recipes. As the results demonstrate, our model not only outperforms other baseline models but also can recommend complementary food pairings and discover novel ingredient pairings.

• #1303
SparseSense: Human Activity Recognition from Highly Sparse Sensor Data-streams Using Set-based Neural Networks
Alireza Abedin, S. Hamid Rezatofighi, Qinfeng Shi, Damith C. Ranasinghe
ST: AI for Improving Human Well-Being 2

Batteryless or so called passive wearables are providing new and innovative methods for human activity recognition (HAR), especially in healthcare applications for older people. Passive sensors are low cost, lightweight, unobtrusive and desirably disposable; attractive attributes for healthcare applications in hospitals and nursing homes. Despite the compelling propositions for sensing applications, the data streams from these sensors are characterised by high sparsity---the time intervals between sensor readings are irregular while the number of readings per unit time are often limited. In this paper, we rigorously explore the problem of learning activity recognition models from temporally sparse data. We describe how to learn directly from sparse data using a deep learning paradigm in an end-to-end manner. We demonstrate significant classification performance improvements on real-world passive sensor datasets from older people over the state-of-the-art deep learning human activity recognition models. Further, we provide insights into the model's behaviour through complementary experiments on a benchmark dataset and visualisation of the learned activity feature spaces.

• #3424
MNN: Multimodal Attentional Neural Networks for Diagnosis Prediction
Zhi Qiao, Xian Wu, Shen Ge, Wei Fan
ST: AI for Improving Human Well-Being 2

Diagnosis prediction plays a key role in clinical decision supporting process, which attracted extensive research attention recently. Existing studies mainly utilize discrete medical codes (e.g., the ICD codes and procedure codes) as the primary features in prediction. However, in real clinical settings, such medical codes could be either incomplete or erroneous. For example, missed diagnosis will neglect some codes which should be included, mis-diagnosis will generate incorrect medical codes. To increase the robustness towards noisy data, we introduce textual clinical notes in addition to medical codes. Combining information from both sides will lead to improved understanding towards clinical health conditions. To accommodate both the textual notes and discrete medical codes in the same framework, we propose Multimodal Attentional Neural Networks (MNN), which integrates multi-modal data in a collaborative manner. Experimental results on real world EHR datasets demonstrate the advantages of MNN in terms of both robustness and accuracy.

• #4465
Global Robustness Evaluation of Deep Neural Networks with Provable Guarantees for the Hamming Distance
Wenjie Ruan, Min Wu, Youcheng Sun, Xiaowei Huang, Daniel Kroening, Marta Kwiatkowska
ST: AI for Improving Human Well-Being 2

Deployment of deep neural networks (DNNs) in safety-critical systems requires provable guarantees for their correct behaviours. We compute the maximal radius of a safe norm ball around a given input, within which there are no adversarial examples for a trained DNN. We define global robustness as an expectation of the maximal safe radius over a test dataset, and develop an algorithm to approximate the global robustness measure by iteratively computing its lower and upper bounds. Our algorithm is the first efficient method for the Hamming (L0) distance, and we hypothesise that this norm is a good proxy for a certain class of physical attacks. The algorithm is anytime, i.e., it returns intermediate bounds and robustness estimates that are gradually, but strictly, improved as the computation proceeds; tensor-based, i.e., the computation is conducted over a set of inputs simultaneously to enable efficient GPU computation; and has provable guarantees, i.e., both the bounds and the robustness estimates can converge to their optimal values. Finally, we demonstrate the utility of our approach by applying the algorithm to a set of challenging problems.

### Wednesday 1411:00 - 12:30ML|DL - Deep Learning 4 (R3)

Chair: TBD
• #2022
Towards Robust ResNet: A Small Step but a Giant Leap
Jingfeng Zhang, Bo Han, Laura Wynter, Bryan Kian Hsiang Low, Mohan Kankanhalli
Deep Learning 4

This paper presents a simple yet principled approach to boosting the robustness of the residual network (ResNet) that is motivated by a dynamical systems perspective. Namely, a deep neural network can be interpreted using a partial differential equation, which naturally inspires us to characterize ResNet based on an explicit Euler method. This consequently allows us to exploit the step factor h in the Euler method to control the robustness of ResNet in both its training and generalization. In particular, we prove that a small step factor h can benefit its training and generalization robustness during backpropagation and forward propagation, respectively. Empirical evaluation on real-world datasets corroborates our analytical findings that a small h can indeed improve both its training and generalization robustness.

• #2847
Reparameterizable Subset Sampling via Continuous Relaxations
Sang Michael Xie, Stefano Ermon
Deep Learning 4

Many machine learning tasks require sampling a subset of items from a collection based on a parameterized distribution. The Gumbel-softmax trick can be used to sample a single item, and allows for low-variance reparameterized gradients with respect to the parameters of the underlying distribution. However, stochastic optimization involving subset sampling is typically not reparameterizable. To overcome this limitation, we define a continuous relaxation of subset sampling that provides reparameterization gradients by generalizing the Gumbel-max trick. We use this approach to sample subsets of features in an instance-wise feature selection task for model interpretability, subsets of neighbors to implement a deep stochastic k-nearest neighbors model, and sub-sequences of neighbors to implement parametric t-SNE by directly comparing the identities of local neighbors. We improve performance in all these tasks by incorporating subset sampling in end-to-end training.

• #2949
Image Captioning with Compositional Neural Module Networks
Junjiao Tian, Jean Oh
Deep Learning 4

In image captioning, sequential models are preferred where fluency is an important factor in evaluation, n-gram metrics; however, sequential models generally result in over-generalized expressions that lack the details which may be present in an input image. Inspired by the idea of compositional neural module networks in the visual question answering task, we introduce a hierarchical framework for image captioning that explores both compositionality and sequentiality of natural language. Our algorithm learns to compose a sentence by selectively attending to different modules corresponding to unique aspects of each object detected in an input image such as counts and color.In a set of experiments on the MSCOCO dataset, the proposed model outperforms a state-of-the art model across multiple evaluation metrics, more importantly, presenting visually interpretable results. Furthermore, the breakdown of subcategories f-scores of the SPICE metric and human evaluation on Amazon Mechanical Turk  show that our compositional module networks effectively generate  accurate and detailed captions.

• #5573
Extrapolating Paths with Graph Neural Networks
Jean-Baptiste Cordonnier, Andreas Loukas
Deep Learning 4

We consider the problem of path inference: given a path prefix, i.e., a partially observed sequence of nodes in a graph, we want to predict which nodes are in the missing suffix. In particular, we focus on natural paths occurring as a by-product of the interaction of an agent with a network---a driver on the transportation network, an information seeker in Wikipedia, or a client in an online shop. Our interest is sparked by the realization that, in contrast to shortest-path problems, natural paths are usually not optimal in any graph-theoretic sense, but might still follow predictable patterns. Our main contribution is a graph neural network called Gretel. Conditioned on a path prefix, this network can efficiently extrapolate path suffixes, evaluate path likelihood, and sample from the future path distribution. Our experiments with GPS traces on a road network and user-navigation paths in Wikipedia confirm that Gretel is able to adapt to graphs with very different properties, while also comparing favorably to previous solutions.

• #3760
Ornstein Auto-Encoders
Youngwon Choi, Joong-Ho Won
Deep Learning 4

We propose the Ornstein auto-encoder (OAE), a representation learning model for correlated data. In many interesting applications, data have nested structures. Examples include the VGGFace and MNIST datasets. We view such data consist of i.i.d. copies of a stationary random process, and seek a latent space representation of the observed sequences. This viewpoint necessitates a distance measure between two random processes. We propose to use Orstein's d-bar distance, a process extension of Wasserstein's distance. We first show that the theorem by Bousquet et al. (2017) for Wasserstein auto-encoders extends to stationary random processes. This result, however, requires both encoder and decoder to map an entire sequence to another. We then show that, when exchangeability within a process, valid for VGGFace and MNIST, is assumed, these maps reduce to univariate ones, resulting in a much simpler, tractable optimization problem. Our experiments show that OAEs successfully separate individual sequences in the latent space, and can generate new variations of unknown, as well as known, identity. The latter has not been possible with other existing methods.

• #3138
Variational Graph Embedding and Clustering with Laplacian Eigenmaps
Zitai Chen, Chuan Chen, Zong Zhang, Zibin Zheng, Qingsong Zou
Deep Learning 4

As a fundamental machine learning problem, graph clustering has facilitated various real-world applications, and tremendous efforts had been devoted to it in the past few decades. However, most of the existing methods like spectral clustering suffer from the sparsity, scalability, robustness and handling high dimensional raw information in clustering. To address this issue, we propose a deep probabilistic model, called Variational Graph Embedding and Clustering with Laplacian Eigenmaps (VGECLE), which learns node embeddings and assigns node clusters simultaneously. It represents each node as a Gaussian distribution to disentangle the true embedding position and the uncertainty from the graph. With a Mixture of Gaussian (MoG) prior, VGECLE is capable of learning an interpretable clustering by the variational inference and generative process. In order to learn the pairwise relationships better, we propose a Teacher-Student mechanism encouraging node to learn a better Gaussian from its instant neighbors in the stochastic gradient descent (SGD) training fashion. By optimizing the graph embedding and the graph clustering problem as a whole, our model can fully take the advantages in their correlation. To our best knowledge, we are the first to tackle graph clustering in a deep probabilistic viewpoint. We perform extensive experiments on both synthetic and real-world networks to corroborate the effectiveness and efficiency of the proposed framework.

### Wednesday 1411:00 - 12:30ML|RL - Reinforcement Learning 3 (R4)

Chair: TBD
• #149
Experience Replay Optimization
Daochen Zha, Kwei-Herng Lai, Kaixiong Zhou, Xia Hu
Reinforcement Learning 3

Experience replay enables reinforcement learning agents to memorize and reuse past experiences, just as humans replay memories for the situation at hand. Contemporary off-policy algorithms either replay past experiences uniformly or utilize a rule-based replay strategy, which may be sub-optimal. In this work, we consider learning a replay policy to optimize the cumulative reward. Replay learning is challenging because the replay memory is noisy and large, and the cumulative reward is unstable. To address these issues, we propose a novel experience replay optimization (ERO) framework which alternately updates two policies: the agent policy, and the replay policy. The agent is updated to maximize the cumulative reward based on the replayed data, while the replay policy is updated to provide the agent with the most useful experiences. The conducted experiments on various continuous control tasks demonstrate the effectiveness of ERO, empirically showing promise in experience replay learning to improve the performance of off-policy reinforcement learning algorithms.

• #354
Interactive Teaching Algorithms for Inverse Reinforcement Learning
Parameswaran Kamalaruban, Rati Devidze, Volkan Cevher, Adish Singla
Reinforcement Learning 3

We study the problem of inverse reinforcement learning (IRL) with the added twist that the learner is assisted by a helpful teacher. More formally, we tackle the following algorithmic question: How could a teacher provide an informative sequence of demonstrations to an IRL learner to speed up the learning process? We present an interactive teaching framework where a teacher adaptively chooses the next demonstration based on learner's current policy. In particular, we design teaching algorithms for two concrete settings: an omniscient setting where a teacher has full knowledge about the learner's dynamics and a blackbox setting where the teacher has minimal knowledge. Then, we study a sequential variant of the popular MCE-IRL learner and prove convergence guarantees of our teaching algorithm in the omniscient setting. Extensive experiments with a car driving simulator environment show that the learning progress can be speeded up drastically as compared to an uninformative teacher.

• #619
Interactive Reinforcement Learning with Dynamic Reuse of Prior Knowledge from Human/Agent's Demonstration
Zhaodong Wang, Matt Taylor
Reinforcement Learning 3

Reinforcement learning has enjoyed multiple impressive successes in recent years. However, these successes typically require very large amounts of data before an agent achieves acceptable performance. This paper focuses on a novel way of combating such requirements by leveraging existing (human or agent) knowledge. In particular, this paper leverages demonstrations, allowing an agent to quickly achieve high performance. This paper introduces the Dynamic Reuse of Prior (DRoP) algorithm, which combines the offline knowledge (demonstrations recorded before learning) with online confidence-based performance analysis. DRoP leverages the demonstrator's knowledge by automatically balancing between reusing the prior knowledge and the current learned policy, allowing the agent to outperform the original demonstrations. We compare with multiple state-of-the-art learning algorithms and empirically show that DRoP can achieve superior performance in two domains. Additionally, we show that this confidence measure can be used to selectively request additional demonstrations, significantly improving the learning performance of the agent.

• #1363
Meta Reinforcement Learning with Task Embedding and Shared Policy
Lin Lan, Zhenguo Li, Xiaohong Guan, Pinghui Wang
Reinforcement Learning 3

Despite significant progress, deep reinforcement learning (RL) suffers from data-inefficiency and limited generalization. Recent efforts apply meta-learning to learn a meta-learner from a set of RL tasks such that a novel but related task could be solved quickly. Though specific in some ways, different tasks in meta-RL are generally similar at a high level. However, most meta-RL methods do not explicitly and adequately model the specific and shared information among different tasks, which limits their ability to learn training tasks and to generalize to novel tasks. In this paper, we propose to capture the shared information on the one hand and meta-learn how to quickly abstract the specific information about a task on the other hand. Methodologically, we train an SGD meta-learner to quickly optimize a task encoder for each task, which generates a task embedding based on past experience. Meanwhile, we learn a policy which is shared across all tasks and conditioned on task embeddings. Empirical results on four simulated tasks demonstrate that our method has better learning capacity on both training and novel tasks and attains up to 3 to 4 times higher returns compared to baselines.

• #2866
The Expected-Length Model of Options
David Abel, John Winder, Marie desJardins, Michael Littman
Reinforcement Learning 3

Effective options can make reinforcement learning easier by enhancing an agent's ability to both explore in a targeted manner and plan further into the future. However, learning an appropriate model of an option's dynamics in hard, requiring estimating a highly parameterized probability distribution. This paper introduces and motivates the Expected-Length Model (ELM) for options, an alternate model for transition dynamics. We prove ELM is a (biased) estimator of the traditional Multi-Time Model (MTM), but provide a non-vacuous bound on their deviation. We further prove that, in stochastic shortest path problems, ELM induces a value function that is sufficiently similar to the one induced by MTM, and is thus capable of supporting near-optimal behavior. We explore the practical utility of this option model experimentally, finding consistent support for the thesis that ELM is a suitable replacement for MTM. In some cases, we find ELM leads to more sample efficient learning, especially when options are arranged in a hierarchy.

• #5384
Planning with Expectation Models
Yi Wan, Muhammad Zaheer, Adam White, Martha White, Richard S. Sutton
Reinforcement Learning 3

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; color: #000000} Distribution and sample models are two popular model choices in model-based reinforcement learning (MBRL). However, learning these models can be intractable, particularly when the state and action spaces are large. Expectation models, on the other hand, are relatively easier to learn due to their compactness and have also been widely used for deterministic environments. For stochastic environments, it is not obvious how expectation models can be used for planning as they only partially characterize a distribution. In this paper, we propose a sound way of using approximate expectation models for MBRL. In particular, we 1) show that planning with an expectation model is equivalent to planning with a distribution model if the state value function is linear in state features, 2) analyze two common parametrization choices for approximating the expectation: linear and non-linear expectation models, 3) propose a sound model-based policy evaluation algorithm and present its convergence results, and 4) empirically demonstrate the effectiveness of the proposed planning algorithm.

### Wednesday 1411:00 - 12:30AMS|AGT - Algorithmic Game Theory 1 (R5)

Chair: TBD
• #166
Achieving a Fairer Future by Changing the Past
Jiafan He, Ariel D. Procaccia, Alexandros Psomas, David Zeng
Algorithmic Game Theory 1

We study the problem of allocating T indivisible items that arrive online to agents with additive valuations. The allocation must satisfy a prominent fairness notion, envy-freeness up to one item (EF1), at each round. To make this possible, we allow the reallocation of previously allocated items, but aim to minimize these so-called adjustments. For the case of two agents, we show that algorithms that are informed about the values of future items can get by without any adjustments, whereas uninformed algorithms require Theta(T) adjustments. For the general case of three or more agents, we prove that even informed algorithms must use Omega(T) adjustments, and design an uninformed algorithm that requires only O(T^(3/2)).

• #3032
Preferred Deals in General Environments
Yuan Deng, Sébastien Lahaie, Vahab Mirrokni
Algorithmic Game Theory 1

A preferred deal is a special contract for selling impressions of display ad inventory. By accepting a deal, a buyer agrees to buy a minimum amount of impressions at a fixed price per impression, and is granted priority access to the impressions before they are sent to an open auction on an ad exchange. We consider the problem of designing preferred deals (inventory, price, quantity) in the presence of general convex constraints, including budget constraints, and propose an approximation algorithm to maximize the revenue obtained from the deals. We then evaluate our algorithm using auction data from a major advertising exchange and our empirical results show that the algorithm achieves around 95% of the optimal revenue.

• #3679
On the Efficiency and Equilibria of Rich Ads
MohammadAmin Ghiasi, MohammadTaghi Hajiaghayi, Sébastien Lahaie, Hadi Yami
Algorithmic Game Theory 1

Search ads have evolved in recent years from simple text formats to rich ads that allow deep site links, rating, images and videos. In this paper, we consider a model where several slots are available on the search results page, as in the classic generalized second-price auction (GSP), but now a bidder can be allocated several consecutive slots, which are interpreted as a rich ad. As in the GSP, each bidder submits a bid-per-click, but the click-through rate (CTR) function is generalized from a simple CTR for each slot to a general CTR function over sets of consecutive slots. We study allocation and pricing in this model under subadditive and fractionally subadditive CTRs. We design and analyze a constant-factor approximation algorithm for the efficient allocation problem under fractionally subadditive CTRs, and a log-approximation algorithm for the subadditive case. Building on these results, we show that approximate competitive equilibrium prices exist and can be computed for subadditive and fractionally subadditive CTRs, with the same guarantees as for allocation.

• #4855
Neural Networks for Predicting Human Interactions in Repeated Games
Yoav Kolumbus, Gali Noti
Algorithmic Game Theory 1

We consider the problem of predicting human players' actions in repeated strategic interactions. Our goal is to predict the dynamic step-by-step behavior of individual players in previously unseen games. We study the ability of neural networks to perform such predictions and the information that they require. We show on a dataset of normal-form games from experiments with human participants that standard neural networks are able to learn functions that provide more accurate predictions of the players' actions than established models from behavioral economics. The networks outperform the other models in terms of prediction accuracy and cross-entropy, and yield higher economic value. We show that if the available input is only of a short sequence of play, economic information about the game is important for predicting behavior of human agents. However, interestingly, we find that when the networks are trained with long enough sequences of history of play, action-based networks do well and additional economic details about the game do not improve their performance, indicating that the sequence of actions encode sufficient information for the success in the prediction task.

• #6034
Ridesharing with Driver Location Preferences
Duncan Rheingans-Yoo, Scott Duke Kominers, Hongyao Ma, David C. Parkes
Algorithmic Game Theory 1

We study revenue-optimal pricing and driver compensation in ridesharing platforms when drivers have heterogeneous preferences over locations. If a platform ignores drivers' location preferences, it may make inefficient trip dispatches; moreover, drivers may strategize so as to route towards their preferred locations. In a model with stationary and continuous demand and supply, we present a mechanism that incentivizes drivers to both (i) report their location preferences truthfully and (ii) always provide service. In settings with unconstrained driver supply or symmetric demand patterns, our mechanism achieves (full-information) first-best revenue. Under supply constraints and unbalanced demand, we show via simulation that our mechanism improves over existing mechanisms and has performance close to the first-best.

• #10967
(Sister Conferences Best Papers Track) The Power of Context in Networks: Ideal Point Models with Social Interactions
Mohammad T. Irfan, Tucker Gordon
Algorithmic Game Theory 1

Game theory has been widely used for modeling strategic behaviors in networked multiagent systems. However, the context within which these strategic behaviors take place has received limited attention. We present a model of strategic behavior in networks that incorporates the behavioral context, focusing on the contextual aspects of congressional voting. One salient predictive model in political science is the ideal point model, which assigns each senator and each bill a number on the real line of political spectrum. We extend the classical ideal point model with network-structured interactions among senators. In contrast to the ideal point model's prediction of individual voting behavior, we predict joint voting behaviors in a game-theoretic fashion. The consideration of context allows our model to outperform previous models that solely focus on the networked interactions with no contextual parameters. We focus on two fundamental questions: learning the model using real-world data and computing stable outcomes of the model with a view to predicting joint voting behaviors and identifying most influential senators. We demonstrate the effectiveness of our model through experiments using data from the 114th U.S. Congress.

### Wednesday 1411:00 - 12:30MTA|SP - Security and Privacy 1 (R6)

Chair: TBD
• #606
Principal Component Analysis in the Local Differential Privacy Model
Di Wang, Jinhui Xu
Security and Privacy 1

In this paper, we study the Principal Component Analysis (PCA) problem under the (distributed) non-interactive local differential privacy model. For the low dimensional case, we show the optimal rate for the private minimax risk of the k-dimensional PCA using the squared subspace distance as the measurement. For the high dimensional row sparse case, we first give a lower bound on the private minimax risk, . Then we provide an efficient algorithm to achieve a near optimal upper bound. Experiments on both synthetic and real world datasets confirm the theoretical guarantees of our algorithms.

• #1230
DeepInspect: A Black-box Trojan Detection and Mitigation Framework for Deep Neural Networks
Huili Chen, Cheng Fu, Jishen Zhao, Farinaz Koushanfar
Security and Privacy 1

Deep Neural Networks (DNNs) are vulnerable to Neural Trojan (NT) attacks where the adversary injects malicious behaviors during DNN training. This type of ‘backdoor’ attack is activated when the input is stamped with the trigger pattern specified by the attacker, resulting in an incorrect prediction of the model. Due to the wide application of DNNs in various critical fields, it is indispensable to inspect whether the pre-trained DNN has been trojaned before employing a model. Our goal in this paper is to address the security concern on unknown DNN to NT attacks and ensure safe model deployment. We propose DeepInspect, the first black-box Trojan detection solution with minimal prior knowledge of the model. DeepInspect learns the probability distribution of potential triggers from the queried model using a conditional generative model, thus retrieves the footprint of backdoor insertion. In addition to NT detection, we show that DeepInspect’s trigger generator enables effective Trojan mitigation by model patching. We corroborate the effectiveness, efficiency, and scalability of DeepInspect against the state-of-the-art NT attacks across various benchmarks. Extensive experiments show that DeepInspect offers superior detection performance and lower runtime overhead than the prior work.

• #3546
VulSniper: Focus Your Attention to Shoot Fine-Grained Vulnerabilities
Xu Duan, Jingzheng Wu, Shouling Ji, Zhiqing Rui, Tianyue Luo, Mutian Yang, Yanjun Wu
Security and Privacy 1

With the explosive development of information technology, vulnerabilities have become one of the major threats to computer security. Most vulnerabilities with similar patterns can be detected effectively by static analysis methods. However, some vulnerable and non-vulnerable code is hardly distinguishable, resulting in low detection accuracy. In this paper, we define the accurate identification of vulnerabilities in similar code as a fine-grained vulnerability detection problem. We propose VulSniper which is designed to detect fine-grained vulnerabilities more effectively. In VulSniper, attention mechanism is used to capture the critical features of the vulnerabilities. Especially, we use bottom-up and top-down structures to learn the attention weights of different areas of the program. Moreover, in order to fully extract the semantic features of the program, we generate the code property graph, design a 144-dimensional vector to describe the relation between the nodes, and finally encode the program as a feature tensor. VulSniper achieves F1-scores of 80.6% and 73.3% on the two benchmark datasets, the SARD Buffer Error dataset and the SARD Resource Management Error dataset respectively, which are significantly higher than those of the state-of-the-art methods.

• #4851
Lower Bound of Locally Differentially Private Sparse Covariance Matrix Estimation
Di Wang, Jinhui Xu
Security and Privacy 1

In this paper, we study the sparse covariance matrix estimation problem in the local differential privacy model, and give a non-trivial lower bound on the non-interactive private minimax risk in the metric of squared spectral norm. We show that the lower bound is actually tight, as it matches a previous upper bound. Our main technique for achieving this lower bound is a general framework, called General Private Assouad Lemma, which is a considerable generalization of the previous private Assouad lemma and can be used as a general method for bounding the private minimax risk of matrix-related estimation problems.

• #5689
Data Poisoning against Differentially-Private Learners: Attacks and Defenses
Yuzhe Ma, Xiaojin Zhu, Justin Hsu
Security and Privacy 1

Data poisoning attacks aim to manipulate the model produced by a learning algorithm by adversarially modifying the training set. We consider differential privacy as a defensive measure against this type of attack. We show that private learners are resistant to data poisoning attacks when the adversary is only able to poison a small number of items. However, this protection degrades as the adversary is allowed to poison more data. We emprically evaluate this protection by designing attack algorithms targeting objective and output perturbation learners, two standard approaches to differentially-private machine learning. Experiments show that our methods are effective when the attacker is allowed to poison sufficiently many training items.

• #5758
Robustra: Training Provable Robust Neural Networks over Reference Adversarial Space
Linyi Li, Zexuan Zhong, Bo Li, Tao Xie
Security and Privacy 1

Machine learning techniques, especially deep neural networks (DNNs), have been widely adopted in various applications. However, DNNs are recently found to be vulnerable against adversarial examples, i.e., maliciously perturbed inputs that can mislead the models to make arbitrary prediction errors. Empirical defenses have been studied, but many of them can be adaptively attacked again. Provable defenses provide provable error bound of DNNs, while such bound so far is far from satisfaction. To address this issue, in this paper, we present our approach named Robustra for effectively improving the provable error bound of DNNs. We leverage the adversarial space of a reference model as the feasible region to solve the min-max game between the attackers and defenders. We solve its dual problem by linearly approximating the attackers' best strategy and utilizing the monotonicity of the slack variables introduced by the reference model. The evaluation results show that our approach can provide significantly better provable adversarial error bounds on MNIST and CIFAR10 datasets, compared to the state-of-the-art results. In particular, bounded by L^infty, with epsilon = 0.1, on MNIST we reduce the error bound from 2.74% to 2.09%; with epsilon = 0.3, we reduce the error bound from 24.19% to 16.91%.

### Wednesday 1411:00 - 12:30ML|OL - Online Learning 1 (R7)

Chair: TBD
• #2197
A Practical Semi-Parametric Contextual Bandit
Yi Peng, Miao Xie, Jiahao Liu, Xuying Meng, Nan Li, Cheng Yang, Tao Yao, Rong Jin
Online Learning 1

Classic multi-armed bandit algorithms are inefficient for a large number of arms. On the other hand, contextual bandit algorithms are more efficient, but they suffer from a large regret due to the bias of reward estimation with finite dimensional features. Although recent studies proposed semi-parametric bandits to overcome these defects, they assume arms' features are constant over time. However, this assumption rarely holds in practice, since real-world problems often involve underlying processes that are dynamically evolving over time especially for the special promotions like Singles' Day sales. In this paper, we formulate a novel Semi-Parametric Contextual Bandit Problem to relax this assumption. For this problem, a novel Two-Steps Upper-Confidence Bound framework, called Semi-Parametric UCB (SPUCB), is presented. It can be flexibly applied to linear parametric function problem with a satisfied gap-free bound on the n-step regret. Moreover, to make our method more practical in online system, an optimization is proposed for dealing with high dimensional features of a linear function. Extensive experiments on synthetic data as well as a real dataset from one of the largest e-commercial platforms demonstrate the superior performance of our algorithm.

• #3401
Marginal Posterior Sampling for Slate Bandits
Maria Dimakopoulou, Nikos Vlassis, Tony Jebara
Online Learning 1

We introduce a new Thompson sampling-based algorithm, called marginal posterior sampling, for online slate bandits, that is characterized by three key ideas. First, it postulates that the slate-level reward is a monotone function of the marginal unobserved rewards of the base actions selected in the slates's slots, but it does not attempt to estimate this function. Second, instead of maintaining a slate-level reward posterior, the algorithm maintains posterior distributions for the marginal reward of each slot's base actions and uses the samples from these marginal posteriors to select the next slate. Third, marginal posterior sampling optimizes at the slot-level rather than the slate-level, which makes the approach computationally efficient. Simulation results establish substantial advantages of marginal posterior sampling over alternative Thompson sampling-based approaches that are widely used in the domain of web services.

• #3855
Learning Multi-Objective Rewards and User Utility Function in Contextual Bandits for Personalized Ranking
Nirandika Wanigasekara, Yuxuan Liang, Siong Thye Goh, Ye Liu, Joseph Jay Williams, David S. Rosenblum
Online Learning 1

This paper tackles the problem of providing users with ranked lists of relevant search results, by incorporating contextual features of the users and search results, and learning how a user values multiple objectives. For example, to recommend a ranked list of hotels, an algorithm needs to learn which hotels are the right price for particular users, as well as how users vary in their weighting of price against the location. In our paper, we formulate the context-aware, multi-objective, ranking problem as a Multi-Objective Contextual Ranked Bandit (MOCR-B). To solve the MOCR-B problem, we present a novel algorithm, named Multi-Objective Utility-Upper Confidence Bound (MOU-UCB). The goal of MOU-UCB is to learn how to generate a ranked list of resources that maximizes the rewards in multiple objectives to give relevant search results. Our algorithm learns to predict rewards in multiple objectives based on contextual information (combining the Upper Confidence Bound algorithm for multi-armed contextual bandits with neural network embeddings), as well as learns how a user weights the multiple objectives. Our empirical results reveal that the ranked lists generated by MOU-UCB lead to better click-through rates, compared to approaches that do not learn the utility function over multiple reward objectives.

• #5278
Perturbed-History Exploration in Stochastic Multi-Armed Bandits
Branislav Kveton, Csaba Szepesvári, Mohammad Ghavamzadeh, Craig Boutilier
Online Learning 1

We propose an online algorithm for cumulative regret minimization in a stochastic multi-armed bandit. The algorithm adds $O(t)$ i.i.d. pseudo-rewards to its history in round $t$ and then pulls the arm with the highest average reward in its perturbed history. Therefore, we call it perturbed-history exploration (PHE). The pseudo-rewards are designed to offset the underestimated mean rewards of arms in round $t$ with a high probability. We analyze PHE in a $K$-armed bandit with $[0, 1]$ rewards, and derive both $O(K \Delta^{-1} \log n)$ and $O(\sqrt{K n \log n})$ bounds on its $n$-round regret, where $\Delta$ is the minimum gap between the mean rewards of the optimal and suboptimal arms. The key step in our analysis is a novel argument that shows that randomized Bernoulli rewards lead to optimism. We empirically evaluate PHE and show that it is competitive with state-of-the-art baselines.

• #5548
Unifying the Stochastic and the Adversarial Bandits with Knapsack
Anshuka Rangi, Massimo Franceschetti, Long Tran-Thanh
Online Learning 1

This work investigates the adversarial Bandits with Knapsack (BwK) learning problem, where a player repeatedly chooses to perform an action, pays the corresponding cost of the action, and receives a reward associated with the action. The player is constrained by the maximum budget $B$ that can be spent to perform the actions, and the rewards and the costs of these actions are assigned by an adversary. This setting is studied in terms of expected regret defined as the difference between the total expected rewards per unit cost corresponding the best fixed action and the total expected rewards per unit cost of learning algorithm. We propose a novel algorithm EXP3.BwK and show that its expected regret is $O(\sqrt{B})$. We also show that the expected regret of any learning algorithm is at least $\Omega(\sqrt{B})$, thus our algorithm is order optimal. We then propose another algorithm EXP3++.BwK, which is order optimal in the adversarial BwK setting, and incurs an almost optimal expected regret with an additional factor of $O(\log(B))$ in the stochastic BwK setting where the rewards and the costs are drawn from unknown underlying distributions.These results are then extended to a more general online learning setting, by designing another algorithm EXP3++.LwK and providing its performance guarantees. Finally, we investigate the scenario where the costs of the actions are large and comparable to the budget $B$. We show that for the adversarial setting, the achievable regret bounds scale at least linearly with the maximum cost for any learning algorithm, and are significantly worse in comparison to the case of having costs bounded by a constant, which is a common assumption in the BwK literature.

• #2097
Multi-Objective Generalized Linear Bandits
Shiyin Lu, Guanghui Wang, Yao Hu, Lijun Zhang
Online Learning 1

In this paper, we study the multi-objective bandits (MOB) problem, where a learner repeatedly selects one arm to play and then receives a reward vector consisting of multiple objectives. MOB has found many real-world applications as varied as online recommendation and network routing. On the other hand, these applications typically contain contextual information that can guide the learning process which, however, is ignored by most of existing work. To utilize this information, we associate each arm with a context vector and assume the reward follows the generalized linear model (GLM). We adopt the notion of Pareto regret to evaluate the learner's performance and develop a novel algorithm for minimizing it. The essential idea is to apply a variant of the online Newton step to estimate model parameters, based on which we utilize the upper confidence bound (UCB) policy to construct an approximation of the Pareto front, and then uniformly at random choose one arm from the approximate Pareto front. Theoretical analysis shows that the proposed algorithm achieves an $\tilde O(d\sqrt{T})$ Pareto regret, where $T$ is the time horizon and $d$ is the dimension of contexts, which matches the optimal result for single objective contextual bandits problem. Numerical experiments demonstrate the effectiveness of our method.

### Wednesday 1411:00 - 12:30ML|C - Classification 4 (R8)

Chair: TBD
• #2994
SPAGAN: Shortest Path Graph Attention Network
Yiding Yang, Xinchao Wang, Mingli Song, Junsong Yuan, Dacheng Tao
Classification 4

Graph convolutional networks (GCN) have recently demonstrated their potential in analyzing non-grid structure data that can be represented as graphs. The core idea is to encode the local topology of a graph, via convolutions, into the feature of a center node. In this paper, we propose a novel GCN model, which we term as Shortest Path Graph Attention Network (SPAGAN). Unlike conventional GCN models that carry out node-based attentions, on either first-order neighbors or random higher-order ones, the proposed SPAGAN conducts path-based attention that explicitly accounts for the influence of a sequence of nodes yielding the minimum cost, or shortest path, between the center node and its higher-order neighbors. SPAGAN therefore allows for a more informative and intact exploration of the graph structure and further the more effective aggregation of information from distant neighbors, as compared to node-based GCN methods. We test SPAGAN for the downstream classification task on several standard datasets, and achieve performances superior to the state of the art.

• #3128
Learn Smart with Less: Building Better Online Decision Trees with Fewer Training Examples
Ariyam Das, Jin Wang, Sahil M. Gandhi, Jae Lee, Wei Wang, Carlo Zaniolo
Classification 4

Online  decision  tree  models  are  extensively  used in  many  industrial  machine  learning  applications for real-time classification tasks. These models are highly accurate, scalable and easy to use in practice. The Very Fast Decision Tree (VFDT) is the classic  online  decision  tree  induction  model  that has been widely adopted due to its theoretical guarantees  as  well  as  competitive  performance.  However, VFDT and its variants solely rely on conservative statistical measures like Hoeffding bound to incrementally grow the tree. This makes these models  extremely  circumspect  and  limits  their  ability to  learn  fast.  In  this  paper,  we  efficiently  employ statistical  resampling  techniques  to  build  an online tree faster using fewer examples. We first theoretically show that a naive implementation of resampling techniques like non-parametric bootstrap does not scale due to large memory and computational overheads. We  mitigate  this  by  proposing  a  robust  memory-efficient bootstrap simulation heuristic (Mem-ES) that  successfully  expedites  the  learning  process. Experimental  results  on  both  synthetic  data  and large-scale real world datasets demonstrate the efficiency  and  effectiveness  of  our  proposed  technique.

• #3915
Discrete Binary Coding based Label Distribution Learning
Ke Wang, Xin Geng
Classification 4

Label Distribution Learning (LDL) is a general learning paradigm in machine learning, which includes both single-label learning (SLL) and multi-label learning (MLL) as its special cases. Recently, many LDL algorithms have been proposed to handle different application tasks such as facial age estimation, head pose estimation and visual sentiment distributions prediction. However, the training time complexity of most existing LDL algorithms is too high, which makes them unapplicable to large-scale LDL. In this paper, we propose a novel LDL method to address this issue, termed Discrete Binary Coding based Label Distribution Learning (DBC-LDL). Specifically, we design an efficiently discrete coding framework to learn binary codes for instances. Furthermore, both the pair-wise semantic similarities and the original label distributions are integrated into this framework to learn highly discriminative binary codes. In addition, a fast approximate nearest neighbor (ANN) search strategy is utilized to predict label distributions for testing instances. Experimental results on five real-world datasets demonstrate its superior performance over several state-of-the-art LDL methods with the lower time cost.

• #3982
Learning for Tail Label Data: A Label-Specific Feature Approach
Tong Wei, Wei-Wei Tu, Yu-Feng Li
Classification 4

Tail label data (TLD) is prevalent in real-world tasks, and large-scale multi-label learning (LMLL) is its major learning scheme. Previous LMLL studies typically need to additionally take into account extensive head label data (HLD), and thus fail to guide the learning behavior of TLD. In many applications such as recommender systems, however, the prediction of tail label is very necessary, since it provides very important supplementary information. We call this kind of problem as \emph{tail label learning}. In this paper, we propose a novel method for the tail label learning problem. Based on the observation that the raw feature representation in LMLL data usually benefits HLD, which may not be suitable for TLD, we construct effective and rich label-specific features through exploring labeled data distribution and leveraging label correlations. Specifically, we employ clustering analysis to explore discriminative features for each tail label replacing the original high-dimensional and sparse features. In addition, due to the scarcity of positive examples of TLD, we encode knowledge from HLD by exploiting label correlations to enhance the label-specific features. Experimental results verify the superiority of the proposed method in terms of performance on TLD.

• #4952
Spatio-Temporal Attentive RNN for Node Classification in Temporal Attributed Graphs
Dongkuan Xu, Wei Cheng, Dongsheng Luo, Xiao Liu, Xiang Zhang
Classification 4

Node classification in graph-structured data aims to classify the nodes where labels are only available for a subset of nodes. This problem has attracted considerable research efforts in recent years. In real-world applications, both graph topology and node attributes evolve over time. Existing techniques, however, mainly focus on static graphs and lack the capability to simultaneously learn both temporal and spatial/structural features. Node classification in temporal attributed graphs is challenging for two major aspects. First, effectively modeling the spatio-temporal contextual information is hard. Second, as temporal and spatial dimensions are entangled, to learn the feature representation of one target node, it's desirable and challenging to differentiate the relative importance of different factors, such as different neighbors and time periods. In this paper, we propose STAR, a spatio-temporal attentive recurrent network model, to deal with the above challenges. STAR embeds the sampling and aggregation of local neighbor nodes into a gated recurrent unit network to jointly learn the spatio-temporal contextual information. On top of that, we take advantage of the dual attention mechanism to perform a thorough analysis on the model interpretability. Extensive experiments on real datasets demonstrate the effectiveness of the STAR model.

• #2572
Worst-Case Discriminative Feature Selection
Shuangli Liao, Quanxue Gao, Feiping Nie, Yang Liu, Xiangdong Zhang
Classification 4

Feature selection plays a critical role in data mining, driven by increasing feature dimensionality in target problems. In this paper, we propose a new criterion for discriminative feature selection, worst-case discriminative feature selection (WDFS). Unlike Fisher Score and other methods based on the discriminative criteria considering the overall (or average) separation of data, WDFS adopts a new perspective called worst-case view which arguably is more suitable for classification applications. Specifically, WDFS directly maximizes the ratio of the minimum of between-class variance of all class pairs over the maximum of within-class variance, and thus it duly considers the separation of all classes. Otherwise, we take a greedy strategy by finding one feature at a time, but it is very easy to implement. Moreover, we also utilize the correlation between features to help reduce the redundancy and extend WDFS to uncorrelated WDFS (UWDFS). To evaluate the effectiveness of the proposed algorithm, we conduct classification experiments on many real data sets. In the experiment, we respectively use the original features and the score vectors of features over all class pairs to calculate the correlation coefficients, and analyze the experimental results in these two ways. Experimental results demonstrate the effectiveness of WDFS and UWDFS.

### Wednesday 1411:00 - 12:30NLP|D - Dialogue (R9)

Chair: TBD
• #28
Exploiting Persona Information for Diverse Generation of Conversational Responses
Haoyu Song, Wei-Nan Zhang, Yiming Cui, Dong Wang, Ting Liu
Dialogue

In human conversations, due to their personalities in mind, people can easily carry out and maintain the conversations. Giving conversational context with persona information to a chatbot, how to exploit the information to generate diverse and sustainable conversations is still a non-trivial task. Previous work on persona-based conversational models successfully make use of predefined persona information and have shown great promise in delivering more realistic responses. And they all learn with the assumption that given a source input, there is only one target response. However, in human conversations, there are massive appropriate responses to a given input message. In this paper, we propose a memory-augmented architecture to exploit persona information from context and incorporate a conditional variational autoencoder model together to generate diverse and sustainable conversations. We evaluate the proposed model on a benchmark persona-chat dataset. Both automatic and human evaluations show that our model can deliver more diverse and more engaging persona-based responses than baseline approaches.

• #1946
Generating Multiple Diverse Responses with Multi-Mapping and Posterior Mapping Selection
Chaotao Chen, Jinhua Peng, Fan Wang, Jun Xu, Hua Wu
Dialogue

In human conversation an input post is open to multiple potential responses, which is typically regarded as a one-to-many problem. Promising approaches mainly incorporate multiple latent mechanisms to build the one-to-many relationship. However, without accurate selection of the latent mechanism corresponding to the target response during training, these methods suffer from a rough optimization of latent mechanisms. In this paper, we propose a multi-mapping mechanism to better capture the one-to-many relationship, where multiple mapping modules are employed as latent mechanisms to model the semantic mappings from an input post to its diverse responses. For accurate optimization of latent mechanisms, a posterior mapping selection module is designed to select the corresponding mapping module according to the target response for further optimization. We also introduce an auxiliary matching loss to facilitate the optimization of posterior mapping selection. Empirical results demonstrate the superiority of our model in generating multiple diverse and informative responses over the state-of-the-art methods.

• #1987
Learning to Select Knowledge for Response Generation in Dialog Systems
Rongzhong Lian, Min Xie, Fan Wang, Jinhua Peng, Hua Wu
Dialogue

End-to-end neural models for intelligent dialogue systems suffer from the problem of generating uninformative responses. Various methods were proposed to generate more informative responses by leveraging external knowledge. However,  few previous work has focused on selecting appropriate knowledge in the learning process. The inappropriate selection of knowledge could prohibit the model from learning to make full use of the knowledge. Motivated by this, we propose an end-to-end neural model which employs a novel knowledge selection mechanism where both prior and posterior distributions over knowledge are used to facilitate knowledge selection. Specifically, a posterior distribution over knowledge is inferred from both utterances and responses, and it ensures the appropriate selection of knowledge during the training process. Meanwhile, a prior distribution, which is inferred from utterances only,  is used to approximate the posterior distribution so that appropriate knowledge can be selected even without responses during the inference process. Compared with the previous work, our model can better incorporate appropriate knowledge in response generation. Experiments on both automatic and human evaluation verify the superiority of our model over previous baselines.

• #2334
GSN: A Graph-Structured Network for Multi-Party Dialogues
Wenpeng Hu, Zhangming Chan, Bing Liu, Dongyan Zhao, Jinwen Ma, Rui Yan
Dialogue

Existing neural models for dialogue response generation assume that utterances are sequentially organized. However, many real-world dialogues involve multiple interlocutors (i.e., multi-party dialogues), where the assumption does not hold as utterances from different interlocutors can occur in parallel.'' This paper generalizes existing sequence-based models to a Graph-Structured neural Network (GSN) for dialogue modeling. The core of GSN is a graph-based encoder that can model the information flow along the graph-structured dialogues (two-party sequential dialogues are a special case). Experimental results show that GSN significantly outperforms existing sequence-based models.

• #2504
Dual Visual Attention Network for Visual Dialog
Dan Guo, Hui Wang, Meng Wang
Dialogue

Visual dialog is a challenging task, which involves multi-round semantic transformations between vision and language. This paper aims to address cross-modal semantic correlation for visual dialog. Motivated by that Vg (global vision), Vl (local vision), Q (question) and H (history) have inseparable relevances, the paper proposes a novel Dual Visual Attention Network (DVAN) to realize (Vg, Vl, Q, H)--> A. DVAN is a three-stage query-adaptive attention model. In order to acquire accurate A (answer), it first explores the textual attention, which imposes the question on history to pick out related context H'. Then, based on Q and H', it implements respective visual attentions to discover related global image visual hints Vg' and local object-based visual hints Vl'. Next, a dual crossing visual attention is proposed. Vg' and Vl' are mutually embedded to learn the complementary of visual semantics. Finally, the attended textual and visual features are combined to infer the answer. Experimental results on the VisDial v0.9 and v1.0 datasets validate the effectiveness of the proposed approach.

• #2665
A Document-grounded Matching Network for Response Selection in Retrieval-based Chatbots
Xueliang Zhao, Chongyang Tao, Wei Wu, Can Xu, Dongyan Zhao, Rui Yan
Dialogue

We present a document-grounded matching network (DGMN) for response selection that can power a knowledge-aware retrieval-based chatbot system. The challenges of building such a model lie in how to ground conversation contexts with background documents and how to recognize important information in the documents for matching. To overcome the challenges, DGMN fuses information in a document and a context into representations of each other, and dynamically determines if grounding is necessary and importance of different parts of the document and the context through hierarchical interaction with a response at the matching step. Empirical studies on two public data sets indicate that DGMN can significantly improve upon state-of-the-art methods and at the same time enjoys good interpretability.

### Wednesday 1411:00 - 12:30CV|RDCIMRSI - Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 3 (R10)

Chair: TBD
• #2079
Pedestrian Attribute Recognition by Joint Visual-semantic Reasoning and Knowledge Distillation
Qiaozhe Li, Xin Zhao, Ran He, Kaiqi Huang
Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 3

Pedestrian attribute recognition in surveillance is a challenging task in computer vision due to significant pose variation, viewpoint change and poor image quality. To achieve effective recognition, this paper presents a graph-based global reasoning framework to jointly model potential visual-semantic relations of attributes and distill auxiliary human parsing knowledge to guide the relational learning. The reasoning framework models attribute groups on a graph and learns a projection function to adaptively assign local visual features to the nodes of the graph. After feature projection, graph convolution is utilized to perform global reasoning between the attribute groups to model their mutual dependencies. Then, the learned node features are projected back to visual space to facilitate knowledge transfer. An additional regularization term is proposed by distilling human parsing knowledge from a pre-trained teacher model to enhance feature representations. The proposed framework is verified on three large scale pedestrian attribute datasets including PETA, RAP, and PA-100k. Experiments show that our method achieves state-of-the-art results.

• #3776
Low Shot Box Correction for Weakly Supervised Object Detection
Tianxiang Pan, Bin Wang, Guiguang Ding, Jungong Han, Junhai Yong
Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 3

Weakly supervised object detection (WSOD) has been widely studied but the accuracy of state-of-art methods remains far lower than strongly supervised methods. One major reason for this huge gap is the incomplete box detection problem which arises because most previous WSOD models are structured on classification networks and therefore tend to recognize the most discriminative parts instead of complete bounding boxes. To solve this problem, we define a low-shot weakly supervised object detection task and propose a novel low-shot box correction network to address it. The proposed task enables to train object detectors on a large data set all of which have image-level annotations, but only a small portion or few shots have box annotations. Given the low-shot box annotations, we use a novel box correction network to transfer the incomplete boxes into complete ones. Extensive empirical evidence shows that our proposed method yields state-of-art detection accuracy under various settings on the PASCAL VOC benchmark.

• #3835
Transferable Adversarial Attacks for Image and Video Object Detection
Xingxing Wei, Siyuan Liang, Ning Chen, Xiaochun Cao
Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 3

Identifying adversarial examples is beneficial for understanding deep networks and developing robust models. However, existing attacking methods for image object detection have two limitations: weak transferability---the generated adversarial examples often have a low success rate to attack other kinds of detection methods, and high computation cost---they need much time to deal with video data, where many frames need polluting. To address these issues, we present a generative method to obtain adversarial images and videos, thereby significantly reducing the processing time. To enhance transferability, we manipulate the feature maps extracted by a feature network, which usually constitutes the basis of object detectors. Our method is based on the Generative Adversarial Network (GAN) framework, where we combine a high-level class loss and a low-level feature loss to jointly train the adversarial example generator. Experimental results on PASCAL VOC and ImageNet VID datasets show that our method efficiently generates image and video adversarial examples, and more importantly, these adversarial examples have better transferability, therefore being able to simultaneously attack two kinds of  representative object detection models: proposal based models like Faster-RCNN and regression based models like SSD.

• #4376
Equally-Guided Discriminative Hashing for Cross-modal Retrieval
Yufeng Shi, Xinge You, Feng Zheng, Shuo Wang, Qinmu Peng
Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 3

Cross-modal hashing intends to project data from two modalities into a common hamming space to perform cross-modal retrieval efficiently. Despite satisfactory performance achieved on real applications, existing methods are incapable of effectively preserving semantic structure to maintain inter-class relationship and improving discriminability to make intra-class samples aggregated simultaneously, which thus limits the higher retrieval performance. To handle this problem, we propose Equally-Guided Discriminative Hashing (EGDH), which jointly takes into consideration semantic structure and discriminability. Specifically, we discover the connection between semantic structure preserving and discriminative methods. Based on it, we directly encode multi-label annotations that act as high-level semantic features to build a common semantic structure preserving classifier. With the common classifier to guide the learning of different modal hash functions equally, hash codes of samples are intra-class aggregated and inter-class relationship preserving. Experimental results on two benchmark datasets demonstrate the superiority of EGDH compared with the state-of-the-arts.

• #151
Color-Sensitive Person Re-Identification
Guan'an Wang, Yang Yang, Jian Cheng, Jinqiao Wang, Zengguang Hou
Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 3

Recent deep Re-ID models mainly focus on learning high-level semantic features, while failing to explicitly explore color information which is one of the most important cues for person Re-ID. In this paper, we propose a novel Color-Sensitive Re-ID to take full advantage of color information. On one hand, we train our model with real and fake images. By using the extra fake images, more color information can be exploited and it can avoid overfitting during training. On the other hand, we also train our model with images of the same person with different colors. By doing so, features can be forced to focus on the color difference in regions. To generate fake images with specified colors, we propose a novel Color Translation GAN (CTGAN) to learn mappings between different clothing colors and preserve identity consistency among the same clothing color. Extensive evaluations on two benchmark datasets show that our approach significantly outperforms state-of-the-art  Re-ID models.

• #168
Graph Convolutional Network Hashing for Cross-Modal Retrieval
Ruiqing Xu, Chao Li, Junchi Yan, Cheng Deng, Xianglong Liu
Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation 3

Deep network based cross-modal retrieval has recently made significant progress. However, bridging modality gap to further enhance the retrieval accuracy still remains a crucial bottleneck. In this paper, we propose a Graph Convolutional Hashing (GCH) approach, which learns modality-unified binary codes via an affinity graph. An end-to-end deep architecture is constructed with three main components: a semantic encoder module, two feature encoding networks, and a graph convolutional network (GCN). We design a semantic encoder as a teacher module to guide the feature encoding process, a.k.a. student module, for semantic information exploiting. Furthermore, GCN is utilized to explore the inherent similarity structure among data points, which will help to generate discriminative hash codes. Extensive experiments on three benchmark datasets demonstrate that the proposed GCH outperforms the state-of-the-art methods.

### Wednesday 1411:00 - 12:30PS|TFP - Theoretical Foundations of Planning (R11)

Chair: TBD
• #2816
Partitioning Techniques in LTLf Synthesis
Lucas Martinelli Tabajara, Moshe Y. Vardi
Theoretical Foundations of Planning

Decomposition is a general principle in computational thinking, aiming at decomposing a problem instance into easier subproblems. Indeed, decomposing a transition system into a partitioned transition relation was critical to scaling BDD-based model checking to large state spaces. Since then, it has become a standard technique for dealing with related problems, such as Boolean synthesis. More recently, partitioning has begun to be explored in the synthesis of reactive systems. LTLf synthesis, a finite-horizon version of reactive synthesis with applications in areas such as robotics, seems like a promising candidate for partitioning techniques. After all, the state of the art is based on a BDD-based symbolic algorithm similar to those from model checking, and partitioning could be a potential solution to the current bottleneck of this approach, which is the construction of the state space. In this work, however, we expose fundamental limitations of partitioning that hinder its effective application to symbolic LTLf synthesis. We not only provide evidence for this fact through an extensive experimental evaluation, but also perform an in-depth analysis to identify the reason for these results. We trace the issue to an overall increase in the size of the explored state space, caused by an inability of partitioning to fully exploit state-space minimization, which has a crucial effect on performance. We conclude that more specialized decomposition techniques are needed for LTLf synthesis which take into account the effects of minimization.

• #6582
Dynamic logic of parallel propositional assignments and its applications to planning
Andreas Herzig, Frédéric Maris, Julien Vianey
Theoretical Foundations of Planning

We introduce a dynamic logic with parallel composition and two kinds of nondeterministic composition, exclusive and inclusive. We show PSPACE completeness of both the model checking and the satisfiability problem and apply our logic to sequential and parallel classical planning where actions have conditional effects.

• #733
Planning for LTLf /LDLf Goals in Non-Markovian Fully Observable Nondeterministic Domains
Ronen I. Brafman, Giuseppe De Giacomo
Theoretical Foundations of Planning

In this paper, we investigate non-Markovian Nondeterministic Fully Observable Planning Domains (NMFONDs), variants of Nondeterministic Fully Observable Planning Domains (FONDs) where the next state is determined by the full history leading to the current state. In particular, we introduce TFONDs which are NMFONDs where conditions on the history are succinctly and declaratively specified using the linear-time temporal logic on finite traces LTLf and its extension LDLf. We provide algorithms for planning in TFONDs for general LTLf/LDLf goals, and establish tight complexity bounds w.r.t. the domain representation and the goal, separately. We also show that TFONDs are able to capture all NMFONDs in which the dependency on the history is "finite state". Finally, we show that TFONDs also capture Partially Observable Nondeterministic Planning Domains (PONDs), but without referring to unobservable variables.

• #1561
Steady-State Policy Synthesis for Verifiable Control
Alvaro Velasquez
Theoretical Foundations of Planning

In this paper, we introduce the Steady-State Policy Synthesis (SSPS) problem which consists of finding a stochastic decision-making policy that maximizes expected rewards while satisfying a set of asymptotic behavioral specifications. These specifications are determined by the steady-state probability distribution resulting from the Markov chain induced by a given policy. Since such distributions necessitate recurrence, we propose a solution which finds policies that induce recurrent Markov chains within possibly non-recurrent Markov Decision Processes (MDPs). The SSPS problem functions as a generalization of steady-state control, which has been shown to be in PSPACE. We improve upon this result by showing that SSPS is in P via linear programming. Our results are validated using CPLEX simulations on MDPs with over 10000 states. We also prove that the deterministic variant of SSPS is NP-hard.

• #10960
(Sister Conferences Best Papers Track) A Refined Understanding of Cost-optimal Planning with Polytree Causal Graphs
Christer Bäckström, Peter Jonsson, Sebastian Ordyniak
Theoretical Foundations of Planning

<p>Complexity analysis based on the causal graphs of planning instances is a highly important research area. In particular, tractability results have led to new methods for constructing domain-independent heuristics. Important early examples of such results were presented by, for instance, Brafman & Domshlak and Katz & Keyder. More general results based on polytrees and bounding certain parameters were subsequently derived by Aghighi et al. and Ståhlberg. We continue this line of research by analyzing cost-optimal planning for instances with a polytree causal graph, bounded domain size and bounded depth. We show that no further restrictions are necessary for tractability, thus generalizing the previous results. Our approach is based on a novel method of closely analysing optimal plans: we recursively decompose the causal graph in a way that allows for bounding the number of variable changes as a function of the depth, using a reording argument and a comparison with prefix trees of known size. We then transform the planning instances into tree-structured constraint satisfaction instances.<br><br></p>

• #4601
Reachability and Coverage Planning for Connected Agents
Tristan Charrier, Arthur Queffelec, Ocan Sankur, François Schwarzentruber
Theoretical Foundations of Planning

Motivated by the increasing appeal of robots in information-gathering missions, we study multi-agent path planning problems in which the agents must remain interconnected. We model an area by a topological graph specifying the movement and the connectivity constraints of the agents. We study the theoretical complexity of the reachability and the coverage problems of a fleet of connected agents on various classes of topological graphs. We establish the complexity of these problems on known classes, and introduce a new class called sight-moveable graphs which admit efficient algorithms.

### Wednesday 1411:00 - 12:30ML|DM - Data Mining 5 (R12)

Chair: TBD
• #1007
RecoNet: An Interpretable Neural Architecture for Recommender Systems
Francesco Fusco, Michalis Vlachos, Vasileios Vasileiadis, Kathrin Wardatzky, Johannes Schneider
Data Mining 5

Neural systems offer high predictive accuracy but are plagued by long training times and low interpretability. We present a simple neural architecture for recommender systems that lifts several of these shortcomings. Firstly, the approach has a high predictive power that is comparable to state-of-the-art recommender approaches. Secondly, owing to its simplicity, the trained model can be interpreted easily because it provides the individual contribution of each input feature to the decision. Our method is three orders of magnitude faster than general-purpose explanatory approaches, such as LIME. Finally, thanks to its design, our architecture addresses cold-start issues, and therefore the model does not require retraining in the presence of new users.

• #1044
GSTNet: Global Spatial-Temporal Network for Traffic Flow Prediction
Shen Fang, Qi Zhang, Gaofeng Meng, Shiming Xiang, Chunhong Pan
Data Mining 5

Predicting traffic flow on traffic networks is a very challenging task, due to the complicated and dynamic spatial-temporal dependencies between different nodes on the network. The traffic flow renders two types of temporal dependencies, including short-term neighboring and long-term periodic dependencies. What's more, the spatial correlations over different nodes are both local and non-local. To capture the global dynamic spatial-temporal correlations, we propose a Global Spatial-Temporal Network (GSTNet), which consists of several layers of spatial-temporal blocks. Each block contains a multi-resolution temporal module and a global correlated spatial module in sequence, which can simultaneously extract the dynamic temporal dependencies and the global spatial correlations. Extensive experiments on the real world datasets verify the effectiveness and superiority of the proposed method on both the public transportation network and the road network.

• #1050
Graph Contextualized Self-Attention Network for Session-based Recommendation
Chengfeng Xu, Pengpeng Zhao, Yanchi Liu, Victor S. Sheng, Jiajie Xu, Fuzhen Zhuang, Junhua Fang, Xiaofang Zhou
Data Mining 5

Session-based recommendation, which aims to predict the user's immediate next action based on anonymous sessions, is a key task in many online services (e.g., e-commerce, media streaming).  Recently, Self-Attention Network (SAN) has achieved significant success in various sequence modeling tasks without using either recurrent or convolutional network. However, SAN lacks local dependencies that exist over adjacent items and limits its capacity for learning contextualized representations of items in sequences.  In this paper, we propose a graph contextualized self-attention model (GC-SAN), which utilizes both graph neural network and self-attention mechanism, for session-based recommendation. In GC-SAN, we dynamically construct a graph structure for session sequences and capture rich local dependencies via graph neural network (GNN).  Then each session learns long-range dependencies by applying the self-attention mechanism. Finally, each session is represented as a linear combination of the global preference and the current interest of that session. Extensive experiments on two real-world datasets show that GC-SAN outperforms state-of-the-art methods consistently.

• #2999
Outlier Detection for Time Series with Recurrent Autoencoder Ensembles
Tung Kieu, Bin Yang, Chenjuan Guo, Christian S. Jensen
Data Mining 5

We propose two solutions to outlier detection in time series based on recurrent autoencoder ensembles. The solutions exploit autoencoders built using sparsely-connected recurrent neural networks (S-RNNs). Such networks make it possible to generate multiple autoencoders with different neural network connection structures. The two solutions are ensemble frameworks, specifically an independent framework and a shared framework, both of which combine multiple S-RNN based autoencoders to enable outlier detection.  This ensemble-based approach aims to reduce the effects of some autoencoders being overfitted to outliers, this way improving overall detection quality. Experiments with two large real-world time series data sets, including univariate and multivariate time series, offer insight into the design properties of the proposed frameworks and demonstrate that the resulting solutions are capable of outperforming both baselines and the state-of-the-art methods.

• #5978
Similarity Preserving Representation Learning for Time Series Clustering
Qi Lei, Jinfeng Yi, Roman Vaculin, Lingfei Wu, Inderjit Dhillon
Data Mining 5

A considerable amount of clustering algorithms take instance-feature matrices as their inputs. As such, they cannot directly analyze time series data due to its temporal nature, usually unequal lengths, and complex properties. This is a great pity since many of these algorithms are effective, robust, efficient, and easy to use. In this paper, we bridge this gap by proposing an efficient representation learning framework that is able to convert a set of time series with various lengths to an instance-feature matrix. In particular, we guarantee that the pairwise similarities between time series are well preserved after the transformation , thus the learned feature representation is particularly suitable for the time series clustering task.&nbsp;Given a set of $n$ time series, we first construct an $n\times n$ partially-observed similarity matrix by randomly sampling $\mathcal{O}(n \log n)$ pairs of time series and computing their pairwise similarities. We then propose an efficient algorithm that solves a non-convex and NP-hard problem to learn new features based on the partially-observed similarity matrix. By conducting extensive empirical studies, we demonstrate that the proposed framework is much more effective, efficient, and flexible compared to other state-of-the-art clustering methods.

• #6128
DyAt Nets: Dynamic Attention Networks for State Forecasting in Cyber-Physical Systems
Nikhil Muralidhar, Sathappan Muthiah, Naren Ramakrishnan
Data Mining 5

Multivariate time series forecasting is an important task in state forecasting for cyber-physical systems (CPS). State forecasting in CPS is imperative for optimal planning of system energy utility and understanding normal operational characteristics of the system thus enabling anomaly detection. Forecasting models can also be used to identify sub-optimal or worn out components and are thereby useful for overall system monitoring. Most existing work only performs single step forecasting but in CPS it is imperative to forecast the next sequence of system states (i.e curve forecasting). In this paper, we propose DyAt (Dynamic Attention) networks, a novel deep learning sequence to sequence (Seq2Seq) model with a novel hierarchical attention mechanism for long-term time series state forecasting. We evaluate our method on several CPS state forecasting and electric load forecasting tasks and find that our proposed DyAt models yield a performance improvement of at least 13.69% for the CPS state forecasting task and a performance improvement of at least 18.83% for the electric load forecasting task over other state-of-the-art forecasting baselines. We perform rigorous experimentation with several variants of the DyAt model and demonstrate that the DyAt models indeed learn better representations over the entire course of the long term forecast as compared to their counterparts with or without traditional attention mechanisms. All data and source code has been made available online.

### Wednesday 1411:00 - 12:30ML|TAML - Transfer, Adaptation, Multi-task Learning 2 (R13)

Chair: TBD
• #2810
Metadata-driven Task Relation Discovery for Multi-task Learning
Zimu Zheng, Yuqi Wang, Quanyu Dai, Huadi Zheng, Dan Wang
Transfer, Adaptation, Multi-task Learning 2

Task Relation Discovery (TRD), i.e., reveal the relation of tasks, has notable value: it is the key concept underlying Multi-task Learning (MTL) and provides a principled way for identifying redundancies across tasks. However, task relation is usually specifically determined by data scientist resulting in the additional human effort for TRD, while transfer based on brute-force methods or mere training samples may cause negative effects which degrade the learning performance. To avoid negative transfer in an automatic manner, our idea is to leverage commonly available context attributes in nowadays systems, i.e., the metadata. In this paper, we, for the first time, introduce metadata into TRD for MTL and propose a novel Metadata Clustering method, which jointly uses historical samples and additional metadata to automatically exploit the true relatedness. It also avoids the negative transfer by identifying reusable samples between related tasks. Experimental results on five real-world datasets demonstrate that the proposed method is effective for MTL with TRD, and particularly useful in complicated systems with diverse metadata but insufficient data samples. In general, this study helps in automatic relation discovery among partially related tasks and sheds new light on the development of TRD in MTL through the use of metadata as apriori information.

• #4660
Group LASSO with Asymmetric Structure Estimation for Multi-Task Learning
Saullo H. G. Oliveira, André R. Gonçalves, Fernando J. Von Zuben
Transfer, Adaptation, Multi-task Learning 2

Group LASSO is a widely used regularization that imposes sparsity considering groups of covariates. When used in Multi-Task Learning (MTL) formulations, it makes an underlying assumption that if one group of covariates is not relevant for one or a few tasks, it is also not relevant for all tasks, thus implicitly assuming that all tasks are related. This implication can easily lead to negative transfer if this assumption does not hold for all tasks. Since for most practical applications we hardly know a priori how the tasks are related, several approaches have been conceived in the literature to (i) properly capture the transference structure, (ii) improve interpretability of the tasks interplay, and (iii) penalize potential negative transfer. Recently, the automatic estimation of asymmetric structures inside the learning process was capable of effectively avoiding negative transfer. Our proposal is the first attempt in the literature to conceive a Group LASSO with asymmetric transference formulation, looking for the best of both worlds in a framework that admits the overlap of groups. The resulting optimization problem is solved by an alternating procedure with fast methods. We performed experiments using synthetic and real datasets to compare our proposal with state-of-the-art approaches, evidencing the promising predictive performance and distinguished interpretability of our proposal. The real case study involves the prediction of cognitive scores for Alzheimer's disease progression assessment. The source codes are available at GitHub.

• #5047
Meta-Learning for Low-resource Natural Language Generation in Task-oriented Dialogue Systems
Fei Mi, Minlie Huang, Jiyong Zhang, Boi Faltings
Transfer, Adaptation, Multi-task Learning 2

Natural language generation (NLG) is an essential component of task-oriented dialogue systems. Despite the recent success of neural approaches for NLG, they are typically developed for particular domains with rich annotated training examples. In this paper, we study NLG in a low-resource setting to generate sentences in new scenarios with handful training examples. We formulate the problem from a meta-learning perspective, and propose a generalized optimization-based approach (Meta-NLG) based on the well-recognized model-agnostic meta-learning (MAML) algorithm. Meta-NLG defines a set of meta tasks, and directly incorporates the objective of adapting to new low-resource NLG tasks into the meta-learning optimization process. Extensive experiments are conducted on a large multi-domain dataset (MultiWoz) with diverse linguistic variations. We show that Meta-NLG significantly outperforms other training procedures in various low-resource configurations. We analyze the results, and demonstrate that Meta-NLG adapts extremely fast and well to low-resource situations.

• #5499
One Network for Multi-Domains: Domain Adaptive Hashing with Intersectant Generative Adversarial Networks
Tao He, Yuan-Fang Li, Lianli Gao, Dongxiang Zhang, Jingkuan Song
Transfer, Adaptation, Multi-task Learning 2

With the recent explosive increase of digital data,image recognition and retrieval has become a critical practical application. Hashing is an effective solution to this problem, due to its low storage requirement and high query speed. However, most of past works focus on hashing in a single (source) domain. Thus, the learned hash function may not adapt well in a new (target) domain that has a large distributional difference with the source domain. In this paper, we explore an endto-end domain adaptive learning framework that simultaneously and precisely generates discriminative hash codes and classifies target domain images. Our method encodes images into a semantic common space, followed by two independent generative adversarial networks aiming at crosswise reconstructing two domain? images, reducing domain disparity and improving alignment in the shared space. We evaluate our framework on four public benchmark datasets, all of which show that our method is superior to the other state-of-the-art methods on the tasks of object recognition and image retrieval. Our anonymous code is released at https://github.com/htlsn/igan.

• #1358
Progressive Transfer Learning for Person Re-identification
Zhengxu Yu, Zhongming Jin, Long Wei, Jishun Guo, Jianqiang Huang, Deng Cai, Xiaofei He, Xian-Sheng Hua
Transfer, Adaptation, Multi-task Learning 2

Model fine-tuning is a widely used transfer learning approach in person Re-identification (ReID) applications, which fine-tuning a pre-trained feature extraction model into the target scenario instead of training a model from scratch. It is challenging due to the significant variations inside the target scenario, e.g., different camera viewpoint, illumination changes, and occlusion. These variations result in a gap between the distribution of each mini-batch and the distribution of the whole dataset when using mini-batch training. In this paper, we study model fine-tuning from the perspective of the aggregation and utilization of the global information of the dataset when using mini-batch training. Specifically, we introduce a novel network structure called Batch-related Convolutional Cell (BConv-Cell), which progressively collects the global information of the dataset into a latent state and uses this latent state to rectify the extracted feature. Based on BConv-Cells, we further proposed the Progressive Transfer Learning (PTL) method to facilitate the model fine-tuning process by joint training the BConv-Cells and the pre-trained ReID model. Empirical experiments show that our proposal can improve the performance of the ReID model greatly on MSMT17, Market-1501, CUHK03 and DukeMTMC-reID datasets. The code will be released later on at \url{https://github.com/ZJULearning/PTL}

• #2913
Complementary Learning for Overcoming Catastrophic Forgetting Using Experience Replay
Mohammad Rostami, Soheil Kolouri, Praveen Pilly
Transfer, Adaptation, Multi-task Learning 2

Despite huge success, deep networks are unable to learn effectively in sequential multitask learning settings as they forget the past learned tasks after learning new tasks. Inspired from complementary learning systems theory, we address this challenge by learning a generative model that couples the current task to the past learned tasks through a discriminative embedding space. We learn an abstract generative distribution in the embedding that allows generation of data points to represent past experience. We sample from this distribution and utilize experience replay to avoid forgetting and simultaneously accumulate new knowledge to the abstract distribution in order to couple the current task with past experience. We demonstrate theoretically and empirically that our framework learns a distribution in the embedding, which is shared across all tasks, and as a result tackles catastrophic forgetting.

### Wednesday 1411:00 - 12:30HSGP|HS - Heuristic Search 1 (R14)

Chair: TBD
• #762
Depth-First Memory-Limited AND/OR Search and Unsolvability in Cyclic Search Spaces
Akihiro Kishimoto, Adi Botea, Radu Marinescu
Heuristic Search 1

Computing cycle-free solutions in cyclic AND/OR search spaces is an important AI problem.  Previous work on optimal depth-first search strongly assumes the use of consistent heuristics, the need to keep all examined states in a transposition table, and the existence of solutions.  We give a new theoretical analysis under relaxed assumptions where previous results no longer hold.  We then present a generic approachto proving unsolvability, and apply it to RBFAOO and BLDFS, two state-of-the-art algorithms. We demonstrate the performance in domain-independent nondeterministic planning

• #1078
Iterative Budgeted Exponential Search
Malte Helmert, Tor Lattimore, Levi Lelis, Laurent Orseau, Nathan R. Sturtevant
Heuristic Search 1

We tackle two long-standing problems related to re-expansions in heuristic search algorithms. For graph search, A* can require $\Omega(2^{n_*})$ expansions, where $n_*$ is the number of states within the final $f$ bound. Existing algorithms that address this problem like B and B' improve this bound to $\Omega(n_*^2)$. For tree search, IDA* can also require $\Omega(n_*^2)$ expansions. We describe a new algorithmic framework that iteratively controls an expansion budget and solution cost limit, giving rise to new graph and tree search algorithms for which the number of expansions is $O(n_* \log C^*)$, where $C^*$ is the optimal solution cost. Our experiments show that the new algorithms are robust in scenarios where existing algorithms fail. In the case of tree search, our new algorithms have no overhead over IDA* in scenarios to which IDA* is well suited and can therefore be recommended as a general replacement for IDA*.

• #1195
Conditions for Avoiding Node Re-expansions in Bounded Suboptimal Search
Jingwei Chen, Nathan R. Sturtevant
Heuristic Search 1

Many practical problems are too difficult to solve optimally, motivating the need to found suboptimal solutions, particularly those with bounds on the final solution quality. Algorithms like Weighted A*, A*-epsilon, Optimistic Search, EES, and DPS have been developed to find suboptimal solutions with solution quality that is within a constant bound of the optimal solution. However, with the exception of weighted A*, all of these algorithms require performing node re-expansions during search. This paper explores the properties of priority functions that can find bounded suboptimal solution without requiring node re-expansions. After general bounds are developed, two new convex priority functions are developed that can outperform weighted A*.

• #5894
A*+IDA*: A Simple Hybrid Search Algorithm
Zhaoxing Bu, Richard E. Korf
Heuristic Search 1

We present a simple combination of A* and IDA*, which we call A*+IDA*. It runs A* until memory is almost exhausted, then runs IDA* below each frontier node without duplicate checking. It is widely believed that this algorithm is called MREC, but MREC is just IDA* with a transposition table. A*+IDA* is the first algorithm to run significantly faster than IDA* on the 24-Puzzle, by a factor of almost 5. A complex algorithm called dual search was reported to significantly outperform IDA* on the 24-Puzzle, but the original version does not. We made improvements to dual search and our version combined with A*+IDA* outperforms IDA* by a factor of 6.7 on the 24-Puzzle. Our disk-based A*+IDA* shows further improvement on several hard 24-Puzzle instances. We also found optimal solutions to a subset of random 27 and 29-Puzzle problems. A*+IDA* does not outperform IDA* on Rubik’s Cube, for reasons we explain.

• #5795
Heuristic Search for Homology Localization Problem and Its Application in Cardiac Trabeculae Reconstruction
Xudong Zhang, Pengxiang Wu, Changhe Yuan, Yusu Wang, Dimitris Metaxas, Chao Chen
Heuristic Search 1

Cardiac trabeculae are fine rod-like muscles whose ends are attached to the inner walls of ventricles. Accurate extraction of trabeculae is important yet challenging, due to the background noise and limited resolution of cardiac images. Existing works proposed to handle this task by modeling the trabeculae as topological handles for better extraction. Computing optimal representation of these handles is essential yet very expensive. In this work, we formulate the problem as a heuristic search problem, and propose novel heuristic functions based on advanced topological techniques. We show in experiments that the proposed heuristic functions improve the computation in both time and memory.

### Wednesday 1411:00 - 12:30Survey 1 - Survey Session 1 (R15)

Chair: TBD
• #10893
A Survey on Hierarchical Planning – One Abstract Idea, Many Concrete Realizations
Pascal Bercher, Ron Alford, Daniel Höller
Survey Session 1

Hierarchical planning has attracted renewed interest in the last couple of years, which led to numerous novel formalisms, problem classes, and theoretical investigations. Yet it is important to differentiate between the various formalisms and problem classes, since they show -- sometimes fundamental -- differences with regard to their expressivity and computational complexity: Some of them can be regarded equivalent to non-hierarchical formalisms while others are clearly more expressive. We survey the most important hierarchical problem classes and explain their differences and similarities. We furthermore give pointers to some of the best-known planning systems capable of solving the respective problem classes.

• #10895
Integrating Knowledge and Reasoning in Image Understanding
Somak Aditya, Yezhou Yang, Chitta Baral
Survey Session 1

Deep learning based data-driven approaches have been successfully applied in various image understanding applications ranging from object recognition, semantic segmentation to visual question answering. However, the lack of knowledge integration as well as higher-level reasoning capabilities with the methods still pose a hindrance. In this work, we present a brief survey of a few representative reasoning mechanisms, knowledge integration methods and their corresponding image understanding applications developed by various groups of researchers, approaching the problem from a variety of angles. Furthermore, we discuss upon key efforts on integrating external knowledge with neural networks. Taking cues from these efforts, we conclude by discussing potential pathways to improve reasoning capabilities.

• #10909
Counterfactuals in Explainable Artificial Intelligence (XAI): Evidence from Human Reasoning
Ruth M. J. Byrne
Survey Session 1

Counterfactuals about what could have happened are increasingly used in an array of Artificial Intelligence (AI) applications, and especially in explainable AI (XAI). Counterfactuals can aid the provision of interpretable models to make the decisions of inscrutable systems intelligible to developers and users. However, not all counterfactuals are equally helpful in assisting human comprehension. Discoveries about the nature of the counterfactuals that humans create are a helpful guide to maximize the effectiveness of counterfactual use in AI.

• #10953
A Replication Study of Semantics in Argumentation
Leila Amgoud
Survey Session 1

Argumentation is an activity of reason whose aim is to increase or decrease acceptability of claims. Its backbone is the notion of argument, a set of propositions that justifies a claim. An argument may be weakened by arguments attacking it or strengthened by arguments supporting it. Hence, its evaluation is crucial for assessing the plausibility of the claim it sustains. Consequently, a sizable amount of evaluation methods, called semantics, have been proposed in the literature. This paper discusses two classifications of the existing semantics: one is based on the type of semantics' outcomes (sets of arguments, weighting, and preorder), and the second is based on the goals pursued by the semantics (acceptability, strength, coalitions).

• #10954
Automated Essay Scoring: A Survey of the State of the Art
Zixuan Ke, Vincent Ng
Survey Session 1

Despite being investigated for over 50 years, the task of automated essay scoring is far from being solved. Nevertheless, it continues to draw a lot of attention in the natural language processing community in part because of its commercial and educational values as well as the research challenges it brings about. This paper presents an overview of the major milestones made in automated essay scoring research since its inception.

• #10947
Social Media-based User Embedding: A Literature Review
Shimei Pan, Tao Ding
Survey Session 1

Automated representation learning is behind many recent success stories in machine learning. It is often used to transfer knowledge learned from a large dataset (e.g., raw text) to tasks for which only a small number of training examples are available. In this paper, we review recent advance in learning to represent social media users in low-dimensional embeddings. The technology is critical for creating high performance social media-based human traits and behavior models since the ground truth for assessing latent human traits and behavior is often expensive to acquire at a large scale. In this survey, we review typical methods for learning a unified user embeddings from heterogeneous user data (e.g., combines social media texts with images to learn a unified user representation). Finally we point out some current issues and future directions.

### Wednesday 1411:00 - 12:30DemoT3 - Demo Talks 3 (R16)

Chair: TBA
• #11020
Crowd View: Converting Investors' Opinions into Indicators
Chung-Chi Chen, Hen-Hsen Huang, Hsin-Hsi Chen
Demo Talks 3

This paper demonstrates an opinion indicator (OI) generation system, named Crowd View, with which traders can refer to the fine-grained opinions, beyond the market sentiment (bullish/bearish), from crowd investors when trading financial instruments. We collect the real-time textual information from Twitter, and convert it into five kinds of OIs, including the support price level, resistance price level, price target, buy-side cost, and sell-side cost. The OIs for all component stocks in Dow Jones Industrial Average Index (DJI) are provided, and shown with the real-time stock price for comparison and analysis. The information embedding in the OIs and the application scenarios are introduced.

• #11031
ACTA: A Tool for Argumentative Clinical Trial Analysis
Tobias Mayer, Elena Cabrio, Serena Villata
Demo Talks 3

Argumentative analysis of textual documents of various nature (e.g., persuasive essays, online discussion blogs, scientific articles) allows to detect the main argumentative components (i.e., premises and claims) present in the text and to predict whether these components are connected to each other by argumentative relations (e.g., support and attack), leading to the identification of (possibly complex) argumentative structures. Given the importance of argument-based decision making in medicine, in this demo paper we introduce ACTA, a tool for automating the argumentative analysis of clinical trials. The tool is designed to support doctors and clinicians in identifying the document(s) of interest about a certain disease, and in analyzing the main argumentative content and PICO elements.

• #11021
OpenMarkov, an open-source tool for probabilistic graphical models
Manuel Arias, Jorge Pérez-Martín, Manuel Luque, Francisco Javier Diez
Demo Talks 3

OpenMarkov is a Java open-source tool for creating and evaluating probabilistic graphical models, including Bayesian networks, influence diagrams, and some Markov models. With more than 100,000 lines of code, it offers some features for interactive learning, explanation of reasoning, and cost-effectiveness analysis, which are not available in any other tool. OpenMarkov has been used at universities, research centers, and large companies in more than 30 countries on four continents. Several models, some of them for real-world medical applications, built with OpenMarkov, are publicly available on Internet.

• #11042
SAGE: A Hybrid Geopolitical Event Forecasting System
Fred Morstatter, Aram Galstyan, Gleb Satyukov, Daniel Benjamin, Andres Abeliuk, Mehrnoosh Mirtaheri, KSM Tozammel Hossain, Pedro Szekely, Emilio Ferrara, Akira Matsui, Mark Steyvers, Stephen Bennet, David Budescu, Mark Himmelstein, Michael Ward, Andreas Beger, Michele Catasta, Rok Sosic, Jure Leskovec, Pavel Atanasov, Regina Joseph, Rajiv Sethi, Ali Abbas
Demo Talks 3

Forecasting of geopolitical events is a notoriously difficult task, with experts failing to significantly outperform a random baseline across many types of forecasting events. One successful way to increase the performance of forecasting tasks is to turn to crowdsourcing: leveraging many forecasts from non-expert users. Simultaneously, advances in machine learning have led to models that can produce reasonable, although not perfect, forecasts for many tasks. Recent efforts have shown that forecasts can be further improved by hybridizing'' human forecasters: pairing them with the machine models in an effort to combine the unique advantages of both. In this demonstration, we present Synergistic Anticipation of Geopolitical Events (SAGE), a platform for human/computer interaction that facilitates human reasoning with machine models.

• #11047
VEST: A System for Vulnerability Exploit Scoring & Timing
Haipeng Chen, Jing Liu, Rui Liu, Noseong Park, V. S. Subrahmanian
Demo Talks 3

Knowing if/when a cyber-vulnerability will be exploited and how severe the vulnerability is can help enterprise security officers (ESOs) come up with appropriate patching schedules. Today, this ability is severely compromised: our study of data from Mitre and NIST shows that on average there is a 132 day gap between the announcement of a vulnerability by Mitre and the time NIST provides an analysis with severity score estimates and 8 important severity attributes. Many attacks happen during this very 132-day window. We present Vulnerability Exploit Scoring \& Timing (VEST), a system for (early) prediction and visualization of if/when a vulnerability will be exploited, and its estimated severity attributes and score.

• #11037
The pywmi Framework and Toolbox for Probabilistic Inference using Weighted Model Integration
Samuel Kolb, Paolo Morettin, Pedro Zuidberg Dos Martires, Francesco Sommavilla, Andrea Passerini, Roberto Sebastiani, Luc De Raedt
Demo Talks 3

Weighted Model Integration (WMI) is a popular technique for probabilistic inference that extends Weighted Model Counting (WMC) -- the standard inference technique for inference in discrete domains -- to domains with both discrete and continuous variables.  However, existing WMI solvers each have different interfaces and use different formats for representing WMI problems.  Therefore, we introduce pywmi (http://pywmi.org), an open source framework and toolbox for probabilistic inference using WMI, to address these shortcomings.  Crucially, pywmi fixes a common internal format for WMI problems and introduces a common interface for WMI solvers.  To assist users in modeling WMI problems, pywmi introduces modeling languages based on SMT-LIB.v2 or MiniZinc and parsers for both.  To assist users in comparing WMI solvers, pywmi includes implementations of several state-of-the-art solvers, a fast approximate WMI solver, and a command-line interface to solve WMI problems.  Finally, to assist developers in implementing new solvers, pywmi provides Python implementations of commonly used subroutines.

• #11046
CoTrRank: Trust Evaluation of Users and Tweets
Peiyao Li, Weiliang Zhao, Jian Yang, Jia Wu
Demo Talks 3

Trust evaluation of people and information on Twitter is critical for maintaining a healthy online social environment. How to evaluate the trustworthiness of users and tweets becomes a challenging question. In this demo, we show how our proposed CoTrRank approach deal with this problem. This approach models users and tweets in two coupled networks and calculate their trust values in different trust spaces. In particular, our solution provides a configurable way when mapping the calculated raw evidences to the trust values. The CoTrRank demo system has an interactive interface to show how our proposed approach produces more effective and adaptive trust evaluation results comparing with baseline methods.

• #11035
Intelligent Decision Support for Improving Power Management
Yongqing Zheng, Han Yu, Kun Zhang, Yuliang Shi, Cyril Leung, Chunyan Miao
Demo Talks 3

With the development and adoption of the electricity information tracking system in China, real-time electricity consumption big data have become available to enable artificial intelligence (AI) to help power companies and the urban management departments to make demand side management decisions. We demonstrate the Power Intelligent Decision Support (PIDS) platform, which can generate Orderly Power Utilization (OPU) decision recommendations and perform Demand Response (DR) implementation management based on a short-term load forecasting model. It can also provide different users with query and application functions to facilitate explainable decision support.

• #11053
GraspSnooker: Automatic Chinese Commentary Generation for Snooker Videos
Zhaoyue Sun, Jiaze Chen, Hao Zhou, Deyu Zhou, Lei Li, Mingmin Jiang
Demo Talks 3

We demonstrate a web-based software system, GraspSnooker, which is able to automatically generate Chinese text commentaries for snooker game videos. It consists of a video analyzer, a strategy predictor and a commentary generator. As far as we know, it is the first attempt on snooker commentary generation, which might be helpful for snooker learners to understand the game.

### Wednesday 1414:00 - 14:50Invited Talk (R1)

• Deep Learning: Why deep and is it only doable for neural networks?
Zhi-Hua Zhou
Invited Talk
• ### Wednesday 1415:00 - 16:00AI-HWB - ST: AI for Improving Human Well-Being 3 (R2)

Chair: TBD
• #2209
A Decomposition Approach for Urban Anomaly Detection Across Spatiotemporal Data
Mingyang Zhang, Tong Li, Hongzhi Shi, Yong Li, Pan Hui
ST: AI for Improving Human Well-Being 3

Urban anomalies such as abnormal flow of crowds and traffic accidents could result in loss of life or property if not handled properly. Detecting urban anomalies at the early stage is important to minimize the adverse effects. However, urban anomaly detection is difficult due to two challenges: a) the criteria of urban anomalies varies with different locations and time; b) urban anomalies of different types may show different signs. In this paper, we propose a decomposing approach to address these two challenges. Specifically, we decompose urban dynamics into the normal component and the abnormal component. The normal component is merely decided by spatiotemporal features, while the abnormal component is caused by anomalous events. Then, we extract spatiotemporal features and estimate the normal component accordingly. At last, we derive the abnormal component to identify anomalies. We evaluate our method using both real-world and synthetic datasets. The results show our method can detect meaningful events and outperforms state-of-the-art anomaly detecting methods by a large margin.

• #3473
RDPD: Rich Data Helps Poor Data via Imitation
Shenda Hong, Cao Xiao, Trong Nghia Hoang, Tengfei Ma, Hongyan Li, Jimeng Sun
ST: AI for Improving Human Well-Being 3

In many situations, we need to build and deploy separate models in related environments with different data qualities. For example, an environment with strong observation equipments (e.g., intensive care units) often provides high-quality multi-modal data, which are acquired from multiple sensory devices and have rich-feature representations. On the other hand, an environment with poor observation equipment (e.g., at home) only provides low-quality, uni-modal data with poor-feature representations. To deploy a competitive model in a poor-data environment without requiring direct access to multi-modal data acquired from a rich-data environment, this paper develops and presents a knowledge distillation (KD) method (RDPD) to enhance a predictive model trained on poor data using knowledge distilled from a high-complexity model trained on rich, private data. We evaluated RDPD on three real-world datasets and shown that its distilled model consistently outperformed all baselines across all datasets, especially achieving the greatest performance improvement over a model trained only on low-quality data by 24.56% on PR-AUC and 12.21% on ROC-AUC, and over that of a state-of-the-art KD model by 5.91% on PR-AUC and 4.44% on ROC-AUC.

• #215
Systematic Conservation Planning for Sustainable Land-use Policies: A Constrained Partitioning Approach to Reserve Selection and Design.
Dimitri Justeau-Allaire, Philippe Vismara, Xavier Lorca, Philippe Birnbaum
ST: AI for Improving Human Well-Being 3

Faced with natural habitat degradation, fragmentation, and destruction, it is a major challenge for environmental managers to implement sustainable land use policies promoting socioeconomic development and natural habitat conservation in a balanced way. Relying on artificial intelligence and operational research, reserve selection and design models can be of assistance. This paper introduces a partitioning approach based on Constraint Programming (CP) for the reserve selection and design problem, dealing with both coverage and complex spatial constraints. Moreover, it introduces the first CP formulation of the buffer zone constraint, which can be reused to compose more complex spatial constraints. This approach has been evaluated in a real-world dataset addressing the problem of forest fragmentation in New Caledonia, a biodiversity hotspot where managers are gaining interest in integrating these methods into their decisional processes. Through several scenarios, it showed expressiveness, flexibility, and ability to quickly find solutions to complex questions.

• #4746
Balanced Ranking with Diversity Constraints
Ke Yang, Vasilis Gkatzelis, Julia Stoyanovich
ST: AI for Improving Human Well-Being 3

Many set selection and ranking algorithms have recently been enhanced with diversity constraints that aim to explicitly increase representation of historically disadvantaged populations, or to improve the over-all representativeness of the selected set. An unintended consequence of these constraints, however, is reduced in-group fairness: the selected candidates from a given group may not be the best ones, and this unfairness may not be well-balanced across groups. In this paper we study this phenomenon using datasets that comprise multiple sensitive attributes. We then introduce additional constraints, aimed at balancing the in-group fairness across groups, and formalize the induced optimization problems as integer linear programs. Using these programs, we conduct an experimental evaluation with real datasets, and quantify the feasible trade-offs between balance and overall performance in the presence of diversity constraints.

### Wednesday 1415:00 - 16:00ML|AML - Adversarial Machine Learning 2 (R3)

Chair: TBD
• #5340
Topology Attack and Defense for Graph Neural Networks: An Optimization Perspective
Kaidi Xu, Hongge Chen, Sijia Liu, Pin-Yu Chen, Tsui-Wei Weng, Mingyi Hong, Xue Lin
Adversarial Machine Learning 2

Graph neural networks (GNNs) which apply the deep neural networks to graph data have achieved significant performance for the task of semi-supervised node classification. However, only few work has addressed the adversarial robustness of GNNs. In this paper, we first present a novel gradient-based attack method that facilitates the difficulty of tackling discrete graph data. When comparing to current adversarial attacks on GNNs, the results show that by only perturbing a small number of edge perturbations, including addition and deletion, our optimization-based attack can lead to a noticeable decrease in classification performance. Moreover, leveraging our gradient-based attack, we propose the first optimization-based adversarial training for GNNs. Our method yields higher robustness against both different gradient based and greedy attack methods without sacrifice classification accuracy on original graph.

• #5480
Feature Prioritization and Regularization Improve Standard Accuracy and Adversarial Robustness
Chihuang Liu, Joseph JaJa
Adversarial Machine Learning 2

Adversarial training has been successfully applied to build robust models at a certain cost. While the robustness of a model increases, the standard classification accuracy declines. This phenomenon is suggested to be an inherent trade-off. We propose a model that employs feature prioritization by a nonlinear attention module and L2 feature regularization to improve the adversarial robustness and the standard accuracy relative to adversarial training. The attention module encourages the model to rely heavily on robust features by assigning larger weights to them while suppressing non-robust features. The regularizer encourages the model to extract similar features for the natural and adversarial images, effectively ignoring the added perturbation. In addition to evaluating the robustness of our model, we provide justification for the attention module and propose a novel experimental strategy that quantitatively demonstrates that our model is almost ideally aligned with salient data characteristics. Additional experimental results illustrate the power of our model relative to the state of the art methods.

• #5913
Interpreting and Evaluating Neural Network Robustness
Fuxun Yu, Zhuwei Qin, Chenchen Liu, Liang Zhao, Yanzhi Wang, Xiang Chen
Adversarial Machine Learning 2

Recently, adversarial deception becomes one of the most considerable threats to deep neural networks. However, compared to extensive research in new designs of various adversarial attacks and defenses, the neural networks? intrinsic robustness property is still lack of thorough investigation. This work aims to qualitatively interpret the adversarial attack and defense mechanisms through loss visualization, and establish a quantitative metric to evaluate the model?s intrinsic robustness. The proposed robustness metric identifies the upper bound of a model?s prediction divergence in the given domain and thus indicates whether the model can maintain a stable prediction. With extensive experiments, our metric demonstrates several advantages over conventional testing accuracy based robustness estimation: (1) it provides a uniformed evaluation to models with different structures and parameter scales; (2) it over-performs conventional accuracy based robustness evaluation and provides a more reliable evaluation that is invariant to different test settings; (3) it can be fast generated without considerable testing cost.

• #4224
DiffChaser: Detecting Disagreements for Deep Neural Networks
Xiaofei Xie, Lei Ma, Haijun Wang, Yuekang Li, Yang Liu, Xiaohong Li
Adversarial Machine Learning 2

The platform migration and customization have be-come an indispensable process of deep neural net-work (DNN) development.  A high-precision but complex DNN trained in the cloud on massive data and powerful GPUs often goes through an optimization phase (e.g., quantization, compression) before deployment to an environment with hardware constraints (e.g.,  mobile devices).  A test set that effectively uncovers the behavior disagreement of a DNN and its optimized variant provides certain feedback to debug and further enhance the optimization procedure. However, the minor behavior inconsistency between a DNN and its optimized version could be hard to detect, which often easily bypasses the original test set.  This paper proposes Deetector, an automated black-box testing framework to detect untargeted/targeted disagreements between version variants of a DNN. We first define the uncertainty of a DNN against an input with the metrics to quantify the measurement. A key observation is that the uncertainty inputs cover the major sources that cause the disagreements between a DNN and its variants.  Based on this, we propose a genetic algorithm based test generation technique to systematically detect the disagreement of DNN behaviors.We demonstrate 1) the effectiveness of Deetector by comparing with the state-of-the-art techniques,and 2) the usefulness on real-world DNN products during DNN quantization and optimization.

### Wednesday 1415:00 - 16:00MLA|ARL - Applications of Reinforcement Learning (R4)

Chair: TBD
• #1508
Dynamic Electronic Toll Collection via Multi-Agent Deep Reinforcement Learning with Edge-Based Graph Convolutional Network Representation
Wei Qiu, Haipeng Chen, Bo An
Applications of Reinforcement Learning

Over the past decades, Electronic Toll Collection (ETC) systems have been proved the capability of alleviating traffic congestion in urban areas. Dynamic Electronic Toll Collection (DETC) was recently proposed to further improve the efficiency of ETC, where tolls are dynamically set based on traffic dynamics. However, computing the optimal DETC scheme is computationally difficult and existing approaches are limited to small scale or partial road networks, which significantly restricts the adoption of DETC. To this end, we propose a novel multi-agent reinforcement learning (RL) approach for DETC. We make several key contributions: i) an enhancement over the state-of-the-art RL-based method with a deep neural network  representation of the policy and value functions and a temporal difference learning framework to accelerate the update of target values, ii) a novel edge-based graph convolutional neural network (eGCN) to extract the spatio-temporal correlations of the road network state features, iii) a novel cooperative multi-agent reinforcement learning (MARL) which divides the whole road network into partitions according to their geographic and economic characteristics and trains a tolling agent for each partition. Experimental results show that our approach can scale up to realistic-sized problems with robust performance and significantly outperform the state-of-the-art method.

• #3513
Randomized Adversarial Imitation Learning for Autonomous Driving
MyungJae Shin, Joongheon Kim
Applications of Reinforcement Learning

With the evolution of various advanced driver assistance system (ADAS) platforms, the design of autonomous driving system is becoming more complex and safety-critical. The autonomous driving system simultaneously activates multiple ADAS functions; and thus it is essential to coordinate various ADAS functions. This paper proposes a randomized adversarial imitation learning (RAIL) method that imitates the coordination of autonomous vehicle equipped with advanced sensors. The RAIL policies are trained through derivative-free optimization for the decision maker that coordinates the proper ADAS functions, e.g., smart cruise control and lane keeping system. Especially, the proposed method is also able to deal with the LIDAR data and makes decisions in complex multi-lane highways and multi-agent environments.

• #10983
(Journal track) Teaching AI Agents Ethical Values Using Reinforcement Learning and Policy Orchestration
Ritesh Noothigattu, Djallel Bouneffouf, Nicholas Mattei, Rachita Chandra, Piyush Madan, Kush R. Varshney, Murray Campbell, Moninder Singh, Francesca Rossi
Applications of Reinforcement Learning

Autonomous cyber-physical agents play an increasingly large role in our lives. To ensure that they behave in ways aligned with the values of society, we must develop techniques that allow these agents to not only maximize their reward in an environment, but also to learn and follow the implicit constraints of society. We detail a novel approach that uses inverse reinforcement learning to learn a set of unspecified constraints from demonstrations and reinforcement learning to learn to maximize environmental rewards. A contextual bandit-based orchestrator then picks between the two policies: constraint-based and environment reward-based. The contextual bandit orchestrator allows the agent to mix policies in novel ways, taking the best actions from either a reward-maximizing or constrained policy. In addition, the orchestrator is transparent on which policy is being employed at each time step. We test our algorithms using Pac-Man and show that the agent is able to learn to act optimally, act within the demonstrated constraints, and mix these two functions in complex ways.

• #2445
Building Personalized Simulator for Interactive Search
Qianlong Liu, Baoliang Cui, Zhongyu Wei, Baolin Peng, Haikuan Huang, Hongbo Deng, Jianye Hao, Xuanjing Huang, Kam-Fai Wong
Applications of Reinforcement Learning

Interactive search, where a set of tags is recommended to users together with search results at each turn, is an effective way to guide users to identify their information need. It is a classical sequential decision problem and the reinforcement learning based agent can be introduced as a solution. The training of the agent can be divided into two stages, i.e., offline and online. Existing reinforcement learning based systems tend to perform the offline training in a supervised way based on historical labeled data while the online training is performed via reinforcement learning algorithms based on interactions with real users. The mis-match between online and offline training leads to a cold-start problem for the online usage of the agent. To address this issue, we propose to employ a simulator to mimic the environment for the offline training of the agent. Users' profiles are considered to build a personalized simulator, besides, model-based approach is used to train the simulator and is able to use the data efficiently. Experimental results based on real-world dataset demonstrate the effectiveness of our agent and personalized simulator.

### Wednesday 1415:00 - 16:00AMS|NG - Noncooperative Games 2 (R5)

Chair: TBD
• #724
Multi-Population Congestion Games With Incomplete Information
Charlotte Roman, Paolo Turrini
Noncooperative Games 2

Congestion games have many important applications to systems where only limited knowledge may be available to players. Here we study traffic networks with multiple origin-destination pairs, relaxing the simplifying assumption of agents having complete knowledge of the network structure. We identify a ubiquitous class of networks, i.e., rings, for which we can safely increase the agents’ knowledge without affecting their own overall performance - known as immunity to Informational Braess’ Paradox - closing a gap in the literature. By extension of this performance measure to include the welfare of all agents, i.e., minimisation of social cost, we show that IBP is a widespread phenomenon and no network is immune to it.

• #1706
The Imitative Attacker Deception in Stackelberg Security Games
Thanh Nguyen, Haifeng Xu
Noncooperative Games 2

To address the challenge of uncertainty regarding the attacker’s payoffs, capabilities, and other characteristics, recent work in security games has focused on learning the optimal defense strategy from observed attack data. This raises a natural concern that the strategic attacker may mislead the defender by deceptively reacting to the learning algorithms. This paper focuses on understanding how such attacker deception affects the game equilibrium. We examine a basic deception strategy termed imitative deception, in which the attacker simply pretends to have a different payoff assuming his true payoff is unknown to the defender. We provide a clean characterization about the game equilibrium as well as optimal algorithms to compute the equilibrium. Our experiments illustrate significant defender loss due to imitative attacker deception, suggesting the potential side effect of learning from the attacker.

• #4919
Optimality and Nash Stability in Additive Separable Generalized Group Activity Selection Problems
Vittorio Bilò, Angelo Fanelli, Michele Flammini, Gianpiero Monaco, Luca Moscardelli
Noncooperative Games 2

The generalized group activity selection problem (GGASP) consists in assigning agents to activities according to their preferences, which depend on both the activity and the set of its participants. We consider additively separable GGASPs, where every agent has a separate valuation for each activity as well as for any other agent, and her overall utility is given by the sum of the valuations she has for the selected activity and its participants. Depending on the nature of the agents' valuations, nine different variants of the problem arise. We completely characterize the complexity of computing a social optimum and provide approximation algorithms for the NP-hard cases. We also focus on Nash stable outcomes, for which we give some complexity results and a full picture of the related performance by providing tights bounds on both the price of anarchy and the price of stability.

• #5162
Leadership in Congestion Games: Multiple User Classes and Non-Singleton Actions
Alberto Marchesi, Matteo Castiglioni, Nicola Gatti
Noncooperative Games 2

We study the problem of finding Stackelberg equilibria in games with a massive number of players. So far, the only known game instances in which the problem is solved in polynomial time are some particular congestion games. However, a complete characterization of hard and easy instances is still lacking. In this paper, we extend the state of the art along two main directions. First, we focus on games where players' actions are made of multiple resources, and we prove that the problem is NP-hard and not in Poly-APX unless P = NP, even in the basic case in which players are symmetric, their actions are made of only two resources, and the cost functions are monotonic. Second, we focus on games with singleton actions where the players are partitioned into classes, depending on which actions they have available. In this case, we provide a dynamic programming algorithm that finds an equilibrium in polynomial time, when the number of classes is fixed and the leader plays pure strategies. Moreover, we prove that, if we allow for leader's mixed strategies, then the problem becomes NP-hard even with only four classes and monotonic costs. Finally, for both settings, we provide mixed-integer linear programming formulations, and we experimentally evaluate their scalability on both random game instances and worst-case instances based on our hardness reductions.

### Wednesday 1415:00 - 16:00AMS|CC - Coordination and Cooperation (R6)

Chair: TBD
• #1084
AsymDPOP: Complete Inference for Asymmetric Distributed Constraint Optimization Problems
Yanchen Deng, Ziyu Chen, Dingding Chen, Wenxin Zhang, Xingqiong Jiang
Coordination and Cooperation

Asymmetric distributed constraint optimization problems (ADCOPs) are an emerging model for coordinating agents with personal preferences. However, the existing inference-based complete algorithms which use local eliminations cannot be applied to ADCOPs, as the parent agents are required to transfer their private functions to their children. Rather than disclosing private functions explicitly to facilitate local eliminations, we solve the problem by enforcing delayed eliminations and propose AsymDPOP, the first inference-based complete algorithm for ADCOPs. To solve the severe scalability problems incurred by delayed eliminations, we propose to reduce the memory consumption by propagating a set of smaller utility tables instead of a joint utility table, and to reduce the computation efforts by sequential optimizations instead of joint optimizations. The empirical evaluation indicates that AsymDPOP significantly outperforms the state-of-the-art, as well as the vanilla DPOP with PEAV formulation.

• #1894
ATSIS: Achieving the Ad hoc Teamwork by Sub-task Inference and Selection
Shuo Chen, Ewa Andrejczuk, Athirai A. Irissappane, Jie Zhang
Coordination and Cooperation

In an ad hoc teamwork setting, the team needs to coordinate their activities to perform a task without prior agreement on how to achieve it. The ad hoc agent cannot communicate with its teammates but it can observe their behaviour and plan accordingly. To do so, the existing approaches rely on the teammates' behaviour models. However, the models may not be accurate, which can compromise teamwork. For this reason, we present Ad Hoc Teamwork by Sub-task Inference and Selection (ATSIS) algorithm that uses a sub-task inference without relying on teammates' models. First, the ad hoc agent observes its teammates to infer which sub-tasks they are handling. Based on that, it selects its own sub-task using a partially observable Markov decision process that handles the uncertainty of the sub-task inference. Last, the ad hoc agent uses the Monte Carlo tree search to find the set of actions to perform the sub-task. Our experiments show the benefits of ATSIS for robust teamwork.

• #5598
Ad hoc Teamwork With Behavior Switching Agents
Manish Ravula, Shani Alkoby, Peter Stone
Coordination and Cooperation

As autonomous AI agents proliferate in the real world, they will increasingly need to cooperate with each other to achieve complex goals without always being able to coordinate in advance. This kind of cooperation, in which agents have to learn to cooperate on the fly, is called ad hoc teamwork. Many previous works investigating this setting assumed that teammates behave according to one of many predefined types that is fixed throughout the task. This assumption of stationarity in behaviors, is a strong assumption which cannot be guaranteed in many real-world settings. In this work, we relax this assumption and investigate settings in which teammates can change their types during the course of the task. This adds complexity to the planning problem as now an agent needs to recognize that a change has occurred in addition to figuring out what is the new type of the teammate it is interacting with. In this paper, we present a novel Convolutional-Neural-Network-based Change point Detection (CPD) algorithm for ad hoc teamwork. When evaluating our algorithm on the modified predator prey domain, we find that it outperforms existing Bayesian CPD algorithms.

• #6462
Integrating Decision Sharing with Prediction in Decentralized Planning for Multi-Agent Coordination under Uncertainty
Minglong Li, Wenjing Yang, Zhongxuan Cai, Shaowu Yang, Ji Wang
Coordination and Cooperation

The performance of decentralized multi-agent systems tends to benefit from information sharing and its effective utilization. However, too much or unnecessary sharing may hinder the performance due to the delay, instability and additional overhead of communications. Aiming to a satisfiable coordination performance, one would prefer the cost of communications as less as possible. In this paper, we propose an approach for improving the sharing utilization by integrating information sharing with prediction in decentralized planning. We present a novel planning algorithm by combining decision sharing and prediction based on decentralized Monte Carlo Tree Search called Dec-MCTS-SP. Each agent grows a search tree guided by the rewards calculated by the joint actions, which can not only be sampled from the shared probability distributions over action sequences, but also be predicted by a sufficiently-accurate and computationally-cheap heuristics-based method. Besides, several policies including sparse and discounted UCT and DIY-bonus are leveraged for performance improvement. We have implemented Dec-MCTS-SP in the case study on multi-agent information gathering under threat and uncertainty, which is formulated as Decentralized Partially Observable Markov Decision Process (Dec-POMDP). The factored belief vectors are integrated into Dec-MCTS-SP to handle the uncertainty. Comparing with the random, auction-based algorithm and Dec-MCTS, the evaluation shows that Dec-MCTS-SP can reduce communication cost significantly while still achieving a surprisingly higher coordination performance.

### Wednesday 1415:00 - 16:00ML|TDS - Time-series;Data Streams 2 (R7)

Chair: TBD
• #4435
ARMIN: Towards a More Efficient and Light-weight Recurrent Memory Network
Zhangheng Li, Jia-Xing Zhong, Jingjia Huang, Tao Zhang, Thomas Li, Ge Li
Time-series;Data Streams 2

In recent years, memory-augmented neural networks(MANNs) have shown promising power to enhance the memory ability of neural networks for sequential processing tasks. However, previous MANNs suffer from complex memory addressing mechanism, making them relatively hard to train and causing computational overheads. Moreover, many of them reuse the classical RNN structure such as LSTM for memory processing, causing inefficient exploitations of memory information. In this paper, we introduce a novel MANN, the Auto-addressing and Recurrent Memory Integrating Network (ARMIN) to address these issues. The ARMIN only utilizes hidden state h_t for automatic memory addressing, and uses a novel RNN cell for refined integration of memory information. Empirical results on a variety of experiments demonstrate that the ARMIN is more light-weight and efficient compared to existing memory networks. Moreover, we demonstrate that the ARMIN can achieve much lower computational overhead than vanilla LSTM while keeping similar performances. Codes are available on github.com/zoharli/armin.

• #5764
AddGraph: Anomaly Detection in Dynamic Graph Using Attention-based Temporal GCN
Li Zheng, Zhenpeng Li, Jian Li, Zhao Li, Jun Gao
Time-series;Data Streams 2

Anomaly detection in dynamic graphs becomes very critical in many different application scenarios, e.g., recommender systems, while it also raises huge challenges due to the high flexible nature of anomaly and lack of sufficient labelled data. It is better to learn the anomaly patterns by considering all possible features including the structural, content and temporal features, rather than utilizing heuristic rules over the partial features. In this paper, we propose AddGraph, a general end-to-end anomalous edge detection framework using an extended temporal GCN (Graph Convolutional Network) with an attention model, which can capture both long-term patterns and the short-term patterns in dynamic graphs. In order to cope with insufficient explicit labelled data, we employ the negative sampling and margin loss in training of AddGraph in a semi-supervised fashion. We conduct extensive experiments on real-world datasets, and illustrate that AddGraph can outperform the state-of-the-art competitors in anomaly detection significantly.

• #5985
Patent Citation Dynamics Modeling via Multi-Attention Recurrent Networks
Taoran Ji, Zhiqian Chen, Nathan Self, Kaiqun Fu, Chang-Tien Lu, Naren Ramakrishnan
Time-series;Data Streams 2

Modeling and forecasting forward citations to a patent is a central task for the discovery of emerging technologies and for measuring the pulse of inventive progress. Conventional methods for forecasting these forward citations cast the problem as analysis of temporal point processes which rely on the conditional intensity of previously received citations. Recent approaches model the conditional intensity as a chain of recurrent neural networks to capture memory dependency in hopes of reducing the restrictions of the parametric form of the intensity function. For the problem of patent citations, we observe that forecasting a patent's chain of citations benefits from not only the patent's history itself but also from the historical citations of assignees and inventors associated with that patent. In this paper, we propose a sequence-to-sequence model which employs an attention-of-attention mechanism to capture the dependencies of these multiple time sequences. Furthermore, the proposed model is able to forecast both the timestamp and the category of a patent's next citation. Extensive experiments on a large patent citation dataset collected from USPTO demonstrate that the proposed model outperforms state-of-the-art models at forward citation forecasting.

• #4437
Monitoring of a Dynamic System Based on Autoencoders
Aomar Osmani, Massinissa Hamidi, Salah Bouhouche
Time-series;Data Streams 2

Monitoring industrial infrastructures are undergoing a critical transformation with industry 4.0.  Monitoring solutions must follow the system behavior in real time and must adapt to its continuous change. We propose in this paper an autoencoder model-based approach for tracking abnormalities in industrial application. A set of sensors collects data from turbo-compressors and an original two-level machine learning LSTM autoencoder architecture defines a continuous nominal vibration model. Normalized thresholds (ISO 20816) between the model and the system generates a possible abnormal situation to diagnose. Experimental results, including hyper-parameter optimization on large real data and domain expert analysis, show that our proposed solution gives promising results.

### Wednesday 1415:00 - 16:00KRR|BC - Belief Change (R8)

Chair: TBD
• #1357
Observations on Darwiche and Pearl's Approach for Iterated Belief Revision
Theofanis Aravanis, Pavlos Peppas, Mary-Anne Williams
Belief Change

Notwithstanding the extensive work on iterated belief revision, there is, still, no fully satisfactory solution within the classical AGM paradigm. The seminal work of Darwiche and Pearl (DP approach, for short) remains the most dominant, despite its well-documented shortcomings. In this article, we make further observations on the DP approach. Firstly, we prove that the DP postulates are, in a strong sense, inconsistent with Parikh's relevance-sensitive axiom (P), extending previous initial conflicts. Immediate consequences of this result are that an entire class of intuitive revision operators, which includes Dalal's operator, violates the DP postulates, as well as that the Independence postulate and Spohn's conditionalization are inconsistent with (P). Lastly, we show that the DP postulates allow for more revision polices than the ones that can be captured by identifying belief states with total preorders over possible worlds, a fact implying that a preference ordering (over possible worlds) is an insufficient representation for a belief state.

• #4693
Belief Revision Operators with Varying Attitudes Towards Initial Beliefs
Adrian Haret, Stefan Woltran
Belief Change

Classical axiomatizations of belief revision include a postulate stating that if new information is consistent with initial beliefs, then revision amounts to simply adding the new information to the original knowledge base. This postulate assumes a conservative attitude towards initial beliefs, in the sense that an agent faced with the need to revise them will seek to preserve initial beliefs as much as possible. In this work we look at operators that can assume different attitudes towards original beliefs. We provide axiomatizations of these operators by varying the aforementioned postulate and obtain representation results that characterize the new types of operators using preorders on possible worlds. We also present concrete examples for each new type of operator, adapting notions from decision theory.

• #5451
Belief Update without Compactness in Non-finitary Languages
Jandson S Ribeiro, Abhaya Nayak, Renata Wassermann
Belief Change

The main paradigms of belief change require the background logic to be Tarskian and finitary. We look at belief update when the underlying logic is not necessarily finitary. We show that in this case the classical construction for KM update does not capture all the rationality postulates for KM belief update. Indeed, this construction, being fully characterised by a subset of the KM update postulates, is weaker. We explore the reason behind this, and subsequently provide an alternative constructive accounts of belief update which is characterised by the full set of KM postulates in this more general framework.

• #10980
(Journal track) Shielded Base Contraction
Marco Garapa, Eduardo Fermé, Maurício D. L. Reis
Belief Change

In this paper we study a kind of non-prioritized contraction operator on belief bases -known as shielded base contractions. We propose twenty different classes of shielded base contractions and obtain axiomatic characterizations for each one of them. Additionally we thoroughly investigate the interrelations (in the sense of inclusion) among all those classes.

### Wednesday 1415:00 - 16:00NLP|IE - Information Extraction 2 (R9)

Chair: TBD
• #3104
Beyond Word Attention: Using Segment Attention in Neural Relation Extraction
Bowen Yu, Zhenyu Zhang, Tingwen Liu, Bin Wang, Sujian Li, Quangang Li
Information Extraction 2

Relation extraction studies the issue of predicting semantic relations between pairs of entities in sentences. Attention mechanisms are often used in this task to alleviate the inner-sentence noise by performing soft selections of words independently. Based on the observation that information pertinent to relations is usually contained within segments (continuous words in a sentence), it is possible to make use of this phenomenon for better extraction. In this paper, we aim to incorporate such segment information into neural relation extractor. Our approach views the attention mechanism as linear-chain conditional random fields over a set of latent variables whose edges encode the desired structure, and regards attention weight as the marginal distribution of each word being selected as a part of the relational expression. Experimental results show that our method can attend to continuous relational expressions without explicit annotations, and achieve the state-of-the-art performance on the large-scale TACRED dataset.

• #3126
Relation Extraction Using Supervision from Topic Knowledge of Relation Labels
Haiyun Jiang, Li Cui, Zhe Xu, Deqing Yang, Jindong Chen, Chenguang Li, Jingping Liu, Jiaqing Liang, Chao Wang, Yanghua Xiao, Wei Wang
Information Extraction 2

Explicitly exploring the semantics of a relation is significant for high-accuracy relation extraction, which is, however, not fully studied in previous works. In this paper, we mine the topic knowledge of a relation to explicitly represent the semantics of this relation, and model relation extraction as a matching problem. That is, the matching score between a sentence and a candidate relation is predicted for an entity pair. To this end, we propose a deep matching network to precisely model the semantic similarity between a sentence-relation pair. Besides, the topic knowledge also allows us to derive the importance information of samples as well as two knowledge-guided negative sampling strategies in the training process. We conduct extensive experiments to evaluate the proposed framework and observe improvements in AUC of 11.5% and max F1 of 5.4% over the baselines with state-of-the-art performance.

• #3304
Extracting Entities and Events as a Single Task Using a Transition-Based Neural Model
Junchi Zhang, Yanxia Qin, Yue Zhang, Mengchi Liu, Donghong Ji
Information Extraction 2

The task of event extraction contains subtasks including detections for entity mentions, event triggers and argument roles. Traditional methods solve them as a pipeline, which does not make use of task correlation for their mutual benefits. There have been recent efforts towards building a joint model for all tasks. However, due to technical challenges, there has not been work predicting the joint output structure as a single task. We build a first model to this end using a neural transition-based framework, incrementally predicting complex joint structures in a state-transition process. Results on standard benchmarks show the benefits of the joint model, which gives the best result in the literature.

• #5076
Early Discovery of Emerging Entities in Microblogs
Satoshi Akasaki, Naoki Yoshinaga, Masashi Toyoda
Information Extraction 2

Keeping up to date on emerging entities that appear every day is indispensable for various applications, such as social-trend analysis and marketing research. Previous studies have attempted to detect unseen entities that are not registered in a particular knowledge base as emerging entities and consequently find non-emerging entities since the absence of entities in knowledge bases does not guarantee their emergence. We therefore introduce a novel task of discovering truly emerging entities when they have just been introduced to the public through microblogs and propose an effective method based on time-sensitive distant supervision, which exploits distinctive early-stage contexts of emerging entities. Experimental results with a large-scale Twitter archive show that the proposed method achieves 83.2% precision of the top 500 discovered emerging entities, which outperforms baselines based on unseen entity recognition with burst detection. Besides notable emerging entities, our method can discover massive long-tail and homographic emerging entities. An evaluation of relative recall shows that the method detects 80.4% emerging entities newly registered in Wikipedia; 92.8% of them are discovered earlier than their registration in Wikipedia, and the average lead-time is more than one year (578 days).

### Wednesday 1415:00 - 16:00CV|DDCV - 2D and 3D Computer Vision 1 (R10)

Chair: TBD
• #4594
Mutually Reinforced Spatio-Temporal Convolutional Tube for Human Action Recognition
Haoze Wu, Jiawei Liu, Zheng-Jun Zha, Zhenzhong Chen, Xiaoyan Sun
2D and 3D Computer Vision 1

Recent works use 3D convolutional neural networks to explore spatio-temporal information for human action recognition. However, they either ignore the correlation between spatial and temporal features or suffer from high computational cost by spatio-temporal features extraction. In this work, we propose a novel and efficient Mutually Reinforced Spatio-Temporal Convolutional Tube (MRST) for human action recognition. It decomposes 3D inputs into spatial and temporal representations, mutually enhances both of them by exploiting the interaction of spatial and temporal information and selectively emphasizes informative spatial appearance and temporal motion, meanwhile reducing the complexity of structure. Moreover, we design three types of MRSTs according to the different order of spatial and temporal information enhancement, each of which contains a spatio-temporal decomposition unit, a mutually reinforced unit and a spatio-temporal fusion unit. An end-to-end deep network, MRST-Net, is also proposed based on the MRSTs to better explore spatio-temporal information in human actions. Extensive experiments show MRST-Net yields the best performance, compared to state-of-the-art approaches.

• #5743
Structure-Aware Residual Pyramid Network for Monocular Depth Estimation
Xiaotian Chen, Xuejin Chen, Zheng-Jun Zha
2D and 3D Computer Vision 1

Monocular depth estimation is an essential task for scene understanding. The underlying structure of objects and stuff in a complex scene is critical to recovering accurate and visually-pleasing depth maps. Global structure conveys scene layouts, while local structure reflects shape details. Recently developed approaches based on convolutional neural networks (CNNs) significantly improve the performance of depth estimation. However, few of them take into account multi-scale structures in complex scenes. In this paper, we propose a Structure-Aware Residual Pyramid Network (SARPN) to exploit multi-scale structures for accurate depth prediction. We propose a Residual Pyramid Decoder (RPD) which expresses global scene structure in upper levels to represent layouts, and local structure in lower levels to present shape details. At each level, we propose Residual Refinement Modules (RRM) that predict residual maps to progressively add finer structures on the coarser structure predicted at the upper level. In order to fully exploit multi-scale image features, an Adaptive Dense Feature Fusion (ADFF) module, which adaptively fuses effective features from all scales for inferring structures of each scale, is introduced. Experiment results on the challenging NYU-Depth v2 dataset demonstrate that our proposed approach achieves state-of-the-art performance in both qualitative and quantitative evaluation. The code is available at https://github.com/Xt-Chen/SARPN.

• #6580
Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity
Liang Liu, Guangyao Zhai, Wenlong Ye, Yong Liu
2D and 3D Computer Vision 1

Scene flow estimation in the dynamic scene remains a challenging task. Computing scene flow by a combination of 2D optical flow and depth has shown to be considerably faster with acceptable performance. In this work, we present a unified framework for joint unsupervised learning of stereo depth and optical flow with explicit local rigidity to estimate scene flow. We estimate camera motion directly by a Perspective-n-Point method from the optical flow and depth predictions, with RANSAC outlier rejection scheme. In order to disambiguate the object motion and the camera motion in the scene, we distinguish the rigid region by the re-project error and the photometric similarity. By joint learning with the local rigidity, both depth and optical networks can be refined. This framework boosts all four tasks: depth, optical flow, camera motion estimation, and object motion segmentation. Through the evaluation on the KITTI benchmark, we show that the proposed framework achieves state-of-the-art results amongst unsupervised methods. Our models and code are available at https://github.com/lliuz/unrigidflow.

• #301
Unsupervised Learning of Monocular Depth and Ego-Motion using Conditional PatchGANs
Madhu Vankadari, Swagat Kumar, Anima Majumder, Kaushik Das
2D and 3D Computer Vision 1

This paper presents a new GAN-based deep learning framework for estimating absolute scale awaredepth and ego motion from monocular images using a completely unsupervised mode of learning.The proposed architecture uses two separate generators to learn the distribution of depth and posedata for a given input image sequence. The depth and pose data, thus generated, are then evaluated bya patch-based discriminator using the reconstructed image and its corresponding actual image. Thepatch-based GAN (or PatchGAN) is shown to detect high frequency local structural defects in thereconstructed image, thereby improving the accuracy of overall depth and pose estimation. Unlikeconventional GANs, the proposed architecture uses a conditioned version of input and output of thegenerator for training the whole network. The resulting framework is shown to outperform all existing deep networks in this field and beating the current state-of-the-art method by 8.7% in absoluteerror and 5.2% in RMSE metric. To the best of our knowledge, this is first deep network based modelto estimate both depth and pose simultaneously using a conditional patch-based GAN paradigm.The efficacy of the proposed approach is demonstrated through rigorous ablation studies and exhaustive performance comparison on the popular KITTI outdoor driving dataset.

### Wednesday 1415:00 - 16:00HSGP|CSO - Combinatorial Search and Optimisation 1 (R11)

Chair: TBD
• #2056
Graph Mining Meets Crowdsourcing: Extracting Experts for Answer Aggregation
Yasushi Kawase, Yuko Kuroki, Atsushi Miyauchi
Combinatorial Search and Optimisation 1

Aggregating responses from crowd workers is a fundamental task in the process of crowdsourcing. In cases where a few experts are overwhelmed by a large number of non-experts, most answer aggregation algorithms such as the majority voting fail to identify the correct answers. Therefore, it is crucial to extract reliable experts from the crowd workers. In this study, we introduce the notion of "expert core", which is a set of workers that is very unlikely to contain a non-expert. We design a graph-mining-based efficient algorithm that exactly computes the expert core. To answer the aggregation task, we propose two types of algorithms. The first one incorporates the expert core into existing answer aggregation algorithms such as the majority voting, whereas the second one utilizes information provided by the expert core extraction algorithm pertaining to the reliability of workers. We then give a theoretical justification for the first type of algorithm. Computational experiments using synthetic and real-world datasets demonstrate that our proposed answer aggregation algorithms outperform state-of-the-art algorithms.

• #1017
Predict+Optimise with Ranking Objectives: Exhaustively Learning Linear Functions
Emir Demirovic, Peter J. Stuckey, James Bailey, Jeffrey Chan, Christopher Leckie, Kotagiri Ramamohanarao, Tias Guns
Combinatorial Search and Optimisation 1

We study the predict+optimise problem, where machine learning and combinatorial optimisation must interact to achieve a common goal. These problems are important when optimisation needs to be performed on input parameters that are not fully observed but must instead be estimated using machine learning. Our contributions are two-fold: 1) we provide theoretical insight into the properties and computational complexity of predict+optimise problems in general, and 2) develop a novel framework that, in contrast to related work, guarantees to compute the optimal parameters for a linear learning function given any ranking optimisation problem. We illustrate the applicability of our framework for the particular case of the unit-weighted knapsack predict+optimise problem and evaluate on benchmarks from the literature.

• #4709
Multiple Policy Value Monte Carlo Tree Search
Li-Cheng Lan, Wei Li, Ting-Han Wei, I-Chen Wu
Combinatorial Search and Optimisation 1

Many of the strongest game playing programs use a combination of Monte Carlo tree search (MCTS) and deep neural networks (DNN), where the DNNs are used as policy or value evaluators. Given a limited budget, such as online playing or during the self-play phase of AlphaZero (AZ) training, a balance needs to be reached between accurate state estimation and more MCTS simulations, both of which are critical for a strong game playing agent. Typically, larger DNNs are better at generalization and accurate evaluation, while smaller DNNs are less costly, and therefore can lead to more MCTS simulations and bigger search trees with the same budget. This paper introduces a new method called the multiple policy value MCTS (MPV-MCTS), which combines multiple policy value neural networks (PV-NNs) of various sizes to retain advantages of each network, where two PV-NNs f_S and f_L are used in this paper. We show through experiments on the game NoGo that a combined f_S and f_L MPV-MCTS outperforms single PV-NN with policy value MCTS, called PV-MCTS. Additionally, MPV-MCTS also outperforms PV-MCTS for AZ training.

• #5485
Model-Based Diagnosis with Multiple Observations
Alexey Ignatiev, Antonio Morgado, Georg Weissenbacher, Joao Marques-Silva
Combinatorial Search and Optimisation 1

Existing automated testing frameworks require multiple observations to be jointly diagnosed with the purpose of identifying common fault locations. This is the case for example with continuous integration tools. This paper shows that existing solutions fail to compute the set of minimal diagnoses, and as a result run times can increase by orders of magnitude. The paper proposes not only solutions to correct existing algorithms, but also conditions for improving their run times. Nevertheless, the diagnosis of multiple observations raises a number of important computational challenges, which even the corrected algorithms are often unable to cope with. As a result, the paper devises a novel algorithm for diagnosing multiple observations, which is shown to enable significant performance improvements in practice.

### Wednesday 1415:00 - 16:00ML|DM - Data Mining 6 (R12)

Chair: TBD
• #1579
Deep Metric Learning: The Generalization Analysis and an Adaptive Algorithm
Mengdi Huai, Hongfei Xue, Chenglin Miao, Liuyi Yao, Lu Su, Changyou Chen, Aidong Zhang
Data Mining 6

As an effective way to learn a distance metric between pairs of samples, deep metric learning (DML) has drawn significant attention in recent years. The key idea of DML is to learn a set of hierarchical nonlinear mappings using deep neural networks, and then project the data samples into a new feature space for comparing or matching. Although DML has achieved practical success in many applications, there is no existing work that theoretically analyzes the generalization error bound for DML, which can measure how good a learned DML model is able to perform on unseen data. In this paper, we try to fill up this research gap and derive the generalization error bound for DML. Additionally, based on the derived generalization bound, we propose a novel DML method (called ADroDML), which can adaptively learn the retention rates for the DML models with dropout in a theoretically justified way. Compared with existing DML works that require predefined retention rates, ADroDML can learn the retention rates in an optimal way and achieve better performance. We also conduct experiments on real-world datasets to verify the findings derived from the generalization error bound and demonstrate the effectiveness of the proposed adaptive DML method.

• #6322
RobustTrend: A Huber Loss with a Combined First and Second Order Difference Regularization for Time Series Trend Filtering
Qingsong Wen, Jingkun Gao, Xiaomin Song, Liang Sun, Jian Tan
Data Mining 6

Extracting the underlying trend signal is a crucial step to facilitate time series analysis like forecasting and anomaly detection. Besides noise signal, time series can contain not only outliers but also abrupt trend changes in real-world scenarios. To deal with these challenges, we propose a robust trend filtering algorithm based on robust statistics and sparse learning. Specifically, we adopt the Huber loss to suppress outliers, and utilize a combination of the first order and second order difference on the trend component as regularization to capture both slow and abrupt trend changes. Furthermore, an efficient method is designed to solve the proposed robust trend filtering based on majorization minimization (MM) and alternative direction method of multipliers (ADMM). We compared our proposed robust trend filter with other nine state-of-the-art trend filtering algorithms on both synthetic and real-world datasets. The experiments demonstrate that our algorithm outperforms existing methods.

• #6358
Geometric Understanding for Unsupervised Subspace Learning
Shihui Ying, Lipeng Cai, Changzhou He, Yaxin Peng
Data Mining 6

In this paper, we address the unsupervised subspace learning from a geometric viewpoint. First, we formulate the subspace learning as an inverse problem on Grassmannian manifold by considering all subspaces as points on it. Then, to make the model computable, we parameterize the Grassmannian manifold by using an orbit of rotation group action on all standard subspaces, which are spanned by the orthonormal basis. Further, to improve the robustness, we introduce a low-rank regularizer which makes the dimension of subspace as low as possible. Thus, the subspace learning problem is transferred to a minimization problem with variables of rotation and dimension. Then, we adopt the alternately iterative strategy to optimize the variables, where a structure-preserving method, based on the geodesic structure of the rotation group, is designed to update the rotation. Finally, we compare the proposed approach with six state-of-the-art methods on three different kinds of real datasets. The experimental results validate that our proposed method outperforms all compared methods.

• #3755
Matching User with Item Set: Collaborative Bundle Recommendation with Attention Network
Liang Chen, Yang Liu, Xiangnan He, Lianli Gao, Zibin Zheng
Data Mining 6

Most recommendation research has been concentrated on recommending single items to users, such as the considerable work on collaborative filtering that models the interaction between a user and an item. However, in many real-world scenarios, the platform needs to show users a set of items, e.g., the marketing strategy that offers multiple items for sale as one bundle.In this work, we consider recommending a set of items to a user, i.e., the Bundle Recommendation task, which concerns the interaction modeling between a user and a set of items. We contribute a neural network solution named DAM, short for Deep Attentive Multi-Task model, which is featured with two special designs: 1) We design a factorized attention network to aggregate the item embeddings in a bundle to obtain the bundle's representation; 2) We jointly model user-bundle interactions and user-item interactions in a multi-task manner to alleviate the scarcity of user-bundle interactions. Extensive experiments on a real-world dataset show that DAM outperforms the state-of-the-art solution, verifying the effectiveness of our attention design and multi-task learning in DAM.

### Wednesday 1415:00 - 16:00ML|SSL - Semi-Supervised Learning 1 (R13)

Chair: TBD
• #1457
Quadruply Stochastic Gradients for Large Scale Nonlinear Semi-Supervised AUC Optimization
Wanli Shi, Bin Gu, Xiang Li, Xiang Geng, Heng Huang
Semi-Supervised Learning 1

Semi-supervised learning is pervasive in real-world applications, where only a few labeled data are available and large amounts of instances remain unlabeled. Since AUC is an important model evaluation metric in classification, directly optimizing AUC in semi-supervised learning scenario has drawn much attention in the machine learning community. Recently, it has been shown that one could find an unbiased solution for the semi-supervised AUC maximization problem without knowing the class prior distribution. However, this method is hardly scalable for nonlinear classification problems with kernels. To address this problem, in this paper, we propose a novel scalable quadruply stochastic gradient algorithm (QSG-S2AUC) for nonlinear semi-supervised AUC optimization. In each iteration of the stochastic optimization process, our method randomly samples a positive instance, a negative instance, an unlabeled instance and their random features to compute the gradient and then update the model by using this quadruply stochastic gradient to approach the optimal solution. More importantly, we prove that QSG-S2AUC can converge to the optimal solution in O(1/t), where t is the iteration number. Extensive experimental results on  a variety of benchmark datasets show that QSG-S2AUC is far more efficient than the existing state-of-the-art algorithms for semi-supervised AUC maximization, while retaining the similar generalization performance.

• #3403
Interpolation Consistency Training for Semi-supervised Learning
Vikas Verma, Alex Lamb, Juho Kannala, Yoshua Bengio, David Lopez-Paz
Semi-Supervised Learning 1

We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm. ICT encourages the prediction at an interpolation of unlabeled points to be consistent with the interpolation of the predictions at those points. In classification problems, ICT moves the decision boundary to low-density regions of the data distribution. Our experiments show that ICT achieves state-of-the-art performance when applied to standard neural network architectures on the CIFAR-10 and SVHN benchmark dataset.

• #5405
Learning Robust Distance Metric with Side Information via Ratio Minimization of Orthogonally Constrained L21-Norm Distances
Kai Liu, Lodewijk Brand, Hua Wang, Feiping Nie
Semi-Supervised Learning 1

Traditional distance metric learning with side information usually formulates the objectives using covariance matrices of the data point pairs in the two constraint sets of must-links and cannot-links. Because the covariance matrix computes the sum of the squared $\ell_{2}$-norm distances, it is sensitive to data noise or outliers. To develop a robust distance metric learning method, we propose a new objective for distance metric learning using $\ell_{2,1}$-norm distances. The resulting objective is challenging to solve, because it simultaneously minimizes and maximizes a number of non-smooth terms. As an important theoretical contribution of this paper, we systematically derive an efficient iterative algorithm to solve the general minmax objectives that use $\ell_{2,1}$-norms. We performed extensive empirical evaluations, where our new distance metric learning method outperforms related state-of-the-art methods in a variety of experimental settings.

• #1319
Robust Learning from Noisy Side-information by Semidefinite Programming
En-Liang Hu, Quanming Yao
Semi-Supervised Learning 1

Robustness recently becomes one of the major concerns among machine learning community, since learning algorithms are usually vulnerable to outliers or corruptions. Motivated by such a trend and needs, we pursue robustness in semi-definite programming (SDP) in this paper. Specifically, this is done by replacing the commonly used squared loss with the more robust L1-loss in the low-rank SDP. However, the resulting objective becomes neither convex nor smooth. As no existing algorithms can be applied, we design an efficient algorithm, based on majorization-minimization, to optimize the objective. The proposed algorithm not only has cheap iterations and low space complexity but also theoretically converges to some critical points. Finally, empirical study shows that the new objective armed with proposed algorithm outperforms state-of-the-art in terms of both speed and accuracy.

### Wednesday 1415:00 - 16:00AMS|ABSE - Agent-Based Simulation and Emergence (R14)

Chair: TBD
• #2874
Swarm Engineering Through Quantitative Measurement of Swarm Robotic Principles in a 10,000 Robot Swarm
John Harwell, Maria Gini
Agent-Based Simulation and Emergence

When designing swarm-robotic systems, system- atic comparison of algorithms from different do- mains is necessary to determine which is capa- ble of scaling up to handle the target problem size and target operating conditions. We propose a set of quantitative metrics for scalability, flexibility, and emergence which are capable of addressing these needs during the system design process. We demonstrate the applicability of our proposed met- rics as a design tool by solving a large object gath- ering problem in temporally varying operating con- ditions using iterative hypothesis evaluation. We provide experimental results obtained in simulation for swarms of over 10,000 robots.

• #5073
Cap-and-Trade Emissions Regulation: A Strategic Analysis
Frank Cheng, Yagil Engel, Michael Wellman
Agent-Based Simulation and Emergence

Cap-and-trade schemes are designed to achieve target levels of regulated emissions in a socially efficient manner. These schemes work by issuing regulatory credits and allowing firms to buy and sell them according to their relative compliance costs. Analyzing the efficacy of such schemes in concentrated industries is complicated by the strategic interactions among firms producing heterogeneous products. We tackle this complexity via an agent-based microeconomic model of the US market for personal vehicles. We calculate Nash equilibria among credits-trading strategies in a variety of scenarios and regulatory models. We find that while cap-and-trade results improves efficiency overall, consumers bear a disproportionate share of regulation cost, as firms use credit trading to segment the vehicle market. Credits trading volume decreases when firms behave more strategically, which weakens the segmentation effect.

• #422
The Price of Governance: A Middle Ground Solution to Coordination in Organizational Control
Chao Yu, Guozhen Tan
Agent-Based Simulation and Emergence

Achieving coordination is crucial in organizational control.  This paper investigates a middle ground solution between decentralized interactions and centralized administrations for coordinating agents beyond inefficient behavior. We first propose the price of governance (PoG) to evaluate how such a middle ground solution performs in terms of effectiveness and cost. We then propose a hierarchical supervision framework to explicitly model the PoG, and define step by step how to realize the core principle of the framework and compute the optimal PoG for a control problem. Two illustrative case studies are carried out to exemplify the applications of the proposed framework and its methodology. Results show that by properly formulating and implementing each step, the hierarchical supervision framework is capable of promoting coordination among agents while bounding administrative cost to a minimum in different kinds of organizational control problems.

• #5627
Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning
Arthur Juliani, Ahmed Khalifa, Vincent-Pierre Berges, Jonathan Harper, Ervin Teng, Hunter Henry, Adam Crespi, Julian Togelius, Danny Lange
Agent-Based Simulation and Emergence

The rapid pace of recent research in AI has been driven in part by the presence of fast and challenging simulation environments. These environments often take the form of games; with tasks ranging from simple board games, to competitive video games. We propose a new benchmark - Obstacle Tower: a high fidelity, 3D, 3rd person, procedurally generated environment. An agent in Obstacle Tower must learn to solve both low-level control and high-level planning problems in tandem while learning from pixels and a sparse reward signal. Unlike other benchmarks such as the Arcade Learning Environment, evaluation of agent performance in Obstacle Tower is based on an agent's ability to perform well on unseen instances of the environment. In this paper we outline the environment and provide a set of baseline results produced by current state-of-the-art Deep RL methods as well as human players. These algorithms fail to produce agents capable of performing near human level.

### Wednesday 1415:00 - 16:00HAI|HAIC - Human-AI Collaboration (R15)

Chair: TBD
• #4728
Exploring Computational User Models for Agent Policy Summarization
Isaac Lage, Daphna Lifschitz, Finale Doshi-Velez, Ofra Amir
Human-AI Collaboration

AI agents are being developed to support high stakes decision-making processes from driving cars to prescribing drugs, making it increasingly important for human users to understand their behavior. Policy summarization methods aim to convey strengths and weaknesses of such agents by demonstrating their behavior in a subset of informative states. Some policy summarization methods extract a summary that optimizes the ability to reconstruct the agent's policy under the assumption that users will deploy inverse reinforcement learning. In this paper, we explore the use of different models for extracting summaries. We introduce an imitation learning-based approach to policy summarization; we demonstrate through computational simulations that a mismatch between the model used to extract a summary and the model used to reconstruct the policy results in worse reconstruction quality; and we demonstrate through a human-subject study that people use different models to reconstruct policies in different contexts, and that matching the summary extraction model to these can improve performance. Together, our results suggest that it is important to carefully consider user models in policy summarization.

• #5120
Why Can’t You Do That HAL? Explaining Unsolvability of Planning Tasks
Sarath Sreedharan, Siddharth Srivastava, David Smith, Subbarao Kambhampati
Human-AI Collaboration

Explainable planning is widely accepted as a prerequisite for autonomous agents to successfully work with humans. While there has been a lot of research on generating explanations of solutions to planning problems, explaining the absence of solutions remains an open and under-studied problem, even though such situations can be the hardest to understand or debug. In this paper, we show that hierarchical abstractions can be used to efficiently generate reasons for unsolvability of planning problems. In contrast to related work on computing certificates of unsolvability, we show that these methods can generate compact, human-understandable reasons for unsolvability. Empirical analysis and user studies show the validity of our methods  as well as their computational efficacy on a number of benchmark planning domains.

• #5910
Balancing Explicability and Explanations in Human-Aware Planning
Tathagata Chakraborti, Sarath Sreedharan, Subbarao Kambhampati
Human-AI Collaboration

Human-aware planning involves generating plans that are explicable as well as providing explanations when such plans cannot be found. In this paper, we bring these two concepts together and show how an agent can achieve a trade-off between these two competing characteristics of a plan. In order to achieve this, we conceive a first of its kind planner MEGA that can augment the possibility of explaining a plan in the plan generation process itself. We situate our discussion in the context of recent work on explicable planning and explanation generation and illustrate these concepts in two well-known planning domains, as well as in a demonstration of a robot in a typical search and reconnaissance task. Human factor studies in the latter highlight the usefulness of the proposed approach.

• #5305
Model-Free Model Reconciliation
Sarath Sreedharan, Alberto Hernandez, Aditya Mishra, Subbarao Kambhampati
Human-AI Collaboration

Designing agents capable of explaining complex sequential decisions remains a significant open problem in human-AI interaction. Recently, there has been a lot of interest in developing approaches for generating such explanations for various decision-making paradigms. One such approach has been the idea of explanation as model-reconciliation. The framework hypothesizes that one of the common reasons for a user's confusion could be the mismatch between the user's model of the agent's task model and the model used by the agent to generate the decisions. While this is a general framework, most works that have been explicitly built on this explanatory philosophy have focused on classical planning settings where the model of user's knowledge is available in a declarative form. Our goal in this paper is to adapt the model reconciliation approach to a more general planning paradigm and discuss how such methods could be used when user models are no longer explicitly available. Specifically, we present a simple and easy to learn labeling model that can help an explainer decide what information could help achieve model reconciliation between the user and the agent with in the context of planning with MDPs.

### Wednesday 1416:30 - 17:45AI-HWB - ST: AI for Improving Human Well-Being 4 (R2)

Chair: TBD
• #5915
Controllable Neural Story Plot Generation via Reward Shaping
Pradyumna Tambwekar, Murtaza Dhuliawala, Lara J. Martin, Animesh Mehta, Brent Harrison, Mark O. Riedl
ST: AI for Improving Human Well-Being 4

Language-modeling--based approaches to story plot generation attempt to construct a plot by sampling from a language model (LM) to predict the next character, word, or sentence to add to the story. LM techniques lack the ability to receive guidance from the user to achieve a specific goal, resulting in stories that don't have a clear sense of progression and lack coherence. We present a reward-shaping technique that analyzes a story corpus and produces intermediate rewards that are backpropagated into a pre-trained LM in order to guide the model toward a given goal. Automated evaluations show our technique can create a model that generates story plots which consistently achieve a specified goal. Human-subject studies show that the generated stories have more plausible event ordering than baseline plot generation techniques.

• #5423
Governance by Glass-Box: Implementing Transparent Moral Bounds for AI Behaviour
Andrea Aler Tubella, Andreas Theodorou, Frank Dignum, Virginia Dignum
ST: AI for Improving Human Well-Being 4

Artificial Intelligence (AI) applications are being used to predict and assess behaviour in multiple domains which directly affect human well-being. However, if AI is to improve people’s lives, then people must be able to trust it, by being able to understand what the system is doing and why. Although transparency is often seen as the requirement in this case, realistically it might not always be possible, whereas the need to ensure that the system operates within set moral bounds remains. In this paper, we present an approach to evaluate the moral bounds of an AI system based on the monitoring of its inputs and outputs. We place a ‘Glass-Box’ around the system by mapping moral values into explicit verifiable norms that constrain inputs and outputs, in such a way that if these remain within the box we can guarantee that the system adheres to the value. The focus on inputs and outputs allows for the verification and comparison of vastly different intelligent systems; from deep neural networks to agent-based systems. The explicit transformation of abstract moral values into concrete norms brings great benefits in terms of explainability; stakeholders know exactly how the system is interpreting and employing relevant abstract moral human values and calibrate their trust accordingly. Moreover, by operating at a higher level we can check the compliance of the system with different interpretations of the same value.

• #4679
AI-powered Posture Training: Application of Machine Learning in Sitting Posture Recognition Using the LifeChair Smart Cushion
Katia Bourahmoune, Toshiyuki Amagasa
ST: AI for Improving Human Well-Being 4

Humans spend on average more than half of their day sitting down. The ill-effects of poor sitting posture and prolonged sitting on physical and mental health have been extensively studied, and solutions for curbing this sedentary epidemic have received special attention in recent years. With the recent advances in sensing technologies and Artificial Intelligence (AI), sitting posture monitoring and correction is one of the key problems to address for enhancing human well-being using AI. We present the application of a sitting posture training smart cushion called LifeChair that combines a novel pressure sensing technology, a smartphone app interface and machine learning (ML) for real-time sitting posture recognition and seated stretching guidance. We present our experimental design for sitting posture and stretch pose data collection using our posture training system. We achieved an accuracy of 98.93% in detecting more than 13 different sitting postures using a fast and robust supervised learning algorithm. We also establish the importance of taking into account the divergence in user body mass index in posture monitoring. Additionally, we present the first ML-based human stretch pose recognition system for pressure sensor data and show its performance in classifying six common chair-bound stretches.

• #4282
Improving Customer Satisfaction in Bike Sharing Systems through Dynamic Repositioning
Supriyo Ghosh, Jing Yu Koh, Patrick Jaillet
ST: AI for Improving Human Well-Being 4

In bike sharing systems (BSSs), the uncoordinated movements of customers using bikes lead to empty or congested stations, which causes a significant loss in customer demand. In order to reduce the lost demand, a wide variety of existing research has employed a fixed set of historical demand patterns to design efficient bike repositioning solutions. However, the progress remains slow in understanding the underlying uncertainties in demand and designing proactive robust bike repositioning solutions. To bridge this gap, we propose a dynamic bike repositioning approach based on a probabilistic satisficing method which uses the uncertain demand parameters that are learnt from historical data. We develop a novel and computationally efficient mixed integer linear program for maximizing the probability of satisfying the uncertain demand so as to improve the overall customer satisfaction and efficiency of the system. Extensive experimental results from a simulation model built on a real-world bike sharing data set demonstrate that our approach is not only robust to uncertainties in customer demand, but also outperforms the existing state-of-the-art repositioning approaches in terms of reducing the expected lost demand.

• #3024
Evaluating the Interpretability of the Knowledge Compilation Map: Communicating Logical Statements Effectively
Serena Booth, Christian Muise, Julie Shah
ST: AI for Improving Human Well-Being 4

Knowledge compilation techniques translate propositional theories into equivalent forms to increase their computational tractability. But, how should we best present these propositional theories to a human? We analyze the standard taxonomy of propositional theories for relative interpretability across three model domains: highway driving, emergency triage, and the chopsticks game. We generate decision-making agents which produce logical explanations for their actions and apply knowledge compilation to these explanations. Then, we evaluate how quickly, accurately, and confidently users comprehend the generated explanations. We find that domain, formula size, and negated logical connectives significantly affect comprehension while formula properties typically associated with interpretability are not strong predictors of human ability to comprehend the theory.

### Wednesday 1416:30 - 17:45AMS|FVVS - Formal Verification, Validation and Synthesis (R5)

Chair: TBD
• #758
Probabilistic Strategy Logic
Benjamin Aminof, Marta Kwiatkowska, Bastien Maubert, Aniello Murano, Sasha Rubin
Formal Verification, Validation and Synthesis

We introduce Probabilistic Strategy Logic, an extension of Strategy Logic for stochastic systems. The logic has probabilistic terms that allow it to express many standard solution concepts, such as Nash equilibria in randomised strategies, as well as constraints on probabilities, such as independence. We study the model-checking problem for agents with perfect- and imperfect-recall. The former is undecidable, while the latter is decidable in space exponential in the system and triple-exponential in the formula. We identify a natural fragment of the logic, in which every temporal operator is immediately preceded by a probabilistic operator, and show that it is decidable in space exponential in the system and the formula, and double-exponential in the nesting depth of the probabilistic terms. Taking a fixed nesting depth, this gives a fragment that still captures many standard solution concepts, and is decidable in exponential space.

• #2732
A Probabilistic Logic for Resource-Bounded Multi-Agent Systems
Hoang Nga Nguyen, Abdur Rakib
Formal Verification, Validation and Synthesis

Resource-bounded alternating-time temporal logic (RB-ATL), an extension of Coalition Logic (CL) and Alternating-time Temporal Logic (ATL), allows reasoning about resource requirements of coalitions in concurrent systems. However, many real-world systems are inherently probabilistic as well as resource-bounded, and there is no straightforward way of reasoning about their unpredictable behaviours. In this paper, we propose a logic for reasoning about coalitional power under resource constraints in the probabilistic setting. We extend RB-ATL with probabilistic reasoning and provide a standard algorithm for the model-checking problem of the resulting logic Probabilistic Resource-Bounded ATL (pRB-ATL).

• #4929
Reasoning about Quality and Fuzziness of Strategic Behaviours
Patricia Bouyer, Orna Kupferman, Nicolas Markey, Bastien Maubert, Aniello Murano, Giuseppe Perelli
Formal Verification, Validation and Synthesis

We introduce and study SL[F], a quantitative extension of SL (Strategy Logic), one of the most natural and expressive logics describing strategic behaviours. The satisfaction value of an SL[F] formula is a real value in [0,1], reflecting how much'' or how well'' the strategic on-going objectives of the underlying agents are satisfied. We demonstrate the applications of SL[F] in quantitative reasoning about multi-agent systems, by showing how it can express concepts of stability in multi-agent systems, and how it generalises some fuzzy temporal logics. We also provide a model-checking algorithm for ourlogic, based on a quantitative extension of Quantified CTL*.

• #2420
Decidability of Model Checking Multi-Agent Systems with Regular Expressions against Epistemic HS Specifications
Jakub Michaliszyn, Piotr Witkowski
Formal Verification, Validation and Synthesis

Epistemic Halpern-Shoham logic (EHS) is an interval temporal logic defined to verify properties of Multi-Agent Systems. In this paper we show that the model checking Multi-Agent Systems with regular expressions against the EHS specifications is decidable. We achieve this by reducing the model checking problem to the satisfiability problem of Monadic Second-Order Logic on trees.

• #6364
Demystifying the Combination of Dynamic Slicing and Spectrum-based Fault Localization
Sofia Reis, Rui Abreu, Marcelo d'Amorim
Formal Verification, Validation and Synthesis

Several approaches have been proposed to reduce debugging costs through automated software fault diagnosis. Dynamic Slicing (DS) and Spectrum-based Fault Localization (SFL) are popular fault diagnosis techniques and normally seen as complementary. This paper reports on a comprehensive study to reassess the effects of combining DS with SFL. With this combination, components that are often involved in failing but seldom in passing test runs could be located and their suspiciousness reduced. Results show that the DS-SFL combination, coined as Tandem-FL, improves the diagnostic accuracy up to 73.7% (13.4% on average). Furthermore, results indicate that the risk of missing faulty statements, which is a DS?s key limitation, is not high ? DS misses faulty statements in 9% of the 260 cases. To sum up, we found that the DS-SFL combination was practical and effective and encourage new SFL techniques to be evaluated against that optimization.

### Wednesday 1416:30 - 17:45HSGP|GPML - Game Playing and Machine Learning (R6)

Chair: TBD
• #779
DeltaDou: Expert-level Doudizhu AI through Self-play
Qiqi Jiang, Kuangzheng Li, Boyao Du, Hao Chen, Hai Fang
Game Playing and Machine Learning

Artificial Intelligence has seen several breakthroughs in two-player perfect information game.  Nevertheless, Doudizhu, a three-player imperfect information game, is still quite challenging.  In this paper, we present a Doudizhu AI by applying deep reinforcement learning from games of self-play.  The algorithm combines an asymmetric MCTS on nodes of information set of each player, a policy-value network that approximates the policy and value on each decision node, and inference on unobserved hands of other players by given policy.  Our results show that self-play can significantly improve the performance of our agent in this multi-agent imperfect information game.  Even starting with a weak AI, our agent can achieve human expert level after days of self-play and training.

• #3831
An Evolution Strategy with Progressive Episode Lengths for Playing Games
Lior Fuks, Noor Awad, Frank Hutter, Marius Lindauer
Game Playing and Machine Learning

Recently, Evolution Strategies (ES) have been successfully applied to solve problems commonly addressed by reinforcement learning (RL). Due to the simplicity of ES approaches, their runtime is often dominated by the RL-task at hand (e.g., playing a game). In this work, we introduce Progressive Episode Lengths (PEL) as a new technique and incorporate it with ES. The main objective is to allow the agent to play short and easy tasks with limited lengths, and then use the gained knowledge to further solve long and hard tasks with progressive lengths. Hence allowing the agent to perform many function evaluations and find a good solution for short time horizons before adapting the strategy to tackle larger time horizons. We evaluated PEL on a subset of Atari games from OpenAI Gym, showing that it can substantially improve the optimization speed, stability and final score of canonical ES. Specifically, we show average improvements of 80% (32%) after 2 hours (10 hours) compared to canonical ES.

• #5042
Playing Card-Based RTS Games with Deep Reinforcement Learning
Tianyu Liu, Zijie Zheng, Hongchang Li, Kaigui Bian, Lingyang Song
Game Playing and Machine Learning

Game AI is of great importance as games are simulations of reality. Recent research on game AI has shown much progress in various kinds of games, such as console games, board games and MOBA games. However, the exploration in RTS games remains a challenge for their huge state space, imperfect information, sparse rewards and various strategies. Besides, the typical card-based RTS games have complex card features and are still lacking solutions. We present a deep model SEAT (selection-attention) to play card-based RTS games. The SEAT model includes two parts, a selection part for card choice and an attention part for card usage, and it learns from scratch via deep reinforcement learning. Comprehensive experiments are performed on Clash Royale, a popular mobile card-based RTS game. Empirical results show that the SEAT model agent makes it to reach a high winning rate against rule-based agents and decision-tree-based agent.

• #3965
Learning Deep Decentralized Policy Network by Collective Rewards for Real-Time Combat Game
Peixi Peng, Junliang Xing, Lili Cao, Lisen Mu, Chang Huang
Game Playing and Machine Learning

The task of real-time combat game is to coordinate multiple units to defeat their enemies controlled by the given opponent in a real-time combat scenario. It is difficult to design a high-level Artificial Intelligence (AI) program for such a task due to its extremely large state-action space and real-time requirements. This paper formulates this task as a collective decentralized partially observable Markov decision process, and designs a Deep Decentralized Policy Network (DDPN) to model the polices. To train DDPN effectively, a novel two-stage learning algorithm is proposed which combines imitation learning from opponent and reinforcement learning by no-regret dynamics. Extensive experimental results on various combat scenarios indicate that proposed method can defeat different opponent models and significantly outperforms many state-of-the-art approaches.

• #751
Procedural Generation of Initial States of Sokoban
Dâmaris Bento, Andre Grahl Pereira, Levi Lelis
Game Playing and Machine Learning

Procedural generation of initial states of state-space search problems have applications in human and machine learning as well as in the evaluation of planning systems. In this paper we deal with the task of generating hard and solvable initial states of Sokoban puzzles. We propose hardness metrics based on pattern database heuristics and the use of novelty to improve the exploration of search methods in the task of generating initial states. We then present a system called Beta that uses our hardness metrics and novelty to generate initial states. Experiments show that Beta is able to generate initial states that are harder to solve by a specialized solver than initial states designed by human experts.

### Wednesday 1416:30 - 17:45UAI|SDM - Sequential Decision Making (R14)

Chair: TBD
• #280
Thompson Sampling on Symmetric Alpha-Stable Bandits
Abhimanyu Dubey, Alex Sandy' Pentland
Sequential Decision Making

Thompson Sampling provides an efficient technique to introduce prior knowledge in the multi-armed bandit problem, along with providing remarkable empirical performance. In this paper, we revisit the Thompson Sampling algorithm under rewards drawn from symmetric alpha-stable distributions, which are a class of heavy-tailed probability distributions utilized in finance and economics, in problems such as modeling stock prices and human behavior. We present an efficient framework for posterior inference, which leads to two algorithms for Thompson Sampling in this setting. We prove finite-time regret bounds for both algorithms, and demonstrate through a series of experiments the stronger performance of Thompson Sampling in this setting. With our results, we provide an exposition of symmetric alpha-stable distributions in sequential decision-making, and enable sequential Bayesian inference in applications from diverse fields in finance and complex systems that operate on heavy-tailed features.

• #3937
ISLF: Interest Shift and Latent Factors Combination Model for Session-based Recommendation
Jing Song, Hong Shen, Zijing Ou, Junyi Zhang, Teng Xiao, Shangsong Liang
Sequential Decision Making

Session-based recommendation is a challenging problem due to the inherent uncertainty of user behavior and the limited historical click information. Latent factors and the complex dependencies within the user’s current session have an important impact on the user's main intention, but the existing methods do not explicitly consider this point. In this paper, we propose a novel model, Interest Shift and Latent Factors Combination Model (ISLF), which can capture the user's main intention by taking into account the user’s interest shift (i.e. long-term and short-term interest) and latent factors simultaneously. In addition, we experimentally give an explicit explanation of this combination in our ISLF. Our experimental results on three benchmark datasets show that our model achieves state-of-the-art performance on all test datasets.

• #6516
Automated Negotiation with Gaussian Process-based Utility Models
Haralambie Leahu, Michael Kaisers, Tim Baarslag
Sequential Decision Making

Designing agents that can efficiently learn and integrate user's preferences into decision making processes is a key challenge in automated negotiation. While accurate knowledge of user preferences is highly desirable, eliciting the necessary information might be rather costly, since frequent user interactions may cause inconvenience. Therefore, efficient elicitation strategies (minimizing elicitation costs) for inferring relevant information are critical. We introduce a stochastic, inverse-ranking utility model compatible with the Gaussian Process preference learning framework and integrate it into a (belief) Markov Decision Process paradigm which formalizes automated negotiation processes with incomplete information. Our utility model, which naturally maps ordinal preferences (inferred from the user) into (random) utility values (with the randomness reflecting the underlying uncertainty), provides the basic quantitative modeling ingredient for automated (agent-based) negotiation.

• #759
AdaLinUCB: Opportunistic Learning for Contextual Bandits
Xueying Guo, Xiaoxiao Wang, Xin Liu
Sequential Decision Making

In this paper, we propose and study opportunistic contextual bandits - a special case of contextual bandits where the exploration cost varies under different environmental conditions, such as network load or return variation in recommendations. When the exploration cost is low, so is the actual regret of pulling a sub-optimal arm (e.g., trying a suboptimal recommendation). Therefore, intuitively, we could explore more when the exploration cost is relatively low and exploit more when the exploration cost is relatively high. Inspired by this intuition, for opportunistic contextual bandits with Linear payoffs, we propose an Adaptive Upper-Confidence-Bound algorithm (AdaLinUCB) to adaptively balance the exploration-exploitation trade-off for opportunistic learning. We prove that AdaLinUCB achieves O((log T)^2) problem-dependent regret upper bound, which has a smaller coefficient than that of the traditional LinUCB algorithm. Moreover, based on both synthetic and real-world dataset, we show that AdaLinUCB significantly outperforms other contextual bandit algorithms, under large exploration cost fluctuations.

• #1724
Exact Bernoulli Scan Statistics using Binary Decision Diagrams
Masakazu Ishihata, Takanori Maehara
Sequential Decision Making

In combinatorial statistics, we are interested in a statistical test of combinatorial correlation, i.e., existence a subset from an underlying combinatorial structure such that the observation is large on the subset. The combinatorial scan statistics has been proposed for such a statistical test; however, it is not commonly used in practice because of its high computational cost. In this study, we restrict our attention to the case that the number of data points is moderately small (e.g., 50), the outcome is binary, and the underlying combinatorial structure is represented by a zero-suppressed binary decision diagram (ZDD), and consider the problem of computing the p-value of the combinatorial scan statistics exactly. First, we prove that this problem is a #P-hard problem. Then, we propose a practical algorithm that solves the problem. Here, the algorithm constructs a binary decision diagram (BDD) for a set of realizations of the random variables by a dynamic programming on the ZDD, and computes the p-value by a dynamic programming on the BDD. We conducted experiments to evaluate the performance of the proposed algorithm using real-world datasets.

### Wednesday 1416:30 - 17:45CSAT|CS 1 - Constraint Satisfaction 1 (R15)

Chair: TBD
• #2450
Phase Transition Behavior of Cardinality and XOR Constraints
Yash Pote, Saurabh Joshi, Kuldeep S. Meel
Constraint Satisfaction 1

The runtime performance of modern SAT solvers is deeply connected to the phase transition behavior of CNF formulas. While CNF solving has witnessed significant runtime improvement over the past two decades, the same does not hold for several other classes such as the conjunction of cardinality and XOR constraints, denoted as CARD-XOR formulas. The problem of determining satisfiability of CARD-XOR formulas is a fundamental problem with wide variety of applications ranging from discrete integration in the field of artificial intelligence to maximum likelihood decoding in coding theory. The runtime behavior of random CARD-XOR formulas is unexplored in prior work. In this paper, we present the first rigorous empirical study to characterize the runtime behavior of 1-CARD-XOR formulas. We show empirical evidence of a surprising phase-transition that follows a non-linear tradeoff between CARD and XOR constraints.

• #3226
DoubleLex Revisited and Beyond
Xuming Huang, Jimmy Lee
Constraint Satisfaction 1

The paper proposes Maximum Residue (MR) as a notion to evaluate the strength of a symmetry breaking method. We give a proof to improve the best known DoubleLex MR upper bound from m!n! - (m!+n!) to min(m!,n!) for an m x n matrix model. Our result implies that DoubleLex works well on matrix models where min(m, n) is relatively small. We further study the MR bounds of SwapNext and SwapAny, which are extensions to DoubleLex breaking further a small number of composition symmetries. Such theoretical comparisons suggest general principles on selecting Lex-based symmetry breaking methods based on the dimensions of the matrix models. Our experiments confirm the theoretical predictions as well as efficiency of these methods.

• #4365
Constraint-Based Scheduling With Complex Setup Operations: An Iterative Two-Layer Approach
Adriana Pacheco, Cédric Pralet, Stéphanie Roussel
Constraint Satisfaction 1

In this paper, we consider scheduling problems involving resources that must perform complex setup operations between the tasks they realize. To deal with such problems, we introduce a simple yet efficient iterative two-layer decision process that alternates between the fast synthesis of high-level schedules based on a coarse-grain model of setup operations, and the production of detailed schedules based on a fine-grain model. Experiments realized on representative benchmarks of a multi-robot application show the efficiency of the approach.

• #5034
How to Tame Your Anticipatory Algorithm
Allegra De Filippo, Michele Lombardi, Michela Milano
Constraint Satisfaction 1

Sampling-based anticipatory algorithms can be very effective at solving online optimization problems under uncertainty, but their computational cost may be prohibitive in some cases. Given an arbitrary anticipatory algorithm, we present three methods that allow to retain its solution quality at a fraction of the online computational cost, via a substantial degree of offline preparation. Our approaches are obtained by combining: 1) a simple technique to identify likely future outcomes based on past observations; 2) the (expensive) offline computation of a "contingency table"; and 3) an efficient solution-fixing heuristic. We ground our techniques on two case studies: an energy management system with uncertain renewable generation and load demand, and a traveling salesman problem with uncertain travel times. In both cases, our techniques achieve high solution quality, while substantially reducing the online computation time.

• #4738
Constraint Programming for Mining Borders of Frequent Itemsets
Mohamed-Bachir Belaid, Christian Bessiere, Nadjib Lazaar
Constraint Satisfaction 1

Frequent itemset mining is one of the most studied tasks in knowledge discovery. It is often reduced to mining the positive border of frequent itemsets, i.e. maximal frequent itemsets. Infrequent itemset mining, on the other hand, can be reduced to mining the negative border, i.e. minimal infrequent itemsets. We propose a generic framework based on constraint programming to mine both borders of frequent itemsets.One can easily decide which border to mine by setting a simple parameter. For this, we introduce two new global constraints, FREQUENTSUBS and INFREQUENTSUPERS, with complete polynomial propagators. We then consider the problem of mining borders with additional constraints. We prove that this problem is coNP-hard, ruling out the hope for the existence of a single CSP solving this problem (unless coNP ⊆ NP).

### Wednesday 1416:30 - 18:00ML|DL - Deep Learning 5 (R3)

Chair: TBD
• #4173
One-Shot Texture Retrieval with Global Context Metric
Kai Zhu, Wei Zhai, Zheng-Jun Zha, Yang Cao
Deep Learning 5

In this paper, we tackle one-shot texture retrieval: given an example of a new reference texture, detect and segment all the pixels of the same texture category within an arbitrary image. To address this problem, we present an OS-TR network to encoding both reference patch and query image, leading to achieve texture segmentation towards the reference category. Unlike the existing texture encoding methods that integrate CNN with orderless pooling, we propose a directionality-aware network to capture the texture variations at each direction, resulting in spatially invariant representation. To segment new categories given only few examples, we incorporate a self-gating mechanism into relation network to exploit global context information for adjusting per-channel modulation weights of local relation features. Extensive experiments on benchmark texture datasets and real scenarios demonstrate the above-par segmentation performance and robust generalization across domains of our proposed method.

• #4320
STG2Seq: Spatial-Temporal Graph to Sequence Model for Multi-step Passenger Demand Forecasting
Lei Bai, Lina Yao, Salil S. Kanhere, Xianzhi Wang, Quan Z. Sheng
Deep Learning 5

Multi-step passenger demand forecasting is a crucial task in on-demand vehicle sharing services. However, predicting passenger demand is generally challenging due to the nonlinear and dynamic spatial-temporal dependencies. In this work, we propose to model multi-step citywide passenger demand prediction based on a graph and use a hierarchical graph convolutional structure to capture both spatial and temporal correlations simultaneously. Our model consists of three parts: 1) a long-term encoder to encode historical passenger demands; 2) a short-term encoder to derive the next-step prediction for generating multi-step prediction; 3) an attention-based output module to model the dynamic temporal and channel-wise information. Experiments on three real-world datasets show that our model consistently outperforms many baseline methods and state-of-the-art models.

• #5314
Omnidirectional Scene Text Detection with Sequential-free Box Discretization
Yuliang Liu, Sheng Zhang, Lianwen Jin, Lele Xie, Yaqiang Wu, Zhepeng Wang
Deep Learning 5

Scene text in the wild is commonly presented with high variant characteristics. Using quadrilateral bounding box to localize the text instance is nearly indispensable for detection methods. However, recent researches reveal that introducing quadrilateral bounding box for scene text detection will bring a label confusion issue which is easily overlooked, and this issue may significantly undermine the detection performance. To address this issue, in this paper, we propose a novel method called Sequential-free Box Discretization (SBD) by discretizing the bounding box into key edges (KE) which can further derive more effective methods to improve detection performance. Experiments showed that the proposed method can outperform state-of-the-art methods in many popular scene text benchmarks, including ICDAR 2015, MLT, and MSRA-TD500. Ablation study also showed that simply integrating the SBD into Mask R-CNN framework, the detection performance can be substantially improved. Furthermore, an experiment on the general object dataset HRSC2016 (multi-oriented ships) showed that our method can outperform recent state-of-the-art methods by a large margin, demonstrating its powerful generalization ability.

• #6146
Multi-Group Encoder-Decoder Networks to Fuse Heterogeneous Data for Next-Day Air Quality Prediction
Yawen Zhang, Qin Lv, Duanfeng Gao, Si Shen, Robert Dick, Michael Hannigan, Qi Liu
Deep Learning 5

Accurate next-day air quality prediction is essential to enable warning and prevention measures for cities and individuals to cope with potential air pollution, such as vehicle restriction, factory shutdown, and limiting outdoor activities. The problem is challenging because air quality is affected by a diverse set of complex factors. There has been prior work on short-term (e.g., next 6 hours) prediction, however, there is limited research on modeling local weather influences or fusing heterogeneous data for next-day air quality prediction. This paper tackles this problem through three key contributions: (1) we leverage multi-source data, especially high-frequency grid-based weather data, to model air pollutant dynamics at station-level; (2) we add convolution operators on grid weather data to capture the impacts of various weather parameters on air pollutant variations; and (3) we automatically group (cross-domain) features based on their correlations, and propose multi-group Encoder-Decoder networks (MGED-Net) to effectively fuse multiple feature groups for next-day air quality prediction. The experiments with real-world data demonstrate the improved prediction performance of MGED-Net over state-of-the-art solutions (4.2% to 9.6% improvement in Mean Absolute Error and 9.2% to 16.4% improvement in Root Mean Squared Error).

• #110
Multi-Prototype Networks for Unconstrained Set-based Face Recognition
Jian Zhao, Jianshu Li, Xiaoguang Tu, Fang Zhao, Yuan Xin, Junliang Xing, Hengzhu Liu, Shuicheng Yan, Jiashi Feng
Deep Learning 5

In this paper, we address the challenging unconstrained set-based face recognition problem where each subject face is instantiated by a set of media (images and videos) instead of a single image. Naively aggregating information from all the media within a set would suffer from the large intra-set variance caused by heterogeneous factors (e.g., varying media modalities, poses and illumination) and fail to learn discriminative face representations. A novel Multi-Prototype Network (MP- Net) model is thus proposed to learn multiple prototype face representations adaptively from the media sets. Each learned prototype is representative for the subject face under certain condition in terms of pose, illumination and media modality. Instead of handcrafting the set partition for prototype learn- ing, MPNet introduces a Dense SubGraph (DSG) learning sub-net that implicitly untangles inconsistent media and learns a number of representative prototypes. Qualitative and quantitative experiments clearly demonstrate the superiority of the proposed model over state-of-the-arts.

• #420
A Part Power Set Model for Scale-Free Person Retrieval
Yunhang Shen, Rongrong Ji, Xiaopeng Hong, Feng Zheng, Xiaowei Guo, Yongjian Wu, Feiyue Huang
Deep Learning 5

Recently, person re-identification (re-ID) has attracted increasing research attention, which has broad application prospects in video surveillance and beyond. To this end, most existing methods highly relied on well-aligned pedestrian images and hand-engineered part-based model on the coarsest feature map. In this paper, to lighten the restriction of such fixed and coarse input alignment, an end-to-end part power set model with multi-scale features is proposed, which captures the discriminative parts of pedestrians from global to local, and from coarse to fine, enabling part-based scale-free person re-ID. In particular, we first factorize the visual appearance by enumerating $k$-combinations for all $k$ of $n$ body parts to exploit rich global and partial information to learn discriminative feature maps. Then, a combination ranking module is introduced to guide the model training with all combinations of body parts, which alternates between ranking combinations and estimating an appearance model. To enable scale-free input, we further exploit the pyramid architecture of deep networks to construct multi-scale feature maps with a feasible amount of extra cost in term of memory and time. Extensive experiments on the mainstream evaluation datasets, including Market-1501, DukeMTMC-reID and CUHK03, validate that our method achieves the state-of-the-art performance.

• #2668
Deliberation Learning for Image-to-Image Translation
Tianyu He, Yingce Xia, Jianxin Lin, Xu Tan, Di He, Tao Qin, Zhibo Chen
Deep Learning 5

Image-to-image translation, which transfers an image from a source domain to a target one, has attracted much attention in both academia and industry. The major approach is to adopt an encoder-decoder based framework, where the encoder extracts features from the input image and then the decoder decodes the features and generates an image in the target domain as the output. In this paper, we go beyond this learning framework by considering an additional polishing step on the output image. Polishing an image is very common in human's daily life, such as editing and beautifying a photo in Photoshop after taking/generating it by a digital camera. Such a deliberation process is shown to be very helpful and important in practice and thus we believe it will also be helpful for image translation. Inspired by the success of deliberation network in natural language processing, we extend deliberation process to the field of image translation. We verify our proposed method on four two-domain translation tasks and one multi-domain translation task. Both the qualitative and quantitative results demonstrate the effectiveness of our method.

### Wednesday 1416:30 - 18:00ML|RL - Reinforcement Learning 4 (R4)

Chair: TBD
• #1900
Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces
Haotian Fu, Hongyao Tang, Jianye Hao, Zihan Lei, Yingfeng Chen, Changjie Fan
Reinforcement Learning 4

Deep Reinforcement Learning (DRL) has been applied to address a variety of cooperative multi-agent problems with either discrete action spaces or continuous action spaces. However, to the best of our knowledge, no previous work has ever succeeded in applying DRL to multi-agent problems with discrete-continuous hybrid (or parameterized) action spaces which is very common in practice. Our work fills this gap by proposing two novel algorithms: Deep Multi-Agent Parameterized Q-Networks (Deep MAPQN) and Deep Multi-Agent Hierarchical Hybrid Q-Networks (Deep MAHHQN). We follow the centralized training but decentralized execution paradigm: different levels of communication between different agents are used to facilitate the training process, while each agent executes its policy independently based on local observations during execution. Our empirical results on several challenging tasks (simulated RoboCup Soccer and game Ghost Story) show that both Deep MAPQN and Deep MAHHQN are effective and significantly outperform existing independent deep parameterized Q-learning method.

• #2930
Imitation Learning from Video by Leveraging Proprioception
Faraz Torabi, Garrett Warnell, Peter Stone
Reinforcement Learning 4

Classically, imitation learning algorithms have been developed for idealized situations, e.g., the demonstrations are often required to be collected in the exact same environment and usually include the demonstrator's actions. Recently, however, the research community has begun to address some of these shortcomings by offering algorithmic solutions that enable imitation learning from observation (IfO), e.g., learning to perform a task from visual demonstrations that may be in a different environment and do not include actions. Motivated by the fact that agents often also have access to their own internal states (i.e., proprioception), we propose and study an IfO algorithm that leverages this information in the policy learning process. The proposed architecture learns policies over proprioceptive state representations and compares the resulting trajectories visually to the demonstration data. We experimentally test the proposed technique on several MuJoCo domains and show that it outperforms other imitation from observation algorithms by a large margin.

• #4404
Playing FPS Games With Environment-Aware Hierarchical Reinforcement Learning
Shihong Song, Jiayi Weng, Hang Su, Dong Yan, Haosheng Zou, Jun Zhu
Reinforcement Learning 4

Learning rational behaviors in First-person-shooter (FPS) games is a challenging task for Reinforcement Learning (RL) with the primary difficulties of huge action space and insufficient exploration. To address this, we propose a hierarchical agent based on combined options with intrinsic rewards to drive exploration. Specifically, we present a hierarchical model that works in a manager-worker fashion over two levels of hierarchy. The high-level manager learns a policy over options, and the low-level workers, motivated by intrinsic reward, learn to execute the options. Performance is further improved with environmental signals appropriately harnessed. Extensive experiments demonstrate that our trained bot significantly outperforms the alternative RL-based models on FPS games requiring maze solving and combat skills, etc. Notably, we achieved first place in VDAIC 2018 Track(1).

• #5088
DeepMellow: Removing the Need for a Target Network in Deep Q-Learning
Seungchan Kim, Kavosh Asadi, Michael Littman, George Konidaris
Reinforcement Learning 4

Deep Q-Network (DQN) is an algorithm that achieves human-level performance in complex domains like Atari games. One of the important elements of DQN is its use of a target network, which is necessary to stabilize learning. We argue that using a target network is incompatible with online reinforcement learning, and it is possible to achieve faster and more stable learning without a target network when we use Mellowmax, an alternative softmax operator. We derive novel properties of Mellowmax, and empirically show that the combination of DQN and Mellowmax, but without a target network, outperforms DQN with a target network.

• #5236
On Principled Entropy Exploration in Policy Optimization
Jincheng Mei, Chenjun Xiao, Ruitong Huang, Dale Schuurmans, Martin Müller
Reinforcement Learning 4

In this paper, we investigate Exploratory Conservative Policy Optimization (ECPO), a policy optimization strategy that improves exploration behavior while assuring monotonic progress in a principled objective. ECPO conducts maximum entropy exploration within a mirror descent framework, but updates policies using reversed KL projection. This formulation bypasses undesirable mode seeking behavior and avoids premature convergence to sub-optimal policies, while still supporting strong theoretical properties such as guaranteed policy improvement. Experimental evaluations demonstrate that the proposed method significantly improves practical exploration and surpasses the empirical performance of state-of-the art policy optimization methods in a set of benchmark tasks.

• #1874
Automatic Successive Reinforcement Learning with Multiple Auxiliary Rewards
Zhao-Yang Fu, De-Chuan Zhan, Xin-Chun Li, Yi-Xing Lu
Reinforcement Learning 4

Reinforcement learning has played an important role in decision making related applications, e.g., robotics motion, self-driving, recommendation, etc. The reward function, as a crucial component, affects the efficiency and effectiveness of reinforcement learning to a large extent. In this paper, we focus on the investigation of reinforcement learning with more than one auxiliary reward. It is found that different auxiliary rewards can boost up the learning rate and effectiveness in different stages, and consequently we propose the Automatic Successive Reinforcement Learning (ASR) for auxiliary rewards grading selection for efficient reinforcement learning by stages. Experiments and simulations have shown the superiority of our proposed ASR on a range of environments, including OpenAI classical control domains and video games; Freeway and Catcher.

### Wednesday 1416:30 - 18:00ML|OL - Online Learning 2 (R7)

Chair: TBD
• #744
Indirect Trust is Simple to Establish
Elham Parhizkar, Mohammad Hossein Nikravan, Sandra Zilles
Online Learning 2

In systems with multiple potentially deceptive agents, any single agent may have to assess the trustworthiness of other agents in order to decide with which agents to interact. In this context, indirect trust refers to trust established through third-party advice. Since the advisers themselves may be deceptive or unreliable, agents need a mechanism to assess and properly incorporate advice. We evaluate existing state-of-the-art methods for computing indirect trust in numerous simulations, demonstrating that the best ones tend to be of prohibitively large complexity. We propose a new and easy to implement method for computing indirect trust, based on a simple prediction with expert advice strategy as is often used in online learning. This method either competes with or outperforms all tested systems in the vast majority of the settings we simulated, while scaling substantially better. Our results demonstrate that existing systems for computing indirect trust are overly complex; the problem can be solved much more efficiently than the literature suggests.

• #972
Optimal Exploitation of Clustering and History Information in Multi-armed Bandit
Djallel Bouneffouf, Srinivasan Parthasarathy, Horst Samulowitz, Martin Wistuba
Online Learning 2

We consider the stochastic multi-armed bandit problem and the contextual bandit problem with historical observations and pre-clustered arms. The historical observations can contain any number of instances for each arm, and the pre-clustering information is a fixed clustering of arms provided as part of the input. We develop a variety of algorithms which incorporate this offline information effectively during the online exploration phase and derive their regret bounds. In particular, we develop the META algorithm which effectively hedges between two other algorithms: one which uses both historical observations and clustering, and another which uses only the historical observations. The former outperforms the latter when the clustering quality is good, and vice-versa. Extensive experiments on synthetic and real world datasets on Warafin drug dosage and web server selectionfor latency minimization validate our theoretical insights and demonstrate that META is a robust strategy for optimally exploiting the pre-clustering information.

• #2702
Unsupervised Hierarchical Temporal Abstraction by Simultaneously Learning Expectations and Representations
Katherine Metcalf, David Leake
Online Learning 2

This paper presents a new approach to unsupervised extraction and hierarchical abstraction of concepts from data streams. The algorithm presented, ENHAnCE, simultaneously learns a predictive model of the input stream and generates representations of the concepts being observed. Following cognitively-inspired models of event segmentation, ENHAnCE uses expectation violations to identify boundaries between temporally extended patterns. ENHAnCE applies its expectation-driven process at multiple levels of temporal granularity; each level takes as input the concept embeddings from the previous level. ENHAnCE produces a hierarchy of predictive models that enable it to identify concepts at multiple levels of temporal abstraction. An evaluation shows that the temporal abstraction hierarchies generated by ENHAnCE closely match hand-coded hierarchies for the test data streams. Given language data streams, ENHAnCE learns a hierarchy of predictive models that capture basic units of both spoken and written language: morphemes, lexemes, phonemes, syllables, and words.

• #1113
Online Learning from Capricious Data Streams: A Generative Approach
Yi He, Baijun Wu, Di Wu, Ege Beyazit, Sheng Chen, Xindong Wu
Online Learning 2

Learning with streaming data has received extensive attention during the past few years. Existing approaches assume the feature space is fixed or changes by following explicit regularities, limiting their applicability in dynamic environments where the data streams are described by an arbitrarily varying feature space. To handle such capricious data streams, we in this paper develop a novel algorithm, named OCDS (Online learning from Capricious Data Streams), which does not make any assumption on feature space dynamics. OCDS trains a learner on a universal feature space that establishes relationships between old and new features, so that the patterns learned in the old feature space can be used in the new feature space. Specifically, the universal feature space is constructed by leveraging the relatednesses among features. We propose a generative graphical model to model the construction process, and show that learning from the universal feature space can effectively improve performance with theoretical analysis. The experimental results demonstrate that OCDS achieves conspicuous performance on synthetic and real datasets.

• #1667
Efficient Non-parametric Bayesian Hawkes Processes
Rui Zhang, Christian Walder, Marian-Andrei Rizoiu, Lexing Xie
Online Learning 2

In this paper, we develop an efficient non-parametric Bayesian estimation of the kernel function of Hawkes processes. The non-parametric Bayesian approach is important because it provides flexible Hawkes kernels and quantifies their uncertainty. Our method is based on the cluster representation of Hawkes processes. Utilizing the stationarity of the Hawkes process, we efficiently sample random branching structures and thus, we split the Hawkes process into clusters of Poisson processes. We derive two algorithms --- a block Gibbs sampler and a maximum a posteriori estimator based on expectation maximization --- and we show that our methods have a linear time complexity, both theoretically and empirically. On synthetic data, we show our methods to be able to infer flexible Hawkes triggering kernels. On two large-scale Twitter diffusion datasets, we show that our methods outperform the current state-of-the-art in goodness-of-fit and that the time complexity is linear in the size of the dataset. We also observe that on diffusions related to online videos, the learned kernels reflect the perceived longevity for different content types such as music or pets videos.

• #5712
Sketched Iterative Algorithms for Structured Generalized Linear Models
Qilong Gu, Arindam Banerjee
Online Learning 2

Recent years have seen advances in optimizing large scale statistical estimation problems. In statistical learning settings iterative optimization algorithms have been shown to enjoy geometric convergence. While powerful, such results only hold for the original dataset, and may face computational challenges when the sample size is large. In this paper, we study sketched iterative algorithms, in particular sketched-PGD (projected gradient descent) and sketched-SVRG (stochastic variance reduced gradient) for structured generalized linear model, and illustrate that these methods continue to have geometric convergence to the statistical error under suitable assumptions. Moreover, the sketching dimension is allowed to be even smaller than the ambient dimension, thus can lead to significant speed-ups. The sketched iterative algorithms introduced provide an additional dimension to study the trade-offs between statistical accuracy and time.

### Wednesday 1416:30 - 18:00ML|C - Classification 5 (R8)

Chair: TBD
• #1598
ATTAIN: Attention-based Time-Aware LSTM Networks for Disease Progression Modeling
Yuan Zhang, Xi Yang, Julie Ivy, Min Chi
Classification 5

Modeling patient disease progression using Electronic Health Records (EHRs) is critical to assist clinical decision making. Long-Short Term Memory (LSTM) is an effective model to handle sequential data, such as EHRs, but it encounters two major limitations when applied to EHRs: it is unable to interpret the prediction results and it ignores the irregular time intervals between consecutive events. To tackle these limitations, we propose an attention-based time-aware LSTM Networks (ATTAIN), to improve the interpretability of LSTM and to identify the critical previous events for current diagnosis by modeling the inherent time irregularity. We validate ATTAIN on modeling the progression of an extremely challenging disease, septic shock, by using real-world EHRs. Our results demonstrate that the proposed framework outperforms the state-of-the-art models such as RETAIN and T-LSTM. Also, the generated interpretative time-aware attention weights shed some lights on the progression behaviors of septic shock.

• #2324
Positive and Unlabeled Learning with Label Disambiguation
Chuang Zhang, Dexin Ren, Tongliang Liu, Jian Yang, Chen Gong
Classification 5

Positive and Unlabeled (PU) learning aims to learn a binary classifier from only positive and unlabeled training data. The state-of-the-art methods usually formulate PU learning as a cost-sensitive learning problem, in which every unlabeled example is simultaneously treated as positive and negative with different class weights. However, the ground-truth label of an unlabeled example should be unique, so the existing models inadvertently introduce the label noise which may lead to the biased classifier and deteriorated performance. To solve this problem, this paper  proposes a novel algorithm dubbed as "Positive and Unlabeled learning with Label Disambiguation'' (PULD). We first regard all the unlabeled examples in PU learning as ambiguously labeled as positive and negative, and then employ the margin-based label disambiguation strategy, which enlarges the margin of classifier response between the most likely label and the less likely one, to find the unique ground-truth label of each unlabeled example. Theoretically, we derive the generalization error bound of the proposed method by analyzing its Rademacher complexity. Experimentally, we conduct intensive experiments on both benchmark and real-world datasets, and the results clearly demonstrate the superiority of the proposed PULD to the existing PU learning approaches.

• #2920
Prediction of Mild Cognitive Impairment Conversion Using Auxiliary Information
Xiaofeng Zhu
Classification 5

In this paper, we propose a new feature selection method to exploit the issue of High Dimension Low Sample Size (HDLSS) for the prediction of Mild Cognitive Impairment (MCI) conversion. Specially, by regarding the Magnetic Resonance Imaging (MRI) information of MCI subjects as the target data, this paper proposes to integrate auxiliary information with the target data in a unified feature selection framework for distinguishing progressive MCI (pMCI) subjects from stable MCI (sMCI) subjects, i.e., the MCI conversion classification for short in this paper, based on their MRI information. The auxiliary information includes the Positron Emission Tomography (PET) information of the target data, the MRI information of Alzheimer’s Disease (AD) subjects and Normal Control (NC) subjects, and the ages of the target data and the AD and NC subjects. As a result, the proposed method jointly selects features from the auxiliary data and the target data by taking into account the influence of outliers and aging of these two kinds of data. Experimental results on the public data of Alzheimer’s Disease Neuroimaging Initiative (ADNI) verified the effectiveness of our proposed method, compared to three state-of-the-art feature selection methods, in terms of four classification evaluation metrics.

• #5102
Recurrent Generative Networks for Multi-Resolution Satellite Data: An Application in Cropland Monitoring
Xiaowei Jia, Mengdie Wang, Ankush Khandelwal, Anuj Karpatne, Vipin Kumar
Classification 5

Effective and timely monitoring of croplands is critical for managing food supply, water quality and energy consumption. While remote sensing data from earth-observing satellites can be used to monitor croplands over large regions, this task is challenging for small-scale croplands as they cannot be captured precisely using coarse-resolution data. On the other hand, the remote sensing data with higher resolution are collected less frequently and contain missing or disturbed data. Hence, traditional sequential models cannot be directly applied on high-resolution data to extract temporal patterns, which are essential to identify crops. In this work, we propose a generative model to combine multi-scale remote sensing data to detect croplands at high resolution. Also, we leverage the temporal patterns learned from coarse-resolution data to generate missing high-resolution data. Additionally, we progressively classify crop types using the outputs from the generative model. This enables the tracking of classification confidence in real time and potentially leads to an early detection. The evaluation in an intensively cultivated region demonstrates the superiority of the proposed method in cropland detection over multiple baselines.

• #6256
MR-GNN: Multi-Resolution and Dual Graph Neural Network for Predicting Structured Entity Interactions
Nuo Xu, Pinghui Wang, Long Chen, Jing Tao, Junzhou Zhao
Classification 5

Predicting interactions between structured entities lies at the core of numerous tasks such as drug regimen and new material design. In recent years, graph neural networks have become attractive. They represent structured entities as graphs, and then extract features from each individual graph using graph convolution operations. However, these methods have some limitations: i) their networks only extract features from a fix-sized subgraph structure (i.e., a fix-sized receptive field) of each node, and ignore features in substructures of different sizes, and ii) features are extracted by considering each entity independently, which may not effectively reflect the interaction between two entities. To resolve these problems, we present {\em MR-GNN}, an end-to-end graph neural network with the following features: i) it uses a multi-resolution based architecture to extract node features from different neighborhoods of each node, and, ii) it uses dual graph-state long short-term memory networks (LSTMs) to summarize local features of each graph and extracts the interaction features between pairwise graphs. Experiments conducted on real-world datasets show that MR-GNN improves the prediction of state-of-the-art methods.

• #535
Learning Low-precision Neural Networks without Straight-Through Estimator (STE)
Zhi-Gang Liu, Matthew Mattina
Classification 5

The Straight-Through Estimator (STE) is widely used for back-propagating gradients through the quantization function, but the STE technique lacks a complete theoretical understanding. We propose an alternative methodology called alpha-blending (AB), which quantizes neural networks to low precision using stochastic gradient descent (SGD). Our AB method avoids STE approximation by replacing the quantized weight in the loss function by an affine combination of the quantized weight w_q and the corresponding full-precision weight w with non-trainable scalar coefficient alpha and (1- alpha). During training, alpha is gradually increased from 0 to 1; the gradient updates to the weights are through the full precision term, (1-alpha) * w, of the affine combination; the model is converted from full-precision to low precision progressively. To evaluate the AB method, a 1-bit BinaryNet on CIFAR10 dataset and 8-bits, 4-bits MobileNet v1, ResNet_50 v1/2 on ImageNet are trained using the alpha-blending approach, and the evaluation indicates that AB improves top-1 accuracy by 0.9\%, 0.82\% and 2.93\% respectively compared to the results of STE based quantization.

### Wednesday 1416:30 - 18:00NLP|NLG - Natural Language Generation 1 (R9)

Chair: TBD
• #327
Difficulty Controllable Generation of Reading Comprehension Questions
Yifan Gao, Lidong Bing, Wang Chen, Michael Lyu, Irwin King
Natural Language Generation 1

We investigate the difficulty levels of questions in reading comprehension datasets such as SQuAD, and propose a new question generation setting, named Difficulty-controllable Question Generation (DQG). Taking as input a sentence in the reading comprehension paragraph and some of its text fragments (i.e., answers) that we want to ask questions about, a DQG method needs to generate questions each of which has a given text fragment as its answer, and meanwhile the generation is under the control of specified difficulty labels---the output questions should satisfy the specified difficulty as much as possible. To solve this task, we propose an end-to-end framework to generate questions of designated difficulty levels by exploring a few important intuitions. For evaluation, we prepared the first dataset of reading comprehension questions with difficulty labels. The results show that the question generated by our framework not only have better quality under the metrics like BLEU, but also comply with the specified difficulty labels.

• #835
HorNet: A Hierarchical Offshoot Recurrent Network for Improving Person Re-ID via Image Captioning
Shiyang Yan, Jun Xu, Yuai Liu, Lin Xu
Natural Language Generation 1

Person re-identification (re-ID) aims to recognize a person-of-interest across different cameras with notable appearance variance. Existing research works focused on the capability and robustness of visual representation. In this paper, instead, we propose a novel hierarchical offshoot recurrent network (HorNet) for improving person re-ID via image captioning. Image captions are semantically richer and more consistent than visual attributes, which could significantly alleviate the variance. We use the similarity preserving generative adversarial network (SPGAN) and an image captioner to fulfill domain transfer and language descriptions generation. Then the proposed HorNet can learn the visual and language representation from both the images and captions jointly, and thus enhance the performance of person re-ID. Extensive experiments are conducted on several benchmark datasets with or without image captions, i.e., CUHK03, Market-1501, and Duke-MTMC, demonstrating the superiority of the proposed method. Our method can generate and extract meaningful image captions while achieving state-of-the-art performance.

• #1898
T-CVAE: Transformer-Based Conditioned Variational Autoencoder for Story Completion
Tianming Wang, Xiaojun Wan
Natural Language Generation 1

Story completion is a very challenging task of generating the missing plot for an incomplete story, which requires not only understanding but also inference of the given contextual clues. In this paper, we present a novel conditional variational autoencoder based on Transformer for missing plot generation. Our model uses shared attention layers for encoder and decoder, which make the most of the contextual clues, and a latent variable for learning the distribution of coherent story plots. Through drawing samples from the learned distribution, diverse reasonable plots can be generated. Both automatic and manual evaluations show that our model generates better story plots than state-of-the-art models in terms of readability, diversity and coherence.

• #2108
A Dual Reinforcement Learning Framework for Unsupervised Text Style Transfer
Fuli Luo, Peng Li, Jie Zhou, Pengcheng Yang, Baobao Chang, Xu Sun, Zhifang Sui
Natural Language Generation 1

Unsupervised text style transfer aims to transfer the underlying style of text but keep its main content unchanged without parallel data. Most existing methods typically follow two steps: first separating the content from the original style, and then fusing the content with the desired style. However, the separation in the first step is challenging because the content and style interact in subtle ways in natural language. Therefore, in this paper, we propose a dual reinforcement learning framework to directly transfer the style of the text via a one-step mapping model, without any separation of content and style. Specifically, we consider the learning of the source-to-target and target-to-source mappings as a dual task, and two rewards are designed based on such a dual structure to reflect the style accuracy and content preservation, respectively. In this way, the two one-step mapping models can be trained via reinforcement learning, without any use of parallel data. Automatic evaluations show that our model outperforms the state-of-the-art systems by a large margin, especially with more than 10 BLEU points improvement averaged on two benchmark datasets. Human evaluations also validate the effectiveness of our model in terms of style accuracy, content preservation and fluency.

• #3504
Knowledgeable Storyteller: A Commonsense-Driven Generative Model for Visual Storytelling
Pengcheng Yang, Fuli Luo, Peng Chen, Lei Li, Zhiyi Yin, Xiaodong He, Xu Sun
Natural Language Generation 1

The visual storytelling (VST) task aims at generating a reasonable and coherent paragraph-level story with the image stream as input. Different from caption that is a direct and literal description of image content, the story in the VST task tends to contain plenty of imaginary concepts that do not appear in the image. This requires the AI agent to reason and associate with the imaginary concepts based on implicit commonsense knowledge to generate a reasonable story describing the image stream. Therefore, in this work, we present a commonsense-driven generative model, which aims to introduce crucial commonsense from the external knowledge base for visual storytelling. Our approach first extracts a set of candidate knowledge graphs from the knowledge base. Then, an elaborately designed vision-aware directional encoding schema is adopted to effectively integrate the most informative commonsense. Besides, we strive to maximize the semantic similarity within the output during decoding to enhance the coherence of the generated text. Results show that our approach can outperform the state-of-the-art systems by a large margin, which achieves a 35% relative improvement of CIDEr score. With additional commonsense and semantic-relevance based objective, the generated stories are more diverse and coherent.

• #6196
Utilizing Non-Parallel Text for Style Transfer by Making Partial Comparisons
Di Yin, Shujian Huang, Xin-Yu Dai, Jiajun Chen
Natural Language Generation 1

Text style transfer aims to rephrase a given sentence into a different style without changing its original content. Since parallel corpora (i.e. sentence pairs with the same content but different styles) are usually unavailable, most previous works solely guide the transfer process with distributional information, i.e. using style-related classifiers or language models, which neglect the correspondence of instances, leading to poor transfer performance, especially for the content preservation. In this paper, we propose making partial comparisons to explicitly model the content and style correspondence of instances, respectively. To train the partial comparators, we propose methods to extract partial-parallel training instances automatically from the non-parallel data, and to further enhance the training process by using data augmentation. We perform experiments that compare our method to other existing approaches on two review datasets. Both automatic and manual evaluations show that our approach can significantly improve the performance of existing adversarial methods, and outperforms most state-of-the-art models. Our code and data will be available on Github.

### Wednesday 1416:30 - 18:00CV|DDCV - 2D and 3D Computer Vision 2 (R10)

Chair: TBD
• #487
3DViewGraph: Learning Global Features for 3D Shapes from A Graph of Unordered Views with Attention
Zhizhong Han, Xiyang Wang, Chi Man Vong, Yu-Shen Liu, Matthias Zwicker, C. L. Philip Chen
2D and 3D Computer Vision 2

Learning global features by aggregating information over multiple views has been shown to be effective for 3D shape analysis. For view aggregation in deep learning models, pooling has been applied extensively. However, pooling leads to a loss of the content within views, and the spatial relationship among views, which limits the discriminability of learned features. We propose 3DViewGraph to resolve this issue, which learns 3D global features by more effectively aggregating unordered views with attention. Specifically, unordered views taken around a shape are regarded as view nodes on a view graph. 3DViewGraph first learns a novel latent semantic mapping to project low-level view features into meaningful latent semantic embeddings in a lower dimensional space, which is spanned by latent semantic patterns. Then, the content and spatial information of each pair of view nodes are encoded by a novel spatial pattern correlation, where the correlation is computed among latent semantic patterns. Finally, all spatial pattern correlations are integrated with attention weights learned by a novel attention mechanism. This further increases the discriminability of learned features by highlighting the unordered view nodes with distinctive characteristics and depressing the ones with appearance ambiguity. We show that 3DViewGraph outperforms state-of-the-art methods under three large-scale benchmarks.

• #497
Parts4Feature: Learning 3D Global Features from Generally Semantic Parts in Multiple Views
Zhizhong Han, Xinhai Liu, Yu-Shen Liu, Matthias Zwicker
2D and 3D Computer Vision 2

Deep learning has achieved remarkable results in 3D shape analysis by learning global shape features from the pixel-level over multiple views. Previous methods, however, compute low-level features for entire views without considering part-level information. In contrast, we propose a deep neural network, called Parts4Feature, to learn 3D global features from part-level information in multiple views. We introduce a novel definition of generally semantic parts, which Parts4Feature learns to detect in multiple views from different 3D shape segmentation benchmarks. A key idea of our architecture is that it transfers the ability to detect semantically meaningful parts in multiple views to learn 3D global features. Parts4Feature achieves this by combining a local part detection branch and a global feature learning branch with a shared region proposal module. The global feature learning branch aggregates the detected parts in terms of learned part patterns with a novel multi-attention mechanism, while the region proposal module enables locally and globally discriminative information to be promoted by each other. We demonstrate that Parts4Feature outperforms the state-of-the-art under three large-scale 3D shape benchmarks.

• #1852
Rethinking Loss Design for Large-scale 3D Shape Retrieval
Zhaoqun Li, Cheng Xu, Biao Leng
2D and 3D Computer Vision 2

Learning discriminative shape representations is a crucial issue for large-scale 3D shape retrieval. In this paper, we propose the Collaborative Inner Product Loss (CIP Loss) to obtain ideal shape embedding that discriminative among different categories and clustered within the same class. Utilizing simple inner product operation, CIP loss explicitly enforces the features of the same class to be clustered in a linear subspace, while inter-class subspaces are constrained to be at least orthogonal. Compared to previous metric loss functions, CIP loss could provide more clear geometric interpretation for the embedding than Euclidean margin, and is easy to implement without normalization operation referring to cosine margin. Moreover, our proposed loss term can combine with other commonly used loss functions and can be easily plugged into existing off-the-shelf architectures. Extensive experiments conducted on the two public 3D object retrieval datasets, ModelNet and ShapeNetCore 55, demonstrate the effectiveness of our proposal, and our method has achieved state-of-the-art results on both datasets.

• #3897
MAT-Net: Medial Axis Transform Network for 3D Object Recognition
Jianwei Hu, Bin Wang, Lihui Qian, Yiling Pan, Xiaohu Guo, Lingjie Liu, Wenping Wang
2D and 3D Computer Vision 2

3D deep learning performance depends on object representation and local feature extraction. In this work, we present MAT-Net, a neural network which captures local and global features from the Medial Axis Transform (MAT). Different from K-Nearest-Neighbor method which extracts local features by a fixed number of neighbors, our MAT-Net exploits effective modules Group-MAT and Edge-Net to process topological structure. Experimental results illustrate that MAT-Net demonstrates competitive or better performance on 3D shape recognition than state-of-the-art methods, and prove that MAT representation has excellent capacity in 3D deep learning, even in the case of low resolution.

• #873
Semi-supervised Three-dimensional Reconstruction Framework with GAN
Chong Yu
2D and 3D Computer Vision 2

Because of the intrinsic complexity in computation, three-dimensional (3D) reconstruction is an essential and challenging topic in computer vision research and applications. The existing methods for 3D reconstruction often produce holes, distortions and obscure parts in the reconstructed 3D models, or can only reconstruct voxelized 3D models for simple isolated objects. So they are not adequate for real usage. From 2014, the Generative Adversarial Network (GAN) is widely used in generating unreal dataset and semi-supervised learning. So the focus of this paper is to achieve high quality 3D reconstruction performance by adopting GAN principle. We propose a novel semi-supervised 3D reconstruction framework, namely SS-3D-GAN, which can iteratively improve any raw 3D reconstruction models by training the GAN models to converge. This new model only takes real-time 2D observation images as the weak supervision, and doesn't rely on prior knowledge of shape models or any referenced observations. Finally, through the qualitative and quantitative experiments & analysis, this new method shows compelling advantages over the current state-of-the-art methods on Tanks & Temples reconstruction benchmark dataset.

• #1325
Region Deformer Networks for Unsupervised Depth Estimation from Unconstrained Monocular Videos
Haofei Xu, Jianmin Zheng, Jianfei Cai, Juyong Zhang
2D and 3D Computer Vision 2

While learning based depth estimation from images/videos has achieved substantial progress, there still exist intrinsic limitations. Supervised methods are limited by a small amount of ground truth or labeled data and unsupervised methods for monocular videos are mostly based on the static scene assumption, not performing well on real world scenarios with the presence of dynamic objects. In this paper, we propose a new learning based method consisting of DepthNet, PoseNet and Region Deformer Networks (RDN) to estimate depth from unconstrained monocular videos without ground truth supervision. The core contribution lies in RDN for proper handling of rigid and non-rigid motions of various objects such as rigidly moving cars and deformable humans. In particular, a deformation based motion representation is proposed to model individual object motion on 2D images. This representation enables our method to be applicable to diverse unconstrained monocular videos. Our method can not only achieve the state-of-the-art results on standard benchmarks KITTI and Cityscapes, but also show promising results on a crowded pedestrian tracking dataset, which demonstrates the effectiveness of the deformation based motion representation. Code and trained models are available at https://github.com/haofeixu/rdn4depth.

### Wednesday 1416:30 - 18:00PS|SPS - Search in Planning and Scheduling (R11)

Chair: TBD
• #950
Pattern Selection for Optimal Classical Planning with Saturated Cost Partitioning
Jendrik Seipp
Search in Planning and Scheduling

Pattern databases are the foundation of some of the strongest admissible heuristics for optimal classical planning. Experiments showed that the most informative way of combining information from multiple pattern databases is to use saturated cost partitioning. Previous work selected patterns and computed saturated cost partitionings over the resulting pattern database heuristics in two separate steps. We introduce a new method that uses saturated cost partitioning to select patterns and show that it outperforms all existing pattern selection algorithms.

• #1640
Bayesian Inference of Linear Temporal Logic Specifications for Contrastive Explanations
Joseph Kim, Christian Muise, Ankit Shah, Shubham Agarwal, Julie Shah
Search in Planning and Scheduling

Temporal logics are useful for describing dynamic system behavior, and have been successfully used as a language for goal definitions in task planning. Prior works on inferring temporal logic specifications have focused on "summarizing" the input dataset - i.e., finding specifications that are satisfied by all plan traces belonging to the given set. In this paper, we examine the problem of inferring specifications that describe temporal differences between two sets of plan traces. We formalize the concept of providing such contrastive explanations, then present a Bayesian probabilistic model for inferring contrastive explanations as Linear Temporal Logic (LTL) specifications. We demonstrate the efficacy, scalability, and robustness of our model for inferring accurate specifications from noisy data and across various benchmark domains.

• #10956
(Sister Conferences Best Papers Track) On Guiding Search in HTN Planning with Classical Planning Heuristics
Daniel Höller, Pascal Bercher, Gregor Behnke, Susanne Biundo
Search in Planning and Scheduling

Planning is the task of finding a sequence of actions that achieves the goal(s) of an agent. It is solved based on a model describing the environment and how to change it. There are several approaches to solve planning tasks, two of the most popular are classical planning and hierarchical planning. Solvers are often based on heuristic search, but especially regarding domain-independent heuristics, techniques in classical planning are more sophisticated. However, due to the different problem classes, it is difficult to use them in hierarchical planning. In this paper we describe how to use arbitrary classical heuristics in hierarchical planning and show that the resulting system outperforms the state of the art in hierarchical planning.

• #5204
Earliest-Completion Scheduling of Contract Algorithms with End Guarantees
Spyros Angelopoulos, Shendan Jin
Search in Planning and Scheduling

We consider the setting in which executions of contract algorithms are scheduled in a processor so as to produce an interruptible system. Such algorithms offer a trade off between the quality of output and the available computation time, provided that the latter is known in advance. Previous work on this setting has provided strict performance guarantees for several variants of this setting, assuming that an interruption can occur arbitrarily ahead in the future. In practice, however, one expects that the schedule will reach a point beyond which further progress will only be marginal, hence it can be deemed complete. In this work we show how to optimize the time at which the system reaches a desired performance objective, while maintaining interruptible guarantees throughout the entire execution. The resulting schedule is provably optimal, and it guarantees that upon completion each individual contract algorithm has attained a predefined end guarantee.

• #3582
A Novel Distribution-Embedded Neural Network for Sensor-Based Activity Recognition
Hangwei Qian, Sinno Jialin Pan, Bingshui Da, Chunyan Miao
Search in Planning and Scheduling

Feature-engineering-based machine learning models and deep learning models have been explored for wearable-sensor-based human activity recognition. For both types of methods, one crucial research issue is how to extract proper features from the partitioned segments of multivariate sensor readings. Existing methods have different drawbacks: 1) feature-engineering-based methods are able to extract meaningful features, such as statistical or structural information underlying the segments, but usually require manual designs of features for different applications, which is time consuming, and 2) deep learning models are able to learn temporal and/or spatial features from the sensor data automatically, but fail to capture statistical information. In this paper, we propose a novel deep learning model to automatically learn meaningful features including statistical features, temporal features and spatial correlation features for activity recognition in a unified framework. Extensive experiments are conducted on four datasets to demonstrate the effectiveness of our proposed method compared with state-of-the-art baselines.

• #3244
Online Probabilistic Goal Recognition over Nominal Models
Ramon Fraga Pereira, Mor Vered, Felipe Meneguzzi, Miquel Ramírez
Search in Planning and Scheduling

This paper revisits probabilistic, model-based goal recognition to study the implications of the use of nominal models to estimate the posterior probability distribution over a finite set of hypothetical goals. Existing model-based approaches rely on expert knowledge to produce symbolic descriptions of the dynamic constraints domain objects are subject to, and these are assumed to produce correct predictions. We abandon this assumption to consider the use of nominal models that are learnt from observations on transitions of systems with unknown dynamics. Leveraging existing work on the acquisition of domain models via learning for Hybrid Planning we adapt and evaluate existing goal recognition approaches to analyze how prediction errors, inherent to system dynamics identification and model learning techniques have an impact over recognition error rates.

### Wednesday 1416:30 - 18:00ML|DM - Data Mining 7 (R12)

Chair: TBD
• #1779
DMRAN:A Hierarchical Fine-Grained Attention-Based Network for Recommendation
Huizhao Wang, Guanfeng Liu, An Liu, Zhixu Li, Kai Zheng
Data Mining 7

The conventional methods for the next-item recommendation are generally based on RNN or one- dimensional attention with time encoding. They are either hard to preserve the long-term dependencies between different interactions, or hard to capture fine-grained user preferences. In this paper, we propose a Double Most Relevant Attention Network (DMRAN) that contains two layers, i.e., Item level Attention and Feature Level Self- attention, which are to pick out the most relevant items from the sequence of user’s historical behaviors, and extract the most relevant aspects of relevant items, respectively. Then, we can capture the fine-grained user preferences to better support the next-item recommendation. Extensive experiments on two real-world datasets illustrate that DMRAN can improve the efficiency and effectiveness of the recommendation compared with the state-of-the-art methods.

• #2422
ProNE: Fast and Scalable Network Representation Learning
Jie Zhang, Yuxiao Dong, Yan Wang, Jie Tang, Ming Ding
Data Mining 7

Recent advances in network embedding has revolutionized the field of graph and network mining. However, (pre-)training embeddings for very large-scale networks is computationally challenging for most existing methods. In this work, we present ProNE---a fast, scalable, and effective model, whose single-thread version is 10--400x faster than efficient network embedding benchmarks with 20 threads, including LINE, DeepWalk, node2vec, GraRep, and HOPE. As a concrete example, the single-version ProNE requires only 29 hours to embed a network of hundreds of millions of nodes while it takes LINE weeks and DeepWalk months by using 20 threads. To achieve this, ProNE first initializes network embeddings efficiently by formulating the task as sparse matrix factorization. The second step of ProNE is to enhance the embeddings by propagating them in the spectrally modulated space. Extensive experiments on networks of various scales and types demonstrate that ProNE achieves both effectiveness and significant efficiency superiority when compared to the aforementioned baselines. In addition, ProNE's embedding enhancement step can be also generalized for improving other models at speed, e.g., offering >10% relative gains for the used baselines.

• #2793
Hi-Fi Ark: Deep User Representation via High-Fidelity Archive Network
Zheng Liu, Yu Xing, Fangzhao Wu, Mingxiao An, Xing Xie
Data Mining 7

Deep learning techniques have been widely applied to modern recommendation systems, bringing in flexible and effective ways of user representation. Conventionally, user representations are generated purely in the offline stage. Without referencing to the specific candidate item for recommendation, it is difficult to fully capture user preference from the perspective of interest. More recent algorithms tend to generate user representation at runtime, where user's historical behaviors are attentively summarized w.r.t. the presented candidate item. In spite of the improved efficacy, it is too expensive for many real-world scenarios because of the repetitive access to user's entire history. In this work, a novel user representation framework, Hi-Fi Ark, is proposed. With Hi-Fi Ark, user history is summarized into highly compact and complementary vectors in the offline stage, known as archives. Meanwhile, user preference towards a specific candidate item can be precisely captured via the attentive aggregation of such archives. As a result, both deployment feasibility and superior recommendation efficacy are achieved by Hi-Fi Ark. The effectiveness of Hi-Fi Ark is empirically validated on three real-world datasets, where remarkable and consistent improvements are made over a variety of well-recognized baseline methods.

• #4239
Deep Active Learning for Anchor User Prediction
Anfeng Cheng, Chuan Zhou, Hong Yang, Jia Wu, Lei Li, Jianlong Tan, Li Guo
Data Mining 7

Predicting pairs of anchor users plays an important role in the cross-network analysis. Due to the expensive costs of labeling anchor users for training prediction models, we consider in this paper the problem of minimizing the number of user pairs across multiple networks for labeling as to improve the accuracy of the prediction. To this end, we present a deep active learning model for anchor user prediction (DALAUP for short). However, active learning for anchor user sampling meets the challenges of non-i.i.d. user pair data caused by network structures and the correlation among anchor or non-anchor user pairs. To solve the challenges, DALAUP uses a couple of neural networks with shared-parameter to obtain the vector representations of user pairs, and ensembles three query strategies to select the most informative user pairs for labeling and model training. Experiments on real-world social network data demonstrate that DALAUP outperforms the state-of-the-art approaches.

• #3364
On the Estimation of Treatment Effect with Text Covariates
Liuyi Yao, Sheng Li, Yaliang Li, Hongfei Xue, Jing Gao, Aidong Zhang
Data Mining 7

Estimating the treatment effect benefits decision making in various domains as it can provide the potential outcomes of different choices. Existing work mainly focuses on covariates with numerical values, while how to handle covariates with textual information for treatment effect estimation is still an open question. One major challenge is how to filter out the nearly instrumental variables which are the variables more predictive to the treatment than the outcome. Conditioning on those variables to estimate the treatment effect would amplify the estimation bias. To address this challenge, we propose a conditional treatment-adversarial learning based matching method (CTAM). CTAM incorporates the treatment-adversarial learning to filter out the information related to nearly instrumental variables when learning the representations, and then it performs matching among the learned representations to estimate the treatment effects. The conditional treatment-adversarial learning helps reduce the bias of treatment effect estimation, which is demonstrated by our experimental results on both semi-synthetic and real-world datasets.

• #3561
Noise-Resilient Similarity Preserving Network Embedding for Social Networks
Zhenyu Qiu, Wenbin Hu, Jia Wu, ZhongZheng Tang, Xiaohua Jia
Data Mining 7

Network embedding assigns nodes in a network to low-dimensional representations and effectively preserves the structure and inherent properties of the network. Most existing network embedding methods didn't consider network noise. However, it is almost impossible to observe the actual structure of a real-world network without noise.  The noise in the network will affect the performance of network embedding dramatically. In this paper, we aim to exploit node similarity to address the problem of social network embedding with noise and propose a node similarity preserving (NSP) embedding method. NSP exploits a comprehensive similarity index to quantify the authenticity of the observed network structure. Then we propose an algorithm to construct a correction matrix to reduce the influence of noise. Finally, an objective function for accurate network embedding is proposed and an efficient algorithm to solve the optimization problem is provided. Extensive experimental results on a variety of applications of real-world networks with noise show the superior performance of the proposed method over the state-of-the-art methods.

### Wednesday 1416:30 - 18:00ML|TAML - Transfer, Adaptation, Multi-task Learning 3 (R13)

Chair: TBD
• #4195
Differentially Private Optimal Transport: Application to Domain Adaptation
Nam LeTien, Amaury Habrard, Marc Sebban
Transfer, Adaptation, Multi-task Learning 3

Optimal transport has received much attention during the past few years to deal with domain adaptation tasks. The goal is to transfer knowledge from a source domain to a target domain by finding a transportation of minimal cost moving the source distribution to the target one. In this paper, we address the challenging task of privacy preserving domain adaptation by optimal transport. Using the Johnson-Lindenstrauss transform together with some noise, we present the first differentially private optimal transport model and show how it can be directly applied on both unsupervised and semi-supervised domain adaptation scenarios. Our theoretically grounded method allows the optimization of the transportation plan and the Wasserstein distance between the two distributions while protecting the data of both domains.We perform an extensive series of experiments on various benchmarks (VisDA, Office-Home and Office-Caltech datasets) that demonstrates the efficiency of our method compared to non-private strategies.

• #5546
Learning to Learn Gradient Aggregation by Gradient Descent
Jinlong Ji, Xuhui Chen, Qianlong Wang, Lixing Yu, Pan Li
Transfer, Adaptation, Multi-task Learning 3

In the big data era, distributed machine learning emerges as an important learning paradigm to mine large volumes of data by taking advantage of distributed computing resources. In this work, motivated by learning to learn, we propose a meta-learning approach to coordinate the learning process in the master-slave type of distributed systems. Specifically, we utilize a recurrent neural network (RNN) in the parameter server (the master) to learn to aggregate the gradients from the workers (the slaves). We design a coordinatewise preprocessing and postprocessing method to make the neural network based aggregator more robust. Besides, to address the fault tolerance, especially the Byzantine attack, in distributed machine learning systems, we propose an RNN aggregator with additional loss information (ARNN) to improve the system resilience. We conduct extensive experiments to demonstrate the effectiveness of the RNN aggregator, and also show that it can be easily generalized and achieve remarkable performance when transferred to other distributed systems. Moreover, under majoritarian Byzantine attacks, the ARNN aggregator outperforms the Krum, the state-of-art fault tolerance aggregation method, by 43.14%. In addition, our RNN aggregator enables the server to aggregate gradients from variant local models, which significantly improve the scalability of distributed learning.

• #6042
Efficient Protocol for Collaborative Dictionary Learning in Decentralized Networks
Tsuyoshi Idé, Rudy Raymond, Dzung T. Phan
Transfer, Adaptation, Multi-task Learning 3

This paper is concerned with the task of collaborative density estimation in the distributed multi-task setting. Major application scenarios include collaborative anomaly detection among distributed industrial assets owned by different companies competing with each other. Of critical importance here is to achieve two conflicting goals at once: data privacy and collaboration. To this end, we propose a new framework for collaborative dictionary learning. By using a mixture of the exponential family, we show that collaborative learning can be nicely separated into three steps: local updates, global consensus, and optimization. For the critical step of consensus building, we propose a new algorithm that does not rely on expensive encryption-based multi-party computation. Our theoretical and experimental analysis shows that our method is several orders of magnitude faster than the alternative.

• #3575
Knowledge Amalgamation from Heterogeneous Networks by Common Feature Learning
Sihui Luo, Xinchao Wang, Gongfan Fang, Yao Hu, Dapeng Tao, Mingli Song
Transfer, Adaptation, Multi-task Learning 3

An increasing number of well-trained deep networks have been released online by researchers and developers, enabling the community to reuse them in a plug-and-play way without accessing the training annotations. However, due to the large number of network variants, such public-available trained models are often of different architectures, each of which being tailored for a specific task or dataset. In this paper, we study a deep-model reusing task, where we are given as input pre-trained networks of heterogeneous architectures specializing in distinct tasks, as teacher models. We aim to learn a multitalented and light-weight student model that is able to grasp the integrated knowledge from all such heterogeneous-structure teachers, again without accessing any human annotation. To this end, we propose a common feature learning scheme, in which the features of all teachers are transformed into a common space and the student is enforced to imitate them all so as to amalgamate the intact knowledge. We test the proposed approach on a list of benchmarks and demonstrate that the learned student is able to achieve very promising performance, superior to those of the teachers in their specialized tasks.

• #6456
Landmark Selection for Zero-shot Learning
Yuchen Guo, Guiguang Ding, Jungong Han, Chenggang Yan, Jiyong Zhang, Qionghai Dai
Transfer, Adaptation, Multi-task Learning 3

Zero-shot learning (ZSL) is an emerging research topic whose goal is to build recognition models for previously unseen classes. The basic idea of ZSL is based on heterogeneous feature matching which learns a compatibility function between image and class features using seen classes. The function is constructed based on one-vs-all training in which each class has only one class feature and many image features. Existing ZSL works mostly treat all image features equivalently. However, in this paper we argue that it is more reasonable to use some representative cross-domain data instead of all. Motivated by this idea, we propose a novel approach, termed as Landmark Selection(LAST) for ZSL. LAST is able to identify representative cross-domain features which further lead to better image-class compatibility function. Experiments on several ZSL datasets including ImageNet demonstrate the superiority of LAST to the state-of-the-arts.

• #501
Bayesian Uncertainty Matching for Unsupervised Domain Adaptation
Jun Wen, Nenggan Zheng, Junsong Yuan, Zhefeng Gong, Changyou Chen
Transfer, Adaptation, Multi-task Learning 3

Domain adaptation is an important technique to alleviate performance degradation caused by domain shift, e.g., when training and test data come from different domains. Most existing deep adaptation methods focus on reducing domain shift by matching marginal feature distributions through deep transformations on the input features, due to the unavailability of target domain labels. We show that domain shift may still exist via label distribution shift at the classifier, thus deteriorating model performances. To alleviate this issue, we propose an approximate joint distribution matching scheme by exploiting prediction uncertainty. Specifically, we use a Bayesian neural network to quantify prediction uncertainty of a classifier. By imposing distribution matching on both features and labels (via uncertainty), label distribution mismatching in source and target data is effectively alleviated, encouraging the classifier to produce consistent predictions across domains. We also propose a few techniques to improve our method by adaptively reweighting domain adaptation loss to achieve nontrivial distribution matching and stable training. Comparisons with state of the art unsupervised domain adaptation methods on three popular benchmark datasets demonstrate the superiority of our approach, especially on the effectiveness of alleviating negative transfer.

### Wednesday 1419:00 - 22:00IJCAI 2019 Conference Dinner (GD)

• IJCAI 2019 Conference Dinner
IJCAI 2019 Conference Dinner
• ### Wednesday 1419:00 - 22:00IJCAI 2019 Student Reception (SR)

• IJCAI 2019 Student Reception
IJCAI 2019 Student Reception
• ### Thursday 1508:30 - 09:20Invited Talk (R1)

• Formal Synthesis for Robots
Hadas Kress-Gazit
Invited Talk
• ### Thursday 1509:30 - 10:30AI-HWB - ST: AI for Improving Human Well-Being 5 (R2)

Chair: TBD
• #5070
The Price of Local Fairness in Multistage Selection
Vitalii Emelianov, George Arvanitakis, Nicolas Gast, Krishna Gummadi, Patrick Loiseau
ST: AI for Improving Human Well-Being 5

The rise of algorithmic decision making led to active researches on how to define and guarantee fairness, mostly focusing on one-shot decision making. In several important applications such as hiring, however, decisions are made in multiple stage with additional information at each stage. In such cases, fairness issues remain poorly understood. In this paper we study fairness in k-stage selection problems where additional features are observed at every stage. We first introduce two fairness notions, local (per stage) and global (final stage) fairness, that extend the classical fairness notions to the k-stage setting. We propose a simple model based on a probabilistic formulation and show that the locally and globally fair selections that maximize precision can be computed via a linear program. We then define the price of local fairness to measure the loss of precision induced by local constraints; and investigate theoretically and empirically this quantity. In particular, our experiments show that the price of local fairness is generally smaller when the sensitive attribute is observed at the first stage; but globally fair selections are more locally fair when the sensitive attribute is observed at the second stage – hence in both cases it is often possible to have a selection that has a small price of local fairness and is close to locally fair.

• #5318
mdfa: Multi-Differential Fairness Auditor for Black Box Classifiers
Xavier Gitiaux, Huzefa Rangwala
ST: AI for Improving Human Well-Being 5

Machine learning algorithms are increasingly involved in sensitive decision-making processes with adversarial implications on individuals. This paper presents a new tool,  mdfa that identifies the characteristics of the victims of a classifier's discrimination. We measure discrimination as a violation of multi-differential fairness.  Multi-differential fairness is a guarantee that a black box classifier's outcomes do not leak information on the sensitive attributes of a small group of individuals. We reduce the problem of identifying worst-case violations to matching distributions and predicting where sensitive attributes and classifier's outcomes coincide. We apply mdfa to a recidivism risk assessment classifier widely used in the United States and demonstrate that for individuals with little criminal history, identified African-Americans are three-times more likely to be considered at high risk of violent recidivism than similar non-African-Americans.

• #2973
Diversity-Inducing Policy Gradient: Using Maximum Mean Discrepancy to Find a Set of Diverse Policies
Muhammad Masood, Finale Doshi-Velez
ST: AI for Improving Human Well-Being 5

Standard reinforcement learning methods aim to master one way of solving a task whereas there may exist multiple near-optimal policies. Being able to identify this collection of near-optimal policies can allow a domain expert to efficiently explore the space of reasonable solutions.  Unfortunately, existing approaches that quantify uncertainty over policies are not ultimately relevant to finding policies with qualitatively distinct behaviors.  In this work, we formalize the difference between policies as a difference between the distribution of trajectories induced by each policy, which encourages diversity with respect to both state visitation and action choices.  We derive a gradient-based optimization technique that can be combined with existing policy gradient methods to now identify diverse collections of well-performing policies.  We demonstrate our approach on benchmarks and a healthcare task.

• #3171
Group-Fairness in Influence Maximization
Alan Tsang, Bryan Wilder, Eric Rice, Milind Tambe, Yair Zick
ST: AI for Improving Human Well-Being 5

Influence maximization is a widely used model for information dissemination in social networks. Recent work has employed such interventions across a wide range of social problems, spanning public health, substance abuse, and international development (to name a few examples). A critical but understudied question is whether the benefits of such interventions are fairly distributed across different groups in the population; e.g., avoiding discrimination with respect to sensitive attributes such as race or gender. Drawing on legal and game-theoretic concepts, we introduce formal definitions of fairness in influence maximization. We provide an algorithmic framework to find solutions which satisfy fairness constraints, and in the process improve the state of the art for general multi-objective submodular maximization problems. Experimental results on real data from an HIV prevention intervention for homeless youth show that standard influence maximization techniques oftentimes neglect smaller groups which contribute less to overall utility, resulting in a disparity which our proposed algorithms substantially reduce.

### Thursday 1509:30 - 10:30ML|EM - Ensemble Methods 1 (R3)

Chair: TBD
• #1559
AugBoost: Gradient Boosting Enhanced with Step-Wise Feature Augmentation
Philip Tannor, Lior Rokach
Ensemble Methods 1

Gradient Boosted Decision Trees (GBDT) is a widely used machine learning algorithm, which obtains state-of-the-art results on many machine learning tasks. In this paper we introduce a method for obtaining better results, by augmenting the features in the dataset between the iterations of GBDT. We explore a number of augmentation methods: training an Artificial Neural Network (ANN) and extracting features from it's last hidden layer (supervised), and rotating the feature-space using unsupervised methods such as PCA or Random Projection (RP). These variations on GBDT were tested on 20 classification tasks, on which all of them outperformed GBDT and previous related work.

• #5434
Ensemble-based Ultrahigh-dimensional Variable Screening
Wei Tu, Dong Yang, Linglong Kong, Menglu Che, Qian Shi, Guodong Li, Guangjian Tian
Ensemble Methods 1

Since the sure independence screening (SIS) method by Fan and Lv, many different variable screening methods have been proposed based on different measures under different models. However, most of these methods are designed for specific models. In practice, we often have very little information about the data generating process and different methods can result in very different sets of features. The heterogeneity presented here motivates us to combine various screening methods simultaneously. In this paper, we introduce a general ensemble-based framework to efficiently combine results from multiple variable screening methods. The consistency and sure screening property of proposed framework has been established. Extensive simulation studies confirm our intuition that the proposed ensemble-based method is more robust against model specification than using single variable screening method. The proposed ensemble-based method is used to predict attention deficit hyperactivity disorder (ADHD) status using brain function connectivity (FC).

• #5661
Deep Variational Koopman Models: Inferring Koopman Observations for Uncertainty-Aware Dynamics Modeling and Control
Jeremy Morton, Freddie D. Witherden, Mykel J. Kochenderfer
Ensemble Methods 1

Koopman theory asserts that a nonlinear dynamical system can be mapped to a linear system, where the Koopman operator advances observations of the state forward in time. However, the observable functions that map states to observations are generally unknown. We introduce the Deep Variational Koopman (DVK) model, a method for inferring distributions over observations that can be propagated linearly in time. By sampling from the inferred distributions, we obtain a distribution over dynamical models, which in turn provides a distribution over possible outcomes as a modeled system advances in time. Experiments show that the DVK model is effective at long-term prediction for a variety of dynamical systems. Furthermore, we describe how to incorporate the learned models into a control framework, and demonstrate that accounting for the uncertainty present in the distribution over dynamical models enables more effective control.

• #2704
Gradient Boosting with Piece-Wise Linear Regression Trees
Yu Shi, Jian Li, Zhize Li
Ensemble Methods 1

Gradient Boosted Decision Trees (GBDT) is a very successful ensemble learning algorithm widely used across a variety of applications. Recently, several variants of GBDT training algorithms and implementations have been designed and heavily optimized in some very popular open sourced toolkits including XGBoost, LightGBM and CatBoost. In this paper, we show that both the accuracy and efficiency of GBDT can be further enhanced by using more complex base learners. Specifically, we extend gradient boosting to use piecewise linear regression trees (PL Trees), instead of piecewise constant regression trees, as base learners. We show that PL Trees can accelerate convergence of GBDT and improve the accuracy. We also propose some optimization tricks to substantially reduce the training time of PL Trees, with little sacrifice of accuracy. Moreover, we propose several implementation techniques to speedup our algorithm on modern computer architectures with powerful Single Instruction Multiple Data (SIMD) parallelism. The experimental results show that GBDT with PL Trees can provide very competitive testing accuracy with comparable or less training time.

### Thursday 1509:30 - 10:30UAI|API - Approximate Probabilistic Inference (R4)

Chair: TBD
• #1206
On Constrained Open-World Probabilistic Databases
Tal Friedman, Guy Van den Broeck
Approximate Probabilistic Inference

Increasing amounts of available data have led to a heightened need for representing large-scale probabilistic knowledge bases. One approach is to use a probabilistic database, a model with strong assumptions that allow for efficiently answering many interesting queries. Recent work on open-world probabilistic databases strengthens the semantics of these probabilistic databases by discarding the assumption that any information not present in the data must be false. While intuitive, these semantics are not sufficiently precise to give reasonable answers to queries. We propose overcoming these issues by using constraints to restrict this open world. We provide an algorithm for one class of queries, and establish a basic hardness result for another. Finally, we propose an efficient and tight approximation for a large class of queries.

• #3643
Cutset Bayesian Networks: A New Representation for Learning Rao-Blackwellised Graphical Models
Tahrima Rahman, Shasha Jin, Vibhav Gogate
Approximate Probabilistic Inference

Recently there has been growing interest in learning probabilistic models that admit poly-time inference called tractable probabilistic models from data. Although they generalize poorly as compared to intractable models, they often yield more accurate estimates at prediction time. In this paper, we seek to further explore this trade-off between generalization performance and inference accuracy by proposing a novel, partially tractable representation called cutset Bayesian networks (CBNs). The main idea in CBNs is to partition the variables into two subsets X and Y, learn a (intractable) Bayesian network that represents P(X) and a tractable conditional model that represents P(Y|X). The hope is that the intractable model will help improve generalization while the tractable model, by leveraging Rao-Blackwellised sampling which combines exact inference and sampling, will help improve the prediction accuracy. To compactly model P(Y|X), we introduce a novel tractable representation called conditional cutset networks (CCNs) in which all conditional probability distributions are represented using calibrated classifiers—classifiers which typically yield higher quality probability estimates than conventional classifiers. We show via a rigorous experimental evaluation that CBNs and CCNs yield more accurate posterior estimates than their tractable as well as intractable counterparts.

• #5342
Lifted Message Passing for Hybrid Probabilistic Inference
Yuqiao Chen, Nicholas Ruozzi, Sriraam Natarajan
Approximate Probabilistic Inference

Lifted inference algorithms for first-order logic models, e.g., Markov logic networks (MLNs), have been of significant interest in recent years.  Lifted inference methods exploit model symmetries in order to reduce the size of the model and, consequently, the computational cost of inference.  In this work, we consider the problem of lifted inference in MLNs with continuous or both discrete and continuous groundings. Existing work on lifting with continuous groundings has mostly been limited to special classes of models, e.g., Gaussian models, for which variable elimination or message-passing updates can be computed exactly.  Here, we develop approximate lifted inference schemes based on particle sampling.  We demonstrate empirically that our approximate lifting schemes perform comparably to existing state-of-the-art for models for Gaussian MLNs, while having the flexibility to be applied to models with arbitrary potential functions.

• #5427
Bayesian Parameter Estimation for Nonlinear Dynamics Using Sensitivity Analysis
Yi Chou, Sriram Sankaranarayanan
Approximate Probabilistic Inference

We investigate approximate Bayesian inference techniques for nonlinear systems described by ordinary differential equation (ODE) models. In particular, the approximations will be based on set-valued reachability analysis approaches, yielding approximate models for the posterior distribution. Nonlinear ODEs are widely used to mathematically describe physical and biological models. However, these models are often described by parameters that are not directly measurable and have an impact on the system behaviors. Often, noisy measurement data combined with physical/biological intuition serve as the means for finding appropriate values of these parameters. Our approach operates under a Bayesian framework, given prior distribution over the parameter space and noisy observations under a known sampling distribution. We explore subsets of the space of model parameters, computing bounds on the likelihood for each subset. This is performed using nonlinear set-valued reachability analysis that is made faster by means of linearization around a reference trajectory. The tiling of the parameter space can be adaptively refined to make bounds on the likelihood tighter. We evaluate our approach on a variety of nonlinear benchmarks and compare our results with Markov Chain Monte Carlo and Sequential Monte Carlo approaches.

### Thursday 1509:30 - 10:30AMS|CSC - Computational Social Choice 2 (R5)

Chair: TBD
• #90
Approval-Based Elections and Distortion of Voting Rules
Grzegorz Pierczyński, Piotr Skowron
Computational Social Choice 2

We consider elections where both voters and candidates can be associated with points in a metric space and voters prefer candidates that are closer to those that are farther away. It is often assumed that the optimal candidate is the one that minimizes the total distance to the voters. Yet, the voting rules often do not have access to the metric space M and only see preference rankings induced by M. Consequently, they often are incapable of selecting the optimal candidate. The distortion of a voting rule measures the worst-case loss of the quality being the result of having access only to preference rankings. We extend the idea of distortion to approval-based preferences. First, we compute the distortion of Approval Voting. Second, we introduce the concept of acceptability-based distortion---the main idea behind is that the optimal candidate is the one that is acceptable to most voters. We determine acceptability-distortion for a number of rules, including Plurality, Borda, k-Approval, Veto, Copeland, Ranked Pairs, the Schulze's method, and STV.

• #4615
Complexity of Manipulating and Controlling Approval-Based Multiwinner Voting
Yongjie Yang
Computational Social Choice 2

We study the complexity of several manipulation and control problems for six prevalent approval based multiwinner voting rules. We show that these rules generally resist the proposed strategic types. In addition, we also give fixed-parameter tractability results for these problems with respect to several natural parameters and derive polynomial-time algorithms for certain special cases.

• #216
A Quantitative Analysis of Multi-Winner Rules
Martin Lackner, Piotr Skowron
Computational Social Choice 2

To choose a suitable multi-winner voting rule is a hard and ambiguous task. Depending on the context, it varies widely what constitutes the choice of an "optimal" subset.In this paper, we offer a new perspective on measuring the quality of such subsets and---consequently---of multi-winner rules. We provide a quantitative analysis using methods from the theory of approximation algorithms and estimate how well multi-winner rules approximate two extreme objectives: diversity as captured by the Approval Chamberlin--Courant rule and individual excellence as captured by Multi-winner Approval Voting. With both theoretical and experimental methods we classify multi-winner rules in terms of their quantitative alignment with these two opposing objectives.

• #715
Aggregating Incomplete Pairwise Preferences by Weight
Zoi Terzopoulou, Ulle Endriss
Computational Social Choice 2

We develop a model for the aggregation of preferences that do not need to be either complete or transitive. Our focus is on the normative characterisation of aggregation rules under which each agent has a weight that depends only on the size of her ballot, i.e., on the number of pairs of alternatives for which she chooses to report a relative ranking. We show that for rules that satisfy a restricted form of majoritarianism these weights in fact must be constant, while for rules that are invariant under agents with compatible preferences forming pre-election pacts it must be the case that an agent's weight is inversely proportional to the size of her ballot.

### Thursday 1509:30 - 10:30PS|POMDP - POMDPs (R6)

Chair: TBD
• #2577
Approximability of Constant-horizon Constrained POMDP
Majid Khonji, Ashkan Jasour, Brian Williams
POMDPs

Partially Observable Markov Decision Process (POMDP) is a fundamental framework for planning and decision making under uncertainty. POMDP is known to be intractable to solve, or even approximate, when the planning horizon is long (i.e., within a polynomial number of time steps). Constrained POMDP (C-POMDP) allows constraints to be specified on some aspects of the policy in addition to the objective function. When constraints involve uncertainty, the problem is called Chance-Constrained POMDP (CC-POMDP). Our first contribution is a reduction from CC-POMDP to C-POMDP and a novel integer linear programming problem formulation. Thus, any algorithm for the later problem can be utilized to solve any instance of the former. Second, we show that unlike POMDP, when the length of the planning horizon is constant, (C)C-POMDP is NP-Hard. Third, we present the first Fully Polynomial Time Approximation Scheme (FPTAS) that computes (near) optimal deterministic policies for constant-horizon (C)C-POMDP in polynomial time.

• #4610
Influence of State-Variable Constraints on Partially Observable Monte Carlo Planning
Alberto Castellini, Georgios Chalkiadakis, Alessandro Farinelli
POMDPs

Online planning methods for partially observable Markov decision processes (POMDPs) have recently gained much interest. In this paper, we propose the introduction of prior knowledge in the form of (probabilistic) relationships among discrete state-variables, for online planning based on the well-known POMCP algorithm. In particular, we propose the use of hard constraint networks and probabilistic Markov random fields to formalize state-variable constraints and we extend the POMCP algorithm to take advantage of these constraints. Results on a case study based on Rocksample show that the usage of this knowledge provides significant improvements to the performance of the algorithm. The extent of this improvement depends on the amount of knowledge encoded in the constraints and reaches the 50% of the average discounted return in the most favorable cases that we analyzed.

• #5680
Counterexample-Guided Strategy Improvement for POMDPs Using Recurrent Neural Networks
Steven Carr, Nils Jansen, Alexandru Serban, Ralf Wimmer, Bernd Becker, Ufuk Topcu
POMDPs

We study strategy synthesis for partially observable Markov decision processes (POMDPs). The particular problem is to determine strategies that provably adhere to (probabilistic) temporal logic constraints. This problem is computationally intractable and theoretically hard. We propose a novel method that combines techniques from machine learning and formal verification. First, we train a recurrent neural network (RNN) to encode POMDP strategies. The RNN accounts for memory-based decisions without the need to expand the full belief space of a POMDP. Secondly, we restrict the RNN-based strategy to represent a finite-memory strategy and implement it on a specific POMDP. For the resulting finite Markov chain, efficient formal verification techniques provide provable guarantees against temporal logic specifications. If the specification is not satisfied, counterexamples supply diagnostic information. We use this information to improve the strategy by iteratively training the RNN. Numerical experiments show that the proposed method elevates the state of the art in POMDP solving by up to three orders of magnitude in terms of solving times and model sizes.

• #1193
Regular Decision Processes: A Model for Non-Markovian Domains
Ronen I. Brafman, Giuseppe De Giacomo
POMDPs

We introduce and study Regular Decision Processes (RDPs), a new, compact, factored model for domains with non-Markovian dynamics and rewards. In RDPs, transition and reward functions are specified using formulas in linear dynamic logic over finite traces, a language with the expressive power of regular expressions. This allows specifying complex dependence on the past using intuitive and compact formulas, and provides a model that generalizes MDPs and k-order MDPs. RDPs can also approximate POMDPs without having to postulate the existence of hidden variables, and, in principle, can be learned from observations only.

### Thursday 1509:30 - 10:30MLA|AUL - Applications of Unsupervised Learning (R7)

Chair: TBD
• #913
Pseudo Supervised Matrix Factorization in Discriminative Subspace
Jiaqi Ma, Yipeng Zhang, Lefei Zhang, Bo Du, Dapeng Tao
Applications of Unsupervised Learning

Non-negative Matrix Factorization (NMF) and spectral clustering have been proved to be efficient and effective for data clustering tasks and have been applied to various real-world scenes. However, there are still some drawbacks in traditional methods: (1) most existing algorithms only consider high-dimensional data directly while neglect the intrinsic data structure in the low-dimensional subspace; (2) the pseudo-information got in the optimization process is not relevant to most spectral clustering and manifold regularization methods. In this paper, a novel unsupervised matrix factorization method, Pseudo Supervised Matrix Factorization (PSMF), is proposed for data clustering. The main contributions are threefold: (1) to cluster in the discriminant subspace, Linear Discriminant Analysis (LDA) combines with NMF to become a unified framework; (2) we propose a pseudo supervised manifold regularization term which utilizes the pseudo-information to instruct the regularization term in order to find subspace that discriminates different classes; (3) an efficient optimization algorithm is designed to solve the proposed problem with proved convergence. Extensive experiments on multiple benchmark datasets illustrate that the proposed model outperforms other state-of-the-art clustering algorithms.

• #4390
A Quantum-inspired Classical Algorithm for Separable Non-negative Matrix Factorizations
Zhihuai Chen, Yinan Li, Xiaoming Sun, Pei Yuan, Jialin Zhang
Applications of Unsupervised Learning

Non-negative Matrix Factorization (NMF) asks to decompose a (entry-wise) non-negative matrix into the product of two smaller-sized nonnegative matrices, which has been shown intractable in general. In order to overcome this issue, separability assumption is introduced which assumes all data points are in a conical hull. This assumption makes NMF tractable and widely used in text analysis and image processing, but still impractical for huge-scale datasets. In this paper, inspired by recent development on dequantizing techniques, we propose a new classical algorithm for separable NMF problem. Our new algorithm runs in time polynomial in the rank and logarithmic in the size of input matrices, which achieves an exponential speedup in the low-rank setting.

• #3094
Learning a Generative Model for Fusing Infrared and Visible Images via Conditional Generative Adversarial Network with Dual Discriminators
Han Xu, Pengwei Liang, Wei Yu, Junjun Jiang, Jiayi Ma
Applications of Unsupervised Learning

In this paper, we propose a new end-to-end model, called dual-discriminator conditional generative adversarial network (DDcGAN), for fusing infrared and visible images of different resolutions. Unlike the pixel-level methods and existing deep learning-based methods, the fusion task is accomplished through the adversarial process between a generator and two discriminators, in addition to the specially designed content loss. The generator is trained to generate real-like fused images to fool discriminators. The two discriminators are trained to calculate the JS divergence between the probability distribution of downsampled fused images and infrared images, and the JS divergence between the probability distribution of gradients of fused images and gradients of visible images, respectively. Thus, the fused images can compensate for the features that are not constrained by the single content loss. Consequently, the prominence of thermal targets in the infrared image and the texture details in the visible image can be preserved or even enhanced in the fused image simultaneously. Moreover, by constraining and distinguishing between the downsampled fused image and the low-resolution infrared image, DDcGAN can be preferably applied to the fusion of different resolution images. Qualitative and quantitative experiments on publicly available datasets demonstrate the superiority of our method over the state-of-the-art.

• #909
LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs
Weibin Meng, Ying Liu, Yichen Zhu, Shenglin Zhang, Dan Pei, Yuqing Liu, Yihao Chen, Ruizhi Zhang, Shimin Tao, Pei Sun, Rong Zhou
Applications of Unsupervised Learning

Recording runtime status via logs is common for almost every computer system, and detecting anomalies in logs is crucial for timely identifying malfunctions of systems. However, manually detecting anomalies for logs is time-consuming, error-prone, and infeasible. Existing automatic log anomaly detection approaches, using indexes rather than semantics of log templates, tend to cause false alarms. In this work, we propose LogAnomaly, a framework to model unstructured a log stream as a natural language sequence. Empowered by template2vec, a novel, simple yet effective method to extract the semantic information hidden in log templates, LogAnomaly can detect both sequential and quantitive log anomalies simultaneously, which were not done by any previous work. Moreover, LogAnomaly can avoid the false alarms caused by the newly appearing log templates between periodic model retrainings. Our evaluation on two public production log datasets show that LogAnomaly outperforms existing log-based anomaly detection methods.

### Thursday 1509:30 - 10:30KRR|CCR - Computational Complexity of Reasoning 1 (R8)

Chair: TBD
• #2282
Reasoning about Disclosure in Data Integration in the Presence of Source Constraints
Michael Benedikt, Pierre Bourhis, Louis Jachiet, Michaël Thomazo
Computational Complexity of Reasoning 1

Data integration systems allow users to access data sitting in multiple sources by means of queries over a global schema, related to the sources via mappings. Datasources often contain sensitive information, and thus an analysis is needed to verify that a schema satisfies a privacy policy, given as a set of queries whose answers should not be accessible to users. Such an analysis should take into account not only knowledge that an attacker may have about the mappings, but also what they may know about the semantics of the sources.In this paper, we show that source constraints can have a dramatic impact on disclosure analysis. We study the problem of determining whether a given data integration system discloses a source query to an attacker in the presence of constraints, providing both lower and upper bounds on source-aware disclosure analysis.

• #4211
Enriching Ontology-based Data Access with Provenance
Diego Calvanese, Davide Lanti, Ana Ozaki, Rafael Penaloza, Guohui Xiao
Computational Complexity of Reasoning 1

Ontology-based data access (OBDA) is a popular paradigm for querying heterogeneous data sources by connecting them through mappings to an ontology. In OBDA, it is often difficult to reconstruct why a tuple occurs in the answer of a query. We address this challenge by enriching OBDA with provenance semirings, taking inspiration from database theory. In particular, we investigate the problems of (i) deciding whether a provenance annotated OBDA instance entails a provenance annotated conjunctive query, and (ii) computing a polynomial representing the provenance of a query entailed by a provenance annotated OBDA instance. Differently from pure databases, in our case, these polynomials may be infinite. To regain finiteness, we consider idempotent semirings, and study the complexity in the case of DL-LiteR ontologies. We implement Task (ii) in a state-of-the-art OBDA system and show the practical feasibility of the approach through an extensive evaluation against two popular benchmarks.

• #4223
Mixed-World Reasoning with Existential Rules under Active-Domain Semantics
Meghyn Bienvenu, Pierre Bourhis
Computational Complexity of Reasoning 1

In this paper, we study reasoning with existential rules in a setting where some of the predicates may be closed (i.e., their content is fully specified by the data instance) and the remaining open predicates are interpreted under active-domain semantics. We show, unsurprisingly, that the main reasoning tasks (satisfiability and certainty / possibility of Boolean queries) are all intractable in data complexity in the general case. However, several positive (PTIME data) results are obtained for the linear fragment, and interestingly, these tractability results hold also for various extensions, e.g., with negated closed atoms and disjunctive rule heads. This motivates us to take a closer look at the linear fragment, exploring its expressivity and defining a fixpoint extension to approximate non-linear rules.

• #3536
On Computational Complexity of Pickup-and-Delivery Problems with Precedence Constraints or Time Windows
Xing Tan, Jimmy Huang
Computational Complexity of Reasoning 1

Pickup-and-Delivery (PD) problems consider routing vehicles to achieve a set of tasks related to Pickup'', and to Delivery''. Meanwhile these tasks might subject to Precedence Constraints (PDPC) or Time Windows (PDTW). PD is a variant to Vehicle Routing Problems (VRP), which have been extensively studied for decades. In the recent years, PD demonstrates its closer relevance to AI. With an awareness that few work has been dedicated so far in addressing where the tractability boundary line can be drawn for PD problems, we identify in this paper a set of highly restricted PD problems and prove their NP-completeness. Many problems from a multitude of applications and industry domains are general versions of PDPC. Thus this new result of NP-hardness, of PDPC, not only clarifies the computational complexity of these problems, but also sets up a firm base for the requirement on use of approximation or heuristics, as opposed to looking for exact but intractable algorithms for solving them. We move on to perform an empirical study to locate sources of intractability in PD problems. That is, we propose a local-search formalism and algorithm for solving PDPC problems in particular. Experimental results support strongly effectiveness and efficiency of the local-search. Using the local-search as a solver for randomly generated PDPC problem instances, we obtained interesting and potentially useful insights regarding computational hardness of PDPC and PD.

### Thursday 1509:30 - 10:30NLP|IR - Information Retrieval (R9)

Chair: TBD
• #417
Knowledge Aware Semantic Concept Expansion for Image-Text Matching
Botian Shi, Lei Ji, Pan Lu, Zhendong Niu, Nan Duan
Information Retrieval

Image-text matching is a vital cross-modality task in artificial intelligence and has attracted increasing attention in recent years. Existing works have shown that learning semantic concepts is useful to enhance image representation and can significantly improve the performance of both image-to-text and text-to-image retrieval. However, existing models simply detect semantic concepts from a given image, which are less likely to deal with long-tail and occlusion concepts. Frequently co-occurred concepts in the same scene, e.g. bedroom and bed, can provide common-sense knowledge to discover other semantic-related concepts. In this paper, we develop a Scene Concept Graph (SCG) by aggregating image scene graphs and extracting frequently co-occurred concept pairs as scene common-sense knowledge. Moreover, we propose a novel model to incorporate this knowledge to improve image-text matching. Specifically, semantic concepts are detected from images and then expanded by the SCG. After learning to select relevant contextual concepts, we fuse their representations with the image embedding feature to feed into the matching module. Extensive experiments are conducted on Flickr30K and MSCOCO datasets, and prove that our model achieves state-of-the-art results due to the effectiveness of incorporating the external SCG.

• #1744
Improving Multilingual Sentence Embedding using Bi-directional Dual Encoder with Additive Margin Softmax
Yinfei Yang, Gustavo Hernandez Abrego, Steve Yuan, Mandy Guo, Daniel Cer, Qinlan Shen, Yun-hsuan Sung, Brian Strope, Ray Kurzweil
Information Retrieval

In this paper, we present an approach to learn multilingual sentence embeddings using a bi-directional dual-encoder with additive margin softmax. The embeddings are able to achieve state-of-the-art results on the United Nations (UN) parallel corpus retrieval task. In all the languages tested, the system achieves P@1 of 86% or higher. We use pairs retrieved by our approach to train NMT models that achieve similar performance to models trained on gold pairs. We explore simple document-level embeddings constructed by averaging our sentence embeddings. On the UN document-level retrieval task, document embeddings achieve around 97% on P@1 for all experimented language pairs. Lastly, we evaluate the proposed model on the BUCC mining task. The learned embeddings with raw cosine similarity scores achieve competitive results compared to current state-of-the-art models, and with a second-stage scorer we achieve a new state-of-the-art level on this task.

• #2165
RLTM: An Efficient Neural IR Framework for Long Documents
Chen Zheng, Yu Sun, Shengxian Wan, Dianhai Yu
Information Retrieval

Deep neural networks have achieved significant improvements in information retrieval (IR). However, most existing models are computational costly and can not efficiently scale to long documents. This paper proposes a novel End-to-End neural ranking framework called Reinforced Long Text Matching (RLTM) which matches a query with long documents efficiently and effectively. The core idea behind the framework can be analogous to the human judgment process which firstly locates the relevance parts quickly from the whole document and then matches these parts with the query carefully to obtain the final label. Firstly, we select relevant sentences from the long documents by a coarse and efficient matching model. Secondly, we generate a relevance score by a more sophisticated matching model based on the sentence selected. The whole model is trained jointly with reinforcement learning in a pairwise manner by maximizing the expected score gaps between positive and negative examples. Experimental results demonstrate that RLTM has greatly improved the efficiency and effectiveness of the states-of-the-art models.

• #6207
Revealing Semantic Structures of Texts: Multi-grained Framework for Automatic Mind-map Generation
Yang Wei, Honglei Guo, Jinmao Wei, Zhong Su
Information Retrieval

A mind-map is a diagram used to represent ideas linked to and arranged around a central concept. It’s easier to visually access the knowledge and ideas by converting a text to a mind-map. However, highlighting the semantic skeleton of an article remains a challenge. The key issue is to detect the relations amongst concepts beyond intra-sentence. In this paper, we propose a multi-grained framework for automatic mind-map generation. That is, a novel neural network is taken to detect the relations at first, which employs multi-hop self-attention and gated recurrence network to reveal the directed semantic relations via sentences. A recursive algorithm is then designed to select the most salient sentences to constitute the hierarchy. The human-like mind-map is automatically constructed with the key phrases in the salient sentences. Promising results have been achieved on the comparison with manual mind-maps. The case studies demonstrate that the generated mind-maps reveal the underlying semantic structures of the articles.

### Thursday 1509:30 - 10:30CV|LV - Language and Vision 2 (R10)

Chair: TBD
• #4960
Densely Supervised Hierarchical Policy-Value Network for Image Paragraph Generation
Siying Wu, Zheng-Jun Zha, Zilei Wang, Houqiang Li, Feng Wu
Language and Vision 2

Image paragraph generation aims to describe an image with a paragraph in natural language. Compared to image captioning with a single sentence, paragraph generation provides more expressive and fine-grained description for storytelling. Existing approaches mainly optimize paragraph generator towards minimizing word-wise cross entropy loss, which neglects linguistic hierarchy of paragraph and results in sparse" supervision for generator learning. In this paper, we propose a novel Densely Supervised Hierarchical Policy-Value (DHPV) network for effective paragraph generation. We design new hierarchical supervisions consisting of hierarchical rewards and values at both sentence and word levels. The joint exploration of hierarchical rewards and values provides dense supervision cues for learning effective paragraph generator. We propose a new hierarchical policy-value architecture which exploits compositionality at token-to-token and sentence-to-sentence levels simultaneously and can preserve the semantic and syntactic constituent integrity. Extensive experiments on the Stanford image-paragraph benchmark have demonstrated the effectiveness of the proposed DHPV approach with performance improvements over multiple state-of-the-art methods.

• #6309
Generative Visual Dialogue System via Weighted Likelihood Estimation
Heming Zhang, Shalini Ghosh, Larry Heck, Stephen Walsh, Junting Zhang, Jie Zhang, C.-C. Jay Kuo
Language and Vision 2

The key challenge of generative Visual Dialogue (VD) systems is to respond to human queries with informative answers in natural and contiguous conversation flow. Traditional Maximum Likelihood Estimation-based methods only learn from positive responses but ignore the negative responses, and consequently tend to yield safe or generic responses. To address this issue, we propose a novel training scheme in conjunction with weighted likelihood estimation method. Furthermore, an adaptive multi-modal reasoning module is designed, to accommodate various dialogue scenarios automatically and select relevant information accordingly. The experimental results on the VisDial benchmark demonstrate the superiority of our proposed algorithm over other state-of-the-art approaches, with an improvement of 5.81% on recall@10.

• #4479
Swell-and-Shrink: Decomposing Image Captioning by Transformation and Summarization
Hanzhang Wang, Hanli Wang, Kaisheng Xu
Language and Vision 2

Image captioning is currently viewed as a problem analogous to machine translation. However, it always suffers from poor interpretability, coarse or even incorrect descriptions on regional details. Moreover, information abstraction and compression, as essential characteristics of captioning, are always overlooked and seldom discussed. To overcome the shortcomings, a swell-shrink method is proposed to redefine image captioning as a compositional task which consists of two separated modules: modality transformation and text compression. The former is guaranteed to accurately transform adequate visual content into textual form while the latter consists of a hierarchical LSTM which particularly emphasizes on removing the redundancy among multiple phrases and organizing the final abstractive caption. Additionally, the order and quality of region of interest and modality processing are studied to give insights of better understanding the influence of regional visual cues on language forming. Experiments demonstrate the effectiveness of the proposed method.

• #323
Connectionist Temporal Modeling of Video and Language: a Joint Model for Translation and Sign Labeling
Dan Guo, Shengeng Tang, Meng Wang
Language and Vision 2

Online sign interpretation suffers from challenges presented by hybrid semantics learning among sequential variations of visual representations, sign linguistics, and textual grammars. This paper proposes a Connectionist Temporal Modeling (CTM) network for sentence translation and sign labeling. To acquire short-term temporal correlations, a Temporal Convolution Pyramid (TCP) module is performed on 2D CNN features to realize (2D+1D)=pseudo 3D' CNN features. CTM aligns the pseudo 3D' with the original 3D CNN clip features and fuses them. Next, we implement a connectionist decoding scheme for long-term sequential learning. Here, we embed dynamic programming into the decoding scheme, which learns temporal mapping among features, sign labels, and the generated sentence directly. The solution using dynamic programming to sign labeling is considered as pseudo labels. Finally, we utilize the pseudo supervision cues in an end-to-end framework. A joint objective function is designed to measure feature correlation, entropy regularization on sign labeling, and probability maximization on sentence decoding. The experimental results using the RWTH-PHOENIX-Weather and USTC-CSL datasets demonstrate the effectiveness of the proposed approach.

### Thursday 1509:30 - 10:30ML|SSL - Semi-Supervised Learning 2 (R11)

Chair: TBD
• #2799
Sequential and Diverse Recommendation with Long Tail
Yejin Kim, Kwangseob Kim, Chanyoung Park, Hwanjo Yu
Semi-Supervised Learning 2

Sequential recommendation is a task that learns a temporal dynamic of a user behavior in sequential data and predicts items that a user would like afterward. However, diversity has been rarely emphasized in the context of sequential recommendation. Sequential and diverse recommendation must learn temporal preference on diverse items as well as on general items. Thus, we propose a sequential and diverse recommendation model that predicts a ranked list containing general items and also diverse items without compromising significant accuracy.To learn temporal preference on diverse items as well as on general items, we cluster and relocate consumed long tail items to make a pseudo ground truth for diverse items and learn the preference on long tail using recurrent neural network, which enables us to directly learn a ranking function. Extensive online and offline experiments deployed on a commercial platform demonstrate that our models significantly increase diversity while preserving accuracy compared to the state-of-the-art sequential recommendation model, and consequently our models improve user satisfaction.

• #3603
Belief Propagation Network for Hard Inductive Semi-Supervised Learning
Jaemin Yoo, Hyunsik Jeon, U Kang
Semi-Supervised Learning 2

Given graph-structured data, how can we train a robust classifier in a semi-supervised setting that performs well without neighborhood information? In this work, we propose belief propagation networks (BPN), a novel approach to train a deep neural network in a hard inductive setting, where the test data are given without neighborhood information. BPN uses a differentiable classifier to compute the prior distributions of nodes, and then diffuses the priors through the graphical structure, independently from the prior computation. This separable structure improves the generalization performance of BPN for isolated test instances, compared with previous approaches that jointly use the feature and neighborhood without distinction. As a result, BPN outperforms state-of-the-art methods in four datasets with an average margin of 2.4% points in accuracy.

• #3866
Deep Correlated Predictive Subspace Learning for Incomplete Multi-View Semi-Supervised Classification
Zhe Xue, Junping Du, Dawei Du, Wenqi Ren, Siwei Lyu
Semi-Supervised Learning 2

Incomplete view information often results in failure cases of the conventional multi-view methods. To address this problem, we propose a Deep Correlated Predictive Subspace Learning (DCPSL) method for incomplete multi-view semi-supervised classification. Specifically, we integrate semi-supervised deep matrix factorization, correlated subspace learning, and multi-view label prediction into a unified framework to jointly learn the deep correlated predictive subspace and multi-view shared and private label predictors. DCPSL is able to learn proper subspace representation that is suitable for class label prediction, which can further improve the performance of classification. Extensive experimental results on various practical datasets demonstrate that the proposed method performs favorably against the state-of-the-art methods.

• #2203
Approximate Manifold Regularization: Scalable Algorithm and Generalization Analysis
Jian Li, Yong Liu, Rong Yin, Weiping Wang
Semi-Supervised Learning 2

Graph-based semi-supervised learning is one of the most popular and successful semi-supervised learning approaches. Unfortunately, it suffers from high time and space complexity, at least quadratic with the number of training samples. In this paper, we propose an efficient graph-based semi-supervised algorithm with a sound theoretical guarantee. The proposed method combines Nystrom subsampling and preconditioned conjugate gradient descent, substantially improving computational efficiency and reducing memory requirements. Extensive empirical results reveal that our method achieves the state-of-the-art performance in a short time even with limited computing resources.

### Thursday 1509:30 - 10:30ML|DM - Data Mining 8 (R12)

Chair: TBD
• #1024
Finding Statistically Significant Interactions between Continuous Features
Mahito Sugiyama, Karsten Borgwardt
Data Mining 8

The search for higher-order feature interactions that are statistically significantly associated with a class variable is of high relevance in fields such as Genetics or Healthcare, but the combinatorial explosion of the candidate space makes this problem extremely challenging in terms of computational efficiency and proper correction for multiple testing. While recent progress has been made regarding this challenge for binary features, we here present the first solution for continuous features. We propose an algorithm which overcomes the combinatorial explosion of the search space of higher-order interactions by deriving a lower bound on the p-value for each interaction, which enables us to massively prune interactions that can never reach significance and to thereby gain more statistical power. In our experiments, our approach efficiently detects all significant interactions in a variety of synthetic and real-world datasets.

• #1850
MEGAN: A Generative Adversarial Network for Multi-View Network Embedding
Yiwei Sun, Suhang Wang, Tsung-Yu Hsieh, Xianfeng Tang, Vasant Honavar
Data Mining 8

Data from many real-world applications can be naturally represented by multi-view networks where the different views encode different types of relationships (e.g., friendship, shared interests in music, etc.) between real-world individuals or entities. There is an urgent need for methods to obtain low-dimensional, information preserving and typically nonlinear embeddings of such multi-view networks. However, most of the work on multi-view learning focuses on data that lack a network structure, and most of the work on network embeddings has focused primarily on single-view networks. Against this background, we consider the multi-view network representation learning problem, i.e., the problem of constructing low-dimensional information preserving embeddings of multi-view networks. Specifically, we investigate a novel Generative Adversarial Network (GAN) framework for Multi-View Network Embedding, namely MEGAN, aimed at preserving the information from the individual network views, while accounting for connectivity across (and hence complementarity of and correlations between) different views. The results of our experiments on two real-world multi-view data sets show that the embeddings obtained using MEGAN outperform the state-of-the-art methods on node classification, link prediction and visualization tasks.

• #2893
Privacy-aware Synthesizing for Crowdsourced Data
Mengdi Huai, Di Wang, Chenglin Miao, Jinhui Xu, Aidong Zhang
Data Mining 8

Although releasing crowdsourced data brings many benefits to the data analyzers to conduct statistical analysis, it may violate crowd users' data privacy. A potential way to address this problem is to employ traditional differential privacy (DP) mechanisms and perturb the data with some noise before releasing them. However, considering that there usually exist conflicts among the crowdsourced data and these data are usually large in volume, directly using these mechanisms can not guarantee good utility in the setting of releasing crowdsourced data. To address this challenge, in this paper, we propose a novel privacy-aware synthesizing method (i.e., PrisCrowd) for crowdsourced data, based on which the data collector can release users' data with strong privacy protection for their private information, while at the same time, the data analyzer can achieve good utility from the released data. Both theoretical analysis and extensive experiments on real-world datasets demonstrate the desired performance of the proposed method.

• #3237
Fast Algorithm for K-Truss Discovery on Public-Private Graphs
Soroush Ebadian, Xin Huang
Data Mining 8

In public-private graphs, users share one public graph and have their own private graphs. A private graph consists of personal private contacts that only can be visible to its owner, e.g., hidden friend lists on Facebook and secret following on Sina Weibo. However, existing public-private analytic algorithms have not yet investigated the dense subgraph discovery of k-truss, where each edge is contained in at least k-2 triangles. This paper aims at finding k-truss efficiently in public-private graphs. The core of our solution is a novel algorithm to update k-truss with node insertions. We develop a classification-based hybrid strategy of node insertions and edge insertions to incrementally compute k-truss in public-private graphs. Extensive experiments validate the superiority of our proposed algorithms against state-of-the-art methods on real-world datasets.

### Thursday 1509:30 - 10:30KRR|GSTR - Geometric, Spatial, and Temporal Reasoning 1 (R13)

Chair: TBD
• #1122
Graph WaveNet for Deep Spatial-Temporal Graph Modeling
Zonghan Wu, Shirui Pan, Guodong Long, Jing Jiang, Chengqi Zhang
Geometric, Spatial, and Temporal Reasoning 1

Spatial-temporal graph modeling is an important task to analyze the spatial relations and temporal trends of components in a system. Existing approaches mostly capture the spatial dependency on a fixed graph structure, assuming that the underlying relation between entities is pre-determined. However, the explicit graph structure (relation) does not necessarily reflect the true dependency and genuine relation may be missing due to the incomplete connections in the data. Furthermore, existing methods are ineffective to capture the temporal trends as the RNNs or CNNs employed in these methods cannot capture long-range temporal sequences. To overcome these limitations, we propose in this paper a novel graph neural network architecture, {Graph WaveNet}, for spatial-temporal graph modeling. By developing a novel adaptive dependency matrix and learn it through node embedding, our model can precisely capture the hidden spatial dependency in the data. With a stacked dilated 1D convolution component whose receptive field grows exponentially as the number of layers increases, Graph WaveNet is able to handle very long sequences. These two components are integrated seamlessly in a unified framework and the whole framework is learned in an end-to-end manner. Experimental results on two public traffic network datasets, METR-LA and PEMS-BAY, demonstrate the superior performance of our algorithm.

• #1619
Travel Time Estimation without Road Networks: An Urban Morphological Layout Representation Approach
Wuwei Lan, Yanyan Xu, Bin Zhao
Geometric, Spatial, and Temporal Reasoning 1

Travel time estimation is a crucial task for not only personal travel scheduling but also city planning. Previous methods focus on modeling toward road segments or sub-paths, then summing up for a final prediction, which have been recently replaced by deep neural models with end-to-end training. Usually, these methods are based on explicit feature representations, including spatio-temporal features, traffic states, etc. Here, we argue that the local traffic condition is closely tied up with the land-use and built environment, i.e., metro stations, arterial roads, intersections, commercial area, residential area, and etc, yet the relation is time-varying and too complicated to model explicitly and efficiently. Thus, this paper proposes an end-to-end multi-task deep neural model, named Deep Image to Time (DeepI2T), to learn the travel time mainly from the built environment images, a.k.a. the morphological layout images, and showoff the new state-of-the-art performance on real-world datasets in two cities. Moreover, our model is designed to tackle both path-aware and path-blind scenarios in the testing phase. This work opens up new opportunities of using the publicly available morphological layout images as considerable information in multiple geography-related smart city applications.

• #2786
Cross-City Transfer Learning for Deep Spatio-Temporal Prediction
Leye Wang, Xu Geng, Xiaojuan Ma, Feng Liu, Qiang Yang
Geometric, Spatial, and Temporal Reasoning 1

Spatio-temporal prediction is a key type of tasks in urban computing, e.g., traffic flow and air quality. Adequate data is usually a prerequisite, especially when deep learning is adopted. However, the development levels of different cities are unbalanced, and still many cities suffer from data scarcity. To address the problem, we propose a novel cross-city transfer learning method for deep spatio-temporal prediction tasks, called RegionTrans. RegionTrans aims to effectively transfer knowledge from a data-rich source city to a data-scarce target city. More specifically, we first learn an inter-city region matching function to match each target city region to a similar source city region. A neural network is designed to effectively extract region-level representation for spatio-temporal prediction. Finally, an optimization algorithm is proposed to transfer learned features from the source city to the target city with the region matching function. Using citywide crowd flow prediction as a demonstration experiment, we verify the effectiveness of RegionTrans. Results show that RegionTrans can outperform the state-of-the-art fine-tuning deep spatio-temporal prediction models by reducing up to 10.7% prediction error.

• #5150
Data Complexity and Rewritability of Ontology-Mediated Queries in Metric Temporal Logic under the Event-Based Semantics
Vladislav Ryzhikov, Przemyslaw Andrzej Walega, Michael Zakharyaschev
Geometric, Spatial, and Temporal Reasoning 1

We investigate the data complexity of answering queries mediated by metric temporal logic ontologies under the event-based semantics assuming that data instances are finite timed words timestamped with binary fractions. We identify classes of ontology-mediated queries answering which can be done in AC0, NC1, L, NL, P, and coNP for data complexity, provide their rewritings to first-order logic and its extensions with primitive recursion, transitive closure or datalog, and establish lower complexity bounds.

### Thursday 1509:30 - 10:30HAI|HCI - Human-Computer Interaction (R14)

Chair: TBD
• #1153
Decoding EEG by Visual-guided Deep Neural Networks
Zhicheng Jiao, Haoxuan You, Fan Yang, Xin Li, Han Zhang, Dinggang Shen
Human-Computer Interaction

Decoding visual stimuli from brain activities is an interdisciplinary study of neuroscience and computer vision. With the emerging of Human-AI Collaboration, Human-Computer Interaction, and the development of advanced machine learning models, brain decoding based on deep learning attracts more attention. Electroencephalogram (EEG) is a widely used neurophysiology tool. Inspired by the success of deep learning on image representation and neural decoding, we proposed a visual-guided EEG decoding method that contains a decoding stage and a generation stage. In the classiﬁcation stage, we designed a visual-guided convolutional neural network (CNN) to obtain more discriminative representations from EEG, which are applied to achieve the classiﬁcation results. In the generation stage, the visual-guided EEG features are input to our improved deep generative model with a visual consistence module to generate corresponding visual stimuli. With the help of our visual-guided strategies, the proposed method outperforms traditional machine learning methods and deep learning models in the EEG decoding task.

• #2824
DeepFlow: Detecting Optimal User Experience From Physiological Data Using Deep Neural Networks
Marco Maier, Daniel Elsner, Chadly Marouane, Meike Zehnle, Christoph Fuchs
Human-Computer Interaction

Flow is an affective state of optimal experience, total immersion and high productivity. While often associated with (professional) sports, it is a valuable information in several scenarios ranging from work environments to user experience evaluations, and we expect it to be a potential reward signal for human-in-the-loop reinforcement learning systems. Traditionally, flow has been assessed through questionnaires which prevents its use in online, real-time environments. In this work, we present our findings towards estimating a user's flow state based on physiological signals measured using wearable devices. We conducted a study with participants playing the game Tetris in varying difficulty levels, leading to boredom, stress, and flow. Using an end-to-end deep learning architecture, we achieve an accuracy of 67.50% in recognizing high flow vs. low flow states and 49.23% in distinguishing all three affective states boredom, flow, and stress.

• #3446
Explaining Reinforcement Learning to Mere Mortals: An Empirical Study
Andrew Anderson, Jonathan Dodge, Amrita Sadarangani, Zoe Juozapaitis, Evan Newman, Jed Irvine, Souti Chattopadhyay, Alan Fern, Margaret Burnett
Human-Computer Interaction

We present a user study to investigate the impact of explanations on non-experts? understanding of reinforcement learning (RL) agents. We investigate both a common RL visualization, saliency maps (the focus of attention), and a more recent explanation type, reward-decomposition bars (predictions of future types of rewards). We designed a 124 participant, four-treatment experiment to compare participants? mental models of an RL agent in a simple Real-Time Strategy (RTS) game. Our results show that the combination of both saliency and reward bars were needed to achieve a statistically significant improvement in mental model score over the control. In addition, our qualitative analysis of the data reveals a number of effects for further study.

• #3526
Multi-agent Attentional Activity Recognition
Kaixuan Chen, Lina Yao, Dalin Zhang, Bin Guo, Zhiwen Yu
Human-Computer Interaction

Multi-modality is an important feature of sensor based activity recognition. In this work, we consider two inherent characteristics of human activities, the spatially-temporally varying salience of features and the relations between activities and corresponding body part motions. Based on these, we propose a multi-agent spatial-temporal attention model. The spatial-temporal attention mechanism helps intelligently select informative modalities and their active periods. And the multiple agents in the proposed model represent activities with collective motions across body parts by independently selecting modalities associated with single motions. With a joint recognition goal, the agents share gained information and coordinate their selection policies to learn the optimal recognition model. The experimental results on four real-world datasets demonstrate that the proposed model outperforms the state-of-the-art methods.

### Thursday 1509:30 - 10:30CSAT|CA - Constraint Satisfaction 2 (R15)

Chair: TBD
• #10978
(Sister Conferences Best Papers Track) Constraint Games for stable and optimal allocation of demands in SDN
Anthony Palmieri, Arnaud Lallouet, Luc Pons
Constraint Satisfaction 2

Software Defined Networking (or SDN) allows to apply a centralized control over a network of computers in order to provide better global throughput. One of the problem to solve is the multi-commodity flow routing where a set of demands (or commodities) have to be routed at minimum cost. In contrast with other versions of this problem, we consider here problems with congestion that change the cost of a link according to the capacity used. We propose here to study centralized routing with Constraint Programming and selfish routing with Constraint Games. Selfish routing reaches a Nash equilibrium and is important for the perceived quality of the solution since no user is able to improve his cost by changing only his own path. We present real and synthetic benchmarks with hundreds or thousands players and we show that for this problem the worst selfish routing is often close to the optimal centralized solution.

• #10974
(Sister Conferences Best Papers Track) Not All FPRASs are Equal: Demystifying FPRASs for DNF-Counting
Kuldeep S. Meel, Aditya A. Shrotri, Moshe Y. Vardi
Constraint Satisfaction 2

The problem of counting the number of solutions of a DNF formula, also called #DNF, is a fundamental problem in AI with wide-ranging applications. Owing to the intractability of the exact variant, efforts have focused on the design of approximate techniques. Consequently, several  Fully Polynomial Randomized Approximation Schemes (FPRASs) based on Monte Carlo techniques have been proposed. Recently, it was discovered that hashing-based techniques too lend themselves to FPRASs for #DNF. Despite significant improvements, the complexity of the hashing-based FPRAS is still worse than that of the best Monte Carlo FPRAS by polylog factors. Two questions were left unanswered in previous works: Can the complexity of the hashing-based techniques be improved? How do these approaches compare empirically?  In this paper, we first propose a new search procedure for the hashing-based FPRAS that removes the polylog factors from its time complexity. We then present the first empirical study of runtime behavior of different FPRASs for #DNF, which produces a nuanced picture. We observe that there is no single best algorithm for all formulas and that the algorithm with one of the worst time complexities solves the largest number of benchmarks.

• #4522
Entropy-Penalized Semidefinite Programming
Mikhail Krechetov, Jakub Marecek, Yury Maximov, Martin Takac
Constraint Satisfaction 2

Low-rank methods for semi-definite programming (SDP) have gained a lot of interest recently, especially in machine learning applications. Their analysis often involves determinant-based or Schatten- norm penalties, which are difficult to implement in practice due to high computational efforts. In this paper, we propose Entropy-Penalized Semi-Definite Programming (EP-SDP), which provides a unified framework for a broad class of penalty functions used in practice to promote a low-rank solution. We show that EP-SDP problems admit an efficient numerical algorithm, having (almost) linear time complexity of the gradient computation; this makes it useful for many machine learning and optimization problems. We illustrate the practical efficiency of our approach on several combinatorial optimization and machine learning problems.

• #6520
Acquiring Integer Programs from Data
Mohit Kumar, Stefano Teso, Luc De Raedt
Constraint Satisfaction 2

Integer programming (IP) is widely used within operations research to model and solve complex combinatorial problems such as personnel rostering and assignment problems. Modelling such problems is difficult for non-experts and expensive when hiring domain experts to perform the modelling. For many tasks, however, examples of working solutions are readily available. We propose ARNOLD, an approach that partially automates the modelling step by learning an integer program from example solutions. Contrary to existing alternatives, ARNOLD natively handles multi-dimensional quantities and non-linear operations, which are at the core of IP problems, and it only requires examples of feasible solution. The main challenge is to efficiently explore the space of possible programs. Our approach pairs a general-to-specific traversal strategy with a nested lexicographic ordering in order to prune large portions of the space of candidate constraints while avoiding visiting the same candidate multiple times. Our empirical evaluation shows that ARNOLD can acquire models for a number of realistic benchmark problems

### Thursday 1509:30 - 10:30Early Career 3 - Early Career Spotlight 3 (R16)

Chair: TBD
• #11068
End-User Programming of General Purpose Robots
Maya Cakmak
Early Career Spotlight 3

Robots that can assist humans in everyday tasks have the potential to improve people’s quality of life and bring independence to persons with disabilities. A key challenge in realizing such robots is programming them to meet the unique and changing needs of users and to robustly function in their unique environments. Most research in robotics targets this challenge by attempting to develop universal or adaptive robotic capabilities. This approach has had limited success because it is extremely difficult to anticipate all possible scenarios and use-cases for general-purpose robots or collect massive amounts of data that represent each scenario and use-case. Instead, we aim to develop robots that can be programmed in-context and by end-users after they are deployed, tailoring it for the specific environment and user preferences. To that end, we have been developing new techniques and tools that allow intuitive and rapid programming of robots to do useful tasks. Here, we describe some of these techniques and tools and review observations from a number of user studies with potential users and real world deployments. We also discuss opportunities for automating parts of the programming process to reduce the burden on end-user programmers and improve the quality of robot programs they develop.

• #11069
EdgeML: Machine Learning in 2KB of RAM
Prateek Jain
Early Career Spotlight 3

Several critical applications require ML inference on resource-constrained devices, especially in the domain of Internet of Things like smartcity, smarthouse etc. Furthermore, many of these problems reduce to time-series classification. Unfortunately, existing techniques for time-series classification like recurrent neural networks are very difficult to deploy on the tiny devices due to computation and memory bottleneck. In this talk, we will discuss two new methods FastGRNN and EMI-RNN that can enable time-series inference on devices as small as Arduino Uno that have 2KB of RAM. Our methods can provide as much as 70x speed-up and compression over state-of-the-art methods like LSTM, GRU, while also providing strong theoretical guarantees. This talk is based on joint works with Manik Varma, Harsha Simhadri, Kush Bhatia, Don Dennis, Ashish Kumar, Aditya Kusupati, Manish Singh, and Shishir Patil.

### Thursday 1509:30 - 18:00DB3 - Demo Booths 3 (Z1)

Chair: TBA
• #11047
VEST: A System for Vulnerability Exploit Scoring & Timing
Haipeng Chen, Jing Liu, Rui Liu, Noseong Park, V. S. Subrahmanian
Demo Booths 3

Knowing if/when a cyber-vulnerability will be exploited and how severe the vulnerability is can help enterprise security officers (ESOs) come up with appropriate patching schedules. Today, this ability is severely compromised: our study of data from Mitre and NIST shows that on average there is a 132 day gap between the announcement of a vulnerability by Mitre and the time NIST provides an analysis with severity score estimates and 8 important severity attributes. Many attacks happen during this very 132-day window. We present Vulnerability Exploit Scoring \& Timing (VEST), a system for (early) prediction and visualization of if/when a vulnerability will be exploited, and its estimated severity attributes and score.

• #11020
Crowd View: Converting Investors' Opinions into Indicators
Chung-Chi Chen, Hen-Hsen Huang, Hsin-Hsi Chen
Demo Booths 3

This paper demonstrates an opinion indicator (OI) generation system, named Crowd View, with which traders can refer to the fine-grained opinions, beyond the market sentiment (bullish/bearish), from crowd investors when trading financial instruments. We collect the real-time textual information from Twitter, and convert it into five kinds of OIs, including the support price level, resistance price level, price target, buy-side cost, and sell-side cost. The OIs for all component stocks in Dow Jones Industrial Average Index (DJI) are provided, and shown with the real-time stock price for comparison and analysis. The information embedding in the OIs and the application scenarios are introduced.

• #11026
An Online Intelligent Visual Interaction System
Anxiang Zeng, Han Yu, Xin Gao, Kairi Ou, Zhenchuan Huang, Peng Hou, Mingli Song, Jingshu Zhang, Chunyan Miao
Demo Booths 3

This paper proposes an Online Intelligent Visual Interactive System (OIVIS), which can be applied to various live video broadcast and short video scenes to provide an interactive user experience. In the live video broadcast, the anchor can issue various commands by using pre-defined gestures, and can trigger real-time background replacement to create an immersive atmosphere. To support such dynamic interactivity, we implemented algorithms including real-time gesture recognition and real-time video portrait segmentation, developed a deep network inference framework, and a real-time rendering framework AI Gender at the front end to create a complete set of visual interaction solutions for use in resource constrained mobile.

• #11027
The Open Vault Challenge - Learning how to build calibration-free interactive systems by cracking the code of a vault.
Jonathan Grizou
Demo Booths 3

This demo takes the form of a challenge to the IJCAI community. A physical vault, secured by a 4-digit code, will be placed in the demo area. The author will publicly open the vault by entering the code on a touch-based interface, and as many times as requested. The challenge to the IJCAI participants will be to crack the code, open the vault, and collect its content. The interface is based on previous work on calibration-free interactive systems that enables a user to start instructing a machine without the machine knowing how to interpret the user’s actions beforehand. The intent and the behavior of the human are simultaneously learned by the machine. An online demo and videos are available for readers to participate in the challenge. An additional interface using vocal commands will be revealed on the demo day, demonstrating the scalability of our approach to continuous input signals.

• #11030
ERICA and WikiTalk
Divesh Lala, Graham Wilcock, Kristiina Jokinen, Tatsuya Kawahara
Demo Booths 3

The demo shows ERICA, a highly realistic female android robot, and WikiTalk, an application that helps robots to talk about thousands of topics using information from Wikipedia. The combination of ERICA and WikiTalk results in more natural and engaging human-robot conversations.

• #11031
ACTA: A Tool for Argumentative Clinical Trial Analysis
Tobias Mayer, Elena Cabrio, Serena Villata
Demo Booths 3

Argumentative analysis of textual documents of various nature (e.g., persuasive essays, online discussion blogs, scientific articles) allows to detect the main argumentative components (i.e., premises and claims) present in the text and to predict whether these components are connected to each other by argumentative relations (e.g., support and attack), leading to the identification of (possibly complex) argumentative structures. Given the importance of argument-based decision making in medicine, in this demo paper we introduce ACTA, a tool for automating the argumentative analysis of clinical trials. The tool is designed to support doctors and clinicians in identifying the document(s) of interest about a certain disease, and in analyzing the main argumentative content and PICO elements.

• #11044
DISPUTool -- A tool for the Argumentative Analysis of Political Debates
Shohreh Haddadan, Elena Cabrio, Serena Villata
Demo Booths 3

Political debates are the means used by political candidates to put forward and justify their positions in front of the electors with respect to the issues at stake. Argument mining is a novel research area in Artificial Intelligence, aiming at analyzing discourse on the pragmatics level and applying a certain argumentation theory to model and automatically analyze textual data. In this paper, we present DISPUTool, a tool designed to ease the work of historians and social science scholars in analyzing the argumentative content of political speeches. More precisely, DISPUTool allows to explore and automatically identify argumentative components over the 39 political debates from the last 50 years of US presidential campaigns (1960-2016).

• #11048
Mappa Mundi: An Interactive Artistic Mind Map Generator with Artificial Imagination
Ruixue Liu, Baoyang Chen, Meng Chen, Youzheng Wu, Zhijie Qiu, Xiaodong He
Demo Booths 3

We present a novel real-time, collaborative, and interactive AI painting system, Mappa Mundi, for artistic Mind Map creation. The system consists  of a voice-based input interface, an automatic topic expansion module, and an image projection module. The key innovation is to inject Artificial Imagination into painting creation by considering lexical and phonological similarities of language, learning and inheriting artist’s original painting style, and applying the principles of Dadaism and impossibility of improvisation. Our system indicates that AI and artist can collaborate seamlessly to create imaginative artistic painting and Mappa Mundi has been applied in art exhibition in UCCA, Beijing.

• #11053
GraspSnooker: Automatic Chinese Commentary Generation for Snooker Videos
Zhaoyue Sun, Jiaze Chen, Hao Zhou, Deyu Zhou, Lei Li, Mingmin Jiang
Demo Booths 3

We demonstrate a web-based software system, GraspSnooker, which is able to automatically generate Chinese text commentaries for snooker game videos. It consists of a video analyzer, a strategy predictor and a commentary generator. As far as we know, it is the first attempt on snooker commentary generation, which might be helpful for snooker learners to understand the game.

• #11042
SAGE: A Hybrid Geopolitical Event Forecasting System
Fred Morstatter, Aram Galstyan, Gleb Satyukov, Daniel Benjamin, Andres Abeliuk, Mehrnoosh Mirtaheri, KSM Tozammel Hossain, Pedro Szekely, Emilio Ferrara, Akira Matsui, Mark Steyvers, Stephen Bennet, David Budescu, Mark Himmelstein, Michael Ward, Andreas Beger, Michele Catasta, Rok Sosic, Jure Leskovec, Pavel Atanasov, Regina Joseph, Rajiv Sethi, Ali Abbas
Demo Booths 3

Forecasting of geopolitical events is a notoriously difficult task, with experts failing to significantly outperform a random baseline across many types of forecasting events. One successful way to increase the performance of forecasting tasks is to turn to crowdsourcing: leveraging many forecasts from non-expert users. Simultaneously, advances in machine learning have led to models that can produce reasonable, although not perfect, forecasts for many tasks. Recent efforts have shown that forecasts can be further improved by hybridizing'' human forecasters: pairing them with the machine models in an effort to combine the unique advantages of both. In this demonstration, we present Synergistic Anticipation of Geopolitical Events (SAGE), a platform for human/computer interaction that facilitates human reasoning with machine models.

### Thursday 1511:00 - 12:30Panel: Highly refereed AI conferences (R1)

Chair: TBD
• Highly refereed AI conferences
Carles Sierra
Panel: Highly refereed AI conferences
• ### Thursday 1511:00 - 12:30AI-HWB - ST: AI for Improving Human Well-Being 6 (R2)

Chair: TBD
• #3353
Three-quarter Sibling Regression for Denoising Observational Data
Shiv Shankar, Daniel Sheldon, Tao Sun, John Pickering, Thomas G. Dietterich
ST: AI for Improving Human Well-Being 6

Many ecological studies and conservation policies are based on field observations of species, which can be affected by systematic variability introduced by the observation process. A recently introduced causal modeling technique called 'half-sibling regression' can detect and correct for systematic errors in measurements of multiple independent random variables. However, it will remove intrinsic variability if the variables are dependent, and therefore does not apply to many situations, including modeling of species counts that are controlled by common causes. We present a technique called 'three-quarter sibling regression' to partially overcome this limitation. It can filter the effect of systematic noise when the latent variables have observed common causes. We provide theoretical justification of this approach, demonstrate its effectiveness on synthetic data, and show that it reduces systematic detection variability due to moon brightness in moth surveys.

• #2908
CounterFactual Regression with Importance Sampling Weights
Negar Hassanpour, Russell Greiner
ST: AI for Improving Human Well-Being 6

Perhaps the most pressing concern of a patient diagnosed with cancer is her life expectancy under various treatment options. For a binary-treatment case, this translates into estimating the difference between the outcomes (e.g., survival time) of the two available treatment options -- i.e., her Individual Treatment Effect (ITE). This is especially challenging to estimate from observational data, as that data has selection bias: the treatment assigned to a patient depends on that patient's attributes. In this work, we borrow ideas from domain adaptation to address the distributional shift between the source (outcome of the administered treatment, appearing in the observation training data) and target (outcome of the alternative treatment) that exists due to selection bias. We propose a context-aware importance sampling re-weighing scheme, built on top of a representation learning module, for estimating ITEs. Empirical results on two publicly available benchmarks demonstrate that the proposed method significantly outperforms state-of-the-art.

• #223
Risk Assessment for Networked-guarantee Loans Using High-order Graph Attention Representation
Dawei Cheng, Yi Tu, Zhenwei Ma, Zhibin Niu, Liqing Zhang
ST: AI for Improving Human Well-Being 6

Assessing and predicting the default risk of networked-guarantee loans is critical for the commercial banks and financial regulatory authorities. The guarantee relationships between the loan companies are usually modeled as directed networks. Learning the informative low-dimensional representation of the networks is important for the default risk prediction of loan companies, even for the assessment of systematic financial risk level. In this paper, we propose a high-order graph attention representation method (HGAR) to learn the embedding of guarantee networks. Because this financial network is different from other complex networks, such as social, language, or citation networks, we set the binary roles of vertices and define high-order adjacent measures based on financial domain characteristics. We design objective functions in addition to a graph attention layer to capture the importance of nodes. We implement a productive learning strategy and prove that the complexity is near-linear with the number of edges, which could scale to large datasets. Extensive experiments demonstrate the superiority of our model over state-of-the-art method. We also evaluate the model in a real-world loan risk control system, and the results validate the effectiveness of our proposed approaches.

• #295
Scribble-to-Painting Transformation with Multi-Task Generative Adversarial Networks
Jinning Li, Yexiang Xue
ST: AI for Improving Human Well-Being 6

We propose the Dual Scribble-to-Painting Network (DSP-Net), which is able to produce artistic paintings based on user-generated scribbles. In scribble-to-painting transformation, a neural net has to infer additional details of the image, given relatively sparse information contained in the outlines of the scribble. Therefore, it is more challenging than classical image style transfer, in which the information content is reduced from photos to paintings. Inspired by the human cognitive process, we propose a multi-task generative adversarial network, which consists of two jointly trained neural nets -- one for generating artistic images and the other one for semantic segmentation. We demonstrate that joint training on these two tasks brings in additional benefit. Experimental result shows that DSP-Net outperforms state-of-the-art models both visually and quantitatively. In addition, we publish a large dataset for scribble-to-painting transformation.

• #2506
MINA: Multilevel Knowledge-Guided Attention for Modeling Electrocardiography Signals
Shenda Hong, Cao Xiao, Tengfei Ma, Hongyan Li, Jimeng Sun
ST: AI for Improving Human Well-Being 6

Electrocardiography (ECG) signals are commonly used to diagnose various cardiac abnormalities. Recently, deep learning models showed initial success on modeling ECG data, however they are mostly black-box, thus lack interpretability needed for clinical usage. In this work, we propose MultIlevel kNowledge-guided Attention networks (MINA) that predict heart diseases from ECG signals with intuitive explanation aligned with medical knowledge. By extracting multilevel (beat-, rhythm- and frequency-level) domain knowledge features separately, MINA combines the medical knowledge and ECG data via a multilevel attention model, making the learned models highly interpretable. Our experiments showed MINA achieved PR-AUC 0.9436 (outperforming the best baseline by 5.51%) in real world ECG dataset. Finally, MINA also demonstrated robust performance and strong interpretability against signal distortion and noise contamination.

• #3057
Learning Interpretable Relational Structures of Hinge-loss Markov Random Fields
Yue Zhang, Arti Ramesh
ST: AI for Improving Human Well-Being 6

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 24.0px Helvetica; color: #000000} Statistical relational models such as Markov logic networks (MLNs) and hinge-loss Markov random fields (HL-MRFs) are specified using templated weighted first-order logic clauses, leading to the creation of complex, yet easy-to-encode models that effectively combine uncertainty and logic. Learning the structure of these models from data reduces the human effort of identifying the right structures. In this work, we present an asynchronous deep reinforcement learning algorithm to automatically learn HL-MRF clause structures. Our algorithm possesses the ability to learn semantically meaningful structures that appeal to human intuition and understanding, while simultaneously being able to learn structures from data, thus learning structures that have both the desirable qualities of interpretability and good prediction performance. The asynchronous nature of our algorithm further provides the ability to learn diverse structures via exploration, while remaining scalable. We demonstrate the ability of the models to learn semantically meaningful structures that also achieve better prediction performance when compared with a greedy search algorithm, a path-based algorithm with L-1 regularization, and manually defined rules on two computational social science applications: i) modeling recovery in alcohol use disorder, and ii) detecting bullying.

### Thursday 1511:00 - 12:30ML|DL - Deep Learning 6 (R3)

Chair: TBD
• #665
Nostalgic Adam: Weighting More of the Past Gradients When Designing the Adaptive Learning Rate
Haiwen Huang, Chang Wang, Bin Dong
Deep Learning 6

First-order optimization algorithms have been proven prominent in deep learning. In particu- lar, algorithms such as RMSProp and Adam are extremely popular. However, recent works have pointed out the lack of “long-term memory” in Adam-like algorithms, which could hamper their performance and lead to divergence. In our study, we observe that there are benefits of weighting more of the past gradients when designing the adaptive learning rate. We therefore propose an algorithm called the Nostalgic Adam (NosAdam) with theoretically guaranteed convergence at the best known convergence rate. NosAdam can be regarded as a fix to the non-convergence issue of Adam in alternative to the recent work of [Reddi et al., 2018]. Our preliminary numerical experiments show that NosAdam is a promising alternative al- gorithm to Adam. The proofs, code and other supplementary materials are already released.

• #1899
Coarse-to-Fine Image Inpainting via Region-wise Convolutions and Non-Local Correlation
Yuqing Ma, Xianglong Liu, Shihao Bai, Lei Wang, Dailan He, Aishan Liu
Deep Learning 6

Recently deep neural networks have achieved promising performance for filling large missing regions in image inpainting tasks. They usually adopted the standard convolutional architecture over the corrupted image, where the same convolution filters try to restore the diverse information on both existing and missing regions, and meanwhile ignores the long-distance correlation among the regions. Only relying on the surrounding areas inevitably leads to meaningless contents and artifacts, such as color discrepancy and blur. To address these problems, we first propose region-wise convolutions to locally deal with the different types of regions, which can help exactly reconstruct existing regions and roughly infer the missing ones from existing regions at the same time. Then, a non-local operation is introduced to globally model the correlation among different regions, promising visual consistency between missing and existing regions. Finally, we integrate the region-wise convolutions and non-local correlation in a coarse-to-fine framework to restore semantically reasonable and visually realistic images. Extensive experiments on three widely-used datasets for image inpainting tasks have been conducted, and both qualitative and quantitative experimental results demonstrate that the proposed model significantly outperforms the state-of-the-art approaches, especially for the large irregular missing regions.

• #2830
Localizing Unseen Activities in Video via Image Query
Zhu Zhang, Zhou Zhao, Zhijie Lin, Jingkuan Song, Deng Cai
Deep Learning 6

Action localization in untrimmed videos is an important topic in the field of video understanding. However, existing action localization methods are restricted to a pre-defined set of actions and cannot localize unseen activities. Thus, we consider a new task to localize unseen activities in videos via image queries, named Image-Based Activity Localization. This task faces three inherent challenges: (1) how to eliminate the influence of semantically inessential contents in image queries; (2) how to deal with the fuzzy localization of inaccurate image queries; (3) how to determine the precise boundaries of target segments. We then propose a novel self-attention interaction localizer to retrieve unseen activities in an end-to-end fashion. Specifically, we first devise a region self-attention method with relative position encoding to learn fine-grained image region representations. Then, we employ a local transformer encoder to build multi-step fusion and reasoning of image and video contents. We next adopt an order-sensitive localizer to directly retrieve the target segment. Furthermore, we construct a new dataset ActivityIBAL by reorganizing the ActivityNet dataset. The extensive experiments show the effectiveness of our method.

• #3365
Light-Weight Hybrid Convolutional Network for Liver Tumor Segmentation
Jianpeng Zhang, Yutong Xie, Pingping Zhang, Hao Chen, Yong Xia, Chunhua Shen
Deep Learning 6

Automated segmentation of liver tumors in contrast-enhanced abdominal computed tomography (CT) scans is essential in assisting medical professionals to evaluate tumor development and make fast therapeutic schedule. Although deep convolutional neural networks (DCNNs) have contributed many breakthroughs in image segmentation, this task remains challenging, since 2D DCNNs are incapable of exploring the inter-slice information and 3D DCNNs are too complex to be trained with the available small dataset. In this paper, we propose the light-weight hybrid convolutional network (LW-HCN) to segment the liver and its tumors in CT volumes. Instead of combining a 2D and a 3D networks for coarse-to-fine segmentation, LW-HCN has a encoder-decoder structure, in which 2D convolutions used at the bottom of the encoder decreases the complexity and 3D convolutions used in other layers explore both spatial and temporal information. To further reduce the complexity, we design the depthwise and spatiotemporal separate (DSTS) factorization for 3D convolutions, which not only reduces parameters dramatically but also improves the performance. We evaluated the proposed LW-HCN model against several recent methods on the LiTS and 3D-IRCADb datasets and achieved, respectively, the Dice per case of 73.0% and 94.1% for tumor segmentation, setting a new state of the art.

• #10964
(Sister Conferences Best Papers Track) A Dual Approach to Verify and Train Deep Networks
Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Timothy Mann, Pushmeet Kohli
Deep Learning 6

This paper (published in 2018) addressed the problem of formally verifying desirable properties of neural networks, i.e., obtaining provable guarantees that neural networks satisfy specifications relating their inputs and outputs (robustness to bounded norm adversarial perturbations, for example). Most previous work on this topic was limited in its applicability by the size of the network, network architecture and the complexity of properties to be verified. In contrast, our framework applies to a general class of activation functions and specifications. We formulate verification as an optimization problem (seeking to find the largest violation of the specification) and solve a Lagrangian relaxation of the optimization problem to obtain an upper bound on the worst case violation of the specification being verified. Our approach is anytime i.e. it can be stopped at any time and a valid bound on the maximum violation can be obtained. Finally, we highlight how this approach can be used to train models that are amenable to verification.

• #10982
(Journal track) Learning in the Machine: Random Backpropagation and the Deep Learning Channel
Pierre Baldi, Peter Sadowski, Zhiqin Lu
Deep Learning 6

Random backpropagation (RBP) is a variant of the backpropagation algorithm for training neural networks, where the transpose of the forward matrices are replaced by fixed random matrices in the calculation of the weight updates. It is remarkable both because of its effectiveness, in spite of using random matrices to communicate error information, and because it completely removes the requirement of maintaining symmetric weights in a physical neural system. To better understand RBP, we compare different algorithms in terms of the information available locally to each neuron. In the process, we derive several alternatives to RBP, including skipped RBP (SRBP), adaptive RBP (ARBP), sparse RBP, and study their behavior through simulations. These simulations show that many variants are also robust deep learning algorithms, but that the derivative of the transfer function is important in the learning rule. Finally, we prove several mathematical results including the convergence to fixed points of linear chains of arbitrary length, the convergence to fixed points of linear autoencoders with decorrelated data, the long-term existence of solutions for linear systems with a single hidden layer and convergence in special cases, and the convergence to fixed points of non-linear chains, when the derivative of the activation functions is included.

### Thursday 1511:00 - 12:30ML|RL - Reinforcement Learning 5 (R4)

Chair: TBD
• #3006
Using Natural Language for Reward Shaping in Reinforcement Learning
Prasoon Goyal, Scott Niekum, Raymond J. Mooney
Reinforcement Learning 5

Recent reinforcement learning (RL) approaches have shown strong performance in complex domains, such as Atari games, but are highly sample inefficient. A common approach to reduce interaction time with the environment is to use reward shaping, which involves carefully designing reward functions that provide the agent intermediate rewards for progress towards the goal. Designing such rewards remains a challenge, though. In this work, we use natural language instructions to perform reward shaping. We propose a framework that maps free-form natural language instructions to intermediate rewards, that can seamlessly be integrated into any standard reinforcement learning algorithm. We experiment with Montezuma's Revenge from the Atari video games domain, a popular benchmark in RL. Our experiments on a diverse set of 15 tasks demonstrate that for the same number of interactions with the environment, using language-based rewards can successfully complete the task 60% more often, averaged across all tasks, compared to learning without language.

• #5233
Monte Carlo Tree Search for Policy Optimization
Xiaobai Ma, Katherine Driggs-Campbell, Zongzhang Zhang, Mykel J. Kochenderfer
Reinforcement Learning 5

Gradient-based methods are often used for policy optimization in deep reinforcement learning, despite being vulnerable to local optima and saddle points. Although gradient-free methods (e.g., genetic algorithms or evolution strategies) help mitigate these issues, poor initialization and local optima are still concerns in highly nonconvex spaces. This paper presents a method for policy optimization based on Monte-Carlo tree search and gradient-free optimization. Our method, called Monte-Carlo tree search for policy optimization (MCTSPO), provides a better exploration-exploitation trade-off through the use of the upper confidence bound heuristic. We demonstrate improved performance on reinforcement learning tasks with deceptive or sparse reward functions compared to popular gradient-based and deep genetic algorithm baselines.

• #6078
Hill Climbing on Value Estimates for Search-control in Dyna
Yangchen Pan, Hengshuai Yao, Amir-massoud Farahmand, Martha White
Reinforcement Learning 5

Dyna is an architecture for model-based reinforcement learning (RL), where simulated experience from a model is used to update policies or value functions. A key component of Dyna is \emph{search-control}, the mechanism to generate the state and action from which the agent queries the model, which remains largely unexplored. In this work, we propose to generate such states by using the trajectory obtained from Hill Climbing (HC) the current estimate of the value function. This has the effect of propagating value from high-value regions and of preemptively updating value estimates of the regions that the agent is likely to visit next. We derive a noisy stochastic projected gradient ascent algorithm for hill climbing, and highlight a connection to Langevin dynamics. We provide an empirical demonstration on four classical domains that our algorithm, HC-Dyna, can obtain significant sample efficiency improvements. We study the properties of different sampling distributions for search-control, and find that there appears to be a benefit specifically from using the samples generated by climbing on current value estimates from low-value to high-value region.

• #6239
Recurrent Existence Determination Through Policy Optimization
Baoxiang Wang
Reinforcement Learning 5

Binary determination of the presence of objects is one of the problems where humans perform extraordinarily better than computer vision systems, in terms of both speed and preciseness. One of the possible reasons is that humans can skip most of the clutter and attend only on salient regions. Recurrent attention models (RAM) are the first computational models to imitate the way humans process images via the REINFORCE algorithm. Despite that RAM is originally designed for image recognition, we extend it and present recurrent existence determination, an attention-based mechanism to solve the existence determination. Our algorithm employs a novel $k$-maximum aggregation layer and a new reward mechanism to address the issue of delayed rewards, which would have caused the instability of the training process. The experimental analysis demonstrates significant efficiency and accuracy improvement over existing approaches, on both synthetic and real-world datasets.

• #6265
Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space
Zhou Fan, Rui Su, Weinan Zhang, Yong Yu
Reinforcement Learning 5

In this paper we propose a hybrid architecture of actor-critic algorithms for reinforcement learning in parameterized action space, which consists of multiple parallel sub-actor networks to decompose the structured action space into simpler action spaces along with a critic network to guide the training of all sub-actor networks. While this paper is mainly focused on parameterized action space, the proposed architecture, which we call hybrid actor-critic, can be extended for more general action spaces which has a hierarchical structure. We present an instance of the hybrid actor-critic architecture based on proximal policy optimization (PPO), which we refer to as hybrid proximal policy optimization (H-PPO). Our experiments test H-PPO on a collection of tasks with parameterized action space, where H-PPO demonstrates superior performance over previous methods of parameterized action reinforcement learning.

• #763
Transfer of Temporal Logic Formulas in Reinforcement Learning
Zhe Xu, Ufuk Topcu
Reinforcement Learning 5

Transferring high-level knowledge from a source task to a target task is an effective way to expedite reinforcement learning (RL). For example, propositional logic and first-order logic have been used as representations of such knowledge. We study the transfer of knowledge between tasks in which the timing of the events matters. We call such tasks temporal tasks. We concretize similarity between temporal tasks through a notion of logical transferability, and develop a transfer learning approach between different yet similar temporal tasks. We first propose an inference technique to extract metric interval temporal logic (MITL) formulas in sequential disjunctive normal form from labeled trajectories collected in RL of the two tasks. If logical transferability is identified through this inference, we construct a timed automaton for each sequential conjunctive subformula of the inferred MITL formulas from both tasks. We perform RL on the extended state which includes the locations and clock valuations of the timed automata for the source task. We then establish mappings between the corresponding components (clocks, locations, etc.) of the timed automata from the two tasks, and transfer the extended Q-functions based on the established mappings. Finally, we perform RL on the extended state for the target task, starting with the transferred extended Q-functions. Our implementation results show, depending on how similar the source task and the target task are, that the sampling efficiency for the target task can be improved by up to one order of magnitude by performing RL in the extended state space, and further improved by up to another order of magnitude using the transferred extended Q-functions.

Chair: TBD
• #1422