### Monday 2119:00 - 22:30Social Event (Melbourne Cricket Ground)

• Opening Reception
Social Event
• ### Tuesday 2208:00 - 09:00Opening (Plenary 2)

Chair: Fahiem Bacchus
• Opening Remarks
Opening
• ### Tuesday 2209:00 - 10:00Keynote (Plenary 2)

Chair: Carles Sierra
• Provably beneficial AI
Stuart Russell
Keynote
• ### Tuesday 2210:30 - 12:00ML-CL1 - Classification 1 (204)

Chair: James Kwok
• #912
Locality Adaptive Discriminant Analysis
Xuelong Li, Mulin Chen, Feiping Nie, Qi Wang
Classification 1

Linear Discriminant Analysis (LDA) is a popular technique for supervised dimensionality reduction, and its performance is satisfying when dealing with Gaussian distributed data. However, the neglect of local data structure makes LDA inapplicable to many real-world situations. So some works focus on the discriminant analysis between neighbor points, which can be easily affected by the noise in the original data space. In this paper, we propose a new supervised dimensionality reduction method, Locality Adaptive Discriminant Analysis (LADA), to lean a representative subspace of the data. Compared to LDA and its variants, the proposed method has three salient advantages: (1) it finds the principle projection directions without imposing any assumption on the data distribution; (2) it’s able to exploit the local manifold structure of data in the desired subspace; (3) it exploits the points’ neighbor relationship automatically without introducing any additional parameter to be tuned. Performance on synthetic datasets and real-world benchmark datasets demonstrate the superiority of the proposed method.

• #1160
Interactive Image Segmentation via Pairwise Likelihood Learning
Tao Wang, Quansen Sun, Qi Ge, Zexuan Ji, Qiang Chen, Guiyu Xia
Classification 1

This paper presents an interactive image segmentation approach where the segmentation problem is formulated as a probabilistic estimation manner. Instead of measuring the distances between unseeded pixels and seeded pixels, we measure the similarities between pixel pairs and seed pairs to improve the robustness to the seeds. The unary prior probability of each pixel belonging to the foreground F and background B can be effectively estimated based on the similarities with label pairs (F, F),(F, B),(B, F) and (B, B). Then a likelihood learning framework is proposed to fuse the region and boundary information of the image by imposing the smoothing constraint on the unary potentials. Experiments on challenging data sets demonstrate that the proposed method can obtain better performance than state-of-the-art methods.

• #2367
Unsupervised Deep Video Hashing with Balanced Rotation
Gengshen Wu, Li Liu, Yuchen Guo, Guiguang Ding, Jungong Han, Jialie Shen, Ling Shao
Classification 1

Recently, hashing video contents for fast retrieval has received increasing attention due to the enormous growth of online videos. As the extension of image hashing techniques, traditional video hashing methods mainly focus on seeking the appropriate video features but pay little attention to how the video-specific features can be leveraged to achieve optimal binarization. In this paper, an end-to-end hashing framework, namely Unsupervised Deep Video Hashing (UDVH), is proposed, where feature extraction, balanced code learning and hash function learning are integrated and optimized in a self-taught manner. Particularly, distinguished from previous work, our framework enjoys two novelties: 1) an unsupervised hashing method that integrates the feature clustering and feature binarization, enabling the neighborhood structure to be preserved in the binary space; 2) a smart rotation applied to the video-specific features that are widely spread in the low-dimensional space such that the variance of dimensions can be balanced, thus generating more effective hash codes. Extensive experiments have been performed on two real-world datasets and the results demonstrate its superiority, compared to the state-of-the-art video hashing methods. To bootstrap further developments, the source code will be made publically available.

• #2473
MAM-RNN: Multi-level Attention Model Based RNN for Video Captioning
Xuelong Li, Bin Zhao, Xiaoqiang Lu
Classification 1

Visual information is quite important for the task of video captioning. However, in the video, there are a lot of uncorrelated content, which may cause interference to generate a correct caption. Based on this point, we attempt to exploit the visual features which are most correlated to the caption. In this paper, a Multi-level Attention Model based Recurrent Neural Network (MAM-RNN) is proposed, where MAM is utilized to encode the visual feature and RNN works as the decoder to generate the video caption. During generation, the proposed approach is able to adaptively attend to the salient regions in the frame and the frames correlated to the caption. Practically, the experimental results on two benchmark datasets, i.e., MSVD and Charades, have shown the excellent performance of the proposed approach.

• #3338
JM-Net and Cluster-SVM for Aerial Scene Classification
Xiaoqiang Lu, Yuan Yuan, Jie Fang
Classification 1

Aerial scene classification, which is a fundamental problem for remote sensing imagery, can automatically label an aerial image with a specific semantic category. Although deep learning has achieved competitive performance for aerial scene classification, training the conventional neural networks with aerial datasets will easily stick in overtting and local minimum. Because the aerial datasets only contain a few hundreds or thousands images, meanwhile the conventional networks usually contain millions of parameters to be trained. To address the problem, a novel convolutional neural network named JM-Net is proposed in this paper, which has different size of convolution kernels in same layer and ignores the fully convolytion layer, so it has fewer parameters and can be trained well on aerial datasets. Additionally, Cluster-SVM, a strategy to improve the accuracy and speed up the classification is used in the specific task. Finally, our method suparssed the state-of-art result on the challenging AID dataset while cost shorter time and used smaller storage space.

• #3645
Multi-Class Support Vector Machine via Maximizing Multi-Class Margins
Jie Xu, Xianglong Liu, Zhouyuan Huo, Cheng Deng, Feiping Nie, Heng Huang
Classification 1

Support Vector Machine (SVM) is originally proposed as a binary classification model, and it has already achieved great success in different applications. In reality, it is more often to solve a problem which has more than two classes. So, it is natural to extend SVM to a multi-class classifier. There have been many works proposed to construct a multi-class classifier based on binary SVM, such as one versus all strategy, one versus one strategy and Weston's multi-class SVM. One versus all strategy and one versus one strategy split the multi-class problem to multiple binary classification subproblems, and we need to train multiple binary classifiers. Weston's multi-class SVM is formed by ensuring risk constraints and imposing a specific regularization, like Frobenius norm. It is not derived by maximizing the margin between hyperplane and training data which is the motivation in SVM. In this paper, we propose a multi-class SVM model from the perspective of maximizing margin between training points and hyperplane, and analyze the relation between our model and other related methods. In the experiment, it shows that our model can get better or compared results when comparing with other related methods.

### Tuesday 2210:30 - 12:00ML-FSC1 - Feature Selection and Construction 1 (210)

Chair: Chang Xu
• #1236
TUCH: Turning Cross-view Hashing into Single-view Hashing via Generative Adversarial Nets
Xin Zhao, Guiguang Ding, Yuchen Guo, Jungong Han, Yue Gao
Feature Selection and Construction 1

Cross-view retrieval, which focuses on searching images as response to text queries or vice versa, has received increasing attention recently. Cross-view hashing is to efficiently solve the cross-view retrieval problem with binary hash codes. Most existing works on cross-view hashing exploit multi-view embedding method to tackle this problem, which inevitably causes the information loss in both image and text domains. Inspired by the Generative Adversarial Nets (GANs), this paper presents a new model that is able to Turn Cross-view Hashing into single-view hashing (TUCH), thus enabling the information of image to be preserved as much as possible. TUCH is a novel deep architecture that integrates a language model network T for text feature extraction, a generator network G to generate fake images from text feature and a hashing network H for learning hashing functions to generate compact binary codes. Our architecture effectively unifies joint generative adversarial learning and cross-view hashing. Extensive empirical evidence shows that our TUCH approach achieves state-of-the-art results, especially on text to image retrieval, based on image-sentences datasets, i.e. standard IAPRTC-12 and large-scale Microsoft COCO.

• #2966
Predicting Human Interaction via Relative Attention Model
Yichao Yan, Bingbing Ni, Xiaokang Yang
Feature Selection and Construction 1

Predicting human interaction is challenging as the on-going activity has to be inferred based on a partially observed video. Essentially, a good algorithm should effectively model the mutual influence between the two interacting subjects. Also, only a small region in the scene is discriminative for identifying the on-going interaction. In this work, we propose a relative attention model to explicitly address these difficulties. Built on a tri-coupled deep recurrent structure representing both interacting subjects and global interaction status, the proposed network collects spatio-temporal information from each subject, rectified with global interaction information, yielding effective interaction representation. Moreover, the proposed network also unifies an attention module to assign higher importance to the regions which are relevant to the on-going action. Extensive experiments have been conducted on two public datasets, and the results demonstrate that the proposed relative attention network successfully predicts informative regions between interacting subjects, which in turn yields superior human interaction prediction accuracy.

• #4040
Optimal Feature Selection for Decision Robustness in Bayesian Networks
YooJung Choi, Adnan Darwiche, Guy Van den Broeck
Feature Selection and Construction 1

In many applications, one can define a large set of features to support the classification task at hand. At test time, however, these become prohibitively expensive to evaluate, and only a small subset of features is used, often selected for their information-theoretic value. For threshold-based, Naive Bayes classifiers, recent work has suggested selecting features that maximize the expected robustness of the classifier, that is, the expected probability it maintains its decision after seeing more features. We propose the first algorithm to compute this expected same-decision probability for general Bayesian network classifiers, based on compiling the network into a tractable circuit representation. Moreover, we develop a search algorithm for optimal feature selection that utilizes efficient incremental circuit modifications. Experiments on Naive Bayes, as well as more general networks, show the efficacy and distinct behavior of this decision-making approach.

• #1185
Semi-supervised Feature Selection via Rescaled Linear Regression
Xiaojun Chen, Guowen Yuan, Feiping Nie, Joshua Zhexue Huang
Feature Selection and Construction 1

With the rapid increase of complex and high-dimensional sparse data, demands for new methods to select features by exploiting both labeled and unlabeled data have increased. Least regression based feature selection methods usually learn a projection matrix and evaluate the importances of features using the projection matrix, which is lack of theoretical explanation. Moreover, these methods cannot find both global and sparse solution of the projection matrix. In this paper, we propose a novel semi-supervised feature selection method which can learn both global and sparse solution of the projection matrix. The new method extends the least square regression model by rescaling the regression coefficients in the least square regression with a set of scale factors, which are used for ranking the features. It has shown that the new model can learn global and sparse solution. Moreover, the introduction of scale factors provides a theoretical explanation for why we can use the projection matrix to rank the features. A simple yet effective algorithm with proved convergence is proposed to optimize the new model. Experimental results on eight real-life data sets show the superiority of the method.

• #1739
Multimodal Linear Discriminant Analysis via Structural Sparsity
Yu Zhang, Yuan Jiang
Feature Selection and Construction 1

Linear discriminant analysis (LDA) is a widely used supervised dimensionality reduction technique. Even though the LDA method has many real-world applications, it has some limitations such as the single-modal problem that each class follows a normal distribution. To solve this problem, we propose a method called multimodal linear discriminant analysis (MLDA). By generalizing the between-class and within-class scatter matrices, the MLDA model can allow each data point to have its own class mean which is called the instance-specific class mean. Then in each class, data points which share the same or similar instance-specific class means are considered to form one cluster or modal. In order to learn the instance-specific class means, we use the ratio of the proposed generalized between-class scatter measure over the proposed generalized within-class scatter measure, which encourages the class separability, as a criterion. The observation that each class will have a limited number of clusters inspires us to use a structural sparse regularizor to control the number of unique instance-specific class means in each class. Experiments on both synthetic and real-world datasets demonstrate the effectiveness of the proposed MLDA method.

• #2240
Learning Mahalanobis Distance Metric: Considering Instance Disturbance Helps
Han-Jia Ye, De-Chuan Zhan, Xue-Min Si, Yuan Jiang
Feature Selection and Construction 1

Mahalanobis distance metric takes feature weights and correlation into account in the distance computation, which can improve the performance of many similarity/dissimilarity based methods, such as kNN. Most existing distance metric learning methods obtain metric based on the raw features and side information but neglect the reliability of them. Noises or disturbances on instances will make changes on their relationships, so as to affect the learned metric.In this paper, we claim that considering disturbance of instances may help the distance metric learning approach get a robust metric, and propose the Distance metRIc learning Facilitated by disTurbances (DRIFT) approach. In DRIFT, the noise or the disturbance of each instance is learned. Therefore, the distance between each pair of (noisy) instances can be better estimated, which facilitates side information utilization and metric learning.Experiments on prediction and visualization clearly indicate the effectiveness of the proposed approach.

### Tuesday 2210:30 - 12:00ML-DM1 - Data Mining 1 (211)

Chair: Jingrui He
• #1655
Enhancing Campaign Design in Crowdfunding: A Product Supply Optimization Perspective
Qi Liu, Guifeng Wang, Hongke Zhao, Chuanren Liu, Tong Xu, Enhong Chen
Data Mining 1

Crowdfunding is an emerging Internet application for creators designing campaigns (projects) to collect funds from public investors. Usually, the limited budget of the creator is manually divided into several perks (reward options), that should fit various market demand and further bring different monetary contributions for the campaign. Therefore, it is very challenging for each creator to design an effective campaign. To this end, in this paper, we aim to enhance the funding performance of the newly proposed campaigns, with a focus on optimizing the product supply of perks. Specifically, given the expected budget and the perks of a campaign, we propose a novel solution to automatically recommend the optimal product supply to every perk for balancing the expected return of this campaign against the risk. Along this line, we define it as a constrained portfolio selection problem, where the risk of each campaign is measured by a multi-task learning method. Finally, experimental results on the real-world crowdfunding data clearly prove that the optimized product supply can help improve the campaign performance significantly, and meanwhile, our multi-task learning method could more precisely estimate the risk of each campaign.

• #1203
Video Question Answering via Hierarchical Spatio-Temporal Attention Networks
Zhou Zhao, Qifan Yang, Deng Cai, Xiaofei He, Yueting Zhuang
Data Mining 1

Open-ended video question answering is a challenging problem in visual information retrieval, which automatically generates the natural language answer from the referenced video content according to the question. However, the existing visual question answering works only focus on the static image, which may be ineffectively applied to video question answering due to the temporal dynamics of video contents. In this paper, we consider the problem of open-ended video question answering from the viewpoint of spatio-temporal attentional encoder-decoder learning framework. We propose the hierarchical spatio-temporal attention network for learning the joint representation of the dynamic video contents according to the given question. We then develop the encoder-decoder learning method with reasoning recurrent neural networks for open-ended video question answering. We construct a large-scale video question answering dataset. The extensive experiments show the effectiveness of our method.

• #1514
Link Prediction via Ranking Metric Dual-Level Attention Network Learning
Zhou Zhao, Ben Gao, Vincent W. Zheng, Deng Cai, Xiaofei He, Yueting Zhuang
Data Mining 1

Link prediction is a challenging problem for complex network analysis, arising in many disciplines such as social networks and telecommunication networks. Currently, many existing approaches estimate the proximity of the link endpoints for link prediction from their feature or the local neighborhood around them, which suffer from the localized view of network connections and insufficiency of discriminative feature representation. In this paper, we consider the problem of link prediction from the viewpoint of learning discriminative path-based proximity ranking metric embedding. We propose a novel ranking metric network learning framework by jointly exploiting both node-level and path-level attentional proximity of the endpoints for link prediction. We then develop the path-based dual-level reasoning attentional learning method with recurrent neural network for proximity ranking metric embedding. The extensive experiments on two large-scale datasets show that our method achieves better performance than other state-of-the-art solutions to the problem.

• #2730
Deep Matrix Factorization Models for Recommender Systems
Hong-Jian Xue, Xinyu Dai, Jianbing Zhang, Shujian Huang, Jiajun Chen
Data Mining 1

Recommender systems usually make personalized recommendation with user-item interaction ratings, implicit feedback and auxiliary information. Matrix factorization is the basic idea to predict a personalized ranking over a set of items for an individual user with the similarities among users and items. In this paper, we propose a novel matrix factorization model with neural network architecture. Firstly, we construct a user-item matrix with explicit ratings and non-preference implicit feedback. With this matrix as the input, we present a deep structure learning architecture to learn a common low dimensional space for the representations of users and items. Secondly, we design a new loss function based on binary cross entropy, in which we consider both explicit ratings and implicit feedback for a better optimization. The experimental results show the effectiveness of both our proposed model and the loss function. On several benchmark datasets, our model outperformed other state-of-the-art methods. We also conduct extensive experiments to evaluate the performance within different experimental settings.

• #2816
Image-embodied Knowledge Representation Learning
Ruobing Xie, Zhiyuan Liu, Huanbo Luan, Maosong Sun
Data Mining 1

Entity images could provide significant visual information for knowledge representation learning. Most conventional methods learn knowledge representations merely from structured triples, ignoring rich visual information extracted from entity images. In this paper, we propose a novel Image-embodied Knowledge Representation Learning model (IKRL), where knowledge representations are learned with both triple facts and images. More specifically, we first construct representations for all images of an entity with a neural image encoder. These image representations are then integrated into an aggregated image-based representation via an attention-based method. We evaluate our IKRL models on knowledge graph completion and triple classification. Experimental results demonstrate that our models outperform all baselines on both tasks, which indicates the significance of visual information for knowledge representations and the capability of our models in learning knowledge representations with images.

• #2848
Two dimensional Large Margin Nearest Neighbor for Matrix Classification
Kun Song, Feiping Nie, Junwei Han
Data Mining 1

Matrices are common forms of data that are encountered in a wide range of real applications. How to classify this kind of data is an important research topic. In this paper, we propose a novel distance metric learning method named two dimensional large margin nearest neighbor (2DLMNNN), for improving the performance of k nearest neighbor (KNN) classifier in matrix classification. In the proposed method, left and right projection matrices are employed to define the matrix-based Mahalanobis distance, which is used to construct the objective aimed at separating points in different classes by a large margin. The parameters in those two projection matrices are much less than that in its vector-based counterpart, thus our method reduces the risks of overfitting. We also introduce a framework for solving the proposed 2DLMNN. The convergence behavior, initialization, and parameter determination are also analyzed. Compared with vector-based methods, 2DLMNN performs better for matrix data classification. Promising experimental results on several data sets are provided to demonstrate the effectiveness of our method.

### Tuesday 2210:30 - 12:00ML-LGM - Learning Graphical Models (212)

Chair: Liz Sonenberg
• #1339
Deep Graphical Feature Learning for Face Sketch Synthesis
Mingrui Zhu, Nannan Wang, Xinbo Gao, Jie Li
Learning Graphical Models

The exemplar-based face sketch synthesis method generally contains two steps: neighbor selection and reconstruction weight representation. Pixel intensities are widely used as features by most of the existing exemplar-based methods, which lacks of representation ability and robustness to light variations and clutter backgrounds. We present a novel face sketch synthesis method combining generative exemplar-based method and discriminatively trained deep convolutional neural networks (dCNNs) via a deep graphical feature learning framework. Our method works in both two steps by using deep discriminative representations derived from dCNNs. Instead of using it directly, we boost its representation capability by a deep graphical feature learning framework. Finally, the optimal weights of deep representations and optimal reconstruction weights for face sketch synthesis can be obtained simultaneously. With the optimal reconstruction weights, we can synthesize high quality sketches which is robust against light variations and clutter backgrounds. Extensive experiments on public face sketch databases show that our method outperforms state-of-the-art methods, in terms of both synthesis quality and recognition ability.

• #2538
Locally Consistent Bayesian Network Scores for Multi-Relational Data
Oliver Schulte, Sajjad Gholami
Learning Graphical Models

An important task for relational learning is Bayesian network (BN) structure learning. A fundamental component of structure learning is a model selection score that measures how well a model fits a dataset. We describe a new method that upgrades for multi-relational databases, a log-linear BN score designed for single-table i.i.d. data. Chickering and Meek showed that for i.i.d. data, standard BN scores are locally consistent, meaning that their maxima converge to an optimal model, that represents the data generating distribution {\em and} contains no redundant edges. Our main theorem establishes that if a model selection score is locally consistent for i.i.d. data, then our upgraded gain function is locally consistent for relational data as well. To our knowledge this is the first consistency result for relational structure learning. A novel aspect of our approach is employing a {\em gain function} that compares two models: a current vs. an alternative BN structure. In contrast, previous approaches employed a score that is a function of a single model only. Empirical evaluation on six benchmark relational databases shows that our gain function is also practically useful: On realistic size data sets, it selects informative BN structures with a better data fit than those selected by baseline single-model scores.

• #2743
Deep-dense Conditional Random Fields for Object Co-segmentation
Zehuan Yuan, Tong Lu, Yirui Wu
Learning Graphical Models

We address the problem of object co-segmentation in images. Object co-segmentation aims to segment common objects in images and has promising applications in AI agents. We solve it by proposing a co-occurrence map, which measures how likely an image region belongs to an object and also appears in other images. The co-occurrence map of an image is calculated by combining two parts: objectness scores of image regions and similarity evidences from object proposals across images. We introduce a deep-dense conditional random field framework to infer co-occurrence maps. Both similarity metric and objectness measure are learned end-to-end in a single deep network. We evaluate our method on two benchmarks and achieve competitive performance.

• #2824
Discriminative Bayesian Nonparametric Clustering
Vu Nguyen, Dinh Phung, Trung Le, Hung Bui
Learning Graphical Models

We propose a general framework for discriminative Bayesian nonparametric clustering to promote the inter-discrimination among the learned clusters in a fully Bayesian nonparametric (BNP) manner. Our method combines existing BNP clustering and discriminative models by enforcing latent cluster indices to be consistent with the predicted labels resulted from probabilistic discriminative model. This formulation results in a well-defined generative process wherein we can use either logistic regression or SVM for discrimination. Using the proposed framework, we develop two novel discriminative BNP variants: the discriminative Dirichlet process mixtures, and the discriminative-state infinite HMMs for sequential data. We develop efficient data-augmentation Gibbs samplers for posterior inference. Extensive experiments in image clustering and dynamic location clustering demonstrate that by encouraging discrimination between induced clusters, our model enhances the quality of clustering in comparison with the traditional generative BNP models.

• #3236
A Density-based Nonparametric Model for Online Event Discovery from the Social Media Data
Jinjin Guo, Zhiguo Gong
Learning Graphical Models

In this paper, we propose a novel online event discovery model DP-density to capture various events from the social media data. The proposed model can flexibly accommodate the incremental arriving of the social documents in an online manner by leveraging Dirichlet Process, and a density based technique is exploited to deduce the temporal dynamics of events. The spatial patterns of events are also incorporated in the model by a mixture of Gaussians. To remove the bias caused by the streaming process of the documents, Sequential Monte Carlo is used for the parameter inference. Our extensive experiments over two different real datasets show that the proposed model is capable to extract interpretable events effectively in terms of perplexity and coherence.

• #3743
Inverse Covariance Estimation with Structured Groups
Shaozhe Tao, Yifan Sun, Daniel Boley
Learning Graphical Models

Estimating the inverse covariance matrix of p variables from n observations is challenging when n is much less than p, since the sample covariance matrix is singular and cannot be inverted. A popular solution is to optimize for the L1 penalized estimator; however, this does not incorporate structure domain knowledge and can be expensive to optimize. We consider finding inverse covariance matrices with group structure, defined as potentially overlapping principal submatrices, determined from domain knowledge (e.g. categories or graph cliques). We propose a new estimator for this problem setting that can be derived efficiently via the conditional gradient method, leveraging chordal decomposition theory for scalability. Simulation results show significant improvement in sample complexity when the correct group structure is known. We also apply these estimators to 14,910 stock closing prices, with noticeable improvement when group sparsity is exploited.

### Tuesday 2210:30 - 12:00ML-AL - Active Learning (213)

Chair: Sarah Erfani
• #1321
On Gleaning Knowledge from Multiple Domains for Active Learning
Zengmao Wang, Bo Du, Lefei Zhang, Liangpei Zhang, Ruimin Hu, Dacheng Tao
Active Learning

How can a doctor diagnose new diseases with little historical knowledge, which are emerging over time? Active learning is a promising way to address the problem by querying the most informative samples. Since the diagnosed cases for new disease are very limited, gleaning knowledge from other domains (classical prescriptions) to prevent the bias of active leaning would be vital for accurate diagnosis. In this paper, a framework that attempts to glean knowledge from multiple domains for active learning by querying the most uncertain and representative samples from the target domain and calculating the importance weights for re-weighting the source data in a single unified formulation is proposed. The weights are optimized by both a supervised classifier and distribution matching between the source domain and target domain with maximum mean discrepancy. Besides, a multiple domains active learning method is designed based on the proposed framework as an example. The proposed method is verified with newsgroups and handwritten digits data recognition tasks, where it outperforms the state-of-the-art methods.

• #2865
High Dimensional Bayesian Optimization using Dropout
Cheng Li, Sunil Gupta, Santu Rana, Vu Nguyen, Svetha Venkatesh, Alistair Shilton
Active Learning

Scaling Bayesian optimization to high dimensions is challenging task as the global optimization of high-dimensional acquisition function can be expensive and often infeasible. Existing methods depend either on limited “active” variables or the additive form of the objective function. We propose a new method for high-dimensional Bayesian optimization, that uses a drop-out strategy to optimize only a subset of variables at each iteration. We derive theoretical bounds for the regret and show how it can inform the derivation of our algorithm. We demonstrate the efficacy of our algorithms for optimization on two benchmark functions and two real-world applications - training cascade classifiers and optimizing alloy composition.

• #3183
Cost-Effective Active Learning from Diverse Labelers
Sheng-Jun Huang, Jia-Lve Chen, Xin Mu, Zhi-Hua Zhou
Active Learning

In traditional active learning, there is only one labeler that always returns the ground truth of queried labels. However, in many applications, multiple labelers are available to offer diverse qualities of labeling with different costs. In this paper, we perform active selection on both instances and labelers, aiming to improve the classification model most with the lowest cost. While the cost of a labeler is proportional to its overall labeling quality, we also observe that different labelers usually have diverse expertise, and thus it is likely that labelers with a low overall quality can provide accurate labels on some specific instances. Based on this fact, we propose a novel active selection criterion to evaluate the cost-effectiveness of instance-labeler pairs, which ensures that the selected instance is helpful for improving the classification model, and meanwhile the selected labeler can provide an accurate label for the instance with a relative low cost. Experiments on both UCI and real crowdsourcing data sets demonstrate the superiority of our proposed approach on selecting cost-effective queries.

• #3192
Multi-instance multi-label active learning
Sheng-Jun Huang, Nengneng Gao, Songcan Chen
Active Learning

Multi-instance multi-label learning(MIML) has been successfully applied into many real-world applications. Along with the enhancing of the expressive power, the cost of labelling a MIML example increases significantly. And thus it becomes an important task to train an effective MIML model with as few labelled examples as possible. Active learning, which actively selects the most valuable data to query their labels, is a main approach to reducing labeling cost. Existing active methods achieved great success in traditional learning tasks, but cannot be directly applied to MIML problems. In this paper, we propose a MIML active learning algorithm, which exploits diversity and uncertainty in both the input and output space to query the most valuable information. This algorithm designs a novel query strategy for MIML objects specifically and acquires more precise information from the oracle without addition cost. Based on the queried information, the MIML model is then effectively trained by simultaneously optimizing the relative rank among instances and labels.

• #3228
COBRA: A Fast and Simple Method for Active Clustering with Pairwise Constraints
Toon Van Craenendonck, Sebastijan Dumancic, Hendrik Blockeel
Active Learning

Clustering is inherently ill-posed: there often exist multiple valid clusterings of a single dataset, and without any additional information a clustering system has no way of knowing which clustering it should produce. This motivates the use of constraints in clustering, as they allow users to communicate their interests to the clustering system. Active constraint-based clustering algorithms select the most useful constraints to query, aiming to produce a good clustering using as few constraints as possible. We propose COBRA, an active method that first over-clusters the data by running K-means with a $K$ that is intended to be too large, and subsequently merges the resulting small clusters into larger ones based on pairwise constraints. In its merging step, COBRA is able to keep the number of pairwise queries low by maximally exploiting constraint transitivity and entailment. We experimentally show that COBRA outperforms the state of the art in terms of clustering quality and runtime, without requiring the number of clusters in advance.

• #3879
Correlational Dueling Bandits with Application to Clinical Treatment in Large Decision Spaces
Yanan Sui, Joel W. Burdick
Active Learning

We consider sequential decision making under uncertainty, the optimization over large decision space with noisy comparative feedback. This problem can be formulated as a K-armed Dueling Bandits problem where K is the total number of decisions. When K is very large, existing dueling bandits algorithms suffer huge cumulative regret before converging on the optimal arm. This paper studies the dueling bandits problem with a large number of dependent arms. Our problem is motivated by a clinical decision making process in large decision space. We propose an efficient algorithm CorrDuel for the problem which makes decisions to simultaneously deliver effective therapy and explore the decision space. Many sequential decision making problems with large and structured decision space could be facilitated by our algorithm. After evaluated the fast convergence of CorrDuel in analysis and simulation experiments, we applied it on a live clinical trial of therapeutic spinal cord stimulation. It is the first applied algorithm towards spinal cord injury treatments and experimental results show the effectiveness and efficiency of our algorithm.

### Tuesday 2210:30 - 12:00CS-CS1 - Constraint Satisfaction 1 (216)

Chair: Felip Manyà
• #2075
On Neighborhood Singleton Consistencies
Anastasia Paparrizou, Kostas Stergiou
Constraint Satisfaction 1

CP solvers predominantly use arc consistency (AC) as the default propagation method. Many stronger consistencies, such as triangle consistencies (e.g. RPC and maxRPC) exist, but their use is limited despite results showing that they outperform AC on many problems. This is due to the intricacies involved in incorporating them into solvers. On the other hand, singleton consistencies such as SAC can be easily crafted into solvers but they are too expensive. We seek a balance between the efficiency of triangle consistencies and the ease of implementation of singleton ones. Using the recently proposed variant of SAC called Neighborhood SAC as basis, we propose a family of weaker singleton consistencies. We study them theoretically, comparing their pruning power to existing consistencies. We make a detailed experimental study using a very simple algorithm for their implementation. Results demonstrate that they outperform the existing propagation techniques, often by orders of magnitude, on a wide range of problems.

• #2485
Automatic Synthesis of Smart Table Constraints by Abstraction of Table Constraints
Baudouin Le Charlier, Minh Thanh Khong, Christophe Lecoutre, Yves Deville
Constraint Satisfaction 1

The smart table constraint represents a powerful modeling tool that has been recently introduced. This constraint allows the user to represent compactly a number of well-known (global) constraints and more generally any arbitrarily structured constraints, especially when disjunction is at stake. In many problems, some constraints are given under the basic and simple form of tables explicitly listing the allowed combinations of values. In this paper, we propose an algorithm to convert automatically any (ordinary) table into a compact smart table. Its theoretical time complexity is shown to be quadratic in the size of the input table. Experimental results demonstrate its compression efficiency on many constraint cases while showing its reasonable execution time. It is then shown that using filtering algorithms on the resulting smart table is more efficient than using state of the art filtering algorithms on the initial table.

• #2568
Learning to Run Heuristics in Tree Search
Elias B. Khalil, Bistra Dilkina, George L. Nemhauser, Shabbir Ahmed, Yufen Shao
Constraint Satisfaction 1

Primal heuristics'' are a key contributor to the improved performance of exact branch-and-bound solvers for combinatorial optimization and integer programming. Perhaps the most crucial question concerning primal heuristics is that of at which nodes they should run, to which the typical answer is via hard-coded rules or fixed solver parameters tuned, offline, by trial-and-error. Alternatively, a heuristic should be run when it is most likely to succeed, based on the problem instance's characteristics, the state of the search, etc. In this work, we study the problem of deciding at which node a heuristic should be run, such that the overall (primal) performance of the solver is optimized. To our knowledge, this is the first attempt at formalizing and systematically addressing this problem. Central to our approach is the use of Machine Learning (ML) for predicting whether a heuristic will succeed at a given node. We give a theoretical framework for analyzing this decision-making process in a simplified setting, propose a ML approach for modeling heuristic success likelihood, and design practical rules that leverage the ML models to dynamically decide whether to run a heuristic at each node of the search tree. Experimentally, our approach improves the primal performance of a state-of-the-art Mixed Integer Programming solver by up to 6% on a set of benchmark instances, and by up to 60% on a family of hard Independent Set instances.

• #3557
Learning-Based Abstractions for Nonlinear Constraint Solving
Sumanth Dathathri, Nikos Arechiga, Sicun Gao, Richard M. Murray
Constraint Satisfaction 1

We propose a new abstraction refinement procedure based on machine learning to improve the performance of nonlinear constraint solving algorithms on large-scale problems. The proposed approach decomposes the original set of constraints into smaller subsets, and uses learning algorithms to propose sequences of abstractions that take the form of conjunctions of classifiers. The core procedure is a refinement loop that keeps improving the learned results based on counterexamples that are obtained from partial constraints that are easy to solve. Experiments show that the proposed techniques significantly improve the performance of state-of-the-art constraint solvers on many challenging benchmarks. The mechanism is capable of producing intermediate symbolic abstractions that are also important for many applications and for understanding the internal structures of hard constraint solving problems.

• #3941
The Hard Problems Are Almost Everywhere For Random CNF-XOR Formulas
Jeffrey M. Dudek, Kuldeep S. Meel, Moshe Y. Vardi
Constraint Satisfaction 1

Recent universal-hashing based approaches to sampling and counting crucially depend on the runtime performance of SAT solvers on formulas expressed as the conjunction of both CNF constraints and variable-width XOR constraints (known as CNF-XOR formulas). In this paper, we present the first study of the runtime behavior of SAT solvers equipped with XOR-reasoning techniques on random CNF-XOR formulas. We empirically demonstrate that a state-of-the-art SAT solver scales exponentially on random CNF-XOR formulas across a wide range of XOR-clause densities, peaking around the empirical phase-transition location. On the theoretical front, we prove that the solution space of a random CNF-XOR formula 'shatters' at all nonzero XOR-clause densities into well-separated components, similar to the behavior seen in random CNF formulas known to be difficult for many SAT algorithms.

• #4190
Personnel Scheduling as Satisfiability Modulo Theories
Christoph Erkinger, Nysret Musliu
Constraint Satisfaction 1

Rotating workforce scheduling (RWS) is an important real-life personnel rostering problem that appears in a large number of different business areas. In this paper, we propose a new exact approach to RWS that exploits the recent advances on Satisfiability Modulo Theories (SMT). While solving can be automated by using a number of so-called SMT-solvers, the most challenging task is to find an efficient formulation of the problem in first-order logic. We propose two new modeling techniques for RWS that encode the problem using formulas over different background theories. The first encoding provides an elegant approach based on linear integer arithmetic. Furthermore, we developed a new formulation based on bitvectors in order to achieve a more compact representation of the constraints and a reduced number of variables. These two modeling approaches were experimentally evaluated on benchmark instances from literature using different state-of-the-art SMT-solvers. Compared to other exact methods, the results of this approach showed an important improvement in the number of found solutions.

### Tuesday 2210:30 - 12:00UAI-API1 - Approximate Probabilistic Inference 1 (217)

Chair: Vanina Martinez
• #1803
Nonlinear Maximum Margin Multi-View Learning with Adaptive Kernel
Jia He, Changying Du, Changde Du, Fuzhen Zhuang, Qing He, Guoping Long
Approximate Probabilistic Inference 1

Existing multi-view learning methods based on kernel function either require the user to select and tune a single predefined kernel or have to compute and store many Gram matrices to perform multiple kernel learning. Apart from the huge consumption of manpower, computation and memory resources, most of these models seek point estimation of their parameters, and are prone to overfitting to small training data. This paper presents an adaptive kernel nonlinear max-margin multi-view learning model under the Bayesian framework. Specifically, we regularize the posterior of an efficient multi-view latent variable model by explicitly mapping the latent representations extracted from multiple data views to a random Fourier feature space where max-margin classification constraints are imposed. Assuming these random features are drawn from Dirichlet process Gaussian mixtures, we can adaptively learn shift-invariant kernels from data according to Bochners theorem. For inference, we employ the data augmentation idea for hinge loss, and design an efficient gradient-based MCMC sampler in the augmented space. Having no need to compute the Gram matrix, our algorithm scales linearly with the size of training set. Extensive experiments on real-world datasets demonstrate that our method has superior performance.

• #1969
Variational Mixtures of Gaussian Processes for Classification
Chen Luo, Shiliang Sun
Approximate Probabilistic Inference 1

Gaussian Processes (GPs) are powerful tools for machine learning which have been applied to both classification and regression. The mixture models of GPs were later proposed to further improve GPs for data modeling. However, these models are formulated for regression problems. In this work, we propose a new Mixture of Gaussian Processes for Classification (MGPC). Instead of the Gaussian likelihood for regression, MGPC employs the logistic function as likelihood to obtain the class probabilities, which is suitable for classification problems. The posterior distribution of latent variables is approximated through variational inference. The hyperparameters are optimized through the variational EM method and a greedy algorithm. Experiments are performed on multiple real-world datasets which show improvements over five widely used methods on predictive performance. The results also indicate that for classification MGPC is significantly better than the regression model with mixtures of GPs, different from the existing consensus that their single model counterparts are comparable.

• #2477
Order Statistics for Probabilistic Graphical Models
David Smith, Sara Rouhani, Vibhav Gogate
Approximate Probabilistic Inference 1

We consider the problem of computing r-th order statistics, namely finding an assignment having rank r in a probabilistic graphical model. We show that the problem is NP-hard even when the graphical model has no edges (zero-treewidth models) via a reduction from the partition problem. We use this reduction, specifically a pseudo-polynomial time algorithm for number partitioning to yield a pseudo-polynomial time approximation algorithm for solving the r-th order statistics problem in zero- treewidth models. We then extend this algorithm to arbitrary graphical models by generalizing it to tree decompositions, and demonstrate via experimental evaluation on various datasets that our proposed algorithm is more accurate than sampling algorithms.

• #3042
Dynamic Programming Bipartite Belief Propagation For Hyper Graph Matching
Zhen Zhang, Julian McAuley, Yong Li, Wei Wei, Yanning Zhang, Qinfeng Shi
Approximate Probabilistic Inference 1

Hyper graph matching problems have drawn attention recently due to their ability to embed higher order relations between nodes. In this paper, we formulate hyper graph matching problems as constrained MAP inference problems in graphical models. Whereas previous discrete approaches introduce several global correspondence vectors, we introduce only one global correspondence vector, but several local correspondence vectors. This allows us to decompose the problem into a (linear) bipartite matching problem and several belief propagation sub-problems. Bipartite matching can be solved by traditional approaches, while the belief propagation sub-problem is further decomposed as two sub-problems with optimal substructure. Then a newly proposed dynamic programming procedure is used to solve the belief propagation sub-problem. Experiments show that the proposed methods outperform state-of-the-art techniques for hyper graph matching.

• #3305
Coarse-to-Fine Lifted MAP Inference in Computer Vision
Haroun Habeeb, Ankit Anand, Mausam, Parag Singla
Approximate Probabilistic Inference 1

There is a vast body of theoretical research on lifted inference in probabilistic graphical models (PGMs). However, few demonstrations exist where lifting is applied in conjunction with top of the line applied algorithms. We pursue the applicability of lifted inference for computer vision (CV), with the insight that a globally optimal (MAP) labeling will likely have the same label for two symmetric pixels. The success of our approach lies in efficiently handling a distinct unary potential on every node (pixel), typical of CV applications. This allows us to lift the large class of algorithms that model a CV problem via PGM inference. We propose a generic template for coarse-to-fine (C2F) inference in CV, which progressively refines an initial coarsely lifted PGM for varying quality-time trade-offs. We demonstrate the performance of C2F inference by developing lifted versions of two near state-of-the-art CV algorithms for stereo vision and interactive image segmentation. We find that, against flat algorithms, the lifted versions have a much superior anytime performance, without any loss in final solution quality.

• #3911
Efficient Inference for Untied MLNs
Somdeb Sarkhel, Deepak Venugopal, Nicholas Ruozzi, Vibhav Gogate
Approximate Probabilistic Inference 1

We address the problem of scaling up local-search or sampling-based inference in Markov logic networks (MLNs) that have large shared sub-structures but no (or few) tied weights. Such untied MLNs are ubiquitous in practical applications. However, they have very few symmetries, and as a result lifted inference algorithms--the dominant approach for scaling up inference--perform poorly on them. The key idea in our approach is to reduce the hard, time-consuming sub-task in sampling algorithms, computing the sum of weights of features that satisfy a full assignment, to the problem of computing a set of partition functions of graphical models, each defined over the logical variables in a first-order formula. The importance of this reduction is that when the treewidth of all the graphical models is small, it yields an order of magnitude speedup. When the treewidth is large, we propose an over-symmetric approximation and experimentally demonstrate that it is both fast and accurate.

### Tuesday 2210:30 - 12:00ROB-VP - Vision and Perception (218)

Chair: Qi Wang
• #1935
Learning to Hallucinate Face Images via Component Generation and Enhancement
Yibing Song, Jiawei Zhang, Shengfeng He, Linchao Bao, Qingxiong Yang
Vision and Perception

We propose a two-stage method for face hallucination. First, we generate facial components of the input image using CNNs. These components represent the basic facial structures. Second, we synthesize fine-grained facial structures from high resolution training images. The details of these structures are transferred into facial components for enhancement. Therefore, we generate facial components to approximate ground truth global appearance in the first stage and enhance them through recovering details in the second stage. The experiments demonstrate that our method performs favorably against state-of-the-art methods.

• #4018
Single-Image 3D Scene Parsing Using Geometric Commonsense
Chengcheng Yu, Xiaobai Liu, Song-Chun Zhu
Vision and Perception

This paper presents a unified grammatical framework capable of reconstructing a variety of scene types (e.g., urban, campus, county etc.) from a single input image. The key idea of our approach is to study a novel commonsense reasoning framework that mainly exploits two types of prior knowledges: (i) prior distributions over a single dimension of objects, e.g., that the length of a sedan is about 4.5 meters; (ii) pair-wise relationships between the dimensions of scene entities, e.g., that the length of a sedan is shorter than a bus. These unary or relative geometric knowledge, once extracted, are fairly stable across different types of natural scenes, and are informative for enhancing the understanding of various scenes in both 2D images and 3D world. Methodologically, we propose to construct a hierarchical graph representation as a unified representation of the input image and related geometric knowledge. We formulate these objectives with a unified probabilistic formula and develop a data-driven Monte Carlo method to infer the optimal solution with both bottom-to-up and top-down computations. Results with comparisons on public datasets showed that our method clearly outperforms the alternative methods.

• #1810
Image Gradient-based Joint Direct Visual Odometry for Stereo Camera
Jianke Zhu
Vision and Perception

Visual odometry is an important research problem for computer vision and robotics. In general, the feature-based visual odometry methods heavily rely on the accurate correspondences between local salient points, while the direct approaches could make full use of whole image and perform dense 3D reconstruction simultaneously. However, the direct visual odometry usually suffers from the drawback of getting stuck at local optimum especially with large displacement, which may lead to the inferior results. To tackle this critical problem, we propose a novel scheme for stereo odometry in this paper, which is able to improve the convergence with more accurate pose. The key of our approach is a dual Jacobian optimization that is fused into a multi-scale pyramid scheme. Moreover, we introduce a gradient-based feature representation, which enjoys the merit of being robust to illumination changes. Furthermore, a joint direct odometry approach is proposed to incorporate the information from the last frame and previous keyframes. We have conducted the experimental evaluation on the challenging KITTI odometry benchmark, whose promising results show that the proposed algorithm is very effective for stereo visual odometry.

• #2522
Salient Object Detection with Semantic Priors
Tam V. Nguyen, Luoqi Liu
Vision and Perception

Salient object detection has increasingly become a popular topic in cognitive and computational sciences, including computer vision and artificial intelligence research. In this paper, we propose integrating semantic priors into the salient object detection process. Our algorithm consists of three basic steps. Firstly, the explicit saliency map is obtained based on the semantic segmentation refined by the explicit saliency priors learned from the data. Next, the implicit saliency map is computed based on a trained model which maps the implicit saliency priors embedded into regional features with the saliency values. Finally, the explicit semantic map and the implicit map are adaptively fused to form a pixel-accurate saliency map which uniformly covers the objects of interest. We further evaluate the proposed framework on two challenging datasets, namely, ECSSD and HKUIS. The extensive experimental results demonstrate that our method outperforms other state-of-the-art methods.

• #1361
Large-scale Subspace Clustering by Fast Regression Coding
Jun Li, Handong Zhao, Zhiqiang Tao, Yun Fu
Vision and Perception

Large-Scale Subspace Clustering (LSSC) is an interesting and important problem in big data era. However, most existing methods (i.e., sparse or low-rank subspace clustering) cannot be directly used for solving LSSC because they suffer from the high time complexity-quadratic or cubic in n (the number of data points). To overcome this limitation, we propose a Fast Regression Coding (FRC) to optimize regression codes, and simultaneously train a non-linear function to approximate the codes. By using FRC, we develop an efficient Regression Coding Clustering (RCC) framework to solve the LSSC problem. It consists of sampling, FRC and clustering. RCC randomly samples a small number of data points, quickly calculates the codes of all data points by using the non-linear function learned from FRC, and employs a large-scale spectral clustering method to cluster the codes. Besides, we provide a theorem guarantee that the non-linear function has a first-order approximation ability and a group effect. The theorem manifests that the codes are easily used to construct a dividable similarity graph. Compared with the state-of-the-art LSSC methods, our model achieves better clustering results in large-scale datasets.

• #1432
Projective Low-rank Subspace Clustering via Learning Deep Encoder
Jun Li, Liu Hongfu, Handong Zhao, Yun Fu
Vision and Perception

Low-rank subspace clustering (LRSC) has been considered as the state-of-the-art method on small datasets. LRSC constructs a desired similarity graph by low-rank representation (LRR), and employs a spectral clustering to segment the data samples. However, effectively applying LRSC into clustering big data becomes a challenge because both LRR and spectral clustering suffer from high computational cost. To address this challenge, we create a projective low-rank subspace clustering (PLrSC) scheme for large scale clustering problem. First, a small dataset is randomly sampled from big dataset. Second, our proposed predictive low-rank decomposition (PLD) is applied to train a deep encoder by using the small dataset, and the deep encoder is used to fast compute the low-rank representations of all data samples. Third, fast spectral clustering is employed to segment the representations. As a non-trivial contribution, we theoretically prove the deep encoder can universally approximate to the exact (or bounded) recovery of the row space. Experiments verify that our scheme outperforms the related methods on large scale datasets in a small amount of time. We achieve the state-of-art clustering accuracy by 95.8% on MNIST using scattering convolution features.

### Tuesday 2210:30 - 12:00MAS-ATM - Agent Theories and Models (219)

Chair: Virginia Dignum
• #1295
Plan Explanations as Model Reconciliation: Moving Beyond Explanation as Soliloquy
Tathagata Chakraborti, Sarath Sreedharan, Yu Zhang, Subbarao Kambhampati
Agent Theories and Models

When AI systems interact with humans in the loop, they are often called on to provide explanations for their plans and behavior. Past work on plan explanations primarily involved the AI system explaining the correctness of its plan and the rationale for its decision in terms of its own model. Such soliloquy is wholly inadequate in most realistic scenarios where the humans have domain and task models that differ significantly from that used by the AI system. We posit that the explanations are best studied in light of these differing models. In particular, we show how explanation can be seen as a "model reconciliation problem" (MRP), where the AI system in effect suggests changes to the human's model, so as to make its plan be optimal with respect to that changed human model. We will study the properties of such explanations, present algorithms for automatically computing them, and evaluate the performance of the algorithms.

• #1930
Don't Bury your Head in Warnings: A Game-Theoretic Approach for Intelligent Allocation of Cyber-security Alerts
Aaron Schlenker, Haifeng Xu, Mina Guirguis, Christopher Kiekintveld, Arunesh Sinha, Milind Tambe, Solomon Sonya, Darryl Balderas, Noah Dunstatter
Agent Theories and Models

In recent years, there have been a number of successful cyber attacks on enterprise networks by malicious actors which have caused severe damage. These networks have Intrusion Detection and Prevention Systems in place to protect them, but they are notorious for producing a high volume of alerts. These alerts must be investigated by cyber analysts to determine whether they are an attack or benign. Unfortunately, there are magnitude more alerts generated than there are cyber analysts to investigate them. This trend is expected to continue into the future creating a need for tools which find optimal assignments of the incoming alerts to analysts in the presence of a strategic adversary. We address this challenge with the four following contributions: (1) a cyber screening game (CSG) model for the cyber network protection domain, (2) an NP-hardness proof for computing the optimal strategy for the defender, (3) an algorithm that finds the optimal allocation of experts to alerts in the CSG, and (4) heuristic improvements for computing allocations in CSGs that accomplishes significant scale-up which we show empirically to closely match the solution quality of the optimal algorithm.

• #3043
Pure Nash Equilibria in Online Fair Division
Martin Aleksandrov, Toby Walsh
Agent Theories and Models

We consider a fair division setting in which items arrive one by one and are allocated to agents via two existing mechanisms: LIKE and BALANCED LIKE. The LIKE mechanism is strategy-proof whereas the BALANCED LIKE mechanism is not. Whilst LIKE is strategy-proof, we show that it is not group strategy-proof. Indeed, our first main result is that no online mechanism is group strategy-proof. We then focus on pure Nash equilibria of these two mechanisms. Our second main result is that computing a pure Nash equilibrium is tractable for LIKE and intractable for BALANCED LIKE. Our third main result is that there could be multiple such profiles and counting them is also intractable even when we restrict our attention to equilibria with a specific property (e.g. envy-freeness, Pareto efficiency).

• #3293
Synchronisation Games on Hypergraphs
Sunil Simon, Dominik Wojtczak
Agent Theories and Models

We study a strategic game model on hypergraphs where players, modelled by nodes, try to coordinate or anti-coordinate their choices within certain groups of players, modelled by hyperedges. We show this model to be a strict generalisation of symmetric additively separable hedonic games to the hypergraph setting and that such games always have a pure Nash equilibrium, which can be computed in pseudo-polynomial time. Moreover, in the pure coordination setting, we show that a strong equilibrium exists and can be computed in polynomial time when the game possesses a certain acyclic structure.

• #3739
The Off-Switch Game
Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell
Agent Theories and Models

It is clear that one of the primary tools we can use to mitigate the potential risk from a misbehaving AI system is the ability to turn the system off. As the capabilities of AI systems improve, it is important to ensure that such systems do not adopt subgoals that prevent a human from switching the system off. This is a challenge because many formulations of rational agents create strong incentives for self-preservation. This is not caused by a built-in instinct, but because a rational agent will maximize expected utility and cannot achieve whatever objective it has been given if it is dead. Our goal is to study the incentives an agent has to allow itself to be switched off. We analyze a simple game between a human H and a robot R, where H can press R’s off switch but R can disable the off switch. A traditional agent takes its reward function for granted: we show that such agents have an incentive to disable the off switch, except in the special case where H is perfectly rational. Our key insight is that for R to want to preserve its off switch, it needs to be uncertain about the utility associated with the outcome, and to treat H’s actions as important observations about that utility. (R also has no incentive to switch itself off in this setting.) We conclude that giving machines an appropriate level of uncertainty about their objectives leads to safer designs, and we argue that this setting is a useful generalization of the classical AI paradigm of rational agents.

• #3924
Score Aggregation via Spectral Method
Mingyu Xiao, Yuqing Wang
Agent Theories and Models

The score aggregation problem is to find an aggregate scoring over all candidates given individual scores provided by different agents. This is a fundamental problem with a broad range of applications in social choice and many other areas. The simple and commonly used method is to sum up all scores of each candidate, which is called the sum-up method. In this paper, we give good algebraic and geometric explanations for score aggregation, and develop a spectral method for it. If we view the original scores as noise data', our method can find an optimal' aggregate scoring by minimizing the noise information'. We also suggest a signal-to-noise indicator to evaluate the validity of the aggregation or the consistency of the agents.

### Tuesday 2210:30 - 12:00NLP-NLS - Natural Language Semantics (220)

Chair: Yanghua Xiao
• #3061
Understanding and Exploiting Language Diversity
Fausto Giunchiglia, Khuyagbaatar Batsuren, Gabor Bella
Natural Language Semantics

The main goal of this paper is to describe a general approach to the problem of understanding linguistic phenomena, as they appear in lexical semantics, through the analysis of large scale resources, while exploiting these results to improve the quality of the resources themselves. The main contributions are: the approach itself, a formal quantitative measure of language diversity; a set of formal quantitative measures of resource incompleteness and a large scale resource, called the Universal Knowledge Core (UKC) built following the methodology proposed. As a concrete example of an application, we provide an algorithm for distinguishing polysemes from homonyms, as stored in the UKC.

• #1487
Entity Suggestion with Conceptual Expanation
Yi Zhang, Yanghua Xiao, Seung-won Hwang, Haixun Wang, X. Sean Wang, Wei Wang
Natural Language Semantics

Entity Suggestion with Conceptual Explanation (ESC) refers to a type of entity acquisition query in which a user provides a set of example entities as the query and obtains in return not only some related entities but also concepts which can best explain the query and the result. ESC is useful in many applications such as related-entity recommendation and query expansion. Many example based entity suggestion solutions are available in existing literatures. However, they are generally not aware of the concepts of query entities thus cannot be used for conceptual explanation. In this paper, we propose two probabilistic entity suggestion models and their computation solutions. Our models and solutions fully take advantage of the large scale taxonomies which consist of isA relations between entities and concepts. With our models and solutions, we can not only find the best entities to suggest but also derive the best concepts to explain the suggestion. Extensive evaluations on real data sets justify the accuracy of our models and the efficiency of our solutions.

• #1858
Learning Sentence Representation with Guidance of Human Attention
Shaonan Wang, Jiajun Zhang, Chengqing Zong
Natural Language Semantics

Recently, much progress has been made in learning general-purpose sentence representations that can be used across domains. However, most of the existing models typically treat each word in a sentence equally. In contrast, extensive studies have proven that human read sentences efficiently by making a sequence of fixation and saccades. This motivates us to improve sentence representations by assigning different weights to the vectors of the component words, which can be treated as an attention mechanism on single sentences. To that end, we propose two novel attention models, in which the attention weights are derived using significant predictors of human reading time, i.e., Surprisal, POS tags and CCG supertags. The extensive experiments demonstrate that the proposed methods significantly improve upon the state-of-the-art sentence representation models.

• #3027
Dynamic Compositional Neural Networks over Tree Structure
Pengfei Liu, Xipeng Qiu, Xuanjing Huang
Natural Language Semantics

Tree-structured neural networks have proven to be effective in learning semantic representations by exploitingsyntactic information. In spite of their success, most existing models suffer from the underfitting problem: they recursively use the same shared compositional function throughout the whole compositional process and lack expressive power due to inability to capture the richness of compositionality.In this paper, we address this issue by introducing the dynamic compositional neural networks over tree structure (DC-TreeNN), in which the compositional function is dynamically generated by a meta network.The role of meta-network is to capture the metaknowledge across the different compositional rules and formulate them. Experimental results on two typical tasks show the effectiveness of the proposed models.

• #4054
Lexical Sememe Prediction via Word Embeddings and Matrix Factorization
Ruobing Xie, Xingchi Yuan, Zhiyuan Liu, Maosong Sun
Natural Language Semantics

Sememes are defined as the minimum semantic units of human languages. People have manually annotated lexical sememes for words and form linguistic knowledge bases. However, manual construction is time-consuming and labor-intensive, with significant annotation inconsistency and noise. In this paper, we for the first time explore to automatically predict lexical sememes based on semantic meanings of words encoded by word embeddings. Moreover, we apply matrix factorization to learn semantic relations between sememes and words. In experiments, we take a real-world sememe knowledge base HowNet for training and evaluation, and the results reveal the effectiveness of our method for lexical sememe prediction. Our method will be of great use for annotation verification of existing noisy sememe knowledge bases and annotation suggestion of new words and phrases.

• #3660
Cognitive-Inspired Conversational-Strategy Reasoner for Socially-Aware Agents
Oscar J. Romero, Ran Zhao, Justine Cassell
Natural Language Semantics

In this work we propose a novel module for a dialogue system that allows a conversational agent to utter phrases that do not just meet the system's task intentions, but also work towards achieving the system's social intentions. The module - a Social Reasoner - takes the task goals the system must achieve and decides the appropriate conversational style and strategy with which the dialogue system describes the information the user desires so as to boost the strength of the relationship between the user and system (rapport), and therefore the user's engagement and willingness to divulge the information the agent needs to efficiently and effectively achieve the user's goals. Our Social Reasoner is inspired both by analysis of empirical data of friends and stranger dyads engaged in a task, and by prior literature in fields as diverse as reasoning processes in cognitive and social psychology, decision-making, sociolinguistics and conversational analysis. Our experiments demonstrated that, when using the Social Reasoner in a Dialogue System, the rapport level between the user and system increases in more than 35% in comparison with those cases where no Social Reasoner is used.

### Tuesday 2210:30 - 12:30EAR-1 - Early Career 1 (Plenary 2)

Chair: Edith Elkind
• #21
Game Theoretic Analysis of Security and Sustainability
Bo An
Early Career 1

Computational game theory has become a powerful tool to address critical issues in security and sustainability. Casting the security resource allocation problem as a Stackelberg game, novel algorithms have been developed to provide randomized security resource allocations. These algorithms have led to deployed security-game based decision aids for many real-world security domains including infrastructure security and wildlife protection. We contribute to this community by addressing several major research challenges in complex security resource allocation, including dynamic payoffs, uncertainty, protection externality, games on networks, and strategic secrecy. We also analyze optimal security resource allocation in many potential application domains including cyber security. Furthermore, we apply game theory to reasoning optimal policy in deciding taxi pricing scheme and EV charging placement and pricing.

• #24
Committee Scoring Rules: A Call to Arms
Piotr Faliszewski
Early Career 1

Committee scoring rules are a class of voting rules used to select sets of candidates based on the preferences of the voters. The goal of this paper is to present this class and to invite researchers to study its properties (computational and axiomatic alike).

• #30
Reinforcement mechanism design
Pingzhong Tang
Early Career 1

We put forward a modeling and algorithmic framework to design and optimize mechanisms in dynamic industrial environments where a designer can make use of the data generated in the process to automatically improve future design. Our solution, coined reinforcement mechanism design, is rooted in game theory but incorporates recent AI techniques to get rid of nonrealistic modeling assumptions and to make automated optimization feasible. We instantiate our framework on the key application scenarios of Baidu and Taobao, two of the largest mobile app companies in China. For the Taobao case, our framework automatically designs mechanisms that allocate buyer impressions for the e-commerce website; for the Baidu case, our framework automatically designs dynamic reserve pricing schemes of advertisement auctions of the search engine. Experiments show that our solutions outperform the state-of-the-art alternatives and those currently deployed, under both scenarios.

• #33
Securing and scaling cryptocurrencies
Aviv Zohar
Early Career 1

Bitcoin, a protocol for a new permissionless decentralized digital currency hailed the arrival of a new application domain for computer science. Following Bitcoin's arrival, a series of innovations derived from the state of the art in several fields has been applied to cryptocurrencies, and has been slowly reshaping monetary and financial instruments on public distributed ledgers. It was soon clear however that Bitcoin and similar cryptocurrencies still require additional improvements. This challenging domain presents researchers in the field with new and exciting questions. I provide examples from two main research threads, related to the scalability of the protocol and to its underlying incentives.

### Tuesday 2210:30 - 12:30SIS-KR - Sister Conference Track: Knowledge Representation (203)

Chair: Michael Thielscher
• #1330
A Verified SAT Solver Framework with Learn, Forget, Restart, and Incrementality
Jasmin Christian Blanchette, Mathias Fleury, Christoph Weidenbach
Sister Conference Track: Knowledge Representation

We developed a formal framework for SAT solving using the Isabelle/HOL proof assistant. Through a chain of refinements, an abstract CDCL (conflict-driven clause learning) calculus is connected to a SAT solver that always terminates with correct answers. The framework offers a convenient way to prove theorems about the SAT solver and experiment with variants of the calculus. Compared with earlier verifications, the main novelties are the inclusion of the CDCL rules for forget, restart, and incremental solving and the use of refinement.

• #4215
Unsatisfiable Core Shrinking for Anytime Answer Set Optimization
Mario Alviano, Carmine Dodaro
Sister Conference Track: Knowledge Representation

Efficient algorithms for the computation of optimum stable models are based on unsatisfiable core analysis. However, these algorithms essentially run to completion, providing few or even no suboptimal stable models. This drawback can be circumvented by shrinking unsatisfiable cores. Interestingly, the resulting anytime algorithm can solve more instances than the original algorithm.

• #4234
KSP: A Resolution-based Prover for Multimodal K, Abridged Report
Cláudia Nalon, Ullrich Hustadt, Clare Dixon
Sister Conference Track: Knowledge Representation

In this paper, we briefly describe an implementation of a hyper-resolution-based calculus for the propositional basic multimodal logic, Kn. The prover, KSP, is designed to support experimentation with different combinations of refinements for its basic calculus. The prover allows for both local and global reasoning. We present an experimental evaluation that compares KSP with a range of existing reasoners for Kn.

• #4240
Concerning Referring Expressions in Query Answers
Alexander Borgida, David Toman, Grant Weddell
Sister Conference Track: Knowledge Representation

A referring expression in linguistics is a noun phrase that identifies individuals to listeners. In the context of a query over a first order knowledge base, referring expressions to answers are usually constant symbols. This paper motivates and initiates the exploration of allowing more general formulas, called singular referring expressions, to replace constants in this role. Referring expression types play a novel and significant role in analyzing the properties of candidate expressions.

• #4242
First-Order Modular Logic Programs and their Conservative Extensions (Extended Abstract)
Amelia Harrison, Yuliya Lierler
Sister Conference Track: Knowledge Representation

This paper introduces first-order modular logic programs, which provide a way of viewing answer set programs as consisting of many independent, meaningful modules. We also present conservative extensions of such programs. This concept helps to identify strong relationships between modular programs as well as between traditional programs. For example, we illustrate how the notion of a conservative extension can be used to justify the common projection rewriting. This is a short version of a paper was presented at the 32nd International Conference on Logic Programming (Harrison and Lierler, 2016).

• #4271
nanoCoP: Natural Non-clausal Theorem Proving
Jens Otten
Sister Conference Track: Knowledge Representation

Most efficient fully automated theorem provers implement proof search calculi that require the input formula to be in a clausal form, i.e. disjunctive or conjunctive normal form. The translation into clausal form introduces a significant overhead to the proof search and modifies the structure of the original formula. Translating a proof in clausal form back into a more readable non-clausal proof of the original formula is not straightforward. This paper presents a non-clausal automated theorem prover for classical first-order logic. It is based on a non-clausal connection calculus and implemented with a few lines of Prolog code. Working entirely on the original structure of the input formula yields not only a speed up of the proof search, but the resulting non-clausal proofs are also shorter.

### Tuesday 2210:30 - 12:30Competition (206)

Chair: Shuang (Catherine) Wu
• Alibaba
Competition
• ### Tuesday 2214:00 - 15:00Invited Talk (Plenary 2)

Chair: Craig Knoblock
• As We Train the AI, so the AI Can Train Us
Marti Hearst
Invited Talk
• ### Tuesday 2214:00 - 15:00Invited Talk (203-204)

Chair: Thomas Eiter
• Swift Logic for Big Data and Knowledge Graphs
Georg Gottlob
Invited Talk
• ### Tuesday 2215:00 - 16:00Panel (Plenary 2)

Chair: Michael Luck
• AI and Autonomy: Current Opportunity or Future Threat?
Panelists: David Danks, Marta Poblet, Toby Walsh.
Panel
• ### Tuesday 2215:00 - 16:00ML-RL - Relational Learning (203-204)

Chair: Parag Singla
• #3270
Clustering-Based Relational Unsupervised Representation Learning with an Explicit Distributed Representation
Sebastijan Dumancic, Hendrik Blockeel
Relational Learning

The goal of unsupervised representation learning is to extract a new representation of data, such that solving many different tasks becomes easier. Existing methods typically focus on vectorized data and offer little support for relational data, which additionally describes relationships among instances. In this work we introduce an approach for relational unsupervised representation learning. Viewing a relational dataset as a hypergraph, new features are obtained by clustering vertices and hyperedges. To find a representation suited for many relational learning tasks, a wide range of similarities between relational objects is considered, e.g. feature and structural similarities. We experimentally evaluate the proposed approach and show that models learned on such latent representations perform better, have lower complexity, and outperform the existing approaches on classification tasks.

• #1162
Multilingual Knowledge Graph Embeddings for Cross-lingual Knowledge Alignment
Muhao Chen, Yingtao Tian, Mohan Yang, Carlo Zaniolo
Relational Learning

Many recent works have demonstrated the benefits of knowledge graph embeddings in completing monolingual knowledge graphs. Inasmuch as related knowledge bases are built in several different languages, achieving cross-lingual knowledge alignment will help people in constructing a coherent knowledge base, and assist machines in dealing with different expressions of entity relationships across diverse human languages. Unfortunately, achieving this highly desirable cross-lingual alignment by human labor is very costly and error-prone. Thus, we propose MTransE, a translation-based model for multilingual knowledge graph embeddings, to provide a simple and automated solution. By encoding entities and relations of each language in a separated embedding space, MTransE provides transitions for each embedding vector to its cross-lingual counterparts in other spaces, while preserving the functionalities of monolingual embeddings. We deploy three different techniques to represent cross-lingual transitions, namely axis calibration, translation vectors, and linear transformations, and derive five variants for MTransE using different loss functions. Our models can be trained on partially aligned graphs, where just a small portion of triples are aligned with their cross-lingual counterparts. The experiments on cross-lingual entity matching and triple-wise alignment verification show promising results, with some variants consistently outperforming others on different tasks. We also explore how MTransE preserves the key properties of its monolingual counterpart.

• #1326
When Does Label Propagation Fail? A View from a Network Generative Model
Yuto Yamaguchi, Kohei Hayashi
Relational Learning

What kinds of data does Label Propagation (LP) work best on? Can we justify the solution of LP from a theoretical standpoint? LP is a semi-supervised learning algorithm that is widely used to predict unobserved node labels on a network (e.g., user's gender on an SNS). Despite its importance, its theoretical properties remain mostly unexplored. In this paper, we answer the above questions by interpreting LP from a statistical viewpoint. As our main result, we identify the network generative model behind the discretized version of LP (DLP), and we show that under specific conditions the solution of DLP is equal to the maximum {\it a posteriori} estimate of that generative model. Our main result reveals the critical limitations of LP. Specifically, we discover that LP would not work best on networks with (1) disassortative node labels, (2) clusters having different edge densities, (3) non-uniform label distributions, or (4) unreliable node labels provided. Our experiments under a variety of settings support our theoretical results.

• #1818
Tensor Decomposition with Missing Indices
Yuto Yamaguchi, Kohei Hayashi
Relational Learning

How can we decompose a data tensor if the indices are partially missing?Tensor decomposition is a fundamental tool to analyze the tensor data.Suppose, for example, we have a 3rd-order tensor X where each element Xijk takes 1 if user i posts word j at location k on Twitter.Standard tensor decomposition expects all the indices are observed but, in some tweets, location k can be missing.In this paper, we study a tensor decomposition problem where the indices (i, j, or k) of some observed elements are partially missing.Towards the problem, we propose a probabilistic tensor decomposition model that handles missing indices as latent variables.To infer them, we derive an algorithm based on stochastic variational inference, which enables to leverage the information from the incomplete data scalably. The experiments on both synthetic and real datasets show that the proposed method achieves higher accuracy in the tensor completion task than baselines that cannot handle missing indices.

### Tuesday 2215:00 - 16:00MAS-CG - Cooperative Games (210)

Chair: Nicholas Mattei
• #1629
How to Form Winning Coalitions in Mixed Human-Computer Settings
Yair Zick, Kobi Gal, Yoram Bachrach, Moshe Mash
Cooperative Games

Despite the prevalence of weighted voting in the real world, there has been relatively little work studying real people's behavior in such settings. This paper proposes a new negotiation game, based on the weighted voting paradigm in cooperative games, where players need to form coalitions and agree on how to share the gains. We show that solution concepts from cooperative game theory (in particular, an extension of the Deegan-Packel Index) provide a good prediction of people's decisions to join a given coalition. With this insight in mind, we design an agent that combines predictive analytics with decision theory to make offers to people in the game. We show that the agent was able to obtain higher shares from coalitions than did people playing other people, without reducing the acceptance rate of its offers. These results demonstrate the potential of incorporating concepts from cooperative game theory in the design of negotiating agents.

• #2290
Attachment Centrality for Weighted Graphs
Jadwiga Sosnowska, Oskar Skibski
Cooperative Games

Measuring how central nodes are in terms of connecting a network has recently received increasing attention in the literature. While a few dedicated centrality measures have been proposed, Skibski et al. [2016] showed that the Attachment Centrality is the only one that satisfies certain natural axioms desirable for connectivity. Unfortunately, the Attachment Centrality is defined only for unweighted graphs which makes this measure ill-fitted for various applications. For instance, covert networks are typically weighted, where the weights carry additional intelligence available about criminals or terrorists and the links between them. To analyse such settings, in this paper we extend the Attachment Centrality to node-weighted and edge-weighted graphs. By an axiomatic analysis, we show that the Attachment Centrality is closely related to the Degree Centrality in weighted graphs.

• #1753
The Condorcet Principle for Multiwinner Elections: From Shortlisting to Proportionality
Haris Aziz, Edith Elkind, Piotr Faliszewski, Martin Lackner, Piotr Skowron
Cooperative Games

We study two notions of stability in multiwinner elections that are based on the Condorcet criterion. The first notion was introduced by Gehrlein and is majoritarian in spirit. The second one, local stability, is introduced in this paper, and focuses on voter representation. The goal of this paper is to explore these two notions, their implications on restricted domains, and the computational complexity of rules that are consistent with them.

• #3857
Core Stability in Hedonic Games among Friends and Enemies: Impact of Neutrals
Kazunori Ohta, Nathanaël Barrot, Anisse Ismaili, Yuko Sakurai, Makoto Yokoo
Cooperative Games

We investigate hedonic games under enemies aversion and friends appreciation, where every agent considers other agents as either a friend or an enemy. We extend these simple preferences by allowing each agent to also consider other agents to be neutral. Neutrals have no impact on her preference, as in a graphical hedonic game.Surprisingly, we discover that neutral agents do not simplify matters, but cause complexity. We prove that the core can be empty under enemies aversion and the strict core can be empty under friends appreciation. Furthermore, we show that under both preferences, deciding whether the strict core is non-empty, is NP^NP-complete. This complexity extends to the core under enemies aversion. We also show that under friends appreciation, we can always find a core stable coalition structure in polynomial time.

### Tuesday 2215:00 - 16:00MT-CBH - Computational Biology and eHealth (211)

Chair: Daniel Boley
• #1840
The DNA Word Design Problem: A New Constraint Model and New Results
Michael Codish, Michael Frank, Vitaly Lagoon
Computational Biology and eHealth

A fundamental problem in coding theory concerns the computation of the maximum cardinality of a set S of length n code words over an alphabet of size q, such that every pair of code words has Hamming distance at least d, and the set of additional constraints U on S is satisfied. This problem has application in several areas, one of which is the design of DNA codes where q=4 and the alphabet is {A,C,G,T}. We describe a new constraint model for this problem and demonstrate that it improves on previous solutions (computes better lower bounds) for various instances of the problem. Our approach is based on a clustering of DNA words into small sets of words. Solutions are then obtained as the union of such clusters. Our approach is SAT based: we specify constraints on clusters of DNA words and solve these using a Boolean satisfiability solver.

• #3199
Deep Neural Networks for High Dimension, Low Sample Size Data
Bo Liu, Ying Wei, Yu Zhang, Qiang Yang
Computational Biology and eHealth

Deep neural networks (DNN) have achieved breakthroughs in applications with large sample size. However, when facing high dimension, low sample size (HDLSS) data, such as the phenotype prediction problem using genetic data in bioinformatics, DNN suffers from overfitting and high-variance gradients. In this paper, we propose a DNN model tailored for the HDLSS data, named Deep Neural Pursuit (DNP). DNP selects a subset of high dimensional features for the alleviation of overfitting and takes the average over multiple dropouts to calculate gradients with low variance. As the first DNN method applied on the HDLSS data, DNP enjoys the advantages of the high nonlinearity, the robustness to high dimensionality, the capability of learning from a small number of samples, the stability in feature selection, and the end-to-end training. We demonstrate these advantages of DNP via empirical results on both synthetic and real-world biological datasets.

• #3693
Fast Sparse Gaussian Markov Random Fields Learning Based on Cholesky Factorization
Ivan Stojkovic, Vladisav Jelisavcic, Veljko Milutinovic, Zoran Obradovic
Computational Biology and eHealth

Learning the sparse Gaussian Markov Random Field, or conversely, estimating the sparse inverse covariance matrix is an approach to uncover the underlying dependency structure in data. Most of the current methods solve the problem by optimizing the maximum likelihood objective with a Laplace prior L1 on entries of a precision matrix. We propose a novel objective with a regularization term which penalizes an approximate product of the Cholesky decomposed precision matrix. This new reparametrization of the penalty term allows efficient coordinate descent optimization, which in synergy with an active set approach results in a very fast and efficient method for learning the sparse inverse covariance matrix. We evaluated the speed and solution quality of the newly proposed SCHL method on problems consisting of up to 24,840 variables. Our approach was several times faster than three state-of-the-art approaches. We also demonstrate that SCHL can be used to discover interpretable networks, by applying it to a high impact problem from the health informatics domain.

• #3636
Predicting Alzheimer's Disease Cognitive Assessment via Robust Low-Rank Structured Sparse Model
Jie Xu, Cheng Deng, Xinbo Gao, Dinggang Shen, Heng Huang
Computational Biology and eHealth

Alzheimer's disease (AD) is a neurodegenerative disorder with slow onset, which could result in the deterioration of the duration of persistent neurological dysfunction. How to identify the informative longitudinal phenotypic neuroimaging markers and predict cognitive measures are crucial to recognize AD at early stage. Many existing models related imaging measures to cognitive status using regression models, but they did not take full consideration of the interaction between cognitive scores. In this paper, we propose a robust low-rank structured sparse regression method (RLSR) to address this issue. The proposed model simultaneously selects effective features and learns the underlying structure between cognitive scores by utilizing novel mixed structured sparsity inducing norms and low-rank approximation. In addition, an efficient algorithm is derived to solve the proposed non-smooth objective function with proved convergence. Empirical studies on cognitive data of the ADNI cohort demonstrate the superior performance of the proposed method.

### Tuesday 2215:00 - 16:00ML-TSDS1 - Time Series and Data Streams 1 (212)

Chair: Maria Gini
• #1719
Retaining Data from Streams of Social Platforms with Minimal Regret
Nguyen Thanh Tam, Matthias Weidlich, Duong Chi Thang, Hongzhi Yin, Nguyen Quoc Viet Hung
Time Series and Data Streams 1

Today's social platforms, such as Twitter and Facebook, continuously generate massive volumes of data. The resulting data streams exceed any reasonable limit for permanent storage, especially since data is often redundant, overlapping, sparse, and generally of low value. This calls for means to retain solely a small fraction of the data in an online manner. In this paper, we propose techniques to effectively decide which data to retain, such that the induced loss of information, the regret of neglecting certain data, is minimized. These techniques enable not only efficient processing of massive streaming data, but are also adaptive and address the dynamic nature of social media. Experiments on large-scale real-world datasets illustrate the feasibility of our approach in terms of both, runtime and information quality.

• #3592
Disambiguating Energy Disaggregation: A Collective Probabilistic Approach
Sabina Tomkins, Jay Pujara, Lise Getoor
Time Series and Data Streams 1

Reducing household energy usage is a priority for improving the resiliency and stability of the power grid and decreasing the negative impact of energy consumption on the environment and public health.Relevant and timely feedback about the power consumption of specific appliances can help household residents to reduce their energy demand. Given only a total energy reading, such as that collected from a residential meter, energy disaggregation strives to discover the consumption of individual appliances. Existing disaggregation algorithms are computationally inefficient and rely heavily on high-resolution ground truth data. We introduce a probabilistic framework which infers the energy consumption of individual appliances using a hinge-loss Markov random field (HL-MRF), which admits highly scalable inference. To further enhance efficiency, we introduce a temporal representation which leverages state duration. We also explore how contextual information impacts solution quality with low-resolution data. Our framework is flexible in its ability to incorporate additional constraints; by constraining appliance usage with context and duration we can better disambiguate appliances with similar energy consumption profiles. We demonstrate the effectiveness of our framework on two public real-world datasets, reducing the error relative to a previous state-of-the-art method by as much as 50%.

• #896
Modelling the Working Week for Multi-Step Forecasting using Gaussian Process Regression
Pasan Karunaratne, Masud Moshtaghi, Shanika Karunasekera, Aaron Harwood, Trevor Cohn
Time Series and Data Streams 1

In time-series forecasting, regression is a popular method, with Gaussian Process Regression widely held to be the state of the art. The versatility of Gaussian Processes has led to them being used in many varied application domains. However, though many real-world applications involve data which follows a working-week structure, where weekends exhibit substantially different behavior to weekdays, methods for explicit modelling of working-week effects in Gaussian Process Regression models have not been proposed. Not explicitly modelling the working week fails to incorporate a signiﬁcant source of information which can be invaluable in forecasting scenarios. In this work we provide novel kernel-combination methods to explicitly model working-week effects in time-series data for more accurate predictions using Gaussian Process Regression. Further, we demonstrate that prediction accuracy can be improved by constraining the non-convex optimization process of ﬁnding optimal hyperparameter values. We validate the effectiveness of our methods by performing multi-step prediction on two real-world publicly available time-series datasets - one relating to electricity Smart Meter data of the University of Melbourne, and the other relating to the counts of pedestrians in the City of Melbourne.

• #1941
Stochastic Online Anomaly Analysis for Streaming Time Series
Zhao Xu, Kristian Kersting, Lorenzo von Ritter
Time Series and Data Streams 1

Identifying patterns in time series that exhibit anomalous behavior is of increasing importance in many domains, such as financial and Web data analysis. In real applications, time series data often arrive continuously, and usually only a single scan is allowed through the data. Batch learning and retrospective segmentation methods would not be well applicable to such scenarios. In this paper, we present an online nonparametric Bayesian method OLAD for anomaly analysis in streaming time series. Moreover, we develop a novel and efficient online learning approach for the OLAD model based on stochastic gradient descent. The proposed method can effectively learn the underlying dynamics of anomaly-contaminated heavy-tailed time series and identify potential anomalous events. Empirical analysis on real-world datasets demonstrates the effectiveness of our method.

### Tuesday 2215:00 - 16:00KR-CMR - Common Sense Reasoning (213)

Chair: Anthony Cohn
• #1257
Explicit Knowledge-based Reasoning for Visual Question Answering
Peng Wang, Qi Wu, Chunhua Shen, Anthony Dick, Anton van den Hengel
Common Sense Reasoning

We describe a method for visual question answering which is capable of reasoning about an image on the basis of information extracted from a large-scale knowledge base. The method not only answers natural language questions using concepts not contained in the image, but can explain the reasoning by which it developed its answer. It is capable of answering far more complex questions than the predominant long short-term memory-based approach, and outperforms it significantly in testing. We also provide a dataset and a protocol by which to evaluate general visual question answering methods.

• #2214
Induction of Interpretable Possibilistic Logic Theories from Relational Data
Ondrej Kuzelka, Jesse Davis, Steven Schockaert
Common Sense Reasoning

The field of statistical relational learning (SRL) is concerned with learning probabilistic models from relational data. Learned SRL models are typically represented using some kind of weighted logical formulas, which makes them considerably more interpretable than those obtained by e.g. neural networks. In practice, however, these models are often still difficult to interpret correctly, as they can contain many formulas that interact in non-trivial ways and weights do not always have an intuitive meaning. To address this, we propose a new SRL method which uses possibilistic logic to encode relational models. Learned models are then essentially stratified classical theories, which explicitly encode what can be derived with a given level of certainty. Compared to Markov Logic Networks (MLNs), our method is faster and produces considerably more interpretable models.

• #2450
What Can You Do with a Rock? Affordance Extraction via Word Embeddings
Nancy Fulda, Daniel Ricks, Ben Murdoch, David Wingate
Common Sense Reasoning

Autonomous agents must often detect affordances: the set of behaviors enabled by a situation. Affordance extraction is particularly helpful in domains with large action spaces, allowing the agent to prune its search space by avoiding futile behaviors. This paper presents a method for affordance extraction via word embeddings trained on a tagged Wikipedia corpus. The resulting word vectors are treated as a common knowledge database which can be queried using linear algebra. We apply this method to a reinforcement learning agent in a text-only environment and show that affordance-based action selection improves performance in most cases. Our method increases the computational complexity of each learning step but significantly reduces the total number of steps needed. In addition, the agent's action selections begin to resemble those a human would choose.

• #3792
How a General-Purpose Commonsense Ontology can Improve Performance of Learning-Based Image Retrieval
Rodrigo Toro Icarte, Jorge A. Baier, Cristian Ruz, Alvaro Soto
Common Sense Reasoning

The knowledge representation community has built general-purpose ontologies which contain large amounts of commonsense knowledge over relevant aspects of the world, including useful visual information, e.g.: "a ball is used by a football player", "a tennis player is located at a tennis court". Current state-of-the-art approaches for visual recognition do not exploit these rule-based knowledge sources. Instead, they learn recognition models directly from training examples. In this paper, we study how general-purpose ontologies—specifically, MIT's ConceptNet ontology—can improve the performance of state-of-the-art vision systems. As a testbed, we tackle the problem of sentence-based image retrieval. Our retrieval approach incorporates knowledge from ConceptNet on top of a large pool of object detectors derived from a deep learning technique. In our experiments, we show that ConceptNet can improve performance on a common benchmark dataset. Key to our performance is the use of the ESPGAME dataset to select visually relevant relations from ConceptNet. Consequently, a main conclusion of this work is that general-purpose commonsense ontologies improve performance on visual reasoning tasks when properly filtered to select meaningful visual relations.

### Tuesday 2215:00 - 16:00CS-CS2 - Constraint Satisfaction 2 (216)

Chair: Mateu Villaret
• #3049
Solving Integer Linear Programs with a Small Number of Global Variables and Constraints
Pavel Dvořák, Eduard Eiben, Robert Ganian, Dušan Knop, Sebastian Ordyniak
Constraint Satisfaction 2

Integer Linear Programming (ILP) has a broad range of applications in various areas of artificial intelligence. Yet in spite of recent advances, we still lack a thorough understanding of which structural restrictions make ILP tractable. Here we study ILP instances consisting of a small number of global'' variables and/or constraints such that the remaining part of the instance consists of small and otherwise independent components; this is captured in terms of a structural measure we call fracture backdoors which generalizes, for instance, the well-studied class of N-fold ILP instances. Our main contributions can be divided into three parts. First, we formally develop fracture backdoors and obtain exact and approximation algorithms for computing these. Second, we exploit these backdoors to develop several new parameterized algorithms for ILP; the performance of these algorithms will naturally scale based on the number of global variables or constraints in the instance. Finally, we complement the developed algorithms with matching lower bounds. Altogether, our results paint a near-complete complexity landscape of ILP with respect to fracture backdoors.

• #4013
Efficiency Through Procrastination: Approximately Optimal Algorithm Configuration with Runtime Guarantees
Robert Kleinberg, Kevin Leyton-Brown, Brendan Lucier
Constraint Satisfaction 2

Algorithm configuration methods have achieved much practical success, but to date have not been backed by meaningful performance guarantees. We address this gap with a new algorithm configuration framework, Structured Procrastination. With high probability and nearly as quickly as possible in the worst case, our framework finds an algorithm configuration that provably achieves near optimal performance. Moreover, its running time requirements asymptotically dominate those of existing methods.

• #2619
An Effective Learnt Clause Minimization Approach for CDCL SAT Solvers
Mao Luo, Chu-Min Li, Fan Xiao, Felip Manyà, Zhipeng Lü
Constraint Satisfaction 2

Learnt clauses in CDCL SAT solvers often contain redundant literals. This may have a negative impact on performance because redundant literals may deteriorate both the effectiveness of Boolean constraint propagation and the quality of subsequent learnt clauses. To overcome this drawback, we define a new inprocessing SAT approach which eliminates redundant literals from learnt clauses by applying Boolean constraint propagation. Learnt clause minimization is activated before the SAT solver triggers some selected restarts, and affects only some learnt clauses during the search process. Moreover, we conducted an empirical evaluation on instances coming from the hard combinatorial and application categories of recent SAT competitions. The results show that a remarkable number of additional instances are solved when the approach is incorporated into five of the best performing CDCL SAT solvers (Glucose, TC_Glucose, COMiniSatPS, MapleCOMSPS and MapleCOMSPS_LRB).

• #4108
Efficient Weighted Model Integration via SMT-Based Predicate Abstraction
Paolo Morettin, Andrea Passerini, Roberto Sebastiani
Constraint Satisfaction 2

Weighted model integration (WMI) is a recent formalism generalizing weighted model counting (WMC) to run probabilistic inference over hybrid domains, characterized by both discrete and continuous variables and relationships between them. Albeit powerful, the original formulation of WMI suffers from some theoretical limitations, and it is computationally very demanding as it requires to explicitly enumerate all possible models to be integrated over. In this paper we present a novel general notion of WMI, which fixes the theoretical limitations and allows for exploiting the power of SMT-based predicate abstraction techniques. A novel algorithm combines a strong reduction in the number of models to be integrated over with their efficient enumeration. Experimental results on synthetic and real-world data show drastic computational improvements over the original WMI formulation as well as existing alternatives for hybrid inference.

### Tuesday 2215:00 - 16:00UAI-API2 - Approximate Probabilistic Inference 2 (217)

Chair: Guy Van den Broeck
• #3667
The Mixing of Markov Chains on Linear Extensions in Practice
Topi Talvitie, Teppo Niinimäki, Mikko Koivisto
Approximate Probabilistic Inference 2

We investigate almost uniform sampling from the set of linear extensions of a given partial order. The most efficient schemes stem from Markov chains whose mixing time bounds are polynomial, yet impractically large. We show that, on instances one encounters in practice, the actual mixing times can be much smaller than the worst-case bounds, and particularly so for a novel Markov chain we put forward. We circumvent the inherent hardness of estimating standard mixing times by introducing a refined notion, which admits estimation for moderate-size partial orders. Our empirical results suggest that the Markov chain approach to sample linear extensions can be made to scale well in practice, provided that the actual mixing times can be realized by instance-sensitive upper bounds or termination rules. Examples of the latter include existing perfect simulation algorithms, whose running times in our experiments follow the actual mixing times of certain chains, albeit with significant overhead.

• #1335
Approximating Discrete Probability Distribution of Image Emotions by Multi-Modal Features Fusion
Sicheng Zhao, Guiguang Ding, Yue Gao, Jungong Han
Approximate Probabilistic Inference 2

Existing works on image emotion recognition mainly assigned the dominant emotion category or average dimension values to an image based on the assumption that viewers can reach a consensus on the emotion of images. However, the image emotions perceived by viewers are subjective by nature and highly related to the personal and situational factors. On the other hand, image emotions can be conveyed by different features, such as semantics and aesthetics. In this paper, we propose a novel machine learning approach that formulates the categorical image emotions as a discrete probability distribution (DPD). To associate emotions with the extracted visual features, we present a weighted multi-modal shared sparse leaning to learn the combination coefficients, with which the DPD of an unseen image can be predicted by linearly integrating the DPDs of the training images. The representation abilities of different modalities are jointly explored and the optimal weight of each modality is automatically learned. Extensive experiments on three datasets verify the superiority of the proposed method, as compared to the state-of-the-art.

• #3347
Scalable Estimation of Dirichlet Process Mixture Models on Distributed Data
Ruohui Wang, Dahua Lin
Approximate Probabilistic Inference 2

We consider the estimation of Dirichlet Process Mixture Models (DPMMs) in distributed environments, where data are distributed across multiple computing nodes. A key advantage of Bayesian nonparametric models such as DPMMs is that they allow new components to be introduced on the fly as needed. This, however, posts an important challenge to distributed estimation -- how to handle new components efficiently and consistently. To tackle this problem, we propose a new estimation method, which allows new components to be created locally in individual computing nodes. Components corresponding to the same cluster will be identified and merged via a probabilistic consolidation scheme. In this way, we can maintain the consistency of estimation with very low communication cost. Experiments on large real-world data sets show that the proposed method can achieve high scalability in distributed and asynchronous environments without compromising the mixing performance.

• #3862
Compressed Nonparametric Language Modelling
Ehsan Shareghi, Gholamreza Haffari, Trevor Cohn
Approximate Probabilistic Inference 2

Hierarchical Pitman-Yor Process priors are compelling for learning language models, outperforming point-estimate based methods. However, these models remain unpopular due to computational and statistical inference issues, such as memory and time usage, as well as poor mixing of sampler. In this work we propose a novel framework which represents the HPYP model compactly using compressed suffix trees. Then, we develop an efficient approximate inference scheme in this framework that has a much lower memory footprint compared to full HPYP and is fast in the inference time. The experimental results illustrate that our model can be built on significantly larger datasets compared to previous HPYP models, while being several orders of magnitudes smaller, fast for training and inference, and outperforming the perplexity of the state-of-the-art Modified Kneser-Ney count-based LM smoothing by up to 15%.

### Tuesday 2215:00 - 16:00ROB-MPP - Motion and Path Planning (218)

Chair: Chris Amato
• #1618
On the Power and Limitations of Deception in Multi-Robot Adversarial Patrolling
Noga Talmor, Noa Agmon
Motion and Path Planning

Multi-robot adversarial patrolling is a well studied problem, investigating how defenders can optimally use all given resources for maximizing the probability of detecting penetrations, that are controlled by an adversary. It is commonly assumed that the adversary in this problem is rational, thus uses the knowledge it has on the patrolling robots (namely, the number of robots, their location, characteristics and strategy) to optimize its own chances to penetrate successfully. In this paper we present a novel defending approach which manipulates the adversarial (possibly partial) knowledge on the patrolling robots, so that it will believe the robots have more power than they actually have. We describe two different ways of deceiving the adversary: Window Deception, in which it is assumed that the adversary has partial observability of the perimeter, and Scarecrow Deception, in which some of the patrolling robots only appear as real robots, though they have no ability to actually detect the adversary. We analyze the limitations of both models, and suggest a random-based approach for optimally deceiving the adversary that considers both the resources of the defenders, and the adversarial knowledge.

• #2822
Compromise-free Pathfinding on a Navigation Mesh
Michael Cui, Daniel D. Harabor, Alban Grastien
Motion and Path Planning

We want to compute geometric shortest paths in a collection of convex traversable polygons, also known as a navigation mesh. Simple to compute and easy to update, navigation meshes are widely used for pathfinding in computer games. When the mesh is static, shortest path problems can be solved exactly and very fast but only after a costly preprocessing step. When the mesh is dynamic, practitioners turn to online methods which typically compute only approximately shortest paths. In this work we present a new pathfinding algorithm which is compromise-free; i.e. it is simultaneously fast, online and optimal. Our method, Polyanya, extends and generalises Anya; a recent and related interval-based search technique developed for computing geometric shortest paths in grids. We show how that algorithm can be modified to support search over arbitrary sets of convex polygons and then evaluate its performance on a range of realistic and synthetic benchmark problems.

• #2991
Switched Linear Multi-Robot Navigation Using Hierarchical Model Predictive Control
Chao Huang, Xin Chen, Yifan Zhang, Shengchao Qin, Yifeng Zeng, Xuandong Li
Motion and Path Planning

Multi-robot navigation control in the absence of reference trajectory is rather challenging as it is expected to ensure stability and feasibility while still offer fast computation on control decisions. The intrinsic high complexity of switched linear dynamical robots makes the problem even more challenging. In this paper, we propose a novel HMPC based method to address the navigation problem of multiple robots with switched linear dynamics. We develop a new technique to compute the reachable sets of switched linear systems and use them to enable the parallel computation of control parameters. We present theoretical results on stability, feasibility and complexity of the proposed approach, and demonstrate its empirical advance in performance against other approaches.

• #2950
Maintaining Communication in Multi-Robot Tree Coverage
Mor Sinay, Noa Agmon, Oleg Maksimov, Sarit Kraus, David Peleg
Motion and Path Planning

Area coverage is an important task for mobile robots, mainly due to its applicability in many domains, such as search and rescue. In this paper we study the problem of multi-robot coverage, in which the robots must obey a strong communication restriction: they should maintain connectivity between teammates throughout the coverage. We formally describe the Multi-Robot Connected Tree Coverage problem, and an algorithm for covering perfect N-ary trees while adhering to the communication requirement. The algorithm is analyzed theoretically, providing guarantees for coverage time by the notion of speedup factor. We enhance the theoretically-proven solution with a dripping heuristic algorithm, and show in extensive simulations that it significantly decreases the coverage time. The algorithm is then adjusted to general (not necessarily perfect) N-ary trees and additional experiments prove its efficiency. Furthermore, we show the use of our solution in a simulated officebuilding scenario. Finally, we deploy our algorithm on real robots in a real office building setting, showing efficient coverage time in practice.

### Tuesday 2215:00 - 16:00PL-MDP - Markov Decision Processies (219)

Chair: Eyal Shlomo Shimony
• #2921
Improved Strong Worst-case Upper Bounds for MDP Planning
Anchit Gupta, Shivaram Kalyanakrishnan
Markov Decision Processies

The Markov Decision Problem (MDP) plays a central role in AI as an abstraction of sequential decision making. We contribute to the theoretical analysis of MDP PLANNING, which is the problem of computing an optimal policy for a given MDP. Specifically, we furnish improved STRONG WORST-CASE upper bounds on the running time of MDP planning. Strong bounds are those that depend only on the number of states n and the number of actions k in the specified MDP; they have no dependence on affiliated variables such as the discount factor and the number of bits needed to represent the MDP. Worst-case bounds apply to EVERY run of an algorithm; randomised algorithms can typically yield faster EXPECTED running times. While the special case of 2-action MDPs (that is, k = 2) has recently received some attention, bounds for general k have remained to be improved for several decades. Our contributions are to this general case. For k >= 3, the tightest strong upper bound shown to date for MDP planning belongs to a family of algorithms called Policy Iteration. This bound is only a polynomial improvement over a trivial bound of poly(n, k) k^{n} [Mansour and Singh, 1999]. In this paper, we generalise a contrasting algorithm called the Fibonacci Seesaw, and derive a bound of poly(n, k) k^{0.6834n}. The key construct we use is a template to map algorithms for the 2-action setting to the general setting. Interestingly, this idea can also be used to design Policy Iteration algorithms with a running time upper bound of poly(n, k) k^{0.7207n}. Both our results improve upon bounds that have stood for several decades.

• #3464
Proactive and Reactive Coordination of Non-dedicated Agent Teams Operating in Uncertain Environments
Pritee Agrawal, Pradeep Varakantham
Markov Decision Processies

Domains such as disaster rescue, security patrolling etc. often feature dynamic environments where allocations of tasks to agents become ineffective due to unforeseen conditions that may require agents to leave the team. Agents leave the team either due to arrival of high priority tasks (e.g., emergency, accident or violation) or due to some damage to the agent. Existing research in task allocation has only considered fixed number of agents and in some instances arrival of new agents on the team. However, there is little or no literature that considers situations where agents leave the team after task allocation. To that end, we first provide a general model to represent non-dedicated teams. Second, we provide a proactive approach based on sample average approximation to generate a strategy that works well across different feasible scenarios of agents leaving the team. Furthermore, we also provide a 2-stage approach that provides a 2-stage policy that changes allocation based on observed state of the team. Third, we provide a reactive approach that rearranges the allocated tasks to better adapt to leaving agents. Finally, we provide a detailed evaluation of our approaches on existing benchmark problems.

• #1909
Equi-Reward Utility Maximizing Design in Stochastic Environments
Sarah Keren, Luis Pineda, Avigdor Gal, Erez Karpas, Shlomo Zilberstein
Markov Decision Processies

We present the Equi Reward Utility Maximizing Design (ER-UMD) problem for redesigning stochastic environments to maximize agent performance. ER-UMD fits well contemporary applications that require offline design of environments where robots and humans act and cooperate. To find an optimal modification sequence we present two novel solution techniques: a compilation that embeds design into a planning problem, allowing use of off-the-shelf solvers to find a solution, and a heuristic search in the modifications space, for which we present an admissible heuristic. Evaluation shows the feasibility of the approach using standard benchmarks from the probabilistic planning competition and a benchmark we created for a vacuum cleaning robot setting.

• #2530
Reduction Techniques for Model Checking and Learning in MDPs
Suda Bharadwaj, Stephane Le Roux, Guillermo Perez, Ufuk Topcu
Markov Decision Processies

Omega-regular objectives in Markov decision processes (MDPs) reduce to reachability: find a policy which maximizes the probability of reaching a target set of states. Given an MDP, an initial distribution, and a target set of states, such a policy can be computed by most probabilistic model checking tools. If the MDP is only partially specified, i.e., some prob- abilities are unknown, then model-learning techniques can be used to statistically approximate the probabilities and enable the computation of the de- sired policy. For fully specified MDPs, reducing the size of the MDP translates into faster model checking; for partially specified MDPs, into faster learning. We provide reduction techniques that al- low us to remove irrelevant transition probabilities: transition probabilities (known, or to be learned) that do not influence the maximal reachability probability. Among other applications, these reductions can be seen as a pre-processing of MDPs before model checking or as a way to reduce the number of experiments required to obtain a good approximation of an unknown MDP.

### Tuesday 2215:00 - 16:00NLP-NLP - Natural Language Processing (220)

Chair: Jiajun Zhang
• #1204
Microblog Sentiment Classiﬁcation via Recurrent Random Walk Network Learning
Zhou Zhao, Hanqing Lu, Deng Cai, Xiaofei He, Yueting Zhuang
Natural Language Processing

Microblog Sentiment Classiﬁcation (MSC) is a challenging task in microblog mining, arising in many applications such as stock price prediction and crisis management. Currently, most of the existing approaches learn the user sentiment model from their posted tweets in microblogs, which suffer from the insufﬁciency of discriminative tweet representation. In this paper, we consider the problem of microblog sentiment classiﬁcation from the viewpoint of heterogeneous MSC network embedding. We propose a novel recurrent random walk network learning framework for the problem by exploiting both users’ posted tweets and their social relations in microblogs. We then introduce the deep recurrent neural networks with random-walk layer for heterogeneous MSC network embedding, which can be trained end-to-end from the scratch. Weemploytheback-propagationmethodfortraining the proposed recurrent random walk network model. The extensive experiments on the large-scale public datasets from Twitter show that our method achieves better performance than other state-of-the-art solutions to the problem.

• #2094
A Variational Autoencoding Approach for Inducing Cross-lingual Word Embeddings
Liangchen Wei, Zhi-Hong Deng
Natural Language Processing

Cross-language learning allows one to use training data from one language to build models for another language. Many traditional approaches require word-level alignment sentences from parallel corpora, in this paper we define a general bilingual training objective function requiring sentence level parallel corpus only. We propose a variational autoencoding approach for training bilingual word embeddings. The variational model introduces a continuous latent variable to explicitly model the underlying semantics of the parallel sentence pairs and to guide the generation of the sentence pairs. Our model restricts the bilingual word embeddings to represent words in exactly the same continuous vector space. Empirical results on the task of cross lingual document classification has shown that our method is effective.

• #2132
Automatic Assessment of Absolute Sentence Complexity
Sanja Stajner, Simone Paolo Ponzetto, Heiner Stuckenschmidt
Natural Language Processing

Lexically and syntactically simpler sentences result in shorter reading time and better understanding in many people. However, no reliable systems for automatic assessment of absolute sentence complexity have been proposed so far. Instead, the assessment is usually done manually, requiring expert human annotators. To address this problem, we first define the sentence complexity assessment as a five-level classification task, and build a ‘gold standard’ dataset. Next, we propose robust systems for sentence complexity assessment, using a novel set of features based on leveraging lexical properties of freely available corpora, and investigate the impact of the feature type and corpus size on the classification performance.

• #4164
Why Can't You Convince Me? Modeling Weaknesses in Unpersuasive Arguments
Isaac Persing, Vincent Ng
Natural Language Processing

Recent work on argument persuasiveness has focused on determining how persuasive an argument is. Oftentimes, however, it is equally important to understand why an argument is unpersuasive, as it is difficult for an author to make her argument more persuasive unless she first knows what errors made it unpersuasive. Motivated by this practical concern, we (1) annotate a corpus of debate comments with not only their persuasiveness scores but also the errors they contain, (2) propose an approach to persuasiveness scoring and error identification that outperforms competing baselines, and (3) show that the persuasiveness scores computed by our approach can indeed be explained by the errors it identifies.

### Tuesday 2215:00 - 16:00Competition (206)

Chair: Jochen Renz
• Angry Birds
Competition
• ### Tuesday 2216:30 - 18:00AUT-SEC - AI & Autonomy: Security (Plenary 2)

Chair: Frank Dignum
• #2255
Context-Based Reasoning on Privacy in Internet of Things
Nadin Kokciyan, Pinar Yolum
AI & Autonomy: Security

More and more, devices around us are being connected to each other in the realm of Internet of Things (IoT). Their communication and especially collaboration promises useful services to be provided to end users. However, the same communication channels pose important privacy concerns to be raised. It is not clear which information will be shared with whom, for which intents, under which conditions. Existing approaches to privacy advocate policies to be in place to regulate privacy. However, the scale and heterogeneity of the IoT entities make it infeasible to maintain policies among each and every entity in the system. Conversely, it is best if each entity can reason on the privacy using norms and context autonomously. Accordingly, this paper proposes an approach where each entity finds out which contexts it is in based on information it gathers from other entities in the system. The proposed approach uses argumentation to enable IoT entities to reason about their context and decide to reveal information based on it. We demonstrate the applicability of the approach over an IoT scenario.

• #3510
Privacy and Autonomous Systems
Jose M. Such
AI & Autonomy: Security

We discuss the problem of privacy in autonomous systems, introducing different conceptualizations and perspectives on privacy to assess the threats that autonomous systems may pose to privacy. After this, we outline socio-technical and legal measures that should be put in place to mitigate these threats. Beyond privacy threats and countermeasures, we also argue how autonomous systems may be, at the same time, the key to address some of the most challenging and pressing privacy problems nowadays and in the near future.

• #3518
Concrete Problems for Autonomous Vehicle Safety: Advantages of Bayesian Deep Learning
Rowan McAllister, Yarin Gal, Alex Kendall, Mark van der Wilk, Amar Shah, Roberto Cipolla, Adrian Weller
AI & Autonomy: Security

Autonomous vehicle (AV) software is typically composed of a pipeline of individual components, linking sensor inputs to motor outputs. Erroneous component outputs propagate downstream, hence safe AV software must consider the ultimate effect of each component’s errors. Further, improving safety alone is not sufficient. Passengers must also feel safe to trust and use AV systems. To address such concerns, we investigate three under-explored themes for AV research: safety, interpretability, and compliance. Safety can be improved by quantifying the uncertainties of component outputs and propagating them forward through the pipeline. Interpretability is concerned with explaining what the AV observes and why it makes the decisions it does, building reassurance with the passenger. Compliance refers to maintaining some control for the passenger. We discuss open challenges for research within these themes. We highlight the need for concrete evaluation metrics, propose example problems, and highlight possible solutions.

• #3889
Algorithmic Bias in Autonomous Systems
David Danks, Alex John London
AI & Autonomy: Security

Algorithms play a key role in the functioning of autonomous systems, and so concerns have periodically been raised about the possibility of algorithmic bias. However, debates in this area have been hampered by different meanings and uses of the term, "bias." It is sometimes used as a purely descriptive term, sometimes as a pejorative term, and such variations can promote confusion and hamper discussions about when and how to respond to algorithmic bias. In this paper, we first provide a taxonomy of different types and sources of algorithmic bias, with a focus on their different impacts on the proper functioning of autonomous systems. We then use this taxonomy to distinguish between algorithmic biases that are neutral or unobjectionable, and those that are problematic in some way and require a response. In some cases, there are technological or algorithmic adjustments that developers can use to compensate for problematic bias. In other cases, however, responses require adjustments by the agent, whether human or autonomous system, who uses the results of the algorithm. There is no "one size fits all" solution to algorithmic bias.

### Tuesday 2216:30 - 18:00ML-CL2 - Classification 2 (204)

Chair: Georg Dorffner
• #1569
Convolutional 2D LDA for Nonlinear Dimensionality Reduction
Qi Wang, Zequn Qin, Feiping Nie, Yuan Yuan
Classification 2

Representing high-volume and high-order data is an essential problem, especially in machine learning field. Although existing two-dimensional (2D) discriminant analysis achieves promising performance, the single and linear projection features make it difficult to analyze more complex data. In this paper, we propose a novel convolutional two-dimensional linear discriminant analysis (2D LDA) method for data representation. In order to deal with nonlinear data, a specially designed Convolutional Neural Networks (CNN) is presented, which can be proved having the equivalent objective function with common 2D LDA. In this way, the discriminant ability can benefit from not only the nonlinearity of Convolutional Neural Networks, but also the powerful learning process. Experiment results on several datasets show that the proposed method performs better than other state-of-the-art methods in terms of classification accuracy.

• #1766
Hierarchical Feature Selection with Recursive Regularization
Hong Zhao, Pengfei Zhu, Ping Wang, Qinghua Hu
Classification 2

In the big data era, the sizes of datasets have increased dramatically in terms of the number of samples, features, and classes. In particular, there exists usually a hierarchical structure among the classes. This kind of task is called hierarchical classification. Various algorithms have been developed to select informative features for flat classification. However, these algorithms ignore the semantic hyponymy in the directory of hierarchical classes, and select a uniform subset of the features for all classes. In this paper, we propose a new technique for hierarchical feature selection based on recursive regularization. This algorithm takes the hierarchical information of the class structure into account. As opposed to flat feature selection, we select different feature subsets for each node in a hierarchical tree structure using the parent-children relationships and the sibling relationships for hierarchical regularization. By imposing $\ell_{2,1}$-norm regularization to different parts of the hierarchical classes, we can learn a sparse matrix for the feature ranking of each node. Extensive experiments on public datasets demonstrate the effectiveness of the proposed algorithm.

• #1808
Classification and Representation Joint Learning via Deep Networks
Ya Li, Xinmei Tian, Xu Shen, Dacheng Tao
Classification 2

Deep learning has been proven to be effective for classification problems. However, the majority of previous works trained classifiers by considering only class label information and ignoring the local information from the spatial distribution of training samples. In this paper, we propose a deep learning framework that considers both class label information and local spatial distribution information between training samples. A two-channel network with shared weights is used to measure the local distribution. The classification performance can be improved with more detailed information provided by the local distribution, particularly when the training samples are insufficient. Additionally, the class label information can help to learn better feature representations compared with other feature learning methods that use only local distribution information between samples. The local distribution constraint between sample pairs can also be viewed as a regularization of the network, which can efficiently prevent the overfitting problem. Extensive experiments are conducted on several benchmark image classification datasets, and the results demonstrate the effectiveness of our proposed method.

• #2228
Discriminant Tensor Dictionary Learning with Neighbor Uncorrelation for Image Set Based Classification
Fei Wu, Xiao-Yuan Jing, Wangmeng Zuo, Ruiping Wang, Xiaoke Zhu
Classification 2

Image set based classification (ISC) has attracted lots of research interest in recent years. Several ISC methods have been developed, and dictionary learning technique based methods obtain state-of-the-art performance. However, existing ISC methods usually transform the image sample of a set into a vector for subsequent processing, which breaks the inherent spatial structure of image sample and the set. In this paper, we utilize tensor to model an image set with two spatial modes and one set mode, which can fully explore the intrinsic structure of image set. We propose a novel ISC approach, named discriminant tensor dictionary learning with neighbor uncorrelation (DTDLNU), which jointly learns two spatial dictionaries and one set dictionary. The spatial and set dictionaries are composed by set-specific sub-dictionaries corresponding to the class labels, such that the reconstruction error is discriminative. To obtain dictionaries with favorable discriminative power, DTDLNU designs a neighbor-uncorrelated discriminant tensor dictionary term, which minimizes the within-class scatter of the training sets in the projected tensor space and reduces tensor dictionary correlation among set-specific sub-dictionaries corresponding to neighbor sets from different classes. Experiments on three challenging datasets demonstrate the effectiveness of DTDLNU.

• #2774
Learning Feature Engineering for Classification
Fatemeh Nargesian, Horst Samulowitz, Udayan Khurana, Elias B. Khalil, Deepak Turaga
Classification 2

Feature engineering is the task of improving predictive modelling performance on a dataset by transforming its feature space. Existing approaches to automate this process rely on either transformed feature space exploration through evaluation-guided search, or explicit expansion of datasets with all transformed features followed by feature selection. Such approaches incur high computational costs in runtime and/or memory. We present a novel technique, called Learning Feature Engineering (LFE), for automating feature engineering in classification tasks. LFE is based on learning the effectiveness of applying a transformation (e.g., arithmetic or aggregate operators) on numerical features, from past feature engineering experiences. Given a new dataset, LFE recommends a set of useful transformations to be applied on features without relying on model evaluation or explicit feature expansion and selection. Using a collection of datasets, we train a set of neural networks, which aim at predicting the transformation that impacts classification performance positively. Our empirical results show that LFE outperforms other feature engineering approaches for an overwhelming majority (89%) of the datasets from various sources while incurring a substantially lower computational cost.

• #2884
Instability Prediction in Power Systems using Recurrent Neural Networks
Ankita Gupta, Gurunath Gurrala, Pidaparthy S Sastry
Classification 2

Recurrent Neural Networks (RNNs) can model temporal dependencies in time series well. In this paper we present an interesting application of stacked Gated Recurrent Unit (GRU) based RNN for early prediction of imminent instability in a power system based on normal measurements of power system variables over time. In a power system, disturbances like a fault can result in transient instability which may lead to blackouts. Early pre- diction of any such contingency can aid the operator to take timely preventive control actions. In recent times some machine learning techniques such as SVMs have been proposed to predict such instability. However, these approaches assume availability of accurate fault information like its occurrence and clearance instants which is impractical. In this paper we propose an Online Monitoring System (OMS), which is a GRU based RNN, that continuously keeps predicting the current status based on past measurements. Through extensive simulations using a standard 118-bus system, the effectiveness of the proposed system is demonstrated. We also show how we can use PCA and predictions from the RNN to identify the most critical generator that leads to transient instability.

### Tuesday 2216:30 - 18:00ML-FSC2 - Feature Selection and Construction 2 (210)

Chair: Min-Ling Zhang
• #1359
Self-Paced Multitask Learning with Shared Knowledge
Keerthiram Murugesan, Jaime Carbonell
Feature Selection and Construction 2

This paper introduces self-paced task selection to multitask learning, where instances from more closely related tasks are selected in a progression of easier-to-harder tasks, to emulate an effective human education strategy, but applied to multitask machine learning. We develop the mathematical foundation for the approach based on iterative selection of the most appropriate task, learning the task parameters, and updating the shared knowledge, optimizing a new bi-convex loss function. This proposed method applies quite generally, including to multitask feature learning, multitask learning with alternating structure optimization, etc. Results show that in each of the above formulations self-paced (easier-to-harder) task selection outperforms the baseline version of these methods in all the experiments.

• #1541
Adaptive Hypergraph Learning for Unsupervised Feature Selection
Xiaofeng Zhu, Yonghua Zhu, Shichao Zhang, Rongyao Hu, Wei He
Feature Selection and Construction 2

Current unsupervised feature selection (UFS) methods learn the similarity matrix by using a simple graph which is learnt from the original data as well as is independent from the process of feature selection, and thus unable to efficiently remove the redundant/irrelevant features. To address these issues, we propose a new UFS method to jointly learn the similarity matrix and conduct both subspace learning (via learning a dynamic hypergraph) and feature selection (via a sparsity constraint). As a result, we reduce the feature dimensions using different methods (i.e., subspace learning and feature selection) from different feature spaces, and thus makes our method select the informative features effectively and robustly. We tested our method using benchmark datasets to conduct the clustering tasks using the selected features, and the experimental results show that our proposed method outperforms all the comparison methods.

• #2634
Data-driven Random Fourier Features using Stein Effect
Wei-Cheng Chang, Chun-Liang Li, Yiming Yang, Barnabás Póczos
Feature Selection and Construction 2

Large-scale kernel approximation is an important problem in machine learning research. Approaches using random Fourier features have become increasingly popular \cite{Rahimi_NIPS_07}, where kernel approximation is treated as empirical mean estimation via Monte Carlo (MC) or Quasi-Monte Carlo (QMC) integration \cite{Yang_ICML_14}. A limitation of the current approaches is that all the features receive an equal weight summing to 1. In this paper, we propose a novel shrinkage estimator from "Stein effect", which provides a data-driven weighting strategy for random features and enjoys theoretical justifications in terms of lowering the empirical risk. We further present an efficient randomized algorithm for large-scale applications of the proposed method. Our empirical results on six benchmark data sets demonstrate the advantageous performance of this approach over representative baselines in both kernel approximation and supervised learning tasks.

• #3088
Theoretic Analysis and Extremely Easy Algorithms for Domain Adaptive Feature Learning
Wenhao Jiang, Cheng Deng, Wei Liu, Feiping Nie, Fu-lai Chung, Heng Huang
Feature Selection and Construction 2

Domain adaptation problems arise in a variety of applications, where a training dataset from the source domain and a test dataset from the target domain typically follow different distributions. The primary difficulty in designing effective learning models to solve such problems lies in how to bridge the gap between the source and target distributions. In this paper, we provide comprehensive analysis of feature learning algorithms used in conjunction with linear classifiers for domain adaptation. Our analysis shows that in order to achieve good adaptation performance, the second moments of the source domain distribution and target domain distribution should be similar. Based on our new analysis, a novel extremely easy feature learning algorithm for domain adaptation is proposed. Furthermore, our algorithm is extended by leveraging multiple layers, leading to another feature learning algorithm. We evaluate the effectiveness of the proposed algorithms in terms of domain adaptation tasks on Amazon review and spam datasets from the ECML/PKDD 2006 discovery challenge.

• #3486
Multiple Indefinite Kernel Learning for Feature Selection
Hui Xue, Yu Song, Hai-Ming Xu
Feature Selection and Construction 2

Multiple kernel learning for feature selection (MKL-FS) utilizes kernels to explore complex properties of features and performs better in embedded methods. However, the kernels in MKL-FS are generally limited to be positive definite. In fact, indefinite kernels often emerge in actual applications and can achieve better empirical performance. But due to the non-convexity of indefinite kernels, existing MKL-FS methods are usually inapplicable and the corresponding research is also relatively little. In this paper, we propose a novel multiple indefinite kernel feature selection method (MIK-FS) based on the primal framework of indefinite kernel support vector machine (IKSVM), which applies an indefinite base kernel for each feature and then exerts an l1-norm constraint on kernel combination coefficients to select features automatically. A two-stage algorithm is further presented to optimize the coefficients of IKSVM and kernel combination alternately. In the algorithm, we reformulate the non-convex optimization problem of primal IKSVM as a difference of convex functions (DC) programming and transform the non-convex problem into a convex one with the affine minorization approximation. Experiments on real-world datasets demonstrate that MIK-FS is superior to some related state-of-the-art methods in both feature selection and classification performance.

• #3797
Learning Sparse Representations in Reinforcement Learning with Sparse Coding
Lei Le, Raksha Kumaraswamy, Martha White
Feature Selection and Construction 2

A variety of representation learning approaches have been investigated for reinforcement learning; much less attention, however, has been given to investigating the utility of sparse coding. Outside of reinforcement learning, sparse coding representations have been widely used, with non-convex objectives that result in discriminative representations. In this work, we develop a supervised sparse coding objective for policy evaluation. Despite the non-convexity of this objective, we prove that all local minima are global minima, making the approach amenable to simple optimization strategies. We empirically show that it is key to use a supervised objective, rather than the more straightforward unsupervised sparse coding approach. We then compare the learned representations to a canonical fixed sparse representation, called tile-coding, demonstrating that the sparse coding representation outperforms a wide variety of tile-coding representations.

### Tuesday 2216:30 - 18:00ML-DM2 - Data Mining 2 (211)

Chair: Jeffrey Chan
• #1285
Doubly Sparsifying Network
Zhangyang Wang, Shuai Huang, Jiayu Zhou, Thomas S. Huang
Data Mining 2

We propose the doubly sparsifying network (DSN), by drawing inspirations from the double sparsity model for dictionary learning. DSN emphasizes the joint utilization of both the problem structure and the parameter structure. It simultaneously sparsifies the output features and the learned model parameters, under one unified framework. DSN enjoys intuitive model interpretation, compact model size and low complexity. We compare DSN against a few carefully-designed baselines, to verify its consistently superior performance in a wide range of settings. Encouraged by its robustness to insufficient training data, we explore the applicability of DSN in brain signal processing that has been a challenging interdisciplinary area. DSN is evaluated for two mainstream tasks, electroencephalographic (EEG) signal classification and blood oxygenation level dependent (BOLD) response prediction, both achieving promising results.

• #1325
Improved Bounded Matrix Completion for Large-Scale Recommender Systems
Huang Fang, Zhang Zhen, Yiqun Shao, Cho-Jui Hsieh
Data Mining 2

Matrix completion is a widely used technique for personalized recommender system. In this paper, we focus on the idea of Bounded Matrix Completion (BMC) which imposes bounded constraint into the original matrix completion problem. It has been shown that BMC works well for several real world datasets, and an efficient coordinate descent solver called BMA has been proposed in~\cite{bma}. However, we observe that the BMA algorithm sometimes fails to converge to a stationary point, resulting in a relatively poor accuracy in those cases. To overcome this issue, we propose our new approach for solving BMC under the ADMM framework. The proposed algorithm is gauranteed to converge to stationary points. Experimental results on real world datasets show that our algorithm can reach a lower objective value, obtain a higher predict accuracy rate and have better scalability compared with BMA. We also present that our method outperforms the state-of-art standard matrix factorization in most cases.

• #2317
Multi-view Feature Learning with Discriminative Regularization
Jinglin Xu, Junwei Han, Feiping Nie
Data Mining 2

More and more multi-view data which can capture rich information from heterogeneous features are widely used in real world applications. How to integrate different types of features, and how to learn low dimensional and discriminative information from high dimensional data are two main challenges. To address these challenges, this paper proposes a novel multi-view feature learning framework, which is regularized by discriminative information and obtains a feature learning model that contains multiple discriminative feature weighting matrices for different views, and then yields multiple low dimensional features used for subsequent multi-view clustering. To optimize the formulated objective function, we transform the proposed framework into a trace optimization problem which obtains the global solution in a closed form. Experimental evaluations on four widely used datasets and comparisons with a number of state-of-the-art multi-view clustering algorithms demonstrate the superiority of the proposed work.

• #2724
LoCaTe: Influence Quantification for Location Promotion in Location-based Social Networks
Ankita Likhyani, Srikanta Bedathur, Deepak P
Data Mining 2

Location-based social networks (LBSNs) such as Foursquare offer a platform for users to share and be aware of each other’s physical movements. As a result of such a sharing of check-in information with each other, users can be influenced to visit (or check-in) at the locations visited by their friends. Quantifying such influences in these LBSNs is useful in various settings such as location promotion, personalized recommendations, mobility pattern prediction etc. In this paper, we focus on the problem of location promotion and develop a model to quantify the influence specific to a location between a pair of users. Specifically, we develop a joint model called LoCaTe, consisting of (i) user mobility model estimated using kernel density estimates; (ii) a model of the semantics of the location using topic models; and (iii) a model of time-gap between check-ins using exponential distribution. We validate our model on a long-term crawl of Foursquare data collected between Jan 2015 Feb 2016, as well as on publicly available LBSN datasets. Our experiments demonstrate that LoCaTe significantly outperforms state-of-the-art models for the same task.

• #3355
Effective Representing of Information Network by Variational Autoencoder
Hang Li, Haozheng Wang, Zhenglu Yang, Haochen Liu
Data Mining 2

Network representation is the basis of many applications and of extensive interest in various fields, such as information retrieval, social network analysis, and recommendation systems. Most previous methods for network representation only consider the incomplete aspects of a problem, including link structure, node information, and partial integration. The present study proposes a deep network representation model that seamlessly integrates the text information and structure of a network. Our model captures highly non-linear relationships between nodes and complex features of a network by exploiting the variational autoencoder (VAE), which is a deep unsupervised generation algorithm. We also merge the representation learned with a paragraph vector model and that learned with the VAE to obtain the network representation that preserves both structure and text information. We conduct comprehensive empirical experiments on benchmark datasets and find our model performs better than state-of-the-art techniques by a large margin.

• #1455
Cross-Domain Recommendation: An Embedding and Mapping Approach
Tong Man, Huawei Shen, Xiaolong Jin, Xueqi Cheng
Data Mining 2

Data sparsity is one of the most challenging problems for recommender systems. One promising solution to this problem is cross-domain recommendation, i.e., leveraging feedbacks or ratings from multiple domains to improve recommendation performance in a collective manner. In this paper, we propose an Embedding and Mapping framework for Cross-Domain Recommendation, called EMCDR. The proposed EMCDR framework distinguishes itself from existing cross-domain recommendation models in two aspects. First, a multi-layer perceptron is used to capture the nonlinear mapping function across domains, which offers high flexibility for learning domain-specific features of entities in each domain. Second, only the entities with sufficient data are used to learn the mapping function, guaranteeing its robustness to noise caused by data sparsity in single domain. Extensive experiments on two cross-domain recommendation scenarios demonstrate that EMCDR significantly outperforms state-of-the-art cross-domain recommendation methods.

### Tuesday 2216:30 - 18:00ML-TSDS2 - Time Series and Data Streams 2 (212)

Chair: Albert Bifet
• #1369
A Functional Dynamic Boltzmann Machine
Hiroshi Kajino
Time Series and Data Streams 2

Dynamic Boltzmann machines (DyBMs) are recently developed generative models of a time series. They are designed to learn a time series by efficient online learning algorithms, whilst taking long-term dependencies into account with help of eligibility traces, recursively updatable memory units storing descriptive statistics of all the past data. The current DyBMs assume a finite-dimensional time series and cannot be applied to a functional time series, in which the dimension goes to infinity (e.g., spatiotemporal data on a continuous space). In this paper, we present a functional dynamic Boltzmann machine (F-DyBM) as a generative model of a functional time series. A technical challenge is to devise an online learning algorithm with which F-DyBM, consisting of functions and integrals, can learn a functional time series using only finite observations of it. We rise to the above challenge by combining a kernel-based function approximation method along with a statistical interpolation method and finally derive closed-form update rules. We design numerical experiments to empirically confirm the effectiveness of our solutions. The experimental results demonstrate consistent error reductions as compared to baseline methods, from which we conclude the effectiveness of F-DyBM for functional time series prediction.

• #1824
Bayesian Dynamic Mode Decomposition
Naoya Takeishi, Yoshinobu Kawahara, Yasuo Tabei, Takehisa Yairi
Time Series and Data Streams 2

Dynamic mode decomposition (DMD) is a data-driven method for calculating a modal representation of a nonlinear dynamical system, and it has been utilized in various fields of science and engineering. In this paper, we propose Bayesian DMD, which provides a principled way to transfer the advantages of the Bayesian formulation into DMD. To this end, we first develop a probabilistic model corresponding to DMD, and then, provide the Gibbs sampler for the posterior inference in Bayesian DMD. Moreover, as a specific example, we discuss the case of using a sparsity-promoting prior for an automatic determination of the number of dynamic modes. We investigate the empirical performance of Bayesian DMD using synthetic and real-world datasets.

• #2135
Hybrid Neural Networks for Learning the Trend in Time Series
Tao Lin, Tian Guo, Karl Aberer
Time Series and Data Streams 2

The trend of time series characterizes the intermediate upward and downward behaviour of time series. Learning and forecasting the trend in time series data play an important role in many real applications, ranging from resource allocation in data centers, load schedule in smart grid, and so on. Inspired by the recent successes of neural networks, in this paper we propose TreNet, a novel end-to-end hybrid neural network to learn local and global contextual features for predicting the trend of time series. TreNet leverages convolutional neural networks (CNNs) to extract salient features from local raw data of time series. Meanwhile, considering the long-range dependency existing in the sequence of historical trends of time series, TreNet uses a long-short term memory recurrent neural network (LSTM) to capture such dependency. Then, a feature fusion layer is to learn joint representation for predicting the trend. TreNet demonstrates its effectiveness by outperforming CNN, LSTM, the cascade of CNN and LSTM, Hidden Markov Model based method and various kernel based baselines on real datasets.

• #2749
A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction
Yao Qin, Dongjin Song, Haifeng Chen, Wei Cheng, Guofei Jiang, Garrison W. Cottrell
Time Series and Data Streams 2

The Nonlinear autoregressive exogenous (NARX) model, which predicts the current value of a time series based upon its previous values as well as the current and past values of multiple driving (exogenous) series, has been studied for decades. Despite the fact that various NARX models have been developed, few of them can capture the long-term temporal dependencies appropriately and select the relevant driving series to make predictions. In this paper, we propose a dual-stage attention-based recurrent neural network (DA-RNN) to address these two issues. In the first stage, we introduce an input attention mechanism to adaptively extract relevant driving series (a.k.a., input features) at each time step by referring to the previous encoder hidden state. In the second stage, we use a temporal attention mechanism to select relevant encoder hidden states across all time steps. With this dual-stage attention scheme, our model can not only make predictions effectively, but can also be easily interpreted. Thorough empirical studies based upon the SML 2010 dataset and the NASDAQ 100 Stock dataset demonstrate that the DA-RNN can outperform state-of-the-art methods for time series prediction.

• #3934
CHARDA: Causal Hybrid Automata Recovery via Dynamic Analysis
Adam Summerville, Joseph Osborn, Michael Mateas
Time Series and Data Streams 2

We propose and evaluate a new technique for learning hybrid automata automatically by observing the runtime behavior of a dynamical system.Working from a sequence of continuous state values and predicates about the environment, CHARDA recovers the distinct dynamic modes, learns a model for each mode from a given set of templates, and postulates \textit{causal} guard conditions which trigger transitions between modes.Our main contribution is the use of information-theoretic measures (1)~as a cost function for data segmentation and model selction to penalize over-fitting and (2)~to determine the likely causes of each transition.CHARDA is easily extended with different classes of model templates, fitting methods, or predicates.In our experiments on a complex videogame character, CHARDA successfully discovers a reasonable over-approximation of the character's true behaviors.Our results also compare favorably against recent work in automatically learning probabilistic timed automata in an aircraft domain: CHARDA exactly learns the modes of these simpler automata.

• #4019
Sequential Prediction of Social Media Popularity with Deep Temporal Context Networks
Bo Wu, Wen-Huang Cheng, Yongdong Zhang, Qiushi Huang, Jintao Li, Tao Mei
Time Series and Data Streams 2

Prediction of popularity has profound impact for social media, since it offers opportunities to reveal individual preference and public attention from evolutionary social systems. Previous research, although achieves promising results, neglects one distinctive characteristic of social data, i.e., sequentiality. For example, the popularity of online content is generated over time with sequential post streams of social media. To investigate the sequential prediction of popularity, we propose a novel prediction framework called Deep Temporal Context Networks (DTCN) by incorporating both temporal context and temporal attention into account. Our DTCN contains three main components, from embedding, learning to predicting. With a joint embedding network, we obtain a unified deep representation of multi-modal user-post data in a common embedding space. Then, based on the embedded data sequence over time, temporal context learning attempts to recurrently learn two adaptive temporal contexts for sequential popularity. Finally, a novel temporal attention is designed to predict new popularity (the popularity of a new user-post pair) with temporal coherence across multiple time-scales. Experiments on our released image dataset with about 600K Flickr photos demonstrate that DTCN outperforms state-of-the-art deep prediction algorithms, with an average of 21.51% relative performance improvement in the popularity prediction (Spearman Ranking Correlation).

### Tuesday 2216:30 - 18:00ML-KM - Kernel Methods (213)

Chair: James Kwok
• #1292
Large-scale Online Kernel Learning with Random Feature Reparameterization
Tu Dinh Nguyen, Trung Le, Hung Bui, Dinh Phung
Kernel Methods

A typical online kernel learning method faces two fundamental issues: the complexity in dealing with a huge number of observed data points (a.k.a the curse of kernelization) and the difficulty in learning kernel parameters, which often assumed to be fixed. Random Fourier feature is a recent and effective approach to address the former by approximating the shift-invariant kernel function via Bocher's theorem, and allows the model to be maintained directly in the random feature space with a fixed dimension, hence the model size remains constant w.r.t. data size. We further introduce in this paper the reparameterized random feature (RRF), a random feature framework for large-scale online kernel learning to address both aforementioned challenges. Our initial intuition comes from the so-called "reparameterization trick" [Kingma et al., 2014] to lift the source of randomness of Fourier components to another space which can be independently sampled, so that stochastic gradient of the kernel parameters can be analytically derived. We develop a well-founded underlying theory for our method, including a general way to reparameterize the kernel, and a new tighter error bound on the approximation quality. This view further inspires a direct application of stochastic gradient descent for updating our model under an online learning setting. We then conducted extensive experiments on several large-scale datasets where we demonstrate that our work achieves state-of-the-art performance in both learning efficacy and efficiency.

• #1501
Multiple Kernel Clustering Framework with Improved Kernels
Yueqing Wang, Xinwang Liu, Yong Dou, Rongchun Li
Kernel Methods

Multiple kernel clustering (MKC) algorithms have been successfully applied into various applications. However, these successes are largely dependent on the quality of pre-defined base kernels, which cannot be guaranteed in practical applications. This may adversely affect the clustering performance. To address this issue, we propose a simple while effective framework to adaptively improve the quality of these base kernels. Under our framework, we instantiate three MKC algorithms based on the widely used multiple kernel $k$-means clustering (MKKM), MKKM with matrix-induced regularization (MKKM-MR) and co-regularized multi-view spectral clustering (CRSC). After that, we design the corresponding algorithms with proved convergence to solve the resultant optimization problems. To the best of our knowledge, our framework fills the gap between kernel adaption and clustering procedure for the first time in the literature and is readily extendable. Extensive experimental research has been conducted on 7 MKC benchmarks. As is shown, our algorithms consistently and significantly improve the performance of the base MKC algorithms, indicating the effectiveness of the proposed framework. Meanwhile, our framework shows better performance than compared ones with imperfect kernels.

• #1502
Approximate Large-scale Multiple Kernel k-means Using Deep Neural Network
Yueqing Wang, Xinwang Liu, Yong Dou, Rongchun Li
Kernel Methods

Multiple kernel clustering (MKC) algorithms have been extensively studied and applied to various applications. Although they demonstrate great success in both the theoretical aspects and applications, existing MKC algorithms cannot be applied to large-scale clustering tasks due to: i) the heavy computational cost to calculate the base kernels; and ii) insufficient memory to load the kernel matrices. In this paper, we propose an approximate algorithm to overcome these issues, and to make it be applicable to large-scale applications. Specifically, our algorithm trains a deep neural network to regress the indicating matrix generated by MKC algorithms on a small subset, and then obtains the approximate indicating matrix of the whole data set using the trained network, and finally performs the $k$-means on the output of our network. By mapping features into indicating matrix directly, our algorithm avoids computing the full kernel matrices, which dramatically decreases the memory requirement. Extensive experiments show that our algorithm consumes less time than most comparatively similar algorithms, while it achieves comparable performance with MKC algorithms.

• #3931
Learning Co-Substructures by Kernel Dependence Maximization
Sho Yokoi, Daichi Mochihashi, Ryo Takahashi, Naoaki Okazaki, Kentaro Inui
Kernel Methods

Modeling associations between items in a dataset is a problem that is frequently encountered in data and knowledge mining research. Most previous studies have simply applied a predefined fixed pattern for extracting the substructure of each item pair and then analyzed the associations between these substructures. Using such fixed patterns may not, however, capture the significant association. We, therefore, propose the novel machine learning task of extracting a strongly associated substructure pair (co-substructure) from each input item pair. We call this task dependent co-substructure extraction (DCSE), and formalize it as a dependence maximization problem. Then, we discuss critical issues with this task: the data sparsity problem and a huge search space. To address the data sparsity problem, we adopt the Hilbert--Schmidt independence criterion as an objective function. To improve search efficiency, we adopt the Metropolis--Hastings algorithm. We report the results of empirical evaluations, in which the proposed method is applied for acquiring and predicting narrative event pairs, an active task in the field of natural language processing.

• #3359
Student-t Process Regression with Student-t Likelihood
Qingtao Tang, Li Niu, Yisen Wang, Tao Dai, Wangpeng An, Jianfei Cai, Shu-Tao Xia
Kernel Methods

Gaussian Process Regression (GPR) is a powerful Bayesian method. However, the performance of GPR can be significantly degraded when the training data are contaminated by outliers, including target outliers and input outliers. Although there are some variants of GPR (e.g., GPR with Student-t likelihood (GPRT)) aiming to handle outliers, most of the variants focus on handling the target outliers while little effort has been done to deal with the input outliers. In contrast, in this work, we aim to handle both the target outliers and the input outliers at the same time. Specifically, we replace the Gaussian noise in GPR with independent Student-t noise to cope with the target outliers. Moreover, to enhance the robustness w.r.t. the input outliers, we use a Student-t Process prior instead of the common Gaussian Process prior, leading to Student-t Process Regression with Student-t Likelihood (TPRT). We theoretically show that TPRT is more robust to both input and target outliers than GPR and GPRT, and prove that both GPR and GPRT are special cases of TPRT. Various experiments demonstrate that TPRT outperforms GPR and its variants on both synthetic and real datasets.

• #2898
Feature Selection via Scaling Factor Integrated Multi-Class Support Vector Machines
Jinglin Xu, Feiping Nie, Junwei Han
Kernel Methods

In data mining, we often encounter high dimensional and noisy features, which may not only increase the load of computational resources but also result in the problem of model overfitting. Feature selection is often adopted to address this issue. In this paper, we propose a novel feature selection method based on multi-class SVM, which introduces the scaling factor with a flexible parameter to renewedly adjust the distribution of feature weights and select the most discriminative features. Concretely, the proposed method designs a scaling factor with p/2 power to control the distribution of weights adaptively and search optimal sparsity of weighting matrix. In addition, to solve the proposed model, we provide an alternative and iterative optimization method. It not only makes solutions of weighting matrix and scaling factor independently, but also provides a better way to address the problem of solving L2,0-norm. Comprehensive experiments are conducted on six datasets to demonstrate that this work can obtain better performance compared with a number of existing state-of-the-art multi-class feature selection methods.

### Tuesday 2216:30 - 18:00CS-ST - Solvers and Tools (216)

Chair: Jordi Levy
• #1775
Scalable Constraint-based Virtual Data Center Allocation
Sam Bayless, Nodir Kodirov, Ivan Beschastnikh, Holger H. Hoos, Alan J. Hu
Solvers and Tools

Constraint-based techniques can solve challenging problems arising from highly diverse applications. This paper considers the problem of virtual data center (VDC) allocation, an important, emerging challenge for modern data center operators. To solve this problem, we introduce NETSOLVER, which is based on the general-purpose constraint solver MONOSAT. NETSOLVER represents a major improvement over existing approaches: it is sound, complete, and scalable, providing support for end-to-end, multi-path bandwidth guarantees across all the layers of hosting infrastructure, from servers to top-of-rack switches to aggregation switches to access routers. NETSOLVER scales to realistic data center sizes and VDC topologies, typically requiring just seconds to allocate VDCs of 5–15 virtual machines to physical data centers with 1000+ servers, maintaining this efficiency even when the data center is nearly saturated. In many cases, NETSOLVER can allocate 150%−300% as many total VDCs to the same physical data center as previous methods. Essential to our solution efficiency is our formulation of VDC allocation using monotonic theories, illustrating the practical value of the recently proposed SAT modulo monotonic theories approach.

• #2495
On Computing World Views of Epistemic Logic Programs
Tran Cao Son, Tiep Le, Patrick Kahl, Anthony Leclerc
Solvers and Tools

This paper presents a novel algorithm for computing world views of different semantics of epistemic logic programs (ELP) and two of its realization, called Ep-asp (for an older semantics) and Ep-asp^{se} (for the newest semantics), whose implementation builds on the theoretical advancement in the study of ELPs and takes advantage of the multi-shot computation paradigm of the answer set solver Clingo. The new algorithm differs from the majority of earlier algorithms in its strategy. Specifically, it computes one world view at a time and utilizes properties of world views to reduce its search space. It starts by computing an answer set and then determines whether or not a world view containing this answer set exists. In addition, it allows for the computation to focus on world views satisfying certain properties. The paper includes an experimental analysis of the performance of the two solvers comparing against a recently developed solver. It also contains an analysis of their performance in goal directed computing against a logic programming based conformant planning system, dlv-k. It concludes with some final remarks and discussion on the future work.

• #2513
Stochastic Constraint Programming with And-Or Branch-and-Bound
Behrouz Babaki, Tias Guns, Luc de Raedt
Solvers and Tools

Complex multi-stage decision making problems often involve uncertainty, for example, regarding demand or processing times. Stochastic constraint programming was proposed as a way to formulate and solve such decision problems, involving arbitrary constraints over both decision and random variables. What stochastic constraint programming still lacks is support for the use of factorized probabilistic models that are popular in the graphical model community. We show how a state-of-the-art probabilistic inference engine can be integrated into standard constraint solvers. The resulting approach searches over the And-Or search tree directly, and we investigate tight bounds on the expected utility objective. This significantly improves search efficiency and outperforms scenario-based methods that ground out the possible worlds.

• #3279
An Improved Decision-DNNF Compiler
Jean-Marie Lagniez, Pierre Marquis
Solvers and Tools

We present and evaluate a new compiler, called d4, targeting the Decision-DNNF language. As the state-of-the-art compilers C2D and Dsharp targeting the same language, d4 is a top-down tree-search algorithm exploring the space of propositional interpretations. d4 is based on the same ingredients as those considered in C2D and Dsharp (mainly, disjoint component analysis, conflict analysis and non-chronological backtracking, component caching). d4 takes advantage of a dynamic decomposition approach based on hypergraph partitioning, used sparingly. Some simplification rules are also used to minimize the time spent in the partitioning steps and to promote the quality of the decompositions. Experiments show that the compilation times and the sizes of the Decision-DNNF representations computed by d4 are in many cases significantly lower than the ones obtained by C2D and Dsharp.

• #3369
Solving Stochastic Boolean Satisfiability under Random-Exist Quantification
Nian-Ze Lee, Yen-Shi Wang, Jie-Hong R. Jiang
Solvers and Tools

Stochastic Boolean Satisfiability (SSAT) is a powerful formalism to represent computational problems with uncertainly, such as belief network inference and propositional probabilistic planning. Solving SSAT formulas lies in the same complexity class (PSPACE-complete) as solving Quantified Boolean Formula (QBF). While many endeavors have been made to enhance QBF solving, SSAT has drawn relatively less attention in recent years. This paper focuses on random-exist quantified SSAT formulas, and proposes an algorithm combining binary decision diagram (BDD), logic synthesis, and modern SAT techniques to improve computational efficiency. Unlike prior exact SSAT algorithms, the proposed method can be easily modified to solve approximate SSAT by deriving upper and lower bounds of satisfying probability. Experimental results show that our method outperforms the state-of-the-art algorithm on random k-CNF formulas and has effective application to approximate SSAT on circuit benchmarks.

• #2648
SVD-free Convex-Concave Approaches for Nuclear Norm Regularization
Yichi Xiao, Zhe Li, Tianbao Yang, Lijun Zhang
Solvers and Tools

Minimizing a convex function of matrices regularized by the nuclear norm arises in many applications such as collaborative filtering and multi-task learning. In this paper, we study the general setting where the convex function could be non-smooth. When the size of the data matrix, denoted by m x n, is very large, existing optimization methods are inefficient because in each iteration, they need to perform a singular value decomposition (SVD) which takes O(m^2 n) time. To reduce the computation cost, we exploit the dual characterization of the nuclear norm to introduce a convex-concave optimization problem and design a subgradient-based algorithm without performing SVD. In each iteration, the proposed algorithm only computes the largest singular vector, reducing the time complexity from O(m^2 n) to O(mn). To the best of our knowledge, this is the first SVD-free convex optimization approach for nuclear-norm regularized problems that does not rely on the smoothness assumption. Theoretical analysis shows that the proposed algorithm converges at an optimal O(1/\sqrt{T}) rate where T is the number of iterations. We also extend our algorithm to the stochastic case where only stochastic subgradients of the convex function are available and a special case that contains an additional non-smooth regularizer (e.g., L1 norm regularizer). We conduct experiments on robust low-rank matrix approximation and link prediction to demonstrate the efficiency of our algorithms.

### Tuesday 2216:30 - 18:00KR-ARTP - Automated Reasoning and Theorem Proving (217)

Chair: Alessio Lomuscio
• #1842
The Impact of Treewidth on ASP Grounding and Solving
Bernhard Bliem, Marius Moldovan, Michael Morak, Stefan Woltran
Automated Reasoning and Theorem Proving

In this paper, we aim to study how the performance of modern answer set programming (ASP) solvers is influenced by the treewidth of the input program and to investigate the consequences of this relationship. We first perform an experimental evaluation that shows that the solving performance is heavily influenced by the treewidth, given ground input programs that are otherwise uniform, both in size and construction. This observation leads to an important question for ASP, namely, how to design encodings such that the treewidth of the resulting ground program remains small. To this end, we define the class of connection-guarded programs, which guarantees that the treewidth of the program after grounding only depends on the treewidth (and the degree) of the input instance. In order to obtain this result, we formalize the grounding process using MSO transductions.

• #1931
ATL Strategic Reasoning Meets Correlated Equilibrium
Xiaowei Huang, Ji Ruan
Automated Reasoning and Theorem Proving

This paper is motivated by analysing a Google self-driving car accident, i.e., the car hit a bus, with the framework and the tools of strategic reasoning by model checking. First of all, we find that existing ATL model checking may find a solution to the accident with {\it irrational} joint strategy of the bus and the car. This leads to a restriction of treating both the bus and the car as rational agents, by which their joint strategy is an equilibrium of certain solution concepts. Second, we find that a randomly-selected joint strategy from the set of equilibria may result in the collision of the two agents, i.e., the accident. Based on these, we suggest taking Correlated Equilibrium (CE) as agents' joint stratgey and optimising over the utilitarian value which is the expected sum of the agents' total rewards. The language ATL is extended with two new modalities to express the existence of an CE and a unique CE, respectively. We implement the extension into a software model checker and use the tool to analyse the examples in the paper. We also study the complexity of the model checking problems.

• #2292
Query Conservative Extensions in Horn Description Logics with Inverse Roles
Jean Christoph Jung, Carsten Lutz, Mauricio Martel, Thomas Schneider
Automated Reasoning and Theorem Proving

We investigate the decidability and computational complexity of query conservative extensions in Horn description logics (DLs) with inverse roles. This is more challenging than without inverse roles because characterizations in terms of unbounded homomorphisms between universal models fail, blocking the standard approach to establishing decidability. We resort to a combination of automata and mosaic techniques, proving that the problem is 2EXPTIME-complete in Horn-ALCHIF (and also in Horn-ALC and in ELI). We obtain the same upper bound for deductive conservative extensions, for which we also prove a coNEXPTIME lower bound.

• #3260
Efficient and Complete FD-solving for extended array constraints
Quentin Plazar, Mathieu Acher, Sébastien Bardin, Arnaud Gotlieb
Automated Reasoning and Theorem Proving

Array constraints are essential for handling data structures in automated reasoning and software verification. Unfortunately, the use of a typical finite domain (FD) solver based on local consistency-based filtering has strong limitations when constraints on indexes are combined with constraints on array elements and size. This paper proposes an efficient and complete FD-solving technique for extended constraints over (possibly unbounded) arrays. We describe a simple but particularly powerful transformation for building an equisatisfiable formula that can be efficiently solved using standard FD reasoning over arrays, even in the unbounded case. Experiments show that the proposed solver significantly outperforms FD solvers, and successfully competes with the best SMT-solvers.

• #3504
Symbolic LTLf Synthesis
Shufang Zhu, Lucas M. Tabajara, Jianwen Li, Geguang Pu, Moshe Y. Vardi
Automated Reasoning and Theorem Proving

LTLf synthesis is the process of finding a strategy that satisfies a linear temporal specification over finite traces. An existing solution to this problem relies on a reduction to a DFA game. In this paper, we propose a symbolic framework for LTLf synthesis based on this technique, by performing the computation over a representation of the DFA as a boolean formula rather than as an explicit graph. This approach enables strategy generation by utilizing the mechanism of boolean synthesis. We implement this symbolic synthesis method in a tool called Syft, and demonstrate by experiments on scalable benchmarks that the symbolic approach scales better than the explicit one.

• #3551
Classical Generalized Probabilistic Satisfiability
Carlos Caleiro, Filipe Casal, Andreia Mordido
Automated Reasoning and Theorem Proving

We analyze a classical generalized probabilistic satisfiability problem (GGenPSAT) which consists in deciding the satisfiability of Boolean combinations of linear inequalities involving probabilities of classical propositional formulas. GGenPSAT coincides precisely with the satisfiability problem of the probabilistic logic of Fagin et al. and was proved to be NP-complete. Here, we present a polynomial reduction of GGenPSAT to SMT over the quantifier-free theory of linear integer and real arithmetic. Capitalizing on this translation, we implement and test a solver for the GGenPSAT problem. As previously observed for many other NP-complete problems, we are able to detect a phase transition behavior for GGenPSAT.

### Tuesday 2216:30 - 18:00ROB-RV - Robotics and Vision (218)

Chair: Arnau Ramisa
• #832
Locality Preserving Matching
Jiayi Ma, Ji Zhao, Hanqi Guo, Junjun Jiang, Huabing Zhou, Yuan Gao
Robotics and Vision

Seeking reliable correspondences between two feature sets is a fundamental and important task in computer vision. This paper attempts to remove mismatches from given putative image feature correspondences. To achieve the goal, an efficient approach, termed as locality preserving matching (LPM), is designed, the principle of which is to maintain the local neighborhood structures of those potential true matches. We formulate the problem into a mathematical model, and derive a closed-form solution with linearithmic time and linear space complexities. More specifically, our method can accomplish the mismatch removal from thousands of putative correspondences in only a few milliseconds. Experiments on various real image pairs for general feature matching, as well as for visual homing and image retrieval demonstrate the generality of our method for handling different types of image deformations, and it is more than two orders of magnitude faster than state-of-the-art methods in the same range of or better accuracy.

• #1217
Fast Preprocessing for Robust Face Sketch Synthesis
Yibing Song, Jiawei Zhang, Linchao Bao, Qingxiong Yang
Robotics and Vision

Exemplar-based face sketch synthesis methods usually meet the challenging problem that input photos are captured in different lighting conditions from training photos. The critical step causing the failure is the search of similar patch candidates for an input photo patch. Conventional illumination invariant patch distances are adopted rather than directly relying on pixel intensity difference, but they will fail when local contrast within a patch changes. In this paper, we propose a fast preprocessing method named Bidirectional Luminance Remapping (BLR), which interactively adjust the lighting of training and input photos. Our method can be directly integrated into state-of-the-art exemplar-based methods to improve their robustness with ignorable computational cost

• #1510
Is My Object in This Video? Reconstruction-based Object Search in Videos
Tan Yu, Jingjing Meng, Junsong Yuan
Robotics and Vision

This paper addresses the problem of video-level object instance search, which aims to retrieve the videos in the database that contain a given query object instance. Without prior knowledge about "when" and "where" an object of interest may appear in a video, determining "whether" a video contains the target object is computationally prohibitive, as it requires exhaustively matching the query against all possible spatial-temporal locations in each video that an object may appear. To alleviate the computational and memory cost, we propose the Reconstruction-based Object SEarch (ROSE) method.It characterizes a huge corpus of features of possible spatial-temporal locations in the video into the parameters of the reconstruction model. Since the memory cost of storing reconstruction model is much less than that of storing features of possible spatial-temporal locations in the video, the efficiency of the search is significantly boosted. Comprehensive experiments on three benchmark datasets demonstrate the promising performance of the proposed ROSE method.

• #1773
Combining Models from Multiple Sources for RGB-D Scene Recognition
Xinhang Song, Shuqiang Jiang, Luis Herranz
Robotics and Vision

Depth can complement RGB with useful cues about object volumes and scene layout. However, RGB-D image datasets are still too small for directly training deep convolutional neural networks (CNNs), in contrast to the massive monomodal RGB datasets. Previous works in RGB-D recognition typically combine two separate networks for RGB and depth data, pretrained with a large RGB dataset and then fine tuned to the respective target RGB and depth datasets. These approaches have several limitations: 1) only use low-level filters learned from RGB data, thus not being able to exploit properly depth-specific patterns, and 2) RGB and depth features are only combined at high-levels but rarely at lower-levels. In this paper, we propose a framework that leverages both knowledge acquired from large RGB datasets together with depth-specific cues learned from the limited depth data, obtaining more effective multi-source and multi-modal representations. We propose a multi-modal combination method that selects discriminative combinations of layers from the different source models and target modalities, capturing both high-level properties of the task and intrinsic low-level properties of both modalities.

• #2315
Cross-Granularity Graph Inference for Semantic Video Object Segmentation
Huiling Wang, Tinghuai Wang, Ke Chen, Joni-Kristian Kämäräinen
Robotics and Vision

We address semantic video object segmentation via a novel cross-granularity hierarchical graphical model to integrate tracklet and object proposal reasoning with superpixel labeling. Tracklet characterizes varying spatial-temporal relations of video object which, however, quite often suffers from sporadic local outliers. In order to acquire high-quality tracklets, we propose a transductive inference model which is capable of calibrating short-range noisy object tracklets with respect to long-range dependencies and high-level context cues. In the center of this work lies a new paradigm of semantic video object segmentation beyond modeling appearance and motion of objects locally, where the semantic label is inferred by jointly exploiting multi-scale contextual information and spatial-temporal relations of video object. We evaluate our method on two popular semantic video object segmentation benchmarks and demonstrate that it advances the state-of-the-art by achieving superior accuracy performance than other leading methods.

• #2625
Synthesizing Samples for Zero-shot Learning
Yuchen Guo, Guiguang Ding, Jungong Han, Yue Gao
Robotics and Vision

Zero-shot learning (ZSL) is to construct recognition models for unseen target classes that have no labeled samples for training. It utilizes the class attributes or semantic vectors as side information and transfers supervision information from related source classes with abundant labeled samples. Existing ZSL approaches adopt an intermediary embedding space to measure the similarity between a sample and the attributes of a target class to perform zero-shot classification. However, this way may suffer from the information loss caused by the embedding process and the similarity measure cannot fully make use of the data distribution. In this paper, we propose a novel approach which turns the ZSL problem into a conventional supervised learning problem by synthesizing samples for the unseen classes. Firstly, the probability distribution of an unseen class is estimated by using the knowledge from seen classes and the class attributes. Secondly, the samples are synthesized based on the distribution for the unseen class. Finally, we can train any supervised classifiers based on the synthesized samples. Extensive experiments on benchmarks demonstrate the superiority of the proposed approach to the state-of-the-art ZSL approaches.

### Tuesday 2216:30 - 18:00MAS-NCG - Noncooperative Games (219)

Chair: Pingzhong Tang
• #2699
Playing Repeated Network Interdiction Games with Semi-Bandit Feedback
Qingyu Guo, Bo An, Long Tran-Thanh
Noncooperative Games

We study repeated network interdiction games with no prior knowledge of the adversary and the environment, which can model many real world network security domains. Existing works often require plenty of available information for the defender and neglect the frequent interactions between both players, which are unrealistic and impractical, and thus, are not suitable for our settings. As such, we provide the first defender strategy, that enjoys nice theoretical and practical performance guarantees, by applying the adversarial online learning approach. In particular, we model the repeated network interdiction game with no prior knowledge as an online linear optimization problem, for which a novel and efficient online learning algorithm, SBGA, is proposed, which exploits the unique semi-bandit feedback in network security domains. We prove that SBGA achieves sublinear regret against adaptive adversary, compared with both the best fixed strategy in hindsight and a near optimal adaptive strategy. Extensive experiments also show that SBGA significantly outperforms existing approaches with fast convergence rate.

• #2705
Comparing Strategic Secrecy and Stackelberg Commitment in Security Games
Qingyu Guo, Bo An, Branislav Bošanský, Christopher Kiekintveld
Noncooperative Games

The Strong Stackelberg Equilibrium (SSE) has drawn extensive attention recently in several security domains. However, the SSE concept neglects the advantage of defender's strategic revelation of her private information, and overestimates the observation ability of the adversaries. In this paper, we overcome these restrictions and analyze the tradeoff between strategic secrecy and commitment in security games. We propose a Disguised-resource Security Game (DSG) where the defender strategically disguises some of her resources. We compare strategic information revelation with public commitment and formally show that they have different advantages depending the payoff structure. To compute the Perfect Bayesian Equilibrium (PBE), several novel approaches are provided, including a novel algorithm based on support set enumeration, and an approximation algorithm for \epsilon-PBE. Extensive experimental evaluation shows that both strategic secrecy and Stackelberg commitment are critical measures in security domain, and our approaches can efficiently solve PBEs for realistic-sized problems.

• #3136
Mechanism Design for Strategic Project Scheduling
Pradeep Varakantham, Na Fu
Noncooperative Games

Organizing large scale projects (e.g., Conferences, IT Shows, F1 race) requires precise scheduling of multiple dependent tasks on common resources where multiple selfish entities are competing to execute the individual tasks. In this paper, we consider a well studied and rich scheduling model referred to as RCPSP (Resource Constrained Project Scheduling Problem). The key change to this model that we consider in this paper is the presence of selfish entities competing to perform individual tasks with the aim of maximizing their own utility. Due to the selfish entities in play, the goal of the scheduling problem is no longer only to minimize makespan for the entire project, but rather, to maximize social welfare while ensuring incentive compatibility and economic efficiency. We show that traditional VCG mechanism is not incentive compatible in this context and hence we provide two new practical mechanisms that extend on VCG. These new mechanisms referred to as Individual Completion based Payments (ICP) and Social Completion based Payments (SCP) provide strong theoretical properties including strategy proofness.

• #3581
Posted Pricing sans Discrimination
Shreyas Sekar
Noncooperative Games

In the quest for market mechanisms that are easy to implement, yet close to optimal, few seem as viable as posted pricing. Despite the growing body of impressive results, the performance of most posted price mechanisms however, rely crucially on "price discrimination" when multiple copies of a good are available. For the more general case with non-linear production costs on each good, hardly anything is known for general multi-good markets. With this in mind, we study the problem of social welfare maximization in a Bayesian setting where the seller can produce any number of copies of a good but faces convex production costs for the same. Our central contribution is a structured framework for decision making and static item pricing in the face of uncertainty and production costs, i.e., the seller decides how much to produce and posts a single price per good that is common to all buyers, the buyers arrive sequentially and purchase utility maximizing bundles of goods. The framework yields constant factor approximations to the optimum welfare when buyer valuations are fractionally subadditive, extends to more general valuations and also settings where the seller is completely oblivious to buyer valuations. Our work presents the first known results for non-discriminatory pricing in environments with non-linear costs where we only have access to stochastic information regarding buyer preferences. At a high level, our results imply that it is often possible to obtain good guarantees without discriminating against buyers, i.e., charging them differently for the same good.

• #1836
Equilibria in Ordinal Games: A Framework based on Possibility Theory.
Nahla Ben Amor, Helene Fargier, Régis Sabbadin
Noncooperative Games

The present paper proposes the first definition of mixed equilibrium for ordinal games. This definition naturally extends possibilistic (single agent) decision theory. This allows us to provide a unifying view of single and multi-agent qualitative decision theory. Our first contribution is to show that ordinal games always admit a possibilistic mixed equilibrium, which can be seen as a qualitative counterpart to mixed (probabilistic) equilibrium.Then, we show that a possibilistic mixed equilibrium can be computed in polynomial time (wrt the size of the game), which contrasts with pure Nash or mixed probabilistic equilibrium computation in cardinal game theory.The definition we propose is thus operational in two ways: (i) it tackles the case when no pure Nash equilibrium exists in an ordinal game; and (ii) it allows an efficient computation of a mixed equilibrium.

• #3675
Convergence and Quality of Iterative Voting Under Non-Scoring Rules
Aaron Koolyk, Tyrone Strangway, Omer Lev, Jeffrey S. Rosenschein
Noncooperative Games

Iterative voting is a social choice mechanism that assumes all voters are strategic, and allows voters to change their stated preferences as the vote progresses until an equilibrium is reached (at which point no player wishes to change their vote). Previous research established that this process converges to an equilibrium for the plurality and veto voting methods and for no other scoring rule. We consider iterative voting for non-scoring rules, examining the major ones, and show that none of them converge when assuming (as most research has so far) that voters pursue a best response strategy. We investigate other potential voter strategies, with a more heuristic flavor (since for most of these voting rules, calculating the best response is NP-hard); we show that they also do not converge. We then conduct an empirical analysis of the iterative voting winners for these non-scoring rules, and compare the winner quality of various strategies.

### Tuesday 2216:30 - 18:00NLP-AT1 - NLP Applications and Tools 1 (220)

Chair: Jiajun Zhang
• #1711
Multi-Modal Word Synset Induction
Jesse Thomason, Raymond J. Mooney
NLP Applications and Tools 1

A word in natural language can be polysemous, having multiple meanings, as well as synonymous, meaning the same thing as other words. Word sense induction attempts to find the senses of polysemous words. Synonymy detection attempts to find when two words are interchangeable. We combine these tasks, first inducing word senses and then detecting similar senses to form word-sense synonym sets (synsets) in an unsupervised fashion. Given pairs of images and text with noun phrase labels, we perform synset induction to produce collections of underlying concepts described by one or more noun phrases. We find that considering multi-modal features from both visual and textual context yields better induced synsets than using either context alone. Human evaluations show that our unsupervised, multi-modally induced synsets are comparable in quality to annotation-assisted ImageNet synsets, achieving about 84% of ImageNet synsets' approval.

• #1448
DDoS Event Forecasting using Twitter Data
Zhongqing Wang, Yue Zhang
NLP Applications and Tools 1

Distributed Denial of Service (DDoS) attacks have been significant threats to the Internet. Traditional research in cyber security focuses on detecting emerging DDoS attacks by tracing network package flow. A characteristic of DDoS defense is that rescue time is limited since the launch of attack. More resilient detection and defence models are typically more costly. We aim at predicting the likelihood of DDoS attacks by monitoring relevant text streams in social media, so that the level of defense can be adjusted dynamically for maximizing cost-effect. To our knowledge, this is a novel and challenge research question for DDoS rescue. Because the input of this task is a text stream rather than a document, information should be collected both on the textual content of individual posts. We propose a fine-grained hierarchical stream model to capture semantic information over infinitely long history, and reveal burstiness and trends. Empirical evaluation shows that social text streams are indeed informative for DDoS forecasting, and our proposed hierarchical model is more effective compared to strong baseline text stream models and discrete bag-of-words models.

• #1449
A Neural Model for Joint Event Detection and Summarization
Zhongqing Wang, Yue Zhang
NLP Applications and Tools 1

Twitter new event detection aims to identify first stories in a tweet stream. Typical approaches consider two sub tasks. First, it is necessary to filter out mundane or irrelevant tweets. Second, tweets are grouped automatically into event clusters. Traditionally, these two sub tasks are processed separately, and integrated under a pipeline setting, despite that there is inter-dependence between the two tasks. In addition, one further related task is summarization, which is to extract a succinct summary for representing a large group of tweets. Summarization is related to detection, under the new event setting in that salient information is universal between event representing tweets and informative event summaries. In this paper, we build a joint model to filter, cluster, and summarize the tweets for new events. In particular, deep representation learning is used to vectorize tweets, which serves as basis that connects tasks. A neural stacking model is used for integrating a pipeline of different sub tasks, and for better sharing between the predecessor and successors. Experiments show that our proposed neural joint model is more effective compared to its pipeline baseline.

• #1841
Fast Parallel Training of Neural Language Models
Tong Xiao, Jingbo Zhu, Tongran Liu, Chunliang Zhang
NLP Applications and Tools 1

Training neural language models (NLMs) is very time consuming and we need parallelization for system speedup. However, standard training methods have poor scalability across multiple devices (e.g., GPUs) due to the huge time cost required to transmit data for gradient sharing in the back-propagation process. In this paper we present a sampling-based approach to reducing data transmission for better scaling of NLMs. As a ''bonus'', the resulting model also improves the training speed on a single device. Our approach yields significant speed improvements on a recurrent neural network-based language model. On four NVIDIA GTX1080 GPUs, it achieves a speedup of 2.1+ times over the standard asynchronous stochastic gradient descent baseline, yet with no increase in perplexity. This is even 4.2 times faster than the naive single GPU counterpart.

• #2677
Joint Learning on Relevant User Attributes in Micro-blog
Jingjing Wang, Shoushan Li, Guodong Zhou
NLP Applications and Tools 1

User attribute classification aims to identify users’ attributes (e.g., gender, age and profession) by leveraging user generated content. However, conventional approaches to user attribute classification focus on single attribute classification involving only one user attribute, which completely ignores the relationship among various user attributes. In this paper, we confront a novel scenario in user attribute classification where relevant user attributes are jointly learned, attempting to make the relevant attribute classification tasks help each other. Specifically, we propose a joint learning approach, namely Aux-LSTM, which first learns a proper auxiliary representation between the related tasks and then leverages the auxiliary representation to integrate the learning process in both tasks. Empirical studies demonstrate the effectiveness of our proposed approach to joint learning on relevant user attributes.

• #3802
Active Learning for Black-Box Semantic Role Labeling with Neural Factors
Chenguang Wang, Laura Chiticariu, Yunyao Li
NLP Applications and Tools 1

Active learning is a useful technique for tasks for which unlabeled data is abundant but manual labeling is expensive. One example of such a task is semantic role labeling (SRL), which relies heavily on labels from trained linguistic experts. One challenge in applying active learning algorithms for SRL is that the complete knowledge of the SRL model is often unavailable, against the common assumption that active learning methods are aware of the details of the underlying models. In this paper, we present an active learning framework for black-box SRL models (i.e., models whose details are unknown). In lieu of a query strategy based on model details, we propose a neural query strategy model that embeds both language and semantic information to automatically learn the query strategy from predictions of an SRL model alone. Our experimental results demonstrate the effectiveness of both this new active learning framework and the neural query strategy model.

### Tuesday 2216:30 - 18:30SIS-PL - Sister Conference Track: Planning (203)

Chair: Shivaram Kalyanakrishnan
• #4208
Dynamical System-Based Motion Planning for Multi-Arm Systems: Reaching for Moving Objects
Seyed Sina Mirrazavi Salehian, Nadia Figueroa, Aude Billard
Sister Conference Track: Planning

The use of coordinated multi-arm robotic systems allows to preform manipulations of heavy or bulky objects that would otherwise be infeasible for a single-arm robot. This paper concisely introduces our work on coordinated multi-arm control [Salehian et al., 2016a], where we proposed a virtual object based dynamical systems (DS) control law to generate autonomous and synchronized motions for a multi-arm robot system. We show theoretically and empirically that the multi-arm + virtual object system converges asymptotically to a moving object. The proposed framework is validated on a dual-arm robotic system. We demonstrate that it can re-synchronize and adapt the motion of each arm in a fraction of a second, even when the object’s motion is fast and not accurately predictable.

• #4223
Lessons from the Amazon Picking Challenge: Four Aspects of Building Robotic Systems
Clemens Eppner, Sebastian Höfer, Rico Jonschkowski, Roberto Martín-Martín, Arne Sieverling, Vincent Wall, Oliver Brock
Sister Conference Track: Planning

We describe the winning entry to the Amazon Picking Challenge 2015. From the experience of building this system and competing, we derive several conclusions: (1) We suggest to characterize robotic system building along four key aspects, each of them spanning a spectrum of solutions - modularity vs. integration, generality vs. assumptions, computation vs. embodiment, and planning vs. feedback. (2) To understand which region of each spectrum most adequately addresses which robotic problem, we must explore the full spectrum of possible approaches. (3) For manipulation problems in unstructured environments, certain regions of each spectrum match the problem most adequately, and should be exploited further. This is supported by the fact that our solution deviated from the majority of the other challenge entries along each of the spectra. This is an abridged version of a conference publication.

• #4232
Maximizing Awareness about HIV in Social Networks of Homeless Youth with Limited Information
Amulya Yadav, Hau Chan, Albert Xin Jiang, Haifeng Xu, Eric Rice, Milind Tambe
Sister Conference Track: Planning

This paper presents HEALER, a software agent that recommends sequential intervention plans for use by homeless shelters, who organize these interventions to raise awareness about HIV among homeless youth. HEALER's sequential plans (built using knowledge of social networks of homeless youth) choose intervention participants strategically to maximize influence spread, while reasoning about uncertainties in the network. While previous work presents influence maximizing techniques to choose intervention participants, they do not address two real-world issues: (i) they completely fail to scale up to real-world sizes; and (ii) they do not handle deviations in execution of intervention plans. HEALER handles these issues via two major contributions: (i) HEALER casts this influence maximization problem as a POMDP and solves it using a novel planner which scales up to previously unsolvable real-world sizes; and (ii) HEALER allows shelter officials to modify its recommendations, and updates its future plans in a deviation-tolerant manner. HEALER was deployed in the real world in Spring 2016 with considerable success.

• #4241
I-dual: Solving Constrained SSPs via Heuristic Search in the Dual Space
Felipe Trevizan, Sylvie Thiebaux, Pedro Santana, Brian Williams
Sister Conference Track: Planning

We consider the problem of generating optimal stochastic policies for Constrained Stochastic Shortest Path problems, which are a natural model for planning under uncertainty for resource-bounded agents with multiple competing objectives. While unconstrained SSPs enjoy a multitude of efficient heuristic search solution methods with the ability to focus on promising areas reachable from the initial state, the state of the art for constrained SSPs revolves around linear and dynamic programming algorithms which explore the entire state space. In this paper, we present i-dual, the first heuristic search algorithm for constrained SSPs. To concisely represent constraints and efficiently decide their violation, i-dual operates in the space of dual variables describing the policy occupation measures. It does so while retaining the ability to use standard value function heuristics computed by well-known methods. Our experiments show that these features enable i-dual to achieve up to two orders of magnitude improvement in run-time and memory over linear programming algorithms.

• #4257
An End-to-End System for Accomplishing Tasks with Modular Robots: Perspectives for the AI community
Gangyuan Jing, Tarik Tosun, Mark Yim, Hadas Kress-Gazit
Sister Conference Track: Planning

The advantage of modular robot systems lies in their flexibility, but this advantage can only be realized if there exists some reliable, effective way of generating configurations (shapes) and behaviors (controlling programs) appropriate for a given task. In this paper, we present an end-to-end system for addressing tasks with modular robots, and demonstrate that it is capable of accomplishing challenging multi-part tasks in hardware experiments. The system consists of four tightly integrated components: (1) A high-level mission planner, (2) A design library spanning a wide set of functionality, (3) A design and simulation tool for populating the library with new configurations and behaviors, and (4) Modular robot hardware. This paper condenses the material originally presented in Jing et al. 2016 into a shorter format suitable for a broad audience.

• #4304
Value Iteration Networks
Aviv Tamar, Yi Wu, Garrett Thomas, Sergey Levine, Pieter Abbeel
Sister Conference Track: Planning

We introduce the value iteration network (VIN): a fully differentiable neural network with a planning module' embedded within. VINs can learn to plan, and are suitable for predicting outcomes that involve planning-based reasoning, such as policies for reinforcement learning. Key to our approach is a novel differentiable approximation of the value-iteration algorithm, which can be represented as a convolutional neural network, and trained end-to-end using standard backpropagation.We evaluate VIN based policies on discrete and continuous path-planning domains, and on a natural-language based search task. We show that by learning an explicit planning computation, VIN policies generalize better to new, unseen domains.This paper is a significantly abridged and IJCAI audience targeted version of the original NIPS 2016 paper with the same title, available here: https://arxiv.org/abs/1602.02867

### Tuesday 2216:30 - 18:30Competition (206)

Chair: Jochen Renz
• Angry Birds
Competition
• ### Tuesday 2218:00 - 19:00Demonstrations (Lobby)

Chair: John Thangarajah
• Demonstrations
Demonstrations
• ### Wednesday 2308:30 - 10:00EAR-2 - Early Career 2 (Plenary 2)

Chair: Wai Kiang Yeap
• #28
Multimodal News Article Analysis
Arnau Ramisa
Early Career 2

The intersection of Computer Vision and Natural Language Processing has been a hot topic of research in recent years, with results that were unthinkable only a few years ago. In view of this progress, we want to highlight online news articles as a potential next step for this area of research. The rich interrelations of text, tags, images or videos, as well as a vast corpus of general knowledge are an exciting benchmark for high-capacity models such as the deep neural networks. In this paper we present a series of tasks and baseline approaches to leverage corpus such as the BreakingNews dataset.

• #25
Towards understanding stories in videos
Sanja Fidler
Early Career 2

None

• #20
Robotic Strategic Behavior in Adversarial Environments
Noa Agmon
Early Career 2

The presence of robots in areas containing threats is becoming more prevalent, due to their ability to perform missions accurately, efficiently, and with little risk to humans. Having the robots handle adversarial forces in missions such as search and rescue, intelligence gathering, border protection and humanitarian assistance, raises many new, exciting research challenges. This paper describes recent research achievements in areas related to robotic mission planning in adversarial environments, including multi-robot patrolling, robotic coverage, multi-robot formation, and navigation, and suggests possible future research directions.

### Wednesday 2308:30 - 10:00SIS-SECO - Sister Conference Track: Search and Constraints (203)

Chair: Christian Bessiere
• #4207
Using Constraint Programming to solve a Cryptanalytic Problem
David Gerault, Marine Minier, Christine Solnon
Sister Conference Track: Search and Constraints

We describe Constraint Programming (CP) models to solve a cryptanalytic problem: the chosen key differential attack against the standard block cipher AES. We show that CP solvers are able to solve these problems quicker than dedicated cryptanalysis tools, and we prove that a solution claimed to be optimal in two recent cryptanalysis papers is not optimal by providing a better solution.

• #4246
A SAT Approach to Branchwidth
Neha Lodha, Sebastian Ordyniak, Stefan Szeider
Sister Conference Track: Search and Constraints

Branch decomposition is a prominent method for structurally decomposing a graph, hypergraph or CNF formula. The width of a branch decomposition provides a measure of how well the object is decomposed. For many applications it is crucial to compute a branch decomposition whose width is as small as possible. We propose a SAT approach to finding branch decompositions of small width. The core of our approach is an efficient SAT encoding which determines with a single SAT-call whether a given hypergraph admits a branch decomposition of certain width. For our encoding we develop a novel partition-based characterization of branch decompositions. The encoding size imposes a limit on the size of the given hypergraph. In order to break through this barrier and to scale the SAT approach to larger instances, we develop a new heuristic approach where the SAT encoding is used to locally improve a given candidate decomposition until a fixed-point is reached. This new method scales now to instances with several thousands of vertices and edges.

• #4262
Blockedness in Propositional Logic: Are You Satisfied With Your Neighborhood?
Benjamin Kiesl, Martina Seidl, Hans Tompits, Armin Biere
Sister Conference Track: Search and Constraints

Clause-elimination techniques that simplify formulas by removing redundant clauses play an important role in modern SAT solving. Among the types of redundant clauses, blocked clauses are particularly popular. For checking whether a clause C is blocked in a formula F, one only needs to consider the so-called resolution neighborhood of C, i.e., the set of clauses that can be resolved with C. Because of this, blocked clauses are referred to as being locally redundant. In this paper, we discuss powerful generalizations of blocked clauses that are still locally redundant, viz. set-blocked clauses and super-blocked clauses. We furthermore present complexity results for deciding whether a clause is set-blocked or super-blocked.

• #4270
Solving Very Hard Problems: Cube-and-Conquer, a Hybrid SAT Solving Method
Marijn J.H. Heule, Oliver Kullmann, Victor W. Marek
Sister Conference Track: Search and Constraints

A recent success of SAT solving has been the solution of the boolean Pythagorean Triples problem [Heule et al., 2016], delivering the largest proof yet, of 200 terabytes in size. We present this and the underlying paradigm Cube-and-Conquer, a powerful general method to solve big SAT problems, based on integrating the “old” and “new” methods of SAT solving.

### Wednesday 2308:30 - 10:00ML-CL3 - Classification 3 (204)

Chair: Freddy Lecue
• #2163
Rescale-Invariant SVM for Binary Classification
Mojtaba Montazery, Nic Wilson
Classification 3

Support Vector Machines (SVM) are among the most well-known machine learning methods, with broad use in different scientific areas. However, one necessary pre-processing phase for SVM is normalization (scaling) of features, since SVM is not invariant to the scales of the features’ spaces, i.e., different ways of scaling may lead to different results. We define a more robust decision-making approach for binary classification, in which one sample strongly belongs to a class if it belongs to that class for all possible rescalings of features. We derive a way of characterising the approach for binary SVM that allows determining when an instance strongly belongs to a class and when the classification is invariant to rescaling. The characterisation leads to a computation method to determine whether one sample is strongly positive, strongly negative or neither. Our experimental results back up the intuition that being strongly positive suggests stronger confidence that an instance really is positive.

• #2410
Analogy-preserving functions: A way to extend Boolean samples
Miguel Couceiro, Nicolas Hug, Henri Prade, Gilles Richard
Classification 3

Training set extension is an important issue in machine learning. Indeed when the examples at hand are in a limited quantity, the performances of standard classifiers may significantly decrease and it can be helpful to build additional examples. In this paper, we consider the use of analogical reasoning, and more particularly of analogical proportions for extending training sets. Here the ground truth labels are considered to be given by a (partially known) function. We examine the conditions that are required for such functions to ensure an error-free extension in a Boolean setting. To this end, we introduce the notion of Analogy Preserving (AP) functions, and we prove that their class is the class of affine Boolean functions. This noteworthy theoretical result is complemented with an empirical investigation of approximate AP functions, which suggests that they remain suitable for training set extension.

• #2563
Further Results on Predicting Cognitive Abilities for Adaptive Visualizations
Cristina Conati, Sébastien Lallé, Md. Abed Rahman, Dereck Toker
Classification 3

Previous work has shown that some user cognitive abilities relevant for processing information visualizations can be predicted from eye tracking data. Performing this type of user modeling is important for devising user-adaptive visualizations that can adapt to a user’s abilities as needed during the interaction. In this paper, we contribute to previous work by extending the type of visualizations considered and the set of cognitive abilities that can be predicted from gaze data, thus providing evidence on the generality of these findings. We also evaluate how quality of gaze data impacts prediction.

• #3530
Logistic Markov Decision Processes
Martin Mladenov, Craig Boutilier, Dale Schuurmans, Ofer Meshi, Gal Elidan, Tyler Lu
Classification 3

User modeling in advertising and recommendation has typically focused on myopic predictors of user responses. In this work, we consider the long-term decision problem associated with user interaction. We propose a concise specification of long-term interaction dynamics by combining factored dynamic Bayesian networks with logistic predictors of user responses, allowing state-of-the-art prediction models to be seamlessly extended. We show how to solve such models at scale by providing a constraint generation approach for approximate linear programming that overcomes the variable coupling and non-linearity induced by the logistic regression predictor. The efficacy of the approach is demonstrated on advertising domains with up to 2^54 states and 2^39 actions.

• #1200
Fast SVM Trained by Divide-and-Conquer Anchors
Meng Liu, Chang Xu, Chao Xu, Dacheng Tao
Classification 3

Supporting vector machine (SVM) is the most frequently used classifier for machine learning tasks. However, its training time could become cumbersome when the size of training data is very large. Thus, many kinds of representative subsets are chosen from the original dataset to reduce the training complexity. In this paper, we propose to choose the representative points which are noted as anchors obtained from non-negative matrix factorization (NMF) in a divide-and-conquer framework, and then use the anchors to train an approximate SVM. Our theoretical analysis shows that the solving the DCA-SVM can yield an approximate solution close to the primal SVM. Experimental results on multiple datasets demonstrate that our DCA-SVM is faster than the state-of-the-art algorithms without notably decreasing the accuracy of classification results.

• #3171
Accelerated Doubly Stochastic Gradient Algorithm for Large-scale Empirical Risk Minimization
Zebang Shen, Hui Qian, Tongzhou Mu, Chao Zhang
Classification 3

Nowadays, algorithms with fast convergence, small memory footprints, and low per-iteration complexity are particularly favorable for artificial intelligence applications. In this paper, we propose a doubly stochastic algorithm with a novel accelerating multi-momentum technique to solve large scale empirical risk minimization problem for learning tasks. While enjoying a provably superior convergence rate, in each iteration, such algorithm only accesses a mini batch of samples and meanwhile updates a small block of variable coordinates, which substantially reduces the amount of memory reference when both the massive sample size and ultra-high dimensionality are involved. Specifically, to obtain an ε-accurate solution, our algorithm requires only O(log(1/ε)/sqrt(ε)) overall computation for the general convex case and O((n+sqrt{nκ})log(1/ε)) for the strongly convex case. Empirical studies on huge scale datasets are conducted to illustrate the efficiency of our method in practice.

### Wednesday 2308:30 - 10:00ML-DL3 - Deep Learning 3 (210)

Chair: Longbing Cao
• #2139
Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering
Zhuxi Jiang, Yin Zheng, Huachun Tan, Bangsheng Tang, Hanning Zhou
Deep Learning 3

Clustering is among the most fundamental tasks in machine learning and artificial intelligence. In this paper, we propose Variational Deep Embedding (VaDE), a novel unsupervised generative clustering approach within the framework of Variational Auto-Encoder (VAE). Specifically, VaDE models the data generative procedure with a Gaussian Mixture Model (GMM) and a deep neural network (DNN): 1) the GMM picks a cluster; 2) from which a latent embedding is generated; 3) then the DNN decodes the latent embedding into an observable. Inference in VaDE is done in a variational way: a different DNN is used to encode observables to latent embeddings, so that the evidence lower bound (ELBO) can be optimized using the Stochastic Gradient Variational Bayes (SGVB) estimator and the reparameterization trick. Quantitative comparisons with strong baselines are included in this paper, and experimental results show that VaDE significantly outperforms the state-of-the-art clustering methods on 5 benchmarks from various modalities. Moreover, by VaDE's generative nature, we show its capability of generating highly realistic samples for any specified cluster, without using supervised information during training.

• #2397
Convolutional-Match Networks for Question Answering
Spyridon Samothrakis, Tom Vodopivec, Michael Fairbank, Maria Fasli
Deep Learning 3

In this paper, we present a simple, yet effective, attention and memory mechanism that is reminiscent of Memory Networks and we demonstrate it in question-answering scenarios. Our mechanism is based on four simple premises: a) memories can be formed from word sequences by using convolutional networks; b) distance measurements can be taken at a neuronal level; c) a recursive softmax function can be used for attention; d) extensive weight sharing can help profoundly. We achieve state-of-the-art results in the bAbI tasks, outperforming both Memory Networks and the Differentiable Neural Computer, both in terms of accuracy and stability (i.e. variance) of results.

• #2919
Improved Deep Embedded Clustering with Local Structure Preservation
Xifeng Guo, Long Gao, Xinwang Liu, Jianping Yin
Deep Learning 3

Deep clustering learns deep feature representations that favor clustering task using neural networks. Some pioneering work proposes to simultaneously learn embedded features and perform clustering by explicitly defining a clustering oriented loss. Though promising performance has been demonstrated in various applications, we observe that a vital ingredient has been overlooked by these work that the defined clustering loss may corrupt feature space, which leads to non-representative meaningless features and this in turn hurts clustering performance. To address this issue, in this paper, we propose the Improved Deep Embedded Clustering (IDEC) algorithm to take care of data structure preservation. Specifically, we manipulate feature space to scatter data points using a clustering loss as guidance. To constrain the manipulation and maintain the local structure of data generating distribution, an under-complete autoencoder is applied. By integrating the clustering loss and autoencoder's reconstruction loss, IDEC can jointly optimize cluster labels assignment and learn features that are suitable for clustering with local structure preservation. The resultant optimization problem can be effectively solved by mini-batch stochastic gradient descent and backpropagation. Experiments on image and text datasets empirically validate the importance of local structure preservation and the effectiveness of our algorithm.

• #2988
Modeling Hebb Learning Rule for Unsupervised Learning
Jia Liu, Maoguo Gong, Qiguang Miao
Deep Learning 3

This paper presents to model the Hebb learning rule and proposes a neuron learning machine (NLM). Hebb learning rule describes the plasticity of the connection between presynaptic and postsynaptic neurons and it is unsupervised itself. It formulates the updating gradient of the connecting weight in artificial neural networks. In this paper, we construct an objective function via modeling the Hebb rule. We make a hypothesis to simplify the model and introduce a correlation based constraint according to the hypothesis and stability of solutions. By analysis from the perspectives of maintaining abstract information and increasing the energy based probability of observed data, we find that this biologically inspired model has the capability of learning useful features. NLM can also be stacked to learn hierarchical features and reformulated into convolutional version to extract features from 2-dimensional data. Experiments on single-layer and deep networks demonstrate the effectiveness of NLM in unsupervised feature learning.

• #3002
DRLnet: Deep Difference Representation Learning Network and An Unsupervised Optimization Framework
Puzhao Zhang, Maoguo Gong, Hui Zhang, Jia Liu
Deep Learning 3

Change detection and analysis (CDA) is an important research topic in the joint interpretation of spatial-temporal remote sensing images. The core of CDA is to effectively represent the difference and measure the difference degree between bi-temporal images. In this paper, we propose a novel difference representation learning network (DRLnet) and an effective optimization framework without any supervision. Difference measurement, difference representation learning and unsupervised clustering are combined as a single model, i.e., DRLnet, which is driven to learn clustering-friendly and discriminative difference representations (DRs) for different types of changes. Further, DRLnet is extended into a recurrent learning framework to update and reuse limited training samples and prevent the semantic gaps caused by the saltation in the number of change types from over-clustering stage to the desired one. Experimental results identify the effectiveness of the proposed framework.

• #3596
SEVEN: Deep Semi-supervised Verification Networks
Vahid Noroozi, Lei Zheng, Sara Bahaadini, Sihong Xie, Philip S. Yu
Deep Learning 3

Verification determines whether two samples belong to the same class or not, and has important applications such as face and fingerprint verification, where thousands or millions of categories are present but each category has scarce labeled examples, presenting two major challenges for existing deep learning models. We propose a deep semi-supervised model named SEmi-supervised VErification Network (SEVEN) to address these challenges. The model consists of two complementary components. The generative component addresses the lack of supervision within each category by learning general salient structures from a large amount of data across categories. The discriminative component exploits the learned general features to mitigate the lack of supervision within categories, and also directs the generative component to find more informative structures of the whole data manifold. The two components are tied together in SEVEN to allow an end-to-end training of the two components. Extensive experiments on four verification tasks demonstrate that SEVEN significantly outperforms other state-of-the-art deep semi-supervised techniques when labeled data are in short supply. Furthermore, SEVEN is competitive with fully supervised baselines trained with a larger amount of labeled data. It indicates the importance of the generative component in SEVEN.

### Wednesday 2308:30 - 10:00ML-DMP - Data Mining and Personalization (211)

Chair: Jose Such
• #3308
FolkPopularityRank: Tag Recommendation for Enhancing Social Popularity using Text Tags in Content Sharing Services
Toshihiko Yamasaki, Jiani Hu, Shumpei Sano, Kiyoharu Aizawa
Data Mining and Personalization

In this study, we address two emerging yet challenging problems in social media: (1) scoring the text tags in terms of the influence to the numbers of views, comments, and favorite ratings of images and videos on content sharing services, and (2) recommending additional tags to increase such popularity-related numbers. For these purposes, we present the FolkPopularityRank algorithm, which can score text tags based on their ability to influence the popularity-related numbers. The FolkPopularityRank algorithm is inspired by the PageRank and FolkRank algorithms but the scores of the tags are calculated not only by the co-occurrence of the tags but also by considering the popularity-related numbers of the content. To the best of our knowledge, this is the first attempt to recommending tags that can enhance popularity attributes of social media. We conducted extensive experiments with about 1,000 images. We uploaded the photos with the recommended tags along with the original tags to Flickr as a real test, and obtained very promising results.

• #1512
Sampling for Approximate Maximum Search in Factorized Tensor
Zhi Lu, Yang Hu, Bing Zeng
Data Mining and Personalization

Factorization models have been extensively used for recovering the missing entries of a matrix or tensor. However, directly computing all of the entries using the learned factorization models is prohibitive when the size of the matrix/tensor is large. On the other hand, in many applications, such as collaborative filtering, we are only interested in a few entries that are the largest among them. In this work, we propose a sampling-based approach for finding the top entries of a tensor which is decomposed by the CANDECOMP/PARAFAC model. We develop an algorithm to sample the entries with probabilities proportional to their values. We further extend it to make the sampling proportional to the $k$-th power of the values, amplifying the focus on the top ones. We provide theoretical analysis of the sampling algorithm and evaluate its performance on several real-world data sets. Experimental results indicate that the proposed approach is orders of magnitude faster than exhaustive computing. When applied to the special case of searching in a matrix, it also requires fewer samples than the other state-of-the-art method.

• #2183
Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks
Jun Xiao, Hao Ye, Xiangnan He, Hanwang Zhang, Fei Wu, Tat-Seng Chua
Data Mining and Personalization

Factorization Machines (FMs) are a supervised learning approach that enhances the linear regression model by incorporating the second-order feature interactions. Despite effectiveness, FM can be hindered by its modelling of all feature interactions with the same weight, as not all feature interactions are equally useful and predictive. For example, the interactions with useless features may even introduce noises and adversely degrade the performance. In this work, we improve FM by discriminating the importance of different feature interactions. We propose a novel model named Attentional Factorization Machine (AFM), which learns the importance of each feature interaction from data via a neural attention network. Extensive experiments on two real-world datasets demonstrate the effectiveness of AFM. Empirically, it is shown on regression task AFM betters FM with a 8.6% relative improvement, and consistently outperforms the state-of-the-art deep learning methods Wide&Deep [Cheng et al., 2016] and DeepCross [Shan et al., 2016] with a much simpler structure and fewer model parameters. Our implementation of AFM is publicly available at: https://github.com/hexiangnan/attentional_factorization_machine

• #2665
Learning User's Intrinsic and Extrinsic Interests for Point-of-Interest Recommendation: A Unified Approach
Huayu Li, Yong Ge, Defu Lian, Hao Liu
Data Mining and Personalization

Point-of-Interest (POI) recommendation has been an important service on location-based social networks. However, it is very challenging to generate accurate recommendations due to the complex nature of user's interest in POI and the data sparseness. In this paper, we propose a novel unified approach that could effectively learn fine-grained and interpretable user's interest, and adaptively model the missing data. Specifically, a user's general interest in POI is modeled as a mixture of her intrinsic and extrinsic interests, upon which we formulate the ranking constraints in our unified recommendation approach. Furthermore, a self-adaptive location-oriented method is proposed to capture the inherent property of missing data, which is formulated as squared error based loss in our unified optimization objective. Extensive experiments on real-world datasets demonstrate the effectiveness and advantage of our approach.

• #2708
Tracking the Evolution of Customer Purchase Behavior Segmentation via a Fragmentation-Coagulation Process
Ling Luo, Bin Li, Irena Koprinska, Shlomo Berkovsky, Fang Chen
Data Mining and Personalization

Customer behavior modeling is important for businesses in order to understand, attract and retain customers. It is critical that the models are able to track the dynamics of customer behavior over time. We propose FC-CSM, a Customer Segmentation Model based on a Fragmentation-Coagulation process, which can track the evolution of customer segmentation, including the splitting and merging of customer groups. We conduct a case study using transaction data from a major Australian supermarket chain, where we: 1) show that our model achieves high fitness of purchase rate, outperforming models using mixture of Poisson processes; 2) compare the impact of promotions on customers for different products; and 3) track how customer groups evolve over time and how individual customers shift across groups. Our model provides valuable information to stakeholders about the different types of customers, how they change purchase behavior, and which customers are more receptive to promotion campaigns.

• #2737
Life-Stage Modeling by Customer-Manifold Embedding
Jing-Wen Yang, Yang Yu, Xiao-Peng Zhang
Data Mining and Personalization

A person experiences different stages throughout the life, causing dramatically varying behavior patterns. In applications such as online-shopping, it has been observed that customer behaviors are largely affected by their stages and are evolving over time. Although this phenomena has been recognized previously, very few studies tried to model the life-stage and make use of it. In this paper, we propose to discover a latent space, called customer-manifold, on which a position corresponds to a customer stage. The customer-manifold allows us to train a static prediction model that captures dynamic customer behavior patterns. We further embed the learned customer-manifold into a neural network model as a hidden layer output, resulting in an efficient and accurate customer behavior prediction system. We apply this system to online-shopping recommendation. Experiments in real world data show that taking customer-manifold into account can improve the performance of the recommender system. Moreover, visualization of the customer-manifold space may also be helpful to understand the evolutionary customer behaviors.

### Wednesday 2308:30 - 10:00ML-SSL1 - Semi-Supervised Learning 1 (212)

Chair: Ming Li
• #1846
Storage Fit Learning with Unlabeled Data
Bo-Jian Hou, Lijun Zhang, Zhi-Hua Zhou
Semi-Supervised Learning 1

By using abundant unlabeled data, semi-supervised learning approaches have been found very useful in various tasks. Existing approaches, however, neglect the fact that the storage available for the learning process is different under different situations, and thus, the learning approaches should be flexible subject to the storage budget limit. In this paper, we focus on graph-based semi-supervised learning and propose two storage fit learning approaches which can adjust their behaviors to different storage budgets. Specifically, we utilize techniques of low-rank matrix approximation to find a low-rank approximator of the similarity matrix so as to reduce the space complexity. The first approach is based on stochastic optimization, which is an iterative approach that converges to the optimal low-rank approximator globally. The second approach is based on Nystrom method, which can find a good low-rank approximator efficiently and is suitable for real-time applications. Experiments on classification tasks show that the proposed methods can fit dynamically different storage budgets and obtain good performances in different scenarios.

• #1856
Multi-Positive and Unlabeled Learning
Yixing Xu, Chang Xu, Chao Xu, Dacheng Tao
Semi-Supervised Learning 1

The positive and unlabeled (PU) learning problem focuses on learning a classifier from positive and unlabeled data. Some methods have been developed to solve the PU learning problem. However, they are often limited in practical applications, since only binary classes are involved and cannot easily be adapted to multi-class data. Here we propose a one-step method that directly enables multi-class model to be trained using the given input multi-class data and that predicts the label based on the model decision. Specifically, we construct different convex loss functions for labeled and unlabeled data to learn a discriminant function F. The theoretical analysis on the generalization error bound shows that it is no worse than k√k times of the fully supervised multi-class classification methods when the size of the data in k classes is of the same order. Finally, our experimental results demonstrate the significance and effectiveness of the proposed algorithm in synthetic and real-world datasets.

• #1414
Adaptively Unified Semi-supervised Learning for Cross-Modal Retrieval
Liang Zhang, Bingpeng Ma, Jianfeng He, Guorong Li, Qingming Huang, Qi Tian
Semi-Supervised Learning 1

Motivated by the fact that both relevancy of class labels and unlabeled data can help to strengthen multi-modal correlation, this paper proposes a novel method for cross-modal retrieval. To make each sample moving to the direction of its relevant label while far away from that of its irrelevant ones, a novel dragging technique is fused into a unified linear regression model. By this way, not only the relation between embedded features and relevant class labels but also the relation between embedded features and irrelevant class labels can be exploited. Moreover, considering that some unlabeled data contain specific semantic information, a weighted regression model is designed to adaptively enlarge their contribution while weaken that of the unlabeled data with non-specific semantic information. Hence, unlabeled data can supply semantic information to enhance discriminant ability of classifier. Finally, we integrate the constraints into a joint minimization formulation and develop an efficient optimization algorithm to learn a discriminative common subspace for different modalities. Experimental results on Wiki, Pascal and NUS-WIDE datasets show that the proposed method outperforms the state-of-the-art methods even when we set 20% samples without class labels.

• #1457
Instance-Level Label Propagation with Multi-Instance Learning
Qifan Wang, Gal Chechik, Chen Sun, Bin Shen
Semi-Supervised Learning 1

Label propagation is a popular semi-supervised learning technique that transfers information from labeled examples to unlabeled examples through a graph. Most label propagation methods construct a graph based on example-to-example similarity, assuming that the resulting graph connects examples that share similar labels. Unfortunately, example-level similarity is sometimes badly defined. For instance, two images may contain two different objects, but have similar overall appearance due to large similar background. In this case, computing similarities based on whole-image would fail propagating information to the right labels. This paper proposes a novel Instance-Level Label Propagation (ILLP) approach that integrates label propagation with multi-instance learning. Each example is treated as containing multiple instances, as in the case of an image consisting of multiple regions. We first construct a graph based on instance-level similarity and then simultaneously identify the instances carrying the labels and propagate the labels across instances in the graph. Optimization is based on an iterative Expectation Maximization (EM) algorithm. Experimental results on two benchmark datasets demonstrate the effectiveness of the proposed approach over several state-of-the-art methods.

• #2727
Learning Discriminative Recommendation Systems with Side Information
Feipeng Zhao, Yuhong Guo
Semi-Supervised Learning 1

Top-N recommendation systems are useful in many real world applications such as E-commerce platforms. Most previous methods produce top-N recommendations based on the observed user purchase or recommendation activities. Recently, it has been noticed that side information that describes the items can be produced from auxiliary sources and help to improve the performance of top-N recommendation systems; e.g., side information of the items can be collected from the item reviews. In this paper, we propose a joint discriminative prediction model that exploits both the partially observed user-item recommendation matrix and the item-based side information to build top-N recommendation systems. This joint model aggregates observed user-item recommendation activities to produce the missing user-item recommendation scores while simultaneously training a linear regression model to predict the user-item recommendation scores from auxiliary item features. We evaluate the proposed approach on a number of recommendation datasets. The experimental results show that the proposed joint model is very effective for producing top-N recommendation systems.

• #1276
Adaptive Semi-Supervised Learning with Discriminative Least Squares Regression
Minnan Luo, Lingling Zhang, Feiping Nie, Xiaojun Chang, Buyue Qian, Qinghua Zheng
Semi-Supervised Learning 1

Semi-supervised learning plays a significant role in multi-class classification, where a small number of labeled data are more deterministic while substantial unlabeled data might cause large uncertainties and potential threats. In this paper, we distinguish the label fitting of labeled and unlabeled training data through a probabilistic vector with an adaptive parameter, which always ensures the significant importance of labeled data and characterizes the contribution of unlabeled instance according to its uncertainty. Instead of using traditional least squares regression (LSR) for classification, we develop a new discriminative LSR by equipping each label with an adjustment vector. This strategy avoids incorrect penalization on samples that are far away from the boundary and simultaneously facilitates multi-class classification by enlarging the geometrical distance of instances belonging to different classes. An efficient alternative algorithm is exploited to solve the proposed model with closed form solution for each updating rule. We also analyze the convergence and complexity of the proposed algorithm theoretically. Experimental results on several benchmark datasets demonstrate the effectiveness and superiority of the proposed model for multi-class classification tasks.

### Wednesday 2308:30 - 10:00MT-CG - Computer Games (213)

Chair: Sven Koenig
• #1222
Real-Time Navigation in Classical Platform Games via Skill Reuse
Michael Dann, Fabio Zambetta, John Thangarajah
Computer Games

In platform videogames, players are frequently tasked with solving medium-term navigation problems in order to gather items or powerups. Artificial agents must generally obtain some form of direct experience before they can solve such tasks. Experience is gained either through training runs, or by exploiting knowledge of the game's physics to generate detailed simulations. Human players, on the other hand, seem to look ahead in high-level, abstract steps. Motivated by human play, we introduce an approach that leverages not only abstract "skills", but also knowledge of what those skills can and cannot achieve. We apply this approach to Infinite Mario, where despite facing randomly generated, maze-like levels, our agent is capable of deriving complex plans in real-time, without relying on perfect knowledge of the game's physics.

• #2404
Player Movement Models for Video Game Level Generation
Sam Snodgrass, Santiago Ontañón
Computer Games

The use of statistical and machine learning approaches, such as Markov chains, for procedural content generation (PCG) has been growing in recent years in the field of Game AI. However, there has been little work in learning to generate content, specifically levels, accounting for player movement within those levels. We are interested in extracting player models automatically from play traces and using those learned models, paired with a machine learning-based generator to create levels that allow the same types of movements observed in the play traces. We test our approach by generating levels for Super Mario Bros. We compare our results against the original levels, a previous constrained sampling approach, and a previous approach that learned a combined player and level model.

• #1293
Stratified Strategy Selection for Unit Control in Real-Time Strategy Games
Levi H. S. Lelis
Computer Games

In this paper we introduce Stratified Strategy Selection (SSS), a novel search algorithm for micromanaging units in real-time strategy (RTS) games. SSS uses a type system to partition the player's units into types and assumes that units of the same type must follow the same strategy. SSS searches in the state space induced by the type system to select, from a pool of options, a strategy for each unit. Empirical results on a simulator of an RTS game shows that SSS employing either fixed or adaptive type systems is able to substantially outperform state-of-the-art search-based algorithms in combat scenarios with up to 100 units.

• #2034
Focused Depth-first Proof Number Search using Convolutional Neural Networks for the Game of Hex
Chao Gao, Martin Müller, Ryan Hayward
Computer Games

Proof Number search (PNS) is an effective algorithm for searching theoretical values on games with non-uniform branching factors. Focused depth-first proof number search (FDFPN) with dynamic widening was proposed for Hex where the branching factor is nearly uniform. However, FDFPN is fragile to its heuristic move ordering function. The recent advances of Convolutional Neural Networks (CNNs) have led to considerable progress in game playing. We investigate how to incorporate the strength of CNNs into solving, with application to the game of Hex. We describe FDFPN-CNN, a new focused DFPN search that uses convolutional neural networks. FDFPN-CNN integrates two CNNs trained from games played by expert players. The value approximation CNN provides reliable information for defining the widening size by estimating the value of the node to expand, while the policy CNN selects promising children nodes to the search. On 8x8 Hex, experimental results show FDFPN-CNN performs notably better than FDFPN, suggesting a promising direction for better solving Hex positions where learning from strong players is possible.

• #3929
Interactive Narrative Personalization with Deep Reinforcement Learning
Pengcheng Wang, Jonathan Rowe, Wookhee Min, Bradford Mott, James Lester
Computer Games

Data-driven techniques for interactive narrative generation are the subject of growing interest. Reinforcement learning (RL) offers significant potential for devising data-driven interactive narrative generators that tailor players’ story experiences by inducing policies from player interaction logs. A key open question in RL-based interactive narrative generation is how to model complex player interaction patterns to learn effective policies. In this paper we present a deep RL-based interactive narrative generation framework that leverages synthetic data produced by a bipartite simulated player model. Specifically, the framework involves training a set of Q-networks to control adaptable narrative event sequences with long short-term memory network-based simulated players. We investigate the deep RL framework’s performance with an educational interactive narrative, Crystal Island. Results suggest that the deep RL-based narrative generation framework yields effective personalized interactive narratives.

• #4152
Game Engine Learning from Video
Matthew Guzdial, Boyang Li, Mark O. Riedl
Computer Games

Intelligent agents need to be able to make predictions about their environment. In this work we present a novel approach to learn a forward simulation model via simple search over pixel input. We make use of a video game, Super Mario Bros., as an initial test of our approach as it represents a physics system that is significantly less complex than reality. We demonstrate the significant improvement of our approach in predicting future states compared with a baseline CNN and apply the learned model to train a game playing agent. Thus we evaluate the algorithm in terms of the accuracy and value of its output model.

### Wednesday 2308:30 - 10:00CS-MOTR - Modeling and Formulation (216)

Chair: Mark Wallace
• #3673
Cardinality Encodings for Graph Optimization Problems
Alexey Ignatiev, Antonio Morgado, Joao Marques-Silva
Modeling and Formulation

Different optimization problems defined on graphs find application in complex network analysis. Existing propositional encodings render impractical the use of propositional satisfiability (SAT) and maximum satisfiability (MaxSAT) solvers for solving a variety of these problems on large graphs. This paper has two main contributions. First, the paper identifies sources of inefficiency in existing encodings for different optimization problems in graphs. Second, for the concrete case of the maximum clique problem, the paper develops a novel encoding which is shown to be far more compact than existing encodings for large sparse graphs. More importantly, the experimental results show that the proposed encoding enables existing SAT solvers to compute a maximum clique for large sparse networks, often more efficiently than the state of the art.

• #3037
Temporal Planning with Clock-Based SMT Encodings
Jussi Rintanen
Modeling and Formulation

We propose more scalable encodings of temporal planning in SMT. The first contribution is practical clock-based encodings of resources and effect delays. Existing encodings of effect delays (Shin and Davis, 2015) have a quadratic size, due to the necessity to determine the time differences between steps for a linear number of steps. Clocks improve this to linear. The second contribution is a new relaxed scheme for steps. Existing schemes require a step for every time point with discontinuous change. This is relaxed, improving scalability.

• #3157
Finding Robust Solutions to Stable Marriage
Begum Genc, Mohamed Siala, Barry O'Sullivan, Gilles Simonin
Modeling and Formulation

We study the notion of robustness in stable matching problems. We first define robustness by introducing (a,b)-supermatches. An (a,b)-supermatch is a stable matching in which if a pairs break up it is possible to find another stable matching by changing the partners of those a pairs and at most b other pairs. In this context, we define the most robust stable matching as a (1,b)-supermatch where b is minimum. We show that checking whether a given stable matching is a (1,b)-supermatch can be done in polynomial time. Next, we use this procedure to design a constraint programming model, a local search approach, and a genetic algorithm to find the most robust stable matching. Our empirical evaluation on large instances show that local search outperforms the other approaches.

• #3233
Nonlinear Hybrid Planning with Deep Net Learned Transition Models and Mixed-Integer Linear Programming
Buser Say, Ga Wu, Yu Qing Zhou, Scott Sanner
Modeling and Formulation

In many real-world hybrid (mixed discrete continuous) planning problems such as Reservoir Control, Heating, Ventilation and Air Conditioning (HVAC), and Navigation, it is difficult to obtain a model of the complex nonlinear dynamics that govern state evolution. However, the ubiquity of modern sensors allow us to collect large quantities of data from each of these complex systems and build accurate, nonlinear deep network models of their state transitions. But there remains one major problem for the task of control -- how can we plan with deep network learned transition models without resorting to Monte Carlo Tree Search and other black-box transition model techniques that ignore model structure and do not easily extend to mixed discrete and continuous domains? In this paper, we make the critical observation that the popular Rectified Linear Unit (ReLU) transfer function for deep networks not only allows accurate nonlinear deep net model learning, but also permits a direct compilation of the deep network transition model to a Mixed-Integer Linear Program (MILP) encoding in a planner we call Hybrid Deep MILP Planning (HD-MILP-PLAN). We identify deep net specific optimizations and a simple sparsification method for HD-MILP-PLAN that improve performance over a naive encoding, and show that we are able to plan optimally with respect to the learned deep network.

• #3418
Relaxed Exists-Step Plans in Planning as SMT
Miquel Bofill, Joan Espasa, Mateu Villaret
Modeling and Formulation

Planning Modulo Theories (PMT), inspired by Satisfiability Modulo Theories (SMT), allows the integration of arbitrary first order theories, such as linear arithmetic, with propositional planning. Under this setting, planning as SAT is generalized to planning as SMT. In this paper we introduce a new encoding for planning as SMT, which adheres to the relaxed relaxed ∃-step (R 2 ∃-step) semantics for parallel plans. We show the benefits of relaxing the requirements on the set of actions eligible to be executed at the same time, even though many redundant actions can be introduced. We also show how, by a MaxSMT based post-processing step, redundant actions can be efficiently removed, and provide experimental results showing the benefits of this approach.

• #3676
Compact MDDs for Pseudo-Boolean Constraints with At-Most-One Relations in Resource-Constrained Scheduling Problems
Miquel Bofill, Jordi Coll, Josep Suy, Mateu Villaret
Modeling and Formulation

Pseudo-Boolean (PB) constraints are usually encoded into Boolean clauses using compact Binary Decision Diagram (BDD) representations. Although these constraints appear in many problems, they are particularly useful for representing resource constraints in scheduling problems. Sometimes, the Boolean variables in the PB constraints have implicit at-most-one relations. In this work we introduce a way to take advantage of these implicit relations to obtain a compact Multi-Decision Diagram (MDD) representation for those PB constraints. We provide empirical evidence of the usefulness of this technique for some Resource-Constrained Project Scheduling Problem (RCPSP) variants, namely the Multi-Mode RCPSP (MRCPSP) and the RCPSP with Time-Dependent Resource Capacities and Requests (RCPSP/t). The size reduction of the representation of the PB constraints lets us decrease the number of Boolean variables in the encodings by one order of magnitude. We close/certify the optimum of many instances of these problems.

### Wednesday 2308:30 - 10:00KR-DLO1 - Description Logics ad Ontologies 1 (217)

Chair: Sebastian Rudolph
• #936
Role Forgetting for ALCOQH(universal role)-Ontologies Using an Ackermann-Based Approach
Yizheng Zhao, Renate A. Schmidt
Description Logics ad Ontologies 1

Forgetting refers to a non-standard reasoning problem concerned with eliminating concept and role symbols from description logic-based ontologies while preserving all logical consequences up to the remaining symbols. Whereas previous research has primarily focused on forgetting concept symbols, in this paper, we turn our attention to role symbol forgetting. In particular, we present a practical method for semantic role forgetting for ontologies expressible in the description logic ALCOQH(universal role), i.e., the basic description logic ALC extended with nominals, qualified number restrictions, role inclusions and the universal role. Being based on an Ackermann approach, the method is the only approach so far for forgetting role symbols in description logics with qualified number restrictions. The method is goal-oriented and incremental. It always terminates and is sound in the sense that the forgetting solution is equivalent to the original ontology up to the forgotten symbols possibly with new concept definer symbols. Despite our method not being complete, performance results of an evaluation with a prototypical implementation have shown very good success rates on real-world ontologies.

• #2278
Ontology-Mediated Querying with the Description Logic EL: Trichotomy and Linear Datalog Rewritability
Carsten Lutz, Leif Sabellek
Description Logics ad Ontologies 1

We consider ontology-mediated queries (OMQs) based on an EL ontology and an atomic query (AQ), provide an ultimately fine-grained analysis of data complexity and study rewritability into linear Datalog-aiming to capture linear recursion in SQL. Our main results are that every such OMQ is in AC0, NL-complete or PTime-complete, and that containment in NL coincides with rewritability into linear Datalog (whereas containment in AC0 coincides with rewritability into first-order logic). We establish natural characterizations of the three cases, show that deciding linear Datalog rewritability (as well as the mentioned complexities) is ExpTime-complete, give a way to construct linear Datalog rewritings when they exist, and prove that there is no constant bound on the arity of IDB relations in linear Datalog rewritings.

• #2396
A Characterization Theorem for a Modal Description Logic
Paul Wild, Lutz Schröder
Description Logics ad Ontologies 1

Modal description logics feature modalities that capture dependence of knowledge on parameters such as time, place, or the information state of agents. E.g., the logic S5-ALC combines the standard description logic ALC with an S5-modality that can be understood as an epistemic operator or as representing (undirected) change. This logic embeds into a corresponding modal first-order logic S5-FOL. We prove a modal characterization theorem for this embedding, in analogy to results by van Benthem and Rosen relating ALC to standard first-order logic: We show that S5-ALC with only local roles is, both over finite and over unrestricted models, precisely the bisimulation-invariant fragment of S5-FOL, thus giving an exact description of the expressive power of S5-ALC with only local roles.

• #2411
Learning from Ontology Streams with Semantic Concept Drift
Jiaoyan Chen, Freddy Lecue, Jeff Z. Pan, Huajun Chen
Description Logics ad Ontologies 1

Data stream learning has been largely studied for extracting knowledge structures from continuous and rapid data records. In the semantic Web, data is interpreted in ontologies and its ordered sequence is represented as an ontology stream. Our work exploits the semantics of such streams to tackle the problem of concept drift i.e., unexpected changes in data distribution, causing most of models to be less accurate as time passes. To this end we revisited (i) semantic inference in the context of supervised stream learning, and (ii) models with semantic embeddings. The experiments show accurate prediction with data from Dublin and Beijing.

• #2542
The Bag Semantics of Ontology-Based Data Access
Charalampos Nikolaou, Egor V. Kostylev, George Konstantinidis, Mark Kaminski, Bernardo Cuenca Grau, Ian Horrocks
Description Logics ad Ontologies 1

Ontology-based data access (OBDA) is a popular approach for integrating and querying multiple data sources by means of a shared ontology. The ontology is linked to the sources using mappings, which assign views over the data to ontology predicates. Motivated by the need for OBDA systems supporting database-style aggregate queries, we propose a bag semantics for OBDA, where duplicate tuples in the views defined by the mappings are retained, as is the case in standard databases. We show that bag semantics makes conjunctive query answering in OBDA coNP-hard in data complexity. To regain tractability, we consider a rather general class of queries and show its rewritability to a generalisation of the relational calculus to bags.

• #3549
Ontology-Mediated Query Answering for Key-Value Stores
Meghyn Bienvenu, Pierre Bourhis, Marie-Laure Mugnier, Sophie Tison, Federico Ulliana
Description Logics ad Ontologies 1

We propose a novel rule-based ontology language for JSON records and investigate its computational properties. After providing a natural translation into first-order logic, we identify relationships to existing ontology languages, which yield decidability of query answering but only rough complexity bounds. By establishing an interesting and non-trivial connection to word rewriting, we are able to pinpoint the exact combined complexity of query answering in our framework and obtain tractability results for data complexity. The upper bounds are proven using a query reformulation technique, which can be implemented on top of key-value stores, thereby exploiting their querying facilities.

### Wednesday 2308:30 - 10:00KR-GSTR - Geometric, Spatial, and Temporal Reasoning (218)

Chair: Pavel Naumov
• #1860
Efficiently Enforcing Path Consistency on Qualitative Constraint Networks by Use of Abstraction
Michael Sioutis, Jean-François Condotta
Geometric, Spatial, and Temporal Reasoning

Partial closure under weak composition, or partial weak path-consistency for short, is essential for tackling fundamental reasoning problems associated with qualitative constraint networks, such as the satisfiability checking problem, and therefore it is crucial to be able to enforce it as fast as possible. To this end, we propose a new algorithm, called PWCα, for efficiently enforcing partial weak path-consistency on qualitative constraint networks, that exploits the notion of abstraction for qualitative constraint networks, utilizes certain properties of partial weak path-consistency,and adapts the functionalities of some state-of-the-art algorithms to its design. It is worth noting that, as opposed to a related approach in the recent literature, algorithm PWCα is complete for arbitrary qualitative constraint networks. The evaluation that we conducted with qualitative constraint networks of the Region Connection Calculus against a competing state-of-the-art generic algorithm for enforcing partial weak path-consistency, demonstrates the usefulness and efficiency of algorithm PWCα.

• #2788
Inferring Human Attention by Learning Latent Intentions
Ping Wei, Dan Xie, Nanning Zheng, Song-Chun Zhu
Geometric, Spatial, and Temporal Reasoning

This paper addresses the problem of inferring 3D human attention in RGB-D videos at scene scale. 3D human attention describes where a human is looking in 3D scenes. We propose a probabilistic method to jointly model attention, intentions, and their interactions. Latent intentions guide human attention which conversely reveals the intention features. This mutual interaction makes attention inference a joint optimization with latent intentions. An EM-based approach is adopted to learn the latent intentions and model parameters. Given an RGB-D video with 3D human skeletons, a joint-state dynamic programming algorithm is utilized to jointly infer the latent intentions, the 3D attention directions, and the attention voxels in scene point clouds. Experiments on a new 3D human attention dataset prove the strength of our method.

• #3490
Dynamic Logic for Data-aware Systems: Decidability Results
Francesco Belardinelli, Andreas Herzig
Geometric, Spatial, and Temporal Reasoning

We introduce a first-order extension of dynamic logic (FO-DL), suitable to represent and reason about the behaviour of Data-aware Systems (DaS), which are systems whose data content is explicitly exhibited in the system’s description. We illustrate the expressivity of the formal framework by modelling English auctions as DaS, and by specifying relevant properties in FO-DL. Most importantly, we develop an abstraction-based verification procedure, thus proving that the model checking problem for DaS against FO-DL is actually decidable, provided some mild assumptions on the interpretationdomain.

• #3604
Temporal Sequences of Qualitative Information: Reasoning about the Topology of Constant-Size Moving Regions
Quentin Cohen-Solal, Maroua Bouzid, Alexandre Niveau
Geometric, Spatial, and Temporal Reasoning

Relying on the recently introduced multi-algebras, we present a general approach for reasoning about temporal sequences of qualitative information that is generally more efficient than existing techniques. Applying our approach to the specific case of sequences of topological information about constant-size regions, we show that the resulting formalism has a complete procedure for deciding consistency, and we identify its three maximal tractable subclasses containing all basic relations.

• #3903
Temporalising Separation Logic for Planning with Search Control Knowledge
Xu Lu, Cong Tian, Zhenhua Duan
Geometric, Spatial, and Temporal Reasoning

Temporal logics are widely adopted in Artificial Intelligence (AI) planning for specifying Search Control Knowledge (SCK). However, traditional temporal logics are limited in expressive power since they are unable to express spatial constraints which are as important as temporal ones in many planning domains. To this end, we propose a two-dimensional (spatial and temporal) logic namely PPTL^SL by temporalising separation logic with Propositional Projection Temporal Logic (PPTL). The new logic is well-suited for specifying SCK containing both spatial and temporal constraints which are useful in AI planning. We show that PPTL^SL is decidable and present a decision procedure. With this basis, a planner namely S-TSolver for computing plans based on the spatio-temporal SCK expressed in PPTL^SL formulas is developed. Evaluation on some selected benchmark domains shows the effectiveness of S-TSolver.

• #4107
Bounded Timed Propositional Temporal Logic with Past Captures Timeline-based Planning with Bounded Constraints
Dario Della Monica, Nicola Gigante, Angelo Montanari, Pietro Sala, Guido Sciavicco
Geometric, Spatial, and Temporal Reasoning

Within the timeline-based framework, planning problems are modeled as sets of independent, but interacting, components whose behavior over time is described by a set of temporal constraints. Timeline-based planning is being used successfully in a number of complex tasks, but its theoretical properties are not so well studied. In particular, while it is known that Linear Temporal Logic (LTL) can capture classical action-based planning, a similar logical characterization was not available for timeline-based planning formalisms. This paper shows that timeline-based planning with bounded temporal constraints can be captured by a bounded version of Timed Propositional Temporal Logic, augmented with past operators, which is an extension of LTL originally designed for the verification of real-time systems. As a byproduct, we get that the proposed logic is expressive enough to capture temporal action-based planning problems.

### Wednesday 2308:30 - 10:00MAS-EP1 - Economic Paradigms 1 (219)

Chair: Bo An
• #1271
Why You Should Charge Your Friends for Borrowing Your Stuff
Kijung Shin, Euiwoong Lee, Dhivya Eswaran, Ariel D. Procaccia
Economic Paradigms 1

We consider goods that can be shared with k-hop neighbors (i.e., the set of nodes within k hops from an owner) on a social network. We examine incentives to buy such a good by devising game-theoretic models where each node decides whether to buy the good or free ride. First, we find that social inefficiency, specifically excessive purchase of the good, occurs in Nash equilibria. Second, the social inefficiency decreases as k increases and thus a good can be shared with more nodes. Third, and most importantly, the social inefficiency can also be significantly reduced by charging free riders an access cost and paying it to owners, leading to the conclusion that organizations and system designers should impose such a cost. These findings are supported by our theoretical analysis in terms of the price of anarchy and the price of stability; and by simulations based on synthetic and real social networks.

• #2063
Representativeness-aware Aspect Analysis for Brand Monitoring in Social Media
Lizi Liao, Xiangnan He, Zhaochun Ren, Liqiang Nie, Huan Xu, Tat-Seng Chua
Economic Paradigms 1

Owing to the fast-responding nature and extreme success of social media, many companies resort to social media sites for monitoring their brands’ reputation and the opinions of general public. To help companies monitor their brands, in this work, we delve into the task of extracting representative aspects and posts from users’ free-text posts in social media. Previous efforts have treated it as a traditional information extraction task, and forgo the specific properties of social media, such as the possible noise in user generated posts and the varying impacts; In contrast, we extract aspects by maximizing their representativeness, which is a new notion defined by us that accounts for both the coverage of aspects and the impact of posts. We formalize it as a submodular optimization problem, and develop a FastPAS algorithm to jointly select representative posts and aspects. The FastPAS algorithm optimizes parameters in a greedy way, which is highly efficient and can reach a good solution with theoretical guarantees. We perform extensive experiments on two datasets, showing that our method outperforms the state-of-the-art aspect extraction and summarization methods in identifying representative aspects.

• #2341
Contest Design with Uncertain Performance and Costly Participation
Priel Levy, David Sarne, Igor Rochlin
Economic Paradigms 1

This paper studies the problem of designing contests for settings where a principal seeks to optimize the quality of the best performance obtained, and potential contestants only strategize about whether to participate in the contest, as participation incurs some cost. This type of contest can be mapped to various real-life settings (e.g., an audition, a beauty pageant, technology crowdsourcing). The paper provides a comparative game-theoretic based solution to two variants of the above underlying model: parallel and sequential contest, enabling a characterization of the equilibrium strategies in each. Special emphasis is placed on the case where the contestants are homogeneous which is often the case in real-life whenever the contestants are basically alike and their ranking in the contest is mostly influenced by some probabilistic factors (e.g., luck). Here, several (somehow counter-intuitive) properties of the equilibrium are proved, in particular for the sequential contest, leading to a comprehensive characterization of the principal preference between the two.

• #2378
Pessimistic Leader-Follower Equilibria with Multiple Followers
Stefano Coniglio, Nicola Gatti, Alberto Marchesi
Economic Paradigms 1

The problem of computing the strategy to commit to has been widely investigated in the scientific literature for the case where a single-follower is present. In the multi-follower setting though, results are only sporadic. In this paper, we address the multi-follower case for normal-form games, assuming that, after observing the leader’s commitment, the followers play pure strategies and reach a Nash equilibrium. We focus on the pessimistic case where, among many equilibria, one minimizing the leader’s utility is chosen (the opposite case is computationally trivial). We show that the problem is NP-hard even with only two followers, and propose an exact exponential-time algorithm which, for any number of followers, either finds an equilibrium when the game admits a finite one or, if not, an α-approximation of the supremum of the leader’ utility, for any α > 0.

• #3174
Bounding the Inefficiency of Compromise
Ioannis Caragiannis, Panagiotis Kanellopoulos, Alexandros A. Voudouris
Economic Paradigms 1

Social networks on the Internet have seen an enormous growth recently and play a crucial role in different aspects of today's life. They have facilitated information dissemination in ways that have been beneficial for their users but it is also a common belief that they are often used strategically in order to spread information that only serves the objectives of particular users. These properties have inspired a revision of classical opinion formation models from sociology using game-theoretic notions and tools. We follow the same modeling approach, focusing on scenarios where the opinion expressed by each user is a compromise between her internal belief and the opinions of a small number of neighbors among her social acquaintances. We formulate simple games that capture this behavior and quantify the inefficiency of equilibria using the well-known notion of the price of anarchy. Our results indicate that compromise comes at a cost that strongly depends on the neighborhood size.

• #3402
Computing Bayes-Nash Equilibria in Combinatorial Auctions with Continuous Value and Action Spaces
Vitor Bosshard, Benedikt Bünz, Benjamin Lubin, Sven Seuken
Economic Paradigms 1

Combinatorial auctions (CAs) are widely used in practice, which is why understanding their incentive properties is an important problem. However, finding Bayes-Nash equilibria (BNEs) of CAs analytically is tedious, and prior algorithmic work has only considered limited solution concepts (e.g. restricted action spaces). In this paper, we present a fast, general algorithm for computing symmetric pure ε-BNEs in CAs with continuous values and actions. In contrast to prior work, we separate the search phase (for finding the BNE) from the verification step (for estimating the ε), and always consider the full (continuous) action space in the best response computation. We evaluate our method in the well-studied LLG domain, against a benchmark of 16 CAs for which analytical BNEs are known. In all cases, our algorithm converges quickly, matching the known results with high precision. Furthermore, for CAs with quasi-linear utility functions and independently distributed valuations, we derive a theoretical bound on ε. Finally, we introduce the new Multi-Minded LLLLGG domain with eight goods and six bidders, and apply our algorithm to finding an equilibrium in this domain. Our algorithm is the first to find an accurate BNE in a CA of this size.

### Wednesday 2308:30 - 10:00PL-SPS - Search in Planning and Scheduling (220)

Chair: Gabriele Röger
• #3790
Landmarks for Numeric Planning Problems
Enrico Scala, Patrik Haslum, Daniele Magazzeni, Sylvie Thiébaux
Search in Planning and Scheduling

The paper generalises the notion of landmarks for reasoning about planning problems involving propositional and numeric variables. Intuitively, numeric landmarks are regions in the metric space defined by the problem whose crossing is necessary for its resolution. The paper proposes a relaxation-based method for their automated extraction directly from the problem structure, and shows how to exploit them to infer what we call disjunctive and additive hybrid action landmarks. The justification of such a disjunctive representation results from the intertwined propositional and numeric structure of the problem. The paper exercises their use in two novel admissible LP-Based numeric heuristics, and reports experiments on cost-optimal numeric planning problems. Results show the heuristics are more informed and effective than previous work for problems involving a higher number of (sub)goals.

• #3849
Faster Conflict Generation for Dynamic Controllability
Nikhil Bhargava, Tiago Vaquero, Brian Williams
Search in Planning and Scheduling

In this paper, we focus on speeding up the temporal plan relaxation problem for dynamically controllable systems. We take a look at the current best-known algorithm for determining dynamic controllability and augment it to efficiently generate conflicts when the network is deemed uncontrollable. Our work preserves the O(n^3) runtime of the best available dynamic controllability checker and improves on the previous best runtime of O(n^4) for extracting dynamic controllability conflicts. We then turn our attention to temporal plan relaxation tasks and show how we can leverage our work on conflicts and the structure of the network to efficiently make incremental updates intended to restore dynamic controllability by relaxing constraints. Our new algorithm, RelaxIDC, has the same asymptotic runtime as previous algorithms but sees dramatic empirical improvements over the course of repeated dynamic controllability checks.

• #3883
Numeric Planning via Abstraction and Policy Guided Search
León Illanes, Sheila A. McIlraith
Search in Planning and Scheduling

The real-world application of planning techniques often requires models with numeric fluents. However, these fluents are not directly supported by most planners and heuristics. We describe a family of planning algorithms that takes a numeric planning problem and produces an abstracted representation that can be solved using any classical planner. The resulting abstract plan is generalized into a policy and then used to guide the search in the original numeric domain. We prove that our approach is sound, and we evaluate it on a set of standard benchmarks. We show that it can provide competitive performance when compared to other well-known algorithms for numeric planning, and a significant performance improvement in certain domains.

• #1235
Lossy Compression of Pattern Databases Using Acyclic Random Hypergraphs
Mehdi Sadeqi, Howard J. Hamilton
Search in Planning and Scheduling

A domain-independent heuristic function created by an abstraction is usually implemented using a Pattern Database (PDB), which is a lookup table of (abstract state, heuristic value) pairs. PDBs containing high quality heuristic values generally require substantial memory space and therefore need to be compressed. In this paper, we introduce Acyclic Random Hypergraph Compression (ARHC), a domain-independent approach to compressing PDBs using acyclic random r-partite r-uniform hypergraphs. The ARHC algorithm, which comes in Base and Extended versions, provides fast lookup and a high compression rate. ARHC-Extended achieves higher quality heuristics than ARHC-Base by decreasing the heuristic information loss at the cost of some decrease in the compression rate. ARHC shows higher performance than level-by-level Bloom filter PDB compression in all experiments conducted so far.

• #2416
A Scalable Approach to Chasing Multiple Moving Targets with Multiple Agents
Fan Xie, Adi Botea, Akihiro Kishimoto
Search in Planning and Scheduling

Chasing multiple mobile targets with multiple agents is important in several applications, such as computer games and police chasing scenarios. Existing approaches can compute optimal policies. However, they have a limited scalability, as they implement expensive minimax searches. We introduce a sub-optimal but scalable approach that assigns individual agents to individual targets and that can dynamically re-compute such assignments. We provide a theoretical analysis, including upper bounds on the number of time steps required to solve an instance. In a detailed empirical evaluation on grid maps, our algorithm scales up very convincingly beyond the limits of previous methods. On small problems, where a comparison to a minimax approach is possible, the results demonstrate a good solution quality for our method.

• #3047
Efficient Optimal Search under Expensive Edge Cost Computation
Masataro Asai, Akihiro Kishimoto, Adi Botea, Radu Marinescu, Elizabeth M. Daly, Spyros Kotoulas
Search in Planning and Scheduling

Optimal heuristic search has been successful in many domains, including journey planning, route planning and puzzle solving. Existing work typically assumes that the cost of each action can easily be obtained. However, in many problems, the exact edge cost is expensive to compute. Existing search algorithms face a significant performance bottleneck, due to an excessive overhead associated with dynamically calculating exact edge costs. We present DEA*, an algorithm for problems with expensive edge cost computations. DEA* combines heuristic edge cost evaluations with delayed node expansions, reducing the number of exact edge computations. We formally prove that DEA* is optimal and it is efficient with respect to the number of exact edge cost computations. We empirically evaluate DEA* on multiple-worker routing problems where the exact edge cost is calculated by invoking an external multi-modal journey planning engine. The results demonstrate the effectiveness of our ideas in reducing the computational time and improving the solving ability. In addition, we show the advantages of DEA* in domain-independent planning, where we simulate that accurate edge costs are expensive to compute.

### Wednesday 2310:30 - 12:00ML-CL4 - Classification 4 (204)

Chair: Jennifer Neville
• #1319
Adaptive Manifold Regularized Matrix Factorization for Data Clustering
Lefei Zhang, Qian Zhang, Bo Du, Jane You, Dacheng Tao
Classification 4

Data clustering is the task to group the data samples into certain clusters based on the relationships of samples and structures hidden in data, and it is a fundamental and important topic in data mining and machine learning areas. In the literature, the spectral clustering is one of the most popular approaches and has many variants in recent years. However, the performance of spectral clustering is determined by the affinity matrix, which is always computed by a predefined model (e.g., Gaussian kernel function) with carefully tuned parameters combination, and may far from optimal in practice. In this paper, we propose to consider the observed data clustering as a robust matrix factorization point of view, and learn an affinity matrix simultaneously to regularize the proposed matrix factorization. The solution of the proposed adaptive manifold regularized matrix factorization (AMRMF) is reached by a novel Augmented Lagrangian Multiplier (ALM) based algorithm. The experimental results on standard clustering datasets demonstrate the superior performance over the exist alternatives.

• #1562
Efficient Kernel Selection via Spectral Analysis
Jian Li, Yong Liu, Hailun Lin, Yinliang Yue, Weiping Wang
Classification 4

Kernel selection is a fundamental problem of kernel methods. Existing measures for kernel selection either provide less theoretical guarantee or have high computational complexity. In this paper, we propose a novel kernel selection criterion based on a newly defined spectral measure of a kernel matrix, with sound theoretical foundation and high computational efficiency. We first show that the spectral measure can be used to derive generalization bounds for some kernel-based algorithms. By minimizing the derived generalization bounds, we propose the kernel selection criterion with spectral measure. Moreover, we demonstrate that the popular minimum graph cut and maximum mean discrepancy are two special cases of the proposed criterion. Experimental results on lots of data sets show that our proposed criterion can not only give the comparable results as the state-of-the-art criterion, but also significantly improve the efficiency.

• #1733
Adaptive Learning Rate via Covariance Matrix Based Preconditioning for Deep Neural Networks
Yasutoshi Ida, Yasuhiro Fujiwara, Sotetsu Iwamura
Classification 4

Adaptive learning rate algorithms such as RMSProp are widely used for training deep neural networks. RMSProp offers efficient training since it uses first order gradients to approximate Hessian-based preconditioning. However, since the first order gradients include noise caused by stochastic optimization, the approximation may be inaccurate. In this paper, we propose a novel adaptive learning rate algorithm called SDProp. Its key idea is effective handling of the noise by preconditioning based on covariance matrix. For various neural networks, our approach is more efficient and effective than RMSProp and its variant.

• #2883
Robust Softmax Regression for Multi-class Classification with Self-Paced Learning
Yazhou Ren, Peng Zhao, Yongpan Sheng, Dezhong Yao, Zenglin Xu
Classification 4

Softmax regression, a generalization of Logistic regression (LR) in the setting of multi-class classification, has been widely used in many machine learning applications. However, the performance of softmax regression is extremely sensitive to the presence of noisy data and outliers. To address this issue, we propose a model of robust softmax regression (RoSR) originated from the self-paced learning (SPL) paradigm for multi-class classification. Concretely, RoSR equipped with the soft weighting scheme is able to evaluate the importance of each data instance. Then, data instances participate in the classification problem according to their weights. In this way, the influence of noisy data and outliers (which are typically with small weights) can be significantly reduced. However, standard SPL may suffer from the imbalanced class influence problem, where some classes may have little influence in the training process if their instances are not sensitive to the loss. To alleviate this problem, we design two novel soft weighting schemes that assign weights and select instances locally for each class. Experimental results demonstrate the effectiveness of the proposed methods.

• #3955
Recommendation vs Sentiment Analysis: A Text-Driven Latent Factor Model for Rating Prediction with Cold-Start Awareness
Kaisong Song, Wei Gao, Shi Feng, Daling Wang, Kam-Fai Wong, Chengqi Zhang
Classification 4

Review rating prediction is an important research topic. The problem was approached from either the perspective of recommender systems (RS) or that of sentiment analysis (SA). Recent SA research using deep neural networks (DNNs) has realized the importance of user and product interaction for better interpreting the sentiment of reviews. However, the complexity of DNN models in terms of the scale of parameters is very high, and the performance is not always satisfying especially when user-product interaction is sparse. In this paper, we propose a simple, extensible RS-based model, called Text-driven Latent Factor Model (TLFM), to capture the semantics of reviews, user preferences and product characteristics by jointly optimizing two components, a user-specific LFM and a product-specific LFM, each of which decomposes text into a specific low-dimension representation. Furthermore, we address the cold-start issue by developing a novel Pairwise Rating Comparison strategy (PRC), which utilizes the difference between ratings on common user/product as supplementary information to calibrate parameter estimation. Experiments conducted on IMDB and Yelp datasets validate the advantage of our approach over state-of-the-art baseline methods.

• #4014
Regional Concept Drift Detection and Density Synchronized Drift Adaptation
Anjin Liu, Yiliao Song, Guangquan Zhang, Jie Lu
Classification 4

In data stream mining, the emergence of new patterns or a pattern ceasing to exist is called concept drift. Concept drift makes the learning process complicated because of the inconsistency between existing data and upcoming data. Since concept drift was first proposed, numerous articles have been published to address this issue in terms of distribution analysis. However, most distribution-based drift detection methods assume that a drift happens at an exact time point, and the data arrived before that time point is considered not important. Thus, if a drift only occurs in a small region of the entire feature space, the other non-drifted regions may also be suspended, thereby reducing the learning efficiency of models. To retrieve non-drifted information from suspended historical data, we propose a local drift degree (LDD) measurement that can continuously monitor regional density changes. Instead of suspending all historical data after a drift, we synchronize the regional density discrepancies according to LDD. Experimental evaluations on three public data sets show that our concept drift adaptation algorithm improves accuracy compared to other methods.

### Wednesday 2310:30 - 12:00ML-DLNLP - Deep Learning and NLP (210)

Chair: Yuhong Guo
• #1442
Multimodal Storytelling via Generative Adversarial Imitation Learning
Zhiqian Chen, Xuchao Zhang, Arnold P. Boedihardjo, Jing Dai, Chang-Tien Lu
Deep Learning and NLP

Deriving event storylines is an effective summarization method to succinctly organize extensive information, which can significantly alleviate the pain of information overload. The critical challenge is the lack of widely recognized definition of storyline metric. Prior studies have developed various approaches based on different assumptions about users' interests. These works can extract interesting patterns, but their assumptions do not guarantee that the derived patterns will match users' preference. On the other hand, their exclusiveness of single modality source misses cross-modality information. This paper proposes a method, multimodal imitation learning via Generative Adversarial Networks(MIL-GAN), to directly model users' interests as reflected by various data. In particular, the proposed model addresses the critical challenge by imitating users' demonstrated storylines. Our proposed model is designed to learn the reward patterns given user-provided storylines and then applies the learned policy to unseen data. The proposed approach is demonstrated to be capable of acquiring the user's implicit intent and outperforming competing methods by a substantial margin with a user study.

• #1258
Combining Knowledge with Deep Convolutional Neural Networks for Short Text Classification
Jin Wang, Zhongyuan Wang, Dawei Zhang, Jun Yan
Deep Learning and NLP

Text classification is a fundamental task in NLP applications. Most existing work relied on either explicit or implicit text representation to address this problem. While these techniques work well for sentences, they can not easily be applied to short text because of its shortness and sparsity. In this paper, we propose a framework based on convolutional neural networks that combines explicit and implicit representations of short text for classification. We first conceptualize a short text as a set of relevant concepts using a large taxonomy knowledge base. We then obtain the embedding of short text by coalescing the words and relevant concepts on top of pre-trained word vectors. We further incorporate character level features into our model to capture fine-grained subword information. Experimental results on five commonly used datasets show that our proposed method significantly outperforms state-of-the-art methods.

• #3036
Adaptive Semantic Compositionality for Sentence Modelling
Pengfei Liu, Xipeng Qiu, Xuanjing Huang
Deep Learning and NLP

Representing a sentence with a fixed vector has shown its effectiveness in various NLP tasks. Most of the existing methods are based on neural network, which recursively apply different composition functions to a sequence of word vectors thereby obtaining a sentence vector.A hypothesis behind these approaches is that the meaning of any phrase can be composed of the meanings of its constituents.However, many phrases, such as idioms, are apparently non-compositional.To address this problem, we introduce a parameterized compositional switch, which outputs a scalar to adaptively determine whether the meaning of a phrase should be composed of its two constituents.We evaluate our model on five datasets of sentiment classification and demonstrate its efficacy with qualitative and quantitative experimental analysis .

• #3898
Exploration of Tree-based Hierarchical Softmax for Recurrent Language Models
Nan Jiang, Wenge Rong, Min Gao, Yikang Shen, Zhang Xiong
Deep Learning and NLP

Recently, variants of neural networks for computational linguistics have been proposed and successfully applied to neural language modeling and neural machine translation. These neural models can leverage knowledge from massive corpora but they are extremely slow as they predict candidate words from a large vocabulary during training and inference. As an alternative to gradient approximation and softmax with class decomposition, we explore the tree-based hierarchical softmax method and reform its architecture, making it compatible with modern GPUs and introducing a compact tree-based loss function. When combined with several word hierarchical clustering algorithms, improved performance is achieved in language modelling task with intrinsic evaluation criterions on PTB, WikiText-2 and WikiText-103 datasets.

• #3768
Deep Ordinal Regression Based on Data Relationship for Small Datasets
Yanzhu Liu, Adams Wai Kin Kong, Chi Keong Goh
Deep Learning and NLP

Ordinal regression aims to classify instances into ordinal categories. As with other supervised learning problems, learning an effective deep ordinal model from a small dataset is challenging. This paper proposes a new approach which transforms the ordinal regression problem to binary classification problems and uses triplets with instances from different categories to train deep neural networks such that high-level features describing their ordinal relationship can be extracted automatically. In the testing phase, triplets are formed by a testing instance and other instances with known ranks. A decoder is designed to estimate the rank of the testing instance based on the outputs of the network. Because of the data argumentation by permutation, deep learning can work for ordinal regression even on small datasets. Experimental results on the historical color image benchmark and MSRA image search datasets demonstrate that the proposed algorithm outperforms the traditional deep learning approach and is comparable with other state-of-the-art methods, which are highly based on prior knowledge to design effective features.

• #3933
Random Shifting for CNN: a Solution to Reduce Information Loss in Down-Sampling Layers
Gangming Zhao, Jingdong Wang, Zhaoxiang Zhang
Deep Learning and NLP

Down-sampling is widely adopted in deep convolutional neural networks (DCNN) for reducing the number of network parameters while preserving the transformation invariance. However, it cannot utilize information effectively because it only adopts a fixed stride strategy, which may result in poor generalization ability and information loss. In this paper, we propose a novel random strategy to alleviate these problems by embedding random shifting in the down-sampling layers during the training process. Random shifting can be universally applied to diverse DCNN models to dynamically adjust receptive fields by shifting kernel centers on feature maps in different directions. Thus, it can generate more robust features in networks and further enhance the transformation invariance of down-sampling operators. In addition, random shifting cannot only be integrated in all down-sampling layers including strided convolutional layers and pooling layers, but also improve performance of DCNN with negligible additional computational cost. We evaluate our method in different tasks (e.g., image classification and segmentation) with various network architectures (i.e., AlexNet, FCN and DFN-MR). Experimental results demonstrate the effectiveness of our proposed method.

### Wednesday 2310:30 - 12:00ML-DMFS - Data Mining and Feature Selection (211)

Chair: Qiang Yang
• #1274
Top-k Supervise Feature Selection via ADMM for Integer Programming
Mingyu Fan, Xiaojun Chang, Xiaoqin Zhang, Di Wang, Liang Du
Data Mining and Feature Selection

Recently, structured sparsity inducing based feature selection has become a hot topic in machine learning and pattern recognition. Most of the sparsity inducing feature selection methods are designed to rank all features by certain criterion and then select the k top ranked features, where k is an integer. However, the k top features are usually not the top k features and therefore maybe a suboptimal result. In this paper, we propose a novel supervised feature selection method to directly identify the top k features. The new method is formulated as a classic regularized least squares regression model with two groups of variables. The problem with respect to one group of the variables turn out to be a 0-1 integer programming, which had been considered very hard to solve. To address this, we utilize an efficient optimization method to solve the integer programming, which first replaces the discrete 0-1 constraints with two continuous constraints and then utilizes the alternating direction method of multipliers to optimize the equivalent problem. The obtained result is the top subset with k features under the proposed criterion rather than the subset of k top features. Experiments have been conducted on benchmark data sets to show the effectiveness of proposed method.

• #2165
Symmetric Non-negative Latent Factor Models for Undirected Large Networks
Xin Luo, Ming-Sheng Shang
Data Mining and Feature Selection

Undirected, high dimensional and sparse networks are frequently encountered in industrial applications. They contain rich knowledge regarding various useful patterns. Non-negative latent factor (NLF) models have proven to be effective and efficient in acquiring useful knowledge from asymmetric networks. However, they cannot correctly describe the symmetry of an undirected network. For addressing this issue, this work analyzes the NLF extraction processes on asymmetric and symmetric matrices respectively, thereby innovatively achieving the symmetric and non-negative latent factor (SNLF) models for undirected, high dimensional and sparse networks. The proposed SNLF models are equipped with a) high efficiency, b) non-negativity, and c) symmetry. Experimental results on real networks show that they are able to a) represent the symmetry of the target network rigorously; b) maintain the non-negativity of resulting latent factors; and c) achieve high computational efficiency when performing data analysis tasks as missing data estimation.

• #3144
SitNet: Discrete Similarity Transfer Network for Zero-shot Hashing
Yuchen Guo, Guiguang Ding, Jungong Han, Yue Gao
Data Mining and Feature Selection

Hashing has been widely utilized for fast image retrieval recently. With semantic information as supervision, hashing approaches perform much better, especially when combined with deep convolution neural network(CNN). However, in practice, new concepts emerge every day, making collecting supervised information for re-training hashing model infeasible. In this paper, we propose a novel zero-shot hashing approach, called Discrete Similarity Transfer Network (SitNet), to preserve the semantic similarity between images from both seen'' concepts and new unseen'' concepts. Motivated by zero-shot learning, the semantic vectors of concepts are adopted to capture the similarity structures among classes, making the model trained with seen concepts generalize well for unseen ones benefiting from the transferability of the semantic vector space. We adopt a multi-task architecture to exploit the supervised information for seen concepts and the semantic vectors simultaneously. Moreover, a discrete hashing layer is integrated into the network for hashcode generating to avoid the information loss caused by real-value relaxation in training phase, which is a critical problem in existing works. Experiments on three benchmarks validate the superiority of SitNet to the state-of-the-arts.

• #3368
Handling Noise in Boolean Matrix Factorization
Radim Belohlavek, Martin Trnecka
Data Mining and Feature Selection

We critically examine and point out weaknesses of the existing considerations in Boolean matrix factorization (BMF) regarding noise and the algorithms' ability to deal with noise. We argue that the current understanding is underdeveloped and that the current approaches are missing an important aspect. We provide a new, quantitative way to assess the ability of an algorithm to handle noise. Our approach is based on a common-sense definition of robustness requiring that the computed factorizations should not be affected much by varying the noise in data. We present an experimental evaluation of several existing algorithms and compare the results to the observations available in the literature. In addition to providing justification of some properties claimed in the literature without proper justification, our experiments reveal properties which were not reported as well as properties which counter certain claims made in the literature. Importantly, our approach reveals a line separating robust-to-noise from sensitive-to-noise algorithms, which has not been revealed by the previous approaches.

• #3401
Single-Pass PCA of Large High-Dimensional Data
Wenjian Yu, Yu Gu, Jian Li, Shenghua Liu, Yaohang Li
Data Mining and Feature Selection

Principal component analysis (PCA) is a fundamental dimension reduction tool in statistics and machine learning. For large and high-dimensional data, computing the PCA (i.e., the top singular vectors of the data matrix) becomes a challenging task. In this work, a single-pass randomized algorithm is proposed to compute PCA with only one pass over the data. It is suitable for processing extremely large and high-dimensional data stored in slow memory (hard disk) or the data generated in a streaming fashion. Experiments with synthetic and real data validate the algorithm's accuracy, which has orders of magnitude smaller error than an existing single-pass algorithm. For a set of high-dimensional data stored as a 150 GB file, the algorithm is able to compute the first 50 principal components in just 24 minutes on a typical 24-core computer, with less than 1 GB memory cost.

• #3770
Learning Homophily Couplings from Non-IID Data for Joint Feature Selection and Noise-Resilient Outlier Detection
Guansong Pang, Longbing Cao, Ling Chen, Huan Liu
Data Mining and Feature Selection

This paper introduces a novel wrapper-based outlier detection framework (WrapperOD) and its instance (HOUR) for identifying outliers in noisy data (i.e., data with noisy features) with strong couplings between outlying behaviors. Existing subspace or feature selection-based methods are significantly challenged by such data, as their search of feature subset(s) is independent of outlier scoring and thus can be misled by noisy features. In contrast, HOUR takes a wrapper approach to iteratively optimize the feature subset selection and outlier scoring using a top-k outlier ranking evaluation measure as its objective function. HOUR learns homophily couplings between outlying behaviors (i.e., abnormal behaviors are not independent - they bond together) in constructing a noise-resilient outlier scoring function to produce a reliable outlier ranking in each iteration. We show that HOUR (i) retains a 2-approximation outlier ranking to the optimal one; and (ii) significantly outperforms five state-of-the-art competitors on 15 real-world data sets with different noise levels in terms of AUC and/or P@n. The source code of HOUR is available at https://sites.google.com/site/gspangsite/sourcecode.

### Wednesday 2310:30 - 12:00ML-SSL2 - Semi-Supervised Learning 2 (212)

Chair: Ramon López de Màntaras
• #3684
Scaling Active Search using Linear Similarity Functions
Sibi Venkatesan, James K. Miller, Jeff Schneider, Artur Dubrawski
Semi-Supervised Learning 2

Active Search has become an increasingly useful tool in information retrieval problems where the goal is to discover as many target elements as possible using only limited label queries. With the advent of big data, there is a growing emphasis on the scalability of such techniques to handle very large and very complex datasets. In this paper, we consider the problem of Active Search where we are given a similarity function between data points. We look at an algorithm introduced by Wang et al. [Wang et al., 2013] known as Active Search on Graphs and propose crucial modifications which allow it to scale significantly. Their approach selects points by minimizing an energy function over the graph induced by the similarity function on the data. Our modifications require the similarity function to be a dot-product between feature vectors of data points, equivalent to having a linear kernel for the adjacency matrix. With this, we are able to scale tremendously: for n data points, the original algorithm runs in O(n^2) time per iteration while ours runs in only O(nr + r^2) given r-dimensional features. We also describe a simple alternate approach using a weighted-neighbor predictor which also scales well. In our experiments, we show that our method is competitive with existing semi-supervised approaches. We also briefly discuss conditions under which our algorithm performs well.

• #1943
Projection Free Rank-Drop Steps
Edward Cheung, Yuying Li
Semi-Supervised Learning 2

The Frank-Wolfe (FW) algorithm has been widely used in solving nuclear norm constrained problems, since it does not require projections. However, FW often yields high rank intermediate iterates, which can be very expensive in time and space costs for large problems. To address this issue, we propose a rank-drop method for nuclear norm constrained problems. The goal is to generate descent steps that lead to rank decreases, maintaining low-rank solutions throughout the algorithm. Moreover, the optimization problems are constrained to ensure that the rank-drop step is also feasible and can be readily incorporated into a projection-free minimization method, e.g., Frank-Wolfe. We demonstrate that by incorporating rank-drop steps into the Frank-Wolfe algorithm, the rank of the solution is greatly reduced compared to the original Frank-Wolfe or its common variants.

• #2114
Semi-Supervised Deep Hashing with a Bipartite Graph
Xinyu Yan, Lijun Zhang, Wu-Jun Li
Semi-Supervised Learning 2

Recently, deep learning has been successfully applied to the problem of hashing, yielding remarkable performance compared to traditional methods with hand-crafted features. However, most of existing deep hashing methods are designed for the supervised scenario and require a large number of labeled data. In this paper, we propose a novel semi-supervised hashing method for image retrieval, named Deep Hashing with a Bipartite Graph (DHBG), to simultaneously learn embeddings, features and hash codes. More specifically, we construct a bipartite graph to discover the underlying structure of data, based on which an embedding is generated for each instance. Then, we feed raw pixels as well as embeddings to a deep neural network, and concatenate the resulting features to determine the hash code. Compared to existing methods, DHBG is a universal framework that is able to utilize various types of graphs and losses. Furthermore, we propose an inductive variant of DHBG to support out-of-sample extensions. Experimental results on real datasets show that our DHBG outperforms state-of-the-art hashing methods.

• #2444
Learning to Learn Programs from Examples: Going Beyond Program Structure
Kevin Ellis, Sumit Gulwani
Semi-Supervised Learning 2

Programming-by-example technologies let end users construct and run new programs by providing examples of the intended program behavior. But, the few provided examples seldom uniquely determine the intended program. Previous approaches to picking a program used a bias toward shorter or more naturally structured programs. Our work here gives a machine learning approach for learning to learn programs that departs from previous work by relying upon features that are independent of the program structure, instead relying upon a learned bias over program behaviors, and more generally over program execution traces. Our approach leverages abundant unlabeled data for semisupervised learning, and incorporates simple kinds of world knowledge for common-sense reasoning during program induction. These techniques are evaluated in two programming-by-example domains, improving the accuracy of program learners.

• #3071
Semi-Supervised Learning for Surface EMG-based Gesture Recognition
Yu Du, Yongkang Wong, Wenguang Jin, Wentao Wei, Yu Hu, Mohan Kankanhalli, Weidong Geng
Semi-Supervised Learning 2

Conventionally, gesture recognition based on non-intrusive muscle-computer interfaces required a strongly-supervised learning algorithm and a large amount of labeled training signals of surface electromyography (sEMG). In this work, we show that temporal relationship of sEMG signals and data glove provides implicit supervisory signal for learning the gesture recognition model. To demonstrate this, we present a semi-supervised learning framework with a novel Siamese architecture for sEMG-based gesture recognition. Specifically, we employ auxiliary tasks to learn visual representation; predicting the temporal order of two consecutive sEMG frames; and, optionally, predicting the statistics of 3D hand pose with a sEMG frame. Experiments on the NinaPro, CapgMyo and csl-hdemg datasets validate the efficacy of our proposed approach, especially when the labeled samples are very scarce.

• #1263
Improving Learning-from-Crowds through Expert Validation
Mengchen Liu, Liu Jiang, Junlin Liu, Xiting Wang, Jun Zhu, Shixia Liu
Semi-Supervised Learning 2

Although several effective learning-from-crowd methods have been developed to infer correct labels from noisy crowdsourced labels, a method for post-processed expert validation is still needed. This paper introduces a semi-supervised learning algorithm that is capable of selecting the most informative instances and maximizing the influence of expert labels. Specifically, we have developed a complete uncertainty assessment to facilitate the selection of the most informative instances. The expert labels are then propagated to similar instances via regularized Bayesian inference. Experiments on both real-world and simulated datasets indicate that given a specific accuracy goal (e.g., 95%) our method reduces expert effort from 39% to 60% compared with the state-of-the-art method.

### Wednesday 2310:30 - 12:00MT-SP1 - Security and Privacy 1 (213)

Chair: Jose Such
• #1379
Online Reputation Fraud Campaign Detection in User Ratings
Chang Xu, Jie Zhang, Zhu Sun
Security and Privacy 1

Reputation fraud campaigns (RFCs) distort the reputations of rated items, by generating fake ratings through multiple spammers. One effective way of detecting RFCs is to characterize their collective behaviors based on rating histories.However, these campaigns are constantly evolving and changing tactics to evade detection.For example, they can launch early attacks on the items to quickly dominate the reputations.They can also whitewash themselves through creating new accounts for subsequent attacks.It is thus challenging for existing approaches working on historical data to promptly react to such emerging fraud activities.In this paper, we conduct RFC detection in online fashion, so as to spot campaign activities as early as possible.This leads to a unified and scalable optimization framework, FraudScan, that can adapt to emerging fraud patterns over time.Empirical analysis on two real-world datasets validates the effectiveness and efficiency of the proposed framework.

• #2010
Defending Against Man-In-The-Middle Attack in Repeated Games
Shuxin Li, Xiaohong Li, Jianye Hao, Bo An, Zhiyong Feng, Kangjie Chen, Chengwei Zhang
Security and Privacy 1

The Man-in-the-Middle (MITM) attack has become widespread in networks nowadays. The MITM attack would cause serious information leakage and result in tremendous loss to users. Previous work applies game theory to analyze the MITM attack-defense problem and computes the optimal defense strategy to minimize the total loss. It assumes that all defenders are cooperative and the attacker know defenders' strategies beforehand. However, each individual defender is rational and may not have the incentive to cooperate. Furthermore, the attacker can hardly know defenders' strategies ahead of schedule in practice. To this end, we assume that all defenders are self-interested and model the MITM attack-defense scenario as a simultaneous-move game. Nash equilibrium is adopted as the solution concept which is proved to be always unique. Given the impracticability of computing Nash equilibrium directly, we propose practical adaptive algorithms for the defenders and the attacker to learn towards the unique Nash equilibrium through repeated interactions. Simulation results show that the algorithms are able to converge to Nash equilibrium strategy efficiently.

• #2752
Staying Ahead of the Game: Adaptive Robust Optimization for Dynamic Allocation of Threat Screening Resources
Sara Marie Mc Carthy, Phebe Vayanos, Milind Tambe
Security and Privacy 1

We consider the problem of dynamically allocating screening resources of different efficacies (e.g., magnetic or X-ray imaging) at checkpoints (e.g., at airports or ports) to successfully avert an attack by one of the screenees. Previously, the Threat Screening Game model was introduced to address this problem under the assumption that screenee arrival times are perfectly known. In reality, arrival times are uncertain, which severely impedes the implementability and performance of this approach. We thus propose a novel framework for dynamic allocation of threat screening resources that explicitly accounts for uncertainty in the screenee arrival times. We model the problem as a multistage robust optimization problem and propose a tractable solution approach using compact linear decision rules combined with robust reformulation and constraint randomization. We perform extensive numerical experiments which showcase that our approach outperforms (a) exact solution methods in terms of tractability, while incurring only a very minor loss in optimality, and (b) methods that ignore uncertainty in terms of both feasibility and optimality.

• #3397
A Monte Carlo Tree Search approach to Active Malware Analysis
Riccardo Sartea, Alessandro Farinelli
Security and Privacy 1

Active Malware Analysis (AMA) focuses on acquiring knowledge about dangerous software by executing actions that trigger a response in the malware. A key problem for AMA is to design strategies that select most informative actions for the analysis. To devise such actions, we model AMA as a stochastic game between an analyzer agent and a malware sample, and we propose a reinforcement learning algorithm based on Monte Carlo Tree Search. Crucially, our approach does not require a pre-specified malware model but, in contrast to most existing analysis techniques, we generate such model while interacting with the malware. We evaluate our solution using clustering techniques on models generated by analyzing real malware samples. Results show that our approach learns faster than existing techniques even without any prior information on the samples.

• #3497
Tactics of Adversarial Attack on Deep Reinforcement Learning Agents
Yen-Chen Lin, Zhang-Wei Hong, Yuan-Hong Liao, Meng-Li Shih, Ming-Yu Liu, Min Sun
Security and Privacy 1

We introduce two tactics, namely the strategically-timed attack and the enchanting attack, to attack reinforcement learning agents trained by deep reinforcement learning algorithms using adversarial examples. In the strategically-timed attack, the adversary aims at minimizing the agent's reward by only attacking the agent at a small subset of time steps in an episode. Limiting the attack activity to this subset helps prevent detection of the attack by the agent. We propose a novel method to determine when an adversarial example should be crafted and applied. In the enchanting attack, the adversary aims at luring the agent to a designated target state. This is achieved by combining a generative model and a planning algorithm: while the generative model predicts the future states, the planning algorithm generates a preferred sequence of actions for luring the agent. A sequence of adversarial examples is then crafted to lure the agent to take the preferred sequence of actions. We apply the proposed tactics to the agents trained by the state-of-the-art deep reinforcement learning algorithm including DQN and A3C. In 5 Atari games, our strategically-timed attack reduces as much reward as the uniform attack (i.e., attacking at every time step) does by attacking the agent 4 times less often. Our enchanting attack lures the agent toward designated target states with a more than 70% success rate. Example videos are available at http://yclin.me/adversarial_attack_RL/.

• #2072
Efficient Private ERM for Smooth Objectives
Jiaqi Zhang, Kai Zheng, Wenlong Mou, Liwei Wang
Security and Privacy 1

In this paper, we consider efficient differentially private empirical risk minimization from the viewpoint of optimization algorithms. For strongly convex and smooth objectives, we prove that gradient descent with output perturbation not only achieves nearly optimal utility, but also significantly improves the running time of previous state-of-the-art private optimization algorithms, for both $\epsilon$-DP and $(\epsilon, \delta)$-DP. For non-convex but smooth objectives, we propose an RRPSGD (Random Round Private Stochastic Gradient Descent) algorithm, which provably converges to a stationary point with privacy guarantee. Besides the expected utility bounds, we also provide guarantees in high probability form. Experiments demonstrate that our algorithm consistently outperforms existing method in both utility and running time.

### Wednesday 2310:30 - 12:00CS-HS - Heuristic Search (216)

Chair: Yang Yu
• #1967
A Random Model for Argumentation Framework: Phase Transitions, Empirical Hardness, and Heuristics
Yong Gao
Heuristic Search

We propose and study, theoretically and empirically, a new random model for the abstract argumentation framework (AF). Our model overcomes some intrinsic difficulties of the only random model of directed graphs in the literature that is relevant to AFs, and makes it possible to study the typical-case complexity of AF instances in terms of threshold behaviours and phase transitions. We proved that the probability for a random AF instance to have a stable/preferred extension goes through a sudden change (from 1 to 0) at the threshold of the parameters of the new model D(n, p, q), satisfying the equation 4q/((1 + q)(1+q)) = p. We showed, empirically, that in this new model, there is a clear easy-hard-easy pattern of hardness (for a typical backtracking-style exact solvers) associated with the phase transition. Our empirical studies indicated that instances from the new model at phase transitions are much harder than those from an Erdos-Renyi-style model with equal edge density. In addition to being an analytically tractable model for understanding the interplay between problems structures and effectiveness of (branching) heuristics used in practical argumentation solvers, the model can also be used to generate, in a systematic way, non-trivial AF instances with controlled features to evaluate the performance of other AF solvers.

• #3239
Beyond Forks: Finding and Ranking Star Factorings for Decoupled Search
Daniel Gnad, Valerie Poser, Jörg Hoffmann
Heuristic Search

Star-topology decoupling is a recent search reduction method for forward state space search. The idea basically is to automatically identify a star factoring, then search only over the center component in the star, avoiding interleavings across leaf components. The framework can handle complex star topologies, yet prior work on decoupled search considered only factoring strategies identifying fork and inverted-fork topologies. Here, we introduce factoring strategies able to detect general star topologies, thereby extending the reach of decoupled search to new factorings and to new domains, sometimes resulting in significant performance improvements. Furthermore, we introduce a predictive portfolio method that reliably selects the most suitable factoring for a given planning task, leading to superior overall performance.

• #2880
Online Bridged Pruning for Real-Time Search with Arbitrary Lookaheads
Carlos Hernandez, Adi Botea, Jorge A. Baier, Vadim Bulitko
Heuristic Search

Real-time search algorithms are relevant to time-sensitive decision-making domains such as video games and robotics. In such settings, the agent is required to decide on each action under a constant time bound, regardless of the search space size. Despite recent progress, poor-quality solutions can be produced mainly due to state re-visitation. Different techniques have been developed to reduce such a re-visitation with state pruning showing promise. In this paper, we propose a novel pruning approach applicable to the wide class of real-time search algorithms. Given a local search space of arbitrary size, our technique aggressively prunes away all states in its interior, possibly adding new edges to maintain the connectivity of the search space frontier. An experimental evaluation shows that our pruning often improves the performance of a base real-time search algorithm by over an order of magnitude. This allows our implemented system to outperform state-of-the-art real-time search algorithms used in the evaluation.

• #3400
An Admissible HTN Planning Heuristic
Pascal Bercher, Gregor Behnke, Daniel Höller, Susanne Biundo
Heuristic Search

Hierarchical task network (HTN) planning is well-known for being an efficient planning approach. This is mainly due to the success of the HTN planning system SHOP2. However, its performance depends on hand-designed search control knowledge. At the time being, there are only very few domain-independent heuristics, which are designed for differing hierarchical planning formalisms. Here, we propose an admissible heuristic for standard HTN planning, which allows to find optimal solutions heuristically. It bases upon the so-called task decomposition graph (TDG), a data structure reflecting reachable parts of the task hierarchy. We show (both in theory and empirically) that rebuilding it during planning can improve heuristic accuracy thereby decreasing the explored search space. The evaluation further studies the heuristic both in terms of plan quality and coverage.

• #3431
Optimizing Ratio of Monotone Set Functions
Chao Qian, Jing-Cheng Shi, Yang Yu, Ke Tang, Zhi-Hua Zhou
Heuristic Search

This paper considers the problem of minimizing the ratio of two set functions, i.e., $f/g$. Previous work assumed monotone and submodular of the two functions, while we consider a more general situation where $g$ is not necessarily submodular. We derive that the greedy approach GreedRatio, as a fixed time algorithm, achieves a $\frac{|X^*|}{(1+(|X^*| \textendash 1)(1 \textendash \kappa_f))\gamma(g)}$ approximation ratio, which also improves the previous bound for submodular $g$. If more time can be spent, we present the PORM algorithm, an anytime randomized iterative approach minimizing $f$ and $\textendash g$ simultaneously. We show that PORM using reasonable time has the same general approximation guarantee as GreedRatio, but can achieve better solutions in cases and applications.

• #4132
On Subset Selection with General Cost Constraints
Chao Qian, Jing-Cheng Shi, Yang Yu, Ke Tang
Heuristic Search

This paper considers the subset selection problem with a monotone objective function and a monotone cost constraint, which relaxes the submodular property of previous studies. We first show that the approximation ratio of the generalized greedy algorithm is $\frac{\alpha}{2}(1 \textendash \frac{1}{e^{\alpha}})$ (where $\alpha$ is the submodularity ratio); and then propose POMC, an anytime randomized iterative approach that can utilize more time to find better solutions than the generalized greedy algorithm. We show that POMC can obtain the same general approximation guarantee as the generalized greedy algorithm, but can achieve better solutions in cases and applications.

### Wednesday 2310:30 - 12:00KR-DLO2 - Description Logics and Ontologies 2 (217)

Chair: Jérôme Euzenat
• #1540
Combining DL-Lite_{bool}^N with Branching Time: A gentle Marriage
Víctor Gutiérrez-Basulto, Jean Christoph Jung
Description Logics and Ontologies 2

We study combinations of the description logic DL-Lite_{bool}^N with the branching temporal logics CTL* and CTL. We analyse two types of combinations, both with rigid roles: (i) temporal operators are applied to concepts and to ABox assertions, and (ii) temporal operators are applied to concepts and Boolean combinations of concept inclusions and ABox assertions. For the resulting logics, we present algorithms for the satisfiability problem and (mostly tight) complexity bounds ranging from ExpTime to 3ExpTime.

• #1667
Query Rewriting for DL-Lite with n-ary Concrete Domains
Franz Baader, Stefan Borgwardt, Marcel Lippmann
Description Logics and Ontologies 2

We investigate ontology-based query answering (OBQA) in a setting where both the ontology and the query can refer to concrete values such as numbers and strings. In contrast to previous work on this topic, the built-in predicates used to compare values are not restricted to being unary. We introduce restrictions on these predicates and on the ontology language that allow us to reduce OBQA to query answering in databases using the so-called combined rewriting approach. Though at first sight our restrictions are different from the ones used in previous work, we show that our results strictly subsume some of the existing first-order rewritability results for unary predicates.

• #3213
Making Cross Products and Guarded Ontology Languages Compatible
Pierre Bourhis, Michael Morak, Andreas Pieris
Description Logics and Ontologies 2

Cross products form a useful modelling tool that allows us to express natural statements such as "elephants are bigger than mice", or, more generally, to define relations that connect every instance in a relation with every instance in another relation. Despite their usefulness, cross products cannot be expressed using existing guarded ontology languages, such as description logics (DLs) and guarded existential rules. The question that comes up is whether cross products are compatible with guarded ontology languages, and, if not, whether there is a way of making them compatible. This has been already studied for DLs, while for guarded existential rules remains unanswered. Our goal is to give an answer to the above question. To this end, we focus on the guarded fragment of first-order logic (which serves as a unifying framework that subsumes many of the aforementioned ontology languages) extended with cross products, and we investigate the standard tasks of satisfiability and query answering. Interestingly, we isolate relevant fragments that are compatible with cross products.

• #3791
Query Answering in Ontologies under Preference Rankings
İsmail İlkan Ceylan, Thomas Lukasiewicz, Rafael Peñaloza, Oana Tifrea-Marciuska
Description Logics and Ontologies 2

We present an ontological framework, based on preference rankings, that allows users to express their preferences between the knowledge explicitly available in the ontology. Using this formalism, the answers for a given query to an ontology can be ranked by preference, allowing users to retrieve the most preferred answers only. We provide a host of complexity results for the main computational tasks in this framework, for the general case, and for EL and DL-Lite_core as underlying ontology languages.

• #4109
Mapping Repair in Ontology-based Data Access Evolving Systems
Domenico Lembo, Riccardo Rosati, Valerio Santarelli, Domenico Fabio Savo, Evgenij Thorstensen
Description Logics and Ontologies 2

In this paper we study the evolution of ontology-based data access (OBDA) specifications, and focus on the case in which the ontology and/or the data source schema change, which may require a modification to the mapping between them to preserve both consistency and knowledge. Our approach is based on the idea of repairing the mapping according to the usual principle of minimal change and on a recent, mapping-based notion of consistency of the specification. We define and analyze two notions of mapping repair under ontology and source schema update. We then present a set of results on the complexity of query answering in the above framework, when the ontology is expressed in DL-LiteR.

• #4125
Most Probable Explanations for Probabilistic Database Queries
İsmail İlkan Ceylan, Stefan Borgwardt, Thomas Lukasiewicz
Description Logics and Ontologies 2

Forming the foundations of large-scale knowledge bases, probabilistic databases have been widely studied in the literature. In particular, probabilistic query evaluation has been investigated intensively as a central inference mechanism. However, despite its power, query evaluation alone cannot extract all the relevant information encompassed in large-scale knowledge bases. To exploit this potential, we study two inference tasks; namely finding the most probable database and the most probable hypothesis for a given query. As natural counterparts of most probable explanations (MPE) and maximum a posteriori hypotheses (MAP) in probabilistic graphical models, they can be used in a variety of applications that involve prediction or diagnosis tasks. We investigate these problems relative to a variety of query languages, ranging from conjunctive queries to ontology-mediated queries, and provide a detailed complexity analysis.

### Wednesday 2310:30 - 12:00MT-SS1 - Social Sciences 1 (218)

Chair: Haris Aziz
• #3439
A Causal Framework for Discovering and Removing Direct and Indirect Discrimination
Lu Zhang, Yongkai Wu, Xintao Wu
Social Sciences 1

In this paper, we investigate the problem of discovering both direct and indirect discrimination from the historical data, and removing the discriminatory effects before the data is used for predictive analysis (e.g., building classifiers). The main drawback of existing methods is that they cannot distinguish the part of influence that is really caused by discrimination from all correlated influences. In our approach, we make use of the causal network to capture the causal structure of the data. Then we model direct and indirect discrimination as the path-specific effects, which accurately identify the two types of discrimination as the causal effects transmitted along different paths in the network. Based on that, we propose an effective algorithm for discovering direct and indirect discrimination, as well as an algorithm for precisely removing both types of discrimination while retaining good data utility. Experiments using the real dataset show the effectiveness of our approaches.

• #2080
Fast Network Embedding Enhancement via High Order Proximity Approximation
Cheng Yang, Maosong Sun, Zhiyuan Liu, Cunchao Tu
Social Sciences 1

Many Network Representation Learning (NRL) methods have been proposed to learn vector representations for vertices in a network recently. In this paper, we summarize most existing NRL methods into a unified two-step framework, including proximity matrix construction and dimension reduction. We focus on the analysis of proximity matrix construction step and conclude that an NRL method can be improved by exploring higher order proximities when building the proximity matrix. We propose Network Embedding Update (NEU) algorithm which implicitly approximates higher order proximities with theoretical approximation bound and can be applied on any NRL methods to enhance their performances. We conduct experiments on multi-label classification and link prediction tasks. Experimental results show that NEU can make a consistent and significant improvement over a number of NRL methods with almost negligible running time on all three publicly available datasets.

• #1770
Cake Cutting: Envy and Truth
Xiaohui Bei, Ning Chen, Guangda Huzhang, Biaoshuai Tao, Jiajun Wu
Social Sciences 1

We study envy-free cake cutting with strategic agents, where each agent may manipulate his private information in order to receive a better allocation. We focus on piecewise constant utility functions and consider two scenarios: the general setting without any restriction on the allocations and the restricted setting where each agent has to receive a connected piece. We show that no deterministic truthful envy-free mechanism exists in the connected piece scenario, and the same impossibility result for the general setting with some additional mild assumptions on the allocations. Finally, we study a large market model where the economy is replicated and demonstrate that truth-telling converges to a Nash equilibrium.

• #2166
Networked Fairness in Cake Cutting
Xiaohui Bei, Youming Qiao, Shengyu Zhang
Social Sciences 1

We introduce a graphical framework for fair division in cake cutting, where comparisons between agents are limited by an underlying network structure. We generalize the classical fairness notions of envy-freeness and proportionality in this graphical setting. An allocation is called envy-free on a graph if no agent envies any of her neighbor's share, and is called proportional on a graph if every agent values her own share no less than the average among her neighbors, with respect to her own measure. These generalizations enable new research directions in developing simple and efficient algorithms that can produce fair allocations under specific graph structures. On the algorithmic frontier, we first propose a moving-knife algorithm that outputs an envy-free allocation on trees. The algorithm is significantly simpler than the discrete and bounded envy-free algorithm introduced in [Aziz and Mackenzie, 2016] for compete graphs. Next, we give a discrete and bounded algorithm for computing a proportional allocation on transitive closure of trees, a class of graphs by taking a rooted tree and connecting all its ancestor-descendant pairs.

• #3690
Deterministic, Strategyproof, and Fair Cake Cutting
Vijay Menon, Kate Larson
Social Sciences 1

We study the classic cake cutting problem from a mechanism design perspective, in particular focusing on deterministic mechanisms that are strategyproof and fair. We begin by looking at mechanisms that are non-wasteful and primarily show that for even the restricted class of piecewise constant valuations there exists no direct-revelation mechanism that is strategyproof and even approximately proportional. Subsequently, we remove the non-wasteful constraint and show another impossibility result stating that there is no strategyproof and approximately proportional direct-revelation mechanism that outputs contiguous allocations, again, for even the restricted class of piecewise constant valuations. In addition to the above results, we also present some negative results when considering an approximate notion of strategyproofness, show a connection between direct-revelation mechanisms and mechanisms in the Robertson-Webb model when agents have piecewise constant valuations, and finally also present a (minor) modification to the well-known Even-Paz algorithm that has better incentive-compatible properties for the cases when there are two or three agents.

• #1546
Modeling Physicians' Utterances to Explore Diagnostic Decision-making
Xuan Guo, Rui Li, Qi Yu, Anne Haake
Social Sciences 1

Diagnostic error prevention is a long-established but specialized topic in clinical and psychological research. In this paper, we contribute to the field by exploring diagnostic decision-making via modeling physicians' utterances of medical concepts during image-based diagnoses. We conduct experiments to collect verbal narratives from dermatologists while they are examining and describing dermatology images towards diagnoses. We propose a hierarchical probabilistic framework to learn domain-specific patterns from the medical concepts in these narratives. The discovered patterns match the diagnostic units of thought identified by domain experts. These meaningful patterns uncover physicians' diagnostic decision-making processes while parsing the image content. Our evaluation shows that these patterns provide key information to classify narratives by diagnostic correctness levels.

### Wednesday 2310:30 - 12:00MAS-EP2 - Economic Paradigms 2 (219)

Chair: Makoto Yokoo
• #1584
Diverse Weighted Bipartite b-Matching
Faez Ahmed, John P. Dickerson, Mark Fuge
Economic Paradigms 2

Bipartite matching, where agents on one side of a market are matched to agents or items on the other, is a classical problem in computer science and economics, with widespread application in healthcare, education, advertising, and general resource allocation. A practitioner's goal is typically to maximize a matching market's economic efficiency, possibly subject to some fairness requirements that promote equal access to resources. A natural balancing act exists between fairness and efficiency in matching markets, and has been the subject of much research.In this paper, we study a complementary goal---balancing diversity and efficiency---in a generalization of bipartite matching where agents on one side of the market can be matched to sets of agents on the other. Adapting a classical definition of the diversity of a set, we propose a quadratic programming-based approach to solving a submodular minimization problem that balances diversity and total weight of the solution. We also provide a scalable greedy algorithm with theoretical performance bounds. We then define the price of diversity, a measure of the efficiency loss due to enforcing diversity, and give a worst-case theoretical bound. Finally, we demonstrate the efficacy of our methods on three real-world datasets, and show that the price of diversity is not bad in practice. Our code is publicly accessible for further research.

• #2042
Online Optimization of Video-Ad Allocation
Hanna Sumita, Yasushi Kawase, Sumio Fujita, Takuro Fukunaga
Economic Paradigms 2

In this paper, we study the video advertising in the context of internet advertising. Video advertising is a rapidly growing industry, but its computational aspects have not yet been investigated. A difference between video advertising and traditional display advertising is that the former requires more time to be viewed. In contrast to a traditional display advertisement, a video advertisement has no influence over a user unless the user watches it for a certain amount of time. Previous studies have not considered the length of video advertisements, and time spent by users to watch them. Motivated by this observation, we formulate a new online optimization problem for optimizing the allocation of video advertisements, and we develop a nearly (1 − 1/e)-competitive algorithm for finding an envy-free allocation of video advertisements.

• #2191
Near-Feasible Stable Matchings with Budget Constraints
Yasushi Kawase, Atsushi Iwasaki
Economic Paradigms 2

This paper deals with two-sided matching with budget constraints where one side (firm or hospital) can make monetary transfers (offer wages) to the other (worker or doctor). In a standard model, while multiple doctors can be matched to a single hospital, a hospital has a maximum quota: the number of doctors assigned to a hospital cannot exceed a certain limit. In our model, a hospital instead has a fixed budget: the total amount of wages allocated by each hospital to doctors is constrained. With budget constraints, stable matchings may fail to exist and checking the existence is hard. To deal with the nonexistence of stable matchings, we extend the “matching with contracts” model by Hatfield and Milgrom, so that it handles near-feasible matchings that exceeds each budget of the hospitals by a certain amount. We then propose two novel mechanisms that efficiently return such a near-feasible matching that is stable with respect to the actual amount of wages allocated by each hospital. In particular, by sacrificing strategy-proofness, our second mechanism achieves the best possible bound.

• #2791
Optimal Posted-Price Mechanism in Microtask Crowdsourcing
Zehong Hu, Jie Zhang
Economic Paradigms 2

Posted-price mechanisms are widely-adopted to decide the price of tasks in popular microtask crowdsourcing. In this paper, we propose a novel posted-price mechanism which not only outperforms existing mechanisms on performance but also avoids their need of a finite price range. The advantages are achieved by converting the pricing problem into a multi-armed bandit problem and designing an optimal algorithm to exploit the unique features of microtask crowdsourcing. We theoretically show the optimality of our algorithm and prove that the performance upper bound can be achieved without the need of a prior price range. We also conduct extensive experiments using real price data to verify the advantages and practicability of our mechanism.

• #3476
Learning a Ground Truth Ranking Using Noisy Approval Votes
Ioannis Caragiannis, Evi Micha
Economic Paradigms 2

We consider a voting scenario where agents have opinions that are estimates of an underlying common ground truth ranking of the available alternatives, and each agent is asked to approve a set with her most preferred alternatives. We assume that estimates are implicitly formed using the well-known Mallows model for generating random rankings. We show that k-approval voting --- where all agents are asked to approve the same number k of alternatives and the outcome is obtained by sorting the alternatives in terms of their number of approvals --- has exponential sample complexity for all values of k. This negative result suggests that an exponential (in terms of the number of alternatives m) number of agents is always necessary in order to recover the ground truth ranking with high probability. In contrast, by just asking each agent to approve a random number of alternatives, the sample complexity improves dramatically: it now depends only polynomially on m. Our results may have implications on the effectiveness of crowdsourcing applications that ask workers to provide their input by approving sets of available alternatives.

• #3765
Thwarting Vote Buying Through Decoy Ballots
David C. Parkes, Paul Tylkin, Lirong Xia
Economic Paradigms 2

There is increasing interest in promoting participatory democracy, in particular by allowing voting by mail or internet and through random-sample elections. A pernicious concern, though, is that of vote buying, which occurs when a bad actor seeks to buy ballots, paying someone to vote against their own intent. This becomes possible whenever a voter is able to sell evidence of which way she voted. We show how to thwart vote buying through decoy ballots, which are not counted but are indistinguishable from real ballots to a buyer. We show that an Election Authority can significantly reduce the power of vote buying through a small number of optimally distributed decoys, and model societal processes by which decoys could be distributed.

### Wednesday 2310:30 - 12:00PL-PA - Planning Algorithms (220)

Chair: Eyal Shlomo (Solomon) Shimony
• #2223
On Creating Complementary Pattern Databases
Santiago Franco, Álvaro Torralba, Levi H. S. Lelis, Mike Barley
Planning Algorithms

A pattern database (PDB) for a planning task is a heuristic function in the form of a lookup table that contains optimal solution costs of a simplified version of the task. In this paper we introduce a method that sequentially creates multiple PDBs which are later combined into a single heuristic function. At a given iteration, our method uses estimates of the A* running time to create a PDB that complements the strengths of the PDBs created in previous iterations. We evaluate our algorithm using explicit and symbolic PDBs. Our results show that the heuristics produced by our approach are able to outperform existing schemes, and that our method is able to create PDBs that complement the strengths of other existing heuristics such as a symbolic perimeter heuristic.

• #2508
Additive Merge-and-Shrink Heuristics for Diverse Action Costs
Gaojian Fan, Martin Müller, Robert Holte
Planning Algorithms

In many planning applications, actions can have highly diverse costs. Recent studies focus on the effects of diverse action costs on search algorithms, but not on their effects on domain-independent heuristics. In this paper, we demonstrate there are negative impacts of action cost diversity on merge-and-shrink (M&S), a successful abstraction method for producing high-quality heuristics for planning problems. We propose a new cost partitioning method for M&S to address the negative effects of diverse action costs. We investigate non-unit cost IPC domains, especially those for which diverse action costs have severe negative effects on the quality of the M&S heuristic. Our experiments demonstrate that in these domains, an additive set of M&S heuristics using the new cost partitioning method produces much more informative and effective heuristics than creating a single M&S heuristic which directly encodes diverse costs.

• #3078
From Qualitative to Quantitative Dominance Pruning for Optimal Planning
Álvaro Torralba
Planning Algorithms

Dominance relations compare states to determine whether one is at least as good as another in terms of their goal distance. We generalize these qualitative yes/no relations to functions that measure by how much a state is better than another. This allows us to distinguish cases where the state is strictly closer to the goal. Moreover, we may obtain a bound on the difference in goal distance between two states even if there is no qualitative dominance.We analyze the multiple advantages that quantitative dominance has, like discovering coarser dominance relations, or trading dominance by g-value. Moreover, quantitative dominance can also be used to prove that an action starts an optimal plan from a given state. We introduce a novel action selection pruning that uses this to prune any other successor. Results show that quantitative dominance pruning greatly reduces the search space, significantly increasing the planners' performance.

• #3251
Search and Learn: On Dead-End Detectors, the Traps they Set, and Trap Learning
Marcel Steinmetz, Jörg Hoffmann
Planning Algorithms

A key technique for proving unsolvability in classical planning are dead-end detectors \Delta: effectively testable criteria sufficient for unsolvability, pruning (some) unsolvable states during search. Related to this, a recent proposal is the identification of traps prior to search, compact representations of non-goal state sets T that cannot be escaped. Here, we create new synergy across these ideas. We define a generalized concept of traps, relative to a given dead-end detector \Delta, where T can be escaped, but only into dead-end states detected by \Delta. We show how to learn compact representations of such T during search, extending the reach of \Delta. Our experiments show that this can be quite beneficial. It improves coverage for many unsolvable benchmark planning domains and dead-end detectors \Delta, in particular on resource-constrained domains where it outperforms the state of the art.

• #3386
Robust Advertisement Allocation
Shaojie Tang
Planning Algorithms

With the rapid growth of e-commerce and World Wide Web, internet advertising revenue has surpassed broadcast revenue very recently. As online advertising has become a major source of revenue for online publishers, such as Google and Amazon, one problem facing them is to optimize the ads selection and allocation in order to maximize their revenue. Although there is a rich body of work that has been devoted to this field, uncertainty about models and parameter settings is largely ignored in existing algorithm design. To fill this gap, we are the first to formulate and study the \emph{Robust Ad Allocation} problem, by taking into account the uncertainty about parameter settings. We define a Robust Ad Allocation framework with a set of candidate parameter settings, typically derived from different users or topics. Our main aim is to develop robust ad allocation algorithms, which can provide satisfactory performance across a spectrum of parameter settings, compared to the (parameter-specific) optimum solutions. We study this problem progressively and propose a series of algorithms with bounded approximation ratio.

• #3647
Purely Declarative Action Descriptions are Overrated: Classical Planning with Simulators
Guillem Francès, Miquel Ramírez, Nir Lipovetzky, Hector Geffner
Planning Algorithms

Classical planning is concerned with problems where a goal needs to be reached from a known initial state by doing actions with deterministic, known effects. Classical planners, however, deal only with classical problems that can be expressed in declarative planning languages such as STRIPS or PDDL. This prevents their use on problems that are not easy to model declaratively or whose dynamics are given via simulations. Simulators do not provide a declarative representation of actions, but simply return successor states. The question we address in this paper is: can a planner that has access to the structure of states and goals only, approach the performance of planners that also have access to the structure of actions expressed in PDDL? To answer this, we develop domain-independent, black box planning algorithms that completely ignore action structure, and show that they match the performance of state-of-the-art classical planners on the standard planning benchmarks. Effective black box algorithms open up new possibilities for modeling and for expressing control knowledge, which we also illustrate.

### Wednesday 2310:30 - 12:30EAR-3 - Early Career 3 (Plenary 2)

Chair: Gerhard Lakemeyer
• #23
Logic meets Probability: Towards Explainable AI Systems for Uncertain Worlds
Vaishak Belle
Early Career 3

Logical AI is concerned with formal languages to represent and reason with qualitative specifications; statistical AI is concerned with learning quantitative specifications from data. To combine the strengths of these two camps, there has been exciting recent progress on unifying logic and probability. We review the many guises for this union, while emphasizing the need for a formal language to represent a system's knowledge. Formal languages allow their internal properties to be robustly scrutinized, can be augmented by adding new knowledge, and are amenable to abstractions, all of which are vital to the design of intelligent systems that are explainable and interpretable.

• #27
Knowledge Engineering for Intelligent Decision Support
María Vanina Martínez
Early Career 3

Knowledge can be seen as the collection of skills and information an individual (or group) has acquired through experience, while intelligence as the ability to apply such knowledge. In many areas of Artificial Intelligence, we have been focusing for the last 40 years on the formalization and development of automated ways of finding and collecting data, as well as on the construction of models to represent that data adequately in a way that an automated system can make sense of it. However, in order to achieve real artificial intelligence we need to go beyond data and knowledge representation, and deeper into how such a system could, and would, use available knowledge in order to empower and enhance the capabilities of humans in making decisions in real-world applications. From my point of view, an AI should be able to combine automatically acquired data and knowledge together with specific domain expertise from the users that the tool is expected to help.

• #32
Improving Group Decision-Making by Artificial Intelligence
Lirong Xia
Early Career 3

We summarize some of our recent work on using AI to improve group decision-making by taking a unified approach from statistics, economics, and computation. We then discuss a few ongoing and future directions.

• #29
Towards Certified Unsolvability in Classical Planning
Gabriele Röger
Early Career 3

While it is easy to verify that an action sequence is a solution for a classical planning task, there is no such verification capability if a task is reported unsolvable. We are therefore interested in certificates that allow an independent verification of the absence of solutions. We identify promising concepts for certificates that can be generated by a wide range of planning approaches. We present a first proposal of unsolvability certificates and sketch ideas how the underlying concepts can be used as part of a more flexible unsolvability proof system.

### Wednesday 2310:30 - 12:30SIS-KRNLP - Sister Conference Track: Knowledge Representation and Natural Language Processing (203)

Chair: Mausam
• #4222
User-Based Opinion-based Recommendation
Ruihai Dong, Barry Smyth
Sister Conference Track: Knowledge Representation and Natural Language Processing

User-generated reviews are a plentiful source of user opinions and interests and can play an important role in a range of artificial intelligence contexts, particularly when it comes to recommender systems. In this paper, we describe how natural language processing and opinion mining techniques can be used to automatically mine useful recommendation knowledge from user generated reviews and how this information can be used by recommender systems in a number of classical settings.

• #4231
Predicting Human Similarity Judgments with Distributional Models: The Value of Word Associations
Simon De Deyne, Amy Perfors, Daniel J. Navarro
Sister Conference Track: Knowledge Representation and Natural Language Processing

To represent the meaning of a word, most models use external language resources, such as text corpora, to derive the distributional properties of word usage. In this study, we propose that internal language models, that are more closely aligned to the mental representations of words, can be used to derive new theoretical questions regarding the structure of the mental lexicon. A comparison with internal models also puts into perspective a number of assumptions underlying recently proposed distributional text-based models could provide important insights into cognitive science, including linguistics and artificial intelligence. We focus on word-embedding models which have been proposed to learn aspects of word meaning in a manner similar to humans and contrast them with internal language models derived from a new extensive data set of word associations. An evaluation using relatedness judgments shows that internal language models consistently outperform current state-of-the art text-based external language models. This suggests alternative approaches to represent word meaning using properties that aren't encoded in text.

• #4238
Lexicons on Demand: Neural Word Embeddings for Large-Scale Text Analysis
Ethan Fast, Binbin Chen, Michael S. Bernstein
Sister Conference Track: Knowledge Representation and Natural Language Processing

Human language is colored by a broad range of topics, but existing text analysis tools only focus on a small number of them. We present Empath, a tool that can generate and validate new lexical categories on demand from a small set of seed terms (like "bleed" and "punch" to generate the category violence). Empath draws connotations between words and phrases by learning a neural embedding across billions of words on the web. Given a small set of seed words that characterize a category, Empath uses its neural embedding to discover new related terms, then validates the category with a crowd-powered filter. Empath also analyzes text across 200 built-in, pre-validated categories we have generated such as neglect, government, and social media. We show that Empath's data-driven, human validated categories are highly correlated (r=0.906) with similar categories in LIWC.

• #4244
Adapting Deep Network Features to Capture Psychological Representations: An Abridged Report
Joshua C. Peterson, Joshua T. Abbott, Thomas L. Griffiths
Sister Conference Track: Knowledge Representation and Natural Language Processing

Deep neural networks have become increasingly successful at solving classic perception problems (e.g., recognizing objects), often reaching or surpassing human-level accuracy. In this abridged report of Peterson et al. [2016], we examine the relationship between the image representations learned by these networks and those of humans. We find that deep features learned in service of object classification account for a significant amount of the variance in human similarity judgments for a set of animal images. However, these features do not appear to capture some key qualitative aspects of human representations. To close this gap, we present a method for adapting deep features to align with human similarity judgments, resulting in image representations that can potentially be used to extend the scope of psychological experiments and inform human-centric AI.

• #4249
Grounding Abstract Spatial Concepts for Language Interaction with Robots
Rohan Paul, Jacob Arkin, Nicholas Roy, Thomas M. Howard
Sister Conference Track: Knowledge Representation and Natural Language Processing

Our goal is to develop models that allow a robot to understand or ground" natural language instructionsin the context of its world model. Contemporary approaches estimate correspondences between an instruction and possible candidate groundings such as objects, regions and goals for a robot's action. However, these approaches are unable to reason about abstract or hierarchical concepts such as rows, columns and groups that are relevant in a manipulation domain. We introduce a probabilistic model that incorporates an expressive space of abstract spatial concepts as well as notions of cardinality and ordinality. Abstract concepts are introduced as explicit hierarchical symbols correlated with concrete groundings. Crucially, the abstract groundings form a Markov boundary over concrete groundings, effectively de-correlating them from the remaining variables in the graph which reduces the complexity of training and inference in the model. Empirical evaluation demonstrates accurate grounding of abstract concepts embedded in complex natural language instructions commanding a robot manipulator. The proposed inference method leads to significant efficiency gains compared to the baseline, with minimal trade-off in accuracy.

• #4259
Intuitionistic Layered Graph Logic
Simon Docherty, David Pym
Sister Conference Track: Knowledge Representation and Natural Language Processing

Models of complex systems are widely used in the physical and social sciences, and the concept of layering, typically building upon graph-theoretic structure, is a common feature. We describe an intuitionistic substructural logic that gives an account of layering. As in other bunched systems, the logic includes the usual intuitionistic connectives, together with a non-commutative, non-associative conjunction (used to capture layering) and its associated implications. We give a soundness and completeness theorem for a labelled tableaux system with respect to a Kripke semantics on graphs. To demonstrate the utility of the logic, we show how to represent systems and security examples, illuminating the relationship between services/policies and the infrastructures/architectures to which they are applied.

### Wednesday 2314:00 - 15:00Invited Talk (203-204)

Chair: Michael Wooldridge
• Super-Human AI for Strategic Reasoning: Beating Top Pros in Heads-Up No-Limit Texas Hold'em
Tuomas Sandholm
Invited Talk
• ### Wednesday 2314:00 - 15:30Invited Talk (Plenary 2)

Chair: Qiang Yang
• Improving Health-Care: Challenges and Opportunities for Reinforcement Learning
Joelle Pineau
Invited Talk
• ### Wednesday 2315:00 - 16:00Panel (Plenary 2)

• AI and Societal Challenges
Panelists: Toby Walsh, Joelle Pineau, Virginia Dignum, Tuomas Sandholm.
Panel
• ### Wednesday 2315:00 - 16:00ML-SL - Structured Learning (203-204)

Chair: Daniel Boley
• #3671
Parsing Natural Language Conversations using Contextual Cues
Shashank Srivastava, Amos Azaria, Tom Mitchell
Structured Learning

In this work, we focus on semantic parsing of natural language conversations. Most existing methods for semantic parsing are based on understanding the semantics of a single sentence at a time. However, understanding conversations also requires an understanding of conversational context and discourse structure across sentences. We formulate semantic parsing of conversations as a structured prediction task, incorporating structural features that model the flow of discourse' across sequences of utterances. We create a dataset for semantic parsing of conversations, consisting of 113 real-life sequences of interactions of human users with an automated email assistant. The data contains 4759 natural language statements paired with annotated logical forms. Our approach yields significant gains in performance over traditional semantic parsing.

• #1328
ROUTE: Robust Outlier Estimation for Low Rank Matrix Recovery
Xiaojie Guo, Zhouchen Lin
Structured Learning

In practice, even very high-dimensional data are typically sampled from low-dimensional subspaces but with intrusion of outliers and/or noises. Recovering the underlying structure and the pollution from the observations is key to understanding and processing such data. Besides properly modeling the low-rank structure of subspace, how to handle the pollution, is core regarding the performance of recovery. Often, the observed data is posed as a superimposition of the clean data and residual, while the residual can be roughly divided into two groups, including small dense noises and gross sparse outliers. Compared with small noises, outliers more likely ruin the recovery, as they can be arbitrarily large. By considering the above, this paper designs a method for recovering the low rank matrix with robust outlier estimation, termed as ROUTE, in a unified manner. Theoretical analysis on convergence and optimality, and experimental results on both synthetic and real data are provided to demonstrate the efficacy of our proposed method and show its superiority over other state-of-the-arts.

• #2115
Sense Beauty by Label Distribution Learning
Yi Ren, Xin Geng
Structured Learning

Beauty is always an attractive topic in the human society, not only artists and psychologists, but also scientists have been searching for an answer -- what is beautiful. This paper presents an approach to learning the human sense toward facial beauty. Different from previous study, the human sense is represented by a label distribution, which covers the full range of beauty ratings and indicates the degree to which each beauty rating describes the face. The motivation is that the human sense of beauty is generally quite subjective, thus it might be inappropriate to represent it with a single scalar, as most previous work does. Therefore, we propose a method called Beauty Distribution Transformation(BDT) to covert the k-wise ratings to label distributions and propose a learning method called Structural Label Distribution Learning(SLDL) based on structural Support Vector Machine to learn the human sense of facial beauty.

• #2565
Efficient Inexact Proximal Gradient Algorithm for Nonconvex Problems
Quanming Yao, James T. Kwok, Fei Gao, Wei Chen, Tie-Yan Liu
Structured Learning

While proximal gradient algorithm is originally designed for convex optimization, several variants have been recently proposed for nonconvex problems. Among them, nmAPG [Li and Lin, 2015] is the state-of-art. However, it is inefficient when the proximal step does not have closed-form solution, or such solution exists but is expensive, as it requires more than one proximal steps to be exactly solved in each iteration. In this paper, we propose an efficient accelerate proximal gradient (niAPG) algorithm for nonconvex problems. In each iteration, it requires only one inexact (less expensive) proximal step. Convergence to a critical point is still guaranteed, and a O(1/k) convergence rate is derived. Experiments on image inpainting and matrix completion problems demonstrate that the proposed algorithm has comparable performance as the state-of-the-art, but is much faster.

### Wednesday 2315:00 - 16:00ROB-XXX - Robotics, Voting (210)

Chair: Daniel Harabor
• #1987
Integrating Answer Set Programming with Semantic Dictionaries for Robot Task Planning
Dongcai Lu, Yi Zhou, Feng Wu, Zhao Zhang, Xiaoping Chen
Robotics, Voting

In this paper, we propose a novel integrated task planning system for service robot in domestic domains. Given open-ended high-level user instructions in natural language, robots need to generate a plan, i.e., a sequence of low-level executable actions, to complete the required tasks. To address this, we exploit the knowledge on semantic roles of common verbs defined in semantic dictionaries such as FrameNet and integrate it with Answer Set Programming --- a task planning framework with both representation language and solvers. In the experiments, we evaluated our approach using common benchmarks on service tasks and showed that it can successfully handle much more tasks than the state-of-the-art solution. Notably, we deployed the proposed planning system on our service robot for the annual RoboCup@Home competitions and achieved very encouraging results.

• #2064
Dual Track Multimodal Automatic Learning through Human-Robot Interaction
Shuqiang Jiang, Weiqing Min, Xue Li, Huayang Wang, Jian Sun, Jiaqi Zhou
Robotics, Voting

Human beings are constantly improving their cognitive ability via automatic learning from the interaction with the environment. Two important aspects of automatic learning are the visual perception and knowledge acquisition. The fusion of these two aspects is vital for improving the intelligence and interaction performance of robots. Many automatic knowledge extraction and recognition methods have been widely studied. However, little work focuses on integrating automatic knowledge extraction and recognition into a unified framework to enable jointly visual perception and knowledge acquisition. To solve this problem, we propose a Dual Track Multimodal Automatic Learning (DTMAL) system, which consists of two components: Hybrid Incremental Learning (HIL) from the vision track and Multimodal Knowledge Extraction (MKE) from the knowledge track. HIL can incrementally improve recognition ability of the system by learning new object samples and new object concepts. MKE is capable of constructing and updating the multimodal knowledge items based on the recognized new objects from HIL and other knowledge by exploring the multimodal signals. The fusion of the two tracks is a mutual promotion process and jointly devote to the dual track learning. We have conducted the experiments through human-machine interaction and the experimental results validated the effectiveness of our proposed system.

• #3644
Temporal Grounding Graphs for Language Understanding with Accrued Visual-Linguistic Context
Rohan Paul, Andrei Barbu, Sue Felshin, Boris Katz, Nicholas Roy
Robotics, Voting

A robot’s ability to understand or ground natural language instructions is fundamentally tied to its knowledge about the surrounding world. We present an approach to grounding natural language utterances in the context of factual information gathered through natural-language interactions and past visual observations. A probabilistic model estimates, from a natural language utterance, the objects, relations, and actions that the utterance refers to, the objectives for future robotic actions it implies, and generates a plan to execute those actions while updating a state representation to include newly acquired knowledge from the visual-linguistic context. Grounding a command necessitates a representation for past observations and interactions; however, maintaining the full context consisting of all possible observed objects, attributes, spatial relations, actions, etc., over time is intractable. Instead, our model, Temporal Grounding Graphs, maintains a learned state representation for a belief over factual groundings, those derived from natural-language interactions, and lazily infers new groundings from visual observations using the context implied by the utterance. This work significantly expands the range of language that a robot can understand by incorporating factual knowledge and observations of its workspace into its inference about the meaning and grounding of natural-language utterances.

• #2353
Voting by sequential elimination with few voters
Sylvain Bouveret, Yann Chevaleyre, François Durand, Jérôme Lang
Robotics, Voting

We define a new class of low-communication voting rules, tailored for contexts with few voters and possibly many candidates. These rules are defined by a predefined sequence of voters: at each stage, the designated voter eliminates a candidate, and the last remaining candidate wins. We study both deterministic (non-anonymous) variants, and randomized (and anonymous) versions of these rules. We focus on a subfamily of these rules defined by non-interleaved'' sequences. We first focus on the axiomatic properties of our rules. Then we focus on the identification of the non-interleaved sequence that gives the best approximation of the Borda score under the impartial culture. Finally, we apply our rules to randomly generated data. Our conclusion is that, in contexts where there are more candidates than voters, elimination-based rules allow for a very low communication complexity (and especially, avoid asking voters to rank alternatives), and yet can be good approximations of common voting rules, while enjoying a number of good properties.

### Wednesday 2315:00 - 16:00MAS-ABS - Agent-Based Simulation (211)

Chair: Hanna Kurniawati
• #2205
Enhancing Sustainability of Complex Epidemiological Models through a Generic Multilevel Agent-based Approach
Sébastien Picault, Yu-Lin Huang, Vianney Sicard, Pauline Ezanno
Agent-Based Simulation

The development of computational sciences has fostered major advances in life sciences, but also led to reproducibility and reliability issues, which become a crucial stake when simulations are aimed at assessing control measures, as in epidemiology. A broad use of software development methods is a useful remediation to reduce those problems, but preventive approaches, targeting not only implementation but also model design, are essential to sustainable enhancements. Among them, AI techniques, based on the separation between declarative and procedural concerns, and on knowledge engineering, offer promising solutions. Especially, multilevel multi-agent systems, deeply rooted in that culture, provide a generic way to integrate several epidemiological modeling paradigms within a homogeneous interface. We explain in this paper how this approach is used for building more generic, reliable and sustainable simulations, illustrated by real-case applications in cattle epidemiology.

• #2309
Factorized Asymptotic Bayesian Policy Search for POMDPs
Masaaki Imaizumi, Ryohei Fujimaki
Agent-Based Simulation

This paper proposes a novel direct policy search (DPS) method with model selection for partially observed Markov decision processes (POMDPs). DPSs have been standard for learning POMDPs due to their computational efficiency and natural ability to maximize total rewards. An important open challenge for the best use of DPS methods is model selection, i.e., determination of the proper dimensionality of hidden states and complexity of policy functions, to mitigate overfitting in highly-flexible model representations of POMDPs. This paper bridges Bayesian inference and reward maximization and derives marginalized weighted log-likelihood~(MWL) for POMDPs which takes both advantages of Bayesian model selection and DPS. Then we propose factorized asymptotic Bayesian policy search (FABPS) to explore the model and the policy which maximizes MWL by expanding recently-developed factorized asymptotic Bayesian inference. Experimental results show that FABPS outperforms state-of-the-art model selection methods for POMDPs, with respect both to model selection and to expected total rewards.

• #2482
Interaction-based ontology alignment repair with expansion and relaxation
Jérôme Euzenat
Agent-Based Simulation

Agents may use ontology alignments to communicate when they represent knowledge with different ontologies: alignments help reclassifying objects from one ontology to the other. These alignments may not be perfectly correct, yet agents have to proceed. They can take advantage of their experience in order to evolve alignments: upon communication failure, they will adapt the alignments to avoid reproducing the same mistake. Such repair experiments had been performed in the framework of networks of ontologies related by alignments. They revealed that, by playing simple interaction games, agents can effectively repair random networks of ontologies. Here we repeat these experiments and, using new measures, show that previous results were underestimated. We introduce new adaptation operators that improve those previously considered. We also allow agents to go beyond the initial operators in two ways: they can generate new correspondences when they discard incorrect ones, and they can provide less precise answers. The combination of these modalities satisfy the following properties: (1) Agents still converge to a state in which no mistake occurs. (2) They achieve results far closer to the correct alignments than previously found. (3) They reach again 100\% precision and coherent alignments.

• #3826
Aggressive, Tense or Shy? Identifying Personality Traits from Crowd Videos
Aniket Bera, Tanmay Randhavane, Dinesh Manocha
Agent-Based Simulation

We present a real-time algorithm to automatically classify the behavior or personality of a pedestrian based on his or her movements in a crowd video. Our classification criterion is based on Personality Trait theory. We present a statistical scheme that dynamically learns the behavior of every pedestrian and computes its motion model. This model is combined with global crowd characteristics to compute the movement patterns and motion dynamics and use them for crowd prediction. Our learning scheme is general and we highlight its performance in identifying the personality of different pedestrians in low and high density crowd videos. We also evaluate the accuracy by comparing the results with a user study.

### Wednesday 2315:00 - 16:00CS-SAT - Satisfiability (212)

Chair: Jussi Rintanen
• #2254
A Recursive Shortcut for CEGAR: Application To The Modal Logic K Satisfiability Problem
Jean-Marie Lagniez, Daniel Le Berre, Tiago de Lima, Valentin Montmirail
Satisfiability

Counter-Example-Guided Abstraction Refinement (CEGAR) has been very successful in model checking large systems. Since then, it has been applied to many different problems. It especially proved to be an highly successful practical approach for solving the PSPACE complete QBF problem. In this paper, we propose a new CEGAR-like approach for tackling PSPACE complete problems that we call RECAR (Recursive Explore and Check Abstraction Refinement). We show that this generic approach is sound and complete. Then we propose a specific implementation of the RECAR approach to solve the modal logic K satisfiability problem. We implemented both a CEGAR and a RECAR approach for the modal logic K satisfiability problem within the solver MoSaiC. We compared experimentally those approaches to the state-of-the-art solvers for that problem. The RECAR approach outperforms the CEGAR one for that problem and also compares favorably against the state-of-the-art on the benchmarks considered.

• #2826
Intelligent Belief State Sampling for Conformant Planning
Alban Grastien, Enrico Scala
Satisfiability

We propose a new method for conformant planning based on two ideas. First given a small sample of the initial belief state we reduce conformant planning for this sample to a classical planning problem, giving us a candidate solution. Second we exploit regression as a way to compactly represent necessary conditions for such a solution to be valid for the non-deterministic setting. If necessary, we use the resulting formula to extract a counter-example to populate our next sampling. Our experiments show that this approach is competitive on a class of problems that are hard for traditional planners, and also returns generally shorter plans. We are also able to demonstrate unsatisfiability of some problems.

• #1253
Generating Hard Random Boolean Formulas and Disjunctive Logic Programs
Giovanni Amendola, Francesco Ricca, Miroslaw Truszczynski
Satisfiability

We propose a model of random quantified boolean formulas and their natural random disjunctive logic program counterparts. The model extends the standard models for random SAT and 2QBF. We provide theoretical bounds for the phase transition region in the new model, and show experimentally the presence of the easy-hard-easy pattern. Importantly, we show that the model is well suited for assessing solvers tuned to real-world instances. Moreover, to the best of our knowledge, our model and results on random disjunctive logic programs are the first of their kind.

• #3123
Locality in Random SAT Instances
Jesús Giráldez-Cru, Jordi Levy
Satisfiability

Despite the success of CDCL SAT solvers solving industrial problems, there are still many open questions to explain such success. In this context, the generation of random SAT instances having computational properties more similar to real-world problems becomes crucial. Such generators are possibly the best tool to analyze families of instances and solvers behaviors on them. In this paper, we present a random SAT instances generator based on the notion of locality. We show that this is a decisive dimension of attractiveness among the variables of a formula, and how CDCL SAT solvers take advantage of it. To the best of our knowledge, this is the first random SAT model that generates both scale-free structure and community structure at once.

### Wednesday 2315:00 - 16:00NLP-DIS - Discourse (213)

Chair: Frank Dignum
• #3222
A Deep Neural Network for Chinese Zero Pronoun Resolution
Qingyu Yin, Weinan Zhang, Yu Zhang, Ting Liu
Discourse

Existing approaches for Chinese zero pronoun resolution overlook semantic information. This is because zero pronouns have no descriptive information, which results in difficulty in explicitly capturing their semantic similarities with antecedents. Moreover, when dealing with candidate antecedents, traditional systems simply take advantage of the local information of a single candidate antecedent while failing to consider the underlying information provided by the other candidates from a global perspective. To address these weaknesses, we propose a novel zero pronoun-specific neural network, which is capable of representing zero pronouns by utilizing the contextual information at the semantic level. In addition, when dealing with candidate antecedents, a two-level candidate encoder is employed to explicitly capture both the local and global information of candidate antecedents. We conduct experiments on the Chinese portion of the OntoNotes 5.0 corpus. Experimental results show that our approach substantially outperforms the state-of-the-art method in various experimental settings.

• #4141
Inferring Implicit Event Locations from Context with Distributional Similarities
Jin-Woo Chung, Wonsuk Yang, Jinseon You, Jong C. Park
Discourse

Automatic event location extraction from text plays a crucial role in many applications such as infectious disease surveillance and natural disaster monitoring. The fundamental limitation of previous work such as SpaceEval is the limited scope of extraction, targeting only at locations that are explicitly stated in a syntactic structure. This leads to missing a lot of implicit information inferable from context in a document, which amounts to nearly 40% of the entire location information. To overcome this limitation for the first time, we present a system that infers the implicit event locations from a given document. Our system exploits distributional semantics, based on the hypothesis that if two events are described by similar expressions, it is likely that they occur in the same location. For example, if “A bomb exploded causing 30 victims” and “many people died from terrorist attack in Boston” are reported in the same document, it is highly likely that the bomb exploded in Boston. Our system shows good performance of a 0.58 F1-score, where state-of-the-art classifiers for intra-sentential spatiotemporal relations achieve around 0.60 F1-scores.

• #3348
SWIM: A Simple Word Interaction Model for Implicit Discourse Relation Recognition
Wenqiang Lei, Xuancong Wang, Meichun Liu, Ilija Ilievski, Xiangnan He, Min-Yen Kan
Discourse

Capturing the semantic interaction of pairs of words across arguments and proper argument representation are both crucial issues in implicit discourse relation recognition. The current state-of-the-art represents arguments as distributional vectors that are computed via bi-directional Long Short-Term Memory networks (BiLSTMs), known to have significant model complexity.In contrast, we demonstrate that word-weighted averaging can encode argument representation which can incorporate word pair information efficiently. By saving an order of magnitude in parameters, our proposed model achieves equivalent performance, but trains seven times faster.

• #1658
Tosca: Operationalizing Commitments Over Information Protocols
Thomas C. King, Akın Günay, Amit K. Chopra, Munindar P. Singh
Discourse

The notion of commitment is widely studied as a high-level abstraction for modeling multiagent interaction. An important challenge is supporting flexible decentralized enactments of commitment specifications. In this paper, we combine recent advances on specifying commitments and information protocols. Specifically, we contribute Tosca, a technique for automatically synthesizing information protocols from commitment specifications. Our main result is that the synthesized protocols support commitment alignment, which is the idea that agents must make compatible inferences about their commitments despite decentralization.

### Wednesday 2315:00 - 16:00PL-APR - Activity and Plan Recognition (216)

Chair: Noa Agmon
• #1934
New Metrics and Algorithms for Stochastic Goal Recognition Design Problems
Christabel Wayllace, Ping Hou, William Yeoh
Activity and Plan Recognition

Goal Recognition Design (GRD) problems involve identifying the best ways to modify the underlying environment that agents operate in, typically by making a subset of feasible actions infeasible, in such a way that agents are forced to reveal their goals as early as possible. The Stochastic GRD (S-GRD) model is an important extension that introduced stochasticity to the outcome of agent actions. Unfortunately, the worst-case distinctiveness (wcd) metric proposed for S-GRDs has a formal definition that is inconsistent with its intuitive definition, which is the maximal number of actions an agent can take, in the expectation, before its goal is revealed. In this paper, we make the following contributions: (1) We propose a new wcd metric, called all-goals wcd (wcdag), that remedies this inconsistency; (2) We introduce a new metric, called expected-case distinctiveness (ecd), that weighs the possible goals based on their importance; (3) We provide theoretical results comparing these different metrics as well as the complexity of computing them optimally; and (4) We describe new efficient algorithms to compute the wcdag and ecd values.

• #2709
Deceptive Path-Planning
Peta Masters, Sebastian Sardina
Activity and Plan Recognition

Deceptive path-planning involves finding a path such that the probability of an observer identifying the path's final destination - before it has been reached - is minimised. This paper formalises deception as it applies to path-planning and introduces the notion of a last deceptive point (LDP) which, when measured in terms of 'path completion', can be used to rank paths by their potential to deceive. Building on recent developments in probabilistic goal-recognition, we propose a formula to calculate an optimal LDP and present strategies for the generation of deceptive paths by both simulation ('showing the false') and dissimulation ('hiding the real').

• #3561
Heuristic Online Goal Recognition in Continuous Domains
Mor Vered, Gal A. Kaminka
Activity and Plan Recognition

Goal recognition is the problem of inferring the goal of an agent, based on its observed actions. An inspiring approach—plan recognition by planning (PRP)—uses off-the-shelf planners to dynamically generate plans for given goals, eliminating the need for the traditional plan library. However, existing PRP formulation is inherently inefficient in online recognition, and cannot be used with motion planners for continuous spaces. In this paper, we utilize a different PRP formulation which allows for online goal recognition, and for application in continuous spaces. We present an online recognition algorithm, where two heuristic decision points may be used to improve run-time significantly over existing work. We specify heuristics for continuous domains, prove guarantees on their use, and empirically evaluate the algorithm over hundreds of experiments in both a 3D navigational environment and a cooperative robotic team task.

• #4136
Bridging the Gap between Observation and Decision Making: Goal Recognition and Flexible Resource Allocation in Dynamic Network Interdiction
Kai Xu, Kaiming Xiao, Quanjun Yin, Yabing Zha, Cheng Zhu
Activity and Plan Recognition

Goal recognition, which is the task of inferring an agent’s goals given some or all of the agent’s observed actions, is one of the important approaches in bridging the gap between the observation and decision making within an observe-orient-decide-act cycle. Unfortunately, few researches focus on how to improve the utilization of knowledge produced by a goal recognition system. In this work, we propose a Markov Decision Process-based goal recognition approach tailored to a dynamic shortest-path local network interdiction (DSPLNI) problem. We first introduce a novel DSPLNI model and its solvable dual form so as to incorporate real-time knowledge acquired from goal recognition system. Then a Markov Decision Process-based goal recognition model along with its dynamic Bayesian network representation and the applied goal inference method is proposed to identify the evader’s real goal within the DSPLNI context. Based on that, we further propose an efficient scalable technique in maintaining action utility map used in fast goal inference, and develop a flexible resource assignment mechanism in DSPLNI using knowledge from goal recognition system. Experimental results show the effectiveness and accuracy of our methods both in goal recognition and dynamic network interdiction.

### Wednesday 2315:00 - 16:00KR-CCR - Computational Complexity of Reasoning (217)

Chair: Georg Gottlob
• #3102
On the Kernelization of Global Constraints
Clément Carbonnel, Emmanuel Hebrard
Computational Complexity of Reasoning

Kernelization is a powerful concept from parameterized complexity theory that captures (a certain idea of) efficient polynomial-time preprocessing for hard decision problems. However, exploiting this technique in the context of constraint programming is challenging. Building on recent results for the VertexCover constraint, we introduce novel "loss-less" kernelization variants that are tailored for constraint propagation. We showcase the theoretical interest of our ideas on two constraints, VertexCover and EdgeDominatingSet.

• #1669
On the Complexity of Enumerating the Extensions of Abstract Argumentation Frameworks
Markus Kröll, Reinhard Pichler, Stefan Woltran
Computational Complexity of Reasoning

Several computational problems of abstract argumentation frameworks (AFs) such as skeptical and credulous reasoning, existence of a non-empty extension, verification, etc. have been thoroughly analyzed for various semantics. In contrast, the enumeration problem of AFs (i.e., the problem of computing all extensions according to some semantics) has been left unexplored so far. The goal of this paper is to fill this gap. We thus investigate the enumeration complexity of AFs for a large collection of semantics and, in addition, consider the most common structural restrictions on AFs.

• #1829
A General Notion of Equivalence for Abstract Argumentation
Ringo Baumann, Wolfgang Dvořák, Thomas Linsbichler, Stefan Woltran
Computational Complexity of Reasoning

We introduce a parametrized equivalence notion for abstract argumentation that subsumes standard and strong equivalence as corner cases. Under this notion, two argumentation frameworks are equivalent if they deliver the same extensions under any addition of arguments and attacks that do not affect a given set of core arguments. As we will see, this notion of equivalence nicely captures the concept of local simplifications. We provide exact characterizations and complexity results for deciding our new notion of equivalence.

• #3620
On the Computational Complexity of Gossip Protocols
Krzysztof R. Apt, Eryk Kopczyński, Dominik Wojtczak
Computational Complexity of Reasoning

Gossip protocols deal with a group of communicating agents, each holding a private information, and aim at arriving at a situation in which all the agents know each other secrets. Distributed epistemic gossip protocols are particularly simple distributed programs that use formulas from an epistemic logic. Recently, the implementability of these distributed protocols was established (which means that the evaluation of these formulas is decidable), and the problems of their partial correctness and termination were shown to be decidable, but their exact computational complexity was left open. We show that for any monotonic type of calls the implementability of a distributed epistemic gossip protocol is a P^{NP}_{||}-complete problem, while the problems of its partial correctness and termination are in coNP^{NP}.

### Wednesday 2315:00 - 16:00MT-KBSE - Knowledge-Based Software Engineering (218)

Chair: Takahira Yamaguchi
• #3019
Leveraging Human Knowledge in Tabular Reinforcement Learning: A Study of Human Subjects
Ariel Rosenfeld, Matthew E. Taylor, Sarit Kraus
Knowledge-Based Software Engineering

Reinforcement Learning (RL) can be extremely effective in solving complex, real-world problems. However, injecting human knowledge into an RL agent may require extensive effort on the human designer's part. To date, human factors are generally not considered in the development and evaluation of possible approaches. In this paper, we propose and evaluate a novel method, based on human psychology literature, which we show to be both effective and efficient, for both expert and non-expert designers, in injecting human knowledge for speeding up tabular RL.

• #3230
Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code
Huihui Wei, Ming Li
Knowledge-Based Software Engineering

Software clone detection, aiming at identifying out code fragments with similar functionalities, has played an important role in software maintenance and evolution. Many clone detection approaches have been proposed. However, most of them represent source codes with hand-crafted features using lexical or syntactical information, or unsupervised deep features, which makes it difficult to detect the functional clone pairs, i.e., pieces of codes with similar functionality but differing in both syntactical and lexical level. In this paper, we address the software functional clone detection problem by learning supervised deep features. We formulate the clone detection as a supervised learning to hash problem and propose an end-to-end deep feature learning framework called CDLH for functional clone detection. Such framework learns hash codes by exploiting the lexical and syntactical information for fast computation of functional similarity between code fragments. Experiments on software clone detection benchmarks indicate that the CDLH approach is effective and outperforms the state-of-the-art approaches in software functional clone detection.

• #3618
Enhancing the Unified Features to Locate Buggy Files by Exploiting the Sequential Nature of Source Code
Xuan Huo, Ming Li
Knowledge-Based Software Engineering

Bug reports provide an effective way for end-users to disclose potential bugs hidden in a software system, while automatically locating the potential buggy source files according to a bug report remains a great challenge in software maintenance. Many previous approaches represent bug reports and source code from lexical and structural information correlated their relevance by measuring their similarity, and recently a CNN-based model is proposed to learn the unified features for bug localization, which overcomes the difficulty in modeling natural and programming languages with different structural semantics. However, previous studies fail to capture the sequential nature of source code, which carries additional semantics beyond the lexical and structural terms and such information is vital in modeling program functionalities and behaviors. In this paper, we propose a novel model LS-CNN, which enhances the unified features by exploiting the sequential nature of source code. LS-CNN combines CNN and LSTM to extract semantic features for automatically identifying potential buggy source code according to a bug report. Experimental results on widely-used software projects indicate that LS-CNN significantly outperforms the state-of-the-art methods in locating buggy files.

• #3884
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, Sunghun Kim
Knowledge-Based Software Engineering

Computer programs written in one language are often required to be ported to other languages to support multiple devices and environments. When programs use language specific APIs (Application Programming Interfaces), it is very challenging to migrate these APIs to the corresponding APIs written in other languages. Existing approaches mine API mappings from projects that have corresponding versions in two languages. They rely on the sparse availability of bilingual projects, thus producing a limited number of API mappings. In this paper, we propose an intelligent system called DeepAM for automatically mining API mappings from a large-scale code corpus without bilingual projects. The key component of DeepAM is based on the multi-modal sequence to sequence learning architecture that aims to learn joint semantic representations of bilingual API sequences from big source code data. Experimental results indicate that DeepAM significantly increases the accuracy of API mappings as well as the number of API mappings when compared with the state-of-the-art approaches.

### Wednesday 2315:00 - 16:00NLP-SATM - Sentiment Analysis and Text Mining (219)

Chair: Rafal Rzepka
• #1561
Opinion-aware Knowledge Graph for Political Ideology Detection
Wei Chen, Xiao Zhang, Tengjiao Wang, Bishan Yang, Yi Li
Sentiment Analysis and Text Mining

Identifying individual's political ideology from their speeches and written texts is important for analyzing political opinions and user behavior on social media. Traditional opinion mining methods rely on bag-of-words representations to classify texts into different ideology categories. Such methods are too coarse for understanding political ideologies. The key to identify different ideologies is to recognize different opinions expressed toward a specific topic. To model this insight, we classify ideologies based on the distribution of opinions expressed towards real-world entities or topics. Specifically, we propose a novel approach to political ideology detection that makes predictions based on an opinion-aware knowledge graph. We show how to construct such graph by integrating the opinions and targeted entities extracted from text into an existing structured knowledge base, and show how to perform ideology inference by information propagation on the graph. Experimental results demonstrate that our method achieves high accuracy in detecting ideologies compared to baselines including LR, SVM and RNN.

• #2376
End-to-End Adversarial Memory Network for Cross-domain Sentiment Classification
Zheng Li, Yu Zhang, Ying Wei, Yuxiang Wu, Qiang Yang
Sentiment Analysis and Text Mining

Domain adaptation tasks such as cross-domain sentiment classification have raised much attention in recent years. Due to the domain discrepancy, a sentiment classifier trained in a source domain may not work well when directly applied to a target domain. Traditional methods need to manually select pivots, which behave in the same way for discriminative learning in both domains. Recently, deep learning methods have been proposed to learn a representation shared by domains. However, they lack the interpretability to directly identify the pivots. To address the problem, we introduce an end-to-end Adversarial Memory Network (AMN) for cross-domain sentiment classification. Unlike existing methods, our approach can automatically capture the pivots using an attention mechanism. Our framework consists of two parameter-shared memory networks: one is for sentiment classification and the other is for domain classification. The two networks are jointly trained so that the selected features minimize the sentiment classification error and at the same time make the domain classifier indiscriminative between the representations from the source or target domains. Moreover, unlike deep learning methods that cannot tell us which words are the pivots, our approach can offer a direct visualization of them. Experiments on the Amazon review dataset demonstrate that our approach can significantly outperform state-of-the-art methods.

• #2608
Stance Classification with Target-specific Neural Attention
Jiachen Du, Ruifeng Xu, Yulan He, Lin Gui
Sentiment Analysis and Text Mining

Stance classification, which aims at detecting the stance expressed in text towards a specific target, is an emerging problem in sentiment analysis. A major difference between stance classification and traditional aspect-level sentiment classification is that the identification of stance is dependent on target which might not be explicitly mentioned in text. This indicates that apart from text content, the target information is important to stance detection. To this end, we propose a neural network-based model, which incorporates target-specific information into stance classification by following a novel attention mechanism. In specific, the attention mechanism is expected to locate the critical parts of text which are related to target. Our evaluations on both the English and Chinese Stance Detection datasets show that the proposed model achieves the state-of-the-art performance.

• #3880
Interactive Attention Networks for Aspect-Level Sentiment Classification
Dehong Ma, Sujian Li, Xiaodong Zhang, Houfeng Wang
Sentiment Analysis and Text Mining

Aspect-level sentiment classification aims at identifying the sentiment polarity of specific target in its context. Previous approaches have realized the importance of targets in sentiment classification and developed various methods with the goal of precisely modeling thier contexts via generating target-specific representations. However, these studies always ignore the separate modeling of targets. In this paper, we argue that both targets and contexts deserve special treatment and need to be learned their own representations via interactive learning. Then, we propose the interactive attention networks (IAN) to interactively learn attentions in the contexts and targets, and generate the representations for targets and contexts separately. With this design, the IAN model can well represent a target and its collocative context, which is helpful to sentiment classification. Experimental results on SemEval 2014 Datasets demonstrate the effectiveness of our model.

### Wednesday 2315:00 - 16:00NLP-MT - Machine Translation (220)

Chair: Shujian Huang
• #1749
ME-MD: An Effective Framework for Neural Machine Translation with Multiple Encoders and Decoders
Jinchao Zhang, Qun Liu, Jie Zhou
Machine Translation

The encoder-decoder neural framework is widely employed for Neural Machine Translation (NMT) with a single encoder to represent the source sentence and a single decoder to generate target words. The translation performance heavily relies on the representation ability of the encoder and the generation ability of the decoder. To further enhance NMT, we propose to extend the original encoder-decoder framework to a novel one, which has multiple encoders and decoders (ME-MD). Through this way, multiple encoders extract more diverse features to represent the source sequence and multiple decoders capture more complicated translation knowledge. Our proposed ME-MD framework is convenient to integrate heterogeneous encoders and decoders with multiple depths and multiple types. Experiment on Chinese-English translation task shows that our ME-MD system surpasses the state-of-the-art NMT system by 2.1 BLEU points and surpasses the phrase-based Moses by 7.38 BLEU points. Our framework is general and can be applied to other sequence to sequence tasks.

• #2011
Joint Training for Pivot-based Neural Machine Translation
Yong Cheng, Qian Yang, Yang Liu, Maosong Sun, Wei Xu
Machine Translation

While recent neural machine translation approaches have delivered state-of-the-art performance for resource-rich language pairs, they suffer from the data scarcity problem for resource-scarce language pairs. Although this problem can be alleviated by exploiting a pivot language to bridge the source and target languages, the source-to-pivot and pivot-to-target translation models are usually independently trained. In this work, we introduce a joint training algorithm for pivot-based neural machine translation. We propose three methods to connect the two models and enable them to interact with each other during training. Experiments on Europarl and WMT corpora show that joint training of source-to-pivot and pivot-to-target models leads to significant improvements over independent training across various languages.

• #2573
Improved Neural Machine Translation with Source Syntax
Shuangzhi Wu, Ming Zhou, Dongdong Zhang
Machine Translation

Neural Machine Translation (NMT) based on the encoder-decoder architecture has recently achieved the state-of-the-art performance. Researchers have proven that extending word level attention to phrase level attention by incorporating source-side phrase structure can enhance the attention model and achieve promising improvement. However, word dependencies that can be crucial to correctly understand a source sentence are not always in a consecutive fashion (i.e. phrase structure), sometimes they can be in long distance. Phrase structures are not the best way to explicitly model long distance dependencies. In this paper we propose a simple but effective method to incorporate source-side long distance dependencies into NMT. Our method based on dependency trees enriches each source state with global dependency structures, which can better capture the inherent syntactic structure of source sentences. Experiments on Chinese-English and English-Japanese translation tasks show that our proposed method outperforms state-of-the-art SMT and NMT baselines.

• #2989
Maximum Expected Likelihood Estimation for Zero-resource Neural Machine Translation
Hao Zheng, Yong Cheng, Yang Liu
Machine Translation

While neural machine translation (NMT) has made remarkable progress in translating a handful of high-resource language pairs recently, parallel corpora are not always available for many zero-resource language pairs. To deal with this problem, we propose an approach to zero-resource NMT via maximum expected likelihood estimation. The basic idea is to maximize the expectation with respect to a pivot-to-source translation model for the intended source-to-target model on a pivot-target parallel corpus. To approximate the expectation, we propose two methods to connect the pivot-to-source and source-to-target models. Experiments on two zero-resource language pairs show that the proposed approach yields substantial gains over baseline methods. We also observe that when trained jointly with the source-to-target model, the pivot-to-source translation model also obtains improvements over independent training.

### Wednesday 2315:00 - 16:00Competition (206)

Chair: Reyhan Aydogan
• ANAC
Competition
• ### Wednesday 2316:30 - 18:00AUT-TEC - AI & Autonomy: Technical issues (Plenary 2)

Chair: Maria Gini
• #2518
Online Decision-Making for Scalable Autonomous Systems
Kyle Hollins Wray, Stefan J. Witwicki, Shlomo Zilberstein
AI & Autonomy: Technical issues

We present a general formal model called MODIA that can tackle a central challenge for autonomous vehicles (AVs), namely the ability to interact with an unspecified, large number of world entities. In MODIA, a collection of possible decision-problems (DPs), known a priori, are instantiated online and executed as decision-components (DCs), unknown a priori. To combine their individual action recommendations of the DCs into a single action, we propose the lexicographic executor action function (LEAF) mechanism. We analyze the complexity of MODIA and establish LEAF’s relation to regret minimization. Finally, we implement MODIA and LEAF using collections of partially observable Markov decision process (POMDP) DPs, and use them for complex AV intersection decision-making. We evaluate the approach in six scenarios within an industry-standard vehicle simulator, and present its use on an AV prototype.

• #2932
Reinforcement Learning with a Corrupted Reward Channel
Tom Everitt, Victoria Krakovna, Laurent Orseau, Shane Legg
AI & Autonomy: Technical issues

No real-world reward function is perfect. Sensory errors and software bugs may result in agents getting higher (or lower) rewards than they should. For example, a reinforcement learning agent may prefer states where a sensory error gives it the maximum reward, but where the true reward is actually small. We formalise this problem as a generalised Markov Decision Problem called Corrupt Reward MDP. Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensate for the possibly corrupt rewards. Two ways around the problem are investigated. First, by giving the agent richer data, such as in inverse reinforcement learning and semi-supervised reinforcement learning, reward corruption stemming from systematic sensory errors may sometimes be completely managed. Second, by using randomisation to blunt the agent's optimisation, reward corruption can be partially managed under some assumptions.

• #3130
Achieving Coordination in Multi-Agent Systems by Stable Local Conventions under Community Networks
Shuyue Hu, Ho-fung Leung
AI & Autonomy: Technical issues

Recently, the study of social conventions has attracted much attention in the literature. We notice that a type of interesting phenomena, local convention phenomena, may also exist in certain multi-agent systems. When agents are partitioned into compact communities, different local conventions emerge in different communities. In this paper, we provide a definition for local conventions, and propose two metrics measuring their strength and diversity. In our experimental study, we show that agents can achieve coordination via establishing diverse stable local conventions, which indicates a practical way to solve coordination problems other than the traditional global convention emergence. Moreover, we find that with smaller community sizes, denser connections and fewer available actions, diverse local conventions emerge in shorter time.

• #3783
A Goal Reasoning Agent for Controlling UAVs in Beyond-Visual-Range Air Combat
Michael W. Floyd, Justin Karneeb, Philip Moore, David W. Aha
AI & Autonomy: Technical issues

We describe the Tactical Battle Manager (TBM), an intelligent agent that uses several integrated artificial intelligence techniques to control an autonomous unmanned aerial vehicle in simulated beyond-visual-range (BVR) air combat scenarios. The TBM incorporates goal reasoning, automated planning, opponent behavior recognition, state prediction, and discrepancy detection to operate in a real-time, dynamic, uncertain, and adversarial environment. We describe evidence from our empirical study that the TBM significantly outperforms an expert-scripted agent in BVR scenarios. We also report the results of an ablation study which indicates that all components of our agent architecture are needed to maximize mission performance.

### Wednesday 2316:30 - 18:00ML-CL5 - Classification 5 (204)

Chair: Tongliang Liu
• #1231
Exclusivity Regularized Machine: A New Ensemble SVM Classifier
Xiaojie Guo, Xiaobo Wang, Haibin Ling
Classification 5

The diversity of base learners is of utmost importance to a good ensemble. This paper defines a novel measurement of diversity, termed as exclusivity. With the designed exclusivity, we further propose an ensemble SVM classifier, namely Exclusivity Regularized Machine (ExRM), to jointly suppress the training error of ensemble and enhance the diversity between bases. Moreover, an Augmented Lagrange Multiplier based algorithm is customized to effectively and efficiently seek the optimal solution of ExRM. Theoretical analysis on convergence, global optimality and linear complexity of the proposed algorithm, as well as experiments are provided to reveal the efficacy of our method and show its superiority over state-of-the-arts in terms of accuracy and efficiency.

• #1312
Vertex-Weighted Hypergraph Learning for Multi-View Object Classification
Lifan Su, Yue Gao, Xibin Zhao, Hai Wan, Ming Gu, Jiaguang Sun
Classification 5

3D object classification with multi-view representation has become very popular, thanks to the progress on computer techniques and graphic hardware, and attracted much research attention in recent years. Regarding this task, there are mainly two challenging issues, i.e., the complex correlation among multiple views and the possible imbalance data issue. In this work, we propose to employ the hypergraph structure to formulate the relationship among 3D objects, taking the advantage of hypergraph on high-order correlation modelling. However, traditional hypergraph learning method may suffer from the imbalance data issue. To this end, we propose a vertex-weighted hypergraph learning algorithm for multi-view 3D object classification, introducing an updated hypergraph structure. In our method, the correlation among different objects is formulated in a hypergraph structure and each object (vertex) is associated with a corresponding weight, weighting the importance of each sample in the learning process. The learning process is conducted on the vertex-weighted hypergraph and the estimated object relevance is employed for object classification. The proposed method has been evaluated on two public benchmarks, i.e., the NTU and the PSB datasets. Experimental results and comparison with the state-of-the-art methods and recent deep learning method demonstrate the effectiveness of our proposed method.

• #2077
Improving the Generalization Performance of Multi-class SVM via Angular Regularization
Jianxin Li, Haoyi Zhou, Pengtao Xie, Yingchun Zhang
Classification 5

In multi-class support vector machine (MSVM) for classification, one core issue is to regularize the coefficient vectors to reduce overfitting. Various regularizers have been proposed such as L2, L1, and trace norm. In this paper, we introduce a new type of regularization approach -- angular regularization, that encourages the coefficient vectors to have larger angles such that class regions can be widen to flexibly accommodate unseen samples. We propose a novel angular regularizer based on the singular values of the coefficient matrix, where the uniformity of singular values reduces the correlation among different classes and drives the angles between coefficient vectors to increase. In generalization error analysis, we show that decreasing this regularizer effectively reduces generalization error bound. On various datasets, we demonstrate the efficacy of the regularizer in reducing overfitting.

• #2101
Ordinal Zero-Shot Learning
Zengwei Huo, Xin Geng
Classification 5

Zero-shot learning predicts new class even if no training data is available for that class. The solution to conventional zero-shot learning usually depends on side information such as attribute or text corpora. But these side information is not easy to obtain or use. Fortunately in many classification tasks, the class labels are ordered, and therefore closely related to each other. This paper deals with zero-shot learning for ordinal classification. The key idea is using label relevance to expand supervision information from seen labels to unseen labels. The proposed method SIDL generates a supervision intensity distribution (SID) that contains each label's supervision intensity, and then learns a mapping from instance to SID. Experiments on two typical ordinal classification problems, i.e., head pose estimation and age estimation, show that SIDL performs significantly better than the compared regression methods. Furthermore, SIDL appears much more robust against the increase of unseen labels than other compared baselines.

• #3935
Distributed Accelerated Proximal Coordinate Gradient Methods
Yong Ren, Jun Zhu
Classification 5

We develop a general accelerated proximal coordinate descent algorithm in distributed settings (Dis- APCG) for the optimization problem that minimizes the sum of two convex functions: the first part f is smooth with a gradient oracle, and the other one Ψ is separable with respect to blocks of coordinate and has a simple known structure (e.g., L1 norm). Our algorithm gets new accelerated convergence rate in the case that f is strongly con- vex by making use of modern parallel structures, and includes previous non-strongly case as a special case. We further present efficient implementations to avoid full-dimensional operations in each step, significantly reducing the computation cost. Experiments on the regularized empirical risk minimization problem demonstrate the effectiveness of our algorithm and match our theoretical findings.

• #3962
Open Category Classification by Adversarial Sample Generation
Yang Yu, Wei-Yang Qu, Nan Li, Zimin Guo
Classification 5

In real-world classification tasks, it is difficult to collect training samples from all possible categories of the environment. Therefore, when an instance of an unseen class appears in the prediction stage, a robust classifier should be able to tell that it is from an unseen class, instead of classifying it to be any known category. In this paper, adopting the idea of adversarial learning, we propose the ASG framework for open-category classification. ASG generates positive and negative samples of seen categories in the unsupervised manner via an adversarial learning strategy. With the generated samples, ASG then learns to tell seen from unseen in the supervised manner. Experiments performed on several datasets show the effectiveness of ASG.

### Wednesday 2316:30 - 18:00ML-DLV1 - Deep Learning and Vision 1 (210)

Chair: David Hogg
• #1490
Fashion Style Generator
Shuhui Jiang, Yun Fu
Deep Learning and Vision 1

In this paper, we focus on a new problem: applying artificial intelligence to automatically generate fashion style images. Given a basic clothing image and a fashion style image (e.g., leopard print), we generate a clothing image with the certain style in real time with a neural fashion style generator. Fashion style generation is related to recent artistic style transfer works, but has its own challenges. The synthetic image should preserve the similar design as the basic clothing, and meanwhile blend the new style pattern on the clothing. Neither existing global nor patch based neural style transfer methods could well solve these challenges. In this paper, we propose an end-to-end feed-forward neural network which consists of a fashion style generator and a discriminator. The global and patch based style and content losses calculated by the discriminator alternatively back-propagate the generator network and optimize it. The global optimization stage preserves the clothing form and design and the local optimization stage preserves the detailed style pattern. Extensive experiments show that our method outperforms the state-of-the-arts.

• #1170
EigenNet: Towards Fast and Structural Learning of Deep Neural Networks
Ping Luo
Deep Learning and Vision 1

Deep Neural Network (DNN) is difficult to train and easy to overfit in training. We address these two issues by introducing EigenNet, an architecture that not only accelerates training but also adjusts number of hidden neurons to reduce over-fitting. They are achieved by whitening the information flows of DNNs and removing those eigenvectors that may capture noises. The former improves conditioning of the Fisher information matrix, whilst the latter increases generalization capability. These appealing properties of EigenNet can benefit many recent DNN structures, such as network in network and inception, by wrapping their hidden layers into the layers of EigenNet. The modeling capacities of the original networks are preserved. Both the training wall-clock time and number of updates are reduced by using EigenNet, compared to stochastic gradient descent on various datasets, including MNIST, CIFAR-10, and CIFAR-100.

• #2096
DeepFacade: A Deep Learning Approach to Facade Parsing
Hantang Liu, Jialiang Zhang, Jianke Zhu, Steven C. H. Hoi
Deep Learning and Vision 1

The parsing of building facades is a key component to the problem of 3D street scenes reconstruction, which is long desired in computer vision. In this paper, we propose a deep learning based method for segmenting a facade into semantic categories. Man-made structures often present the characteristic of symmetry. Based on this observation, we propose a symmetric regularizer for training the neural network. Our proposed method can make use of both the power of deep neural networks and the structure of man-made architectures. We also propose a method to refine the segmentation results using bounding boxes generated by the Region Proposal Network. We test our method by training a FCN-8s network with the novel loss function. Experimental results show that our method has outperformed previous state-of-the-art methods significantly on both the ECP dataset and the eTRIMS dataset. As far as we know, we are the first to employ end-to-end deep convolutional neural network on full image scale in the task of building facades parsing.

• #2196
Training Group Orthogonal Neural Networks with Privileged Information
Yunpeng Chen, Xiaojie Jin, Jiashi Feng, Shuicheng Yan
Deep Learning and Vision 1

Learning rich and diverse representations is critical for the performance of deep convolutional neural networks (CNNs). In this paper, we consider how to use privileged information to promote inherent diversity of a single CNN model such that the model can learn better representations and offer stronger generalization ability. To this end, we propose a novel group orthogonal convolutional neural network (GoCNN) that learns untangled representations within each layer by exploiting provided privileged information and enhances representation diversity effectively. We take image classification as an example where image segmentation annotations are used as privileged information during the training process. Experiments on two benchmark datasets – ImageNet and PASCAL VOC – clearly demonstrate the strong generalization ability of our proposed GoCNN model. On the ImageNet dataset, GoCNN improves the performance of state-of-the-art ResNet-152 model by absolute value of 1.2% while only uses privileged information of 10% of the training images, confirming effectiveness of GoCNN on utilizing available privileged knowledge to train better CNNs.

• #2409
Forecast the Plausible Paths in Crowd Scenes
Hang Su, Jun Zhu, Yinpeng Dong, Bo Zhang
Deep Learning and Vision 1

Forecasting the future plausible paths of pedestrians in crowd scenes is of wide applications, but it still remains as a challenging task due to the complexities and uncertainties of crowd motions. To address these issues, we propose to explore the inherent crowd dynamics via a social-aware recurrent Gaussian process model, which facilitates the path prediction by taking advantages of the interplay between the rich prior knowledge and motion uncertainties. Specifically, we derive a social-aware LSTM to explore the crowd dynamic, resulting in a hidden feature embedding the rich prior in massive data. Afterwards, we integrate the descriptor into deep Gaussian processes with motion uncertainties appropriately harnessed. Crowd motion forecasting is implemented by regressing relative motion against the current positions, yielding the predicted paths based on a functional object associated with a distribution. Extensive experiments on public datasets demonstrate that our method obtains the state-of-the-art performance in both structured and unstructured scenes by exploring the complex and uncertain motion patterns, even if the occlusion is serious or the observed trajectories are noisy.

• #2879
Deep Optical Flow Estimation Via Multi-Scale Correspondence Structure Learning
Shanshan Zhao, Xi Li, Omar El Farouk Bourahla
Deep Learning and Vision 1

As an important and challenging problem in computer vision, learning based optical flow estimation aims to discover the intrinsic correspondence structure between two adjacent video frames through statistical learning. Therefore, a key issue to solve in this area is how to effectively model the multi-scale correspondence structure properties in an adaptive end-to-end learning fashion. Motivated by this observation, we propose an end-to-end multi-scale correspondence structure learning (MSCSL) approach for optical flow estimation. In principle, the proposed MSCSL approach is capable of effectively capturing the multi-scale inter-image-correlation correspondence structures within a multi-level feature space from deep learning. Moreover, the proposed MSCSL approach builds a spatial Conv-GRU neural network model to adaptively model the intrinsic dependency relationships among these multi-scale correspondence structures. Finally, the above procedures for correspondence structure learning and multi-scale dependency modeling are implemented in a unified end-to-end deep learning framework. Experimental results on several benchmark datasets demonstrate the effectiveness of the proposed approach.

### Wednesday 2316:30 - 18:00ML-DMSS - Data Mining and Social Sciences (211)

Chair: Longbing Cao
• #1401
A Robust Noise Resistant Algorithm for POI Identification from Flickr Data
Yiyang Yang, Zhiguo Gong, Qing Li, Leong Hou U, Ruichu Cai, Zhifeng Hao
Data Mining and Social Sciences

Point of Interests (POI) identification using social media data (e.g. Flickr, Microblog) is one of the most popular research topics in recent years. However, there exist large amounts of noises (POI irrelevant data) in such crowd-contributed collections. Traditional solutions to this problem is to set a global density threshold and remove the data point as noise if its density is lower than the threshold. However, the density values vary significantly among POIs. As the result, some POIs with relatively lower density could not be identified. To solve the problem, we propose a technique based on the local drastic changes of the data density. First we define the local maxima of the density function as the Urban POIs, and the gradient ascent algorithm is exploited to assign data points into different clusters. To remove noises, we incorporate the Laplacian Zero-Crossing points along the gradient ascent process as the boundaries of the POI. Points located outside the POI region are regarded as noises. Then the technique is extended into the geographical and textual joint space so that it can make use of the heterogeneous features of social media. The experimental results show the significance of the proposed approach in removing noises.

• #2427
Learning Concise Representations of Users' Influences through Online Behaviors
Shenghua Liu, Houdong Zheng, Huawei Shen, Xueqi Cheng, Xiangwen Liao
Data Mining and Social Sciences

Whereas it is well known that social network users influence each other, a fundamental problem in influence maximization, opinion formation and viral marketing is that users' influences are difficult to quantify. Previous work has directly defined an independent model parameter to capture the interpersonal influence between each pair of users. However, such models do not consider how influences depend on each other if they originate from the same user or if they act on the same user. To do so, these models need a parameter for each pair of users, which results in high-dimensional models becoming easily trapped into the overfitting problem. Given these problems, another way of defining the parameters is needed to consider the dependencies. Thus we propose a model that defines parameters for every user with a latent influence vector and a susceptibility vector. Such low-dimensional and distributed representations naturally cause the interpersonal influences involving the same user to be coupled with each other, thus reducing the model's complexity. Additionally, the model can easily consider the sentimental polarities of users' messages and how sentiment affects users' influences. In this study, we conduct extensive experiments on real Microblog data, showing that our model with distributed representations achieves better accuracy than the state-of-the-art and pair-wise models, and that learning influences on sentiments benefit performance.

• #2611
TransNet: Translation-Based Network Representation Learning for Social Relation Extraction
Cunchao Tu, Zhengyan Zhang, Zhiyuan Liu, Maosong Sun
Data Mining and Social Sciences

Conventional network representation learning (NRL) models learn low-dimensional vertex representations by simply regarding each edge as a binary or continuous value. However, there exists rich semantic information on edges and the interactions between vertices usually preserve distinct meanings, which are largely neglected by most existing NRL models. In this work, we present a novel Translation-based NRL model, TransNet, by regarding the interactions between vertices as a translation operation. Moreover, we formalize the task of Social Relation Extraction (SRE) to evaluate the capability of NRL methods on modeling the relations between vertices. Experimental results on SRE demonstrate that TransNet significantly outperforms other baseline methods by 10% to 20% on hits@1. The source code and datasets can be obtained from https://github.com/thunlp/TransNet.

• #2688
Accelerated Local Anomaly Detection via Resolving Attributed Networks
Ninghao Liu, Xiao Huang, Xia Hu
Data Mining and Social Sciences

Attributed networks, in which network connectivity and node attributes are available, have been increasingly used to model real-world information systems, such as social media and e-commerce platforms. While outlier detection has been extensively studied to identify anomalies that deviate from certain chosen background, existing algorithms cannot be directly applied on attributed networks due to the heterogeneous types of information and the scale of real-world data. Meanwhile, it has been observed that local anomalies, which may align with global condition, are hard to be detected by existing algorithms with interpretability. Motivated by the observations, in this paper, we propose to study the problem of effective and efficient local anomaly detection in attributed networks. In particular, we design a collective way for modeling heterogeneous network and attribute information, and develop a novel and efficient distributed optimization algorithm to handle large-scale data. In the experiments, we compare the proposed framework with the state-of-the-art methods on both real and synthetic datasets, and demonstrate its effectiveness and efficiency through quantitative evaluation and case studies.

• #2984
ContextCare: Incorporating Contextual Information Networks to Representation Learning on Medical Forum Data
Stan Zhao, Meng Jiang, Quan Yuan, Bing Qin, Ting Liu, ChengXiang Zhai
Data Mining and Social Sciences

Online users have generated a large amount of health-related data on medical forums and search engines. However, exploiting these rich data for orienting patient online and assisting medical checkup offline is nontrivial due to the sparseness of existing symptom-disease links, which caused by the natural and chatty expressions of symptoms. In this paper, we propose a novel and general representation learning method ContextCare for human generated health-related data, which learns the latent relationship between symptoms and diseases from the symptom-disease diagnosis network for disease prediction, disease category prediction and disease clustering. To alleviate the network sparseness, ContextCare adopts regularizations from rich contextual information networks including a symptom co-occurrence network and a disease evolution network. Therefore, our representations of symptoms and diseases incorporate knowledge from these three networks. Extensive experiments on medical forum data demonstrate that ContextCare outperforms the state-of-the-art methods in disease category prediction, disease prediction and disease clustering.

• #3812
SPMC: Socially-Aware Personalized Markov Chains for Sparse Sequential Recommendation
Chenwei Cai, Ruining He, Julian McAuley
Data Mining and Social Sciences

Dealing with sparse, long-tailed datasets, and cold-start problems is always a challenge for recommender systems. These issues can partly be dealt with by making predictions not in isolation, but by leveraging information from related events; such information could include signals from social relationships or from the sequence of recent activities. Both types of additional information can be used to improve the performance of state-of-the-art matrix factorization-based techniques. In this paper, we propose new methods to combine both social and sequential information simultaneously, in order to further improve recommendation performance. We show these techniques to be particularly effective when dealing with sparsity and cold-start issues in several large, real-world datasets.

### Wednesday 2316:30 - 18:00ML-SSL3 - Semi-Supervised Learning 3 (212)

Chair: Ming Li
• #1287
Semi-supervised Max-margin Topic Model with Manifold Posterior Regularization
Wenbo Hu, Jun Zhu, Hang Su, Jingwei Zhuo, Bo Zhang
Semi-Supervised Learning 3

Supervised topic models leverage label information to learn discriminative latent topic representations. As collecting a fully labeled dataset is often time-consuming, semi-supervised learning is of high interest. In this paper, we present an effective semi-supervised max-margin topic model by naturally introducing manifold posterior regularization to a regularized Bayesian topic model, named LapMedLDA. The model jointly learns latent topics and a related classifier with only a small fraction of labeled documents. To perform the approximate inference, we derive an efficient stochastic gradient MCMC method. Unlike the previous semi-supervised topic models, our model adopts a tight coupling between the generative topic model and the discriminative classifier. Extensive experiments demonstrate that such tight coupling brings significant benefits in quantitative and qualitative performance.

• #1426
Learning deep structured network for weakly supervised change detection
Salman Khan, Xuming He, Fatih Porikli, Mohammed Bennamoun, Ferdous Sohel, Roberto Togneri
Semi-Supervised Learning 3

Conventional change detection methods require a large number of images to learn background models or depend on tedious pixel-level labeling by humans. In this paper, we present a weakly supervised approach that needs only image-level labels to simultaneously detect and localize changes in a pair of images. To this end, we employ a deep neural network with DAG topology to learn patterns of change from image-level labeled training data. On top of the initial CNN activations, we define a CRF model to incorporate the local differences and context with the dense connections between individual pixels. We apply a constrained mean-field algorithm to estimate the pixel-level labels, and use the estimated labels to update the parameters of the CNN in an iterative EM framework. This enables imposing global constraints on the observed foreground probability mass function. Our evaluations on four benchmark datasets demonstrate superior detection and localization performance.

• #1590
Using Graphs of Classifiers to Impose Declarative Constraints on Semi-supervised Learning
Lidong Bing, William W. Cohen, Bhuwan Dhingra
Semi-Supervised Learning 3

We propose a general approach to modeling semi-supervised learning (SSL) algorithms. Specifically, we present a declarative language for modeling both traditional supervised classification tasks and many SSL heuristics, including both well-known heuristics such as co-training and novel domain-specific heuristics. In addition to representing individual SSL heuristics, we show that multiple heuristics can be automatically combined using Bayesian optimization methods. We experiment with two classes of tasks, link-based text classification and relation extraction. We show modest improvements on well-studied link-based classification benchmarks, and state-of-the-art results on relation-extraction tasks for two realistic domains.

• #1747
Incomplete Attribute Learning with auxiliary labels
Kongming Liang, Yuhong Guo, Hong Chang, Xilin Chen
Semi-Supervised Learning 3

Visual attribute learning is a fundamental and challenging problem for image understanding. Considering the huge semantic space of attributes, it is economically impossible to annotate all their presence or absence for a natural image via crowd-sourcing. In this paper, we tackle the incompleteness nature of visual attributes by introducing auxiliary labels into a novel transductive learning framework. By jointly predicting the attributes from the input images and modeling the relationship of attributes and auxiliary labels, the missing attributes can be recovered effectively. In addition, the proposed model can be solved efficiently in an alternative way by optimizing quadratic programming problems and updating parameters in closed-form solutions. Moreover, we propose and investigate different methods for acquiring auxiliary labels. We conduct experiments on three widely used attribute prediction datasets. The experimental results show that our proposed method can achieve the state-of-the-art performance with access to partially observed attribute annotations.

• #1959
Decreasing Uncertainty in Planning with State Prediction
Senka Krivic, Michael Cashmore, Daniele Magazzeni, Bram Ridder, Sandor Szedmak, Justus Piater
Semi-Supervised Learning 3

In real world environments the state is almost never completely known. Exploration is often expensive. The application of planning in these environments is consequently more difficult and less robust. In this paper we present an approach for predicting new information about a partially-known state. The state is translated into a partially-known multigraph, which can then be extended using machine-learning techniques. We demonstrate the effectiveness of our approach, showing that it enhances the scalability of our planners, and leads to less time spent on sensing actions.

• #3440
Semi-supervised Learning over Heterogeneous Information Networks by Ensemble of Meta-graph Guided Random Walks
He Jiang, Yangqiu Song, Chenguang Wang, Ming Zhang, Yizhou Sun
Semi-Supervised Learning 3

Heterogeneous information networks (HINs) is a general representation of many real world applications. The difference between HIN and traditional homogeneous graphs is that the nodes and edges in HIN are with types. Then in the many applications, we need to consider the types to make the approach more semantically meaningful. For the applications that annotation is expensive, on natural way is to consider semi-supervised learning over HIN. In this paper, we present a semi-supervised learning algorithm constrained by the types of HINs. We first decompose the original HIN into several semantically meaningful sub-graphs based the meta-graphs composed of entity and relation types. Then we perform random walk over the sub-graphs to propagate the labels from labeled data to unlabeled data. After we obtain all the labels propagated by different trials of random walk guided by meta-graphs, we use an ensemble algorithm to vote for the final labeling results. We use two public available datasets, 20-newsgroups and RCV1 datasets to test our algorithm. Experimental results show that our algorithm is better than the traditional semi-supervised learning algorithms for HINs. One particular by-product of this work is that we show that previous random walk approach guided by meta-paths can be non-stationary, which is the major reason we propose a meta-graph guide random walk for semi-supervised learning over HINs.

### Wednesday 2316:30 - 18:00MT-SP2 - Security and Privacy 2 (213)

Chair: Anika Schumann
• #1787
When Security Games Hit Traffic: Optimal Traffic Enforcement Under One Sided Uncertainty
Ariel Rosenfeld, Sarit Kraus
Security and Privacy 2

Efficient traffic enforcement is an essential, yet complex, component in preventing road accidents. In this paper, we present a novel model and an optimizing algorithm for mitigating some of the computational challenges of real-world traffic enforcement allocation in large road networks. Our approach allows for scalable, coupled and non-Markovian optimization of multiple police units and guarantees optimality. In an extensive empirical evaluation we show that our approach favorably compares to several baseline solutions achieving a significant speed-up, using both synthetic and real-world road networks.

• #1831
A Convolutional Approach for Misinformation Identification
Feng Yu, Qiang Liu, Shu Wu, Liang Wang, Tieniu Tan
Security and Privacy 2

The fast expanding of social media fuels the spreading of misinformation which disrupts people's normal lives. It is urgent to achieve goals of misinformation identification and early detection in social media. In dynamic and complicated social media scenarios, some conventional methods mainly concentrate on feature engineering which fail to cover potential features in new scenarios and have difficulty in shaping elaborate high-level interactions among significant features. Moreover, a recent Recurrent Neural Network (RNN) based method suffers from deficiencies that it is not qualified for practical early detection of misinformation and poses a bias to the latest input. In this paper, we propose a novel method, Convolutional Approach for Misinformation Identification (CAMI) based on Convolutional Neural Network (CNN). CAMI can flexibly extract key features scattered among an input sequence and shape high-level interactions among significant features, which help effectively identify misinformation and achieve practical early detection. Experiment results on two large-scale datasets validate the effectiveness of CAMI model on both misinformation identification and early detection tasks.

• #2181
Optimal Escape Interdiction on Transportation Networks
Youzhi Zhang, Bo An, Long Tran-Thanh, Zhen Wang, Jiarui Gan, Nicholas R. Jennings
Security and Privacy 2

Preventing crimes or terrorist attacks in urban areas is challenging. Law enforcement officers need to respond quickly to catch the attacker on his escape route, which is subject to time-dependent traffic conditions on transportation networks. The attacker can strategically choose his escape path and driving speed to avoid being captured. Existing work on security resource allocation has not considered such scenarios with time-dependent strategies for both players. Therefore, in this paper, we study the problem of efficiently scheduling security resources for interdicting the escaping attacker. We propose: 1) a new defender-attacker security game model for escape interdiction on transportation networks; and 2) an efficient double oracle algorithm to compute the optimal defender strategy, which combines mixed-integer linear programming formulations for best response problems and effective approximation algorithms for improving the scalability of the algorithms. Experimental evaluation shows that our approach significantly outperforms baselines in solution quality and scales up to realistic-sized transportation networks with hundreds of intersections.

• #2844
A Trust-based Mixture of Gaussian Processes Model for Reliable Regression in Participatory Sensing
Qikun Xiang, Jie Zhang, Ido Nevat, Pengfei Zhang
Security and Privacy 2

Data trustworthiness is a crucial issue in real-world participatory sensing applications. Without considering this issue, different types of worker misbehavior, especially the challenging collusion attacks, can result in biased and inaccurate estimation and decision making. We propose a novel trust-based mixture of Gaussian processes (GP) model for spatial regression to jointly detect such misbehavior and accurately estimate the spatial field. We develop a Markov chain Monte Carlo (MCMC)-based algorithm to efficiently perform Bayesian inference of the model. Experiments using two real-world datasets show the superior robustness of our model compared with existing approaches.

• #3243
A Group-Based Personalized Model for Image Privacy Classification and Labeling
Haoti Zhong, Anna Squicciarini, David Miller, Cornelia Caragea
Security and Privacy 2

We address machine prediction of an individual's label (private or public) for a given image. This problem is difficult due to user subjectivity and inadequate labeled examples to train individual, personalized models. It is also time and space consuming to train a classifier for each user. We propose a Group-Based Personalized Model for image privacy classification in online social media sites, which learns a set of archetypical privacy models (groups), and associates a given user with one of these groups. Our system can be used to provide accurate early warnings'' with respect to a user's privacy awareness level.

• #3477
Efficient Label Contamination Attacks Against Black-Box Learning Models
Mengchen Zhao, Bo An, Wei Gao, Teng Zhang
Security and Privacy 2

Label contamination attack (LCA) is an important type of data poisoning attack where an attacker manipulates the labels of training data to make the learned model beneficial to him. Existing work on LCA assumes that the attacker has full knowledge of the victim learning model, whereas the victim model is usually a black-box to the attacker. In this paper, we develop a Projected Gradient Ascent (PGA) algorithm to compute LCAs on a family of empirical risk minimizations and show that an attack on one victim model can also be effective on other victim models. This makes it possible that the attacker designs an attack against a substitute model and transfers it to a black-box victim model. Based on the observation of the transferability, we develop a defense algorithm to identify the data points that are most likely to be attacked. Empirical studies show that PGA significantly outperforms existing baselines and linear learning models are better substitute models than nonlinear ones.

### Wednesday 2316:30 - 18:00MT-SS2 - Social Sciences 2 (216)

Chair: Mingyu Xiao
• #1509
Depression Detection via Harvesting Social Media: A Multimodal Dictionary Learning Solution
Guangyao Shen, Jia Jia, Liqiang Nie, Fuli Feng, Cunjun Zhang, Tianrui Hu, Tat-Seng Chua, Wenwu Zhu
Social Sciences 2

Depression is a major contributor to the overall global burden of diseases. Traditionally, doctors diagnose depressed people face to face via referring to clinical depression criteria. However, more than 70% of the patients would not consult doctors at early stages of depression, which leads to further deterioration of their conditions. Meanwhile, people are increasingly relying on social media to disclose emotions and sharing their daily lives, thus social media have successfully been leveraged for helping detect physical and mental diseases. Inspired by these, our work aims to make timely depression detection via harvesting social media data. We construct well-labeled depression and non-depression dataset on Twitter, and extract six depression-related feature groups covering not only the clinical depression criteria, but also online behaviors on social media. With these feature groups, we propose a multimodal depressive dictionary learning model to detect the depressed users on Twitter. A series of experiments are conducted to validate this model, which outperforms (+3% to +10%) several baselines. Finally, we analyze a large-scale dataset on Twitter to reveal the underlying online behaviors between depressed and non-depressed users.

• #3137
Who to Invite Next? Predicting Invitees of Social Groups
Yu Han, Jie Tang
Social Sciences 2

Social instant messaging services (SMS) such as WhatsApp, Snapchat and WeChat, have significantly changed the way people work, live, and communicate, attracting increasing attention from multiple disciplinary including computer science, sociology, psychology, and physics. In SMS, social groups play a very important role in supporting communication among multiple users. An interesting question arises: what are the dynamic mechanisms underlying the group evolution? Or more specifically, in an existing group, who should be invited to join? In this paper, we formalize a novel problem of predicting potential invitees of groups. Employing WeChat, the largest social messaging service in China, as the source for our experimental data, we develop a probabilistic graph model to capture the fundamental factors that determine the probability of a user to be invited to a specific social group. Our results show that the proposed model indeed lead to statistically significant prediction improvements over several state-of-the-art baseline methods.

• #3232
The Minds of Many: Opponent Modeling in a Stochastic Game
Friedrich Burkhard von der Osten, Michael Kirley, Tim Miller
Social Sciences 2

The Theory of Mind provides a framework for an agent to predict the actions of adversaries by building an abstract model of their strategies using recursive nested beliefs. In this paper, we extend a recently introduced technique for opponent modeling based on Theory of Mind reasoning. Our extended multi-agent Theory of Mind model explicitly considers multiple opponents simultaneously. We introduce a stereotyping mechanism, which segments the agent population into sub-groups of agents with similar behavior. Here, sub-group profiles guide decision making in place of individual agent profiles. We evaluate our model using a multi-player stochastic game, which presents agents with the challenge of unknown adversaries in a partially-observable environment. Simulation results demonstrate that the model performs well under uncertainty and that stereotyping allows larger groups of agents to be modeled robustly. The findings strengthen results showing that Theory of Mind modeling is useful in many artificial intelligence applications.

• #3388
Social Pressure in Opinion Games
Diodato Ferraioli, Carmine Ventre
Social Sciences 2

Motivated by privacy and security concerns in online social networks, we study the role of social pressure in opinion games. These are games, important in economics and sociology, that model the formation of opinions in a social network. We enrich the definition of (noisy) best-response dynamics for opinion games by introducing the pressure, increasing with time, to reach an agreement.We prove that for clique social networks, the dynamics always converges to consensus (no matter the level of noise) if the social pressure is high enough. Moreover, we provide (tight) bounds on the speed of convergence; these bounds are polynomial in the number of players provided that the pressure grows sufficiently fast.We finally look beyond cliques: we characterize the graphs for which consensus is guaranteed, and make some considerations on the computational complexity of checking whether a graph satisfies such a condition.

• #3396
No Time to Observe: Adaptive Influence Maximization with Partial Feedback
Jing Yuan, Shaojie Tang
Social Sciences 2

Although influence maximization problem has been extensively studied over the past ten years, majority of existing work adopt one of the following models: full-feedback model or zero-feedback model. In the zero-feedback model, we have to commit the seed users all at once in advance, this strategy is also known as non-adaptive policy. In the full-feedback model, we select one seed at a time and wait until the diffusion completes, before selecting the next seed. Full-feedback model has better performance but potentially huge delay, zero-feedback model has zero delay but poorer performance since it does not utilize the observation that may be made during the seeding process. To fill the gap between these two models, we propose partial-feedback model, which allows us to select a seed at any intermediate stage. We develop a novel alpha-greedy policy that achieves a bounded approximation ratio.

• #3555
Unified Representation and Lifted Sampling for Generative Models of Social Networks
Pablo Robles-Granda, Sebastian Moreno, Jennifer Neville
Social Sciences 2

Statistical models of network structure are widely used in network science to reason about the properties of complex systems—where the nodes and edges represent entities and their relationships. Recently, a number of generative network models (GNM) have been developed that accurately capture characteristics of real world networks, but since they are typically defined in a procedural manner, it is difficult to identify commonalities in their structure. Moreover, procedural definitions make it difficult to develop statistical sampling algorithms that are both efficient and correct. In this paper, we identify a family of GNMs that share a common latent structure and create a Bayesian network (BN) representation that captures their common form. We show how to reduce two existing GNMs to this representation. Then, using the BN representation we develop a generalized, efficient, and provably correct, sampling method that exploits parametric symmetries and deterministic context-specific dependence. Finally, we use the new representation to design a novel GNM and evaluate it empirically.

### Wednesday 2316:30 - 18:00KR-KRL - Knowledge Representation Languages (217)

Chair: Mark Kaminski
• #1405
Discriminative Dictionary Learning With Ranking Metric Embedded for Person Re-Identification
De Cheng, Xiaojun Chang, Li Liu, Alexander G. Hauptmann, Yihong Gong, Nanning Zheng
Knowledge Representation Languages

The goal of person re-identification (Re-Id) is to match pedestrians captured from multiple non-overlapping cameras. In this paper, we propose a novel dictionary learning based method with the ranking metric embedded, for person Re-Id. A new and essential ranking graph Laplacian term is introduced, which minimizes the intra-personal compactness and maximizes the inter-personal dispersion in the objective. Different from the traditional dictionary learning based approaches and their extensions, which just use the same or not information, our proposed method can explore the ranking relationship among the person images, which is essential for such retrieval related tasks. Simultaneously, one distance measurement has been explicitly learned in the model to further improve the performance. Since we have reformulated these ranking constraints into the graph Laplacian form, the proposed method is easy-to-implement but effective. We conduct extensive experiments on three widely used person Re-Id benchmark datasets, and achieve state-of-the-art performances.

• #2248
Knowledge Graph Representation with Jointly Structural and Textual Encoding
Jiacheng Xu, Xipeng Qiu, Kan Chen, Xuanjing Huang
Knowledge Representation Languages

The objective of knowledge graph embedding is to encode both entities and relations of knowledge graphs into continuous low-dimensional vector spaces. Previously, most works focused on symbolic representation of knowledge graph with structure information, which can not handle new entities or entities with few facts well. In this paper, we propose a novel deep architecture to utilize both structural and textual information of entities. Specifically, we introduce three neural models to encode the valuable information from text description of entity, among which an attentive model can select related information as needed. Then, a gating mechanism is applied to integrate representations of structure and text into a unified architecture. Experiments show that our models outperform baseline and obtain state-of-the-art results on link prediction and triplet classification tasks.

• #2967
Context-aware Path Ranking for Knowledge Base Completion
Sahisnu Mazumder, Bing Liu
Knowledge Representation Languages

Knowledge base (KB) completion aims to infer missing facts from existing ones in a KB. Among various approaches, path ranking (PR) algorithms have received increasing attention in recent years. PR algorithms enumerate paths between entity-pairs in a KB and use those paths as features to train a model for missing fact prediction. Due to their good performances and high model interpretability, several methods have been proposed. However, most existing methods suffer from scalability (high RAM consumption) and feature explosion (trains on an exponentially large number of features) problems. This paper proposes a Context-aware Path Ranking (C-PR) algorithm to solve these problems by introducing a selective path exploration strategy. C-PR learns global semantics of entities in the KB using word embedding and leverages the knowledge of entity semantics to enumerate contextually relevant paths using bidirectional random walk. Experimental results on three large KBs show that the path features (fewer in number) discovered by C-PR not only improve predictive performance but also are more interpretable than existing baselines.

• #3706
A Model for Accountable Ordinal Sorting
Khaled Belahcene, Christophe Labreuche, Nicolas Maudet, Vincent Mousseau, Wassila Ouerdane
Knowledge Representation Languages

We address the problem of multicriteria ordinalsorting through the lens of accountability, i.e. theability of a human decision-maker to own a recommendationmade by the system. We put forward anumber of model features that would favor the capabilityto support the recommendation with a convincingexplanation. To account for that, we designa recommender system implementing and formalizingsuch features. This system outputs explanationsdefined under the form of specific argumentschemes tailored to represent the specific rules ofthe model. At the end, we discuss possible andpromising argumentative perspectives.

• #3821
Relatedness-based Multi-Entity Summarization
Kalpa Gunaratna, Amir Hossein Yazdavar, Krishnaprasad Thirunarayan, Amit Sheth, Gong Cheng
Knowledge Representation Languages

Representing world knowledge in a machine processable format is important as entities and their descriptions have fueled tremendous growth in knowledge-rich information processing platforms, services, and systems. Prominent applications of knowledge graphs include search engines (e.g., Google Search and Microsoft Bing), email clients (e.g., Gmail), and intelligent personal assistants (e.g., Google Now, Amazon Echo, and Apple's Siri). In this paper, we present an approach that can summarize facts about a collection of entities by analyzing their relatedness in preference to summarizing each entity in isolation. Specifically, we generate informative entity summaries by selecting: (i) inter-entity facts that are similar and (ii) intra-entity facts that are important and diverse. We employ a constrained knapsack problem solving approach to efficiently compute entity summaries. We perform both qualitative and quantitative experiments and demonstrate that our approach yields promising results compared to two other stand-alone state-of-the-art entity summarization approaches.

• #4081
A Reasoning System for a First-Order Logic of Limited Belief
Christoph Schwering
Knowledge Representation Languages

Logics of limited belief aim at enabling computationally feasible reasoning in highly expressive representation languages. These languages are often dialects of first-order logic with a weaker form of logical entailment that keeps reasoning decidable or even tractable. While a number of such logics have been proposed in the past, they tend to remain for theoretical analysis only and their practical relevance is very limited. In this paper, we aim to go beyond the theory. Building on earlier work by Liu, Lakemeyer, and Levesque, we develop a logic of limited belief that is highly expressive but remains decidable in the first-order and tractable in the propositional case and exhibits some characteristics that make it attractive for an implementation. We introduce a reasoning system that employs this logic as representation language and present experimental results that showcase the benefit of limited belief.

### Wednesday 2316:30 - 18:00MAS-EPSC - Economic Paradigms and Social Choice (219)

Chair: Thomas Meyer
• #3026
Mechanisms for Online Organ Matching
Nicholas Mattei, Abdallah Saffidine, Toby Walsh
Economic Paradigms and Social Choice

Matching donations from deceased patients to patients on the waiting list account for over 85\% of all kidney transplants performed in Australia. We propose a simple mechanisms to perform this matching and compare this new mechanism with the more complex algorithm currently under consideration by the Organ and Tissue Authority in Australia. We perform a number of experiments using real world data provided by the Organ and Tissue Authority of Australia. We find that our simple mechanism is more efficient and fairer in practice compared to the other mechanism currently under consideration.

• #1503
Computing an Approximately Optimal Agreeable Set of Items
Pasin Manurangsi, Warut Suksompong
Economic Paradigms and Social Choice

We study the problem of finding a small subset of items that is agreeable to all agents, meaning that all agents value the subset at least as much as its complement. Previous work has shown worst-case bounds, over all instances with a given number of agents and items, on the number of items that may need to be included in such a subset. Our goal in this paper is to efficiently compute an agreeable subset whose size approximates the size of the smallest agreeable subset for a given instance. We consider three well-known models for representing the preferences of the agents: ordinal preferences on single items, the value oracle model, and additive utilities. In each of these models, we establish virtually tight bounds on the approximation ratio that can be obtained by algorithms running in polynomial time.

• #1704
Recognizing Top-Monotonic Preference Profiles in Polynomial Time
Krzysztof Magiera, Piotr Faliszewski
Economic Paradigms and Social Choice

We provide the first polynomial-time algorithm for recognizing if a profile of (possibly weak) preference orders is top-monotonic. Top-monotonicity is a generalization of the notions of single-peakedness and single-crossingness, defined by Barbera and Moreno. Top-monotonic profiles always have weak Condorcet winners and satisfy a variant of the median voter theorem. Our algorithm proceeds by reducing the recognition problem to the SAT-2CNF problem.

• #1756
Proportional Rankings
Piotr Skowron, Martin Lackner, Markus Brill, Dominik Peters, Edith Elkind
Economic Paradigms and Social Choice

We extend the principle of proportional representation to rankings: given approval preferences, we aim to generate aggregate rankings so that cohesive groups of voters are represented proportionally in each initial segment of the ranking. Such rankings are desirable in situations where initial segments of different lengths may be relevant, e.g., in recommender systems, for hiring decisions, or for the presentation of competing proposals on a liquid democracy platform. We define what it means for rankings to be proportional, provide bounds for well-known aggregation rules, and experimentally evaluate the performance of these rules.

• #1884
Manipulating Gale-Shapley Algorithm: Preserving Stability and Remaining Inconspicuous
Rohit Vaish, Dinesh Garg
Economic Paradigms and Social Choice

We study the problem of manipulation of the men-proposing Gale-Shapley algorithm by a single woman via permutation of her true preference list. Our contribution is threefold: First, we show that the matching induced by an optimal manipulation is stable with respect to the true preferences. Second, we identify a class of optimal manipulations called inconspicuous manipulations which, in addition to preserving stability, are also nearly identical to the true preference list of the manipulator (making the manipulation hard to be detected). Third, for optimal inconspicuous manipulations, we strengthen the stability result by showing that the entire stable lattice of the manipulated instance is contained inside the original lattice.​

• #2402
Fair Division of a Graph
Sylvain Bouveret, Katarína Cechlárová, Edith Elkind, Ayumi Igarashi, Dominik Peters
Economic Paradigms and Social Choice

We consider fair allocation of indivisible items under an additional constraint: there is an undirected graph describing the relationship between the items, and each agent's share must form a connected subgraph of this graph. This framework captures, e.g., fair allocation of land plots, where the graph describes the accessibility relation among the plots. We focus on agents that have additive utilities for the items, and consider several common fair division solution concepts, such as proportionality, envy-freeness and maximin share guarantee. While finding good allocations according to these solution concepts is computationally hard in general, we design efficient algorithms for special cases wherethe underlying graph has simple structure, and/or the number of agents---or, less restrictively, the number of agent types---is small. In particular, despite non-existence results in the general case, we prove that for acyclic graphs a maximin share allocation always exists and can be found efficiently.

### Wednesday 2316:30 - 18:00NLP-AT2 - NLP Applications and Tools 2 (220)

Chair: Freddy Lecue
• #3283
A Feature-Enriched Neural Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging
Xinchi Chen, Xipeng Qiu, Xuanjing Huang
NLP Applications and Tools 2

Recently, neural network models for natural language processing tasks have been increasingly focused on for their ability of alleviating the burden of manual feature engineering. However, the previous neural models cannot extract the complicated feature compositions as the traditional methods with discrete features. In this work, we propose a feature-enriched neural model for joint Chinese word segmentation and part-of-speech tagging task. Specifically, to simulate the feature templates of traditional discrete feature based models, we use different filters to model the complex compositional features with convolutional and pooling layer, and then utilize long distance dependency information with recurrent layer. Experimental results on five different datasets show the effectiveness of our proposed model.

• #3642
Learning Conversational Systems that Interleave Task and Non-Task Content
Zhou Yu, Alexander Rudnicky, Alan Black
NLP Applications and Tools 2

Task-oriented dialog systems have been applied in various tasks, such as automated personal assistants, customer service providers and tutors. These systems work well when users have clear and explicit intentions that are well-aligned to the systems' capabilities. However, they fail if users intentions are not explicit.To address this shortcoming, we propose a framework to interleave non-task content (i.e.everyday social conversation) into task conversations. When the task content fails, the system can still keep the user engaged with the non-task content. We trained a policy using reinforcement learning algorithms to promote long-turn conversation coherence and consistency, so that the system can have smooth transitions between task and non-task content.To test the effectiveness of the proposed framework, we developed a movie promotion dialog system. Experiments with human users indicate that a system that interleaves social and task content achieves a better task success rate and is also rated as more engaging compared to a pure task-oriented system.

• #3692
Predicting the Quality of Short Narratives from Social Media
Tong Wang, Ping Chen, Boyang Li
NLP Applications and Tools 2

An important and difficult challenge in building computational models for narratives is the automatic evaluation of narrative quality. Quality evaluation connects narrative understanding and generation as generation systems need to evaluate their own products. To circumvent difficulties in acquiring annotations, we employ upvotes in social media as an approximate measure for story quality. We collected 54,484 answers from a crowd-powered question-and-answer website, Quora, and then used active learning to build a classifier that labeled 28,320 answers as stories. To predict the number of upvotes without the use of social network features, we create neural networks that model textual regions and the interdependence among regions, which serve as strong benchmarks for future research. To our best knowledge, this is the first large-scale study for automatic evaluation of narrative quality.

• #2862
AGRA: An Analysis-Generation-Ranking Framework for Automatic Abbreviation from Paper Titles
Jianbing Zhang, Yixin Sun, Shujian Huang, Cam-Tu Nguyen, Xiaoliang Wang, Xinyu Dai, Jiajun Chen, Yang Yu
NLP Applications and Tools 2

People sometimes choose word-like abbreviations to refer to items with a long description. These abbreviations usually come from the descriptive text of the item and are easy to remember and pronounce, while preserving the key idea of the item. Coming up with a nice abbreviation is not an easy job, even for human. Previous assistant naming systems compose names by applying hand-written rules, which may not perform well. In this paper, we propose to view the naming task as an artificial intelligence problem and create a data set in the domain of academic naming. To generate more delicate names, we propose a three-step framework, including description analysis, candidate generation and abbreviation ranking, each of which is parameterized and optimizable. We conduct experiments to compare different settings of our framework with several analysis approaches from different perspectives. Compared to online or baseline systems, our framework could achieve the best results.

• #2888
Learning to Identify Ambiguous and Misleading News Headlines
Wei Wei, Xiaojun Wan
NLP Applications and Tools 2

Accuracy is one of the basic principles of journalism. However, it is increasingly hard to manage due to the diversity of news media. Some editors of online news tend to use catchy headlines which trick readers into clicking. These headlines are either ambiguous or misleading, degrading the reading experience of the audience. Thus, identifying inaccurate news headlines is a task worth studying. Previous work names these headlines clickbaits'' and mainly focus on the features extracted from the headlines, which limits the performance since the consistency between headlines and news bodies is underappreciated. In this paper, we clearly redefine the problem and identify ambiguous and misleading headlines separately. We utilize class sequential rules to exploit structure information when detecting ambiguous headlines. For the identification of misleading headlines, we extract features based on the congruence between headlines and bodies. To make use of the large unlabeled data set, we apply a co-training method and gain an increase in performance. The experiment results show the effectiveness of our methods. Then we use our classifiers to detect inaccurate headlines crawled from different sources and conduct a data analysis.

• #916
Learning to Explain Entity Relationships by Pairwise Ranking with Convolutional Neural Networks
Jizhou Huang, Wei Zhang, Shiqi Zhao, Shiqiang Ding, Haifeng Wang
NLP Applications and Tools 2

Providing a plausible explanation for the relationship between two related entities is an important task in some applications of knowledge graphs, such as in search engines. However, most existing methods require a large number of manually labeled training data, which cannot be applied in large-scale knowledge graphs due to the expensive data annotation. In addition, these methods typically rely on costly handcrafted features. In this paper, we propose an effective pairwise ranking model by leveraging clickthrough data of a Web search engine to address these two problems. We first construct large-scale training data by leveraging the query-title pairs derived from clickthrough data of a Web search engine. Then, we build a pairwise ranking model which employs a convolutional neural network to automatically learn relevant features. The proposed model can be easily trained with backpropagation to perform the ranking task. The experiments show that our method significantly outperforms several strong baselines.

### Wednesday 2316:30 - 18:30JOU-MISC - Journal Track: Search, Planning, Uncertainty and applications (203)

Chair: Alessio Lomuscio
• #1370
Local Search for Minimum Weight Dominating Set with Two-Level Configuration Checking and Frequency Based Scoring Function (Extended Abstract)
Yiyuan Wang, Shaowei Cai, Minghao Yin
Journal Track: Search, Planning, Uncertainty and applications

The Minimum Weight Dominating Set (MWDS) problem is an important generalization of the Minimum Dominating Set (MDS) problem with extensive applications. This paper proposes a new local search algorithm for the MWDS problem, which is based on two new ideas. The first idea is a heuristic called two-level configuration checking (CC2), which is a new variant of a recent powerful configuration checking strategy (CC) for effectively avoiding the recent search paths. The second idea is a novel scoring function based on the frequency of being uncovered of vertices. Our algorithm is called CC2FS, according to the names of the two ideas. The experimental results show that, CC2FS performs much better than some state-of-the-art algorithms in terms of solution quality on a broad range of MWDS benchmarks.

• #2116
Efficient Mechanism Design for Online Scheduling (Extended Abstract)
Xujin Chen, Xiaodong Hu, Tie-Yan Liu, Weidong Ma, Tao Qin, Pingzhong Tang, Changjun Wang, Bo Zheng
Journal Track: Search, Planning, Uncertainty and applications

This work concerns the mechanism design for online scheduling in a strategic setting. In this setting, each job is owned by a self-interested agent who may misreport the release time, deadline, length, and value of her job, while we need to determine not only the schedule of the jobs, but also the payment of each agent. We focus on the design of incentive compatible (IC) mechanisms, and study the maximization of social welfare (i.e., the aggregated value of completed jobs) by competitive analysis. We first derive two lower bounds on the competitive ratio of any deterministic IC mechanism to characterize the landscape of our research: one bound is 5, which holds for equal-length jobs; the other bound is $\frac{\kappa}{\ln\kappa}+1-o(1)$, which holds for unequal-length jobs, where $\kappa$ is the maximum ratio between lengths of any two jobs. We then propose a deterministic IC mechanism and show that such a simple mechanism works very well for two models: (1) In the preemption-restart model, the mechanism can achieve the optimal competitive ratio of 5 for equal-length jobs and a near optimal ratio of $(\frac{1}{(1-\epsilon)^2}+o(1)) \frac{\kappa}{\ln\kappa}$ for unequal-length jobs, where $0<\epsilon<1$ is a small constant; (2) In the preemption-resume model, the mechanism can achieve the optimal competitive ratio of 5 for equal-length jobs and a near optimal competitive ratio (within factor 2) for unequal-length jobs.

• #4194
Some Properties of Batch Value of Information in the Selection Problem (Extended Abstract)
Shahaf S. Shperberg, Solomon Eyal Shimony
Journal Track: Search, Planning, Uncertainty and applications

We examine theoretical properties of value of information (VOI) in the selection problem, and identify cases of submodularity and supermodularity. We use these properties to compute approximately optimal measurement batch policies, implemented on a “wine selection problem” example.

• #4230
A generic approach to planning in the presence of incomplete information: Theory and implementation (Extended Abstract)
Son Thanh To, Tran Cao Son, Enrico Pontelli
Journal Track: Search, Planning, Uncertainty and applications

This paper proposes a generic approach to planning in the presence of incomplete information. The approach builds on an abstract notion of a belief state representation, along with an associated set of basic operations. These operations facilitate the development of a sound and complete transition function, for reasoning about effects of actions in the presence of incomplete information, and a set of abstract algorithms for planning. The paper demonstrates how the abstract definitions and algorithms can be instantiated in three concrete representations—minimal-DNF, minimal-CNF, and prime implicates—resulting in three highly competitive conformant planners: DNF, CNF, and PIP. The paper relates the notion of a representation to that of ordered binary decision diagrams, a well-known belief state representation employed by many conformant planners, and several target compilation languages that have been presented in the literature.The paper also includes an experimental evaluation of the planners DNF, CNF, and PIP and proposes a new set of conformant planning benchmarks that are challenging for state-of-the-art conformant planners.

• #4253
Coherent Predictive Inference under Exchangeability with Imprecise Probabilities (Extended Abstract)
Gert de Cooman, Jasper De Bock, Márcio Alves Diniz
Journal Track: Search, Planning, Uncertainty and applications

Coherent reasoning under uncertainty can be represented in a very general manner by coherent sets of desirable gambles. This leads to a more general foundation for coherent (imprecise-)probabilistic inference that allows for indecision. In this framework, and for a given finite category set, coherent predictive inference under exchangeability can be represented using Bernstein coherent cones of multivariate polynomials on the simplex generated by this category set. We define an inference system as a map that associates a Bernstein coherent cone of polynomials with every finite category set. Inference principles can then be represented mathematically as restrictions on such maps, which allows us to develop a notion of conservative inference under such inference principles. We discuss, as particular examples, representation insensitivity and specificity, and show that there is an infinity of inference systems that satisfy these two principles.

• #4218
Computer Models Solving Intelligence Test Problems: Progress and Implications (Extended Abstract)
José Hernández-Orallo, Fernando Martínez-Plumed, Ute Schmid, Michael Siebers, David Dowe
Journal Track: Search, Planning, Uncertainty and applications

While some computational models of intelligence test problems were proposed throughout the second half of the XXth century, in the first years of the XXIst century we have seen an increasing number of computer systems being able to score well on particular intelligence test tasks. However, despitethis increasing trend there has been no general account of all these works in terms of how theyrelate to each other and what their real achievements are. In this paper, we provide some insighton these issues by giving a comprehensive account of about thirty computer models, from the 1960sto nowadays, and their relationships, focussing on the range of intelligence test tasks they address, thepurpose of the models, how general or specialised these models are, the AI techniques they use in eachcase, their comparison with human performance, and their evaluation of item difficulty.

### Wednesday 2316:30 - 18:30Competition (206)

Chair: Reyhan Aydogan
• ANAC
Competition
• ### Wednesday 2319:30 - 23:00Social event (The Peninsula at Docklands)

• Conference Banquet
Social event
• ### Thursday 2408:30 - 10:00SIS-MAS - Sister Conference Track: Multiagent Systems (203)

Chair: Piotr Faliszewski
• #4201
Which is the Fairest (Rent Division) of Them All? [Extended Abstract]
Ya'akov (Kobi) Gal, Moshe Mash, Ariel D. Procaccia, Yair Zick
Sister Conference Track: Multiagent Systems

What is a fair way to assign rooms to several housemates, and divide the rent between them? This is not just a theoretical question: many people have used the Spliddit website to obtain envy-free solutions to rent division instances. But envy freeness, in and of itself, is insufficient to guarantee outcomes that people view as intuitive and acceptable. We therefore focus on solutions that optimize a criterion of social justice, subject to the envy freeness constraint, in order to pinpoint the “fairest” solutions. We develop a general algorithmic framework that enables the computation of such solutions in polynomial time. We then study the relations between natural optimization objectives, and identify the maximin solution, which maximizes the minimum utility subject to envy freeness, as the most attractive. We demonstrate, in theory and using experiments on real data from Spliddit, that the maximin solution gives rise to significant gains in terms of our optimization objectives. Finally, a user study with Spliddit users as subjects demonstrates that people find the maximin solution to be significantly fairer than arbitrary envy-free solutions; this user study is unprecedented in that it asks people about their real-world rent division instances. Based on these results, the maximin solution has been deployed on Spliddit since April 2015.

• #4229
Rationalisation of Profiles of Abstract Argumentation Frameworks: Extended Abstract
Stephane Airiau, Elise Bonzon, Ulle Endriss, Nicolas Maudet, Julien Rossit
Sister Conference Track: Multiagent Systems

We review a recently introduced model in which each of a number of agents is endowed with an abstract argumentation framework reflecting her individual views regarding a given set of arguments. A question arising in this context is whether the diversity of views observed in such a situation is consistent with the assumption that every individual argumentation framework is induced by a combination of, first, some basic factual information and, second, the personal preferences of the agent concerned. We treat this question of rationalisability of a profile as an algorithmic problem and identify tractable and intractable cases. This is useful for understanding what types of profiles can reasonably be expected to occur in a multiagent system.

• #4252
Summary: Multi-Agent Path Finding with Kinematic Constraints
Wolfgang Hönig, T. K. Satish Kumar, Liron Cohen, Hang Ma, Hong Xu, Nora Ayanian, Sven Koenig
Sister Conference Track: Multiagent Systems

Multi-Agent Path Finding (MAPF) is well studied in both AI and robotics. Given a discretized environment and agents with assigned start and goal locations, MAPF solvers from AI find collision-free paths for hundreds of agents with user-provided sub-optimality guarantees. However, they ignore that actual robots are subject to kinematic constraints (such as velocity limits) and suffer from imperfect plan-execution capabilities. We therefore introduce MAPF-POST to postprocess the output of a MAPF solver in polynomial time to create a plan-execution schedule that can be executed on robots. This schedule works on non-holonomic robots, considers kinematic constraints, provides a guaranteed safety distance between robots, and exploits slack to avoid time-intensive replanning in many cases. We evaluate MAPF-POST in simulation and on differential-drive robots, showcasing the practicality of our approach.

• #4265
Evaluating Market User Interfaces for Electric Vehicle Charging using Bid2Charge
Sebastian Stein, Enrico H. Gerding, Adrian Nedea, Avi Rosenfeld, Nicholas R. Jennings
Sister Conference Track: Multiagent Systems

We consider settings where electric vehicle drivers participate in a market mechanism to charge their vehicles. Existing work typically assumes that participants are fully rational and can report their charging preferences accurately. However, this may not be reasonable in settings with non-experts. To explore this, we design a novel game called Bid2Charge and compare a fully expressive interface that covers the entire space of preferences to two restricted interfaces that offer fewer possible reports. We show that restricting the users' preferences significantly reduces deliberation times while also leading to an increase in utility by up to 70%.

### Thursday 2408:30 - 10:00ML-CLNN - Classification and Neural Networks (204)

Chair: Georg Dorffner
• #1592
Discriminative Deep Hashing for Scalable Face Image Retrieval
Jie Lin, Zechao Li, Jinhui Tang
Classification and Neural Networks

With the explosive growth of images containing faces, scalable face image retrieval has attracted increasing attention. Due to the amazing effectiveness, deep hashing has become a popular hashing method recently. In this work, we propose a new Discriminative Deep Hashing (DDH) network to learn discriminative and compact hash codes for large-scale face image retrieval. The proposed network incorporates the end-to-end learning, the divide-and-encode module and the desired discrete code learning into a unified framework. Specifically, a network with a stack of convolution-pooling layers is proposed to extract multi-scale and robust features by merging the outputs of the third max pooling layer and the fourth convolutional layer. To reduce the redundancy among hash codes and the network parameters simultaneously, a divide-and-encode module to generate compact hash codes. Moreover, a loss function is introduced to minimize the prediction errors of the learned hash codes, which can lead to discriminative hash codes. Extensive experiments on two datasets demonstrate that the proposed method achieves superior performance compared with some state-of-the-art hashing methods.

• #1602
Confusion Graph: Detecting Confusion Communities in Large Scale Image Classification
Ruochun Jin, Yong Dou, Yueqing Wang, Xin Niu
Classification and Neural Networks

For deep CNN-based image classification models, we observe that confusions between classes with high visual similarity are much stronger than those where classes are visually dissimilar. With these unbalanced confusions, classes can be organized in communities, which is similar to cliques of people in the social network. Based on this, we propose a graph-based tool named "confusion graph" to quantify these confusions and further reveal the community structure inside the database. With this community structure, we can diagnose the model's weaknesses and improve the classification accuracy using specialized expert sub-nets, which is comparable to other state-of-the-art techniques. Utilizing this community information, we can also employ pre-trained models to automatically identify mislabeled images in the large scale database. With our method, researchers just need to manually check approximate 3% of the ILSVRC2012 classification database to locate almost all mislabeled samples.

• #2830
Identifying Human Mobility via Trajectory Embeddings
Qiang Gao, Fan Zhou, Kunpeng Zhang, Goce Trajcevski, Xucheng Luo, Fengli Zhang
Classification and Neural Networks

Understanding human trajectory patterns is an important task in many location based social networks (LBSNs) applications, such as personalized recommendation and preference-based route planning. Most of the existing methods classify a trajectory (or its segments) based on spatio-temporal values and activities, into some predefined categories, e.g., walking or jogging. We tackle a novel trajectory classification problem: we identify and link trajectories to users who generate them in the LBSNs, a problem called Trajectory-User Linking (TUL). Solving the TUL problem is not a trivial task because: (1) the number of the classes (i.e., users) is much larger than the number of motion patterns in the common trajectory classification problems; and (2) the location based trajectory data, especially the check-ins, are often extremely sparse. To address these challenges, a Recurrent Neural Networks (RNN) based semi-supervised learning model, called TULER (TUL via Embedding and RNN) is proposed, which exploits the spatio-temporal data to capture the underlying semantics of user mobility patterns. Experiments conducted on real-world datasets demonstrate that TULER achieves better accuracy than the existing methods.

• #3311
Name Nationality Classification with Recurrent Neural Networks
Jinhyuk Lee, Hyunjae Kim, Miyoung Ko, Donghee Choi, Jaehoon Choi, Jaewoo Kang
Classification and Neural Networks

Personal names tend to have many variations differing from country to country. Though there exists a large amount of personal names on the Web, nationality prediction solely based on names has not been fully studied due to its difficulties in extracting subtle character level features. We propose a recurrent neural network based model which predicts nationalities of each name using automatic feature extraction. Evaluation of Olympic record data shows that our model achieves greater accuracy than previous feature based approaches in nationality prediction tasks. We also evaluate our proposed model and baseline models on name ethnicity classification task, again achieving better or comparable performances. We further investigate the effectiveness of character embeddings used in our proposed model.

• #3712
Improving Classification Accuracy of Feedforward Neural Networks for Spiking Neuromorphic Chips
Antonio Jimeno Yepes, Jianbin Tang, Benjamin Scott Mashford
Classification and Neural Networks

Deep Neural Networks (DNN) achieve human level performance in many image analytics tasks but DNNs are mostly deployed to GPU platforms that consume a considerable amount of power. New hardware platforms using lower precision arithmetic achieve drastic reductions in power consumption. More recently, brain-inspired spiking neuromorphic chips have achieved even lower power consumption, on the order of milliwatts, while still offering real-time processing. However, for deploying DNNs to energy efficient neuromorphic chips the incompatibility between continuous neurons and synaptic weights of traditional DNNs, discrete spiking neurons and synapses of neuromorphic chips need to be overcome. Previous work has achieved this by training a network to learn continuous probabilities, before it is deployed to a neuromorphic architecture, such as IBM TrueNorth Neurosynaptic System, by random sampling these probabilities. The main contribution of this paper is a new learning algorithm that learns a TrueNorth configuration ready for deployment. We achieve this by training directly a binary hardware crossbar that accommodates the TrueNorth axon configuration constrains and we propose a different neuron model. Results of our approach trained on electroencephalogram (EEG) data show a significant improvement with previous work (76% vs 86% accuracy) while maintaining state of the art performance on the MNIST handwritten data set.

• #3963
Object Recognition with and without Objects
Zhuotun Zhu, Lingxi Xie, Alan Yuille
Classification and Neural Networks

While recent deep neural networks have achieved a promising performance on object recognition, they rely implicitly on the visual contents of the whole image. In this paper, we train deep neural networks on the foreground (object) and background (context) regions of images respectively. Considering human recognition in the same situations, networks trained on the pure background without objects achieves highly reasonable recognition performance that beats humans by a large margin if only given context. However, humans still outperform networks with pure object available, which indicates networks and human beings have different mechanisms in understanding an image. Furthermore, we straightforwardly combine multiple trained networks to explore different visual cues learned by different networks. Experiments show that useful visual hints can be explicitly learned separately and then combined to achieve higher performance, which verifies the advantages of the proposed framework.

### Thursday 2408:30 - 10:00ML-DLV2 - Deep Learning and Vision 2 (210)

Chair: Guiguang Ding
• #1520
Importance-Aware Semantic Segmentation for Autonomous Driving System
Bi-ke Chen, Chen Gong, Jian Yang
Deep Learning and Vision 2

Semantic Segmentation (SS) partitions an image into several coherent semantically meaningful parts, and classifies each part into one of the pre-determined classes. In this paper, we argue that existing SS methods cannot be reliably applied to autonomous driving system as they ignore the different importance levels of distinct classes for safe-driving. For example, pedestrians in the scene are much more important than sky when driving a car, so their segmentations should be as accurate as possible. To incorporate the importance information possessed by various object classes, this paper designs an "Importance-Aware Loss" (IAL) that specifically emphasizes the critical objects for autonomous driving. IAL operates under a hierarchical structure, and the classes with different importance are located in different levels so that they are assigned distinct weights. Furthermore, we derive the forward and backward propagation rules for IAL and apply them to deep neural networks for realizing SS in intelligent driving system. The experiments on CamVid and Cityscapes datasets reveal that by employing the proposed loss function, the existing deep learning models including FCN, SegNet and ENet are able to consistently obtain the improved segmentation results on the pre-defined important classes for safe-driving.

• #1737
Multi-Stream Deep Similarity Learning Networks for Visual Tracking
Kunpeng Li, Yu Kong, Yun Fu
Deep Learning and Vision 2

Visual tracking has achieved remarkable success in recent decades, but it remains a challenging problem due to appearance variations over time and complex cluttered background. In this paper, we adopt a tracking-by-verification scheme to overcome these challenges by determining the patch in the subsequent frame that is most similar to the target template and distinctive to the background context. A multi-stream deep similarity learning network is proposed to learn the similarity comparison model. The loss function of our network encourages the distance between a positive patch in the search region and the target template to be smaller than that between positive patch and the background patches. Within the learned feature space, even if the distance between positive patches becomes large caused by the appearance change or interference of background clutter, our method can use the relative distance to distinguish the target robustly. Besides, the learned model is directly used for tracking with no need of model updating, parameter fine-tuning and can run at 45 fps on a single GPU. Our tracker achieves state-of-the-art performance on the visual tracking benchmark compared with other recent real-time-speed trackers, and shows better capability in handling background clutter, occlusion and appearance change.

• #1899
Person Re-Identification by Deep Joint Learning of Multi-Loss Classification
Wei Li, Xiatian Zhu, Shaogang Gong
Deep Learning and Vision 2

Existing person re-identification (re-id) methods rely mostly on either localised or global feature representation. This ignores their joint benefit and mutual complementary effects. In this work, we show the advantages of jointly learning local and global features in a Convolutional Neural Network (CNN) by aiming to discover correlated local and global features in different context. Specifically, we formulate a method for joint learning of local and global feature selection losses designed to optimise person re-id when using generic matching metrics such as the L2 distance. We design a novel CNN architecture for Jointly Learning Multi-Loss (JLML) of local and global discriminative feature optimisation subject concurrently to the same re-id labelled information. Extensive comparative evaluations demonstrate the advantages of this new JLML model for person re-id over a wide range of state-of-the-art re-id methods on five benchmarks (VIPeR, GRID, CUHK01, CUHK03, Market-1501).

• #2039
Locality Constrained Deep Supervised Hashing for Image Retrieval
Hao Zhu, Shenghua Gao
Deep Learning and Vision 2

Deep Convolutional Neural Network (DCNN) based deep hashing has shown its success for fast and accurate image retrieval, however directly minimizing the quantization error in deep hashing will change the distribution of DCNN features, and consequently change the similarity between the query and the retrieved images in hashing. In this paper, we propose a novel Locality-Constrained Deep Supervised Hashing. By simultaneously learning discriminative DCNN features and preserving the similarity between image pairs, the hash codes of our scheme preserves the distribution of DCNN features thus favors the accurate image retrieval.The contributions of this paper are two-fold: i) Our analysis shows that minimizing quantization error in deep hashing makes the features less discriminative which is not desirable for image retrieval; ii) We propose a Locality-Constrained Deep Supervised Hashing which preserves the similarity between image pairs in hashing.Extensive experiments on the CIFARA-10 and NUS-WIDE datasets show that our method significantly boosts the accuracy of image retrieval, especially on the CIFAR-10 dataset, the improvement is usually more than 6% in terms of the MAP measurement. Further, our method demonstrates 10 times faster than state-of-the-art methods in the training phase.

• #2627
Deep Supervised Hashing with Nonlinear Projections
Sen Su, Gang Chen, Xiang Cheng, Rong Bi
Deep Learning and Vision 2

Hashing has attracted broad research interests in large scale image retrieval due to its high search speed and efficient storage. Recently, many deep hashing methods have been proposed to perform simultaneous nonlinear feature learning and hash projection learning, which have shown superior performance compared to hand-crafted feature based hashing methods. Nonlinear projection functions have shown their advantages over the linear ones due to their powerful generalization capabilities. To improve the performance of deep hashing methods by generalizing projection functions, we propose the idea of implementing a pure nonlinear deep hashing network architecture. By consolidating the above idea, this paper presents a Deep Supervised Hashing architecture with Nonlinear Projections (DSHNP). In particular, soft decision trees are adopted as the nonlinear projection functions, since they can generate differentiable nonlinear outputs and can be trained with deep neural networks in an end-to-end way. Moreover, to make the hash codes as independent as possible, we design two regularizers imposed on the parameter matrices of the leaves in the soft decision trees. Extensive evaluations on two benchmark image datasets show that the proposed DSHNP outperforms several state-of-the-art hashing methods.

• #2597
Cause-Effect Knowledge Acquisition and Neural Association Model for Solving A Set of Winograd Schema Problems
Quan Liu, Hui Jiang, Andrew Evdokimov, Zhen-Hua Ling, Xiaodan Zhu, Si Wei, Yu Hu
Deep Learning and Vision 2

This paper focuses on the investigations in Winograd Schema (WS), a challenging problem which has been proposed for measuring progress in commonsense reasoning.Due to the lack of commonsense knowledge and training data, very little work has been found on the WS problems in recent years.Actually, there is no shortcut to solve this problem except to collect more commonsense knowledge and design suitable models.Therefore, this paper addresses a set of WS problems by proposing a knowledge acquisition method and a general neural association model.To avoid the sparseness issue, the knowledge we aim to collect is the cause-effect relationships between thousands of commonly used words.The knowledge acquisition method supports us to extract hundreds of thousands of cause-effect pairs from large text corpus automatically.Meanwhile, a neural association model (NAM) is proposed to encode the association relationships between any two discrete events.Based on the extracted knowledge and the NAM models, in this paper, we successfully build a system for solving WS problems from scratch and achieve 70.0% accuracy.Most importantly, this paper provides a flexible framework to solve WS problems based on event association and neural network methods.

### Thursday 2408:30 - 10:00ML-DM3 - Data Mining 3 (211)

Chair: Joao Gama
• #2339
Linear Manifold Regularization with Adaptive Graph for Semi-supervised Dimensionality Reduction
Kai Xiong, Feiping Nie, Junwei Han
Data Mining 3

Many previous graph-based methods perform dimensionality reduction on a pre-defined graph. However, due to the noise and redundant information in the original data, the pre-defined graph has no clear structure and may not be appropriate for the subsequent task. To overcome the drawbacks, in this paper, we propose a novel approach called linear manifold regularization with adaptive graph (LMRAG) for semi-supervised dimensionality reduction. LMRAG directly incorporates the graph construction into the objective function, thus the projection matrix and the optimal graph can be simultaneously optimized. Due to the structure constraint, the learned graph is sparse and has clear structure. Extensive experiments on several benchmark datasets demonstrate the effectiveness of the proposed method.

• #2350
Dynamic Weighted Majority for Incremental Learning of Imbalanced Data Streams with Concept Drift
Yang Lu, Yiu-ming Cheung, Yuan Yan Tang
Data Mining 3

Concept drifts occurring in data streams will jeopardize the accuracy and stability of the online learning process. If the data stream is imbalanced, it will be even more challenging to detect and cure the concept drift. In the literature, these two problems have been intensively addressed separately, but have yet to be well studied when they occur together. In this paper, we propose a chunk-based incremental learning method called Dynamic Weighted Majority for Imbalance Learning (DWMIL) to deal with the data streams with concept drift and class imbalance problem. DWMIL utilizes an ensemble framework by dynamically weighting the base classifiers according to their performance on the current data chunk. Compared with the existing methods, its merits are four-fold: (1) it can keep stable for non-drifted streams and quickly adapt to the new concept; (2) it is totally incremental, i.e. no previous data needs to be stored; (3) it keeps a limited number of classifiers to ensure high efficiency; and (4) it is simple and needs only one thresholding parameter. Experiments on both synthetic and real data sets with concept drift show that DWMIL performs better than the state-of-the-art competitors, with less computational cost.

• #2728
Semi-supervised Orthogonal Graph Embedding with Recursive Projections
Hanyang Liu, Junwei Han, Feiping Nie
Data Mining 3

Many graph based semi-supervised dimensionality reduction algorithms utilize the projection matrix to linearly map the data matrix from the original feature space to a lower dimensional representation. But the dimensionality after reduction is inevitably restricted to the number of classes, and the learned non-orthogonal projection matrix usually fails to preserve distances well and balance the weight on different projection direction. This paper proposes a novel dimensionality reduction method, called the semi-supervised orthogonal graph embedding with recursive projections (SOGE). We integrate the manifold smoothness and label fitness as well as the penalization of the linear mapping mismatch, and learn the orthogonal projection on the Stiefel manifold that empirically demonstrates better performance. Moreover, we recursively update the projection matrix in its orthocomplemented space to continuously learn more projection vectors, so as to better control the dimension of reduction. Comprehensive experiment on several benchmarks demonstrates the significant improvement over the existing methods.

• #2438
Self-paced Mixture of Regressions
Longfei Han, Dingwen Zhang, Dong Huang, Xiaojun Chang, Jun Ren, Senlin Luo, Junwei Han
Data Mining 3

Mixture of regressions (MoR) is the well-established and effective approach to model discontinuous and heterogeneous data in regression problems. Existing MoR approaches assume smooth joint distribution for its good anlaytic properties. However, such assumption makes existing MoR very sensitive to intra-component outliers (the noisy training data residing in certain components) and the inter-component imbalance (the different amounts of training data in different components). In this paper, we make the earliest effort on Self-paced Learning (SPL) in MoR, i.e., Self-paced mixture of regressions (SPMoR) model. We propose a novel self-paced regularizer based on the Exclusive LASSO, which improves inter-component balance of training data. As a robust learning regime, SPL pursues confidence sample reasoning. To demonstrate the effectiveness of SPMoR, we conducted experiments on both the sythetic examples and real-world applications to age estimation and glucose estimation. The results show that SPMoR outperforms the state-of-the-arts methods.

• #3301
Locally Linear Factorization Machines
Chenghao Liu, Teng Zhang, Peilin Zhao, Jun Zhou, Jianling Sun
Data Mining 3

Factorization Machines (FMs) are a widely used method for efficiently using high-order feature interactions in classification and regression tasks. Unfortunately, despite increasing interests in FMs, existing work only considers high order information of the input features which limits their capacities in non-linear problems and fails to capture the underlying structures of more complex data. In this work, we present a novel Locally Linear Factorization Machines (LLFM) which overcomes this limitation by exploring local coding technique. Unlike existing local coding classifiers that involve a phase of unsupervised anchor point learning and predefined local coding scheme which is suboptimal as the class label information is not exploited in discovering the encoding and thus can result in a suboptimal encoding for prediction, we formulate a joint optimization over the anchor points, local coding coordinates and FMs variables to minimize classification or regression risk. Empirically, we demonstrate that our approach achieves much better predictive accuracy than other competitive methods which employ LLFM with unsupervised anchor point learning and predefined local coding scheme.

• #3336
Robust Survey Aggregation with Student-t Distribution and Sparse Representation
Qingtao Tang, Tao Dai, Li Niu, Yisen Wang, Shu-Tao Xia, Jianfei Cai
Data Mining 3

Most existing survey aggregation methods assume that the sample data follow Gaussian distribution. However, these methods are sensitive to outliers, due to the thin-tailed property of the Gaussian distribution. To address this issue, we propose a robust survey aggregation method based on Student-t distribution and sparse representation. Specifically, we assume that the samples follow Student-$t$ distribution, instead of the common Gaussian distribution. Due to the Student-t distribution, our method is robust to outliers, which can be explained from both Bayesian point of view and non-Bayesian point of view. In addition, inspired by James-Stain estimator (JS) and Compressive Averaging (CAvg), we propose to sparsely represent the global mean vector by an adaptive basis comprising both data-specific basis and combined generic bases. Theoretically, we prove that JS and CAvg are special cases of our method. Extensive experiments demonstrate that our proposed method achieves significant improvement over the state-of-the-art methods on both synthetic and real datasets.

### Thursday 2408:30 - 10:00ML-TAML1 - transfer, Adaptation, Multi-Task Learning 1 (212)

Chair: Tongliang Liu
• #1741
Learning Latest Classifiers without Additional Labeled Data
Atsutoshi Kumagai, Tomoharu Iwata
transfer, Adaptation, Multi-Task Learning 1

In various applications such as spam mail classification, the performance of classifiers deteriorates over time. Although retraining classifiers using labeled data helps to maintain the performance, continuously preparing labeled data is quite expensive. In this paper, we propose a method to learn classifiers by using newly obtained unlabeled data, which are easy to prepare, as well as labeled data collected beforehand. A major reason for the performance deterioration is the emergence of new features that do not appear in the training phase. Another major reason is the change of the distribution between the training and test phases. The proposed method learns the latest classifiers that overcome both problems. With the proposed method, the conditional distribution of new features given existing features is learned using the unlabeled data. In addition, the proposed method estimates the density ratio between training and test distributions by using the labeled and unlabeled data. We approximate the classification error of a classifier, which exploits new features as well as existing features, at the test phase by incorporating both the conditional distribution of new features and the densityratio, simultaneously. By minimizing the approximated error while integrating out new feature values, we obtain a classifier that exploits new features and fits on the test phase. The effectiveness of the proposed method is demonstrated with experiments using synthetic and real-world data sets.

• #1957
Dependency Exploitation: A Unified CNN-RNN Approach for Visual Emotion Recognition
Xinge Zhu, Liang Li, Weigang Zhang, Tianrong Rao, Min Xu, Qingming Huang, Dong Xu
transfer, Adaptation, Multi-Task Learning 1

Visual emotion recognition aims to associate images with appropriate emotions. There are different visual stimuli that can affect human emotion from low-level to high-level, such as color, texture, part, object, etc. However, most existing methods treat different levels of features as independent entity without having effective method for feature fusion. In this paper, we propose a unified CNN-RNN model to predict the emotion based on the fused features from different levels by exploiting the dependency among them. Our proposed architecture leverages convolutional neural network (CNN) with multiple layers to extract different levels of features with in a multi-task learning framework, in which two related loss functions are introduced to learn the feature representation. Considering the dependencies within the low-level and high-level features, a new bidirectional recurrent neural network (RNN) is proposed to integrate the learned features from different layers in the CNN model. Extensive experiments on both Internet images and art photo datasets demonstrate that our method outperforms the state-of-the-art methods with at least 7% performance improvement.

• #2062
Learning Discriminative Correlation Subspace for Heterogeneous Domain Adaptation
Yuguang Yan, Wen Li, Michael Ng, Mingkui Tan, Hanrui Wu, Huaqing Min, Qingyao Wu
transfer, Adaptation, Multi-Task Learning 1

Domain adaptation aims to reduce the effort on collecting and annotating target data by leveraging knowledge from a different source domain. The domain adaptation problem will become extremely challenging when the feature spaces of the source and target domains are different, which is also known as the heterogeneous domain adaptation (HDA) problem. In this paper, we propose a novel HDA method to find the optimal discriminative correlation subspace for the source and target data. The discriminative correlation subspace is inherited from the canonical correlation subspace between the source and target data, and is further optimized to maximize the discriminative ability for the target domain classifier. We formulate a joint objective in order to simultaneously learn the discriminative correlation subspace and the target domain classifier. We then apply an alternating direction method of multiplier (ADMM) algorithm to address the resulting non-convex optimization problem. Comprehensive experiments on two real-world data sets demonstrate the effectiveness of the proposed method compared to the state-of-the-art methods.

• #2690
AccGenSVM: Selectively Transferring from Previous Hypotheses
Diana Benavides-Prado, Yun Sing Koh, Patricia Riddle
transfer, Adaptation, Multi-Task Learning 1

In our research, we consider transfer learning scenarios where a target learner does not have access to the source data, but instead to hypotheses or models induced from it. This is called the Hypothesis Transfer Learning (HTL) problem. Previous approaches concentrated on transferring source hypotheses as a whole. We introduce a novel method for selectively transferring elements from previous hypotheses learned with Support Vector Machines. The representation of an SVM hypothesis as a set of support vectors allows us to treat this information as privileged to aid learning during a new task. Given a possibly large number of source hypotheses, our approach selects the source support vectors that more closely resemble the target data, and transfers their learned coefficients as constraints on the coefficients to be learned. This strategy increases the importance of relevant target data points based on their similarity to source support vectors, while learning from the target data. Our method shows important improvements on the convergence rate on three classification datasets of varying sizes, decreasing the number of iterations by up to 56% on average compared to learning with no transfer and up to 92% compared to regular HTL, while maintaining similar accuracy levels.

• #2933
Privileged Multi-label Learning
Shan You, Chang Xu, Yunhe Wang, Chao Xu, Dacheng Tao
transfer, Adaptation, Multi-Task Learning 1

This paper presents privileged multi-label learning (PrML) to explore and exploit the relationship between labels in multi-label learning problems. We suggest that for each individual label, it cannot only be implicitly connected with other labels via the low-rank constraint over label predictors, but also its performance on examples can receive the explicit comments from other labels together acting as an Oracle teacher. We generate privileged label feature for each example and its individual label, and then integrate it into the framework of low-rank based multi-label learning. The proposed algorithm can therefore comprehensively explore and exploit label relationships by inheriting all the merits of privileged information and low-rank constraints. We show that PrML can be efficiently solved by dual coordinate descent algorithm using iterative optimization strategy with cheap updates. Experiments on benchmark datasets show that through privileged label features, the performance can be significantly improved and PrML is superior to several competing methods in most cases.

• #3006
Boosted Zero-Shot Learning with Semantic Correlation Regularization
Te Pi, Xi Li, Zhongfei (Mark) Zhang
transfer, Adaptation, Multi-Task Learning 1

We study zero-shot learning (ZSL) as a transfer learning problem, and focus on the two key aspects of ZSL, model effectiveness and model adaptation. For effective modeling, we adopt the boosting strategy to learn a zero-shot classifier from weak models to a strong model. For adaptable knowledge transfer, we devise a Semantic Correlation Regularization (SCR) approach to regularize the boosted model to be consistent with the inter-class semantic correlations. With SCR embedded in the boosting objective, and with a self-controlled sample selection for learning robustness, we propose a unified framework, Boosted Zero-shot classification with Semantic Correlation Regularization (BZ-SCR). By balancing the SCR-regularized boosted model selection and the self-controlled sample selection, BZ-SCR is capable of capturing both discriminative and adaptable feature-to-class semantic alignments, while ensuring the reliability and adaptability of the learned samples. The experiments on two ZSL datasets show the superiority of BZ-SCR over the state-of-the-arts.

### Thursday 2408:30 - 10:00ML-REL1 - Reinforcement Learning 1 (213)

Chair: Jianye Hao
• #1444
Efficient Reinforcement Learning with Hierarchies of Machines by Leveraging Internal Transitions
Aijun Bai, Stuart Russell
Reinforcement Learning 1

In the context of hierarchical reinforcement learning, the idea of hierarchies of abstract machines (HAMs) is to write a partial policy as a set of hierarchical finite state machines with unspecified choice states, and use reinforcement learning to learn an optimal completion of this partial policy. Given a HAM with potentially deep hierarchical structure, there often exist many internal transitions where a machine calls another machine with the environment state unchanged. In this paper, we propose a new hierarchical reinforcement learning algorithm that discovers such internal transitions automatically, and shortcircuits them recursively in computation of Q values. The resulting HAMQ-INT algorithm outperforms the state of the art significantly on the benchmark Taxi domain and a much more complex RoboCup Keepaway domain.

• #2146
Multi-Task Deep Reinforcement Learning for Continuous Action Control
Zhaoyang Yang, Kathryn Merrick, Hussein Abbass, Lianwen Jin
Reinforcement Learning 1

In this paper, we propose a deep reinforcement learning algorithm to learn multiple tasks concurrently. A new network architecture is proposed in the algorithm which reduces the number of parameters needed by more than 75% per task compared to typical single-task deep reinforcement learning algorithms. The proposed algorithm and network fuse images with sensor data and were tested with up to 12 movement-based control tasks on a simulated Pioneer 3AT robot equipped with a camera and range sensors. Results show that the proposed algorithm and network can learn skills that are as good as the skills learned by a comparable single-task learning algorithm. Results also show that learning performance is consistent even when the number of tasks and the number of constraints on the tasks increased.

• #2268
End-to-end optimization of goal-driven and visually grounded dialogue systems
Florian Strub, Harm de Vries, Jérémie Mary, Bilal Piot, Aaron Courville, Olivier Pietquin
Reinforcement Learning 1

End-to-end design of dialogue systems has recently become a popular research topic thanks to powerful tools such as encoder-decoder architectures for sequence-to-sequence learning. Yet, most current approaches cast human-machine dialogue management as a supervised learning problem, aiming at predicting the next utterance of a participant given the full history of the dialogue. This vision may fail to correctly render the planning problem inherent to dialogue as well as its contextual and grounded nature. In this paper, we introduce a Deep Reinforcement Learning method to optimize visually grounded task-oriented dialogues, based on the policy gradient algorithm. This approach is tested on the question generation task from the dataset GuessWhat?! containing 120k dialogues and provides encouraging results at solving both the problem of generating natural dialogues and the task of discovering a specific object in a complex picture.

• #2286
Sequence Prediction with Unlabeled Data by Reward Function Learning
Lijun Wu, Li Zhao, Tao Qin, Jianhuang Lai, Tie-Yan Liu
Reinforcement Learning 1

Reinforcement learning (RL), which has been successfully applied to sequence prediction, introduces \textit{reward} as sequence-level supervision signal to evaluate the quality of a generated sequence. Existing RL approaches use the ground-truth sequence to define reward, which limits the application of RL techniques to labeled data. Since labeled data is usually scarce and/or costly to collect, it is desirable to leverage large-scale unlabeled data. In this paper, we extend existing RL methods for sequence prediction to exploit unlabeled data. We propose to learn the reward function from labeled data and use the predicted reward as \textit{pseudo reward} for unlabeled data so that we can learn from unlabeled data using the pseudo reward. To get good pseudo reward on unlabeled data, we propose a RNN-based reward network with attention mechanism, trained with purposely biased data distribution. Experiments show that the pseudo reward can provide good supervision and guide the learning process on unlabeled data. We observe significant improvements on both neural machine translation and text summarization.

• #2776
Autonomous Task Sequencing for Customized Curriculum Design in Reinforcement Learning
Sanmit Narvekar, Jivko Sinapov, Peter Stone
Reinforcement Learning 1

Transfer learning is a method where an agent reuses knowledge learned in a source task to improve learning on a target task. Recent work has shown that transfer learning can be extended to the idea of curriculum learning, where the agent incrementally accumulates knowledge over a sequence of tasks (i.e. a curriculum). In most existing work, such curricula have been constructed manually. Furthermore, they are fixed ahead of time, and do not adapt to the progress or abilities of the agent. In this paper, we formulate the design of a curriculum as a Markov Decision Process, which directly models the accumulation of knowledge as an agent interacts with tasks, and propose a method that approximates an execution of an optimal policy in this MDP to produce an agent-specific curriculum. We use our approach to automatically sequence tasks for 3 agents with varying sensing and action capabilities in an experimental domain, and show that our method produces curricula customized for each agent that improve performance relative to learning from scratch or using a different agent's curriculum.

• #2855
Improving Reinforcement Learning with Confidence-Based Demonstrations
Zhaodong Wang, Matthew E. Taylor
Reinforcement Learning 1

Reinforcement learning has had many successes, but in practice it often requires significant amounts of data to learn high-performing policies. One common way to improve learning is to allow a trained (source) agent to assist a new (target) agent. The goals in this setting are to 1) improve the target agent's performance, relative to learning unaided, and 2) allow the target agent to outperform the source agent. Our approach leverages source agent demonstrations, removing any requirements on the source agent's learning algorithm or representation. The target agent then estimates the source agent's policy and improves upon it. The key contribution of this work is to show that leveraging the target agent's uncertainty in the source agent's policy can significantly improve learning in two complex simulated domains, Keepaway and Mario.

### Thursday 2408:30 - 10:00KR-GT - Game Theory (216)

Chair: Son Tran
• #2787
Smoothing Method for Approximate Extensive-Form Perfect Equilibrium
Christian Kroer, Gabriele Farina, Tuomas Sandholm
Game Theory

Nash equilibrium is a popular solution concept for solving imperfect-information games in practice. However, it has a major drawback: it does not preclude suboptimal play in branches of the game tree that are not reached in equilibrium. Equilibrium refinements can mend this issue, but have experienced little practical adoption. This is largely due to a lack of scalable algorithms.Sparse iterative methods, in particular first-order methods, are known to be among the most effective algorithms for computing Nash equilibria in large-scale two-player zero-sum extensive-form games. In this paper, we provide, to our knowledge, the first extension of these methods to equilibrium refinements. We develop a smoothing approach for behavioral perturbations of the convex polytope that encompasses the strategy spaces of players in an extensive-form game. This enables one to compute an approximate variant of extensive-form perfect equilibria. Experiments show that our smoothing approach leads to solutions with dramatically stronger strategies at information sets that are reached with low probability in approximate Nash equilibria, while retaining the overall convergence rate associated with fast algorithms for Nash equilibrium. This has benefits both in approximate equilibrium finding (such approximation is necessary in practice in large games) where some probabilities are low while possibly heading toward zero in the limit, and exact equilibrium computation where the low probabilities are actually zero.

• #2048
Weakening Covert Networks by Minimizing Inverse Geodesic Length
Haris Aziz, Serge Gaspers, Kamran Najeebullah
Game Theory

We consider the problem of deleting nodes in a covert network to minimize its performance. The inverse geodesic length (IGL) is a well-known and widely used measure of network performance. It equals the sum of the inverse distances of all pairs of vertices. In the MinIGL problem the input is a graph $G$, a budget $k$, and a target IGL $T$, and the question is whether there exists a subset of vertices $X$ with $|X|=k$, such that the IGL of $G-X$ is at most $T$. In network analysis, the IGL is often used to evaluate how well heuristics perform in strengthening or weakening a network. In this paper, we undertake a study of the classical and parameterized complexity of the MinIGL problem. The problem is NP-complete even if $T=0$ and remains both NP-complete and $W[1]$-hard for parameter $k$ on bipartite and on split graphs. On the positive side, we design several multivariate algorithms for the problem. Our main result is an algorithm for MinIGL parameterized by the twin cover number.

• #2347
The Tractability of the Shapley Value over Bounded Treewidth Matching Games
Gianluigi Greco, Francesco Lupia, Francesco Scarcello
Game Theory

Matching games form a class of coalitional games that attracted much attention in the literature. Indeed, several results are known about the complexity of computing over them {solution concepts}. In particular, it is known that computing the Shapley value is intractable in general, formally #P-hard, and feasible in polynomial time over games defined on trees. In fact, it was an open problem whether or not this tractability result holds over classes of graphs properly including acyclic ones. The main contribution of the paper is to provide a positive answer to this question, by showing that the Shapley value is tractable for matching games defined over graphs having bounded treewidth. The proposed technique has been implemented and tested on classes of graphs having different sizes and treewidth at most three.

• #3377
An Algorithm for Constructing and Solving Imperfect Recall Abstractions of Large Extensive-Form Games
Jiri Cermak, Branislav Bošanský, Viliam Lisý
Game Theory

We solve large two-player zero-sum extensive-form games with perfect recall. We propose a new algorithm based on fictitious play that significantly reduces memory requirements for storing average strategies. The key feature is exploiting imperfect recall abstractions while preserving the convergence rate and guarantees of fictitious play applied directly to the perfect recall game. The algorithm creates a coarse imperfect recall abstraction of the perfect recall game and automatically refines its information set structure only where the imperfect recall might cause problems. Experimental evaluation shows that our novel algorithm is able to solve a simplified poker game with 7.10^5 information sets using an abstracted game with only 1.8% of information sets of the original game. Additional experiments on poker and randomly generated games suggest that the relative size of the abstraction decreases as the size of the solved games increases.

• #3425
Nash Equilibria in Concurrent Games with Lexicographic Preferences
Julian Gutierrez, Aniello Murano, Giuseppe Perelli, Sasha Rubin, Michael Wooldridge
Game Theory

We study concurrent games with finite-memory strategies where players are given a Buchi and a mean-payoff objective, which are related by a lexicographic order: a player first prefers to satisfy its Buchi objective, and then prefers to minimise costs, which are given by a mean-payoff function. In particular, we show that deciding the existence of a strict Nash equilibrium in such games is decidable, even if players' deviations are implemented as infinite memory strategies.

• #1938
Multiple-Profile Prediction-of-Use Games
Andrew Perrault, Craig Boutilier
Game Theory

Prediction-of-use (POU) games (Robu et al., 2017) address the mismatch between energy supplier costs and the incentives imposed on consumers by a fixed-rate electricity tariff. However, the framework does not address how consumers should coordinate to maximize social welfare. To address this, we develop MPOU games, an extension of POU games in which agents report multiple acceptable electricity use profiles. We show that MPOU games share many attractive properties with POU games (e.g., convexity). Despite this, MPOU games introduce new incentive issues that prevent the consequences of convexity from being exploited directly, a problem we analyze and resolve. We validate our approach with experimental results using utility models learned from real electricity use data.

### Thursday 2408:30 - 10:00MT-PUM - Personalisation and user Modelling (217)

Chair: Reyhan Aydogan
• #1979
Learning User Dependencies for Recommendation
Yong Liu, Peilin Zhao, Xin Liu, Min Wu, Lixin Duan, Xiao-Li Li
Personalisation and user Modelling

Social recommender systems exploit users' social relationships to improve recommendation accuracy. Intuitively, a user tends to trust different people regarding with different scenarios. Therefore, one main challenge of social recommendation is to exploit the most appropriate dependencies between users for a given recommendation task. Previous social recommendation methods are usually developed based on pre-defined user dependencies. Thus, they may not be optimal for a specific recommendation task. In this paper, we propose a novel recommendation method, named probabilistic relational matrix factorization (PRMF), which can automatically learn the dependencies between users to improve recommendation accuracy. In PRMF, users' latent features are assumed to follow a matrix variate normal (MVN) distribution. Both positive and negative user dependencies can be modeled by the row precision matrix of the MVN distribution. Moreover, we also propose an alternating optimization algorithm to solve the optimization problem of PRMF. Extensive experiments on four real datasets have been performed to demonstrate the effectiveness of the proposed PRMF model.

• #873
Exploiting Music Play Sequence for Music Recommendation
Zhiyong Cheng, Jialie Shen, Lei Zhu, Mohan Kankanhalli, Liqiang Nie
Personalisation and user Modelling

Users leave digital footprints when interacting with various music streaming services. Music play sequence, which contains rich information about personal music preference and song similarity, has been largely ignored in previous music recommender systems. In this paper, we explore the effects of music play sequence on developing effective personalized music recommender systems. Towards the goal, we propose to use word embedding techniques in music play sequences to estimate the similarity between songs. The learned similarity is then embedded into matrix factorization to boost the latent feature learning and discovery. Furthermore, the proposed method only considers the k-nearest songs (e.g., k = 5) in the learning process and thus avoids the increase of time complexity. Experimental results on two public datasets demonstrate that our methods could significantly improve the performance of both rating prediction and top-n recommendation tasks.

• #1349
Beyond Universal Saliency: Personalized Saliency Prediction with Multi-task CNN
Yanyu Xu, Nianyi Li, Junru Wu, Jingyi Yu, Shenghua Gao
Personalisation and user Modelling

Saliency detection is a long standing problem in computer vision. Tremendous efforts have been focused on exploring a universal saliency model across users despite their differences in gender, race, age, etc. Yet recent psychology studies suggest that saliency is highly specific than universal: individuals exhibit heterogeneous gaze patterns when viewing an identical scene containing multiple salient objects. In this paper, we first show that such heterogeneity is common and critical for reliable saliency prediction. Our study also produces the first database of personalized saliency maps (PSMs). We model PSM based on universal saliency map (USM) shared by different participants and adopt a multi-task CNN framework to estimate the discrepancy between PSM and USM. Comprehensive experiments demonstrate that our new PSM model and prediction scheme are effective and reliable.

• #1588
Quantifying Aspect Bias in Ordinal Ratings using a Bayesian Approach
Lahari Poddar, Wynne Hsu, Mong Li Lee
Personalisation and user Modelling

User opinions expressed in the form of ratings can influence an individual's view of an item. However, the true quality of an item is often obfuscated by user biases, and it is not obvious from the observed ratings the importance different users place on different aspects of an item. We propose a probabilistic modeling of the observed aspect ratings to infer (i) each user's aspect bias and (ii) latent intrinsic quality of an item. We model multi-aspect ratings as ordered discrete data and encode the dependency between different aspects by using a latent Gaussian structure. We handle the Gaussian-Categorical non-conjugacy using a stick-breaking formulation coupled with P\'{o}lya-Gamma auxiliary variable augmentation for a simple, fully Bayesian inference. On two real world datasets, we demonstrate the predictive ability of our model and its effectiveness in learning explainable user biases to provide insights towards a more reliable product quality estimation.

• #3416
Socialized Word Embeddings
Ziqian Zeng, Yichun Yin, Yangqiu Song, Ming Zhang
Personalisation and user Modelling

Word embeddings have attracted a lot of attention. On social media, each user’s language use can be significantly affected by the user’s friends. In this paper, we propose a socialized word embedding algorithm which can consider both user’s personal characteristics of language use and the user’s social relationship on social media. To incorporate personal characteristics, we propose to use a user vector to represent each user. Then for each user, the word embeddings are trained based on each user’s corpus by combining the global word vectors and local user vector. To incorporate social relationship, we add a regularization term to impose similarity between two friends. In this way, we can train the global word vectors and user vectors jointly. To demonstrate the effectiveness, we used the latest large-scale Yelp data to train our vectors, and designed several experiments to show how user vectors affect the results.

• #3967
Exploring Personalized Neural Conversational Models
Satwik Kottur, Xiaoyu Wang, Vitor Carvalho
Personalisation and user Modelling

Modeling dialog systems is currently one of the most active problems in Natural Language Processing. Recent advancement in Deep Learning has sparked an interest in the use of neural networks in modeling language, particularly for personalized conversational agents that can retain contextual information during dialog exchanges. This work carefully explores and compares several of the recently proposed neural conversation models, and carries out a detailed evaluation on the multiple factors that can significantly affect predictive performance, such as pretraining, embedding training, data cleaning, diversity reranking, evaluation setting, etc. Based on the tradeoffs of different models, we propose a new generative dialogue model conditioned on speakers as well as context history that outperforms all previous models on both retrieval and generative metrics. Our findings indicate that pretraining speaker embeddings on larger datasets, as well as bootstrapping word and speaker embeddings, can significantly improve performance (up to 3 points in perplexity), and that promoting diversity in using Mutual Information based techniques has a very strong effect in ranking metrics.

### Thursday 2408:30 - 10:00KR-BC - Belief Change (218)

Chair: Yanjing Wang
• #1804
A General Multi-agent Epistemic Planner Based on Higher-order Belief Change
Xiao Huang, Biqing Fang, Hai Wan, Yongmei Liu
Belief Change

In recent years, multi-agent epistemic planning has received attention from both dynamic logic and planning communities. Existing implementations of multi-agent epistemic planning are based on compilation into classical planning and suffer from various limitations, such as generating only linear plans, restriction to public actions, and incapability to handle disjunctive beliefs. In this paper, we propose a general representation language for multi-agent epistemic planning where the initial KB and the goal, the preconditions and effects of actions can be arbitrary multi-agent epistemic formulas, and the solution is an action tree branching on sensing results.To support efficient reasoning in the multi-agent KD45 logic, we make use of a normal form called alternative cover disjunctive formula (ACDF). We propose basic revision and update algorithms for ACDF formulas. We also handle static propositional common knowledge, which we call constraints. Based on our reasoning, revision and update algorithms, adapting the PrAO algorithm for contingent planning from the literature, we implemented a multi-agent epistemic planner called MAEP. Our experimental results show the viability of our approach.

• #2492
Belief Change in a Preferential Non-monotonic Framework
Giovanni Casini, Thomas Meyer
Belief Change

Belief change and non-monotonic reasoning are usually viewed as two sides of the same coin, with results showing that one can formally be defined in terms of the other. In this paper we show that it also makes sense to analyse belief change within a (preferential) non-monotonic framework. We consider belief change operators in a non-monotonic propositional setting with a view towards preserving consistency. We show that the results obtained can also be applied to the preservation of coherence— an important notion within the field of logic-based ontologies. We adopt the AGM approach to belief change and show that standard AGM can be adapted to a preferential non-monotonic framework, with the definition of expansion, contraction, and revision operators, and corresponding representation results.

• #1250
Strong Syntax Splitting for Iterated Belief Revision
Gabriele Kern-Isberner, Gerhard Brewka
Belief Change

AGM theory is the most influential formal account of belief revision. Nevertheless, there are some issues with the original proposal. In particular, Parikh has pointed out that completely irrelevant information may be affected in AGM revision. To remedy this, he proposed an additional axiom (P) aiming to capture (ir)relevance by a notion of syntax splitting. In this paper we generalize syntax splitting from logical sentences to epistemic states, a step which is necessary to cover iterated revision. The generalization is based on the notion of marginalization of epistemic states. Furthermore, we study epistemic syntax splitting in the context of ordinal conditional functions. Our approach substantially generalizes the semantical treatment of (P) in terms of faithful preorders recently presented by Peppas and colleagues.

• #2184
Non-Determinism and the Dynamics of Knowledge
Davide Grossi, Andreas Herzig, Wiebe van der Hoek, Christos Moyzes
Belief Change

In this paper we attempt to shed light on the concept of an agent’s knowledge after a non-deterministic action is executed. We start by making a comparison between notions of non-deterministic choice, and between notions of sequential composition, of settings with dynamic and/or epistemic character; namely Propositional Dynamic Logic (PDL), Dynamic Epistemic Logic (DEL), and the more recent logic of Semi-Public Environments (SPE). These logics represent two different approaches for defining the aforementioned actions, and in order to provide unified frameworks that encompass both, we define the logics DELVO (DEL+Vision+Ontic change) and PDLVE (PDL+Vision+Epistemic operators). DELVO is given a sound and complete axiomatisation.

• #2960
Belief Manipulation Through Propositional Announcements
Aaron Hunter, François Schwarzentruber, Eric Tsang
Belief Change

Public announcements cause each agent in a group to modify their beliefs to incorporate some new piece of information, while simultaneously being aware that all other agents are doing the same. Given a set of agents and a set of epistemic goals, it is natural to ask if there is a single announcement that will make each agent believe the corresponding goal. This problem is known to be undecidable in a general modal setting, where the presence of nested beliefs can lead to complex dynamics. In this paper, we consider not necessarily truthful public announcements in the setting of AGM belief revision. We prove that announcement finding in this setting is not only decidable, but that it is simpler than the corresponding problem in the most simplified modal logics. We then describe AnnB, an implemented tool that uses announcement finding as the basis for controlling robot behaviour through belief manipulation.

• #2982
Epistemic-entrenchment Characterization of Parikh’s Axiom
Theofanis Aravanis, Pavlos Peppas, Mary-Anne Williams
Belief Change

In this article, we provide the epistemic-entrenchment characterization of the weak version of Parikh’s relevance-sensitive axiom for belief revision — known as axiom (P) — for the general case of incomplete theories. Loosely speaking, axiom (P) states that, if a belief set K can be divided into two disjoint compartments, and the new information φ relates only to the first compartment, then the second compartment should not be affected by the revision of K by φ. The above-mentioned characterization, essentially, constitutes additional constraints on epistemic-entrenchment preorders, that induce AGM revision functions, satisfying the weak version of Parikh’s axiom (P).

### Thursday 2408:30 - 10:00ML-MIML - Multi-Instance and Multi-Label Learning (219)

Chair: Andy Song
• #2001
Multi-Instance Learning with Key Instance Shift
Ya-Lin Zhang, Zhi-Hua Zhou
Multi-Instance and Multi-Label Learning

Multi-instance learning (MIL) deals with the tasks where each example is represented by a bag of instances. A bag is positive if it contains at least one positive instance, and negative otherwise. The positive instances are also called key instances. Only bag labels are observed, whereas specific instance labels are not available in MIL. Previous studies typically assume that training and test data follow the same distribution, which may be violated in many real-world tasks. In this paper, we address the problem that the distribution of key instances varies between training and test phase. We refer to this problem as MIL with key instance shift and solve it by proposing an embedding based method MIKI. Specifically, to transform the bags into informative vectors, we propose a weighted multi-class model to select the instances with high positiveness as instance prototypes. Then we learn the importance weights for transformed bag vectors and incorporate original instance weights into them to narrow the gap between training/test distributions. Experimental results validate the effectiveness of our approach when key instance shift occurs.

• #1621
Deep Multiple Instance Hashing for Object-based Image Retrieval
Wanqing Zhao, Ziyu Guan, Hangzai Luo, Jinye Peng, Jianping Fan
Multi-Instance and Multi-Label Learning

Multi-keyword query is widely supported in text search engines. However, an analogue in image retrieval systems, multi-object query, is rarely studied. Meanwhile, traditional object-based image retrieval methods often involve multiple steps separately and need expensive location labeling for detecting objects. In this work, we propose a weakly-supervised Deep Multiple Instance Hashing (DMIH) framework for object-based image retrieval. DMIH integrates object detection and hashing learning on the basis of a popular CNN model to build the end-to-end relation between a raw image and the binary hashing codes of multiple objects in it. Specifically, we cast the object detection of each object class as a binary multiple instance learning problem where instances are object proposals extracted from multi-scale convolutional feature maps. For hashing training, we sample image pairs to learn their semantic relationships in terms of hash codes of the most probable proposals for owned labels as guided by object predictors. The two objectives benefit each other in learning. DMIH outperforms state-of-the-arts on public benchmarks for object-based image retrieval and achieves promising results for multi-object queries.

• #2276
Saliency Guided End-to-End Learning for Weakly Supervised Object Detection
Baisheng Lai, Xiaojin Gong
Multi-Instance and Multi-Label Learning

Weakly supervised object detection (WSOD), which is the problem of learning detectors using only image-level labels, has been attracting more and more interest. However, this problem is quite challenging due to the lack of location supervision. To address this issue, this paper integrates saliency into a deep architecture, in which the location information is explored both explicitly and implicitly. Specifically, we select highly confident object proposals under the guidance of class-specific saliency maps. The location information, together with semantic and saliency information, of the select proposals are then used to explicitly supervise the network by imposing two additional losses. Meanwhile, a saliency prediction sub-network is built in the architecture. The prediction results are used to implicitly guide the localization procedure. The entire network is trained end-to-end. Experiments on PASCAL VOC demonstrate that our approach outperforms all state-of-the-arts.

• #2073
Obtaining High-Quality Label by Distinguishing between Easy and Hard Items in Crowdsourcing
Wei Wang, Xiang-Yu Guo, Shao-Yuan Li, Yuan Jiang, Zhi-Hua Zhou
Multi-Instance and Multi-Label Learning

Crowdsourcing systems make it possible to hire voluntary workers to label large-scale data by offering them small monetary payments. Usually, the taskmaster requires to collect high-quality labels, while the quality of labels obtained from the crowd may not satisfy this requirement. In this paper, we study the problem of obtaining high-quality labels from the crowd and present an approach of learning the difficulty of items in crowdsourcing, in which we construct a small training set of items with estimated difficulty and then learn a model to predict the difficulty of future items. With the predicted difficulty, we can distinguish between easy and hard items to obtain high-quality labels. For easy items, the quality of their labels inferred from the crowd could be high enough to satisfy the requirement; while for hard items, the crowd could not provide high-quality labels, it is better to choose a more knowledgable crowd or employ specialized workers to label them. The experimental results demonstrate that the proposed approach by learning to distinguish between easy and hard items can significantly improve the label quality.

• #2235
Binary Linear Compression for Multi-label Classification
Wen-Ji Zhou, Yang Yu, Min-Ling Zhang
Multi-Instance and Multi-Label Learning

In multi-label classification tasks, labels are commonly related with each other. It has been well recognized that utilizing label relationship is essential to multi-label learning. One way to utilizing label relationship is to map labels to a lower-dimensional space of uncorrelated labels, where the relationship could be encoded in the mapping. Previous linear mapping methods commonly result in regression subproblems in the lower-dimensional label space. In this paper, we disclose that mappings to a low-dimensional multi-label regression problem can be worse than mapping to a classification problem, since regression requires more complex model than classification. We then propose the binary linear compression (BILC) method that results in a binary label space, leading to classification subproblems. Experiments on several multi-label datasets show that, employing classification in the embedded space results in much simpler models than regression, leading to smaller structure risk. The proposed methods are also shown to be superior to some state-of-the-art approaches.

• #3379
Incomplete Label Distribution Learning
Miao Xu, Zhi-Hua Zhou
Multi-Instance and Multi-Label Learning

Label distribution learning (LDL) assumes labels can be associated to an instance to some degree, thus it can learn the relevance of a label to a particular instance. Although LDL has got successful practical applications, one problem with existing LDL methods is that they are designed for data with \emph{complete} supervised information, while in reality, annotation information may be \emph{incomplete}, because assigning each label a real value to indicate its association with a particular instance will result in large cost in labor and time. In this paper, we will solve LDL problem when given \emph{incomplete} supervised information. We propose an objective based on trace norm minimization to exploit the correlation between labels. We develop a proximal gradient descend algorithm and an algorithm based on alternating direction method of multipliers. Experiments validate the effectiveness of our proposal.

### Thursday 2408:30 - 10:00MAS-COCO - Coordination and Cooperation (220)

Chair: Stefano Albrecht
• #2466
COG-DICE: An Algorithm for Solving Continuous-Observation Dec-POMDPs
Madison Clark-Turner, Christopher Amato
Coordination and Cooperation

The decentralized partially observable Markov decision process (Dec-POMDP) is a powerful model for representing multi-agent problems with decentralized behavior. Unfortunately, current Dec-POMDP solution methods cannot solve problems with continuous observations, which are common in many real-world domains. To that end, we present a framework for representing and generating Dec-POMDP policies that explicitly include continuous observations. We apply our algorithm to a novel tagging problem and an extended version of a common benchmark, where it generates policies that meet or exceed the values of equivalent discretized domains without the need for finding an adequate discretization.

• #1402
Coordinated Versus Decentralized Exploration In Multi-Agent Multi-Armed Bandits
Mithun Chakraborty, Kai Yee Phoebe Chua, Sanmay Das, Brendan Juba
Coordination and Cooperation

In this paper, we introduce a multi-agent multi-armed bandit-based model for ad hoc teamwork with expensive communication. The goal of the team is to maximize the total reward gained from pulling arms of a bandit over a number of epochs. In each epoch, each agent decides whether to pull an arm, or to broadcast the reward it obtained in the previous epoch to the team and forgo pulling an arm. These decisions must be made only on the basis of the agent’s private information and the public information broadcast prior to that epoch. We first benchmark the achievable utility by analyzing an idealized version of this problem where a central authority has complete knowledge of rewards acquired from all arms in all epochs and uses a multiplicative weights update algorithm for allocating arms to agents. We then introduce an algorithm for the decentralized setting that uses a value-of-information based communication strategy and an exploration-exploitation strategy based on the centralized algorithm, and show experimentally that it converges rapidly to the performance of the centralized method.

• #1661
Probability Bounds for Overlapping Coalition Formation
Michail Mamakos, Georgios Chalkiadakis
Coordination and Cooperation

In this work, we provide novel methods which benefit from obtained probability bounds for assessing the ability of teams of agents to accomplish coalitional tasks. To this end, our first method is based on an improvement of the Paley-Zygmund inequality, while the second and the third ones are devised based on manipulations of the two-sided Chebyshev’s inequality and the Hoeffding’s inequality, respectively. Agents have no knowledge of the amount of resources others possess; and hold private Bayesian beliefs regarding the potential resource investment of every other agent. Our methods allow agents to demand that certain confidence levels are reached, regarding the resource contributions of the various coalitions. In order to tackle real-world scenarios, we allow agents to form overlapping coalitions, so that one can simultaneously be part of a number of coalitions. We thus present a protocol for iterated overlapping coalition formation (OCF), through which agents can complete tasks that grant them utility. Agents lie on a social network and their distance affects their likelihood of cooperation towards the completion of a task. We confirm our methods’ effectiveness by testing them on both a random graph of 300 nodes and a real-world social network of 4039 nodes.

• #2668
Multi-Agent Planning with Baseline Regret Minimization
Feng Wu, Shlomo Zilberstein, Xiaoping Chen
Coordination and Cooperation

We propose a novel baseline regret minimization algorithm for multi-agent planning problems modeled as finite-horizon decentralized POMDPs. It guarantees to produce a policy that is provably better than or at least equivalent to the baseline policy. We also propose an iterative belief generation algorithm to effectively and efficiently minimize the baseline regret, which only requires necessary iterations to converge to the policy with minimum baseline regret. Experimental results on common benchmark problems confirm its advantage comparing to the state-of-the-art approaches.

• #3514
Object Allocation via Swaps along a Social Network
Laurent Gourvès, Julien Lesca, Anaëlle Wilczynski
Coordination and Cooperation

This article deals with object allocation where each agent receives a single item. Starting from an initial endowment, the agents can be better off by exchanging their objects. However, not all trades are likely because some participants are unable to communicate. By considering that the agents are embedded in a social network, we propose to study the allocations emerging from a sequence of simple swaps between pairs of neighbors in the network. This model raises natural questions regarding (i) the reachability of a given assignment, (ii) the ability of an agent to obtain a given object, and (iii) the search of Pareto-efficient allocations. We investigate the complexity of these problems by providing, according to the structure of the social network, polynomial and NP-complete cases.

• #3865
Manipulating Opinion Diffusion in Social Networks
Robert Bredereck, Edith Elkind
Coordination and Cooperation

We consider opinion diffusion in binary influence networks, where at each step one or more agents update their opinions so as to be in agreement with the majority of their neighbors. We consider several ways of manipulating the majority opinion in a stable outcome, such as bribing agents, adding/deleting links, and changing the order of updates, and investigate the computational complexity of the associated problems, identifying tractable and intractable cases.

### Thursday 2408:30 - 10:00Competition (206)

Chair: Jochen Renz
• Andry Birds
Competition
• ### Thursday 2410:30 - 12:00ML-NN1 - Neural Networks 1 (210)

Chair: Longbing Cao
• #2032
Segmenting Chinese Microtext: Joint Informal-Word Detection and Segmentation with Neural Networks
Meishan Zhang, Guohong Fu, Nan Yu
Neural Networks 1

State-of-the-art Chinese word segmentation systems typically exploit supervised modelstrained on a standard manually-annotated corpus,achieving performances over 95% on a similar standard testing corpus.However, the performances may drop significantly when the same models are applied onto Chinese microtext.One major challenge is the issue of informal words in the microtext.Previous studies show that informal word detection can be helpful for microtext processing.In this work, we investigate it under the neural setting, by proposing a joint segmentation model that integrates the detection of informal words simultaneously.In addition, we generate training corpus for the joint model by using existing corpus automatically.Experimental results show that the proposed model is highly effective for segmentation of Chinese microtext.

• #1341
Privacy Issues Regarding the Application of DNNs to Activity-Recognition using Wearables and Its Countermeasures by Use of Adversarial Training
Yusuke Iwasawa, Kotaro Nakayama, Ikuko Yairi, Yutaka Matsuo
Neural Networks 1

Deep neural networks have been successfully applied to activity recognition with wearables in terms of recognition performance. However, the black-box nature of neural networks could lead to privacy concerns. Namely, generally it is hard to expect what neural networks learn from data, and so they possibly learn features that highly discriminate user-information unintentionally, which increases the risk of information-disclosure. In this study, we analyzed the features learned by conventional deep neural networks when applied to data of wearables to confirm this phenomenon.Based on the results of our analysis, we propose the use of an adversarial training framework to suppress the risk of sensitive/unintended information disclosure. Our proposed model considers both an adversarial user classifier and a regular activity-classifier during training, which allows the model to learn representations that help the classifier to distinguish the activities but which, at the same time, prevents it from accessing user-discriminative information. This paper provides an empirical validation of the privacy issue and efficacy of the proposed method using three activity recognition tasks based on data of wearables. The empirical validation shows that our proposed method suppresses the concerns without any significant performance degradation, compared to conventional deep nets on all three tasks.

• #1689
Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning
Jingkuan Song, Lianli Gao, Zhao Guo, Wu Liu, Dongxiang Zhang, Heng Tao Shen
Neural Networks 1

Recent progress has been made in using attention based encoder-decoder framework for video captioning. However, most existing decoders apply the attention mechanism to every generated words including both visual words (e.g., “gun” and "shooting“) and non-visual words (e.g. "the“, "a”).However, these non-visual words can be easily predicted using natural language model without considering visual signals or attention.Imposing attention mechanism on non-visual words could mislead and decrease the overall performance of video captioning.To address this issue, we propose a hierarchical LSTM with adjusted temporal attention (hLSTMat) approach for video captioning. Specifically, the proposed framework utilizes the temporal attention for selecting specific frames to predict related words, while the adjusted temporal attention is for deciding whether to depend on the visual information or the language context information. Also, a hierarchical LSTMs is designed to simultaneously consider both low-level visual information and deep semantic information to support the video caption generation. To demonstrate the effectiveness of our proposed framework, we test our method on two prevalent datasets: MSVD and MSR-VTT, and experimental results show that our approach outperforms the state-of-the-art methods on both two datasets.

• #3702
Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations
Andrew Slavin Ross, Michael C. Hughes, Finale Doshi-Velez
Neural Networks 1

Expressive classifiers such as neural networks are among the most accurate supervised learning methods in use today, but their opaque decision boundaries make them difficult to trust in critical applications. We propose a method to explain the predictions of any differentiable model via the gradient of the class label with respect to the input (which provides a normal to the decision boundary). Not only is this approach orders of magnitude faster at identifying input dimensions of high sensitivity than sample-based perturbation methods (e.g. LIME), but it also lends itself to efficiently discovering multiple qualitatively different decision boundaries as well as decision boundaries that are consistent with expert annotation. On multiple datasets, we show our approach generalizes much better when test conditions differ from those in training.

• #2992
Self-paced Convolutional Neural Networks
Hao Li, Maoguo Gong
Neural Networks 1

Convolutional neural networks (CNNs) have achieved breakthrough performance in many pattern recognition tasks. In order to distinguish the reliable data from the noisy and confusing data, we improve CNNs with self-paced learning (SPL) for enhancing the learning robustness of CNNs. In the proposed self-paced convolutional network (SPCN), each sample is assigned to a weight to reflect the easiness of the sample. Then a dynamic self-paced function is incorporated into the leaning objective of CNN to jointly learn the parameters of CNN and the latent weight variable. SPCN learns the samples from easy to complex and the sample weights can dynamically control the learning rates for converging to better values. To gain more insights of SPCN, theoretical studies are conducted to show that SPCN converges to a stationary solution and is robust to the noisy and confusing data. Experimental results on MNIST and rectangles datasets demonstrate that the proposed method outperforms baseline methods.

• #3638
Exemplar-centered Supervised Shallow Parametric Data Embedding
Martin Renqiang Min, Hongyu Guo, Dongjin Song
Neural Networks 1

Metric learning methods for dimensionality reduction in combination with k-Nearest Neighbors (kNN) have been extensively deployed in many classification, data embedding, and information retrieval applications. However, most of these approaches involve pairwise training data comparisons, and thus have quadratic computational complexity with respect to the size of training set, preventing them from scaling to fairly big datasets. Moreover, during testing, comparing test data against all the training data points is also expensive in terms of both computational cost and resources required. Furthermore, previous metrics are either too constrained or too expressive to be well learned. To effectively solve these issues, we present an exemplar-centered supervised shallow parametric data embedding model, using a Maximally Collapsing Metric Learning (MCML) objective. Our strategy learns a shallow high-order parametric embedding function and compares training/test data only with learned or precomputed exemplars, resulting in a cost function with linear computational complexity for both training and testing. We also empirically demonstrate, using several benchmark datasets, that for classification in two-dimensional embedding space, our approach not only gains speedup of kNN by hundreds of times, but also outperforms state-of-the-art supervised embedding approaches.

### Thursday 2410:30 - 12:00ML-DMUL1 - Data Mining and Unsupervised Learning 1 (211)

Chair: Chang Xu
• #1198
Discovering Relevance-Dependent Bicluster Structure from Relational Data
Iku Ohama, Takuya Kida, Hiroki Arimura
Data Mining and Unsupervised Learning 1

In this paper, we propose a statistical model for relevance-dependent biclustering to analyze relational data. The proposed model factorizes relational data into bicluster structure with two features: (1) each object in a cluster has a relevance value, which indicates how strongly the object relates to the cluster and (2) all clusters are related to at least one dense block. These features simplify the task of understanding the meaning of each cluster because only a few highly relevant objects need to be inspected. We introduced the Relevance-Dependent Bernoulli Distribution (R-BD) as a prior for relevance-dependent binary matrices and proposed the novel Relevance-Dependent Infinite Biclustering (R-IB) model, which automatically estimates the number of clusters. Posterior inference can be performed efficiently using a collapsed Gibbs sampler because the parameters of the R-IB model can be fully marginalized out. Experimental results show that the R-IB extracts more essential bicluster structure with better computational efficiency than conventional models. We further observed that the biclustering results obtained by R-IB facilitate interpretation of the meaning of each cluster.

• #2036
Affinity Learning for Mixed Data Clustering
Nan Li, Longin Jan Latecki
Data Mining and Unsupervised Learning 1

In this paper, we propose a novel affinity learning based framework for mixed data clustering, which includes: how to process data with mixed-type attributes, how to learn affinities between data points, and how to exploit the learned affinities for clustering. In the proposed framework, each original data attribute is represented with several abstract objects defined according to the specific data type and values. Each attribute value is transformed into the initial affinities between the data point and the abstract objects of attribute. We refine these affinities and infer the unknown affinities between data points by taking into account the interconnections among the attribute values of all data points. The inferred affinities between data points can be exploited for clustering. Alternatively, the refined affinities between data points and the abstract objects of attributes can be transformed into new data features for clustering. Experimental results on many real world data sets demonstrate that the proposed framework is effective for mixed data clustering.

• #2798
Understanding People Lifestyles: Construction of Urban Movement Knowledge Graph from GPS Trajectory
Chenyi Zhuang, Nicholas Jing Yuan, Ruihua Song, Xing Xie, Qiang Ma
Data Mining and Unsupervised Learning 1

Technologies are increasingly taking advantage of the explosion in the amount of data generated by social multimedia (e.g., web searches, ad targeting, and urban computing). In this paper, we propose a multi-view learning framework for presenting the construction of a new urban movement knowledge graph, which could greatly facilitate the research domains mentioned above. In particular, by viewing GPS trajectory data from temporal, spatial, and spatiotemporal points of view, we construct a knowledge graph of which nodes and edges are their locations and relations, respectively. On the knowledge graph, both nodes and edges are represented in latent semantic space. We verify its utility by subsequently applying the knowledge graph to predict the extent of user attention (high or low) paid to different locations in a city. Experimental evaluations and analysis of a real-world dataset show significant improvements in comparison to state-of-the-art methods.

• #3193
Mining Convex Polygon Patterns with Formal Concept Analysis
Aimene Belfodil, Sergei O. Kuznetsov, Céline Robardet, Mehdi Kaytoue
Data Mining and Unsupervised Learning 1

Pattern mining is an important task in AI for eliciting hypotheses from the data. When it comes to spatial data, the geo-coordinates are often considered independently as two different attributes. Consequently, rectangular patterns are searched for. Such an arbitrary form is not able to capture interesting regions in general. We thus introduce convex polygons, a good trade-off for capturing high density areas in any pattern mining task. Our contribution is threefold: (i) We formally introduce such patterns in Formal Concept Analysis (FCA), (ii) we give all the basic bricks for mining polygons with exhaustive search and pattern sampling, and (iii) we design several algorithms that we compare experimentally.

• #3720
See without looking: joint visualization of sensitive multi-site datasets
Debbrata K. Saha, Vince D. Calhoun, Sandeep R. Panta, Sergey M. Plis
Data Mining and Unsupervised Learning 1

Visualization of high dimensional large-scale datasets via an embedding into a 2D map is a powerful exploration tool for assessing latent structure in the data and detecting outliers. There are many methods developed for this task but most assume that all pairs of samples are available for common computation. Specifically, the distances between all pairs of points need to be directly computable. In contrast, we work with sensitive neuroimaging data, when local sites cannot share their samples and the distances cannot be easily computed across the sites. Yet, the desire is to let all the local data participate in collaborative computation without leaving their respective sites. In this scenario, a quality control tool that visualizes decentralized dataset in its entirety via global aggregation of local computations is especially important as it would allow screening of samples that cannot be evaluated otherwise. This paper introduces an algorithm to solve this problem: decentralized data stochastic neighbor embedding (dSNE). Based on the MNIST dataset we introduce metrics for measuring the embedding quality and use them to compare dSNE to its centralized counterpart. We also apply dSNE to a multi-site neuroimaging dataset with encouraging results.

• #3918
Beyond the Nystrom Approximation: Speeding up Spectral Clustering using Uniform Sampling and Weighted Kernel k-means
Mahesh Mohan, Claire Monteleoni
Data Mining and Unsupervised Learning 1

In this paper we present a framework for spectral clustering based on the following simple scheme: sample a subset of the input points, compute the clusters for the sampled subset using weighted kernel k-means (Dhillon et al. 2004) and use the resulting centers to compute a clustering for the remaining data points. For the case where the points are sampled uniformly at random without replacement, we show that the number of samples required depends mainly on the number of clusters and the diameter of the set of points in the kernel space. Experiments show that the proposed framework outperforms the approaches based on the Nystr\"{o}m approximation both in terms of accuracy and computation time.

### Thursday 2410:30 - 12:00ML-TAML2 - Transfer, Adaptation, Multi-Task Learning 2 (212)

Chair: Jingrui He
• #1724
A Generalized Recurrent Neural Architecture for Text Classification with Multi-Task Learning
Honglun Zhang, Liqiang Xiao, Yongkun Wang, Yaohui Jin
Transfer, Adaptation, Multi-Task Learning 2

Multi-task learning leverages potential correlations among related tasks to extract common features and yield performance gains. However, most previous works only consider simple or weak interactions, thereby failing to model complex correlations among three or more tasks. In this paper, we propose a multi-task learning architecture with four types of recurrent neural layers to fuse information across multiple related tasks. The architecture is structurally flexible and considers various interactions among tasks, which can be regarded as a generalized case of many previous works. Extensive experiments on five benchmark datasets for text classification show that our model can significantly improve performances of related tasks with additional information from others.

• #2054
Cross-modal Common Representation Learning by Hybrid Transfer Network
Xin Huang, Yuxin Peng, Mingkuan Yuan
Transfer, Adaptation, Multi-Task Learning 2

DNN-based cross-modal retrieval is a research hotspot to retrieve across different modalities as image and text, but existing methods often face the challenge of insufficient cross-modal training data. In single-modal scenario, similar problem is usually relieved by transferring knowledge from large-scale auxiliary datasets (as ImageNet). Knowledge from such single-modal datasets is also very useful for cross-modal retrieval, which can provide rich general semantic information that can be shared across different modalities. However, it is challenging to transfer useful knowledge from single-modal (as image) source domain to cross-modal (as image/text) target domain. Knowledge in source domain cannot be directly transferred to both two different modalities in target domain, and the inherent cross-modal correlation contained in target domain provides key hints for cross-modal retrieval which should be preserved during transfer process. This paper proposes Cross-modal Hybrid Transfer Network (CHTN) with two subnetworks: Modal-sharing transfer subnetwork utilizes the modality in both source and target domains as a bridge, for transferring knowledge to both two modalities simultaneously; Layer-sharing correlation subnetwork preserves the inherent cross-modal semantic correlation to further adapt to cross-modal retrieval task. Cross-modal data can be converted to common representation by CHTN for retrieval, and comprehensive experiment on 3 datasets shows its effectiveness.

• #3196
Demystifying Neural Style Transfer
Yanghao Li, Naiyan Wang, Jiaying Liu, Xiaodi Hou
Transfer, Adaptation, Multi-Task Learning 2

Neural Style Transfer has recently demonstrated very exciting results which catches eyes in both academia and industry. Despite the amazing results, the principle of neural style transfer, especially why the Gram matrices could represent style remains unclear. In this paper, we propose a novel interpretation of neural style transfer by treating it as a domain adaptation problem. Specifically, we theoretically show that matching the Gram matrices of feature maps is equivalent to minimize the Maximum Mean Discrepancy (MMD) with the second order polynomial kernel. Thus, we argue that the essence of neural style transfer is to match the feature distributions between the style images and the generated images. To further support our standpoint, we experiment with several other distribution alignment methods, and achieve appealing results. We believe this novel interpretation connects these two important research fields, and could enlighten future researches.

• #3777
Completely Heterogeneous Transfer Learning with Attention - What And What Not To Transfer
Seungwhan Moon, Jaime Carbonell
Transfer, Adaptation, Multi-Task Learning 2

We study a transfer learning framework where source and target datasets are heterogeneous in both feature and label spaces. Specifically, we do not assume explicit relations between source and target tasks a priori, and thus it is crucial to determine what and what not to transfer from source knowledge. Towards this goal, we define a new heterogeneous transfer learning approach that (1) selects and attends to an optimized subset of source samples to transfer knowledge from, and (2) builds a unified transfer network that learns from both source and target knowledge. This method, termed "Attentional Heterogeneous Transfer", along with a newly proposed unsupervised transfer loss, improve upon the previous state-of-the-art approaches on extensive simulations as well as a challenging hetero-lingual text classification task.

• #3956
Dynamic Multi-Task Learning with Convolutional Neural Network
Yuchun Fang, Zhengyan Ma, Zhaoxiang Zhang, Xu-Yao Zhang, Xiang Bai
Transfer, Adaptation, Multi-Task Learning 2

Multi-task learning and deep convolutional neural network (CNN) have been successfully used in various fields. This paper considers the integration of CNN and multi-task learning in a novel way to further improve the performance of multiple related tasks. Existing multi-task CNN models usually empirically combine different tasks into a group which is then trained jointly with a strong assumption of model commonality. Furthermore, traditional approaches usually only consider small number of tasks with rigid structure, which is not suitable for large-scale applications. In light of this, we propose a dynamic multi-task CNN model to handle these problems. The proposed model directly learns the task relations from data instead of subjective task grouping. Due to its flexible structure, it supports task-wise incremental training, which is useful for efficient training of massive tasks. Specifically, we add a new task transfer connection (TTC) between the layers of each task. The learned TTC is able to reflect the correlation among different tasks guiding the model dynamically adjusting the multiplexing of the information among different tasks. With the help of TTC, multiple related tasks can further boost the whole performance for each other. Experiments demonstrate that the proposed dynamic multi-task CNN model outperforms traditional approaches.

• #3227
General Heterogeneous Transfer Distance Metric Learning via Knowledge Fragments Transfer
Yong Luo, Yonggang Wen, Tongliang Liu, Dacheng Tao
Transfer, Adaptation, Multi-Task Learning 2

Transfer learning aims to improve the performance of target learning task by leveraging information (or transferring knowledge) from other related tasks. Recently, transfer distance metric learning (TDML) has attracted lots of interests, but most of these methods assume that feature representations for the source and target learning tasks are the same. Hence, they are not suitable for the applications, in which the data are from heterogeneous domains (feature spaces, modalities and even semantics). Although some existing heterogeneous transfer learning (HTL) approaches is able to handle such domains, they lack flexibility in real-world applications, and the learned transformations are often restricted to be linear. We therefore develop a general and flexible heterogeneous TDML (HTDML) framework based on the knowledge fragment transfer strategy. In the proposed HTDML, any (linear or nonlinear) distance metric learning algorithms can be employed to learn the source metric beforehand. Then a set of knowledge fragments are extracted from the pre-learned source metric to help target metric learning. In addition, either linear or nonlinear distance metric can be learned for the target domain. Extensive experiments on both scene classification and object recognition demonstrate superiority of the proposed method.

### Thursday 2410:30 - 12:00ML-REL2 - Reinforcement Learning 2 (213)

Chair: Matthew Taylor
• #1446
Weighted Double Q-learning
Zongzhang Zhang, Zhiyuan Pan, Mykel J. Kochenderfer
Reinforcement Learning 2

Q-learning is a popular reinforcement learning algorithm, but it can perform poorly in stochastic environments due to overestimating action values. Overestimation is due to the use of a single estimator that uses the maximum action value as an approximation for the maximum expected action value. To avoid overestimation in Q-learning, the double Q-learning algorithm was recently proposed, which uses the double estimator method. It uses two estimators from independent sets of experiences, with one estimator determining the maximizing action and the other providing the estimate of its value. Double Q-learning sometimes underestimates the action values. This paper introduces a weighted double Q-learning algorithm, which is based on the construction of the weighted double estimator, with the goal of balancing between the overestimation in the single estimator and the underestimation in the double estimator. Empirically, the new algorithm is shown to perform well on several MDP problems.

• #3824
Sample Efficient Policy Search for Optimal Stopping Domains
Karan Goel, Christoph Dann, Emma Brunskill
Reinforcement Learning 2

Optimal stopping problems consider the question of deciding when to stop an observation-generating process in order to maximize a return. We examine the problem of simultaneously learning and planning in such domains, when data is collected directly from the environment. We propose GFSE, a simple and flexible model-free policy search method that reuses data for sample efficiency by leveraging problem structure. We bound the sample complexity of our approach to guarantee uniform convergence of policy value estimates, tightening existing PAC bounds to achieve logarithmic dependence on horizon length for our setting. We also examine the benefit of our method against prevalent model-based and model-free approaches on 3 domains taken from diverse fields.

• #3966
Learning from Demonstrations with High-Level Side Information
Min Wen, Ivan Papusha, Ufuk Topcu
Reinforcement Learning 2

We consider the problem of learning from demonstration, where extra side information about the demonstration is encoded as a co-safe linear temporal logic formula. We address two known limitations of existing methods that do not account for such side information. First, the policies that result from existing methods, while matching the expected features or likelihood of the demonstrations, may still be in conflict with high-level objectives not explicit in the demonstration trajectories. Second, existing methods fail to provide a priori guarantees on the out-of-sample generalization performance with respect to such high-level goals. This lack of formal guarantees can prevent the application of learning from demonstration to safety- critical systems, especially when inference to state space regions with poor demonstration coverage is required. In this work, we show that side information, when explicitly taken into account, indeed improves the performance and safety of the learned policy with respect to task implementation. Moreover, we describe an automated procedure to systematically generate the features that encode side information expressed in temporal logic.

• #4163
Constrained Bayesian Reinforcement Learning via Approximate Linear Programming
Jongmin Lee, Youngsoo Jang, Pascal Poupart, Kee-Eung Kim
Reinforcement Learning 2

In this paper, we consider the safe learning scenario where we need to restrict the exploratory behavior of a reinforcement learning agent. Specifically, we treat the problem as a form of Bayesian reinforcement learning in an environment that is modeled as a constrained MDP (CMDP) where the cost function penalizes undesirable situations. We propose a model-based Bayesian reinforcement learning (BRL) algorithm for such an environment, eliciting risk-sensitive exploration in a principled way. Our algorithm efficiently solves the constrained BRL problem by approximate linear programming, and generates a finite state controller in an off-line manner. We provide theoretical guarantees and demonstrate empirically that our approach outperforms the state of the art.

• #4185
Universal Reinforcement Learning Algorithms: Survey and Experiments
John Aslanides, Jan Leike, Marcus Hutter
Reinforcement Learning 2

Many state-of-the-art reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP). In contrast, the field of universal reinforcement learning (URL) is concerned with algorithms that make as few assumptions as possible about the environment. The universal Bayesian agent AIXI and a family of related URL algorithms have been developed in this setting. While numerous theoretical optimality results have been proven for these agents, there has been no empirical investigation of their behavior to date. We present a short and accessible survey of these URL algorithms under a unified notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partially-observable gridworld environments. We also present an open- source reference implementation of the algorithms which we hope will facilitate further understanding of, and experimentation with, these ideas.

• #4188
Count-Based Exploration in Feature Space for Reinforcement Learning
Jarryd Martin, Suraj Narayanan S., Tom Everitt, Marcus Hutter
Reinforcement Learning 2

We introduce a new count-based optimistic exploration algorithm for Reinforcement Learning (RL) that is feasible in environments with high-dimensional state-action spaces. The success of RL algorithms in these domains depends crucially on generalisation from limited training experience. Function approximation techniques enable RL agents to generalise in order to estimate the value of unvisited states, but at present few methods enable generalisation regarding uncertainty. This has prevented the combination of scalable RL algorithms with efficient exploration strategies that drive the agent to reduce its uncertainty. We present a new method for computing a generalised state visit-count, which allows the agent to estimate the uncertainty associated with any state. Our \phi-pseudocount achieves generalisation by exploiting same feature representation of the state space that is used for value function approximation. States that have less frequently observed features are deemed more uncertain. The \phi-Exploration-Bonus algorithm rewards the agent for exploring in feature space rather than in the untransformed state space. The method is simpler and less computationally expensive than some previous proposals, and achieves near state-of-the-art results on high-dimensional RL benchmarks.

### Thursday 2410:30 - 12:00CS-CSO - Combinatorial Search and Optimisation (216)

Chair: Hiroshi Hosobe
• #1315
Weighted Model Integration with Orthogonal Transformations
David Merrell, Aws Albarghouthi, Loris D'Antoni
Combinatorial Search and Optimisation

Weighted model counting and integration (WMC/WMI) are natural problems to which we can reduce many probabilistic inference tasks, e.g., in Bayesian networks, Markov networks, and probabilistic programs. Typically, we are given a first-order formula, where each satisfying assignment is associated with a weight---e.g., a probability of occurrence---and our goal is to compute the total weight of the formula. In this paper, we target exact inference techniques for WMI that leverage the power of satisfiability modulo theories (SMT) solvers to decompose a first-order formula in linear real arithmetic into a set of hyperrectangular regions whose weight is easy to compute. We demonstrate the challenges of hyperrectangular decomposition and present a novel technique that utilizes orthogonal transformations to transform formulas in order to enable efficient inference. Our evaluation demonstrates our technique's ability to improve the time required to achieve exact probability bounds.

• #1445
Contextual Covariance Matrix Adaptation Evolutionary Strategies
Abbas Abdolmaleki, Bob Price, Nuno Lau, Luis Paulo Reis, Gerhard Neumann
Combinatorial Search and Optimisation

Many stochastic search algorithms are designed to optimize a fixed objective function to learn a task, i.e., if the objective function changes slightly, for example, due to a change in the situation or context of the task, relearning is required to adapt to the new context. For instance, if we want to learn a kicking movement for a soccer robot, we have to relearn the movement for different ball locations. Such relearning is undesired as it is highly inefficient and many applications require a fast adaptation to a new context/situation. Therefore, we investigate contextual stochastic search algorithms that can learn multiple, similar tasks simultaneously. Current contextual stochastic search methods are based on policy search algorithms and suffer from premature convergence and the need for parameter tuning. In this paper, we extend the well known CMA-ES algorithm to the contextual setting and illustrate its performance on several contextual tasks. Our new algorithm, called contextual CMA-ES, leverages from contextual learning while it preserves all the features of standard CMA-ES such as stability, avoidance of premature convergence, step size control and a minimal amount of parameter tuning.

• #1454
From Decimation to Local Search and Back: A New Approach to MaxSAT
Shaowei Cai, Chuan Luo, Haochen Zhang
Combinatorial Search and Optimisation

Maximum Satisfiability (MaxSAT) is an important NP-hard combinatorial optimization problem with many applications and MaxSAT solving has attracted much interest. This work proposes a new incomplete approach to MaxSAT. We propose a novel decimation algorithm for MaxSAT, and then combine it with a local search algorithm. Our approach works by interleaving between the decimation algorithm and the local search algorithm, with useful information passed between them. Experiments show that our solver DeciLS achieves state of the art performance on all unweighted benchmarks from the MaxSAT Evaluation 2016. Moreover, compared to SAT-based MaxSAT solvers which dominate industrial benchmarks for years, it performs better on industrial benchmarks and significantly better on application formulas from SAT Competition. We also extend this approach to (Weighted) Partial MaxSAT, and the resulting solvers significantly improve local search solvers on crafted and industrial benchmarks, and are complementary (better on WPMS crafted benchmarks) to SAT-based solvers.

• #1471
A Reduction based Method for Coloring Very Large Graphs
Jinkun Lin, Shaowei Cai, Chuan Luo, Kaile Su
Combinatorial Search and Optimisation

The graph coloring problem (GCP) is one of the most studied NP hard problems and has numerous applications. Despite the practical importance of GCP, there are limited works in solving GCP for very large graphs. This paper explores techniques for solving GCP on very large real world graphs.We first propose a reduction rule for GCP, which is based on a novel concept called degree bounded independent set.The rule is iteratively executed by interleaving between lower bound computation and graph reduction. Based on this rule, we develop a novel method called FastColor, which also exploits fast clique and coloring heuristics. We carry out experiments to compare our method FastColor with two best algorithms for coloring large graphs we could find. Experiments on a broad range of real world large graphs show the superiority of our method. Additionally, our method maintains both upper bound and lower bound on the optimal solution, and thus it proves an optimal solution when the upper bound meets the lower bound. In our experiments, it proves the optimal solution for 97 out of 142 instances.

• #2551
Front-to-End Bidirectional Heuristic Search with Near-Optimal Node Expansions
Jingwei Chen, Robert C. Holte, Sandra Zilles, Nathan R. Sturtevant
Combinatorial Search and Optimisation

It is well-known that any admissible unidirectional heuristic search algorithm must expand all states whose f-value is smaller than the optimal solution cost when using a consistent heuristic. Such states are called “surely expanded” (s.e.). A recent study characterized s.e. pairs of states for bidirectional search with consistent heuristics: if a pair of states is s.e. then at least one of the two states must be expanded. This paper derives a lower bound, VC, on the minimum number of expansions required to cover all s.e. pairs, and present a new admissible front-to-end bidirectional heuristic search algorithm, Near-Optimal Bidirectional Search (NBS), that is guaranteed to do no more than 2VC expansions. We further prove that no admissible front-to-end algorithm has a worst case better than 2VC. Experimental results show that NBS competes with or outperforms existing bidirectional search algorithms, and often outperforms A* as well.

• #2694
Estimating the size of search trees by sampling with domain knowledge
Gleb Belov, Samuel Esler, Dylan Fernando, Pierre Le Bodic, George L. Nemhauser
Combinatorial Search and Optimisation

We show how recently-defined abstract models of the Branch-and-Bound algorithm can be used to obtain information on how the nodes are distributed in B&B search trees. This can be directly exploited in the form of probabilities in a sampling algorithm given by Knuth that estimates the size of a search tree. This method reduces the offline estimation error by a factor of two on search trees from Mixed-Integer Programming instances.

### Thursday 2410:30 - 12:00KR-PREF - Preferences (218)

Chair: Srdjan Vesic
• #3265
Revisiting Unrestricted Rebut and Preferences in Structured Argumentation.
Jesse Heyninck, Christian Straßer
Preferences

In structured argumentation frameworks such as ASPIC+, rebuts are only allowed in conclusions produced by defeasible rules. This has been criticized as counter-intuitive especially in dialectical contexts. In this paper we show that ASPIC-, a system allowing for unrestricted rebuts, suffers from contamination problems. We remedy this shortcoming by generalizing the attack rule of unrestricted rebut. Our resulting system satisfies the usual rationality postulates for prioritized rule bases.

• #1251
Pareto Optimal Allocation under Uncertain Preferences
Haris Aziz, Ronald de Haan, Baharak Rastegari
Preferences

The assignment problem is one of the most well-studied settings in social choice, matching, and discrete allocation. We consider this problem with the additional feature that agents' preferences involve uncertainty. The setting with uncertainty leads to a number of interesting questions including the following ones. How to compute an assignment with the highest probability of being Pareto optimal? What is the complexity of computing the probability that a given assignment is Pareto optimal? Does there exist an assignment that is Pareto optimal with probability one? We consider these problems under two natural uncertainty models: (1) the lottery model in which each agent has an independent probability distribution over linear orders and (2) the joint probability model that involves a joint probability distribution over preference profiles. For both of these models, we present a number of algorithmic and complexity results highlighting the difference and similarities in the complexity of the two models.

• #1282
Fair Allocation based on Diminishing Differences
Erel Segal-Halevi, Haris Aziz, Avinatan Hassidim
Preferences

Ranking alternatives is a natural way for humans to explain their preferences. It is being used in many settings, such as school choice (NY, Boston), Course allocations, and the Israeli medical lottery. In some cases (such as the latter two), several items'' are given to each participant. Without having any information on the underlying cardinal utilities, arguing about fairness of allocation requires extending the ordinal item ranking to ordinal bundle ranking. The most commonly used such extension is stochastic dominance (SD), where a bundle X is preferred over a bundle Y if its score is better according to all additive score functions. SD is a very conservative extension, by which few allocations are necessarily fair while many allocations are possibly fair. We propose to make a natural assumption on the underlying cardinal utilities of the players, namely that the difference between two items at the top is larger than the difference between two items at the bottom. This assumption implies a preference extension which we call diminishing differences (DD), where a X is preferred over Y if its score is better according to all additive score functions satisfying the DD assumption. We give a full characterization of allocations that are necessarily-proportional or possibly-proportional according to this assumption. Based on this characterization, we present a polynomial-time algorithm for finding a necessarily-DD-proportional allocation if it exists. Using simulations, we show that with high probability, a necessarily-proportional allocation does not exist but a necessarily-DD-proportional allocation exists, and moreover, that allocation is proportional according to the underlying cardinal utilities.

• #2167
Dominance and Optimisation Based on Scale-Invariant Maximum Margin Preference Learning
Mojtaba Montazery, Nic Wilson
Preferences

In the task of preference learning, there can be natural invariance properties that one might often expect a method to satisfy. These include (i) invariance to scaling of a pair of alternatives, e.g., replacing a pair (a,b) by (2a,2b); and (ii) invariance to rescaling of features across all alternatives. Maximum margin learning approaches satisfy such invariance properties for pairs of test vectors, but not for the preference input pairs, i.e., scaling the inputs in a different way could result in a different preference relation. In this paper we define and analyse more cautious preference relations that are invariant to the scaling of features, or inputs, or both simultaneously; this leads to computational methods for testing dominance with respect to the induced relations, and for generating optimal solutions among a set of alternatives. In our experiments, we compare the relations and their associated optimality sets based on their decisiveness, computation time and cardinality of the optimal set. We also discuss connections with imprecise probability.

• #2418
Efficient Inference and Computation of Optimal Alternatives for Preference Languages Based On Lexicographic Models
Nic Wilson, Anne-Marie George
Preferences

We analyse preference inference, through consistency, for general preference languages based on lexicographic models. We identify a property, which we call strong compositionality, that applies for many natural kinds of preference statement, and that allows a greedy algorithm for determining consistency of a set of preference statements. We also consider different natural definitions of optimality, and their relations to each other, for general preference languages based on lexicographic models. Based on our framework, we show that testing consistency, and thus inference, is polynomial for a specific preference language which allows strict and non-strict statements, comparisons between outcomes and between partial tuples, both ceteris paribus and strong statements, and their combination. Computing different kinds of optimal sets is also shown to be polynomial; this is backed up by our experimental results.

• #3175
Proposing a Highly Accurate Hybrid Component-Based Factorised Preference Model in Recommender Systems
Farhad Zafari, Rasoul Rahmani, Irene Moser
Preferences

Recommender systems play an important role in today's electronic markets due to the large benefits they bring by helping businesses understand their customers' needs and preferences. The major preference components modelled by current recommender systems include user and item biases, feature value preferences, conditional dependencies, temporal preference drifts, and social influence on preferences. In this paper, we introduce a new hybrid latent factor model that achieves great accuracy by integrating all these preference components in a unified model efficiently. The proposed model employs gradient descent to optimise the model parameters, and an evolutionary algorithm to optimise the hyper-parameters and gradient descent learning rates. Using two popular datasets, we investigate the interaction effects of the preference components with each other.We conclude that depending on the dataset, different interactions exist between the preference components. Therefore, understanding these interaction effects is crucial in designing an accurate preference model in every preference dataset and domain.Our results show that on both datasets, different combinations of components result in different accuracies of recommendation, suggesting that some parts of the model interact strongly. Moreover, these effects are highly dataset-dependent, suggesting the need for exploring these effects before choosing the appropriate combination of components.

### Thursday 2410:30 - 12:00MAS-FVVS - Formal Verification, Validation and Synthesis (220)

Chair: Michael Winikoff
• #4148
Process Plan Controllers for Non-Deterministic Manufacturing Systems
Paolo Felli, Lavindra de Silva, Brian Logan, Svetan Ratchev
Formal Verification, Validation and Synthesis

Determining the most appropriate means of producing a given product, i.e., which manufacturing and assembly tasks need to be performed in which order and how, is termed process planning. In process planning, abstract manufacturing tasks in a process recipe are matched to available manufacturing resources, e.g., CNC machines and robots, to give an executable process plan. A process plan controller then delegates each operation in the plan to specific manufacturing resources. In this paper we present an approach to the automated computation of process plans and process plan controllers. We extend previous work to support both non-deterministic (i.e., partially controllable) resources, and to allow operations to be performed in parallel on the same part. We show how implicit fairness assumptions can be captured in this setting, and how this impacts the definition of process plans.

• #2521
Parameterised Verification of Data-aware Multi-Agent Systems
Francesco Belardinelli, Panagiotis Kouvaros, Alessio Lomuscio
Formal Verification, Validation and Synthesis

We introduce parameterised data-aware multi-agent systems, a formalism to reason about the temporal-epistemic properties of arbitrarily large collections of homogeneous agents, each operating on an infinite data domain. We show that their parameterised verification problem is semi-decidable for classes of interest. This is demonstrated by separately addressing the unboundedness of the number of agents and the the data domain. In doing so we reduce the parameterised model checking problem for these systems to that of parameterised verification for interleaved interpreted systems. We illustrate the expressivity of the formal model by modelling English auctions with an unbounded number of bidders on unbouded data and show how the technique here introduced can be used to give formal guarantees on the resulting system behaviour.

• #3205
A Novel Symbolic Approach to Verifying Epistemic Properties of Programs
Nikos Gorogiannis, Franco Raimondi, Ioana Boureanu
Formal Verification, Validation and Synthesis

We introduce a framework for the symbolic verification of epistemic properties of programs expressed in a class of general-purpose programming languages. To this end, we reduce the verification problem to that of satisfiability of first-order formulae in appropriate theories. We prove the correctness of our reduction and we validate our proposal by applying it to two examples: the dining cryptographers problem and the ThreeBallot voting protocol. We put forward an implementation using existing solvers, and report experimental results showing that the approach can perform better than state-of-the-art symbolic model checkers for temporal-epistemic logic.

• #3245
Verifying Fault-tolerance in Parameterised Multi-Agent Systems
Panagiotis Kouvaros, Alessio Lomuscio
Formal Verification, Validation and Synthesis

We develop a technique to evaluate the fault-tolerance of a multi-agent system whose number of agents is unknown at design time. We present a method for injecting a variety of non-ideal behaviours, or faults, studied in the safety-analysis literature into the abstract agent templates that are used to generate an unbounded family of multi-agent systems with different sizes. We define the parameterised fault-tolerance problem as the decision problem of establishing whether any concrete system, in which the ratio of faulty versus non-faulty agents is under a given threshold, satisfies a given temporal-epistemic specification. We put forward a sound and complete technique for solving the problem for the semantical set-up considered. We present an implementation and a case study identifying the threshold under which the alpha swarm aggregation algorithm is robust to faults against its temporal-epistemic specifications.

• #3373
Verification of Broadcasting Multi-Agent Systems against an Epistemic Strategy Logic
Francesco Belardinelli, Alessio Lomuscio, Aniello Murano, Sasha Rubin
Formal Verification, Validation and Synthesis

We study a class of synchronous, perfect-recall multi-agent systemswith imperfect information and broadcasting (i.e., fully observableactions). We define an epistemic extension of strategy logic withincomplete information and the assumption of uniform and coherentstrategies. In this setting, we prove that the model checking problem,and thus rational synthesis, is decidable with non-elementarycomplexity. We exemplify the applicability of the framework on arational secret-sharing scenario.

• #3627
An Abstraction-Refinement Methodology for Reasoning about Network Games
Guy Avni, Shibashis Guha, Orna Kupferman
Formal Verification, Validation and Synthesis

Network games (NGs) are played on directed graphs and are extensively used in network design and analysis. Search problems for NGs include finding special strategy profiles such as a Nash equilibrium and a globally optimal solution. The networks modeled by NGs may be huge. In formal verification, abstraction has proven to be an extremely effective technique for reasoning about systems with big and even infinite state spaces. We describe an abstraction-refinement methodology for reasoning about NGs. Our methodology is based on an abstraction function that maps the state space of an NG to a much smaller state space. We search for a global optimum and a Nash equilibrium by reasoning on an under- and an over-approximation defined on top of this smaller state space. When the approximations are too coarse to find such profiles, we refine the abstraction function. Our experimental results demonstrate the efficiency of the methodology.

### Thursday 2410:30 - 12:30EAR-4 - Early Career 4 (Plenary 2)

Chair: Craig Knoblock
• #26
Learning from Data Heterogeneity: Algorithms and Applications
Jingrui He
Early Career 4

Nowadays, as an intrinsic property of big data, data heterogeneity can be seen in a variety of real-world applications, ranging from security to manufacturing, from healthcare to crowdsourcing. It refers to any inhomogeneity in the data, and can be present in a variety of forms, corresponding to different types of data heterogeneity, such as task/view/instance/oracle heterogeneity. As shown in previous work as well as our own work, learning from data heterogeneity not only helps people gain a better understanding of the large volume of data, but also provides a means to leverage such data for effective predictive modeling. In this paper, along with multiple real applications, we will briefly review state-of-the-art techniques for learning from data heterogeneity, and demonstrate their performance at addressing these real world problems.

• #31
Unsupervised Learning via Total Correlation Explanation
Greg Ver Steeg
Early Career 4

Learning by children and animals occurs effortlessly and largely without obvious supervision. Successes in automating supervised learning have not translated to the more ambiguous realm of unsupervised learning where goals and labels are not provided. Barlow (1961) suggested that the signal that brains leverage for unsupervised learning is dependence, or redundancy, in the sensory environment. Dependence can be characterized using the information-theoretic multivariate mutual information measure called total correlation. The principle of Total Cor-relation Ex-planation (CorEx) is to learn representations of data that "explain" as much dependence in the data as possible. We review some manifestations of this principle along with successes in unsupervised learning problems across diverse domains including human behavior, biology, and language.

• #22
Reinforcement learning for high stakes domains
Emma Brunskill
Early Career 4

None

• #34
Playing the Wrong Game
Reshef Meir
Early Career 4

A huge body of work from decision making and experimental economics teaches us that a decision maker facing a problem often acts suboptimally due to various behavioral and cognitive biases: risk attitudes, altruism, computational limitations and so on. In multiplayer games, these biases may change the set of game equilibria, and have a non-linear effect on agents’ utilities. However explicitly modelling and analyzing every combination of biases is a daunting task. We show that in routing games with arbitrary heterogeneous biases, agents' costs in equilibrium can be bounded based only on their own subjective cost functions, as well as on structural parameters of the underlying network. Time permits, I will try to give a few insights into the working process, the dead-ends, failures, and unexpected breakthroughs behind this project and my work in general.

### Thursday 2410:30 - 12:30JOU-KR1 - Journal Track: Knowledge Representation 1 (203)

Chair: Franz Baader
• #1522
New Canonical Representations by Augmenting OBDDs with Conjunctive Decomposition (Extended Abstract)
Yong Lai, Dayou Liu, Minghao Yin
Journal Track: Knowledge Representation 1

We identify two families of canonical representations called ROBDD[/\i^]_C and ROBDD[/\T^,i]_T by augmenting ROBDD with two types of conjunctive decompositions. These representations cover the three existing languages ROBDD, ROBDD with as many implied literals as possible (ROBDD-L_&infin), and AND/OR BDD. We introduce a new time efficiency criterion called rapidity which reflects the idea that exponential operations may be preferable if the language can be exponentially more succinct. Then we demonstrate that the expressivity, succinctness and operation rapidity do not decrease from ROBDD[/\T^,i]_T to ROBDD[/\i^]_C, and then to ROBDD[/\i+1^]_C. We also demonstrate that ROBDD[/\i^]_C (i > 1) and ROBDD[/\T^,i]_T are not less tractable than ROBDD-L_&infin and ROBDD, respectively. Finally, we develop a compiler for ROBDD[/\&infin^]_C which significantly advances the compiling efficiency of canonical representations.

• #4202
On the Expressivity of Inconsistency Measures (Extended Abstract)
Matthias Thimm
Journal Track: Knowledge Representation 1

We survey recent approaches to inconsistency measurement in propositional logic and provide a comparative analysis in terms of their expressivity. For that, we introduce four different expressivity characteristics that quantitatively assess the number of different knowledge bases that a measure can distinguish. Our approach aims at complementing ongoing discussions on rationality postulates for inconsistency measures by considering expressivity as a desirable property. We evaluate a large selection of measures on the proposed characteristics and conclude that a distance-based measure from [Grant and Hunter, 2013] has maximal expressivity along all considered characteristics.

• #4205
The Ceteris Paribus Structure of Logics of Game Forms (Extended Abstract)
Davide Grossi, Emiliano Lorini, François Schwarzentruber
Journal Track: Knowledge Representation 1

We present a simple Ceteris Paribus Logic (CP) and study its relationship with existing logics that deal with the representation of choice and power in games in normal form including atemporal STIT, Coalition Logic of Propositional Control (CL-PC) and Dynamic Logic of Propositional Assignments (DL-PA). Thanks to the polynomial reduction of the satisfiability problem for atemporal STIT in the satisfiability problem for CP, we obtain a complexity result for the latter problem.

• #4210
A New Semantics for Overriding in Description Logics (Extended Abstract)
Piero Bonatti, Marco Faella, Iliana M. Petrova, Luigi Sauro
Journal Track: Knowledge Representation 1

Nonmonotonic inferences are not yet supported by Description Logic technology, although their potential usefulness is widely recognized. Lack of support to nonmonotonic reasoning is due to a number of issues related to expressiveness, computational complexity, and optimizations. This work contributes to the practical support of nonmonotonic reasoning in description logics by introducing a new semantics designed to address knowledge engineering needs. The formalism is validated through extensive comparison with the other nonmonotonic DLs, and systematic scalability tests.

• #4228
Automated Conjecturing I: Fajtlowicz's Dalmatian Heuristic Revisited (Extended Abstract)
Craig E. Larson, Nico Van Cleemput
Journal Track: Knowledge Representation 1

This condensed summary highlights the results of a 2016 AIJ paper reporting on a successful general-purpose conjecturing program.

• #4226
Bayesian Network Structure Learning with Integer Programming: Polytopes, Facets and Complexity (Extended Abstract)
James Cussens, Matti Järvisalo, Janne H. Korhonen, Mark Bartlett
Journal Track: Knowledge Representation 1

Developing accurate algorithms for learning structures of probabilistic graphical models is an important problem within modern AI research. Here we focus on score-based structure learning for Bayesian networks as arguably the most central class of graphical models. A successful generic approach to optimal Bayesian network structure learning (BNSL), based on integer programming (IP), is implemented in the Gobnilp system. Despite the recent algorithmic advances, current understanding of foundational aspects underlying the IP based approach to BNSL is still somewhat lacking. In this paper, we provide theoretical contributions towards understanding fundamental aspects of cutting planes and the related separation problem in this context, ranging from NP-hardness results to analysis of polytopes and the related facets in connection to BNSL.

### Thursday 2410:30 - 12:30SIS-MISC - Sister Conference Track: HCI, CBR, Machine Learning, Robotics (204)

Chair: Yair Zick
• #4247
Competence Guided Model for Casebase Maintenance
Ditty Mathew, Sutanu Chakraborti
Sister Conference Track: HCI, CBR, Machine Learning, Robotics

A competence guided casebase maintenance algorithm retains a case in the casebase if it is useful to solve many problems and ensures that the casebase is highly competent. In this paper, we address the compositional adaptation process (of which single case adaptation is a special case) during casebase maintenance by proposing a case competence model for which we propose a measure called retention score to estimate the retention quality of a case. We also propose a revised algorithm based on the retention score to estimate the competent subset of a casebase. We used synthetic datasets to test the effectiveness of the competent subset obtained from the proposed model. We also applied this model in a tutoring application and analyzed the competent subset of concepts in tutoring resources. Empirical results show that the proposed model is effective and overcomes the limitation of footprint-based competence model in compositional adaptation applications.

• #4250
Local Topic Discovery via Boosted Ensemble of Nonnegative Matrix Factorization
Sangho Suh, Jaegul Choo, Joonseok Lee, Chandan K. Reddy
Sister Conference Track: HCI, CBR, Machine Learning, Robotics

Nonnegative matrix factorization (NMF) has been increasingly popular for topic modeling of large-scale documents. However, the resulting topics often represent only general, thus redundant information about the data rather than minor, but potentially meaningful information to users. To tackle this problem, we propose a novel ensemble model of nonnegative matrix factorization for discovering high-quality local topics. Our method leverages the idea of an ensemble model to successively perform NMF given a residual matrix obtained from previous stages and generates a sequence of topic sets. The novelty of our method lies in the fact that it utilizes the residual matrix inspired by a state-of-the-art gradient boosting model and applies a sophisticated local weighting scheme on the given matrix to enhance the locality of topics, which in turn delivers high-quality, focused topics of interest to users.

• #4274
Multi-Type Activity Recognition from a Robot's Viewpoint
Ilaria Gori, J. K. Aggarwal, Larry Matthies, Michael. S. Ryoo
Sister Conference Track: HCI, CBR, Machine Learning, Robotics

The literature in computer vision is rich of works where different types of activities -- single actions, two persons interactions or ego-centric activities, to name a few -- have been analyzed. However, traditional methods treat such types of activities separately, while in real settings detecting and recognizing different types of activities simultaneously is necessary. We first design a new unified descriptor, called Relation History Image (RHI), which can be extracted from all the activity types we are interested in. We then formulate an optimization procedure to detect and recognize activities of different types. We assess our approach on a new dataset recorded from a robot-centric perspective as well as on publicly available datasets, and evaluate its quality compared to multiple baselines.

• #4245
Efficient Techniques for Crowdsourced Top-k Lists
Luca de Alfaro, Vassilis Polychronopoulos, Neoklis Polyzotis
Sister Conference Track: HCI, CBR, Machine Learning, Robotics

We focus on the problem of obtaining top-k lists of items from larger itemsets, using human workers for doing comparisons among items.An example application is short-listing a large set of college applications using advanced students as workers. We describe novel efficient techniques and explore their tolerance to adversarial behavior and the tradeoffs among different measures of performance (latency, expense and quality of results). We empirically evaluate the proposed techniques against prior art using simulations as well as real crowds in Amazon Mechanical Turk. A randomized variant of the proposed algorithms achieves significant budget saves, especially for very large itemsets and large top-k lists, with negligible risk of lowering the quality of the output.

• #4243
The Many Benefits of Annotator Rationales for Relevance Judgments
Tyler McDonnell, Mucahid Kutlu, Tamer Elsayed, Matthew Lease
Sister Conference Track: HCI, CBR, Machine Learning, Robotics

When collecting subjective human ratings of items, it can be difficult to measure and enforce data quality due to task subjectivity and lack of insight into how judges arrive at each rating decision. To address this, we propose requiring judges to provide a specific type of rationale underlying each rating decision. We evaluate this approach in the domain of Information Retrieval, where human judges rate the relevance of Webpages. Cost-benefit analysis over 10,000 judgments collected on Mechanical Turk suggests a win-win: experienced crowd workers provide rationales with no increase in task completion time while providing further benefits, including more reliable judgments and greater transparency.

• #4237
Enhancing Crowdworkers' Vigilance
Avshalom Elmalech, David Sarne, Esther David, Chen Hajaj
Sister Conference Track: HCI, CBR, Machine Learning, Robotics

This paper presents methods for improving the attention span of workers in tasks that heavily rely on their attention to the occurrence of rare events. The underlying idea in our approach is to dynamically augment the task with some dummy (artificial) events at different times throughout the task, rewarding the worker upon identifying and reporting them. The proposed approach is an alternative to the traditional approach of exclusively relying on rewarding the worker for successfully identifying the event of interest itself. We propose three methods for timing the dummy events throughout the task. Two of these methods are static and determine the timing of the dummy events at random or uniformly throughout the task. The third method is dynamic and uses the identification (or misidentification) of dummy events as a signal for the worker's attention to the task, adjusting the rate of dummy events generation accordingly.

### Thursday 2410:30 - 12:30Competition (206)

Chair: Jochen Renz
• Angry Birds
Competition
• ### Thursday 2414:00 - 15:00Invited Talk (Plenary 2)

Chair: Zhi-Hua Zhou
• Deep Learning at Alibaba
Rong Jin
Invited Talk
• ### Thursday 2414:00 - 15:00Invited Talk (203-204)

Chair: Pompeu Casanovas
• From Automation to Autonomous Systems: A Legal Phenomenology with Problems of Accountability
Ugo Pagallo
Invited Talk
• ### Thursday 2415:00 - 16:00Panel (Plenary 2)

Chair: TBD
• AI in 2027
Panelists: Noa Agmon, Sven König, Fausto Giunchiglia, Kevin Leyton-Brown.
Panel
• ### Thursday 2415:00 - 16:00ML-KBL - Knowledge-Based Learning (203-204)

Chair: Freddy Lecue
• #1378
Extracting Visual Knowledge from the Web with Multimodal Learning
Dihong Gong, Daisy Zhe Wang
Knowledge-Based Learning

We consider the problem of automatically extracting visual objects from web images. Despite the extraordinary advancement in deep learning, visual object detection remains a challenging task. To overcome the deficiency of pure visual techniques, we propose to make use of meta text surrounding images on the Web for enhanced detection accuracy. In this paper we present a multimodal learning algorithm to integrate text information into visual knowledge extraction. To demonstrate the effectiveness of our approach, we developed a system that takes raw webpages as input, and automatically extracts visual knowledge (e.g. object bounding boxes) from tens of millions of images crawled from the Web. Experimental results based on 46 object categories show that the extraction precision is improved significantly from 73% (with state-of-the-art deep learning programs) to 81%, which is equivalent to a 31% reduction in error rates.

• #1835
Adversarial Generation of Real-time Feedback with Neural Networks for Simulation-based Training
Xingjun Ma, Sudanthi Wijewickrema, Shuo Zhou, Yun Zhou, Zakaria Mhammedi, Stephen O'Leary, James Bailey
Knowledge-Based Learning

Simulation-based training (SBT) is gaining popularity as a low-cost and convenient training technique in a vast range of applications. However, for a SBT platform to be fully utilized as an effective training tool, it is essential that feedback on performance is provided automatically in real-time during training. It is the aim of this paper to develop an efficient and effective feedback generation method for the provision of real-time feedback in SBT. Existing methods either have low effectiveness in improving novice skills or suffer from low efficiency, resulting in their inability to be used in real-time. In this paper, we propose a neural network based method to generate feedback using the adversarial technique. The proposed method utilizes a bounded adversarial update to minimize a L1 regularized loss via back-propagation. We empirically show that the proposed method can be used to generate simple, yet effective feedback. Also, it was observed to have high effectiveness and efficiency when compared to existing methods, thus making it a promising option for real-time feedback generation in SBT.

• #3101
Object Detection Meets Knowledge Graphs
Yuan Fang, Kingsley Kuan, Jie Lin, Cheston Tan, Vijay Chandrasekhar
Knowledge-Based Learning

Object detection in images is a crucial task in computer vision, with important applications ranging from security surveillance to autonomous vehicles. Existing state-of-the-art algorithms, including deep neural networks, only focus on utilizing features within an image itself, largely neglecting the vast amount of background knowledge about the real world. In this paper, we propose a novel framework of knowledge-aware object detection, which enables the integration of external knowledge such as knowledge graphs into any object detection algorithm. The framework employs the notion of semantic consistency to quantify and generalize knowledge, which improves object detection through a re-optimization process to achieve better consistency with background knowledge. Finally, empirical evaluation on two benchmark datasets show that our approach can significantly increase recall by up to 6.3 points without compromising mean average precision, when compared to the state-of-the-art baseline.

• #3493
Logic Tensor Networks for Semantic Image Interpretation
Ivan Donadello, Luciano Serafini, Artur d'Avila Garcez
Knowledge-Based Learning

Semantic Image Interpretation (SII) is the task of extracting structured semantic descriptions from images. It is widely agreed that the combined use of visual data and background knowledge is of great importance for SII. Recently, Statistical Relational Learning (SRL) approaches have been developed for reasoning under uncertainty and learning in the presence of data and rich knowledge. Logic Tensor Networks (LTNs) are a SRL framework which integrates neural networks with first-order fuzzy logic to allow (i) efficient learning from noisy data in the presence of logical constraints, and (ii) reasoning with logical formulas describing general properties of the data. In this paper, we develop and apply LTNs to two of the main tasks of SII, namely, the classification of an image's bounding boxes and the detection of the relevant part-of relations between objects. To the best of our knowledge, this is the first successful application of SRL to such SII tasks. The proposed approach is evaluated on a standard image processing benchmark. Experiments show that background knowledge in the form of logical constraints can improve the performance of purely data-driven approaches, including the state-of-the-art Fast Region-based Convolutional Neural Networks (Fast R-CNN). Moreover, we show that the use of logical background knowledge adds robustness to the learning system when errors are present in the labels of the training data.

### Thursday 2415:00 - 16:00UAI-UAI - Uncertainty (210)

Chair: Nic Wilson
• #1408
Plato's Cave in the Dempster-Shafer land--the Link between Pignistic and Plausibility Transformations
Chunlai Zhou, Biao Qin, Xiaoyong Du
Uncertainty

In reasoning under uncertainty in AI, there are (at least) two useful and different ways of understanding beliefs: the first is as absolute belief or degree of belief in propositions and the second is as belief update or measure of change in belief. Pignistic and plausibility transformations are two well-known probability transformations that map belief functions to probability functions in the Dempster-Shafer theory of evidence. In this paper, we establish the link between pignistic and plausibility transformations by devising a belief-update framework for belief functions where plausibility transformation works on belief update while pignistic transformation operates on absolute belief. In this framework, we define a new belief-update operator connecting the two transformations, and interpret the framework in a belief-function model of parametric statistical inference. As a metaphor, these two transformations projecting the belief-update framework for belief functions to that for probabilities are likened to the fire projecting reality into shadows on the wall in Plato's cave.

• #2328
Adaptive Elicitation of Preferences under Uncertainty in Sequential Decision Making Problems
Nawal Benabbou, Patrice Perny
Uncertainty

This paper aims to introduce an adaptive preference elicitation method for interactive decision support in sequential decision problems. The Decision Maker's preferences are assumed to be representable by an additive utility, initially unknown or imperfectly known. We first study the determination of possibly optimal policies when admissible utilities are imprecisely defined by some linear constraints derived from observed preferences. Then, we introduce a new approach interleaving elicitation of utilities and backward induction to incrementally determine an optimal or near-optimal policy. We propose an interactive algorithm with performance guarantees and describe numerical experiments demonstrating the practical efficiency of our approach.

• #2437
Incremental Decision Making Under Risk with the Weighted Expected Utility Model
Hugo Gilbert, Nawal Benabbou, Patrice Perny, Olivier Spanjaard, Paolo Viappiani
Uncertainty

This paper deals with decision making under risk with the Weighted Expected Utility (WEU) model, which is a model generalizing expected utility and providing stronger descriptive possibilities. We address the problem of identifying, within a given set of lotteries, a (near-)optimal solution for a given decision maker consistent with the WEU theory. The WEU model is parameterized by two real-valued functions. We propose here a new incremental elicitation procedure to progressively reduce the imprecision about these functions until a robust decision can be made. We also give experimental results showing the practical efficiency of our method.

• #2685
Causal Discovery from Nonstationary/Heterogeneous Data: Skeleton Estimation and Orientation Determination
Kun Zhang, Biwei Huang, Jiji Zhang, Clark Glymour, Bernhard Schölkopf
Uncertainty

It is commonplace to encounter nonstationary or heterogeneous data, of which the underlying generating process changes over time or across data sets (the data sets may have different experimental conditions or data collection conditions). Such a distribution shift feature presents both challenges and opportunities for causal discovery. In this paper we develop a principled framework for causal discovery from such data, called Constraint-based causal Discovery from Nonstationary/heterogeneous Data (CD-NOD), which addresses two important questions. First, we propose an enhanced constraint-based procedure to detect variables whose local mechanisms change and recover the skeleton of the causal structure over observed variables. Second, we present a way to determine causal orientations by making use of independence changes in the data distribution implied by the underlying causal model, benefiting from information carried by changing distributions. Experimental results on various synthetic and real-world data sets are presented to demonstrate the efficacy of our methods.

### Thursday 2415:00 - 16:00MT-CS - Computational Sustainability (211)

Chair: Zinovi Rabinovich
• #1585
Operation Frames and Clubs in Kidney Exchange
Gabriele Farina, John P. Dickerson, Tuomas Sandholm
Computational Sustainability

A kidney exchange is a centrally-administered barter market where patients swap their willing yet incompatible donors. Modern kidney exchanges use 2-cycles, 3-cycles, and chains initiated by non-directed donors (altruists who are willing to give a kidney to anyone) as the means for swapping. We propose significant generalizations to kidney exchange. We allow more than one donor to donate in exchange for their desired patient receiving a kidney. We also allow for the possibility of a donor willing to donate if any of a number of patients receive kidneys. Furthermore, we combine these notions and generalize them. The generalization is to exchange among organ clubs, where a club is willing to donate organs outside the club if and only if the club receives organs from outside the club according to given specifications. We prove that unlike in the standard model, the uncapped clearing problem is NP-complete. We also present the notion of operation frames that can be used to sequence the operations across batches, and present integer programming formulations for the market clearing problems for these new types of organ exchanges. Experiments show that in the single-donation setting, operation frames improve planning by 34% - 51%. Allowing up to two donors to donate in exchange for one kidney donated to their designated patient yields a further increase in social welfare.

• #1708
Contract Design for Energy Demand Response
Reshef Meir, Hongyao Ma, Valentin Robu
Computational Sustainability

Power companies such as Southern California Edison (SCE) uses Demand Response (DR) contracts to incentivize consumers to reduce their power consumption during periods when demand forecast exceeds supply. Current mechanisms in use offer contracts to consumers independent of one another, do not take into consideration consumers' heterogeneity in consumption profile or reliability, and fail to achieve high participation. We introduce DR-VCG, a new DR mechanism that offers a flexible set of contracts (which may include the standard SCE contracts) and uses VCG pricing. We prove that DR-VCG elicits truthful bids, incentivizes honest preparation efforts, and enables efficient computation of allocation and prices. With simple fixed-penalty contracts, the optimization goal of the mechanism is an upper bound on probability that the reduction target is missed. Extensive simulations show that compared to the current mechanism deployed by SCE, the DR-VCG mechanism achieves higher participation, increased reliability, and significantly reduced total expenses.

• #2142
Blue Skies: A Methodology for Data-Driven Clear Sky Modelling
Kartik Palani, Ramachandra Kota, Amar Prakash Azad, Vijay Arya
Computational Sustainability

One of the major challenges confronting the widespread adoption of solar energy is the uncertainty of production. The energy generated by photo-voltaic systems is a function of the received solar irradiance which varies due to atmospheric and weather conditions. A key component required for forecasting irradiance accurately is the clear sky model which estimates the average irradiance at a location at a given time in the absence of clouds. Current methods for modelling clear sky irradiance are either inaccurate or require extensive atmospheric data, which tends to vary with location and is often unavailable. In this paper, we present a data-driven methodology, Blue Skies, for modelling clear sky irradiance solely based on historical irradiance measurements. Using machine learning techniques, Blue Skies is able to generate clear sky models that are more accurate spatio-temporally compared to the state of the art, reducing errors by almost 50%.

• #2383
Deep Multi-species Embedding
Di Chen, Yexiang Xue, Daniel Fink, Shuo Chen, Carla P. Gomes
Computational Sustainability

Understanding how species are distributed across landscapes over time is a fundamental question in biodiversity research. Unfortunately, most species distribution models only target a single species at a time, despite strong ecological evidence that species are not independently distributed. We propose Deep Multi-Species Embedding (DMSE), which jointly embeds vectors corresponding to multiple species as well as vectors representing environmental covariates into a common high-dimensional feature space via a deep neural network. Applied to bird observational data from the citizen science project eBird, we demonstrate how the DMSE model discovers inter-species relationships to outperform single-species distribution models (random forests and SVMs) as well as competing multi-label models. Additionally, we demonstrate the benefit of using a deep neural network to extract features within the embedding and show how they improve the predictive performance of species distribution modelling. An important domain contribution of the DMSE model is the ability to discover and describe species interactions while simultaneously learning the shared habitat preferences among species. As an additional contribution, we provide a graphical embedding of hundreds of bird species in the Northeast US.

### Thursday 2415:00 - 16:00MAS-SC - Social Choice (212)

Chair: Liz Sonenberg
• #1614
Multiwinner Rules on Paths From k-Borda to Chamberlin–Courant
Piotr Faliszewski, Piotr Skowron, Arkadii Slinko, Nimrod Talmon
Social Choice

The classical multiwinner rules are designed for particular purposes. For example, variants of k-Borda are used to find k best competitors in judging contests while the Chamberlin-Courant rule is used to select a diverse set of k products. These rules represent two extremes of the multiwinner world. At times, however, one might need to find an appropriate trade-off between these two extremes. We explore continuous transitions from k-Borda to Chamberlin-Courant and study intermediate rules.

• #1964
Fair and Efficient Social Choice in Dynamic Settings
Rupert Freeman, Seyed Majid Zahedi, Vincent Conitzer
Social Choice

We study a dynamic social choice problem in which an alternative is chosen at each round according to the reported valuations of a set of agents. In the interests of obtaining a solution that is both efficient and fair, we aim to maximize the long-term Nash social welfare, which is the product of all agents' utilities. We present and analyze two greedy algorithms for this problem, including the classic Proportional Fair (PF) algorithm. We analyze several versions of the algorithms and how they relate, and provide an axiomatization of PF. Finally, we evaluate the algorithms on data gathered from a computer systems application.

• #2097
Online Roommate Allocation Problem
Guangda Huzhang, Xin Huang, Shengyu Zhang, Xiaohui Bei
Social Choice

We study the online allocation problem under a roommate market model introduced in [Chan et al., 2016]. Consider a fixed supply of n rooms and a list of 2n applicants arriving sequentially in an online fashion. The problem is to assign a room to each person upon her arrival, such that after the algorithm terminates, each room is shared by exactly two people. We focus on two objectives: (1) maximizing the social welfare, which is defined as the sum of valuations that applicants have for their rooms, plus the happiness value between each pair of roommates; (2) the allocation should satisfy certain stability conditions, such that no group of people would be willing to switch roommates or rooms. We first show a polynomial-time online algorithm that achieves constant competitive ratio for social welfare maximization. We then extend it to the case where each room is assigned to c > 2 people, and achieve a competitive ratio of Ω(1/c^2). Finally, we show both positive and negative results in satisfying different stability conditions in this online setting.

• #3687
On Coalitional Manipulation for Multiwinner Elections: Shortlisting
Robert Bredereck, Andrzej Kaczmarczyk, Rolf Niedermeier
Social Choice

Shortlisting of candidates—selecting a group of “best” candidates—is a special case of multiwinner elections. We provide the first in-depth study of the computational complexity of strategic voting for shortlisting based on the most natural and simple voting rule in this scenario, l-Bloc (every voter approves l candidates). In particular, we investigate the influence of several tie-breaking mechanisms (e.g. pessimistic versus optimistic) and group evaluation functions (e.g. egalitarian versus utilitarian) and conclude that in an egalitarian setting strategic voting may indeed be computationally intractable regardless of the tie-breaking rule. We provide a fairly comprehensive picture of the computational complexity landscape of this neglected scenario.

### Thursday 2415:00 - 16:00MAS-AOSE - Agent-Oriented Software Engineering (213)

Chair: Michael Winikoff
• #3334
Constraint Games revisited
Anthony Palmieri, Arnaud Lallouet
Agent-Oriented Software Engineering

Constraint Games are a recent framework proposed to model and solve static games where Constraint Programming is used to express players preferences. In this paper, we rethink their solving technique in terms of constraint propagation by considering players preferences as global constraints. It yields not only a more elegant but also a more efficient framework. Our new complete solver is faster than previous state-of-the-art and is able to find all pure Nash equilibria for some problems with 200 players. We also show that performances can greatly be improved for graphical games, allowing some games with 2000 players to be solved.

• #1406
Agent Design Consistency Checking via Planning
Nitin Yadav, John Thangarajah, Sebastian Sardina
Agent-Oriented Software Engineering

In this work we present a novel approach to check the consistency of agent designs (prior to any implementation) with respect to the requirements specifications via automated planning. This checking is essentially a search problem which makes planning technology an appropriate solution. We focus our work on BDI agent systems and the Prometheus design methodology in order to directly compare our approach to previous work. Our experiments in more than 16K random instances prove that the approach is more effective than previous ones proposed: it achieves higher coverage, lower run-time, and importantly, can handle loops in the agent detailed design and unbounded subgoal reasoning.

• #1684
Omniscient Debugging for Cognitive Agent Programs
Vincent J. Koeman, Koen V. Hindriks, Catholijn M. Jonker
Agent-Oriented Software Engineering

For real-time programs reproducing a bug by rerunning the system is likely to fail, making fault localization a time-consuming process. Omniscient debugging is a technique that stores each run in such a way that it supports going backwards in time. However, the overhead of existing omniscient debugging implementations for languages like Java is so large that it cannot be effectively used in practice. In this paper, we show that for agent-oriented programming practical omniscient debugging is possible. We design a tracing mechanism for efficiently storing and exploring agent program runs. We are the first to demonstrate that this mechanism does not affect program runs by empirically establishing that the same tests succeed or fail. Usability is supported by a trace visualization method aimed at more effectively locating faults in agent programs.

• #2593
No Pizza for You: Value-based Plan Selection in BDI Agents
Stephen Cranefield, Michael Winikoff, Virginia Dignum, Frank Dignum
Agent-Oriented Software Engineering

Autonomous agents are increasingly required to be able to make moral decisions. In these situations, the agent should be able to reason about the ethical bases of the decision and explain its decision in terms of the moral values involved. This is of special importance when the agent is interacting with a user and should understand the value priorities of the user in order to provide adequate support. This paper presents a model of agent behavior that takes into account user preferences and moral values.

### Thursday 2415:00 - 16:00NLP-NLG - Natural Language Generation (216)

Chair: Ingrid Zukerman
• #3333
Human-Centric Justification of Machine Learning Predictions
Or Biran, Kathleen McKeown
Natural Language Generation

Human decision makers in many domains can make use of predictions made by machine learning models in their decision making process, but the usability of these predictions is limited if the human is unable to justify his or her trust in the prediction. We propose a novel approach to producing justifications that is geared towards users without machine learning expertise, focusing on domain knowledge and on human reasoning, and utilizing natural language generation. Through a task-based experiment, we show that our approach significantly helps humans to correctly decide whether or not predictions are accurate, and significantly increases their satisfaction with the justification.

• #1296
MAT: A Multimodal Attentive Translator for Image Captioning
Chang Liu, Fuchun Sun, Changhu Wang, Feng Wang, Alan Yuille
Natural Language Generation

In this work we formulate the problem of image captioning as a multimodal translation task. Analogous to machine translation, we present a sequence-to-sequence recurrent neural networks (RNN) model for image caption generation. Different from most existing work where the whole image is represented by convolutional neural network (CNN) feature, we propose to represent the input image as a sequence of detected objects which feeds as the source sequence of the RNN model. In this way, the sequential representation of an image can be naturally translated to a sequence of words, as the target sequence of the RNN model. To represent the image in a sequential way, we extract the objects features in the image and arrange them in a order using convolutional neural networks. To further leverage the visual information from the encoded objects, a sequential attention layer is introduced to selectively attend to the objects that are related to generate corresponding words in the sentences. Extensive experiments are conducted to validate the proposed approach on popular benchmark dataset, i.e., MS COCO, and the proposed model surpasses the state-of-the-art methods in all metrics following the dataset splits of previous work. The proposed approach is also evaluated by the evaluation server of MS COCO captioning challenge, and achieves very competitive results, e.g., a CIDEr of 1.029 (c5) and 1.064 (c40).

• #2061
From Neural Sentence Summarization to Headline Generation: A Coarse-to-Fine Approach
Jiwei Tan, Xiaojun Wan, Jianguo Xiao
Natural Language Generation

Headline generation is a task of abstractive text summarization, and previously suffers from the immaturity of natural language generation techniques. Recent success of neural sentence summarization models shows the capacity of generating informative, fluent headlines conditioned on selected recapitulative sentences. In this paper, we investigate the extension of sentence summarization models to the document headline generation task. The challenge is that extending the sentence summarization model to consider more document information will mostly confuse the model and hurt the performance. In this paper, we propose a coarse-to-fine approach, which first identifies the important sentences of a document using document summarization techniques, and then exploits a multi-sentence summarization model with hierarchical attention to leverage the important sentences for headline generation. Experimental results on a large real dataset demonstrate the proposed approach significantly improves the performance of neural sentence summarization models on the headline generation task.

• #2874
A Correlated Topic Model Using Word Embeddings
Guangxu Xun, Yaliang Li, Wayne Xin Zhao, Jing Gao, Aidong Zhang
Natural Language Generation

Conventional correlated topic models are able to capture correlation structure among latent topics by replacing the Dirichlet prior with the logistic normal distribution. Word embeddings have been proven to be able to capture semantic regularities in language. Therefore, the semantic relatedness and correlations between words can be directly calculated in the word embedding space, for example, via cosine values. In this paper, we propose a novel correlated topic model using word embeddings. The proposed model enables us to exploit the additional word-level correlation information in word embeddings and directly model topic correlation in the continuous word embedding space. In the model, words in documents are replaced with meaningful word embeddings, topics are modeled as multivariate Gaussian distributions over the word embeddings and topic correlations are learned among the continuous Gaussian topics. A Gibbs sampling solution with data augmentation is given to perform inference. We evaluate our model on the 20 Newsgroups dataset and the Reuters-21578 dataset qualitatively and quantitatively. The experimental results show the effectiveness of our proposed model.

### Thursday 2415:00 - 16:00PL-TFP - Theoretical Foundations of Planning (217)

Chair: Sebastian Sardina
• #3395
Generalized Planning: Non-Deterministic Abstractions and Trajectory Constraints
Blai Bonet, Giuseppe De Giacomo, Hector Geffner, Sasha Rubin
Theoretical Foundations of Planning

We study the characterization and computation of general policies for families of problems that share a structure characterized by a common reduction into a single abstract problem. Policies mu that solve the abstract problem P have been shown to solve all problems Q that reduce to P provided that mu terminates in Q. In this work, we shed light on why this termination condition is needed and how it can be removed. The key observation is that the abstract problem P captures the common structure among the concrete problems Q that is local (Markovian) but misses common structure that is global. We show how such global structure can be captured by means of trajectory constraints that in many cases can be expressed as LTL formulas, thus reducing generalized planning to LTL synthesis. Moreover, for a broad class of problems that involve integer variables that can be increased or decreased, trajectory constraints can be compiled away, reducing generalized planning to fully observable non-deterministic planning.

• #1365
Efficient, Safe, and Probably Approximately Complete Learning of Action Models
Roni Stern, Brendan Juba
Theoretical Foundations of Planning

In this paper we explore the theoretical boundaries of planning in a setting where no model of the agent's actions is given. Instead of an action model, a set of successfully executed plans are given and the task is to generate a plan that is safe, i.e., guaranteed to achieve the goal without failing. To this end, we show how to learn a conservative model of the world in which actions are guaranteed to be applicable. This conservative model is then given to an off-the-shelf classical planner, resulting in a plan that is guaranteed to achieve the goal. However, this reduction from a model-free planning to a model-based planning is not complete: in some cases a plan will not be found even when such exists. We analyze the relation between the number of observed plans and the likelihood that our conservative approach will indeed fail to solve a solvable problem. Our analysis show that the number of trajectories needed scales gracefully.

• #2106
An Improved Approximation Algorithm for the Subpath Planning Problem and Its Generalization
Hanna Sumita, Yuma Yonebayashi, Naonori Kakimura, Ken-ichi Kawarabayashi
Theoretical Foundations of Planning

This paper focuses on a generalization of the traveling salesman problem (TSP), called the subpath planning problem (SPP). Given 2n vertices and n independent edges on a metric space, we aim to find a shortest tour that contains all the edges. SPP is one of the fundamental problems in both artificial intelligence and robotics. Our main result is to design a 1.5-approximation algorithm that runs in polynomial time, improving the currently best approximation algorithm. The idea is direct use of techniques developed for TSP. In addition, we propose a generalization of SPP called the subgroup planning problem (SGPP). In this problem, we are given a set of disjoint groups of vertices, and we aim to find a shortest tour such that all the vertices in each group are traversed sequentially. We propose a 3-approximation algorithm for SGPP. We also conduct numerical experiments. Compared with previous algorithms, our algorithms improve the solution quality by more than 10% for large instances with more than 10,000 vertices.

• #3532
Hierarchical Task Network Planning with Task Insertion and State Constraints
Zhanhao Xiao, Andreas Herzig, Laurent Perrussel, Hai Wan, Xiaoheng Su
Theoretical Foundations of Planning

We extend hierarchical task network planning with task insertion (TIHTN) by introducing state constraints, called TIHTNS. We show that just as for TIHTN planning, all solutions of the TIHTNS planning problem can be obtained by acyclic decomposition and task insertion, entailing that its plan-existence problem is decidable without any restriction on decomposition methods. We also prove that the extension by state constraints does not increase the complexity of the plan-existence problem, which stays 2-NEXPTIME-complete, based on an acyclic progression operator. In addition, we show that TIHTNS planning covers not only the original TIHTN planning but also hierarchy-relaxed hierarchical goal network planning.

### Thursday 2415:00 - 16:00PL-APLI - Applications of Planning (218)

Chair: Daniele Magazzeni
• #2500
Generalized Target Assignment and Path Finding Using Answer Set Programming
Van Nguyen, Philipp Obermeier, Tran Cao Son, Torsten Schaub, William Yeoh
Applications of Planning

In Multi-Agent Path Finding (MAPF), a team of agents needs to find collision-free paths from their starting locations to their respective targets. Combined Target Assignment and Path Finding (TAPF) extends MAPF by including the problem of assigning targets to agents as a precursor to the MAPF problem. A limitation of both models is their assumption that the number of agents and targets are equal, which is invalid in some applications such as autonomous warehouse systems. We address this limitation by generalizing TAPF to allow for (1)~unequal number of agents and tasks; (2)~tasks to have deadlines by which they must be completed; (3)~ordering of groups of tasks to be completed; and (4)~tasks that are composed of a sequence of checkpoints that must be visited in a specific order. Further, we model the problem using answer set programming (ASP) to show that customizing the desired variant of the problem is simple one only needs to choose the appropriate combination of ASP rules to enforce it. We also demonstrate experimentally that if problem specific information can be incorporated into the ASP encoding then ASP based method can be efficient and can scale up to solve practical applications.

• #3850
Temporal Planning for Compilation of Quantum Approximate Optimization Circuits
Davide Venturelli, Minh Do, Eleanor Rieffel, Jeremy Frank
Applications of Planning

We investigate the application of temporal planners to the problem of compiling quantum circuits to emerging quantum hardware. While our approach is general, we focus our initial experiments on Quantum Approximate Optimization Algorithm (QAOA) circuits that have few ordering constraints and thus allow highly parallel plans. We report on experiments using several temporal planners to compile circuits of various sizes to a realistic hardware architecture. This early empirical evaluation suggests that temporal planning is a viable approach to quantum circuit compilation.

• #3958
Softpressure: A Schedule-Driven Backpressure Algorithm for Coping with Network Congestion
Hsu-Chieh Hu, Stephen F. Smith
Applications of Planning

We consider the problem of minimizing the the delay of jobs moving through a directed graph of service nodes. In this problem, each node may have several links and is constrained to serve one link at a time. As jobs move through the network, they can pass through a node only after they have been serviced by that node. The objective is to minimize the delay jobs incur sitting on queues waiting to be serviced. Two popular approaches to this problem are backpressure algorithm and schedule-driven control. In this paper, we present a hybrid approach of those two methods that incorporates the stability of queuing theory into the schedule-driven control. We then demonstrate how this hybrid method outperforms the other two in a real-time traffic signal control problem, where the nodes are traffic lights, the links are roads, and the jobs are vehicles. We show through simulations that, in scenarios with heavy congestion, the hybrid method results in 50% and 15% reductions in delay over schedule-driven control and backpressure respectively. A theoretical analysis also justifies our results.

• #3097
Generating Context-Free Grammars using Classical Planning
Javier Segovia-Aguas, Sergio Jiménez, Anders Jonsson
Applications of Planning

This paper presents a novel approach for generating Context-Free Grammars (CFGs) from small sets of input strings (a single input string in some cases). Our approach is to compile this task into a classical planning problem whose solutions are sequences of actions that build and validate a CFG compliant with the input strings. In addition, we show that our compilation is suitable for implementing the two canonical tasks for CFGs, string production and string recognition.

### Thursday 2415:00 - 16:00MAS-ATA - Agreement Technologies: Argumentation (219)

Chair: Matthias Thimm
• #893
Acceptability Semantics for Weighted Argumentation Frameworks
Leila Amgoud, Jonathan Ben-Naim, Dragan Doder, Srdjan Vesic
Agreement Technologies: Argumentation

The paper studies semantics that evaluate arguments in argumentation graphs, where each argument has a basic strength, and may be attacked by other arguments. It starts by defining a set of principles, each of which is a property that a semantics could satisfy. It provides the first formal analysis and comparison of existing semantics. Finally, it defines three novel semantics that satisfy more principles than existing ones.

• #1249
Measuring the Intensity of Attacks in Argumentation Graphs with Shapley Value
Leila Amgoud, Jonathan Ben-Naim, Srdjan Vesic
Agreement Technologies: Argumentation

In an argumentation setting, a semantics evaluates the overall acceptability of arguments. Consequently, it reveals the global loss incurred by each argument due to attacks. However, it does not say anything on the contribution of each attack to that loss. This paper introduces the novel concept of contribution measure which evaluates those contributions. It starts by defining a set of axioms that a reasonable measure would satisfy, then shows that the Shapley value is the unique measure that satisfies them. Finally, it investigates the properties of the latter under existing semantics.

• #2390
A Bayesian Approach to Argument-Based Reasoning for Attack Estimation
Hiroyuki Kido, Keishi Okamoto
Agreement Technologies: Argumentation

The web is a source of a large amount of arguments and their acceptability statuses (e.g., votes for and against the arguments). However, relations existing between the fore-mentioned arguments are typically not available. This study investigates the utilisation of acceptability semantics to statistically estimate an attack relation between arguments wherein the acceptability statuses of arguments are provided. A Bayesian network model of argument-based reasoning is defined in which Dung's theory of abstract argumentation gives the substance of Bayesian inference. The model correctness is demonstrated by analysing properties of estimated attack relations and illustrating its applicability to online forums.

• #3000
Efficient Computation of Extensions for Dynamic Abstract Argumentation Frameworks: An Incremental Approach
Gianvincenzo Alfano, Sergio Greco, Francesco Parisi
Agreement Technologies: Argumentation

Abstract argumentation frameworks (AFs) are a well-known formalism for modelling and deciding many argumentation problems. Computational issues and evaluation algorithms have been deeply investigated for static AFs, whose structure does not change over the time. However, AFs are often dynamic as a consequence of the fact that argumentation is inherently dynamic. In this paper, we tackle the problem of incrementally computing extensions for dynamic AFs: given an initial extension and an update (or a set of updates), we devise a technique for computing an extension of the updated AF under four well-known semantics (i.e., complete, preferred, stable, and grounded). The idea is to identify a reduced (updated) AF sufficient to compute an extension of the whole AF and use state-of-the-art algorithms to recompute an extension of the reduced AF only. The experiments reveal that, for all semantics considered and using different solvers, the incremental technique is on average two orders of magnitude faster than computing the semantics from scratch.

### Thursday 2415:00 - 16:00NLP-QA - Question Answering (220)

Chair: Rafal Rzepka
• #3161
Automatic Generation of Grounded Visual Questions
Shijie Zhang, Lizhen Qu, Shaodi You, Zhenglu Yang, Jiawan Zhang
Question Answering

In this paper, we propose the first model to be able to generate visually grounded questions with diverse types for a single image. Visual question generation is an emerging topic which aims to ask questions in natural language based on visual input. To the best of our knowledge, it lacks automatic methods to generate meaningful questions with various types for the same visual input. To circumvent the problem, we propose a model that automatically generates visually grounded questions with varying types. Our model takes as input both images and the captions generated by a dense caption model, samples the most probable question types, and generates the questions in sequel. The experimental results on two real world datasets show that our model outperforms the strongest baseline in terms of both correctness and diversity with a wide margin.

• #3544
Symbolic Priors for RNN-based Semantic Parsing
Chunyang Xiao, Marc Dymetman, Claire Gardent
Question Answering

Seq2seq models based on Recurrent Neural Networks (RNNs) have recently received a lot of attention in the domain of Semantic Parsing. While in principle they can be trained directly on pairs (natural language utterances, logical forms), their performance is limited by the amount of available data. To alleviate this problem, we propose to exploit various sources of prior knowledge: the well-formedness of the logical forms is modeled by a weighted context-free grammar; the likelihood that certain entities present in the input utterance are also present in the logical form is modeled by weighted finite-state automata. The grammar and automata are combined together through an efficient intersection algorithm to form a soft guide (“background”) to the RNN.We test our method on an extension of the Overnight dataset and show that it not only strongly improves over an RNN baseline, but also outperforms non-RNN models based on rich sets of hand-crafted features.

• #3560
Solving Probability Problems in Natural Language
Anton Dries, Angelika Kimmig, Jesse Davis, Vaishak Belle, Luc de Raedt
Question Answering

The ability to solve probability word problems such as those found in introductory discrete mathematics textbooks, is an important cognitive and intellectual skill. In this paper, we develop a two-step end-to-end fully automated approach for solving such questions that is able to automatically provide answers to exercises about probability formulated in natural language.In the first step, a question formulated in natural language is analysed and transformed into a high-level model specified in a declarative language. In the second step, a solution to the high-level model is computed using a probabilistic programming system. On a dataset of 2160 probability problems, our solver is able to correctly answer 97.5% of the questions given a correct model. On the end-to-end evaluation, we are able to answer 12.5% of the questions (or 31.1% if we exclude examples not supported by design).

• #3127
Finding Prototypes of Answers for Improving Answer Sentence Selection
Wai Lok Tam, Namgi Han, Juan Ignacio Navarro-Horñiacek, Yusuke Miyao
Question Answering

Answer sentence selection has been widely adopted recently for benchmarking techniques in Question Answering. Previous proposals for the task are essentially general solutions taking the form of neural networks that measure semantic similarity. In contrast, the present paper describes a simple technique to take advantage of such general-purpose tools for dealing with questions and answer sentences without changing the base system. The technique involves replacing wh-words in input questions with a word denoting the prototype of all answers. These transformed questions are passed as input to an existing neural network built for measuring semantic similarity. This technique is evaluated on two different neural network architectures over two datasets: TrecQA and WikiQA. Results of our experiments show improvement in overall accuracy across most question types we are interested in: who', when' and where'-type questions.

### Thursday 2415:00 - 16:00Competition (206)

Chair: Jochen Renz
• Angry Birds
Competition
• ### Thursday 2416:30 - 18:00AUT-ETH - AI & Autonomy: Ethics and Responsibility (Plenary 2)

Chair: Michael Rovatsos
• #4197
Responsible Autonomy
Virginia Dignum
AI & Autonomy: Ethics and Responsibility

As intelligent systems are increasingly making decisions that directly affect society, perhaps the most important upcoming research direction in AI is to rethink the ethical implications of their actions. Means are needed to integrate moral, societal and legal values with technological developments in AI, both during the design process as well as part of the deliberation algorithms employed by these systems. In this paper, we describe leading ethics theories and propose alternative ways to ensure ethical behavior by artificial systems. Given that ethics are dependent on the socio-cultural context and are often only implicit in deliberation processes, methodologies are needed to elicit the values held by designers and stakeholders, and to make these explicit leading to better understanding and trust on artificial autonomous systems.

• #4204
Should Robots be Obedient?
Smitha Milli, Dylan Hadfield-Menell, Anca Dragan, Stuart Russell
AI & Autonomy: Ethics and Responsibility

Intuitively, obedience -- following the order that a human gives -- seems like a good property for a robot to have. But, we humans are not perfect and we may give orders that are not best aligned to our preferences. We show that when a human is not perfectly rational then a robot that tries to infer and act according to the human's underlying preferences can always perform better than a robot that simply follows the human's literal order. Thus, there is a tradeoff between the obedience of a robot and the value it can attain for its owner. We investigate how this tradeoff is impacted by the way the robot infers the human's preferences, showing that some methods err more on the side of obedience than others. We then analyze how performance degrades when the robot has a misspecified model of the features that the human cares about or the level of rationality of the human. Finally, we study how robots can start detecting such model misspecification. Overall, our work suggests that there might be a middle ground in which robots intelligently decide when to obey human orders, but err on the side of obedience.

• #3841
On Automating the Doctrine of Double Effect
Naveen Sundar Govindarajulu, Selmer Bringsjord
AI & Autonomy: Ethics and Responsibility

The doctrine of double effect (DDE) is a long-studied ethical principle that governs when actions that have both positive and negative effects are to be allowed. The goal in this paper is to automate DDE. We briefly present DDE, and use a first-order modal logic, the deontic cognitive event calculus, as our framework to formalize the doctrine. We present formalizations of increasingly stronger versions of the principle, including what is known as the doctrine of triple effect. We then use our framework to simulate successfully scenarios that have been used to test the presence of the principle in human subjects. Our framework can be used in two different modes. One can use it to build DDE-compliant autonomous systems from scratch, or one can use it to verify that a given AI system is DDE-complaint, by applying a DDE layer on an existing system or model. For the latter mode, the underlying AI system can be built using any architecture (planners, deep neural networks, bayesian networks, knowledge-representation systems, or a hybrid); as long as the system exposes a few parameters in its model, such verification is possible. The role of the DDE layer here is akin to a (dynamic or static) software verifier that examines existing software modules. Finally, we end by sketching initial work on how one can apply our DDE layer to the STRIPS-style planning model, and to a modified POMDP model. This is preliminary work to illustrate the feasibility of the second mode, and we hope that our initial sketches can be useful for other researchers in incorporating DDE in their own frameworks.

• #1923
When Will Negotiation Agents Be Able to Represent Us? The Challenges and Opportunities for Autonomous Negotiators
Tim Baarslag, Michael Kaisers, Enrico H. Gerding, Catholijn M. Jonker, Jonathan Gratch
AI & Autonomy: Ethics and Responsibility

Computers that negotiate on our behalf hold great promise for the future and will even become indispensable in emerging application domains such as the smart grid and the Internet of Things. Much research has thus been expended to create agents that are able to negotiate in an abundance of circumstances. However, up until now, truly autonomous negotiators have rarely been deployed in real-world applications. This paper sizes up current negotiating agents and explores a number of technological, societal and ethical challenges that autonomous negotiation systems have brought about. The questions we address are: in what sense are these systems autonomous, what has been holding back their further proliferation, and is their spread something we should encourage? We relate the automated negotiation research agenda to dimensions of autonomy and distill three major themes that we believe will propel autonomous negotiation forward: accurate representation, long-term perspective, and user trust. We argue these orthogonal research directions need to be aligned and advanced in unison to sustain tangible progress in the field.

### Thursday 2416:30 - 18:00ML-NN2 - Neural Networks 2 (210)

Chair: Ziyu Guan
• #1958
Stacked Similarity-Aware Autoencoders
Wenqing Chu, Deng Cai
Neural Networks 2

As one of the most popular unsupervised learning approaches, the autoencoder aims at transforming the inputs to the outputs with the least discrepancy. The conventional autoencoder and most of its variants only consider the one-to-one reconstruction, which ignores the intrinsic structure of the data and may lead to overfitting. In order to preserve the latent geometric information in the data, we propose the stacked similarity-aware autoencoders. To train each single autoencoder, we first obtain the pseudo class label of each sample by clustering the input features. Then the hidden codes of those samples sharing the same category label will be required to satisfy an additional similarity constraint. Specifically, the similarity constraint is implemented based on an extension of the recently proposed center loss. With this joint supervision of the autoencoder reconstruction error and the center loss, the learned feature representations not only can reconstruct the original data, but also preserve the geometric structure of the data. Furthermore, a stacked framework is introduced to boost the representation capacity. The experimental results on several benchmark datasets show the remarkable performance improvement of the proposed algorithm compared with other autoencoder based approaches.

• #2267
Mention Recommendation for Twitter with End-to-end Memory Network
Haoran Huang, Qi Zhang, Xuanjing Huang
Neural Networks 2

In this study, we investigated the problem of recommending usernames when people attempt to use the @'' sign to mention other people in twitter-like social media. With the extremely rapid development of social networking services, this problem has received considerable attention in recent years. Previous methods have studied the problem from different aspects. Because most of Twitter-like microblogging services limit the length of posts, statistical learning methods may be affected by the problems of word sparseness and synonyms. Although recent progress in neural word embedding methods have advanced the state-of-the-art in many natural language processing tasks, the benefits of word embedding have not been taken into consideration for this problem. In this work, we proposed a novel end-to-end memory network architecture to perform this task. We incorporated the interests of users with external memory. A hierarchical attention mechanism was also applied to better consider the interests of users. The experimental results on a dataset we collected from Twitter demonstrated that the proposed method could outperform state-of-the-art approaches.

• #2319
Hashtag Recommendation for Multimodal Microblog Using Co-Attention Network
Qi Zhang, Jiawen Wang, Haoran Huang, Xuanjing Huang, Yeyun Gong
Neural Networks 2

In microblogging services, authors can use hashtags to mark keywords or topics. Many live social media applications (e.g., microblog retrieval, classification) can gain great benefits from these manually labeled tags. However, only a small portion of microblogs contain hashtags inputed by users. Moreover, many microblog posts contain not only textual content but also images. These visual resources also provide valuable information that may not be included in the textual content. So that it can also help to recommend hashtags more accurately. Motivated by the successful use of the attention mechanism, we propose a co-attention network incorporating textual and visual information to recommend hashtags for multimodal tweets. Experimental result on the data collected from Twitter demonstrated that the proposed method can achieve better performance than state-of-the-art methods using textual information only.

• #2986
Encoding and Recall of Spatio-Temporal Episodic Memory in Real Time
Poo-Hee Chang, Ah-Hwee Tan
Neural Networks 2

Episodic memory enables a cognitive system to improve its performance by reflecting upon past events. In this paper, we propose a computational model called STEM for encoding and recall of episodic events together with the associated contextual information in real time. Based on a class of self-organizing neural networks, STEM is designed to learn memory chunks or cognitive nodes, each encoding a set of co-occurring multi-modal activity patterns across multiple pattern channels. We present algorithms for recall of events based on partial and inexact input patterns. Our empirical results based on a public domain data set show that STEM displays a high level of efficiency and robustness in encoding and retrieval with both partial and noisy search cues when compared with a state-of-the-art associative memory model.

• #3679
Deep Context: A Neural Language Model for Large-scale Networked Documents
Hao Wu, Kristina Lerman
Neural Networks 2

We propose a scalable neural language model that leverages the links between documents to learn the deep context of documents. Our model, Deep Context Vector, takes advantage of distributed representations to exploit the word order in document sentences, as well as the semantic connections among linked documents in a document network. We evaluate our model on large-scale data collections that include Wikipedia pages, and scientific and legal citations networks. We demonstrate its effectiveness and efficiency on document classification and link prediction tasks.

• #3211
Earth Mover's Distance Pooling over Siamese LSTMs for Automatic Short Answer Grading
Sachin Kumar, Soumen Chakrabarti, Shourya Roy
Neural Networks 2

Automatic short answer grading (ASAG) can reduce tedium for instructors, but is complicated by free-form student inputs. An important ASAG task is to assign ordinal scores to student answers, given some “model” or ideal answers. Here we introduce a novel framework for ASAG by cascading three neural building blocks: Siamese bidirectional LSTMs applied to a model and a student answer, a novel pooling layer based on earth-mover distance (EMD) across all hidden states from both LSTMs, and a flexible final regression layer to output scores. On standard ASAG data sets, our system shows substantial reduction in grade estimation error compared to competitive baselines. We demonstrate that EMD pooling results in substantial accuracy gains, and that a support vector ordinal regression (SVOR) output layer helps outperform softmax. Our system also outperforms recent attention mechanisms on LSTM states.

### Thursday 2416:30 - 18:00ML-DMUL2 - Data Mining and Unsupervised Learning 2 (211)

Chair: Hady W. Lauw
• #1221
Reconstruction-based Unsupervised Feature Selection: An Embedded Approach
Jundong Li, Jiliang Tang, Huan Liu
Data Mining and Unsupervised Learning 2

Feature selection has been proven to be effective and efficient in preparing high-dimensional data for data mining and machine learning problems. Since real-world data is usually unlabeled, unsupervised feature selection has received increasing attention in recent years. Without label information, unsupervised feature selection needs alternative criteria to define feature relevance. Recently, data reconstruction error emerged as a new criterion for unsupervised feature selection, which defines feature relevance as the capability of features to approximate original data via a reconstruction function. Most existing algorithms in this family assume predefined, linear reconstruction functions. However, the reconstruction function should be data dependent and may not always be linear especially when the original data is high-dimensional. In this paper, we investigate how to learn the reconstruction function from the data automatically for unsupervised feature selection, and propose a novel reconstruction-based unsupervised feature selection framework REFS, which embeds the reconstruction function learning process into feature selection. Experiments on various types of real-world datasets demonstrate the effectiveness of the proposed framework REFS.

• #2162
Multiple Medoids based Multi-view Relational Fuzzy Clustering with Minimax Optimization
Yangtao Wang, Lihui Chen, Xiao-Li Li
Data Mining and Unsupervised Learning 2

Multi-view data becomes prevalent nowadays because more and more data can be collected from various sources. Each data set may be described by different set of features, hence forms a multi-view data set or multi-view data in short. To find the underlying pattern embedded in an unlabelled multi-view data, many multi-view clustering approaches have been proposed. Fuzzy clustering in which a data object can belong to several clusters with different memberships is widely used in many applications. However, in most of the fuzzy clustering approaches, a single center or medoid is considered as the representative of each cluster in the end of clustering process. This may not be sufficient to ensure accurate data analysis. In this paper, a new multi-view fuzzy clustering approach based on multiple medoids and minimax optimization called M4-FC for relational data is proposed. In M4-FC, every object is considered as a medoid candidate with a weight. The higher the weight is, the more likely the object is chosen as the final medoid. In the end of clustering process, there may be more than one mediod in each cluster. Moreover, minimax optimization is applied to find consensus clustering results of different views with its set of features. Extensive experimental studies on several multi-view data sets including real world image and document data sets demonstrate that M4-FC not only outperforms single medoid based multi-view fuzzy clustering approach, but also performs better than existing multi-view relational clustering approaches.

• #2326
Flexible Orthogonal Neighborhood Preserving Embedding
Tianji Pang, Feiping Nie, Junwei Han
Data Mining and Unsupervised Learning 2

In this paper, we propose a novel linear subspace learning algorithm called Flexible Orthogonal Neighborhood Preserving Embedding (FONPE), which is a linear approximation of Locally Linear Embedding (LLE) algorithm. Our novel objective function integrates two terms related to manifold smoothness and a flexible penalty defined on the projection fitness. Different from Neighborhood Preserving Embedding (NPE), we relax the hard constraint by modeling the mismatch between the approximate linear embedding and the original nonlinear embedding instead of enforcing them to be equal, which makes it better cope with the data sampled from a nonlinear manifold. Besides, instead of enforcing an orthogonality between the projected points, we enforce the mapping to be orthogonal. By using this method, FONPE tends to preserve distances and thus the overall geometry can be preserved. Unlike LLE, as FONPE has an explicit linear mapping between the input and the reduced spaces, it can handle novel testing data straightforwardly. Moreover, when the projection matrix in our model becomes an identity matrix, our model can be transformed to denoising LLE (DLLE). Compared with the standard LLE, we demonstrate that DLLE can handle data with noise better. Comprehensive experiments on several benchmark databases demonstrate the effectiveness of our algorithm.

• #2577
User Profile Preserving Social Network Embedding
Daokun Zhang, Jie Yin, Xingquan Zhu, Chengqi Zhang
Data Mining and Unsupervised Learning 2

This paper addresses social network embedding, which aims to embed social network nodes, including user profile information, into a latent low-dimensional space. Most of the existing works on network embedding only consider network structure, but ignore user-generated content that could be potentially helpful in learning a better joint network representation. Different from rich node content in citation networks, user profile information in social networks is useful but noisy, sparse, and incomplete. To properly utilize this information, we propose a new algorithm called User Profile Preserving Social Network Embedding (UPP-SNE), which incorporates user profile with network structure to jointly learn a vector representation of a social network. The theme of UPP-SNE is to embed user profile information via a nonlinear mapping into a consistent subspace, where network structure is seamlessly encoded to jointly learn informative node representations. Extensive experiments on four real-world social networks show that compared to state-of-the-art baselines, our method learns better social network representations and achieves substantial performance gains in node classification and clustering tasks.

• #2614
Multi-Component Nonnegative Matrix Factorization
Jing Wang, Feng Tian, Xiao Wang, Hongchuan Yu, Chang Hong Liu, Liang Yang
Data Mining and Unsupervised Learning 2

Real data are usually complex and contain various components. For example, face images have expressions and genders. Each component mainly reflects one aspect of data and provides information others do not have. Therefore, exploring the semantic information of multiple components as well as the diversity among them is of great benefit to understand data comprehensively and in-depth. However, this cannot be achieved by current nonnegative matrix factorization (NMF)-based methods, despite that NMF has shown remarkable competitiveness in learning parts-based representation of data. To overcome this limitation, we propose a novel multi-component nonnegative matrix factorization (MCNMF). Instead of seeking for only one representation of data, MCNMF learns multiple representations simultaneously, with the help of the Hilbert Schmidt Independence Criterion (HSIC) as a diversity term. HSIC explores the diverse information among the representations, where each representation corresponds to a component. By integrating the multiple representations, a more comprehensive representation is then established. A new iterative updating optimization scheme is derived to solve the objective function of MCNMF, along with its correctness and convergence guarantees. Extensive experimental results on real-world datasets have shown that MCNMF not only achieves more accurate performance over the state-of-the-arts using the aggregated representation, but also interprets data from different aspects with the multiple representations, which is beyond what current NMFs can offer.

• #3342
Self-weighted Multiview Clustering with Multiple Graphs
Feiping Nie, Jing Li, Xuelong Li
Data Mining and Unsupervised Learning 2

In multiview learning, it is essential to assign a reasonable weight to each view according to its importance. Thus, for multiview clustering task, a wise and elegant method should achieve clustering multiview data while learning the view weights. In this paper, we address this problem by exploring a Laplacian rank constrained graph, which can be approximately as the centroid of the built graph for each view with different confidences. We start our work with a natural thought that the weights can be learned by introducing a hyperparameter. By analyzing the weakness of it, we further propose a new multiview clustering method which is totally self-weighted. Furthermore, once the target graph is obtained in our models, we can directly assign the cluster label to each data point and do not need any postprocessing such as $K$-means in standard spectral clustering. Evaluations on two synthetic datasets prove the effectiveness of our methods. Compared with several representative graph-based multiview clustering approaches on four real-world datasets, experimental results demonstrate that the proposed methods achieve the better performances and our new clustering method is more practical to use.

### Thursday 2416:30 - 18:00ML-TAML3 - Transfer, Adaptation, Multi-Task Learning 3 (212)

Chair: Jingrui He
• #1965
Modal Consistency based Pre-Trained Multi-Model Reuse
Yang Yang, De-Chuan Zhan, Xiang-Yu Guo, Yuan Jiang
Transfer, Adaptation, Multi-Task Learning 3

Multi-Model Reuse is one of the prominent problems in Learnware framework, while the main issue of Multi-Model Reuse lies in the final prediction acquisition from the responses of multiple pre-trained models. Different from multi-classifiers ensemble, there are only pre-trained models rather than the whole training sets provided in Multi-Model Reuse configuration. This configuration is closer to the real applications where the reliability of each model cannot be evaluated properly. In this paper, aiming at the lack of evaluation on reliability, the potential consistency spread on different modalities is utilized. With the consistency of pre-trained models on different modalities, we propose a Pre-trained Multi-Model Reuse approach PM2R with multi-modal data, which realizes the reusability of multiple models. PM2R can combine pre-trained multi-models efficiently without re-training, and consequently no more training data storage is required. We describe the more realistic Multi-Model Reuse setting comprehensively in our paper, and point out the differences among this setting, classifier ensemble and later fusion on multi-modal learning. Experiments on synthetic and real-world datasets validate the effectiveness of PM2R when it is compared with state-of-the-art ensemble/multi-modal learning methods under this more realistic setting.

• #2712
Joint Image Emotion Classification and Distribution Learning via Deep Convolutional Neural Network
Jufeng Yang, Dongyu She, Ming Sun
Transfer, Adaptation, Multi-Task Learning 3

Visual sentiment analysis is attracting more and more attention with the increasing tendency to express emotions through visual contents. Recent algorithms in convolutional neural networks (CNNs) considerably advance the emotion classification, which aims to distinguish differences among emotional categories and assigns a single dominant label to each image. However, the task is inherently ambiguous since an image usually evokes multiple emotions and its annotation varies from person to person. In this work, we address the problem via label distribution learning (LDL) and develop a multi-task deep framework by jointly optimizing both classification and distribution prediction. While the proposed method prefers to the distribution dataset with annotations of different voters, the majority voting scheme is widely adopted as the ground truth in this area, and few dataset has provided multiple affective labels. Hence, we further exploit two weak forms of prior knowledge, which are expressed as similarity information between labels, to generate emotional distribution for each category. The experiments conducted on both distribution datasets, i.e., Emotion6, Flickr_LDL, Twitter_LDL, and the largest single emotion dataset, i.e., Flickr and Instagram, demonstrate the proposed method outperforms the state-of-the-art approaches.

• #3741
Tensor Based Knowledge Transfer Across Skill Categories for Robot Control
Chenyang Zhao, Timothy M. Hospedales, Freek Stulp, Olivier Sigaud
Transfer, Adaptation, Multi-Task Learning 3

Advances in hardware and learning for control are enabling robots to perform increasingly dextrous and dynamic control tasks. These skills typically require a prohibitive amount of exploration for reinforcement learning, and so are commonly achieved by imitation learning from manual demonstration. The costly non-scalable nature of manual demonstration has motivated work into skill generalisation, e.g., through contextual policies and options. Despite good results, existing work along these lines is limited to generalising across variants of one skill such as throwing an object to different locations. In this paper we go significantly further and investigate generalisation across qualitatively different classes of control skills. In particular, we introduce a class of neural network controllers that can realise four distinct skill classes: reaching, object throwing, casting, and ball-in-cup. By factorising the weights of the neural network, we are able to extract transferrable latent skills, that enable dramatic acceleration of learning in cross-task transfer. With a suitable curriculum, this allows us to learn challenging dextrous control tasks like ball-in-cup from scratch with pure reinforcement learning.

• #3822
Learning with Previously Unseen Features
Yuan Shi, Craig A. Knoblock
Transfer, Adaptation, Multi-Task Learning 3

We study the problem of improving a machine learning model by identifying and using features that are not in the training set. This is applicable to machine learning systems deployed in an open environment. For example, a prediction model built on a set of sensors may be improved when it has access to new and relevant sensors at test time. To effectively use new features, we propose a novel approach that learns a model over both the original and new features, with the goal of making the joint distribution of features and predicted labels similar to that in the training set. Our approach can naturally leverage labels associated with these new features when they are accessible. We present an efficient optimization algorithm for learning the model parameters and empirically evaluate the approach on several regression and classification tasks. Experimental results show that our approach can achieve on average 11.2% improvement over baselines.

• #3200
Exploiting High-Order Information in Heterogeneous Multi-Task Feature Learning
Yong Luo, Dacheng Tao, Yonggang Wen
Transfer, Adaptation, Multi-Task Learning 3

Multi-task feature learning (MTFL) aims to improve the generalization performance of multiple related learning tasks by sharing features between them. It has been successfully applied to many pattern recognition and biometric prediction problems. Most of current MTFL methods assume that different tasks exploit the same feature representation, and thus are not applicable to the scenarios where data are drawn from heterogeneous domains. Existing heterogeneous transfer learning (including multi-task learning) approaches handle multiple heterogeneous domains by usually learning feature transformations across different domains, but they ignore the high-order statistics (correlation information) which can only be discovered by simultaneously exploring all domains. We therefore develop a tensor based heterogeneous MTFL (THMTFL) framework to exploit such high-order information. Specifically, feature transformations of all domains are learned together, and finally used to derive new representations. A connection between all domains is built by using the transformations to project the pre-learned predictive structures of different domains into a common subspace, and minimizing their divergence in the subspace. By exploring the high-order information, the proposed THMTFL can obtain more reliable feature transformations compared with existing heterogeneous transfer learning approaches. Extensive experiments on both text categorization and social image annotation demonstrate superiority of the proposed method.

• #3404
Adaptive Group Sparse Multi-task Learning via Trace Lasso
Sulin Liu, Sinno Jialin Pan
Transfer, Adaptation, Multi-Task Learning 3

In multi-task learning (MTL), tasks are learned jointly so that information among related tasks is shared and utilized to help improve generalization for each individual task. A major challenge in MTL is how to selectively choose what to share among tasks. Ideally, only related tasks should share information with each other. In this paper, we propose a new MTL method that can adaptively group correlated tasks into clusters and share information among the correlated tasks only. Our method is based on the assumption that each task parameter is a linear combination of other tasks' and the coefficients of the linear combination are active only if there is relatedness between the two tasks. Through introducing trace Lasso penalty on these coefficients, our method is able to adaptively select the subset of coefficients with respect to the tasks that are correlated to the task. Our model frees the process of determining task clustering structure as used in the literature. Efficient optimization methods based on alternating direction method of multipliers (ADMM) is developed to solve the problem. Experimental results on both synthetic and real-world datasets demonstrate the effectiveness of our method in terms of clustering related tasks and generalization performance.

### Thursday 2416:30 - 18:00ML-EM - Ensemble Methods (213)

Chair: Min-Ling Zhang
• #1300
Positive unlabeled learning via wrapper-based adaptive sampling
Pengyi Yang, Wei Liu, Jean Yang
Ensemble Methods

Learning from positive and unlabeled data frequently occurs in applications where only a subset of positive instances is available while the rest of the data are unlabeled. In such scenarios, often the goal is to create a discriminant model that can accurately classify both positive and negative data by modelling from labeled and unlabeled instances. In this study, we propose an adaptive sampling (AdaSampling) approach that utilises prediction probabilities from a model to iteratively update the training data. Starting with equal prior probabilities for all unlabeled data, our method "wraps" around a predictive model to iteratively update these probabilities to distinguish positive and negative instances in unlabeled data. Subsequently, one or more robust negative set(s) can be drawn from unlabeled data, according to the likelihood of each instance being negative, to train a single classification model or ensemble of models.

• #1418
Integrating Specialized Classifiers Based on Continuous Time Markov Chain
Zhizhong Li, Dahua Lin
Ensemble Methods

Specialized classifiers, namely those dedicated to a subset of classes, are often adopted in real-world recognition systems. However, integrating such classifiers is nontrivial. Existing methods, e.g. weighted average, usually implicitly assume that all constituents of an ensemble cover the same set of classes. Such methods can produce misleading predictions when used to combine specialized classifiers. This work explores a novel approach. Instead of combining predictions from individual classifiers directly, it first decomposes the predictions into sets of pairwise preferences, treating them as transition channels between classes, and thereon constructs a continuous-time Markov chain, and use the equilibrium distribution of this chain as the final prediction. This way allows us to form a coherent picture over all specialized predictions. On large public datasets, the proposed method obtains considerable improvement compared to mainstream ensemble methods, especially when the classifier coverage is highly unbalanced.

• #1865
Unsupervised Learning of Deep Feature Representation for Clustering Egocentric Actions
Bharat Lal Bhatnagar, Suriya Singh, Chetan Arora, C.V. Jawahar
Ensemble Methods

Popularity of wearable cameras in life logging, law enforcement, assistive vision and other similar applications is leading to explosion in generation of egocentric video content. First person action recognition is an important aspect of automatic analysis of such videos. Annotating such videos is hard, not only because of obvious scalability constraints, but also because of privacy issues often associated with egocentric videos. This motivates the use of unsupervised methods for egocentric video analysis. In this work, we propose a robust and generic unsupervised approach for first person action clustering. Unlike the contemporary approaches, our technique is neither limited to any particular class of actions nor requires priors such as pre-training, fine-tuning, etc. We learn time sequenced visual and flow features from an array of weak feature extractors based on convolutional and LSTM autoencoder networks. We demonstrate that clustering of such features leads to the discovery of semantically meaningful actions present in the video. We validate our approach on four disparate public egocentric actions datasets amounting to approximately 50 hours of videos. We show that our approach surpasses the supervised state of the art accuracies without using the action labels.

• #1903
Bayesian Aggregation of Categorical Distributions with Applications in Crowdsourcing
Alexandry Augustin, Matteo Venanzi, Alex Rogers, Nicholas R. Jennings
Ensemble Methods

A key problem in crowdsourcing is the aggregation of judgments of proportions. For example, workers might be presented with a news article or an image, and be asked to identify the proportion of each topic, sentiment, object, or colour present in it. These varying judgments then need to be aggregated to form a consensus view of the document’s or image’s contents. Often, however, these judgments are skewed by workers who provide judgments randomly. Such spammers make the cost of acquiring judgments more expensive and degrade the accuracy of the aggregation. For such cases, we provide a new Bayesian framework for aggregating these responses (expressed in the form of categorical distributions) that for the first time accounts for spammers. We elicit 796 judgments about proportions of objects and coloursin images. Experimental results show comparable aggregation accuracy when 60% of the workers are spammers, as other state of the art approaches do when there are no spammers.

• #2015
Deep Forest: Towards An Alternative to Deep Neural Networks
Zhi-Hua Zhou, Ji Feng
Ensemble Methods

In this paper, we propose gcForest, a decision tree ensemble approach with performance highly competitive to deep neural networks in a broad range of tasks. In contrast to deep neural networks which require great effort in hyper-parameter tuning, gcForest is much easier to train; even when it is applied to different data across different domains in our experiments, excellent performance can be achieved by almost same settings of hyper-parameters. The training process of gcForest is efficient, and users can control training cost according to computational resource available. The efficiency may be further enhanced because gcForest is naturally apt to parallel implementation. Furthermore, in contrast to deep neural networks which require large-scale training data, gcForest can work well even when there are only small-scale training data.

• #2553
Stacking With Auxiliary Features
Nazneen Fatema Rajani, Raymond J. Mooney
Ensemble Methods

Ensembling methods are well known for improving prediction accuracy. However, they are limited in the sense that they cannot effectively discriminate among component models. In this paper, we propose stacking with auxiliary features that learns to fuse additional relevant information from multiple component systems as well as input instances to improve performance. We use two types of auxiliary features -- instance features and provenance features. The instance features enable the stacker to discriminate across input instances and the provenance features enable the stacker to discriminate across component systems. When combined together, our algorithm learns to rely on systems that not just agree on an output but also the provenance of this output in conjunction with the properties of the input instance. We demonstrate the success of our approach on three very different and challenging natural language and vision problems: Slot Filling, Entity Discovery and Linking, and ImageNet Object Detection. We obtain new state-of-the-art results on the first two tasks and significant improvements on the ImageNet task, thus verifying the power and generality of our approach.

### Thursday 2416:30 - 18:00CS-CO - Constraint Optimisation (216)

Chair: Jordi Levy
• #1939
Constraint-Based Symmetry Detection in General Game Playing
Frédéric Koriche, Sylvain Lagrue, Éric Piette, Sébastien Tabary
Constraint Optimisation

Symmetry detection is a promising approach for reducing the search tree of games. In General Game Playing (GGP), where any game is compactly represented by a set of rules in the Game Description Language (GDL), the state-of-the-art methods for symmetry detection rely on a rule graph associated with the GDL description of the game. Though such rule-based symmetry detection methods can be applied to various tree search algorithms, they cover only a limited number of symmetries which are apparent in the GDL description. In this paper, we develop an alternative approach to symmetry detection in stochastic games that exploits constraint programming techniques. The minimax optimization problem in a GDL game is cast as a stochastic constraint satisfaction problem (SCSP), which can be viewed as a sequence of one-stage SCSPs. Minimax symmetries are inferred according to themicrostructure complement of these one-stage constraint networks. Based on a theoretical analysis of this approach, we experimentally show on various games that the recent stochastic constraint solver MAC-UCB, coupled with constraint-based symmetry detection, significantly outperforms the standard Monte Carlo Tree Search algorithms, coupled with rule-based symmetry detection. This constraint-driven approach is also validated by the excellent results obtained by our player during the last GGP competition.

• #2431
A Partitioning Algorithm for Maximum Common Subgraph Problems
Ciaran McCreesh, Patrick Prosser, James Trimble
Constraint Optimisation

We introduce a new branch and bound algorithm for the maximum common subgraph and maximum common connected subgraph problems which is based around vertex labelling and partitioning. Our method in some ways resembles a traditional constraint programming approach, but uses a novel compact domain store and supporting inference algorithms which dramatically reduce the memory and computation requirements during search, and allow better dual viewpoint ordering heuristics to be calculated cheaply. Experiments show a speedup of more than an order of magnitude over the state of the art, and demonstrate that we can operate on much larger graphs without running out of memory.

• #2534
Robust Quadratic Programming for Price Optimization
Akihiro Yabe, Shinji Ito, Ryohei Fujimaki
Constraint Optimisation

The goal of price optimization is to maximize total revenue by adjusting the prices of products, on the basis of predicted sales numbers that are functions of pricing strategies. Recent advances in demand modeling using machine learning raise a new challenge in price optimization, i.e., how to manage statistical errors in estimation. In this paper, we show that uncertainty in recently-proposed prescriptive price optimization frameworks can be represented by a matrix normal distribution. For this particular uncertainty, we propose novel robust quadratic programming algorithms for conservative lower-bound maximization. We offer an asymptotic probabilistic guarantee of conservativeness of our formulation. Our experiments on both artificial and actual price data show that our robust price optimization allows users to determine best risk-return trade-offs and to explore safe, profitable price strategies.

• #3437
XOR-Sampling for Network Design with Correlated Stochastic Events
Xiaojian Wu, Yexiang Xue, Bart Selman, Carla P. Gomes
Constraint Optimisation

Many network optimization problems can be formulated as stochastic network design problems in which edges are present or absent stochastically. Furthermore, protective actions can guarantee that edges will remain present. We consider the problem of finding the optimal protection strategy under a budget limit in order to maximize some connectivity measurements of the network. Previous approaches rely on the assumption that edges are independent. In this paper, we consider a more realistic setting where multiple edges are not independent due to natural disasters or regional events that make the states of multiple edges stochastically correlated. We use Markov Random Fields to model the correlation and define a new stochastic network design framework. We provide a novel algorithm based on Sample Average Approximation (SAA) coupled with a Gibbs or XOR sampler. The experimental results on real road network data show that the policies produced by SAA with the XOR sampler have higher quality and lower variance compared to SAA with Gibbs sampler.

• #1441
Robust Regression via Heuristic Hard Thresholding
Xuchao Zhang, Liang Zhao, Arnold P. Boedihardjo, Chang-Tien Lu
Constraint Optimisation

The presence of data noise and corruptions recently invokes increasing attention on Robust Least Squares Regression (RLSR), which addresses the fundamental problem that learns reliable regression coefficients when response variables can be arbitrarily corrupted. Until now, several important challenges still cannot be handled concurrently: 1) exact recovery guarantee of regression coefficients 2) difficulty in estimating the corruption ratio parameter; and 3) scalability to massive dataset. This paper proposes a novel Robust Least squares regression algorithm via Heuristic Hard thresholding (RLHH), that concurrently addresses all the above challenges. Specifically, the algorithm alternately optimizes the regression coefficients and estimates the optimal uncorrupted set via heuristic hard thresholding without corruption ratio parameter until it converges. We also prove that our algorithm benefits from strong guarantees analogous to those of state-of-the-art methods in terms of convergence rates and recovery guarantees. We provide empirical evidence to demonstrate that the effectiveness of our new method is superior to that of existing methods in the recovery of both regression coefficients and uncorrupted sets, with very competitive efficiency.

• #2600
Restart and Random Walk in Local Search for Maximum Vertex Weight Cliques with Evaluations in Clustering Aggregation
Yi Fan, Nan Li, Chengqian Li, Zongjie Ma, Longin Jan Latecki, Kaile Su
Constraint Optimisation

The Maximum Vertex Weight Clique (MVWC) problem is NP-hard and also important in real-world applications. In this paper we propose to use the restart and the random walk strategies to improve local search for MVWC. If a solution is revisited in some particular situation, the search will restart. In addition, when the local search has no other options except dropping vertices, it will use random walk. Experimental results show that our solver outperforms state-of-the-art solvers in DIMACS and finds a new best-known solution. Also it is the unique solver which is comparable with state-of-the-art methods on both BHOSLIB and large crafted graphs. Furthermore we evaluated our solver in clustering aggregation. Experimental results on a number of real data sets demonstrate that our solver outperforms the state-of-the-art for solving the derived MVWC problem and helps improve the final clustering results.

### Thursday 2416:30 - 18:00KR-NMR - Non-Monotonic Reasoning (217)

Chair: Matthias Thimm
• #797
Semantics for Active Integrity Constraints Using Approximation Fixpoint Theory
Bart Bogaerts, Luís Cruz-Filipe
Non-Monotonic Reasoning

Active integrity constraints (AICs) constitute a formalism to associate with a database not just the constraints it should adhere to, but also how to fix the database in case one or more of these constraints are violated. The intuitions regarding which repairs are “good” given such a description are closely related to intuitions that live in various areas of non-monotonic reasoning. In this paper, we apply approximation fixpoint theory, an algebraic framework that unifies semantics of non-monotonic logics, to the field of AICs. This results in a new family of semantics for AICs, of which we study semantics and relationships to existing semantics. We argue that the AFT-well-founded semantics has some desirable properties.

• #846
Safe Inductions: An Algebraic Study
Bart Bogaerts, Joost Vennekens, Marc Denecker
Non-Monotonic Reasoning

In many knowledge representation formalisms, a constructive semantics is defined based on sequential applications of rules or of a semantic operator. These constructions often share the property that rule applications must be delayed until it is safe to do so: until it is known that the condition that triggers the rule will remain to hold. This intuition occurs for instance in the well-founded semantics of logic programs and in autoepistemic logic. In this paper, we formally define the safety criterion algebraically. We study properties of so-called safe inductions and apply our theory to logic programming and autoepistemic logic. For the latter, we show that safe inductions manage to capture the intended meaning of a class of theories on which all classical constructive semantics fail.

• #1838
A Study of Unrestricted Abstract Argumentation Frameworks
Ringo Baumann, Christof Spanring
Non-Monotonic Reasoning

Research in abstract argumentation typically per-tains to finite argumentation frameworks (AFs). Ac-tual or potential infinite AFs frequently occur if theyare used for the purpose of nonmonotonic entail-ment, so-called instantiation-based argumentation,or if they are involved as modeling tool for dia-logues, n-person-games or action sequences. Apartfrom these practical cases a profound analysis yieldsa better understanding of how the nonmonotonic the-ory of abstract argumentation works in general. Inthis paper we study a bunch of abstract propertieslike SCC-recursiveness, expressiveness or intertrans-latability for unrestricted AFs.

• #2298
Streaming Multi-Context Systems
Minh Dao-Tran, Thomas Eiter
Non-Monotonic Reasoning

Multi-Context Systems (MCS) are a powerful framework to interlink heterogeneous knowledge bases under equilibrium semantics. Recent extensions of MCS to dynamic data settings either abstract from computing time, or abandon a dynamic equilibrium semantics. We thus present streaming MCS, which have a run-based semantics that accounts for asynchronous, distributed execution and supports obtaining equilibria for contexts in cyclic exchange (avoiding infinite loops); moreover, they equip MCS with native stream reasoning features. Ad-hoc query answering is NP-complete while prediction is PSpace-complete in relevant settings (but undecidable in general); tractability results for suitable restrictions.

• #2539
A Unifying Framework for Probabilistic Belief Revision
Zhiqiang Zhuang, James Delgrande, Abhaya Nayak, Abdul Sattar
Non-Monotonic Reasoning

In this paper we provide a general, unifying framework for probabilistic belief revision. We first introduce a probabilistic logic called p-logic that is capable of representing and reasoning with basic probabilistic information. With p-logic as the background logic, we define a revision function called p-revision that resembles partial meet revision in the AGM framework. We provide a representation theorem for p-revision which shows that it can be characterised by the set of basic AGM revision postulates. P-revision represents an "all purpose" method for revising probabilistic information that can be used for, but not limited to, the revision problems behind Bayesian conditionalisation, Jeffrey conditionalisation, and Lewis's imaging. Importantly, p-revision subsumes all three approaches indicating that Bayesian conditionalisation, Jeffrey conditionalisation, and Lewis' imaging all obey the basic principles of AGM revision. As well our investigation sheds light on the corresponding operation of AGM expansion in the probabilistic setting.

• #3773
Lazy-Grounding for Answer Set Programs with External Source Access
Thomas Eiter, Tobias Kaminski, Antonius Weinzierl
Non-Monotonic Reasoning

HEX-programs enrich the well-known Answer Set Programming (ASP) paradigm. In HEX, problems are solved using nonmonotonic logic programs with bidirectional access to external sources. ASP evaluation is traditionally based on grounding the input program first, but recent advances in lazy-grounding make the latter also interesting for HEX, as the grounding bottleneck of ASP may be avoided. We explore this issue and present a new evaluation algorithm for HEX-programs based on lazy-grounding solving for ASP. Nonmonotonic dependencies and value invention (i.e., import of new constants) from external sources make an efficient solution nontrivial. However, illustrative benchmarks show a clear advantage of the new algorithm for grounding-intense programs, which is a new perspective to make HEX more suitable for real-world application needs.

### Thursday 2416:30 - 18:00KR-ACC - Action, Change and Causality (218)

Chair: Franz Baader
• #2110
A Core-Guided Approach to Learning Optimal Causal Graphs
Antti Hyttinen, Paul Saikko, Matti Järvisalo
Action, Change and Causality

Discovery of causal relations is an important part of data analysis. Recent exact Boolean optimization approaches enable tackling very general search spaces of causal graphs with feedback cycles and latent confounders, simultaneously obtaining high accuracy by optimally combining conflicting independence information in sample data. We propose several domain-specific techniques and integrate them into a core-guided maximum satisfiability solver, thereby speeding up current state of the art in exact search for causal graphs with cycles and latent confounders on simulated and real-world data.

• #1278
Budget-Constrained Dynamics in Multiagent Systems
Rui Cao, Pavel Naumov
Action, Change and Causality

The paper introduces a notion of a budget-constrained multiagent transition system that associates two financial parameters with each transition: a pre-transition minimal budget requirement and a post-transition profit. The paper also proposes a new modal language for reasoning about such a system. The language uses a modality labeled by agent as well as by budget and profit constraints. The main technical result is a sound and complete logical system that describes all universal properties of this modality. Among these properties is a form of Transitivity axiom that captures the interplay between the budget and profit constraints.

• #1605
GDL-III: A Description Language for Epistemic General Game Playing
Michael Thielscher
Action, Change and Causality

GDL-III, a description language for general game playing with imperfect information and introspection, supports the specification of epistemic games. These are characterised by rules that depend on the knowledge of players. GDL-III provides a simpler language for representing actions and knowledge than existing formalisms: domain descriptions require neither explicit axioms about the epistemic effects of actions, nor explicit specifications of accessibility relations. We develop a formal semantics for GDL-III and demonstrate that this language, despite its syntactic simplicity, is expressive enough to model the famous Muddy Children domain. We also show that it significantly enhances the expressiveness of its predecessor GDL-II by formally proving that termination of games becomes undecidable, and we present experimental results with a reasoner for GDL-III applied to general epistemic puzzles.

• #2312
Handling non-local dead-ends in Agent Planning Programs
Lukas Chrpa, Nir Lipovetzky, Sebastian Sardina
Action, Change and Causality

We propose an approach to reason about agent planning programs with global information. Agent planning programs can be understood as a network of planning tasks, accommodating long-term goals, non-terminating behaviors, and interactive execution. We provide a technique that relies on reasoning about global" dead-ends and that can be incorporated to any planning-based approach to agent planning problems. In doing so, we also introduce the notion of online execution of such planning structures. We provide experimental evidence suggesting the technique yields significant benefits.

• #2509
Reasoning about Probabilities in Unbounded First-Order Dynamical Domains
Vaishak Belle, Gerhard Lakemeyer
Action, Change and Causality

When it comes to robotic agents operating in an uncertain world, a major concern in knowledge representation is to better relate high-level logical accounts of belief and action to the low-level probabilistic sensorimotor data. Perhaps the most general formalism for dealing with degrees of belief and, in particular, how such beliefs should evolve in the presence of noisy sensing and acting is the account by Bacchus, Halpern, and Levesque. In this paper, we reconsider that model of belief, and propose a new logical variant that has much of the expressive power of the original, but goes beyond it in novel ways. In particular, by moving to a semantical account of a modal variant of the situation calculus based on possible worlds with unbounded domains and probabilistic distributions over them, we are able to capture the beliefs of a fully introspective knowledge base with uncertainty by way of an only-believing operator. The paper introduces the new logic and discusses key properties as well as examples that demonstrate how the beliefs of a knowledge base change as a result of noisy actions.

• #3753
Transfer Learning in Multi-Armed Bandits: A Causal Approach
Junzhe Zhang, Elias Bareinboim
Action, Change and Causality

Reinforcement learning (RL) agents have been deployed in complex environments where interactions are costly, and learning is usually slow. One prominent task in these settings is to reuse interactions performed by other agents to accelerate the learning process. Causal inference provides a family of methods to infer the effects of actions from a combination of data and qualitative assumptions about the underlying environment. Despite its success of transferring invariant knowledge across domains in the empirical sciences, causal inference has not been fully realized in the context of transfer learning in interactive domains. In this paper, we use causal inference as a basis to support a principled and more robust transfer of knowledge in RL settings. In particular, we tackle the problem of transferring knowledge across bandit agents in settings where causal effects cannot be identified by do-calculus [Pearl, 2000] and standard learning techniques. Our new identification strategy combines two steps -- first, deriving bounds over the arm’s distribution based on structural knowledge; second, incorporating these bounds in a dynamic allocation procedure so as to guide the search towards more promising actions. We formally prove that our strategy dominates previously known algorithms and achieves orders of magnitude faster convergence rates than these algorithms. Finally, we perform simulations and empirically demonstrate that our strategy is consistently more efficient than the current (non-causal) state-of-the-art methods

### Thursday 2416:30 - 18:00ML-LT - Learning Theory (219)

Chair: Tianqing Zhu
• #1637
Understanding How Feature Structure Transfers in Transfer Learning
Tongliang Liu, Qiang Yang, Dacheng Tao
Learning Theory

Transfer learning transfers knowledge across domains to improve the learning performance. Since feature structures generally represent the common knowledge across different domains, they can be transferred successfully even though the labeling functions across domains differ arbitrarily. However, theoretical justification for this success has remained elusive. In this paper, motivated by self-taught learning, we regard a set of bases as a feature structure of a domain if the bases can (approximately) reconstruct any observation in this domain. We propose a general analysis scheme to theoretically justify that if the source and target domains share similar feature structures, the source domain feature structure is transferable to the target domain, regardless of the change of the labeling functions across domains. The transferred structure is interpreted to function as a regularization matrix which benefits the learning process of the target domain task. We prove that such transfer enables the corresponding learning algorithms to be uniformly stable. Specifically, we illustrate the existence of feature structure transfer in two well-known transfer learning settings: domain adaptation and learning to learn.

• #1894
Query-Driven Discovery of Anomalous Subgraphs in Attributed Graphs
Nannan Wu, Feng Chen, Jianxin Li, Jinpeng Huai, Bo Li
Learning Theory

For a detection problem, a user often has some prior knowledge about the structure-specific subgraphs of interest, but few traditional approaches are capable of employing this knowledge. The main technical challenge is that few approaches can efficiently model the space of connected subgraphs that are isomorphic to a query graph. We present a novel, efficient approach for optimizing a generic nonlinear cost function subject to a query-specific structural constraint. Our approach enjoys strong theoretical guarantees on the convergence of a nearly optimal solution and a low time complexity. For the case study, we specialize the nonlinear function to several well-known graph scan statistics for anomalous subgraph discovery. Empirical evidence demonstrates that our method is superior to state-of-the-art methods in several real-world anomaly detection tasks.

• #2784
Thresholding Bandits with Augmented UCB
Subhojyoti Mukherjee, Naveen Kolar Purushothama, Nandan Sudarsanam, Balaraman Ravindran
Learning Theory

In this paper we propose the Augmented-UCB (AugUCB) algorithm for a fixed-budget version of the thresholding bandit problem (TBP), where the objective is to identify a set of arms whose quality is above a threshold. A key feature of AugUCB is that it uses both mean and variance estimates to eliminate arms that have been sufficiently explored; to the best of our knowledge this is the first algorithm to employ such an approach for the considered TBP. Theoretically, we obtain an upper bound on the loss (probability of mis-classification) incurred by AugUCB. Although UCBEV in literature provides a better guarantee, it is important to emphasize that UCBEV has access to problem complexity (whose computation requires arms' mean and variances), and hence is not realistic in practice; this is in contrast to AugUCB whose implementation does not require any such complexity inputs. We conduct extensive simulation experiments to validate the performance of AugUCB. Through our simulation work, we establish that AugUCB, owing to its utilization of variance estimates, performs significantly better than the state-of-the-art APT, CSAR and other non variance-based algorithms.

• #3478
No Learner Left Behind: On the Complexity of Teaching Multiple Learners Simultaneously
Xiaojin Zhu, Ji Liu, Manuel Lopes
Learning Theory

We present a theoretical study of algorithmic teaching in the setting where the teacher must use the same training set to teach multiple learners. This problem is a theoretical abstraction of the real-world classroom setting in which the teacher delivers the same lecture to academically diverse students. We define a minimax teaching criterion to guarantee the performance of the worst learner in the class. We prove that the teaching dimension increases with class diversity in general. For the classes of conjugate Bayesian learners and linear regression learners, respectively, we exhibit corresponding minimax teaching set. We then propose a method to enhance teaching by partitioning the class into sections. We present cases where the optimal partition minimizes overall teaching dimension while maintaining the guarantee on all learners. Interestingly, we show personalized education (one learner per section) is not necessarily the optimal partition. Our results generalize algorithmic teaching to multiple learners and offer insight on how to teach large classes.

• #3640
On the Complexity of Learning from Label Proportions
Benjamin Fish, Lev Reyzin
Learning Theory

In the problem of learning with label proportions (also known as the problem of estimating class ratios), the training data is unlabeled, and only the proportions of examples receiving each label are given. The goal is to learn a hypothesis that predicts the proportions of labels on the distribution underlying the sample. This model of learning is useful in a wide variety of settings, including predicting the number of votes for candidates in political elections from polls. In this paper, we resolve foundational questions regarding the computational complexity of learning in this setting. We formalize a simple version of the setting, and we compare the computational complexity of learning in this model to classical PAC learning. Perhaps surprisingly, we show that what can be learned efficiently in this model is a strict subset of what may be leaned efficiently in PAC, under standard complexity assumptions. We give a characterization in terms of VC dimension, and we show that there are non-trivial problems in this model that can be efficiently learned. We also give an algorithm that demonstrates the feasibility of learning under well-behaved distributions.

• #2220
Fast Stochastic Variance Reduced ADMM for Stochastic Composition Optimization
Yue Yu, Longbo Huang
Learning Theory

We consider the stochastic composition optimization problem proposed in \cite{wang2017stochastic}, which has applications ranging from estimation to statistical and machine learning. We propose the first ADMM based algorithm named com SVR ADMM, and show that com SVR ADMM converges linearly for strongly convex and Lipschitz smooth objectives, and has a convergence rate of $O(\logS/S)$, which improves upon the $O(S^{-4/9})$ rate in \cite{wang2016accelerating} when the objective is convex and Lipschitz smooth. Moreover, com SVR ADMM possesses a rate of $O(1/\sqrt{S})$ when the objective is convex but without Lipschitz smoothness. We also conduct experiments and show that it outperforms existing algorithms.

### Thursday 2416:30 - 18:00NLP-IE - Information Extraction (220)

Chair: Lidong Bing
• #2206
How to Keep a Knowledge Base Synchronized with Its Encyclopedia Source
Jiaqing Liang, Sheng Zhang, Yanghua Xiao
Information Extraction

Knowledge bases are playing an increasingly important role in many real-world applications. However, most of these knowledge bases tend to be outdated, which limits the utility of these knowledge bases. In this paper, we investigate how to keep the freshness of the knowledge base by synchronizing it with its data source (usually encyclopedia websites). A direct solution is revisiting the whole encyclopedia periodically and rerun the entire pipeline of the construction of knowledge base like most existing methods. However, this solution is wasteful and incurs massive overload of the network, which limits the update frequency and leads to knowledge obsolescence. To overcome the weakness, we propose a set of synchronization principles upon which we build an Update System for knowledge Base (USB) with an update frequency predictor of entities as the core component. We also design a set of effective features and realize the predictor. We conduct extensive experiments to justify the effectiveness of the proposed system, model, as well as the underlying principles. Finally, we deploy USB on a Chinese knowledge base to improve its freshness.

• #2164
Iterative Entity Alignment via Joint Knowledge Embeddings
Hao Zhu, Ruobing Xie, Zhiyuan Liu, Maosong Sun
Information Extraction

Entity alignment aims to link entities and their counterparts among multiple knowledge graphs (KGs). Most existing methods typically rely on external information of entities such as Wikipedia links and require costly manual feature construction to complete alignment. In this paper, we present a novel approach for entity alignment via joint knowledge embeddings. Our method jointly encodes both entities and relations of various KGs into a unified low-dimensional semantic space according to a small seed set of aligned entities. During this process, we can align entities according to their semantic distance in this joint semantic space. More specifically, we present an iterative and parameter sharing method to improve alignment performance. Experiment results on real-world datasets show that, as compared to baselines, our method achieves significant improvements on entity alignment, and can further improve knowledge graph completion performance on various KGs with the favor of joint knowledge embeddings.

• #2264
Conditional Generative Adversarial Networks for Commonsense Machine Comprehension
Bingning Wang, Kang Liu, Jun Zhao
Information Extraction

Recently proposed Story Cloze Test [Mostafazadeh et al., 2016] is a commonsense machine comprehension application to deal with natural language understanding problem. This dataset contains a lot of story tests which require commonsense inference ability. Unfortunately, the training data is almost unsupervised where each context document followed with only one positive sentence that can be inferred from the context. However, in the testing period, we must make inference from two candidate sentences. To tackle this problem, we employ the generative adversarial networks (GANs) to generate fake sentence. We proposed a Conditional GANs in which the generator is conditioned by the context. Our experiments show the advantage of the CGANs in discriminating sentence and achieve state-of-the-art results in commonsense story reading comprehension task compared with previous feature engineering and deep learning methods.

• #2307
Inverted Bilingual Topic Models for Lexicon Extraction from Non-parallel Data
Tengfei Ma, Tetsuya Nasukawa
Information Extraction

Topic models have been successfully applied in lexicon extraction. However, most previous methods are limited to document-aligned data. In this paper, we try to address two challenges of applying topic models to lexicon extraction in non-parallel data: 1) hard to model the word relationship and 2) noisy seed dictionary. To solve these two challenges, we propose two new bilingual topic models to better capture the semantic information of each word while discriminating the multiple translations in a noisy seed dictionary. We extend the scope of topic models by inverting the roles of "word" and "document". In addition, to solve the problem of noise in seed dictionary, we incorporate the probability of translation selection in our models. Moreover, we also propose an effective measure to evaluate the similarity of words in different languages and select the optimal translation pairs. Experimental results using real world data demonstrate the utility and efficacy of the proposed models.

• #2499
Self-paced Compensatory Deep Boltzmann Machine for Semi-Structured Document Embedding
Shuangyin Li, Rong Pan, Jun Yan
Information Extraction

In the last decade, there has been a huge amount of documents with different types of rich metadata information, which belongs to the Semi-Structured Documents (SSDs), appearing in many real applications. It is an interesting research work to model this type of text data following the way how humans understand text with informative metadata. In the paper, we introduce a Self-paced Compensatory Deep Boltzmann Machine (SCDBM) architecture that learns a deep neural network by using metadata information to learn deep structure layer-wisely for Semi-Structured Documents (SSDs) embedding in a self-paced way. Inspired by the way how humans understand text, the model defines a deep process of document vector extraction beyond the space of words by jointing the metadata where each layer selects different types of metadata. We present efficient learning and inference algorithms for the SCDBM model and empirically demonstrate that using the representation discovered by this model has better performance on semi-structured document classification and retrieval, and tag prediction comparing with state-of-the-art baselines.

• #3282
Effective Deep Memory Networks for Distant Supervised Relation Extraction
Xiaocheng Feng, Jiang Guo, Bing Qin, Ting Liu, Yongjie Liu
Information Extraction

Distant supervised relation extraction (RE) has been an effective way of finding novel relational facts from text without labeled training data. Typically it can be formalized as a multi-instance multi-label problem.In this paper, we introduce a novel neural approach for distant supervised (RE) with specific focus on attention mechanisms.Unlike the feature-based logistic regression model and compositional neural models such as CNN, our approach includes two major attention-based memory components, which is capable of explicitly capturing the importance of each context word for modeling the representation of the entity pair, as well as the intrinsic dependencies between relations.Such importance degree and dependency relationship are calculated with multiple computational layers, each of which is a neural attention model over an external memory. Experiment on real-world datasets shows that our approach performs significantly and consistently better than various baselines.

### Thursday 2416:30 - 18:30JOU-KR2 - Journal Track: Knowledge Representation 2 (203)

Chair: Randy Goebel
• #3506
Exploiting Causality for Selective Belief Filtering in Dynamic Bayesian Networks (Extended Abstract)
Stefano V. Albrecht, Subramanian Ramamoorthy
Journal Track: Knowledge Representation 2

Dynamic Bayesian networks (DBNs) are a general model for stochastic processes with partially observed states. Belief filtering in DBNs is the task of inferring the belief state (i.e. the probability distribution over process states) based on incomplete and uncertain observations. In this article, we explore the idea of accelerating the filtering task by automatically exploiting causality in the process. We consider a specific type of causal relation, called passivity, which pertains to how state variables cause changes in other variables. We present the Passivity-based Selective Belief Filtering (PSBF) method, which maintains a factored belief representation and exploits passivity to perform selective updates over the belief factors. PSBF is evaluated in both synthetic processes and a simulated multi-robot warehouse, where it outperformed alternative filtering methods by exploiting passivity.

• #4214
Construction of System of Spheres-based Transitively Relational Partial Meet Multiple Contractions: An Impossibility Result (Extended Abstract)
Maurício D. L. Reis, Eduardo Fermé, Pavlos Peppas
Journal Track: Knowledge Representation 2

In this paper we show that, contrary to what is the case in what concerns contractions by a single sentence, there is not a system of spheres-based construction of multiple contractions which generates each and every transitively relational partial meet multiple contraction. Furthermore, we propose two system of spheres-based constructions of multiple contractions which generate (only) transitively relational partial meet multiple contractions.

• #4216
Evaluating Epistemic Negation in Answer Set Programming (Extended Abstract)
Yi-Dong Shen, Thomas Eiter
Journal Track: Knowledge Representation 2

Epistemic negation 'not' along with default negation 'neg' plays a key role in knowledge representation and nonmonotonic reasoning. However, the existing approaches behave not satisfactorily in that they suffer from the problems of unintended world views due to recursion through the epistemic modal operator K or M ( K F and M F are shorthands for (neg not F) and (not neg F), respectively). In this paper we present a general approach to epistemic negation which is free of unintended world views and thus offers a solution to the long-standing problem of epistemic specifications which were introduced by Gelfond 1991 over two decades ago.

• #4221
POPPONENT: Highly accurate, individually and socially efficient opponent preference model in bilateral multi issue negotiations (Extended Abstract)
Farhad Zafari, Faria Nassiri-Mofakham
Journal Track: Knowledge Representation 2

In automated bilateral multi issue negotiations, two intelligent automated agents negotiate on behalf of their owners over many issues in order to reach an agreement. Modeling the opponent can excessively boost the performance of the agents and increase the quality of the negotiation outcome. State of the art models accomplish this by considering some assumptions about the opponent which restricts their applicability in real scenarios. In this paper, a less restricted technique where perceptron units (POPPONENT) are applied in modelling the preferences of the opponent is proposed. This model adopts a Multi Bipartite version of the Standard Gradient Descent search algorithm (MBGD) to find the best hypothesis, which is the best preference profile. In order to evaluate the accuracy and performance of this proposed opponent model, it is compared with the state of the art models available in the Genius repository. This results in the devised setting which approves the higher accuracy of POPPONENT compared to the most accurate state of the art model. Evaluating the model in the real world negotiation scenarios in the Genius framework also confirms its high accuracy in relation to the state of the art models in estimating the utility of offers. The findings here indicate that the proposed model is individually and socially efficient. This proposed MBGD method could also be adopted in similar practical areas of Artificial Intelligence.

• #4227
Relations Between Spatial Calculi About Directions and Orientations (Extended Abstract)
Till Mossakowski, Reinhard Moratz
Journal Track: Knowledge Representation 2

A qualitative representation of space and/or time provides mechanisms which characterize the essential properties of objects or configurations. The advantages over quantitative representations can be: (1) a better match with human concepts related to natural language, and (2) better efficiency for reasoning. The two main trends in qualitative spatial constraint reasoning are topological reasoning about regions and reasoning about directions between points and straight lines and orientations of straight lines or configurations derived from points. In this work, we apply universal algebraic tools to binary qualitative calculi and their relations.

• #4209
On Redundant Topological Constraints (Extended Abstract)
Sanjiang Li, Zhiguo Long, Weiming Liu, Matt Duckham, Alan Both
Journal Track: Knowledge Representation 2

Redundancy checking is an important task in AI subfields such as knowledge representation and constraint solving. This paper considers redundant topological constraints, defined in the region connection calculus RCC8. We say a constraint in a set C of RCC8 constraints is redundant if it is entailed by the rest of C. A prime subnetwork of C is a subset of C which contains no redundant constraints and has the same solution set as C. It is natural to ask how to compute such a prime subnetwork, and when it is unique. While this problem is in general intractable, we show that, if S is a subalgebra of RCC8 in which weak composition distributes over nonempty intersections, then C has a unique prime subnetwork, which can be obtained in cubic time by removing all redundant constraints simultaneously from C. As a by-product, we show that any path-consistent network over such a distributive subalgebra is minimal.

### Thursday 2416:30 - 18:30SIS-ML - Sister Conference Track: Machine Learning (204)

Chair: Yang Yu
• #4217
On Thompson Sampling and Asymptotic Optimality
Jan Leike, Tor Lattimore, Laurent Orseau, Marcus Hutter
Sister Conference Track: Machine Learning

We discuss some recent results on Thompson sampling for nonparametric reinforcement learning in countable classes of general stochastic environments. These environments can be non-Markovian, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges in mean to the optimal value and (2) given a recoverability assumption regret is sublinear. We conclude with a discussion about optimality in reinforcement learning.

• #4235
Self-Adjusting Memory: How to Deal with Diverse Drift Types
Viktor Losing, Barbara Hammer, Heiko Wersing
Sister Conference Track: Machine Learning

Data Mining in non-stationary data streams is particularly relevant in the context of the Internet of Things and Big Data. Its challenges arise from fundamentally different drift types violating assumptions of data independence or stationarity. Available methods often struggle with certain forms of drift or require unavailable a priori task knowledge. We propose the Self-Adjusting Memory (SAM) model for the k Nearest Neighbor (kNN) algorithm. SAM-kNN can deal with heterogeneous concept drift, i.e. different drift types and rates. Its basic idea are dedicated models for current and former concepts used according to the demands of the given situation. It can be robustly applied in practice without meta parameter optimization. We conduct an extensive evaluation on various benchmarks, consisting of artificial streams with known drift characteristics and real-world datasets. Highly competitive results throughout all experiments underline the robustness of SAM-kNN as well as its capability to handle heterogeneous concept drift.

• #4239
Learning and Applying Case Adaptation Rules for Classification: An Ensemble Approach
Vahid Jalali, David Leake, Najmeh Forouzandehmehr
Sister Conference Track: Machine Learning

The ability of case-based reasoning systems to solve novel problems depends on their capability to adapt past solutions to new circumstances. However, acquiring the knowledge required for case adaptation is a classic challenge for CBR. This motivates the use of machine learning methods to generate adaptation knowledge. A popular approach uses the case difference heuristic (CDH) to generate adaptation rules from pairs of cases in the case base, based on the premise that the observed differences in case solutions result from the differences in the problems they solve, so can form the basic of rules to adapt cases with similar problem differences. Extensive research has successfully applied the CDH approach to adaptation rule learning for case-based regression (numerical prediction) tasks. However, classification tasks have been outside of its scope. The work presented in this paper addresses that gap by extending CDH-based learning of adaptation rules to apply to cases with categorical features and solutions. It presents the generalized case value heuristic to assess case and solution differences and applies it in an ensemble-based case-based classification method, ensembles of adaptations for classification (EAC), built on the authors' previous work on ensembles of adaptations for regression (EAR). Experimental results support the effectiveness of EAC.

• #4248
Open-World Probabilistic Databases: An Abridged Report
Ismail Ilkan Ceylan, Adnan Darwiche, Guy Van den Broeck
Sister Conference Track: Machine Learning

Large-scale probabilistic knowledge bases are becoming increasingly important in academia and industry alike. They are constantly extended with new data, powered by modern information extraction tools that associate probabilities with database tuples. In this paper, we revisit the semantics underlying such systems. In particular, the closed-world assumption of probabilistic databases, that facts not in the database have probability zero, clearly conflicts with their everyday use. To address this discrepancy, we propose an open-world probabilistic database semantics, which relaxes the probabilities of open facts to default intervals. For this open-world setting, we lift the existing data complexity dichotomy of probabilistic databases, and propose an efficient evaluation algorithm for unions of conjunctive queries. We also show that query evaluation can become harder for non-monotone queries.

• #4251
Model Accuracy and Runtime Tradeoff in Distributed Deep Learning: A Systematic Study
Suyog Gupta, Wei Zhang, Fei Wang
Sister Conference Track: Machine Learning

Deep learning with a large number of parame-ters requires distributed training, where model accuracy and runtime are two important factors to be considered. However, there has been no systematic study of the tradeoff between these two factors during the model training process. This paper presents Rudra, a parameter server based distributed computing framework tuned for training large-scale deep neural networks. Using variants of the asynchronous stochastic gradient descent algorithm we study the impact of synchronization protocol, stale gradient updates, minibatch size, learning rates, and number of learners on runtime performance and model accuracy. We introduce a new learningrate modulation strategy to counter the effect of stale gradients and propose a new synchronization protocol that can effectively bound the staleness in gradients, improve runtime performance and achieve good model accuracy. Our empirical investigation reveals a principled approach for distributed training of neural networks: the mini-batch size per learner should be reduced as more learners are added to the system to preserve the model accuracy. We validate this approach using commonly-used image classification benchmarks: CIFAR10 and ImageNet.

• #4299
Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling
Christopher De Sa, Kunle Olukotun, Christopher Ré
Sister Conference Track: Machine Learning

Gibbs sampling is a Markov chain Monte Carlo technique commonly used for estimating marginal distributions. To speed up Gibbs sampling, there has recently been interest in parallelizing it by executing asynchronously. While empirical results suggest that many models can be efficiently sampled asynchronously, traditional Markov chain analysis does not apply to the asynchronous case, and thus asynchronous Gibbs sampling is poorly understood. In this paper, we derive a better understanding of the two main challenges of asynchronous Gibbs: bias and mixing time. We show experimentally that our theoretical results match practical outcomes.

### Thursday 2416:30 - 18:30Competition (206)

Chair: Jochen Renz
• Angry Birds
Competition
• ### Thursday 2418:30 - 19:30Special Session (218)

Chair: Michael Wooldridge
• Business Meeting
Special Session
• ### Thursday 2418:30 - 20:00Social event (Boatbuilders Yard)

• Student and Sponsor Reception
Social event
• ### Thursday 2420:00 - 23:00Social Event (Boatbuilders Yard)

• Student Reception
Social Event
• ### Friday 2508:30 - 10:00EurAI Award Session (Plenary 2)

Chair: Barry O'Sullivan
• EurAI Artificial Intelligence Dissertation Award 2016
EurAI Award Session
• ### Friday 2508:30 - 10:00Competition (Lobby)

Chair: Jochen Renz
• Angry Birds
Competition
• ### Friday 2508:30 - 10:00JOU-NLP - Journal Track: Natural Language Processing (203)

Chair: Mausam
• #1313
News Across Languages - Cross-Lingual Document Similarity and Event Tracking (Extended Abstract)
Jan Rupnik, Andrej Muhič, Gregor Leban, Blaž Fortuna, Marko Grobelnik
Journal Track: Natural Language Processing

In today's world, we follow news which is distributed globally. Significant events are reported by different sources and in different languages. In this work, we address the problem of tracking of events in a large multilingual stream. Within a recently developed system Event Registry we examine two aspects of this problem: how to compare articles in different languages and how to link collections of articles in different languages which refer to the same event. Building on previous work, we show there are methods which scale well and can compute a meaningful similarity between articles from languages with little or no direct overlap in the training data.Using this capability, we then propose an approach to link clusters of articles across languages which represent the same event.

• #4211
Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures (Extended Abstract)
Raffaella Bernardi, Ruket Cakici, Desmond Elliott, Aykut Erdem, Erkut Erdem, Nazli Ikizler-Cinbis, Frank Keller, Adrian Muscat, Barbara Plank
Journal Track: Natural Language Processing

Automatic image description generation is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities. In this survey, we classify the known approaches based on how they conceptualise this problem and provide a review of existing models, highlighting their advantages and disadvantages. Moreover, we give an overview of the benchmark image-text datasets and the evaluation measures that have been developed to assess the quality of machine-generated descriptions. Finally we explore future directions in the area of automatic image description.

• #4220
Text Rewriting Improves Semantic Role Labeling (Extended Abstract)
Kristian Woodsend, Mirella Lapata
Journal Track: Natural Language Processing

Large-scale annotated corpora are a prerequisite to developing high-performance NLP systems. Such corpora are expensive to produce, limited in size, often demanding linguistic expertise. In this paper we use text rewriting as a means of increasing the amount of labeled data available for model training. Our method uses automatically extracted rewrite rules from comparable corpora and bitexts to generate multiple versions of sentences annotated with gold standard labels. We apply this idea to semantic role labeling and show that a model trained on rewritten data outperforms the state of the art on the CoNLL-2009 benchmark dataset.

• #4225
Robust Multilingual Named Entity Recognition with Shallow Semi-supervised Features (Extended Abstract)
Rodrigo Agerri, German Rigau
Journal Track: Natural Language Processing

We present a multilingual Named Entity Recognition approach based on a robust and general set of features across languages and datasets. Our system combines shallow local information with clustering semi-supervised features induced on large amounts of unlabeled text. Understanding via empiricalexperimentation how to effectively combine various types of clustering features allows us to seamlessly export our system to other datasets and languages. The result is a simple but highly competitive system which obtains state of the art results across five languages and twelve datasets. The results are reported on standard shared task evaluation data such as CoNLL for English, Spanish and Dutch. Furthermore, and despite the lack of linguistically motivated features, we also report best results for languages such as Basque and German. In addition, we demonstrate that our method also obtains very competitive results even when the amount of supervised data is cut by half, alleviating the dependency on manually annotated data. Finally, the results show that our emphasis on clustering features is crucial to develop robust out-of-domain models. The system and models are freely available to facilitate its use and guarantee the reproducibility of results.

### Friday 2508:30 - 10:00ML-DL1 - Deep Learning 1 (204)

Chair: Truyen Tran
• #1327
DeepStory: Video Story QA by Deep Embedded Memory Networks
Kyung-Min Kim, Min-Oh Heo, Seong-Ho Choi, Byoung-Tak Zhang
Deep Learning 1

Question-answering (QA) on video contents is a significant challenge for achieving human-level intelligence as it involves both vision and language in real-world settings. Here we demonstrate the possibility of an AI agent performing video story QA by learning from a large amount of cartoon videos. We develop a video-story learning model, i.e. Deep Embedded Memory Networks (DEMN), to reconstruct stories from a joint scene-dialogue video stream using a latent embedding space of observed data. The video stories are stored in a long-term memory component. For a given question, an LSTM-based attention model uses the long-term memory to recall the best question-story-answer triplet by focusing on specific words containing key information. We trained the DEMN on a novel QA dataset of children’s cartoon video series, Pororo. The dataset contains 16,066 scene-dialogue pairs of 20.5-hour videos, 27,328 fine-grained sentences for scene description, and 8,913 story-related QA pairs. Our experimental results show that the DEMN outperforms other QA models. This is mainly due to 1) the reconstruction of video stories in a scene-dialogue combined form that utilize the latent embedding and 2) attention. DEMN also achieved state-of-the-art results on the MovieQA benchmark.

• #1597
Learning Multi-level Region Consistency with Dense Multi-label Networks for Semantic Segmentation
Tong Shen, Guosheng Lin, Chunhua Shen, Ian Reid
Deep Learning 1

Semantic image segmentation is a fundamental task in image understanding. Per-pixel semantic labelling of an image benefits greatly from the ability to consider region consistency both locally and globally. However, many Fully Convolutional Network based methods do not impose such consistency, which may give rise to noisy and implausible predictions. We address this issue by proposing a dense multi-label network module that is able to encourage the region consistency at different levels. This simple but effective module can be easily integrated into any semantic segmentation systems. With comprehensive experiments, we show that the dense multi-label can successfully remove the implausible labels and clear the confusion so as to boost the performance of semantic segmentation systems.

• #1906
Towards Understanding the Invertibility of Convolutional Neural Networks
Anna Gilbert, Yi Zhang, Kibok Lee, Yuting Zhang, Honglak Lee
Deep Learning 1

Several recent works have empirically observed that Convolutional Neural Nets (CNNs) are (approximately) invertible. To understand this approximate invertibility phenomenon and how to leverage it more effectively, we focus on a theoretical explanation and develop a mathematical model of sparse signal recovery that is consistent with CNNs with random weights. We give an exact connection to a particular model of model-based compressive sensing (and its recovery algorithms) and random-weight CNNs. We show empirically that several learned networks are consistent with our mathematical analysis and then demonstrate that with such a simple theoretical framework, we can obtain reasonable reconstruction results on real images. We also discuss gaps between our model assumptions and the CNN trained for classification in practical scenarios.

• #2217
Tag Disentangled Generative Adversarial Network for Object Image Re-rendering
Chaoyue Wang, Chaohui Wang, Chang Xu, Dacheng Tao
Deep Learning 1

In this paper, we propose a principled Tag Disentangled Generative Adversarial Networks (TD-GAN) for re-rendering new images for the object of interest from a single image of it by specifying multiple scene properties (such as viewpoint, illumination, expression, etc.). The whole framework consists of a disentangling network, a generative network, a tag mapping net, and a discriminative network, which are trained jointly based on a given set of images that are completely/partially tagged (i.e., supervised/semi-supervised setting). Given an input image, the disentangling network extracts disentangled and interpretable representations, which are then used to generate images by the generative network. In order to boost the quality of disentangled representations, the tag mapping net is integrated to explore the consistency between the image and its tags. Furthermore, the discriminative network is introduced to implement the adversarial training strategy for generating more realistic images. Experiments on two challenging datasets demonstrate the state-of-the-art performance of the proposed framework in the problem of interest.

• #2245
Image Matching via Loopy RNN
Donghao Luo, Bingbing Ni, Yichao Yan, Xiaokang Yang
Deep Learning 1

Most existing matching algorithms are one-off algorithms, i.e., they usually measure the distance between the two image feature representation vectors for only one time. In contrast, human's vision system achieves this task, i.e., image matching, by recursively looking at specific/related parts of both images and then making the final judgement. Towards this end, we propose a novel loopy recurrent neural network (Loopy RNN), which is capable of aggregating relationship information of two input images in a progressive/iterative manner and outputting the consolidated matching score in the final iteration. A Loopy RNN features two uniqueness. First, built on conventional long short-term memory (LSTM) nodes, it links the output gate of the tail node to the input gate of the head node, thus it brings up symmetry property required for matching. Second, a monotonous loss designed for the proposed network guarantees increasing confidence during the recursive matching process. Extensive experiments on several image matching benchmarks demonstrate the great potential of the proposed method.

• #2470
Dual Inference for Machine Learning
Yingce Xia, Jiang Bian, Tao Qin, Nenghai Yu, Tie-Yan Liu
Deep Learning 1

Recent years have witnessed the rapid development of machine learning in solving artificial intelligence (AI) tasks in many domains, including translation, speech, image, etc. Within these domains, AI tasks are usually not independent. As a specific type of relationship, structural duality does exist between many pairs of AI tasks, such as translation from one language to another vs. its opposite direction, speech recognition vs. speech synthetization, image classification vs. image generation, etc. The importance of such duality has been magnified by some recent studies, which revealed that it can boost the learning of two tasks in the dual form. However, there has been little investigation on how to leverage this invaluable relationship into the inference stage of AI tasks. In this paper, we propose a general framework of dual inference which can take advantage of both existing models from two dual tasks, without re-training, to conduct inference for one individual task. Empirical studies on three pairs of specific dual tasks, including machine translation, sentiment analysis, and image processing have illustrated that dual inference can significantly improve the performance of each of individual tasks.

### Friday 2508:30 - 10:00ML-NNV - Neural Networks and Vision (210)

Chair: Arnau Ramisa
• #2065
CFNN: Correlation Filter Neural Network for Visual Object Tracking
Yang Li, Zhan Xu, Jianke Zhu
Neural Networks and Vision

Albeit convolutional neural network (CNN) has shown promising capacity in many computer vision tasks, applying it to visual tracking is yet far from solved. Existing methods either employ a large external dataset to undertake exhaustive pre-training or suffer from less satisfactory results in terms of accuracy and robustness. To track single target in a wide range of videos, we present a novel Correlation Filter Neural Network architecture, as well as a complete visual tracking pipeline, The proposed approach is a special case of CNN, whose initialization does not need any pre-training on the external dataset. The initialization of network enjoys the merits of cyclic sampling to achieve the appealing discriminative capability, while the network updating scheme adopts advantages from back-propagation in order to capture new appearance variations. The tracking pipeline integrates both aspects well by making them complementary to each other. We validate our tracker on OTB-2013 benchmark. The proposed tracker obtains the promising results compared to most of existing representative trackers.

• #2071
WALKING WALKing walking: Action Recognition from Action Echoes
Qianli Ma, Lifeng Shen, Enhuan Chen, Shuai Tian, Jiabing Wang, Garrison W. Cottrell
Neural Networks and Vision

Recognizing human actions represented by 3D trajectories of skeleton joints is a challenging machine learning task. In this paper, the 3D skeleton sequences are regarded as multivariate time series, and their dynamics and multiscale features are efficiently learned from action echo states. Specifically, first the skeleton data from the limbs and trunk are projected into five high dimensional nonlinear spaces, that are randomly generated by five dynamic, training-free recurrent networks, i.e., the reservoirs of echo state networks (ESNs). In this way, the history of the time series is represented as nonlinear echo states of actions. We then use a single multiscale convolutional layer to extract multiscale features from the echo states, and maintain multiscale temporal invariance by a max-over-time pooling layer. We propose two multi-step fusion strategies to integrate the spatial information over the five parts of the human physical structure. Finally, we learn the label distribution using softmax. With one training-free recurrent layer and only layer of convolution, our Convolutional Echo State Network (ConvESN) is a very efficient end-to-end model, and achieves state-of-the-art performance on four skeleton benchmark data sets.

• #2119
Global-residual and Local-boundary Refinement Networks for Rectifying Scene Parsing Predictions
Rui Zhang, Sheng Tang, Min Lin, Jintao Li, Shuicheng Yan
Neural Networks and Vision

Most of existing scene parsing methods suffer from the serious problems of both inconsistent parsing results and object boundary shift. To tackle these problems, we first propose an iterative Global-residual Refinement Network (GRN) through exploiting global contextual information to predict the parsing residuals and iteratively smoothen the inconsistent parsing labels. Furthermore, we propose a Local-boundary Refinement Network (LRN) to learn the position-adaptive propagation coefficients so that local contextual information from neighbors can be optimally captured for refining object boundaries. Finally, we cascade the proposed two refinement networks after a fully residual convolutional neural network within a uniform framework. Extensive experiments on ADE20K and Cityscapes datasets well demonstrate the effectiveness of the two refinement methods for refining scene parsing predictions.

• #2864
Group-wise Deep Co-saliency Detection
Lina Wei, Shanshan Zhao, Omar El Farouk Bourahla, Xi Li, Fei Wu
Neural Networks and Vision

In this paper, we propose an end-to-end group-wise deep co-saliency detection approach to address the co-salient object discovery problem based on the fully convolutional network (FCN) with group input and group output. The proposed approach captures the group-wise interaction information for group images by learning a semantics-aware image representation based on a convolutional neural network, which adaptively learns the group-wise features for co-saliency detection. Furthermore, the proposed approach discovers the collaborative and interactive relationships between group-wise feature representation and single-image individual feature representation, and model this in a collaborative learning framework. Finally, we set up a unified end-to-end deep learning scheme to jointly optimize the process of group-wise feature representation learning and the collaborative learning, leading to more reliable and robust co-saliency detection results. Experimental results demonstrate the effectiveness of our approach in comparison with the state-of-the-art approaches.

• #2979
A Sequence Labeling Convolutional Network and Its Application to Handwritten String Recognition
Qingqing Wang, Yue Lu
Neural Networks and Vision

Handwritten string recognition has been struggling with connected patterns fiercely. Segmentation-free and over-segmentation frameworks are commonly applied to deal with this issue. For the past years, RNN combining with CTC has occupied the domain of segmentation-free handwritten string recognition, while CNN is just employed as a single character recognizer in the over-segmentation framework. The main challenges for CNN to directly recognize handwritten strings are the appropriate processing of arbitrary input string length, which implies arbitrary input image size, and reasonable design of the output layer. In this paper, we propose a sequence labeling convolutional network for the recognition of handwritten strings, in particular, the connected patterns. We properly design the structure of the network to predict how many characters present in the input images and what exactly they are at every position. Spatial pyramid pooling (SPP) is utilized with a new implementation to handle arbitrary string length. Moreover, we propose a more flexible pooling strategy called FSPP to adapt the network to the straightforward recognition of long strings better. Experiments conducted on handwritten digital strings from two benchmark datasets and our own cell-phone number dataset demonstrate the superiority of the proposed network.

• #3831
Learning to Read Irregular Text with Attention Mechanisms
Xiao Yang, Dafang He, Zihan Zhou, Daniel Kifer, C. Lee Giles
Neural Networks and Vision

We present a robust end-to-end neural-based model to attentively recognize text in natural images. Particularly, we focus on accurately identifying irregular (perspectively distorted or curved) text, which has not been well addressed in the previous literature. Previous research on text reading often works with regular (horizontal and frontal) text and does not adequately generalize to processing text with perspective distortion or curving effects. Our work proposes to overcome this difficulty by introducing two learning components: (1) an auxiliary dense character detection task that helps to learn text specific visual patterns, (2) an alignment loss that provides guidance to the training of an attention model. We show with experiments that these two components are crucial for achieving fast convergence and high classification accuracy for irregular text recognition. Our model outperforms previous work on two irregular-text datasets: SVT-Perspective and CUTE80, and is also highly-competitive on several regular-text datasets containing primarily horizontal and frontal text.

### Friday 2508:30 - 10:00ML-UL1 - Unsupervised Learning 1 (211)

Chair: Kathryn Merrick
• #1219
Radar: Residual Analysis for Anomaly Detection in Attributed Networks
Jundong Li, Harsh Dani, Xia Hu, Huan Liu
Unsupervised Learning 1

Attributed networks are pervasive in different domains, ranging from social networks, gene regulatory networks to financial transaction networks. This kind of rich network representation presents challenges for anomaly detection due to the heterogeneity of two data representations. A vast majority of existing algorithms assume certain properties of anomalies are given a prior. Since various types of anomalies in real-world attributed networks co-exist, the assumption that priori knowledge regarding anomalies is available does not hold. In this paper, we investigate the problem of anomaly detection in attributed networks generally from a residual analysis perspective, which has been shown to be effective in traditional anomaly detection problems. However, it is a non-trivial task in attributed networks as interactions among instances complicate the residual modeling process. Methodologically, we propose a learning framework to characterize the residuals of attribute information and its coherence with network information for anomaly detection. By learning and analyzing the residuals, we detect anomalies whose behaviors are singularly different from the majority. Experiments on real datasets show the effectiveness and generality of the proposed framework.

• #2009
Online Robust Low-Rank Tensor Learning
Ping Li, Jiashi Feng, Xiaojie Jin, Luming Zhang, Xianghua Xu, Shuicheng Yan
Unsupervised Learning 1

The rapid increase of multidimensional data (a.k.a. tensor) like videos brings new challenges for low-rank data modeling approaches such as dynamic data size, complex high-order relations, and multiplicity of low-rank structures. Resolving these challenges require a new tensor analysis method that can perform tensor data analysis online, which however is still absent. In this paper, we propose an Online Robust Low-rank Tensor Modeling (ORLTM) approach to address these challenges. ORLTM dynamically explores the high-order correlations across all tensor modes for low-rank structure modeling. To analyze mixture data from multiple subspaces, ORLTM introduces a new dictionary learning component. ORLTM processes data streamingly and thus requires quite low memory cost that is independent of data size. This makes ORLTM quite suitable for processing large-scale tensor data. Empirical studies have validated the effectiveness of the proposed method on both synthetic data and one practical task, i.e., video background subtraction. In addition, we provide theoretical analysis regarding computational complexity and memory cost, demonstrating the efficiency of ORLTM rigorously.

• #1280
From Ensemble Clustering to Multi-View Clustering
Zhiqiang Tao, Hongfu Liu, Sheng Li, Zhengming Ding, Yun Fu
Unsupervised Learning 1

Multi-View Clustering (MVC) aims to find the cluster structure shared by multiple views of a particular dataset. Existing MVC methods mainly integrate the raw data from different views, while ignoring the high-level information. Thus, their performance may degrade due to the conflict between heterogeneous features and the noises existing in each individual view. To overcome this problem, we propose a novel Multi-View Ensemble Clustering (MVEC) framework to solve MVC in an Ensemble Clustering (EC) way, which generates Basic Partitions (BPs) for each view individually and seeks for a consensus partition among all the BPs. By this means, we naturally leverage the complementary information of multi-view data in the same partition space. Instead of directly fusing BPs, we employ the low-rank and sparse decomposition to explicitly consider the connection between different views and detect the noises in each view. Moreover, the spectral ensemble clustering task is also involved by our framework with a carefully designed constraint, making MVEC a unified optimization framework to achieve the final consensus partition. Experimental results on six real-world datasets show the efficacy of our approach compared with both MVC and EC methods.

• #1173
Angle Principal Component Analysis
Qianqian Wang, Quanxue Gao, Xinbo Gao, Feiping Nie
Unsupervised Learning 1

Recently, many ℓ1-norm based PCA methods have been developed for dimensionality reduction, but they do not explicitly consider the reconstruction error. Moreover, they do not take into account the relationship between reconstruction error and variance of projected data. This reduces the robustness of algorithms. To handle this problem, a novel formulation for PCA, namely angle PCA, is proposed. Angle PCA employs ℓ2-norm to measure reconstruction error and variance of projected da-ta and maximizes the summation of ratio between variance and reconstruction error of each data. Angle PCA not only is robust to outliers but also retains PCA’s desirable property such as rotational invariance. To solve Angle PCA, we propose an iterative algorithm, which has closed-form solution in ea