Tuesday 22 08:00  09:00 Opening Opening Remarks
Tuesday 22 09:00  10:00 Keynote Provably beneficial AI
Stuart Russell
Tuesday 22 10:30  12:00 ROBVP  Vision and Perception

Learning to Hallucinate Face Images via Component Generation and Enhancement
Yibing Song, Jiawei Zhang, Shengfeng He, Linchao Bao, Qingxiong Yang
We propose a twostage method for face hallucination. First, we generate facial components of the input image using CNNs. These components represent the basic facial structures. Second, we synthesize finegrained facial structures from high resolution training images. The details of these structures are transferred into facial components for enhancement. Therefore, we generate facial components to approximate ground truth global appearance in the first stage and enhance them through recovering details in the second stage. The experiments demonstrate that our method performs favorably against stateoftheart methods.

SingleImage 3D Scene Parsing Using Geometric Commonsense
Chengcheng Yu, Xiaobai Liu, SongChun Zhu
This paper presents a unified grammatical framework capable of reconstructing a variety of scene types (e.g., urban, campus, county etc.) from a single input image. The key idea of our approach is to study a novel commonsense reasoning framework that mainly exploits two types of prior knowledges: (i) prior distributions over a single dimension of objects, e.g., that the length of a sedan is about 4.5 meters; (ii) pairwise relationships between the dimensions of scene entities, e.g., that the length of a sedan is shorter than a bus. These unary or relative geometric knowledge, once extracted, are fairly stable across different types of natural scenes, and are informative for enhancing the understanding of various scenes in both 2D images and 3D world. Methodologically, we propose to construct a hierarchical graph representation as a unified representation of the input image and related geometric knowledge. We formulate these objectives with a unified probabilistic formula and develop a datadriven Monte Carlo method to infer the optimal solution with both bottomtoup and topdown computations. Results with comparisons on public datasets showed that our method clearly outperforms the alternative methods.

Image Gradientbased Joint Direct Visual Odometry for Stereo Camera
Jianke Zhu
Visual odometry is an important research problem for computer vision and robotics. In general, the featurebased visual odometry methods heavily rely on the accurate correspondences between local salient points, while the direct approaches could make full use of whole image and perform dense 3D reconstruction simultaneously. However, the direct visual odometry usually suffers from the drawback of getting stuck at local optimum especially with large displacement, which may lead to the inferior results. To tackle this critical problem, we propose a novel scheme for stereo odometry in this paper, which is able to improve the convergence with more accurate pose. The key of our approach is a dual Jacobian optimization that is fused into a multiscale pyramid scheme. Moreover, we introduce a gradientbased feature representation, which enjoys the merit of being robust to illumination changes. Furthermore, a joint direct odometry approach is proposed to incorporate the information from the last frame and previous keyframes. We have conducted the experimental evaluation on the challenging KITTI odometry benchmark, whose promising results show that the proposed algorithm is very effective for stereo visual odometry.

Salient Object Detection with Semantic Priors
Tam V. Nguyen, Luoqi Liu
Salient object detection has increasingly become a popular topic in cognitive and computational sciences, including computer vision and artificial intelligence research. In this paper, we propose integrating semantic priors into the salient object detection process. Our algorithm consists of three basic steps. Firstly, the explicit saliency map is obtained based on the semantic segmentation refined by the explicit saliency priors learned from the data. Next, the implicit saliency map is computed based on a trained model which maps the implicit saliency priors embedded into regional features with the saliency values. Finally, the explicit semantic map and the implicit map are adaptively fused to form a pixelaccurate saliency map which uniformly covers the objects of interest. We further evaluate the proposed framework on two challenging datasets, namely, ECSSD and HKUIS. The extensive experimental results demonstrate that our method outperforms other stateoftheart methods.

Largescale Subspace Clustering by Fast Regression Coding
Jun Li, Handong Zhao, Zhiqiang Tao, Yun Fu
LargeScale Subspace Clustering (LSSC) is an interesting and important problem in big data era. However, most existing methods (i.e., sparse or lowrank subspace clustering) cannot be directly used for solving LSSC because they suffer from the high time complexityquadratic or cubic in n (the number of data points). To overcome this limitation, we propose a Fast Regression Coding (FRC) to optimize regression codes, and simultaneously train a nonlinear function to approximate the codes. By using FRC, we develop an efficient Regression Coding Clustering (RCC) framework to solve the LSSC problem. It consists of sampling, FRC and clustering. RCC randomly samples a small number of data points, quickly calculates the codes of all data points by using the nonlinear function learned from FRC, and employs a largescale spectral clustering method to cluster the codes. Besides, we provide a theorem guarantee that the nonlinear function has a firstorder approximation ability and a group effect. The theorem manifests that the codes are easily used to construct a dividable similarity graph. Compared with the stateoftheart LSSC methods, our model achieves better clustering results in largescale datasets.

Projective Lowrank Subspace Clustering via Learning Deep Encoder
Jun Li, Liu Hongfu, Handong Zhao, Yun Fu
Lowrank subspace clustering (LRSC) has been considered as the stateoftheart method on small datasets. LRSC constructs a desired similarity graph by lowrank representation (LRR), and employs a spectral clustering to segment the data samples. However, effectively applying LRSC into clustering big data becomes a challenge because both LRR and spectral clustering suffer from high computational cost. To address this challenge, we create a projective lowrank subspace clustering (PLrSC) scheme for large scale clustering problem. First, a small dataset is randomly sampled from big dataset. Second, our proposed predictive lowrank decomposition (PLD) is applied to train a deep encoder by using the small dataset, and the deep encoder is used to fast compute the lowrank representations of all data samples. Third, fast spectral clustering is employed to segment the representations. As a nontrivial contribution, we theoretically prove the deep encoder can universally approximate to the exact (or bounded) recovery of the row space. Experiments verify that our scheme outperforms the related methods on large scale datasets in a small amount of time. We achieve the stateofart clustering accuracy by 95.8% on MNIST using scattering convolution features.
Tuesday 22 10:30  12:00 NLPNLS  Natural Language Semantics

Modeling Physicians' Utterances to Explore Diagnostic Decisionmaking
Xuan Guo, Rui Li, Qi Yu, Anne Haake
Diagnostic error prevention is a longestablished but specialized topic in clinical and psychological research. In this paper, we contribute to the field by exploring diagnostic decisionmaking via modeling physicians' utterances of medical concepts during imagebased diagnoses. We conduct experiments to collect verbal narratives from dermatologists while they are examining and describing dermatology images towards diagnoses. We propose a hierarchical probabilistic framework to learn domainspecific patterns from the medical concepts in these narratives. The discovered patterns match the diagnostic units of thought identified by domain experts. These meaningful patterns uncover physicians' diagnostic decisionmaking processes while parsing the image content. Our evaluation shows that these patterns provide key information to classify narratives by diagnostic correctness levels.

Understanding and Exploiting Language Diversity
Fausto Giunchiglia, Khuyagbaatar Batsuren, Gabor Bella
The main goal of this paper is to describe a general approach to the problem of understanding linguis tic phenomena, as they appear in lexical semantics, through the analysis of large scale resources, while exploiting these results to improve the quality of the resources themselves. The main contributions are: the approach itself, a formal quantitative mea sure of language diversity; a set of formal quanti tative measures of resource incompleteness and a large scale resource, called the Universal Knowl edge Core (UKC) built following the methodology proposed. As a concrete example of an application, we provide an algorithm for distinguishing poly semes from homonyms, as stored in the UKC.

Entity Suggestion with Conceptual Expanation
Yi Zhang, Yanghua Xiao, Seungwon Hwang, Wei Wang, Haixun Wang, X. Sean Wang
Entity Suggestion with Conceptual Explanation (ESC) refers to a type of entity acquisition query in which a user provides a set of example entities as the query and obtains in return not only some related entities but also concepts which can best explain the query and the result. ESC is useful in many applications such as relatedentity recommendation and query expansion. Many example based entity suggestion solutions are available in existing literatures. However, they are generally not aware of the concepts of query entities thus cannot be used for conceptual explanation. In this paper, we propose two probabilistic entity suggestion models and their computation solutions. Our models and solutions fully take advantage of the large scale taxonomies which consist of isA relations between entities and concepts. With our models and solutions, we can not only find the best entities to suggest but also derive the best concepts to explain the suggestion. Extensive evaluations on real data sets justify the accuracy of our models and the efficiency of our solutions.

Learning Sentence Representation with Guidance of Human Attention
Shaonan Wang, Jiajun Zhang, Chengqing Zong
Recently, much progress has been made in learning generalpurpose sentence representations that can be used across domains. However, most of the existing models typically treat each word in a sentence equally. In contrast, extensive studies have proven that human read sentences efficiently by making a sequence of fixation and saccades. This motivates us to improve sentence representations by assigning different weights to the vectors of the component words, which can be treated as an attention mechanism on single sentences. To that end, we propose two novel attention models, in which the attention weights are derived using significant predictors of human reading time, i.e., Surprisal, POS tags and CCG supertags. The extensive experiments demonstrate that the proposed methods significantly improve upon the stateoftheart sentence representation models.

Dynamic Compositional Neural Networks over Tree Structure
Pengfei Liu, Xipeng Qiu, Xuanjing Huang
Treestructured neural networks have proven to be effective in learning semantic representations by exploitingsyntactic information. In spite of their success, most existing models suffer from the underfitting problem: they recursively use the same shared compositional function throughout the whole compositional process and lack expressive power due to inability to capture the richness of compositionality.In this paper, we address this issue by introducing the dynamic compositional neural networks over tree structure (DCTreeNN), in which the compositional function is dynamically generated by a meta network.The role of metanetwork is to capture the metaknowledge across the different compositional rules and formulate them. Experimental results on two typical tasks show the effectiveness of the proposed models.

Lexical Sememe Prediction via Word Embeddings and Matrix Factorization
Ruobing Xie, Xingchi Yuan, Zhiyuan Liu, Maosong Sun
Sememes are defined as the minimum semantic units of human languages. People have manually annotated lexical sememes for words and form linguistic knowledge bases. However, manual construction is timeconsuming and laborintensive, with significant annotation inconsistency and noise. In this paper, we for the first time explore to automatically predict lexical sememes based on semantic meanings of words encoded by word embeddings. Moreover, we apply matrix factorization to learn semantic relations between sememes and words. In experiments, we take a realworld sememe knowledge base HowNet for training and evaluation, and the results reveal the effectiveness of our method for lexical sememe prediction. Our method will be of great use for annotation verification of existing noisy sememe knowledge bases and annotation suggestion of new words and phrases.
Tuesday 22 10:30  12:00 UAIAPI1  Approximate Probabilistic Inference 1

Nonlinear Maximum Margin MultiView Learning with Adaptive Kernel
Jia he, Changying Du, Changde Du, Fuzhen Zhuang, Qing He, Guoping Long
Existing multiview learning methods based on kernel function either require the user to select and tune a single predefined kernel or have to compute and store many Gram matrices to perform multiple kernel learning. Apart from the huge consumption of manpower, computation and memory resources, most of these models seek point estimation of their parameters, and are prone to overfitting to small training data. This paper presents an adaptive kernel nonlinear maxmargin multiview learning model under the Bayesian framework. Specifically, we regularize the posterior of an efficient multiview latent variable model by explicitly mapping the latent representations extracted from multiple data views to a random Fourier feature space where maxmargin classification constraints are imposed. Assuming these random features are drawn from Dirichlet process Gaussian mixtures, we can adaptively learn shiftinvariant kernels from data according to Bochners theorem. For inference, we employ the data augmentation idea for hinge loss, and design an efficient gradientbased MCMC sampler in the augmented space. Having no need to compute the Gram matrix, our algorithm scales linearly with the size of training set. Extensive experiments on realworld datasets demonstrate that our method has superior performance.

Variational Mixtures of Gaussian Processes for Classification
Chen Luo, Shiliang Sun
Gaussian Processes (GPs) are powerful tools for machine learning which have been applied to both classification and regression. The mixture models of GPs were later proposed to further improve GPs for data modeling. However, these models are formulated for regression problems. In this work, we propose a new Mixture of Gaussian Processes for Classification (MGPC). Instead of the Gaussian likelihood for regression, MGPC employs the logistic function as likelihood to obtain the class probabilities, which is suitable for classification problems. The posterior distribution of latent variables is approximated through variational inference. The hyperparameters are optimized through the variational EM method and a greedy algorithm. Experiments are performed on multiple realworld datasets which show improvements over five widely used methods on predictive performance. The results also indicate that for classification MGPC is significantly better than the regression model with mixtures of GPs, different from the existing consensus that their single model counterparts are comparable.

Order Statistics for Probabilistic Graphical Models
David Smith, Sara Rouhani, Vibhav Gogate
We consider the problem of computing rth order statistics, namely finding an assignment having rank r in a probabilistic graphical model. We show that the problem is NPhard even when the graphical model has no edges (zerotreewidth models) via a reduction from the partition problem. We use this reduction, specifically a pseudopolynomial time algorithm for number partitioning to yield a pseudopolynomial time approximation algorithm for solving the rth order statistics problem in zero treewidth models. We then extend this algorithm to arbitrary graphical models by generalizing it to tree decompositions, and demonstrate via experimental evaluation on various datasets that our proposed algorithm is more accurate than sampling algorithms.

Dynamic Programming Bipartite Belief Propagation For Hyper Graph Matching
Zhen Zhang, Julian McAuley, Yong Li, Wei Wei, Yanning Zhang, Qinfeng Shi
Hyper graph matching problems have drawn attention recently due to their ability to embed higher order relations between nodes. In this paper, we formulate hyper graph matching problems as constrained MAP inference problems in graphical models. Whereas previous discrete approaches introduce several global correspondence vectors, we introduce only one global correspondence vector, but several local correspondence vectors. This allows us to decompose the problem into a (linear) bipartite matching problem and several belief propagation subproblems. Bipartite matching can be solved by traditional approaches, while the belief propagation subproblem is further decomposed as two subproblems with optimal substructure. Then a newly proposed dynamic programming procedure is used to solve the belief propagation subproblem. Experiments show that the proposed methods outperform stateoftheart techniques for hyper graph matching.

CoarsetoFine Lifted MAP Inference in Computer Vision
Haroun Habeeb, Ankit Anand, Parag Singla, Mausam .
There is a vast body of theoretical research on lifted inference in probabilistic graphical models (PGMs). However, few demonstrations exist where lifting is applied in conjunction with top of the line applied algorithms. We pursue the applicability of lifted inference for computer vision (CV), with the insight that a globally optimal (MAP) labeling will likely have the same label for two symmetric pixels. This allows us to lift the large class of algorithms that model a CV problem via PGM inference. We propose a generic template for coarsetofine (C2F) inference in CV, which progressively refines an initial coarsely lifted PGM for varying qualitytime tradeoffs. We demonstrate the performance of C2F inference by developing lifted versions of two near stateoftheart CV algorithms for stereo vision and interactive image segmentation. We find that, against flat algorithms, the lifted versions have a much superior anytime performance, without any loss in final solution quality.

Efficient Inference for Untied MLNs
Somdeb Sarkhel, Deepak Venugopal, Nicholas Ruozzi, Vibhav Gogate
We address the problem of scaling up localsearch or samplingbased inference in Markov logic networks (MLNs) that have large shared substructures but no (or few) tied weights. Such untied MLNs are ubiquitous in practical applications. However, they have very few symmetries, and as a result lifted inference algorithmsthe dominant approach for scaling up inferenceperform poorly on them. The key idea in our approach is to reduce the hard, timeconsuming subtask in sampling algorithms, computing the sum of weights of features that satisfy a full assignment, to the problem of computing a set of partition functions of graphical models, each defined over the logical variables in a firstorder formula. The importance of this reduction is that when the treewidth of all the graphical models is small, it yields an order of magnitude speedup. When the treewidth is large, we propose an oversymmetric approximation and experimentally demonstrate that it is both fast and accurate.
Tuesday 22 10:30  12:00 MLDM1  Data Mining 1

Enhancing Campaign Design in Crowdfunding: A Product Supply Optimization Perspective
Qi Liu, Guifeng Wang, Hongke Zhao, Chuanren Liu, Tong Xu, Enhong Chen
Crowdfunding is an emerging Internet application for creators designing campaigns (projects) to collect funds from public investors. Usually, the limited budget of the creator is manually divided into several perks (reward options), that should fit various market demand and further bring different monetary contributions for the campaign. Therefore, it is very challenging for each creator to design an effective campaign. To this end, in this paper, we aim to enhance the funding performance of the newly proposed campaigns, with a focus on optimizing the product supply of perks. Specifically, given the expected budget and the perks of a campaign, we propose a novel solution to automatically recommend the optimal product supply to every perk for balancing the expected return of this campaign against the risk. Along this line, we define it as a constrained portfolio selection problem, where the risk of each campaign is measured by a multitask learning method. Finally, experimental results on the realworld crowdfunding data clearly prove that the optimized product supply can help improve the campaign performance significantly, and meanwhile, our multitask learning method could more precisely estimate the risk of each campaign.

Video Question Answering via Hierarchical SpatioTemporal Attention Networks
zhou zhao, Qifan Yang, Deng Cai, Xiaofei He, Yueting Zhuang
Openended video question answering is a challenging problem in visual information retrieval, which automatically generates the natural language answer from the referenced video content according to the question. However, the existing visual question answering works only focus on the static image, which may be ineffectively applied to video question answering due to the temporal dynamics of video contents. In this paper, we consider the problem of openended video question answering from the viewpoint of spatiotemporal attentional encoderdecoder learning framework. We propose the hierarchical spatiotemporal attention network for learning the joint representation of the dynamic video contents according to the given question. We then develop the encoderdecoder learning method with reasoning recurrent neural networks for openended video question answering. We construct a largescale video question answering dataset. The extensive experiments show the effectiveness of our method.

Link Prediction via Ranking Metric DualLevel Attention Network Learning
zhou zhao, Ben Gao, Vincent Zheng, Deng Cai, Xiaofei He, Yueting Zhuang
Link prediction is a challenging problem for complex network analysis, arising in many disciplines such as social networks and telecommunication networks. Currently, many existing approaches estimate the proximity of the link endpoints for link prediction from their feature or the local neighborhood around them, which suffer from the localized view of network connections and insufficiency of discriminative feature representation. In this paper, we consider the problem of link prediction from the viewpoint of learning discriminative pathbased proximity ranking metric embedding. We propose a novel ranking metric network learning framework by jointly exploiting both nodelevel and pathlevel attentional proximity of the endpoints for link prediction. We then develop the pathbased duallevel reasoning attentional learning method with recurrent neural network for proximity ranking metric embedding. The extensive experiments on two largescale datasets show that our method achieves better performance than other stateoftheart solutions to the problem.

Deep Matrix Factorization Models for Recommender Systems
HongJian Xue, XinYu Dai, Jianbing Zhang, Shujian Huang, Jiajun Chen
Recommender systems usually make personalized recommendation with useritem interaction ratings, implicit feedback and auxiliary information. Matrix factorization is the basic idea to predict a personalized ranking over a set of items for an individual user with the similarities among users and items. In this paper, we propose a novel matrix factorization model with neural network architecture. Firstly, we construct a useritem matrix with explicit ratings and nonpreference implicit feedback. With this matrix as the input, we present a deep structure learning architecture to learn a common low dimensional space for the representations of users and items. Secondly, we design a new loss function based on binary cross entropy, in which we consider both explicit ratings and implicit feedback for a better optimization. The experimental results show the effectiveness of both our proposed model and the loss function. On several benchmark datasets, our model outperformed other stateoftheart methods. We also conduct extensive experiments to evaluate the performance within different experimental settings.

Imageembodied Knowledge Representation Learning
Ruobing Xie, Zhiyuan Liu, Huanbo Luan, Maosong Sun
Entity images could provide significant visual information for knowledge representation learning. Most conventional methods learn knowledge representations merely from structured triples, ignoring rich visual information extracted from entity images. In this paper, we propose a novel Imageembodied Knowledge Representation Learning model (IKRL), where knowledge representations are learned with both triple facts and images. More specifically, we first construct representations for all images of an entity with a neural image encoder. These image representations are then integrated into an aggregated imagebased representation via an attentionbased method. We evaluate our IKRL models on knowledge graph completion and triple classification. Experimental results demonstrate that our models outperform all baselines on both tasks, which indicates the significance of visual information for knowledge representations and the capability of our models in learning knowledge representations with images.

Two dimensional Large Margin Nearest Neighbor for Matrix Classification
Junwei Han, Feiping Nie, Kong Song
Matrices are common forms of data that are encountered in a wide range of real applications. How to classify this kind of data is an important research topic. In this paper, we propose a novel distance metric learning method named two dimensional large margin nearest neighbor (2DLMNNN), for improving the performance of k nearest neighbor (KNN) classifier in matrix classification. In the proposed method, left and right projection matrices are employed to define the matrixbased Mahalanobis distance, which is used to construct the objective aimed at separating points in different classes by a large margin. The parameters in those two projection matrices are much less than that in its vectorbased counterpart, thus our method reduces the risks of overfitting. We also introduce a framework for solving the proposed 2DLMNN. The convergence behavior, initialization, and parameter determination are also analyzed. Compared with vectorbased methods, 2DLMNN performs better for matrix data classification. Promising experimental results on several data sets are provided to demonstrate the effectiveness of our method.
Tuesday 22 10:30  12:00 MASATM  Agent Theories and Models

Plan Explanations as Model Reconciliation: Moving Beyond Explanation as Soliloquy
Tathagata Chakraborti, Sarath Sreedharan, Yu Zhang, Subbarao Kambhampati
When AI systems interact with humans in the loop, they are often called on to provide explanations for their plans and behavior. Past work on plan explanations primarily involved the AI system explaining the correctness of its plan and the rationale for its decision in terms of its own model. Such soliloquy is wholly inadequate in most realistic scenarios where the humans have domain and task models that differ significantly from that used by the AI system. We posit that the explanations are best studied in light of these differing models. In particular, we show how explanation can be seen as a "model reconciliation problem" (MRP), where the AI system in effect suggests changes to the human's model, so as to make its plan be optimal with respect to that changed human model. We will study the properties of such explanations, present algorithms for automatically computing them, and evaluate the performance of the algorithms.

Don't Bury your Head in Warnings: A GameTheoretic Approach for Intelligent Allocation of Cybersecurity Alerts
Aaron Schlenker, Milind Tambe, Christopher Kiekintveld, Haifeng Xu, Mina Guirguis, Arunesh Sinha, Solomon Sonya, Noah Dunstatter, Darryl Balderas
In recent years, there have been a number of successful cyber attacks on enterprise networks by malicious actors which have caused severe damage. These networks have Intrusion Detection and Prevention Systems in place to protect them, but they are notorious for producing a high volume of alerts. These alerts must be investigated by cyber analysts to determine whether they are an attack or benign. Unfortunately, there are magnitude more alerts generated than there are cyber analysts to investigate them. This trend is expected to continue into the future creating a need for tools which find optimal assignments of the incoming alerts to analysts in the presence of a strategic adversary. We address this challenge with the four following contributions: (1) a cyber screening game (CSG) model for the cyber network protection domain, (2) an NPhardness proof for computing the optimal strategy for the defender, (3) an algorithm that finds the optimal allocation of experts to alerts in the CSG, and (4) heuristic improvements for computing allocations in CSGs that accomplishes significant scaleup which we show empirically to closely match the solution quality of the optimal algorithm.

Pure Nash Equilibria in Online Fair Division
Martin Aleksandrov, Toby Walsh
We consider a fair division setting in which items arrive one by one and are allocated to agents via two existing mechanisms: LIKE and BALANCED LIKE. The LIKE mechanism is strategyproof whereas the BALANCED LIKE mechanism is not. Whilst LIKE is strategyproof, we show that it is not group strategyproof. Indeed, our first main result is that no online mechanism is group strategyproof. We then focus on pure Nash equilibria of these two mechanisms. Our second main result is that computing a pure Nash equilibrium is tractable for LIKE and intractable for BALANCED LIKE. Our third main result is that there could be multiple such profiles and counting them is also intractable even when we restrict our attention to equilibria with a specific property (e.g. envyfreeness, Pareto efficiency).

Synchronisation Games on Hypergraphs
Sunil Simon, Dominik Wojtczak
We study a strategic game model on hypergraphs where players, modelled by nodes, try to coordinate or anticoordinate their choices within certain groups of players, modelled by hyperedges. We show this model to be a strict generalisation of symmetric additively separable hedonic games to the hypergraph setting and that such games always have a pure Nash equilibrium, which can be computed in pseudopolynomial time. Moreover, in the pure coordination setting, we show that a strong equilibrium exists and can be computed in polynomial time when the game possesses a certain acyclic structure.

The OffSwitch Game
Dylan HadfieldMenell, Anca Dragan, Pieter Abbeel, Stuart Russell
It is clear that one of the primary tools we can use to mitigate the potential risk from a misbehaving AI system is the ability to turn the system off. As the capabilities of AI systems improve, it is important to ensure that such systems do not adopt subgoals that prevent a human from switching the system off. This is a challenge because many formulations of rational agents create strong incentives for selfpreservation. This is not caused by a builtin instinct, but because a rational agent will maximize expected utility and cannot achieve whatever objective it has been given if it is dead. Our goal is to study the incentives an agent has to allow itself to be switched off. We analyze a simple game between a human H and a robot R, where H can press R’s off switch but R can disable the off switch. A traditional agent takes its reward function for granted: we show that such agents have an incentive to disable the off switch, except in the special case where H is perfectly rational. Our key insight is that for R to want to preserve its off switch, it needs to be uncertain about the utility associated with the outcome, and to treat H’s actions as important observations about that utility. (R also has no incentive to switch itself off in this setting.) We conclude that giving machines an appropriate level of uncertainty about their objectives leads to safer designs, and we argue that this setting is a useful generalization of the classical AI paradigm of rational agents.

Score Aggregation via Spectral Method
Mingyu Xiao, Yuqing Wang
The score aggregation problem is to find an aggregate scoring over all candidates given individual scores provided by different agents. This is a fundamental problem with a broad range of applications in social choice and many other areas. The simple and commonly used method is to sum up all scores of each candidate, which is called the sumup method. In this paper, we give good algebraic and geometric explanations for score aggregation, and develop a spectral method for it. If we view the original scores as `noise data', our method can find an `optimal' aggregate scoring by minimizing the `noise information'. We also suggest a signaltonoise indicator to evaluate the validity of the aggregation or the consistency of the agents.
Tuesday 22 10:30  12:00 MLLGM  Learning Graphical Models

Deep Graphical Feature Learning for Face Sketch Synthesis
Mingrui Zhu, Nannan Wang, Xinbo Gao, Jie Li
The exemplarbased face sketch synthesis method generally contains two steps: neighbor selection and reconstruction weight representation. Pixel intensities are widely used as features by most of the existing exemplarbased methods, which lacks of representation ability and robustness to light variations and clutter backgrounds. We present a novel face sketch synthesis method combining generative exemplarbased method and discriminatively trained deep convolutional neural networks (dCNNs) via a deep graphical feature learning framework. Our method works in both two steps by using deep discriminative representations derived from dCNNs. Instead of using it directly, we boost its representation capability by a deep graphical feature learning framework. Finally, the optimal weights of deep representations and optimal reconstruction weights for face sketch synthesis can be obtained simultaneously. With the optimal reconstruction weights, we can synthesize high quality sketches which is robust against light variations and clutter backgrounds. Extensive experiments on public face sketch databases show that our method outperforms stateoftheart methods, in terms of both synthesis quality and recognition ability.

Locally Consistent Bayesian Network Scores for MultiRelational Data
Oliver Schulte, Sajjad Gholami
An important task for relational learning is Bayesian network (BN) structure learning. A fundamental component of structure learning is a model selection score that measures how well a model fits a dataset. We describe a new method that upgrades for multirelational databases, a loglinear BN score designed for singletable i.i.d. data. Chickering and Meek showed that for i.i.d. data, standard BN scores are locally consistent, meaning that their maxima converge to an optimal model, that represents the data generating distribution {\em and} contains no redundant edges. Our main theorem establishes that if a model selection score is locally consistent for i.i.d. data, then our upgraded gain function is locally consistent for relational data as well. To our knowledge this is the first consistency result for relational structure learning. A novel aspect of our approach is employing a {\em gain function} that compares two models: a current vs. an alternative BN structure. In contrast, previous approaches employed a score that is a function of a single model only. Empirical evaluation on six benchmark relational databases shows that our gain function is also practically useful: On realistic size data sets, it selects informative BN structures with a better data fit than those selected by baseline singlemodel scores.

Deepdense Conditional Random Fields for Object Cosegmentation
zehuan yuan, Tong Lu, Yirui Wu
We address the problem of object cosegmentation in images. Object cosegmentation aims to segment common objects in images and has promising applications in AI agents. We solve it by proposing a cooccurrence map, which measures how likely an image region belongs to an object and also appears in other images. The cooccurrence map of an image is calculated by combining two parts: objectness scores of image regions and similarity evidences from object proposals across images. We introduce a deepdense conditional random field framework to infer cooccurrence maps. Both similarity metric and objectness measure are learned endtoend in a single deep network. We evaluate our method on two benchmarks and achieve competitive performance.

Discriminative Bayesian Nonparametric Clustering
Vu Nguyen, Dinh Phung, Trung Le, Hung Bui
We propose a general framework for discriminative Bayesian nonparametric clustering to promote the interdiscrimination among the learned clusters in a fully Bayesian nonparametric (BNP) manner. Our method combines existing BNP clustering and discriminative models by enforcing latent cluster indices to be consistent with the predicted labels resulted from probabilistic discriminative model. This formulation results in a welldefined generative process wherein we can use either logistic regression or SVM for discrimination. Using the proposed framework, we develop two novel discriminative BNP variants: the discriminative Dirichlet process mixtures, and the discriminativestate infinite HMMs for sequential data. We develop efficient dataaugmentation Gibbs samplers for posterior inference. Extensive experiments in image clustering and dynamic location clustering demonstrate that by encouraging discrimination between induced clusters, our model enhances the quality of clustering in comparison with the traditional generative BNP models.

A Densitybased Nonparametric Model for Online Event Discovery from the Social Media Data
Jinjin Guo, Zhiguo Gong
In this paper, we propose a novel online event discovery model DPdensity to capture various events from the social media data. The proposed model can flexibly accommodate the incremental arriving of the social documents in an online manner by leveraging Dirichlet Process, and a density based technique is exploited to deduce the temporal dynamics of events. The spatial patterns of events are also incorporated in the model by a mixture of Gaussians. To remove the bias caused by the streaming process of the documents, Sequential Monte Carlo is used for the parameter inference. Our extensive experiments over two different real datasets show that the proposed model is capable to extract interpretable events effectively in terms of perplexity and coherence.

Inverse Covariance Estimation with Structured Groups
Shaozhe Tao, Yifan Sun, Daniel Boley
Estimating the inverse covariance matrix of p variables from n observations is challenging when n is much less than p, since the sample covariance matrix is singular and cannot be inverted. A popular solution is to optimize for the L1 penalized estimator; however, this does not incorporate structure domain knowledge and can be expensive to optimize. We consider finding inverse covariance matrices with group structure, defined as potentially overlapping principal submatrices, determined from domain knowledge (e.g. categories or graph cliques). We propose a new estimator for this problem setting that can be derived efficiently via the conditional gradient method, leveraging chordal decomposition theory for scalability. Simulation results show significant improvement in sample complexity when the correct group structure is known. We also apply these estimators to 14,910 stock closing prices, with noticeable improvement when group sparsity is exploited.
Tuesday 22 10:30  12:00 MLAL  Active Learning

On Gleaning Knowledge from Multiple Domains for Active Learning
Zengmao Wang, Bo Du, Lefei Zhang, Liangpei Zhang, Ruimin Hu, Dacheng Tao
How can a doctor diagnose new diseases with little historical knowledge, which are emerging over time? Active learning is a promising way to address the problem by querying the most informative samples. Since the diagnosed cases for new disease are very limited, gleaning knowledge from other domains (classical prescriptions) to prevent the bias of active leaning would be vital for accurate diagnosis. In this paper, a framework that attempts to glean knowledge from multiple domains for active learning by querying the most uncertain and representative samples from the target domain and calculating the importance weights for reweighting the source data in a single unified formulation is proposed. The weights are optimized by both a supervised classifier and distribution matching between the source domain and target domain with maximum mean discrepancy. Besides, a multiple domains active learning method is designed based on the proposed framework as an example. The proposed method is verified with newsgroups and handwritten digits data recognition tasks, where it outperforms the stateoftheart methods.

High Dimensional Bayesian Optimization using Dropout
Cheng Li, Sunil Gupta, Santu Rana, Vu Nguyen, Svetha Venkatesh, alistair shilton
Scaling Bayesian optimization to high dimensions is challenging task as the global optimization of highdimensional acquisition function can be expensive and often infeasible. Existing methods depend either on limited “active” variables or the additive form of the objective function. We propose a new method for highdimensional Bayesian optimization, that uses a dropout strategy to optimize only a subset of variables at each iteration. We derive theoretical bounds for the regret and show how it can inform the derivation of our algorithm. We demonstrate the efficacy of our algorithms for optimization on two benchmark functions and two realworld applications  training cascade classifiers and optimizing alloy composition.

CostEffective Active Learning from Diverse Labelers
ShengJun Huang, ZhiHua Zhou, JiaLue Chen, Xin Mu
In traditional active learning, there is only one labeler that always returns the ground truth of queried labels. However, in many applications, multiple labelers are available to offer diverse qualities of labeling with different costs. In this paper, we perform active selection on both instances and labelers, aiming to improve the classification model most with the lowest cost. While the cost of a labeler is proportional to its overall labeling quality, we also observe that different labelers usually have diverse expertise, and thus it is likely that labelers with a low overall quality can provide accurate labels on some specific instances. Based on this fact, we propose a novel active selection criterion to evaluate the costeffectiveness of instancelabeler pairs, which ensures that the selected instance is helpful for improving the classification model, and meanwhile the selected labeler can provide an accurate label for the instance with a relative low cost. Experiments on both UCI and real crowdsourcing data sets demonstrate the superiority of our proposed approach on selecting costeffective queries.

Multiinstance multilabel active learning
ShengJun Huang, Nengneng Gao, Songcan Chen
Multiinstance multilabel learning(MIML) has been successfully applied into many realworld applications. Along with the enhancing of the expressive power, the cost of labelling a MIML example increases significantly. And thus it becomes an important task to train an effective MIML model with as few labelled examples as possible. Active learning, which actively selects the most valuable data to query their labels, is a main approach to reducing labeling cost. Existing active methods achieved great success in traditional learning tasks, but cannot be directly applied to MIML problems. In this paper, we propose a MIML active learning algorithm, which exploits diversity and uncertainty in both the input and output space to query the most valuable information. This algorithm designs a novel query strategy for MIML objects specifically and acquires more precise information from the oracle without addition cost. Based on the queried information, the MIML model is then effectively trained by simultaneously optimizing the relative rank among instances and labels.

COBRA: A Fast and Simple Method for Active Clustering with Pairwise Constraints
Toon Van Craenendonck, Sebastijan Dumancic, Hendrik Blockeel
Clustering is inherently illposed: there often exist multiple valid clusterings of a single dataset, and without any additional information a clustering system has no way of knowing which clustering it should produce. This motivates the use of constraints in clustering, as they allow users to communicate their interests to the clustering system. Active constraintbased clustering algorithms select the most useful constraints to query, aiming to produce a good clustering using as few constraints as possible. We propose COBRA, an active method that first overclusters the data by running Kmeans with a $K$ that is intended to be too large, and subsequently merges the resulting small clusters into larger ones based on pairwise constraints. In its merging step, COBRA is able to keep the number of pairwise queries low by maximally exploiting constraint transitivity and entailment. We experimentally show that COBRA outperforms the state of the art in terms of clustering quality and runtime, without requiring the number of clusters in advance.

Correlational Dueling Bandits with Application to Clinical Treatment in Large Decision Space
Yanan Sui, Joel Burdick
We consider sequential decision making under uncertainty, the optimization over large decision space with noisy comparative feedback. This problem can be formulated as a $K$armed Dueling Bandits problem where $K$ is the total number of decisions. When $K$ is very large, existing dueling bandits algorithms suffer huge cumulative regret before converging on the optimal arm. This paper studies the dueling bandits problem with a large number of dependent arms. Our problem is motivated by a clinical decision making process in large decision space. We propose an efficient algorithm CorrDuel for the problem which makes decisions to simultaneously deliver effective therapy and explore the decision space. Many sequential decision making problems with large and structured decision space could be facilitated by our algorithm. After evaluated the fast convergence of CorrDuel in analysis and simulation experiments, we applied it on a live clinical trial of therapeutic spinal cord stimulation. It is the first applied algorithm towards spinal cord injury treatments and experimental results show the effectiveness and efficiency of our algorithm.
Tuesday 22 10:30  12:00 CSCS1  Constraint Satisfaction 1

On Neighborhood Singleton Consistencies
Kostas Stergiou, Anastasia Paparrizou
CP solvers predominantly use arc consistency (AC) as the default propagation method. Many stronger consistencies, such as triangle consistencies (e.g. RPC and maxRPC) exist, but their use is limited despite results showing that they outperform AC on many problems. This is due to the intricacies involved in incorporating them into solvers. On the other hand, singleton consistencies such as SAC can be easily crafted into solvers but they are too expensive. We seek a balance between the efficiency of triangle consistencies and the ease of implementation of singleton ones. Using the recently proposed variant of SAC called Neighborhood SAC as basis, we propose a family of weaker singleton consistencies. We study them theoretically, comparing their pruning power to existing consistencies. We make a detailed experimental study using a very simple algorithm for their implementation. Results demonstrate that they outperform the existing propagation techniques, often by orders of magnitude, on a wide range of problems.

Automatic Synthesis of Smart Table Constraints by Abstraction of Table Constraints
Baudouin Le Charlier, Minh Thanh KHONG, Christophe Lecoutre, Yves Deville
The smart table constraint represents a powerful modeling tool that has been recently introduced. This constraint allows the user to represent compactly a number of wellknown (global) constraints and more generally any arbitrarily structured constraints, especially when disjunction is at stake. In many problems, some constraints are given under the basic and simple form of tables explicitly listing the allowed combinations of values. In this paper, we propose an algorithm to convert automatically any (ordinary) table into a compact smart table. Its theoretical time complexity is shown to be quadratic in the size of the input table. Experimental results demonstrate its compression efficiency on many constraint cases while showing its reasonable execution time. It is then shown that using filtering algorithms on the resulting smart table is more efficient than using state of the art filtering algorithms on the initial table.

Learning to Run Heuristics in Tree Search
Elias Khalil, Bistra Dilkina, George Nemhauser, Shabbir Ahmed, Yufen Shao
``Primal heuristics'' are a key contributor to the improved performance of exact branchandbound solvers for combinatorial optimization and integer programming. Perhaps the most crucial question concerning primal heuristics is that of at which nodes they should run, to which the typical answer is via hardcoded rules or fixed solver parameters tuned, offline, by trialanderror. Alternatively, a heuristic should be run when it is most likely to succeed, based on the problem instance's characteristics, the state of the search, etc. In this work, we study the problem of deciding at which node a heuristic should be run, such that the overall (primal) performance of the solver is optimized. To our knowledge, this is the first attempt at formalizing and systematically addressing this problem. Central to our approach is the use of Machine Learning (ML) for predicting whether a heuristic will succeed at a given node. We give a theoretical framework for analyzing this decisionmaking process in a simplified setting, propose a ML approach for modeling heuristic success likelihood, and design practical rules that leverage the ML models to dynamically decide whether to run a heuristic at each node of the search tree. Experimentally, our approach improves the primal performance of a stateoftheart Mixed Integer Programming solver by up to 6% on a set of benchmark instances, and by up to 60% on a family of hard Independent Set instances.

LearningBased Abstractions for Nonlinear Constraint Solving
Sumanth Dathathri, Nikos Arechiga, Sicun Gao, Richard M. Murray
We propose a new abstraction refinement procedure based on machine learning to improve the performance of nonlinear constraint solving algorithms on largescale problems. The proposed approach decomposes the original set of constraints into smaller subsets, and uses learning algorithms to propose sequences of abstractions that take the form of conjunctions of classifiers. The core procedure is a refinement loop that keeps improving the learned results based on counterexamples that are obtained from partial constraints that are easy to solve. Experiments show that the proposed techniques significantly improve the performance of stateoftheart constraint solvers on many challenging benchmarks. The mechanism is capable of producing intermediate symbolic abstractions that are also important for many applications and for understanding the internal structures of hard constraint solving problems.

The Hard Problems Are Almost Everywhere For Random CNFXOR Formulas
Jeffrey M. Dudek, Kuldeep S. Meel, Moshe Y. Vardi
Recent universalhashing based approaches to sampling and counting crucially depend on the runtime performance of SAT solvers on formulas expressed as the conjunction of both CNF constraints and variablewidth XOR constraints (known as CNFXOR formulas). In this paper, we present the first study of the runtime behavior of SAT solvers equipped with XORreasoning techniques on random CNFXOR formulas. We empirically demonstrate that a stateoftheart SAT solver scales exponentially on random CNFXOR formulas across a wide range of XORclause densities, peaking around the empirical phasetransition location. On the theoretical front, we prove that the solution space of a random CNFXOR formula 'shatters' at all nonzero XORclause densities into wellseparated components, similar to the behavior seen in random CNF formulas known to be difficult for many SAT algorithms.

Personnel Scheduling as Satisfiability Modulo Theories
Christoph Erkinger, Nysret Musliu
Rotating workforce scheduling (RWS) is an important reallife personnel rostering problem that appears in a large number of different business areas. In this paper, we propose a new exact approach to RWS that exploits the recent advances on Satisfiability Modulo Theories (SMT). While solving can be automated by using a number of socalled SMTsolvers, the most challenging task is to find an efficient formulation of the problem in firstorder logic. We propose two new modeling techniques for RWS that encode the problem using formulas over different background theories. The first encoding provides an elegant approach based on linear integer arithmetic. Furthermore, we developed a new formulation based on bitvectors in order to achieve a more compact representation of the constraints and a reduced number of variables. These two modeling approaches were experimentally evaluated on benchmark instances from literature using different stateoftheart SMTsolvers. Compared to other exact methods, the results of this approach showed an important improvement in the number of found solutions.
Tuesday 22 10:30  12:00 MLCL1  Classification 1

Locality Adaptive Discriminant Analysis
Qi Wang, Mulin Chen, Feiping Nie, Xuelong Li
Linear Discriminant Analysis (LDA) is a popular technique for supervised dimensionality reduction, and its performance is satisfying when dealing with Gaussian distributed data. However, the neglect of local data structure makes LDA inapplicable to many realworld situations. So some works focus on the discriminant analysis between neighbor points, which can be easily affected by the noise in the original data space. In this paper, we propose a new supervised dimensionality reduction method, Locality Adaptive Discriminant Analysis (LADA), to lean a representative subspace of the data. Compared to LDA and its variants, the proposed method has three salient advantages: (1) it finds the principle projection directions without imposing any assumption on the data distribution; (2) it’s able to exploit the local manifold structure of data in the desired subspace; (3) it exploits the points’ neighbor relationship automatically without introducing any additional parameter to be tuned. Performance on synthetic datasets and realworld benchmark datasets demonstrate the superiority of the proposed method.

Interactive Image Segmentation via Pairwise Likelihood Learning
Tao Wang, Quansen Sun, Qi Ge, Zexuan Ji, Qiang Chen, Guiyu Xia
This paper presents an interactive image segmentation approach where the segmentation problem is formulated as a probabilistic estimation manner. Instead of measuring the distances between unseeded pixels and seeded pixels, we measure the similarities between pixel pairs and seed pairs to improve the robustness to the seeds. The unary prior probability of each pixel belonging to the foreground F and background B can be effectively estimated based on the similarities with label pairs (F, F),(F, B),(B, F) and (B, B). Then a likelihood learning framework is proposed to fuse the region and boundary information of the image by imposing the smoothing constraint on the unary potentials. Experiments on challenging data sets demonstrate that the proposed method can obtain better performance than stateoftheart methods.

Unsupervised Deep Video Hashing with Balanced Rotation
Gengshen Wu, Li Liu, Yuchen Guo, Guiguang Ding, Jungong Han, Jialie Shen, Ling Shao
Recently, hashing video contents for fast retrieval has received increasing attention due to the enormous growth of online videos. As the extension of image hashing techniques, traditional video hashing methods mainly focus on seeking the appropriate video features but pay little attention to how the videospecific features can be leveraged to achieve optimal binarization. In this paper, an endtoend hashing framework, namely Unsupervised Deep Video Hashing (UDVH), is proposed, where feature extraction, balanced code learning and hash function learning are integrated and optimized in a selftaught manner. Particularly, distinguished from previous work, our framework enjoys two novelties: 1) an unsupervised hashing method that integrates the feature clustering and feature binarization, enabling the neighborhood structure to be preserved in the binary space; 2) a smart rotation applied to the videospecific features that are widely spread in the lowdimensional space such that the variance of dimensions can be balanced, thus generating more effective hash codes. Extensive experiments have been performed on two realworld datasets and the results demonstrate its superiority, compared to the stateoftheart video hashing methods. To bootstrap further developments, the source code will be made publically available.

MAMRNN: Multilevel Attention Model Based RNN for Video Captioning
Xuelong Li, Bin Zhao, Xiaoqiang Lu
Visual information is quite important for the task of video captioning. However, in the video, there are a lot of uncorrelated content, which may cause interference to generate a correct caption. Based on this point, we attempt to exploit the visual features which are most correlated to the caption. In this paper, a Multilevel Attention Model based Recurrent Neural Network (MAMRNN) is proposed, where MAM is utilized to encode the visual feature and RNN works as the decoder to generate the video caption. During generation, the proposed approach is able to adaptively attend to the salient regions in the frame and the frames correlated to the caption. Practically, the experimental results on two benchmark datasets, i.e., MSVD and Charades, have shown the excellent performance of the proposed approach.

JMNet and ClusterSVM for Aerial Scene Classification
Xiaoqiang Lu, Yuan Yuan, Jie Fang
Aerial scene classification, which is a fundamental problem for remote sensing imagery, can automatically label an aerial image with a specific semantic category. Although deep learning has achieved competitive performance for aerial scene classification, training the conventional neural networks with aerial datasets will easily stick in overtting and local minimum. Because the aerial datasets only contain a few hundreds or thousands images, meanwhile the conventional networks usually contain millions of parameters to be trained. To address the problem, a novel convolutional neural network named JMNet is proposed in this paper, which has different size of convolution kernels in same layer and ignores the fully convolytion layer, so it has fewer parameters and can be trained well on aerial datasets. Additionally, ClusterSVM, a strategy to improve the accuracy and speed up the classification is used in the specific task. Finally, our method suparssed the stateofart result on the challenging AID dataset while cost shorter time and used smaller storage space.

MultiClass Support Vector Machine via Maximizing MultiClass Margins
Jie Xu, Cheng Deng, Zhouyuan Huo, Xianglong Liu, Feiping Nie, Heng Huang
Support Vector Machine (SVM) is originally proposed as a binary classification model, and it has already achieved great success in different applications. In reality, it is more often to solve a problem which has more than two classes. So, it is natural to extend SVM to a multiclass classifier. There have been many works proposed to construct a multiclass classifier based on binary SVM, such as one versus all strategy, one versus one strategy and Weston's multiclass SVM. One versus all strategy and one versus one strategy split the multiclass problem to multiple binary classification subproblems, and we need to train multiple binary classifiers. Weston's multiclass SVM is formed by ensuring risk constraints and imposing a specific regularization, like Frobenius norm. It is not derived by maximizing the margin between hyperplane and training data which is the motivation in SVM. In this paper, we propose a multiclass SVM model from the perspective of maximizing margin between training points and hyperplane, and analyze the relation between our model and other related methods. In the experiment, it shows that our model can get better or compared results when comparing with other related methods.
Tuesday 22 10:30  12:00 MLFSC1  Feature Selection and Construction 1

TUCH: Turning Crossview Hashing into Singleview Hashing via Generative Adversarial Nets
Xin Zhao, Guiguang Ding, Yuchen Guo, Jungong Han, Yue Gao
Crossview retrieval, which focuses on searching images as response to text queries or vice versa, has received increasing attention recently. Crossview hashing is to efficiently solve the crossview retrieval problem with binary hash codes. Most existing works on crossview hashing exploit multiview embedding method to tackle this problem, which inevitably causes the information loss in both image and text domains. Inspired by the Generative Adversarial Nets (GANs), this paper presents a new model that is able to Turn Crossview Hashing into singleview hashing (TUCH), thus enabling the information of image to be preserved as much as possible. TUCH is a novel deep architecture that integrates a language model network T for text feature extraction, a generator network G to generate fake images from text feature and a hashing network H for learning hashing functions to generate compact binary codes. Our architecture effectively unifies joint generative adversarial learning and crossview hashing. Extensive empirical evidence shows that our TUCH approach achieves stateoftheart results, especially on text to image retrieval, based on imagesentences datasets, i.e. standard IAPRTC12 and largescale Microsoft COCO.

Predicting Human Interaction via Relative Attention Model
Yichao YAN, Bingbing Ni, Xiaokang Yang
Predicting human interaction is challenging as the ongoing activity has to be inferred based on a partially observed video. Essentially, a good algorithm should effectively model the mutual influence between the two interacting subjects. Also, only a small region in the scene is discriminative for identifying the ongoing interaction. In this work, we propose a relative attention model to explicitly address these difficulties. Built on a tricoupled deep recurrent structure representing both interacting subjects and global interaction status, the proposed network collects spatiotemporal information from each subject, rectified with global interaction information, yielding effective interaction representation. Moreover, the proposed network also unifies an attention module to assign higher importance to the regions which are relevant to the ongoing action. Extensive experiments have been conducted on two public datasets, and the results demonstrate that the proposed relative attention network successfully predicts informative regions between interacting subjects, which in turn yields superior human interaction prediction accuracy.

Optimal Feature Selection for Decision Robustness in Bayesian Networks
YooJung Choi, Adnan Darwiche, Guy Van den Broeck
In many applications, one can define a large set of features to support the classification task at hand. At test time, however, these become prohibitively expensive to evaluate, and only a small subset of features is used, often selected for their informationtheoretic value. For thresholdbased, Naive Bayes classifiers, recent work has suggested selecting features that maximize the expected robustness of the classifier, that is, the expected probability it maintains its decision after seeing more features. We propose the first algorithm to compute this expected samedecision probability for general Bayesian network classifiers, based on compiling the network into a tractable circuit representation. Moreover, we develop a search algorithm for optimal feature selection that utilizes efficient incremental circuit modifications. Experiments on Naive Bayes, as well as more general networks, show the efficacy and distinct behavior of this decisionmaking approach.

Semisupervised Feature Selection via Rescaled Linear Regression
Xiaojun Chen, Feiping Nie, Guowen Yuan, Joshua Huang
With the rapid increase of complex and highdimensional sparse data, demands for new methods to select features by exploiting both labeled and unlabeled data have increased. Least regression based feature selection methods usually learn a projection matrix and rank the importance of features using the projection matrix, which is lack of theoretical explanation. Moreover, these methods cannot find both global and sparse solution of the projection matrix. In this paper, we propose a novel semisupervised feature selection method which can learn both global and sparse solution of the projection matrix. The new method extends the least square regression model by rescaling the regression coefficients in the least square regression with a set of scale factors, which are used for ranking the importance of features. It has shown that the new model is convex and uses $\ell_{2,1}$ norm implicitly, therefore, it can learn global and sparse solution. Moreover, the introduction of scale factors provides a theoretical explanation for why we can use the projection matrix to rank the importance of features. A simple yet effective algorithm with proved convergence is proposed to optimize the new model. Experimental results on eight reallife data sets show the superiority of the method.

Multimodal Linear Discriminant Analysis via Structural Sparsity
Yu Zhang, Yuan Jiang
Linear discriminant analysis (LDA) is a widely used supervised dimensionality reduction technique. Even though the LDA method has many realworld applications, it has some limitations such as the singlemodal problem that each class follows a normal distribution. To solve this problem, we propose a method called multimodal linear discriminant analysis (MLDA). By generalizing the betweenclass and withinclass scatter matrices, the MLDA model can allow each data point to have its own class mean which is called the instancespecific class mean. Then in each class, data points which share the same or similar instancespecific class means are considered to form one cluster or modal. In order to learn the instancespecific class means, we use the ratio of the proposed generalized betweenclass scatter measure over the proposed generalized withinclass scatter measure, which encourages the class separability, as a criterion. The observation that each class will have a limited number of clusters inspires us to use a structural sparse regularizor to control the number of unique instancespecific class means in each class. Experiments on both synthetic and realworld datasets demonstrate the effectiveness of the proposed MLDA method.

Learning Mahalanobis Distance Metric: Considering Instance Disturbance Helps
HanJia Ye, DeChuan Zhan, XueMin Si, Yuan Jiang
Mahalanobis distance metric takes feature weights and correlation into account in the distance computation, which can improve the performance of many similarity/dissimilarity based methods, such as kNN. Most existing distance metric learning methods obtain metric based on the raw features and side information but neglect the reliability of them. Noises or disturbances on instances will make changes on their relationships, so as to affect the learned metric.In this paper, we claim that considering disturbance of instances may help the distance metric learning approach get a robust metric, and propose the Distance metRIc learning Facilitated by disTurbances (DRIFT) approach. In DRIFT, the noise or the disturbance of each instance is learned. Therefore, the distance between each pair of (noisy) instances can be better estimated, which facilitates side information utilization and metric learning.Experiments on prediction and visualization clearly indicate the effectiveness of the proposed approach.
Tuesday 22 10:30  12:30 SISKR  Sister Conference Track: Knowledge Representation

A Verified SAT Solver Framework with Learn, Forget, Restart, and Incrementality
Jasmin Blanchette, Mathias Fleury, Christoph Weidenbach
We developed a formal framework for SAT solving using the Isabelle/HOL proof assistant. Through a chain of refinements, an abstract CDCL (conflictdriven clause learning) calculus is connected to a SAT solver that always terminates with correct answers. The framework offers a convenient way to prove theorems about the SAT solver and experiment with variants of the calculus. Compared with earlier verifications, the main novelties are the inclusion of the CDCL rules for forget, restart, and incremental solving and the use of refinement.

Unsatisfiable Core Shrinking for Anytime Answer Set Optimization
Mario Alviano, Carmine Dodaro
Efficient algorithms for the computation of optimum stable models are based on unsatisfiable core analysis. However, these algorithms essentially run to completion, providing few or even no suboptimal stable models. This drawback can be circumvented by shrinking unsatisfiable cores. Interestingly, the resulting anytime algorithm can solve more instances than the original algorithm.

KSP: A Resolutionbased Prover for Multimodal K, Abridged Report
Cláudia Nalon, Ullrich Hustadt, Clare Dixon
In this paper, we briefly describe an implementation of a hyperresolutionbased calculus for the propositional basic multimodal logic, Kn. The prover, KSP, is designed to support experimentation with different combinations of refinements for its basic calculus. The prover allows for both local and global reasoning. We present an experimental evaluation that compares KSP with a range of existing reasoners for Kn.

Concerning Referring Expressions in Query Answers
Alexander Borgida, David Toman, Grant Weddell
A referring expression in linguistics is a noun phrase that identifies individuals to listeners. In the context of a query over a first order knowledge base, referring expressions to answers are usually constant symbols. This paper motivates and initiates the exploration of allowing more general formulas, called singular referring expressions, to replace constants in this role. Referring expression types play a novel and significant role in analyzing the properties of candidate expressions.

FirstOrder Modular Logic Programs and their Conservative Extensions (Extended Abstract)
Amelia Harrison
This paper introduces firstorder modular logic programs, which provide a way of viewing answer set programs as consisting of many independent, meaningful modules. We also present conservative extensions of such programs. This concept helps to identify strong relationships between modular programs as well as between traditional programs. For example, we illustrate how the notion of a conservative extension can be used to justify the common projection rewriting. This is a short version of a paper was presented at the 32nd International Conference on Logic Programming (Harrison and Lierler, 2016).

nanoCoP: Natural Nonclausal Theorem Proving
Jens Otten
Most efficient fully automated theorem provers implement proof search calculi that require the input formula to be in a clausal form, i.e. disjunctive or conjunctive normal form. The translation into clausal form introduces a significant overhead to the proof search and modifies the structure of the original formula. Translating a proof in clausal form back into a more readable nonclausal proof of the original formula is not straightforward. This paper presents a nonclausal automated theorem prover for classical firstorder logic. It is based on a nonclausal connection calculus and implemented with a few lines of Prolog code. Working entirely on the original structure of the input formula yields not only a speed up of the proof search, but the resulting nonclausal proofs are also shorter.
Tuesday 22 10:30  12:30 EAR1  Early Career 1

Game Theoretic Analysis of Security and Sustainability
Bo An
Computational game theory has become a powerful tool to address critical issues in security and sustainability. Casting the security resource allocation problem as a Stackelberg game, novel algorithms have been developed to provide randomized security resource allocations. These algorithms have led to deployed securitygame based decision aids for many realworld security domains including infrastructure security and wildlife protection. We contribute to this community by addressing several major research challenges in complex security resource allocation, including dynamic payoffs, uncertainty, protection externality, games on networks, and strategic secrecy. We also analyze optimal security resource allocation in many potential application domains including cyber security. Furthermore, we apply game theory to reasoning optimal policy in deciding taxi pricing scheme and EV charging placement and pricing.

Committee Scoring Rules: A Call to Arms
Piotr Faliszewski
Committee scoring rules are a class of voting rules used to select sets of candidates based on the preferences of the voters. The goal of this paper is to present this class and to invite researchers to study its properties (computational and axiomatic alike).

Reinforcement mechanism design
Pingzhong Tang
We put forward a modeling and algorithmic framework to design and optimize mechanisms in dynamic industrial environments where a designer can make use of the data generated in the process to automatically improve future design. Our solution, coined {\em reinforcement mechanism design}, is rooted in game theory but incorporates recent AI techniques to get rid of nonrealistic modeling assumptions and to make automated optimization feasible. We instantiate our framework on the key application scenarios of Baidu and Taobao, two of the largest mobile app companies in China. For the Taobao case, our framework automatically designs mechanisms that allocate buyer impressions for the ecommerce website; for the Baidu case, our framework automatically designs dynamic reserve pricing schemes of advertisement auctions of the search engine. Experiments show that our solutions outperform the stateoftheart alternatives and those currently deployed, under both scenarios.

Securing and scaling cryptocurrencies
Aviv Zohar
Bitcoin, a protocol for a new permissionless decentralized digital currency hailed the arrival of a new application domain for computer science. Following Bitcoin's arrival, a series of innovations derived from the state of the art in several fields has been applied to cryptocurrencies, and has been slowly reshaping monetary and financial instruments on public distributed ledgers. It was soon clear however that Bitcoin and similar cryptocurrencies still require additional improvements. This challenging domain presents researchers in the field with new and exciting questions. I provide examples from two main research threads, related to the scalability of the protocol and to its underlying incentives.
Tuesday 22 14:00  15:00 Invited Talk Explaining to Educate with Natural Language Processing
Marti Hearst
Tuesday 22 14:00  15:00 Invited Talk Swift Logics for Big Data
Georg Gottlob
Tuesday 22 15:00  16:00 Panel Autonomy and AI
Participants: TBD
Tuesday 22 15:00  16:00 Competition Angry Birds
Tuesday 22 15:00  16:00 ROBMPP  Motion and Path Planning

On the Power and Limitations of Deception in MultiRobot Adversarial Patrolling
Noga Talmor, Noa Agmon
Multirobot adversarial patrolling is a well studied problem, investigating how defenders can optimally use all given resources for maximizing the probability of detecting penetrations, that are controlled by an adversary. It is commonly assumed that the adversary in this problem is rational, thus uses the knowledge it has on the patrolling robots (namely, the number of robots, their location, characteristics and strategy) to optimize its own chances to penetrate successfully. In this paper we present a novel defending approach which manipulates the adversarial (possibly partial) knowledge on the patrolling robots, so that it will believe the robots have more power than they actually have. We describe two different ways of deceiving the adversary: Window Deception, in which it is assumed that the adversary has partial observability of the perimeter, and Scarecrow Deception, in which some of the patrolling robots only appear as real robots, though they have no ability to actually detect the adversary. We analyze the limitations of both models, and suggest a randombased approach for optimally deceiving the adversary that considers both the resources of the defenders, and the adversarial knowledge.

Compromisefree Pathfinding on a Navigation Mesh
Michael Cui, Daniel Harabor, Alban Grastien
We want to compute geometric shortest paths in a collection of convex traversable polygons, also known as a navigation mesh. Simple to compute and easy to update, navigation meshes are widely used for pathfinding in computer games. When the mesh is static, shortest path problems can be solved exactly and very fast but only after a costly preprocessing step. When the mesh is dynamic, practitioners turn to online methods which typically compute only approximately shortest paths. In this work we present a new pathfinding algorithm which is compromisefree; i.e. it is simultaneously fast, online and optimal. Our method, Polyanya, extends and generalises Anya; a recent and related intervalbased search technique developed for computing geometric shortest paths in grids. We show how that algorithm can be modified to support search over arbitrary sets of convex polygons and then evaluate its performance on a range of realistic and synthetic benchmark problems.

Switched Linear MultiRobot Navigation Using Hierarchical Model Predictive Control
Chao Huang, xin chen, Yifan Zhang, Shengchao Qin, Yifeng Zeng, Xuandong Li
Multirobot navigation control in the absence of reference trajectory is rather challenging as it is expected to ensure stability and feasibility while still offer fast computation on control decisions. The intrinsic high complexity of switched linear dynamical robots makes the problem even more challenging. In this paper, we propose a novel HMPC based method to address the navigation problem of multiple robots with switched linear dynamics. We develop a new technique to compute the reachable sets of switched linear systems and use them to enable the parallel computation of control parameters. We present theoretical results on stability, feasibility and complexity of the proposed approach, and demonstrate its empirical advance in performance against other approaches.

Maintaining Communication in MultiRobot Tree Coverage
Mor Sinay, Oleg Maximov, Noa Agmon, Sarit Kraus, David Peleg
Area coverage is an important task for mobile robots, mainly due to its applicability in many domains, such as search and rescue. In this paper we study the problem of multirobot coverage, in which the robots must obey a strong communication restriction: they should maintain connectivity between teammates throughout the coverage. We formally describe the MultiRobot Connected Tree Coverage problem, and an algorithm for covering perfect Nary trees while adhering to the communication requirement. The algorithm is analyzed theoretically, providing guarantees for coverage time by the notion of speedup factor. We enhance the theoreticallyproven solution with a dripping heuristic algorithm, and show in extensive simulations that it significantly decreases the coverage time. The algorithm is then adjusted to general (not necessarily perfect) Nary trees and additional experiments prove its efficiency. Furthermore, we show the use of our solution in a simulated officebuilding scenario. Finally, we deploy our algorithm on real robots in a real office building setting, showing efficient coverage time in practice.
Tuesday 22 15:00  16:00 NLPNLP  Natural Language Processing

Microblog Sentiment Classiﬁcation via Recurrent Random Walk Network Learning
zhou zhao, Hanqing Lu, Deng Cai, Xiaofei He, Yueting Zhuang
Microblog Sentiment Classiﬁcation (MSC) is a challenging task in microblog mining, arising in many applications such as stock price prediction and crisis management. Currently, most of the existing approaches learn the user sentiment model from their posted tweets in microblogs, which suffer from the insufﬁciency of discriminative tweet representation. In this paper, we consider the problem of microblog sentiment classiﬁcation from the viewpoint of heterogeneous MSC network embedding. We propose a novel recurrent random walk network learning framework for the problem by exploiting both users’ posted tweets and their social relations in microblogs. We then introduce the deep recurrent neural networks with randomwalk layer for heterogeneous MSC network embedding, which can be trained endtoend from the scratch. Weemploythebackpropagationmethodfortraining the proposed recurrent random walk network model. The extensive experiments on the largescale public datasets from Twitter show that our method achieves better performance than other stateoftheart solutions to the problem.

A Variational Autoencoding Approach for Inducing Crosslingual Word Embeddings
Liangchen Wei, ZhiHong Deng
Crosslanguage learning allows one to use training data from one language to build models for another language. Many traditional approaches require wordlevel alignment sentences from parallel corpora, in this paper we define a general bilingual training objective function requiring sentence level parallel corpus only. We propose a variational autoencoding approach for training bilingual word embeddings. The variational model introduces a continuous latent variable to explicitly model the underlying semantics of the parallel sentence pairs and to guide the generation of the sentence pairs. Our model restricts the bilingual word embeddings to represent words in exactly the same continuous vector space. Empirical results on the task of cross lingual document classification has shown that our method is effective.

Automatic Assessment of Absolute Sentence Complexity
Sanja Stajner, Simone Paolo Ponzetto, Heiner Stuckenschmidt
Lexically and syntactically simpler sentences result in shorter reading time and better understanding in many people. However, no reliable systems for automatic assessment of absolute sentence complexity have been proposed so far. Instead, the assessment is usually done manually, requiring expert human annotators. To address this problem, we first define the sentence complexity assessment as a fivelevel classification task, and build a ‘gold standard’ dataset. Next, we propose robust systems for sentence complexity assessment, using a novel set of features based on leveraging lexical properties of freely available corpora, and investigate the impact of the feature type and corpus size on the classification performance.

Why Can't You Convince Me? Modeling Weaknesses in Unpersuasive Arguments
Isaac Persing, Vincent Ng
Recent work on argument persuasiveness has focused on determining how persuasive an argument is. Oftentimes, however, it is equally important to understand why an argument is unpersuasive, as it is difficult for an author to make her argument more persuasive unless she first knows what errors made it unpersuasive. Motivated by this practical concern, we (1) annotate a corpus of debate comments with not only their persuasiveness scores but also the errors they contain, (2) propose an approach to persuasiveness scoring and error identification that outperforms competing baselines, and (3) show that the persuasiveness scores computed by our approach can indeed be explained by the errors it identifies.
Tuesday 22 15:00  16:00 UAIAPI2  Approximate Probabilistic Inference 2

The Mixing of Markov Chains on Linear Extensions in Practice
Topi Talvitie, Teppo Niinimäki, Mikko Koivisto
We investigate almost uniform sampling from the set of linear extensions of a given partial order. The most efficient schemes stem from Markov chains whose mixing time bounds are polynomial, yet impractically large. We show that, on instances one encounters in practice, the actual mixing times can be much smaller than the worstcase bounds, and particularly so for a novel Markov chain we put forward. We circumvent the inherent hardness of estimating standard mixing times by introducing a refined notion, which admits estimation for moderatesize partial orders. Our empirical results suggest that the Markov chain approach to sample linear extensions can be made to scale well in practice, provided that the actual mixing times can be realized by instancesensitive upper bounds or termination rules. Examples of the latter include existing perfect simulation algorithms, whose running times in our experiments follow the actual mixing times of certain chains, albeit with significant overhead.

Approximating Discrete Probability Distribution of Image Emotions by MultiModal Features Fusion
Sicheng Zhao, Guiguang Ding, Yue Gao, Jungong Han
Existing works on image emotion recognition mainly assigned the dominant emotion category or average dimension values to an image based on the assumption that viewers can reach a consensus on the emotion of images. However, the image emotions perceived by viewers are subjective by nature and highly related to the personal and situational factors. On the other hand, image emotions can be conveyed by different features, such as semantics and aesthetics. In this paper, we propose a novel machine learning approach that formulates the categorical image emotions as a discrete probability distribution (DPD). To associate emotions with the extracted visual features, we present a weighted multimodal shared sparse leaning to learn the combination coefficients, with which the DPD of an unseen image can be predicted by linearly integrating the DPDs of the training images. The representation abilities of different modalities are jointly explored and the optimal weight of each modality is automatically learned. Extensive experiments on three datasets verify the superiority of the proposed method, as compared to the stateoftheart.

Scalable Estimation of Dirichlet Process Mixture Models on Distributed Data
Ruohui Wang, Dahua Lin
We consider the estimation of Dirichlet Process Mixture Models (DPMMs) in distributed environments, where data are distributed across multiple computing nodes. A key advantage of Bayesian nonparametric models such as DPMMs is that they allow new components to be introduced on the fly as needed. This, however, posts an important challenge to distributed estimation  how to handle new components efficiently and consistently. To tackle this problem, we propose a new estimation method, which allows new components to be created locally in individual computing nodes. Components corresponding to the same cluster will be identified and merged via a probabilistic consolidation scheme. In this way, we can maintain the consistency of estimation with very low communication cost. Experiments on large realworld data sets show that the proposed method can achieve high scalability in distributed and asynchronous environments without compromising the mixing performance.

Compressed Nonparametric Language Modelling
Ehsan Shareghi, Gholamreza Haffari, Trevor Cohn
Hierarchical PitmanYor Process priors are compelling for learning language models, outperforming pointestimate based methods. However, these models remain unpopular due to computational and statistical inference issues, such as memory and time usage, as well as poor mixing of sampler. In this work we propose a novel framework which represents the HPYP model compactly using compressed suffix trees. Then, we develop an efficient approximate inference scheme in this framework that has a much lower memory footprint compared to full HPYP and is fast in the inference time. The experimental results illustrate that our model can be built on significantly larger datasets compared to previous HPYP models, while being several orders of magnitudes smaller, fast for training and inference, and outperforming the perplexity of the stateoftheart Modified KneserNey countbased LM smoothing by up to 15%.
Tuesday 22 15:00  16:00 KRCMR  Common Sense Reasoning

Explicit Knowledgebased Reasoning for Visual Question Answering
Peng Wang, Qi Wu, Chunhua Shen, Anthony Dick, Anton van den Hengel
We describe a method for visual question answering which is capable of reasoning about an image on the basis of information extracted from a largescale knowledge base. The method not only answers natural language questions using concepts not contained in the image, but can explain the reasoning by which it developed its answer. It is capable of answering far more complex questions than the predominant long shortterm memorybased approach, and outperforms it significantly in testing. We also provide a dataset and a protocol by which to evaluate general visual question answering methods.

Induction of Interpretable Possibilistic Logic Theories from Relational Data
Ondrej Kuzelka, Jesse Davis, Steven Schockaert
The field of statistical relational learning (SRL) is concerned with learning probabilistic models from relational data. Learned SRL models are typically represented using some kind of weighted logical formulas, which makes them considerably more interpretable than those obtained by e.g. neural networks. In practice, however, these models are often still difficult to interpret correctly, as they can contain many formulas that interact in nontrivial ways and weights do not always have an intuitive meaning. To address this, we propose a new SRL method which uses possibilistic logic to encode relational models. Learned models are then essentially stratified classical theories, which explicitly encode what can be derived with a given level of certainty. Compared to Markov Logic Networks (MLNs), our method is faster and produces considerably more interpretable models.

What Can You Do with a Rock? Affordance Extraction via Word Embeddings
Nancy Fulda, Daniel Ricks, Ben Murdoch, David Wingate
Autonomous agents must often detect affordances: the set of behaviors enabled by a situation. Affordance extraction is particularly helpful in domains with large action spaces, allowing the agent to prune its search space by avoiding futile behaviors. This paper presents a method for affordance extraction via word embeddings trained on a tagged Wikipedia corpus. The resulting word vectors are treated as a common knowledge database which can be queried using linear algebra. We apply this method to a reinforcement learning agent in a textonly environment and show that affordancebased action selection improves performance in most cases. Our method increases the computational complexity of each learning step but significantly reduces the total number of steps needed. In addition, the agent's action selections begin to resemble those a human would choose.

How a GeneralPurpose Commonsense Ontology can Improve Performance of LearningBased Image Retrieval
Rodrigo Toro Icarte, Alvaro Soto, Jorge Baier, Cristian Ruz
The knowledge representation community has built generalpurpose ontologies which contain large amounts of commonsense knowledge over relevant aspects of the world, including useful visual information, e.g.: "a ball is used by a football player", "a tennis player is located at a tennis court". Current stateoftheart approaches for visual recognition do not exploit these rulebased knowledge sources. Instead, they learn recognition models directly from training examples. In this paper, we study how generalpurpose ontologies—specifically, MIT's ConceptNet ontology—can improve the performance of stateoftheart vision systems. As a testbed, we tackle the problem of sentencebased image retrieval. Our retrieval approach incorporates knowledge from ConceptNet on top of a large pool of object detectors derived from a deep learning technique. In our experiments, we show that ConceptNet can improve performance on a common benchmark dataset. Key to our performance is the use of the ESPGAME dataset to select visually relevant relations from ConceptNet. Consequently, a main conclusion of this work is that generalpurpose commonsense ontologies improve performance on visual reasoning tasks when properly filtered to select meaningful visual relations.
Tuesday 22 15:00  16:00 MTCBH  Computational Biology and eHealth

The DNA Word Design Problem: A New Constraint Model and New Results
Michael Codish, Michael Frank, Vitaly Lagoon
A fundamental problem in coding theory concerns the computation of A_q^U(n,d)  the maximum cardinality of a set S of length n code words over an alphabet of size q, such that every pair of code words has Hamming distance at least d, and the set of additional constraints U on S is satisfied. This problem has application in several areas, one of which is the design of DNA codes where q=4 and the alphabet is {A,C,G,T}. We describe a new constraint model for this problem and demonstrate that it improves on previous solutions (computes better lower bounds) for various instances of the problem. Our approach is based on a clustering of DNA words into small sets of words. Solutions are then obtained as the union of such clusters. Our approach is SAT based: we specify constraints on clusters of DNA words and solve these using a Boolean satisfiability solver.

Deep Neural Networks for High Dimension, Low Sample Size Data
Liu Bo, Ying WEI, Yu Zhang, Qiang Yang
Deep neural networks (DNN) have achieved breakthroughs in applications with large sample size. However, when facing high dimension, low sample size (HDLSS) data, such as the phenotype prediction problem using genetic data in bioinformatics, DNN suffers from overfitting and highvariance gradients. In this paper, we propose a DNN model tailored for the HDLSS data, named Deep Neural Pursuit (DNP). DNP selects a subset of high dimensional features for the alleviation of overfitting and takes the average over multiple dropouts to calculate gradients with low variance. As the first DNN method applied on the HDLSS data, DNP enjoys the advantages of the high nonlinearity, the robustness to high dimensionality, the capability of learning from a small number of samples, the stability in feature selection, and the endtoend training. We demonstrate these advantages of DNP via empirical results on both synthetic and realworld biological datasets.

Fast Sparse Gaussian Markov Random Fields Learning Based on Cholesky Factorization
Ivan Stojkovic, Vladisav Jelisavcic, Veljko Milutinovic, Zoran Obradovic
Learning the sparse Gaussian Markov Random Field, or conversely, estimating the sparse inverse covariance matrix is an approach to uncover the underlying dependency structure in data. Most of the current methods solve the problem by optimizing the maximum likelihood objective with a Laplace prior L1 on entries of a precision matrix. We propose a novel objective with a regularization term which penalizes an approximate product of the Cholesky decomposed precision matrix. This new reparametrization of the penalty term allows efficient coordinate descent optimization, which in synergy with an active set approach results in a very fast and efficient method for learning the sparse inverse covariance matrix. We evaluated the speed and solution quality of the newly proposed SCHL method on problems consisting of up to 24,840 variables. Our approach was several times faster than three stateoftheart approaches. We also demonstrate that SCHL can be used to discover interpretable networks, by applying it to a high impact problem from the health informatics domain.

Predicting Alzheimer's Disease Cognitive Assessment via Robust LowRank Structured Sparse Model
Jie Xu, Cheng Deng, Xinbo Gao, Dinggang Shen, Heng Huang
Alzheimer's disease (AD) is a neurodegenerative disorder with slow onset, which could result in the deterioration of the duration of persistent neurological dysfunction. How to identify the informative longitudinal phenotypic neuroimaging markers and predict cognitive measures are crucial to recognize AD at early stage. Many existing models related imaging measures to cognitive status using regression models, but they did not take full consideration of the interaction between cognitive scores. In this paper, we propose a robust lowrank structured sparse regression method (RLSR) to address this issue. The proposed model simultaneously selects effective features and learns the underlying structure between cognitive scores by utilizing novel mixed structured sparsity inducing norms and lowrank approximation. In addition, an efficient algorithm is derived to solve the proposed nonsmooth objective function with proved convergence. Empirical studies on cognitive data of the ADNI cohort demonstrate the superior performance of the proposed method.
Tuesday 22 15:00  16:00 CSCS2  Constraint Satisfaction 2

Solving Integer Linear Programs with a Small Number of Global Variables and Constraints
Pavel Dvořák, Eduard Eiben, Robert Ganian, Dušan Knop, Sebastian Ordyniak
Integer Linear Programming (ILP) has a broad range of applications in various areas of artificial intelligence. Yet in spite of recent advances, we still lack a thorough understanding of which structural restrictions make ILP tractable. Here we study ILP instances consisting of a small number of ``global'' variables and/or constraints such that the remaining part of the instance consists of small and otherwise independent components; this is captured in terms of a structural measure we call fracture backdoors which generalizes, for instance, the wellstudied class of Nfold ILP instances. Our main contributions can be divided into three parts. First, we formally develop fracture backdoors and obtain exact and approximation algorithms for computing these. Second, we exploit these backdoors to develop several new parameterized algorithms for ILP; the performance of these algorithms will naturally scale based on the number of global variables or constraints in the instance. Finally, we complement the developed algorithms with matching lower bounds. Altogether, our results paint a nearcomplete complexity landscape of ILP with respect to fracture backdoors.

Efficiency Through Procrastination: Approximately Optimal Algorithm Configuration with Runtime Guarantees
Kevin LeytonBrown, Robert Kleinberg, Brendan Lucier
Algorithm configuration methods have achieved much practical success, but to date have not been backed by meaningful performance guarantees. We address this gap with a new algorithm configuration framework, Structured Procrastination. With high probability and nearly as quickly as possible in the worst case, our framework finds an algorithm configuration that provably achieves near optimal performance. Moreover, its running time requirements asymptotically dominate those of existing methods.

An Effective Learnt Clause Minimization Approach for CDCL SAT Solvers
Mao Luo, ChuMin Li, Fan Xiao, Felip Manyà, Zhipeng Lü
Learnt clauses in CDCL SAT solvers often contain redundant literals. This may have a negative impact on performance because redundant literals may deteriorate both the effectiveness of Boolean constraint propagation and the quality of subsequent learnt clauses. To overcome this drawback, we define a new inprocessing SAT approach which eliminates redundant literals from learnt clauses by applying Boolean constraint propagation. Learnt clause minimization is activated before the SAT solver triggers some selected restarts, and affects only some learnt clauses during the search process. Moreover, we conducted an empirical evaluation on instances coming from the hard combinatorial and application categories of recent SAT competitions. The results show that a remarkable number of additional instances are solved when the approach is incorporated into five of the best performing CDCL SAT solvers (Glucose, TC_Glucose, COMiniSatPS, MapleCOMSPS and MapleCOMSPS_LRB).

Efficient Weighted Model Integration via SMTBased Predicate Abstraction
Paolo Morettin, Andrea Passerini, Roberto Sebastiani
Weighted model integration (WMI) is a recent formalism generalizing weighted model counting (WMC) to run probabilistic inference over hybrid domains, characterized by both discrete and continuous variables and relationships between them. Albeit powerful, the original formulation of WMI suffers from some theoretical limitations, and it is computationally very demanding as it requires to explicitly enumerate all possible models to be integrated over. In this paper we present a novel general notion of WMI, which fixes the theoretical limitations and allows for exploiting the power of SMTbased predicate abstraction techniques. A novel algorithm combines a strong reduction in the number of models to be integrated over with their efficient enumeration. Experimental results on synthetic and realworld data show drastic computational improvements over the original WMI formulation as well as existing alternatives for hybrid inference.
Tuesday 22 15:00  16:00 PLMDP  Markov Decision Processies

Improved Strong Worstcase Upper Bounds for MDP Planning
Anchit Gupta, Shivaram Kalyanakrishnan
The Markov Decision Problem (MDP) plays a central role in AI as an abstraction of sequential decision making. We contribute to the theoretical analysis of MDP PLANNING, which is the problem of computing an optimal policy for a given MDP. Specifically, we furnish improved STRONG WORSTCASE upper bounds on the running time of MDP planning. Strong bounds are those that depend only on the number of states n and the number of actions k in the specified MDP; they have no dependence on affiliated variables such as the discount factor and the number of bits needed to represent the MDP. Worstcase bounds apply to EVERY run of an algorithm; randomised algorithms can typically yield faster EXPECTED running times. While the special case of 2action MDPs (that is, k = 2) has recently received some attention, bounds for general k have remained to be improved for several decades. Our contributions are to this general case. For k >= 3, the tightest strong upper bound shown to date for MDP planning belongs to a family of algorithms called Policy Iteration. This bound is only a polynomial improvement over a trivial bound of poly(n, k) k^{n} [Mansour and Singh, 1999]. In this paper, we generalise a contrasting algorithm called the Fibonacci Seesaw, and derive a bound of poly(n, k) k^{0.6834n}. The key construct we use is a template to map algorithms for the 2action setting to the general setting. Interestingly, this idea can also be used to design Policy Iteration algorithms with a running time upper bound of poly(n, k) k^{0.7207n}. Both our results improve upon bounds that have stood for several decades.

Proactive and Reactive Coordination of Nondedicated Agent Teams Operating in Uncertain Environments
pritee agrawal, Pradeep VARAKANTHAM
Domains such as disaster rescue, security patrolling etc. often feature dynamic environments where allocations of tasks to agents become ineffective due to unforeseen conditions that may require agents to leave the team. Agents leave the team either due to arrival of high priority tasks (e.g., emergency, accident or violation) or due to some damage to the agent. Existing research in task allocation has only considered fixed number of agents and in some instances arrival of new agents on the team. However, there is little or no literature that considers situations where agents leave the team after task allocation. To that end, we make the following key contributions. First, we provide a general model to represent nondedicated teams. Second, we provide a proactive approach based on sample average approximation to generate a strategy that works well across different feasible scenarios of agents leaving the team. Furthermore, we also provide a 2stage approach that provides a 2stage closed loop policy that changes allocation based on observed state of the team. Third, we provide a reactive approach that rearranges the allocated tasks to better adapt to leaving agents. Finally, we provide a detailed evaluation of our approaches on existing benchmark problems

EquiReward Utility Maximizing Design in Stochastic Environments
Sarah Keren, Luis Pineda, Avigdor Gal, Erez Karpas, Shlomo Zilberstein
We present the Equi Reward Utility Maximizing Design (ERUMD) problem for redesigning stochastic environments to maximize agent performance. ERUMD fits well contemporary applications that require offline design of environments where robots and humans act and cooperate. To find an optimal modification sequence we present two novel solution techniques: a compilation that embeds design into a planning problem, allowing use of offtheshelf solvers to find a solution, and a heuristic search in the modifications space, for which we present an admissible heuristic. Evaluation shows the feasibility of the approach using standard benchmarks from the probabilistic planning competition and a benchmark we created for a vacuum cleaning robot setting.

Reduction Techniques for Model Checking and Learning in MDPs
Suda Bharadwaj, Stephane Le Roux, Guillermo Perez, Ufuk Topcu
Omegaregular objectives in Markov decision processes (MDPs) reduce to reachability: find a policy which maximizes the probability of reaching a target set of states. Given an MDP, an initial distribution, and a target set of states, such a policy can be computed by most probabilistic model checking tools. If the MDP is only partially specified, i.e., some prob abilities are unknown, then modellearning techniques can be used to statistically approximate the probabilities and enable the computation of the de sired policy. For fully specified MDPs, reducing the size of the MDP translates into faster model checking; for partially specified MDPs, into faster learning. We provide reduction techniques that al low us to remove irrelevant transition probabilities: transition probabilities (known, or to be learned) that do not influence the maximal reachability probability. Among other applications, these reductions can be seen as a preprocessing of MDPs before model checking or as a way to reduce the number of experiments required to obtain a good approximation of an unknown MDP.
Tuesday 22 15:00  16:00 MASCG  Cooperative Games

How to Form Winning Coalitions in Mixed HumanComputer Settings
Yair Zick, Kobi Gal, Yoram Bachrach, Moshe Mash
Despite the prevalence of weighted voting in the real world, there has been relatively little work studying real people's behavior in such settings. This paper proposes a new negotiation game, based on the weighted voting paradigm in cooperative games, where players need to form coalitions and agree on how to share the gains. We show that solution concepts from cooperative game theory (in particular, an extension of the DeeganPackel Index) provide a good prediction of people's decisions to join a given coalition. With this insight in mind, we design an agent that combines predictive analytics with decision theory to make offers to people in the game. We show that the agent was able to obtain higher shares from coalitions than did people playing other people, without reducing the acceptance rate of its offers. These results demonstrate the potential of incorporating concepts from cooperative game theory in the design of negotiating agents.

Attachment Centrality for Weighted Graphs
Jadwiga Sosnowska, Oskar Skibski
Measuring how central nodes are in terms of connecting a network has recently received increasing attention in the literature. While a few dedicated centrality measures have been proposed, Skibski et al. [2016] showed that the Attachment Centrality is the only one that satisfies certain natural axioms desirable for connectivity. Unfortunately, the Attachment Centrality is defined only for unweighted graphs which makes this measure illfitted for various applications. For instance, covert networks are typically weighted, where the weights carry additional intelligence available about criminals or terrorists and the links between them. To analyse such settings, in this paper we extend the Attachment Centrality to nodeweighted and edgeweighted graphs. By an axiomatic analysis, we show that the Attachment Centrality is closely related to the Degree Centrality in weighted graphs.

The Condorcet Principle for Multiwinner Elections: From Shortlisting to Proportionality
Haris Aziz, Edith Elkind, Piotr Faliszewski, Martin Lackner, Piotr Skowron
We study two notions of stability in multiwinner elections that are based on the Condorcet criterion. The first notion was introduced by Gehrlein and is majoritarian in spirit. The second one, local stability, is introduced in this paper, and focuses on voter representation. The goal of this paper is to explore these two notions, their implications on restricted domains, and the computational complexity of rules that are consistent with them.

Core Stability in Hedonic Games among Friends and Enemies: Impact of Neutrals
Kazunori Ohta, Nathanaël Barrot, Anisse Ismaili, Yuko Sakurai, Makoto Yokoo
We investigate hedonic games under enemies aversion and friends appreciation, where every agent considers other agents as either a friend or an enemy. We extend these simple preferences by allowing each agent to also consider other agents to be neutral. Neutrals have no impact on her preference, as in a graphical hedonic game.Surprisingly, we discover that neutral agents do not simplify matters, but cause complexity. We prove that the core can be empty under enemies aversion and the strict core can be empty under friends appreciation. Furthermore, we show that under both preferences, deciding whether the strict core is nonempty, is NP^NPcomplete. This complexity extends to the core under enemies aversion. We also show that under friends appreciation, we can always find a core stable coalition structure in polynomial time.
Tuesday 22 15:00  16:00 MLTSDS1  Time Series and Data Streams 1

Retaining Data from Streams of Social Platforms with Minimal Regret
Thanh Tam Nguyen, Matthias Weidlich, Chi Thang Duong, Hongzhi Yin, Quoc Viet Hung Nguyen
Today's social platforms, such as Twitter and Facebook, continuously generate massive volumes of data. The resulting data streams exceed any reasonable limit for permanent storage, especially since data is often redundant, overlapping, sparse, and generally of low value. This calls for means to retain solely a small fraction of the data in an online manner. In this paper, we propose techniques to effectively decide which data to retain, such that the induced loss of information, the regret of neglecting certain data, is minimized. These techniques enable not only efficient processing of massive streaming data, but are also adaptive and address the dynamic nature of social media. Experiments on largescale realworld datasets illustrate the feasibility of our approach in terms of both, runtime and information quality.

Disambiguating Energy Disaggregation: A Collective Probabilistic Approach
Sabina Tomkins, Jay Pujara, Lise Getoor
Reducing household energy usage is a priority for improving the resiliency and stability of the power grid and decreasing the negative impact of energy consumption on the environment and public health.Relevant and timely feedback about the power consumption of specific appliances can help household residents to reduce their energy demand. Given only a total energy reading, such as that collected from a residential meter, energy disaggregation strives to discover the consumption of individual appliances. Existing disaggregation algorithms are computationally inefficient and rely heavily on highresolution ground truth data. We introduce a probabilistic framework which infers the energy consumption of individual appliances using a hingeloss Markov random field (HLMRF), which admits highly scalable inference. To further enhance efficiency, we introduce a temporal representation which leverages state duration. We also explore how contextual information impacts solution quality with lowresolution data. Our framework is flexible in its ability to incorporate additional constraints; by constraining appliance usage with context and duration we can better disambiguate appliances with similar energy consumption profiles. We demonstrate the effectiveness of our framework on two public realworld datasets, reducing the error relative to a previous stateoftheart method by as much as 50%.

Modelling the Working Week for MultiStep Forecasting using Gaussian Process Regression
Pasan Karunaratne, Masud Moshtaghi, Shanika Karunasekera, Aaron Harwood, Trevor Cohn
In timeseries forecasting, regression is a popular method, with Gaussian Process Regression widely held to be the state of the art. The versatility of Gaussian Processes has led to them being used in many varied application domains. However, though many realworld applications involve data which follows a workingweek structure, where weekends exhibit substantially different behavior to weekdays, methods for explicit modelling of workingweek effects in Gaussian Process Regression models have not been proposed. Not explicitly modelling the working week fails to incorporate a signiﬁcant source of information which can be invaluable in forecasting scenarios. In this work we provide novel kernelcombination methods to explicitly model workingweek effects in timeseries data for more accurate predictions using Gaussian Process Regression. Further, we demonstrate that prediction accuracy can be improved by constraining the nonconvex optimization process of ﬁnding optimal hyperparameter values. We validate the effectiveness of our methods by performing multistep prediction on two realworld publicly available timeseries datasets  one relating to electricity Smart Meter data of the University of Melbourne, and the other relating to the counts of pedestrians in the City of Melbourne.

Stochastic Online Anomaly Analysis for Streaming Time Series
Zhao Xu, Kristian Kersting, Lorenzo von Ritter
Identifying patterns in time series that exhibit anomalous behavior is of increasing importance in many domains, such as financial and Web data analysis. In real applications, time series data often arrive continuously, and usually only a single scan is allowed through the data. Batch learning and retrospective segmentation methods would not be well applicable to such scenarios. In this paper, we present an online nonparametric Bayesian method OLAD for anomaly analysis in streaming time series. Moreover, we develop a novel and efficient online learning approach for the OLAD model based on stochastic gradient descent. The proposed method can effectively learn the underlying dynamics of anomalycontaminated heavytailed time series and identify potential anomalous events. Empirical analysis on realworld datasets demonstrates the effectiveness of our method.
Tuesday 22 15:00  16:00 MLRL  Relational Learning

ClusteringBased Relational Unsupervised Representation Learning with an Explicit Distributed Representation
Sebastijan Dumancic, Hendrik Blockeel
The goal of unsupervised representation learning is to extract a new representation of data, such that solving many different tasks becomes easier. Existing methods typically focus on vectorized data and offer little support for relational data, which additionally describes relationships among instances. In this work we introduce an approach for relational unsupervised representation learning. Viewing a relational dataset as a hypergraph, new features are obtained by clustering vertices and hyperedges. To find a representation suited for many relational learning tasks, a wide range of similarities between relational objects is considered, e.g. feature and structural similarities. We experimentally evaluate the proposed approach and show that models learned on such latent representations perform better, have lower complexity, and outperform the existing approaches on classification tasks.

Multilingual Knowledge Graph Embeddings for Crosslingual Knowledge Alignment
Muhao Chen, Yingtao Tian, Mohan Yang, Carlo Zaniolo
Many recent works have demonstrated the benefits of knowledge graph embeddings in completing monolingual knowledge graphs. Inasmuch as related knowledge bases are built in several different languages, achieving crosslingual knowledge alignment will help people in constructing a coherent knowledge base, and assist machines in dealing with different expressions of entity relationships across diverse human languages. Unfortunately, achieving this highly desirable crosslingual alignment by human labor is very costly and errorprone. Thus, we propose MTransE, a translationbased model for multilingual knowledge graph embeddings, to provide a simple and automated solution. By encoding entities and relations of each language in a separated embedding space, MTransE provides transitions for each embedding vector to its crosslingual counterparts in other spaces, while preserving the functionalities of monolingual embeddings. We deploy three different techniques to represent crosslingual transitions, namely axis calibration, translation vectors, and linear transformations, and derive five variants for MTransE using different loss functions. Our models can be trained on partially aligned graphs, where just a small portion of triples are aligned with their crosslingual counterparts. The experiments on crosslingual entity matching and triplewise alignment verification show promising results, with some variants consistently outperforming others on different tasks. We also explore how MTransE preserves the key properties of its monolingual counterpart.

When Does Label Propagation Fail? A View from a Network Generative Model
Yuto Yamaguchi, Kohei Hayashi
What kinds of data does Label Propagation (LP) work best on? Can we justify the solution of LP from a theoretical standpoint? LP is a semisupervised learning algorithm that is widely used to predict unobserved node labels on a network (e.g., user's gender on an SNS). Despite its importance, its theoretical properties remain mostly unexplored. In this paper, we answer the above questions by interpreting LP from a statistical viewpoint. As our main result, we identify the network generative model behind the discretized version of LP (DLP), and we show that under specific conditions the solution of DLP is equal to the maximum {\it a posteriori} estimate of that generative model. Our main result reveals the critical limitations of LP. Specifically, we discover that LP would not work best on networks with (1) disassortative node labels, (2) clusters having different edge densities, (3) nonuniform label distributions, or (4) unreliable node labels provided. Our experiments under a variety of settings support our theoretical results.

Tensor Decomposition with Missing Indices
Yuto Yamaguchi, Kohei Hayashi
How can we decompose a data tensor if the indices are partially missing?Tensor decomposition is a fundamental tool to analyze the tensor data.Suppose, for example, we have a 3rdorder tensor X where each element $\mathcal{X}_{ijk}$ takes $1$ if user $i$ posts word $j$ at location $k$ on Twitter.Standard tensor decomposition expects all the indices are observed but, in some tweets, location $k$ can be missing.In this paper, we study a tensor decomposition problem where the indices ($i$, $j$, or $k$) of some observed elements are partially missing.Towards the problem, we propose a probabilistic tensor decomposition model that handles missing indices as latent variables.To infer them, we derive an algorithm based on stochastic variational inference, which enables to leverage the information from the incomplete data scalably.The experiments on both synthetic and real datasets show that the proposed method achieves higher accuracy in the tensor completion task than baselines that cannot handle missing indices.
Tuesday 22 16:30  18:00 Competition Angry Birds
Tuesday 22 16:30  18:00 ROBRV  Robotics and Vision

Locality Preserving Matching
Jiayi Ma, Ji Zhao, Hanqi Guo, Junjun Jiang, Huabing Zhou, Yuan Gao
Seeking reliable correspondences between two feature sets is a fundamental and important task in computer vision. This paper attempts to remove mismatches from given putative image feature correspondences. To achieve the goal, an efficient approach, termed as locality preserving matching (LPM), is designed, the principle of which is to maintain the local neighborhood structures of those potential true matches. We formulate the problem into a mathematical model, and derive a closedform solution with linearithmic time and linear space complexities. More specifically, our method can accomplish the mismatch removal from thousands of putative correspondences in only a few milliseconds. Experiments on various real image pairs for general feature matching, as well as for visual homing and image retrieval demonstrate the generality of our method for handling different types of image deformations, and it is more than two orders of magnitude faster than stateoftheart methods in the same range of or better accuracy.

Fast Preprocessing for Robust Face Sketch Synthesis
Yibing Song, Jiawei Zhang, Linchao Bao, Qingxiong Yang
Exemplarbased face sketch synthesis methods usually meet the challenging problem that input photos are captured in different lighting conditions from training photos. The critical step causing the failure is the search of similar patch candidates for an input photo patch. Conventional illumination invariant patch distances are adopted rather than directly relying on pixel intensity difference, but they will fail when local contrast within a patch changes. In this paper, we propose a fast preprocessing method named Bidirectional Luminance Remapping (BLR), which interactively adjust the lighting of training and input photos. Our method can be directly integrated into stateoftheart exemplarbased methods to improve their robustness with ignorable computational cost

Is My Object in This Video? Reconstructionbased Object Search in Videos
Tan Yu, Jingjing Meng, Junsong Yuan
This paper addresses the problem of videolevel object instance search, which aims to retrieve the videos in the database that contain a given query object instance. Without prior knowledge about "when" and "where" an object of interest may appear in a video, determining "whether" a video contains the target object is computationally prohibitive, as it requires exhaustively matching the query against all possible spatialtemporal locations in each video that an object may appear. To alleviate the computational and memory cost, we propose the Reconstructionbased Object SEarch (ROSE) method.It characterizes a huge corpus of features of possible spatialtemporal locations in the video into the parameters of the reconstruction model. Since the memory cost of storing reconstruction model is much less than that of storing features of possible spatialtemporal locations in the video, the efficiency of the search is significantly boosted. Comprehensive experiments on three benchmark datasets demonstrate the promising performance of the proposed ROSE method.

Combining Models from Multiple Sources for RGBD Scene Recognition
Xinhang Song, Shuqiang Jiang, Luis Herranz
Depth can complement RGB with useful cues about object volumes and scene layout. However, RGBD image datasets are still too small for directly training deep convolutional neural networks (CNNs), in contrast to the massive monomodal RGB datasets. Previous works in RGBD recognition typically combine two separate networks for RGB and depth data, pretrained with a large RGB dataset and then fine tuned to the respective target RGB and depth datasets. These approaches have several limitations: 1) only use lowlevel filters learned from RGB data, thus not being able to exploit properly depthspecific patterns, and 2) RGB and depth features are only combined at highlevels but rarely at lowerlevels. In this paper, we propose a framework that leverages both knowledge acquired from large RGB datasets together with depthspecific cues learned from the limited depth data, obtaining more effective multisource and multimodal representations. We propose a multimodal combination method that selects discriminative combinations of layers from the different source models and target modalities, capturing both highlevel properties of the task and intrinsic lowlevel properties of both modalities.

CrossGranularity Graph Inference for Semantic Video Object Segmentation
Huiling Wang, Tinghuai Wang, Ke Chen, JoniKristian Kämäräinen
We address semantic video object segmentation via a novel crossgranularity hierarchical graphical model to integrate tracklet and object proposal reasoning with superpixel labeling. Tracklet characterizes varying spatialtemporal relations of video object which, however, quite often suffers from sporadic local outliers. In order to acquire highquality tracklets, we propose a transductive inference model which is capable of calibrating shortrange noisy object tracklets with respect to longrange dependencies and highlevel context cues. In the center of this work lies a new paradigm of semantic video object segmentation beyond modeling appearance and motion of objects locally, where the semantic label is inferred by jointly exploiting multiscale contextual information and spatialtemporal relations of video object. We evaluate our method on two popular semantic video object segmentation benchmarks and demonstrate that it advances the stateoftheart by achieving superior accuracy performance than other leading methods.

Synthesizing Samples for Zeroshot Learning
Yuchen Guo, Guiguang Ding, Jungong Han, Yue Gao
Zeroshot learning (ZSL) is to construct recognition models for unseen target classes that have no labeled samples for training. It utilizes the class attributes or semantic vectors as side information and transfers supervision information from related source classes with abundant labeled samples. Existing ZSL approaches adopt an intermediary embedding space to measure the similarity between a sample and the attributes of a target class to perform zeroshot classification. However, this way may suffer from the information loss caused by the embedding process and the similarity measure cannot fully make use of the data distribution. In this paper, we propose a novel approach which turns the ZSL problem into a conventional supervised learning problem by synthesizing samples for the unseen classes. Firstly, the probability distribution of an unseen class is estimated by using the knowledge from seen classes and the class attributes. Secondly, the samples are synthesized based on the distribution for the unseen class. Finally, we can train any supervised classifiers based on the synthesized samples. Extensive experiments on benchmarks demonstrate the superiority of the proposed approach to the stateoftheart ZSL approaches.
Tuesday 22 16:30  18:00 NLPAT1  NLP Applications and Tools 1

MultiModal Word Synset Induction
Jesse Thomason, Raymond J. Mooney
A word in natural language can be polysemous, having multiple meanings, as well as synonymous, meaning the same thing as other words. Word sense induction attempts to find the senses of polysemous words. Synonymy detection attempts to find when two words are interchangeable. We combine these tasks, first inducing word senses and then detecting similar senses to form wordsense synonym sets (synsets) in an unsupervised fashion. Given pairs of images and text with noun phrase labels, we perform synset induction to produce collections of underlying concepts described by one or more noun phrases. We find that considering multimodal features from both visual and textual context yields better induced synsets than using either context alone. Human evaluations show that our unsupervised, multimodally induced synsets are comparable in quality to annotationassisted ImageNet synsets, achieving about 84% of ImageNet synsets' approval.

DDoS Event Forecasting using Twitter Data
Zhongqing Wang, Yue Zhang
Distributed Denial of Service (DDoS) attacks have been significant threats to the Internet. Traditional research in cyber security focuses on detecting emerging DDoS attacks by tracing network package flow. A characteristic of DDoS defense is that rescue time is limited since the launch of attack. More resilient detection and defence models are typically more costly. We aim at predicting the likelihood of DDoS attacks by monitoring relevant text streams in social media, so that the level of defense can be adjusted dynamically for maximizing costeffect. To our knowledge, this is a novel and challenge research question for DDoS rescue. Because the input of this task is a text stream rather than a document, information should be collected both on the textual content of individual posts. We propose a finegrained hierarchical stream model to capture semantic information over infinitely long history, and reveal burstiness and trends. Empirical evaluation shows that social text streams are indeed informative for DDoS forecasting, and our proposed hierarchical model is more effective compared to strong baseline text stream models and discrete bagofwords models.

A Neural Model for Joint Event Detection and Summarization
Zhongqing Wang, Yue Zhang
Twitter new event detection aims to identify first stories in a tweet stream. Typical approaches consider two sub tasks. First, it is necessary to filter out mundane or irrelevant tweets. Second, tweets are grouped automatically into event clusters. Traditionally, these two sub tasks are processed separately, and integrated under a pipeline setting, despite that there is interdependence between the two tasks. In addition, one further related task is summarization, which is to extract a succinct summary for representing a large group of tweets. Summarization is related to detection, under the new event setting in that salient information is universal between event representing tweets and informative event summaries. In this paper, we build a joint model to filter, cluster, and summarize the tweets for new events. In particular, deep representation learning is used to vectorize tweets, which serves as basis that connects tasks. A neural stacking model is used for integrating a pipeline of different sub tasks, and for better sharing between the predecessor and successors. Experiments show that our proposed neural joint model is more effective compared to its pipeline baseline.

Fast Parallel Training of Neural Language Models
Tong Xiao, Jingbo Zhu, Tongran Liu, Chunliang Zhang
Training neural language models (NLMs) is very time consuming and we need parallelization for system speedup. However, standard training methods have poor scalability across multiple devices (e.g., GPUs) due to the huge time cost required to transmit data for gradient sharing in the backpropagation process. In this paper we present a samplingbased approach to reducing data transmission for better scaling of NLMs. As a ''bonus'', the resulting model also improves the training speed on a single device. Our approach yields significant speed improvements on a recurrent neural networkbased language model. On four NVIDIA GTX1080 GPUs, it achieves a speedup of 2.1+ times over the standard asynchronous stochastic gradient descent baseline, yet with no increase in perplexity. This is even 4.2 times faster than the naive single GPU counterpart.

Joint Learning on Relevant User Attributes in Microblog
Jingjing Wang, Shoushan Li, Guodong Zhou
User attribute classification aims to identify users’ attributes (e.g., gender, age and profession) by leveraging user generated content. However, conventional approaches to user attribute classification focus on single attribute classification involving only one user attribute, which completely ignores the relationship among various user attributes. In this paper, we confront a novel scenario in user attribute classification where relevant user attributes are jointly learned, attempting to make the relevant attribute classification tasks help each other. Specifically, we propose a joint learning approach, namely AuxLSTM, which first learns a proper auxiliary representation between the related tasks and then leverages the auxiliary representation to integrate the learning process in both tasks. Empirical studies demonstrate the effectiveness of our proposed approach to joint learning on relevant user attributes.

Active Learning for BlackBox Semantic Role Labeling with Neural Factors
Chenguang Wang, Yunyao Li, Laura Chiticariu
Active learning is a useful technique for tasks for which unlabeled data is abundant but manual labeling is expensive. One example of such a task is semantic role labeling (SRL), which relies heavily on labels from trained linguistic experts. One challenge in applying active learning algorithms for SRL is that the complete knowledge of the SRL model is often unavailable, against the common assumption that active learning methods are aware of the details of the underlying models. In this paper, we present an active learning framework for blackbox SRL models (i.e., models whose details are unknown). In lieu of a query strategy based on model details, we propose a neural query strategy model that embeds both language and semantic information to automatically learn the query strategy from predictions of an SRL model alone. Our experimental results demonstrate the effectiveness of both this new active learning framework and the neural query strategy model.
Tuesday 22 16:30  18:00 MLCL2  Classification 2

Convolutional 2D LDA for Nonlinear Dimensionality Reduction
Qi Wang, Zequn Qin, Feiping Nie, Yuan Yuan
Representing highvolume and highorder data is an essential problem, especially in machine learning field. Although existing twodimensional (2D) discriminant analysis achieves promising performance, the single and linear projection features make it difficult to analyze more complex data. In this paper, we propose a novel convolutional twodimensional linear discriminant analysis (2D LDA) method for data representation. In order to deal with nonlinear data, a specially designed Convolutional Neural Networks (CNN) is presented, which can be proved having the equivalent objective function with common 2D LDA. In this way, the discriminant ability can benefit from not only the nonlinearity of Convolutional Neural Networks, but also the powerful learning process. Experiment results on several datasets show that the proposed method performs better than other stateoftheart methods in terms of classification accuracy.

Hierarchical Feature Selection with Recursive Regularization
Hong Zhao, Pengfei Zhu, Ping Wang, Qinghua Hu
In the big data era, the sizes of datasets have increased dramatically in terms of the number of samples, features, and classes. In particular, there exists usually a hierarchical structure among the classes. This kind of task is called hierarchical classification. Various algorithms have been developed to select informative features for flat classification. However, these algorithms ignore the semantic hyponymy in the directory of hierarchical classes, and select a uniform subset of the features for all classes. In this paper, we propose a new technique for hierarchical feature selection based on recursive regularization. This algorithm takes the hierarchical information of the class structure into account. As opposed to flat feature selection, we select different feature subsets for each node in a hierarchical tree structure using the parentchildren relationships and the sibling relationships for hierarchical regularization. By imposing $\ell_{2,1}$norm regularization to different parts of the hierarchical classes, we can learn a sparse matrix for the feature ranking of each node. Extensive experiments on public datasets demonstrate the effectiveness of the proposed algorithm.

Classification and Representation Joint Learning via Deep Networks
Ya Li, Xinmei Tian, Xu Shen, Dacheng Tao
Deep learning has been proven to be effective for classification problems. However, the majority of previous works trained classifiers by considering only class label information and ignoring the local information from the spatial distribution of training samples. In this paper, we propose a deep learning framework that considers both class label information and local spatial distribution information between training samples. A twochannel network with shared weights is used to measure the local distribution. The classification performance can be improved with more detailed information provided by the local distribution, particularly when the training samples are insufficient. Additionally, the class label information can help to learn better feature representations compared with other feature learning methods that use only local distribution information between samples. The local distribution constraint between sample pairs can also be viewed as a regularization of the network, which can efficiently prevent the overfitting problem. Extensive experiments are conducted on several benchmark image classification datasets, and the results demonstrate the effectiveness of our proposed method.

Discriminant Tensor Dictionary Learning with Neighbor Uncorrelation for Image Set Based Classification
Fei Wu, XiaoYuan Jing, Wangmeng Zuo, Ruiping Wang, Xiaoke Zhu
Image set based classification (ISC) has attracted lots of research interest in recent years. Several ISC methods have been developed, and dictionary learning technique based methods obtain stateoftheart performance. However, existing ISC methods usually transform the image sample of a set into a vector for subsequent processing, which breaks the inherent spatial structure of image sample and the set. In this paper, we utilize tensor to model an image set with two spatial modes and one set mode, which can fully explore the intrinsic structure of image set. We propose a novel ISC approach, named discriminant tensor dictionary learning with neighbor uncorrelation (DTDLNU), which jointly learns two spatial dictionaries and one set dictionary. The spatial and set dictionaries are composed by setspecific subdictionaries corresponding to the class labels, such that the reconstruction error is discriminative. To obtain dictionaries with favorable discriminative power, DTDLNU designs a neighboruncorrelated discriminant tensor dictionary term, which minimizes the withinclass scatter of the training sets in the projected tensor space and reduces tensor dictionary correlation among setspecific subdictionaries corresponding to neighbor sets from different classes. Experiments on three challenging datasets demonstrate the effectiveness of DTDLNU.

Learning Feature Engineering for Classification
Fatemeh Nargesian, Horst Samulowitz, Udayan Khurana, Elias Khalil, Deepak Turaga
Feature engineering is the task of improving predictive modelling performance on a dataset by transforming its feature space. Existing approaches to automate this process rely on either transformed feature space exploration through evaluationguided search, or explicit expansion of datasets with all transformed features followed by feature selection. Such approaches incur high computational costs in runtime and/or memory. We present a novel technique, called Learning Feature Engineering (LFE), for automating feature engineering in classification tasks. LFE is based on learning the effectiveness of applying a transformation (e.g., arithmetic or aggregate operators) on numerical features, from past feature engineering experiences. Given a new dataset, LFE recommends a set of useful transformations to be applied on features without relying on model evaluation or explicit feature expansion and selection. Using a collection of datasets, we train a set of neural networks, which aim at predicting the transformation that impacts classification performance positively. Our empirical results show that LFE outperforms other feature engineering approaches for an overwhelming majority (89%) of the datasets from various sources and incurring a substantially lower computational cost.

Feature Selection via Scaling Factor Integrated MultiClass Support Vector Machines
Jinglin Xu, Feiping Nie, Junwei Han
In data mining, we often encounter high dimensional and noisy features, which may not only increase the load of computational resources but also result in the problem of model overfitting. Feature selection is often adopted to address this issue. In this paper, we propose a novel feature selection method based on multiclass SVM, which introduces the scaling factor with a flexible parameter to renewedly adjust the distribution of feature weights and select the most discriminative features. Concretely, the proposed method designs a scaling factor with p/2 power to control the distribution of weights adaptively and search optimal sparsity of weighting matrix. In addition, to solve the proposed model, we provide an alternative and iterative optimization method. It not only makes solutions of weighting matrix and scaling factor independently, but also provides a better way to address the problem of solving L2,0norm. Comprehensive experiments are conducted on six datasets to demonstrate that this work can obtain better performance compared with a number of existing stateoftheart multiclass feature selection methods.
Tuesday 22 16:30  18:00 MLDM2  Data Mining 2

Doubly Sparsifying Network
Zhangyang Wang, Shuai Huang, Jiayu Zhou, Thomas S. Huang
We propose the doubly sparsifying network (DSN), by drawing inspirations from the double sparsity model for dictionary learning. DSN emphasizes the joint utilization of both the problem structure and the parameter structure. It simultaneously sparsifies the output features and the learned model parameters, under one unified framework. DSN enjoys intuitive model interpretation, compact model size and low complexity. We compare DSN against a few carefullydesigned baselines, to verify its consistently superior performance in a wide range of settings. Encouraged by its robustness to insufficient training data, we explore the applicability of DSN in brain signal processing that has been a challenging interdisciplinary area. DSN is evaluated for two mainstream tasks, electroencephalographic (EEG) signal classification and blood oxygenation level dependent (BOLD) response prediction, both achieving promising results.

Improved Bounded Matrix Completion for LargeScale Recommender Systems
Huang Fang, ChoJui Hsieh, Zhang Zhen, Yiqun Shao
Matrix completion is a widely used technique for personalized recommender system. In this paper, we focus on the idea of Bounded Matrix Completion (BMC) which imposes bounded constraint into the original matrix completion problem. It has been shown that BMC works well for several real world datasets, and an efficient coordinate descent solver called BMA has been proposed in~\cite{bma}. However, we observe that the BMA algorithm sometimes fails to converge to a stationary point, resulting in a relatively poor accuracy in those cases. To overcome this issue, we propose our new approach for solving BMC under the ADMM framework. The proposed algorithm is gauranteed to converge to stationary points. Experimental results on real world datasets show that our algorithm can reach a lower objective value, obtain a higher predict accuracy rate and have better scalability compared with BMA. We also present that our method outperforms the stateofart standard matrix factorization in most cases.

Multiview Feature Learning with Discriminative Regularization
Jinglin Xu, Junwei Han, Feiping Nie
More and more multiview data which can capture rich information from heterogeneous features are widely used in real world applications. How to integrate different types of features, and how to learn low dimensional and discriminative information from high dimensional data are two main challenges. To address these challenges, this paper proposes a novel multiview feature learning framework, which is regularized by discriminative information and obtains a feature learning model that contains multiple discriminative feature weighting matrices for different views, and then yields multiple low dimensional features used for subsequent multiview clustering. To optimize the formulated objective function, we transform the proposed framework into a trace optimization problem which obtains the global solution in a closed form. Experimental evaluations on four widely used datasets and comparisons with a number of stateoftheart multiview clustering algorithms demonstrate the superiority of the proposed work.

LoCaTe: Influence Quantification for Location Promotion in Locationbased Social Networks
Ankita Likhyani, Srikanta Bedathur, Deepak P
Locationbased social networks (LBSNs) such as Foursquare offer a platform for users to share and be aware of each other’s physical movements. As a result of such a sharing of checkin information with each other, users can be influenced to visit (or checkin) at the locations visited by their friends. Quantifying such influences in these LBSNs is useful in various settings such as location promotion, personalized recommendations, mobility pattern prediction etc. In this paper, we focus on the problem of location promotion and develop a model to quantify the influence specific to a location between a pair of users. Specifically, we develop a joint model called LoCaTe, consisting of (i) user mobility model estimated using kernel density estimates; (ii) a model of the semantics of the location using topic models; and (iii) a model of timegap between checkins using exponential distribution. We validate our model on a longterm crawl of Foursquare data collected between Jan 2015 Feb 2016, as well as on publicly available LBSN datasets. Our experiments demonstrate that LoCaTe significantly outperforms stateoftheart models for the same task.

Effective Representing Information Network by Variational Autoencoder
Hang Li, Haozheng Wang, Zhenglu Yang, Haochen Liu
Network representation is the basis of many applications and of extensive interest in various fields, such as information retrieval, social network analysis, and recommendation systems. Most previous methods for network representation only consider the incomplete aspects of a problem, including link structure, node information, and partial integration. The present study proposes a deep network representation model that seamlessly integrates the text information and structure of a network. Our model captures highly nonlinear relationships between nodes and complex features of a network by exploiting the variational autoencoder (VAE), which is a deep unsupervised generation algorithm. We also merge the representation learned with a paragraph vector model and that learned with the VAE to obtain the network representation that preserves both structure and text information. We conduct comprehensive empirical experiments on benchmark datasets and find our model performs better than stateoftheart techniques by a large margin.

CrossDomain Recommendation: An Embedding and Mapping Approach
Tong Man, Huawei Shen, Xueqi Cheng, Xiaolong Jin
Data sparsity is one of the most challenging problems for recommender systems. One promising solution to this problem is crossdomain recommendation, i.e., leveraging feedbacks or ratings from multiple domains to improve recommendation performance in a collective manner. In this paper, we propose an Embedding and Mapping framework for CrossDomain Recommendation, called EMCDR. The proposed EMCDR framework distinguishes itself from existing crossdomain recommendation models in two aspects. First, a multilayer perceptron is used to capture the nonlinear mapping function across domains, which offers high flexibility for learning domainspecific features of entities in each domain. Second, only the entities with sufficient data are used to learn the mapping function, guaranteeing its robustness to noise caused by data sparsity in single domain. Extensive experiments on two crossdomain recommendation scenarios demonstrate that EMCDR significantly outperforms stateoftheart crossdomain recommendation methods.
Tuesday 22 16:30  18:00 CSST  Solvers and Tools

Scalable Constraintbased Virtual Data Center Allocation
Sam Bayless, Nodir Kodirov, Ivan Beschastnikh, Holger Hoos, Alan Hu
Constraintbased techniques can solve challenging problems arising from highly diverse applications. This paper considers the problem of virtual data center (VDC) allocation, an important, emerging challenge for modern data center operators. To solve this problem, we introduce NETSOLVER, which is based on the generalpurpose constraint solver MONOSAT. NETSOLVER represents a major improvement over existing approaches: it is sound, complete, and scalable, providing support for endtoend, multipath bandwidth guarantees across all the layers of hosting infrastructure, from servers to topofrack switches to aggregation switches to access routers. NETSOLVER scales to realistic data center sizes and VDC topologies, typically requiring just seconds to allocate VDCs of 5–15 virtual machines to physical data centers with 1000+ servers, maintaining this efficiency even when the data center is nearly saturated. In many cases, NETSOLVER can allocate 150%−300% as many total VDCs to the same physical data center as previous methods. Essential to our solution efficiency is our formulation of VDC allocation using monotonic theories, illustrating the practical value of the recently proposed SAT modulo monotonic theories approach.

On Computing World Views of Epistemic Logic Programs
Tran Son, Tiep Le, Patrick Kahl, Anthony Leclerc
This paper presents a novel algorithm for computing world views of different semantics of epistemic logic programs (ELP) and two of its realization, called Epasp (for an older semantics) and Epasp^{se} (for the newest semantics), whose implementation builds on the theoretical advancement in the study of ELPs and takes advantage of the multishot computation paradigm of the answer set solver Clingo. The new algorithm differs from the majority of earlier algorithms in its strategy. Specifically, it computes one world view at a time and utilizes properties of world views to reduce its search space. It starts by computing an answer set and then determines whether or not a world view containing this answer set exists. In addition, it allows for the computation to focus on world views satisfying certain properties. The paper includes an experimental analysis of the performance of the two solvers comparing against a recently developed solver. It also contains an analysis of their performance in goal directed computing against a logic programming based conformant planning system, dlvk. It concludes with some final remarks and discussion on the future work.

Stochastic Constraint Programming with AndOr BranchandBound
Behrouz Babaki, Tias Guns, Luc de Raedt
Complex multistage decision making problems often involve uncertainty, for example, regarding demand or processing times. Stochastic constraint programming was proposed as a way to formulate and solve such decision problems, involving arbitrary constraints over both decision and random variables. What stochastic constraint programming still lacks is support for the use of factorized probabilistic models that are popular in the graphical model community. We show how a stateoftheart probabilistic inference engine can be integrated into standard constraint solvers. The resulting approach searches over the AndOr search tree directly, and we investigate tight bounds on the expected utility objective. This significantly improves search efficiency and outperforms scenariobased methods that ground out the possible worlds.

An Improved DecisionDNNF Compiler
Pierre Marquis, JeanMarie Lagniez
We present and evaluate a new compiler, called d4, targeting the DecisionDNNF language. As the stateoftheart compilers C2D and Dsharp targeting the same language, d4 is a topdown treesearch algorithm exploring the space of propositional interpretations. d4 is based on the same ingredients as those considered in C2D and Dsharp (mainly, disjoint component analysis, conflict analysis and nonchronological backtracking, component caching). d4 takes advantage of a dynamic decomposition approach based on hypergraph partitioning, used sparingly. Some simplification rules are also used to minimize the time spent in the partitioning steps and to promote the quality of the decompositions. Experiments show that the compilation times and the sizes of the DecisionDNNF representations computed by d4 are in many cases significantly lower than the ones obtained by C2D and Dsharp.

Solving Stochastic Boolean Satisfiability under RandomExist Quantification
NianZe Lee, YenShi Wang, JieHong Jiang
Stochastic Boolean Satisfiability (SSAT) is a powerful formalism to represent computational problems with uncertainly, such as belief network inference and propositional probabilistic planning. Solving SSAT formulas lies in the same complexity class (PSPACEcomplete) as solving Quantified Boolean Formula (QBF). While many endeavors have been made to enhance QBF solving, SSAT has drawn relatively less attention in recent years. This paper focuses on randomexist quantified SSAT formulas, and proposes an algorithm combining binary decision diagram (BDD), logic synthesis, and modern SAT techniques to improve computational efficiency. Unlike prior exact SSAT algorithms, the proposed method can be easily modified to solve approximate SSAT by deriving upper and lower bounds of satisfying probability. Experimental results show that our method outperforms the stateoftheart algorithm on random kCNF formulas and has effective application to approximate SSAT on circuit benchmarks.

SVDfree ConvexConcave Approaches for Nuclear Norm Regularization
Yichi Xiao, Zhe Li, Tianbao Yang, Lijun Zhang
Minimizing a convex function of matrices regularized by the nuclear norm arises in many applications such as collaborative filtering and multitask learning. In this paper, we study the general setting where the convex function could be nonsmooth. When the size of the data matrix, denoted by m x n, is very large, existing optimization methods are inefficient because in each iteration, they need to perform a singular value decomposition (SVD) which takes O(m^2 n) time. To reduce the computation cost, we exploit the dual characterization of the nuclear norm to introduce a convexconcave optimization problem and design a subgradientbased algorithm without performing SVD. In each iteration, the proposed algorithm only computes the largest singular vector, reducing the time complexity from O(m^2 n) to O(mn). To the best of our knowledge, this is the first SVDfree convex optimization approach for nuclearnorm regularized problems that does not rely on the smoothness assumption. Theoretical analysis shows that the proposed algorithm converges at an optimal O(1/\sqrt{T}) rate where T is the number of iterations. We also extend our algorithm to the stochastic case where only stochastic subgradients of the convex function are available and a special case that contains an additional nonsmooth regularizer (e.g., L1 norm regularizer). We conduct experiments on robust lowrank matrix approximation and link prediction to demonstrate the efficiency of our algorithms.
Tuesday 22 16:30  18:00 MASNCG  Noncooperative Games

Playing Repeated Network Interdiction Games with SemiBandit Feedback
Qingyu Guo, Bo An, Long TranThanh
We study repeated network interdiction games with no prior knowledge of the adversary and the environment, which can model many real world network security domains. Existing works often require plenty of available information for the defender and neglect the frequent interactions between both players, which are unrealistic and impractical, and thus, are not suitable for our settings. As such, we provide the first defender strategy, that enjoys nice theoretical and practical performance guarantees, by applying the adversarial online learning approach. In particular, we model the repeated network interdiction game with no prior knowledge as an online linear optimization problem, for which a novel and efficient online learning algorithm, SBGA, is proposed, which exploits the unique semibandit feedback in network security domains. We prove that SBGA achieves sublinear regret against adaptive adversary, compared with both the best fixed strategy in hindsight and a near optimal adaptive strategy. Extensive experiments also show that SBGA significantly outperforms existing approaches with fast convergence rate.

Comparing Strategic Secrecy and Stackelberg Commitment in Security Games
Qingyu Guo, Bo An, Branislav Bošanský, Christopher Kiekintveld
The Strong Stackelberg Equilibrium (SSE) has drawn extensive attention recently in several security domains. However, the SSE concept neglects the advantage of defender's strategic revelation of her private information, and overestimates the observation ability of the adversaries. In this paper, we overcome these restrictions and analyze the tradeoff between strategic secrecy and commitment in security games. We propose a Disguisedresource Security Game (DSG) where the defender strategically disguises some of her resources. We compare strategic information revelation with public commitment and formally show that they have different advantages depending the payoff structure. To compute the Perfect Bayesian Equilibrium (PBE), several novel approaches are provided, including a novel algorithm based on support set enumeration, and an approximation algorithm for \epsilonPBE. Extensive experimental evaluation shows that both strategic secrecy and Stackelberg commitment are critical measures in security domain, and our approaches can efficiently solve PBEs for realisticsized problems.

Mechanism Design for Strategic Project Scheduling
Pradeep VARAKANTHAM, Na FU
Organizing large scale projects (e.g., Conferences, IT Shows, F1 race) requires precise scheduling of multiple dependent tasks on common resources where multiple selfish entities are competing to execute the individual tasks. In this paper, we consider a well studied and rich scheduling model referred to as RCPSP (Resource Constrained Project Scheduling Problem). The key change to this model that we consider in this paper is the presence of selfish entities competing to perform individual tasks with the aim of maximizing their own utility. Due to the selfish entities in play, the goal of the scheduling problem is no longer only to minimize makespan for the entire project, but rather, to maximize social welfare while ensuring incentive compatibility and economic efficiency. We show that traditional VCG mechanism is not incentive compatible in this context and hence we provide two new practical mechanisms that extend on VCG. These new mechanisms referred to as Individual Completion based Payments (ICP) and Social Completion based Payments (SCP) provide strong theoretical properties including strategy proofness.

Posted Pricing sans Discrimination
Shreyas Sekar
In the quest for market mechanisms that are easy to implement, yet close to optimal, few seem as viable as posted pricing. Despite the growing body of impressive results, the performance of most posted price mechanisms however, rely crucially on "price discrimination" when multiple copies of a good are available. For the more general case with nonlinear production costs on each good, hardly anything is known for general multigood markets. With this in mind, we study the problem of social welfare maximization in a Bayesian setting where the seller can produce any number of copies of a good but faces convex production costs for the same. Our central contribution is a structured framework for decision making and static item pricing in the face of uncertainty and production costs, i.e., the seller decides how much to produce and posts a single price per good that is common to all buyers, the buyers arrive sequentially and purchase utility maximizing bundles of goods. The framework yields constant factor approximations to the optimum welfare when buyer valuations are fractionally subadditive, extends to more general valuations and also settings where the seller is completely oblivious to buyer valuations. Our work presents the first known results for nondiscriminatory pricing in environments with nonlinear costs where we only have access to stochastic information regarding buyer preferences. At a high level, our results imply that it is often possible to obtain good guarantees without discriminating against buyers, i.e., charging them differently for the same good.

Equilibria in Ordinal Games: A Framework based on Possibility Theory.
Régis Sabbadin, Nahla Ben Amor, helene fargier
The present paper proposes the first definition of mixed equilibrium for ordinal games. This definition naturally extends possibilistic (single agent) decision theory. This allows us to provide a unifying view of single and multiagent qualitative decision theory. Our first contribution is to show that ordinal games always admit a possibilistic mixed equilibrium, which can be seen as a qualitative counterpart to mixed (probabilistic) equilibrium.Then, we show that a possibilistic mixed equilibrium can be computed in polynomial time (wrt the size of the game), which contrasts with pure Nash or mixed probabilistic equilibrium computation in cardinal game theory.The definition we propose is thus operational in two ways: (i) it tackles the case when no pure Nash equilibrium exists in an ordinal game; and (ii) it allows an efficient computation of a mixed equilibrium.

Convergence and Quality of Iterative Voting Under NonScoring Rules
Aaron Koolyk, Tyrone Strangway, Omer Lev, Jeffrey S. Rosenschein
Iterative voting is a social choice mechanism that assumes all voters are strategic, and allows voters to change their stated preferences as the vote progresses until an equilibrium is reached (at which point no player wishes to change their vote). Previous research established that this process converges to an equilibrium for the plurality and veto voting methods and for no other scoring rule. We consider iterative voting for nonscoring rules, examining the major ones, and show that none of them converge when assuming (as most research has so far) that voters pursue a best response strategy. We investigate other potential voter strategies, with a more heuristic flavor (since for most of these voting rules, calculating the best response is NPhard); we show that they also do not converge. We then conduct an empirical analysis of the iterative voting winners for these nonscoring rules, and compare the winner quality of various strategies.
Tuesday 22 16:30  18:00 KRARTP  Automated Reasoning and Theorem Proving

The Impact of Treewidth on ASP Grounding and Solving
Bernhard Bliem, Marius Moldovan, Michael Morak, Stefan Woltran
In this paper, we aim to study how the performance of modern answer set programming (ASP) solvers is influenced by the treewidth of the input program and to investigate the consequences of this relationship. We first perform an experimental evaluation that shows that the solving performance is heavily influenced by the treewidth, given ground input programs that are otherwise uniform, both in size and construction. This observation leads to an important question for ASP, namely, how to design encodings such that the treewidth of the resulting ground program remains small. To this end, we define the class of connectionguarded programs, which guarantees that the treewidth of the program after grounding only depends on the treewidth (and the degree) of the input instance. In order to obtain this result, we formalize the grounding process using MSO transductions.

ATL Strategic Reasoning Meets Correlated Equilibrium
Xiaowei Huang, Ji Ruan
This paper is motivated by analysing a Google selfdriving car accident, i.e., the car hit a bus, with the framework and the tools of strategic reasoning by model checking. First of all, we find that existing ATL model checking may find a solution to the accident with {\it irrational} joint strategy of the bus and the car. This leads to a restriction of treating both the bus and the car as rational agents, by which their joint strategy is an equilibrium of certain solution concepts. Second, we find that a randomlyselected joint strategy from the set of equilibria may result in the collision of the two agents, i.e., the accident. Based on these, we suggest taking Correlated Equilibrium (CE) as agents' joint stratgey and optimising over the utilitarian value which is the expected sum of the agents' total rewards. The language ATL is extended with two new modalities to express the existence of an CE and a unique CE, respectively. We implement the extension into a software model checker and use the tool to analyse the examples in the paper. We also study the complexity of the model checking problems.

Query Conservative Extensions in Horn Description Logics with Inverse Roles
Jean Christoph Jung, Carsten Lutz, Mauricio Martel, Thomas Schneider
We investigate the decidability and computational complexity of query conservative extensions in Horn description logics (DLs) with inverse roles. This is more challenging than without inverse roles because characterizations in terms of unbounded homomorphisms between universal models fail, blocking the standard approach to establishing decidability. We resort to a combination of automata and mosaic techniques, proving that the problem is 2EXPTIMEcomplete in HornALCHIF (and also in HornALC and in ELI). We obtain the same upper bound for deductive conservative extensions, for which we also prove a coNEXPTIME lower bound.

Efficient and Complete FDsolving for extended array constraints
Quentin Plazar, Mathieu Acher, Sébastien Bardin, Arnaud Gotlieb
Array constraints are essential for handling data structures in automated reasoning and software verification. Unfortunately, the use of a typical finite domain (FD) solver based on local consistencybased filtering has strong limitations when constraints on indexes are combined with constraints on array elements and size. This paper proposes an efficient and complete FDsolving technique for extended constraints over (possibly unbounded) arrays. We describe a simple but particularly powerful transformation for building an equisatisfiable formula that can be efficiently solved using standard FD reasoning over arrays, even in the unbounded case. Experiments show that the proposed solver significantly outperforms FD solvers, and successfully competes with the best SMTsolvers.

Symbolic LTLf Synthesis
Shufang Zhu, Lucas M. Tabajara, Jianwen Li, Geguang Pu, Moshe Y. Vardi
LTLf synthesis is the process of finding a strategy that satisfies a linear temporal specification over finite traces. An existing solution to this problem relies on a reduction to a DFA game. In this paper, we propose a symbolic framework for LTLf synthesis based on this technique, by performing the computation over a representation of the DFA as a boolean formula rather than as an explicit graph. This approach enables strategy generation by utilizing the mechanism of boolean synthesis. We implement this symbolic synthesis method in a tool called Syft, and demonstrate by experiments on scalable benchmarks that the symbolic approach scales better than the explicit one.

Classical Generalized Probabilistic Satisfiability
Filipe Casal, Andreia Mordido, Carlos Caleiro
We analyze a classical generalized probabilistic satisfiability problem (GGenPSAT) which consists in deciding the satisfiability of Boolean combinations of linear inequalities involving probabilities of classical propositional formulas. GGenPSAT coincides precisely with the satisfiability problem of the probabilistic logic of Fagin et al. and was proved to be NPcomplete. Here, we present a polynomial reduction of GGenPSAT to SMT over the quantifierfree theory of linear integer and real arithmetic. Capitalizing on this translation, we implement and test a solver for the GGenPSAT problem. As previously observed for many other NPcomplete problems, we are able to detect a phase transition behavior for GGenPSAT.
Tuesday 22 16:30  18:00 MLTSDS2  Time Series and Data Streams 2

A Functional Dynamic Boltzmann Machine
Hiroshi Kajino
Dynamic Boltzmann machines (DyBMs) are recently developed generative models of a time series. They are designed to learn a time series by efficient online learning algorithms, whilst taking longterm dependencies into account with help of eligibility traces, recursively updatable memory units storing descriptive statistics of all the past data. The current DyBMs assume a finitedimensional time series and cannot be applied to a functional time series, in which the dimension goes to infinity (e.g., spatiotemporal data on a continuous space). In this paper, we present a functional dynamic Boltzmann machine (FDyBM) as a generative model of a functional time series. A technical challenge is to devise an online learning algorithm with which FDyBM, consisting of functions and integrals, can learn a functional time series using only finite observations of it. We rise to the above challenge by combining a kernelbased function approximation method along with a statistical interpolation method and finally derive closedform update rules. We design numerical experiments to empirically confirm the effectiveness of our solutions. The experimental results demonstrate consistent error reductions as compared to baseline methods, from which we conclude the effectiveness of FDyBM for functional time series prediction.

Bayesian Dynamic Mode Decomposition
Naoya Takeishi, Yoshinobu Kawahara, Yasuo Tabei, Takehisa Yairi
Dynamic mode decomposition (DMD) is a datadriven method for calculating a modal representation of a nonlinear dynamical system, and it has been utilized in various fields of science and engineering. In this paper, we propose Bayesian DMD, which provides a principled way to transfer the advantages of the Bayesian formulation into DMD. To this end, we first develop a probabilistic model corresponding to DMD, and then, provide the Gibbs sampler for the posterior inference in Bayesian DMD. Moreover, as a specific example, we discuss the case of using a sparsitypromoting prior for an automatic determination of the number of dynamic modes. We investigate the empirical performance of Bayesian DMD using synthetic and realworld datasets.

Hybrid Neural Networks for Learning the Trend in Time Series
Tao Lin, tian guo, karl aberer
The trend of time series characterizes the intermediate upward and downward behaviour of time series. Learning and forecasting the trend in time series data play an important role in many real applications, ranging from resource allocation in data centers, load schedule in smart grid, and so on. Inspired by the recent successes of neural networks, in this paper we propose TreNet, a novel endtoend hybrid neural network to learn local and global contextual features for predicting the trend of time series. TreNet leverages convolutional neural networks (CNNs) to extract salient features from local raw data of time series. Meanwhile, considering the longrange dependency existing in the sequence of historical trends of time series, TreNet uses a longshort term memory recurrent neural network (LSTM) to capture such dependency. Then, a feature fusion layer is to learn joint representation for predicting the trend. TreNet demonstrates its effectiveness by outperforming CNN, LSTM, the cascade of CNN and LSTM, Hidden Markov Model based method and various kernel based baselines on real datasets.

A DualStage AttentionBased Recurrent Neural Network for Time Series Prediction
Yao Qin, Dongjin Song, Haifeng Chen, Wei Cheng, Geoff Jiang, Garrison Cottrell
The Nonlinear autoregressive exogenous (NARX) model, which predicts the current value of a time series based upon its previous values as well as the current and past values of multiple driving (exogenous) series, has been studied for decades. Despite the fact that various NARX models have been developed, few of them can capture the longterm temporal dependencies appropriately and select the relevant driving series to make predictions. In this paper, we propose a dualstage attentionbased recurrent neural network (DARNN) to address these two issues. In the first stage, we introduce an input attention mechanism to adaptively extract relevant driving series (a.k.a., input features) at each time step by referring to the previous encoder hidden state. In the second stage, we use a temporal attention mechanism to select relevant encoder hidden states across all time steps. With this dualstage attention scheme, our model can not only make predictions effectively, but can also be easily interpreted. Thorough empirical studies based upon the SML 2010 dataset and the NASDAQ 100 Stock dataset demonstrate that the DARNN can outperform stateoftheart methods for time series prediction.

CHARDA: Causal Hybrid Automata Recovery via Dynamic Analysis
Adam Summerville, Joseph Osborn, Michael Mateas
We propose and evaluate a new technique for learning hybrid automata automatically by observing the runtime behavior of a dynamical system.Working from a sequence of continuous state values and predicates about the environment, CHARDA recovers the distinct dynamic modes, learns a model for each mode from a given set of templates, and postulates \textit{causal} guard conditions which trigger transitions between modes.Our main contribution is the use of informationtheoretic measures (1)~as a cost function for data segmentation and model selction to penalize overfitting and (2)~to determine the likely causes of each transition.CHARDA is easily extended with different classes of model templates, fitting methods, or predicates.In our experiments on a complex videogame character, CHARDA successfully discovers a reasonable overapproximation of the character's true behaviors.Our results also compare favorably against recent work in automatically learning probabilistic timed automata in an aircraft domain: CHARDA exactly learns the modes of these simpler automata.

Sequential Prediction of Social Media Popularity with Deep Temporal Context Networks
Bo Wu, WenHuang Cheng, Yongdong Zhang, Qiushi Huang, Jintao Li, Tao Mei
Prediction of popularity has profound impact for social media, since it offers opportunities to reveal individual preference and public attention from evolutionary social systems. Previous research, although achieves promising results, neglects one distinctive characteristic of social data, i.e., sequentiality. For example, the popularity of online content is generated over time with sequential post streams of social media. To investigate the sequential prediction of popularity, we propose a novel prediction framework called Deep Temporal Context Networks (DTCN) by incorporating both temporal context and temporal attention into account. Our DTCN contains three main components, from embedding, learning to predicting. With a joint embedding network, we obtain a unified deep representation of multimodal userpost data in a common embedding space. Then, based on the embedded data sequence over time, temporal context learning attempts to recurrently learn two adaptive temporal contexts for sequential popularity. Finally, a novel temporal attention is designed to predict new popularity (the popularity of a new userpost pair) with temporal coherence across multiple timescales. Experiments on our released image dataset with about 600K Flickr photos demonstrate that DTCN outperforms stateoftheart deep prediction algorithms, with an average of 21.51% relative performance improvement in the popularity prediction (Spearman Ranking Correlation).
Tuesday 22 16:30  18:00 MLKM  Kernel Methods

Largescale Online Kernel Learning with Random Feature Reparameterization
Tu Dinh Nguyen, Trung Le, Hung Bui, Dinh Phung
A typical online kernel learning method faces two fundamental issues: the complexity in dealing with a huge number of observed data points (a.k.a the curse of kernelization) and the difficulty in learning kernel parameters, which often assumed to be fixed. Random Fourier feature is a recent and effective approach to address the former by approximating the shiftinvariant kernel function via Bocher's theorem, and allows the model to be maintained directly in the random feature space with a fixed dimension, hence the model size remains constant w.r.t. data size. We further introduce in this paper the reparameterized random feature (RRF), a random feature framework for largescale online kernel learning to address both aforementioned challenges. Our initial intuition comes from the socalled "reparameterization trick" [Kingma et al., 2014] to lift the source of randomness of Fourier components to another space which can be independently sampled, so that stochastic gradient of the kernel parameters can be analytically derived. We develop a wellfounded underlying theory for our method, including a general way to reparameterize the kernel, and a new tighter error bound on the approximation quality. This view further inspires a direct application of stochastic gradient descent for updating our model under an online learning setting. We then conducted extensive experiments on several largescale datasets where we demonstrate that our work achieves stateoftheart performance in both learning efficacy and efficiency.

Multiple Kernel Clustering Framework with Improved Kernels
Yueqing Wang, Xinwang Liu, Yong Dou, Rongchun Li
Multiple kernel clustering (MKC) algorithms have been successfully applied into various applications. However, these successes are largely dependent on the quality of predefined base kernels, which cannot be guaranteed in practical applications. This may adversely affect the clustering performance. To address this issue, we propose a simple while effective framework to adaptively improve the quality of these base kernels. Under our framework, we instantiate three MKC algorithms based on the widely used multiple kernel $k$means clustering (MKKM), MKKM with matrixinduced regularization (MKKMMR) and coregularized multiview spectral clustering (CRSC). After that, we design the corresponding algorithms with proved convergence to solve the resultant optimization problems. To the best of our knowledge, our framework fills the gap between kernel adaption and clustering procedure for the first time in the literature and is readily extendable. Extensive experimental research has been conducted on 7 MKC benchmarks. As is shown, our algorithms consistently and significantly improve the performance of the base MKC algorithms, indicating the effectiveness of the proposed framework. Meanwhile, our framework shows better performance than compared ones with imperfect kernels.

Approximate Largescale Multiple Kernel kmeans using Deep Neuron Network
Yueqing Wang, Xinwang Liu, Yong Dou, Rongchun Li
Multiple kernel clustering algorithms have been extensively studied and applied into various applications. Although demonstrating great success in both theoretical aspect and applications, existing MKKM algorithms cannot be applied to large scale clustering tasks due to that: i) the heavy computational cost to calculate the base kernels; and ii) insufficient memory to load the kernel matrices. In this paper, we propose an approximate algorithm to overcome these issues, and to make it be applicable to largescale applications. In specific, our algorithm trains a deep neuron network to regress the indicating matrix generated by MKC algorithms on a small subset, and then obtains the approximate indicating matrix of the whole data set using the trained network, and finally performs the $k$means on the output of our network. By this way, our algorithm avoids computing the full kernel matrices by mapping features into indicating matrix directly, which dramatically decreases the memory requirement. Extensive experiments show that our algorithm consumes less time than most compared algorithms, while achieves comparable performance with MKC algorithms.

Instability Prediction in Power Systems using Recurrent Neural Networks
Ankita Gupta, Gurunath Gurrala, P.S. Sastry
Recurrent Neural Networks (RNNs) can model temporal dependencies in time series well. In this paper we present an interesting application of stacked Gated Recurrent Unit (GRU) based RNN for early prediction of imminent instability in a power system based on normal measurements of power system variables over time. In a power system, disturbances like a fault can result in transient instability which may lead to blackouts. Early pre diction of any such contingency can aid the operator to take timely preventive control actions. In recent times some machine learning techniques such as SVMs have been proposed to predict such instability. However, these approaches assume availability of accurate fault information like its occurrence and clearance instants which is impractical. In this paper we propose an Online Monitoring System (OMS), which is a GRU based RNN, that continuously keeps predicting the current status based on past measurements. Through extensive simulations using a standard 118bus system, the effectiveness of the proposed system is demonstrated. We also show how we can use PCA and predictions from the RNN to identify the most critical generator that leads to transient instability.

Learning CoSubstructures by Kernel Dependence Maximization
Sho Yokoi, Daichi Mochihashi, Ryo Takahashi, Naoaki Okazaki, Kentaro Inui
Modeling associations between items in a dataset is a problem that is frequently encountered in data and knowledge mining research. Most previous studies have simply applied a predefined fixed pattern for extracting the substructure of each item pair and then analyzed the associations between these substructures. Using such fixed patterns may not, however, capture the significant association. We, therefore, propose the novel machine learning task of extracting a strongly associated substructure pair (cosubstructure) from each input item pair. We call this task dependent cosubstructure extraction (DCSE), and formalize it as a dependence maximization problem. Then, we discuss critical issues with this task: the data sparsity problem and a huge search space. To address the data sparsity problem, we adopt the HilbertSchmidt independence criterion as an objective function. To improve search efficiency, we adopt the MetropolisHastings algorithm. We report the results of empirical evaluations, in which the proposed method is applied for acquiring narrative event pairs, a knowledge mining task that is an active area of study in the field of natural language processing.

Studentt Process Regression with Studentt Likelihood
Qingtao Tang, Li Niu, Yisen Wang, Tao Dai, Wangpeng An, Jianfei Cai, ShuTao Xia
Gaussian Process Regression (GPR) is a powerful Bayesian method. However, the performance of GPR can be significantly degraded when the training data are contaminated by outliers, including target outliers and input outliers. Although there are some variants of GPR (e.g., GPR with Studentt likelihood (GPRT)) aiming to handle outliers, most of the variants focus on handling the target outliers while little effort has been done to deal with the input outliers. In contrast, in this work, we aim to handle both the target outliers and the input outliers at the same time. Specifically, we replace the Gaussian noise in GPR with independent Studentt noise to cope with the target outliers. Moreover, to enhance the robustness w.r.t. the input outliers, we use a Studentt Process prior instead of the common Gaussian Process prior, leading to Studentt Process Regression with Studentt Likelihood (TPRT). We theoretically show that TPRT is more robust to both input and target outliers than GPR and GPRT, and prove that both GPR and GPRT are special cases of TPRT. Various experiments demonstrate that TPRT outperforms GPR and its variants on both synthetic and real datasets.
Tuesday 22 16:30  18:00 AUTSEC  AI & Autonomy: Security

ContextBased Reasoning on Privacy in Internet of Things
Nadin Kokciyan, Pinar Yolum
More and more, devices around us are being connected to each other in the realm of Internet of Things (IoT). Their communication and especially collaboration promises useful services to be provided to end users. However, the same communication channels pose important privacy concerns to be raised. It is not clear which information will be shared with whom, for which intents, under which conditions. Existing approaches to privacy advocate policies to be in place to regulate privacy. However, the scale and heterogeneity of the IoT entities make it infeasible to maintain policies among each and every entity in the system. Conversely, it is best if each entity can reason on the privacy using norms and context autonomously. Accordingly, this paper proposes an approach where each entity finds out which contexts it is in based on information it gathers from other entities in the system. The proposed approach uses argumentation to enable IoT entities to reason about their context and decide to reveal information based on it. We demonstrate the applicability of the approach over an IoT scenario.

Privacy and Autonomous Systems
Jose Such
We discuss the problem of privacy in autonomous systems, introducing different conceptualizations and perspectives on privacy to assess the threats that autonomous systems may pose to privacy. After this, we outline sociotechnical and legal measures that should be put in place to mitigate these threats. Beyond privacy threats and countermeasures, we also argue how autonomous systems may be, at the same time, the key to address some of the most challenging and pressing privacy problems nowadays and in the near future.

Concrete Problems for Autonomous Vehicle Safety: Advantages of Bayesian Deep Learning
Rowan McAllister, Yarin Gal, Alex Kendall, Mark van der Wilk, Amar Shah, Roberto Cipolla, Adrian Weller
Autonomous vehicle (AV) software is typically composed of a pipeline of individual components, linking sensor inputs to motor outputs. Erroneous component outputs propagate downstream, hence safe AV software must consider the ultimate effect of each component’s errors. Further, improving safety alone is not sufficient. Passengers must also feel safe to trust and use AV systems. To address such concerns, we investigate three underexplored themes for AV research: safety, interpretability, and compliance. Safety can be improved by quantifying the uncertainties of component outputs and propagating them forward through the pipeline. Interpretability is concerned with explaining what the AV observes and why it makes the decisions it does, building reassurance with the passenger. Compliance refers to maintaining some control for the passenger. We discuss open challenges for research within these themes. We highlight the need for concrete evaluation metrics, propose example problems, and highlight possible solutions.

Algorithmic Bias in Autonomous Systems
David Danks, Alex John London
Algorithms play a key role in the functioning of autonomous systems, and so concerns have periodically been raised about the possibility of algorithmic bias. However, debates in this area have been hampered by different meanings and uses of the term, "bias." It is sometimes used as a purely descriptive term, sometimes as a pejorative term, and such variations can promote confusion and hamper discussions about when and how to respond to algorithmic bias. In this paper, we first provide a taxonomy of different types and sources of algorithmic bias, with a focus on their different impacts on the proper functioning of autonomous systems. We then use this taxonomy to distinguish between algorithmic biases that are neutral or unobjectionable, and those that are problematic in some way and require a response. In some cases, there are technological or algorithmic adjustments that developers can use to compensate for problematic bias. In other cases, however, responses require adjustments by the agent, whether human or autonomous system, who uses the results of the algorithm. There is no "one size fits all" solution to algorithmic bias.
Tuesday 22 16:30  18:00 MLFSC2  Feature Selection and Construction 2

SelfPaced Multitask Learning with Shared Knowledge
Keerthiram Murugesan, Jaime Carbonell
This paper introduces selfpaced task selection to multitask learning, where instances from more closely related tasks are selected in a progression of easiertoharder tasks, to emulate an effective human education strategy, but applied to multitask machine learning. We develop the mathematical foundation for the approach based on iterative selection of the most appropriate task, learning the task parameters, and updating the shared knowledge, optimizing a new biconvex loss function. This proposed method applies quite generally, including to multitask feature learning, multitask learning with alternating structure optimization, etc. Results show that in each of the above formulations selfpaced (easiertoharder) task selection outperforms the baseline version of these methods in all the experiments.

Adaptive Hypergraph Learning for Unsupervised Feature Selection
Xiaofeng Zhu, Yonghua Zhu, Shichao Zhang, Rongyao Hu, Wei He
Current unsupervised feature selection (UFS) methods learn the similarity matrix by using a simple graph which is learnt from the original data as well as is independent from the process of feature selection, and thus unable to efficiently remove the redundant/irrelevant features. To address these issues, we propose a new UFS method to jointly learn the similarity matrix and conduct both subspace learning (via learning a dynamic hypergraph) and feature selection (via a sparsity constraint). As a result, we reduce the feature dimensions using different methods (i.e., subspace learning and feature selection) from different feature spaces, and thus makes our method select the informative features effectively and robustly. We tested our method using benchmark datasets to conduct the clustering tasks using the selected features, and the experimental results show that our proposed method outperforms all the comparison methods.

Datadriven Random Fourier Features using Stein Effect
WeiCheng Chang, ChunLiang Li, Yiming Yang, Barnabas Poczos
Largescale kernel approximation is an important problem in machine learning research. Approaches using random Fourier features have become increasingly popular \cite{Rahimi_NIPS_07}, where kernel approximation is treated as empirical mean estimation via Monte Carlo (MC) or QuasiMonte Carlo (QMC) integration \cite{Yang_ICML_14}. A limitation of the current approaches is that all the features receive an equal weight summing to 1. In this paper, we propose a novel shrinkage estimator from "Stein effect", which provides a datadriven weighting strategy for random features and enjoys theoretical justifications in terms of lowering the empirical risk. We further present an efficient randomized algorithm for largescale applications of the proposed method. Our empirical results on six benchmark data sets demonstrate the advantageous performance of this approach over representative baselines in both kernel approximation and supervised learning tasks.

Theoretic Analysis and Extremely Easy Algorithms for Domain Adaptive Feature Learning
Wenhao Jiang, Cheng Deng, Wei Liu, Feiping Nie, Fulai Chung, Heng Huang
Domain adaptation problems arise in a variety of applications, where a training dataset from the source domain and a test dataset from the target domain typically follow different distributions. The primary difficulty in designing effective learning models to solve such problems lies in how to bridge the gap between the source and target distributions. In this paper, we provide comprehensive analysis of feature learning algorithms used in conjunction with linear classifiers for domain adaptation. Our analysis shows that in order to achieve good adaptation performance, the second moments of the source domain distribution and target domain distribution should be similar. Based on our new analysis, a novel extremely easy feature learning algorithm for domain adaptation is proposed. Furthermore, our algorithm is extended by leveraging multiple layers, leading to another feature learning algorithm. We evaluate the effectiveness of the proposed algorithms in terms of domain adaptation tasks on Amazon review and spam datasets from the ECML/PKDD 2006 discovery challenge.

Multiple Indefinite Kernel Learning for Feature Selection
Hui Xue, Yu Song, HaiMing Xu
Multiple kernel learning for feature selection (MKLFS) utilizes kernels to explore complex properties of features and performs better in embedded methods. However, the kernels in MKLFS are generally limited to be positive definite. In fact, indefinite kernels often emerge in actual applications and can achieve better empirical performance. But due to the nonconvexity of indefinite kernels, existing MKLFS methods are usually inapplicable and the corresponding research is also relatively little. In this paper, we propose a novel multiple indefinite kernel feature selection method (MIKFS) based on the primal framework of indefinite kernel support vector machine (IKSVM), which applies an indefinite base kernel for each feature and then exerts an l1norm constraint on kernel combination coefficients to select features automatically. A twostage algorithm is further presented to optimize the coefficients of IKSVM and kernel combination alternately. In the algorithm, we reformulate the nonconvex optimization problem of primal IKSVM as a difference of convex functions (DC) programming and transform the nonconvex problem into a convex one with the affine minorization approximation. Experiments on realworld datasets demonstrate that MIKFS is superior to some related stateoftheart methods in both feature selection and classification performance.

Learning sparse representations in reinforcement learning with sparse coding
Lei Le, Raksha Kumaraswamy, Martha White
A variety of representation learning approaches have been investigated for reinforcement learning; much less attention, however, has been given to investigating the utility of sparse coding. Outside of reinforcement learning, sparse coding representations have been widely used, with nonconvex objectives that result in discriminative representations. In this work, we develop a supervised sparse coding objective for policy evaluation. Despite the nonconvexity of this objective, we prove that all local minima are global minima, making the approach amenable to simple optimization strategies. We empirically show that it is key to use a supervised objective, rather than the more straightforward unsupervised sparse coding approach. We then compare the learned representations to a canonical fixed sparse representation, called tilecoding, demonstrating that the sparse coding representation outperforms a wide variety of tilecoding representations.
Tuesday 22 16:30  18:30 SISPL  Sister Conference Track: Planning

Dynamical SystemBased Motion Planning for MultiArm Systems: Reaching for Moving Objects
Seyed Sina Mirrazavi Salehian, Nadia Figueroa, Aude Billard
The use of coordinated multiarm robotic systems allows to preform manipulations of heavy or bulky objects that would otherwise be infeasible for a singlearm robot. This paper concisely introduces our work on coordinated multiarm control [Salehian et al., 2016a], where we proposed a virtual object based dynamical systems (DS) control law to generate autonomous and synchronized motions for a multiarm robot system. We show theoretically and empirically that the multiarm + virtual object system converges asymptotically to a moving object. The proposed framework is validated on a dualarm robotic system. We demonstrate that it can resynchronize and adapt the motion of each arm in a fraction of a second, even when the object’s motion is fast and not accurately predictable.

Lessons from the Amazon Picking Challenge: Four Aspects of Building Robotic Systems
Clemens Eppner, Sebastian Höfer, Rico Jonschkowski, Roberto MartínMartín, Arne Sieverling, Vincent Wall, Oliver Brock
We describe the winning entry to the Amazon Picking Challenge 2015. From the experience of building this system and competing, we derive several conclusions: (1) We suggest to characterize robotic system building along four key aspects, each of them spanning a spectrum of solutions  modularity vs. integration, generality vs. assumptions, computation vs. embodiment, and planning vs. feedback. (2) To understand which region of each spectrum most adequately addresses which robotic problem, we must explore the full spectrum of possible approaches. (3) For manipulation problems in unstructured environments, certain regions of each spectrum match the problem most adequately, and should be exploited further. This is supported by the fact that our solution deviated from the majority of the other challenge entries along each of the spectra. This is an abridged version of a conference publication.

Maximizing Awareness about HIV in Social Networks of Homeless Youth with Limited Information
Amulya Yadav, Hau Chan, Albert Xin Jiang, Haifeng Xu, Eric Rice, Milind Tambe
This paper presents HEALER, a software agent that recommends sequential intervention plans for use by homeless shelters, who organize these interventions to raise awareness about HIV among homeless youth. HEALER's sequential plans (built using knowledge of social networks of homeless youth) choose intervention participants strategically to maximize influence spread, while reasoning about uncertainties in the network. While previous work presents influence maximizing techniques to choose intervention participants, they do not address two realworld issues: (i) they completely fail to scale up to realworld sizes; and (ii) they do not handle deviations in execution of intervention plans. HEALER handles these issues via two major contributions: (i) HEALER casts this influence maximization problem as a POMDP and solves it using a novel planner which scales up to previously unsolvable realworld sizes; and (ii) HEALER allows shelter officials to modify its recommendations, and updates its future plans in a deviationtolerant manner. HEALER was deployed in the real world in Spring 2016 with considerable success.

Idual: Solving Constrained SSPs via Heuristic Search in the Dual Space
Felipe Trevizan, Sylvie Thiebaux, Pedro Santana, Brian Williams
We consider the problem of generating optimal stochastic policies for Constrained Stochastic Shortest Path problems, which are a natural model for planning under uncertainty for resourcebounded agents with multiple competing objectives. While unconstrained SSPs enjoy a multitude of efficient heuristic search solution methods with the ability to focus on promising areas reachable from the initial state, the state of the art for constrained SSPs revolves around linear and dynamic programming algorithms which explore the entire state space. In this paper, we present idual, the first heuristic search algorithm for constrained SSPs. To concisely represent constraints and efficiently decide their violation, idual operates in the space of dual variables describing the policy occupation measures. It does so while retaining the ability to use standard value function heuristics computed by wellknown methods. Our experiments show that these features enable idual to achieve up to two orders of magnitude improvement in runtime and memory over linear programming algorithms.

An EndtoEnd System for Accomplishing Tasks with Modular Robots: Perspectives for the AI community
Gangyuan Jing, Tarik Tosun, Mark Yim, Hadas KressGazit
The advantage of modular robot systems lies in their flexibility, but this advantage can only be realized if there exists some reliable, effective way of generating configurations (shapes) and behaviors (controlling programs) appropriate for a given task. In this paper, we present an endtoend system for addressing tasks with modular robots, and demonstrate that it is capable of accomplishing challenging multipart tasks in hardware experiments. The system consists of four tightly integrated components: (1) A highlevel mission planner, (2) A design library spanning a wide set of functionality, (3) A design and simulation tool for populating the library with new configurations and behaviors, and (4) Modular robot hardware. This paper condenses the material originally presented in Jing et al. 2016 into a shorter format suitable for a broad audience.

Value Iteration Networks
Aviv Tamar, Yi Wu, Garrett Thomas, Sergey Levine, Pieter Abbeel
We introduce the value iteration network (VIN): a fully differentiable neural network with a `planning module' embedded within. VINs can learn to plan, and are suitable for predicting outcomes that involve planningbased reasoning, such as policies for reinforcement learning. Key to our approach is a novel differentiable approximation of the valueiteration algorithm, which can be represented as a convolutional neural network, and trained endtoend using standard backpropagation.We evaluate VIN based policies on discrete and continuous pathplanning domains, and on a naturallanguage based search task. We show that by learning an explicit planning computation, VIN policies generalize better to new, unseen domains.This paper is a significantly abridged and IJCAI audience targeted version of the original NIPS 2016 paper with the same title, available here: https://arxiv.org/abs/1602.02867
Tuesday 22 18:00  19:00 Special Session Demonstrations
Wednesday 23 08:30  10:00 PLSPS  Search in Planning and Scheduling

Landmarks for Numeric Planning Problems
Enrico Scala, Patrik Haslum, Daniele Magazzeni, Sylvie Thiebaux
The paper generalises the notion of landmarks for reasoning about planning problems involving propositional and numeric variables. Intuitively, numeric landmarks are regions in the metric space defined by the problem whose crossing is necessary for its resolution. The paper proposes a relaxationbased method for their automated extraction directly from the problem structure, and shows how to exploit them to infer what we call disjunctive and additive hybrid action landmarks. The justification of such a disjunctive representation results from the intertwined propositional and numeric structure of the problem. The paper exercises their use in two novel admissible LPBased numeric heuristics, and reports experiments on costoptimal numeric planning problems. Results show the heuristics are more informed and effective than previous work for problems involving a higher number of (sub)goals.

Faster Conflict Generation for Dynamic Controllability
Nikhil Bhargava, Tiago Vaquero, Brian Williams
In this paper, we focus on speeding up the temporal plan relaxation problem for dynamically controllable systems. We take a look at the current bestknown algorithm for determining dynamic controllability and augment it to efficiently generate conflicts when the network is deemed uncontrollable. Our work preserves the O(n^3) runtime of the best available dynamic controllability checker and improves on the previous best runtime of O(n^4) for extracting dynamic controllability conflicts. We then turn our attention to temporal plan relaxation tasks and show how we can leverage our work on conflicts and the structure of the network to efficiently make incremental updates intended to restore dynamic controllability by relaxing constraints. Our new algorithm, RelaxIDC, has the same asymptotic runtime as previous algorithms but sees dramatic empirical improvements over the course of repeated dynamic controllability checks.

Numeric Planning via Abstraction and Policy Guided Search
León Illanes, Sheila McIlraith
The realworld application of planning techniques often requires models with numeric fluents. However, these fluents are not directly supported by most planners and heuristics. We describe a family of planning algorithms that takes a numeric planning problem and produces an abstracted representation that can be solved using any classical planner. The resulting abstract plan is generalized into a policy and then used to guide the search in the original numeric domain. We prove that our approach is sound, and we evaluate it on a set of standard benchmarks. We show that it can provide competitive performance when compared to other wellknown algorithms for numeric planning, and a significant performance improvement in certain domains.

Lossy Compression of Pattern Databases Using Acyclic Random Hypergraphs
Mehdi Sadeqi, Howard Hamilton
A domainindependent heuristic function created by an abstraction is usually implemented using a Pattern Database (PDB), which is a lookup table of (abstract state, heuristic value) pairs. PDBs containing high quality heuristic values generally require substantial memory space and therefore need to be compressed. In this paper, we introduce Acyclic Random Hypergraph Compression (ARHC), a domainindependent approach to compressing PDBs using acyclic random rpartite runiform hypergraphs. The ARHC algorithm, which comes in Base and Extended versions, provides fast lookup and a high compression rate. ARHCExtended achieves higher quality heuristics than ARHCBase by decreasing the heuristic information loss at the cost of some decrease in the compression rate. ARHC shows higher performance than levelbylevel Bloom filter PDB compression in all experiments conducted so far.

A Scalable Approach to Chasing Multiple Moving Targets with Multiple Agents
Fan Xie, Adi Botea, Akihiro Kishimoto
Chasing multiple mobile targets with multiple agents is important in several applications, such as computer games and police chasing scenarios. Existing approaches can compute optimal policies. However, they have a limited scalability, as they implement expensive minimax searches. We introduce a suboptimal but scalable approach that assigns individual agents to individual targets and that can dynamically recompute such assignments. We provide a theoretical analysis, including upper bounds on the number of time steps required to solve an instance. In a detailed empirical evaluation on grid maps, our algorithm scales up very convincingly beyond the limits of previous methods. On small problems, where a comparison to a minimax approach is possible, the results demonstrate a good solution quality for our method.

Efficient Optimal Search under Expensive Edge Cost Computation
Masataro Asai, Akihiro Kishimoto, Adi Botea, Radu Marinescu, Elizabeth M. Daly, Spyros Kotoulas
Optimal heuristic search has been successful in many domains, including journey planning, route planning and puzzle solving. Existing work typically assumes that the cost of each action can easily be obtained. However, in many problems, the exact edge cost is expensive to compute. Existing search algorithms face a significant performance bottleneck, due to an excessive overhead associated with dynamically calculating exact edge costs. We present DEA*, an algorithm for problems with expensive edge cost computations. DEA* combines heuristic edge cost evaluations with delayed node expansions, reducing the number of exact edge computations. We formally prove that DEA* is optimal and it is efficient with respect to the number of exact edge cost computations. We empirically evaluate DEA* on multipleworker routing problems where the exact edge cost is calculated by invoking an external multimodal journey planning engine. The results demonstrate the effectiveness of our ideas in reducing the computational time and improving the solving ability. In addition, we show the advantages of DEA* in domainindependent planning, where we simulate that accurate edge costs are expensive to compute.
Wednesday 23 08:30  10:00 MLCL3  Classification 3

RescaleInvariant SVM for Binary Classification
Mojtaba Montazery, Nic Wilson
Support Vector Machines (SVM) are among the most wellknown machine learning methods, with broad use in different scientific areas. However, one necessary preprocessing phase for SVM is normalization (scaling) of features, since SVM is not invariant to the scales of the features’ spaces, i.e., different ways of scaling may lead to different results. We define a more robust decisionmaking approach for binary classification, in which one sample strongly belongs to a class if it belongs to that class for all possible rescalings of features. We derive a way of characterising the approach for binary SVM that allows determining when an instance strongly belongs to a class and when the classification is invariant to rescaling. The characterisation leads to a computation method to determine whether one sample is strongly positive, strongly negative or neither. Our experimental results back up the intuition that being strongly positive suggests stronger confidence that an instance really is positive.

Analogypreserving functions: A way to extend Boolean samples
Nicolas Hug, Henri Prade, Miguel Couceiro, Gilles Richard
Training set extension is an important issue in machine learning. Indeed when the examples at hand are in a limited quantity, the performances of standard classifiers may significantly decrease and it can be helpful to build additional examples. In this paper, we consider the use of analogical reasoning, and more particularly of analogical proportions for extending training sets. Here the ground truth labels are considered to be given by a (partially known) function. We examine the conditions that are required for such functions to ensure an errorfree extension in a Boolean setting. To this end, we introduce the notion of Analogy Preserving (AP) functions, and we prove that their class is the class of affine Boolean functions. This noteworthy theoretical result is complemented with an empirical investigation of approximate AP functions, which suggests that they remain suitable for training set extension.

Further Results on Predicting Cognitive Abilities for Adaptive Visualizations
Cristina Conati, Sébastien Lallé, Md Abed Rahman, Dereck Toker
Previous work has shown that some user cognitive abilities relevant for processing information visualizations can be predicted from eye tracking data. Performing this type of user modeling is important for devising useradaptive visualizations that can adapt to a user’s abilities as needed during the interaction. In this paper, we contribute to previous work by extending the type of visualizations considered and the set of cognitive abilities that can be predicted from gaze data, thus providing evidence on the generality of these findings. We also evaluate how quality of gaze data impacts prediction.

Logistic Markov Decision Processes
Martin Mladenov, Craig Boutilier, Dale Schuurmans, Ofer Meshi, Gal Elidan, Tyler Lu
User modeling in advertising and recommendation has typically focused on myopic predictors of user responses. In this work, we consider the longterm decision problem associated with user interaction. We propose a concise specification of longterm interaction dynamics by combining factored dynamic Bayesian networks with logistic predictors of user responses, allowing stateoftheart prediction models to be seamlessly extended. We show how to solve such models at scale by providing a constraint generation approach for approximate linear programming that overcomes the variable coupling and nonlinearity induced by the logistic regression predictor. The efficacy of the approach is demonstrated on advertising domains with up to 2^54 states and 2^39 actions.

Fast SVM Trained by DivideandConquer Anchors
Meng Liu, Chang Xu, Chao Xu, Dacheng Tao
Supporting vector machine (SVM) is the most frequently used classifier for machine learning tasks. However, its training time could become cumbersome when the size of training data is very large. Thus, many kinds of representative subsets are chosen from the original dataset to reduce the training complexity. In this paper, we propose to choose the representative points which are noted as anchors obtained from nonnegative matrix factorization (NMF) in a divideandconquer framework, and then use the anchors to train an approximate SVM. Our theoretical analysis shows that the solving the DCASVM can yield an approximate solution close to the primal SVM. Experimental results on multiple datasets demonstrate that our DCASVM is faster than the stateoftheart algorithms without notably decreasing the accuracy of classification results.

Accelerated Doubly Stochastic Gradient Algorithm for Largescale Empirical Risk Minimization
Zebang Shen, Hui Qian, Tongzhou Mu, Chao Zhang
Nowadays, algorithms with fast convergence, small memory footprints, and low periteration complexity are particularly favorable for artificial intelligence applications. In this paper, we propose a doubly stochastic algorithm with a novel accelerating multimomentum technique to solve large scale empirical risk minimization problem for learning tasks. While enjoying a provably superior convergence rate, in each iteration, such algorithm only accesses a mini batch of samples and meanwhile updates a small block of variable coordinates, which substantially reduces the amount of memory reference when both the massive sample size and ultrahigh dimensionality are involved. Specifically, to obtain an $\epsilon$accurate solution, our algorithm requires only $\OM(\log(1/\epsilon)/\sqrt{\epsilon})$ overall computation for the general convex case and $\OM((n+\sqrt{n\kappa})\log(1/\epsilon))$ for the strongly convex case. Empirical studies on huge scale datasets are conducted to illustrate the efficiency of our method in practice.
Wednesday 23 08:30  10:00 MLDL3  Deep Learning 3

Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering
Zhuxi Jiang, Yin Zheng, Huachun Tan, Bangsheng Tang, Hanning Zhou
Clustering is among the most fundamental tasks in machine learning and artificial intelligence. In this paper, we propose Variational Deep Embedding (VaDE), a novel unsupervised generative clustering approach within the framework of Variational AutoEncoder (VAE). Specifically, VaDE models the data generative procedure with a Gaussian Mixture Model (GMM) and a deep neural network (DNN): 1) the GMM picks a cluster; 2) from which a latent embedding is generated; 3) then the DNN decodes the latent embedding into an observable. Inference in VaDE is done in a variational way: a different DNN is used to encode observables to latent embeddings, so that the evidence lower bound (ELBO) can be optimized using the Stochastic Gradient Variational Bayes (SGVB) estimator and the reparameterization trick. Quantitative comparisons with strong baselines are included in this paper, and experimental results show that VaDE significantly outperforms the stateoftheart clustering methods on 5 benchmarks from various modalities. Moreover, by VaDE's generative nature, we show its capability of generating highly realistic samples for any specified cluster, without using supervised information during training.

ConvolutionalMatch Networks for Question Answering
Spyridon Samothrakis, Tom Vodopivec, Michael Fairbank, Maria Fasli
In this paper, we present a simple, yet effective, attention and memory mechanism that is reminiscent of Memory Networks and we demonstrate it in questionanswering scenarios. Our mechanism is based on four simple premises: a) memories can be formed from word sequences by using convolutional networks; b) distance measurements can be taken at a neuronal level; c) a recursive softmax function can be used for attention; d) extensive weight sharing can help profoundly. We achieve stateoftheart results in the bAbI tasks, outperforming both Memory Networks and the Differentiable Neural Computer, both in terms of accuracy and stability (i.e. variance) of results.

Improved Deep Embedded Clustering with Local Structure Preservation
Xifeng Guo, Long Gao, Xinwang Liu, Jianping Yin
Deep clustering learns deep feature representations that favor clustering task using neural networks. Some pioneering work proposes to simultaneously learn embedded features and perform clustering by explicitly defining a clustering oriented loss. Though promising performance has been demonstrated in various applications, we observe that a vital ingredient has been overlooked by these work that the defined clustering loss may corrupt feature space, which leads to nonrepresentative meaningless features and this in turn hurts clustering performance. To address this issue, in this paper, we propose the Improved Deep Embedded Clustering (IDEC) algorithm to take care of data structure preservation. Specifically, we manipulate feature space to scatter data points using a clustering loss as guidance. To constrain the manipulation and maintain the local structure of data generating distribution, an undercomplete autoencoder is applied. By integrating the clustering loss and autoencoder's reconstruction loss, IDEC can jointly optimize cluster labels assignment and learn features that are suitable for clustering with local structure preservation. The resultant optimization problem can be effectively solved by minibatch stochastic gradient descent and backpropagation. Experiments on image and text datasets empirically validate the importance of local structure preservation and the effectiveness of our algorithm.

Modeling Hebb Learning Rule for Unsupervised Learning
Jia Liu, Maoguo Gong, Qiguang Miao
This paper presents to model the Hebb learning rule and proposes a neuron learning machine (NLM). Hebb learning rule describes the plasticity of the connection between presynaptic and postsynaptic neurons and it is unsupervised itself. It formulates the updating gradient of the connecting weight in artificial neural networks. In this paper, we construct an objective function via modeling the Hebb rule. We make a hypothesis to simplify the model and introduce a correlation based constraint according to the hypothesis and stability of solutions. By analysis from the perspectives of maintaining abstract information and increasing the energy based probability of observed data, we find that this biologically inspired model has the capability of learning useful features. NLM can also be stacked to learn hierarchical features and reformulated into convolutional version to extract features from 2dimensional data. Experiments on singlelayer and deep networks demonstrate the effectiveness of NLM in unsupervised feature learning.

DRLnet: Deep Difference Representation Learning Network and An Unsupervised Optimization Framework
Puzhao Zhang, Maoguo Gong, Hui Zhang, Jia Liu
Change detection and analysis (CDA) is an important research topic in the joint interpretation of spatialtemporal remote sensing images. The core of CDA is to effectively represent the difference and measure the difference degree between bitemporal images. In this paper, we propose a novel difference representation learning network (DRLnet) and an effective optimization framework without any supervision. Difference measurement, difference representation learning and unsupervised clustering are combined as a single model, i.e., DRLnet, which is driven to learn clusteringfriendly and discriminative difference representations (DRs) for different types of changes. Further, DRLnet is extended into a recurrent learning framework to update and reuse limited training samples and prevent the semantic gaps caused by the saltation in the number of change types from overclustering stage to the desired one. Experimental results identify the effectiveness of the proposed framework.

SEVEN: Deep Semisupervised Verification Networks
Vahid Noroozi, Lei Zheng, Sara Bahaadini, Sihong Xie, Philip Yu
Verification determines whether two samples belong to the same class or not, and has important applications such as face and fingerprint verification, where thousands or millions of categories are present but each category has scarce labeled examples, presenting two major challenges for existing deep learning models. We propose a deep semisupervised model named SEmisupervised VErification Network (SEVEN) to address these challenges. The model consists of two complementary components. The generative component addresses the lack of supervision within each category by learning general salient structures from a large amount of data across categories. The discriminative component exploits the learned general features to mitigate the lack of supervision within categories, and also directs the generative component to find more informative structures of the whole data manifold. The two components are tied together in SEVEN to allow an endtoend training of the two components. Extensive experiments on four verification tasks demonstrate that SEVEN significantly outperforms other stateoftheart deep semisupervised techniques when labeled data are in short supply. Furthermore, SEVEN is competitive with fully supervised baselines trained with a larger amount of labeled data. It indicates the importance of the generative component in SEVEN.
Wednesday 23 08:30  10:00 KRDLO1  Description Logics ad Ontologies 1

Role Forgetting for ALCOQH(universal role)Ontologies Using an AckermannBased Approach
Yizheng Zhao, Renate Schmidt
Forgetting refers to a nonstandard reasoning problem concerned with eliminating concept and role symbols from description logicbased ontologies while preserving all logical consequences up to the remaining symbols. While previous work has primarily focused on forgetting concept symbols, in this paper, we turn our attention to role symbol forgetting. In particular, we present a practical method of semantic role forgetting for ontologies expressible in the description logic ALCOQH(universal role), i.e., the basic description logic ALC extended with nominals, qualified number restrictions, role inclusions and the universal role. Being based on an Ackermann approach, the method is the only approach so far for forgetting role symbols in description logics with qualified number restrictions. The method is goaloriented and incremental. It always terminates and is sound in the sense that the forgetting solution is equivalent to the original ontology up to the forgotten symbols possibly with new concept definer symbols. Despite our method not being complete, performance results of an evaluation with a prototypical implementation have shown very good success rates on realworld ontologies.

OntologyMediated Querying with the Description Logic EL: Trichotomy and Linear Datalog Rewritability
Carsten Lutz, Leif Sabellek
We consider ontologymediated queries (OMQs) based on an EL ontology and an atomic query (AQ), provide an ultimately finegrained analysis of data complexity and study rewritability into linear Datalogaiming to capture linear recursion in SQL. Our main results are that every such OMQ is in AC0, NLcomplete or PTimecomplete, and that containment in NL coincides with rewritability into linear Datalog (whereas containment in AC0 coincides with rewritability into firstorder logic). We establish natural characterizations of the three cases, show that deciding linear Datalog rewritability (as well as the mentioned complexities) is ExpTimecomplete, give a way to construct linear Datalog rewritings when they exist, and prove that there is no constant bound on the arity of IDB relations in linear Datalog rewritings.

A Characterization Theorem for a Modal Description Logic
Paul Wild, Lutz Schröder
Modal description logics feature modalities that capture dependence of knowledge on parameters such as time, place, or the information state of agents. E.g., the logic S5ALC combines the standard description logic ALC with an S5modality that can be understood as an epistemic operator or as representing (undirected) change. This logic embeds into a corresponding modal firstorder logic S5FOL. We prove a modal characterization theorem for this embedding, in analogy to results by van Benthem and Rosen relating ALC to standard firstorder logic: We show that S5ALC with only local roles is, both over finite and over unrestricted models, precisely the bisimulationinvariant fragment of S5FOL, thus giving an exact description of the expressive power of S5ALC with only local roles.

Learning from Ontology Streams with Semantic Concept Drift
Jiaoyan Chen, Freddy Lecue, Jeff Z. Pan, Huajun Chen
Data stream learning has been largely studied for extracting knowledge structures from continuous and rapid data records. In the semantic Web, data is interpreted in ontologies and its ordered sequence is represented as an ontology stream. Our work exploits the semantics of such streams to tackle the problem of concept drift i.e., unexpected changes in data distribution, causing most of models to be less accurate as time passes. To this end we revisited (i) semantic inference in the context of supervised stream learning, and (ii) models with semantic embeddings. The experiments show accurate prediction with data from Dublin and Beijing.

The Bag Semantics of OntologyBased Data Access
Charalampos Nikolaou, Egor Kostylev, George Konstantinidis, Mark Kaminski, Bernardo Cuenca Grau, Ian Horrocks
Ontologybased data access (OBDA) is a popular approach for integrating and querying multiple data sources by means of a shared ontology. The ontology is linked to the sources using mappings, which assign views over the data to ontology predicates. Motivated by the need for OBDA systems supporting databasestyle aggregate queries, we propose a bag semantics for OBDA, where duplicate tuples in the views defined by the mappings are retained, as is the case in standard databases. We show that bag semantics makes conjunctive query answering in OBDA coNPhard in data complexity. To regain tractability, we consider a rather general class of queries and show its rewritability to a generalisation of the relational calculus to bags.

OntologyMediated Query Answering for KeyValue Stores
Meghyn Bienvenu, Pierre Bourhis, MarieLaure MUGNIER, Sophie Tison, Federico Ulliana
We propose a novel rulebased ontology language for JSON records and investigate its computational properties. After providing a natural translation into firstorder logic, we identify relationships to existing ontology languages, which yield decidability of query answering but only rough complexity bounds. By establishing an interesting and nontrivial connection to word rewriting, we are able to pinpoint the exact combined complexity of query answering in our framework and obtain tractability results for data complexity. The upper bounds are proven using a query reformulation technique, which can be implemented on top of keyvalue stores, thereby exploiting their querying facilities.
Wednesday 23 08:30  10:00 MTCG  Computer Games

RealTime Navigation in Classical Platform Games via Skill Reuse
Michael Dann, John Thangarajah, Fabio Zambetta
In platform videogames, players are frequently tasked with solving mediumterm navigation problems in order to gather items or powerups. Artificial agents must generally obtain some form of direct experience before they can solve such tasks. Experience is gained either through training runs, or by exploiting knowledge of the game's physics to generate detailed simulations. Human players, on the other hand, seem to look ahead in highlevel, abstract steps. Motivated by human play, we introduce an approach that leverages not only abstract "skills", but also knowledge of what those skills can and cannot achieve. We apply this approach to Infinite Mario, where despite facing randomly generated, mazelike levels, our agent is capable of deriving complex plans in realtime, without relying on perfect knowledge of the game's physics.

Player Movement Models for Video Game Level Generation
Sam Snodgrass, Santiago Ontañón
The use of statistical and machine learning approaches, such as Markov chains, for procedural content generation (PCG) has been growing in recent years in the field of Game AI. However, there has been little work in learning to generate content, specifically levels, accounting for player movement within those levels. We are interested in extracting player models automatically from play traces and using those learned models, paired with a machine learningbased generator to create levels that allow the same types of movements observed in the play traces. We test our approach by generating levels for Super Mario Bros. We compare our results against the original levels, a previous constrained sampling approach, and a previous approach that learned a combined player and level model.

Stratified Strategy Selection for Unit Control in RealTime Strategy Games
Levi Lelis
In this paper we introduce Stratified Strategy Selection (SSS), a novel search algorithm for micromanaging units in realtime strategy (RTS) games. SSS uses a type system to partition the player's units into types and assumes that units of the same type must follow the same strategy. SSS searches in the state space induced by the type system to select, from a pool of options, a strategy for each unit. Empirical results on a simulator of an RTS game shows that SSS employing either fixed or adaptive type systems is able to substantially outperform stateoftheart searchbased algorithms in combat scenarios with up to 100 units.

Focused Depthfirst Proof Number Search using Convolutional Neural Networks for the Game of Hex
Chao Gao, Martin Mueller, Ryan Hayward
Proof Number search (PNS) is an effective algorithm for searching theoretical values on games with nonuniform branching factors. Focused depthfirst proof number search (FDFPN) with dynamic widening was proposed for Hex where the branching factor is nearly uniform. However, FDFPN is fragile to its heuristic move ordering function. The recent advances of Convolutional Neural Networks (CNNs) have led to considerable progress in game playing. We investigate how to incorporate the strength of CNNs into solving, with application to the game of Hex. We describe FDFPNCNN, a new focused DFPN search that uses convolutional neural networks. FDFPNCNN integrates two CNNs trained from games played by expert players. The value approximation CNN provides reliable information for defining the widening size by estimating the value of the node to expand, while the policy CNN selects promising children nodes to the search. On 8x8 Hex, experimental results show FDFPNCNN performs notably better than FDFPN, suggesting a promising direction for better solving Hex positions where learning from strong players is possible.

Interactive Narrative Personalization with Deep Reinforcement Learning
Pengcheng Wang, Jonathan Rowe, Bradford Mott, James Lester, Wookhee Min
Datadriven techniques for interactive narrative generation are the subject of growing interest. Reinforcement learning (RL) offers significant potential for devising datadriven interactive narrative generators that tailor players’ story experiences by inducing policies from player interaction logs. A key open question in RLbased interactive narrative generation is how to model complex player interaction patterns to learn effective policies. In this paper we present a deep RLbased interactive narrative generation framework that leverages synthetic data produced by a bipartite simulated player model. Specifically, the framework involves training a set of Qnetworks to control adaptable narrative event sequences with long shortterm memory networkbased simulated players. We investigate the deep RL framework’s performance with an educational interactive narrative, Crystal Island. Results suggest that the deep RLbased narrative generation framework yields effective personalized interactive narratives.

Game Engine Learning from Video
Matthew Guzdial, Boyang Li, Mark Riedl
Intelligent agents need to be able to make predictions about their environment. In this work we present a novel approach to learn a forward simulation model via simple search over pixel input. We make use of a video game, Super Mario Bros., as an initial test of our approach as it represents a physics system that is significantly less complex than reality. We demonstrate the significant improvement of our approach in predicting future states compared with a baseline CNN and apply the learned model to train a game playing agent. Thus we evaluate the algorithm in terms of the accuracy and value of its output model.
Wednesday 23 08:30  10:00 KRGSTR  Geometric, Spatial, and Temporal Reasoning

Efficiently Enforcing Path Consistency on Qualitative Constraint Networks by Use of Abstraction
Michael Sioutis, JeanFrançois Condotta
Partial closure under weak composition, or partial weak pathconsistency for short, is essential for tackling fundamental reasoning problems associated with qualitative constraint networks, such as the satisfiability checking problem, and therefore it is crucial to be able to enforce it as fast as possible. To this end, we propose a new algorithm, called PWCα, for efficiently enforcing partial weak pathconsistency on qualitative constraint networks, that exploits the notion of abstraction for qualitative constraint networks, utilizes certain properties of partial weak pathconsistency,and adapts the functionalities of some stateoftheart algorithms to its design. It is worth noting that, as opposed to a related approach in the recent literature, algorithm PWCα is complete for arbitrary qualitative constraint networks. The evaluation that we conducted with qualitative constraint networks of the Region Connection Calculus against a competing stateoftheart generic algorithm for enforcing partial weak pathconsistency, demonstrates the usefulness and efficiency of algorithm PWCα.

Inferring Human Attention by Learning Latent Intentions
Ping Wei, Dan Xie, Nanning Zheng, SongChun Zhu
This paper addresses the problem of inferring 3D human attention in RGBD videos at scene scale. 3D human attention describes where a human is looking in 3D scenes. We propose a probabilistic method to jointly model attention, intentions, and their interactions. Latent intentions guide human attention which conversely reveals the intention features. This mutual interaction makes attention inference a joint optimization with latent intentions. An EMbased approach is adopted to learn the latent intentions and model parameters. Given an RGBD video with 3D human skeletons, a jointstate dynamic programming algorithm is utilized to jointly infer the latent intentions, the 3D attention directions, and the attention voxels in scene point clouds. Experiments on a new 3D human attention dataset prove the strength of our method.

Dynamic Logic for Dataaware Systems: Decidability Results
Francesco Belardinelli, Andreas Herzig
We introduce a firstorder extension of dynamic logic (FODL), suitable to represent and reason about the behaviour of Dataaware Systems (DaS), which are systems whose data content is explicitly exhibited in the system’s description. We illustrate the expressivity of the formal framework by modelling English auctions as DaS, and by specifying relevant properties in FODL. Most importantly, we develop an abstractionbased verification procedure, thus proving that the model checking problem for DaS against FODL is actually decidable, provided some mild assumptions on the interpretationdomain.

Temporal Sequences of Qualitative Information: Reasoning about the Topology of ConstantSize Moving Regions
Quentin CohenSolal, Maroua Bouzid, Alexandre Niveau
Relying on the recently introduced multialgebras, we present a general approach for reasoning about temporal sequences of qualitative information that is generally more efficient than existing techniques. Applying our approach to the specific case of sequences of topological information about constantsize regions, we show that the resulting formalism has a complete procedure for deciding consistency, and we identify its three maximal tractable subclasses containing all basic relations.

Temporalising Separation Logic for Planning with Search Control Knowledge
Xu Lu, Cong Tian, Zhenhua Duan
Temporal logics are widely adopted in Artificial Intelligence (AI) planning for specifying Search Control Knowledge (SCK). However, traditional temporal logics are limited in expressive power since they are unable to express spatial constraints which are as important as temporal ones in many planning domains. To this end, we propose a twodimensional (spatial and temporal) logic namely PPTL^SL by temporalising separation logic with Propositional Projection Temporal Logic (PPTL). The new logic is wellsuited for specifying SCK containing both spatial and temporal constraints which are useful in AI planning. We show that PPTL^SL is decidable and present a decision procedure. With this basis, a planner namely STSolver for computing plans based on the spatiotemporal SCK expressed in PPTL^SL formulas is developed. Evaluation on some selected benchmark domains shows the effectiveness of STSolver.

Bounded Timed Propositional Temporal Logic with Past Captures Timelinebased Planning with Bounded Constraints
Nicola Gigante, Dario Della Monica, Angelo Montanari, Pietro Sala, Guido Sciavicco
Within the timelinebased framework, planning problems are modeled as sets of independent, but interacting, components whose behavior over time is described by a set of temporal constraints. Timelinebased planning is being used successfully in a number of complex tasks, but its theoretical properties are not so well studied. In particular, while it is known that Linear Temporal Logic (LTL) can capture classical actionbased planning, a similar logical characterization was not available for timelinebased planning formalisms. This paper shows that timelinebased planning with bounded temporal constraints can be captured by a bounded version of Timed Propositional Temporal Logic, augmented with past operators, which is an extension of LTL originally designed for the verification of realtime systems. As a byproduct, we get that the proposed logic is expressive enough to capture temporal actionbased planning problems.
Wednesday 23 08:30  10:00 CSMOTR  Modeling and Formulation

Cardinality Encodings for Graph Optimization Problems
Alexey Ignatiev, Antonio Morgado, Joao MarquesSilva
Different optimization problems defined on graphs find application in complex network analysis. Existing propositional encodings render impractical the use of propositional satisfiability (SAT) and maximum satisfiability (MaxSAT) solvers for solving a variety of these problems on large graphs. This paper has two main contributions. First, the paper identifies sources of inefficiency in existing encodings for different optimization problems in graphs. Second, for the concrete case of the maximum clique problem, the paper develops a novel encoding which is shown to be far more compact than existing encodings for large sparse graphs. More importantly, the experimental results show that the proposed encoding enables existing SAT solvers to compute a maximum clique for large sparse networks, often more efficiently than the state of the art.

Temporal Planning with ClockBased SMT Encodings
Jussi Rintanen
We propose more scalable encodings of temporal planning in SMT. The first contribution is practical clockbased encodings of resources and effect delays. Existing encodings of effect delays (Shin and Davis, 2015) have a quadratic size, due to the necessity to determine the time differences between steps for a linear number of steps. Clocks improve this to linear. The second contribution is a new relaxed scheme for steps. Existing schemes require a step for every time point with discontinuous change. This is relaxed, improving scalability.

Finding Robust Solutions to Stable Marriage
Begum Genc, Mohamed Siala, Barry O'Sullivan, Gilles Simonin
We study the notion of robustness in stable matching problems. We first define robustness by introducing (a,b)supermatches. An (a,b)supermatch is a stable matching in which if a pairs break up it is possible to find another stable matching by changing the partners of those a pairs and at most b other pairs. In this context, we define the most robust stable matching as a (1,b)supermatch where b is minimum. We show that checking whether a given stable matching is a (1,b)supermatch can be done in polynomial time. Next, we use this procedure to design a constraint programming model, a local search approach, and a genetic algorithm to find the most robust stable matching. Our empirical evaluation on large instances show that local search outperforms the other approaches.

Nonlinear Hybrid Planning with Deep Net Learned Transition Models and MixedInteger Linear Programming
Buser Say, Ga Wu, Yu Qing Zhou, Scott Sanner
In many realworld hybrid (mixed discrete continuous) planning problems such as Reservoir Control, Heating, Ventilation and Air Conditioning (HVAC), and Navigation, it is difficult to obtain a model of the complex nonlinear dynamics that govern state evolution. However, the ubiquity of modern sensors allow us to collect large quantities of data from each of these complex systems and build accurate, nonlinear deep network models of their state transitions. But there remains one major problem for the task of control  how can we plan with deep network learned transition models without resorting to Monte Carlo Tree Search and other blackbox transition model techniques that ignore model structure and do not easily extend to mixed discrete and continuous domains? In this paper, we make the critical observation that the popular Rectified Linear Unit (ReLU) transfer function for deep networks not only allows accurate nonlinear deep net model learning, but also permits a direct compilation of the deep network transition model to a MixedInteger Linear Program (MILP) encoding in a planner we call Hybrid Deep MILP Planning (HDMILPPLAN). We identify deep net specific optimizations and a simple sparsification method for HDMILPPLAN that improve performance over a naive encoding, and show that we are able to plan optimally with respect to the learned deep network.

Relaxed ExistsStep Plans in Planning as SMT
Joan Espasa Arxer, Miquel Bofill, Mateu Villaret
Planning Modulo Theories (PMT), inspired by Satisfiability Modulo Theories (SMT), allows the integration of arbitrary first order theories, such as linear arithmetic, with propositional planning. Under this setting, planning as SAT is generalized to planning as SMT. In this paper we introduce a new encoding for planning as SMT, which adheres to the relaxed relaxed ∃step (R 2 ∃step) semantics for parallel plans. We show the benefits of relaxing the requirements on the set of actions eligible to be executed at the same time, even though many redundant actions can be introduced. We also show how, by a MaxSMT based postprocessing step, redundant actions can be efficiently removed, and provide experimental results showing the benefits of this approach.

Compact MDDs for PseudoBoolean Constraints with AtMostOne Relations in ResourceConstrained Scheduling Problems
Jordi Coll, Miquel Bofill, Josep Suy, Mateu Villaret
PseudoBoolean (PB) constraints are usually encoded into Boolean clauses using compact Binary Decision Diagram (BDD) representations. Although these constraints appear in many problems, they are particularly useful for representing resource constraints in scheduling problems. Sometimes, the Boolean variables in the PB constraints have implicit atmostone relations. In this work we introduce a way to take advantage of these implicit relations to obtain a compact MultiDecision Diagram (MDD) representation for those PB constraints. We provide empirical evidence of the usefulness of this technique for some ResourceConstrained Project Scheduling Problem (RCPSP) variants, namely the MultiMode RCPSP (MRCPSP) and the RCPSP with TimeDependent Resource Capacities and Requests (RCPSP/t). The size reduction of the representation of the PB constraints lets us decrease the number of Boolean variables in the encodings by one order of magnitude. We close/certify the optimum of many instances of these problems.
Wednesday 23 08:30  10:00 MASEP1  Economic Paradigms 1

Why You Should Charge Your Friends for Borrowing Your Stuff
Kijung Shin, Euiwoong Lee, Dhivya Eswaran, Ariel Procaccia
We consider goods that can be shared with khop neighbors (i.e., the set of nodes within k hops from an owner) on a social network. We examine incentives to buy such a good by devising gametheoretic models where each node decides whether to buy the good or free ride. First, we find that social inefficiency, specifically excessive purchase of the good, occurs in Nash equilibria. Second, the social inefficiency decreases as k increases and thus a good can be shared with more nodes. Third, and most importantly, the social inefficiency can also be significantly reduced by charging free riders an access cost and paying it to owners, leading to the conclusion that organizations and system designers should impose such a cost. These findings are supported by our theoretical analysis in terms of the price of anarchy and the price of stability; and by simulations based on synthetic and real social networks.

Representativenessaware Aspect Analysis for Brand Monitoring in Social Media
Lizi Liao, Xiangnan He, Zhaochun Ren, Liqiang Nie, Huan Xu, TatSeng Chua
Owing to the fastresponding nature and extreme success of social media, many companies resort to social media sites for monitoring their brands’ reputation and the opinions of general public. To help companies monitor their brands, in this work, we delve into the task of extracting representative aspects and posts from users’ freetext posts in social media. Previous efforts have treated it as a traditional information extraction task, and forgo the specific properties of social media, such as the possible noise in user generated posts and the varying impacts; In contrast, we extract aspects by maximizing their representativeness, which is a new notion defined by us that accounts for both the coverage of aspects and the impact of posts. We formalize it as a submodular optimization problem, and develop a FastPAS algorithm to jointly select representative posts and aspects. The FastPAS algorithm optimizes parameters in a greedy way, which is highly efficient and can reach a good solution with theoretical guarantees. We perform extensive experiments on two datasets, showing that our method outperforms the stateoftheart aspect extraction and summarization methods in identifying representative aspects.

Contest Design with Uncertain Performance and Costly Participation
Priel Levy, David Sarne, Igor Rochlin
This paper studies the problem of designing contests for settings where a principal seeks to optimize the quality of the best performance obtained, and potential contestants only strategize about whether to participate in the contest, as participation incurs some cost. This type of contest can be mapped to various reallife settings (e.g., an audition, a beauty pageant, technology crowdsourcing). The paper provides a comparative gametheoretic based solution to two variants of the above underlying model: parallel and sequential contest, enabling a characterization of the equilibrium strategies in each. Special emphasis is placed on the case where the contestants are homogeneous which is often the case in reallife whenever the contestants are basically alike and their ranking in the contest is mostly influenced by some probabilistic factors (e.g., luck). Here, several (somehow counterintuitive) properties of the equilibrium are proved, in particular for the sequential contest, leading to a comprehensive characterization of the principal preference between the two.

Pessimistic LeaderFollower Equilibria with Multiple Followers
Stefano Coniglio, Alberto Marchesi, Nicola Gatti
The problem of computing the strategy to commit to has been widely investigated in the scientific literature for the case where a singlefollower is present. In the multifollower setting though, results are only sporadic. In this paper, we address the multifollower case for normalform games, assuming that, after observing the leader’s commitment, the followers play pure strategies and reach a Nash equilibrium. We focus on the pessimistic case where, among many equilibria, one minimizing the leader’s utility is chosen (the opposite case is computationally trivial). We show that the problem is NPhard even with only two followers, and propose an exact exponentialtime algorithm which, for any number of followers, either finds an equilibrium when the game admits a finite one or, if not, an αapproximation of the supremum of the leader’ utility, for any α > 0.

Bounding the Inefficiency of Compromise
Ioannis Caragiannis, Panagiotis Kanellopoulos, Alexandros Voudouris
Social networks on the Internet have seen an enormous growth recently and play a crucial role in different aspects of today's life. They have facilitated information dissemination in ways that have been beneficial for their users but it is also a common belief that they are often used strategically in order to spread information that only serves the objectives of particular users. These properties have inspired a revision of classical opinion formation models from sociology using gametheoretic notions and tools. We follow the same modeling approach, focusing on scenarios where the opinion expressed by each user is a compromise between her internal belief and the opinions of a small number of neighbors among her social acquaintances. We formulate simple games that capture this behavior and quantify the inefficiency of equilibria using the wellknown notion of the price of anarchy. Our results indicate that compromise comes at a cost that strongly depends on the neighborhood size.

Computing BayesNash Equilibria in Combinatorial Auctions with Continuous Value and Action Spaces
Vitor Bosshard, Benedikt Bünz, Benjamin Lubin, Sven Seuken
Combinatorial auctions (CAs) are widely used in practice, which is why understanding their incentive properties is an important problem. However, finding BayesNash equilibria (BNEs) of CAs analytically is tedious, and prior algorithmic work has only considered limited solution concepts (e.g. restricted action spaces). In this paper, we present a fast, general algorithm for computing symmetric pure epsilonBNEs in CAs with continuous values and actions. In contrast to prior work, we separate the search phase (for finding the BNE) from the verification step (for estimating the epsilon), and always consider the full (continuous) action space in the best response computation. We evaluate our method in the wellstudied LLG domain, against a benchmark of 16 CAs for which analytically BNEs are known. In all cases, our algorithm converges quickly, matching the known results with high precision. Furthermore, for CAs with quasilinear utility functions and independent value distributions, we derive a theoretical bound on epsilon. Finally, we introduce the new multiminded LLLLGG domain with eight goods and six bidders, and apply our algorithm to finding an equilibrium in this domain. Our algorithm is the first to find an accurate BNE in a CA of this size.
Wednesday 23 08:30  10:00 MLDMP  Data Mining and Personalization

FolkPopularityRank: Tag Recommendation for Enhancing Social Popularity using Text Tags in Content Sharing Services
Toshihiko Yamasaki, Jiani Hu, Shumpei Sano, Kiyoharu Aizawa
In this study, we address two emerging yet challenging problems in social media: (1) scoring the text tags in terms of the influence to the numbers of views, comments, and favorite ratings of images and videos on content sharing services, and (2) recommending additional tags to increase such popularityrelated numbers. For these purposes, we present the FolkPopularityRank algorithm, which can score text tags based on their ability to influence the popularityrelated numbers. The FolkPopularityRank algorithm is inspired by the PageRank and FolkRank algorithms but the scores of the tags are calculated not only by the cooccurrence of the tags but also by considering the popularityrelated numbers of the content. To the best of our knowledge, this is the first attempt to recommending tags that can enhance popularity attributes of social media. We conducted extensive experiments with about 1,000 images. We uploaded the photos with the recommended tags along with the original tags to Flickr as a real test, and obtained very promising results.

Sampling for Approximate Maximum Search in Factorized Tensor
Zhi Lu, Yang Hu, Bing Zeng
Factorization models have been extensively used for recovering the missing entries of a matrix or tensor. However, directly computing all of the entries using the learned factorization models is prohibitive when the size of the matrix/tensor is large. On the other hand, in many applications, such as collaborative filtering, we are only interested in a few entries that are the largest among them. In this work, we propose a samplingbased approach for finding the top entries of a tensor which is decomposed by the CANDECOMP/PARAFAC model. We develop an algorithm to sample the entries with probabilities proportional to their values. We further extend it to make the sampling proportional to the $k$th power of the values, amplifying the focus on the top ones. We provide theoretical analysis of the sampling algorithm and evaluate its performance on several realworld data sets. Experimental results indicate that the proposed approach is orders of magnitude faster than exhaustive computing. When applied to the special case of searching in a matrix, it also requires fewer samples than the other stateoftheart method.

Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks
Jun Xiao, Hao Ye, Xiangnan He, Hanwang Zhang, Fei Wu, TatSeng Chua
Factorization Machines (FMs) are a supervised learning approach that enhances the linear regression model by incorporating the secondorder feature interactions. Despite effectiveness, FM can be hindered by its modelling of all feature interactions with the same weight, as not all feature interactions are equally useful and predictive. For example, the interactions with useless features may even introduce noises and adversely degrade the performance. In this work, we improve FM by discriminating the importance of different feature interactions. We propose a novel model named Attentional Factorization Machine (AFM), which learns the importance of each feature interaction from data via a neural attention network. Extensive experiments on two realworld datasets demonstrate the effectiveness of AFM. Empirically, it is shown on regression task AFM betters FM with a 8.6% relative improvement, and consistently outperforms the stateoftheart deep learning methods Wide&Deep [Cheng et al., 2016] and DeepCross [Shan et al., 2016] with a much simpler structure and fewer model parameters. Our implementation of AFM is publicly available at: https://github.com/hexiangnan/attentional_factorization_machine

Learning User's Intrinsic and Extrinsic Interests for PointofInterest Recommendation: A Unified Approach
Huayu Li, Yong Ge, Defu Lian, Hao Liu
PointofInterest (POI) recommendation has been an important service on locationbased social networks. However, it is very challenging to generate accurate recommendations due to the complex nature of user's interest in POI and the data sparseness. In this paper, we propose a novel unified approach that could effectively learn finegrained and interpretable user's interest, and adaptively model the missing data. Specifically, a user's general interest in POI is modeled as a mixture of her intrinsic and extrinsic interests, upon which we formulate the ranking constraints in our unified recommendation approach. Furthermore, a selfadaptive locationoriented method is proposed to capture the inherent property of missing data, which is formulated as squared error based loss in our unified optimization objective. Extensive experiments on realworld datasets demonstrate the effectiveness and advantage of our approach.

Tracking the Evolution of Customer Purchase Behavior Segmentation via a FragmentationCoagulation Process
Ling Luo, Bin Li, Irena Koprinska, Shlomo Berkovsky, Fang Chen
Customer behavior modeling is important for businesses in order to understand, attract and retain customers. It is critical that the models are able to track the dynamics of customer behavior over time. We propose FCCSM, a Customer Segmentation Model based on a FragmentationCoagulation process, which can track the evolution of customer segmentation, including the splitting and merging of customer groups. We conduct a case study using transaction data from a major Australian supermarket chain, where we: 1) show that our model achieves high fitness of purchase rate, outperforming models using mixture of Poisson processes; 2) compare the impact of promotions on customers for different products; and 3) track how customer groups evolve over time and how individual customers shift across groups. Our model provides valuable information to stakeholders about the different types of customers, how they change purchase behavior, and which customers are more receptive to promotion campaigns.

LifeStage Modeling by CustomerManifold Embedding
JingWen Yang, Yang Yu, XiaoPeng Zhang
A person experiences different stages throughout the life, causing dramatically varying behavior patterns. In applications such as onlineshopping, it has been observed that customer behaviors are largely affected by their stages and are evolving over time. Although this phenomena has been recognized previously, very few studies tried to model the lifestage and make use of it. In this paper, we propose to discover a latent space, called customermanifold, on which a position corresponds to a customer stage. The customermanifold allows us to train a static prediction model that captures dynamic customer behavior patterns. We further embed the learned customermanifold into a neural network model as a hidden layer output, resulting in an efficient and accurate customer behavior prediction system. We apply this system to onlineshopping recommendation. Experiments in real world data show that taking customermanifold into account can improve the performance of the recommender system. Moreover, visualization of the customermanifold space may also be helpful to understand the evolutionary customer behaviors.
Wednesday 23 08:30  10:00 MLSSL1  SemiSupervised Learning 1

Storage Fit Learning with Unlabeled Data
BoJian Hou, Lijun Zhang, ZhiHua Zhou
By using abundant unlabeled data, semisupervised learning approaches have been found very useful in various tasks. Existing approaches, however, neglect the fact that the storage available for the learning process is different under different situations, and thus, the learning approaches should be flexible subject to the storage budget limit. In this paper, we focus on graphbased semisupervised learning and propose two storage fit learning approaches which can adjust their behaviors to different storage budgets. Specifically, we utilize techniques of lowrank matrix approximation to find a lowrank approximator of the similarity matrix so as to reduce the space complexity. The first approach is based on stochastic optimization, which is an iterative approach that converges to the optimal lowrank approximator globally. The second approach is based on Nystrom method, which can find a good lowrank approximator efficiently and is suitable for realtime applications. Experiments on classification tasks show that the proposed methods can fit dynamically different storage budgets and obtain good performances in different scenarios.

MultiPositive and Unlabeled Learning
Yixing Xu, Chang Xu, Chao Xu, Dacheng Tao
The positive and unlabeled (PU) learning problem focuses on learning a classifier from positive and unlabeled data. Some methods have been developed to solve the PU learning problem. However, they are often limited in practical applications, since only binary classes are involved and cannot easily be adapted to multiclass data. Here we propose a onestep method that directly enables multiclass model to be trained using the given input multiclass data and that predicts the label based on the model decision. Specifically, we construct different convex loss functions for labeled and unlabeled data to learn a discriminant function F. The theoretical analysis on the generalization error bound shows that it is no worse than k√k times of the fully supervised multiclass classification methods when the size of the data in k classes is of the same order. Finally, our experimental results demonstrate the significance and effectiveness of the proposed algorithm in synthetic and realworld datasets.

Adaptively Unified Semisupervised Learning for CrossModal Retrieval
Liang Zhang, Bingpeng Ma, Jianfeng He, Guorong Li, Qingming Huang, Qi Tian
Motivated by the fact that both relevancy of class labels and unlabeled data can help to strengthen multimodal correlation, this paper proposes a novel method for crossmodal retrieval. To make each sample moving to the direction of its relevant label while far away from that of its irrelevant ones, a novel dragging technique is fused into a unified linear regression model. By this way, not only the relation between embedded features and relevant class labels but also the relation between embedded features and irrelevant class labels can be exploited. Moreover, considering that some unlabeled data contain specific semantic information, a weighted regression model is designed to adaptively enlarge their contribution while weaken that of the unlabeled data with nonspecific semantic information. Hence, unlabeled data can supply semantic information to enhance discriminant ability of classifier. Finally, we integrate the constraints into a joint minimization formulation and develop an efficient optimization algorithm to learn a discriminative common subspace for different modalities. Experimental results on Wiki, Pascal and NUSWIDE datasets show that the proposed method outperforms the stateoftheart methods even when we set 20% samples without class labels.

InstanceLevel Label Propagation with MultiInstance Learning
Qifan Wang, Gal Chechik, Chen Sun, Bin Shen
Label propagation is a popular semisupervised learning technique that transfers information from labeled examples to unlabeled examples through a graph. Most label propagation methods construct a graph based on exampletoexample similarity, assuming that the resulting graph connects examples that share similar labels. Unfortunately, examplelevel similarity is sometimes badly defined. For instance, two images may contain two different objects, but have similar overall appearance due to large similar background. In this case, computing similarities based on wholeimage would fail propagating information to the right labels. This paper proposes a novel InstanceLevel Label Propagation (ILLP) approach that integrates label propagation with multiinstance learning. Each example is treated as containing multiple instances, as in the case of an image consisting of multiple regions. We first construct a graph based on instancelevel similarity and then simultaneously identify the instances carrying the labels and propagate the labels across instances in the graph. Optimization is based on an iterative Expectation Maximization (EM) algorithm. Experimental results on two benchmark datasets demonstrate the effectiveness of the proposed approach over several stateoftheart methods.

Learning Discriminative Recommendation Systems with Side Information
Feipeng Zhao, Yuhong Guo
TopN recommendation systems are useful in many real world applications such as Ecommerce platforms. Most previous methods produce topN recommendations based on the observed user purchase or recommendation activities. Recently, it has been noticed that side information that describes the items can be produced from auxiliary sources and help to improve the performance of topN recommendation systems; e.g., side information of the items can be collected from the item reviews. In this paper, we propose a joint discriminative prediction model that exploits both the partially observed useritem recommendation matrix and the itembased side information to build topN recommendation systems. This joint model aggregates observed useritem recommendation activities to produce the missing useritem recommendation scores while simultaneously training a linear regression model to predict the useritem recommendation scores from auxiliary item features. We evaluate the proposed approach on a number of recommendation datasets. The experimental results show that the proposed joint model is very effective for producing topN recommendation systems.

Adaptive SemiSupervised Learning with Discriminative Least Squares Regression
MINNAN LUO, LINGLING ZHANG, Feiping Nie, Xiaojun Chang, BUYUE QIAN, Qinghua Zheng
Semisupervised learning plays a significant role in multiclass classification, where a small number of labeled data are more deterministic while substantial unlabeled data might cause large uncertainties and potential threats. In this paper, we distinguish the label fitting of labeled and unlabeled training data through a probabilistic vector with an adaptive parameter, which always ensures the significant importance of labeled data and characterizes the contribution of unlabeled instance according to its uncertainty. Instead of using traditional least squares regression (LSR) for classification, we develop a new discriminative LSR by equipping each label with an adjustment vector. This strategy avoids incorrect penalization on samples that are far away from the boundary and simultaneously facilitates multiclass classification by enlarging the geometrical distance of instances belonging to different classes. An efficient alternative algorithm is exploited to solve the proposed model with closed form solution for each updating rule. We also analyze the convergence and complexity of the proposed algorithm theoretically. Experimental results on several benchmark datasets demonstrate the effectiveness and superiority of the proposed model for multiclass classification tasks.
Wednesday 23 08:30  10:00 SISSECO  Sister Conference Track: Search and Constraints

Using Constraint Programming to solve a Cryptanalytic Problem
Christine Solnon, David Gerault, Marine Minier
We describe Constraint Programming (CP) models to solve a cryptanalytic problem: the chosen key differential attack against the standard block cipher AES. We show that CP solvers are able to solve these problems quicker than dedicated cryptanalysis tools, and we prove that a solution claimed to be optimal in two recent cryptanalysis papers is not optimal by providing a better solution.

A SAT Approach to Branchwidth
Neha Lodha, Sebastian Ordyniak, Stefan Szeider
Branch decomposition is a prominent method for structurally decomposing a graph, hypergraph or CNF formula. The width of a branch decomposition provides a measure of how well the object is decomposed. For many applications it is crucial to compute a branch decomposition whose width is as small as possible. We propose a SAT approach to finding branch decompositions of small width. The core of our approach is an efficient SAT encoding which determines with a single SATcall whether a given hypergraph admits a branch decomposition of certain width. For our encoding we develop a novel partitionbased characterization of branch decompositions. The encoding size imposes a limit on the size of the given hypergraph. In order to break through this barrier and to scale the SAT approach to larger instances, we develop a new heuristic approach where the SAT encoding is used to locally improve a given candidate decomposition until a fixedpoint is reached. This new method scales now to instances with several thousands of vertices and edges.

Blockedness in Propositional Logic: Are You Satisfied With Your Neighborhood?
Benjamin Kiesl, Martina Seidl, Hans Tompits, Armin Biere
Clauseelimination techniques that simplify formulas by removing redundant clauses play an important role in modern SAT solving. Among the types of redundant clauses, blocked clauses are particularly popular. For checking whether a clause C is blocked in a formula F, one only needs to consider the socalled resolution neighborhood of C, i.e., the set of clauses that can be resolved with C. Because of this, blocked clauses are referred to as being locally redundant. In this paper, we discuss powerful generalizations of blocked clauses that are still locally redundant, viz. setblocked clauses and superblocked clauses. We furthermore present complexity results for deciding whether a clause is setblocked or superblocked.

Solving Very Hard Problems: CubeandConquer, a Hybrid SAT Solving Method
Marijn Heule, Oliver Kullmann, Victor Marek
A recent success of SAT solving has been the solution of the boolean Pythagorean Triples problem [Heule et al., 2016], delivering the largest proof yet, of 200 terabytes in size. We present this and the underlying paradigm CubeandConquer, a powerful general method to solve big SAT problems, based on integrating the “old” and “new” methods of SAT solving.
Wednesday 23 08:30  10:00 EAR2  Early Career 2

Multimodal News Article Analysis
Arnau Ramisa
The intersection of Computer Vision and Natural Language Processing has been a hot topic of research in recent years, with results that were unthinkable only a few years ago. In view of this progress, we want to highlight online news articles as a potential next step for this area of research. The rich interrelations of text, tags, images or videos, as well as a vast corpus of general knowledge are an exciting benchmark for highcapacity models such as the deep neural networks. In this paper we present a series of tasks and baseline approaches to leverage corpus such as the BreakingNews dataset.

Towards understanding stories in videos
Sanja Fidler
None

Robotic Strategic Behavior in Adversarial Environments
Noa Agmon
The presence of robots in areas containing threats is becoming more prevalent, due to their ability to perform missions accurately, efficiently, and with little risk to humans. Having the robots handle adversarial forces in missions such as search and rescue, intelligence gathering, border protection and humanitarian assistance, raises many new, exciting research challenges. This paper describes recent research achievements in areas related to robotic mission planning in adversarial environments, including multirobot patrolling, robotic coverage, multirobot formation, and navigation, and suggests possible future research directions.
Wednesday 23 10:30  12:00 PLPA  Planning Algorithms

On Creating Complementary Pattern Databases
Santiago Franco, Alvaro Torralba, Levi Lelis, Mike Barley
A pattern database (PDB) for a planning problem is a heuristic function in the form of a lookup table that contains optimal solution costs of a simplified version of the problem. The simplified problem is constructed by accounting for a subset of the problem’s variables, which is called a pattern. It is known that the effectiveness of PDBs are greatly affected by the choice of its pattern and that they can be more effective if combined together. In this paper we introduce a method for sequentially creating multiple PDB heuristics that accounts for the PDBs already created. Our method tries, at a given iteration, to select a pattern that complements the strengths of the patterns already selected. We evaluate our algorithm using explicit and symbolic PDBs. Our results show that the heuristic produced by our approach are able to outperform existing schemes, and that our method is able to create complementary PDBs to other existing search enhancements such as perimeter search.

Additive MergeandShrink Heuristics for Diverse Action Costs
Gaojian Fan, Martin Mueller, Robert Holte
In many planning applications, actions can have highly diverse costs. Recent studies focus on the effects of diverse action costs on search algorithms, but not on their effects on domainindependent heuristics. In this paper, we demonstrate there are negative impacts of action cost diversity on mergeandshrink (M&S), a successful abstraction method for producing highquality heuristics for planning problems. We propose a new cost partitioning method for M&S to address the negative effects of diverse action costs. We investigate nonunit cost IPC domains, especially those for which diverse action costs have severe negative effects on the quality of the M&S heuristic. Our experiments demonstrate that in these domains, an additive set of M&S heuristics using the new cost partitioning method produces much more informative and effective heuristics than creating a single M&S heuristic which directly encodes diverse costs.

From Qualitative to Quantitative Dominance Pruning for Optimal Planning
Alvaro Torralba
Dominance relations compare states to determine whether one is at least as good as another in terms of their goal distance. We generalize these qualitative yes/no relations to functions that measure by how much a state is better than another. This allows us to distinguish cases where the state is strictly closer to the goal. Moreover, we may obtain a bound on the difference in goal distance between two states even if there is no qualitative dominance.We analyze the multiple advantages that quantitative dominance has, like discovering coarser dominance relations, or trading dominance by gvalue. Moreover, quantitative dominance can also be used to prove that an action starts an optimal plan from a given state. We introduce a novel action selection pruning that uses this to prune any other successor. Results show that quantitative dominance pruning greatly reduces the search space, significantly increasing the planners' performance.

Search and Learn: On DeadEnd Detectors, the Traps they Set, and Trap Learning
Marcel Steinmetz, Joerg Hoffmann
A key technique for proving unsolvability in classical planning are deadend detectors \Delta: effectively testable criteria sufficient for unsolvability, pruning (some) unsolvable states during search. Related to this, a recent proposal is the identification of traps prior to search, compact representations of nongoal state sets T that cannot be escaped. Here, we create new synergy across these ideas. We define a generalized concept of traps, relative to a given deadend detector \Delta, where T can be escaped, but only into deadend states detected by \Delta. We show how to learn compact representations of such T during search, extending the reach of \Delta. Our experiments show that this can be quite beneficial. It improves coverage for many unsolvable benchmark planning domains and deadend detectors \Delta, in particular on resourceconstrained domains where it outperforms the state of the art.

Robust Advertisement Allocation
Shaojie Tang
With the rapid growth of ecommerce and World Wide Web, internet advertising revenue has surpassed broadcast revenue very recently. As online advertising has become a major source of revenue for online publishers, such as Google and Amazon, one problem facing them is to optimize the ads selection and allocation in order to maximize their revenue. Although there is a rich body of work that has been devoted to this field, uncertainty about models and parameter settings is largely ignored in existing algorithm design. To fill this gap, we are the first to formulate and study the \emph{Robust Ad Allocation} problem, by taking into account the uncertainty about parameter settings. We define a Robust Ad Allocation framework with a set of candidate parameter settings, typically derived from different users or topics. Our main aim is to develop robust ad allocation algorithms, which can provide satisfactory performance across a spectrum of parameter settings, compared to the (parameterspecific) optimum solutions. We study this problem progressively and propose a series of algorithms with bounded approximation ratio.

Purely Declarative Action Descriptions are Overrated: Classical Planning with Simulators
Guillem Francès, Miquel Ramírez, Nir Lipovetzky, Hector Geffner
Classical planning is concerned with problems where a goal needs to be reached from a known initial state by doing actions with deterministic, known effects. Classical planners, however, deal only with classical problems that can be expressed in declarative planning languages such as STRIPS or PDDL. This prevents their use on problems that are not easy to model declaratively or whose dynamics are given via simulations. Simulators do not provide a declarative representation of actions, but simply return successor states. The question we address in this paper is: can a planner that has access to the structure of states and goals only, approach the performance of planners that also have access to the structure of actions expressed in PDDL? To answer this, we develop domainindependent, black box planning algorithms that completely ignore action structure, and show that they match the performance of stateoftheart classical planners on the standard planning benchmarks. Effective black box algorithms open up new possibilities for modeling and for expressing control knowledge, which we also illustrate.
Wednesday 23 10:30  12:00 MLCL4  Classification 4

Adaptive Manifold Regularized Matrix Factorization for Data Clustering
Lefei Zhang, Qian Zhang, Bo Du, Jane You, Dacheng Tao
Data clustering is the task to group the data samples into certain clusters based on the relationships of samples and structures hidden in data, and it is a fundamental and important topic in data mining and machine learning areas. In the literature, the spectral clustering is one of the most popular approaches and has many variants in recent years. However, the performance of spectral clustering is determined by the affinity matrix, which is always computed by a predefined model (e.g., Gaussian kernel function) with carefully tuned parameters combination, and may far from optimal in practice. In this paper, we propose to consider the observed data clustering as a robust matrix factorization point of view, and learn an affinity matrix simultaneously to regularize the proposed matrix factorization. The solution of the proposed adaptive manifold regularized matrix factorization (AMRMF) is reached by a novel Augmented Lagrangian Multiplier (ALM) based algorithm. The experimental results on standard clustering datasets demonstrate the superior performance over the exist alternatives.

Efficient Kernel Selection via Spectral Analysis
Jian Li, Yong Liu, Hailun Lin, Yinliang Yue, Weiping Wang
Kernel selection is a fundamental problem of kernel methods. Existing measures for kernel selection either provide less theoretical guarantee or have high computational complexity. In this paper, we propose a novel kernel selection criterion based on a newly defined spectral measure of a kernel matrix, with sound theoretical foundation and high computational efficiency. We first show that the spectral measure can be used to derive generalization bounds for some kernelbased algorithms. By minimizing the derived generalization bounds, we propose the kernel selection criterion with spectral measure. Moreover, we demonstrate that the popular minimum graph cut and maximum mean discrepancy are two special cases of the proposed criterion. Experimental results on lots of data sets show that our proposed criterion can not only give the comparable results as the stateoftheart criterion, but also significantly improve the efficiency.

Adaptive Learning Rate via Covariance Matrix Based Preconditioning for Deep Neural Networks
Yasutoshi Ida, Yasuhiro Fujiwara, Sotetsu Iwamura
Adaptive learning rate algorithms such as RMSProp are widely used for training deep neural networks. RMSProp offers efficient training since it uses first order gradients to approximate Hessianbased preconditioning. However, since the first order gradients include noise caused by stochastic optimization, the approximation may be inaccurate. In this paper, we propose a novel adaptive learning rate algorithm called SDProp. Its key idea is effective handling of the noise by preconditioning based on covariance matrix. For various neural networks, our approach is more efficient and effective than RMSProp and its variant.

Robust Softmax Regression for Multiclass Classification with SelfPaced Learning
Yazhou Ren, Peng Zhao, ZENGLIN Xu, Yongpan Sheng, Dezhong Yao
Softmax regression, a generalization of Logistic regression (LR) in the setting of multiclass classification, has been widely used in many machine learning applications. However, the performance of softmax regression is extremely sensitive to the presence of noisy data and outliers. To address this issue, we propose a model of robust softmax regression (RoSR) originated from the selfpaced learning (SPL) paradigm for multiclass classification. Concretely, RoSR equipped with the soft weighting scheme is able to evaluate the importance of each data instance. Then, data instances participate in the classification problem according to their weights. In this way, the influence of noisy data and outliers (which are typically with small weights) can be significantly reduced. However, standard SPL may suffer from the imbalanced class influence problem, where some classes may have little influence in the training process if their instances are not sensitive to the loss. To alleviate this problem, we design two novel soft weighting schemes that assign weights and select instances locally for each class. Experimental results demonstrate the effectiveness of the proposed methods.

Recommendation vs Sentiment Analysis: A TextDriven Latent Factor Model for Rating Prediction with ColdStart Awareness
Kaisong Song, Wei Gao, Shi Feng, Daling Wang, KamFai Wong, Chengqi Zhang
Review rating prediction is an important research topic. The problem was approached from either the perspective of recommender systems (RS) or that of sentiment analysis (SA). Recent SA research using deep neural networks (DNNs) has realized the importance of user and product interaction for better interpreting the sentiment of reviews. However, the complexity of DNN models in terms of the scale of parameters is very high, and the performance is not always satisfying especially when userproduct interaction is sparse. In this paper, we propose a simple, extensible RSbased model, called Textdriven Latent Factor Model (TLFM), to capture the semantics of reviews, user preferences and product characteristics by jointly optimizing two components, a userspecific LFM and a productspecific LFM, each of which decomposes text into a specific lowdimension representation. Furthermore, we address the coldstart issue by developing a novel Pairwise Rating Comparison strategy (PRC), which utilizes the difference between ratings on common user/product as supplementary information to calibrate parameter estimation. Experiments conducted on IMDB and Yelp datasets validate the advantage of our approach over stateoftheart baseline methods.

Regional Concept Drift Detection and Density Synchronized Drift Adaptation
Anjin Liu, YIliao Song, Guangquan Zhang, Jie Lu
In data stream mining, the emergence of new patterns or a pattern ceasing to exist is called concept drift. Concept drift makes the learning process complicated because of the inconsistency between existing data and upcoming data. Since concept drift was first proposed, numerous articles have been published to address this issue in terms of distribution analysis. However, most distributionbased drift detection methods assume that a drift happens at an exact time point, and the data arrived before that time point is considered not important. Thus, if a drift only occurs in a small region of the entire feature space, the other nondrifted regions may also be suspended, thereby reducing the learning efficiency of models. To retrieve nondrifted information from suspended historical data, we propose a local drift degree (LDD) measurement that can continuously monitor regional density changes. Instead of suspending all historical data after a drift, we synchronize the regional density discrepancies according to LDD. Experimental evaluations on three public data sets show that our concept drift adaptation algorithm improves accuracy compared to other methods.
Wednesday 23 10:30  12:00 MLDLNLP  Deep Learning and NLP

Multimodal Storytelling via Generative Adversarial Imitation Learning
Zhiqian Chen, Xuchao Zhang, Arnold Boedihardjo, Jing Dai, ChangTien Lu
Deriving event storylines is an effective summarization method to succinctly organize extensive information, which can significantly alleviate the pain of information overload. The critical challenge is the lack of widely recognized definition of storyline metric. Prior studies have developed various approaches based on different assumptions about users' interests. These works can extract interesting patterns, but their assumptions do not guarantee that the derived patterns will match users' preference. On the other hand, their exclusiveness of single modality source misses crossmodality information. This paper proposes a method, multimodal imitation learning via Generative Adversarial Networks(MILGAN), to directly model users' interests as reflected by various data. In particular, the proposed model addresses the critical challenge by imitating users' demonstrated storylines. Our proposed model is designed to learn the reward patterns given userprovided storylines and then applies the learned policy to unseen data. The proposed approach is demonstrated to be capable of acquiring the user's implicit intent and outperforming competing methods by a substantial margin with a user study.

Combining Knowledge with Deep Convolutional Neural Networks for Short Text Classification
Jin Wang, Zhongyuan Wang, Dawei Zhang, Jun Yan
Text classification is a fundamental task in NLP applications. Most existing work relied on either explicit or implicit text representation to address this problem. While these techniques work well for sentences, they can not easily be applied to short text because of its shortness and sparsity. In this paper, we propose a framework based on convolutional neural networks that combines explicit and implicit representations of short text for classification. We first conceptualize a short text as a set of relevant concepts using a large taxonomy knowledge base. We then obtain the embedding of short text by coalescing the words and relevant concepts on top of pretrained word vectors. We further incorporate character level features into our model to capture finegrained subword information. Experimental results on five commonly used datasets show that our proposed method significantly outperforms stateoftheart methods.

Adaptive Semantic Compositionality for Sentence Modelling
Pengfei Liu, Xipeng Qiu, Xuanjing Huang
Representing a sentence with a fixed vector has shown its effectiveness in various NLP tasks. Most of the existing methods are based on neural network, which recursively apply different composition functions to a sequence of word vectors thereby obtaining a sentence vector.A hypothesis behind these approaches is that the meaning of any phrase can be composed of the meanings of its constituents.However, many phrases, such as idioms, are apparently noncompositional.To address this problem, we introduce a parameterized compositional switch, which outputs a scalar to adaptively determine whether the meaning of a phrase should be composed of its two constituents.We evaluate our model on five datasets of sentiment classification and demonstrate its efficacy with qualitative and quantitative experimental analysis .

Exploration of Treebased Hierarchical Softmax for Recurrent Language Models
nan jiang, Wenge Rong, Min Gao, Yikang Shen, Zhang Xiong
Recently, variants of neural networks for computational linguistics have been proposed and successfully applied to language modeling and machine translation. These neural models can learn knowledge from massive corpora but they are extremely slow as they predict candidate words from a large vocabulary during training and inference. As an alternative to gradient approximation and softmax with class decomposition, we explore the treebased hierarchical softmax method and reform its architecture, making it compatible with modern GPUs and introducing a treebased loss. When combined with several word hierarchical clustering algorithms, improved performance is achieved in language modelling task with intrinsic evaluation criterions on PTB, WikiText2 and WikiText103 datasets.

Deep Ordinal Regression Based on Data Relationship for Small Datasets
Yanzhu Liu, Adams Wai Kin Kong, Chi Keong Goh
Ordinal regression aims to classify instances into ordinal categories. As with other supervised learning problems, learning an effective deep ordinal model from a small dataset is challenging. This paper proposes a new approach which transforms the ordinal regression problem to binary classification problems and uses triplets with instances from different categories to train deep neural networks such that highlevel features describing their ordinal relationship can be extracted automatically. In the testing phase, triplets are formed by a testing instance and other instances with known ranks. A decoder is designed to estimate the rank of the testing instance based on the outputs of the network. Because of the data argumentation by permutation, deep learning can work for ordinal regression even on small datasets. Experimental results on the historical color image benchmark and MSRA image search datasets demonstrate that the proposed algorithm outperforms the traditional deep learning approach and is comparable with other stateoftheart methods, which are highly based on prior knowledge to design effective features.

Random Shifting for CNN: a Solution to Reduce Information Loss in DownSampling Layers
Gangming Zhao, Jingdong Wang, Zhaoxiang Zhang
Downsampling is widely adopted in deep convolutional neural networks (DCNN) for reducing the number of network parameters while preserving the transformation invariance. However, it cannot utilize information effectively because it only adopts a fixed stride strategy, which may result in poor generalization ability and information loss. In this paper, we propose a novel random strategy to alleviate these problems by embedding random shifting in the downsampling layers during the training process. Random shifting can be universally applied to diverse DCNN models to dynamically adjust receptive fields by shifting kernel centers on feature maps in different directions. Thus, it can generate more robust features in networks and further enhance the transformation invariance of downsampling operators. In addition, random shifting cannot only be integrated in all downsampling layers including strided convolutional layers and pooling layers, but also improve performance of DCNN with negligible additional computational cost. We evaluate our method in different tasks (e.g., image classification and segmentation) with various network architectures (i.e., AlexNet, FCN and DFNMR). Experimental results demonstrate the effectiveness of our proposed method.
Wednesday 23 10:30  12:00 KRDLO2  Description Logics and Ontologies 2

Combining DLLite_{bool}^N with Branching Time: A gentle Marriage
Víctor GutiérrezBasulto, Jean Christoph Jung
We study combinations of the description logic DLLite_{bool}^N with the branching temporal logics CTL* and CTL. We analyse two types of combinations, both with rigid roles: (i) temporal operators are applied to concepts and to ABox assertions, and (ii) temporal operators are applied to concepts and Boolean combinations of concept inclusions and ABox assertions. For the resulting logics, we present algorithms for the satisfiability problem and (mostly tight) complexity bounds ranging from ExpTime to 3ExpTime.

Query Rewriting for DLLite with nary Concrete Domains
Franz Baader, Stefan Borgwardt, Marcel Lippmann
We investigate ontologybased query answering (OBQA) in a setting where both the ontology and the query can refer to concrete values such as numbers and strings. In contrast to previous work on this topic, the builtin predicates used to compare values are not restricted to being unary. We introduce restrictions on these predicates and on the ontology language that allow us to reduce OBQA to query answering in databases using the socalled combined rewriting approach. Though at first sight our restrictions are different from the ones used in previous work, we show that our results strictly subsume some of the existing firstorder rewritability results for unary predicates.

Making Cross Products and Guarded Ontology Languages Compatible
Pierre Bourhis, Michael Morak, Andreas Pieris
Cross products form a useful modelling tool that allows us to express natural statements such as "elephants are bigger than mice", or, more generally, to define relations that connect every instance in a relation with every instance in another relation. Despite their usefulness, cross products cannot be expressed using existing guarded ontology languages, such as description logics (DLs) and guarded existential rules. The question that comes up is whether cross products are compatible with guarded ontology languages, and, if not, whether there is a way of making them compatible. This has been already studied for DLs, while for guarded existential rules remains unanswered. Our goal is to give an answer to the above question. To this end, we focus on the guarded fragment of firstorder logic (which serves as a unifying framework that subsumes many of the aforementioned ontology languages) extended with cross products, and we investigate the standard tasks of satisfiability and query answering. Interestingly, we isolate relevant fragments that are compatible with cross products.

Query Answering in Ontologies under Preference Rankings
İsmail İlkan Ceylan, Thomas Lukasiewicz, Rafael Penaloza, Oana TifreaMarciuska
We present an ontological framework, based on preference rankings, that allows users to express their preferences between the knowledge explicitly available in the ontology. Using this formalism, the answers for a given query to an ontology can be ranked by preference, allowing users to retrieve the most preferred answers only. We provide a host of complexity results for the main computational tasks in this framework, for the general case, and for EL and DLLite_core as underlying ontology languages.

Mapping Repair in Ontologybased Data Access Evolving Systems
Domenico Lembo, Riccardo Rosati, Domenico Fabio Savo, Valerio Santarelli, Evgenij Thorstensen
In this paper we study the evolution of ontologybased data access (OBDA) specifications, and focus on the case in which the ontology and/or the data source schema change, which may require a modification to the mapping between them to preserve both consistency and knowledge. Our approach is based on the idea of repairing the mapping according to the usual principle of minimal change and on a recent, mappingbased notion of consistency of the specification. We define and analyze two notions of mapping repair under ontology and source schema update. We then present a set of results on the complexity of query answering in the above framework, when the ontology is expressed in DLLiteR.

Most Probable Explanations for Probabilistic Database Queries
İsmail İlkan Ceylan, Stefan Borgwardt, Thomas Lukasiewicz
Forming the foundations of largescale knowledge bases, probabilistic databases have been widely studied in the literature. In particular, probabilistic query evaluation has been investigated intensively as a central inference mechanism. However, despite its power, query evaluation alone cannot extract all the relevant information encompassed in largescale knowledge bases. To exploit this potential, we study two inference tasks; namely finding the most probable database and the most probable hypothesis for a given query. As natural counterparts of most probable explanations (MPE) and maximum a posteriori hypotheses (MAP) in probabilistic graphical models, they can be used in a variety of applications that involve prediction or diagnosis tasks. We investigate these problems relative to a variety of query languages, ranging from conjunctive queries to ontologymediated queries, and provide a detailed complexity analysis.
Wednesday 23 10:30  12:00 MTSP1  Security and Privacy 1

Online Reputation Fraud Campaign Detection in User Ratings
Chang Xu, Jie Zhang, Zhu Sun
Reputation fraud campaigns (RFCs) distort the reputations of rated items, by generating fake ratings through multiple spammers. One effective way of detecting RFCs is to characterize their collective behaviors based on rating histories.However, these campaigns are constantly evolving and changing tactics to evade detection.For example, they can launch early attacks on the items to quickly dominate the reputations.They can also whitewash themselves through creating new accounts for subsequent attacks.It is thus challenging for existing approaches working on historical data to promptly react to such emerging fraud activities.In this paper, we conduct RFC detection in online fashion, so as to spot campaign activities as early as possible.This leads to a unified and scalable optimization framework, FraudScan, that can adapt to emerging fraud patterns over time.Empirical analysis on two realworld datasets validates the effectiveness and efficiency of the proposed framework.

Defending Against ManInTheMiddle Attack in Repeated Games
Shuxin Li, Xiaohong Li, Jianye Hao, Bo An, Zhiyong Feng, Kangjie Chen, Chengwei Zhang
The ManintheMiddle (MITM) attack has become widespread in networks nowadays. The MITM attack would cause serious information leakage and result in tremendous loss to users. Previous work applies game theory to analyze the MITM attackdefense problem and computes the optimal defense strategy to minimize the total loss. It assumes that all defenders are cooperative and the attacker know defenders' strategies beforehand. However, each individual defender is rational and may not have the incentive to cooperate. Furthermore, the attacker can hardly know defenders' strategies ahead of schedule in practice. To this end, we assume that all defenders are selfinterested and model the MITM attackdefense scenario as a simultaneousmove game. Nash equilibrium is adopted as the solution concept which is proved to be always unique. Given the impracticability of computing Nash equilibrium directly, we propose practical adaptive algorithms for the defenders and the attacker to learn towards the unique Nash equilibrium through repeated interactions. Simulation results show that the algorithms are able to converge to Nash equilibrium strategy efficiently.

Staying Ahead of the Game: Adaptive Robust Optimization for Dynamic Allocation of Threat Screening Resources
Sara Marie Mc Carthy, Phebe Vayanos, Milind Tambe
We consider the problem of dynamically allocating screening resources of different efficacies (e.g., magnetic or Xray imaging) at checkpoints (e.g., at airports or ports) to successfully avert an attack by one of the screenees. Previously, the Threat Screening Game model was introduced to address this problem under the assumption that screenee arrival times are perfectly known. In reality, arrival times are uncertain, which severely impedes the implementability and performance of this approach. We thus propose a novel framework for dynamic allocation of threat screening resources that explicitly accounts for uncertainty in the screenee arrival times. We model the problem as a multistage robust optimization problem and propose a tractable solution approach using compact linear decision rules combined with robust reformulation and constraint randomization. We perform extensive numerical experiments which showcase that our approach outperforms (a) exact solution methods in terms of tractability, while incurring only a very minor loss in optimality, and (b) methods that ignore uncertainty in terms of both feasibility and optimality.

A Monte Carlo Tree Search approach to Active Malware Analysis
Riccardo Sartea, Alessandro Farinelli
Active Malware Analysis (AMA) focuses on acquiring knowledge about dangerous software by executing actions that trigger a response in the malware. A key problem for AMA is to design strategies that select most informative actions for the analysis. To devise such actions, we model AMA as a stochastic game between an analyzer agent and a malware sample, and we propose a reinforcement learning algorithm based on Monte Carlo Tree Search. Crucially, our approach does not require a prespecified malware model but, in contrast to most existing analysis techniques, we generate such model while interacting with the malware. We evaluate our solution using clustering techniques on models generated by analyzing real malware samples. Results show that our approach learns faster than existing techniques even without any prior information on the samples.

Tactics of Adversarial Attack on Deep Reinforcement Learning Agents
LIN YENCHEN, ZhangWei Hong, YuanHong Liao, MengLi Shih, MingYu Liu, Min Sun
We introduce two tactics, namely the strategicallytimed attack and the enchanting attack, to attack reinforcement learning agents trained by deep reinforcement learning algorithms using adversarial examples. In the strategicallytimed attack, the adversary aims at minimizing the agent's reward by only attacking the agent at a small subset of time steps in an episode. Limiting the attack activity to this subset helps prevent detection of the attack by the agent. We propose a novel method to determine when an adversarial example should be crafted and applied. In the enchanting attack, the adversary aims at luring the agent to a designated target state. This is achieved by combining a generative model and a planning algorithm: while the generative model predicts the future states, the planning algorithm generates a preferred sequence of actions for luring the agent. A sequence of adversarial examples is then crafted to lure the agent to take the preferred sequence of actions. We apply the proposed tactics to the agents trained by the stateoftheart deep reinforcement learning algorithm including DQN and A3C. In 5 Atari games, our strategicallytimed attack reduces as much reward as the uniform attack (i.e., attacking at every time step) does by attacking the agent 4 times less often. Our enchanting attack lures the agent toward designated target states with a more than 70% success rate. Example videos are available at http://yclin.me/adversarial_attack_RL/.

Efficient Private ERM for Smooth Objectives
Jiaqi Zhang, Kai Zheng, Wenlong Mou, Liwei Wang
In this paper, we consider efficient differentially private empirical risk minimization from the viewpoint of optimization algorithms. For strongly convex and smooth objectives, we prove that gradient descent with output perturbation not only achieves nearly optimal utility, but also significantly improves the running time of previous stateoftheart private optimization algorithms, for both $\epsilon$DP and $(\epsilon, \delta)$DP. For nonconvex but smooth objectives, we propose an RRPSGD (Random Round Private Stochastic Gradient Descent) algorithm, which provably converges to a stationary point with privacy guarantee. Besides the expected utility bounds, we also provide guarantees in high probability form. Experiments demonstrate that our algorithm consistently outperforms existing method in both utility and running time.
Wednesday 23 10:30  12:00 MTSS1  Social Sciences 1

A Causal Framework for Discovering and Removing Direct and Indirect Discrimination
Lu Zhang, Yongkai Wu, Xintao Wu
In this paper, we investigate the problem of discovering both direct and indirect discrimination from the historical data, and removing the discriminatory effects before the data is used for predictive analysis (e.g., building classifiers). The main drawback of existing methods is that they cannot distinguish the part of influence that is really caused by discrimination from all correlated influences. In our approach, we make use of the causal network to capture the causal structure of the data. Then we model direct and indirect discrimination as the pathspecific effects, which accurately identify the two types of discrimination as the causal effects transmitted along different paths in the network. Based on that, we propose an effective algorithm for discovering direct and indirect discrimination, as well as an algorithm for precisely removing both types of discrimination while retaining good data utility. Experiments using the real dataset show the effectiveness of our approaches.

Fast Network Embedding Enhancement via High Order Proximity Approximation
Cheng Yang, Maosong Sun, Zhiyuan Liu, Cunchao Tu
Many Network Representation Learning (NRL) methods have been proposed to learn vector representations for vertices in a network recently. In this paper, we summarize most existing NRL methods into a unified twostep framework, including proximity matrix construction and dimension reduction. We focus on the analysis of proximity matrix construction step and conclude that an NRL method can be improved by exploring higher order proximities when building the proximity matrix. We propose Network Embedding Update (NEU) algorithm which implicitly approximates higher order proximities with theoretical approximation bound and can be applied on any NRL methods to enhance their performances. We conduct experiments on multilabel classification and link prediction tasks. Experimental results show that NEU can make a consistent and significant improvement over a number of NRL methods with almost negligible running time on all three publicly available datasets.

CognitiveInspired ConversationalStrategy Reasoner for SociallyAware Agents
Oscar J. Romero, Ran Zhao, Justine Cassell
In this work we propose a novel module for a dialogue system that allows a conversational agent to utter phrases that do not just meet the system's task intentions, but also work towards achieving the system's social intentions. The module  a Social Reasoner  takes the task goals the system must achieve and decides the appropriate conversational style and strategy with which the dialogue system describes the information the user desires so as to boost the strength of the relationship between the user and system (rapport), and therefore the user's engagement and willingness to divulge the information the agent needs to efficiently and effectively achieve the user's goals. Our Social Reasoner is inspired both by analysis of empirical data of friends and stranger dyads engaged in a task, and by prior literature in fields as diverse as reasoning processes in cognitive and social psychology, decisionmaking, sociolinguistics and conversational analysis. Our experiments demonstrated that, when using the Social Reasoner in a Dialogue System, the rapport level between the user and system increases in more than 35% in comparison with those cases where no Social Reasoner is used.

Cake Cutting: Envy and Truth
Xiaohui Bei, Ning Chen, Guangda Huzhang, Biaoshuai Tao, Jiajun Wu
We study envyfree cake cutting with strategic agents, where each agent may manipulate his private information in order to receive a better allocation. We focus on piecewise constant utility functions and consider two scenarios: the general setting without any restriction on the allocations and the restricted setting where each agent has to receive a connected piece. We show that no deterministic truthful envyfree mechanism exists in the connected piece scenario, and the same impossibility result for the general setting with some additional mild assumptions on the allocations. Finally, we study a large market model where the economy is replicated and demonstrate that truthtelling converges to a Nash equilibrium.

Networked Fairness in Cake Cutting
Xiaohui Bei, Youming Qiao, Shengyu Zhang
We introduce a graphical framework for fair division in cake cutting, where comparisons between agents are limited by an underlying network structure. We generalize the classical fairness notions of envyfreeness and proportionality in this graphical setting. An allocation is called envyfree on a graph if no agent envies any of her neighbor's share, and is called proportional on a graph if every agent values her own share no less than the average among her neighbors, with respect to her own measure. These generalizations enable new research directions in developing simple and efficient algorithms that can produce fair allocations under specific graph structures. On the algorithmic frontier, we first propose a movingknife algorithm that outputs an envyfree allocation on trees. The algorithm is significantly simpler than the discrete and bounded envyfree algorithm introduced in [Aziz and Mackenzie, 2016] for compete graphs. Next, we give a discrete and bounded algorithm for computing a proportional allocation on transitive closure of trees, a class of graphs by taking a rooted tree and connecting all its ancestordescendant pairs.

Deterministic, Strategyproof, and Fair Cake Cutting
Vijay Menon, Kate Larson
We study the classic cake cutting problem from a mechanism design perspective, in particular focusing on deterministic mechanisms that are strategyproof and fair. We begin by looking at mechanisms that are nonwasteful and primarily show that for even the restricted class of piecewise constant valuations there exists no directrevelation mechanism that is strategyproof and even approximately proportional. Subsequently, we remove the nonwasteful constraint and show another impossibility result stating that there is no strategyproof and approximately proportional directrevelation mechanism that outputs contiguous allocations, again, for even the restricted class of piecewise constant valuations. In addition to the above results, we also present some negative results when considering an approximate notion of strategyproofness, show a connection between directrevelation mechanisms and mechanisms in the RobertsonWebb model when agents have piecewise constant valuations, and finally also present a (minor) modification to the wellknown EvenPaz algorithm that has better incentivecompatible properties for the cases when there are two or three agents.
Wednesday 23 10:30  12:00 CSHS  Heuristic Search

A Random Model for Argumentation Framework: Phase Transitions, Empirical Hardness, and Heuristics
Yong Gao
We propose and study, theoretically and empirically, a new random model for the abstract argumentation framework (AF). Our model overcomes some intrinsic difficulties of the only random model of directed graphs in the literature that is relevant to AFs, and makes it possible to study the typicalcase complexity of AF instances in terms of threshold behaviours and phase transitions. We proved that the probability for a random AF instance to have a stable/preferred extension goes through a sudden change (from 1 to 0) at the threshold of the parameters of the new model D(n, p, q), satisfying the equation 4q/((1 + q)(1+q)) = p. We showed, empirically, that in this new model, there is a clear easyhardeasy pattern of hardness (for a typical backtrackingstyle exact solvers) associated with the phase transition. Our empirical studies indicated that instances from the new model at phase transitions are much harder than those from an ErdosRenyistyle model with equal edge density. In addition to being an analytically tractable model for understanding the interplay between problems structures and effectiveness of (branching) heuristics used in practical argumentation solvers, the model can also be used to generate, in a systematic way, nontrivial AF instances with controlled features to evaluate the performance of other AF solvers.

Beyond Forks: Finding and Ranking Star Factorings for Decoupled Search
Daniel Gnad, Valerie Poser, Joerg Hoffmann
Startopology decoupling is a recent search reduction method for forward state space search. The idea basically is to automatically identify a star factoring, then search only over the center component in the star, avoiding interleavings across leaf components. The framework can handle complex star topologies, yet prior work on decoupled search considered only factoring strategies identifying fork and invertedfork topologies. Here, we introduce factoring strategies able to detect general star topologies, thereby extending the reach of decoupled search to new factorings and to new domains, sometimes resulting in significant performance improvements. Furthermore, we introduce a predictive portfolio method that reliably selects the most suitable factoring for a given planning task, leading to superior overall performance.

Online Bridged Pruning for RealTime Search with Arbitrary Lookaheads
Carlos Hernandez, Adi Botea, Jorge Baier, Vadim Bulitko
Realtime search algorithms are relevant to timesensitive decisionmaking domains such as video games and robotics. In such settings, the agent is required to decide on each action under a constant time bound, regardless of the search space size. Despite recent progress, poorquality solutions can be produced mainly due to state revisitation. Different techniques have been developed to reduce such a revisitation with state pruning showing promise. In this paper, we propose a novel pruning approach applicable to the wide class of realtime search algorithms. Given a local search space of arbitrary size, our technique aggressively prunes away all states in its interior, possibly adding new edges to maintain the connectivity of the search space frontier. An experimental evaluation shows that our pruning often improves the performance of a base realtime search algorithm by over an order of magnitude. This allows our implemented system to outperform stateoftheart realtime search algorithms used in the evaluation.

An Admissible HTN Planning Heuristic
Pascal Bercher, Gregor Behnke, Daniel Höller, Susanne Biundo
Hierarchical task network (HTN) planning is wellknown for being an efficient planning approach. This is mainly due to the success of the HTN planning system SHOP2. However, its performance depends on handdesigned search control knowledge. At the time being, there are only very few domainindependent heuristics, which are designed for differing hierarchical planning formalisms. Here, we propose an admissible heuristic for standard HTN planning, which allows to find optimal solutions heuristically. It bases upon the socalled task decomposition graph (TDG), a data structure reflecting reachable parts of the task hierarchy. We show (both in theory and empirically) that rebuilding it during planning can improve heuristic accuracy thereby decreasing the explored search space. The evaluation further studies the heuristic both in terms of plan quality and coverage.

Optimizing Ratio of Monotone Set Functions
Chao Qian, JingCheng Shi, Yang Yu, Ke Tang, ZhiHua Zhou
This paper considers the problem of minimizing the ratio of two set functions, i.e., $f/g$. Previous work assumed monotone and submodular of the two functions, while we consider a more general situation where $g$ is not necessarily submodular. We derive that the greedy approach GreedRatio, as a fixed time algorithm, achieves a $\frac{X^*}{(1+(X^* \textendash 1)(1 \textendash \kappa_f))\gamma(g)}$ approximation ratio, which also improves the previous bound for submodular $g$. If more time can be spent, we present the PORM algorithm, an anytime randomized iterative approach minimizing $f$ and $\textendash g$ simultaneously. We show that PORM using reasonable time has the same general approximation guarantee as GreedRatio, but can achieve better solutions in cases and applications.

On Subset Selection with General Cost Constraints
Chao Qian, JingCheng Shi, Yang Yu, Ke Tang
This paper considers the subset selection problem with a monotone objective function and a monotone cost constraint, which relaxes the submodular property of previous studies. We first show that the approximation ratio of the generalized greedy algorithm is $\frac{\alpha}{2}(1 \textendash \frac{1}{e^{\alpha}})$ (where $\alpha$ is the submodularity ratio); and then propose POMC, an anytime randomized iterative approach that can utilize more time to find better solutions than the generalized greedy algorithm. We show that POMC can obtain the same general approximation guarantee as the generalized greedy algorithm, but can achieve better solutions in cases and applications.
Wednesday 23 10:30  12:00 MASEP2  Economic Paradigms 2

Diverse Weighted Bipartite bMatching
Faez Ahmed, John Dickerson, Mark Fuge
Bipartite matching, where agents on one side of a market are matched to agents or items on the other, is a classical problem in computer science and economics, with widespread application in healthcare, education, advertising, and general resource allocation. A practitioner's goal is typically to maximize a matching market's economic efficiency, possibly subject to some fairness requirements that promote equal access to resources. A natural balancing act exists between fairness and efficiency in matching markets, and has been the subject of much research.In this paper, we study a complementary goalbalancing diversity and efficiencyin a generalization of bipartite matching where agents on one side of the market can be matched to sets of agents on the other. Adapting a classical definition of the diversity of a set, we propose a quadratic programmingbased approach to solving a submodular minimization problem that balances diversity and total weight of the solution. We also provide a scalable greedy algorithm with theoretical performance bounds. We then define the price of diversity, a measure of the efficiency loss due to enforcing diversity, and give a worstcase theoretical bound. Finally, we demonstrate the efficacy of our methods on three realworld datasets, and show that the price of diversity is not bad in practice. Our code is publicly accessible for further research.

Online Optimization of VideoAd Allocation
Hanna Sumita, Yasushi Kawase, Sumio Fujita, Takuro Fukunaga
In this paper, we study the video advertising in the context of internet advertising. Video advertising is a rapidly growing industry, but its computational aspects have not yet been investigated. A difference between video advertising and traditional display advertising is that the former requires more time to be viewed. In contrast to a traditional display advertisement, a video advertisement has no influence over a user unless the user watches it for a certain amount of time. Previous studies have not considered the length of video advertisements, and time spent by users to watch them. Motivated by this observation, we formulate a new online optimization problem for optimizing the allocation of video advertisements, and we develop a nearly (1 − 1/e)competitive algorithm for finding an envyfree allocation of video advertisements.

NearFeasible Stable Matchings with Budget Constraints
Yasushi Kawase, Atsushi Iwasaki
This paper deals with twosided matching with budget constraints where one side (firm or hospital) can make monetary transfers (offer wages) to the other (worker or doctor). In a standard model, while multiple doctors can be matched to a single hospital, a hospital has a maximum quota: the number of doctors assigned to a hospital cannot exceed a certain limit. In our model, a hospital instead has a fixed budget: the total amount of wages allocated by each hospital to doctors is constrained. With budget constraints, stable matchings may fail to exist and checking the existence is hard. To deal with the nonexistence of stable matchings, we extend the “matching with contracts” model by Hatfield and Milgrom, so that it handles nearfeasible matchings that exceeds each budget of the hospitals by a certain amount. We then propose two novel mechanisms that efficiently return such a nearfeasible matching that is stable with respect to the actual amount of wages allocated by each hospital. In particular, by sacrificing strategyproofness, our second mechanism achieves the best possible bound.

Optimal PostedPrice Mechanism in Microtask Crowdsourcing
Zehong Hu, Jie Zhang
Postedprice mechanisms are widelyadopted to decide the price of tasks in popular microtask crowdsourcing. In this paper, we propose a novel postedprice mechanism which not only outperforms existing mechanisms on performance but also avoids their need of a finite price range. The advantages are achieved by converting the pricing problem into a multiarmed bandit problem and designing an optimal algorithm to exploit the unique features of microtask crowdsourcing. We theoretically show the optimality of our algorithm and prove that the performance upper bound can be achieved without the need of a prior price range. We also conduct extensive experiments using real price data to verify the advantages and practicability of our mechanism.

Learning a Ground Truth Ranking Using Noisy Approval Votes
Ioannis Caragiannis, Evi Micha
We consider a voting scenario where agents have opinions that are estimates of an underlying common ground truth ranking of the available alternatives, and each agent is asked to approve a set with her most preferred alternatives. We assume that estimates are implicitly formed using the wellknown Mallows model for generating random rankings. We show that kapproval voting  where all agents are asked to approve the same number k of alternatives and the outcome is obtained by sorting the alternatives in terms of their number of approvals  has exponential sample complexity for all values of k. This negative result suggests that an exponential (in terms of the number of alternatives m) number of agents is always necessary in order to recover the ground truth ranking with high probability. In contrast, by just asking each agent to approve a random number of alternatives, the sample complexity improves dramatically: it now depends only polynomially on m. Our results may have implications on the effectiveness of crowdsourcing applications that ask workers to provide their input by approving sets of available alternatives.

Thwarting Vote Buying Through Decoy Ballots
David Parkes, Paul Tylkin, Lirong Xia
There is increasing interest in promoting participatory democracy, in particular by allowing voting by mail or internet and through randomsample elections. A pernicious concern, though, is that of vote buying, which occurs when a bad actor seeks to buy ballots, paying someone to vote against their own intent. This becomes possible whenever a voter is able to sell evidence of which way she voted. We show how to thwart vote buying through decoy ballots, which are not counted but are indistinguishable from real ballots to a buyer. We show that an Election Authority can significantly reduce the power of vote buying through a small number of optimally distributed decoys, and model societal processes by which decoys could be distributed.
Wednesday 23 10:30  12:00 MLDMFS  Data Mining and Feature Selection

Topk Supervise Feature Selection via ADMM for Integer Programming
Mingyu Fan, Xiaojun Chang, Xiaoqin Zhang, Di Wang, Liang Du
Recently, structured sparsity inducing based feature selection has become a hot topic in machine learning and pattern recognition. Most of the sparsity inducing feature selection methods are designed to rank all features by certain criterion and then select the k top ranked features, where k is an integer. However, the k top features are usually not the top k features and therefore maybe a suboptimal result. In this paper, we propose a novel supervised feature selection method to directly identify the top k features. The new method is formulated as a classic regularized least squares regression model with two groups of variables. The problem with respect to one group of the variables turn out to be a 01 integer programming, which had been considered very hard to solve. To address this, we utilize an efficient optimization method to solve the integer programming, which first replaces the discrete 01 constraints with two continuous constraints and then utilizes the alternating direction method of multipliers to optimize the equivalent problem. The obtained result is the top subset with k features under the proposed criterion rather than the subset of k top features. Experiments have been conducted on benchmark data sets to show the effectiveness of proposed method.

Symmetric Nonnegative Latent Factor Models for Undirected Large Networks
Xin Luo, MingSheng Shang, Zidong Wang
Undirected, high dimensional and sparse networks are frequently encountered in industrial applications. They contain rich knowledge regarding various useful patterns. Nonnegative latent factor (NLF) models have proven to be effective and efficient in acquiring useful knowledge from asymmetric networks. However, they cannot correctly describe the symmetry of an undirected network. For addressing this issue, this work analyzes the NLF extraction processes on asymmetric and symmetric matrices respectively, thereby innovatively achieving the symmetric and nonnegative latent factor (SNLF) models for undirected, high dimensional and sparse networks. The proposed SNLF models are equipped with a) high efficiency, b) nonnegativity, and c) symmetry. Experimental results on real networks show that they are able to a) represent the symmetry of the target network rigorously; b) maintain the nonnegativity of resulting latent factors; and c) achieve high computational efficiency when performing data analysis tasks as missing data estimation.

SitNet: Discrete Similarity Transfer Network for Zeroshot Hashing
Yuchen Guo, Guiguang Ding, Jungong Han, Yue Gao
Hashing has been widely utilized for fast image retrieval recently. With semantic information as supervision, hashing approaches perform much better, especially when combined with deep convolution neural network(CNN). However, in practice, new concepts emerge every day, making collecting supervised information for retraining hashing model infeasible. In this paper, we propose a novel \textbf{zeroshot} hashing approach, called Discrete Similarity Transfer Network (SitNet), to preserve the semantic similarity between images from both ``seen'' concepts and new ``unseen'' concepts. Motivated by zeroshot learning, the semantic vectors of concepts are adopted to capture the similarity structures among classes, making the model trained with seen concepts generalize well for unseen ones benefiting from the transferability of the semantic vector space. We adopt a multitask architecture to exploit the supervised information for seen concepts and the semantic vectors simultaneously. Moreover, a discrete hashing layer is integrated into the network for hashcode generating to avoid the information loss caused by realvalue relaxation in training phase, which is a critical problem in existing works. Experiments on three benchmarks validate the superiority of SitNet to the stateofthearts.

Handling Noise in Boolean Matrix Factorization
Martin Trnecka, Radim Belohlavek
We critically examine and point out weaknesses of the existing considerations in Boolean matrix factorization (BMF) regarding noise and the algorithms' ability to deal with noise. We argue that the current understanding is underdeveloped and that the current approaches are missing an important aspect. We provide a new, quantitative way to assess the ability of an algorithm to handle noise. Our approach is based on a commonsense definition of robustness requiring that the computed factorizations should not be affected much by varying the noise in data. We present an experimental evaluation of several existing algorithms and compare the results to the observations available in the literature. In addition to providing justification of some properties claimed in the literature without proper justification, our experiments reveal properties which were not reported as well as properties which counter certain claims made in the literature. Importantly, our approach reveals a line separating robusttonoise from sensitivetonoise algorithms, which has not been revealed by the previous approaches.

SinglePass PCA of Large HighDimensional Data
Wenjian Yu, Yu Gu, Jian Li, Shenghua Liu, Yaohang Li
Principal component analysis (PCA) is a fundamental dimension reduction tool in statistics and machine learning. For large and highdimensional data, computing the PCA (i.e., the top singular vectors of the data matrix) becomes a challenging task. In this work, a singlepass randomized algorithm is proposed to compute PCA with only one pass over the data. It is suitable for processing extremely large and highdimensional data stored in slow memory (hard disk) or the data generated in a streaming fashion. Experiments with synthetic and real data validate the algorithm's accuracy, which has orders of magnitude smaller error than an existing singlepass algorithm. For a set of highdimensional data stored as a 150 GB file, the algorithm is able to compute the first 50 principal components in just 24 minutes on a typical 24core computer, with less than 1 GB memory cost.

Learning Homophily Couplings from NonIID Data for Joint Feature Selection and NoiseResilient Outlier Detection
Guansong Pang, Longbing Cao, Ling Chen, Huan Liu
This paper introduces a novel wrapperbased outlier detection framework (WrapperOD) and its instance (HOUR) for identifying outliers in noisy data (i.e., data with noisy features) with strong couplings between outlying behaviors. Existing subspace or feature selectionbased methods are significantly challenged by such data, as their search of feature subset(s) is independent of outlier scoring and thus can be misled by noisy features. In contrast, HOUR takes a wrapper approach to iteratively optimize the feature subset selection and outlier scoring using a topk outlier ranking evaluation measure as its objective function. HOUR learns homophily couplings between outlying behaviors (i.e., abnormal behaviors are not independent  they bond together) in constructing a noiseresilient outlier scoring function to produce a reliable outlier ranking in each iteration. We show that HOUR (i) retains a 2approximation outlier ranking to the optimal one; and (ii) significantly outperforms five stateoftheart competitors on 15 realworld data sets with different noise levels in terms of AUC and/or P@n. The source code of HOUR is available at https://sites.google.com/site/gspangsite/sourcecode.
Wednesday 23 10:30  12:00 MLSSL2  SemiSupervised Learning 2

Scaling Active Search using Linear Similarity Functions
Sibi Venkatesan, James Miller, Jeff Schneider, Artur Dubrawski
Active Search has become an increasingly useful tool in information retrieval problems where the goal is to discover as many target elements as possible using only limited label queries. With the advent of big data, there is a growing emphasis on the scalability of such techniques to handle very large and very complex datasets. In this paper, we consider the problem of Active Search where we are given a similarity function between data points. We look at an algorithm introduced by Wang et al. [Wang et al., 2013] known as Active Search on Graphs and propose crucial modifications which allow it to scale significantly. Their approach selects points by minimizing an energy function over the graph induced by the similarity function on the data. Our modifications require the similarity function to be a dotproduct between feature vectors of data points, equivalent to having a linear kernel for the adjacency matrix. With this, we are able to scale tremendously: for $n$ data points, the original algorithm runs in $O(n^2)$ time per iteration while ours runs in only $O(nr + r^2)$ given $r$dimensional features. We also describe a simple alternate approach using a weightedneighbor predictor which also scales well. In our experiments, we show that our method is competitive with existing semisupervised approaches. We also briefly discuss conditions under which our algorithm performs well.

Projection Free RankDrop Steps
Edward Cheung, Yuying Li
The FrankWolfe (FW) algorithm has been widely used in solving nuclear norm constrained problems, since it does not require projections. However, FW often yields high rank intermediate iterates, which can be very expensive in time and space costs for large problems. To address this issue, we propose a rankdrop method for nuclear norm constrained problems. The goal is to generate descent steps that lead to rank decreases, maintaining lowrank solutions throughout the algorithm. Moreover, the optimization problems are constrained to ensure that the rankdrop step is also feasible and can be readily incorporated into a projectionfree minimization method, e.g., FrankWolfe. We demonstrate that by incorporating rankdrop steps into the FrankWolfe algorithm, the rank of the solution is greatly reduced compared to the original FrankWolfe or its common variants.

SemiSupervised Deep Hashing with a Bipartite Graph
Xinyu Yan, Lijun Zhang, WuJun Li
Recently, deep learning has been successfully applied to the problem of hashing, yielding remarkable performance compared to traditional methods with handcrafted features. However, most of existing deep hashing methods are designed for the supervised scenario and require a large number of labeled data. In this paper, we propose a novel semisupervised hashing method for image retrieval, named Deep Hashing with a Bipartite Graph (DHBG), to simultaneously learn embeddings, features and hash codes. More specifically, we construct a bipartite graph to discover the underlying structure of data, based on which an embedding is generated for each instance. Then, we feed raw pixels as well as embeddings to a deep neural network, and concatenate the resulting features to determine the hash code. Compared to existing methods, DHBG is a universal framework that is able to utilize various types of graphs and losses. Furthermore, we propose an inductive variant of DHBG to support outofsample extensions. Experimental results on real datasets show that our DHBG outperforms stateoftheart hashing methods.

Learning to Learn Programs from Examples: Going Beyond Program Structure
Kevin Ellis, Sumit Gulwani
Programmingbyexample technologies let end users construct and run new programs by providing examples of the intended program behavior. But, the few provided examples seldom uniquely determine the intended program. Previous approaches to picking a program used a bias toward shorter or more naturally structured programs. Our work here gives a machine learning approach for learning to learn programs that departs from previous work by relying upon features that are independent of the program structure, instead relying upon a learned bias over program behaviors, and more generally over program execution traces. Our approach leverages abundant unlabeled data for semisupervised learning, and incorporates simple kinds of world knowledge for commonsense reasoning during program induction. These techniques are evaluated in two programmingbyexample domains, improving the accuracy of program learners.

SemiSupervised Learning for Surface EMGbased Gesture Recognition
Yu Du, Yongkang Wong, Wenguang Jin, Wentao Wei, Yu Hu, Mohan Kankanhalli, Weidong Geng
Conventionally, gesture recognition based on nonintrusive musclecomputer interfaces required a stronglysupervised learning algorithm and a large amount of labeled training signals of surface electromyography (sEMG). In this work, we show that temporal relationship of sEMG signals and data glove provides implicit supervisory signal for learning the gesture recognition model. To demonstrate this, we present a semisupervised learning framework with a novel Siamese architecture for sEMGbased gesture recognition. Specifically, we employ auxiliary tasks to learn visual representation; predicting the temporal order of two consecutive sEMG frames; and, optionally, predicting the statistics of 3D hand pose with a sEMG frame. Experiments on the NinaPro, CapgMyo and cslhdemg datasets validate the efficacy of our proposed approach, especially when the labeled samples are very scarce.

Improving LearningfromCrowds through Expert Validation
Liu Jiang, Mengchen Liu, Junlin Liu, Xiting Wang, Jun Zhu, Shixia Liu
Although several effective learningfromcrowd methods have been developed to infer correct labels from noisy crowdsourced labels, a method for postprocessed expert validation is still needed. This paper introduces a semisupervised learning algorithm that is capable of selecting the most informative instances and maximizing the influence of expert labels. Specifically, we have developed a complete uncertainty assessment to facilitate the selection of the most informative instances. The expert labels are then propagated to similar instances via regularized Bayesian inference. Experiments on both realworld and simulated datasets indicate that given a specific accuracy goal (e.g., 95%) our method reduces expert effort from 39% to 60% compared with the stateoftheart method.
Wednesday 23 10:30  12:30 SISKRNLP  Sister Conference Track: Knowledge Representation and Natural Language Processing

UserBased Opinionbased Recommendation
Ruihai Dong, Barry Smyth
Usergenerated reviews are a plentiful source of user opinions and interests and can play an important role in a range of artificial intelligence contexts, particularly when it comes to recommender systems. In this paper, we describe how natural language processing and opinion mining techniques can be used to automatically mine useful recommendation knowledge from user generated reviews and how this information can be used by recommender systems in a number of classical settings.

Predicting Human Similarity Judgments with Distributional Models: The Value of Word Associations
Simon De Deyne, Amy Perfors, Dan Navarro
To represent the meaning of a word, most models use external language resources, such as text corpora, to derive the distributional properties of word usage. In this study, we propose that internal language models, that are more closely aligned to the mental representations of words, can be used to derive new theoretical questions regarding the structure of the mental lexicon. A comparison with internal models also puts into perspective a number of assumptions underlying recently proposed distributional textbased models could provide important insights into cognitive science, including linguistics and artificial intelligence. We focus on wordembedding models which have been proposed to learn aspects of word meaning in a manner similar to humans and contrast them with internal language models derived from a new extensive data set of word associations. An evaluation using relatedness judgments shows that internal language models consistently outperform current stateofthe art textbased external language models. This suggests alternative approaches to represent word meaning using properties that aren't encoded in text.

Lexicons on Demand: Neural Word Embeddings for LargeScale Text Analysis
Ethan Fast, Binbin Chen, Michael Bernstein
Human language is colored by a broad range of topics, but existing text analysis tools only focus on a small number of them. We present Empath, a tool that can generate and validate new lexical categories on demand from a small set of seed terms (like "bleed" and "punch" to generate the category violence). Empath draws connotations between words and phrases by learning a neural embedding across billions of words on the web. Given a small set of seed words that characterize a category, Empath uses its neural embedding to discover new related terms, then validates the category with a crowdpowered filter. Empath also analyzes text across 200 builtin, prevalidated categories we have generated such as neglect, government, and social media. We show that Empath's datadriven, human validated categories are highly correlated (r=0.906) with similar categories in LIWC.

Adapting Deep Network Features to Capture Psychological Representations: An Abridged Report
Joshua Peterson, Joshua Abbott, Thomas Griffiths
Deep neural networks have become increasingly successful at solving classic perception problems (e.g., recognizing objects), often reaching or surpassing humanlevel accuracy. In this abridged report of Peterson et al. [2016], we examine the relationship between the image representations learned by these networks and those of humans. We find that deep features learned in service of object classification account for a significant amount of the variance in human similarity judgments for a set of animal images. However, these features do not appear to capture some key qualitative aspects of human representations. To close this gap, we present a method for adapting deep features to align with human similarity judgments, resulting in image representations that can potentially be used to extend the scope of psychological experiments and inform humancentric AI.

Adaptive Distributed Correspondence Graphs for Grounding Abstract Spatial Concepts for Natural Language Interaction with Robot Manipulators
Rohan Paul, Jacob Arkin, Nicholas Roy, Thomas Howard
Our goal is to develop models that allow a robot to understand or ``ground" natural language instructionsin the context of its world model. Contemporary approaches estimate correspondences between an instruction and possible candidate groundings such as objects, regions and goals for a robot's action. However, these approaches are unable to reason about abstract or hierarchical concepts such as rows, columns and groups that are relevant in a manipulation domain. We introduce a probabilistic model that incorporates an expressive space of abstract spatial concepts as well as notions of cardinality and ordinality. Abstract concepts are introduced as explicit hierarchical symbols correlated with concrete groundings. Crucially, the abstract groundings form a Markov boundary over concrete groundings, effectively decorrelating them from the remaining variables in the graph which reduces the complexity of training and inference in the model. Empirical evaluation demonstrates accurate grounding of abstract concepts embedded in complex natural language instructions commanding a robot manipulator. The proposed inference method leads to significant efficiency gains compared to the baseline, with minimal tradeoff in accuracy.

Intuitionistic layered graph logic
Simon Docherty, David Pym
Models of complex systems are widely used in the physical and social sciences, and the concept of layering, typically building upon graphtheoretic structure, is a common feature. We describe an intuitionistic substructural logic that gives an account of layering. As in other bunched systems, the logic includes the usual intuitionistic connectives, together with a noncommutative, nonassociative conjunction (used to capture layering) and its associated implications. We give a soundness and completeness theorem for a labelled tableaux system with respect to a Kripke semantics on graphs. To demonstrate the utility of the logic, we show how to represent systems and security examples, illuminating the relationship between services/policies and the infrastructures/architectures to which they are applied.
Wednesday 23 10:30  12:30 EAR3  Early Career 3

Logic meets Probability: Towards Explainable AI Systems for Uncertain Worlds
Vaishak Belle
Logical AI is concerned with formal languages to represent and reason with qualitative specifications; statistical AI is concerned with learning quantitative specifications from data. To combine the strengths of these two camps, there has been exciting recent progress on unifying logic and probability. We review the many guises for this union, while emphasizing the need for a formal language to represent a system's knowledge. Formal languages allow their internal properties to be robustly scrutinized, can be augmented by adding new knowledge, and are amenable to abstractions, all of which are vital to the design of intelligent systems that are explainable and interpretable.

Knowledge Engineering for Intelligent Decision Support
María Vanina Martínez
Knowledge can be seen as the collection of skills and information an individual (or group) has acquired through experience, while intelligence as the ability to apply such knowledge. In many areas of Artificial Intelligence, we have been focusing for the last 40 years on the formalization and development of automated ways of finding and collecting data, as well as on the construction of models to represent that data adequately in a way that an automated system can make sense of it. However, in order to achieve real artificial intelligence we need to go beyond data and knowledge representation, and deeper into how such a system could, and would, use available knowledge in order to empower and enhance the capabilities of humans in making decisions in realworld applications. From my point of view, an AI should be able to combine automatically acquired data and knowledge together with specific domain expertise from the users that the tool is expected to help.

Improving Group DecisionMaking by Artificial Intelligence
Lirong Xia
We summarize some of our recent work on using AI to improve group decisionmaking by taking a unified approach from statistics, economics, and computation. We then discuss a few ongoing and future directions.

Towards Certified Unsolvability in Classical Planning
Gabriele Röger
While it is easy to verify that an action sequence is a solution for a classical planning task, there is no such verification capability if a task is reported unsolvable. We are therefore interested in certificates that allow an independent verification of the absence of solutions. We identify promising concepts for certificates that can be generated by a wide range of planning approaches. We present a first proposal of unsolvability certificates and sketch ideas how the underlying concepts can be used as part of a more flexible unsolvability proof system.
Wednesday 23 14:00  15:00 Invited Talk SuperHuman AI for Strategic Reasoning: Beating Top Pros in HeadsUp NoLimit Texas Hold'em
Tuomas Sandholm
Wednesday 23 14:00  15:00 Invited Talk Improving healthcare: challenges and opportunities for reinforcement learning
Joelle Pineau
Wednesday 23 15:00  16:00 Competition Angry Birds
Wednesday 23 15:00  16:00 NLPMT  Machine Translation

MEMD: An Effective Framework for Neural Machine Translation with Multiple Encoders and Decoders
Jinchao Zhang, Qun Liu, Jie Zhou
The encoderdecoder neural framework is widely employed for Neural Machine Translation (NMT) with a single encoder to represent the source sentence and a single decoder to generate target words. The translation performance heavily relies on the representation ability of the encoder and the generation ability of the decoder. To further enhance NMT, we propose to extend the original encoderdecoder framework to a novel one, which has multiple encoders and decoders (MEMD). Through this way, multiple encoders extract more diverse features to represent the source sequence and multiple decoders capture more complicated translation knowledge. Our proposed MEMD framework is convenient to integrate heterogeneous encoders and decoders with multiple depths and multiple types. Experiment on ChineseEnglish translation task shows that our MEMD system surpasses the stateoftheart NMT system by 2.1 BLEU points and surpasses the phrasebased Moses by 7.38 BLEU points. Our framework is general and can be applied to other sequence to sequence tasks.

Joint Training for Pivotbased Neural Machine Translation
Yong Cheng, Qian Yang, Yang Liu, Maosong Sun, Wei Xu
While recent neural machine translation approaches have delivered stateoftheart performance for resourcerich language pairs, they suffer from the data scarcity problem for resourcescarce language pairs. Although this problem can be alleviated by exploiting a pivot language to bridge the source and target languages, the sourcetopivot and pivottotarget translation models are usually independently trained. In this work, we introduce a joint training algorithm for pivotbased neural machine translation. We propose three methods to connect the two models and enable them to interact with each other during training. Experiments on Europarl and WMT corpora show that joint training of sourcetopivot and pivottotarget models leads to significant improvements over independent training across various languages.

Improved Neural Machine Translation with Source Syntax
Shuangzhi Wu, Ming Zhou, Dongdong Zhang
Neural Machine Translation (NMT) based on the encoderdecoder architecture has recently achieved the stateoftheart performance. Researchers have proven that extending word level attention to phrase level attention by incorporating sourceside phrase structure can enhance the attention model and achieve promising improvement. However, word dependencies that can be crucial to correctly understand a source sentence are not always in a consecutive fashion (i.e. phrase structure), sometimes they can be in long distance. Phrase structures are not the best way to explicitly model long distance dependencies. In this paper we propose a simple but effective method to incorporate sourceside long distance dependencies into NMT. Our method based on dependency trees enriches each source state with global dependency structures, which can better capture the inherent syntactic structure of source sentences. Experiments on ChineseEnglish and EnglishJapanese translation tasks show that our proposed method outperforms stateoftheart SMT and NMT baselines.

Maximum Expected Likelihood Estimation for Zeroresource Neural Machine Translation
Hao Zheng, Yong Cheng, Yang Liu
While neural machine translation (NMT) has made remarkable progress in translating a handful of highresource language pairs recently, parallel corpora are not always available for many zeroresource language pairs. To deal with this problem, we propose an approach to zeroresource NMT via maximum expected likelihood estimation. The basic idea is to maximize the expectation with respect to a pivottosource translation model for the intended sourcetotarget model on a pivottarget parallel corpus. To approximate the expectation, we propose two methods to connect the pivottosource and sourcetotarget models. Experiments on two zeroresource language pairs show that the proposed approach yields substantial gains over baseline methods. We also observe that when trained jointly with the sourcetotarget model, the pivottosource translation model also obtains improvements over independent training.
Wednesday 23 15:00  16:00 NLPSATM  Sentiment Analysis and Text Mining

Opinionaware Knowledge Graph for Political Ideology Detection
Wei Chen, Xiao Zhang, Tengjiao Wang, Bishan Yang, Yi Li
Identifying individual's political ideology from their speeches and written texts is important for analyzing political opinions and user behavior on social media. Traditional opinion mining methods rely on bagofwords representations to classify texts into different ideology categories. Such methods are too coarse for understanding political ideologies. The key to identify different ideologies is to recognize different opinions expressed toward a specific topic. To model this insight, we classify ideologies based on the distribution of opinions expressed towards realworld entities or topics. Specifically, we propose a novel approach to political ideology detection that makes predictions based on an opinionaware knowledge graph. We show how to construct such graph by integrating the opinions and targeted entities extracted from text into an existing structured knowledge base, and show how to perform ideology inference by information propagation on the graph. Experimental results demonstrate that our method achieves high accuracy in detecting ideologies compared to baselines including LR, SVM and RNN.

EndtoEnd Adversarial Memory Network for Crossdomain Sentiment Classification
Zheng LI, Yu Zhang, Ying WEI, Yuxiang Wu, Qiang Yang
Domain adaptation tasks such as crossdomain sentiment classification have raised much attention in recent years. Due to the domain discrepancy, a sentiment classifier trained in a source domain may not work well when directly applied to a target domain. Traditional methods need to manually select pivots, which behave in the same way for discriminative learning in both domains. Recently, deep learning methods have been proposed to learn a representation shared by domains. However, they lack the interpretability to directly identify the pivots. To address the problem, we introduce an endtoend Adversarial Memory Network (AMN) for crossdomain sentiment classification. Unlike existing methods, our approach can automatically capture the pivots using an attention mechanism. Our framework consists of two parametershared memory networks: one is for sentiment classification and the other is for domain classification. The two networks are jointly trained so that the selected features minimize the sentiment classification error and at the same time make the domain classifier indiscriminative between the representations from the source or target domains. Moreover, unlike deep learning methods that cannot tell us which words are the pivots, our approach can offer a direct visualization of them. Experiments on the Amazon review dataset demonstrate that our approach can significantly outperform stateoftheart methods.

Stance Classification with Targetspecific Neural Attention
Jiachen Du, Ruifeng Xu, Yulan He, Lin Gui
Stance classification, which aims at detecting the stance expressed in text towards a specific target, is an emerging problem in sentiment analysis. A major difference between stance classification and traditional aspectlevel sentiment classification is that the identification of stance is dependent on target which might not be explicitly mentioned in text. This indicates that apart from text content, the target information is important to stance detection. To this end, we propose a neural networkbased model, which incorporates targetspecific information into stance classification by following a novel attention mechanism. In specific, the attention mechanism is expected to locate the critical parts of text which are related to target. Our evaluations on both the English and Chinese Stance Detection datasets show that the proposed model achieves the stateoftheart performance.

Interactive Attention Networks for AspectLevel Sentiment Classification
Dehong Ma, Sujian Li, Xiaodong Zhang, Houfeng Wang
Aspectlevel sentiment classification aims at identifying the sentiment polarity of specific target in its context. Previous approaches have realized the importance of targets in sentiment classification and developed various methods with the goal of precisely modeling thier contexts via generating targetspecific representations. However, these studies always ignore the separate modeling of targets. In this paper, we argue that both targets and contexts deserve special treatment and need to be learned their own representations via interactive learning. Then, we propose the interactive attention networks (IAN) to interactively learn attentions in the contexts and targets, and generate the representations for targets and contexts separately. With this design, the IAN model can well represent a target and its collocative context, which is helpful to sentiment classification. Experimental results on SemEval 2014 Datasets demonstrate the effectiveness of our model.
Wednesday 23 15:00  16:00 PLAPR  Activity and Plan Recognition

New Metrics and Algorithms for Stochastic Goal Recognition Design Problems
Christabel Wayllace, Ping Hou, William Yeoh
Goal Recognition Design (GRD) problems involve identifying the best ways to modify the underlying environment that agents operate in, typically by making a subset of feasible actions infeasible, in such a way that agents are forced to reveal their goals as early as possible. The Stochastic GRD (SGRD) model is an important extension that introduced stochasticity to the outcome of agent actions. Unfortunately, the worstcase distinctiveness (wcd) metric proposed for SGRDs has a formal definition that is inconsistent with its intuitive definition, which is the maximal number of actions an agent can take, in the expectation, before its goal is revealed. In this paper, we make the following contributions: (1) We propose a new wcd metric, called allgoals wcd (wcdag), that remedies this inconsistency; (2) We introduce a new metric, called expectedcase distinctiveness (ecd), that weighs the possible goals based on their importance; (3) We provide theoretical results comparing these different metrics as well as the complexity of computing them optimally; and (4) We describe new efficient algorithms to compute the wcdag and ecd values.

Deceptive pathplanning
Peta Masters, Sebastian Sardina
Deceptive pathplanning involves finding a path such that the probability of an observer identifying the path's final destination  before it has been reached  is minimised. This paper formalises deception as it applies to pathplanning and introduces the notion of a last deceptive point (LDP) which, when measured in terms of 'path completion', can be used to rank paths by their potential to deceive. Building on recent developments in probabilistic goalrecognition, we propose a formula to calculate an optimal LDP and present strategies for the generation of deceptive paths by both simulation ('showing the false') and dissimulation ('hiding the real').

Heuristic Online Goal Recognition in Continuous Domains
Mor Vered, Gal A. Kaminka
Goal recognition is the problem of inferring the goal of an agent, based on its observed actions. An inspiring approach—plan recognition by planning (PRP)—uses offtheshelf planners to dynamically generate plans for given goals, eliminating the need for the traditional plan library. However, existing PRP formulation is inherently inefficient in online recognition, and cannot be used with motion planners for continuous spaces. In this paper, we utilize a different PRP formulation which allows for online goal recognition, and for application in continuous spaces. We present an online recognition algorithm, where two heuristic decision points may be used to improve runtime significantly over existing work. We specify heuristics for continuous domains, prove guarantees on their use, and empirically evaluate the algorithm over hundreds of experiments in both a 3D navigational environment and a cooperative robotic team task.

Bridging the Gap between Observation and Decision Making: Goal Recognition and Flexible Resource Allocation in Dynamic Network Interdiction
Kai Xu, Kaiming Xiao, Quanjun Yin, Yabing Zha, Cheng Zhu
Goal recognition, which is the task of inferring an agent’s goals given some or all of the agent’s observed actions, is one of the important approaches in bridging the gap between the observation and decision making within an observeorientdecideact cycle. Unfortunately, few researches focus on how to improve the utilization of knowledge produced by a goal recognition system. In this work, we propose a Markov Decision Processbased goal recognition approach tailored to a dynamic shortestpath local network interdiction (DSPLNI) problem. We first introduce a novel DSPLNI model and its solvable dual form so as to incorporate realtime knowledge acquired from goal recognition system. Then a Markov Decision Processbased goal recognition model along with its dynamic Bayesian network representation and the applied goal inference method is proposed to identify the evader’s real goal within the DSPLNI context. Based on that, we further propose an efficient scalable technique in maintaining action utility map used in fast goal inference, and develop a flexible resource assignment mechanism in DSPLNI using knowledge from goal recognition system. Experimental results show the effectiveness and accuracy of our methods both in goal recognition and dynamic network interdiction.
Wednesday 23 15:00  16:00 MASATA  Agreement Technologies: Argumentation

Acceptability Semantics for Weighted Argumentation Frameworks
Leila AMGOUD, Jonathan BenNaim, Srdjan Vesic, Dragan Doder
The paper studies semantics that evaluate arguments in argumentation graphs, where each argument has a basic strength, and may be attacked by other arguments. It starts by defining a set of principles, each of which is a property that a semantics could satisfy. It provides the first formal analysis and comparison of existing semantics. Finally, it defines three novel semantics that satisfy more principles than existing ones.

Measuring the Intensity of Attacks in Argumentation Graphs with Shapley Value
Leila AMGOUD, Jonathan BenNaim, Srdjan Vesic
In an argumentation setting, a semantics evaluates the overall acceptability of arguments. Consequently, it reveals the global loss incurred by each argument due to attacks. However, it does not say anything on the contribution of each attack to that loss. This paper introduces the novel concept of contribution measure which evaluates those contributions. It starts by defining a set of axioms that a reasonable measure would satisfy, then shows that the Shapley value is the unique measure that satisfies them. Finally, it investigates the properties of the latter under existing semantics.

A Bayesian Approach to ArgumentBased Reasoning for Attack Estimation
Hiroyuki Kido, Keishi Okamoto
Although the web is a source of a large amount of arguments and their acceptability statuses (e.g., votes for and against the arguments), relations existing between those arguments are usually not available. This paper asks, given acceptability statuses of arguments, how one should utilise acceptability semantics to statistically estimate an attack relation between those arguments. We define a Bayesian network model of argumentbased reasoning in which Dung's acceptability semantics gives substance of Bayesian inference. We show correctness of our model by analysing properties of estimated attack relations and by illustrating its applicability to online forums.

Efficient Computation of Extensions for Dynamic Abstract Argumentation Frameworks: An Incremental Approach
Gianvincenzo Alfano, Sergio Greco, Francesco Parisi
Abstract argumentation frameworks (AFs) are a wellknown formalism for modelling and deciding many argumentation problems. Computational issues and evaluation algorithms have been deeply investigated for static AFs, whose structure does not change over the time. However, AFs are often dynamic as a consequence of the fact that argumentation is inherently dynamic. In this paper, we tackle the problem of incrementally computing extensions for dynamic AFs: given an initial extension and an update (or a set of updates), we devise a technique for computing an extension of the updated AF under four wellknown semantics (i.e., complete, preferred, stable, and grounded). The idea is to identify a reduced (updated) AF sufficient to compute an extension of the whole AF and use stateoftheart algorithms to recompute an extension of the reduced AF only. The experiments reveal that, for all semantics considered and using different solvers, the incremental technique is on average two orders of magnitude faster than computing the semantics from scratch.
Wednesday 23 15:00  16:00 MASABS  AgentBased Simulation

Enhancing Sustainability of Complex Epidemiological Models through a Generic Multilevel Agentbased Approach
Sebastien Picault, YuLin Huang, Vianney Sicard, Pauline Ezanno
The development of computational sciences has fostered major advances in life sciences, but also led to reproducibility and reliability issues, which become a crucial stake when simulations are aimed at assessing control measures, as in epidemiology. A broad use of software development methods is a useful remediation to reduce those problems, but preventive approaches, targeting not only implementation but also model design, are essential to sustainable enhancements. Among them, AI techniques, based on the separation between declarative and procedural concerns, and on knowledge engineering, offer promising solutions. Especially, multilevel multiagent systems, deeply rooted in that culture, provide a generic way to integrate several epidemiological modeling paradigms within a homogeneous interface. We explain in this paper how this approach is used for building more generic, reliable and sustainable simulations, illustrated by realcase applications in cattle epidemiology.

Factorized Asymptotic Bayesian Policy Search for POMDPs
Masaaki Imaizumi, Ryohei Fujimaki
This paper proposes a novel direct policy search (DPS) method with model selection for partially observed Markov decision processes (POMDPs). DPSs have been standard for learning POMDPs due to their computational efficiency and natural ability to maximize total rewards. An important open challenge for the best use of DPS methods is model selection, i.e., determination of the proper dimensionality of hidden states and complexity of policy functions, to mitigate overfitting in highlyflexible model representations of POMDPs. This paper bridges Bayesian inference and reward maximization and derives marginalized weighted loglikelihood~(MWL) for POMDPs which takes both advantages of Bayesian model selection and DPS. Then we propose factorized asymptotic Bayesian policy search (FABPS) to explore the model and the policy which maximizes MWL by expanding recentlydeveloped factorized asymptotic Bayesian inference. Experimental results show that FABPS outperforms stateoftheart model selection methods for POMDPs, with respect both to model selection and to expected total rewards.

Interactionbased ontology alignment repair with expansion and relaxation
Jérôme Euzenat
Agents may use ontology alignments to communicate when they represent knowledge with different ontologies: alignments help reclassifying objects from one ontology to the other. These alignments may not be perfectly correct, yet agents have to proceed. They can take advantage of their experience in order to evolve alignments: upon communication failure, they will adapt the alignments to avoid reproducing the same mistake. Such repair experiments had been performed in the framework of networks of ontologies related by alignments. They revealed that, by playing simple interaction games, agents can effectively repair random networks of ontologies. Here we repeat these experiments and, using new measures, show that previous results were underestimated. We introduce new adaptation operators that improve those previously considered. We also allow agents to go beyond the initial operators in two ways: they can generate new correspondences when they discard incorrect ones, and they can provide less precise answers. The combination of these modalities satisfy the following properties: (1) Agents still converge to a state in which no mistake occurs. (2) They achieve results far closer to the correct alignments than previously found. (3) They reach again 100\% precision and coherent alignments.

Aggressive, Tense or Shy? Identifying Personality Traits from Crowd Videos
Aniket Bera, Dinesh Manocha, Tanmay Randhavane
We present a realtime algorithm to automatically classify the behavior or personality of a pedestrian based on his or her movements in a crowd video. Our classification criterion is based on Personality Trait theory. We present a statistical scheme that dynamically learns the behavior of every pedestrian and computes its motion model. This model is combined with global crowd characteristics to compute the movement patterns and motion dynamics and use them for crowd prediction. Our learning scheme is general and we highlight its performance in identifying the personality of different pedestrians in low and high density crowd videos. We also evaluate the accuracy by comparing the results with a user study.
Wednesday 23 15:00  16:00 MTKBSE  KnowledgeBased Software Engineering

Leveraging Human Knowledge in Tabular Reinforcement Learning: A Study of Human Subjects
Ariel Rosenfeld, Matt Taylor, Sarit Kraus
Reinforcement Learning (RL) can be extremely effective in solving complex, realworld problems. However, injecting human knowledge into an RL agent may require extensive effort on the human designer's part. To date, human factors are generally not considered in the development and evaluation of possible approaches. In this paper, we propose and evaluate a novel method, based on human psychology literature, which we show to be both effective and efficient, for both expert and nonexpert designers, in injecting human knowledge for speeding up tabular RL.

Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code
Huihui Wei, Ming Li
Software clone detection, aiming at identifying out code fragments with similar functionalities, has played an important role in software maintenance and evolution. Many clone detection approaches have been proposed. However, most of them represent source codes with handcrafted features using lexical or syntactical information, or unsupervised deep features, which makes it difficult to detect the functional clone pairs, i.e., pieces of codes with similar functionality but differing in both syntactical and lexical level. In this paper, we address the software functional clone detection problem by learning supervised deep features. We formulate the clone detection as a supervised learning to hash problem and propose an endtoend deep feature learning framework called CDLH for functional clone detection. Such framework learns hash codes by exploiting the lexical and syntactical information for fast computation of functional similarity between code fragments. Experiments on software clone detection benchmarks indicate that the CDLH approach is effective and outperforms the stateoftheart approaches in software functional clone detection.

Enhancing the Unified Features to Locate Buggy Files by Exploiting the Sequential Nature of Source Code
Xuan Huo, Ming Li
Bug reports provide an effective way for endusers to disclose potential bugs hidden in a software system, while automatically locating the potential buggy source files according to a bug report remains a great challenge in software maintenance. Many previous approaches represent bug reports and source code from lexical and structural information correlated their relevance by measuring their similarity, and recently a CNNbased model is proposed to learn the unified features for bug localization, which overcomes the difficulty in modeling natural and programming languages with different structural semantics. However, previous studies fail to capture the sequential nature of source code, which carries additional semantics beyond the lexical and structural terms and such information is vital in modeling program functionalities and behaviors. In this paper, we propose a novel model LSCNN, which enhances the unified features by exploiting the sequential nature of source code. LSCNN combines CNN and LSTM to extract semantic features for automatically identifying potential buggy source code according to a bug report. Experimental results on widelyused software projects indicate that LSCNN significantly outperforms the stateoftheart methods in locating buggy files.

DeepAM: Migrate APIs with Multimodal Sequence to Sequence Learning
Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, Sunghun Kim
Computer programs written in one language are often required to be ported to other languages to support multiple devices and environments. When programs use language specific APIs (Application Programming Interfaces), it is very challenging to migrate these APIs to the corresponding APIs written in other languages. Existing approaches mine API mappings from projects that have corresponding versions in two languages. They rely on the sparse availability of bilingual projects, thus producing a limited number of API mappings. In this paper, we propose an intelligent system called DeepAM for automatically mining API mappings from a largescale code corpus without bilingual projects. The key component of DeepAM is based on the multimodal sequence to sequence learning architecture that aims to learn joint semantic representations of bilingual API sequences from big source code data. Experimental results indicate that DeepAM significantly increases the accuracy of API mappings as well as the number of API mappings when compared with the stateoftheart approaches.
Wednesday 23 15:00  16:00 UAIUAI  Uncertainty

Plato's Cave in the DempsterShafer landthe Link between Pignistic and Plausibility Transformations
Chunlai Zhou, Biao Qin, Xiaoyong Du
In reasoning under uncertainty in AI, there are (at least) two useful and different ways of understanding beliefs: the first is as absolute belief or degree of belief in propositions and the second is as belief update or measure of change in belief. Pignistic and plausibility transformations are two wellknown probability transformations that map belief functions to probability functions in the DempsterShafer theory of evidence. In this paper, we establish the link between pignistic and plausibility transformations by devising a beliefupdate framework for belief functions where plausibility transformation works on belief update while pignistic transformation operates on absolute belief. In this framework, we define a new beliefupdate operator connecting the two transformations, and interpret the framework in a belieffunction model of parametric statistical inference. As a metaphor, these two transformations projecting the beliefupdate framework for belief functions to that for probabilities are likened to the fire projecting reality into shadows on the wall in Plato's cave.

Adaptive Elicitation of Preferences under Uncertainty in Sequential Decision Making Problems
Nawal Benabbou, Patrice Perny
This paper aims to introduce an adaptive preference elicitation method for interactive decision support in sequential decision problems. The Decision Maker's preferences are assumed to be representable by an additive utility, initially unknown or imperfectly known. We first study the determination of possibly optimal policies when admissible utilities are imprecisely defined by some linear constraints derived from observed preferences. Then, we introduce a new approach interleaving elicitation of utilities and backward induction to incrementally determine an optimal or nearoptimal policy. We propose an interactive algorithm with performance guarantees and describe numerical experiments demonstrating the practical efficiency of our approach.

Incremental Decision Making Under Risk with the Weighted Expected Utility Model
Hugo Gilbert, Nawal Benabbou, Patrice Perny, Olivier Spanjaard, Paolo Viappiani
This paper deals with decision making under risk with the Weighted Expected Utility (WEU) model, which is a model generalizing expected utility and providing stronger descriptive possibilities. We address the problem of identifying, within a given set of lotteries, a (near)optimal solution for a given decision maker consistent with the WEU theory. The WEU model is parameterized by two realvalued functions. We propose here a new incremental elicitation procedure to progressively reduce the imprecision about these functions until a robust decision can be made. We also give experimental results showing the practical efficiency of our method.

Causal Discovery from Nonstationary/Heterogeneous Data: Skeleton Estimation and Orientation Determination
Kun Zhang, Biwei Huang, Jiji Zhang, Bernhard Schölkopf, Clark Glymour
It is commonplace to encounter nonstationary or heterogeneous data, of which the underlying generating process changes over time or across data sets (the data sets may have different experimental conditions or data collection conditions). Such a distribution shift feature presents both challenges and opportunities for causal discovery. In this paper we develop a principled framework for causal discovery from such data, called Constraintbased causal Discovery from Nonstationary/heterogeneous Data (CDNOD), which addresses two important questions. First, we propose an enhanced constraintbased procedure to detect variables whose local mechanisms change and recover the skeleton of the causal structure over observed variables. Second, we present a way to determine causal orientations by making use of independence changes in the data distribution implied by the underlying causal model, benefiting from information carried by changing distributions. Experimental results on various synthetic and realworld data sets are presented to demonstrate the efficacy of our methods.
Wednesday 23 15:00  16:00 KRCCR  Computational Complexity of Reasoning

On the Kernelization of Global Constraints
Clément Carbonnel, Emmanuel Hebrard
Kernelization is a powerful concept from parameterized complexity theory that captures (a certain idea of) efficient polynomialtime preprocessing for hard decision problems. However, exploiting this technique in the context of constraint programming is challenging. Building on recent results for the VertexCover constraint, we introduce novel "lossless" kernelization variants that are tailored for constraint propagation. We showcase the theoretical interest of our ideas on two constraints, VertexCover and EdgeDominatingSet.

On the Complexity of Enumerating the Extensions of Abstract Argumentation Frameworks
Markus Kröll, Reinhard Pichler, Stefan Woltran
Several computational problems of abstract argumentation frameworks (AFs) such as skeptical and credulous reasoning, existence of a nonempty extension, verification, etc. have been thoroughly analyzed for various semantics. In contrast, the enumeration problem of AFs (i.e., the problem of computing all extensions according to some semantics) has been left unexplored so far. The goal of this paper is to fill this gap. We thus investigate the enumeration complexity of AFs for a large collection of semantics and, in addition, consider the most common structural restrictions on AFs.

A General Notion of Equivalence for Abstract Argumentation
Ringo Baumann, Wolfgang Dvořák, Thomas Linsbichler, Stefan Woltran
We introduce a parametrized equivalence notion for abstract argumentation that subsumes standard and strong equivalence as corner cases. Under this notion, two argumentation frameworks are equivalent if they deliver the same extensions under any addition of arguments and attacks that do not affect a given set of core arguments. As we will see, this notion of equivalence nicely captures the concept of local simplifications. We provide exact characterizations and complexity results for deciding our new notion of equivalence.

On the Computational Complexity of Gossip Protocols
Krzysztof Apt, Eryk Kopczyński, Dominik Wojtczak
Gossip protocols deal with a group of communicating agents, each holding a private information, and aim at arriving at a situation in which all the agents know each other secrets. Distributed epistemic gossip protocols are particularly simple distributed programs that use formulas from an epistemic logic. Recently, the implementability of these distributed protocols was established (which means that the evaluation of these formulas is decidable), and the problems of their partial correctness and termination were shown to be decidable, but their exact computational complexity was left open. We show that for any monotonic type of calls the implementability of a distributed epistemic gossip protocol is a $P^{NP}_{}$complete problem, while the problems of its partial correctness and termination are in $coNP^{NP}$.
Wednesday 23 15:00  16:00 MLSL  Structured Learning

Parsing Natural Language Conversations using Contextual Cues
Shashank Srivastava, Amos Azaria, Tom Mitchell
In this work, we focus on semantic parsing of natural language conversations. Most existing methods for semantic parsing are based on understanding the semantics of a single sentence at a time. However, understanding conversations also requires an understanding of conversational context and discourse structure across sentences. We formulate semantic parsing of conversations as a structured prediction task, incorporating structural features that model the `flow of discourse' across sequences of utterances. We create a dataset for semantic parsing of conversations, consisting of 113 reallife sequences of interactions of human users with an automated email assistant. The data contains 4759 natural language statements paired with annotated logical forms. Our approach yields significant gains in performance over traditional semantic parsing.

ROUTE: Robust Outlier Estimation for Low Rank Matrix Recovery
Xiaojie Guo, Zhouchen Lin
In practice, even very highdimensional data are typically sampled from lowdimensional subspaces but with intrusion of outliers and/or noises. Recovering the underlying structure and the pollution from the observations is key to understanding and processing such data. Besides properly modeling the lowrank structure of subspace, how to handle the pollution, is core regarding the performance of recovery. Often, the observed data is posed as a superimposition of the clean data and residual, while the residual can be roughly divided into two groups, including small dense noises and gross sparse outliers. Compared with small noises, outliers more likely ruin the recovery, as they can be arbitrarily large. By considering the above, this paper designs a method for recovering the low rank matrix with robust outlier estimation, termed as ROUTE, in a unified manner. Theoretical analysis on convergence and optimality, and experimental results on both synthetic and real data are provided to demonstrate the efficacy of our proposed method and show its superiority over other stateofthearts.

Sense Beauty by Label Distribution Learning
Yi Ren, Xin Geng
Beauty is always an attractive topic in the human society, not only artists and psychologists, but also scientists have been searching for an answer  what is beautiful. This paper presents an approach to learning the human sense toward facial beauty. Different from previous study, the human sense is represented by a label distribution, which covers the full range of beauty ratings and indicates the degree to which each beauty rating describes the face. The motivation is that the human sense of beauty is generally quite subjective, thus it might be inappropriate to represent it with a single scalar, as most previous work does. Therefore, we propose a method called Beauty Distribution Transformation(BDT) to covert the kwise ratings to label distributions and propose a learning method called Structural Label Distribution Learning(SLDL) based on structural Support Vector Machine to learn the human sense of facial beauty.

Efficient Inexact Proximal Gradient Algorithm for Nonconvex Problems
Quanming Yao, James Kwok, Fei Gao, Wei Chen, TieYan Liu
While proximal gradient algorithm is originally designed for convex optimization, several variants have been recently proposed for nonconvex problems. Among them, nmAPG \cite{li2015accelerated} is the stateofart. However, it is inefficient when the proximal step does not have closedform solution, or such solution exists but is expensive, as it requires more than one proximal steps to be exactly solved in each iteration. In this paper, we propose an efficient accelerate proximal gradient (niAPG) algorithm for nonconvex problems. In each iteration, it requires only one inexact (less expensive) proximal step. Convergence to a critical point is still guaranteed, and a $O(1/k)$ convergence rate is derived. Experiments on image inpainting and matrix completion problems demonstrate that the proposed algorithm has comparable performance as the stateoftheart, but is much faster.
Wednesday 23 15:00  16:00 Competition ANAC
Wednesday 23 15:00  16:00 NLPDIS  Discourse

A Deep Neural Network for Chinese Zero Pronoun Resolution
Qingyu Yin, WeiNan Zhang, Yu Zhang, Ting Liu
Existing approaches for Chinese zero pronoun resolution overlook semantic information. This is because zero pronouns have no descriptive information, which results in difficulty in explicitly capturing their semantic similarities with antecedents. Moreover, when dealing with candidate antecedents, traditional systems simply take advantage of the local information of a single candidate antecedent while failing to consider the underlying information provided by the other candidates from a global perspective. To address these weaknesses, we propose a novel zero pronounspecific neural network, which is capable of representing zero pronouns by utilizing the contextual information at the semantic level. In addition, when dealing with candidate antecedents, a twolevel candidate encoder is employed to explicitly capture both the local and global information of candidate antecedents. We conduct experiments on the Chinese portion of the OntoNotes 5.0 corpus. Experimental results show that our approach substantially outperforms the stateoftheart method in various experimental settings.

Inferring Implicit Event Locations from Context with Distributional Similarities
JinWoo Chung, Wonsuk Yang, Jinseon You, Jong Park
Automatic event location extraction from text plays a crucial role in many applications such as infectious disease surveillance and natural disaster monitoring. The fundamental limitation of previous work such as SpaceEval is the limited scope of extraction, targeting only at locations that are explicitly stated in a syntactic structure. This leads to missing a lot of implicit information inferable from context in a document, which amounts to nearly 40% of the entire location information. To overcome this limitation for the first time, we present a system that infers the implicit event locations from a given document. Our system exploits distributional semantics, based on the hypothesis that if two events are described by similar expressions, it is likely that they occur in the same location. For example, if “A bomb exploded causing 30 victims” and “many people died from terrorist attack in Boston” are reported in the same document, it is highly likely that the bomb exploded in Boston. Our system shows good performance of a 0.58 F1score, where stateoftheart classifiers for intrasentential spatiotemporal relations achieve around 0.60 F1scores.

SWIM: A Simple Word Interaction Model for Implicit Discourse Relation Recognition
Wenqiang Lei, Xuangcong Wang, Meichun Liu, Ilija Ilievski, Xiangnan He, MinYen Kan
Capturing the semantic interaction of pairs of words across arguments and proper argument representation are both crucial issues in implicit discourse relation recognition. The current stateoftheart represents arguments as distributional vectors that are computed via bidirectional Long ShortTerm Memory networks (BiLSTMs), known to have significant model complexity.In contrast, we demonstrate that wordweighted averaging can encode argument representation which can incorporate word pair information efficiently. By saving an order of magnitude in parameters, our proposed model achieves equivalent performance, but trains seven times faster.

Tosca: Operationalizing Commitments Over Information Protocols
Thomas C. King, Akın Günay, Amit K. Chopra, Munindar Singh
The notion of commitment is widely studied as a highlevel abstraction for modeling multiagent interaction. An important challenge is supporting flexible decentralized enactments of commitment specifications. In this paper, we combine recent advances on specifying commitments and information protocols. Specifically, we contribute Tosca, a technique for automatically synthesizing information protocols from commitment specifications. Our main result is that the synthesized protocols support commitment alignment, which is the idea that agents must make compatible inferences about their commitments despite decentralization.
Wednesday 23 15:00  16:00 Panel AI and Societal Challenges
Wednesday 23 16:30  18:00 NLPAT2  NLP Applications and Tools 2

A FeatureEnriched Neural Model for Joint Chinese Word Segmentation and PartofSpeech Tagging
Xinchi Chen, Xipeng Qiu, Xuanjing Huang
Recently, neural network models for natural language processing tasks have been increasingly focused on for their ability of alleviating the burden of manual feature engineering. However, the previous neural models cannot extract the complicated feature compositions as the traditional methods with discrete features. In this work, we propose a featureenriched neural model for joint Chinese word segmentation and partofspeech tagging task. Specifically, to simulate the feature templates of traditional discrete feature based models, we use different filters to model the complex compositional features with convolutional and pooling layer, and then utilize long distance dependency information with recurrent layer. Experimental results on five different datasets show the effectiveness of our proposed model.

Learning Conversational Systems that Interleave Task and NonTask Content
Zhou Yu, Alexander Rudnicky, Alan Black
Taskoriented dialog systems have been applied in various tasks, such as automated personal assistants, customer service providers and tutors. These systems work well when users have clear and explicit intentions that are wellaligned to the systems' capabilities. However, they fail if users intentions are not explicit.To address this shortcoming, we propose a framework to interleave nontask content (i.e.everyday social conversation) into task conversations. When the task content fails, the system can still keep the user engaged with the nontask content. We trained a policy using reinforcement learning algorithms to promote longturn conversation coherence and consistency, so that the system can have smooth transitions between task and nontask content.To test the effectiveness of the proposed framework, we developed a movie promotion dialog system. Experiments with human users indicate that a system that interleaves social and task content achieves a better task success rate and is also rated as more engaging compared to a pure taskoriented system.

Predicting the Quality of Short Narratives from Social Media
Tong Wang, Ping Chen, Boyang Li
An important and difficult challenge in building computational models for narratives is the automatic evaluation of narrative quality. Quality evaluation connects narrative understanding and generation as generation systems need to evaluate their own products. To circumvent difficulties in acquiring annotations, we employ upvotes in social media as an approximate measure for story quality. We collected 54,484 answers from a crowdpowered questionandanswer website, Quora and then used active learning to build a classifier that labeled 28,320 answers as stories. To predict the number of upvotes without the use of social network features, we create neural networks that model textual regions and the interdependence among regions, which serve as strong benchmarks for future research. To our best knowledge, this is the first largescale study for automatic evaluation of narrative quality.

AGRA: An AnalysisGenerationRanking Framework for Automatic Abbreviation from Paper Titles
Jianbing Zhang, Yixin Sun, Shujian Huang, Camtu Nguyen, Xiaoliang Wang, XinYu Dai, Jiajun Chen, Yang Yu
People sometimes choose wordlike abbreviations to refer to items with a long description. These abbreviations usually come from the descriptive text of the item and are easy to remember and pronounce, while preserving the key idea of the item. Coming up with a nice abbreviation is not an easy job, even for human. Previous assistant naming systems compose names by applying handwritten rules, which may not perform well. In this paper, we propose to view the naming task as an artificial intelligence problem and create a data set in the domain of academic naming. To generate more delicate names, we propose a threestep framework, including description analysis, candidate generation and abbreviation ranking, each of which is parameterized and optimizable. We conduct experiments to compare different settings of our framework with several analysis approaches from different perspectives. Compared to online or baseline systems, our framework could achieve the best results.

Learning to Identify Ambiguous and Misleading News Headlines
Wei Wei, Xiaojun Wan
Accuracy is one of the basic principles of journalism. However, it is increasingly hard to manage due to the diversity of news media. Some editors of online news tend to use catchy headlines which trick readers into clicking. These headlines are either ambiguous or misleading, degrading the reading experience of the audience. Thus, identifying inaccurate news headlines is a task worth studying. Previous work names these headlines ``clickbaits'' and mainly focus on the features extracted from the headlines, which limits the performance since the consistency between headlines and news bodies is underappreciated. In this paper, we clearly redefine the problem and identify ambiguous and misleading headlines separately. We utilize class sequential rules to exploit structure information when detecting ambiguous headlines. For the identification of misleading headlines, we extract features based on the congruence between headlines and bodies. To make use of the large unlabeled data set, we apply a cotraining method and gain an increase in performance. The experiment results show the effectiveness of our methods. Then we use our classifiers to detect inaccurate headlines crawled from different sources and conduct a data analysis.

Learning to Explain Entity Relationships by Pairwise Ranking with Convolutional Neural Networks
Jizhou Huang, Wei Zhang, Shiqi Zhao, Shiqiang Ding, Haifeng Wang
Providing a plausible explanation for the relationship between two related entities is an important task in some applications of knowledge graphs, such as in search engines. However, most existing methods require a large number of manually labeled training data, which cannot be applied in largescale knowledge graphs due to the expensive data annotation. In addition, these methods typically rely on costly handcrafted features. In this paper, we propose an effective pairwise ranking model by leveraging clickthrough data of a Web search engine to address these two problems. We first construct largescale training data by leveraging the querytitle pairs derived from clickthrough data of a Web search engine. Then, we build a pairwise ranking model which employs a convolutional neural network to automatically learn relevant features. The proposed model can be easily trained with backpropagation to perform the ranking task. The experiments show that our method significantly outperforms several strong baselines.
Wednesday 23 16:30  18:00 MTSS2  Social Sciences 2

Depression Detection via Harvesting Social Media: A Multimodal Dictionary Learning Solution
Guangyao Shen, Jia Jia, Liqiang Nie, Fuli Feng, Cunjun Zhang, Tianrui Hu, TatSeng Chua, Wenwu Zhu
Depression is a major contributor to the overall global burden of diseases. Traditionally, doctors diagnose depressed people face to face via referring to clinical depression criteria. However, more than 70% of the patients would not consult doctors at early stages of depression, which leads to further deterioration of their conditions. Meanwhile, people are increasingly relying on social media to disclose emotions and sharing their daily lives, thus social media have successfully been leveraged for helping detect physical and mental diseases. Inspired by these, our work aims to make timely depression detection via harvesting social media data. We construct welllabeled depression and nondepression dataset on Twitter, and extract six depressionrelated feature groups covering not only the clinical depression criteria, but also online behaviors on social media. With these feature groups, we propose a multimodal depressive dictionary learning model to detect the depressed users on Twitter. A series of experiments are conducted to validate this model, which outperforms (+3% to +10%) several baselines. Finally, we analyze a largescale dataset on Twitter to reveal the underlying online behaviors between depressed and nondepressed users.

Who to Invite Next? Predicting Invitees of Social Groups
Yu Han, Jie Tang
Social instant messaging services have greatly changed the way people work, live, and communicate, obtaining more and more attention from researchers in the fields of computer science and sociology. Groups play a very important role in such social networks. Most social messaging services allow users to create groups and invite their friends to join the groups. So when a user creates a group, how to predict who will be invited to this group? To address this issue, we propose a framework to formulate the problem of predicting potential invitees of groups in the context of social instant messaging services and develop a novel model which catches factors that can affect or have correlation with the users' probabilities of being invited to the groups at three levels to predict invitees of social groups. Experimental results show that our proposed model outperforms the baseline methods significantly. All our study is based on realworld data from WeChat, which is one of the largest standalone messaging communication services.

The minds of many: opponent modeling in a stochastic game
Friedrich Burkhard von der Osten, Michael Kirley, Tim Miller
The Theory of Mind provides a framework for an agent to predict the actions of adversaries by building an abstract model of their strategies using recursive nested beliefs. In this paper, we extend a recently introduced technique for opponent modeling based on Theory of Mind reasoning. Our extended multiagent Theory of Mind model explicitly considers multiple opponents simultaneously. We introduce a stereotyping mechanism, which segments the agent population into subgroups of agents with similar behavior. Here, subgroup profiles guide decision making in place of individual agent profiles. We evaluate our model using a multiplayer stochastic game, which presents agents with the challenge of unknown adversaries in a partiallyobservable environment. Simulation results demonstrate that the model performs well under uncertainty and that stereotyping allows larger groups of agents to be modeled robustly. The findings strengthen results showing that Theory of Mind modeling is useful in many artificial intelligence applications.

Social Pressure in Opinion Games
Diodato Ferraioli, Carmine Ventre
Motivated by privacy and security concerns in online social networks, we study the role of social pressure in opinion games. These are games, important in economics and sociology, that model the formation of opinions in a social network. We enrich the definition of (noisy) bestresponse dynamics for opinion games by introducing the pressure, increasing with time, to reach an agreement.We prove that for clique social networks, the dynamics always converges to consensus (no matter the level of noise) if the social pressure is high enough. Moreover, we provide (tight) bounds on the speed of convergence; these bounds are polynomial in the number of players provided that the pressure grows sufficiently fast.We finally look beyond cliques: we characterize the graphs for which consensus is guaranteed, and make some considerations on the computational complexity of checking whether a graph satisfies such a condition.

No Time to Observe: Adaptive Influence Maximization with Partial Feedback
Jing Yuan, Shaojie Tang
Although influence maximization problem has been extensively studied over the past ten years, majority of existing work adopt one of the following models: \emph{fullfeedback model} or \emph{zerofeedback model}. In the zerofeedback model, we have to commit the seed users all at once in advance, this strategy is also known as nonadaptive policy. In the fullfeedback model, we select one seed at a time and wait until the diffusion completes, before selecting the next seed. Fullfeedback model has better performance but potentially huge delay, zerofeedback model has zero delay but poorer performance since it does not utilize the observation that may be made during the seeding process. To fill the gap between these two models, we propose \emph{partialfeedback model}, which allows us to select a seed at any intermediate stage. We develop a novel $\alpha$greedy policy that achieves a bounded approximation ratio.

Unified Representation and Lifted Sampling for Generative Models of Social Networks
Pablo Robles, Sebastian Moreno, Jennifer Neville
Statistical models of network structure are widely used in network science to reason about the properties of complex systems—where the nodes and edges represent entities and their relationships. Recently, a number of generative network models (GNM) have been developed that accurately capture characteristics of real world networks, but since they are typically defined in a procedural manner, it is difficult to identify commonalities in their structure. Moreover, procedural definitions make it difficult to develop statistical sampling algorithms that are both efficient and correct. In this paper, we identify a family of GNMs that share a common latent structure and create a Bayesian network (BN) representation that captures their common form. We show how to reduce two existing GNMs to this representation. Then, using the BN representation we develop a generalized, efficient, and provably correct, sampling method that exploits parametric symmetries and deterministic contextspecific dependence. Finally, we use the new representation to design a novel GNM and evaluate it empirically.
Wednesday 23 16:30  18:00 MLCL5  Classification 5

Exclusivity Regularized Machine: A New Ensemble SVM Classifier
Xiaojie Guo, Xiaobo Wang, Haibin Ling
The diversity of base learners is of utmost importance to a good ensemble. This paper defines a novel measurement of diversity, termed as exclusivity. With the designed exclusivity, we further propose an ensemble SVM classifier, namely Exclusivity Regularized Machine (ExRM), to jointly suppress the training error of ensemble and enhance the diversity between bases. Moreover, an Augmented Lagrange Multiplier based algorithm is customized to effectively and efficiently seek the optimal solution of ExRM. Theoretical analysis on convergence, global optimality and linear complexity of the proposed algorithm, as well as experiments are provided to reveal the efficacy of our method and show its superiority over stateofthearts in terms of accuracy and efficiency.

VertexWeighted Hypergraph Learning for MultiView Object Classification
Lifan Su, Yue Gao, Xibin Zhao, Hai Wan, Ming Gu, Jiaguang Sun
3D object classification with multiview representation has become very popular, thanks to the progress on computer techniques and graphic hardware, and attracted much research attention in recent years. Regarding this task, there are mainly two challenging issues, i.e., the complex correlation among multiple views and the possible imbalance data issue. In this work, we propose to employ the hypergraph structure to formulate the relationship among 3D objects, taking the advantage of hypergraph on highorder correlation modelling. However, traditional hypergraph learning method may suffer from the imbalance data issue. To this end, we propose a vertexweighted hypergraph learning algorithm for multiview 3D object classification, introducing an updated hypergraph structure. In our method, the correlation among different objects is formulated in a hypergraph structure and each object (vertex) is associated with a corresponding weight, weighting the importance of each sample in the learning process. The learning process is conducted on the vertexweighted hypergraph and the estimated object relevance is employed for object classification. The proposed method has been evaluated on two public benchmarks, i.e., the NTU and the PSB datasets. Experimental results and comparison with the stateoftheart methods and recent deep learning method demonstrate the effectiveness of our proposed method.

Improving the Generalization Performance of Multiclass SVM via Angular Regularization
Jianxin Li, Haoyi Zhou, Pengtao Xie, Yingchun Zhang
In multiclass support vector machine (MSVM) for classification, one core issue is to regularize the coefficient vectors to reduce overfitting. Various regularizers have been proposed such as L2, L1, and trace norm. In this paper, we introduce a new type of regularization approach  angular regularization, that encourages the coefficient vectors to have larger angles such that class regions can be widen to flexibly accommodate unseen samples. We propose a novel angular regularizer based on the singular values of the coefficient matrix, where the uniformity of singular values reduces the correlation among different classes and drives the angles between coefficient vectors to increase. In generalization error analysis, we show that decreasing this regularizer effectively reduces generalization error bound. On various datasets, we demonstrate the efficacy of the regularizer in reducing overfitting.

Ordinal ZeroShot Learning
Zengwei Huo, Xin Geng
Zeroshot learning predicts new class even if no training data is available for that class. The solution to conventional zeroshot learning usually depends on side information such as attribute or text corpora. But these side information is not easy to obtain or use. Fortunately in many classification tasks, the class labels are ordered, and therefore closely related to each other. This paper deals with zeroshot learning for ordinal classification. The key idea is using label relevance to expand supervision information from seen labels to unseen labels. The proposed method SIDL generates a supervision intensity distribution (SID) that contains each label's supervision intensity, and then learns a mapping from instance to SID. Experiments on two typical ordinal classification problems, i.e., head pose estimation and age estimation, show that SIDL performs significantly better than the compared regression methods. Furthermore, SIDL appears much more robust against the increase of unseen labels than other compared baselines.

Distributed Accelerated Proximal Coordinate Gradient Methods
Yong Ren, Jun Zhu
We develop a general accelerated proximal coordinate descent algorithm in distributed settings (Dis APCG) for the optimization problem that minimizes the sum of two convex functions: the first part f is smooth with a gradient oracle, and the other one Ψ is separable with respect to blocks of coordinate and has a simple known structure (e.g., L1 norm). Our algorithm gets new accelerated convergence rate in the case that f is strongly con vex by making use of modern parallel structures, and includes previous nonstrongly case as a special case. We further present efficient implementations to avoid fulldimensional operations in each step, significantly reducing the computation cost. Experiments on the regularized empirical risk minimization problem demonstrate the effectiveness of our algorithm and match our theoretical findings.

Open Category Classification by Adversarial Sample Generation
Yang Yu, WeiYang Qu, Nan Li, Zimin Guo
In realworld classification tasks, it is difficult to collect samples of all possible categories of the environment in the training stage. Therefore, the classifier should be prepared for unseen classes. When an instance of an unseen class appears in the prediction stage, a robust classifier should have the ability to tell it is unseen, instead of classifying it to be any known category. In this paper, adopting the idea of adversarial learning, we propose the ASG framework for opencategory classification. ASG generates positive and negative samples of seen categories in the unsupervised manner via an adversarial learning strategy. With the generated samples, ASG then learns to tell seen from unseen in the supervised manner. Experiments performed on several datasets show the effectiveness of ASG.
Wednesday 23 16:30  18:00 MLDLV1  Deep Learning and Vision 1

Fashion Style Generator
Shuhui Jiang, Yun Fu
In this paper, we focus on a new problem: applying artificial intelligence to automatically generate fashion style images. Given a basic clothing image and a fashion style image (e.g., leopard print), we generate a clothing image with the certain style in real time with a neural fashion style generator. Fashion style generation is related to recent artistic style transfer works, but has its own challenges. The synthetic image should preserve the similar design as the basic clothing, and meanwhile blend the new style pattern on the clothing. Neither existing global nor patch based neural style transfer methods could well solve these challenges. In this paper, we propose an endtoend feedforward neural network which consists of a fashion style generator and a discriminator. The global and patch based style and content losses calculated by the discriminator alternatively backpropagate the generator network and optimize it. The global optimization stage preserves the clothing form and design and the local optimization stage preserves the detailed style pattern. Extensive experiments show that our method outperforms the stateofthearts.

EigenNet: Towards Fast and Structural Learning of Deep Neural Networks
Luo Ping
Deep Neural Network (DNN) is difficult to train and easy to overfit in training. We address these two issues by introducing EigenNet, an architecture that not only accelerates training but also adjusts number of hidden neurons to reduce overfitting. They are achieved by whitening the information flows of DNNs and removing those eigenvectors that may capture noises. The former improves conditioning of the Fisher information matrix, whilst the latter increases generalization capability. These appealing properties of EigenNet can benefit many recent DNN structures, such as network in network and inception, by wrapping their hidden layers into the layers of EigenNet. The modeling capacities of the original networks are preserved. Both the training wallclock time and number of updates are reduced by using EigenNet, compared to stochastic gradient descent on various datasets, including MNIST, CIFAR10, and CIFAR100.

DeepFacade: A Deep Learning Approach to Facade Parsing
Hantang Liu, Jialiang Zhang, Jianke Zhu, Steven C. H. Hoi
The parsing of building facades is a key component to the problem of 3D street scenes reconstruction, which is long desired in computer vision. In this paper, we propose a deep learning based method for segmenting a facade into semantic categories. Manmade structures often present the characteristic of symmetry. Based on this observation, we propose a symmetric regularizer for training the neural network. Our proposed method can make use of both the power of deep neural networks and the structure of manmade architectures. We also propose a method to refine the segmentation results using bounding boxes generated by the Region Proposal Network. We test our method by training a FCN8s network with the novel loss function. Experimental results show that our method has outperformed previous stateoftheart methods significantly on both the ECP dataset and the eTRIMS dataset. As far as we know, we are the first to employ endtoend deep convolutional neural network on full image scale in the task of building facades parsing.

Training Group Orthogonal Neural Networks with Privileged Information
Yunpeng Chen, Xiaojie Jin, Jiashi Feng, Shuicheng Yan
Learning rich and diverse representations is critical for the performance of deep convolutional neural networks (CNNs). In this paper, we consider how to use privileged information to promote inherent diversity of a single CNN model such that the model can learn better representations and offer stronger generalization ability. To this end, we propose a novel group orthogonal convolutional neural network (GoCNN) that learns untangled representations within each layer by exploiting provided privileged information and enhances representation diversity effectively. We take image classification as an example where image segmentation annotations are used as privileged information during the training process. Experiments on two benchmark datasets – ImageNet and PASCAL VOC – clearly demonstrate the strong generalization ability of our proposed GoCNN model. On the ImageNet dataset, GoCNN improves the performance of stateoftheart ResNet152 model by absolute value of 1.2% while only uses privileged information of 10% of the training images, confirming effectiveness of GoCNN on utilizing available privileged knowledge to train better CNNs.

Forecast the Plausible Paths in Crowd Scenes
Hang Su, Jun Zhu, Yinpeng Dong, Bo Zhang
Forecasting the future plausible paths of pedestrians in crowd scenes is of wide applications, but it still remains as a challenging task due to the complexities and uncertainties of crowd motions. To address these issues, we propose to explore the inherent crowd dynamics via a socialaware recurrent Gaussian process model, which facilitates the path prediction by taking advantages of the interplay between the rich prior knowledge and motion uncertainties. Specifically, we derive a socialaware LSTM to explore the crowd dynamic, resulting in a hidden feature embedding the rich prior in massive data. Afterwards, we integrate the descriptor into deep Gaussian processes with motion uncertainties appropriately harnessed. Crowd motion forecasting is implemented by regressing relative motion against the current positions, yielding the predicted paths based on a functional object associated with a distribution. Extensive experiments on public datasets demonstrate that our method obtains the stateoftheart performance in both structured and unstructured scenes by exploring the complex and uncertain motion patterns, even if the occlusion is serious or the observed trajectories are noisy.

Deep Optical Flow Estimation Via MultiScale Correspondence Structure Learning
Shanshan Zhao, Xi Li, Omar El Farouk Bourahla
As an important and challenging problem in computer vision, learning based optical flow estimation aims to discover the intrinsic correspondence structure between two adjacent video frames through statistical learning. Therefore, a key issue to solve in this area is how to effectively model the multiscale correspondence structure properties in an adaptive endtoend learning fashion. Motivated by this observation, we propose an endtoend multiscale correspondence structure learning (MSCSL) approach for optical flow estimation. In principle, the proposed MSCSL approach is capable of effectively capturing the multiscale interimagecorrelation correspondence structures within a multilevel feature space from deep learning. Moreover, the proposed MSCSL approach builds a spatial ConvGRU neural network model to adaptively model the intrinsic dependency relationships among these multiscale correspondence structures. Finally, the above procedures for correspondence structure learning and multiscale dependency modeling are implemented in a unified endtoend deep learning framework. Experimental results on several benchmark datasets demonstrate the effectiveness of the proposed approach.
Wednesday 23 16:30  18:00 KRKRL  Knowledge Representation Languages

Discriminative Dictionary Learning With Ranking Metric Embedded for Person ReIdentification
De Cheng, Xiaojun Chang, Li Liu, Alexander G. Hauptmann, Yihong Gong, Nanning Zheng
The goal of person reidentification (ReId) is to match pedestrians captured from multiple nonoverlapping cameras. In this paper, we propose a novel dictionary learning based method with the ranking metric embedded, for person ReId. A new and essential ranking graph Laplacian term is introduced, which minimizes the intrapersonal compactness and maximizes the interpersonal dispersion in the objective. Different from the traditional dictionary learning based approaches and their extensions, which just use the same or not information, our proposed method can explore the ranking relationship among the person images, which is essential for such retrieval related tasks. Simultaneously, one distance measurement has been explicitly learned in the model to further improve the performance. Since we have reformulated these ranking constraints into the graph Laplacian form, the proposed method is easytoimplement but effective. We conduct extensive experiments on three widely used person ReId benchmark datasets, and achieve stateoftheart performances.

Knowledge Graph Representation with Jointly Structural and Textual Encoding
Jiacheng Xu, Xipeng Qiu, Kan Chen, Xuanjing Huang
The objective of knowledge graph embedding is to encode both entities and relations of knowledge graphs into continuous lowdimensional vector spaces. Previously, most works focused on symbolic representation of knowledge graph with structure information, which can not handle new entities or entities with few facts well. In this paper, we propose a novel deep architecture to utilize both structural and textual information of entities. Specifically, we introduce three neural models to encode the valuable information from text description of entity, among which an attentive model can select related information as needed. Then, a gating mechanism is applied to integrate representations of structure and text into a unified architecture. Experiments show that our models outperform baseline and obtain stateoftheart results on link prediction and triplet classification tasks.

Contextaware Path Ranking for Knowledge Base Completion
Sahisnu Mazumder, Bing Liu
Knowledge base (KB) completion aims to infer missing facts from existing ones in a KB. Among various approaches, path ranking (PR) algorithms have received increasing attention in recent years. PR algorithms enumerate paths between entitypairs in a KB and use those paths as features to train a model for missing fact prediction. Due to their good performances and high model interpretability, several methods have been proposed. However, most existing methods suffer from scalability (high RAM consumption) and feature explosion (trains on an exponentially large number of features) problems. This paper proposes a Contextaware Path Ranking (CPR) algorithm to solve these problems by introducing a selective path exploration strategy. CPR learns global semantics of entities in the KB using word embedding and leverages the knowledge of entity semantics to enumerate contextually relevant paths using bidirectional random walk. Experimental results on three large KBs show that the path features (fewer in number) discovered by CPR not only improve predictive performance but also are more interpretable than existing baselines.

A Model for Accountable Ordinal Sorting
Khaled Belahcene, Christophe Labreuche, Nicolas Maudet, Vincent Mousseau, Wassila Ouerdane
We address the problem of multicriteria ordinalsorting through the lens of accountability, i.e. theability of a human decisionmaker to own a recommendationmade by the system. We put forward anumber of model features that would favor the capabilityto support the recommendation with a convincingexplanation. To account for that, we designa recommender system implementing and formalizingsuch features. This system outputs explanationsdefined under the form of specific argumentschemes tailored to represent the specific rules ofthe model. At the end, we discuss possible andpromising argumentative perspectives.

Relatednessbased MultiEntity Summarization
Kalpa Gunaratna, Krishnaprasad Thirunarayan, Amit Sheth, Amir Yazdavar, Gong Cheng
Representing world knowledge in a machine processable format is important as entities and their descriptions have fueled tremendous growth in knowledgerich information processing platforms, services, and systems. Prominent applications of knowledge graphs include search engines (e.g., Google Search and Microsoft Bing), email clients (e.g., Gmail), and intelligent personal assistants (e.g., Google Now, Amazon Echo, and Apple's Siri). In this paper, we present an approach that can summarize facts about a collection of entities by analyzing their relatedness in preference to summarizing each entity in isolation. Specifically, we generate informative entity summaries by selecting: (i) interentity facts that are similar and (ii) intraentity facts that are important and diverse. We employ a constrained knapsack problem solving approach to efficiently compute entity summaries. We perform both qualitative and quantitative experiments and demonstrate that our approach yields promising results compared to two other standalone stateoftheart entity summarization approaches.

A Reasoning System for a FirstOrder Logic of Limited Belief
Christoph Schwering
Logics of limited belief aim at enabling computationally feasible reasoning in highly expressive representation languages. These languages are often dialects of firstorder logic with a weaker form of logical entailment that keeps reasoning decidable or even tractable. While a number of such logics have been proposed in the past, they tend to remain for theoretical analysis only and their practical relevance is very limited. In this paper, we aim to go beyond the theory. Building on earlier work by Liu, Lakemeyer, and Levesque, we develop a logic of limited belief that is highly expressive but remains decidable in the firstorder and tractable in the propositional case and exhibits some characteristics that make it attractive for an implementation. We introduce a reasoning system that employs this logic as representation language and present experimental results that showcase the benefit of limited belief.
Wednesday 23 16:30  18:00 MTSP2  Security and Privacy 2

When Security Games Hit Traffic: Optimal Traffic Enforcement Under One Sided Uncertainty
Ariel Rosenfeld, Sarit Kraus
Efficient traffic enforcement is an essential, yet complex, component in preventing road accidents. In this paper, we present a novel model and an optimizing algorithm for mitigating some of the computational challenges of realworld traffic enforcement allocation in large road networks. Our approach allows for scalable, coupled and nonMarkovian optimization of multiple police units and guarantees optimality. In an extensive empirical evaluation we show that our approach favorably compares to several baseline solutions achieving a significant speedup, using both synthetic and realworld road networks.

A Convolutional Approach for Misinformation Identification
Feng Yu, Qiang Liu, Shu Wu, Liang Wang, Tieniu Tan
The fast expanding of social media fuels the spreading of misinformation which disrupts people's normal lives. It is urgent to achieve goals of misinformation identification and early detection in social media. In dynamic and complicated social media scenarios, some conventional methods mainly concentrate on feature engineering which fail to cover potential features in new scenarios and have difficulty in shaping elaborate highlevel interactions among significant features. Moreover, a recent Recurrent Neural Network (RNN) based method suffers from deficiencies that it is not qualified for practical early detection of misinformation and poses a bias to the latest input. In this paper, we propose a novel method, Convolutional Approach for Misinformation Identification (CAMI) based on Convolutional Neural Network (CNN). CAMI can flexibly extract key features scattered among an input sequence and shape highlevel interactions among significant features, which help effectively identify misinformation and achieve practical early detection. Experiment results on two largescale datasets validate the effectiveness of CAMI model on both misinformation identification and early detection tasks.

Optimal Escape Interdiction on Transportation Networks
Youzhi Zhang, Bo An, Long TranThanh, Zhen Wang, Jiarui Gan, Nicholas R. Jennings
Preventing crimes or terrorist attacks in urban areas is challenging. Law enforcement officers need to respond quickly to catch the attacker on his escape route, which is subject to timedependent traffic conditions on transportation networks. The attacker can strategically choose his escape path and driving speed to avoid being captured. Existing work on security resource allocation has not considered such scenarios with timedependent strategies for both players. Therefore, in this paper, we study the problem of efficiently scheduling security resources for interdicting the escaping attacker. We propose: 1) a new defenderattacker security game model for escape interdiction on transportation networks; and 2) an efficient double oracle algorithm to compute the optimal defender strategy, which combines mixedinteger linear programming formulations for best response problems and effective approximation algorithms for improving the scalability of the algorithms. Experimental evaluation shows that our approach significantly outperforms baselines in solution quality and scales up to realisticsized transportation networks with hundreds of intersections.

A Trustbased Mixture of Gaussian Processes Model for Reliable Regression in Participatory Sensing
Qikun Xiang, Ido Nevat, Pengfei Zhang, Jie Zhang
Data trustworthiness is a crucial issue in realworld participatory sensing applications. Without considering this issue, different types of worker misbehavior, especially the challenging collusion attacks, can result in biased and inaccurate estimation and decision making. We propose a novel trustbased mixture of Gaussian processes (GP) model for spatial regression to jointly detect such misbehavior and accurately estimate the spatial field. We develop a Markov chain Monte Carlo (MCMC)based algorithm to efficiently perform Bayesian inference of the model. Experiments using two realworld datasets show the superior robustness of our model compared with existing approaches.

A GroupBased Personalized Model for Image Privacy Classification and Labeling
Haoti Zhong, Anna squicciarini, David Miller, Cornelia Caragea
Machine labeling of image content as private or public is a notoriously difficult problem, with the usual image processing challenges compounded by the highly personal, subjective, and contextual nature of access control decision making. In general, a user's privacy expectation for a given image is consequential to specific contents therein and the user's personal characteristics and privacy awareness. In this paper, we propose a stochastic GroupBased Personalized Model for image privacy classification in online social media sites. Our model relies on the concept of privacy groups, which model a subset of users, and treat group membership as latent variable, for each user. Our experimental results show that our model performs well regardless of the amount of data used for training, and consistently outperforms several baselines, including the obvious approach of a separate personalized model for each user.

Efficient Label Contamination Attacks Against BlackBox Learning Models
Mengchen Zhao, Bo An, Wei Gao, Teng Zhang
Label contamination attack (LCA) is an important type of data poisoning attack where an attacker manipulates the labels of training data to make the learned model beneficial to him. Existing work on LCA assumes that the attacker has full knowledge of the victim learning model, whereas the victim model is usually a blackbox to the attacker. In this paper, we develop a Projected Gradient Ascent (PGA) algorithm to compute LCAs on a family of empirical risk minimizations and show that an attack on one victim model can also be effective on other victim models. This makes it possible that the attacker designs an attack against a substitute model and transfers it to a blackbox victim model. Based on the observation of the transferability, we develop a defense algorithm to identify the data points that are most likely to be attacked. Empirical studies show that PGA significantly outperforms existing baselines and linear learning models are better substitute models than nonlinear ones.
Wednesday 23 16:30  18:00 MASEPSC  Economic Paradigms and Social Choice

Computing an Approximately Optimal Agreeable Set of Items
Pasin Manurangsi, Warut Suksompong
We study the problem of finding a small subset of items that is agreeable to all agents, meaning that all agents value the subset at least as much as its complement. Previous work has shown worstcase bounds, over all instances with a given number of agents and items, on the number of items that may need to be included in such a subset. Our goal in this paper is to efficiently compute an agreeable subset whose size approximates the size of the smallest agreeable subset for a given instance. We consider three wellknown models for representing the preferences of the agents: ordinal preferences on single items, the value oracle model, and additive utilities. In each of these models, we establish virtually tight bounds on the approximation ratio that can be obtained by algorithms running in polynomial time.

Recognizing TopMonotonic Preference Profiles in Polynomial Time
Krzysztof Magiera, Piotr Faliszewski
We provide the first polynomialtime algorithm for recognizing if a profile of (possibly weak) preference orders is topmonotonic. Topmonotonicity is a generalization of the notions of singlepeakedness and singlecrossingness, defined by Barbera and Moreno. Topmonotonic profiles always have weak Condorcet winners and satisfy a variant of the median voter theorem. Our algorithm proceeds by reducing the recognition problem to the SAT2CNF problem.

Proportional Rankings
Piotr Skowron, Martin Lackner, Edith Elkind, Markus Brill, Dominik Peters
We extend the principle of proportional representation to rankings: given approval preferences, we aim to generate aggregate rankings so that cohesive groups of voters are represented proportionally in each initial segment of the ranking. Such rankings are desirable in situations where initial segments of different lengths may be relevant, e.g., in recommender systems, for hiring decisions, or for the presentation of competing proposals on a liquid democracy platform. We define what it means for rankings to be proportional, provide bounds for wellknown aggregation rules, and experimentally evaluate the performance of these rules.

Manipulating GaleShapley Algorithm: Preserving Stability and Remaining Inconspicuous
Rohit Vaish, Dinesh Garg
We study the problem of manipulation of the menproposing GaleShapley algorithm by a single woman via permutation of her true preference list. Our contribution is threefold: First, we show that the matching induced by an optimal manipulation is stable with respect to the true preferences. Second, we identify a class of optimal manipulations called inconspicuous manipulations which, in addition to preserving stability, are also nearly identical to the true preference list of the manipulator (making the manipulation hard to be detected). Third, for optimal inconspicuous manipulations, we strengthen the stability result by showing that the entire stable lattice of the manipulated instance is contained inside the original lattice.

Fair Division of a Graph
Sylvain Bouveret, Katarína Cechlárová, Edith Elkind, Ayumi Igarashi, Dominik Peters
We consider fair allocation of indivisible items under an additional constraint: there is an undirected graph describing the relationship between the items, and each agent's share must form a connected subgraph of this graph. This framework captures, e.g., fair allocation of land plots, where the graph describes the accessibility relation among the plots. We focus on agents that have additive utilities for the items, and consider several common fair division solution concepts, such as proportionality, envyfreeness and maximin share guarantee. While finding good allocations according to these solution concepts is computationally hard in general, we design efficient algorithms for special cases wherethe underlying graph has simple structure, and/or the number of agentsor, less restrictively, the number of agent typesis small. In particular, despite nonexistence results in the general case, we prove that for acyclic graphs a maximin share allocation always exists and can be found efficiently.

Mechanisms for Online Organ Matching
Nicholas Mattei, Abdallah Saffidine, Toby Walsh
Matching donations from deceased patients to patients on the waiting list account for over 85\% of all kidney transplants performed in Australia. We propose a simple mechanisms to perform this matching and compare this new mechanism with the more complex algorithm currently under consideration by the Organ and Tissue Authority in Australia. We perform a number of experiments using real world data provided by the Organ and Tissue Authority of Australia. We find that our simple mechanism is more efficient and fairer in practice compared to the other mechanism currently under consideration.
Wednesday 23 16:30  18:00 MLDMSS  Data Mining and Social Sciences

A Robust Noise Resistant Algorithm for POI Identification from Flickr Data
Yiyang Yang, Zhiguo Gong, Qing Li, Leong Hou U, Ruichu Cai, Zhifeng Hao
Point of Interests (POI) identification using social media data (e.g. Flickr, Microblog) is one of the most popular research topics in recent years. However, there exist large amounts of noises (POI irrelevant data) in such crowdcontributed collections. Traditional solutions to this problem is to set a global density threshold and remove the data point as noise if its density is lower than the threshold. However, the density values vary significantly among POIs. As the result, some POIs with relatively lower density could not be identified. To solve the problem, we propose a technique based on the local drastic changes of the data density. First we define the local maxima of the density function as the Urban POIs, and the gradient ascent algorithm is exploited to assign data points into different clusters. To remove noises, we incorporate the Laplacian ZeroCrossing points along the gradient ascent process as the boundaries of the POI. Points located outside the POI region are regarded as noises. Then the technique is extended into the geographical and textual joint space so that it can make use of the heterogeneous features of social media. The experimental results show the significance of the proposed approach in removing noises.

Learning Concise Representations of Users' Influences through Online Behaviors
Shenghua Liu, Houdong Zheng, Huawei Shen, Xueqi Cheng, Xiangwen Liao
Whereas it is well known that social network users influence each other, a fundamental problem in influence maximization, opinion formation and viral marketing is that users' influences are difficult to quantify. Previous work has directly defined an independent model parameter to capture the interpersonal influence between each pair of users. However, such models do not consider how influences depend on each other if they originate from the same user or if they act on the same user. To do so, these models need a parameter for each pair of users, which results in highdimensional models becoming easily trapped into the overfitting problem. Given these problems, another way of defining the parameters is needed to consider the dependencies. Thus we propose a model that defines parameters for every user with a latent influence vector and a susceptibility vector. Such lowdimensional and distributed representations naturally cause the interpersonal influences involving the same user to be coupled with each other, thus reducing the model's complexity. Additionally, the model can easily consider the sentimental polarities of users' messages and how sentiment affects users' influences. In this study, we conduct extensive experiments on real Microblog data, showing that our model with distributed representations achieves better accuracy than the stateoftheart and pairwise models, and that learning influences on sentiments benefit performance.

TransNet: TranslationBased Network Representation Learning for Social Relation Extraction
Cunchao Tu, Zhengyan Zhang, Zhiyuan Liu, Maosong Sun
Conventional network representation learning (NRL) models learn lowdimensional vertex representations by simply regarding each edge as a binary or continuous value. However, there exists rich semantic information on edges and the interactions between vertices usually preserve distinct meanings, which are largely neglected by most existing NRL models. In this work, we present a novel Translationbased NRL model, TransNet, by regarding the interactions between vertices as a translation operation. Moreover, we formalize the task of Social Relation Extraction (SRE) to evaluate the capability of NRL methods on modeling the relations between vertices. Experimental results on SRE demonstrate that TransNet significantly outperforms other baseline methods by 10% to 20% on hits@1. The source code and datasets can be obtained from https://github.com/thunlp/TransNet.

Accelerated Local Anomaly Detection via Resolving Attributed Networks
Ninghao Liu, Xiao Huang, Xia Hu
Attributed networks, in which network connectivity and node attributes are available, have been increasingly used to model realworld information systems, such as social media and ecommerce platforms. While outlier detection has been extensively studied to identify anomalies that deviate from certain chosen background, existing algorithms cannot be directly applied on attributed networks due to the heterogeneous types of information and the scale of realworld data. Meanwhile, it has been observed that local anomalies, which may align with global condition, are hard to be detected by existing algorithms with interpretability. Motivated by the observations, in this paper, we propose to study the problem of effective and efficient local anomaly detection in attributed networks. In particular, we design a collective way for modeling heterogeneous network and attribute information, and develop a novel and efficient distributed optimization algorithm to handle largescale data. In the experiments, we compare the proposed framework with the stateoftheart methods on both real and synthetic datasets, and demonstrate its effectiveness and efficiency through quantitative evaluation and case studies.

ContextCare: Incorporating Contextual Information Networks to Representation Learning on Medical Forum Data
Stan Zhao, Meng Jiang, Quan Yuan, Bing Qin, Ting Liu, ChengXiang Zhai
Online users have generated a large amount of healthrelated data on medical forums and search engines. However, exploiting these rich data for orienting patient online and assisting medical checkup offline is nontrivial due to the sparseness of existing symptomdisease links, which caused by the natural and chatty expressions of symptoms. In this paper, we propose a novel and general representation learning method ContextCare for human generated healthrelated data, which learns the latent relationship between symptoms and diseases from the symptomdisease diagnosis network for disease prediction, disease category prediction and disease clustering. To alleviate the network sparseness, ContextCare adopts regularizations from rich contextual information networks including a symptom cooccurrence network and a disease evolution network. Therefore, our representations of symptoms and diseases incorporate knowledge from these three networks. Extensive experiments on medical forum data demonstrate that ContextCare outperforms the stateoftheart methods in disease category prediction, disease prediction and disease clustering.

SPMC: SociallyAware Personalized Markov Chains for Sparse Sequential Recommendation
Chenwei Cai, Ruining He, Julian McAuley
Dealing with sparse, longtailed datasets, and coldstart problems is always a challenge for recommender systems. These issues can partly be dealt with by making predictions not in isolation, but by leveraging information from related events; such information could include signals from social relationships or from the sequence of recent activities. Both types of additional information can be used to improve the performance of stateoftheart matrix factorizationbased techniques. In this paper, we propose new methods to combine both social and sequential information simultaneously, in order to further improve recommendation performance. We show these techniques to be particularly effective when dealing with sparsity and coldstart issues in several large, realworld datasets.
Wednesday 23 16:30  18:00 MLSSL3  SemiSupervised Learning 3

Semisupervised Maxmargin Topic Model with Manifold Posterior Regularization
Wenbo Hu, Jun Zhu, Hang Su, Jingwei Zhuo, Bo Zhang
Supervised topic models leverage label information to learn discriminative latent topic representations. As collecting a fully labeled dataset is often timeconsuming, semisupervised learning is of high interest. In this paper, we present an effective semisupervised maxmargin topic model by naturally introducing manifold posterior regularization to a regularized Bayesian topic model, named LapMedLDA. The model jointly learns latent topics and a related classifier with only a small fraction of labeled documents. To perform the approximate inference, we derive an efficient stochastic gradient MCMC method. Unlike the previous semisupervised topic models, our model adopts a tight coupling between the generative topic model and the discriminative classifier. Extensive experiments demonstrate that such tight coupling brings significant benefits in quantitative and qualitative performance.

Learning deep structured network for weakly supervised change detection
Salman Khan, Xuming He, Fatih Porikli, Ferdous Sohel, Roberto Togneri, Mohammed Bennamoun
Conventional change detection methods require a large number of images to learn background models or depend on tedious pixellevel labeling by humans. In this paper, we present a weakly supervised approach that needs only imagelevel labels to simultaneously detect and localize changes in a pair of images. To this end, we employ a deep neural network with DAG topology to learn patterns of change from imagelevel labeled training data. On top of the initial CNN activations, we define a CRF model to incorporate the local differences and context with the dense connections between individual pixels. We apply a constrained meanfield algorithm to estimate the pixellevel labels, and use the estimated labels to update the parameters of the CNN in an iterative EM framework. This enables imposing global constraints on the observed foreground probability mass function. Our evaluations on four benchmark datasets demonstrate superior detection and localization performance.

Using Graphs of Classifiers to Impose Declarative Constraints on Semisupervised Learning
Lidong Bing, William Cohen, Bhuwan Dhingra
We propose a general approach to modeling semisupervised learning (SSL) algorithms. Specifically, we present a declarative language for modeling both traditional supervised classification tasks and many SSL heuristics, including both wellknown heuristics such as cotraining and novel domainspecific heuristics. In addition to representing individual SSL heuristics, we show that multiple heuristics can be automatically combined using Bayesian optimization methods. We experiment with two classes of tasks, linkbased text classification and relation extraction. We show modest improvements on wellstudied linkbased classification benchmarks, and stateoftheart results on relationextraction tasks for two realistic domains.

Incomplete Attribute Learning with auxiliary labels
Kongming Liang, Yuhong Guo, Hong Chang, Xilin Chen
Visual attribute learning is a fundamental and challenging problem for image understanding. Considering the huge semantic space of attributes, it is economically impossible to annotate all their presence or absence for a natural image via crowdsourcing. In this paper, we tackle the incompleteness nature of visual attributes by introducing auxiliary labels into a novel transductive learning framework. By jointly predicting the attributes from the input images and modeling the relationship of attributes and auxiliary labels, the missing attributes can be recovered effectively. In addition, the proposed model can be solved efficiently in an alternative way by optimizing quadratic programming problems and updating parameters in closedform solutions. Moreover, we propose and investigate different methods for acquiring auxiliary labels. We conduct experiments on three widely used attribute prediction datasets. The experimental results show that our proposed method can achieve the stateoftheart performance with access to partially observed attribute annotations.

Decreasing Uncertainty in Planning with State Prediction
Senka Krivic, Michael Cashmore, Daniele Magazzeni, Bram Ridder, Sandor Szedmak, Justus Piater
In real world environments the state is almost never completely known. Exploration is often expensive. The application of planning in these environments is consequently more difficult and less robust. In this paper we present an approach for predicting new information about a partiallyknown state. The state is translated into a partiallyknown multigraph, which can then be extended using machinelearning techniques. We demonstrate the effectiveness of our approach, showing that it enhances the scalability of our planners, and leads to less time spent on sensing actions.

Semisupervised Learning over Heterogeneous Information Networks by Ensemble of Metagraph Guided Random Walks
He Jiang, Yangqiu Song, Chenguang Wang, Ming Zhang, Yizhou Sun
Heterogeneous information networks (HINs) is a general representation of many real world applications. The difference between HIN and traditional homogeneous graphs is that the nodes and edges in HIN are with types. Then in the many applications, we need to consider the types to make the approach more semantically meaningful. For the applications that annotation is expensive, on natural way is to consider semisupervised learning over HIN. In this paper, we present a semisupervised learning algorithm constrained by the types of HINs. We first decompose the original HIN into several semantically meaningful subgraphs based the metagraphs composed of entity and relation types. Then we perform random walk over the subgraphs to propagate the labels from labeled data to unlabeled data. After we obtain all the labels propagated by different trials of random walk guided by metagraphs, we use an ensemble algorithm to vote for the final labeling results. We use two public available datasets, 20newsgroups and RCV1 datasets to test our algorithm. Experimental results show that our algorithm is better than the traditional semisupervised learning algorithms for HINs. One particular byproduct of this work is that we show that previous random walk approach guided by metapaths can be nonstationary, which is the major reason we propose a metagraph guide random walk for semisupervised learning over HINs.
Wednesday 23 16:30  18:00 AUTTEC  AI & Autonomy: Technical issues

Online DecisionMaking for Scalable Autonomous Systems
Kyle Wray, Stefan Witwicki, Shlomo Zilberstein
We present a general formal model called MODIA that can tackle a central challenge for autonomous vehicles (AVs), namely the ability to interact with an unspecified, large number of world entities. In MODIA, a collection of possible decisionproblems (DPs), known a priori, are instantiated online and executed as decisioncomponents (DCs), unknown a priori. To combine their individual action recommendations of the DCs into a single action, we propose the lexicographic executor action function (LEAF) mechanism. We analyze the complexity of MODIA and establish LEAF’s relation to regret minimization. Finally, we implement MODIA and LEAF using collections of partially observable Markov decision process (POMDP) DPs, and use them for complex AV intersection decisionmaking. We evaluate the approach in six scenarios within an industrystandard vehicle simulator, and present its use on an AV prototype.

Reinforcement learning with corrupted reward channel
Tom Everitt, Victoria Krakovna, Laurent Orseau, Shane Legg
No realworld reward function is perfect. Sensory errors and software bugs may result in agents getting higher (or lower) rewards than they should. For example, a reinforcement learning agent may prefer states where a sensory error gives it the maximum reward, but where the true reward is actually small. We formalise this problem as a generalised Markov Decision Problem called Corrupt Reward MDP. Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensate for the possibly corrupt rewards. Two ways around the problem are investigated. First, by giving the agent richer data, such as in inverse reinforcement learning and semisupervised reinforcement learning, reward corruption stemming from systematic sensory errors may sometimes be completely managed. Second, by using randomisation to blunt the agent's optimisation, reward corruption can be partially managed under some assumptions.

Achieving Coordination in MultiAgent Systems by Stable Local Conventions under Community Networks
Shuyue Hu, Hofung Leung
Recently, the study of social conventions has attracted much attention in the literature. We notice that a type of interesting phenomena, local convention phenomena, may also exist in certain multiagent systems. When agents are partitioned into compact communities, different local conventions emerge in different communities. In this paper, we provide a definition for local conventions, and propose two metrics measuring their strength and diversity. In our experimental study, we show that agents can achieve coordination via establishing diverse stable local conventions, which indicates a practical way to solve coordination problems other than the traditional global convention emergence. Moreover, we find that with smaller community sizes, denser connections and fewer available actions, diverse local conventions emerge in shorter time.

A Goal Reasoning Agent for Controlling UAVs in BeyondVisualRange Air Combat
Michael W. Floyd, Justin Karneeb, Philip Moore, David W. Aha
We describe the Tactical Battle Manager (TBM), an intelligent agent that uses several integrated artificial intelligence techniques to control an autonomous unmanned aerial vehicle in simulated beyondvisualrange (BVR) air combat scenarios. The TBM incorporates goal reasoning, automated planning, opponent behavior recognition, state prediction, and discrepancy detection to operate in a realtime, dynamic, uncertain, and adversarial environment. We describe evidence from our empirical study that the TBM significantly outperforms an expertscripted agent in BVR scenarios. We also report the results of an ablation study which indicates that all components of our agent architecture are needed to maximize mission performance.
Wednesday 23 16:30  18:30 Competition Angry Birds
Wednesday 23 16:30  18:30 Competition ANAC
Wednesday 23 16:30  18:30 JOUMISC  Journal Track: Search, Planning, Uncertainty and applications

Local Search for Minimum Weight Dominating Set with TwoLevel Configuration Checking and Frequency Based Scoring Function (Extended Abstract)
Yiyuan Wang, Shaowei Cai, Minghao Yin
The Minimum Weight Dominating Set (MWDS) problem is an important generalization of the Minimum Dominating Set (MDS) problem with extensive applications. This paper proposes a new local search algorithm for the MWDS problem, which is based on two new ideas. The first idea is a heuristic called twolevel configuration checking (CC2), which is a new variant of a recent powerful configuration checking strategy (CC) for effectively avoiding the recent search paths. The second idea is a novel scoring function based on the frequency of being uncovered of vertices. Our algorithm is called CC2FS, according to the names of the two ideas. The experimental results show that, CC2FS performs much better than some stateoftheart algorithms in terms of solution quality on a broad range of MWDS benchmarks.

Efficient Mechanism Design for Online Scheduling(Extended Abstract)
Weidong Ma, Xujin Chen, Xiaodong Hu, TieYan Liu, Tao Qin, Pingzhong Tang, Changjun Wang, Bo Zheng
This work concerns the mechanism design for online scheduling in a strategic setting. In this setting, each job is owned by a selfinterested agent who may misreport the release time, deadline, length, and value of her job, while we need to determine not only the schedule of the jobs, but also the payment of each agent. We focus on the design of incentive compatible (IC) mechanisms, and study the maximization of social welfare (i.e., the aggregated value of completed jobs) by competitive analysis. We first derive two lower bounds on the competitive ratio of any deterministic IC mechanism to characterize the landscape of our research: one bound is 5, which holds for equallength jobs; the other bound is $\frac{\kappa}{\ln\kappa}+1o(1)$, which holds for unequallength jobs, where $\kappa$ is the maximum ratio between lengths of any two jobs. We then propose a deterministic IC mechanism and show that such a simple mechanism works very well for two models: (1) In the preemptionrestart model, the mechanism can achieve the optimal competitive ratio of 5 for equallength jobs and a near optimal ratio of $(\frac{1}{(1\epsilon)^2}+o(1)) \frac{\kappa}{\ln\kappa}$ for unequallength jobs, where $0<\epsilon<1$ is a small constant; (2) In the preemptionresume model, the mechanism can achieve the optimal competitive ratio of 5 for equallength jobs and a near optimal competitive ratio (within factor 2) for unequallength jobs.

Some Properties of Batch Value of Information in the Selection Problem (Extended Abstract)
Shahaf S. Shperberg, Solomon Eyal Shimony
We examine theoretical properties of value of information (VOI) in the selection problem, and identify cases of submodularity and supermodularity. We use these properties to compute approximately optimal measurement batch policies, implemented on a “wine selection problem” example.

A generic approach to planning in the presence of incomplete information: Theory and implementation
Son Thanh To, Tran Cao Son, Enrico Pontelli
This paper proposes a generic approach to planning in the presence of incomplete information. The approach builds on an abstract notion of a belief state representation, along with an associated set of basic operations. These operations facilitate the development of a sound and complete transition function, for reasoning about effects of actions in the presence of incomplete information, and a set of abstract algorithms for planning. The paper demonstrates how the abstract definitions and algorithms can be instantiated in three concrete representations—minimalDNF, minimalCNF, and prime implicates—resulting in three highly competitive conformant planners: DNF, CNF, and PIP. The paper relates the notion of a representation to that of ordered binary decision diagrams, a wellknown belief state representation employed by many conformant planners, and several target compilation languages that have been presented in the literature.The paper also includes an experimental evaluation of the planners DNF, CNF, and PIP and proposes a new set of conformant planning benchmarks that are challenging for stateoftheart conformant planners.

Coherent Predictive Inference under Exchangeability with Imprecise Probabilities (Extended Abstract)
Gert de Cooman, Jasper De Bock, Márcio Diniz
Coherent reasoning under uncertainty can be represented in a very general manner by coherent sets of desirable gambles. This leads to a more general foundation for coherent (imprecise)probabilistic inference that allows for indecision. In this framework, and for a given finite category set, coherent predictive inference under exchangeability can be represented using Bernstein coherent cones of multivariate polynomials on the simplex generated by this category set. We define an inference system as a map that associates a Bernstein coherent cone of polynomials with every finite category set. Inference principles can then be represented mathematically as restrictions on such maps, which allows us to develop a notion of conservative inference under such inference principles. We discuss, as particular examples, representation insensitivity and specificity, and show that there is an infinity of inference systems that satisfy these two principles.

Computer Models Solving Intelligence Test Problems: Progress and Implications
José HernándezOrallo, Fernando MartínezPlumed, Ute Schmid, Michael Siebers, David Dowe
While some computational models of intelligence test problems were proposed throughout the second half of the XXth century, in the first years of the XXIst century we have seen an increasing number of computer systems being able to score well on particular intelligence test tasks. However, despitethis increasing trend there has been no general account of all these works in terms of how theyrelate to each other and what their real achievements are. In this paper, we provide some insighton these issues by giving a comprehensive account of about thirty computer models, from the 1960sto nowadays, and their relationships, focussing on the range of intelligence test tasks they address, thepurpose of the models, how general or specialised these models are, the AI techniques they use in eachcase, their comparison with human performance, and their evaluation of item difficulty.
Thursday 24 08:30  10:00 Competition Andry Birds
Thursday 24 08:30  10:00 MTPUM  Personalisation and user Modelling

Learning User Dependencies for Recommendation
Yong Liu, Peilin Zhao, Xin Liu, Min Wu, Lixin Duan, Xiaoli Li
Social recommender systems exploit users' social relationships to improve recommendation accuracy. Intuitively, a user tends to trust different people regarding with different scenarios. Therefore, one main challenge of social recommendation is to exploit the most appropriate dependencies between users for a given recommendation task. Previous social recommendation methods are usually developed based on predefined user dependencies. Thus, they may not be optimal for a specific recommendation task. In this paper, we propose a novel recommendation method, named probabilistic relational matrix factorization (PRMF), which can automatically learn the dependencies between users to improve recommendation accuracy. In PRMF, users' latent features are assumed to follow a matrix variate normal (MVN) distribution. Both positive and negative user dependencies can be modeled by the row precision matrix of the MVN distribution. Moreover, we also propose an alternating optimization algorithm to solve the optimization problem of PRMF. Extensive experiments on four real datasets have been performed to demonstrate the effectiveness of the proposed PRMF model.

Exploiting Music Play Sequence for Music Recommendation
Zhiyong Cheng, Jialie Shen, Lei Zhu, Mohan Kankanhalli, Liqiang Nie
Users leave digital footprints when interacting with various music streaming services. Music play sequence, which contains rich information about personal music preference and song similarity, has been largely ignored in previous music recommender systems. In this paper, we explore the effects of music play sequence on developing effective personalized music recommender systems. Towards the goal, we propose to use word embedding techniques in music play sequences to estimate the similarity between songs. The learned similarity is then embedded into matrix factorization to boost the latent feature learning and discovery. Furthermore, the proposed method only considers the knearest songs (e.g., k = 5) in the learning process and thus avoids the increase of time complexity. Experimental results on two public datasets demonstrate that our methods could significantly improve the performance of both rating prediction and topn recommendation tasks.

Beyond Universal Saliency: Personalized Saliency Prediction with Multitask CNN
Yanyu Xu, Nianyi Li, Junru Wu, Jingyi Yu, Shenghua Gao
Saliency detection is a long standing problem in computer vision. Tremendous efforts have been focused on exploring a universal saliency model across users despite their differences in gender, race, age, etc. Yet recent psychology studies suggest that saliency is highly specific than universal: individuals exhibit heterogeneous gaze patterns when viewing an identical scene containing multiple salient objects. In this paper, we first show that such heterogeneity is common and critical for reliable saliency prediction. Our study also produces the first database of personalized saliency maps (PSMs). We model PSM based on universal saliency map (USM) shared by different participants and adopt a multitask CNN framework to estimate the discrepancy between PSM and USM. Comprehensive experiments demonstrate that our new PSM model and prediction scheme are effective and reliable.

Quantifying Aspect Bias in Ordinal Ratings using a Bayesian Approach
Lahari Poddar, Wynne Hsu, Mong Li Lee
User opinions expressed in the form of ratings can influence an individual's view of an item. However, the true quality of an item is often obfuscated by user biases, and it is not obvious from the observed ratings the importance different users place on different aspects of an item. We propose a probabilistic modeling of the observed aspect ratings to infer (i) each user's aspect bias and (ii) latent intrinsic quality of an item. We model multiaspect ratings as ordered discrete data and encode the dependency between different aspects by using a latent Gaussian structure. We handle the GaussianCategorical nonconjugacy using a stickbreaking formulation coupled with P\'{o}lyaGamma auxiliary variable augmentation for a simple, fully Bayesian inference. On two real world datasets, we demonstrate the predictive ability of our model and its effectiveness in learning explainable user biases to provide insights towards a more reliable product quality estimation.

Socialized Word Embeddings
Ziqian Zeng, Yichun Yin, Yangqiu Song, Ming Zhang
Word embeddings have attracted a lot of attention. On social media, each user’s language use can be significantly affected by the user’s friends. In this paper, we propose a socialized word embedding algorithm which can consider both user’s personal characteristics of language use and the user’s social relationship on social media. To incorporate personal characteristics, we propose to use a user vector to represent each user. Then for each user, the word embeddings are trained based on each user’s corpus by combining the global word vectors and local user vector. To incorporate social relationship, we add a regularization term to impose similarity between two friends. In this way, we can train the global word vectors and user vectors jointly. To demonstrate the effectiveness, we used the latest largescale Yelp data to train our vectors, and designed several experiments to show how user vectors affect the results.

Exploring Personalized Neural Conversational Models
Satwik Kottur, Vitor Carvalho, Xiaoyu Wang
Modeling dialog systems is currently one of the most active problems in Natural Language Processing. Recent advancement in Deep Learning has sparked an interest in the use of neural networks in modeling language, particularly for personalized conversational agents that can retain contextual information during dialog exchanges. This work carefully explores and compares several of the recently proposed neural conversation models, and carries out a detailed evaluation on the multiple factors that can significantly affect predictive performance, such as pretraining, embedding training, data cleaning, diversity reranking, evaluation setting, etc. Based on the tradeoffs of different models, we propose a new generative dialogue model conditioned on speakers as well as context history that outperforms all previous models on both retrieval and generative metrics. Our findings indicate that pretraining speaker embeddings on larger datasets, as well as bootstrapping word and speaker embeddings, can significantly improve performance (up to 3 points in perplexity), and that promoting diversity in using Mutual Information based techniques has a very strong effect in ranking metrics.
Thursday 24 08:30  10:00 MLCLNN  Classification and Neural Networks

Discriminative Deep Hashing for Scalable Face Image Retrieval
Jie Lin, Zechao Li, Jinhui Tang
With the explosive growth of images containing faces, scalable face image retrieval has attracted increasing attention. Due to the amazing effectiveness, deep hashing has become a popular hashing method recently. In this work, we propose a new Discriminative Deep Hashing (DDH) network to learn discriminative and compact hash codes for largescale face image retrieval. The proposed network incorporates the endtoend learning, the divideandencode module and the desired discrete code learning into a unified framework. Specifically, a network with a stack of convolutionpooling layers is proposed to extract multiscale and robust features by merging the outputs of the third max pooling layer and the fourth convolutional layer. To reduce the redundancy among hash codes and the network parameters simultaneously, a divideandencode module to generate compact hash codes. Moreover, a loss function is introduced to minimize the prediction errors of the learned hash codes, which can lead to discriminative hash codes. Extensive experiments on two datasets demonstrate that the proposed method achieves superior performance compared with some stateoftheart hashing methods.

Confusion Graph: Detecting Confusion Communities in Large Scale Image Classification
Ruochun Jin, Yong Dou, Yueqing Wang, Xin Niu
For deep CNNbased image classification models, we observe that confusions between classes with high visual similarity are much stronger than those where classes are visually dissimilar. With these unbalanced confusions, classes can be organized in communities, which is similar to cliques of people in the social network. Based on this, we propose a graphbased tool named "confusion graph" to quantify these confusions and further reveal the community structure inside the database. With this community structure, we can diagnose the model's weaknesses and improve the classification accuracy using specialized expert subnets, which is comparable to other stateoftheart techniques. Utilizing this community information, we can also employ pretrained models to automatically identify mislabeled images in the large scale database. With our method, researchers just need to manually check approximate 3% of the ILSVRC2012 classification database to locate almost all mislabeled samples.

Identifying Human Mobility via Trajectory Embeddings
Qiang Gao, Fan Zhou, Kunpeng Zhang, Goce Trajcevski, Xucheng Luo, Fengli Zhang
Understanding human trajectory patterns is an important task in many location based social networks (LBSNs) applications, such as personalized recommendation and preferencebased route planning. Most of the existing methods classify a trajectory (or its segments) based on spatiotemporal values and activities, into some predefined categories, e.g., walking or jogging. We tackle a novel trajectory classification problem: we identify and link trajectories to users who generate them in the LBSNs, a problem called TrajectoryUser Linking (TUL). Solving the TUL problem is not a trivial task because: (1) the number of the classes (i.e., users) is much larger than the number of motion patterns in the common trajectory classification problems; and (2) the location based trajectory data, especially the checkins, are often extremely sparse. To address these challenges, a Recurrent Neural Networks (RNN) based semisupervised learning model, called TULER (TUL via Embedding and RNN) is proposed, which exploits the spatiotemporal data to capture the underlying semantics of user mobility patterns. Experiments conducted on realworld datasets demonstrate that TULER achieves better accuracy than the existing methods.

Name Nationality Classification with Recurrent Neural Networks
Jinhyuk Lee, Hyunjae Kim, Miyoung Ko, Donghee Choi, Jaehoon Choi, Jaewoo Kang
Personal names tend to have many variations differing from country to country. Though there exists a large amount of personal names on the Web, nationality prediction solely based on names has not been fully studied due to its difficulties in extracting subtle character level features. We propose a recurrent neural network based model which predicts nationalities of each name using automatic feature extraction. Evaluation of Olympic record data shows that our model achieves greater accuracy than previous feature based approaches in nationality prediction tasks. We also evaluate our proposed model and baseline models on name ethnicity classification task, again achieving better or comparable performances. We further investigate the effectiveness of character embeddings used in our proposed model.

Improving classification accuracy of feedforward neural networks for spiking neuromorphic chips
Antonio Jose Jimeno Yepes, Jianbin Tang, Benjamin Mashford
Deep Neural Networks (DNN) achieve human level performance in many image analytics tasks but DNNs are mostly deployed to GPU platforms that consume a considerable amount of power. New hardware platforms using lower precision arithmetic achieve drastic reductions in power consumption. More recently, braininspired spiking neuromorphic chips have achieved even lower power consumption, on the order of milliwatts, while still offering realtime processing. However, for deploying DNNs to energy efficient neuromorphic chips the incompatibility between continuous neurons and synaptic weights of traditional DNNs, discrete spiking neurons and synapses of neuromorphic chips need to be overcome. Previous work has achieved this by training a network to learn continuous probabilities, before it is deployed to a neuromorphic architecture, such as IBM TrueNorth Neurosynaptic System, by random sampling these probabilities. The main contribution of this paper is a new learning algorithm that learns a TrueNorth configuration ready for deployment. We achieve this by training directly a binary hardware crossbar that accommodates the TrueNorth axon configuration constrains and we propose a different neuron model. Results of our approach trained on electroencephalogram (EEG) data show a significant improvement with previous work (76% vs 86% accuracy) while maintaining state of the art performance on the MNIST handwritten data set.

Object Recognition with and without Objects
Zhuotun Zhu, Lingxi Xie, Alan Yuille
While recent deep neural networks have achieved a promising performance on object recognition, they rely implicitly on the visual contents of the whole image. In this paper, we train deep neural networks on the foreground (object) and background (context) regions of images respectively. Considering human recognition in the same situations, networks trained on the pure background without objects achieves highly reasonable recognition performance that beats humans by a large margin if only given context. However, humans still outperform networks with pure object available, which indicates networks and human beings have different mechanisms in understanding an image. Furthermore, we straightforwardly combine multiple trained networks to explore different visual cues learned by different networks. Experiments show that useful visual hints can be explicitly learned separately and then combined to achieve higher performance, which verifies the advantages of the proposed framework.
Thursday 24 08:30  10:00 MLDLV2  Deep Learning and Vision 2

ImportanceAware Semantic Segmentation for Autonomous Driving System
Bike Chen, Chen Gong, Jian Yang
Semantic Segmentation (SS) partitions an image into several coherent semantically meaningful parts, and classifies each part into one of the predetermined classes. In this paper, we argue that existing SS methods cannot be reliably applied to autonomous driving system as they ignore the different importance levels of distinct classes for safedriving. For example, pedestrians in the scene are much more important than sky when driving a car, so their segmentations should be as accurate as possible. To incorporate the importance information possessed by various object classes, this paper designs an "ImportanceAware Loss" (IAL) that specifically emphasizes the critical objects for autonomous driving. IAL operates under a hierarchical structure, and the classes with different importance are located in different levels so that they are assigned distinct weights. Furthermore, we derive the forward and backward propagation rules for IAL and apply them to deep neural networks for realizing SS in intelligent driving system. The experiments on CamVid and Cityscapes datasets reveal that by employing the proposed loss function, the existing deep learning models including FCN, SegNet and ENet are able to consistently obtain the improved segmentation results on the predefined important classes for safedriving.

MultiStream Deep Similarity Learning Networks for Visual Tracking
Kunpeng Li, Yu Kong, Yun Fu
Visual tracking has achieved remarkable success in recent decades, but it remains a challenging problem due to appearance variations over time and complex cluttered background. In this paper, we adopt a trackingbyverification scheme to overcome these challenges by determining the patch in the subsequent frame that is most similar to the target template and distinctive to the background context. A multistream deep similarity learning network is proposed to learn the similarity comparison model. The loss function of our network encourages the distance between a positive patch in the search region and the target template to be smaller than that between positive patch and the background patches. Within the learned feature space, even if the distance between positive patches becomes large caused by the appearance change or interference of background clutter, our method can use the relative distance to distinguish the target robustly. Besides, the learned model is directly used for tracking with no need of model updating, parameter finetuning and can run at 45 fps on a single GPU. Our tracker achieves stateoftheart performance on the visual tracking benchmark compared with other recent realtimespeed trackers, and shows better capability in handling background clutter, occlusion and appearance change.

Person ReIdentification by Deep Joint Learning of MultiLoss Classification
Wei Li, Xiatian Zhu, Shaogang Gong
Existing person reidentification (reid) methods rely mostly on either localised or global feature representation. This ignores their joint benefit and mutual complementary effects. In this work, we show the advantages of jointly learning local and global features in a Convolutional Neural Network (CNN) by aiming to discover correlated local and global features in different context. Specifically, we formulate a method for joint learning of local and global feature selection losses designed to optimise person reid when using generic matching metrics such as the L2 distance. We design a novel CNN architecture for Jointly Learning MultiLoss (JLML) of local and global discriminative feature optimisation subject concurrently to the same reid labelled information. Extensive comparative evaluations demonstrate the advantages of this new JLML model for person reid over a wide range of stateoftheart reid methods on five benchmarks (VIPeR, GRID, CUHK01, CUHK03, Market1501).

Locality Constrained Deep Supervised Hashing for Image Retrieval
Hao Zhu, Shenghua Gao
Deep Convolutional Neural Network (DCNN) based deep hashing has shown its success for fast and accurate image retrieval, however directly minimizing the quantization error in deep hashing will change the distribution of DCNN features, and consequently change the similarity between the query and the retrieved images in hashing. In this paper, we propose a novel LocalityConstrained Deep Supervised Hashing. By simultaneously learning discriminative DCNN features and preserving the similarity between image pairs, the hash codes of our scheme preserves the distribution of DCNN features thus favors the accurate image retrieval.The contributions of this paper are twofold: i) Our analysis shows that minimizing quantization error in deep hashing makes the features less discriminative which is not desirable for image retrieval; ii) We propose a LocalityConstrained Deep Supervised Hashing which preserves the similarity between image pairs in hashing.Extensive experiments on the CIFARA10 and NUSWIDE datasets show that our method significantly boosts the accuracy of image retrieval, especially on the CIFAR10 dataset, the improvement is usually more than 6\% in terms of the MAP measurement. Further, our method demonstrates 10$\times$ faster than stateoftheart methods in the training phase.

Deep Supervised Hashing with Nonlinear Projections
Sen Su, Gang Chen, Xiang Cheng, Rong Bi
Hashing has attracted broad research interests in large scale image retrieval due to its high search speed and efficient storage. Recently, many deep hashing methods have been proposed to perform simultaneous nonlinear feature learning and hash projection learning, which have shown superior performance compared to handcrafted feature based hashing methods. Nonlinear projection functions have shown their advantages over the linear ones due to their powerful generalization capabilities. To improve the performance of deep hashing methods by generalizing projection functions, we propose the idea of implementing a pure nonlinear deep hashing network architecture. By consolidating the above idea, this paper presents a Deep Supervised Hashing architecture with Nonlinear Projections (DSHNP). In particular, soft decision trees are adopted as the nonlinear projection functions, since they can generate differentiable nonlinear outputs and can be trained with deep neural networks in an endtoend way. Moreover, to make the hash codes as independent as possible, we design two regularizers imposed on the parameter matrices of the leaves in the soft decision trees. Extensive evaluations on two benchmark image datasets show that the proposed DSHNP outperforms several stateoftheart hashing methods.

CauseEffect Knowledge Acquisition and Neural Association Model for Solving A Set of Winograd Schema Problems
Quan Liu, Hui Jiang, Andrew Evdokimov, ZhenHua Ling, Xiaodan Zhu, Si Wei, Yu Hu
This paper focuses on the investigations in Winograd Schema (WS), a challenging problem which has been proposed for measuring progress in commonsense reasoning.Due to the lack of commonsense knowledge and training data, very little work has been found on the WS problems in recent years.Actually, there is no shortcut to solve this problem except to collect more commonsense knowledge and design suitable models.Therefore, this paper addresses a set of WS problems by proposing a knowledge acquisition method and a general neural association model.To avoid the sparseness issue, the knowledge we aim to collect is the causeeffect relationships between thousands of commonly used words.The knowledge acquisition method supports us to extract hundreds of thousands of causeeffect pairs from large text corpus automatically.Meanwhile, a neural association model (NAM) is proposed to encode the association relationships between any two discrete events.Based on the extracted knowledge and the NAM models, in this paper, we successfully build a system for solving WS problems from scratch and achieve 70.0% accuracy.Most importantly, this paper provides a flexible framework to solve WS problems based on event association and neural network methods.
Thursday 24 08:30  10:00 MLDM3  Data Mining 3

Linear Manifold Regularization with Adaptive Graph for Semisupervised Dimensionality Reduction
Kai Xiong, Feiping Nie, Junwei Han
Many previous graphbased methods perform dimensionality reduction on a predefined graph. However, due to the noise and redundant information in the original data, the predefined graph has no clear structure and may not be appropriate for the subsequent task. To overcome the drawbacks, in this paper, we propose a novel approach called linear manifold regularization with adaptive graph (LMRAG) for semisupervised dimensionality reduction. LMRAG directly incorporates the graph construction into the objective function, thus the projection matrix and the optimal graph can be simultaneously optimized. Due to the structure constraint, the learned graph is sparse and has clear structure. Extensive experiments on several benchmark datasets demonstrate the effectiveness of the proposed method.

Dynamic Weighted Majority for Incremental Learning of Imbalanced Data Streams with Concept Drift
Yang Lu, Yiuming Cheung, Yuan Yan Tang
Concept drifts occurring in data streams will jeopardize the accuracy and stability of the online learning process. If the data stream is imbalanced, it will be even more challenging to detect and cure the concept drift. In the literature, these two problems have been intensively addressed separately, but have yet to be well studied when they occur together. In this paper, we propose a chunkbased incremental learning method called Dynamic Weighted Majority for Imbalance Learning (DWMIL) to deal with the data streams with concept drift and class imbalance problem. DWMIL utilizes an ensemble framework by dynamically weighting the base classifiers according to their performance on the current data chunk. Compared with the existing methods, its merits are fourfold: (1) it can keep stable for nondrifted streams and quickly adapt to the new concept; (2) it is totally incremental, i.e. no previous data needs to be stored; (3) it keeps limited number of classifiers to ensure high efficiency; and (4) it is simple and only one thresholding parameter is needed. Experiments on both synthesized and real data sets with concept drift show that DWMIL performs better than the stateoftheart competitors, with less computational cost.

Semisupervised Orthogonal Graph Embedding with Recursive Projections
Hanyang Liu, Junwei Han, Feiping Nie
Many graph based semisupervised dimensionality reduction algorithms utilize the projection matrix to linearly map the data matrix from the original feature space to a lower dimensional representation. But the dimensionality after reduction is inevitably restricted to the number of classes, and the learned nonorthogonal projection matrix usually fails to preserve distances well and balance the weight on different projection direction. This paper proposes a novel dimensionality reduction method, called the semisupervised orthogonal graph embedding with recursive projections (SOGE). We integrate the manifold smoothness and label fitness as well as the penalization of the linear mapping mismatch, and learn the orthogonal projection on the Stiefel manifold that empirically demonstrates better performance. Moreover, we recursively update the projection matrix in its orthocomplemented space to continuously learn more projection vectors, so as to better control the dimension of reduction. Comprehensive experiment on several benchmarks demonstrates the significant improvement over the existing methods.

Selfpaced Mixture of Regressions
Longfei Han, Dingwen Zhang, Dong Huang, Xiaojun Chang, Senlin Luo, Jun Ren, Junwei Han
Mixture of regressions (MoR) is the wellestablished and effective approach to model discontinuous and heterogeneous data in regression problems. Existing MoR approaches assume smooth joint distribution for its good anlaytic properties. However, such assumption makes existing MoR very sensitive to intracomponent outliers (the noisy training data residing in certain components) and the intercomponent imbalance (the different amounts of training data in different components). In this paper, we make the earliest effort on Selfpaced Learning (SPL) in MoR, i.e., Selfpaced mixture of regressions (SPMoR) model. We propose a novel selfpaced regularizer based on the Exclusive LASSO, which improves intercomponent balance of training data. As a robust learning regime, SPL pursues confidence sample reasoning. To demonstrate the effectiveness of SPMoR, we conducted experiments on both the sythetic examples and realworld applications to age estimation and glucose estimation. The results show that SPMoR outperforms the stateofthearts methods.

Locally Linear Factorization Machines
Chenghao Liu, Teng Zhang, Peilin Zhao, Jun Zhou, Jianling Sun
Factorization Machines (FMs) are a widely used method for efficiently using highorder feature interactions in classification and regression tasks. Unfortunately, despite increasing interests in FMs, existing work only considers high order information of the input features which limits their capacities in nonlinear problems and fails to capture the underlying structures of more complex data. In this work, we present a novel Locally Linear Factorization Machines (LLFM) which overcomes this limitation by exploring local coding technique. Unlike existing local coding classifiers that involve a phase of unsupervised anchor point learning and predefined local coding scheme which is suboptimal as the class label information is not exploited in discovering the encoding and thus can result in a suboptimal encoding for prediction, we formulate a joint optimization over the anchor points, local coding coordinates and FMs variables to minimize classification or regression risk. Empirically, we demonstrate that our approach achieves much better predictive accuracy than other competitive methods which employ LLFM with unsupervised anchor point learning and predefined local coding scheme.

Robust Survey Aggregation with Studentt Distribution and Sparse Representation
Qingtao Tang, Tao Dai, Li Niu, Yisen Wang, ShuTao Xia, Jianfei Cai
Most existing survey aggregation methods assume that the sample data follow Gaussian distribution. However, these methods are sensitive to outliers, due to the thintailed property of the Gaussian distribution. To address this issue, we propose a robust survey aggregation method based on Studentt distribution and sparse representation. Specifically, we assume that the samples follow Student$t$ distribution, instead of the common Gaussian distribution. Due to the Studentt distribution, our method is robust to outliers, which can be explained from both Bayesian point of view and nonBayesian point of view. In addition, inspired by JamesStain estimator (JS) and Compressive Averaging (CAvg), we propose to sparsely represent the global mean vector by an adaptive basis comprising both dataspecific basis and combined generic bases. Theoretically, we prove that JS and CAvg are special cases of our method. Extensive experiments demonstrate that our proposed method achieves significant improvement over the stateoftheart methods on both synthetic and real datasets.
Thursday 24 08:30  10:00 KRBC  Belief Change

A General Multiagent Epistemic Planner Based on Higherorder Belief Change
Xiao Huang, Biqing Fang, Hai Wan, Yongmei Liu
In recent years, multiagent epistemic planning has received attention from both dynamic logic and planning communities. Existing implementations of multiagent epistemic planning are based on compilation into classical planning and suffer from various limitations, such as generating only linear plans, restriction to public actions, and incapability to handle disjunctive beliefs. In this paper, we propose a general representation language for multiagent epistemic planning where the initial KB and the goal, the preconditions and effects of actions can be arbitrary multiagent epistemic formulas, and the solution is an action tree branching on sensing results.To support efficient reasoning in the multiagent KD45 logic, we make use of a normal form called alternative cover disjunctive formula (ACDF). We propose basic revision and update algorithms for ACDF formulas. We also handle static propositional common knowledge, which we call constraints. Based on our reasoning, revision and update algorithms, adapting the PrAO algorithm for contingent planning from the literature, we implemented a multiagent epistemic planner called MAEP. Our experimental results show the viability of our approach.

Belief Change in a Preferential Nonmonotonic Framework
Giovanni Casini, Thomas Meyer
Belief change and nonmonotonic reasoning are usually viewed as two sides of the same coin, with results showing that one can formally be defined in terms of the other. In this paper we show that it also makes sense to analyse belief change within a (preferential) nonmonotonic framework. We consider belief change operators in a nonmonotonic propositional setting with a view towards preserving consistency. We show that the results obtained can also be applied to the preservation of coherence— an important notion within the field of logicbased ontologies. We adopt the AGM approach to belief change and show that standard AGM can be adapted to a preferential nonmonotonic framework, with the definition of expansion, contraction, and revision operators, and corresponding representation results.

Strong Syntax Splitting for Iterated Belief Revision
Gabriele KernIsberner, Gerhard Brewka
AGM theory is the most influential formal account of belief revision. Nevertheless, there are some issues with the original proposal. In particular, Parikh has pointed out that completely irrelevant information may be affected in AGM revision. To remedy this, he proposed an additional axiom (P) aiming to capture (ir)relevance by a notion of syntax splitting. In this paper we generalize syntax splitting from logical sentences to epistemic states, a step which is necessary to cover iterated revision. The generalization is based on the notion of marginalization of epistemic states. Furthermore, we study epistemic syntax splitting in the context of ordinal conditional functions. Our approach substantially generalizes the semantical treatment of (P) in terms of faithful preorders recently presented by Peppas and colleagues.

NonDeterminism and the Dynamics of Knowledge
Christos Moyzes, Andreas Herzig, Wiebe van der Hoek, Davide Grossi
In this paper we attempt to shed light on the concept of an agent’s knowledge after a nondeterministic action is executed. We start by making a comparison between notions of nondeterministic choice, and between notions of sequential composition, of settings with dynamic and/or epistemic character; namely Propositional Dynamic Logic (PDL), Dynamic Epistemic Logic (DEL), and the more recent logic of SemiPublic Environments (SPE). These logics represent two different approaches for defining the aforementioned actions, and in order to provide unified frameworks that encompass both, we define the logics DELVO (DEL+Vision+Ontic change) and PDLVE (PDL+Vision+Epistemic operators). DELVO is given a sound and complete axiomatisation.

Belief Manipulation Through Propositional Announcements
Aaron Hunter, François Schwarzentruber, Eric Tsang
Public announcements cause each agent in a group to modify their beliefs to incorporate some new piece of information, while simultaneously being aware that all other agents are doing the same. Given a set of agents and a set of epistemic goals, it is natural to ask if there is a single announcement that will make each agent believe the corresponding goal. This problem is known to be undecidable in a general modal setting, where the presence of nested beliefs can lead to complex dynamics. In this paper, we consider not necessarily truthful public announcements in the setting of AGM belief revision. We prove that announcement finding in this setting is not only decidable, but that it is simpler than the corresponding problem in the most simplified modal logics. We then describe AnnB, an implemented tool that uses announcement finding as the basis for controlling robot behaviour through belief manipulation.

Epistemicentrenchment Characterization of Parikh’s Axiom
Theofanis Aravanis, Pavlos Peppas, MaryAnne Williams
In this article, we provide the epistemicentrenchment characterization of the weak version of Parikh’s relevancesensitive axiom for belief revision — known as axiom (P) — for the general case of incomplete theories. Loosely speaking, axiom (P) states that, if a belief set K can be divided into two disjoint compartments, and the new information φ relates only to the first compartment, then the second compartment should not be affected by the revision of K by φ. The abovementioned characterization, essentially, constitutes additional constraints on epistemicentrenchment preorders, that induce AGM revision functions, satisfying the weak version of Parikh’s axiom (P).
Thursday 24 08:30  10:00 KRGT  Gane Theory

Smoothing Method for Approximate ExtensiveForm Perfect Equilibrium
Christian Kroer, Gabriele Farina, Tuomas Sandholm
Nash equilibrium is a popular solution concept for solving imperfectinformation games in practice. However, it has a major drawback: it does not preclude suboptimal play in branches of the game tree that are not reached in equilibrium. Equilibrium refinements can mend this issue, but have experienced little practical adoption. This is largely due to a lack of scalable algorithms.Sparse iterative methods, in particular firstorder methods, are known to be among the most effective algorithms for computing Nash equilibria in largescale twoplayer zerosum extensiveform games. In this paper, we provide, to our knowledge, the first extension of these methods to equilibrium refinements. We develop a smoothing approach for behavioral perturbations of the convex polytope that encompasses the strategy spaces of players in an extensiveform game. This enables one to compute an approximate variant of extensiveform perfect equilibria. Experiments show that our smoothing approach leads to solutions with dramatically stronger strategies at information sets that are reached with low probability in approximate Nash equilibria, while retaining the overall convergence rate associated with fast algorithms for Nash equilibrium. This has benefits both in approximate equilibrium finding (such approximation is necessary in practice in large games) where some probabilities are low while possibly heading toward zero in the limit, and exact equilibrium computation where the low probabilities are actually zero.

Weakening Covert Networks by Minimizing Inverse Geodesic Length
Haris Aziz, Serge Gaspers, Kamran Najeebullah
We consider the problem of deleting nodes in a covert network to minimize its performance. The inverse geodesic length (IGL) is a wellknown and widely used measure of network performance. It equals the sum of the inverse distances of all pairs of vertices. In the MinIGL problem the input is a graph $G$, a budget $k$, and a target IGL $T$, and the question is whether there exists a subset of vertices $X$ with $X=k$, such that the IGL of $GX$ is at most $T$. In network analysis, the IGL is often used to evaluate how well heuristics perform in strengthening or weakening a network. In this paper, we undertake a study of the classical and parameterized complexity of the MinIGL problem. The problem is NPcomplete even if $T=0$ and remains both NPcomplete and $W[1]$hard for parameter $k$ on bipartite and on split graphs. On the positive side, we design several multivariate algorithms for the problem. Our main result is an algorithm for MinIGL parameterized by the twin cover number.

The Tractability of the Shapley Value over Bounded Treewidth Matching Games
Gianluigi Greco, Francesco Lupia, Francesco Scarcello
Matching games form a class of coalitional games that attracted much attention in the literature. Indeed, several results are known about the complexity of computing over them {solution concepts}. In particular, it is known that computing the Shapley value is intractable in general, formally #Phard, and feasible in polynomial time over games defined on trees. In fact, it was an open problem whether or not this tractability result holds over classes of graphs properly including acyclic ones. The main contribution of the paper is to provide a positive answer to this question, by showing that the Shapley value is tractable for matching games defined over graphs having bounded treewidth. The proposed technique has been implemented and tested on classes of graphs having different sizes and treewidth at most three.

An Algorithm for Constructing and Solving Imperfect Recall Abstractions of Large ExtensiveForm Games
Jiri Cermak, Branislav Bošanský, Viliam Lisý
We solve large twoplayer zerosum extensiveform games with perfect recall. We propose a new algorithm based on fictitious play that significantly reduces memory requirements for storing average strategies. The key feature is exploiting imperfect recall abstractions while preserving the convergence rate and guarantees of fictitious play applied directly to the perfect recall game. The algorithm creates a coarse imperfect recall abstraction of the perfect recall game and automatically refines its information set structure only where the imperfect recall might cause problems. Experimental evaluation shows that our novel algorithm is able to solve a simplified poker game with 7.10^5 information sets using an abstracted game with only 1.8% of information sets of the original game. Additional experiments on poker and randomly generated games suggest that the relative size of the abstraction decreases as the size of the solved games increases.

Nash Equilibria in Concurrent Games with Lexicographic Preferences
Julian Gutierrez, Aniello Murano, Giuseppe Perelli, Sasha Rubin, Michael Wooldridge
We study concurrent games with finitememory strategies where players are given a Buchi and a meanpayoff objective, which are related by a lexicographic order: a player first prefers to satisfy its Buchi objective, and then prefers to minimise costs, which are given by a meanpayoff function. In particular, we show that deciding the existence of a strict Nash equilibrium in such games is decidable, even if players' deviations are implemented as infinite memory strategies.

MultipleProfile PredictionofUse Games
Andrew Perrault, Craig Boutilier
Predictionofuse (POU) games (Robu et al., 2017) address the mismatch between energy supplier costs and the incentives imposed on consumers by a fixedrate electricity tariff. However, the framework does not address how consumers should coordinate to maximize social welfare. To address this, we develop MPOU games, an extension of POU games in which agents report multiple acceptable electricity use profiles. We show that MPOU games share many attractive properties with POU games (e.g., convexity). Despite this, MPOU games introduce new incentive issues that prevent the consequences of convexity from being exploited directly, a problem we analyze and resolve. We validate our approach with experimental results using utility models learned from real electricity use data.
Thursday 24 08:30  10:00 MASCOCO  Coordination and Cooperation

COGDICE: An Algorithm for Solving ContinuousObservation DecPOMDPs
Madison ClarkTurner, Chris Amato
The decentralized partially observable Markov decision process (DecPOMDP) is a powerful model for representing multiagent problems with decentralized behavior. Unfortunately, current DecPOMDP solution methods cannot solve problems with continuous observations, which are common in many realworld domains. To that end, we present a framework for representing and generating DecPOMDP policies that explicitly include continuous observations. We apply our algorithm to a novel tagging problem and an extended version of a common benchmark, where it generates policies that meet or exceed the values of equivalent discretized domains without the need for finding an adequate discretization.

Coordinated Versus Decentralized Exploration In MultiAgent MultiArmed Bandits
Mithun Chakraborty, Kai Yee Phoebe Chua, Sanmay Das, Brendan Juba
In this paper, we introduce a multiagent multiarmed banditbased model for ad hoc teamwork with expensive communication. The goal of the team is to maximize the total reward gained from pulling arms of a bandit over a number of epochs. In each epoch, each agent decides whether to pull an arm, or to broadcast the reward it obtained in the previous epoch to the team and forgo pulling an arm. These decisions must be made only on the basis of the agent’s private information and the public information broadcast prior to that epoch. We first benchmark the achievable utility by analyzing an idealized version of this problem where a central authority has complete knowledge of rewards acquired from all arms in all epochs and uses a multiplicative weights update algorithm for allocating arms to agents. We then introduce an algorithm for the decentralized setting that uses a valueofinformation based communication strategy and an explorationexploitation strategy based on the centralized algorithm, and show experimentally that it converges rapidly to the performance of the centralized method.

Probability Bounds for Overlapping Coalition Formation
Michail Mamakos, Georgios Chalkiadakis
In this work, we provide novel methods which benefit from obtained probability bounds for assessing the ability of teams of agents to accomplish coalitional tasks. To this end, our first method is based on an improvement of the PaleyZygmund inequality, while the second and the third ones are devised based on manipulations of the twosided Chebyshev’s inequality and the Hoeffding’s inequality, respectively. Agents have no knowledge of the amount of resources others possess; and hold private Bayesian beliefs regarding the potential resource investment of every other agent. Our methods allow agents to demand that certain confidence levels are reached, regarding the resource contributions of the various coalitions. In order to tackle realworld scenarios, we allow agents to form overlapping coalitions, so that one can simultaneously be part of a number of coalitions. We thus present a protocol for iterated overlapping coalition formation (OCF), through which agents can complete tasks that grant them utility. Agents lie on a social network and their distance affects their likelihood of cooperation towards the completion of a task. We confirm our methods’ effectiveness by testing them on both a random graph of 300 nodes and a realworld social network of 4039 nodes.

MultiAgent Planning with Baseline Regret Minimization
Feng Wu, Shlomo Zilberstein, Xiaoping Chen
We propose a novel baseline regret minimization algorithm for multiagent planning problems modeled as finitehorizon decentralized POMDPs. It guarantees to produce a policy that is provably better than or at least equivalent to the baseline policy. We also propose an iterative belief generation algorithm to effectively and efficiently minimize the baseline regret, which only requires necessary iterations to converge to the policy with minimum baseline regret. Experimental results on common benchmark problems confirm its advantage comparing to the stateoftheart approaches.

Object Allocation via Swaps along a Social Network
Anaëlle Wilczynski, Laurent Gourvès, Julien LESCA
This article deals with object allocation where each agent receives a single item. Starting from an initial endowment, the agents can be better off by exchanging their objects. However, not all trades are likely because some participants are unable to communicate. By considering that the agents are embedded in a social network, we propose to study the allocations emerging from a sequence of simple swaps between pairs of neighbors in the network. This model raises natural questions regarding (i) the reachability of a given assignment, (ii) the ability of an agent to obtain a given object, and (iii) the search of Paretoefficient allocations. We investigate the complexity of these problems by providing, according to the structure of the social network, polynomial and NPcomplete cases.

Manipulating Opinion Diffusion in Social Networks
Robert Bredereck, Edith Elkind
We consider opinion diffusion in binary influence networks, where at each step one or more agents update their opinions so as to be in agreement with the majority of their neighbors. We consider several ways of manipulating the majority opinion in a stable outcome, such as bribing agents, adding/deleting links, and changing the order of updates, and investigate the computational complexity of the associated problems, identifying tractable and intractable cases.
Thursday 24 08:30  10:00 MLTAML1  transfer, Adaptation, MultiTask Learning 1

Learning Latest Classifiers without Additional Labeled Data
Atsutoshi Kumagai, Tomoharu Iwata
In various applications such as spam mail classification, the performance of classifiers deteriorates over time. Although retraining classifiers using labeled data helps to maintain the performance, continuously preparing labeled data is quite expensive. In this paper, we propose a method to learn classifiers by using newly obtained unlabeled data, which are easy to prepare, as well as labeled data collected beforehand. A major reason for the performance deterioration is the emergence of new features that do not appear in the training phase. Another major reason is the change of the distribution between the training and test phases. The proposed method learns the latest classifiers that overcome both problems. With the proposed method, the conditional distribution of new features given existing features is learned using the unlabeled data. In addition, the proposed method estimates the density ratio between training and test distributions by using the labeled and unlabeled data. We approximate the classification error of a classifier, which exploits new features as well as existing features, at the test phase by incorporating both the conditional distribution of new features and the densityratio, simultaneously. By minimizing the approximated error while integrating out new feature values, we obtain a classifier that exploits new features and fits on the test phase. The effectiveness of the proposed method is demonstrated with experiments using synthetic and realworld data sets.

Dependency Exploitation: A Unified CNNRNN Approach for Visual Emotion Recognition
Xinge Zhu, Liang Li, Weigang Zhang, Tianrong Rao, Min Xu, Qingming Huang, Dong Xu
Visual emotion recognition aims to associate images with appropriate emotions. There are different visual stimuli that can affect human emotion from lowlevel to highlevel, such as color, texture, part, object, etc. However, most existing methods treat different levels of features as independent entity without having effective method for feature fusion. In this paper, we propose a unified CNNRNN model to predict the emotion based on the fused features from different levels by exploiting the dependency among them. Our proposed architecture leverages convolutional neural network (CNN) with multiple layers to extract different levels of features with in a multitask learning framework, in which two related loss functions are introduced to learn the feature representation. Considering the dependencies within the lowlevel and highlevel features, a new bidirectional recurrent neural network (RNN) is proposed to integrate the learned features from different layers in the CNN model. Extensive experiments on both Internet images and art photo datasets demonstrate that our method outperforms the stateoftheart methods with at least 7% performance improvement.

Learning Discriminative Correlation Subspace for Heterogeneous Domain Adaptation
Yuguang Yan, Wen Li, Michael Ng, Mingkui Tan, Qingyao Wu, Hanrui Wu, Huaqing Min
Domain adaptation aims to reduce the effort on collecting and annotating target data by leveraging knowledge from a different source domain. The domain adaptation problem will become extremely challenging when the feature spaces of the source and target domains are different, which is also known as the heterogeneous domain adaptation (HDA) problem. In this paper, we propose a novel HDA method to find the optimal discriminative correlation subspace for the source and target data. The discriminative correlation subspace is inherited from the canonical correlation subspace between the source and target data, and is further optimized to maximize the discriminative ability for the target domain classifier. We formulate a joint objective in order to simultaneously learn the discriminative correlation subspace and the target domain classifier. We then apply an alternating direction method of multiplier (ADMM) algorithm to address the resulting nonconvex optimization problem. Comprehensive experiments on two realworld data sets demonstrate the effectiveness of the proposed method compared to the stateoftheart methods.

AccGenSVM: Selectively Transferring from Previous Hypotheses
Diana BenavidesPrado, Yun Sing Koh, Patricia Riddle
In our research, we consider transfer learning scenarios where a target learner does not have access to the source data, but instead to hypotheses or models induced from it. This is called the Hypothesis Transfer Learning (HTL) problem. Previous approaches concentrated on transferring source hypotheses as a whole. We introduce a novel method for selectively transferring elements from previous hypotheses learned with Support Vector Machines. The representation of an SVM hypothesis as a set of support vectors allows us to treat this information as privileged to aid learning during a new task. Given a possibly large number of source hypotheses, our approach selects the source support vectors that more closely resemble the target data, and transfers their learned coefficients as constraints on the coefficients to be learned. This strategy increases the importance of relevant target data points based on their similarity to source support vectors, while learning from the target data. Our method shows important improvements on the convergence rate on three classification datasets of varying sizes, decreasing the number of iterations by up to 56% on average compared to learning with no transfer and up to 92% compared to regular HTL, while maintaining similar accuracy levels.

Privileged Multilabel Learning
Shan You, Chang Xu, Yunhe Wang, Chao Xu, Dacheng Tao
This paper presents privileged multilabel learning (PrML) to explore and exploit the relationship between labels in multilabel learning problems. We suggest that for each individual label, it cannot only be implicitly connected with other labels via the lowrank constraint over label predictors, but also its performance on examples can receive the explicit comments from other labels together acting as an Oracle teacher. We generate privileged label feature for each example and its individual label, and then integrate it into the framework of lowrank based multilabel learning. The proposed algorithm can therefore comprehensively explore and exploit label relationships by inheriting all the merits of privileged information and lowrank constraints. We show that PrML can be efficiently solved by dual coordinate descent algorithm using iterative optimization strategy with cheap updates. Experiments on benchmark datasets show that through privileged label features, the performance can be significantly improved and PrML is superior to several competing methods in most cases.

Boosted ZeroShot Learning with Semantic Correlation Regularization
Te Pi, Xi Li, zhongfei (Mark) Zhang
We study zeroshot learning (ZSL) as a transfer learning problem, and focus on the two key aspects of ZSL, model effectiveness and model adaptation. For effective modeling, we adopt the boosting strategy to learn a zeroshot classifier from weak models to a strong model. For adaptable knowledge transfer, we devise a Semantic Correlation Regularization (SCR) approach to regularize the boosted model to be consistent with the interclass semantic correlations. With SCR embedded in the boosting objective, and with a selfcontrolled sample selection for learning robustness, we propose a unified framework, Boosted Zeroshot classification with Semantic Correlation Regularization (BZSCR). By balancing the SCRregularized boosted model selection and the selfcontrolled sample selection, BZSCR is capable of capturing both discriminative and adaptable featuretoclass semantic alignments, while ensuring the reliability and adaptability of the learned samples. The experiments on two ZSL datasets show the superiority of BZSCR over the stateofthearts.
Thursday 24 08:30  10:00 MLREL1  Reinforcement Learning 1

Efficient Reinforcement Learning with Hierarchies of Machines by Leveraging Internal Transitions
Aijun Bai
In the context of hierarchical reinforcement learning, the idea of hierarchies of abstract machines (HAMs) is to write a partial policy as a set of hierarchical finite state machines with unspecified choice states, and use reinforcement learning to learn an optimal completion of this partial policy. Given a HAM with potentially deep hierarchical structure, there often exist many internal transitions where a machine calls another machine with the environment state unchanged. In this paper, we propose a new hierarchical reinforcement learning algorithm that discovers such internal transitions automatically, and shortcircuits them recursively in computation of Q values. The resulting HAMQINT algorithm outperforms the state of the art significantly on the benchmark Taxi domain and a much more complex RoboCup Keepaway domain.

MultiTask Deep Reinforcement Learning for Continuous Action Control
Zhaoyang Yang, Kathryn Merrick, Hussein Abbass, Lianwen Jin
In this paper, we propose a deep reinforcement learning algorithm to learn multiple tasks concurrently. A new network architecture is proposed in the algorithm which reduces the number of parameters needed by more than 75% per task compared to typical singletask deep reinforcement learning algorithms. The proposed algorithm and network fuse images with sensor data and were tested with up to 12 movementbased control tasks on a simulated Pioneer 3AT robot equipped with a camera and range sensors. Results show that the proposed algorithm and network can learn skills that are as good as the skills learned by a comparable singletask learning algorithm. Results also show that learning performance is consistent even when the number of tasks and the number of constraints on the tasks increased.

Endtoend optimization of goaldriven and visually grounded dialogue systems
Florian Strub, Harm de Vries, Jérémie Mary, Bilal Piot, Aaron Courville, Olivier Pietquin
Endtoend design of dialogue systems has recently become a popular research topic thanks to powerful tools such as encoderdecoder architectures for sequencetosequence learning. Yet, most current approaches cast humanmachine dialogue management as a supervised learning problem, aiming at predicting the next utterance of a participant given the full history of the dialogue. This vision may fail to correctly render the planning problem inherent to dialogue as well as its contextual and grounded nature. In this paper, we introduce a Deep Reinforcement Learning method to optimize visually grounded taskoriented dialogues, based on the policy gradient algorithm. This approach is tested on the question generation task from the dataset GuessWhat?! containing 120k dialogues and provides encouraging results at solving both the problem of generating natural dialogues and the task of discovering a specific object in a complex picture.

Sequence Prediction with Unlabeled Data by Reward Function Learning
Lijun Wu, Li Zhao, Tao Qin, TieYan Liu, Jianhuang Lai
Reinforcement learning (RL), which has been successfully applied to sequence prediction, introduces \textit{reward} as sequencelevel supervision signal to evaluate the quality of a generated sequence. Existing RL approaches use the groundtruth sequence to define reward, which limits the application of RL techniques to labeled data. Since labeled data is usually scarce and/or costly to collect, it is desirable to leverage largescale unlabeled data. In this paper, we extend existing RL methods for sequence prediction to exploit unlabeled data. We propose to learn the reward function from labeled data and use the predicted reward as \textit{pseudo reward} for unlabeled data so that we can learn from unlabeled data using the pseudo reward. To get good pseudo reward on unlabeled data, we propose a RNNbased reward network with attention mechanism, trained with purposely biased data distribution. Experiments show that the pseudo reward can provide good supervision and guide the learning process on unlabeled data. We observe significant improvements on both neural machine translation and text summarization.

Autonomous Task Sequencing for Customized Curriculum Design in Reinforcement Learning
Sanmit Narvekar, Jivko Sinapov, Peter Stone
Transfer learning is a method where an agent reuses knowledge learned in a source task to improve learning on a target task. Recent work has shown that transfer learning can be extended to the idea of curriculum learning, where the agent incrementally accumulates knowledge over a sequence of tasks (i.e. a curriculum). In most existing work, such curricula have been constructed manually. Furthermore, they are fixed ahead of time, and do not adapt to the progress or abilities of the agent. In this paper, we formulate the design of a curriculum as a Markov Decision Process, which directly models the accumulation of knowledge as an agent interacts with tasks, and propose a method that approximates an execution of an optimal policy in this MDP to produce an agentspecific curriculum. We use our approach to automatically sequence tasks for 3 agents with varying sensing and action capabilities in an experimental domain, and show that our method produces curricula customized for each agent that improve performance relative to learning from scratch or using a different agent's curriculum.

Improving Reinforcement Learning with ConfidenceBased Demonstrations
Zhaodong Wang, Matt Taylor
Reinforcement learning has had many successes, but in practice it often requires significant amounts of data to learn highperforming policies. One common way to improve learning is to allow a trained (source) agent to assist a new (target) agent. The goals in this setting are to 1) improve the target agent's performance, relative to learning unaided, and 2) allow the target agent to outperform the source agent. Our approach leverages source agent demonstrations, removing any requirements on the source agent's learning algorithm or representation. The target agent then estimates the source agent's policy and improves upon it. The key contribution of this work is to show that leveraging the target agent's uncertainty in the source agent's policy can significantly improve learning in two complex simulated domains, Keepaway and Mario.
Thursday 24 08:30  10:00 MLMIML  MultiInstance and MultiLabel Learning

MultiInstance Learning with Key Instance Shift
YaLin Zhang, ZhiHua Zhou
Multiinstance learning (MIL) deals with the tasks where each example is represented by a bag of instances. A bag is positive if it contains at least one positive instance, and negative otherwise. The positive instances are also called key instances. Only bag labels are observed, whereas specific instance labels are not available in MIL. Previous studies typically assume that training and test data follow the same distribution, which may be violated in many realworld tasks. In this paper, we address the problem that the distribution of key instances varies between training and test phase. We refer to this problem as MIL with key instance shift and solve it by proposing an embedding based method MIKI. Specifically, to transform the bags into informative vectors, we propose a weighted multiclass model to select the instances with high positiveness as instance prototypes. Then we learn the importance weights for transformed bag vectors and incorporate original instance weights into them to narrow the gap between training/test distributions. Experimental results validate the effectiveness of our approach when key instance shift occurs.

Deep Multiple Instance Hashing for Objectbased Image Retrieval
Wanqing Zhao, Ziyu Guan, Hangzai Luo, Jinye Peng, Jianping Fan
Multikeyword query is widely supported in text search engines. However, an analogue in image retrieval systems, multiobject query, is rarely studied. Meanwhile, traditional objectbased image retrieval methods often involve multiple steps separately and need expensive location labeling for detecting objects. In this work, we propose a weaklysupervised Deep Multiple Instance Hashing (DMIH) framework for objectbased image retrieval. DMIH integrates object detection and hashing learning on the basis of a popular CNN model to build the endtoend relation between a raw image and the binary hashing codes of multiple objects in it. Specifically, we cast the object detection of each object class as a binary multiple instance learning problem where instances are object proposals extracted from multiscale convolutional feature maps. For hashing training, we sample image pairs to learn their semantic relationships in terms of hash codes of the most probable proposals for owned labels as guided by object predictors. The two objectives benefit each other in learning. DMIH outperforms stateofthearts on public benchmarks for objectbased image retrieval and achieves promising results for multiobject queries.

Saliency Guided EndtoEnd Learning for Weakly Supervised Object Detection
Baisheng Lai, Xiaojin Gong
Weakly supervised object detection (WSOD), which is the problem of learning detectors using only imagelevel labels, has been attracting more and more interest. However, this problem is quite challenging due to the lack of location supervision. To address this issue, this paper integrates saliency into a deep architecture, in which the location information is explored both explicitly and implicitly. Specifically, we select highly confident object proposals under the guidance of classspecific saliency maps. The location information, together with semantic and saliency information, of the select proposals are then used to explicitly supervise the network by imposing two additional losses. Meanwhile, a saliency prediction subnetwork is built in the architecture. The prediction results are used to implicitly guide the localization procedure. The entire network is trained endtoend. Experiments on PASCAL VOC demonstrate that our approach outperforms all stateofthearts.

Obtaining HighQuality Label by Distinguishing between Easy and Hard Items in Crowdsourcing
Wei Wang, XiangYu Guo, ShaoYuan Li, Yuan Jiang, ZhiHua Zhou
Crowdsourcing systems make it possible to hire voluntary workers to label largescale data by offering them small monetary payments. Usually, the taskmaster requires to collect highquality labels, while the quality of labels obtained from the crowd may not satisfy this requirement. In this paper, we study the problem of obtaining highquality labels from the crowd and present an approach of learning the difficulty of items in crowdsourcing, in which we construct a small training set of items with estimated difficulty and then learn a model to predict the difficulty of future items. With the predicted difficulty, we can distinguish between easy and hard items to obtain highquality labels. For easy items, the quality of their labels inferred from the crowd could be high enough to satisfy the requirement; while for hard items, the crowd could not provide highquality labels, it is better to choose a more knowledgable crowd or employ specialized workers to label them. The experimental results demonstrate that the proposed approach by learning to distinguish between easy and hard items can significantly improve the label quality.

Binary Linear Compression for Multilabel Classification
WenJi Zhou, Yang Yu, MinLing Zhang
In multilabel classification tasks, labels are commonly related with each other. It has been well recognized that utilizing label relationship is essential to multilabel learning. One way to utilizing label relationship is to map labels to a lowerdimensional space of uncorrelated labels, where the relationship could be encoded in the mapping. Previous linear mapping methods commonly result in regression subproblems in the lowerdimensional label space. In this paper, we disclose that mappings to a lowdimensional multilabel regression problem can be worse than mapping to a classification problem, since regression requires more complex model than classification. We then propose the binary linear compression (BILC) method that results in a binary label space, leading to classification subproblems. Experiments on several multilabel datasets show that, employing classification in the embedded space results in much simpler models than regression, leading to smaller structure risk. The proposed methods are also shown to be superior to some stateoftheart approaches.

Incomplete Label Distribution Learning
Miao Xu, ZhiHua Zhou
Label distribution learning (LDL) assumes labels can be associated to an instance to some degree, thus it can learn the relevance of a label to a particular instance. Although LDL has got successful practical applications, one problem with existing LDL methods is that they are designed for data with \emph{complete} supervised information, while in reality, annotation information may be \emph{incomplete}, because assigning each label a real value to indicate its association with a particular instance will result in large cost in labor and time. In this paper, we will solve LDL problem when given \emph{incomplete} supervised information. We propose an objective based on trace norm minimization to exploit the correlation between labels. We develop a proximal gradient descend algorithm and an algorithm based on alternating direction method of multipliers. Experiments validate the effectiveness of our proposal.
Thursday 24 08:30  10:00 SISMAS  Sister Conference Track: Multiagent Systems

Which is the Fairest (Rent Division) of Them All? [Abstract Only]
Ya'akov (Kobi) Gal, Moshe Mash, Ariel Procaccia, Yair Zick
What is a fair way to assign rooms to several housemates, and divide the rent between them? This is not just a theoretical question: many people have used the Spliddit website to obtain envyfree solutions to rent division instances. But envy freeness, in and of itself, is insufficient to guarantee outcomes that people view as intuitive and acceptable. We therefore focus on solutions that optimize a criterion of social justice, subject to the envy freeness constraint, in order to pinpoint the “fairest” solutions. We develop a general algorithmic framework that enables the computation of such solutions in polynomial time. We then study the relations between natural optimization objectives, and identify the maximin solution, which maximizes the minimum utility subject to envy freeness, as the most attractive. We demonstrate, in theory and using experiments on real data from Spliddit, that the maximin solution gives rise to significant gains in terms of our optimization objectives. Finally, a user study with Spliddit users as subjects demonstrates that people find the maximin solution to be significantly fairer than arbitrary envyfree solutions; this user study is unprecedented in that it asks people about their realworld rent division instances. Based on these results, the maximin solution has been deployed on Spliddit since April 2015.

Rationalisation of Profiles of Abstract Argumentation Frameworks: Extended Abstract
Stephane Airiau, Elise Bonzon, Ulle Endriss, Nicolas Maudet, Julien Rossit
We review a recently introduced model in which each of a number of agents is endowed with an abstract argumentation framework reflecting her individual views regarding a given set of arguments. A question arising in this context is whether the diversity of views observed in such a situation is consistent with the assumption that every individual argumentation framework is induced by a combination of, first, some basic factual information and, second, the personal preferences of the agent concerned. We treat this question of rationalisability of a profile as an algorithmic problem and identify tractable and intractable cases. This is useful for understanding what types of profiles can reasonably be expected to occur in a multiagent system.

Summary: MultiAgent Path Finding with Kinematic Constraints
Wolfgang Hoenig, T. K. Satish Kumar, Liron Cohen, Hang Ma, Hong Xu, Nora Ayanian, Sven Koenig
MultiAgent Path Finding (MAPF) is well studied in both AI and robotics. Given a discretized environment and agents with assigned start and goal locations, MAPF solvers from AI find collisionfree paths for hundreds of agents with userprovided suboptimality guarantees. However, they ignore that actual robots are subject to kinematic constraints (such as velocity limits) and suffer from imperfect planexecution capabilities. We therefore introduce MAPFPOST to postprocess the output of a MAPF solver in polynomial time to create a planexecution schedule that can be executed on robots. This schedule works on nonholonomic robots, considers kinematic constraints, provides a guaranteed safety distance between robots, and exploits slack to avoid timeintensive replanning in many cases. We evaluate MAPFPOST in simulation and on differentialdrive robots, showcasing the practicality of our approach.

Evaluating Market User Interfaces for Electric Vehicle Charging using Bid2Charge
Sebastian Stein, Enrico Gerding, Adrian Nedea, Avi Rosenfeld, Nicholas Jennings
We consider settings where electric vehicle drivers participate in a market mechanism to charge their vehicles. Existing work typically assumes that participants are fully rational and can report their charging preferences accurately. However, this may not be reasonable in settings with nonexperts. To explore this, we design a novel game called Bid2Charge and compare a fully expressive interface that covers the entire space of preferences to two restricted interfaces that offer fewer possible reports. We show that restricting the users' preferences significantly reduces deliberation times while also leading to an increase in utility by up to 70%.
Thursday 24 08:30  10:00 EAR4  Early Career 4

Learning from Data Heterogeneity: Algorithms and Applications
Jingrui He
Nowadays, as an intrinsic property of big data, {\it data heterogeneity} can be seen in a variety of realworld applications, ranging from security to manufacturing, from healthcare to crowdsourcing. It refers to any inhomogeneity in the data, and can be present in a variety of forms, corresponding to different types of data heterogeneity, such as task/view/instance/oracle heterogeneity. As shown in previous work as well as our own work, learning from data heterogeneity not only helps people gain a better understanding of the large volume of data, but also provides a means to leverage such data for effective predictive modeling. In this paper, along with multiple real applications, we will briefly review stateoftheart techniques for learning from data heterogeneity, and demonstrate their performance at addressing these real world problems.

Unsupervised Learning via Total Correlation Explanation
Greg Ver Steeg
Learning by children and animals occurs effortlessly and largely without obvious supervision. Successes in automating supervised learning have not translated to the more ambiguous realm of unsupervised learning where goals and labels are not provided. Barlow (1961) suggested that the signal that brains leverage for unsupervised learning is dependence, or redundancy, in the sensory environment. Dependence can be characterized using the informationtheoretic multivariate mutual information measure called total correlation. The principle of Total Correlation Explanation (CorEx) is to learn representations of data that "explain" as much dependence in the data as possible. We review some manifestations of this principle along with successes in unsupervised learning problems across diverse domains including human behavior, biology, and language.

TBD
Emma Brunskill
None
Thursday 24 10:30  12:00 MLNN1  Neural Networks 1

Segmenting Chinese Microtext: Joint InformalWord Detection and Segmentation with Neural Networks
Meishan Zhang, Guohong Fu, Nan Yu
Stateoftheart Chinese word segmentation systems typically exploit supervised modelstrained on a standard manuallyannotated corpus,achieving performances over 95% on a similar standard testing corpus.However, the performances may drop significantly when the same models are applied onto Chinese microtext.One major challenge is the issue of informal words in the microtext.Previous studies show that informal word detection can be helpful for microtext processing.In this work, we investigate it under the neural setting, by proposing a joint segmentation model that integrates the detection of informal words simultaneously.In addition, we generate training corpus for the joint model by using existing corpus automatically.Experimental results show that the proposed model is highly effective for segmentation of Chinese microtext.

Privacy Issues Regarding the Application of DNNs to ActivityRecognition using Wearables and Its Countermeasures by Use of Adversarial Training
Yusuke Iwasawa, Kotaro Nakayama, Ikuko Yairi, Yutaka Matsuo
Deep neural networks have been successfully applied to activity recognition with wearables in terms of recognition performance. However, the blackbox nature of neural networks could lead to privacy concerns. Namely, generally it is hard to expect what neural networks learn from data, and so they possibly learn features that highly discriminate userinformation unintentionally, which increases the risk of informationdisclosure. In this study, we analyzed the features learned by conventional deep neural networks when applied to data of wearables to confirm this phenomenon.Based on the results of our analysis, we propose the use of an adversarial training framework to suppress the risk of sensitive/unintended information disclosure. Our proposed model considers both an adversarial user classifier and a regular activityclassifier during training, which allows the model to learn representations that help the classifier to distinguish the activities but which, at the same time, prevents it from accessing userdiscriminative information. This paper provides an empirical validation of the privacy issue and efficacy of the proposed method using three activity recognition tasks based on data of wearables. The empirical validation shows that our proposed method suppresses the concerns without any significant performance degradation, compared to conventional deep nets on all three tasks.

Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning
jingkuan song, Zhao Guo, Wu Liu, Dongxiang Zhang, Heng Tao Shen, Lianli Gao
Recent progress has been made in using attention based encoderdecoder framework for video captioning. However, most existing decoders apply the attention mechanism to every generated words including both visual words (e.g., ``gun'' and ``shooting'') and nonvisual words (e.g. ``the'', ``a'').However, these nonvisual words can be easily predicted using natural language model without considering visual signals or attention.Imposing attention mechanism on nonvisual words could mislead and decrease the overall performance of video captioning.To address this issue, we propose a hierarchical LSTM with adjusted temporal attention (hLSTMat) approach for video captioning. Specifically, the proposed framework utilizes the temporal attention for selecting specific frames to predict related words, while the adjusted temporal attention is for deciding whether to depend on the visual information or the language context information. Also, a hierarchical LSTMs is designed to simultaneously consider both lowlevel visual information and deep semantic information to support the video caption generation. To demonstrate the effectiveness of our proposed framework, we test our method on two prevalent datasets: MSVD and MSRVTT, and experimental results show that our approach outperforms the stateoftheart methods on both two datasets.

Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations
Andrew Ross, Michael Hughes, Finale DoshiVelez
Expressive classifiers such as neural networks are among the most accurate supervised learning methods in use today, but their opaque decision boundaries make them difficult to trust in critical applications. We propose a method to explain the predictions of any differentiable model via the gradient of the class label with respect to the input (which provides a normal to the decision boundary). Not only is this approach orders of magnitude faster at identifying input dimensions of high sensitivity than samplebased perturbation methods (e.g. LIME), but it also lends itself to efficiently discovering multiple qualitatively different decision boundaries as well as decision boundaries that are consistent with expert annotation. On multiple datasets, we show our approach generalizes much better when test conditions differ from those in training.

Selfpaced Convolutional Neural Networks
Hao Li, Maoguo Gong
Convolutional neural networks (CNNs) have achieved breakthrough performance in many pattern recognition tasks. In order to distinguish the reliable data from the noisy and confusing data, we improve CNNs with selfpaced learning (SPL) for enhancing the learning robustness of CNNs. In the proposed selfpaced convolutional network (SPCN), each sample is assigned to a weight to reflect the easiness of the sample. Then a dynamic selfpaced function is incorporated into the leaning objective of CNN to jointly learn the parameters of CNN and the latent weight variable. SPCN learns the samples from easy to complex and the sample weights can dynamically control the learning rates for converging to better values. To gain more insights of SPCN, theoretical studies are conducted to show that SPCN converges to a stationary solution and is robust to the noisy and confusing data. Experimental results on MNIST and rectangles datasets demonstrate that the proposed method outperforms baseline methods.

ExemplarCentered Supervised Shallow Parametric Data Embedding
Martin Renqiang Min, Hongyu Guo, Dongjin Song
Metric learning methods for dimensionality reduction in combination with kNearest Neighbors (kNN) have been extensively deployed in many classification, data embedding, and information retrieval applications. However, most of these approaches involve pairwise training data comparisons, and thus have quadratic computational complexity with respect to the size of training set, preventing them from scaling to fairly big datasets. Moreover, during testing, comparing test data against all the training data points is also expensive in terms of both computational cost and resources required. Furthermore, previous metrics are either too constrained or too expressive to be well learned. To effectively solve these issues, we present an exemplarcentered supervised shallow parametric data embedding model, using a Maximally Collapsing Metric Learning (MCML) objective. Our strategy learns a shallow highorder parametric embedding function and compares training/test data only with learned or precomputed exemplars, resulting in a cost function with linear computational complexity for both training and testing. We also empirically demonstrate, using several benchmark datasets, that for classification in twodimensional embedding space, our approach not only gains speedup of kNN by hundreds of times, but also outperforms stateoftheart supervised embedding approaches.
Thursday 24 10:30  12:00 MLDMUL1  Data Mining and Unsupervised Learning 1

Discovering RelevanceDependent Bicluster Structure from Relational Data
Iku Ohama, Takuya Kida, Hiroki Arimura
In this paper, we propose a statistical model for relevancedependent biclustering to analyze relational data. The proposed model factorizes relational data into bicluster structure with two features: (1) each object in a cluster has a relevance value, which indicates how strongly the object relates to the cluster and (2) all clusters are related to at least one dense block. These features simplify the task of understanding the meaning of each cluster because only a few highly relevant objects need to be inspected. We introduced the RelevanceDependent Bernoulli Distribution (RBD) as a prior for relevancedependent binary matrices and proposed the novel RelevanceDependent Infinite Biclustering (RIB) model, which automatically estimates the number of clusters. Posterior inference can be performed efficiently using a collapsed Gibbs sampler because the parameters of the RIB model can be fully marginalized out. Experimental results show that the RIB extracts more essential bicluster structure with better computational efficiency than conventional models. We further observed that the biclustering results obtained by RIB facilitate interpretation of the meaning of each cluster.

Affinity Learning for Mixed Data Clustering
Nan Li, Longin Jan Latecki
In this paper, we propose a novel affinity learning based framework for mixed data clustering, which includes: how to process data with mixedtype attributes, how to learn affinities between data points, and how to exploit the learned affinities for clustering. In the proposed framework, each original data attribute is represented with several abstract objects defined according to the specific data type and values. Each attribute value is transformed into the initial affinities between the data point and the abstract objects of attribute. We refine these affinities and infer the unknown affinities between data points by taking into account the interconnections among the attribute values of all data points. The inferred affinities between data points can be exploited for clustering. Alternatively, the refined affinities between data points and the abstract objects of attributes can be transformed into new data features for clustering. Experimental results on many real world data sets demonstrate that the proposed framework is effective for mixed data clustering.

Understanding People Lifestyles: Construction of Urban Movement Knowledge Graph from GPS Trajectory
Chenyi Zhuang, Nicholas Jing Yuan, Ruihua Song, Xing Xie, Qiang Ma
Technologies are increasingly taking advantage of the explosion in the amount of data generated by social multimedia (e.g., web searches, ad targeting, and urban computing). In this paper, we propose a multiview learning framework for presenting the construction of a new urban movement knowledge graph, which could greatly facilitate the research domains mentioned above. In particular, by viewing GPS trajectory data from temporal, spatial, and spatiotemporal points of view, we construct a knowledge graph of which nodes and edges are their locations and relations, respectively. On the knowledge graph, both nodes and edges are represented in latent semantic space. We verify its utility by subsequently applying the knowledge graph to predict the extent of user attention (high or low) paid to different locations in a city. Experimental evaluations and analysis of a realworld dataset show significant improvements in comparison to stateoftheart methods.

Mining Convex Polygon Patterns with Formal Concept Analysis
Aimene Belfodil, Sergei O. Kuznetsov, Céline Robardet, Mehdi Kaytoue
Pattern mining is an important task in AI for eliciting hypotheses from the data. When it comes to spatial data, the geocoordinates are often considered independently as two different attributes. Consequently, rectangular patterns are searched for. Such an arbitrary form is not able to capture interesting regions in general. We thus introduce convex polygons, a good tradeoff for capturing high density areas in any pattern mining task. Our contribution is threefold: (i) We formally introduce such patterns in Formal Concept Analysis (FCA), (ii) we give all the basic bricks for mining polygons with exhaustive search and pattern sampling, and (iii) we design several algorithms that we compare experimentally.

See without looking: joint visualization of sensitive multisite datasets
Debbrata Kumar Saha, Vince Calhoun, Sandeep Panta, Sergey Plis
Visualization of high dimensional largescale datasets via an embedding into a 2D map is a powerful exploration tool for assessing latent structure in the data and detecting outliers. There are many methods developed for this task but most assume that all pairs of samples are available for common computation. Specifically, the distances between all pairs of points need to be directly computable. In contrast, we work with sensitive neuroimaging data, when local sites cannot share their samples and the distances cannot be easily computed across the sites. Yet, the desire is to let all the local data participate in collaborative computation without leaving their respective sites. In this scenario, a quality control tool that visualizes decentralized dataset in its entirety via global aggregation of local computations is especially important as it would allow screening of samples that cannot be evaluated otherwise. This paper introduces an algorithm to solve this problem: decentralized data stochastic neighbor embedding (dSNE). Based on the MNIST dataset we introduce metrics for measuring the embedding quality and use them to compare dSNE to its centralized counterpart. We also apply dSNE to a multisite neuroimaging dataset with encouraging results.

Beyond The Nystr\"{o}m Approximation: Speeding Up Spectral Clustering Using Uniform Sampling And Weighted Kernel Kmeans
Mahesh Mohan, Claire Monteleoni
In this paper we present a framework for spectral clustering based on the following simple scheme: sample a subset of the input points, compute the clusters for the sampled subset using weighted kernel kmeans (Dhillon et al. 2004) and use the resulting centers to compute a clustering for the remaining data points. For the case where the points are sampled uniformly at random without replacement, we show that the number of samples required depends mainly on the number of clusters and the diameter of the set of points in the kernel space. Experiments show that the proposed framework outperforms the approaches based on the Nystr\"{o}m approximation both in terms of accuracy and computation time.
Thursday 24 10:30  12:00 CSCSO  Combinatorial Search and Optimisation

Weighted Model Integration with Orthogonal Transformations
David Merrell, Aws Albarghouthi, Loris D'Antoni
Weighted model counting and integration (WMC/WMI) are natural problems to which we can reduce many probabilistic inference tasks, e.g., in Bayesian networks, Markov networks, and probabilistic programs. Typically, we are given a firstorder formula, where each satisfying assignment is associated with a weighte.g., a probability of occurrenceand our goal is to compute the total weight of the formula. In this paper, we target exact inference techniques for WMI that leverage the power of satisfiability modulo theories (SMT) solvers to decompose a firstorder formula in linear real arithmetic into a set of hyperrectangular regions whose weight is easy to compute. We demonstrate the challenges of hyperrectangular decomposition and present a novel technique that utilizes orthogonal transformations to transform formulas in order to enable efficient inference. Our evaluation demonstrates our technique's ability to improve the time required to achieve exact probability bounds.

Contextual Covariance Matrix Adaptation Evolutionary Strategies
abbas Abdolmaleki, Bob Price, Nuno Lua, Luis Paulo Reis, Gerhard Neumann
Many stochastic search algorithms are designed to optimize a fixed objective function to learn a task, i.e., if the objective function changes slightly, for example, due to a change in the situation or context of the task, relearning is required to adapt to the new context. For instance, if we want to learn a kicking movement for a soccer robot, we have to relearn the movement for different ball locations. Such relearning is undesired as it is highly inefficient and many applications require a fast adaptation to a new context/situation. Therefore, we investigate contextual stochastic search algorithms that can learn multiple, similar tasks simultaneously. Current contextual stochastic search methods are based on policy search algorithms and suffer from premature convergence and the need for parameter tuning. In this paper, we extend the well known CMAES algorithm to the contextual setting and illustrate its performance on several contextual tasks. Our new algorithm, called contextual CMAES, leverages from contextual learning while it preserves all the features of standard CMAES such as stability, avoidance of premature convergence, step size control and a minimal amount of parameter tuning.

From Decimation to Local Search and Back: A New Approach to MaxSAT
Shaowei Cai, Chuan Luo, Haochen Zhang
Maximum Satisfiability (MaxSAT) is an important NPhard combinatorial optimization problem with many applications and MaxSAT solving has attracted much interest. This work proposes a new incomplete approach to MaxSAT. We propose a novel decimation algorithm for MaxSAT, and then combine it with a local search algorithm. Our approach works by interleaving between the decimation algorithm and the local search algorithm, with useful information passed between them. Experiments show that our solver DeciLS achieves state of the art performance on all unweighted benchmarks from the MaxSAT Evaluation 2016. Moreover, compared to SATbased MaxSAT solvers which dominate industrial benchmarks for years, it performs better on industrial benchmarks and significantly better on application formulas from SAT Competition. We also extend this approach to (Weighted) Partial MaxSAT, and the resulting solvers significantly improve local search solvers on crafted and industrial benchmarks, and are complementary (better on WPMS crafted benchmarks) to SATbased solvers.

A Reduction based Method for Coloring Very Large Graphs
Jinkun Lin, Shaowei Cai, Chuan Luo, Kaile Su
The graph coloring problem (GCP) is one of the most studied NP hard problems and has numerous applications. Despite the practical importance of GCP, there are limited works in solving GCP for very large graphs. This paper explores techniques for solving GCP on very large real world graphs.We first propose a reduction rule for GCP, which is based on a novel concept called degree bounded independent set.The rule is iteratively executed by interleaving between lower bound computation and graph reduction. Based on this rule, we develop a novel method called FastColor, which also exploits fast clique and coloring heuristics. We carry out experiments to compare our method FastColor with two best algorithms for coloring large graphs we could find. Experiments on a broad range of real world large graphs show the superiority of our method. Additionally, our method maintains both upper bound and lower bound on the optimal solution, and thus it proves an optimal solution when the upper bound meets the lower bound. In our experiments, it proves the optimal solution for 97 out of 142 instances.

FronttoEnd Bidirectional Heuristic Search with NearOptimal Node Expansions
Nathan Sturtevant, Jingwei Chen, Robert Holte, Sandra Zilles
It is wellknown that any admissible unidirectional heuristic search algorithm must expand all states whose fvalue is smaller than the optimal solution cost when using a consistent heuristic. Such states are called “surely expanded” (s.e.). A recent study characterized s.e. pairs of states for bidirectional search with consistent heuristics: if a pair of states is s.e. then at least one of the two states must be expanded. This paper derives a lower bound, VC, on the minimum number of expansions required to cover all s.e. pairs, and present a new admissible fronttoend bidirectional heuristic search algorithm, NearOptimal Bidirectional Search (NBS), that is guaranteed to do no more than 2VC expansions. We further prove that no admissible fronttoend algorithm has a worst case better than 2VC. Experimental results show that NBS competes with or outperforms existing bidirectional search algorithms, and often outperforms A* as well.

Estimating the size of search trees by sampling with domain knowledge
Gleb Belov, Samuel Esler, Dylan Fernando, Pierre Le Bodic, George Nemhauser
We show how recentlydefined abstract models of the BranchandBound algorithm can be used to obtain information on how the nodes are distributed in B&B search trees. This can be directly exploited in the form of probabilities in a sampling algorithm given by Knuth that estimates the size of a search tree. This method reduces the offline estimation error by a factor of two on search trees from MixedInteger Programming instances.
Thursday 24 10:30  12:00 KRPREF  Preferences

Revisiting Unrestricted Rebut and Preferences in Structured Argumentation.
Jesse Heyninck, Christian Straßer
In structured argumentation frameworks such as ASPIC+, rebuts are only allowed in conclusions produced by defeasible rules. This has been criticized as counterintuitive especially in dialectical contexts. In this paper we show that ASPIC, a system allowing for unrestricted rebuts, suffers from contamination problems. We remedy this shortcoming by generalizing the attack rule of unrestricted rebut. Our resulting system satisfies the usual rationality postulates for prioritized rule bases.

Pareto Optimal Allocation under Uncertain Preferences
Haris Aziz, Ronald de Haan, Baharak Rastegari
The assignment problem is one of the most wellstudied settings in social choice, matching, and discrete allocation. We consider this problem with the additional feature that agents' preferences involve uncertainty. The setting with uncertainty leads to a number of interesting questions including the following ones. How to compute an assignment with the highest probability of being Pareto optimal? What is the complexity of computing the probability that a given assignment is Pareto optimal? Does there exist an assignment that is Pareto optimal with probability one? We consider these problems under two natural uncertainty models: (1) the lottery model in which each agent has an independent probability distribution over linear orders and (2) the joint probability model that involves a joint probability distribution over preference profiles. For both of these models, we present a number of algorithmic and complexity results highlighting the difference and similarities in the complexity of the two models.

Fair Allocation based on Diminishing Differences
Erel Segalhalevi, Haris Aziz, Avinatan Hassidim
Ranking alternatives is a natural way for humans to explain their preferences. It is being used in many settings, such as school choice (NY, Boston), Course allocations, and the Israeli medical lottery. In some cases (such as the latter two), several ``items'' are given to each participant. Without having any information on the underlying cardinal utilities, arguing about fairness of allocation requires extending the ordinal item ranking to ordinal bundle ranking. The most commonly used such extension is stochastic dominance (SD), where a bundle X is preferred over a bundle Y if its score is better according to all additive score functions. SD is a very conservative extension, by which few allocations are necessarily fair while many allocations are possibly fair. We propose to make a natural assumption on the underlying cardinal utilities of the players, namely that the difference between two items at the top is larger than the difference between two items at the bottom. This assumption implies a preference extension which we call diminishing differences (DD), where a X is preferred over Y if its score is better according to all additive score functions satisfying the DD assumption. We give a full characterization of allocations that are necessarilyproportional or possiblyproportional according to this assumption. Based on this characterization, we present a polynomialtime algorithm for finding a necessarilyDDproportional allocation if it exists. Using simulations, we show that with high probability, a necessarilyproportional allocation does not exist but a necessarilyDDproportional allocation exists, and moreover, that allocation is proportional according to the underlying cardinal utilities.

Dominance and Optimisation Based on ScaleInvariant Maximum Margin Preference Learning
Mojtaba Montazery, Nic Wilson
In the task of preference learning, there can be natural invariance properties that one might often expect a method to satisfy. These include (i) invariance to scaling of a pair of alternatives, e.g., replacing a pair (a,b) by (2a,2b); and (ii) invariance to rescaling of features across all alternatives. Maximum margin learning approaches satisfy such invariance properties for pairs of test vectors, but not for the preference input pairs, i.e., scaling the inputs in a different way could result in a different preference relation. In this paper we define and analyse more cautious preference relations that are invariant to the scaling of features, or inputs, or both simultaneously; this leads to computational methods for testing dominance with respect to the induced relations, and for generating optimal solutions among a set of alternatives. In our experiments, we compare the relations and their associated optimality sets based on their decisiveness, computation time and cardinality of the optimal set. We also discuss connections with imprecise probability.

Efficient Inference and Computation of Optimal Alternatives for Preference Languages Based On Lexicographic Models
Nic Wilson, AnneMarie George
We analyse preference inference, through consistency, for general preference languages based on lexicographic models. We identify a property, which we call strong compositionality, that applies for many natural kinds of preference statement, and that allows a greedy algorithm for determining consistency of a set of preference statements. We also consider different natural definitions of optimality, and their relations to each other, for general preference languages based on lexicographic models. Based on our framework, we show that testing consistency, and thus inference, is polynomial for a specific preference language which allows strict and nonstrict statements, comparisons between outcomes and between partial tuples, both ceteris paribus and strong statements, and their combination. Computing different kinds of optimal sets is also shown to be polynomial; this is backed up by our experimental results.

Proposing a Highly Accurate Hybrid ComponentBased Factorised Preference Model in Recommender Systems
farhad zafari, R. Rahmani, I. Moser
Recommender systems play an important role in today's electronic markets due to the large benefits they bring by helping businesses understand their customers' needs and preferences. The major preference components modelled by current recommender systems include user and item biases, feature value preferences, conditional dependencies, temporal preference drifts, and social influence on preferences. In this paper, we introduce a new hybrid latent factor model that achieves great accuracy by integrating all these preference components in a unified model efficiently. The proposed model employs gradient descent to optimise the model parameters, and an evolutionary algorithm to optimise the hyperparameters and gradient descent learning rates. Using two popular datasets, we investigate the interaction effects of the preference components with each other.We conclude that depending on the dataset, different interactions exist between the preference components. Therefore, understanding these interaction effects is crucial in designing an accurate preference model in every preference dataset and domain.Our results show that on both datasets, different combinations of components result in different accuracies of recommendation, suggesting that some parts of the model interact strongly. Moreover, these effects are highly datasetdependent, suggesting the need for exploring these effects before choosing the appropriate combination of components.
Thursday 24 10:30  12:00 MASFVVS  Formal Verification, Validation and Synthesis

Process Plan Controllers for NonDeterministic Manufacturing Systems
Paolo Felli, Lavindra de Silva, Brian Logan, Svetan Ratchev
Determining the most appropriate means of producing a given product, i.e., which manufacturing and assembly tasks need to be performed in which order and how, is termed process planning. In process planning, abstract manufacturing tasks in a process recipe are matched to available manufacturing resources, e.g., CNC machines and robots, to give an executable process plan. A process plan controller then delegates each operation in the plan to specific manufacturing resources. In this paper we present an approach to the automated computation of process plans and process plan controllers. We extend previous work to support both nondeterministic (i.e., partially controllable) resources, and to allow operations to be performed in parallel on the same part. We show how implicit fairness assumptions can be captured in this setting, and how this impacts the definition of process plans.

Parameterised Verification of Dataaware Multiagent Systems
Francesco Belardinelli, Panagiotis Kouvaros, Alessio Lomuscio
We introduce parameterised dataaware multiagent systems, a formalism to reason about the temporalepistemic properties of arbitrarily large collections of homogeneous agents, each operating on an infinite data domain. We show that their parameterised verification problem is semidecidable for classes of interest. This is demonstrated by separately addressing the unboundedness of the number of agents and the the data domain. In doing so we reduce the parameterised model checking problem for these systems to that of parameterised verification for interleaved interpreted systems. We illustrate the expressivity of the formal model by modelling English auctions with an unbounded number of bidders on unbouded data and show how the technique here introduced can be used to give formal guarantees on the resulting system behaviour.

A Novel Symbolic Approach to Verifying Epistemic Properties of Programs
Nikos Gorogiannis, Franco Raimondi, Ioana Boureanu
We introduce a framework for the symbolic verification of epistemic properties of programs expressed in a class of generalpurpose programming languages. To this end, we reduce the verification problem to that of satisfiability of firstorder formulae in appropriate theories. We prove the correctness of our reduction and we validate our proposal by applying it to two examples: the dining cryptographers problem and the ThreeBallot voting protocol. We put forward an implementation using existing solvers, and report experimental results showing that the approach can perform better than stateoftheart symbolic model checkers for temporalepistemic logic.

Verifying Faulttolerance in Parameterised MultiAgent Systems
Panagiotis Kouvaros, Alessio Lomuscio
We develop a technique to evaluate the faulttolerance of a multiagent system whose number of agents is unknown at design time. We present a method for injecting a variety of nonideal behaviours, or faults, studied in the safetyanalysis literature into the abstract agent templates that are used to generate an unbounded family of multiagent systems with different sizes. We define the parameterised faulttolerance problem as the decision problem of establishing whether any concrete system, in which the ratio of faulty versus nonfaulty agents is under a given threshold, satisfies a given temporalepistemic specification. We put forward a sound and complete technique for solving the problem for the semantical setup considered. We present an implementation and a case study identifying the threshold under which the alpha swarm aggregation algorithm is robust to faults against its temporalepistemic specifications.

Verification of Broadcasting MultiAgent Systems against an Epistemic Strategy Logic
Francesco Belardinelli, Alessio Lomuscio, Aniello Murano, Sasha Rubin
We study a class of synchronous, perfectrecall multiagent systemswith imperfect information and broadcasting (i.e., fully observableactions). We define an epistemic extension of strategy logic withincomplete information and the assumption of uniform and coherentstrategies. In this setting, we prove that the model checking problem,and thus rational synthesis, is decidable with nonelementarycomplexity. We exemplify the applicability of the framework on arational secretsharing scenario.

An AbstractionRefinement Methodology for Reasoning about Network Games
Guy Avni, Shibashis Guha, Orna Kupferman
Network games (NGs) are played on directed graphs and are extensively used in network design and analysis. Search problems for NGs include finding special strategy profiles such as a Nash equilibrium and a globally optimal solution. The networks modeled by NGs may be huge. In formal verification, abstraction has proven to be an extremely effective technique for reasoning about systems with big and even infinite state spaces. We describe an abstractionrefinement methodology for reasoning about NGs. Our methodology is based on an abstraction function that maps the state space of an NG to a much smaller state space. We search for a global optimum and a Nash equilibrium by reasoning on an under and an overapproximation defined on top of this smaller state space. When the approximations are too coarse to find such profiles, we refine the abstraction function. Our experimental results demonstrate the efficiency of the methodology.
Thursday 24 10:30  12:00 MLTAML2  Transfer, Adaptation, MultiTask Learning 2

A Generalized Recurrent Neural Architecture for Text Classification with MultiTask Learning
Honglun Zhang, Yongkun Wang, Liqiang Xiao, Yaohui Jin
Multitask learning leverages potential correlations among related tasks to extract common features and yield performance gains. However, most previous works only consider simple or weak interactions, thereby failing to model complex correlations among three or more tasks. In this paper, we propose a multitask learning architecture with four types of recurrent neural layers to fuse information across multiple related tasks. The architecture is structurally flexible and considers various interactions among tasks, which can be regarded as a generalized case of many previous works. Extensive experiments on five benchmark datasets for text classification show that our model can significantly improve performances of related tasks with additional information from others.

Crossmodal Common Representation Learning by Hybrid Transfer Network
Xin Huang, Yuxin Peng, Mingkuan Yuan
DNNbased crossmodal retrieval is a research hotspot to retrieve across different modalities as image and text, but existing methods often face the challenge of insufficient crossmodal training data. In singlemodal scenario, similar problem is usually relieved by transferring knowledge from largescale auxiliary datasets (as ImageNet). Knowledge from such singlemodal datasets is also very useful for crossmodal retrieval, which can provide rich general semantic information that can be shared across different modalities. However, it is challenging to transfer useful knowledge from singlemodal (as image) source domain to crossmodal (as image/text) target domain. Knowledge in source domain cannot be directly transferred to both two different modalities in target domain, and the inherent crossmodal correlation contained in target domain provides key hints for crossmodal retrieval which should be preserved during transfer process. This paper proposes Crossmodal Hybrid Transfer Network (CHTN) with two subnetworks: Modalsharing transfer subnetwork utilizes the modality in both source and target domains as a bridge, for transferring knowledge to both two modalities simultaneously; Layersharing correlation subnetwork preserves the inherent crossmodal semantic correlation to further adapt to crossmodal retrieval task. Crossmodal data can be converted to common representation by CHTN for retrieval, and comprehensive experiment on 3 datasets shows its effectiveness.

Demystifying Neural Style Transfer
Yanghao Li, Naiyan Wang, Jiaying Liu, Xiaodi Hou
Neural Style Transfer has recently demonstrated very exciting results which catches eyes in both academia and industry. Despite the amazing results, the principle of neural style transfer, especially why the Gram matrices could represent style remains unclear. In this paper, we propose a novel interpretation of neural style transfer by treating it as a domain adaptation problem. Specifically, we theoretically show that matching the Gram matrices of feature maps is equivalent to minimize the Maximum Mean Discrepancy (MMD) with the second order polynomial kernel. Thus, we argue that the essence of neural style transfer is to match the feature distributions between the style images and the generated images. To further support our standpoint, we experiment with several other distribution alignment methods, and achieve appealing results. We believe this novel interpretation connects these two important research fields, and could enlighten future researches.

Completely Heterogeneous Transfer Learning with Attention  What And What Not To Transfer
Seungwhan Moon, Jaime Carbonell
We study a transfer learning framework where source and target datasets are heterogeneous in both feature and label spaces. Specifically, we do not assume explicit relations between source and target tasks a priori, and thus it is crucial to determine what and what not to transfer from source knowledge. Towards this goal, we define a new heterogeneous transfer learning approach that (1) selects and attends to an optimized subset of source samples to transfer knowledge from, and (2) builds a unified transfer network that learns from both source and target knowledge. This method, termed "Attentional Heterogeneous Transfer", along with a newly proposed unsupervised transfer loss, improve upon the previous stateoftheart approaches on extensive simulations as well as a challenging heterolingual text classification task.

Dynamic MultiTask Learning with Convolutional Neural Network
Zhengyan Ma, Yuchun Fang, Zhaoxiang Zhang, Xuyao Zhang, Xiang Bai
Multitask learning and deep convolutional neural network (CNN) have been successfully used in various fields. This paper considers the integration of CNN and multitask learning in a novel way to further improve the performance of multiple related tasks. Existing multitask CNN models usually empirically combine different tasks into a group which is then trained jointly with a strong assumption of model commonality. Furthermore, traditional approaches usually only consider small number of tasks with rigid structure, which is not suitable for largescale applications. In light of this, we propose a dynamic multitask CNN model to handle these problems. The proposed model directly learns the task relations from data instead of subjective task grouping. Due to its flexible structure, it supports taskwise incremental training, which is useful for efficient training of massive tasks. Specifically, we add a new task transfer connection (TTC) between the layers of each task. The learned TTC is able to reflect the correlation among different tasks guiding the model dynamically adjusting the multiplexing of the information among different tasks. With the help of TTC, multiple related tasks can further boost the whole performance for each other. Experiments demonstrate that the proposed dynamic multitask CNN model outperforms traditional approaches.

General Heterogeneous Transfer Distance Metric Learning via Knowledge Fragments Transfer
Yong Luo, Yonggang Wen, Tongliang Liu, Dacheng Tao
Transfer learning aims to improve the performance of target learning task by leveraging information (or transferring knowledge) from other related tasks. Recently, transfer distance metric learning (TDML) has attracted lots of interests, but most of these methods assume that feature representations for the source and target learning tasks are the same. Hence, they are not suitable for the applications, in which the data are from heterogeneous domains (feature spaces, modalities and even semantics). Although some existing heterogeneous transfer learning (HTL) approaches is able to handle such domains, they lack flexibility in realworld applications, and the learned transformations are often restricted to be linear. We therefore develop a general and flexible heterogeneous TDML (HTDML) framework based on the knowledge fragment transfer strategy. In the proposed HTDML, any (linear or nonlinear) distance metric learning algorithms can be employed to learn the source metric beforehand. Then a set of knowledge fragments are extracted from the prelearned source metric to help target metric learning. In addition, either linear or nonlinear distance metric can be learned for the target domain. Extensive experiments on both scene classification and object recognition demonstrate superiority of the proposed method.
Thursday 24 10:30  12:00 MLREL2  Reinforcement Learning 2

Weighted Double Qlearning
Zongzhang Zhang, Zhiyuan Pan, Mykel Kochenderfer
Qlearning is a popular reinforcement learning algorithm, but it can perform poorly in stochastic environments due to overestimating action values. Overestimation is due to the use of a single estimator that uses the maximum action value as an approximation for the maximum expected action value. To avoid overestimation in Qlearning, the double Qlearning algorithm was recently proposed, which uses the double estimator method. It uses two estimators from independent sets of experiences, with one estimator determining the maximizing action and the other providing the estimate of its value. Double Qlearning sometimes underestimates the action values. This paper introduces a weighted double Qlearning algorithm, which is based on the construction of the weighted double estimator, with the goal of balancing between the overestimation in the single estimator and the underestimation in the double estimator. Empirically, the new algorithm is shown to perform well on several MDP problems.

Sample Efficient Policy Search for Optimal Stopping Domains
Karan Goel, Christoph Dann, Emma Brunskill
Optimal stopping problems consider the question of deciding when to stop an observationgenerating process in order to maximize a return. We examine the problem of simultaneously learning and planning in such domains, when data is collected directly from the environment. We propose GFSE, a simple and flexible modelfree policy search method that reuses data for sample efficiency by leveraging problem structure. We bound the sample complexity of our approach to guarantee uniform convergence of policy value estimates, tightening existing PAC bounds to achieve logarithmic dependence on horizon length for our setting. We also examine the benefit of our method against prevalent modelbased and modelfree approaches on 3 domains taken from diverse fields.

Learning from Demonstrations with HighLevel Side Information
Min Wen, Ivan Papusha, Ufuk Topcu
We consider the problem of learning from demonstration, where extra side information about the demonstration is encoded as a cosafe linear temporal logic formula. We address two known limitations of existing methods that do not account for such side information. First, the policies that result from existing methods, while matching the expected features or likelihood of the demonstrations, may still be in conflict with highlevel objectives not explicit in the demonstration trajectories. Second, existing methods fail to provide a priori guarantees on the outofsample generalization performance with respect to such highlevel goals. This lack of formal guarantees can prevent the application of learning from demonstration to safety critical systems, especially when inference to state space regions with poor demonstration coverage is required. In this work, we show that side information, when explicitly taken into account, indeed improves the performance and safety of the learned policy with respect to task implementation. Moreover, we describe an automated procedure to systematically generate the features that encode side information expressed in temporal logic.

Constrained Bayesian Reinforcement Learning via Approximate Linear Programming
Jongmin Lee, Youngsoo Jang, Pascal Poupart, KeeEung Kim
In this paper, we consider the safe learning scenario where we need to restrict the exploratory behavior of a reinforcement learning agent. Specifically, we treat the problem as a form of Bayesian reinforcement learning in an environment that is modeled as a constrained MDP (CMDP) where the cost function penalizes undesirable situations. We propose a modelbased Bayesian reinforcement learning (BRL) algorithm for such an environment, eliciting risksensitive exploration in a principled way. Our algorithm efficiently solves the constrained BRL problem by approximate linear programming, and generates a finite state controller in an offline manner. We provide theoretical guarantees and demonstrate empirically that our approach outperforms the state of the art.

Universal Reinforcement Learning Algorithms: Survey and Experiments
John Aslanides, Jan Leike, Marcus Hutter
Many stateoftheart reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP). In contrast, the field of universal reinforcement learning (URL) is concerned with algorithms that make as few assumptions as possible about the environment. The universal Bayesian agent AIXI and a family of related URL algorithms have been developed in this setting. While numerous theoretical optimality results have been proven for these agents, there has been no empirical investigation of their behavior to date. We present a short and accessible survey of these URL algorithms under a unified notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partiallyobservable gridworld environments. We also present an open source reference implementation of the algorithms which we hope will facilitate further understanding of, and experimentation with, these ideas.

CountBased Exploration in Feature Space for Reinforcement Learning
Jarryd Martin, Suraj Narayanan S., Tom Everitt, Marcus Hutter
We introduce a new countbased optimistic exploration algorithm for Reinforcement Learning (RL) that is feasible in environments with highdimensional stateaction spaces. The success of RL algorithms in these domains depends crucially on generalisation from limited training experience. Function approximation techniques enable RL agents to generalise in order to estimate the value of unvisited states, but at present few methods enable generalisation regarding uncertainty. This has prevented the combination of scalable RL algorithms with efficient exploration strategies that drive the agent to reduce its uncertainty. We present a new method for computing a generalised state visitcount, which allows the agent to estimate the uncertainty associated with any state. Our \phipseudocount achieves generalisation by exploiting same feature representation of the state space that is used for value function approximation. States that have less frequently observed features are deemed more uncertain. The \phiExplorationBonus algorithm rewards the agent for exploring in feature space rather than in the untransformed state space. The method is simpler and less computationally expensive than some previous proposals, and achieves near stateoftheart results on highdimensional RL benchmarks.
Thursday 24 10:30  12:30 Competition Alibaba
Thursday 24 10:30  12:30 Competition Angry Birds
Thursday 24 10:30  12:30 JOUKR1  Journal Track: Knowledge Representation 1

New Canonical Representations by Augmenting OBDDs with Conjunctive Decomposition (Extended Abstract)
Yong Lai, Minghao Yin, Dayou Liu
We identify two families of canonical representations called ROBDD[/\i^]_C and ROBDD[/\T^,i]_T by augmenting ROBDD with two types of conjunctive decompositions. These representations cover the three existing languages ROBDD, ROBDD with as many implied literals as possible (ROBDDL_&infin), and AND/OR BDD. We introduce a new time efficiency criterion called rapidity which reflects the idea that exponential operations may be preferable if the language can be exponentially more succinct. Then we demonstrate that the expressivity, succinctness and operation rapidity do not decrease from ROBDD[/\T^,i]_T to ROBDD[/\i^]_C, and then to ROBDD[/\i+1^]_C. We also demonstrate that ROBDD[/\i^]_C (i > 1) and ROBDD[/\T^,i]_T are not less tractable than ROBDDL_&infin and ROBDD, respectively. Finally, we develop a compiler for ROBDD[/\&infin^]_C which significantly advances the compiling efficiency of canonical representations.

On the Expressivity of Inconsistency Measures (Extended Abstract)
Matthias Thimm
We survey recent approaches to inconsistency measurement in propositional logic and provide a comparative analysis in terms of their expressivity. For that, we introduce four different expressivity characteristics that quantitatively assess the number of different knowledge bases that a measure can distinguish. Our approach aims at complementing ongoing discussions on rationality postulates for inconsistency measures by considering expressivity as a desirable property. We evaluate a large selection of measures on the proposed characteristics and conclude that a distancebased measure from [Grant and Hunter, 2013] has maximal expressivity along all considered characteristics.

The Ceteris Paribus Structure of Logics of Game Forms
Davide Grossi, Emiliano Lorini, François Schwarzentruber
We present a simple Ceteris Paribus Logic (CP) and study its relationship with existing logics that deal with the representation of choice and power in games in normal form including atemporal STIT, Coalition Logic of Propositional Control (CLPC) and Dynamic Logic of Propositional Assignments (DLPA). Thanks to the polynomial reduction of the satisfiability problem for atemporal STIT in the satisfiability problem for CP, we obtain a complexity result for the latter problem.

A new semantics for overriding in description logics
Piero Bonatti, Marco Faella, Iliana M. Petrova, Luigi Sauro
Many modern applications of description logics (DLs, for short), such as biomedical ontologies and semantic web policies, provide fresh motivations for extending DLs with nonmonotonic inferences  a topic that has attracted a significant amount of attention along the years. Despite this, nonmonotonic inferences are not yet supported by DL technology due to a number of issues related to expressiveness, computational complexity, and optimizations. This paper contributes to the practical support of nonmonotonic inferences in description logics by introducing a new semantics expressly designed to address knowledge engineering needs. This formalism has appealing expressiveness, enjoys nice computational properties, and constitutes an interesting solution to an ample class of application needs. The formalism is validated through extensive comparison with the other nonmonotonic DLs, and systematic scalability tests. The test case generator and its novel validation methodology constitute a further contribution of this paper.

Automated Conjecturing I: Fajtlowicz's Dalmatian Heuristic Revisited (Extended Abstract)
Craig E. Larson, Nico Van Cleemput
This condensed summary highlights the results of a 2016 AIJ paper reporting on a successful generalpurpose conjecturing program.

Bayesian Network Structure Learning with Integer Programming: Polytopes, Facets and Complexity (Extended Abstract)
James Cussens, Matti Järvisalo, Janne H. Korhonen, Mark Bartlett
Developing accurate algorithms for learning structures of probabilistic graphical models is an important problem within modern AI research. Here we focus on scorebased structure learning for Bayesian networks as arguably the most central class of graphical models. A successful generic approach to optimal Bayesian network structure learning (BNSL), based on integer programming (IP), is implemented in the Gobnilp system. Despite the recent algorithmic advances, current understanding of foundational aspects underlying the IP based approach to BNSL is still somewhat lacking. In this paper, we provide theoretical contributions towards understanding fundamental aspects of cutting planes and the related separation problem in this context, ranging from NPhardness results to analysis of polytopes and the related facets in connection to BNSL.
Thursday 24 10:30  12:30 SISMISC  Sister Conference Track: HCI, CBR, Machine Learning, Robotics

Competence Guided Model for Casebase Maintenance
Ditty Mathew, Sutanu Chakraborti
A competence guided casebase maintenance algorithm retains a case in the casebase if it is useful to solve many problems and ensures that the casebase is highly competent. In this paper, we address the compositional adaptation process (of which single case adaptation is a special case) during casebase maintenance by proposing a case competence model for which we propose a measure called retention score to estimate the retention quality of a case. We also propose a revised algorithm based on the retention score to estimate the competent subset of a casebase. We used synthetic datasets to test the effectiveness of the competent subset obtained from the proposed model. We also applied this model in a tutoring application and analyzed the competent subset of concepts in tutoring resources. Empirical results show that the proposed model is effective and overcomes the limitation of footprintbased competence model in compositional adaptation applications.

Local Topic Discovery via Boosted Ensemble of Nonnegative Matrix Factorization
Sangho Suh, Jaegul Choo, Joonseok Lee, Chandan Reddy
Nonnegative matrix factorization (NMF) has been increasingly popular for topic modeling of largescale documents. However, the resulting topics often represent only general, thus redundant information about the data rather than minor, but potentially meaningful information to users. To tackle this problem, we propose a novel ensemble model of nonnegative matrix factorization for discovering highquality local topics. Our method leverages the idea of an ensemble model to successively perform NMF given a residual matrix obtained from previous stages and generates a sequence of topic sets. The novelty of our method lies in the fact that it utilizes the residual matrix inspired by a stateoftheart gradient boosting model and applies a sophisticated local weighting scheme on the given matrix to enhance the locality of topics, which in turn delivers highquality, focused topics of interest to users.

MultiType Activity Recognition from a Robot's Viewpoint
Ilaria Gori, J. K. Aggarwal, Larry Matthies, Michael. S. Ryoo
The literature in computer vision is rich of works where different types of activities  single actions, two persons interactions or egocentric activities, to name a few  have been analyzed. However, traditional methods treat such types of activities separately, while in real settings detecting and recognizing different types of activities simultaneously is necessary. We first design a new unified descriptor, called Relation History Image (RHI), which can be extracted from all the activity types we are interested in. We then formulate an optimization procedure to detect and recognize activities of different types. We assess our approach on a new dataset recorded from a robotcentric perspective as well as on publicly available datasets, and evaluate its quality compared to multiple baselines.

Efficient Techniques for Crowdsourced Topk Lists
Luca de Alfaro, Vassilis Polychronopoulos, Neoklis Polyzotis
We focus on the problem of obtaining topk lists of items from larger itemsets, using human workers for doing comparisons among items.An example application is shortlisting a large set of college applications using advanced students as workers. We describe novel efficient techniques and explore their tolerance to adversarial behavior and the tradeoffs among different measures of performance (latency, expense and quality of results). We empirically evaluate the proposed techniques against prior art using simulations as well as real crowds in Amazon Mechanical Turk. A randomized variant of the proposed algorithms achieves significant budget saves, especially for very large itemsets and large topk lists, with negligible risk of lowering the quality of the output.

The Many Benefits of Annotator Rationales for Relevance Judgments
Tyler McDonnell, Mucahid Kutlu, Tamer Elsayed, Matthew Lease
When collecting subjective human ratings of items, it can be difficult to measure and enforce data quality due to task subjectivity and lack of insight into how judges arrive at each rating decision. To address this, we propose requiring judges to provide a specific type of rationale underlying each rating decision. We evaluate this approach in the domain of Information Retrieval, where human judges rate the relevance of Webpages. Costbenefit analysis over 10,000 judgments collected on Mechanical Turk suggests a winwin: experienced crowd workers provide rationales with no increase in task completion time while providing further benefits, including more reliable judgments and greater transparency.

Enhancing Crowdworkers' Vigilance
Avshalom Elmalech, David Sarne, Esther David, Chen Hajaj
This paper presents methods for improving the attention span of workers in tasks that heavily rely on their attention to the occurrence of rare events. The underlying idea in our approach is to dynamically augment the task with some dummy (artificial) events at different times throughout the task, rewarding the worker upon identifying and reporting them. The proposed approach is an alternative to the traditional approach of exclusively relying on rewarding the worker for successfully identifying the event of interest itself. We propose three methods for timing the dummy events throughout the task. Two of these methods are static and determine the timing of the dummy events at random or uniformly throughout the task. The third method is dynamic and uses the identification (or misidentification) of dummy events as a signal for the worker's attention to the task, adjusting the rate of dummy events generation accordingly.
Thursday 24 14:00  15:00 Invited Talk From Automation to Autonomous Systems: A Legal Phenomenology with Problems of Accountability
Ugo Pagallo
Thursday 24 14:00  15:00 Invited Talk Deep Learning at Alibaba
Rong Jin
Thursday 24 15:00  16:00 Panel TBD
TBD
Thursday 24 15:00  16:00 Competition Angry Birds
Thursday 24 15:00  16:00 NLPQA  Question Answering

Automatic Generation of Grounded Visual Questions
Shijie Zhang, Lizhen Qu, Shaodi You, Zhenglu Yang, Jiawan Zhang
In this paper, we propose the first model to be able to generate visually grounded questions with diverse types for a single image. Visual question generation is an emerging topic which aims to ask questions in natural language based on visual input. To the best of our knowledge, it lacks automatic methods to generate meaningful questions with various types for the same visual input. To circumvent the problem, we propose a model that automatically generates visually grounded questions with varying types. Our model takes as input both images and the captions generated by a dense caption model, samples the most probable question types, and generates the questions in sequel. The experimental results on two real world datasets show that our model outperforms the strongest baseline in terms of both correctness and diversity with a wide margin.

Symbolic Priors for RNNbased Semantic Parsing
Chunyang Xiao, Marc Dymetman, Claire Gardent
Seq2seq models based on Recurrent Neural Networks (RNNs) have recently received a lot of attention in the domain of Semantic Parsing. While in principle they can be trained directly on pairs (natural language utterances, logical forms), their performance is limited by the amount of available data. To alleviate this problem, we propose to exploit various sources of prior knowledge: the wellformedness of the logical forms is modeled by a weighted contextfree grammar; the likelihood that certain entities present in the input utterance are also present in the logical form is modeled by weighted finitestate automata. The grammar and automata are combined together through an efficient intersection algorithm to form a soft guide (“background”) to the RNN.We test our method on an extension of the Overnight dataset and show that it not only strongly improves over an RNN baseline, but also outperforms nonRNN models based on rich sets of handcrafted features.

Solving Probability Problems in Natural Language
Anton Dries, Angelika Kimmig, Jesse Davis, Vaishak Belle, Luc de Raedt
The ability to solve probability word problems such as those found in introductory discrete mathematics textbooks, is an important cognitive and intellectual skill. In this paper, we develop a twostep endtoend fully automated approach for solving such questions that is able to automatically provide answers to exercises about probability formulated in natural language.In the first step, a question formulated in natural language is analysed and transformed into a highlevel model specified in a declarative language. In the second step, a solution to the highlevel model is computed using a probabilistic programming system. On a dataset of 2160 probability problems, our solver is able to correctly answer 97.5% of the questions given a correct model. On the endtoend evaluation, we are able to answer 12.5% of the questions (or 31.1% if we exclude examples not supported by design).

Finding Prototypes of Answers for Improving Answer Sentence Selection
wailok tam, yusuke miyao, namgi han, Juan Horñiacek
Answer sentence selection has been widely adopted recently for benchmarking techniques in Question Answering. Previous proposals for the task are essentially general solutions taking the form of neural networks that measure semantic similarity. In contrast, the present paper describes a simple technique to take advantage of such generalpurpose tools for dealing with questions and answer sentences without changing the base system. The technique involves replacing whwords in input questions with a word denoting the prototype of all answers. These transformed questions are passed as input to an existing neural network built for measuring semantic similarity. This technique is evaluated on two different neural network architectures over two datasets: TrecQA and WikiQA. Results of our experiments show improvement in overall accuracy across most question types we are interested in: `who', `when' and `where'type questions.
Thursday 24 15:00  16:00 NLPNLG  Natural Language Generation

HumanCentric Justification of Machine Learning Predictions
Or Biran, Kathleen McKeown
Human decision makers in many domains can make use of predictions made by machine learning models in their decision making process, but the usability of these predictions is limited if the human is unable to justify his or her trust in the prediction. We propose a novel approach to producing justifications that is geared towards users without machine learning expertise, focusing on domain knowledge and on human reasoning, and utilizing natural language generation. Through a taskbased experiment, we show that our approach significantly helps humans to correctly decide whether or not predictions are accurate, and significantly increases their satisfaction with the justification.

MAT: A Multimodal Attentive Translator for Image Captioning
Chang Liu, Fuchun Sun, Changhu Wang, Feng Wang, Alan Yuille
In this work we formulate the problem of image captioning as a multimodal translation task. Analogous to machine translation, we present a sequencetosequence recurrent neural networks (RNN) model for image caption generation. Different from most existing work where the whole image is represented by convolutional neural network (CNN) feature, we propose to represent the input image as a sequence of detected objects which feeds as the source sequence of the RNN model. In this way, the sequential representation of an image can be naturally translated to a sequence of words, as the target sequence of the RNN model. To represent the image in a sequential way, we extract the objects features in the image and arrange them in a order using convolutional neural networks. To further leverage the visual information from the encoded objects, a sequential attention layer is introduced to selectively attend to the objects that are related to generate corresponding words in the sentences. Extensive experiments are conducted to validate the proposed approach on popular benchmark dataset, i.e., MS COCO, and the proposed model surpasses the stateoftheart methods in all metrics following the dataset splits of previous work. The proposed approach is also evaluated by the evaluation server of MS COCO captioning challenge, and achieves very competitive results, e.g., a CIDEr of 1.029 (c5) and 1.064 (c40).

From Neural Sentence Summarization to Headline Generation: A CoarsetoFine Approach
Jiwei Tan, Xiaojun Wan, Jianguo Xiao
Headline generation is a task of abstractive text summarization, and previously suffers from the immaturity of natural language generation techniques. Recent success of neural sentence summarization models shows the capacity of generating informative, fluent headlines conditioned on selected recapitulative sentences. In this paper, we investigate the extension of sentence summarization models to the document headline generation task. The challenge is that extending the sentence summarization model to consider more document information will mostly confuse the model and hurt the performance. In this paper, we propose a coarsetofine approach, which first identifies the important sentences of a document using document summarization techniques, and then exploits a multisentence summarization model with hierarchical attention to leverage the important sentences for headline generation. Experimental results on a large real dataset demonstrate the proposed approach significantly improves the performance of neural sentence summarization models on the headline generation task.

A Correlated Topic Model Using Word Embeddings
Guangxu Xun, Yaliang Li, Xin Zhao, Jing Gao, Aidong Zhang
Conventional correlated topic models are able to capture correlation structure among latent topics by replacing the Dirichlet prior with the logistic normal distribution. Word embeddings have been proven to be able to capture semantic regularities in language. Therefore, the semantic relatedness and correlations between words can be directly calculated in the word embedding space, for example, via cosine values. In this paper, we propose a novel correlated topic model using word embeddings. The proposed model enables us to exploit the additional wordlevel correlation information in word embeddings and directly model topic correlation in the continuous word embedding space. In the model, words in documents are replaced with meaningful word embeddings, topics are modeled as multivariate Gaussian distributions over the word embeddings and topic correlations are learned among the continuous Gaussian topics. A Gibbs sampling solution with data augmentation is given to perform inference. We evaluate our model on the 20 Newsgroups dataset and the Reuters21578 dataset qualitatively and quantitatively. The experimental results show the effectiveness of our proposed model.
Thursday 24 15:00  16:00 PLTFP  Theoretical Foundations of Planning

Generalized Planning: NonDeterministic Abstractions and Trajectory Constraints
Blai Bonet, Giuseppe De Giacomo, Hector Geffner, Sasha Rubin
We study the characterization and computation of general policies for families of problems that share a structure characterized by a common reduction into a single abstract problem. Policies mu that solve the abstract problem P have been shown to solve all problems Q that reduce to P provided that mu terminates in Q. In this work, we shed light on why this termination condition is needed and how it can be removed. The key observation is that the abstract problem P captures the common structure among the concrete problems Q that is local (Markovian) but misses common structure that is global. We show how such global structure can be captured by means of trajectory constraints that in many cases can be expressed as LTL formulas, thus reducing generalized planning to LTL synthesis. Moreover, for a broad class of problems that involve integer variables that can be increased or decreased, trajectory constraints can be compiled away, reducing generalized planning to fully observable nondeterministic planning.

Efficient, Safe, and Probably Approximately Complete Learning of Action Models
Roni Stern, Brendan Juba
In this paper we explore the theoretical boundaries of planning in a setting where no model of the agent's actions is given. Instead of an action model, a set of successfully executed plans are given and the task is to generate a plan that is safe, i.e., guaranteed to achieve the goal without failing. To this end, we show how to learn a conservative model of the world in which actions are guaranteed to be applicable. This conservative model is then given to an offtheshelf classical planner, resulting in a plan that is guaranteed to achieve the goal. However, this reduction from a modelfree planning to a modelbased planning is not complete: in some cases a plan will not be found even when such exists. We analyze the relation between the number of observed plans and the likelihood that our conservative approach will indeed fail to solve a solvable problem. Our analysis show that the number of trajectories needed scales gracefully.

An Improved Approximation Algorithm for the Subpath Planning Problem and Its Generalization
Hanna Sumita, Yuuma Yonebayashi, Naonori Kakimura, Kenichi Kawarabayashi
This paper focuses on a generalization of the traveling salesman problem (TSP), called the subpath planning problem (SPP). Given 2n vertices and n independent edges on a metric space, we aim to find a shortest tour that contains all the edges. SPP is one of the fundamental problems in both artificial intelligence and robotics. Our main result is to design a 1.5approximation algorithm that runs in polynomial time, improving the currently best approximation algorithm. The idea is direct use of techniques developed for TSP. In addition, we propose a generalization of SPP called the subgroup planning problem (SGPP). In this problem, we are given a set of disjoint groups of vertices, and we aim to find a shortest tour such that all the vertices in each group are traversed sequentially. We propose a 3approximation algorithm for SGPP. We also conduct numerical experiments. Compared with previous algorithms, our algorithms improve the solution quality by more than 10% for large instances with more than 10,000 vertices.

Hierarchical task network planning with task insertion and state constraints
Zhanhao Xiao, Andreas Herzig, Laurent Perrussel, Hai Wan, Xiaoheng Su
We extend hierarchical task networks planning with task insertion (TIHTN) by adding state constraints. We show that just as for TIHTN planning, the solutions of extended TIHTN planning can be obtained by acyclic decomposition, entailing that it is decidable without any restriction on methods. We also prove that the extension by state constraints does not cause an increase in the complexity of the planexistence problem, which stays 2NEXPTIMEcomplete, based on an acyclic progression operator. In addition, we show that this extension of TIHTN planning not only covers the original TIHTN planning but also hierarchyrelaxed hierarchical goal networks planning.
Thursday 24 15:00  16:00 PLAPLI  Applications of Planning

Generalized Target Assignment and Path Finding Using Answer Set Programming
Van Nguyen, Philipp Obermeier, Tran Son, Torsten Schaub, William Yeoh
In MultiAgent Path Finding (MAPF), a team of agents needs to find collisionfree paths from their starting locations to their respective targets. Combined Target Assignment and Path Finding (TAPF) extends MAPF by including the problem of assigning targets to agents as a precursor to the MAPF problem. A limitation of both models is their assumption that the number of agents and targets are equal, which is invalid in some applications such as autonomous warehouse systems. We address this limitation by generalizing TAPF to allow for (1)~unequal number of agents and tasks; (2)~tasks to have deadlines by which they must be completed; (3)~ordering of groups of tasks to be completed; and (4)~tasks that are composed of a sequence of checkpoints that must be visited in a specific order. Further, we model the problem using answer set programming (ASP) to show that customizing the desired variant of the problem is simple one only needs to choose the appropriate combination of ASP rules to enforce it. We also demonstrate experimentally that if problem specific information can be incorporated into the ASP encoding then ASP based method can be efficient and can scale up to solve practical applications.

Temporal Planning for Compilation of Quantum Approximate Optimization Circuits
Davide Venturelli, Minh Do, Eleanor Rieffel, Jeremy Frank
We investigate the application of temporal planners to the problem of compiling quantum circuits to emerging quantum hardware. While our approach is general, we focus our initial experiments on Quantum Approximate Optimization Algorithm (QAOA) circuits that have few ordering constraints and thus allow highly parallel plans. We report on experiments using several temporal planners to compile circuits of various sizes to a realistic hardware architecture. This early empirical evaluation suggests that temporal planning is a viable approach to quantum circuit compilation.

Softpressure: A ScheduleDriven Backpressure Algorithm for Coping with Network Congestion
HsuChieh Hu, Stephen F. Smith
We consider the problem of minimizing the the delay of jobs moving through a directed graph of service nodes. In this problem, each node may have several links and is constrained to serve one link at a time. As jobs move through the network, they can pass through a node only after they have been serviced by that node. The objective is to minimize the delay jobs incur sitting on queues waiting to be serviced. Two popular approaches to this problem are backpressure algorithm and scheduledriven control. In this paper, we present a hybrid approach of those two methods that incorporates the stability of queuing theory into the scheduledriven control. We then demonstrate how this hybrid method outperforms the other two in a realtime traffic signal control problem, where the nodes are traffic lights, the links are roads, and the jobs are vehicles. We show through simulations that, in scenarios with heavy congestion, the hybrid method results in 50% and 15% reductions in delay over scheduledriven control and backpressure respectively. A theoretical analysis also justifies our results.

Generating ContextFree Grammars using Classical Planning
Sergio Jiménez, Javier Segovia, Anders Jonsson
This paper presents a novel approach for generating ContextFree Grammars (CFGs) from small sets of input strings (a single input string in some cases). Our approach is to compile this task into a classical planning problem whose solutions are sequences of actions that build and validate a CFG compliant with the input strings. In addition, we show that our compilation is suitable for implementing the two canonical tasks for CFGs, string production and string recognition.
Thursday 24 15:00  16:00 CSSAT  Satisfiability

A Recursive Shortcut for CEGAR: Application To The Modal Logic K Satisfiability Problem
JeanMarie Lagniez, Daniel Le Berre, Tiago de Lima, Valentin Montmirail
CounterExampleGuided Abstraction Refinement (CEGAR) has been very successful in model checking large systems. Since then, it has been applied to many different problems. It especially proved to be an highly successful practical approach for solving the PSPACE complete QBF problem. In this paper, we propose a new CEGARlike approach for tackling PSPACE complete problems that we call RECAR (Recursive Explore and Check Abstraction Refinement). We show that this generic approach is sound and complete. Then we propose a specific implementation of the RECAR approach to solve the modal logic K satisfiability problem. We implemented both a CEGAR and a RECAR approach for the modal logic K satisfiability problem within the solver MoSaiC. We compared experimentally those approaches to the stateoftheart solvers for that problem. The RECAR approach outperforms the CEGAR one for that problem and also compares favorably against the stateoftheart on the benchmarks considered.

Intelligent Belief State Sampling for Conformant Planning
Alban Grastien, Enrico Scala
We propose a new method for conformant planning based on two ideas. First given a small sample of the initial belief state we reduce conformant planning for this sample to a classical planning problem, giving us a candidate solution. Second we exploit regression as a way to compactly represent necessary conditions for such a solution to be valid for the nondeterministic setting. If necessary, we use the resulting formula to extract a counterexample to populate our next sampling. Our experiments show that this approach is competitive on a class of problems that are hard for traditional planners, and also returns generally shorter plans. We are also able to demonstrate unsatisfiability of some problems.

Generating Hard Random Boolean Formulas and Disjunctive Logic Programs
Giovanni Amendola, Francesco Ricca, Mirek Truszczynski
We propose a model of random quantified boolean formulas and their natural random disjunctive logic program counterparts. The model extends the standard models for random SAT and 2QBF. We provide theoretical bounds for the phase transition region in the new model, and show experimentally the presence of the easyhardeasy pattern. Importantly, we show that the model is well suited for assessing solvers tuned to realworld instances. Moreover, to the best of our knowledge, our model and results on random disjunctive logic programs are the first of their kind.

Locality in Random SAT Instances
Jesús GiráldezCru, Jordi Levy
Despite the success of CDCL SAT solvers solving industrial problems, there are still many open questions to explain such success. In this context, the generation of random SAT instances having computational properties more similar to realworld problems becomes crucial. Such generators are possibly the best tool to analyze families of instances and solvers behaviors on them. In this paper, we present a random SAT instances generator based on the notion of locality. We show that this is a decisive dimension of attractiveness among the variables of a formula, and how CDCL SAT solvers take advantage of it. To the best of our knowledge, this is the first random SAT model that generates both scalefree structure and community structure at once.
Thursday 24 15:00  16:00 MASSC  Social Choice

Multiwinner Rules on Paths From kBorda to Chamberlin–Courant
Piotr Faliszewski, Piotr Skowron, Arkadii Slinko, Nimrod Talmon
The classical multiwinner rules are designed for particular purposes. For example, variants of kBorda are used to find k best competitors in judging contests while the ChamberlinCourant rule is used to select a diverse set of k products. These rules represent two extremes of the multiwinner world. At times, however, one might need to find an appropriate tradeoff between these two extremes. We explore continuous transitions from kBorda to ChamberlinCourant and study intermediate rules.

Fair and Efficient Social Choice in Dynamic Settings
Rupert Freeman, Seyed Majid Zahedi, Vincent Conitzer
We study a dynamic social choice problem in which an alternative is chosen at each round according to the reported valuations of a set of agents. In the interests of obtaining a solution that is both efficient and fair, we aim to maximize the longterm Nash social welfare, which is the product of all agents' utilities. We present and analyze two greedy algorithms for this problem, including the classic Proportional Fair (PF) algorithm. We analyze several versions of the algorithms and how they relate, and provide an axiomatization of PF. Finally, we evaluate the algorithms on data gathered from a computer systems application.

Online Roommate Allocation Problem
Guangda Huzhang, Xin Huang, Shengyu Zhang, Xiaohui Bei
We study the online allocation problem under a roommate market model introduced in [Chan et al., 2016]. Consider a fixed supply of n rooms and a list of 2n applicants arriving sequentially in an online fashion. The problem is to assign a room to each person upon her arrival, such that after the algorithm terminates, each room is shared by exactly two people. We focus on two objectives: (1) maximizing the social welfare, which is defined as the sum of valuations that applicants have for their rooms, plus the happiness value between each pair of roommates; (2) the allocation should satisfy certain stability conditions, such that no group of people would be willing to switch roommates or rooms. We first show a polynomialtime online algorithm that achieves constant competitive ratio for social welfare maximization. We then extend it to the case where each room is assigned to c > 2 people, and achieve a competitive ratio of Ω(1/c^2). Finally, we show both positive and negative results in satisfying different stability conditions in this online setting.

On Coalitional Manipulation for Multiwinner Elections: Shortlisting
Robert Bredereck, Andrzej Kaczmarczyk, Rolf Niedermeier
Shortlisting of candidates—selecting a group of “best” candidates—is a special case of multiwinner elections. We provide the first indepth study of the computational complexity of strategic voting for shortlisting based on the most natural and simple voting rule in this scenario, lBloc (every voter approves l candidates). In particular, we investigate the influence of several tiebreaking mechanisms (e.g. pessimistic versus optimistic) and group evaluation functions (e.g. egalitarian versus utilitarian) and conclude that in an egalitarian setting strategic voting may indeed be computationally intractable regardless of the tiebreaking rule. We provide a fairly comprehensive picture of the computational complexity landscape of this neglected scenario.
Thursday 24 15:00  16:00 MASAOSE  AgentOriented Software Engineering

Constraint Games revisited
Anthony Palmieri, Arnaud Lallouet
Constraint Games are a recent framework proposed to model and solve static games where Constraint Programming is used to express players preferences. In this paper, we rethink their solving technique in terms of constraint propagation by considering players preferences as global constraints. It yields not only a more elegant but also a more efficient framework. Our new complete solver is faster than previous stateoftheart and is able to find all pure Nash equilibria for some problems with 200 players. We also show that performances can greatly be improved for graphical games, allowing some games with 2000 players to be solved.

Agent Design Consistency Checking via Planning
Nitin Yadav, John Thangarajah, Sebastian Sardina
In this work we present a novel approach to check the consistency of agent designs (prior to any implementation) with respect to the requirements specifications via automated planning. This checking is essentially a search problem which makes planning technology an appropriate solution. We focus our work on BDI agent systems and the Prometheus design methodology in order to directly compare our approach to previous work. Our experiments in more than 16K random instances prove that the approach is more effective than previous ones proposed: it achieves higher coverage, lower runtime, and importantly, can handle loops in the agent detailed design and unbounded subgoal reasoning.

Omniscient Debugging for Cognitive Agent Programs
Vincent Koeman, Koen Hindriks, Catholijn Catholijn
For realtime programs reproducing a bug by rerunning the system is likely to fail, making fault localization a timeconsuming process. Omniscient debugging is a technique that stores each run in such a way that it supports going backwards in time. However, the overhead of existing omniscient debugging implementations for languages like Java is so large that it cannot be effectively used in practice. In this paper, we show that for agentoriented programming practical omniscient debugging is possible. We design a tracing mechanism for efficiently storing and exploring agent program runs. We are the first to demonstrate that this mechanism does not affect program runs by empirically establishing that the same tests succeed or fail. Usability is supported by a trace visualization method aimed at more effectively locating faults in agent programs.

No Pizza for You: Valuebased Plan Selection in BDI Agents
Stephen Cranefield, Michael Winikoff, Virginia Dignum, Frank Dignum
Autonomous agents are increasingly required to be able to make moral decisions. In these situations, the agent should be able to reason about the ethical bases of the decision and explain its decision in terms of the moral values involved. This is of special importance when the agent is interacting with a user and should understand the value priorities of the user in order to provide adequate support. This paper presents a model of agent behavior that takes into account user preferences and moral values.
Thursday 24 15:00  16:00 MTCS  Computational Sustainability

Operation Frames and Clubs in Kidney Exchange
Gabriele Farina, John Dickerson, Tuomas Sandholm
A kidney exchange is a centrallyadministered barter market where patients swap their willing yet incompatible donors. Modern kidney exchanges use 2cycles, 3cycles, and chains initiated by nondirected donors (altruists who are willing to give a kidney to anyone) as the means for swapping. We propose significant generalizations to kidney exchange. We allow more than one donor to donate in exchange for their desired patient receiving a kidney. We also allow for the possibility of a donor willing to donate if any of a number of patients receive kidneys. Furthermore, we combine these notions and generalize them. The generalization is to exchange among organ clubs, where a club is willing to donate organs outside the club if and only if the club receives organs from outside the club according to given specifications. We prove that unlike in the standard model, the uncapped clearing problem is NPcomplete. We also present the notion of operation frames that can be used to sequence the operations across batches, and present integer programming formulations for the market clearing problems for these new types of organ exchanges. Experiments show that in the singledonation setting, operation frames improve planning by 34%  51%. Allowing up to two donors to donate in exchange for one kidney donated to their designated patient yields a further increase in social welfare.

Demand Response Contract Design for Energy Markets
Hongyao Ma, Reshef Meir, Valentin Robu
Power companies such as Southern California Edison (SCE) uses Demand Response (DR) contracts to incentivize consumers to reduce their power consumption during periods when demand forecast exceeds supply. Current mechanisms in use offer contracts to consumers independent of one another, do not take into consideration consumers' heterogeneity in consumption profile or reliability, and fail to achieve high participation. We introduce DRVCG, a new DR mechanism that offers a flexible set of contracts (which may include the standard SCE contracts) and uses VCG pricing. We prove that DRVCG elicits truthful bids, incentivizes honest preparation efforts, and enables efficient computation of allocation and prices. With simple fixedpenalty contracts, the optimization goal of the mechanism is an upper bound on probability that the reduction target is missed. Extensive simulations show that compared to the current mechanism deployed in by SCE, the DRVCG mechanism achieves higher participation, increased reliability, and significantly reduced total expenses.

Blue Skies: A Methodology for DataDriven Clear Sky Modelling
Kartik Palani, Ramachandra Kota, Amar Prakash Azad, Vijay Arya
One of the major challenges confronting the widespread adoption of solar energy is the uncertainty of production. The energy generated by photovoltaic systems is a function of the received solar irradiance which varies due to atmospheric and weather conditions. A key component required for forecasting irradiance accurately is the clear sky model which estimates the average irradiance at a location at a given time in the absence of clouds. Current methods for modelling clear sky irradiance are either inaccurate or require extensive atmospheric data, which tends to vary with location and is often unavailable. In this paper, we present a datadriven methodology, Blue Skies, for modelling clear sky irradiance solely based on historical irradiance measurements. Using machine learning techniques, Blue Skies is able to generate clear sky models that are more accurate spatiotemporally compared to the state of the art, reducing errors by almost 50%.

Deep Multispecies Embedding
Yexiang Xue, Di Chen, Daniel Fink, Shuo Chen, Carla P. Gomes
Understanding how species are distributed across landscapes over time is a fundamental question in biodiversity research. Unfortunately, most species distribution models only target a single species at a time, despite strong ecological evidence that species are not independently distributed. We propose Deep MultiSpecies Embedding (DMSE), which jointly embeds vectors corresponding to multiple species as well as vectors representing environmental covariates into a common highdimensional feature space via a deep neural network. Applied to bird observational data from the citizen science project eBird, we demonstrate how the DMSE model discovers interspecies relationships to outperform singlespecies distribution models (random forests and SVMs) as well as competing multilabel models. Additionally, we demonstrate the benefit of using a deep neural network to extract features within the embedding and show how they improve the predictive performance of species distribution modelling. An important domain contribution of the DMSE model is the ability to discover and describe species interactions while simultaneously learning the shared habitat preferences among species. As an additional contribution, we provide a graphical embedding of hundreds of bird species in the Northeast US.
Thursday 24 15:00  16:00 MLKBL  KnowledgeBased Learning

Extracting Visual Knowledge from the Web with Multimodal Learning
Dihong Gong, Daisy Wang
We consider the problem of automatically extracting visual objects from web images. Despite the extraordinary advancement in deep learning, visual object detection remains a challenging task. To overcome the deficiency of pure visual techniques, we propose to make use of meta text surrounding images on the Web for enhanced detection accuracy. In this paper we present a multimodal learning algorithm to integrate text information into visual knowledge extraction. To demonstrate the effectiveness of our approach, we developed a system that takes raw webpages as input, and automatically extracts visual knowledge (e.g. object bounding boxes) from tens of millions of images crawled from the Web. Experimental results based on 46 object categories show that the extraction precision is improved significantly from 73% (with stateoftheart deep learning programs) to 81%, which is equivalent to a 31% reduction in error rates.

Adversarial Generation of Realtime Feedback with Neural Networks for Simulationbased Training
Xingjun Ma, Sudanthi Wijewickrema, Shuo Zhou, Yun Zhou, Zakaria Mhammedi, Stephen O'Leary, James Bailey
Simulationbased training (SBT) is gaining popularity as a lowcost and convenient training technique in a vast range of applications. However, for a SBT platform to be fully utilized as an effective training tool, it is essential that feedback on performance is provided automatically in realtime during training. It is the aim of this paper to develop an efficient and effective feedback generation method for the provision of realtime feedback in SBT. Existing methods either have low effectiveness in improving novice skills or suffer from low efficiency, resulting in their inability to be used in realtime. In this paper, we propose a neural network based method to generate feedback using the adversarial technique. The proposed method utilizes a bounded adversarial update to minimize a L1 regularized loss via backpropagation. We empirically show that the proposed method can be used to generate simple, yet effective feedback. Also, it was observed to have high effectiveness and efficiency when compared to existing methods, thus making it a promising option for realtime feedback generation in SBT.

Object Detection Meets Knowledge Graphs
Yuan Fang, Kingsley Kuan, Jie Lin, Cheston Tan, Vijay Chandrasekhar
Object detection in images is a crucial task in computer vision, with important applications ranging from security surveillance to autonomous vehicles. Existing stateoftheart algorithms, including deep neural networks, only focus on utilizing features within an image itself, largely neglecting the vast amount of background knowledge about the real world. In this paper, we propose a novel framework of knowledgeaware object detection, which enables the integration of external knowledge such as knowledge graphs into any object detection algorithm. The framework employs the notion of semantic consistency to quantify and generalize knowledge, which improves object detection through a reoptimization process to achieve better consistency with background knowledge. Finally, empirical evaluation on two benchmark datasets show that our approach can significantly increase recall by up to 6.3 points without compromising mean average precision, when compared to the stateoftheart baseline.

Logic Tensor Networks for Semantic Image Interpretation
Ivan Donadello, Luciano Serafini, Artur d'Avila Garcez
Semantic Image Interpretation (SII) is the task of extracting structured semantic descriptions from images. It is widely agreed that the combined use of visual data and background knowledge is of great importance for SII. Recently, Statistical Relational Learning (SRL) approaches have been developed for reasoning under uncertainty and learning in the presence of data and rich knowledge. Logic Tensor Networks (LTNs) are a SRL framework which integrates neural networks with firstorder fuzzy logic to allow (i) efficient learning from noisy data in the presence of logical constraints, and (ii) reasoning with logical formulas describing general properties of the data. In this paper, we develop and apply LTNs to two of the main tasks of SII, namely, the classification of an image's bounding boxes and the detection of the relevant partof relations between objects. To the best of our knowledge, this is the first successful application of SRL to such SII tasks. The proposed approach is evaluated on a standard image processing benchmark. Experiments show that background knowledge in the form of logical constraints can improve the performance of purely datadriven approaches, including the stateoftheart Fast Regionbased Convolutional Neural Networks (Fast RCNN). Moreover, we show that the use of logical background knowledge adds robustness to the learning system when errors are present in the labels of the training data.
Thursday 24 15:00  16:00 ROBXXX  MISCELANEOUS: ROBOTICS AND VOTING

Integrating Answer Set Programming with Semantic Dictionaries for Robot Task Planning
Dongcai Lu, Yi Zhou, Feng Wu, Zhao Zhang, Xiaoping Chen
In this paper, we propose a novel integrated task planning system for service robot in domestic domains. Given openended highlevel user instructions in natural language, robots need to generate a plan, i.e., a sequence of lowlevel executable actions, to complete the required tasks. To address this, we exploit the knowledge on semantic roles of common verbs defined in semantic dictionaries such as FrameNet and integrate it with Answer Set Programming  a task planning framework with both representation language and solvers. In the experiments, we evaluated our approach using common benchmarks on service tasks and showed that it can successfully handle much more tasks than the stateoftheart solution. Notably, we deployed the proposed planning system on our service robot for the annual RoboCup@Home competitions and achieved very encouraging results.

Dual Track Multimodal Automatic Learning through HumanRobot Interaction
Shuqiang Jiang, Weiqing Min, Xue Li, Huayang Wang, Jian Sun, Jiaqi Zhou
Human beings are constantly improving their cognitive ability via automatic learning from the interaction with the environment. Two important aspects of automatic learning are the visual perception and knowledge acquisition. The fusion of these two aspects is vital for improving the intelligence and interaction performance of robots. Many automatic knowledge extraction and recognition methods have been widely studied. However, little work focuses on integrating automatic knowledge extraction and recognition into a unified framework to enable jointly visual perception and knowledge acquisition. To solve this problem, we propose a Dual Track Multimodal Automatic Learning (DTMAL) system, which consists of two components: Hybrid Incremental Learning (HIL) from the vision track and Multimodal Knowledge Extraction (MKE) from the knowledge track. HIL can incrementally improve recognition ability of the system by learning new object samples and new object concepts. MKE is capable of constructing and updating the multimodal knowledge items based on the recognized new objects from HIL and other knowledge by exploring the multimodal signals. The fusion of the two tracks is a mutual promotion process and jointly devote to the dual track learning. We have conducted the experiments through humanmachine interaction and the experimental results validated the effectiveness of our proposed system.

Temporal Grounding Graphs for Language Understanding with Accrued VisualLinguistic Context
Rohan Paul, Andrei Barbu, Sue Felshin, Boris Katz, Nicholas Roy
A robot’s ability to understand or ground natural language instructions is fundamentally tied to its knowledge about the surrounding world. We present an approach to grounding natural language utterances in the context of factual information gathered through naturallanguage interactions and past visual observations. A probabilistic model estimates, from a natural language utterance, the objects, relations, and actions that the utterance refers to, the objectives for future robotic actions it implies, and generates a plan to execute those actions while updating a state representation to include newly acquired knowledge from the visuallinguistic context. Grounding a command necessitates a representation for past observations and interactions; however, maintaining the full context consisting of all possible observed objects, attributes, spatial relations, actions, etc., over time is intractable. Instead, our model, Temporal Grounding Graphs, maintains a learned state representation for a belief over factual groundings, those derived from naturallanguage interactions, and lazily infers new groundings from visual observations using the context implied by the utterance. This work significantly expands the range of language that a robot can understand by incorporating factual knowledge and observations of its workspace into its inference about the meaning and grounding of naturallanguage utterances.

Voting by sequential elimination with few voters
Sylvain Bouveret, yann chevaleyre, François Durand, Jerome Lang
We define a new class of lowcommunication voting rules, tailored for contexts with few voters and possibly many candidates. These rules are defined by a predefined sequence of voters: at each stage, the designated voter eliminates a candidate, and the last remaining candidate wins. We study both deterministic (nonanonymous) variants, and randomized (and anonymous) versions of these rules. We focus on a subfamily of these rules defined by ``noninterleaved'' sequences. We first focus on the axiomatic properties of our rules. Then we focus on the identification of the noninterleaved sequence that gives the best approximation of the Borda score under the impartial culture. Finally, we apply our rules to randomly generated data. Our conclusion is that, in contexts where there are more candidates than voters, eliminationbased rules allow for a very low communication complexity (and especially, avoid asking voters to rank alternatives), and yet can be good approximations of common voting rules, while enjoying a number of good properties.
Thursday 24 16:30  18:00 NLPIE  Information Extraction

How to Keep a Knowledge Base Synchronized with Its Encyclopedia Source
Jiaqing Liang, Sheng Zhang, Yanghua Xiao
Knowledge bases are playing an increasingly important role in many realworld applications. However, most of these knowledge bases tend to be outdated, which limits the utility of these knowledge bases. In this paper, we investigate how to keep the freshness of the knowledge base by synchronizing it with its data source (usually encyclopedia websites). A direct solution is revisiting the whole encyclopedia periodically and rerun the entire pipeline of the construction of knowledge base like most existing methods. However, this solution is wasteful and incurs massive overload of the network, which limits the update frequency and leads to knowledge obsolescence. To overcome the weakness, we propose a set of synchronization principles upon which we build an Update System for knowledge Base (USB) with an update frequency predictor of entities as the core component. We also design a set of effective features and realize the predictor. We conduct extensive experiments to justify the effectiveness of the proposed system, model, as well as the underlying principles. Finally, we deploy USB on a Chinese knowledge base to improve its freshness.

Iterative Entity Alignment via Joint Knowledge Embeddings
Hao Zhu, Ruobing Xie, Zhiyuan Liu, Maosong Sun
Entity alignment aims to link entities and their counterparts among multiple knowledge graphs (KGs). Most existing methods typically rely on external information of entities such as Wikipedia links and require costly manual feature construction to complete alignment. In this paper, we present a novel approach for entity alignment via joint knowledge embeddings. Our method jointly encodes both entities and relations of various KGs into a unified lowdimensional semantic space according to a small seed set of aligned entities. During this process, we can align entities according to their semantic distance in this joint semantic space. More specifically, we present an iterative and parameter sharing method to improve alignment performance. Experiment results on realworld datasets show that, as compared to baselines, our method achieves significant improvements on entity alignment, and can further improve knowledge graph completion performance on various KGs with the favor of joint knowledge embeddings.

Conditional Generative Adversarial Networks for Commonsense Machine Comprehension
Bingning Wang, kang liu, jun zhao
Recently proposed Story Cloze Test [Mostafazadeh et al., 2016] is a commonsense machine comprehension application to deal with natural language understanding problem. This dataset contains a lot of story tests which require commonsense inference ability. Unfortunately, the training data is almost unsupervised where each context document followed with only one positive sentence that can be inferred from the context. However, in the testing period, we must make inference from two candidate sentences. To tackle this problem, we employ the generative adversarial networks (GANs) to generate fake sentence. We proposed a Conditional GANs in which the generator is conditioned by the context. Our experiments show the advantage of the CGANs in discriminating sentence and achieve stateoftheart results in commonsense story reading comprehension task compared with previous feature engineering and deep learning methods.

Inverted Bilingual Topic Models for Lexicon Extraction from Nonparallel Data
Tengfei Ma, Tetsuya Nasukawa
Topic models have been successfully applied in lexicon extraction. However, most previous methods are limited to documentaligned data. In this paper, we try to address two challenges of applying topic models to lexicon extraction in nonparallel data: 1) hard to model the word relationship and 2) noisy seed dictionary. To solve these two challenges, we propose two new bilingual topic models to better capture the semantic information of each word while discriminating the multiple translations in a noisy seed dictionary. We extend the scope of topic models by inverting the roles of "word" and "document". In addition, to solve the problem of noise in seed dictionary, we incorporate the probability of translation selection in our models. Moreover, we also propose an effective measure to evaluate the similarity of words in different languages and select the optimal translation pairs. Experimental results using real world data demonstrate the utility and efficacy of the proposed models.

Selfpaced Compensatory Deep Boltzmann Machine for SemiStructured Document Embedding
Shuangyin Li, Rong Pan, Jun Yan
In the last decade, there has been a huge amount of documents with different types of rich metadata information, which belongs to the SemiStructured Documents (SSDs), appearing in many real applications. It is an interesting research work to model this type of text data following the way how humans understand text with informative metadata. In the paper, we introduce a Selfpaced Compensatory Deep Boltzmann Machine (SCDBM) architecture that learns a deep neural network by using metadata information to learn deep structure layerwisely for SemiStructured Documents (SSDs) embedding in a selfpaced way. Inspired by the way how humans understand text, the model defines a deep process of document vector extraction beyond the space of words by jointing the metadata where each layer selects different types of metadata. We present efficient learning and inference algorithms for the SCDBM model and empirically demonstrate that using the representation discovered by this model has better performance on semistructured document classification and retrieval, and tag prediction comparing with stateoftheart baselines.

Effective Deep Memory Networks for Distant Supervised Relation Extraction
Xiaocheng Feng, Bing Qin, Jiang Guo, Ting Liu, Yongjie Liu
Distant supervised relation extraction (RE) has been an effective way of finding novel relational facts from text without labeled training data. Typically it can be formalized as a multiinstance multilabel problem.In this paper, we introduce a novel neural approach for distant supervised (RE) with specific focus on attention mechanisms.Unlike the featurebased logistic regression model and compositional neural models such as CNN, our approach includes two major attentionbased memory components, which is capable of explicitly capturing the importance of each context word for modeling the representation of the entity pair, as well as the intrinsic dependencies between relations.Such importance degree and dependency relationship are calculated with multiple computational layers, each of which is a neural attention model over an external memory. Experiment on realworld datasets shows that our approach performs significantly and consistently better than various baselines.
Thursday 24 16:30  18:00 MLNN2  Neural Networks 2

Stacked SimilarityAware Autoencoders
Wenqing Chu, Deng Cai
As one of the most popular unsupervised learning approaches, the autoencoder aims at transforming the inputs to the outputs with the least discrepancy. The conventional autoencoder and most of its variants only consider the onetoone reconstruction, which ignores the intrinsic structure of the data and may lead to overfitting. In order to preserve the latent geometric information in the data, we propose the stacked similarityaware autoencoders. To train each single autoencoder, we first obtain the pseudo class label of each sample by clustering the input features. Then the hidden codes of those samples sharing the same category label will be required to satisfy an additional similarity constraint. Specifically, the similarity constraint is implemented based on an extension of the recently proposed center loss. With this joint supervision of the autoencoder reconstruction error and the center loss, the learned feature representations not only can reconstruct the original data, but also preserve the geometric structure of the data. Furthermore, a stacked framework is introduced to boost the representation capacity. The experimental results on several benchmark datasets show the remarkable performance improvement of the proposed algorithm compared with other autoencoder based approaches.

Mention Recommendation for Twitter with Endtoend Memory Network
Haoran Huang, Qi Zhang, Xuanjing Huang
In this study, we investigated the problem of recommending usernames when people attempt to use the ``@'' sign to mention other people in twitterlike social media. With the extremely rapid development of social networking services, this problem has received considerable attention in recent years. Previous methods have studied the problem from different aspects. Because most of Twitterlike microblogging services limit the length of posts, statistical learning methods may be affected by the problems of word sparseness and synonyms. Although recent progress in neural word embedding methods have advanced the stateoftheart in many natural language processing tasks, the benefits of word embedding have not been taken into consideration for this problem. In this work, we proposed a novel endtoend memory network architecture to perform this task. We incorporated the interests of users with external memory. A hierarchical attention mechanism was also applied to better consider the interests of users. The experimental results on a dataset we collected from Twitter demonstrated that the proposed method could outperform stateoftheart approaches.

Hashtag Recommendation for Multimodal Microblog Using CoAttention Network
Qi Zhang, Jiawen Wang, Haoran Huang, Xuanjing Huang, Yeyun Gong
In microblogging services, authors can use hashtags to mark keywords or topics. Many live social media applications (e.g., microblog retrieval, classification) can gain great benefits from these manually labeled tags. However, only a small portion of microblogs contain hashtags inputed by users. Moreover, many microblog posts contain not only textual content but also images. These visual resources also provide valuable information that may not be included in the textual content. So that it can also help to recommend hashtags more accurately. Motivated by the successful use of the attention mechanism, we propose a coattention network incorporating textual and visual information to recommend hashtags for multimodal tweets. Experimental result on the data collected from Twitter demonstrated that the proposed method can achieve better performance than stateoftheart methods using textual information only.

Encoding and Recall of SpatioTemporal Episodic Memory in Real Time
PooHee Chang, AhHwee Tan
Episodic memory enables a cognitive system to improve its performance by reflecting upon past events. In this paper, we propose a computational model called STEM for encoding and recall of episodic events together with the associated contextual information in real time. Based on a class of selforganizing neural networks, STEM is designed to learn memory chunks or cognitive nodes, each encoding a set of cooccurring multimodal activity patterns across multiple pattern channels. We present algorithms for recall of events based on partial and inexact input patterns. Our empirical results based on a public domain data set show that STEM displays a high level of efficiency and robustness in encoding and retrieval with both partial and noisy search cues when compared with a stateoftheart associative memory model.

Deep Context: A Neural Language Model for Largescale Networked Documents
Hao Wu, Kristina Lerman
We propose a scalable neural language model that leverages the links between documents to learn the deep context of words in documents. Our model, Deep Context Vector, takes advantage of distributed representations to exploit the word order in document sentences, as well as the semantic connections among linked documents in a document network. We evaluate our model on largescale data collections that include Wikipedia pages, and scientific and legal citations networks. We demonstrate its effectiveness and efficiency on document classification and link prediction tasks.

Earth Mover Distance Pooling over Siamese LSTMs for Automatic Short Answer Grading
Sachin Kumar, Soumen Chakrabarti, Shourya Roy
Automatic short answer grading (ASAG) can reduce tedium for instructors, but is complicated by freeform student inputs. An important ASAG task is to assign ordinal scores to student answers, given some “model” or ideal answers. Here we introduce a novel framework for ASAG by cascading three building blocks into an endtoend differentiable network: Siamese bidirectional LSTMs applied to a model and a student answer, a novel pooling layer based on earthmover distance (EMD) across all hidden states from both LSTMs, and a flexible final regression layer to output scores. On standard ASAG data sets, our system shows substantial reduction in grade estimation error compared to competitive baselines. We demonstrate that EMD pooling results in substantial accuracy gains, and that a support vector ordinal regression (SVOR) output layer helps outperform softmax. Our system also outperforms recent attention mechanisms on LSTM states.
Thursday 24 16:30  18:00 MLDMUL2  Data Mining and Unsupervised Learning 2

Reconstructionbased Unsupervised Feature Selection: An Embedded Approach
Jundong Li, Jiliang Tang, Huan Liu
Feature selection has been proven to be effective and efficient in preparing highdimensional data for data mining and machine learning problems. Since realworld data is usually unlabeled, unsupervised feature selection has received increasing attention in recent years. Without label information, unsupervised feature selection needs alternative criteria to define feature relevance. Recently, data reconstruction error emerged as a new criterion for unsupervised feature selection, which defines feature relevance as the capability of features to approximate original data via a reconstruction function. Most existing algorithms in this family assume predefined, linear reconstruction functions. However, the reconstruction function should be data dependent and may not always be linear especially when the original data is highdimensional. In this paper, we investigate how to learn the reconstruction function from the data automatically for unsupervised feature selection, and propose a novel reconstructionbased unsupervised feature selection framework REFS, which embeds the reconstruction function learning process into feature selection. Experiments on various types of realworld datasets demonstrate the effectiveness of the proposed framework REFS.

Multiple Medoids based Multiview Relational Fuzzy Clustering with Minimax Optimization
Yangtao Wang, Lihui Chen, Xiaoli Li
Multiview data becomes prevalent nowadays because more and more data can be collected from various sources. Each data set may be described by different set of features, hence forms a multiview data set or multiview data in short. To find the underlying pattern embedded in an unlabelled multiview data, many multiview clustering approaches have been proposed. Fuzzy clustering in which a data object can belong to several clusters with different memberships is widely used in many applications. However, in most of the fuzzy clustering approaches, a single center or medoid is considered as the representative of each cluster in the end of clustering process. This may not be sufficient to ensure accurate data analysis. In this paper, a new multiview fuzzy clustering approach based on multiple medoids and minimax optimization called M4FC for relational data is proposed. In M4FC, every object is considered as a medoid candidate with a weight. The higher the weight is, the more likely the object is chosen as the final medoid. In the end of clustering process, there may be more than one mediod in each cluster. Moreover, minimax optimization is applied to find consensus clustering results of different views with its set of features. Extensive experimental studies on several multiview data sets including real world image and document data sets demonstrate that M4FC not only outperforms single medoid based multiview fuzzy clustering approach, but also performs better than existing multiview relational clustering approaches.

Flexible Orthogonal Neighborhood Preserving Embedding
Tianji Pang, Feiping Nie, Junwei Han
In this paper, we propose a novel linear subspace learning algorithm called Flexible Orthogonal Neighborhood Preserving Embedding (FONPE), which is a linear approximation of Locally Linear Embedding (LLE) algorithm. Our novel objective function integrates two terms related to manifold smoothness and a flexible penalty defined on the projection fitness. Different from Neighborhood Preserving Embedding (NPE), we relax the hard constraint by modeling the mismatch between the approximate linear embedding and the original nonlinear embedding instead of enforcing them to be equal, which makes it better cope with the data sampled from a nonlinear manifold. Besides, instead of enforcing an orthogonality between the projected points, we enforce the mapping to be orthogonal. By using this method, FONPE tends to preserve distances and thus the overall geometry can be preserved. Unlike LLE, as FONPE has an explicit linear mapping between the input and the reduced spaces, it can handle novel testing data straightforwardly. Moreover, when the projection matrix in our model becomes an identity matrix, our model can be transformed to denoising LLE (DLLE). Compared with the standard LLE, we demonstrate that DLLE can handle data with noise better. Comprehensive experiments on several benchmark databases demonstrate the effectiveness of our algorithm.

User Profile Preserving Social Network Embedding
Daokun Zhang, Jie Yin, Xingquan Zhu, Chengqi Zhang
This paper addresses social network embedding, which aims to embed social network nodes, including user profile information, into a latent lowdimensional space. Most of the existing works on network embedding only consider network structure, but ignore usergenerated content that could be potentially helpful in learning a better joint network representation. Different from rich node content in citation networks, user profile information in social networks is useful but noisy, sparse, and incomplete. To properly utilize this information, we propose a new algorithm called User Profile Preserving Social Network Embedding (UPPSNE), which incorporates user profile with network structure to jointly learn a vector representation of a social network. The theme of UPPSNE is to embed user profile information via a nonlinear mapping into a consistent subspace, where network structure is seamlessly encoded to jointly learn informative node representations. Extensive experiments on four realworld social networks show that compared to stateoftheart baselines, our method learns better social network representations and achieves substantial performance gains in node classification and clustering tasks.

MultiComponent Nonnegative Matrix Factorization
Jing Wang, Feng Tian, Xiao Wang, Hongchuan Yu, Chang Hong Liu, Liang Yang
Real data are usually complex and contain various components. For example, face images have expressions and genders. Each component mainly reflects one aspect of data and provides information others do not have. Therefore, exploring the semantic information of multiple components as well as the diversity among them is of great benefit to understand data comprehensively and indepth. However, this cannot be achieved by current nonnegative matrix factorization (NMF)based methods, despite that NMF has shown remarkable competitiveness in learning partsbased representation of data. To overcome this limitation, we propose a novel multicomponent nonnegative matrix factorization (MCNMF). Instead of seeking for only one representation of data, MCNMF learns multiple representations simultaneously, with the help of the Hilbert Schmidt Independence Criterion (HSIC) as a diversity term. HSIC explores the diverse information among the representations, where each representation corresponds to a component. By integrating the multiple representations, a more comprehensive representation is then established. A new iterative updating optimization scheme is derived to solve the objective function of MCNMF, along with its correctness and convergence guarantees. Extensive experimental results on realworld datasets have shown that MCNMF not only achieves more accurate performance over the stateofthearts using the aggregated representation, but also interprets data from different aspects with the multiple representations, which is beyond what current NMFs can offer.

Selfweighted Multiview Clustering with Multiple Graphs
Feiping Nie, Jing Li, Xuelong Li
In multiview learning, it is essential to assign a reasonable weight to each view according to its importance. Thus, for multiview clustering task, a wise and elegant method should achieve clustering multiview data while learning the view weights. In this paper, we address this problem by exploring a Laplacian rank constrained graph, which can be approximately as the centroid of the built graph for each view with different confidences. We start our work with a natural thought that the weights can be learned by introducing a hyperparameter. By analyzing the weakness of it, we further propose a new multiview clustering method which is totally selfweighted. Furthermore, once the target graph is obtained in our models, we can directly assign the cluster label to each data point and do not need any postprocessing such as $K$means in standard spectral clustering. Evaluations on two synthetic datasets prove the effectiveness of our methods. Compared with several representative graphbased multiview clustering approaches on four realworld datasets, experimental results demonstrate that the proposed methods achieve the better performances and our new clustering method is more practical to use.
Thursday 24 16:30  18:00 KRACC  Action, Change and Causality

A CoreGuided Approach to Learning Optimal Causal Graphs
Antti Hyttinen, Paul Saikko, Matti Järvisalo
Discovery of causal relations is an important part of data analysis. Recent exact Boolean optimization approaches enable tackling very general search spaces of causal graphs with feedback cycles and latent confounders, simultaneously obtaining high accuracy by optimally combining conflicting independence information in sample data. We propose several domainspecific techniques and integrate them into a coreguided maximum satisfiability solver, thereby speeding up current state of the art in exact search for causal graphs with cycles and latent confounders on simulated and realworld data.

BudgetConstrained Dynamics in Multiagent Systems
Rui Cao, Pavel Naumov
The paper introduces a notion of a budgetconstrained multiagent transition system that associates two financial parameters with each transition: a pretransition minimal budget requirement and a posttransition profit. The paper also proposes a new modal language for reasoning about such a system. The language uses a modality labeled by agent as well as by budget and profit constraints. The main technical result is a sound and complete logical system that describes all universal properties of this modality. Among these properties is a form of Transitivity axiom that captures the interplay between the budget and profit constraints.

GDLIII: A Description Language for Epistemic General Game Playing
Michael Thielscher
GDLIII, a description language for general game playing with imperfect information and introspection, supports the specification of epistemic games. These are characterised by rules that depend on the knowledge of players. GDLIII provides a simpler language for representing actions and knowledge than existing formalisms: domain descriptions require neither explicit axioms about the epistemic effects of actions, nor explicit specifications of accessibility relations. We develop a formal semantics for GDLIII and demonstrate that this language, despite its syntactic simplicity, is expressive enough to model the famous Muddy Children domain. We also show that it significantly enhances the expressiveness of its predecessor GDLII by formally proving that termination of games becomes undecidable, and we present experimental results with a reasoner for GDLIII applied to general epistemic puzzles.

Handling nonlocal deadends in Agent Planning Programs
Lukas Chrpa, Sebastian Sardina, Nir Lipovetzky
We propose an approach to reason about agent planning programs with global information. Agent planning programs can be understood as a network of planning tasks, accommodating longterm goals, nonterminating behaviors, and interactive execution. We provide a technique that relies on reasoning about ``global" deadends and that can be incorporated to any planningbased approach to agent planning problems. In doing so, we also introduce the notion of online execution of such planning structures. We provide experimental evidence suggesting the technique yields significant benefits.

Reasoning about Probabilities in Unbounded FirstOrder Dynamical Domains
Vaishak Belle, Gerhard Lakemeyer
When it comes to robotic agents operating in an uncertain world, a major concern in knowledge representation is to better relate highlevel logical accounts of belief and action to the lowlevel probabilistic sensorimotor data. Perhaps the most general formalism for dealing with degrees of belief and, in particular, how such beliefs should evolve in the presence of noisy sensing and acting is the account by Bacchus, Halpern, and Levesque. In this paper, we reconsider that model of belief, and propose a new logical variant that has much of the expressive power of the original, but goes beyond it in novel ways. In particular, by moving to a semantical account of a modal variant of the situation calculus based on possible worlds with unbounded domains and probabilistic distributions over them, we are able to capture the beliefs of a fully introspective knowledge base with uncertainty by way of an onlybelieving operator. The paper introduces the new logic and discusses key properties as well as examples that demonstrate how the beliefs of a knowledge base change as a result of noisy actions.

Transfer Learning in MultiArmed Bandits: A Causal Approach
Junzhe Zhang, Elias Bareinboim
Reinforcement learning (RL) agents have been deployed in complex environments where interactions are costly, and learning is usually slow. One prominent task in these settings is to reuse interactions performed by other agents to accelerate the learning process. Causal inference provides a family of methods to infer the effects of actions from a combination of data and qualitative assumptions about the underlying environment. Despite its success of transferring invariant knowledge across domains in the empirical sciences, causal inference has not been fully realized in the context of transfer learning in interactive domains. In this paper, we use causal inference as a basis to support a principled and more robust transfer of knowledge in RL settings. In particular, we tackle the problem of transferring knowledge across bandit agents in settings where causal effects cannot be identified by docalculus [Pearl, 2000] and standard learning techniques. Our new identification strategy combines two steps  first, deriving bounds over the arm’s distribution based on structural knowledge; second, incorporating these bounds in a dynamic allocation procedure so as to guide the search towards more promising actions. We formally prove that our strategy dominates previously known algorithms and achieves orders of magnitude faster convergence rates than these algorithms. Finally, we perform simulations and empirically demonstrate that our strategy is consistently more efficient than the current (noncausal) stateoftheart methods
Thursday 24 16:30  18:00 CSCO  Constraint Optimisation

ConstraintBased Symmetry Detection in General Game Playing
Frederic Koriche, Sylvain Lagrue, Eric Piette, Sébastien TABARY
Symmetry detection is a promising approach for reducing the search tree of games. In General Game Playing (GGP), where any game is compactly represented by a set of rules in the Game Description Language (GDL), the stateoftheart methods for symmetry detection rely on a rule graph associated with the GDL description of the game. Though such rulebased symmetry detection methods can be applied to various tree search algorithms, they cover only a limited number of symmetries which are apparent in the GDL description. In this paper, we develop an alternative approach to symmetry detection in stochastic games that exploits constraint programming techniques. The minimax optimization problem in a GDL game is cast as a stochastic constraint satisfaction problem (SCSP), which can be viewed as a sequence of onestage SCSPs. Minimax symmetries are inferred according to themicrostructure complement of these onestage constraint networks. Based on a theoretical analysis of this approach, we experimentally show on various games that the recent stochastic constraint solver MACUCB, coupled with constraintbased symmetry detection, significantly outperforms the standard Monte Carlo Tree Search algorithms, coupled with rulebased symmetry detection. This constraintdriven approach is also validated by the excellent results obtained by our player during the last GGP competition.

A Partitioning Algorithm for Maximum Common Subgraph Problems
Ciaran McCreesh, Patrick Prosser, James Trimble
We introduce a new branch and bound algorithm for the maximum common subgraph and maximum common connected subgraph problems which is based around vertex labelling and partitioning. Our method in some ways resembles a traditional constraint programming approach, but uses a novel compact domain store and supporting inference algorithms which dramatically reduce the memory and computation requirements during search, and allow better dual viewpoint ordering heuristics to be calculated cheaply. Experiments show a speedup of more than an order of magnitude over the state of the art, and demonstrate that we can operate on much larger graphs without running out of memory.

Robust Quadratic Programming for Price Optimization
Akihiro Yabe, Shinji Ito, Ryohei Fujimaki
The goal of price optimization is to maximize total revenue by adjusting the prices of products, on the basis of predicted sales numbers that are functions of pricing strategies. Recent advances in demand modeling using machine learning raise a new challenge in price optimization, i.e., how to manage statistical errors in estimation. In this paper, we show that uncertainty in recentlyproposed prescriptive price optimization frameworks can be represented by a matrix normal distribution. For this particular uncertainty, we propose novel robust quadratic programming algorithms for conservative lowerbound maximization. We offer an asymptotic probabilistic guarantee of conservativeness of our formulation. Our experiments on both artificial and actual price data show that our robust price optimization allows users to determine best riskreturn tradeoffs and to explore safe, profitable price strategies.

XORSampling for Network Design with Correlated Stochastic Events
Xiaojian Wu, Yexiang Xue, Bart Selman, Carla P. Gomes
Many network optimization problems can be formulated as stochastic network design problems in which edges are present or absent stochastically. Furthermore, protective actions can guarantee that edges will remain present. We consider the problem of finding the optimal protection strategy under a budget limit in order to maximize some connectivity measurements of the network. Previous approaches rely on the assumption that edges are independent. In this paper, we consider a more realistic setting where multiple edges are not independent due to natural disasters or regional events that make the states of multiple edges stochastically correlated. We use Markov Random Fields to model the correlation and define a new stochastic network design framework. We provide a novel algorithm based on Sample Average Approximation (SAA) coupled with a Gibbs or XOR sampler. The experimental results on real road network data show that the policies produced by SAA with the XOR sampler have higher quality and lower variance compared to SAA with Gibbs sampler.

Robust Regression via Heuristic Hard Thresholding
Xuchao Zhang, Liang Zhao, Arnold Boedihardjo, ChangTien Lu
The presence of data noise and corruptions recently invokes increasing attention on Robust Least Squares Regression (RLSR), which addresses the fundamental problem that learns reliable regression coefficients when response variables can be arbitrarily corrupted. Until now, several important challenges still cannot be handled concurrently: 1) exact recovery guarantee of regression coefficients 2) difficulty in estimating the corruption ratio parameter; and 3) scalability to massive dataset. This paper proposes a novel Robust Least squares regression algorithm via Heuristic Hard thresholding (RLHH), that concurrently addresses all the above challenges. Specifically, the algorithm alternately optimizes the regression coefficients and estimates the optimal uncorrupted set via heuristic hard thresholding without corruption ratio parameter until it converges. We also prove that our algorithm benefits from strong guarantees analogous to those of stateoftheart methods in terms of convergence rates and recovery guarantees. We provide empirical evidence to demonstrate that the effectiveness of our new method is superior to that of existing methods in the recovery of both regression coefficients and uncorrupted sets, with very competitive efficiency.

Restart and Random Walk in Local Search for Maximum Vertex Weight Cliques with Evaluations in Clustering Aggregation
Yi Fan, Nan Li, Chengqian Li, Zongjie Ma, Kaile Su, Longin Jan Latecki
The Maximum Vertex Weight Clique (MVWC) problem is NPhard and also important in realworld applications. In this paper we propose to use the restart and the random walk strategies to improve local search for MVWC. If a solution is revisited in some particular situation, the search will restart. In addition, when the local search has no other options except dropping vertices, it will use random walk. Experimental results show that our solver outperforms stateoftheart solvers in DIMACS and finds a new bestknown solution. Also it is the unique solver which is comparable with stateoftheart methods on both BHOSLIB and large crafted graphs. Furthermore we evaluated our solver in clustering aggregation. Experimental results on a number of real data sets demonstrate that our solver outperforms the stateoftheart for solving the derived MVWC problem and helps improve the final clustering results.
Thursday 24 16:30  18:00 MLTAML3  Transfer, Adaptation, MultiTask Learning 3

Modal Consistency based PreTrained MultiModel Reuse
Yang Yang, DeChuan Zhan, XiangYu Guo, Yuan Jiang
MultiModel Reuse is one of the prominent problems in Learnware framework, while the main issue of MultiModel Reuse lies in the final prediction acquisition from the responses of multiple pretrained models. Different from multiclassifiers ensemble, there are only pretrained models rather than the whole training sets provided in MultiModel Reuse configuration. This configuration is closer to the real applications where the reliability of each model cannot be evaluated properly. In this paper, aiming at the lack of evaluation on reliability, the potential consistency spread on different modalities is utilized. With the consistency of pretrained models on different modalities, we propose a Pretrained MultiModel Reuse approach PM2R with multimodal data, which realizes the reusability of multiple models. PM2R can combine pretrained multimodels efficiently without retraining, and consequently no more training data storage is required. We describe the more realistic MultiModel Reuse setting comprehensively in our paper, and point out the differences among this setting, classifier ensemble and later fusion on multimodal learning. Experiments on synthetic and realworld datasets validate the effectiveness of PM2R when it is compared with stateoftheart ensemble/multimodal learning methods under this more realistic setting.

Joint Image Emotion Classification and Distribution Learning via Deep Convolutional Neural Network
Jufeng Yang, Dongyu She, Ming Sun
Visual sentiment analysis is attracting more and more attention with the increasing tendency to express emotions through visual contents. Recent algorithms in convolutional neural networks (CNNs) considerably advance the emotion classification, which aims to distinguish differences among emotional categories and assigns a single dominant label to each image. However, the task is inherently ambiguous since an image usually evokes multiple emotions and its annotation varies from person to person. In this work, we address the problem via label distribution learning (LDL) and develop a multitask deep framework by jointly optimizing both classification and distribution prediction. While the proposed method prefers to the distribution dataset with annotations of different voters, the majority voting scheme is widely adopted as the ground truth in this area, and few dataset has provided multiple affective labels. Hence, we further exploit two weak forms of prior knowledge, which are expressed as similarity information between labels, to generate emotional distribution for each category. The experiments conducted on both distribution datasets, i.e., Emotion6, Flickr_LDL, Twitter_LDL, and the largest single emotion dataset, i.e., Flickr and Instagram, demonstrate the proposed method outperforms the stateoftheart approaches.

Tensor Based Knowledge Transfer Across Skill Categories for Robot Control
Chenyang Zhao, Timothy Hospedales, Freek Stulp, Olivier Sigaud
Advances in hardware and learning for control are enabling robots to perform increasingly dextrous and dynamic control tasks. These skills typically require a prohibitive amount of exploration for reinforcement learning, and so are commonly achieved by imitation learning from manual demonstration. The costly nonscalable nature of manual demonstration has motivated work into skill generalisation, e.g., through contextual policies and options. Despite good results, existing work along these lines is limited to generalising across variants of one skill such as throwing an object to different locations. In this paper we go significantly further and investigate generalisation across qualitatively different classes of control skills. In particular, we introduce a class of neural network controllers that can realise four distinct skill classes: reaching, object throwing, casting, and ballincup. By factorising the weights of the neural network, we are able to extract transferrable latent skills, that enable dramatic acceleration of learning in crosstask transfer. With a suitable curriculum, this allows us to learn challenging dextrous control tasks like ballincup from scratch with pure reinforcement learning.

Learning with Previously Unseen Features
Yuan Shi, Craig Knoblock
We study the problem of improving a machine learning model by identifying and using features that are not in the training set. This is applicable to machine learning systems deployed in an open environment. For example, a prediction model built on a set of sensors may be improved when it has access to new and relevant sensors at test time. To effectively use new features, we propose a novel approach that learns a model over both the original and new features, with the goal of making the joint distribution of features and predicted labels similar to that in the training set. Our approach can naturally leverage labels associated with these new features when they are accessible. We present an efficient optimization algorithm for learning the model parameters and empirically evaluate the approach on several regression and classification tasks. Experimental results show that our approach can achieve on average 11.2% improvement over baselines.

Exploiting HighOrder Information in Heterogeneous MultiTask Feature Learning
Yong Luo, Yonggang Wen, Dacheng Tao
Multitask feature learning (MTFL) aims to improve the generalization performance of multiple related learning tasks by sharing features between them. It has been successfully applied to many pattern recognition and biometric prediction problems. Most of current MTFL methods assume that different tasks exploit the same feature representation, and thus are not applicable to the scenarios where data are drawn from heterogeneous domains. Existing heterogeneous transfer learning (including multitask learning) approaches handle multiple heterogeneous domains by usually learning feature transformations across different domains, but they ignore the highorder statistics (correlation information) which can only be discovered by simultaneously exploring all domains. We therefore develop a tensor based heterogeneous MTFL (THMTFL) framework to exploit such highorder information. Specifically, feature transformations of all domains are learned together, and finally used to derive new representations. A connection between all domains is built by using the transformations to project the prelearned predictive structures of different domains into a common subspace, and minimizing their divergence in the subspace. By exploring the highorder information, the proposed THMTFL can obtain more reliable feature transformations compared with existing heterogeneous transfer learning approaches. Extensive experiments on both text categorization and social image annotation demonstrate superiority of the proposed method.

Adaptive Group Sparse Multitask Learning via Trace Lasso
Sulin Liu, Sinno Jialin Pan
In multitask learning (MTL), tasks are learned jointly so that information among related tasks is shared and utilized to help improve generalization for each individual task. A major challenge in MTL is how to selectively choose what to share among tasks. Ideally, only related tasks should share information with each other. In this paper, we propose a new MTL method that can adaptively group correlated tasks into clusters and share information among the correlated tasks only. Our method is based on the assumption that each task parameter is a linear combination of other tasks' and the coefficients of the linear combination are active only if there is relatedness between the two tasks. Through introducing trace Lasso penalty on these coefficients, our method is able to adaptively select the subset of coefficients with respect to the tasks that are correlated to the task. Our model frees the process of determining task clustering structure as used in the literature. Efficient optimization methods based on alternating direction method of multipliers (ADMM) is developed to solve the problem. Experimental results on both synthetic and realworld datasets demonstrate the effectiveness of our method in terms of clustering related tasks and generalization performance.
Thursday 24 16:30  18:00 MLEM  Ensemble Methods

Positive unlabeled learning via wrapperbased adaptive sampling
Pengyi Yang, Wei Liu, Jean Yang
Learning from positive and unlabeled data frequently occurs in applications where only a subset of positive instances is available while the rest of the data are unlabeled. In such scenarios, often the goal is to create a discriminant model that can accurately classify both positive and negative data by modelling from labeled and unlabeled instances. In this study, we propose an adaptive sampling (AdaSampling) approach that utilises prediction probabilities from a model to iteratively update the training data. Starting with equal prior probabilities for all unlabeled data, our method "wraps" around a predictive model to iteratively update these probabilities to distinguish positive and negative instances in unlabeled data. Subsequently, one or more robust negative set(s) can be drawn from unlabeled data, according to the likelihood of each instance being negative, to train a single classification model or ensemble of models.

Integrating Specialized Classifiers Based on Continuous Time Markov Chain
Zhizhong Li, Dahua Lin
Specialized classifiers, namely those dedicated to a subset of classes, are often adopted in realworld recognition systems. However, integrating such classifiers is nontrivial. Existing methods, e.g. weighted average, usually implicitly assume that all constituents of an ensemble cover the same set of classes. Such methods can produce misleading predictions when used to combine specialized classifiers. This work explores a novel approach. Instead of combining predictions from individual classifiers directly, it first decomposes the predictions into sets of pairwise preferences, treating them as transition channels between classes, and thereon constructs a continuoustime Markov chain, and use the equilibrium distribution of this chain as the final prediction. This way allows us to form a coherent picture over all specialized predictions. On large public datasets, the proposed method obtains considerable improvement compared to mainstream ensemble methods, especially when the classifier coverage is highly unbalanced.

Unsupervised Learning of Deep Feature Representation for Clustering Egocentric Actions
Bharat Lal Bhatnagar, Suriya Singh, Chetan Arora, C.V. Jawahar
Popularity of wearable cameras in life logging, law enforcement, assistive vision and other similar applications is leading to explosion in generation of egocentric video content. First person action recognition is an important aspect of automatic analysis of such videos. Annotating such videos is hard, not only because of obvious scalability constraints, but also because of privacy issues often associated with egocentric videos. This motivates the use of unsupervised methods for egocentric video analysis. In this work, we propose a robust and generic unsupervised approach for first person action clustering. Unlike the contemporary approaches, our technique is neither limited to any particular class of actions nor requires priors such as pretraining, finetuning, etc. We learn time sequenced visual and flow features from an array of weak feature extractors based on convolutional and LSTM autoencoder networks. We demonstrate that clustering of such features leads to the discovery of semantically meaningful actions present in the video. We validate our approach on four disparate public egocentric actions datasets amounting to approximately 50 hours of videos. We show that our approach surpasses the supervised state of the art accuracies without using the action labels.

Bayesian Aggregation of Categorical Distributions with Applications in Crowdsourcing
alexandry augustin, Matteo Venanzi, Nicholas R. Jennings, Alex Rogers
A key problem in crowdsourcing is the aggregation of judgments of proportions. For example, workers might be presented with a news article or an image, and be asked to identify the proportion of each topic, sentiment, object, or colour present in it. These varying judgments then need to be aggregated to form a consensus view of the document’s or image’s contents. Often, however, these judgments are skewed by workers who provide judgments randomly. Such spammers make the cost of acquiring judgments more expensive and degrade the accuracy of the aggregation. For such cases, we provide a new Bayesian framework for aggregating these responses (expressed in the form of categorical distributions) that for the first time accounts for spammers. We elicit 796 judgments about proportions of objects and coloursin images. Experimental results show comparable aggregation accuracy when 60% of the workers are spammers, as other state of the art approaches do when there are no spammers.

Deep Forest: Towards An Alternative to Deep Neural Networks
ZhiHua Zhou, Ji Feng
In this paper, we propose gcForest, a decision tree ensemble approach with performance highly competitive to deep neural networks in a broad range of tasks. In contrast to deep neural networks which require great effort in hyperparameter tuning, gcForest is much easier to train; even when it is applied to different data across different domains in our experiments, excellent performance can be achieved by almost same settings of hyperparameters. The training process of gcForest is efficient, and users can control training cost according to computational resource available. The efficiency may be further enhanced because gcForest is naturally apt to parallel implementation. Furthermore, in contrast to deep neural networks which require largescale training data, gcForest can work well even when there are only smallscale training data.

Stacking With Auxiliary Features
Nazneen Fatema Rajani, Raymond J. Mooney
Ensembling methods are well known for improving prediction accuracy. However, they are limited in the sense that they cannot effectively discriminate among component models. In this paper, we propose stacking with auxiliary features that learns to fuse additional relevant information from multiple component systems as well as input instances to improve performance. We use two types of auxiliary features  instance features and provenance features. The instance features enable the stacker to discriminate across input instances and the provenance features enable the stacker to discriminate across component systems. When combined together, our algorithm learns to rely on systems that not just agree on an output but also the provenance of this output in conjunction with the properties of the input instance. We demonstrate the success of our approach on three very different and challenging natural language and vision problems: Slot Filling, Entity Discovery and Linking, and ImageNet Object Detection. We obtain new stateoftheart results on the first two tasks and significant improvements on the ImageNet task, thus verifying the power and generality of our approach.
Thursday 24 16:30  18:00 KRNMR  NonMonotonic Reasoning

Semantics for Active Integrity Constraints Using Approximation Fixpoint Theory
Bart Bogaerts, Luís CruzFilipe
Active integrity constraints (AICs) constitute a formalism to associate with a database not just the constraints it should adhere to, but also how to fix the database in case one or more of these constraints are violated. The intuitions regarding which repairs are “good” given such a description are closely related to intuitions that live in various areas of nonmonotonic reasoning. In this paper, we apply approximation fixpoint theory, an algebraic framework that unifies semantics of nonmonotonic logics, to the field of AICs. This results in a new family of semantics for AICs, of which we study semantics and relationships to existing semantics. We argue that the AFTwellfounded semantics has some desirable properties.

Safe Inductions: An Algebraic Study
Bart Bogaerts, Joost Vennekens, Marc Denecker
In many knowledge representation formalisms, a constructive semantics is defined based on sequential applications of rules or of a semantic operator. These constructions often share the property that rule applications must be delayed until it is safe to do so: until it is known that the condition that triggers the rule will remain to hold. This intuition occurs for instance in the wellfounded semantics of logic programs and in autoepistemic logic. In this paper, we formally define the safety criterion algebraically. We study properties of socalled safe inductions and apply our theory to logic programming and autoepistemic logic. For the latter, we show that safe inductions manage to capture the intended meaning of a class of theories on which all classical constructive semantics fail.

A Study of Unrestricted Abstract Argumentation Frameworks
Ringo Baumann, Christof Spanring
Research in abstract argumentation typically pertains to finite argumentation frameworks (AFs). Actual or potential infinite AFs frequently occur if theyare used for the purpose of nonmonotonic entailment, socalled instantiationbased argumentation,or if they are involved as modeling tool for dialogues, npersongames or action sequences. Apartfrom these practical cases a profound analysis yieldsa better understanding of how the nonmonotonic theory of abstract argumentation works in general. Inthis paper we study a bunch of abstract propertieslike SCCrecursiveness, expressiveness or intertranslatability for unrestricted AFs.

Streaming MultiContext Systems
Minh DaoTran, Thomas Eiter
MultiContext Systems (MCS) are a powerful framework to interlink heterogeneous knowledge bases under equilibrium semantics. Recent extensions of MCS to dynamic data settings either abstract from computing time, or abandon a dynamic equilibrium semantics. We thus present streaming MCS, which have a runbased semantics that accounts for asynchronous, distributed execution and supports obtaining equilibria for contexts in cyclic exchange (avoiding infinite loops); moreover, they equip MCS with native stream reasoning features. Adhoc query answering is NPcomplete while prediction is PSpacecomplete in relevant settings (but undecidable in general); tractability results for suitable restrictions.

A Unifying Framework for Probabilistic Belief Revision
ZHIQIANG ZHUANG, James Delgrande, Abhaya Nayak, Abdul Sattar
In this paper we provide a general, unifying framework for probabilistic belief revision. We first introduce a probabilistic logic called plogic that is capable of representing and reasoning with basic probabilistic information. With plogic as the background logic, we define a revision function called prevision that resembles partial meet revision in the AGM framework. We provide a representation theorem for prevision which shows that it can be characterised by the set of basic AGM revision postulates. Prevision represents an "all purpose" method for revising probabilistic information that can be used for, but not limited to, the revision problems behind Bayesian conditionalisation, Jeffrey conditionalisation, and Lewis's imaging. Importantly, prevision subsumes all three approaches indicating that Bayesian conditionalisation, Jeffrey conditionalisation, and Lewis' imaging all obey the basic principles of AGM revision. As well our investigation sheds light on the corresponding operation of AGM expansion in the probabilistic setting.

LazyGrounding for Answer Set Programs with External Source Access
Thomas Eiter, Tobias Kaminski, Antonius Weinzierl
HEXprograms enrich the wellknown Answer Set Programming (ASP) paradigm. In HEX, problems are solved using nonmonotonic logic programs with bidirectional access to external sources. ASP evaluation is traditionally based on grounding the input program first, but recent advances in lazygrounding make the latter also interesting for HEX, as the grounding bottleneck of ASP may be avoided. We explore this issue and present a new evaluation algorithm for HEXprograms based on lazygrounding solving for ASP. Nonmonotonic dependencies and value invention (i.e., import of new constants) from external sources make an efficient solution nontrivial. However, illustrative benchmarks show a clear advantage of the new algorithm for groundingintense programs, which is a new perspective to make HEX more suitable for realworld application needs.
Thursday 24 16:30  18:00 AUTETH  AI & Autonomy: Ethics and Responsibility

Responsible Autonomy
Virginia Dignum
As intelligent systems are increasingly making decisions that directly affect society, perhaps the most important upcoming research direction in AI is to rethink the ethical implications of their actions. Means are needed to integrate moral, societal and legal values with technological developments in AI, both during the design process as well as part of the deliberation algorithms employed by these systems. In this paper, we describe leading ethics theories and propose alternative ways to ensure ethical behavior by artificial systems. Given that ethics are dependent on the sociocultural context and are often only implicit in deliberation processes, methodologies are needed to elicit the values held by designers and stakeholders, and to make these explicit leading to better understanding and trust on artificial autonomous systems.

Should Robots be Obedient?
Smitha Milli, Dylan HadfieldMenell, Anca Dragan, Stuart Russell
Intuitively, obedience  following the order that a human gives  seems like a good property for a robot to have. But, we humans are not perfect and we may give orders that are not best aligned to our preferences. We show that when a human is not perfectly rational then a robot that tries to infer and act according to the human's underlying preferences can always perform better than a robot that simply follows the human's literal order. Thus, there is a tradeoff between the obedience of a robot and the value it can attain for its owner. We investigate how this tradeoff is impacted by the way the robot infers the human's preferences, showing that some methods err more on the side of obedience than others. We then analyze how performance degrades when the robot has a misspecified model of the features that the human cares about or the level of rationality of the human. Finally, we study how robots can start detecting such model misspecification. Overall, our work suggests that there might be a middle ground in which robots intelligently decide when to obey human orders, but err on the side of obedience.

On Automating the Doctrine of Double Effect
Naveen Sundar Govindarajulu, Selmer Bringsjord
The doctrine of double effect (DDE) is a longstudied ethical principle that governs when actions that have both positive and negative effects are to be allowed. The goal in this paper is to automate DDE. We briefly present DDE, and use a firstorder modal logic, the deontic cognitive event calculus, as our framework to formalize the doctrine. We present formalizations of increasingly stronger versions of the principle, including what is known as the doctrine of triple effect. We then use our framework to simulate successfully scenarios that have been used to test the presence of the principle in human subjects. Our framework can be used in two different modes. One can use it to build DDEcompliant autonomous systems from scratch, or one can use it to verify that a given AI system is DDEcomplaint, by applying a DDE layer on an existing system or model. For the latter mode, the underlying AI system can be built using any architecture (planners, deep neural networks, bayesian networks, knowledgerepresentation systems, or a hybrid); as long as the system exposes a few parameters in its model, such verification is possible. The role of the DDE layer here is akin to a (dynamic or static) software verifier that examines existing software modules. Finally, we end by sketching initial work on how one can apply our DDE layer to the STRIPSstyle planning model, and to a modified POMDP model. This is preliminary work to illustrate the feasibility of the second mode, and we hope that our initial sketches can be useful for other researchers in incorporating DDE in their own frameworks.

When will negotiation agents be able to represent us? The challenges and opportunities for autonomous negotiators
Tim Baarslag, Michael Kaisers, Enrico H. Gerding, Catholijn M. Jonker, Jonathan Gratch
Computers that negotiate on our behalf hold great promise for the future and will even become indispensable in emerging application domains such as the smart grid and the Internet of Things. Much research has thus been expended to create agents that are able to negotiate in an abundance of circumstances. However, up until now, truly autonomous negotiators have rarely been deployed in realworld applications. This paper sizes up current negotiating agents and explores a number of technological, societal and ethical challenges that autonomous negotiation systems have brought about. The questions we address are: in what sense are these systems autonomous, what has been holding back their further proliferation, and is their spread something we should encourage? We relate the automated negotiation research agenda to dimensions of autonomy and distill three major themes that we believe will propel autonomous negotiation forward: accurate representation, longterm perspective, and user trust. We argue these orthogonal research directions need to be aligned and advanced in unison to sustain tangible progress in the field.
Thursday 24 16:30  18:00 MLLT  Learning Theory

Understanding How Feature Structure Transfers in Transfer Learning
Tongliang Liu, Dacheng Tao, Qiang Yang
Transfer learning transfers knowledge across domains to improve the learning performance. Since feature structures generally represent the common knowledge across different domains, they can be transferred successfully even though the labeling functions across domains differ arbitrarily. However, theoretical justification for this success has remained elusive. In this paper, motivated by selftaught learning, we regard a set of bases as a feature structure of a domain if the bases can (approximately) reconstruct any observation in this domain. We propose a general analysis scheme to theoretically justify that if the source and target domains share similar feature structures, the source domain feature structure is transferable to the target domain, regardless of the change of the labeling functions across domains. The transferred structure is interpreted to function as a regularization matrix which benefits the learning process of the target domain task. We prove that such transfer enables the corresponding learning algorithms to be uniformly stable. Specifically, we illustrate the existence of feature structure transfer in two wellknown transfer learning settings: domain adaptation and learning to learn.

QueryDriven Discovery of Anomalous Subgraphs in Attributed Graphs
Nannan Wu, Feng Chen, Jianxin Li, Jinpeng Huai, Bo Li
For a detection problem, a user often has some prior knowledge about the structurespecific subgraphs of interest, but few traditional approaches are capable of employing this knowledge. The main technical challenge is that few approaches can efficiently model the space of connected subgraphs that are isomorphic to a query graph. We present a novel, efficient approach for optimizing a generic nonlinear cost function subject to a queryspecific structural constraint. Our approach enjoys strong theoretical guarantees on the convergence of a nearly optimal solution and a low time complexity. For the case study, we specialize the nonlinear function to several wellknown graph scan statistics for anomalous subgraph discovery. Empirical evidence demonstrates that our method is superior to stateoftheart methods in several realworld anomaly detection tasks.

Thresholding Bandits with Augmented UCB
Subhojyoti Mukherjee, Naveen Kolar Purushothama, Nandan Sudarsanam, Balaraman Ravindran
In this paper we propose the AugmentedUCB (AugUCB) algorithm for a fixedbudget version of the thresholding bandit problem (TBP), where the objective is to identify a set of arms whose quality is above a threshold. A key feature of AugUCB is that it uses both mean and variance estimates to eliminate arms that have been sufficiently explored; to the best of our knowledge this is the first algorithm to employ such an approach for the considered TBP. Theoretically, we obtain an upper bound on the loss (probability of misclassification) incurred by AugUCB. Although UCBEV in literature provides a better guarantee, it is important to emphasize that UCBEV has access to problem complexity (whose computation requires arms' mean and variances), and hence is not realistic in practice; this is in contrast to AugUCB whose implementation does not require any such complexity inputs. We conduct extensive simulation experiments to validate the performance of AugUCB. Through our simulation work, we establish that AugUCB, owing to its utilization of variance estimates, performs significantly better than the stateoftheart APT, CSAR and other non variancebased algorithms.

No Learner Left Behind: On the Complexity of Teaching Multiple Learners Simultaneously
Xiaojin Zhu, Ji Liu, Manuel Lopes
We present a theoretical study of algorithmic teaching in the setting where the teacher must use the same training set to teach multiple learners. This problem is a theoretical abstraction of the realworld classroom setting in which the teacher delivers the same lecture to academically diverse students. We define a minimax teaching criterion to guarantee the performance of the worst learner in the class. We prove that the teaching dimension increases with class diversity in general. For the classes of conjugate Bayesian learners and linear regression learners, respectively, we exhibit corresponding minimax teaching set. We then propose a method to enhance teaching by partitioning the class into sections. We present cases where the optimal partition minimizes overall teaching dimension while maintaining the guarantee on all learners. Interestingly, we show personalized education (one learner per section) is not necessarily the optimal partition. Our results generalize algorithmic teaching to multiple learners and offer insight on how to teach large classes.

On the Complexity of Learning from Label Proportions
Lev Reyzin, Benjamin Fish
In the problem of learning with label proportions (also known as the problem of estimating class ratios), the training data is unlabeled, and only the proportions of examples receiving each label are given. The goal is to learn a hypothesis that predicts the proportions of labels on the distribution underlying the sample. This model of learning is useful in a wide variety of settings, including predicting the number of votes for candidates in political elections from polls. In this paper, we resolve foundational questions regarding the computational complexity of learning in this setting. We formalize a simple version of the setting, and we compare the computational complexity of learning in this model to classical PAC learning. Perhaps surprisingly, we show that what can be learned efficiently in this model is a strict subset of what may be leaned efficiently in PAC, under standard complexity assumptions. We give a characterization in terms of VC dimension, and we show that there are nontrivial problems in this model that can be efficiently learned. We also give an algorithm that demonstrates the feasibility of learning under wellbehaved distributions.

Fast Stochastic Variance Reduced ADMM for Stochastic Composition Optimization
Yue Yu, Longbo Huang
We consider the stochastic composition optimization problem proposed in \cite{wang2017stochastic}, which has applications ranging from estimation to statistical and machine learning. We propose the first ADMM based algorithm named com SVR ADMM, and show that com SVR ADMM converges linearly for strongly convex and Lipschitz smooth objectives, and has a convergence rate of $O(\logS/S)$, which improves upon the $O(S^{4/9})$ rate in \cite{wang2016accelerating} when the objective is convex and Lipschitz smooth. Moreover, com SVR ADMM possesses a rate of $O(1/\sqrt{S})$ when the objective is convex but without Lipschitz smoothness. We also conduct experiments and show that it outperforms existing algorithms.
Thursday 24 16:30  18:30 Competition Angry Birds
Thursday 24 16:30  18:30 JOUKR2  Journal Track: Knowledge Representation 2

Exploiting Causality for Selective Belief Filtering in Dynamic Bayesian Networks (Extended Abstract)
Stefano V. Albrecht, Subramanian Ramamoorthy
Dynamic Bayesian networks (DBNs) are a general model for stochastic processes with partially observed states. Belief filtering in DBNs is the task of inferring the belief state (i.e. the probability distribution over process states) based on incomplete and uncertain observations. In this article, we explore the idea of accelerating the filtering task by automatically exploiting causality in the process. We consider a specific type of causal relation, called passivity, which pertains to how state variables cause changes in other variables. We present the Passivitybased Selective Belief Filtering (PSBF) method, which maintains a factored belief representation and exploits passivity to perform selective updates over the belief factors. PSBF is evaluated in both synthetic processes and a simulated multirobot warehouse, where it outperformed alternative filtering methods by exploiting passivity.

Construction of System of Spheresbased Transitively Relational Partial Meet Multiple Contractions: An Impossibility Result (Extended Abstract)
Maurício Reis, Eduardo Fermé, Pavlos Peppas
In this paper we show that, contrary to what is the case in what concerns contractions by a single sentence, there is not a system of spheresbased construction of multiple contractions which generates each and every transitively relational partial meet multiple contraction. Furthermore, we propose two system of spheresbased constructions of multiple contractions which generate (only) transitively relational partial meet multiple contractions.

Evaluating Epistemic Negation in Answer Set Programming (Extended Abstract)
YiDong Shen, Thomas Eiter
Epistemic negation 'not' along with default negation 'neg' plays a key role in knowledge representation and nonmonotonic reasoning. However, the existing approaches behave not satisfactorily in that they suffer from the problems of unintended world views due to recursion through the epistemic modal operator K or M ( K F and M F are shorthands for (neg not F) and (not neg F), respectively). In this paper we present a general approach to epistemic negation which is free of unintended world views and thus offers a solution to the longstanding problem of epistemic specifications which were introduced by Gelfond 1991 over two decades ago.

POPPONENT: Highly accurate, individually and socially efficient opponent preference model in bilateral multi issue negotiations (Extended Abstract)
farhad zafari, Faria NassiriMofakham
In automated bilateral multi issue negotiations, two intelligent automated agents negotiate on behalf of their owners over many issues in order to reach an agreement. Modeling the opponent can excessively boost the performance of the agents and increase the quality of the negotiation outcome. State of the art models accomplish this by considering some assumptions about the opponent which restricts their applicability in real scenarios. In this paper, a less restricted technique where perceptron units (POPPONENT) are applied in modelling the preferences of the opponent is proposed. This model adopts a Multi Bipartite version of the Standard Gradient Descent search algorithm (MBGD) to find the best hypothesis, which is the best preference profile. In order to evaluate the accuracy and performance of this proposed opponent model, it is compared with the state of the art models available in the Genius repository. This results in the devised setting which approves the higher accuracy of POPPONENT compared to the most accurate state of the art model. Evaluating the model in the real world negotiation scenarios in the Genius framework also confirms its high accuracy in relation to the state of the art models in estimating the utility of offers. The findings here indicate that the proposed model is individually and socially efficient. This proposed MBGD method could also be adopted in similar practical areas of Artificial Intelligence.

Relations Between Spatial Calculi About Directions and Orientations (Extended Abstract)
Reinhard Moratz, Till Mossakowski
A qualitative representation of space and/or time provides mechanisms which characterize the essential properties of objects or configurations. The advantages over quantitative representations can be: (1) a better match with human concepts related to natural language,and (2) better efficiency for reasoning.The two main trends in qualitative spatial constraint reasoning are topological reasoning about regions and reasoning about directions between points and straight lines and orientations of straight lines or configurations derived from points. In constraintbased reasoning about spatial configurations, typically a partial initial knowledge of a scene is represented in terms of qualitative constraints between spatial objects. Implicit knowledge about spatial relations is then derived by constraint propagation.In this work, we apply universal algebraic tools to binary qualitative calculi and demonstrate that two calculi expressing related features but on different levels of granularity can often be connected via homomorphisms. Full details and proofs can be found in the full version of our paper: Mossakowski, Till, and Reinhard Moratz. "Relations Between Spatial Calculi About Directions and Orientations." J. Artif. Intell. Res.(JAIR) 54 (2015): 277308.

On Redundant Topological Constraints
Sanjiang Li, Zhiguo Long, Weiming Liu, Matt Duckham, Alan Both
Redundancy checking is an important task in AI subfields such as knowledge representation and constraint solving. This paper considers redundant topological constraints, defined in the region connection calculus RCC8. We say a constraint in a set C of RCC8 constraints is redundant if it is entailed by the rest of C. A prime subnetwork of C is a subset of C which contains no redundant constraints and has the same solution set as C. It is natural to ask how to compute such a prime subnetwork, and when it is unique. While this problem is in general intractable, we show that, if S is a subalgebra of RCC8 in which weak composition distributes over nonempty intersections, then C has a unique prime subnetwork, which can be obtained in cubic time by removing all redundant constraints simultaneously from C. As a byproduct, we show that any pathconsistent network over such a distributive subalgebra is minimal.
Thursday 24 16:30  18:30 SISML  Sister Conference Track: Machine Learning

On Thompson Sampling and Asymptotic Optimality
Jan Leike, Tor Lattimore, Laurent Orseau, Marcus Hutter
We discuss some recent results on Thompson sampling for nonparametric reinforcement learning in countable classes of general stochastic environments. These environments can be nonMarkovian, nonergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges in mean to the optimal value and (2) given a recoverability assumption regret is sublinear. We conclude with a discussion about optimality in reinforcement learning.

SelfAdjusting Memory: How to Deal with Diverse Drift Types
Viktor Losing, Barbara Hammer, Heiko Wersing
Data Mining in nonstationary data streams is particularly relevant in the context of the Internet of Things and Big Data. Its challenges arise from fundamentally different drift types violating assumptions of data independence or stationarity. Available methods often struggle with certain forms of drift or require unavailable a priori task knowledge. We propose the SelfAdjusting Memory (SAM) model for the k Nearest Neighbor (kNN) algorithm. SAMkNN can deal with heterogeneous concept drift, i.e. different drift types and rates. Its basic idea are dedicated models for current and former concepts used according to the demands of the given situation. It can be robustly applied in practice without meta parameter optimization. We conduct an extensive evaluation on various benchmarks, consisting of artificial streams with known drift characteristics and realworld datasets. Highly competitive results throughout all experiments underline the robustness of SAMkNN as well as its capability to handle heterogeneous concept drift.

Learning and Applying Case Adaptation Rules for Classification: An Ensemble Approach
vahid jalali, David Leake, Najmeh Forouzandehmehr
The ability of casebased reasoning systems to solve novel problems depends on their capability to adapt past solutions to new circumstances. However, acquiring the knowledge required for case adaptation is a classic challenge for CBR. This motivates the use of machine learning methods to generate adaptation knowledge. A popular approach uses the case difference heuristic (CDH) to generate adaptation rules from pairs of cases in the case base, based on the premise that the observed differences in case solutions result from the differences in the problems they solve, so can form the basic of rules to adapt cases with similar problem differences. Extensive research has successfully applied the CDH approach to adaptation rule learning for casebased regression (numerical prediction) tasks. However, classification tasks have been outside of its scope. The work presented in this paper addresses that gap by extending CDHbased learning of adaptation rules to apply to cases with categorical features and solutions. It presents the generalized case value heuristic to assess case and solution differences and applies it in an ensemblebased casebased classification method, ensembles of adaptations for classification (EAC), built on the authors' previous work on ensembles of adaptations for regression (EAR). Experimental results support the effectiveness of EAC.

OpenWorld Probabilistic Databases: An Abridged Report
Ismail Ilkan Ceylan, Adnan Darwiche, Guy Van Den Broeck
Largescale probabilistic knowledge bases are becoming increasingly important in academia and industry alike. They are constantly extended with new data, powered by modern information extraction tools that associate probabilities with database tuples. In this paper, we revisit the semantics underlying such systems. In particular, the closedworld assumption of probabilistic databases, that facts not in the database have probability zero, clearly conflicts with their everyday use. To address this discrepancy, we propose an openworld probabilistic database semantics, which relaxes the probabilities of open facts to default intervals. For this openworld setting, we lift the existing data complexity dichotomy of probabilistic databases, and propose an efficient evaluation algorithm for unions of conjunctive queries. We also show that query evaluation can become harder for nonmonotone queries.

Model Accuracy and Runtime Tradeoff in Distributed Deep Learning: A Systematic Study
Wei Zhang, Suyog Gupta, Fei Wang
Deep learning with a large number of parameters requires distributed training, where model accuracy and runtime are two important factors to be considered. However, there has been no systematic study of the tradeoff between these two factors during the model training process. This paper presents Rudra, a parameter server based distributed computing framework tuned for training largescale deep neural networks. Using variants of the asynchronous stochastic gradient descent algorithm we study the impact of synchronization protocol, stale gradient updates, minibatch size, learning rates, and number of learners on runtime performance and model accuracy. We introduce a new learningrate modulation strategy to counter the effect of stale gradients and propose a new synchronization protocol that can effectively bound the staleness in gradients, improve runtime performance and achieve good model accuracy. Our empirical investigation reveals a principled approach for distributed training of neural networks: the minibatch size per learner should be reduced as more learners are added to the system to preserve the model accuracy. We validate this approach using commonlyused image classification benchmarks: CIFAR10 and ImageNet.

Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling
Christopher De Sa, Kunle Olukotun, Christopher Ré
Gibbs sampling is a Markov chain Monte Carlo technique commonly used for estimating marginal distributions. To speed up Gibbs sampling, there has recently been interest in parallelizing it by executing asynchronously. While empirical results suggest that many models can be efficiently sampled asynchronously, traditional Markov chain analysis does not apply to the asynchronous case, and thus asynchronous Gibbs sampling is poorly understood. In this paper, we derive a better understanding of the two main challenges of asynchronous Gibbs: bias and mixing time. We show experimentally that our theoretical results match practical outcomes.
Thursday 24 18:30  19:30 Special Session Business Meeting
Friday 25 08:30  10:00 Industry Day Startups
Friday 25 08:30  10:00 Competition Angry Birds
Friday 25 08:30  10:00 NLPIR  Information Retrieval

Dynamic MultiView Hashing for Online Image Retrieval
Liang Xie, Jialie Shen, Jungong Han, Lei Zhu, Ling Shao
Advanced hashing technique is essential to facilitate effective large scale online image organization and retrieval, where image contents could be frequently changed. Traditional multiview hashing methods are developed based on batchbased learning, which leads to very expensive updating cost. Meanwhile, existing online hashing methods mainly focus on singleview data and thus can not achieve promising performance when searching real online images, which are multiple view based data. Further, both types of hashing methods can only produce hash code with fixed length. Consequently they suffer from limited capability to comprehensive characterization of streaming image data in the real world. In this paper, we propose dynamic multiview hashing (DMVH), which can adaptively augment hash codes according to dynamic changes of image. Meanwhile, DMVH leverages online learning to generate hash codes. It can increase the code length when current code is not able to represent new images effectively. Moreover, to gain further improvement on overall performance, each view is assigned with a weight, which can be efficiently updated during the online learning process. In order to avoid the frequent updating of code length and view weights, an intelligent buffering scheme is also specifically designed to preserve significant data to maintain good effectiveness of DMVH. Experimental results on two realworld image datasets demonstrate superior performance of DWVH over several stateoftheart hashing methods.

A Structural Representation Learning for Multirelational Networks
Lin Liu, Xin Li, William K. Cheung, Chengcheng Xu
Most of the existing multirelational network embedding methods, e.g., TransE, are formulated to preserve pairwise connectivity structures in the networks. With the observations that significant triangular connectivity structures and parallelogram connectivity structures found in many real multirelational networks are often ignored and that a hardconstraint commonly adopted by most of the network embedding methods is inaccurate by design, we propose a novel representation learning model for multirelational networks which can alleviate both fundamental limitations. Scalable learning algorithms are derived using the stochastic gradient descent algorithm and negative sampling. Extensive experiments on real multirelational network datasets of WordNet and Freebase demonstrate the efficacy of the proposed model when compared with the stateoftheart embedding methods.

How Unlabeled Web Videos Help Complex Event Detection?
Huan Liu, Qinghua Zheng, MINNAN LUO, Dingwen Zhang, Xiaojun Chang, Cheng Deng
The lack of labeled exemplars is an important factor that makes the task of multimedia event detection (MED) complicated and challenging. Utilizing artificially picked and labeled external sources is an effective way to enhance the performance of MED. However, building these data usually requires professional human annotators, and the procedure is too timeconsuming and costly to scale. In this paper, we propose a new robust dictionary learning framework for complex event detection, which is able to handle both labeled and easytoget unlabeled web videos by sharing the same dictionary. By employing the lqnorm based loss jointly with the structured sparsity based regularization, our model shows strong robustness against the substantial noisy and outlier videos from open source. We exploit an effective optimization algorithm to solve the proposed highly nonsmooth and nonconvex problem. Extensive experiment results over standard datasets of TRECVID MEDTest 2013 and TRECVID MEDTest 2014 demonstrate the effectiveness and superiority of the proposed framework on complex event detection.

Bilateral MultiPerspective Matching for Natural Language Sentences
Zhiguo Wang, Wael Hamza, Radu Florian
Natural language sentence matching is a fundamental technology for a variety of tasks. Previous approaches either match sentences from a single direction or only apply single granular (wordbyword or sentencebysentence) matching. In this work, we propose a bilateral multiperspective matching (BiMPM) model. Given two sentences $P$ and $Q$, our model first encodes them with a BiLSTM encoder. Next, we match the two encoded sentences in two directions $P \rightarrow Q$ and $P \leftarrow Q$. In each matching direction, each time step of one sentence is matched against all timesteps of the other sentence from multiple perspectives. Then, another BiLSTM layer is utilized to aggregate the matching results into a fixlength matching vector. Finally, based on the matching vector, a decision is made through a fully connected layer. We evaluate our model on three tasks: paraphrase identification, natural language inference and answer sentence selection. Experimental results on standard benchmark datasets show that our model achieves the stateoftheart performance on all tasks.

An Attentionbased Regression Model for Grounding Textual Phrases in Images
Ko Endo, Masaki Aono, Eric Nichols, Kotarou Funakoshi
Grounding, or localizing, a textual phrase in an image is a challenging problem that is integral to visual language understanding. Previous approaches to this task typically make use of candidate region proposals, where end performance depends on that of the region proposal method and additional computational costs are incurred. In this paper, we treat grounding as a regression problem and propose a method to directly identify the region referred to by a textual phrase, eliminating the need for external candidate region prediction. Our approach uses deep neural networks to combine image and text representations and refines the target region with attention models over both image subregions and words in the textual phrase. Despite the challenging nature of this task and sparsity of available data, in evaluation on the ReferIt dataset, our proposed method achieves a new stateoftheart in performance of 37.26% accuracy, surpassing the previously reported best by over 5 percentage points. We find that combining image and text attention models and an image attention areasensitive loss function contribute to substantial improvements.

RHash: Robust Hashing via L_infinitynorm Distortion
Amirali Aghazadeh, Andrew Lan, Anshumali Shrivastava, Richard Baraniuk
Hashing is an important tool in largescale machine learning. Unfortunately, current datadependent hashing algorithms are not robust to small perturbations of the data points, which degrades the performance of nearest neighbor (NN) search. The culprit is the minimization of the L_2norm, average distortion among pairs of points to find the hash function. Inspired by recent progress in robust optimization, we develop a novel hashing algorithm, dubbed RHash, that instead minimizes the L_1norm, worstcase distortion among pairs of points. We develop practical and efficient implementations of RHash that couple the alternating direction method of multipliers (ADMM) framework with column generation to scale well to large datasets. A range of experimental evaluations demonstrate the superiority of RHash over ten stateoftheart binary hashing schemes. In particular, we show that RHash achieves the same retrieval performance as the stateoftheart algorithms in terms of average precision while using up to 60% fewer bits.
Friday 25 08:30  10:00 MLNNV  Neural Networks and Vision

CFNN: Correlation Filter Neural Network for Visual Object Tracking
Yang Li, Zhan Xu, Jianke Zhu
Albeit convolutional neural network (CNN) has shown promising capacity in many computer vision tasks, applying it to visual tracking is yet far from solved. Existing methods either employ a large external dataset to undertake exhaustive pretraining or suffer from less satisfactory results in terms of accuracy and robustness. To track single target in a wide range of videos, we present a novel Correlation Filter Neural Network architecture, as well as a complete visual tracking pipeline, The proposed approach is a special case of CNN, whose initialization does not need any pretraining on the external dataset. The initialization of network enjoys the merits of cyclic sampling to achieve the appealing discriminative capability, while the network updating scheme adopts advantages from backpropagation in order to capture new appearance variations. The tracking pipeline integrates both aspects well by making them complementary to each other. We validate our tracker on OTB2013 benchmark. The proposed tracker obtains the promising results compared to most of existing representative trackers.

WALKING WALKing walking: Action Recognition from Action Echoes
Qianli Ma, Lifeng Shen, Enhuan Chen, Shuai Tian, Jiabing Wang, Garrison Cottrell
Recognizing human actions represented by 3D trajectories of skeleton joints is a challenging machine learning task. In this paper, the 3D skeleton sequences are regarded as multivariate time series, and their dynamics and multiscale features are efficiently learned from action echo states. Specifically, first the skeleton data from the limbs and trunk are projected into five high dimensional nonlinear spaces, that are randomly generated by five dynamic, trainingfree recurrent networks, i.e., the reservoirs of echo state networks (ESNs). In this way, the history of the time series is represented as nonlinear echo states of actions. We then use a single multiscale convolutional layer to extract multiscale features from the echo states, and maintain multiscale temporal invariance by a maxovertime pooling layer. We propose two multistep fusion strategies to integrate the spatial information over the five parts of the human physical structure. Finally, we learn the label distribution using softmax. With one trainingfree recurrent layer and only layer of convolution, our Convolutional Echo State Network (ConvESN) is a very efficient endtoend model, and achieves stateoftheart performance on four skeleton benchmark data sets.

Globalresidual and Localboundary Refinement Networks for Rectifying Scene Parsing Predictions
Rui Zhang, Sheng Tang, Min Lin, Jintao Li, Shuicheng Yan
Most of existing scene parsing methods suffer from the serious problems of both inconsistent parsing results and object boundary shift. To tackle these problems, we first propose an iterative Globalresidual Refinement Network (GRN) through exploiting global contextual information to predict the parsing residuals and iteratively smoothen the inconsistent parsing labels. Furthermore, we propose a Localboundary Refinement Network (LRN) to learn the positionadaptive propagation coefficients so that local contextual information from neighbors can be optimally captured for refining object boundaries. Finally, we cascade the proposed two refinement networks after a fully residual convolutional neural network within a uniform framework. Extensive experiments on ADE20K and Cityscapes datasets well demonstrate the effectiveness of the two refinement methods for refining scene parsing predictions.

Groupwise Deep Cosaliency Detection
Lina Wei, Shanshan Zhao, Omar El Farouk Bourahla, Xi Li, Fei Wu
In this paper, we propose an endtoend groupwise deep cosaliency detection approach to address the cosalient object discovery problem based on the fully convolutional network (FCN) with group input and group output. The proposed approach captures the groupwise interaction information for group images by learning a semanticsaware image representation based on a convolutional neural network, which adaptively learns the groupwise features for cosaliency detection. Furthermore, the proposed approach discovers the collaborative and interactive relationships between groupwise feature representation and singleimage individual feature representation, and model this in a collaborative learning framework. Finally, we set up a unified endtoend deep learning scheme to jointly optimize the process of groupwise feature representation learning and the collaborative learning, leading to more reliable and robust cosaliency detection results. Experimental results demonstrate the effectiveness of our approach in comparison with the stateoftheart approaches.

A Sequence Labeling Convolutional Network and Its Application to Handwritten String Recognition
Qingqing Wang, Yue Lu
Handwritten string recognition has been struggling with connected patterns fiercely. Segmentationfree and oversegmentation frameworks are commonly applied to deal with this issue. For the past years, RNN combining with CTC has occupied the domain of segmentationfree handwritten string recognition, while CNN is just employed as a single character recognizer in the oversegmentation framework. The main challenges for CNN to directly recognize handwritten strings are the appropriate processing of arbitrary input string length, which implies arbitrary input image size, and reasonable design of the output layer. In this paper, we propose a sequence labeling convolutional network for the recognition of handwritten strings, in particular, the connected patterns. We properly design the structure of the network to predict how many characters present in the input images and what exactly they are at every position. Spatial pyramid pooling (SPP) is utilized with a new implementation to handle arbitrary string length. Moreover, we propose a more flexible pooling strategy called FSPP to adapt the network to the straightforward recognition of long strings better. Experiments conducted on handwritten digital strings from two benchmark datasets and our own cellphone number dataset demonstrate the superiority of the proposed network.

Learning to Read Irregular Text with Attention Mechanisms
Xiao Yang, C. Lee Giles, Dafang He, Zihan Zhou, Daniel Kifer
We present a robust endtoend neuralbased model to attentively recognize text in natural images. Particularly, we focus on accurately identifying irregular (perspectively distorted or curved) text, which has not been well addressed in the previous literature. Previous research on text reading often works with regular (horizontal and frontal) text and does not adequately generalize to processing text with perspective distortion or curving effects. Our work proposes to overcome this difficulty by introducing two learning components: (1) an auxiliary dense character detection task that helps to learn text specific visual patterns, (2) an alignment loss that provides guidance to the training of an attention model. We show with experiments that these two components are crucial for achieving fast convergence and high classification accuracy for irregular text recognition. Our model outperforms previous work on two irregulartext datasets: SVTPerspective and CUTE80, and is also highlycompetitive on several regulartext datasets containing primarily horizontal and frontal text.
Friday 25 08:30  10:00 MLUL1  Unsupervised Learning 1

Radar: Residual Analysis for Anomaly Detection in Attributed Networks
Jundong Li, Harsh Dani, Xia Hu, Huan Liu
Attributed networks are pervasive in different domains, ranging from social networks, gene regulatory networks to financial transaction networks. This kind of rich network representation presents challenges for anomaly detection due to the heterogeneity of two data representations. A vast majority of existing algorithms assume certain properties of anomalies are given a prior. Since various types of anomalies in realworld attributed networks coexist, the assumption that priori knowledge regarding anomalies is available does not hold. In this paper, we investigate the problem of anomaly detection in attributed networks generally from a residual analysis perspective, which has been shown to be effective in traditional anomaly detection problems. However, it is a nontrivial task in attributed networks as interactions among instances complicate the residual modeling process. Methodologically, we propose a learning framework to characterize the residuals of attribute information and its coherence with network information for anomaly detection. By learning and analyzing the residuals, we detect anomalies whose behaviors are singularly different from the majority. Experiments on real datasets show the effectiveness and generality of the proposed framework.

Online Robust LowRank Tensor Learning
Ping Li, Jiashi Feng, Xiaojie Jin, Luming Zhang, Xianghua Xu, Shuicheng Yan
The rapid increase of multidimensional data (a.k.a. tensor) like videos brings new challenges for lowrank data modeling approaches such as dynamic data size, complex highorder relations, and multiplicity of lowrank structures. Resolving these challenges require a new tensor analysis method that can perform tensor data analysis online, which however is still absent. In this paper, we propose an Online Robust Lowrank Tensor Modeling (ORLTM) approach to address these challenges. ORLTM dynamically explores the highorder correlations across all tensor modes for lowrank structure modeling. To analyze mixture data from multiple subspaces, ORLTM introduces a new dictionary learning component. ORLTM processes data streamingly and thus requires quite low memory cost that is independent of data size. This makes ORLTM quite suitable for processing largescale tensor data. Empirical studies have validated the effectiveness of the proposed method on both synthetic data and one practical task, i.e., video background subtraction. In addition, we provide theoretical analysis regarding computational complexity and memory cost, demonstrating the efficiency of ORLTM rigorously.

From Ensemble Clustering to MultiView Clustering
Zhiqiang Tao, Hongfu Liu, Sheng Li, Zhengming Ding, Yun Fu
MultiView Clustering (MVC) aims to find the cluster structure shared by multiple views of a particular dataset. Existing MVC methods mainly integrate the raw data from different views, while ignoring the highlevel information. Thus, their performance may degrade due to the conflict between heterogeneous features and the noises existing in each individual view. To overcome this problem, we propose a novel MultiView Ensemble Clustering (MVEC) framework to solve MVC in an Ensemble Clustering (EC) way, which generates Basic Partitions (BPs) for each view individually and seeks for a consensus partition among all the BPs. By this means, we naturally leverage the complementary information of multiview data in the same partition space. Instead of directly fusing BPs, we employ the lowrank and sparse decomposition to explicitly consider the connection between different views and detect the noises in each view. Moreover, the spectral ensemble clustering task is also involved by our framework with a carefully designed constraint, making MVEC a unified optimization framework to achieve the final consensus partition. Experimental results on six realworld datasets show the efficacy of our approach compared with both MVC and EC methods.

Angle Principal Component Analysis
Qianqian Wang, Quanxue Gao, XInbo Gao, Feiping Nie
Recently, many ℓ1norm based PCA methods have been developed for dimensionality reduction, but they do not explicitly consider the reconstruction error. Moreover, they do not take into account the relationship between reconstruction error and variance of projected data. This reduces the robustness of algorithms. To handle this problem, a novel formulation for PCA, namely angle PCA, is proposed. Angle PCA employs ℓ2norm to measure reconstruction error and variance of projected data and maximizes the summation of ratio between variance and reconstruction error of each data. Angle PCA not only is robust to outliers but also retains PCA’s desirable property such as rotational invariance. To solve Angle PCA, we propose an iterative algorithm, which has closedform solution in each iteration. Extensive experiments on several face image databases illustrate that our method is overall superior to the other robust PCA algorithms, such as PCA, PCAL1 greedy, PCAL1 nongreedy and HQPCA.

Locality Preserving Projections for Grassmann manifold
Boyue Wang, Yongli Hu, Junbin Gao, Yanfeng Sun, Baocai Yin, Muhammad Ali, Haoran Chen
Learning on Grassmann manifold has become popular in many computer vision tasks, with the strong capability to extract discriminative information for imagesets and videos. However, such learning algorithms particularly on highdimensional Grassmann manifold always involve with significantly high computational cost, which seriously limits the applicability of learning on Grassmann manifold in more wide areas. In this research, we propose an unsupervised dimensionality reduction algorithm on Grassmann manifold based on the Locality Preserving Projections (LPP) criterion. LPP is a commonly used dimensionality reduction algorithm for vectorvalued data, aiming to preserve local structure of data in the dimensionreduced space. The strategy is to construct a mapping from higher dimensional Grassmann manifold into the one in a relative lowdimensional with more discriminative capability. The proposed method can be optimized as a basic eigenvalue problem. The performance of our proposed method is assessed on several classification and clustering tasks and the experimental results show show its clear advantages over other Grassmann based algorithms.

Robust Asymmetric Bayesian Adaptive Matrix Factorization
Xin Guo, Boyuan Pan, Deng Cai, Xiaofei He
Low rank matrix factorizations(LRMF) have attracted much attention due to its wide range of applications in computer vision, such as image impainting and video denoising. Most of the existing methods assume that the loss between an observed measurement matrix and its bilinear factorization follows symmetric distribution, like gaussian or gamma families. However, in realworld situations, this assumption is often found too idealized, because pictures under various illumination and angles may suffer from multipeaks, asymmetric and irregular noises. To address these problems, this paper assumes that the loss follows a mixture of Asymmetric Laplace distributions and proposes robust Asymmetric Laplace Adaptive Matrix Factorization model(ALAMF) under bayesian matrix factorization framework. The assumption of Laplace distribution makes our model more robust and the asymmetric attribute makes our model more flexible and adaptable to realworld noise. A variational method is then devised for model inference. We compare ALAMF with other stateoftheart matrix factorization methods both on data sets ranging from synthetic and realworld application. The experimental results demonstrate the effectiveness of our proposed approach.
Friday 25 08:30  10:00 KRLKR1  Logics for Knowledge Representation 1

Strong Inconsistency in Nonmonotonic Reasoning
Gerhard Brewka, Matthias Thimm, Markus Ulbricht
Minimal inconsistent subsets of knowledge bases play an important role in classical logics, most notably for repair and inconsistency measurement. It turns out that for nonmonotonic reasoning a stronger notion is needed. In this paper we develop such a notion, called strong inconsistency. We show that—in an arbitrary logic, monotonic or not—minimal strongly inconsistent subsets play the same role as minimal inconsistent subsets in classical reasoning. In particular, we show that the wellknown classical duality between hitting sets of minimal inconsistent subsets and maximal consistent subsets generalizes to arbitrary logics if the strong notion of inconsistency is used. We investigate the complexity of various related reasoning problems and present a generic algorithm for computing minimal strongly inconsistent subsets of a knowledge base. We also demonstrate the potential of our new notion for applications, focusing on repair and inconsistency measurement.

Strategically knowing how
Raul Fervari, Andreas Herzig, Yanjun Li, Yanjing Wang
In this paper, we propose a singleagent logic of goaldirected knowing how extending the standard epistemic logic of knowing that with a new knowing how operator. The semantics of the new operator is based on the idea that knowing how to achieve phi means that there exists a (uniform) strategy such that the agent knows that it can make sure phi. We give an intuitive axiomatisation of our logic and prove the soundness, completeness, and decidability of the logic. The crucial axioms relating knowing that and knowing how illustrate our understanding of knowing how in this setting. This logic can be used in representing and reasoning about knowledgehow.

Conflictdriven ASP Solving with External Sources and Program Splits
Christoph Redl
Answer Set Programming (ASP) is a wellknown problem solving approach based on nonmonotonic reasoning. HEXprograms extend ASP with external atoms for access to arbitrary external sources, which can also introduce constants that do not appear in the program (value invention). In order to determine the relevant constants during (pre)grounding, external atoms must in general be evaluated under up to exponentially many possible inputs. While program splitting techniques allow for eliminating exhaustive pregrounding, they prohibit effective conflictdriven solving. Thus, current techniques suffer either a grounding or a solving bottleneck. In this work we introduce a new technique for conflictdriven learning over multiple program components. To this end, we identify reasons for inconsistency of program components wrt. input from predecessor components and propagate them back. Experiments show a significant, potentially exponential speedup.

Model Checking MultiAgent Systems against LDLK Specifications
Jeremy Kong, Alessio Lomuscio
We define the logic LDLK, a formalism for specifying multiagent systems. LDLK extends LDL with epistemic modalities, including common knowledge, for reasoning about the evolution of knowledge states of the agents in the system. We study the complexity of verifying a multiagent system against LDLK specifications and show this to be in PSPACE. We give an algorithm for the practical verification of multiagent systems specified in LDLK. We show that the model checking algorithm, based on alternatingautomata and nFA, is amenable to symbolic implementation on OBDDs. We introduce MCMAS LDLK , an extension of the opensource model checker MCMAS, implementing the algorithm and discuss the experimental results obtained.

A DataDriven Approach to Infer Knowledge Base Representation for Natural Language Relations
Kangqi Luo, Xusheng Luo, Xianyang Chen, Kenny Zhu
This paper studies the problem of discovering the structured knowledge representation of binary natural language relations.The representation, known as the schema, generalizes the traditional path of predicates to support more complex semantics.We present a search algorithm to generate schemas over a knowledge base, and propose a datadriven learning approach to discover the most suitable representations to one relation. Evaluation results show that inferred schemas are able to represent precise semantics, and can be used to enrich manually crafted knowledge bases.

Characterising the Manipulability of Boolean Games
Paul Harrenstein, Paolo Turrini, Michael Wooldridge
The existence of (Nash) equilibria with undesirable properties is a wellknown problem in game theory, which has motivated much research directed at the possibility of mechanisms for modifying games in order to eliminate undesirable equilibria, or induce desirable ones. Taxation schemes are a wellknown mechanism for modifying games in this way. In the multiagent systems community, taxation mechanisms for incentive engineering have been studied in the context of Boolean games with costs. These are games in which each player assigns truthvalues to a set of propositional variables she uniquely controls in pursuit of satisfying an individual propositional goal formula; different choices for the player are also associated with different costs. In such a game, each player prefers primarily to see the satisfaction of their goal, and secondarily, to minimise the cost of their choice, thereby giving rise to lexicographic preferences over goalsatisfaction and costs. Within this setting, where taxes operate on costs only, however, it may well happen that the elimination or introduction of equilibria can only be achieved at the cost of simultaneously introducing less desirable equilibria or eliminating more attractive ones. Although this framework has been studied extensively, the problem of precisely characterising the equilibria that may be induced or eliminated has remained open. In this paper we close this problem, giving a complete characterisation of those mechanisms that can induce a set of outcomes of the game to be exactly the set of Nash Equilibrium outcomes.
Friday 25 08:30  10:00 MLLPR1  Learning Preferences or Rankings 1

Privileged Matrix Factorization for Collaborative Filtering
Yali Du, Chang Xu, Dacheng Tao
Collaborative filtering plays a crucial role in reducing excessive information in online consuming by suggesting products to customers that fulfil their potential interests. Observing that a user's review comments on purchases are often in companion with ratings, recent works exploit the review texts in representing user or item factors and have achieved prominent performance. Although effectiveness of reviews has been verified, one major defect of existing works is that reviews are used in justifying the learning of either user or item factors without noticing that each review associates a pair of user and item concurrently. To better explore the value of review comments, this paper presents the privileged matrix factorization method that utilize reviews in the learning of both user and item factors. By mapping review texts into the privileged feature space, a learned privileged function compensates the discrepancies between predicted ratings and groundtruth values ratingwisely. Thus by minimizing discrepancies and prediction errors, our method harnesses the information present in the review comments for the learning of both user and item factors. Experiments on five real datasets testify the effectiveness of the proposed method.

Collaborative Rating Allocation
Yali Du, Chang Xu, Dacheng Tao
This paper studies the collaborative rating allocation problem, in which each user has limited ratings on all items. These users are termed ``energy limited''. Different from existing methods which treat each rating independently, we investigate the geometric properties of a user's rating vector, and design a matrix completion method on the simplex. In this method, a user's rating vector is estimated by the combination of user profiles as basis points on the simplex. Instead of using Euclidean metric, a nonlinear pullback distance measurement from the sphere is adopted since it can depict the geometric constraints on each user's rating vector. The resulting objective function is then efficiently optimized by a Riemannian conjugate gradient method on the simplex. Experiments on realworld data sets demonstrate our model's competitiveness versus other collaborative rating prediction methods.

Disguise Adversarial Networks for Clickthrough Rate Prediction
Yue Deng, Yilin Shen, Hongxia Jin
We introduced an adversarial learning framework for improving CTR prediction in Ads recommendation. Our approach was motivated by observing the extremely low clickthrough rate and imbalanced label distribution in the historical Ads impressions. We hence proposed a DisguiseAdversarialNetworks (DAN) to improve the accuracy of supervised learning with limited positiveclass information. In the context of CTR prediction, the rationality behind DAN could be intuitively understood as ``nonclicked Ads makeup''. DAN disguises the disliked Ads impressions (nonclicks) to be interesting ones and encourages a discriminator to classify these disguised Ads as positive recommendations. In an adversarial aspect, the discriminator should be soberminded which is optimized to allocate these disguised Ads to their inherent classes according to an unsupervised information theoretic assignment strategy. We applied DAN to two Ads datasets including both mobile and display Ads for CTR prediction. The results showed that our DAN approach significantly outperformed other supervised learning and generative adversarial networks (GAN) in CTR prediction.

DeepFM: A FactorizationMachine based Neural Network for CTR Prediction
Huifeng Guo, Ruiming TANG, Yunming Ye, Zhenguo Li, Xiuqiang He
Learning sophisticated feature interactions behind user behaviors is critical in maximizing CTR for recommender systems. Despite great progress, existing methods seem to have a strong bias towards low or highorder interactions, or require expertise feature engineering. In this paper, we show that it is possible to derive an endtoend learning model that emphasizes both low and highorder feature interactions. The proposed model, DeepFM, combines the power of factorization machines for recommendation and deep learning for feature learning in a new neural network architecture. Compared to the latest Wide & Deep model from Google, DeepFM has a shared input to its "wide" and "deep" parts, with no need of feature engineering besides raw features. Comprehensive experiments are conducted to demonstrate the effectiveness and efficiency of DeepFM over the existing models for CTR prediction, on both benchmark data and commercial data.

Diversifying Personalized Recommendation with Usersession Context
Liang Hu, Longbing Cao, Guandong Xu, Jian Cao, Zhiping Gu, Shoujin Wang
Recommender systems (RS) have become an integral part of our daily life. However, most current RS often repeatedly recommend items to users with similar profiles. We argue that recommendation should be diversified by leveraging session contexts with personalized user profiles. For this, current sessionbased RS (SBRS) often assume a rigidly ordered sequence over data which does not fit in many realworld cases. Moreover, personalization is often omitted in current SBRS. Accordingly, a personalized SBRS over relaxedly ordered usersession contexts is more pragmatic. In doing so, deepstructured models tend to be too complex to serve for online SBRS owing to the large number of users and items. Therefore, we design an efficient SBRS with shallow wideinwideout networks, inspired by the successful experience in modern language modelings. The experiments on a realworld ecommerce dataset show the superiority of our model over the stateoftheart methods.

Tensor Completion with Side Information: A Riemannian Manifold Approach
Zhou Tengfei, Hui Qian, Zebang Shen, Chao Zhang, Congfu Xu
By restricting the iterate on a nonlinear manifold, the recently proposed Riemannian optimization methods prove to be both efficient and effective in low rank tensor completion problems. However, existing methods fail to exploit the easily accessible side information, due to their format mismatch. Consequently, there is still room for improvement. To fill the gap, in this paper, a novel Riemannian model is proposed to tightly integrate the original model and the side information by overcoming their inconsistency. For this model, an efficient Riemannian conjugate gradient descent solver is devised based on a new metric that captures the curvature of the objective. Numerical experiments suggest that our method is more accurate than the stateoftheart without compromising the efficiency.
Friday 25 08:30  10:00 JOUNLP  Journal Track: Natural Language Processing

News Across Languages  CrossLingual Document Similarity and Event Tracking (Extended Abstract)
Jan Rupnik, Andrej Muhič, Gregor Leban, Blaž Fortuna, Marko Grobelnik
In today's world, we follow news which is distributed globally. Significant events are reported by different sources and in different languages. In this work, we address the problem of tracking of events in a large multilingual stream. Within a recently developed system Event Registry we examine two aspects of this problem: how to compare articles in different languages and how to link collections of articles in different languages which refer to the same event. Building on previous work, we show there are methods which scale well and can compute a meaningful similarity between articles from languages with little or no direct overlap in the training data.Using this capability, we then propose an approach to link clusters of articles across languages which represent the same event.

Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures (Extended Abstract)
Adrian Muscat, Raffaella Bernardi, Ruket Cakici, Desmond Elliott, Aykut Erdem, Erkut Erdem, Nazli IkizlerCinbis, Frank Keller, Barbara Plank
Automatic image description generation is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities. In this survey, we classify the known approaches based on how they conceptualise this problem and provide a review of existing models, highlighting their advantages and disadvantages. Moreover, we give an overview of the benchmark imagetext datasets and the evaluation measures that have been developed to assess the quality of machinegenerated descriptions. Finally we explore future directions in the area of automatic image description.

Text Rewriting Improves Semantic Role Labeling
Kristian Woodsend, Mirella Lapata
Largescale annotated corpora are a prerequisite to developing highperformance NLP systems. Such corpora are expensive to produce, limited in size, often demanding linguistic expertise. In this paper we use text rewriting as a means of increasing the amount of labeled data available for model training. Our method uses automatically extracted rewrite rules from comparable corpora and bitexts to generate multiple versions of sentences annotated with gold standard labels. We apply this idea to semantic role labeling and show that a model trained on rewritten data outperforms the state of the art on the CoNLL2009 benchmark dataset.

Robust Multilingual Named Entity Recognition with Shallow Semisupervised Features (Extended Abstract)
Rodrigo Agerri, German Rigau
We present a multilingual Named Entity Recognition approach based on a robust and general set of features across languages and datasets. Our system combines shallow local information with clustering semisupervised features induced on large amounts of unlabeled text. Understanding via empiricalexperimentation how to effectively combine various types of clustering features allows us to seamlessly export our system to other datasets and languages. The result is a simple but highly competitive system which obtains state of the art results across five languages and twelve datasets. The results are reported on standard shared task evaluation data such as CoNLL for English, Spanish and Dutch. Furthermore, and despite the lack of linguistically motivated features, we also report best results for languages such as Basque and German. In addition, we demonstrate that our method also obtains very competitive results even when the amount of supervised data is cut by half, alleviating the dependency on manually annotated data. Finally, the results show that our emphasis on clustering features is crucial to develop robust outofdomain models. The system and models are freely available to facilitate its use and guarantee the reproducibility of results.
Friday 25 08:30  10:00 MLDL1  Deep Learning 1

DeepStory: Video Story QA by Deep Embedded Memory Networks
Kyungmin Kim, MinOh Heo, SeongHo Choi, ByoungTak Zhang
Questionanswering (QA) on video contents is a significant challenge for achieving humanlevel intelligence as it involves both vision and language in realworld settings. Here we demonstrate the possibility of an AI agent performing video story QA by learning from a large amount of cartoon videos. We develop a videostory learning model, i.e. Deep Embedded Memory Networks (DEMN), to reconstruct stories from a joint scenedialogue video stream using a latent embedding space of observed data. The video stories are stored in a longterm memory component. For a given question, an LSTMbased attention model uses the longterm memory to recall the best questionstoryanswer triplet by focusing on specific words containing key information. We trained the DEMN on a novel QA dataset of children’s cartoon video series, Pororo. The dataset contains 16,066 scenedialogue pairs of 20.5hour videos, 27,328 finegrained sentences for scene description, and 8,913 storyrelated QA pairs. Our experimental results show that the DEMN outperforms other QA models. This is mainly due to 1) the reconstruction of video stories in a scenedialogue combined form that utilize the latent embedding and 2) attention. DEMN also achieved stateoftheart results on the MovieQA benchmark.

Learning Multilevel Region Consistency with Dense Multilabel Networks for Semantic Segmentation
Tong Shen, Guosheng Lin, Chunhua Shen, Ian Reid
Semantic image segmentation is a fundamental task in image understanding. Perpixel semantic labelling of an image benefits greatly from the ability to consider region consistency both locally and globally. However, many Fully Convolutional Network based methods do not impose such consistency, which may give rise to noisy and implausible predictions. We address this issue by proposing a dense multilabel network module that is able to encourage the region consistency at different levels. This simple but effective module can be easily integrated into any semantic segmentation systems. With comprehensive experiments, we show that the dense multilabel can successfully remove the implausible labels and clear the confusion so as to boost the performance of semantic segmentation systems.

Towards Understanding the Invertibility of Convolutional Neural Networks
Anna Gilbert, Yi Zhang, Kibok Lee, Yuting Zhang, Honglak Lee
Several recent works have empirically observed that Convolutional Neural Nets (CNNs) are (approximately) invertible. To understand this approximate invertibility phenomenon and how to leverage it more effectively, we focus on a theoretical explanation and develop a mathematical model of sparse signal recovery that is consistent with CNNs with random weights. We give an exact connection to a particular model of modelbased compressive sensing (and its recovery algorithms) and randomweight CNNs. We show empirically that several learned networks are consistent with our mathematical analysis and then demonstrate that with such a simple theoretical framework, we can obtain reasonable reconstruction results on real images. We also discuss gaps between our model assumptions and the CNN trained for classification in practical scenarios.

Tag Disentangled Generative Adversarial Network for Object Image Rerendering
Chaoyue Wang, Chaohui Wang, Chang Xu, Dacheng Tao
In this paper, we propose a principled Tag Disentangled Generative Adversarial Networks (TDGAN) for rerendering new images for the object of interest from a single image of it by specifying multiple scene properties (such as viewpoint, illumination, expression, etc.). The whole framework consists of a disentangling network, a generative network, a tag mapping net, and a discriminative network, which are trained jointly based on a given set of images that are completely/partially tagged (i.e., supervised/semisupervised setting). Given an input image, the disentangling network extracts disentangled and interpretable representations, which are then used to generate images by the generative network. In order to boost the quality of disentangled representations, the tag mapping net is integrated to explore the consistency between the image and its tags. Furthermore, the discriminative network is introduced to implement the adversarial training strategy for generating more realistic images. Experiments on two challenging datasets demonstrate the stateoftheart performance of the proposed framework in the problem of interest.

Image Matching via Loopy RNN
Donghao Luo, Bingbing Ni, Yichao YAN, Xiaokang Yang
Most existing matching algorithms are oneoff algorithms, i.e., they usually measure the distance between the two image feature representation vectors for only one time. In contrast, human's vision system achieves this task, i.e., image matching, by recursively looking at specific/related parts of both images and then making the final judgement. Towards this end, we propose a novel loopy recurrent neural network (Loopy RNN), which is capable of aggregating relationship information of two input images in a progressive/iterative manner and outputting the consolidated matching score in the final iteration. A Loopy RNN features two uniqueness. First, built on conventional long shortterm memory (LSTM) nodes, it links the output gate of the tail node to the input gate of the head node, thus it brings up symmetry property required for matching. Second, a monotonous loss designed for the proposed network guarantees increasing confidence during the recursive matching process. Extensive experiments on several image matching benchmarks demonstrate the great potential of the proposed method.

Dual Inference for Machine Learning
Yingce Xia, Jiang Bian, Tao Qin, Nenghai Yu, TieYan Liu
Recent years have witnessed the rapid development of machine learning in solving artificial intelligence (AI) tasks in many domains, including translation, speech, image, etc. Within these domains, AI tasks are usually not independent. As a specific type of relationship, structural duality does exist between many pairs of AI tasks, such as translation from one language to another vs. its opposite direction, speech recognition vs. speech synthetization, image classification vs. image generation, etc. The importance of such duality has been magnified by some recent studies, which revealed that it can boost the learning of two tasks in the dual form. However, there has been little investigation on how to leverage this invaluable relationship into the inference stage of AI tasks. In this paper, we propose a general framework of dual inference which can take advantage of both existing models from two dual tasks, without retraining, to conduct inference for one individual task. Empirical studies on three pairs of specific dual tasks, including machine translation, sentiment analysis, and image processing have illustrated that dual inference can significantly improve the performance of each of individual tasks.
Friday 25 08:30  10:00 MLDMTS  Data Mining and Time Series

Cascade Dynamics Modeling with Attentionbased Recurrent Neural Network
Yongqing Wang, Shenghua Liu, Huawei Shen, Jinhua Gao, Xueqi Cheng
An ability of modeling and predicting the cascades of resharing is crucial to understanding information propagation and to launching campaign of viral marketing. Conventional methods for cascade prediction heavily depend on the hypothesis of diffusion models, e.g., independent cascade model and linear threshold model. Recently, researchers attempt to circumvent the problem of cascade prediction using sequential models (e.g., recurrent neural network, namely RNN) that do not require knowing the underlying diffusion model. Existing sequential models employ a chain structure to capture the memory effect. However, for cascade prediction, each cascade generally corresponds to a diffusion tree, causing crossdependence in cascadeone sharing behavior could be triggered by its nonimmediate predecessor in the memory chain. In this paper, we propose to an attentionbased RNN to capture the crossdependence in cascade. Furthermore, we introduce a \emph{coverage} strategy to combat the misallocation of attention caused by the memoryless of traditional attention mechanism. Extensive experiments on both synthetic and real world datasets demonstrate the proposed models outperform stateoftheart models at both cascade prediction and inferring diffusion tree.

What to Do Next: Modeling User Behaviors by TimeLSTM
Yu Zhu, Hao Li, Yikang Liao, Beidou Wang, Ziyu Guan, Haifeng Liu, Deng Cai
Recently, Recurrent Neural Network (RNN) solutions for recommender systems (RS) are becoming increasingly popular. The insight is that, there exist some intrinsic patterns in the sequence of users' actions, and RNN has been proved to perform excellently when modeling sequential data. In traditional tasks such as language modeling, RNN solutions usually only consider the sequential order of objects without the notion of interval. However, in RS, time intervals between users' actions are of significant importance in capturing the relations of users' actions and the traditional RNN architectures are not good at modeling them. In this paper, we propose a new LSTM variant, i.e. TimeLSTM, to model users' sequential actions. TimeLSTM equips LSTM with time gates to model time intervals. These time gates are specifically designed, so that compared to the traditional RNN solutions, TimeLSTM better captures both of users' shortterm and longterm interests, so as to improve the recommendation performance. Experimental results on two realworld datasets show the superiority of the recommendation method using TimeLSTM over the traditional methods.

App Download Forecasting: An Evolutionary Hierarchical Competition Approach
Yingzi Wang, Nicholas Jing Yuan, Yu Sun, Chuan Qin, Xing Xie
Product sales forecasting enables comprehensive understanding of products' future development, making it of particular interest for companies to improve their business, for investors to measure the values of firms, and for users to capture the trends of a market. Recent studies show that the complex competition interactions among products directly influence products' future development. However, most existing approaches fail to model the evolutionary competition among products and lack the capability to organically reflect multilevel competition analysis in sales forecasting. To address these problems, we propose the Evolutionary Hierarchical Competition Model (EHCM), which effectively considers the timeevolving multilevel competition among products. The EHCM model systematically integrates hierarchical competition analysis with multiscale time series forecasting. Extensive experiments using a realworld app download dataset show that EHCM outperforms stateoftheart methods in various forecasting granularities.

Fast Change Point Detection on Dynamic Social Networks
Yu Wang, Aniket Chakrabarti, David Sivakoff, Srinivasan Parthasarathy
A number of real world problems in many domains (e.g. sociology, biology, political science and communication networks) can be modeled as dynamic networks with nodes representing entities of interest and edges representing interactions among the entities at different points in time. A common representation for such models is the snapshot model  where a network is defined at logical timestamps. An important problem under this model is change point detection. In this work we devise an effective and efficient threestepapproach for detecting change points in dynamic networks under the snapshot model. Our algorithm achieves up to 9X speedup over the stateoftheart while improving quality on both synthetic and real world networks.

Link Prediction with Spatial and Temporal Consistency in Dynamic Networks
Wenchao Yu, Wei Cheng, Wei Wang, Charu C Aggarwal, Haifeng Chen
Dynamic networks are ubiquitous. Link prediction in dynamic networks has attracted tremendous research interests. Many models have been developed to predict links that may emerge in the immediate future from the past evolution of the networks. There are two key factors: 1) a node is more likely to form a link in the near future with another node within its close proximity, rather than with a random node; 2) a dynamic network usually evolves smoothly. Existing approaches seldom unify these two factors to strive for the spatial and temporal consistency in a dynamic network. To address this limitation, in this paper, we propose a link prediction model with spatial and temporal consistency (LIST), to predict links in a sequence of networks over time. LIST characterizes the network dynamics as a function of time, which integrates the spatial topology of network at each timestamp and the temporal network evolution. Comparing to existing approaches, LIST has two advantages: 1) LIST uses a generic model to express the network structure as a function of time, which makes it also suitable for a wide variety of temporal network analysis problems beyond the focus of this paper; 2) by retaining the spatial and temporal consistency, LIST yields better prediction performance. Extensive experiments on four real datasets demonstrate the effectiveness of the LIST model.

LMPP: A Large Margin Point Process Combining Reinforcement and Competition for Modeling Hashtag Popularity
Bidisha Samanta, Abir De, Abhijnan Chakraborty, Niloy Ganguly
Predicting the popularity dynamics of Twitter hashtags has a broad spectrum of applications. Existing works have mainly focused on modeling the popularity of individual tweets rather than the popularity of the underlying hashtags. Hence, they do not consider several realistic factors for hashtag popularity. In this paper, we propose Large Margin Point Process (LMPP), a probabilistic framework that integrates hashtagtweet influence and hashtaghashtag competitions, the two factors which play important roles in hashtag propagation. Furthermore, while considering the hashtag competitions, LMPP looks into the variations of popularity rankings of the competing hashtags across time. Extensive experiments on seven real datasets demonstrate that LMPP outperforms existing popularity prediction approaches by a significant margin. Going fu