### Friday 1313:30 - 14:30Invited Talk (A1)

Chair: Peter Stone
• Language to Action: Towards Interactive Task Learning with Physical Agents
Joyce Chai
Invited Talk
• ### Friday 1314:30 - 15:30Invited Talk (A1)

Chair: Francis Bach
• Building Machines that Learn and Think Like People
Josh Tenenbaum
Invited Talk

• Opening
Opening
• ### Monday 1609:00 - 09:45Invited Talk (VICTORIA)

Chair: Jeff Rosenschein
• Learning World Models: the Next Step Towards AI
Yann Le Cun
Invited Talk
• ### Monday 1610:15 - 11:15SUR-KR - Survey Track: Knowledge Representation (VICTORIA)

Chair: Natasha Alechina
• #5410
Maintenance of Case Bases: Current Algorithms after Fifty Years
Jose M. Juarez, Susan Craw, J. Ricardo Lopez-Delgado, Manuel Campos
Survey Track: Knowledge Representation

Case-Based Reasoning (CBR) learns new knowledge from data and so can cope with changing environments. CBR is very different from model-based systems since it can learn incrementally as new data is available, storing new cases in its case-base. This means that it can benefit from readily available new data, but also case-base maintenance (CBM) is essential to manage the cases, deleting and compacting the case-base. In the 50th anniversary of CNN (considered the first CBM algorithm), new CBM methods are proposed to deal with the new requirements of Big Data scenarios. In this paper, we present an accessible historic perspective of CBM and we classify and analyse the most recent approaches to deal with these requirements.

• #5440
Evaluation Techniques and Systems for Answer Set Programming: a Survey
Martin Gebser, Nicola Leone, Marco Maratea, Simona Perri, Francesco Ricca, Torsten Schaub
Survey Track: Knowledge Representation

Answer set programming (ASP) is a prominent knowledge representation and reasoning paradigm that found both industrial and scientific applications. The success of ASP is due to the combination of two factors: a rich modeling language and the availability of efficient ASP implementations. In this paper we trace the history of ASP systems, describing the key evaluation techniques and their implementation in actual tools.

• #5442
Recent Advances in Querying Probabilistic Knowledge Bases
Stefan Borgwardt, İsmail İlkan Ceylan, Thomas Lukasiewicz
Survey Track: Knowledge Representation

We give a survey on recent advances at the forefront of research on probabilistic knowledge bases for representing and querying large-scale automatically extracted data. We concentrate especially on increasing the semantic expressivity of formalisms for representing and querying probabilistic knowledge (i) by giving up the closed-world assumption, (ii) by allowing for commonsense knowledge (and in parallel giving up the tuple-independence assumption), and (iii) by giving up the closed-domain assumption, while preserving some computational properties of query answering in such formalisms.

• #5418
Ontology-Based Data Access: A Survey
Guohui Xiao, Diego Calvanese, Roman Kontchakov, Domenico Lembo, Antonella Poggi, Riccardo Rosati, Michael Zakharyaschev
Survey Track: Knowledge Representation

We present the framework of ontology-based data access, a semantic paradigm for providing a convenient and user-friendly access to data repositories, which has been actively developed and studied in the past decade. Focusing on relational data sources, we discuss the main ingredients of ontology-based data access, key theoretical results, techniques, applications and future challenges.

### Monday 1610:15 - 11:15ML-NN1 - Neural Networks (C7)

Chair: Nevin L. Zhang
• #4117
Deep Convolutional Neural Networks with Merge-and-Run Mappings
Liming Zhao, Mingjie Li, Depu Meng, Xi Li, Zhaoxiang Zhang, Yueting Zhuang, Zhuowen Tu, Jingdong Wang
Neural Networks

A deep residual network, built by stacking a sequence of residual blocks, is easy to train, because identity mappings skip residual branches and thus improve information flow. To further reduce the training difficulty, we present a simple network architecture, deep merge-and-run neural networks. The novelty lies in a modularized building block, merge-and-run block, which assembles residual branches in parallel through a merge-and-run mapping: average the inputs of these residual branches (Merge), and add the average to the output of each residual branch as the input of the subsequent residual branch (Run), respectively. We show that the merge-and-run mapping is a linear idempotent function in which the transformation matrix is idempotent, and thus improves information flow, making training easy. In comparison with residual networks, our networks enjoy compelling advantages: they contain much shorter paths and the width, i.e., the number of channels, is increased, and the time complexity remains unchanged. We evaluate the performance on the standard recognition tasks. Our approach demonstrates consistent improvements over ResNets with the comparable setup, and achieves competitive results (e.g., 3.06% testing error on CIFAR-10, 17.55% on CIFAR-100, 1.51% on SVHN).

• #1269
Accelerating Convolutional Networks via Global & Dynamic Filter Pruning
Shaohui Lin, Rongrong Ji, Yuchao Li, Yongjian Wu, Feiyue Huang, Baochang Zhang
Neural Networks

Accelerating convolutional neural networks has recently received ever-increasing research focus. Among various approaches proposed in the literature, filter pruning has been regarded as a promising solution, which is due to its advantage in significant speedup and memory reduction of both network model and intermediate feature maps. To this end, most approaches tend to prune filters in a layer-wise fixed manner, which is incapable to dynamically recover the previously removed filter, as well as jointly optimize the pruned network across layers. In this paper, we propose a novel global & dynamic pruning (GDP) scheme to prune redundant filters for CNN acceleration. In particular, GDP first globally prunes the unsalient filters across all layers by proposing a global discriminative function based on prior knowledge of filters. Second, it dynamically updates the filter saliency all over the pruned sparse network, and then recover the mistakenly pruned filter, followed by a retraining phase to improve the model accuracy. Specially, we effectively solve the corresponding non-convex optimization problem of the proposed GDP via stochastic gradient descent with greedy alternative updating. Extensive experiments show that, comparing to the state-of-the-art filter pruning methods, the proposed approach achieves superior performance to accelerate several cutting-edge CNNs on the ILSVRC 2012 benchmark.

• #196
Dynamically Hierarchy Revolution: DirNet for Compressing Recurrent Neural Network on Mobile Devices
Jie Zhang, Xiaolong Wang, Dawei Li, Yalin Wang
Neural Networks

Recurrent neural networks (RNNs) achieve cutting-edge performance on a variety of problems. However, due to their high computational and memory demands, deploying RNNs on resource constrained mobile devices is a challenging task. To guarantee minimum accuracy loss with higher compression rate and driven by the mobile resource requirement, we introduce a novel model compression approach DirNet based on an optimized fast dictionary learning algorithm, which 1) dynamically mines the dictionary atoms of the projection dictionary matrix within layer to adjust the compression rate 2) adaptively changes the sparsity of sparse codes cross the hierarchical layers. Experimental results on language model and an ASR model trained with a 1000h speech dataset demonstrate that our method significantly outperforms prior approaches. Evaluated on off-the-shelf mobile devices, we are able to reduce the size of original model by eight times with real-time model inference and negligible accuracy loss.

• #924
Automatic Gating of Attributes in Deep Structure
Xiaoming Jin, Tao He, Cheng Wan, Lan Yi, Guiguang Ding, Dou Shen
Neural Networks

Deep structure has been widely applied in a large variety of fields for its excellence of representing data. Attributes are a unique type of data descriptions that have been successfully utilized in numerous tasks to enhance performance. However, to introduce attributes into deep structure is complicated and challenging, because different layers in deep structure accommodate features of different abstraction levels, while different attributes may naturally represent the data in different abstraction levels. This demands adaptively and jointly modeling of attributes and deep structure by carefully examining their relationship. Different from existing works that treat attributes straightforwardly as the same level without considering their abstraction levels, we can make better use of attributes in deep structure by properly connecting them. In this paper, we move forward along this new direction by proposing a deep structure named Attribute Gated Deep Belief Network (AG-DBN) that includes a tunable attribute-layer gating mechanism and automatically learns the best way of connecting attributes to appropriate hidden layers. Experimental results on a manually-labeled subset of ImageNet, a-Yahoo and a-Pascal data set justify the superiority of AG-DBN against several baselines including CNN model and other AG-DBN variants. Specifically, it outperforms the CNN model, VGG19, by significantly reducing the classification error from 26.70% to 13.56% on a-Pascal.

• #1830
Regularizing Deep Neural Networks with an Ensemble-based Decorrelation Method
Shuqin Gu, Yuexian Hou, Lipeng Zhang, Yazhou Zhang
Neural Networks

Although Deep Neural Networks (DNNs) have achieved excellent performance in many tasks, improving the generalization capacity of DNNs still remains a challenge. In this work, we propose a novel regularizer named Ensemble-based Decorrelation Method (EDM), which is motivated by the idea of the ensemble learning to improve generalization capacity of DNNs. EDM can be applied to hidden layers in fully connected neural networks or convolutional neural networks. We treat each hidden layer as an ensemble of several base learners through dividing all the hidden units into several non-overlap groups, and each group will be viewed as a base learner. EDM encourages DNNs to learn more diverse representations by minimizing the covariance between all base learners during the training step. Experimental results on MNIST and CIFAR datasets demonstrate that EDM can effectively reduce the overfitting and improve the generalization capacity of DNNs

### Monday 1610:15 - 11:15MAS-AGT1 - Algorithmic Game Theory (C8)

Chair: Jörg Rothe
• #2010
Probabilistic Verification for Obviously Strategyproof Mechanisms
Diodato Ferraioli, Carmine Ventre
Algorithmic Game Theory

Obviously strategyproof (OSP) mechanisms maintain the incentive compatibility of agents that are not fully rational. They have been object of a number of studies since their recent definition. A research agenda, initiated in [Ferraioli and Ventre, 2017], is to find a small set (possibly, the smallest) of conditions allowing to implement an OSP mechanism. To this aim, we define a model of probabilistic verification wherein agents are caught misbehaving with a certain probability, and show how OSP mechanisms can implement every social choice function at the cost of either imposing very large fines or verifying a linear number of agents.

• #3112
Payoff Control in the Iterated Prisoner's Dilemma
Dong Hao, Kai Li, Tao Zhou
Algorithmic Game Theory

Repeated game has long been the touchstone model for agents’ long-run relationships. Previous results suggest that it is particularly difficult for a repeated game player to exert an autocratic control on the payoffs since they are jointly determined by all participants. This work discovers that the scale of a player’s capability to unilaterally influence the payoffs may have been much underestimated. Under the conventional iterated prisoner’s dilemma, we develop a general framework for controlling the feasible region where the players’ payoff pairs lie. A control strategy player is able to confine the payoff pairs in her objective region, as long as this region has feasible linear boundaries. With this framework, many well-known existing strategies can be categorized and various new strategies with nice properties can be further identified. We show that the control strategies perform well either in a tournament or against a human-like opponent.

• #3600
Chen Hajaj, Yevgeniy Vorobeychik
Algorithmic Game Theory

• #3740
Tractable (Simple) Contests
Priel Levy, David Sarne, Yonatan Aumann
Algorithmic Game Theory

Much of the work on multi-agent contests is focused on determining the equilibrium behavior of contestants. This capability is essential for the principal for choosing the optimal parameters for the contest (e.g. prize amount). As it turns out, many contests exhibit not one, but many possible equilibria, hence precluding contest design optimization and contestants behavior prediction. In this paper we examine a variation of the classic contest that alleviates this problem by having contestants make the decisions sequentially rather than in parallel. We study this model in the setting of a simple contest, wherein contestants only choose whether or not to participate, while their performance level is exogenously set. We show that by switching to the revised mechanism the principal can not only force her most desired pure-strategies based equilibrium to emerge, but also, at times, end up with an equilibrium offering a greater expected profit. Further, we show that in the modified contest the optimal prize can be effectively computed. The theoretical analysis is complemented by comprehensive experiments with people over Amazon Mechanical Turk. Here, we find that the modified mechanism offers great benefit for the principal, both in terms of an increased over-participation in the contest (compared to theoretical expectations) and increased average profit.

• #775
Computational Aspects of the Preference Cores of Supermodular Two-Scenario Cooperative Games
Daisuke Hatano, Yuichi Yoshida
Algorithmic Game Theory

In a cooperative game, the utility of a coalition of players is given by the characteristic function, and the goal is to find a stable value division of the total utility to the players. In real-world applications, however, multiple scenarios could exist, each of which determines a characteristic function, and which scenario is more important is unknown. To handle such situations, the notion of multi-scenario cooperative games and several solution concepts have been proposed. However, computing the value divisions in those solution concepts is intractable in general. To resolve this issue, we focus on supermodular two-scenario cooperative games in which the number of scenarios is two and the characteristic functions are supermodular and study the computational aspects of a major solution concept called the preference core. First, we show that we can compute the value division in the preference core of a supermodular two-scenario game in polynomial time. Then, we reveal the relations among preference cores with different parameters. Finally, we provide more efficient algorithms for deciding the non-emptiness of the preference core for several specific supermodular two-scenario cooperative games such as the airport game, multicast tree game, and a special case of the generalized induced subgraph game.

### Monday 1610:15 - 11:15SGP-GPS - Game Playing and Search (K2)

Chair: Tristan Cazenave
• #3618
Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari
Patryk Chrabąszcz, Ilya Loshchilov, Frank Hutter
Game Playing and Search

Evolution Strategies (ES) have recently been demonstrated to be a viable alternative to reinforcement learning (RL) algorithms on a set of challenging deep learning problems, including Atari games and MuJoCo humanoid locomotion benchmarks. While the ES algorithms in that work belonged to the specialized class of natural evolution strategies (which resemble approximate gradient RL algorithms, such as REINFORCE), we demonstrate that even a very basic canonical ES algorithm can achieve the same or even better performance. This success of a basic ES algorithm suggests that the state-of-the-art can be advanced further by integrating the many advances made in the field of ES in the last decades.We also demonstrate that ES algorithms have very different performance characteristics than traditional RL algorithms: on some games, they learn to exploit the environment and perform much better while on others they can get stuck in suboptimal local minima. Combining their strengths and weaknesses with those of traditional RL algorithms is therefore likely to lead to new advances in the state-of-the-art for solving RL problems.

• #5465
(Journal track) Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents
Marlos C. Machado, Marc G. Bellemare, Erik Talvitie, Joel Veness, Matthew Hausknecht, Michael Bowling
Game Playing and Search

The Arcade Learning Environment (ALE) is an evaluation platform that poses the challenge of building AI agents with general competency across dozens of Atari 2600 games. It supports a variety of different problem settings and it has been receiving increasing attention from the scientific community. In this paper we take a big picture look at how the ALE is being used by the research community. We focus on how diverse the evaluation methodologies in the ALE have become and we highlight some key concerns when evaluating agents in this platform. We use this discussion to present what we consider to be the best practices for future evaluations in the ALE. To further the progress in the field, we also introduce a new version of the ALE that supports multiple game modes and provides a form of stochasticity we call sticky actions.

• #5453
(Journal track) MCTS-Minimax Hybrids with State Evaluations
Hendrik Baier, Mark H. M. Winands
Game Playing and Search

Monte-Carlo Tree Search (MCTS) has been found to show weaker play than minimax-based search in some tactical game domains. In order to combine the tactical strength of minimax and the strategic strength of MCTS, MCTS-minimax hybrids have been proposed in prior work. This article continues this line of research for the case where heuristic state evaluation functions are available. Three different approaches are considered, employing minimax in the rollout phase of MCTS, as a replacement for the rollout phase, and as a node prior to bias move selection. The latter two approaches are newly proposed. Results show that the use of enhanced minimax for computing node priors results in the strongest MCTS-minimax hybrid in the three test domains of Othello, Breakthrough, and Catch the Lion. This hybrid also outperforms enhanced minimax as a standalone player in Breakthrough, demonstrating that at least in this domain, MCTS and minimax can be combined to an algorithm stronger than its parts.

• #3932
High-Fidelity Simulated Players for Interactive Narrative Planning
Pengcheng Wang, Jonathan Rowe, Wookhee Min, Bradford Mott, James Lester
Game Playing and Search

Interactive narrative planning offers significant potential for creating adaptive gameplay experiences. While data-driven techniques have been devised that utilize player interaction data to induce policies for interactive narrative planners, they require enormously large gameplay datasets. A promising approach to addressing this challenge is creating simulated players whose behaviors closely approximate those of human players. In this paper, we propose a novel approach to generating high-fidelity simulated players based on deep recurrent highway networks and deep convolutional networks. Empirical results demonstrate that the proposed models significantly outperform the prior state-of-the-art in generating high-fidelity simulated player models that accurately imitate human players’ narrative interactions. Using the high-fidelity simulated player models, we show the advantage of more exploratory reinforcement learning methods for deriving generalizable narrative adaptation policies.

• #1130
Knowledge-Guided Agent-Tactic-Aware Learning for StarCraft Micromanagement
Yue Hu, Juntao Li, Xi Li, Gang Pan, Mingliang Xu
Game Playing and Search

As an important and challenging problem in artificial intelligence (AI) game playing, StarCraft micromanagement involves a dynamically adversarial game playing process with complex multi-agent control within a large action space. In this paper, we propose a novel knowledge-guided agent-tactic-aware learning scheme, that is, opponent-guided tactic learning (OGTL), to cope with this micromanagement problem. In principle, the proposed scheme takes a two-stage cascaded learning strategy which is capable of not only transferring the human tactic knowledge from the human-made opponent agents to our AI agents but also improving the adversarial ability. With the power of reinforcement learning, such a knowledge-guided agent-tactic-aware scheme has the ability to guide the AI agents to achieve high winning-rate performances while accelerating the policy exploration process in a tactic-interpretable fashion. Experimental results demonstrate the effectiveness of the proposed scheme against the state-of-the-art approaches in several benchmark combat scenarios.

### Monday 1610:15 - 11:15NLP-GEN - Natural Language Generation (T2)

Chair: Rui Yan
• #572
SentiGAN: Generating Sentimental Texts via Mixture Adversarial Networks
Ke Wang, Xiaojun Wan
Natural Language Generation

Generating texts of different sentiment labels is getting more and more attention in the area of natural language generation. Recently, Generative Adversarial Net (GAN) has shown promising results in text generation. However, the texts generated by GAN usually suffer from the problems of poor quality, lack of diversity and mode collapse. In this paper, we propose a novel framework - SentiGAN, which has multiple generators and one multi-class discriminator, to address the above problems. In our framework, multiple generators are trained simultaneously, aiming at generating texts of different sentiment labels without supervision. We propose a penalty based objective in the generators to force each of them to generate diversified examples of a specific sentiment label. Moreover, the use of multiple generators and one multi-class discriminator can make each generator focus on generating its own examples of a specific sentiment label accurately. Experimental results on four datasets demonstrate that our model consistently outperforms several state-of-the-art text generation methods in the sentiment accuracy and quality of generated texts.

• #886
Generating Thematic Chinese Poetry using Conditional Variational Autoencoders with Hybrid Decoders
Xiaopeng Yang, Xiaowen Lin, Shunda Suo, Ming Li
Natural Language Generation

Computer poetry generation is our first step towards computer writing. Writing must have a theme. The current approaches of using sequence-to-sequence models with attention often produce non-thematic poems. We present a novel conditional variational autoencoder with a hybrid decoder adding the deconvolutional neural networks to the general recurrent neural networks to fully learn topic information via latent variables. This approach significantly improves the relevance of the generated poems by representing each line of the poem not only in a context-sensitive manner but also in a holistic way that is highly related to the given keyword and the learned topic. A proposed augmented word2vec model further improves the rhythm and symmetry. Tests show that the generated poems by our approach are mostly satisfying with regulated rules and consistent themes, and 73.42% of them receive an Overall score no less than 3 (the highest score is 5).

• #1708
Chinese Poetry Generation with a Working Memory Model
Xiaoyuan Yi, Maosong Sun, Ruoyu Li, Zonghan Yang
Natural Language Generation

As an exquisite and concise literary form, poetry is a gem of human culture. Automatic poetry generation is an essential step towards computer creativity. In recent years, several neural models have been designed for this task. However, among lines of a whole poem, the coherence in meaning and topics still remains a big challenge. In this paper, inspired by the theoretical concept in cognitive psychology, we propose a novel Working Memory model for poetry generation. Different from previous methods, our model explicitly maintains topics and informative limited history in a neural memory. During the generation process, our model reads the most relevant parts from memory slots to generate the current line. After each line is generated, it writes the most salient parts of the previous line into memory slots. By dynamic manipulation of the memory, our model keeps a coherent information flow and learns to express each topic flexibly and naturally. We experiment on three different genres of Chinese poetry: quatrain, iambic and chinoiserie lyric. Both automatic and human evaluation results show that our model outperforms current state-of-the-art methods.

• #2499
Topic-to-Essay Generation with Neural Networks
Xiaocheng Feng, Ming Liu, Jiahao Liu, Bing Qin, Yibo Sun, Ting Liu
Natural Language Generation

We focus on essay generation, which is a challenging task that generates a paragraph-level text with multiple topics.Progress towards understanding different topics and expressing diversity in this task requires more powerful generators and richer training and evaluation resources. To address this,  we develop a multi-topic aware long short-term memory (MTA-LSTM) network.In this model, we maintain a novel multi-topic coverage vector, which learns the weight of each topic and is sequentially updated during the decoding process.Afterwards this vector is fed to an attention model to guide the generator.Moreover, we automatically construct two paragraph-level Chinese essay corpora, 305,000 essay paragraphs and 55,000 question-and-answer pairs.Empirical results show that our approach obtains much better BLEU score compared to various baselines.Furthermore, human judgment shows that MTA-LSTM has the ability to generate essays that are not only coherent but also closely related to the input topics.

• #2846
Toward Diverse Text Generation with Inverse Reinforcement Learning
Zhan Shi, Xinchi Chen, Xipeng Qiu, Xuanjing Huang
Natural Language Generation

Text generation is a crucial task in NLP. Recently, several adversarial generative models have been proposed to improve the exposure bias problem in text generation. Though these models gain great success, they still suffer from the problems of reward sparsity and mode collapse. In order to address these two problems, in this paper, we employ inverse reinforcement learning (IRL) for text generation. Specifically, the IRL framework learns a reward function on training data, and then an optimal policy to maximum the expected total reward. Similar to the adversarial models, the reward and policy function in IRL are optimized alternately. Our method has two advantages: (1) the reward function can produce more dense reward signals. (2) the generation policy, trained by entropy regularized'' policy gradient, encourages to generate more diversified texts. Experiment results demonstrate that our proposed method can generate higher quality texts than the previous methods.

### Monday 1610:15 - 11:15CV-UNS - Computer Vision and Unsupervised Learning (T1)

Chair: Mohamed Amer
• #1559
Co-attention CNNs for Unsupervised Object Co-segmentation
Kuang-Jui Hsu, Yen-Yu Lin, Yung-Yu Chuang
Computer Vision and Unsupervised Learning

Object co-segmentation aims to segment the common objects in images. This paper presents a CNN-based method that is unsupervised and end-to-end trainable to better solve this task. Our method is unsupervised in the sense that it does not require any training data in the form of object masks but merely a set of images jointly covering objects of a specific class. Our method comprises two collaborative CNN modules, a feature extractor and a co-attention map generator. The former module extracts the features of the estimated objects and backgrounds, and is derived based on the proposed co-attention loss which minimizes inter-image object discrepancy while maximizing intra-image figure-ground separation. The latter module is learned to generated co-attention maps by which the estimated figure-ground segmentation can better fit the former module. Besides, the co-attention loss, the mask loss is developed to retain the whole objects and remove noises. Experiments show that our method achieves superior results, even outperforming the state-of-the-art, supervised methods.

• #970
Complementary Binary Quantization for Joint Multiple Indexing
Qiang Fu, Xu Han, Xianglong Liu, Jingkuan Song, Cheng Deng
Computer Vision and Unsupervised Learning

Building multiple hash tables has been proven a successful technique for indexing massive databases, which can guarantee a desired level of overall performance. However, existing hash based multi-indexing methods suffer from the heavy redundancy, without strong table complementarity and effective hash code learning. To address the problems, this paper proposes a complementary binary quantization (CBQ) method to jointly learning multiple hash tables. It exploits the power of incomplete binary coding based on prototypes to align the original space and the Hamming space, and further utilizes the nature of multi-indexing search to jointly reduce the quantization loss based on the prototype based hash function. Our alternating optimization adaptively discovers the complementary prototype sets and the corresponding code sets of a varying size in an efficient way, which together robustly approximate the data relations. Our method can be naturally generalized to the product space for long hash codes. Extensive experiments carried out on two popular large-scale tasks including Euclidean and semantic nearest neighbor search demonstrate that the proposed CBQ method enjoys the strong table complementarity and significantly outperforms the state-of-the-art, with up to 57.76\% performance gains relatively.

• #931
Cascaded Low Rank and Sparse Representation on Grassmann Manifolds
Boyue Wang, Yongli Hu, Junbin Gao, Yanfeng Sun, Baocai Yin
Computer Vision and Unsupervised Learning

Inspired by low rank representation and sparse subspace clustering acquiring success, ones attempt to simultaneously perform low rank and sparse constraints on the affinity matrix to improve the performance. However, it is just a trade-off between these two constraints. In this paper, we propose a novel Cascaded Low Rank and Sparse Representation (CLRSR) method for subspace clustering, which seeks the sparse expression on the former learned low rank latent representation. To make our proposed method suitable to multi-dimension or imageset data, we extend CLRSR onto Grassmann manifolds. An effective solution and its convergence analysis are also provided. The excellent experimental results demonstrate the proposed method is more robust than other state-of-the-art clustering methods on imageset data.

• #686
Unpaired Multi-Domain Image Generation via Regularized Conditional GANs
Xudong Mao, Qing Li
Computer Vision and Unsupervised Learning

In this paper, we study the problem of multi-domain image generation, the goal of which is to generate pairs of corresponding images from different domains. With the recent development in generative models, image generation has achieved great progress and has been applied to various computer vision tasks. However, multi-domain image generation may not achieve the desired performance due to the difficulty of learning the correspondence of different domain images, especially when the information of paired samples is not given. To tackle this problem, we propose Regularized Conditional GAN (RegCGAN) which is capable of learning to generate corresponding images in the absence of paired training data. RegCGAN is based on the conditional GAN, and we introduce two regularizers to guide the model to learn the corresponding semantics of different domains. We evaluate the proposed model on several tasks for which paired training data is not given, including the generation of edges and photos, the generation of faces with different attributes, etc. The experimental results show that our model can successfully generate corresponding images for all these tasks, while outperforms the baseline methods. We also introduce an approach of applying RegCGAN to unsupervised domain adaptation.

• #287
Self-Representative Manifold Concept Factorization with Adaptive Neighbors for Clustering
Sihan Ma, Lefei Zhang, Wenbin Hu, Yipeng Zhang, Jia Wu, Xuelong Li
Computer Vision and Unsupervised Learning

Matrix Factorization based methods, e.g., the Concept Factorization (CF) and Nonnegative Matrix Factorization (NMF), have been proved to be efficient and effective for data clustering tasks. In recent years, various graph extensions of CF and NMF have been proposed to explore intrinsic geometrical structure of data for the purpose of better clustering performance. However, many methods build the affinity matrix used in the manifold structure directly based on the input data. Therefore, the clustering results are highly sensitive to the input data. To further improve the clustering performance, we propose a novel manifold concept factorization model with adaptive neighbor structure to learn a better affinity matrix and clustering indicator matrix at the same time. Technically, the proposed model constructs the affinity matrix by assigning the adaptive and optimal neighbors to each point based on the local distance of the learned new representation of the original data with itself as a dictionary. Our experimental results present superior performance over the state-of-the-art alternatives on numerous datasets.

### Monday 1610:15 - 11:15UAI-GPI - Graphical Models, Probabilistic Inference (K11)

Chair: Manfred Jaeger
• #3791
Efficient Localized Inference for Large Graphical Models
Jinglin Chen, Jian Peng, Qiang Liu
Graphical Models, Probabilistic Inference

We propose a new localized inference algorithm for answering marginalization queries in large graphical models with the correlation decay property. Given a query variable and a large graphical model, we define a much smaller model in a local region around the query variable in the target model so that the marginal distribution of the query variable can be accurately approximated. We introduce two approximation error bounds based on the Dobrushin’s comparison theorem and apply our bounds to derive a greedy expansion algorithm that efficiently guides the selection of neighbor nodes for localized inference. We verify our theoretical bounds on various datasets and demonstrate that our localized inference algorithm can provide fast and accurate approximation for large graphical models.

• #1119
Parameterised Queries and Lifted Query Answering
Tanya Braun, Ralf Möller
Graphical Models, Probabilistic Inference

A standard approach for inference in probabilistic formalisms with first-order constructs is lifted variable elimination (LVE) for single queries. To handle multiple queries efficiently, the lifted junction tree algorithm (LJT) employs a first-order cluster representation of a model and LVE as a subroutine. Both algorithms answer conjunctive queries of propositional random variables, shattering the model on the query, which causes unnecessary groundings for conjunctive queries of interchangeable variables. This paper presents parameterised queries as a means to avoid groundings, applying the lifting idea to queries. Parameterised queries enable LVE and LJT to compute answers faster, while compactly representing queries and answers.

• #1097
Lifted Filtering via Exchangeable Decomposition
Stefan Lüdtke, Max Schröder, Sebastian Bader, Kristian Kersting, Thomas Kirste
Graphical Models, Probabilistic Inference

We present a model for exact recursive Bayesian filtering based on lifted multiset states. Combining multisets with lifting makes it possible to simultaneously exploit multiple strategies for reducing inference complexity when compared to list-based grounded state representations. The core idea is to borrow the concept of Maximally Parallel Multiset Rewriting Systems and to enhance it by concepts from Rao-Blackwellization and Lifted Inference, giving a representation of state distributions that enables efficient inference. In worlds where the random variables that define the system state are exchangeable -- where the identity of entities does not matter -- it automatically uses a representation that abstracts from ordering (achieving an exponential reduction in complexity) -- and it automatically adapts when observations or system dynamics destroy exchangeability by breaking symmetry.

• #3864
Efficient Symbolic Integration for Probabilistic Inference
Samuel Kolb, Martin Mladenov, Scott Sanner, Vaishak Belle, Kristian Kersting
Graphical Models, Probabilistic Inference

Weighted model integration (WMI) extends weighted model counting (WMC) to the integration of functions over mixed discrete-continuous probability spaces. It has shown tremendous promise for solving inference problems in graphical models and probabilistic programs. Yet, state-of-the-art tools for WMI are generally limited either by the range of amenable theories, or in terms of performance. To address both limitations, we propose the use of extended algebraic decision diagrams (XADDs) as a compilation language for WMI. Aside from tackling typical WMI problems, XADDs also enable partial WMI yielding parametrized solutions. To overcome the main roadblock of XADDs -- the computational cost of integration -- we formulate a novel and powerful exact symbolic dynamic programming (SDP) algorithm that seamlessly handles Boolean, integer-valued and real variables, and is able to effectively cache partial computations, unlike its predecessor. Our empirical results demonstrate that these contributions can lead to a significant computational reduction over existing probabilistic inference algorithms.

• #3464
Metadata-dependent Infinite Poisson Factorization for Efficiently Modelling Sparse and Large Matrices in Recommendation
Trong Dinh Thac Do, Longbing Cao
Graphical Models, Probabilistic Inference

Matrix Factorization (MF) is widely used in Recommender Systems (RSs) for estimating missing ratings in the rating matrix. MF faces major challenges of handling very sparse and large data. Poisson Factorization (PF) as an MF variant addresses these challenges with high efficiency by only computing on those non-missing elements. However, ignoring the missing elements in computation makes PF weak or incapable for dealing with columns or rows with very few observations (corresponding to sparse items or users). In this work, Metadata-dependent Poisson Factorization (MPF) is invented to address the user/item sparsity by integrating user/item metadata into PF. MPF adds the metadata-based observed entries to the factorized PF matrices. In addition, similar to MF, choosing the suitable number of latent components for PF is very expensive on very large datasets. Accordingly, we further extend MPF to Metadata-dependent Infinite Poisson Factorization (MIPF) that integrates Bayesian Nonparametric (BNP) technique to automatically tune the number of latent components. Our empirical results show that, by integrating metadata, MPF/MIPF significantly outperform the state-of-the-art PF models for sparse and large datasets. MIPF also effectively estimates the number of latent components.

### Monday 1610:15 - 11:15ROB-ROB - Robotics (C2)

Chair: Danica Kragic
• #453
Interactive Robot Transition Repair With SMT
Jarrett Holtz, Arjun Guha, Joydeep Biswas
Robotics

Complex robot behaviors are often structured as state machines, where states encapsulate actions and a transition function switches between states. Since transitions depend on physical parameters, when the environment changes, a roboticist has to painstakingly readjust the parameters to work in the new environment. We present interactive SMT- based Robot Transition Repair (SRTR): instead of manually adjusting parameters, we ask the roboticist to identify a few instances where the robot is in a wrong state and what the right state should be. An automated analysis of the transition function 1) identifies adjustable parameters, 2) converts the transition function into a system of logical constraints, and 3) formulates the constraints and user-supplied corrections as a MaxSMT problem that yields new parameter values. We show that SRTR finds new parameters 1) quickly, 2) with few corrections, and 3) that the parameters generalize to new scenarios. We also show that a SRTR-corrected state machine can outperform a more complex, expert-tuned state machine.

• #1446
Learning Unmanned Aerial Vehicle Control for Autonomous Target Following
Siyi Li, Tianbo Liu, Chi Zhang, Dit-Yan Yeung, Shaojie Shen
Robotics

While deep reinforcement learning (RL) methods have achieved unprecedented successes in a range of challenging problems, their applicability has been mainly limited to simulation or game domains due to the high sample complexity of the trial-and-error learning process. However, real-world robotic applications often need a data-efficient learning process with safety-critical constraints. In this paper, we consider the challenging problem of learning unmanned aerial vehicle (UAV) control for tracking a moving target. To acquire a strategy that combines perception and control, we represent the policy by a convolutional neural network. We develop a hierarchical approach that combines a model-free policy gradient method with a conventional feedback proportional-integral-derivative (PID) controller to enable stable learning without catastrophic failure. The neural network is trained by a combination of supervised learning from raw images and reinforcement learning from games of self-play. We show that the proposed approach can learn a target following policy in a simulator efficiently and the learned behavior can be successfully transferred to the DJI quadrotor platform for real-world UAV control.

• #3816
Online, Interactive User Guidance for High-dimensional, Constrained Motion Planning
Fahad Islam, Oren Salzman, Maxim Likhachev
Robotics

We consider the problem of planning a collision-free path for a high-dimensional robot. Specifically, we suggest a planning framework where a motion-planning algorithm can obtain guidance from a user. In contrast to existing approaches that try to speed up planning by incorporating experiences or demonstrations ahead of planning, we suggest to seek user guidance only when the planner identifies that it ceases to make significant progress towards the goal. Guidance is provided in the form of an intermediate configuration q^, which is used to bias the planner to go through q^. We demonstrate our approach for the case where the planning algorithm is Multi-Heuristic A* (MHA*) and the robot is a 34-DOF humanoid. We show that our approach allows to compute highly-constrained paths with little domain knowledge. Without our approach, solving such problems requires carefully-crafted domain-dependent heuristics.

• #5115
(Sister Conferences Best Papers Track) A Unifying View of Geometry, Semantics, and Data Association in SLAM
Nikolay Atanasov, Sean L. Bowman, Kostas Daniilidis, George J. Pappas
Robotics

Traditional approaches for simultaneous localization and mapping (SLAM) rely on geometric features such as points, lines, and planes to infer the environment structure. They make hard decisions about the (data) association between observed features and mapped landmarks to update the environment model. This paper makes two contributions to the state of the art in SLAM. First, it generalizes the purely geometric model by introducing semantically meaningful objects, represented as structured models of mid-level part features. Second, instead of making hard, potentially wrong associations between semantic features and objects, it shows that SLAM inference can be performed efficiently with probabilistic data association. The approach not only allows building meaningful maps (containing doors, chairs, cars, etc.) but also offers significant advantages in ambiguous environments.

• #912
Learning Transferable UAV for Forest Visual Perception
Lyujie Chen, Wufan Wang, Jihong Zhu
Robotics

In this paper, we propose a new pipeline of training a monocular UAV to fly a collision-free trajectory along the dense forest trail. As gathering high-precision images in the real world is expensive and the off-the-shelf dataset has some deficiencies, we collect a new dense forest trail dataset in a variety of simulated environment in Unreal Engine. Then we formulate visual perception of forests as a classification problem. A ResNet-18 model is trained to decide the moving direction frame by frame. To transfer the learned strategy to the real world, we construct a ResNet-18 adaptation model via multi-kernel maximum mean discrepancies to leverage the relevant labelled data and alleviate the discrepancy between simulated and real environment. Simulation and real-world flight with a variety of appearance and environment changes are both tested. The ResNet-18 adaptation and its variant model achieve the best result of 84.08% accuracy in reality.

### Monday 1610:15 - 11:15ML-KER - Kernel Methods (C3)

Chair: Xinwang Liu
• #1209
Fast Factorization-free Kernel Learning for Unlabeled Chunk Data Streams
Yi Wang, Nan Xue, Xin Fan, Jiebo Luo, Risheng Liu, Bin Chen, Haojie Li, Zhongxuan Luo
Kernel Methods

Data stream analysis aims at extracting discriminative information for classification from continuously incoming samples. It is extremely challenging to detect novel data while updating the model in an efficient and stable fashion, especially for the chunk data. This paper proposes a fast factorization-free kernel learning method to unify novelty detection and incremental learning for unlabeled chunk data streams in one framework. The proposed method constructs a joint reproducing kernel Hilbert space from known class centers by solving a linear system in kernel space. Naturally, unlabeled data can be detected and classified among multi-classes by a single decision model. And projecting samples into the discriminative feature space turns out to be the product of two small-sized kernel matrices without needing such time-consuming factorization like QR-decomposition or singular value decomposition. Moreover, the insertion of a novel class can be treated as the addition of a new orthogonal basis to the existing feature space, resulting in fast and stable updating schemes. Both theoretical analysis and experimental validation on real-world datasets demonstrate that the proposed methods learn chunk data streams with significantly lower computational costs and comparable or superior accuracy than the state of the art.

• #3107
A Property Testing Framework for the Theoretical Expressivity of Graph Kernels
Nils M. Kriege, Christopher Morris, Anja Rey, Christian Sohler
Kernel Methods

Graph kernels are applied heavily for the classification of structured data. However, their expressivity is assessed almost exclusively from experimental studies and there is no theoretical justification why one kernel is in general preferable over another. We introduce a theoretical framework for investigating the expressive power of graph kernels, which is inspired by concepts from the area of property testing. We introduce the notion of distinguishability of a graph property by a graph kernel. For several established graph kernels we show that they cannot distinguish essential graph properties. In order to overcome this, we consider a kernel based on k-disc frequencies. We show that this efficiently computable kernel can distinguish fundamental graph properties. Finally, we obtain learning guarantees for nearest neighbor classifiers in our framework.

• #3781
A Degeneracy Framework for Graph Similarity
Giannis Nikolentzos, Polykarpos Meladianos, Stratis Limnios, Michalis Vazirgiannis
Kernel Methods

The problem of accurately measuring the similarity between graphs is at the core of many applications in a variety of disciplines. Most existing methods for graph similarity focus either on local or on global properties of graphs. However, even if graphs seem very similar from a local or a global perspective, they may exhibit different structure at different scales. In this paper, we present a general framework for graph similarity which takes into account structure at multiple different scales. The proposed framework capitalizes on the well-known k-core decomposition of graphs in order to build a hierarchy of nested subgraphs. We apply the framework to derive variants of four graph kernels, namely graphlet kernel, shortest-path kernel, Weisfeiler-Lehman subtree kernel, and pyramid match graph kernel. The framework is not limited to graph kernels, but can be applied to any graph comparison algorithm. The proposed framework is evaluated on several benchmark datasets for graph classification. In most cases, the core-based kernels achieve significant improvements in terms of classification accuracy over the base kernels, while their time complexity remains very attractive.

• #575
Fast Cross-Validation
Yong Liu, Hailun Lin, Lizhong Ding, Weiping Wang, Shizhong Liao
Kernel Methods

Cross-validation (CV) is the most widely adopted approach for selecting the optimal model. However, the computation of CV  has high complexity due to multiple times of learner training, making it disabled for large scale model selection. In this paper, we present an approximate approach to CV based on the theoretical notion of Bouligand influence function (BIF) and the Nystr\"{o}m method for kernel methods. We first establish the relationship between the theoretical notion of BIF and CV, and propose a method to approximate the CV via the Taylor expansion of BIF. Then, we provide a novel computing method to calculate the BIF for general distribution, and evaluate BIF for sample distribution. Finally, we use the Nystr\"{o}m method to accelerate the computation of the BIF matrix for giving the finally approximate CV criterion. The proposed approximate CV requires training only once and is suitable for a wide variety of kernel methods. Experimental results on lots of datasets how that our approximate CV has no statistical discrepancy with the original CV, but can significantly improve the efficiency.

• #2383
Beyond Similar and Dissimilar Relations : A Kernel Regression Formulation for Metric Learning
Pengfei Zhu, Ren Qi, Qinghua Hu, Qilong Wang, Changqing Zhang, Liu Yang
Kernel Methods

Most existing metric learning methods focus on learning a similarity or distance measure relying on similar and dissimilar relations between sample pairs. However, pairs of samples cannot be simply identified as similar or dissimilar in many real-world applications, e.g., multi-label learning, label distribution learning or tasks with continuous decision values. To this end, in this paper we propose a novel relation alignment metric learning (RAML)  formulation to handle the metric learning problem in those scenarios. Since the relation of two samples can be measured by the difference degree of the decision values, motivated by the consistency of the sample relations in the feature space and decision space, our proposed RAML utilizes the sample relations in the decision space to guide the metric learning in the feature space. Specifically, our RAML method formulates metric learning as a kernel regression problem, which can be efficiently optimized by the standard regression solvers. We carry out several experiments on the single-label classification, multi-label classification, and label distribution learning tasks, to demonstrate that our method achieves favorable performance against the state-of-the-art methods.

### Monday 1611:25 - 12:40EAR1 - EARLY CAREER 1 (VICTORIA)

Chair: Subbarao Kambhampati
• #5449
Towards Human-Engaged AI
Xiaojuan Ma
EARLY CAREER 1

Engagement, the key construct that describes the synergy between human (users) and technology (computing systems), is gaining increasing attention in academia and industry. Human-Engaged AI (HEAI) is an emerging research paradigm that aims to jointly advance the capability and capacity of human and AI technology. In this paper, we first review the key concepts in HEAI and its driving force from the integration of Artificial Intelligence (AI) and Human-Computer Interaction (HCI). Then we present an HEAI framework developed from our own work.

• #5492
Probabilistic Machine Learning: Models, Algorithms and a Programming Library
Jun Zhu
EARLY CAREER 1

Probabilistic machine learning provides a suite of powerful tools for modeling uncertainty, performing probabilistic inference, and making predictions or decisions in uncertain environments. In this paper, we present an overview of our recent work on probabilistic machine learning, including the theory of regularized Bayesian inference, Bayesian deep learning, scalable inference algorithms, a probabilistic programming library named ZhuSuan, and applications in representation learning as well as learning from crowds.

• #5480
Decision-Making Under Uncertainty in Multi-Agent and Multi-Robot Systems: Planning and Learning
Christopher Amato
EARLY CAREER 1

Multi-agent planning and learning methods are becoming increasingly important in today's interconnected world. Methods for real-world domains, such as robotics, must consider uncertainty and limited communication in order to generate high-quality, robust solutions. This paper discusses our work on developing principled models to represent these problems and planning and learning methods that can scale to realistic multi-agent and multi-robot tasks.

### Monday 1611:25 - 12:50KR-MAS1 - Knowledge Representation and Agents: Games, Decision, Social Choice (C7)

Chair: Takayuki Ito
• #1047
Ceteris paribus majority for social ranking
Adrian Haret, Hossein Khani, Stefano Moretti, Meltem Öztürk
Knowledge Representation and Agents: Games, Decision, Social Choice

We study the problem of finding a social ranking over individuals given a ranking over coalitions formed by them. We investigate the use of a ceteris paribus majority principle as a social ranking solution inspired from the classical axioms of social choice theory. Faced with a Condorcet-like paradox, we analyze the consequences of restricting the domain according to an adapted version of single-peakedness. We conclude with a discussion on different interpretations of incompleteness of the ranking over coalitions and its exploitation for defining new social rankings, providing a new rule as an example.

• #2116
An Efficient Algorithm To Compute Distance Between Lexicographic Preference Trees
Minyi Li, Borhan Kazimipour
Knowledge Representation and Agents: Games, Decision, Social Choice

Very often, we have to look into multiple agents' preferences, and compare or aggregate them. In this paper, we consider the well-known model, namely, lexicographic preference trees (LP-trees), for representing agents' preferences in combinatorial domains. We tackle the problem of calculating the dissimilarity/distance between agents' LP-trees. We propose an algorithm LpDis to compute the number of disagreed pairwise preferences between agents by traversing their LP-trees. The proposed algorithm is computationally efficient and allows agents to have different attribute importance structures and preference dependencies.

• #2829
Game Description Language and Dynamic Epistemic Logic Compared
Thorsten Engesser, Robert Mattmüller, Bernhard Nebel, Michael Thielscher
Knowledge Representation and Agents: Games, Decision, Social Choice

Several different frameworks have been proposed to model and reason about knowledge in dynamic multi-agent settings, among them the logic-programming-based game description language GDL-III, and dynamic epistemic logic (DEL), based on possible-worlds semantics. GDL-III and DEL have complementary strengths and weaknesses in terms of ease of modeling and simplicity of semantics. In this paper, we formally study the expressiveness of GDL-III vs. DEL. We clarify the commonalities and differences between those languages, demonstrate how to bridge the differences where possible, and identify large fragments of GDL-III and DEL that are equivalent in the sense that they can be used to encode games or planning tasks that admit the same legal action sequences. We prove the latter by providing compilations between those fragments of GDL-III and DEL.

• #3489
Goal-Based Collective Decisions: Axiomatics and Computational Complexity
Arianna Novaro, Umberto Grandi, Dominique Longin, Emiliano Lorini
Knowledge Representation and Agents: Games, Decision, Social Choice

We study agents expressing propositional goals over a set of binary issues to reach a collective decision. We adapt properties and rules from the literature on Social Choice Theory to our setting, providing an axiomatic characterisation of a majority rule for goal-based voting. We study the computational complexity of finding the outcome of our rules (i.e., winner determination), showing that it ranges from Nondeterministic Polynomial Time (NP) to Probabilistic Polynomial Time (PP).

• #3717
Accountable Approval Sorting
Khaled Belahcene, Yann Chevaleyre, Christophe Labreuche, Nicolas Maudet, Vincent Mousseau, Wassila Ouerdane
Knowledge Representation and Agents: Games, Decision, Social Choice

We consider decision situations in which a set of points of view (voters, criteria) are to sort a set of candidates to ordered categories (Good/Bad). Candidates are judged  good, when approved by a sufficient set of points of view; this corresponds to NonCompensatory Sorting. To be accountable, such approval sorting should provide guarantees about the decision process and decisions concerning specific candidates. We formalize accountability using a feasibility problem expressed as a boolean satisfiability formulation. We illustrate different forms of accountability when a committee decides with approval sorting and study the information that should be disclosed by the committee.

• #5467
(Journal track) Impossibility in Belief Merging
Amilcar Mata Diaz, Ramon Pino Perez
Knowledge Representation and Agents: Games, Decision, Social Choice

With the aim of studying social properties of belief merging and having a better understanding of impossibility, we extend in three ways the framework of logic-based merging introduced by Konieczny and Pino Perez. First, at the level of representation of the information, we pass from belief bases to complex epistemic states. Second, the profiles are represented as functions of finite societies to the set of epistemic states (a sort of vectors) and not as multisets of epistemic states. Third, we extend the set of rational postulates in order to consider the epistemic versions of the classical postulates of social choice theory: standard domain, Pareto property, independence of irrelevant alternatives and absence of dictator. These epistemic versions of social postulates are given, essentially, in terms of the finite propositional logic. We state some representation theorems for these operators. These extensions and representation theorems allow us to establish an epistemic and very general version of Arrow's impossibility theorem. One of the interesting features of our result, is that it holds for different representations of epistemic states; for instance conditionals, ordinal conditional functions and, of course, total preorders.

• #1251
A Savage-style Utility Theory for Belief Functions
Chunlai Zhou, Biao Qin, Xiaoyong Du
Knowledge Representation and Agents: Games, Decision, Social Choice

In this paper, we provide an axiomatic justification for decision making with belief functions by studying the belief-function counterpart of Savage's Theorem where the state space is finite and the consequence set is a continuum [l, M] (l<M). We propose six axioms for a preference relation over acts, and then show that this axiomatization admits a definition of qualitative belief functions comparing preferences over events that guarantees the existence of a belief function on the state space. The key axioms are uniformity and an analogue of the independence axiom. The uniformity axiom is used to ensure that all acts with the same maximal and minimal consequences must be equivalent. And our independence axiom shows the existence of a utility function and implies the uniqueness of the belief function on the state space. Moreover, we prove without the independence axiom the neutrality theorem that two acts are indifferent whenever they generate the same belief functions over consequences. At the end of the paper, we compare our approach with other related decision theories for belief functions.

### Monday 1611:25 - 12:50PS-SEA - Planning and Search (K2)

Chair: Felipe Meneguzzi
• #223
Analyzing Tie-Breaking Strategies for the A* Algorithm
Augusto B. Corrêa, André G. Pereira, Marcus Ritt
Planning and Search

For a given state space and admissible heuristic function h there is always a tie-breaking strategy for which A* expands the minimum number of states [Dechter and Pearl, 1985]. We say that these strategies have optimal expansion. Although such a strategy always exists it may depend on the instance, and we currently do not know a tie-breaker that always guarantees optimal expansion. In this paper, we study tie-breaking strategies for A*. We analyze common strategies from the literature and prove that they do not have optimal expansion. We propose a novel tie-breaking strategy using cost adaptation that has always optimal expansion. We experimentally analyze the performance of A* using several tie-breaking strategies on domains from the IPC and zero-cost domains. Our best strategy solves significantly more instances than the standard method in the literature and more than the previous state-of-the-art strategy. Our analysis improves the understanding of how to develop effective tie-breaking strategies and our results also improve the state-of-the-art of tie-breaking strategies for A*.

• #2009
Meta-Level Control of Anytime Algorithms with Online Performance Prediction
Justin Svegliato, Kyle Hollins Wray, Shlomo Zilberstein
Planning and Search

Anytime algorithms enable intelligent systems to trade computation time with solution quality. To exploit this crucial ability in real-time decision-making, the system must decide when to interrupt the anytime algorithm and act on the current solution. Existing meta-level control techniques, however, address this problem by relying on significant offline work that diminishes their practical utility and accuracy. We formally introduce an online performance prediction framework that enables meta-level control to adapt to each instance of a problem without any preprocessing. Using this framework, we then present a meta-level control technique and two stopping conditions. Finally, we show that our approach outperforms existing techniques that require substantial offline work. The result is efficient nonmyopic meta-level control that reduces the overhead and increases the benefits of using anytime algorithms in intelligent systems.

• #3028
Effect-Abstraction Based Relaxation for Linear Numeric Planning
Dongxu Li, Enrico Scala, Patrik Haslum, Sergiy Bogomolov
Planning and Search

This paper studies an effect-abstraction based relaxation for reasoning about linear numeric planning problems. The effect-abstraction decomposes non-constant linear numeric effects into actions with conditional effects over additive constant numeric effects. With little effort, on this compiled version, it is possible to use known subgoaling based relaxations and relative heuristics. The combination of these two steps leads to a novel relaxation based heuristic. Theoretically, the relaxation is proved tighter than previous interval based relaxation and leading to safe-pruning heuristics. Empirically, a heuristic developed on this relaxation leads to substantial improvements for a class of problems that are currently out of the reach of state-of-the-art numeric planners.

• #3241
Unchaining the Power of Partial Delete Relaxation, Part II: Finding Plans with Red-Black State Space Search
Maximilian Fickert, Daniel Gnad, Joerg Hoffmann
Planning and Search

Red-black relaxation in classical planning allows to interpolate between delete-relaxed and real planning. Yet the traditional use of relaxations to generate heuristics restricts relaxation usage to tractable fragments. How to actually tap into the red-black relaxation's interpolation power? Prior work has devised red-black state space search (RBS) for intractable red-black planning, and has explored two uses: proving unsolvability, generating seed plans for plan repair. Here, we explore the generation of plans directly through RBS. We design two enhancements to this end: (A) use a known tractable fragment where possible, use RBS for the intractable parts; (B) check RBS state transitions for realizability, spawn relaxation refinements where the check fails. We show the potential merits of both techniques on IPC benchmarks.

• #3388
Local Minima, Heavy Tails, and Search Effort for GBFS
Eldan Cohen, J. Christopher Beck
Planning and Search

Problem difficulty for greedy best first search (GBFS) is not entirely understood, though existing work points to deep local minima and poor correlation between the h-values and the distance to goal as factors that have significant negative effect on the search effort. In this work, we show that there is a very strong exponential correlation between the depth of the single deepest local minima encountered in a search and the overall search effort. Furthermore, we find that the distribution of local minima depth changes dramatically based on the constrainedness of problems, suggesting an explanation for the previously observed heavy-tailed behavior in GBFS. In combinatorial search, a similar result led to the use of randomized restarts to escape deep subtrees with no solution and corresponding significant speed-ups. We adapt this method and propose a randomized restarting GBFS variant that improves GBFS performance by escaping deep local minima, and does so even in the presence of other, randomization-based, search enhancements.

• #3552
LP Heuristics over Conjunctions: Compilation, Convergence, Nogood Learning
Marcel Steinmetz, Joerg Hoffmann
Planning and Search

Two strands of research in classical planning are LP heuristics and conjunctions to improve approximations. Combinations of the two have also been explored. Here, we focus on convergence properties, forcing the LP heuristic to equal the perfect heuristic h* in the limit. We show that, under reasonable assumptions, partial variable merges are strictly dominated by the compilation Pi^C of explicit conjunctions, and that both render the state equation heuristic equal to h* for a suitable set C of conjunctions. We show that consistent potential heuristics can be computed from a variant of Pi^C, and that such heuristics can represent h* for suitable C. As an application of these convergence properties, we consider sound nogood learning in state space search, via refining the set C. We design a suitable refinement method to this end. Experiments on IPC benchmarks show significant performance improvements in several domains.

• #4268
William Vega-Brown, Nicholas Roy
Planning and Search

We define an admissibility condition for abstractions expressed using angelic semantics and show that these conditions allow us to accelerate planning while preserving the ability to find the optimal motion plan.  We then derive admissible abstractions for two motion planning domains with continuous state.  We extract upper and lower bounds on the cost of concrete motion plans using local metric and topological properties of the problem domain.  These bounds guide the search for a plan while maintaining performance guarantees.  We show that abstraction can dramatically reduce the complexity of search relative to a direct motion planner.  Using our abstractions, we find near-optimal motion plans in planning problems involving 10^13 states without using a separate task planner.

### Monday 1611:25 - 12:50NLP-CV1 - Language and Vision (T2)

Chair: Jiajun Zhang
• #66
Dilated Convolutional Network with Iterative Optimization for Continuous Sign Language Recognition
Junfu Pu, Wengang Zhou, Houqiang Li
Language and Vision

This paper presents a novel deep neural architecture with iterative optimization strategy for real-world continuous sign language recognition. Generally, a continuous sign language recognition system consists of visual input encoder for feature extraction and a sequence learning model to learn the correspondence between the input sequence and the output sentence-level labels. We use a 3D residual convolutional network (3D-ResNet) to extract visual features. After that, a stacked dilated convolutional network with Connectionist Temporal Classification (CTC) is applied for learning the mapping between the sequential features and the text sentence. The deep network is hard to train since the CTC loss has limited contribution to early CNN parameters. To alleviate this problem,  we design an iterative optimization strategy to train our architecture. We generate pseudo-labels for video clips from sequence learning model with CTC, and fine-tune the 3D-ResNet with the supervision of pseudo-labels for a better feature representation. We alternately optimize feature extractor and sequence learning model with iterative steps. Experimental results on RWTH-PHOENIX-Weather, a large real-world continuous sign language recognition benchmark, demonstrate the advantages and  effectiveness of our proposed method.

• #651
Multi-modal Circulant Fusion for Video-to-Language and Backward
Aming Wu, Yahong Han
Language and Vision

Multi-modal fusion has been widely involved in focuses of the modern artificial intelligence research, e.g., from visual content to languages and backward. Common-used multi-modal fusion methods mainly include element-wise product, element-wise sum, or even simply concatenation between different types of features, which are somewhat straightforward but lack in-depth analysis. Recent studies have shown fully exploiting interactions among elements of multi-modal features will lead to a further performance gain. In this paper, we put forward a new approach of multi-modal fusion, namely Multi-modal Circulant Fusion (MCF). Particularly, after reshaping feature vectors into circulant matrices, we define two types of interaction operations between vectors and matrices. As each row of the circulant matrix shifts one elements, with newly-defined interaction operations, we almost explore all possible interactions between vectors of different modalities. Moreover, as only regular operations are involved and defined a priori, MCF avoids increasing parameters or computational costs for multi-modal fusion. We evaluate MCF with tasks of video captioning and temporal activity localization via language (TALL). Experiments on MSVD and MSRVTT show our method obtains the state-of-the-art for video captioning. For TALL, by plugging into MCF, we achieve a performance gain of roughly 4.2% on TACoS.

• #702
Multi-modal Sentence Summarization with Modality Attention and Image Filtering
Haoran Li, Junnan Zhu, Tianshang Liu, Jiajun Zhang, Chengqing Zong
Language and Vision

In this paper, we introduce a multi-modal sentence summarization task that produces a short summary from a pair of sentence and image. This task is more challenging than sentence summarization. It not only needs to effectively incorporate visual features into standard text summarization framework, but also requires to avoid noise of image. To this end, we propose a modality-based attention mechanism to pay different attention to image patches and text units, and we design image filters to selectively use visual information to enhance the semantics of the input sentence. We construct a multimodal sentence summarization dataset and extensive experiments on this dataset demonstrate that our models significantly outperform conventional models which only employ text as input. Further analyses suggest that sentence summarization task can benefit from visually grounded representations from a variety of aspects.

• #520
Cross-media Multi-level Alignment with Relation Attention Network
Jinwei Qi, Yuxin Peng, Yuxin Yuan
Language and Vision

With the rapid growth of multimedia data, such as image and text, it is a highly challenging problem to effectively correlate and retrieve the data of different media types. Naturally, when correlating an image with textual description, people focus on not only the alignment between discriminative image regions and key words, but also the relations lying in the visual and textual context. Relation understanding is essential for cross-media correlation learning, which is ignored by prior cross-media retrieval works. To address the above issue, we propose Cross-media Relation Attention Network (CRAN) with multi-level alignment. First, we propose visual-language relation attention model to explore both fine-grained patches and their relations of different media types. We aim to not only exploit cross-media fine-grained local information, but also capture the intrinsic relation information, which can provide complementary hints for correlation learning. Second, we propose cross-media multi-level alignment to explore global, local and relation alignments across different media types, which can mutually boost to learn more precise cross-media correlation. We conduct experiments on 2 cross-media datasets, and compare with 10 state-of-the-art methods to verify the effectiveness of proposed approach.

• #3030
Multi-task Layout Analysis for Historical Handwritten Documents Using Fully Convolutional Networks
Yue Xu, Fei Yin, Zhaoxiang Zhang, Cheng-Lin Liu
Language and Vision

Layout analysis is a fundamental process in document image analysis and understanding. It consists of several sub-processes such as page segmentation, text line segmentation, baseline detection and so on. In this work, we propose a multi-task layout analysis method that use a single FCN model to solve the above three problems simultaneously. The FCN is trained to segment the document image into different regions and detect the center line of each text line by classifying pixels into different categories. By supervised learning on document images with pixel-wise labels, the FCN can extract discriminative features and perform pixel-wise classification accurately. After pixel-wise classification, post-processing steps are taken to reduce noises, correct wrong segmentations and find out overlapping regions. Experimental results on the public dataset DIVA-HisDB containing challenging medieval manuscripts demonstrate the effectiveness and superiority of the proposed method.

• #486
Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding
Zhou Yu, Jun Yu, Chenchao Xiang, Zhou Zhao, Qi Tian, Dacheng Tao
Language and Vision

Visual grounding aims to localize an object in an image referred to by a textual query phrase. Various visual grounding approaches have been proposed, and the problem can be modularized into a general framework: proposal generation, multi-modal feature representation, and proposal ranking. Of these three modules, most existing approaches focus on the latter two, with the importance of proposal generation generally neglected. In this paper, we rethink the problem of what properties make a good proposal generator. We introduce the diversity and discrimination simultaneously when generating proposals, and in doing so propose Diversified and Discriminative Proposal Networks model (DDPN). Based on the proposals generated by DDPN, we propose a high performance baseline model for visual grounding and evaluate it on four benchmark datasets. Experimental results demonstrate that our model delivers significant improvements on all the tested data-sets (e.g., 18.8% improvement on ReferItGame and 8.2% improvement on Flickr30k Entities over the existing state-of-the-arts respectively).

• #2550
Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances
Thao Le Minh, Nobuyuki Shimizu, Takashi Miyazaki, Koichi Shinoda
Language and Vision

With the widespread use of intelligent systems, such as smart speakers, addressee recognition has become a concern in human-computer interaction, as more and more people expect such systems to understand complicated social scenes, including those outdoors, in cafeterias, and hospitals. Because previous studies typically focused only on pre-specified tasks with limited conversational situations such as controlling smart homes, we created a mock dataset called Addressee Recognition in Visual Scenes with Utterances (ARVSU) that contains a vast body of image variations in visual scenes with an annotated utterance and a corresponding addressee for each scenario. We also propose a multi-modal deep-learning-based model that takes different human cues, specifically eye gazes and transcripts of an utterance corpus, into account to predict the conversational addressee from a specific speaker's view in various real-life conversational scenarios. To the best of our knowledge, we are the first to introduce an end-to-end deep learning model that combines vision and transcripts of utterance for addressee recognition. As a result, our study suggests that future addressee recognition can reach the ability to understand human intention in many social situations previously unexplored, and our modality dataset is a first step in promoting research in this field.

### Monday 1611:25 - 12:50CV-REC1 - Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation (T1)

Chair: William K. Cheung
• #113
Yupei Wang, Xin Zhao, Yin Li, Xuecai Hu, Kaiqi Huang
Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation

Shadow detection is an important and challenging problem in computer vision. Recently, single image shadow detection had achieved major progress with the development of deep convolutional networks. However, existing methods are still vulnerable to background clutters, and often fail to capture the global context of an input image. These global contextual and semantic cues are essential for accurately localizing the shadow regions. Moreover, rich spatial details are required to segment shadow regions with precise shape. To this end, this paper presents a novel model characterized by a deeply supervised parallel fusion (DSPF) network and a densely cascaded learning scheme. The DSPF network achieves a comprehensive fusion of global semantic cues and local spatial details by multiple stacked parallel fusion branches, which are learned in a deeply supervised manner. Moreover, the densely cascaded learning scheme is employed to refine the spatial details. Our method is evaluated on two widely used shadow detection benchmarks. Experimental results show that our method outperforms state-of-the-arts by a large margin.

• #488
Hi-Fi: Hierarchical Feature Integration for Skeleton Detection
Kai Zhao, Wei Shen, Shanghua Gao, Dandan Li, Ming-Ming Cheng
Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation

In natural images, the scales (thickness) of object skeletons may dramatically vary among objects and object parts. Thus, robust skeleton detection requires powerful multi-scale feature integration ability. To address this issue, we present a new convolutional neural network (CNN) architecture by introducing a novel hierarchical feature integration mechanism, named Hi-Fi, to address the object skeleton detection problem. The proposed CNN-based approach intrinsically captures high-level semantics from deeper layers, as well as low-level details from shallower layers. By hierarchically integrating different CNN feature levels with bidirectional guidance, our approach (1) enables mutual refinement across features of different levels, and (2) possesses the strong ability to capture both rich object context and high-resolution details. Experimental results show that our method significantly outperforms the state-of-the-art methods in terms of effectively fusing features from very different scales, as evidenced by a considerable performance improvement on several benchmarks.

• #660
R³Net: Recurrent Residual Refinement Network for Saliency Detection
Zijun Deng, Xiaowei Hu, Lei Zhu, Xuemiao Xu, Jing Qin, Guoqiang Han, Pheng-Ann Heng
Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation

Saliency detection is a fundamental yet challenging task in computer vision, aiming at highlighting the most visually distinctive objects in an image. We propose a novel recurrent residual refinement network (R^3Net) equipped with residual refinement blocks (RRBs) to more accurately detect salient regions of an input image. Our RRBs learn the residual between the intermediate saliency prediction and the ground truth by alternatively leveraging the low-level integrated features and the high-level integrated features of a fully convolutional network (FCN). While the low-level integrated features are capable of capturing more saliency details, the high-level integrated features can reduce non-salient regions in the intermediate prediction. Furthermore, the RRBs can obtain complementary saliency information of the intermediate prediction, and add the residual into the intermediate prediction to refine the saliency maps. We evaluate the proposed R^3Net on five widely-used saliency detection benchmarks by comparing it with 16 state-of-the-art saliency detectors. Experimental results show that our network outperforms our competitors in all the benchmark datasets.

• #991
IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection
Qiangpeng Yang, Mengli Cheng, Wenmeng Zhou, Yan Chen, Minghui Qiu, Wei Lin
Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation

Incidental scene text detection, especially for multi-oriented text regions, is one of the most challenging tasks in many computer vision applications.Different from the common object detection task, scene text often suffers from a large variance of aspect ratio, scale, and orientation. To solve this problem, we propose a novel end-to-end scene text detector IncepText from an instance-aware segmentation perspective. We design a novel Inception-Text module and introduce deformable PSROI pooling to deal with multi-oriented text detection. Extensive experiments on ICDAR2015, RCTW-17, and MSRA-TD500 datasets demonstrate our method's superiority in terms of both effectiveness and efficiency. Our proposed method achieves 1st place result on ICDAR2015 challenge and the state-of-the-art performance on other datasets. Moreover, we have released our implementation as an OCR product which is available for public access.

• #1837
Collaborative Learning for Weakly Supervised Object Detection
Jiajie Wang, Jiangchao Yao, Ya Zhang, Rui Zhang
Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation

Weakly supervised object detection has recently received much attention, since it only requires image-level labels instead of the bounding-box labels consumed in strongly supervised learning. Nevertheless, the save in labeling expense is usually at the cost of model accuracy.In this paper, we propose a simple but effective weakly supervised collaborative learning framework to resolve this problem, which trains a weakly supervised learner and a strongly supervised learner jointly by enforcing partial feature sharing and prediction consistency. For object detection, taking WSDDN-like architecture as weakly supervised detector sub-network and Faster-RCNN-like architecture as strongly supervised detector sub-network, we propose an end-to-end Weakly Supervised Collaborative Detection Network. As there is no strong supervision available to train the Faster-RCNN-like sub-network, a new prediction consistency loss is defined to enforce consistency of predictions between the two sub-networks as well as within the Faster-RCNN-like sub-networks. At the same time, the two detectors are designed to partially share features to further guarantee the model consistency at perceptual level. Extensive experiments on PASCAL VOC 2007 and 2012 data sets have demonstrated the effectiveness of the proposed framework.

• #102
Deep Joint Semantic-Embedding Hashing
Ning Li, Chao Li, Cheng Deng, Xianglong Liu, Xinbo Gao
Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation

Hashing has been widely deployed to large-scale image retrieval due to its low storage cost and fast query speed. Almost all deep hashing methods do not sufficiently discover semantic correlation from label information, which results in the learned hash codes less discriminative. In this paper, we propose a novel Deep Joint Semantic-Embedding Hashing (DSEH) approach that contains LabNet and ImgNet. Specifically, LabNet is explored to capture abundant semantic correlation between sample pairs and supervise ImgNet from semantic level and hash codes level, which is conductive to the generated hash codes being more discriminative and similarity-preserving. Extensive experiments on three benchmark datasets show that the proposed model outperforms the state-of-the-art methods.

• #154
Semantic Structure-based Unsupervised Deep Hashing
Erkun Yang, Cheng Deng, Tongliang Liu, Wei Liu, Dacheng Tao
Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation

Hashing is becoming increasingly popular for approximate nearest neighbor searching in massive databases due to its storage and search efficiency. Recent supervised hashing methods, which usually construct semantic similarity matrices to guide hash code learning using label information, have shown promising results. However, it is relatively difficult to capture and utilize the semantic relationships between points in unsupervised settings. To address this problem, we propose a novel unsupervised deep framework called Semantic Structure-based unsupervised Deep Hashing (SSDH). We first empirically study the deep feature statistics, and find that the distribution of the cosine distance for point pairs can be estimated by two half Gaussian distributions. Based on this observation, we construct the semantic structure by considering points with distances obviously smaller than the others as semantically similar and points with distances obviously larger than the others as semantically dissimilar. We then design a deep architecture and a pair-wise loss function to preserve this semantic structure in Hamming space. Extensive experiments show that SSDH significantly outperforms current state-of-the-art methods.

### Monday 1611:25 - 12:50ML-ONL - Online Learning (K11)

Chair: Arunesh Sinha
• #1969
Online Kernel Selection via Incremental Sketched Kernel Alignment
Xiao Zhang, Shizhong Liao
Online Learning

In contrast to offline kernel selection, online kernel selection must rise to the new challenges of passing the training set once, selecting optimal kernels and updating hypotheses at each round, enjoying a sublinear regret bound for online kernel learning, and requiring a constant maintenance time complexity at each round and an efficient overall time complexity integrated with online kernel learning. However, most of existing online kernel selection approaches can not meet the new challenges. To address this issue, we propose a novel online kernel selection approach via the incremental sketched kernel alignment criterion, which meets all the new challenges. We first define the incremental sketched kernel alignment (ISKA) criterion, which estimates the kernel alignment and can be computed incrementally  and efficiently. When applying the proposed ISKA criterion to online kernel selection, we adopt the subclass coherence to maintain the hypothesis space, select the optimal kernel at each round using the median of the ISKA criterion estimates, and update the hypothesis following the online gradient decent method. We prove that the ISKA criterion is an unbiased estimate of the maximum mean discrepancy, enjoys the optimal logarithmic regret bound for online kernel learning, and has a constant maintenance time complexity at each round and a logarithmic overall time complexity integrated with online kernel learning. Empirical studies demonstrate that the proposed online kernel selection approach is computationally efficient while maintaining comparable accuracy for online kernel learning.

• #2354
Online Deep Learning: Learning Deep Neural Networks on the Fly
Doyen Sahoo, Quang Pham, Jing Lu, Steven C. H. Hoi
Online Learning

Deep Neural Networks (DNNs) are typically trained by backpropagation in a batch setting, requiring the entire training data to be made available prior to the learning task. This is not scalable for many real-world scenarios where new data arrives sequentially in a stream. We aim to address an open challenge of Online Deep Learning" (ODL) for learning DNNs on the fly in an online setting. Unlike traditional online learning that often optimizes some convex objective function with respect to a shallow model (e.g., a linear/kernel-based hypothesis), ODL is more challenging as the optimization objective is non-convex, and regular DNN with standard backpropagation does not work well in practice for online settings. We present a new ODL framework that attempts to tackle the challenges by learning DNN models which dynamically adapt depth from a sequence of training data in an online learning setting. Specifically, we propose a novel Hedge Backpropagation (HBP) method for online updating the parameters of DNN effectively, and validate the efficacy on large data sets (both stationary and concept drifting scenarios).

• #1499
Guanghui Wang, Dakuan Zhao, Lijun Zhang
Online Learning

To cope with non-stationary environments, recent advances in online optimization have introduced the notion of adaptive regret, which measures the performance of an online learner against different comparators within different time intervals. Previous studies have proposed various algorithms to yield low adaptive regret under different scenarios. However, all of existing algorithms need to query the gradient of the loss function at least O(log t) times in every iteration t, which hinders their applications to broad domains, especially when the evaluation of gradients is expensive. To address this limitation, we propose a series of computationally efficient algorithms for minimizing the adaptive regret of general convex, strongly convex and exponentially concave functions respectively. The key idea is to replace each loss function with a carefully designed surrogate loss, which bounds the original loss function from below. We show that the proposed algorithms only query the gradient once per iteration, and attain the same theoretical guarantees as previous optimal algorithms. Empirical results demonstrate the efficiency and effectiveness of our methods.

• #1421
Efficient Adaptive Online Learning via Frequent Directions
Yuanyu Wan, Nan Wei, Lijun Zhang
Online Learning

By employing time-varying proximal functions, adaptive subgradient methods (ADAGRAD) have improved the regret bound and been widely used in online learning and optimization. However, ADAGRAD with full matrix proximal functions (ADA-FULL) cannot deal with large-scale problems due to the impractical time and space complexities, though it has better performance when gradients are correlated. In this paper, we propose ADA-FD, an efficient variant of ADA-FULL based on a deterministic matrix sketching technique called frequent directions. Following ADA-FULL, we incorporate our ADA-FD into both primal-dual subgradient method and composite mirror descent method to develop two efficient methods. By maintaining and manipulating low-rank matrices, at each iteration, the space complexity is reduced from $O(d^2)$ to $O(\tau d)$ and the time complexity is reduced from $O(d^3)$ to $O(\tau^2d)$, where $d$ is the dimensionality of the data and $\tau \ll d$ is the sketching size. Theoretical analysis reveals that the regret of our methods is close to that of ADA-FULL as long as the outer product matrix of gradients is approximately low-rank. Experimental results show that our ADA-FD is comparable to ADA-FULL and outperforms other state-of-the-art algorithms in online convex optimization as well as in training convolutional neural networks (CNN).

• #1511
Bandit Online Learning on Graphs via Adaptive Optimization
Peng Yang, Peilin Zhao, Xin Gao
Online Learning

Traditional online learning on graphs adapts graph Laplacian into ridge regression, which may not guarantee reasonable accuracy when the data are adversarially generated. To solve this issue, we exploit an adaptive optimization framework for online classification on graphs. The derived model can achieve a min-max regret under an adversarial mechanism of data generation. To take advantage of the informative labels, we propose an adaptive large-margin update rule, which enjoys a lower regret than the algorithms using error-driven update rules. However, this algorithm assumes that the full information label is provided for each node, which is violated in many practical applications where labeling is expensive and the oracle may only tell whether the prediction is correct or not. To address this issue, we propose a bandit online algorithm on graphs. It derives per-instance confidence region of the prediction, from which the model can be learned adaptively to minimize the online regret. Experiments on benchmark graph datasets show that the proposed bandit algorithm outperforms state-of-the-art competitors, even sometimes beats the algorithms using full information label feedback.

• #4598
Combinatorial Pure Exploration with Continuous and Separable Reward Functions and Its Applications
Weiran Huang, Jungseul Ok, Liang Li, Wei Chen
Online Learning

We study the Combinatorial Pure Exploration problem with Continuous and Separable reward functions (CPE-CS) in the stochastic multi-armed bandit setting. In a CPE-CS instance, we are given several stochastic arms with unknown distributions, as well as a collection of possible decisions. Each decision has a reward according to the distributions of arms. The goal is to identify the decision with the maximum reward, using as few arm samples as possible. The problem generalizes the combinatorial pure exploration problem with linear rewards, which has attracted significant attention in recent years. In this paper, we propose an adaptive learning algorithm for the CPE-CS problem, and analyze its sample complexity. In particular, we introduce a new hardness measure called the consistent optimality hardness, and give both the upper and lower bounds of sample complexity. Moreover, we give examples to demonstrate that our solution has the capacity to deal with non-linear reward functions.

• #195
UCBoost: A Boosting Approach to Tame Complexity and Optimality for Stochastic Bandits
Fang Liu, Sinong Wang, Swapna Buccapatnam, Ness Shroff
Online Learning

In this work, we address the open problem of finding low-complexity near-optimal multi-armed bandit algorithms for sequential decision making problems. Existing bandit algorithms are either sub-optimal and computationally simple (e.g., UCB1) or optimal and computationally complex (e.g., kl-UCB). We propose a boosting approach to Upper Confidence Bound based algorithms for stochastic bandits, that we call UCBoost. Specifically, we propose two types of UCBoost algorithms. We show that UCBoost(D) enjoys O(1) complexity for each arm per round as well as regret guarantee that is 1/e-close to that of the kl-UCB algorithm. We propose an approximation-based UCBoost algorithm, UCBoost(epsilon), that enjoys a regret guarantee epsilon-close to that of kl-UCB as well as O(log(1/epsilon)) complexity for each arm per round. Hence, our algorithms provide practitioners a practical way to trade optimality with computational complexity. Finally, we present numerical results which show that UCBoost(epsilon) can achieve the same regret performance as the standard kl-UCB while incurring only 1% of the computational cost of kl-UCB.

### Monday 1611:25 - 12:50MUL-SP - Security and Privacy (C2)

• #2541
GELU-Net: A Globally Encrypted, Locally Unencrypted Deep Neural Network for Privacy-Preserved Learning
Qiao Zhang, Cong Wang, Hongyi Wu, Chunsheng Xin, Tran V. Phuong
Security and Privacy

Privacy is a fundamental challenge for a variety of smart applications that depend on data aggregation and collaborative learning across different entities. In this paper, we propose a novel privacy-preserved architecture where clients can collaboratively train a deep model while preserving the privacy of each client’s data. Our main strategy is to carefully partition a deep neural network to two non-colluding parties. One party performs linear computations on encrypted data utilizing a less complex homomorphic cryptosystem, while the other executes non-polynomial computations in plaintext but in a privacy-preserved manner. We analyze security and compare the communication and computation complexity with the existing approaches. Our extensive experiments on different datasets demonstrate not only stable training without accuracy loss, but also 14 to 35 times speedup compared to the state-of-the-art system.

• #3594
Adversarial Regression for Detecting Attacks in Cyber-Physical Systems
Amin Ghafouri, Yevgeniy Vorobeychik, Xenofon Koutsoukos
Security and Privacy

Attacks in cyber-physical systems (CPS) which manipulate sensor readings can cause enormous physical damage if undetected. Detection of attacks on sensors is crucial to mitigate this issue. We study supervised regression as a means to detect anomalous sensor readings, where each sensor's measurement is predicted as a function of other sensors. We show that several common learning approaches in this context are still vulnerable to stealthy attacks, which carefully modify readings of compromised sensors to cause desired damage while remaining undetected. Next, we model the interaction between the CPS defender and attacker as a Stackelberg game in which the defender chooses detection thresholds, while the attacker deploys a stealthy attack in response. We present a heuristic algorithm for finding an approximately optimal threshold for the defender in this game, and show that it increases system resilience to attacks without significantly increasing the false alarm rate.

• #85
Zheng Wang, Mang Ye, Fan Yang, Xiang Bai, Shin'ichi Satoh
Security and Privacy

Person re-identification (REID) is an important task in video surveillance and forensics applications. Most of previous approaches are based on a key assumption that all person images have uniform and sufficiently high resolutions. Actually, various low-resolutions and scale mismatching always exist in open world REID. We name this kind of problem as Scale-Adaptive Low Resolution Person Re-identification (SALR-REID). The most intuitive way to address this problem is to increase various low-resolutions (not only low, but also with different scales) to a uniform high-resolution. SR-GAN is one of the most competitive image super-resolution deep networks, designed with a fixed upscaling factor. However, it is still not suitable for SALR-REID task, which requires a network not only synthesizing high-resolution images with different upscaling factors, but also extracting discriminative image feature for judging person’s identity. (1) To promote the ability of scale-adaptive upscaling, we cascade multiple SRGANs in series. (2) To supplement the ability of image feature representation, we plug-in a reidentification network. With a unified formulation, a Cascaded Super-Resolution GAN (CSR-GAN) framework is proposed. Extensive evaluations on two simulated datasets and one public dataset demonstrate the advantages of our method over related state-of-the-art methods.

• #1774
Optimal Cruiser-Drone Traffic Enforcement Under Energy Limitation
Ariel Rosenfeld, Oleg Maksimov, Sarit Kraus
Security and Privacy

Drones can assist in mitigating traffic accidents by deterring reckless drivers, leveraging their flexible mobility. In the real world, drones are fundamentally limited by their battery/fuel capacity and have to be replenished during long operations. In this paper, we propose a novel approach where police cruisers act as mobile replenishment providers in addition to their traffic enforcement duties. We propose a binary integer linear program for determining the optimal rendezvous cruiser-drone enforcement policy which guarantees that all drones are replenished on time and minimizes the likelihood of accidents. In an extensive empirical evaluation, we first show that human drivers are expected to react to traffic enforcement drones in a similar fashion to how they react to police cruisers using a first-of-its-kind human study in realistic simulated driving. Then, we show that our proposed approach significantly outperforms the common practice of constructing stationary replenishment installations using both synthetic and real world road networks.

• #4431
Chaowei Xiao, Bo Li, Jun-yan Zhu, Warren He, Mingyan Liu, Dawn Song
Security and Privacy

• #2048
Qi-Zhi Cai, Chang Liu, Dawn Song
Security and Privacy

Recently, deep learning has been applied to many security-sensitive applications, such as facial authentication. The existence of adversarial examples hinders such applications. The state-of-the-art result on defense shows that adversarial training can be applied to train a robust model on MNIST against adversarial examples; but it fails to achieve a high empirical worst-case accuracy on a more complex task, such as CIFAR-10 and SVHN. In our work, we propose curriculum adversarial training (CAT) to resolve this issue. The basic idea is to develop a curriculum of adversarial examples generated by attacks with a wide range of strengths. With two techniques to mitigate the catastrophic forgetting and the generalization issues, we demonstrate that CAT can improve the prior art's empirical worst-case accuracy by a large margin of 25% on CIFAR-10 and 35% on SVHN. At the same, the model's performance on non-adversarial inputs is comparable to the state-of-the-art models.

• #5107
(Sister Conferences Best Papers Track) Tamper-Proof Privacy Auditing for Artificial Intelligence Systems
Andrew Sutton, Reza Samavi
Security and Privacy

Privacy audit logs are used to capture the actions of participants in a data sharing environment in order for auditors to check compliance with privacy policies. However, collusion may occur between the auditors and participants to obfuscate actions that should be recorded in the audit logs. In this paper, we propose a Linked Data based method of utilizing blockchain technology to create tamper-proof audit logs that provide proof of log manipulation and non-repudiation.

### Monday 1611:25 - 12:50ML-MMM1 - Multi-Instance, Multi-View, Multi-Label Learning (C3)

Chair: Xin Geng
• #1913
Deep Multi-View Concept Learning
Cai Xu, Ziyu Guan, Wei Zhao, Yunfei Niu, Quan Wang, Zhiheng Wang
Multi-Instance, Multi-View, Multi-Label Learning

Multi-view data is common in real-world datasets, where different views describe distinct perspectives. To better summarize the consistent and complementary information in multi-view data, researchers have proposed various multi-view representation learning algorithms, typically based on factorization models. However, most previous methods were focused on shallow factorization models which cannot capture the complex hierarchical information. Although a deep multi-view factorization model has been proposed recently, it fails to explicitly discern consistent and complementary information in multi-view data and does not consider conceptual labels. In this work we present a semi-supervised deep multi-view factorization method, named Deep Multi-view Concept Learning (DMCL). DMCL performs nonnegative factorization of the data hierarchically, and tries to capture semantic structures and explicitly model consistent and complementary information in multi-view data at the highest abstraction level. We develop a block coordinate descent algorithm for DMCL. Experiments conducted on image and document datasets show that DMCL performs well and outperforms baseline methods.

• #175
FISH-MML: Fisher-HSIC Multi-View Metric Learning
Changqing Zhang, Yeqinq Liu, Yue Liu, Qinghua Hu, Xinwang Liu, Pengfei Zhu
Multi-Instance, Multi-View, Multi-Label Learning

This work presents a simple yet effective model for multi-view metric learning, which aims to improve the classification of data with multiple views, e.g., multiple modalities or multiple types of features. The intrinsic correlation, different views describing same set of instances, makes it possible and necessary to jointly learn multiple metrics of different views, accordingly, we propose a multi-view metric learning method based on Fisher discriminant analysis (FDA) and Hilbert-Schmidt Independence Criteria (HSIC), termed as Fisher-HSIC Multi-View Metric Learning (FISH-MML). In our approach, the class separability is enforced in the spirit of FDA within each single view, while the consistence among different views is enhanced based on HSIC. Accordingly, both intra-view class separability and inter-view correlation are well addressed in a unified framework. The learned metrics can improve multi-view classification, and experimental results on real-world datasets demonstrate the effectiveness of the proposed method.

• #1413
Adaptive Graph Guided Embedding for Multi-label Annotation
Lichen Wang, Zhengming Ding, Yun Fu
Multi-Instance, Multi-View, Multi-Label Learning

Multi-label annotation is challenging since a large amount of well-labeled training data are required to achieve promising performance. However, providing such data is expensive while unlabeled data are widely available. To this end, we propose a novel Adaptive Graph Guided Embedding (AG2E) approach for multi-label annotation in a semi-supervised fashion, which utilizes limited labeled data associating with large-scale unlabeled data to facilitate learning performance. Specifically, a multi-label propagation scheme and an effective embedding are jointly learned to seek a latent space where unlabeled instances tend to be well assigned multiple labels. Furthermore, a locality structure regularizer is designed to preserve the intrinsic structure and enhance the multi-label annotation. We evaluate our model in both conventional multi-label learning and zero-shot learning scenario. Experimental results demonstrate that our approach outperforms other compared state-of-the-art methods.

• #2296
Label Enhancement for Label Distribution Learning
Ning Xu, An Tao, Xin Geng
Multi-Instance, Multi-View, Multi-Label Learning

Label distribution is more general than both single-label annotation and multi-label annotation. It covers a certain number of labels, representing the degree to which each label describes the instance. The learning process on the instances labeled by label distributions is called label distribution learning (LDL). Unfortunately, many training sets only contain simple logical labels rather than label distributions due to the difficulty of obtaining the label distributions directly.  To solve the problem, one way is to recover the label distributions from the logical labels in the training set via leveraging the topological information of the feature space and the correlation among the labels. Such process of recovering label distributions from logical labels is defined as label enhancement (LE), which reinforces the supervision information in the training sets. This paper proposes a novel LE algorithm called Graph Laplacian Label Enhancement (GLLE). Experimental results on one artificial dataset and fourteen real-world datasets show clear advantages of GLLE over several existing LE algorithms.

• #2460
Doubly Aligned Incomplete Multi-view Clustering
Menglei Hu, Songcan Chen
Multi-Instance, Multi-View, Multi-Label Learning

Nowadays, multi-view clustering has attracted more and more attention. To date, almost all the previous studies assume that views are complete. However, in reality, it is often the case that each view may contain some missing instances. Such incompleteness makes it impossible to directly use traditional multi-view clustering methods. In this paper, we propose a Doubly Aligned Incomplete Multi-view Clustering algorithm (DAIMC) based on weighted semi-nonnegative matrix factorization (semi-NMF). Specifically, on the one hand, DAIMC utilizes the given instance alignment information to learn a common latent feature matrix for all the views. On the other hand, DAIMC establishes a consensus basis matrix with the help of  L2,1-Norm regularized regression for reducing the influence of missing instances. Consequently, compared with existing methods, besides inheriting the strength of semi-NMF with ability to handle negative entries, DAIMC has two unique advantages: 1) solving the incomplete view problem by introducing a respective weight matrix for each view, making it able to easily adapt to the case with more than two views; 2) reducing the influence of view incompleteness on clustering by enforcing the basis matrices of individual views being aligned with the help of regression. Experiments on four real-world datasets demonstrate its advantages.

• #1522
Robust Auto-Weighted Multi-View Clustering
Pengzhen Ren, Yun Xiao, Pengfei Xu, Jun Guo, Xiaojiang Chen, Xin Wang, Dingyi Fang
Multi-Instance, Multi-View, Multi-Label Learning

Multi-view clustering has played a vital role in real-world applications. It aims to cluster the data points into different groups by exploring complementary information of multi-view. A major challenge of this problem is how to learn the explicit cluster structure with multiple views when there is considerable noise. To solve this challenging problem, we propose a novel Robust Auto-weighted Multi-view Clustering (RAMC), which aims to learn an optimal graph with exactly k connected components, where k is the number of clusters. ℓ1-norm is employed for robustness of the proposed algorithm. We have validated this in the later experiment. The new graph learned by the proposed model approximates the original graphs of each individual view but maintains an explicit cluster structure. With this optimal graph, we can immediately achieve the clustering results without any further post-processing. We conduct extensive experiments to confirm the superiority and robustness of the proposed algorithm.

• #1937
Towards Enabling Binary Decomposition for Partial Label Learning
Xuan Wu, Min-Ling Zhang
Multi-Instance, Multi-View, Multi-Label Learning

The task of partial label (PL) learning is to learn a multi-class classifier from training examples each associated with a set of candidate labels, among which only one corresponds to the ground-truth label. It is well known that for inducing multi-class predictive model, the most straightforward solution is binary decomposition which works by either one-vs-rest or one-vs-one strategy. Nonetheless, the ground-truth label for each PL training example is concealed in its candidate label set and thus not accessible to the learning algorithm, binary decomposition cannot be directly applied under partial label learning scenario. In this paper, a novel approach is proposed to solving partial label learning problem by adapting the popular one-vs-one decomposition strategy. Specifically, one binary classifier is derived for each pair of class labels, where PL training examples with distinct relevancy to the label pair are used to generate the corresponding binary training set. After that, one binary classifier is further derived for each class label by stacking over predictions of existing binary classifiers to improve generalization. Experimental studies on both artificial and real-world PL data sets clearly validate the effectiveness of the proposed binary decomposition approach w.r.t state-of-the-art partial label learning techniques.

### Monday 1611:25 - 13:05MAS-AM1 - Auctions and Markets 1 (C8)

Chair: Vangelis Markakis
• #1566
On Fair Price Discrimination in Multi-Unit Markets
Michele Flammini, Manuel Mauro, Matteo Tonelli
Auctions and Markets 1

Discriminatory pricing policies, even if at first glance can be perceived as unfair, are widespread. In fact, pricing differences for the same item among different national markets are common, or forms of discrimination based on the time of purchase, like in tickets' sales. In this work we propose a framework for capturing the setting of fair'' discriminatory pricing and study its application to multi-unit markets, in which many copies of the same item are on sale. Our model is able to incorporate the fundamental discrimination settings proposed in the literature, by expressing individual buyers constraints for assigning prices by means of a social relationship graph, modeling the information that each buyer can acquire about the prices assigned to the other buyers. After pointing out the positive effects of fair price discrimination, we investigate the computational complexity of maximizing the social welfare and the revenue in these markets, providing hardness and approximation results under various assumptions on the buyers valuations and on the social graph topology.

• #1619
Non-decreasing Payment Rules for Combinatorial Auctions
Vitor Bosshard, Ye Wang, Sven Seuken
Auctions and Markets 1

Combinatorial auctions are used to allocate resources in domains where bidders have complex preferences over bundles of goods. However, the behavior of bidders under different payment rules is not well understood, and there has been limited success in finding Bayes-Nash equilibria of such auctions due to the computational difficulties involved. In this paper, we introduce non-decreasing payment rules. Under such a rule, the payment of a bidder cannot decrease when he increases his bid, which is a natural and desirable property. VCG-nearest, the payment rule most commonly used in practice, violates this property and can thus be manipulated in surprising ways. In contrast, we show that many other payment rules are non-decreasing. We also show that a non-decreasing payment rule imposes a structure on the auction game that enables us to search for an approximate Bayes-Nash equilibrium much more efficiently than in the general case. Finally, we introduce the utility planes BNE algorithm, which exploits this structure and outperforms a state-of-the-art algorithm by multiple orders of magnitude.

• #2001
Ex-post IR Dynamic Auctions with Cost-per-Action Payments
Weiran Shen, Zihe Wang, Song Zuo
Auctions and Markets 1

Motivated by online ad auctions, we consider a repeated auction between one seller and many buyers, where each buyer only has an estimation of her value in each period until she actually receives the item in that period. The seller is allowed to conduct a dynamic auction but must guarantee ex-post individual rationality. In this paper, we use a structure that we call credit accounts to enable a general reduction from any incentive compatible and ex-ante individual rational dynamic auction to an approximate incentive compatible and ex-post individually rational dynamic auction with credit accounts. Our reduction obtains stronger individual rationality guarantees at the cost of weaker incentive compatibility. Surprisingly, our reduction works without any common knowledge assumption. Finally, as a complement to our reduction, we prove that there is no non-trivial auction that is exactly incentive compatible and ex-post individually rational under this setting.

• #2897
Double Auctions in Markets for Multiple Kinds of Goods
Erel Segal-Halevi, Avinatan Hassidim, Yonatan Aumann
Auctions and Markets 1

Motivated by applications such as stock exchanges and spectrum auctions, there is a growing interest in mechanisms for arranging trade in two-sided markets. However, existing mechanisms are either not truthful, do not guarantee an asymptotically-optimal gain-from-trade, rely on a prior on the traders' valuations, or operate in limited settings such as a single type of good. We extend the random-sampling technique used in earlier works to multi-good markets where traders have gross-substitute valuations. We show a prior free, truthful and strongly-budget-balanced mechanism which guarantees near-optimal gain from trade when the market sizes of all goods grow to infinity at a similar rate.

• #3954
Bidding in Periodic Double Auctions Using Heuristics and Dynamic Monte Carlo Tree Search
Moinul Morshed Porag Chowdhury, Christopher Kiekintveld, Son Tran, William Yeoh
Auctions and Markets 1

In a Periodic Double Auction (PDA), there are multiple discrete trading periods for a single type of good. PDAs are commonly used in real-world energy markets to trade energy in specific time slots to balance demand on the power grid. Strategically, bidding in a PDA is complicated because the bidder must predict and plan for future auctions that may influence the bidding strategy for the current auction. We present a general bidding strategy for PDAs based on forecasting clearing prices and using Monte Carlo Tree Search (MCTS) to plan a bidding strategy across multiple time periods. In addition, we present a fast heuristic strategy that can be used either as a standalone method or as an initial set of bids to seed the MCTS policy. We evaluate our bidding strategies using a PDA simulator based on the wholesale market implemented in the Power Trading Agent Competition (PowerTAC) competition. We demonstrate that our strategies outperform state-of-the-art bidding strategies designed for that competition.

• #4045
A Cloaking Mechanism to Mitigate Market Manipulation
Xintong Wang, Yevgeniy Vorobeychik, Michael P. Wellman
Auctions and Markets 1

We propose a cloaking mechanism to deter spoofing, a form of manipulation in financial markets. The mechanism works by symmetrically concealing a specified number of price levels from the inside of the order book. To study the effectiveness of cloaking, we simulate markets populated with background traders and an exploiter, who strategically spoofs to profit. The traders follow two representative bidding strategies: the non-spoofable zero intelligence and the manipulable heuristic belief learning. Through empirical game-theoretic analysis across parametrically different environments, we evaluate surplus accrued by traders, and characterize the conditions under which cloaking mitigates manipulation and benefits market welfare. We further design sophisticated spoofing strategies that probe to reveal cloaked information, and find that the effort and risk exceed the gains.

• #1297
Optimal Bidding Strategy for Brand Advertising
Takanori Maehara, Atsuhiro Narita, Jun Baba, Takayuki Kawabata
Auctions and Markets 1

Brand advertising is a type of advertising that aims at increasing the awareness of companies or products. This type of advertising is well studied in economic, marketing, and psychological literature; however, there are no studies in the area of computational advertising because the effect of such advertising is difficult to observe. In this study, we consider a real-time biding strategy for brand advertising. Here, our objective to maximizes the total number of users who remember the advertisement, averaged over the time. For this objective, we first introduce a new objective function that captures the cognitive psychological properties of memory retention, and can be optimized efficiently in the online setting (i.e., it is a monotone submodular function). Then, we propose an algorithm for the bid optimization problem with the proposed objective function under the second price mechanism by reducing the problem to the online knapsack constrained monotone submodular maximization problem. We evaluated the proposed objective function and the algorithm in a real-world data collected from our system and a questionnaire survey. We observed that our objective function is reasonable in real-world setting, and the proposed algorithm outperformed the baseline online algorithms.

• #1235
On the Complexity of Chore Division
Auctions and Markets 1

We study the proportional chore division problem where a protocol wants to divide an undesirable object, called chore, among n different players. This problem is the dual variant of the cake cutting problem in which we want to allocate a desirable object. In this paper, we show that chore division and cake cutting problems are closely related to each other and provide a tight lower bound for proportional chore division.

### Monday 1614:00 - 14:45Invited Talk (VICTORIA)

Chair: Shlomo Zilberstein
• Interactive, Collaborative Robots: Challenges and Opportunities
Danica Kragic
Invited Talk
• ### Monday 1614:55 - 16:10KR-NLP1 - KR and NLP (C7)

Chair: Olaf Hartig
• #434
A Deep Modular RNN Approach for Ethos Mining
Rory Duthie, Katarzyna Budzynska
KR and NLP

Automatically recognising and extracting the reasoning expressed in natural language text is extremely demanding and only very recently has there been significant headway. While such argument mining focuses on logos (the content of what is said) evidence has demonstrated that using ethos (the character of the speaker) can sometimes be an even more powerful tool of influence. We study the UK parliamentary debates which furnish a rich source of ethos with linguistic material signalling the ethotic relationships between politicians. We then develop a novel deep modular recurrent neural network, DMRNN, approach and employ proven methods from argument mining and sentiment analysis to create an ethos mining pipeline. Annotation of ethotic statements is reliable and its extraction is robust (macro-F1 = 0.83), while annotation of polarity is perfect and its extraction is solid (macro-F1 = 0.84). By exploring correspondences between ethos in political discourse and major events in the political landscape through ethos analytics, we uncover tantalising evidence that identifying expressions of positive and negative ethotic sentiment is a powerful instrument for understanding the dynamics of governments.

• #1328
Constructing Narrative Event Evolutionary Graph for Script Event Prediction
Zhongyang Li, Xiao Ding, Ting Liu
KR and NLP

Script event prediction requires a model to predict the subsequent event given an existing event context. Previous models based on event pairs or event chains cannot make full use of dense event connections, which may limit their capability of event prediction. To remedy this, we propose constructing an event graph to better utilize the event network information for script event prediction. In particular, we first extract narrative event chains from large quantities of news corpus, and then construct a narrative event evolutionary graph (NEEG) based on the extracted chains. NEEG can be seen as a knowledge base that describes event evolutionary principles and patterns. To solve the inference problem on NEEG, we present a scaled graph neural network (SGNN) to model event interactions and learn better event representations. Instead of computing the representations on the whole graph, SGNN processes only the concerned nodes each time, which makes our model feasible to large-scale graphs. By comparing the similarity between input context event representations and candidate event representations, we can choose the most reasonable subsequent event. Experimental results on widely used New York Times corpus demonstrate that our model significantly outperforms state-of-the-art baseline methods, by using standard multiple choice narrative cloze evaluation.

• #1920
Learning Conceptual Space Representations of Interrelated Concepts
Zied Bouraoui, Steven Schockaert
KR and NLP

Several recently proposed methods aim to learn conceptual space representations from large text collections. These learned representations associate each object from a given domain of interest with a point in a high-dimensional Euclidean space, but they do not model the concepts from this domain, and can thus not directly be used for categorization and related cognitive tasks. A natural solution is to represent concepts as Gaussians, learned from the representations of their instances, but this can only be reliably done if sufficiently many instances are given, which is often not the case. In this paper, we introduce a Bayesian model which addresses this problem by constructing informative priors from background knowledge about how the concepts of interest are interrelated with each other. We show that this leads to substantially better predictions in a knowledge base completion task.

• #2867
Mitigating the Effect of Out-of-Vocabulary Entity Pairs in Matrix Factorization for KB Inference
Prachi Jain, Shikhar Murty, Mausam, Soumen Chakrabarti
KR and NLP

This paper analyzes the varied performance of Matrix Factorization (MF) on the related tasks of relation extraction and knowledge-base completion, which have been unified recently into a single framework of knowledge-base inference (KBI) [Toutanova et al., 2015]. We first propose a new evaluation protocol that makes comparisons between MF and Tensor Factorization (TF) models fair. We find that this results in a steep drop in MF performance. Our analysis attributes this to the high out-of-vocabulary (OOV) rate of entity pairs in test folds of commonly-used datasets. To alleviate this issue, we propose three extensions to MF. Our best model is a TF-augmented MF model. This hybrid model is robust and obtains strong results across various KBI datasets.

• #3114
Functional Partitioning of Ontologies for Natural Language Query Completion in Question Answering Systems
Jaydeep Sen, Ashish Mittal, Diptikalyan Saha, Karthik Sankaranarayanan
KR and NLP

Query completion systems are well studied in the context of information retrieval systems that handle keyword queries. However, Natural Language Interface to Databases (NLIDB) systems that focus on syntactically correct and semantically complete queries to obtain high precision answers require a fundamentally different approach to the query completion problem as opposed to IR systems. To the best of our knowledge, we are first to focus on the problem of query completion for NLIDB systems. In particular, we introduce a novel concept of functional partitioning of an ontology and then design algorithms to intelligently use the components obtained from functional partitioning to extend a state-of-the-art NLIDB system to produce accurate and semantically meaningful query completions in the absence of query logs. We test the proposed query completion framework on multiple benchmark datasets and demonstrate the efficacy of our technique empirically.

• #4479
Joint Posterior Revision of NLP Annotations via Ontological Knowledge
Marco Rospocher, Francesco Corcoglioniti
KR and NLP

Different well-established NLP tasks contribute to elicit the semantics of entities mentioned in natural language text, such as Named Entity Recognition and Classification (NERC) and Entity Linking (EL). However, combining the outcomes of these tasks may result in NLP annotations --- such as a NERC organization linked by EL to a person --- that are unlikely or contradictory when interpreted in the light of common world knowledge about the entities these annotations refer to. We thus propose a general probabilistic model that explicitly captures the relations between multiple NLP annotations for an entity mention, the ontological entity classes implied by those annotations, and the background ontological knowledge those classes may be consistent with. We use the model to estimate the posterior probability of NLP annotations given their confidences (prior probabilities) and the ontological knowledge, and consequently revise the best annotation choice performed by the NLP tools. In a concrete scenario with two state-of-the-art tools for NERC and EL, we experimentally show on three reference datasets that for these tasks, the joint annotation revision performed by the model consistently improves on the original results of the tools.

### Monday 1614:55 - 16:10MAS-SGN - Social Choice, Game Theory, and Networks (C8)

Chair: Leila Amgoud
• #2665
Opinion Diffusion and Campaigning on Society Graphs
Piotr Faliszewski, Rica Gonen, Martin Koutecký, Nimrod Talmon
Social Choice, Game Theory, and Networks

We study the effects of campaigning, where the society is partitioned into voter clusters and a diffusion process propagates opinions in a network connecting those clusters. Our model is very general and can incorporate many campaigning actions, various partitions of the society into voter clusters, and very general diffusion processes. Perhaps surprisingly, we show that computing the cheapest campaign for rigging a given election can usually be done efficiently, even with arbitrarily-many voters.

• #652
Biharmonic Distance Related Centrality for Edges in Weighted Networks
Yuhao Yi, Liren Shan, Huan Li, Zhongzhi Zhang
Social Choice, Game Theory, and Networks

The Kirchhoff index, defined as the sum of effective resistances over pairs all of nodes, is of primary significance in diverse contexts of complex networks. In this paper, we propose to use the rate at which the Kirchhoff index changes with respect to the change of resistance of an edge as a measure of importance for this edge in weighted networks. For an arbitrary edge, we explicitly determine the change of the Kirchhoff index and express it in terms of the biharmonic distance between its end nodes, and thus call this centrality as biharmonic distance related centrality (BDRC). We show that BDRC has a better discriminating power than those commonly used metrics, such as edge betweenness and spanning edge centrality. We give an efficient algorithm that provides an approximation of biharmonic distance for all edges in nearly linear time of the number of edges, with a high probability. Experiment results validate the efficiency and accuracy of the presented algorithm.

• #1073
Reasoning about Consensus when Opinions Diffuse through Majority Dynamics
Vincenzo Auletta, Diodato Ferraioli, Gianluigi Greco
Social Choice, Game Theory, and Networks

Opinion diffusion is studied on social graphs where agents hold binary opinions and where social pressure leads them to conform to the opinion manifested by their neighbors. Within this setting, questions related to whether a minority/majority can spread the opinion it supports to all the other agents are considered.It is shown that, no matter of the graph given at hand, there always exists a group formed by a half of the agents that can annihilate the opposite opinion. Instead, the influence power of minorities depends on certain features of the underlying graphs, which are NP-hard to be identified. Deciding whether the two opinions can coexist in some stable configuration is NP-hard, too.

• #3060
Path Evaluation and Centralities in Weighted Graphs - An Axiomatic Approach
Social Choice, Game Theory, and Networks

We study the problem of extending the classic centrality measures to weighted graphs. Unfortunately, in the existing extensions, paths in the graph are evaluated solely based on their weights, which is a restrictive and undesirable assumption for a variety of settings. Given this, we define a notion of the path evaluation function that assesses a path between two nodes by looking not only on the sum of edge weights, but also on the number of intermediaries. Using an axiomatic approach, we propose three classes of path evaluation functions. Building upon this analysis, we present the first systematic study how classic centrality measures can be extended to weighted graphs while taking into account an arbitrary path evaluation function. As an application, we use the newly-defined measures to identify the most well-linked districts in a sample public transport network.

• #3063
Axiomatization of the PageRank Centrality
Tomasz Wąs, Oskar Skibski
Social Choice, Game Theory, and Networks

We propose an axiomatization of PageRank. Specifically, we introduce five simple axioms—Foreseeability, Outgoing Homogeneity, Monotonicity, Merging, and Dummy Node—and show that PageRank is the only centrality measure that satisfies all of them. Our axioms give a new conceptual and theoretical underpinnings of PageRank and show how it differs from other centralities.

• #3252
Combining Opinion Pooling and Evidential Updating for Multi-Agent Consensus
Chanelle Lee, Jonathan Lawry, Alan Winfield
Social Choice, Game Theory, and Networks

The evidence available to a multi-agent system can take at least two distinct forms. There can be direct evidence from the environment resulting, for example, from sensor measurements or from running tests or experiments. In addition, agents also gain evidence from other individuals in the population with whom they are interacting. We, therefore, envisage an agent's beliefs as a probability distribution over a set of hypotheses of interest, which are updated either on the basis of direct evidence using Bayesian updating, or by taking account of the probabilities of other agents using opinion pooling. This paper investigates the relationship between these two processes in a multi-agent setting. We consider a possible Bayesian interpretation of probability pooling and then explore properties for pooling operators governing the extent to which direct evidence is diluted, preserved or amplified by the pooling process. We then use simulation experiments to show that pooling operators can provide a mechanism by which a limited amount of direct evidence can be efficiently propagated through a population of agents so that an appropriate consensus is reached. In particular, we explore the convergence properties of a parameterised family of operators with a range of evidence propagation strengths.

### Monday 1614:55 - 16:10PS-ML - Planning and Learning (K2)

Chair: Florent Teichteil-Koenigsbuch
• #2142
On Q-learning Convergence for Non-Markov Decision Processes
Sultan Javed Majeed, Marcus Hutter
Planning and Learning

Temporal-difference (TD) learning is an attractive, computationally efficient framework for model- free reinforcement learning. Q-learning is one of the most widely used TD learning technique that enables an agent to learn the optimal action-value function, i.e. Q-value function. Contrary to its widespread use, Q-learning has only been proven to converge on Markov Decision Processes (MDPs) and Q-uniform abstractions of finite-state MDPs. On the other hand, most real-world problems are inherently non-Markovian: the full true state of the environment is not revealed by recent observations. In this paper, we investigate the behavior of Q-learning when applied to non-MDP and non-ergodic domains which may have infinitely many underlying states. We prove that the convergence guarantee of Q-learning can be extended to a class of such non-MDP problems, in particular, to some non-stationary domains. We show that state-uniformity of the optimal Q-value function is a necessary and sufficient condition for Q-learning to converge even in the case of infinitely many internal states.

• #2441
Organizing Experience: a Deeper Look at Replay Mechanisms for Sample-Based Planning in Continuous State Domains
Planning and Learning

Model-based strategies for control are critical to obtain sample efficient learning. Dyna is a planning paradigm that naturally interleaves learning and planning, by simulating one-step experience to update the action-value function. This elegant planning strategy has been mostly explored in the tabular setting. The aim of this paper is to revisit sample-based planning, in stochastic and continuous domains with learned models. We first highlight the flexibility afforded by a model over Experience Replay (ER). Replay-based methods can be seen as stochastic planning methods that repeatedly sample from a buffer of recent agent-environment interactions and perform updates to improve data efficiency. We show that a model, as opposed to a replay buffer, is particularly useful for specifying which states to sample from during planning, such as predecessor states that propagate information in reverse from a state more quickly. We introduce a semi-parametric model learning approach, called Reweighted Experience Models (REMs), that makes it simple to sample next states or predecessors. We demonstrate that REM-Dyna exhibits similar advantages over replay-based methods in learning in continuous state problems, and that the performance gap grows when moving to stochastic domains, of increasing size.

• #519
Open Loop Execution of Tree-Search Algorithms
Erwan Lecarpentier, Guillaume Infantes, Charles Lesire, Emmanuel Rachelson
Planning and Learning

In the context of tree-search stochastic planning algorithms where a generative model is available, we consider on-line planning algorithms building trees in order to recommend an action. We investigate the question of avoiding re-planning in subsequent decision steps by directly using sub-trees as action recommender. Firstly, we propose a method for open loop control via a new algorithm taking the decision of re-planning or not at each time step based on an analysis of the statistics of the sub-tree. Secondly, we show that the probability of selecting a suboptimal action at any depth of the tree can be upper bounded and converges towards zero. Moreover, this upper bound decays in a logarithmic way between subsequent depths. This leads to a distinction between node-wise optimality and state-wise optimality. Finally, we empirically demonstrate that our method achieves a compromise between loss of performance and computational gain.

• #1354
Extracting Action Sequences from Texts Based on Deep Reinforcement Learning
Wenfeng Feng, Hankz Hankui Zhuo, Subbarao Kambhampati
Planning and Learning

Extracting action sequences from texts is challenging, as it requires commonsense inferences based on world knowledge. Although there has been work on extracting action scripts, instructions, navigation actions, etc., they require either the set of candidate actions be provided in advance, or action descriptions are restricted to a specific form, e.g., description templates. In this paper we aim to extract action sequences from texts in \emph{free} natural language, i.e., without any restricted templates, provided the set of actions is unknown. We propose to extract action sequences from texts based on the deep reinforcement learning framework. Specifically, we view selecting'' or eliminating'' words from texts as actions'', and texts associated with actions as states''. We build Q-networks to learn policies of extracting actions and extract plans from the labeled texts. We demonstrate the effectiveness of our approach on several datasets with comparison to state-of-the-art approaches.

• #2152
Bayesian Active Edge Evaluation on Expensive Graphs
Sanjiban Choudhury, Siddhartha Srinivasa, Sebastian Scherer
Planning and Learning

We consider the problem of real-time motion planning that requires evaluating a minimal number of edges on a graph to quickly discover collision-free paths. Evaluating edges is expensive, both for robots with complex geometries like robot arms, and for robots sensing the world online like UAVs. Until now, this challenge has been addressed via laziness, i.e. deferring edge evaluation until absolutely necessary, with the hope that edges turn out to be valid. However, all edges are not alike in value - some have a lot of potentially good paths flowing through them, and some others encode the likelihood of neighbouring edges being valid. This leads to our key insight - instead of passive laziness, we can actively choose edges that reduce the uncertainty about the validity of paths. We show that this is equivalent to the Bayesian active learning paradigm of decision region determination (DRD). However, the DRD problem is not only combinatorially hard but also requires explicit enumeration of all possible worlds. We propose a novel framework that combines two DRD algorithms, DIRECT and BISECT, to overcome both issues. We show that our approach outperforms several state-of-the-art algorithms on a spectrum of planning problems for mobile robots, manipulators and autonomous helicopters.

• #4104
Planning in Factored State and Action Spaces with Learned Binarized Neural Network Transition Models
Buser Say, Scott Sanner
Planning and Learning

In this paper, we leverage the efficiency of Binarized Neural Networks (BNNs) to learn complex state transition models of planning domains with discretized factored state and action spaces. In order to directly exploit this transition structure for planning, we present two novel compilations of the learned factored planning problem with BNNs based on reductions to Boolean Satisfiability (FD-SAT-Plan) as well as Binary Linear Programming (FD-BLP-Plan). Experimentally, we show the effectiveness of learning complex transition models with BNNs, and test the runtime efficiency of both encodings on the learned factored planning problem. After this initial investigation, we present an incremental constraint generation algorithm based on generalized landmark constraints to improve the planning accuracy of our encodings. Finally, we show how to extend the best performing encoding (FD-BLP-Plan+) beyond goals to handle factored planning problems with rewards.

### Monday 1614:55 - 16:10NLP-EMB1 - Word Embeddings (T2)

Chair: Roberto Navigli
• #952
Complementary Learning of Word Embeddings
Yan Song, Shuming Shi
Word Embeddings

Continuous bag-of-words (CB) and skip-gram (SG) models are popular approaches to training word embeddings. Conventionally they are two standing-alone techniques used individually. However, with the same goal of building embeddings by leveraging surrounding words, they are in fact a pair of complementary tasks where the output of one model can be used as input of the other, and vice versa. In this paper, we propose complementary learning of word embeddings based on the CB and SG model. Specifically, one round of learning first integrates the predicted output of a SG model with existing context, then forms an enlarged context as input to the CB model. Final models are obtained through several rounds of parameter updating. Experimental results indicate that our approach can effectively improve the quality of initial embeddings, in terms of intrinsic and extrinsic evaluations.

• #1736
Approximating Word Ranking and Negative Sampling for Word Embedding
Guibing Guo, Shichang Ouyang, Fajie Yuan, Xingwei Wang
Word Embeddings

CBOW (Continuous Bag-Of-Words) is one of the most commonly used techniques to generate word embeddings in various NLP tasks. However, it fails to reach the optimal performance due to uniform involvements of positive words and a simple sampling distribution of negative words. To resolve these issues, we propose OptRank to optimize word ranking and approximate negative sampling for bettering word embedding. Specifically, we first formalize word embedding as a ranking problem. Then, we weigh the positive words by their ranks such that highly ranked words have more importance, and adopt a dynamic sampling strategy to select informative negative words. In addition, an approximation method is designed to efficiently compute word ranks. Empirical experiments show that OptRank consistently outperforms its counterparts on a benchmark dataset with different sampling scales, especially when the sampled subset is small. The code and datasets can be obtained from https://github.com/ouououououou/OptRank.

• #2265
Joint Learning Embeddings for Chinese Words and their Components via Ladder Structured Networks
Yan Song, Shuming Shi, Jing Li
Word Embeddings

The components, such as characters and radicals, of a Chinese word are important sources to help in capturing semantic information of the word. In this paper, we propose a novel framework, namely, ladder structured networks (LSN), which contains three layers representing word, character and radical and learns their embeddings synchronously. LSN captures not only the relations among words, but also the relations among their component characters and radicals, as well as the relations across layers. Each layer in LSN is pluggable so that any particular type of unit (word, character, radical) can be removed and the LSN is thus adjusted for particular types of inputs. In evaluating our framework, we use word similarity as the intrinsic evaluation and part-of-speech tagging and document classification as extrinsic evaluations. Experimental results confirm the validity of our approach and show superiority of our approach over previous work.

• #4294
Lifelong Domain Word Embedding via Meta-Learning
Hu Xu, Bing Liu, Lei Shu, Philip S. Yu
Word Embeddings

Learning high-quality domain word embeddings is important for achieving good performance in many NLP tasks. General-purpose embeddings trained on large-scale corpora are often sub-optimal for domain-specific applications. However, domain-specific tasks often do not have large in-domain corpora for training high-quality domain embeddings. In this paper, we propose a novel lifelong learning setting for domain embedding. That is, when performing the new domain embedding, the system has seen many past domains, and it tries to expand the new in-domain corpus by exploiting the corpora from the past domains via meta-learning. The proposed meta-learner characterizes the similarities of the contexts of the same word in many domain corpora, which helps retrieve relevant data from the past domains to expand the new domain corpus. Experimental results show that domain embeddings produced from such a process improve the performance of the downstream tasks.

• #4362
Biased Random Walk based Social Regularization for Word Embeddings
Ziqian Zeng, Xin Liu, Yangqiu Song
Word Embeddings

Nowadays, people publish a lot of natural language texts on social media. Socialized word embeddings (SWE) has been proposed to deal with two phenomena of language use: everyone has his/her own personal characteristics of language use and socially connected users are likely to use language in similar ways. We observe that the spread of language use is transitive. Namely, one user can affect his/her friends and the friends can also affect their friends. However, SWE modeled the transitivity implicitly. The social regularization in SWE only applies to one-hop neighbors and thus users outside the one-hop social circle will not be affected directly. In this work, we adopt random walk methods to generate paths on the social graph to model the transitivity explicitly. Each user on a path will be affected by his/her adjacent user(s) on the path. Moreover, according to the update mechanism of SWE, fewer friends a user has, fewer update opportunities he/she can get. Hence, we propose a biased random walk method to provide these users with more update opportunities. Experiments show that our random walk based social regularizations perform better on sentiment classification.

• #1154
Think Globally, Embed Locally --- Locally Linear Meta-embedding of Words
Danushka Bollegala, Kohei Hayashi, Ken-ichi Kawarabayashi
Word Embeddings

Distributed word embeddings have shown superior performances in numerous Natural Language Processing (NLP) tasks. However, their performances vary significantly across different tasks, implying that the word embeddings learnt by those methods capture complementary aspects of lexical semantics. Therefore, we believe that it is important to combine the existing word embeddings to produce more accurate and complete meta-embeddings of words. For this purpose, we propose an unsupervised locally linear meta-embedding learning method that takes pre-trained word embeddings as the input, and produces more accurate meta embeddings. Unlike previously proposed meta-embedding learning methods that learn a global projection over all words in a vocabulary, our proposed method is sensitive to the differences in local neighbourhoods of the individual source word embeddings. Moreover, we show that vector concatenation, a previously proposed highly competitive baseline approach for integrating word embeddings, can be derived as a special case of the proposed method. Experimental results on semantic similarity, word analogy, relation classification, and short-text classification tasks show that our meta-embeddings to significantly outperform prior methods in several benchmark datasets, establishing a new state of the art for meta-embeddings.

### Monday 1614:55 - 16:10CV-IR - Image Retrieval (T1)

Chair: Xiaochun Cao
• #1453
Progressive Generative Hashing for Image Retrieval
Yuqing Ma, Yue He, Fan Ding, Sheng Hu, Jun Li, Xianglong Liu
Image Retrieval

Recent years have witnessed the success of the emerging hashing techniques in large-scale image retrieval. Owing to the great learning capacity, deep hashing has become one of the most promising solutions, and achieved attractive performance in practice. However, without semantic label information, the unsupervised deep hashing still remains an open question. In this paper, we propose a novel progressive generative hashing (PGH) framework to help learn a discriminative hashing network in an unsupervised way. Very different from existing studies, it first treats the hash codes as a kind of semantic condition for the similar image generation, and simultaneously feeds the original image and its codes into the generative adversarial networks (GANs). The real images together with the synthetic ones can further help train a discriminative hashing network based on a triplet loss. By iteratively inputting the learnt codes into the hash conditioned GANs, we can progressively enable the hashing network to discover the semantic relations. Extensive experiments on the widely-used image datasets demonstrate that PGH can significantly outperforms state-of-the-art unsupervised hashing methods.

• #629
Redundancy-resistant Generative Hashing for Image Retrieval
Changying Du, Xingyu Xie, Changde Du, Hao Wang
Image Retrieval

By optimizing probability distributions over discrete latent codes, Stochastic Generative Hashing (SGH) bypasses the critical and intractable binary constraints on hash codes. While encouraging results were reported, SGH still suffers from the deficient usage of latent codes, i.e., there often exist many uninformative latent dimensions in the code space, a disadvantage inherited from its auto-encoding variational framework. Motivated by the fact that code redundancy usually is severer when more complex decoder network is used, in this paper, we propose a constrained deep generative architecture to simplify the decoder for data reconstruction. Specifically, our new framework forces the latent hashing codes to not only reconstruct data through the generative network but also retain minimal squared L2 difference to the last real-valued network hidden layer. Furthermore, during posterior inference, we propose to regularize the standard auto-encoding objective with an additional term that explicitly accounts for the negative redundancy degree of latent code dimensions. We interpret such modifications as Bayesian posterior regularization and design an adversarial strategy to optimize the generative, the variational, and the redundancy-resistanting parameters. Empirical results show that our new method can significantly boost the quality of learned codes and achieve state-of-the-art performance for image retrieval.

• #514
Dual Adversarial Networks for Zero-shot Cross-media Retrieval
Jingze Chi, Yuxin Peng
Image Retrieval

Existing cross-media retrieval methods usually require that testing categories remain the same with training categories, which cannot support the retrieval of increasing new categories. Inspired by zero-shot learning, this paper proposes zeroshot cross-media retrieval for addressing the above problem, which aims to retrieve data of new categories across different media types. It is challenging that zero-shot cross-media retrieval has to handle not only the inconsistent semantics across new and known categories, but also the heterogeneous distributions across different media types. To address the above challenges, this paper proposes Dual Adversarial Networks for Zero-shot Crossmedia Retrieval (DANZCR), which is the first approach to address zero-shot cross-media retrieval to the best of our knowledge. Our DANZCR approach consists of two GANs in a dual structure for common representation generation and original representation reconstruction respectively, which capture the underlying data structures as well as strengthen relations between input data and semantic space to generalize across seen and unseen categories. Our DANZCR approach exploits word embeddings to learn common representations in semantic space via an adversarial learning method, which preserves the inherent cross-media correlation and enhances the knowledge transfer to new categories. Experiments on three widely-used cross-media retrieval datasets show the effectiveness of our approach.

• #1340
Tag-based Weakly-supervised Hashing for Image Retrieval
Ziyu Guan, Fei Xie, Wanqing Zhao, Xiaopeng Wang, Long Chen, Wei Zhao, Jinye Peng
Image Retrieval

We are concerned with using user-tagged images to learn proper hashing functions for image retrieval. The benefits are two-fold: (1) we could obtain abundant training data for deep hashing models; (2) tagging data possesses richer semantic information which could help better characterize similarity relationships between images. However, tagging data suffers from noises, vagueness and incompleteness. Different from previous unsupervised or supervised hashing learning, we propose a novel weakly-supervised deep hashing framework which consists of two stages: weakly-supervised pre-training and supervised fine-tuning. The second stage is as usual. In the first stage, rather than performing supervision on tags, the framework introduces a semantic embedding vector (sem-vector) for each image and performs learning of hashing and sem-vectors jointly. By carefully designing the optimization problem, it can well leverage tagging information and image content for hashing learning. The framework is general and does not depend on specific deep hashing methods. Empirical results on real world datasets show that when it is integrated with state-of-art deep hashing methods, the performance increases by 8-10%.

• #2268
Learning Deep Unsupervised Binary Codes for Image Retrieval
Junjie Chen, William K. Cheung, Anran Wang
Image Retrieval

Hashing is an efficient approximate nearest neighbor search method and has been widely adopted for large-scale multimedia retrieval. While supervised learning is more popular for the data-dependent hashing, deep unsupervised hashing methods have recently been developed to learn non-linear transformations for converting multimedia inputs to binary codes. Most of existing deep unsupervised hashing methods make use of a quadratic constraint for minimizing the difference between the compact representations and the target binary codes, which inevitably causes severe information loss. In this paper, we propose a novel deep unsupervised method called DeepQuan for hashing. The DeepQuan model utilizes a deep autoencoder network, where the encoder is used to learn compact representations and the decoder is for manifold preservation. To contrast with the existing unsupervised methods, DeepQuan learns the binary codes by minimizing the quantization error through product quantization technique. Furthermore, a weighted triplet loss is proposed to avoid trivial solution and poor generalization. Extensive experimental results on standard datasets show that the proposed DeepQuan model outperforms the state-of-the-art unsupervised hashing methods for image retrieval tasks.

• #3101
Hierarchical Graph Structure Learning for Multi-View 3D Model Retrieval
Yuting Su, Wenhui Li, Anan Liu, Weizhi Nie
Image Retrieval

3D model retrieval has been widely utilized in numerous domains, such as computer-aided design, digital entertainment and virtual reality. Recently, many graph-based methods have been proposed to address this task by using multiple views of 3D models. However, these methods are always constrained by the many-to-many graph matching for similarity measure between pair-wise models. In this paper, we propose an hierarchical graph structure learning method (HGS) for 3D model retrieval. The proposed method can decompose the complicated multi-view graph-based similarity measure into multiple single-view graph-based similarity measures. In the bottom hierarchy, we present the method for single-view graph generation and further propose the novel method for similarity measure in single-view graph by leveraging both node-wise context and model-wise context. In the top hierarchy, we fuse the similarities in single-view graphs with respect to different viewpoints to get the multi-view similarity between pair-wise models. In this way, the proposed method can avoid the difficulty in definition and computation in the traditional high-order graph. Moreover, this method is unsupervised and is independent of large-scale 3D dataset for model learning. We conduct extensive evaluation on three popular and challenging datasets. The comparison demonstrates the superiority and effectiveness of the proposed method comparing with the state of the arts. Especially, this unsupervised method can achieve competing performance against the most recent supervised & deep learning method.

### Monday 1614:55 - 16:10MUL-SE - AI and Software Engineering, Program Synthesis (C2)

Chair: Giuseppe de Giacomo
• #1569
Code Completion with Neural Attention and Pointer Networks
Jian Li, Yue Wang, Michael R. Lyu, Irwin King
AI and Software Engineering, Program Synthesis

Intelligent code completion has become an essential research task to accelerate modern software development. To facilitate effective code completion for dynamically-typed programming languages, we apply neural language models by learning from large codebases, and develop a tailored attention mechanism for code completion. However, standard neural language models even with attention mechanism cannot correctly predict the out-of-vocabulary (OoV) words that restrict the code completion performance. In this paper, inspired by the prevalence of locally repeated terms in program source code, and the recently proposed pointer copy mechanism, we propose a pointer mixture network for better predicting OoV words in code completion. Based on the context, the pointer mixture network learns to either generate a within-vocabulary word through an RNN component, or regenerate an OoV word from local context through a pointer component. Experiments on two benchmarked datasets demonstrate the effectiveness of our attention mechanism and pointer mixture network on the code completion task.

• #3177
Positive and Unlabeled Learning for Detecting Software Functional Clones with Adversarial Training
Hui-Hui Wei, Ming Li
AI and Software Engineering, Program Synthesis

Software clone detection is an important problem for software maintenance and evolution and it has attracted lots of attentions. However, existing approaches ignore a fact that people would label the pairs of code fragments as \emph{clone} only if they happen to discover the clones while a huge number of undiscovered clone pairs and non-clone pairs are left unlabeled. In this paper, we argue that the clone detection task in the real-world should be formalized as a Positive-Unlabeled (PU) learning problem, and address this problem by proposing a novel positive and unlabeled learning approach, namely CDPU, to effectively detect software functional clones, i.e., pieces of codes with similar functionality but differing in both syntactical and lexical level, where adversarial training is employed to improve the robustness of the learned model to those non-clone pairs that look extremely similar but behave differently. Experiments on software clone detection benchmarks indicate that the proposed approach together with adversarial training outperforms the state-of-the-art approaches for software functional clone detection.

• #3097
Deontic Sensors
Julian Padget, Marina De Vos, Charlie Ann Page
AI and Software Engineering, Program Synthesis

Normative capabilities in multi-agent systems (MAS) can be represented within agents, separately as institutions, or a blend of the two. This paper addresses how to extend the principles of open MAS to the provision of normative reasoning capabilities, which are currently either embedded in existing MAS platforms - tightly coupled and inaccessible - or not present. We use a resource-oriented architecture (ROA) pattern, that we call deontic sensors, to make normative reasoning part of an open MAS architecture. The pattern specifies how to loosely couple MAS and normative frameworks, such that each is agnostic of the other, while augmenting the brute facts that an agent perceives with institutional facts, that capture each institution's interpretation of an agent's action. In consequence, a MAS without normative capabilities can acquire them, and an embedded normative framework can be de-coupled and opened to other MAS platforms. More importantly, the deontic sensor pattern allows normative reasoning to be published as services, opening routes to certification and re-use, creation of (formalized) trust and non-specialist access to "on demand'' normative reasoning.

• #3178
Cutting the Software Building Efforts in Continuous Integration by Semi-Supervised Online AUC Optimization
Zheng Xie, Ming Li
AI and Software Engineering, Program Synthesis

Continuous Integration (CI) systems aim to provide quick feedback on the success of the code changes by keeping on building the entire systems upon code changes are committed. However, building the entire software system is usually resource and time consuming. Thus, build outcome prediction is usually employed to distinguish the successful builds from the failed ones to cut the building efforts on those successful builds that do not result in any immediate action of the developer. Nevertheless, build outcome prediction in CI is challenging since the learner should be able to learn from a stream of build events with and without the build outcome labels and provide immediate prediction on the next build event. Also, the distribution of the successful and the failed builds are often highly imbalanced. Unfortunately, the existing methods fail to address these challenges well. In this paper, we address these challenges by proposing a semi-supervised online AUC optimization method for CI build outcome prediction. Experiments indicate that our method is able to cut the software building efforts by effectively identify the successful builds, and it outperforms the existing methods that elaborate to address part of these challenges.

• #5133
(Sister Conferences Best Papers Track) Counterexample-Driven Genetic Programming: Stochastic Synthesis of Provably Correct Programs
Krzysztof Krawiec, Iwo Błądek, Jerry Swan, John H. Drake
AI and Software Engineering, Program Synthesis

Genetic programming is an effective technique for inductive synthesis of programs from tests, i.e. training examples of desired input-output behavior. Programs synthesized in this way are not guaranteed to generalize beyond the training set, which is unacceptable in many applications. We present Counterexample-Driven Genetic Programming (CDGP) that employs evolutionary search to synthesize provably correct programs from formal specifications. CDGP employs a Satisfiability Modulo Theories (SMT) solver to formally verify programs in the evaluation phase. A failed verification produces counterexamples that are in turn used to calculate fitness and thereby drive the search process. When compared with a range of approaches on a suite of state-of-the-art specification-based synthesis benchmarks, CDGP systematically outperforms them, typically synthesizing correct programs faster and using fewer tests.

• #2224
Synthesizing Pattern Programs from Examples
Sunbeom So, Hakjoo Oh
AI and Software Engineering, Program Synthesis

We describe a programming-by-example system that automatically generates pattern programs from examples. Writing pattern programs, which produce various patterns of characters, is one of the most popular programming exercises for entry-level students. However, students often find it difficult to write correct solutions by themselves. In this paper, we present a method for synthesizing pattern programs from examples, allowing students to improve their programming skills efficiently. To that end, we first design a domain-specific language that supports a large class of pattern programs that students struggle with. Next, we develop a synthesis algorithm that efficiently finds a desired program by combining enumerative search, constraint solving, and program analysis. We implemented the algorithm in a tool and evaluated it on 40 exercises gathered from online forums. The experimental results and user study show that our tool can synthesize instructive solutions from 1–3 example patterns in 1.2 seconds on average.

### Monday 1614:55 - 16:10UAI-GM - Graphical Models (C3)

Chair: Karthika Mohan
• #776
Learning with Adaptive Neighbors for Image Clustering
Yang Liu, Quanxue Gao, Zhaohua Yang, Shujian Wang
Graphical Models

Due to the importance and efficiency of learning complex structures hidden in data, graph-based methods have been widely studied and get successful in unsupervised learning. Generally, most existing graph-based clustering methods require post-processing on the original data graph to extract the clustering indicators. However, there are two drawbacks with these methods: (1) the cluster structures are not explicit in the clustering results; (2) the final clustering performance is sensitive to the construction of the original data graph. To solve these problems, in this paper, a novel learning model is proposed to learn a graph based on the given data graph such that the new obtained optimal graph is more suitable for the clustering task. We also propose an efficient algorithm to solve the model. Extensive experimental results illustrate that the proposed model outperforms other state-of-the-art clustering algorithms.

• #1636
Markov Random Neural Fields for Face Sketch Synthesis
Mingjin Zhang, Nannan Wang, Xinbo Gao, Yunsong Li
Graphical Models

Synthesizing face sketches with both common and specific information from photos has been recently attracting considerable attentions in digital entertainment. However, the existing approaches either make the strict similarity assumption on face sketches and photos, leading to lose some identity-specific information, or learn the direct mapping relationship from face photos to sketches by the simple neural network, resulting in the lack of some common information. In this paper, we propose a novel face sketch synthesis based on the Markov random neural fields including two structures. In the first structure, we utilize the neural network to learn the non-linear photo-sketch relationship and obtain the identity-specific information of the test photo, such as glasses, hairpins and hairstyles. In the second structure, we choose the nearest neighbors of the test photo patch and the sketch pixel synthesized in the first structure from the training data which ensure the common information of Miss or Mr Average. Experimental results on the Chinese University of Hong Kong face sketch database illustrate that our proposed framework can preserve the common structure and capture the characteristic features. Compared with the state-of-the-art methods, our method achieves better results in terms of both quantitative and qualitative experimental evaluations.

• #4164
Where Have You Been? Inferring Career Trajectory from Academic Social Network
Kan Wu, Jie Tang, Chenhui Zhang
Graphical Models

A person’s career trajectory is composed of her/his past work or educational affiliations (institutions) at different points of times. Knowing people’s, especially scholars’, career trajectories can help the government make more scientific strategies to allocate resources and attract talent and help companies make smart recruiting plans. It could also support individuals find appropriate co-researchers or job opportunities. The paper focuses on inferring career trajectories in the academic social network. For about 1/3 of authors not having any affiliations in the dataset, we need to infer the missings at various years. Traditional affiliation/location inferring methods focus on inferring a stationary location (one and only) for a person. Nevertheless, people won’t stay at a place all their lives. We propose a Space-Time Factor Graph Model (STFGM) incorporating spatial and temporal correlations to fulfill the challenging and new task of inferring temporal locations. Experiments show our approach significantly outperforms baselines. At last, as case study, we develop several applications based on our approach which demonstrate the effectiveness further.

• #4359
Structured Inference for Recurrent Hidden Semi-markov Model
Hao Liu, Lirong He, Haoli Bai, Bo Dai, Kun Bai, Zenglin Xu
Graphical Models

Segmentation and labeling for high dimensional time series is an important yet challenging task in a number of applications, such as behavior understanding and medical diagnosis. Recent advances to model the nonlinear dynamics in such time series data, has suggested to involve recurrent neural networks into  Hidden Markov Models. However, this involvement has caused the inference procedure much more complicated, often leading to intractable inference, especially for the discrete variables of segmentation and labeling. To achieve both flexibility and tractability in modeling nonlinear dynamics of discrete variables, we present a structured and stochastic sequential neural network (SSNN), which composes with a generative network and an inference network. In detail, the generative network aims to not only capture the long-term dependencies but also model the uncertainty of the segmentation labels via semi-Markov models. More importantly, for efficient and accurate inference, the proposed bi-directional inference network reparameterizes the categorical segmentation with the Gumbel-Softmax approximation and resorts to the Stochastic Gradient Variational Bayes. We evaluate the proposed model in a number of tasks, including speech modeling, automatic segmentation and labeling in behavior understanding, and sequential multi-objects recognition. Experimental results have demonstrated that our proposed model can achieve significant improvement over the state-of-the-art methods.

• #1627
Patent Litigation Prediction: A Convolutional Tensor Factorization Approach
Qi Liu, Han Wu, Yuyang Ye, Hongke Zhao, Chuanren Liu, Dongfang Du
Graphical Models

Patent litigation is an expensive legal process faced by many companies. To reduce the cost of patent litigation, one effective approach is proactive management based on predictive analysis. However, automatic prediction of patent litigation is still an open problem due to the complexity of lawsuits. In this paper, we propose a data-driven framework, Convolutional Tensor Factorization (CTF), to identify the patents that may cause litigations between two companies. Specifically, CTF is a hybrid modeling approach, where the content features from the patents are represented by the Network embedding-combined Convolutional Neural Network (NCNN) and the lawsuit records of companies are summarized in a tensor, respectively. Then, CTF integrates NCNN and tensor factorization to systematically exploit both content information and collaborative information from large amount of data. Finally, the risky patents will be returned by a learning to rank strategy. Extensive experimental results on real-world data demonstrate the effectiveness of our framework.

• #4153
Building Sparse Deep Feedforward Networks using Tree Receptive Fields
Xiaopeng Li, Zhourong Chen, Nevin L. Zhang
Graphical Models

Sparse connectivity is an important factor behind the success of convolutional neural networks and recurrent neural networks. In this paper, we consider the problem of learning sparse connectivity for feedforward neural networks (FNNs). The key idea is that a unit should be connected to a small number of units at the next level below that are strongly correlated. We use Chow-Liu's algorithm to learn a tree-structured probabilistic model for the units at the current level, use the tree to identify subsets of units that are strongly correlated, and introduce a new unit with receptive field over the subsets. The procedure is repeated on the new units to build multiple layers of hidden units. The resulting model is called a TRF-net. Empirical results show that, when compared to dense FNNs, TRF-net achieves better or comparable classification performance with much fewer parameters and sparser structures. They are also more interpretable.

### Monday 1614:55 - 16:25ML-TAM1 - Transfer, Adaptation, Multi-Task Learning 1 (K11)

Chair: Yu-Feng Li
• #2483
MUSCAT: Multi-Scale Spatio-Temporal Learning with Application to Climate Modeling
Jianpeng Xu, Xi Liu, Tyler Wilson, Pang-Ning Tan, Pouyan Hatami, Lifeng Luo

In climate and environmental sciences, vast amount of spatio-temporal data have been generated at varying spatial resolutions from satellite observations and computer models. Integrating such diverse sources of data has proven to be useful for building prediction models as the multi-scale data may capture different aspects of the Earth system. In this paper, we present a novel framework called MUSCAT for predictive modeling of multi-scale, spatio-temporal data. MUSCAT performs a joint decomposition of multiple tensors from different spatial scales, taking into account the relationships between the variables. The latent factors derived from the joint tensor decomposition are  used to train the spatial and temporal prediction models at different scales for each location. The outputs from these ensemble of spatial and temporal models will be aggregated to generate future predictions. An incremental learning algorithm is also proposed to handle the massive size of the tensors. Experimental results on real-world data from the United States Historical Climate Network (USHCN) showed that MUSCAT outperformed other competing methods in more than 70\% of the locations.

• #2306
Unsupervised Cross-Modality Domain Adaptation of ConvNets for Biomedical Image Segmentations with Adversarial Loss
Qi Dou, Cheng Ouyang, Cheng Chen, Hao Chen, Pheng-Ann Heng

Convolutional networks (ConvNets) have achieved great successes in various challenging vision tasks. However, the performance of ConvNets would degrade when encountering the domain shift. The domain adaptation is more significant while challenging in the field of biomedical image analysis, where cross-modality data have largely different distributions. Given that annotating the medical data is especially expensive, the supervised transfer learning approaches are not quite optimal. In this paper, we propose an unsupervised domain adaptation framework with adversarial learning for cross-modality biomedical image segmentations. Specifically, our model is based on a dilated fully convolutional network for pixel-wise prediction. Moreover, we build a plug-and-play domain adaptation module (DAM) to map the target input to features which are aligned with source domain feature space. A domain critic module (DCM) is set up for discriminating the feature space of both domains. We optimize the DAM and DCM via an adversarial loss without using any target domain label. Our proposed method is validated by adapting a ConvNet trained with MRI images to unpaired CT data for cardiac structures segmentations, and achieved very promising results.

• #2818
Distance Metric Facilitated Transportation between Heterogeneous Domains
Han-Jia Ye, Xiang-Rong Sheng, De-Chuan Zhan, Peng He

Lacking training examples is one of the main obstacles to learning systems. Transfer learning aims to extract and utilize useful information from related datasets and assists the current task effectively. Most existing methods restrict tasks connection on the same feature sets, or require aligned examples cross domains, even cannot take full advantage of the limited label information. In this paper, we focus on transferring between heterogeneous domains, i.e., those with different feature spaces, and propose the Metric Transporation on HEterogeneous REpresentations (MapHere) approach. In particular, an asymmetric transformation map is first learned to compensate the  cross-domain feature difference based on linkage relationship between objects; then the inner-domain discrepancy is further reduced with learned optimal transportation. Note that both source domain and cross-domain relationship are fully utilized in MapHere, which helps improve target classification task a lot.  Experiments on synthetic dataset validate the importance of the ''metric facilitated'' consideration, while results on real-world image and text classification also show the superiority of the proposed MapHere approach.

• #4096
Summarizing Source Code with Transferred API Knowledge
Xing Hu, Ge Li, Xin Xia, David Lo, Shuai Lu, Zhi Jin

Code summarization, aiming to generate succinct natural language description of source code, is extremely useful for code search and code comprehension. It has played an important role in software maintenance and evolution. Previous approaches generate summaries by retrieving summaries from similar code snippets. However, these approaches heavily rely on whether similar code snippets can be retrieved, how similar the snippets are, and fail to capture the API knowledge in the source code, which carries vital information about the functionality of the source code. In this paper, we propose a novel approach, named TL-CodeSum, which successfully uses API knowledge learned in a different but related task to code summarization. Experiments on large-scale real-world industry Java projects indicate that our approach is effective and outperforms the state-of-the-art in code summarization.

• #1674
Multi-Task Clustering with Model Relation Learning
Xiaotong Zhang, Xianchao Zhang, Han Liu, Jiebo Luo

• #4385
Semi-Supervised Optimal Transport for Heterogeneous Domain Adaptation
Yuguang Yan, Wen Li, Hanrui Wu, Huaqing Min, Mingkui Tan, Qingyao Wu

Heterogeneous domain adaptation (HDA) aims to exploit knowledge from a heterogeneous source domain to improve the learning performance in a target domain. Since the feature spaces of the source and target domains are different, the transferring of knowledge is extremely difficult. In this paper, we propose a novel semi-supervised algorithm for HDA by exploiting the theory of optimal transport (OT), a powerful tool originally designed for aligning two different distributions. To match the samples between heterogeneous domains, we propose to preserve the semantic consistency between heterogeneous domains by incorporating label information into the entropic Gromov-Wasserstein discrepancy, which is a metric in OT for different metric spaces, resulting in a new semi-supervised scheme. Via the new scheme, the target and transported source samples with the same label are enforced to follow similar distributions. Lastly, based on the Kullback-Leibler metric, we develop an efficient algorithm to optimize the resultant problem. Comprehensive experiments on both synthetic and real-world datasets demonstrate the effectiveness of our proposed method.

• #4027
Social Media based Simulation Models for Understanding Disease Dynamics
Ting Hua, Chandan K Reddy, Lei Zhang, Lijing Wang, Liang Zhao, Chang-Tien Lu, Naren Ramakrishnan

In this modern era, infectious diseases, such as H1N1, SARS, and Ebola, are spreading much faster than any time in history. Efficient approaches are therefore desired to monitor and track the diffusion of these deadly epidemics. Traditional computational epidemiology models are able to capture the disease spreading trends through contact network, however, one unable to provide timely updates via real-world data. In contrast, techniques focusing on emerging social media platforms can collect and monitor real-time disease data, but do not provide an understanding of the underlying dynamics of ailment propagation. To achieve efficient and accurate real-time disease prediction, the framework proposed in this paper combines the strength of social media mining and computational epidemiology. Specifically, individual health status is first learned from user's online posts through Bayesian inference, disease parameters are then extracted for the computational models at population-level, and the outputs of computational epidemiology model are inversely fed into social media data based models for further performance improvement. In various experiments, our proposed model outperforms current disease forecasting approaches with better accuracy and more stability.

### Monday 1614:55 - 16:35EAR2 - Early Career 2 (VICTORIA)

Chair: Makoto Yokoo
• #5479
Towards Improving the Expressivity and Scalability of Distributed Constraint Optimization Problems
William Yeoh
Early Career 2

Constraints have long been studied in centralized systems and have proven to be practical and efficient for modeling and solving resource allocation and scheduling problems. Slightly more than a decade ago, researchers proposed the distributed constraint optimization problem (DCOP) formulation, which is well suited for modeling distributed multi-agent coordination problems. In this paper, we highlight some of our recent contributions that are aiming towards improved expressivity of the DCOP model as well as improved scalability of the accompanying algorithms.

• #5493
Formal Analysis of Deep Binarized Neural Networks
Nina Narodytska
Early Career 2

Understanding properties of deep neural networks is an important challenge in deep learning. Deep learning networks are among the most successful artificial intelligence technologies that is making impact in a variety of practical applications. However, many concerns were raised about magical' power of these networks. It is disturbing that we are really lacking of understanding of the decision making process behind this technology. Therefore, a natural question is whether we can trust decisions that neural networks make. One way to address this issue is to define properties that we want a neural network to satisfy. Verifying whether a neural network fulfills these properties sheds light on the properties of the function that it represents. In this work, we take the verification approach. Our goal is to design a framework for analysis of properties of neural networks. We start by defining a set of interesting properties to analyze. Then we focus on Binarized Neural Networks that can be represented and analyzed using well-developed means of Boolean Satisfiability and Integer Linear Programming. One of our main results is an exact representation of a binarized neural network as a Boolean formula. We also discuss how we can take advantage of the structure of neural networks in the search procedure.

• #5484
Emmanuel Hebrard
Early Career 2

The concept of local consistency – making global deductions from local infeasibility – is central to constraint programming. When reasoning about NP-complete constraints, however, since achieving a complete'' form of local consistency is often considered too hard, we need other tools to design and analyze propagation algorithms. In this paper, we argue that NP-complete constraints are an essential part of constraint programming, that designing dedicated methods has lead to, and will bring, significant breakthroughs, and that we need to carefully investigate methods to deal about a necessarily incomplete inference. In particular, we advocate the use of fixed-parameter tractability and kernelization to this purpose.

### Monday 1616:40 - 17:55EAR3 - Early Career 3 (VICTORIA)

Chair: Pierre Marquis
• #5485
Natural Language Understanding: Instructions for (Present and Future) Use
Roberto Navigli
Early Career 3

In this paper I look at Natural Language Understanding, an area of Natural Language Processing aimed at making sense of text, through the lens of a visionary future: what do we expect a machine should be able to understand? and what are the key dimensions that require the attention of researchers to make this dream come true?

• #5491
Improving Data Management using Domain Knowledge
Magdalena Ortiz
Early Career 3

The development of tools and techniques for flexible and reliable data management is a long-standing challenge, ever more pressing in today’s data-rich world. We advocate using domain knowledge expressed in ontologies to tackle it, and summarize some research efforts to this aim that follow two directions. First, we consider the problem of ontology-mediated query answering (OMQA), where queries in a standard database query language are enriched with an ontology expressing background knowledge about the domain of interest, used to retrieve more complete answers when querying incomplete data. We discuss some of our contributions to OMQA, focusing on (i) expressive languages for OMQA, with emphasis on combining the open- and closed-world assumptions to reason about partially complete data; and (ii) OMQA algorithms based on rewriting techniques. The second direction we discuss proposes to use ontologies to manage evolving data. In particular, we use ontologies to model and reason about constraints on datasets, effects of operations that modify data, and the integrity of the data as it evolves.

• #5483
Artificial Argumentation for Humans
Serena Villata
Early Career 3

The latest years have seen an increasing interest in the topic of Artificial Intelligence (AI), the challenges it is facing, and the recent advances it has achieved, e.g., intelligent personal assistants. Differently from the past, where research on AI was mainly confined in research labs, the topic is now attracting interest from a wider audience, including policy-makers, information technology companies, and philosophers. Alas, these advances have also raised a number of concerns on AI’s social, economic, and legal impact. Hence, the definition of design principles and automated methods to support transparent intelligent machine deliberation is highly desirable. Argumentation is important for handling conflicting beliefs, assumptions, opinions, goals, and many other mental attitudes. Argumentation pervades human intelligent behavior, and I believe that it is a mandatory element to conceive autonomous artificial machines that can exploit argumentation models and tools in the cognitive tasks they are required to carry out. Results in this area will allow reducing the gap between humans and machines towards a good AI hybrid society.

### Monday 1616:40 - 18:05MUL-CAU - Causality (C7)

Chair: Tommie Meyer
• #286
Actual Causality in a Logical Setting
Alexander Bochman
Causality

We provide a definition of actual causation in the logical framework of the causal calculus, which is based on a causal version of the well-known NESS (or INUS) condition. We compare our definition with other, mainly counterfactual, approaches on standard examples. On the way, we explore general capabilities of the logical representation for structural equation models of causation and beyond.

• #1571
Causal Inference in Time Series via Supervised Learning
Yoichi Chikahara, Akinori Fujino
Causality

Causal inference in time series is an important problem in many fields. Traditional methods use regression models for this problem. The inference accuracies of these methods depend greatly on whether or not the model can be well fitted to the data, and therefore we are required to select an appropriate regression model, which is difficult in practice. This paper proposes a supervised learning framework that utilizes a classifier instead of regression models. We present a feature representation that employs the distance between the conditional distributions given past variable values and show experimentally that the feature representation provides sufficiently different feature vectors for time series with different causal relationships. Furthermore, we extend our framework to multivariate time series and present experimental results where our method outperformed the model-based methods and the supervised learning method for i.i.d. data.

• #2249
Counterfactual Resimulation for Causal Analysis of Rule-Based Models
Jonathan Laurent, Jean Yang, Walter Fontana
Causality

Models based on rules that express local and heterogeneous mechanisms of stochastic interactions between structured agents are an important tool for investigating the dynamical behavior of complex systems, especially in molecular biology. Given a simulated trace of events, the challenge is to construct a causal diagram that explains how a phenomenon of interest occurred. Counterfactual analysis can provide distinctive insights, but its standard definition is not applicable in rule-based models because they are not readily expressible in terms of structural equations. We provide a semantics of counterfactual statements that addresses this challenge by sampling counterfactual trajectories that are probabilistically as close to the factual trace as a given intervention permits them to be. We then show how counterfactual dependencies give rise to explanations in terms of relations of enablement and prevention between events.

• #2748
From the Periphery to the Core: Information Brokerage in an Evolving Network
Bo Yan, Yiping Liu, Jiamou Liu, Yijin Cai, Hongyi Su, Hong Zheng
Causality

Interpersonal ties are pivotal to individual efficacy, status and performance in an agent society.This paper explores three important and interrelated themes in social network theory: the center/periphery partition of the network; network dynamics; and social integration of newcomers. We tackle the question: How would a newcomer harness information brokerage to integrate into a dynamic network going from periphery to center? We model integration as the interplay between the newcomer and the dynamics network and capture information brokerage using a process of relationship building. We analyze theoretical guarantees for the newcomer to reach the center through tactics; proving that a winning tactic always exists for certain types of network dynamics. We then propose three tactics and show their superior performance over alternative methods on four real-world datasets and four network models. In general, our tactics place the newcomer to the center by adding very few new edges on dynamic networks with ~14000 nodes.

• #4010
Scalable Probabilistic Causal Structure Discovery
Dhanya Sridhar, Jay Pujara, Lise Getoor
Causality

Complex causal networks underlie many real-world problems, from the regulatory interactions between genes to the environmental patterns used to understand climate change. Computational methods seek to infer these causal networks using observational data and domain knowledge. In this paper, we identify three key requirements for inferring the structure of causal networks for scientific discovery: (1) robustness to noise in observed measurements; (2) scalability to handle hundreds of variables; and (3) flexibility to encode domain knowledge and other structural constraints. We first formalize the problem of joint probabilistic causal structure discovery.  We develop an approach using probabilistic soft logic (PSL) that exploits multiple statistical tests, supports efficient optimization over hundreds of variables, and can easily incorporate structural constraints, including imperfect domain knowledge. We compare our method against multiple well-studied approaches on biological and synthetic datasets, showing improvements of up to 20% in F1-score over the best performing baseline in realistic settings.

• #4239
A Graphical Criterion for Effect Identification in Equivalence Classes of Causal Diagrams
Amin Jaber, Jiji Zhang, Elias Bareinboim
Causality

Computing the effects of interventions from observational data is an important task encountered in many data-driven sciences. The problem is addressed by identifying the post-interventional distribution with an expression that involves only quantities estimable from the pre-interventional distribution over observed variables, given some knowledge about the causal structure. In this work, we relax the requirement of having a fully specified causal structure and study the identifiability of effects with a singleton intervention (X), supposing that the structure is known only up to an equivalence class of causal diagrams, which is the output of standard structural learning algorithms (e.g., FCI). We derive a necessary and sufficient graphical criterion for the identifiability of the effect of X on all observed variables. We further establish a sufficient graphical criterion to identify the effect of X on a subset of the observed variables, and prove that it is strictly more powerful than the current state-of-the-art result on this problem.

• #4388
On the Conditional Logic of Simulation Models
Duligur Ibeling, Thomas Icard
Causality

We propose analyzing conditional reasoning by appeal to a notion of intervention on a simulation program, formalizing and subsuming a number of approaches to conditional thinking in the recent AI literature. Our main results include a series of axiomatizations, allowing comparison between this framework and existing frameworks (normality-ordering models, causal structural equation models), and a complexity result establishing NP-completeness of the satisfiability problem. Perhaps surprisingly, some of the basic logical principles common to all existing approaches are invalidated in our causal simulation approach. We suggest that this additional flexibility is important in modeling some intuitive examples.

### Monday 1616:40 - 18:05MAS-SOC - Computational Social Choice (C8)

Chair: Felix Brandt
• #1614
Preference Orders on Families of Sets - When Can Impossibility Results Be Avoided?
Jan Maly, Miroslaw Truszczynski, Stefan Woltran
Computational Social Choice

Lifting a preference order on elements of some universe to a preference order on subsets of this universe is often guided by postulated properties the lifted order should have. Well-known impossibility results pose severe limits on when such liftings exist if all non-empty subsets of the universe are to be ordered. The extent to which these negative results carry over to other families of sets is not known. In this paper, we consider families of sets that induce connected subgraphs in graphs. For such families, common in applications, we study whether lifted orders satisfying the well-studied axioms of dominance and (strict) independence exist for every or, in another setting, only for some underlying order on elements (strong and weak orderability). We characterize families that are strongly and weakly orderable under dominance and strict independence, and obtain a tight bound on the class of families that are strongly orderable under dominance and independence.

• #2061
Service Exchange Problem
Julien Lesca, Taiki Todo
Computational Social Choice

In this paper, we study the service exchange problem where each agent is willing to provide her service in order to receive in exchange the service of someone else. We assume that agent's preference depends both on the service that she receives and the person who receives her service. This framework is an extension of the housing market problem to preferences including a degree of externalities. We investigate the complexity of computing an individually rational and Pareto efficient allocation of services to agents for ordinal preferences, and the complexity of computing an allocation which maximizes either the utility sum or the utility of the least served agent for cardinal preferences.

• #2086
Computational Social Choice Meets Databases
Benny Kimelfeld, Phokion G. Kolaitis, Julia Stoyanovich
Computational Social Choice

We develop a novel framework that aims to create bridges between the computational social choice and the database management communities. This framework enriches the tasks currently supported in computational social choice with relational database context, thus making it possible to formulate sophisticated queries about voting rules, candidates, voters, issues, and positions. At the conceptual level, we give rigorous semantics to queries in this framework by introducing the notions of necessary answers and possible answers to queries. At the technical level, we embark on an investigation of the computational complexity of the necessary answers. In particular, we establish a number of results about the complexity of the necessary answers of conjunctive queries involving the plurality rule that contrast sharply with earlier results about the complexity of the necessary winners under the plurality rule.

• #2922
When Rigging a Tournament, Let Greediness Blind You
Sushmita Gupta, Sanjukta Roy, Saket Saurabh, Meirav Zehavi
Computational Social Choice

A knockout tournament is a standard format of competition, ubiquitous in sports, elections and decision making. Such a competition consists of several rounds. In each round, all players that have not yet been eliminated are paired up into matches. Losers are eliminated, and winners are raised to the next round, until only one winner exists. Given that we can correctly predict the outcome of each potential match (modelled by a tournament D), a seeding of the tournament deterministically determines its winner. Having a favorite player v in mind, the Tournament Fixing Problem (TFP) asks whether there exists a seeding that makes v the winner. Aziz et al. [AAAI’14] showed that TFP is NP-hard. They initiated the study of the parameterized complexity of TFP with respect to the feedback arc set number k of D, and gave an XP-algorithm (which is highly inefficient). Recently, Ramanujan and Szeider [AAAI’17] showed that TFP admits an FPT algorithm, running in time 2^{ O(k^2 log k)} n ^{O(1)}. At the heart of this algorithm is a translation of TFP into an algebraic system of equations, solved in a black box fashion (by an ILP solver). We present a fresh, purely combinatorial greedy solution. We rely on new insights into TFP itself, which also results in the better running time bound of 2^{ O(k log k)} n^{ O(1)} . While our analysis is intricate, the algorithm itself is surprisingly simple.

• #3113
Winning a Tournament by Any Means Necessary
Sushmita Gupta, Sanjukta Roy, Saket Saurabh, Meirav Zehavi
Computational Social Choice

In a tournament, $n$ players enter the competition. In each round, they are paired-up to compete against each other. Losers are thrown, while winners proceed to the next round, until only one player (the winner) is left. Given a prediction of the outcome, for every pair of players, of a match between them (modeled by a digraph $D$), the competitive nature of a tournament makes it attractive for manipulators. In the Tournament Fixing (TF) problem, the goal is to decide if we can conduct the competition (by controlling how players are paired-up) so that our favorite player $w$ wins. A common form of manipulation is to bribe players to alter the outcome of matches. Kim and Williams [IJCAI 2015] integrated such deceit into TF, and showed that the resulting problem is NP-hard when $\ell<(1-\epsilon)\log n$ alterations are possible (for any fixed $\epsilon>0$). For this problem, our contribution is fourfold. First, we present two operations that obfuscate deceit'': given one solution, they produce another solution. Second, we present a combinatorial result, stating that there is always a solution with all reversals incident to $w$ and elite players''. Third, we give a closed formula for the case where $D$ is a DAG. Finally, we present exact exponential-time and parameterized algorithms for the general case.

• #3606
A Structural Approach to Activity Selection
Eduard Eiben, Robert Ganian, Sebastian Ordyniak
Computational Social Choice

The general task of finding an assignment of agents to activities under certain stability and rationality constraints has led to the introduction of two prominent problems in the area of computational social choice: Group Activity Selection (GASP) and Stable Invitations (SIP). Here we introduce and study the Comprehensive Activity Selection Problem, which naturally generalizes both of these problems. In particular, we apply the parameterized complexity paradigm, which has already been successfully employed for SIP and GASP. While previous work has focused strongly on parameters such as solution size or number of activities, here we focus on parameters which capture the complexity of agent-to-agent interactions. Our results include a comprehensive complexity map for CAS under various restrictions on the number of activities in combination with restrictions on the complexity of agent interactions.

• #3513
Deep Learning for Multi-Facility Location Mechanism Design
Noah Golowich, Harikrishna Narasimhan, David C. Parkes
Computational Social Choice

Moulin [1980] characterizes the single-facility, deterministic strategy-proof mechanisms for social choice with single-peaked preferences as the set of generalized median rules. In contrast, we have only a limited understanding of multi-facility strategy-proof mechanisms, and recent work has shown negative worst case results for social cost. Our goal is to design strategy-proof, multi-facility mechanisms that minimize expected social cost. We first give a PAC learnability result for the class of multi-facility generalized median rules, and utilize neural networks to learn mechanisms from this class. Even in the absence of characterization results, we develop a computational procedure for learning almost strategy-proof mechanisms that are as good as or better than benchmarks from the literature, such as the best percentile and dictatorial rules.

### Monday 1616:40 - 18:05ML-LT - Learning Theory (K2)

Chair: Yann Chevaleyre
• #366
Differential Equations for Modeling Asynchronous Algorithms
Li He, Qi Meng, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
Learning Theory

Asynchronous stochastic gradient descent (ASGD) is a popular parallel optimization algorithm in machine learning. Most theoretical analysis on ASGD take a discrete view and prove upper bounds for their convergence rates. However, the discrete view has its intrinsic limitations: there is no characterizationof the optimization path and the proof techniques are induction-based and thus usually complicated. Inspired by the recent successful adoptions of stochastic differential equations (SDE) to the theoretical analysis of SGD, in this paper, we study the continuous approximation of ASGD by using stochastic differential delay equations (SDDE). We introduce the approximation method and study the approximation error. Then we conduct theoretical analysis on the convergence rate of ASGD algorithm based on the continuous approximation.There are two methods: moment estimation and energy function minimization can be used to analyzethe convergence rates. Moment estimation depends on the specific form of the loss function, while energy function minimization only leverages the convex property of the loss function, and does not depend on its specific form. In addition to the convergence analysis, the continuous view also helps us derive better convergence rates. All of this clearly shows the advantage of taking the continuous view in gradient descent algorithms.

• #2040
On the Convergence Properties of a K-step Averaging Stochastic Gradient Descent Algorithm for Nonconvex Optimization
Fan Zhou, Guojing Cong
Learning Theory

We adopt and analyze a synchronous K-step averaging stochastic gradient descent algorithm which we call K-AVG  for solving large scale machine learning problems. We establish the convergence results of K-AVG for nonconvex objectives. Our analysis of K-AVG applies to many existing variants of synchronous SGD.  We explain why the K-step delay is necessary and leads to better performance than traditional parallel stochastic gradient descent which is equivalent to K-AVG with $K=1$. We also show that K-AVG scales better with the number of learners than asynchronous stochastic gradient descent (ASGD). Another advantage of K-AVG over ASGD is that it allows larger stepsizes and facilitates faster convergence. On a cluster of $128$ GPUs, K-AVG is faster than ASGD implementations and achieves better accuracies and faster convergence for training with the CIFAR-10 dataset.

• #547
Quantum Divide-and-Conquer Anchoring for Separable Non-negative Matrix Factorization
Yuxuan Du, Tongliang Liu, Yinan Li, Runyao Duan, Dacheng Tao
Learning Theory

It is NP-complete to find non-negative factors W and H with fixed rank r from a non-negative matrix X by minimizing ||X-WH^Τ ||^2. Although the separability assumption (all data points are in the conical hull of the extreme rows) enables polynomial-time algorithms, the computational cost is not affordable for big data. This paper investigates how the power of quantum computation can be capitalized to solve the non-negative matrix factorization with the separability assumption (SNMF) by devising a quantum algorithm based on the divide-and-conquer anchoring (DCA) scheme [Zhou et al., 2013]. The design of quantum DCA (QDCA) is challenging. In the divide step,  the random projections in  DCA is completed by a quantum algorithm for linear operations, which achieves the exponential speedup. We then  devise a heuristic post-selection procedure which extracts the information of anchors stored in the quantum states efficiently. Under a plausible assumption, QDCA performs efficiently, achieves the quantum speedup, and is beneficial for high dimensional problems.

• #807
A Generic Approach for Accelerating Stochastic Zeroth-Order Convex Optimization
Xiaotian Yu, Irwin King, Michael R. Lyu, Tianbao Yang
Learning Theory

In this paper, we propose a generic approach for accelerating the convergence of existing algorithms to solve the problem of stochastic zeroth-order convex optimization (SZCO). Standard techniques for accelerating the convergence of stochastic zeroth-order algorithms are by exploring multiple functional evaluations (e.g., two-point evaluations), or by exploiting global conditions of the problem (e.g., smoothness and strong convexity). Nevertheless, these classic acceleration techniques are necessarily restricting the applicability of newly developed algorithms. The key of our proposed generic approach is to explore a local growth condition  (or called local error bound condition) of the objective function in SZCO. The benefits of the proposed acceleration technique are: (i) it is applicable to both settings with one-point evaluation and two-point evaluations; (ii) it does not necessarily require strong convexity or smoothness condition of the objective function; (iii) it yields an improvement on convergence for a broad family of problems. Empirical studies in various settings demonstrate the effectiveness of the proposed acceleration approach.

• #938
De-biasing Covariance-Regularized Discriminant Analysis
Haoyi Xiong, Wei Cheng, Yanjie Fu, Wenqing Hu, Jiang Bian, Zhishan Guo
Learning Theory

Fisher's Linear Discriminant Analysis (FLD) is a well-known technique for linear classification, feature extraction and dimension reduction. The empirical FLD relies on two key estimations from the data -- the mean vector for each class and the (inverse) covariance matrix. To improve the accuracy of FLD under the High Dimension Low Sample Size (HDLSS) settings, Covariance-Regularized FLD (CRLD) has been proposed to use shrunken covariance estimators, such as Graphical Lasso, to strike a balance between biases and variances. Though CRLD could obtain better classification accuracy, it usually incurs bias and converges to the optimal result with a slower asymptotic rate. Inspired by the recent progress in de-biased Lasso, we propose a novel FLD classifier, DBLD, which improves classification accuracy of CRLD through de-biasing. Theoretical analysis shows that DBLD possesses better asymptotic properties than CRLD. We conduct experiments on both synthetic datasets and real application datasets to confirm the correctness of our theoretical analysis and demonstrate the superiority of DBLD over classical FLD, CRLD and other downstream competitors under HDLSS settings.

• #3236
Interactive Optimal Teaching with Unknown Learners
Francisco S. Melo, Carla Guerra, Manuel Lopes
Learning Theory

This paper introduces a new approach for machine teaching that partly addresses the (unavoidable) mismatch between what the teacher assumes about the learning process of the student and the actual process. We analyze several situations in which such mismatch takes place, including when the student?s learning algorithm is known but the corresponding parameters are not, and when the learning algorithm itself is not known. Our analysis is focused on the case of a Bayesian Gaussian learner, and we show that, even in this simple case, the lack of knowledge regarding the student?s learning process significantly deteriorates the performance of machine teaching: while perfect knowledge of the student ensures that the target is learned after a finite number of samples, lack of knowledge thereof implies that the student will only learn asymptotically (i.e., after an infinite number of samples). We introduce interactivity as a means to mitigate the impact of imperfect knowledge and show that, by using interactivity, we are able to recover finite learning time, in the best case, or significantly faster convergence, in the worst case. Finally, we discuss the extension of our analysis to a classification problem using linear discriminant analysis, and discuss the implications of our results in single- and multi-student settings.

• #4021
Generalization-Aware Structured Regression towards Balancing Bias and Variance
Martin Pavlovski, Fang Zhou, Nino Arsov, Ljupco Kocarev, Zoran Obradovic
Learning Theory

Attaining the proper balance between underfitting and overfitting is one of the central challenges in machine learning. It has been approached mostly by deriving bounds on generalization risks of learning algorithms. Such bounds are, however, rarely controllable. In this study, a novel bias-variance balancing objective function is introduced in order to improve generalization performance. By utilizing distance correlation, this objective function is able to indirectly control a stability-based upper bound on a model's expected true risk. In addition, the Generalization-Aware Collaborative Ensemble Regressor (GLACER) is developed, a model that bags a crowd of structured regression models, while allowing them to collaborate in a fashion that minimizes the proposed objective function. The experimental results on both synthetic and real-world data indicate that such an objective enhances the overall model's predictive performance. When compared against a broad range of both traditional and structured regression models GLACER was ~10-56% and ~49-99% more accurate for the task of predicting housing prices and hospital readmissions, respectively.

### Monday 1616:40 - 18:05NLP-DIA1 - Dialogue, Conversation Models (T2)

Chair: Xiaojuan Ma
• #2384
Get The Point of My Utterance! Learning Towards Effective Responses with Multi-Head Attention Mechanism
Chongyang Tao, Shen Gao, Mingyue Shang, Wei Wu, Dongyan Zhao, Rui Yan
Dialogue, Conversation Models

Attention mechanism has become a popular and widely used component in sequence-to-sequence models. However, previous research on neural generative dialogue systems always generates universal responses, and the attention distribution learned by the model always attends to the same semantic aspect. To solve this problem, in this paper, we propose a novel Multi-Head Attention Mechanism (MHAM) for generative dialog systems, which aims at capturing multiple semantic aspects from the user utterance. Further, a regularizer is formulated to force different attention heads to concentrate on certain aspects. The proposed mechanism leads to more informative, diverse, and relevant response generated. Experimental results show that our proposed model outperforms several strong baselines.

• #3483
Learning to Converse with Noisy Data: Generation with Calibration
Mingyue Shang, Zhenxin Fu, Nanyun Peng, Yansong Feng, Dongyan Zhao, Rui Yan
Dialogue, Conversation Models

The availability of abundant conversational data on the Internet brought prosperity to the generation-based open domain conversation systems. In the training of the generation models, existing methods generally treat all the training data equivalently. However, the data crawled from the websites may contain many noises. Blindly training with the noisy data could harm the performance of the final generation model. In this paper, we propose a generation with calibration framework, that allows high- quality data to have more influences on the generation model and reduces the effect of noisy data. Specifically, for each instance in training set, we employ a calibration network to produce a quality score for it, then the score is used for the weighted update of the generation model parameters. Experiments show that the calibrated model outperforms baseline methods on both automatic evaluation metrics and human annotations.

• #4533
Smarter Response with Proactive Suggestion: A New Generative Neural Conversation Paradigm
Rui Yan, Dongyan Zhao
Dialogue, Conversation Models

Conversational systems are becoming more and more promising by playing an important role in human-computer communications. A conversational system is supposed to be intelligent to enable human-like interactions. The long-term goal of smart human-computer conversations is challenging and heavily driven by data. Thanks to the prosperity of Web 2.0, a large volume of conversational data become available to establish human-computer conversational systems. Given a human issued message, namely a query, a traditional conversational system would provide a response after proper training of how to respond like humans. In this paper, we propose a new paradigm for neural generative conversations: smarter response with a suggestion is provided given the query. We assume that the new conversation mode which proactively introduces contents as next utterances, keeping user actively engaged. To address the task, we propose a novel integrated model to handle both the response generation and the suggestion generation. From the experimental results, we verify the effectiveness of the new neural generative conversation paradigm.

• #2739
Commonsense Knowledge Aware Conversation Generation with Graph Attention
Hao Zhou, Tom Young, Minlie Huang, Haizhou Zhao, Jingfang Xu, Xiaoyan Zhu
Dialogue, Conversation Models

Commonsense knowledge is vital to many natural language processing tasks. In this paper, we present a novel open-domain conversation generation model to demonstrate how large-scale commonsense knowledge can facilitate language understanding and generation. Given a user post, the model retrieves relevant knowledge graphs from a knowledge base and then encodes the graphs with a static graph attention mechanism, which augments the semantic information of the post and thus supports better understanding of the post. Then, during word generation, the model attentively reads the retrieved knowledge graphs and the knowledge triples within each graph to facilitate better generation through a dynamic graph attention mechanism. This is the first attempt that uses large-scale commonsense knowledge in conversation generation. Furthermore, unlike existing models that use knowledge triples (entities) separately and independently, our model treats each knowledge graph as a whole, which encodes more structured, connected semantic information in the graphs. Experiments show that the proposed model can generate more appropriate and informative responses than state-of-the-art baselines.

• #788
Submodularity-Inspired Data Selection for Goal-Oriented Chatbot Training Based on Sentence Embeddings
Dialogue, Conversation Models

Spoken language understanding (SLU) systems, such as goal-oriented chatbots or personal assistants, rely on an initial natural language understanding (NLU) module to determine the intent and to extract the relevant information from the user queries they take as input. SLU systems usually help users to solve problems in relatively narrow domains and require a large amount of in-domain training data. This leads to significant data availability issues that inhibit the development of successful systems. To alleviate this problem, we propose a technique of data selection in the low-data regime that enables us to train with fewer labeled sentences, thus smaller labelling costs. We propose a submodularity-inspired data ranking function, the ratio-penalty marginal gain, for selecting data points to label based only on the information extracted from the textual embedding space. We show that the distances in the embedding space are a viable source of information that can be used for data selection. Our method outperforms two known active learning techniques and enables cost-efficient training of the NLU unit. Moreover, our proposed selection technique does not need the model to be retrained in between the selection steps, making it time efficient as well.

• #4100
Learning Out-of-Vocabulary Words in Intelligent Personal Agents
Avik Ray, Yilin Shen, Hongxia Jin
Dialogue, Conversation Models

Semantic parsers play a vital role in intelligent agents to convert natural language instructions to an actionable logical form representation. However, after deployment, these parsers suffer from poor accuracy on encountering out-of-vocabulary (OOV) words, or significant accuracy drop on previously supported instructions after retraining. Achieving both goals simultaneously is non-trivial. In this paper, we propose novel neural networks based parsers to learn OOV words; one incorporating a new hybrid paraphrase generation model, and an enhanced sequence-to-sequence model. Extensive experiments on both benchmark and custom datasets show our new parsers achieve significant accuracy gain on OOV words and phrases, and in the meanwhile learn OOV words while maintaining accuracy on previously supported instructions.

• #3124
One "Ruler" for All Languages: Multi-Lingual Dialogue Evaluation with Adversarial Multi-Task Learning
Xiaowei Tong, Zhenxin Fu, Mingyue Shang, Dongyan Zhao, Rui Yan
Dialogue, Conversation Models

Automatic evaluating the performance of Open-domain dialogue system is a challenging problem. Recent work in neural network-based metrics has shown promising opportunities for automatic dialogue evaluation. However, existing methods mainly focus on monolingual evaluation, in which the trained metric is not flexible enough to transfer across different languages. To address this issue, we propose an adversarial multi-task neural metric (ADVMT) for multi-lingual dialogue evaluation, with shared feature extraction across languages. We evaluate the proposed model in two different languages. Experiments show that the adversarial multi-task neural metric achieves a high correlation with human annotation, which yields better performance than monolingual ones and various existing metrics.

### Monday 1616:40 - 18:05CV-CLA - Vision and Classification (T1)

Chair: Minnan Luo
• #628
Ensemble Soft-Margin Softmax Loss for Image Classification
Xiaobo Wang, Shifeng Zhang, Zhen Lei, Si Liu, Xiaojie Guo, Stan Z. Li
Vision and Classification

Softmax loss is arguably one of the most popular losses to train CNN models for image classification. However, recent works have exposed its limitation on feature discriminability. This paper casts a new viewpoint on the weakness of softmax loss. On the one hand, the CNN features learned using the softmax loss are often inadequately discriminative. We hence introduce a soft-margin softmax function to explicitly encourage the discrmination between different classes. On the other hand, the learned classifier of softmax loss is weak. We propose to assemble multiple these weak classifiers to a strong one, inspired by the recognition that the diversity among weak classifiers is critical to a good ensemble. To achieve the diversity, we adopt the Hilbert-Schmidt Independence Criterion (HSIC). Considering these two aspects in one framework, we design a novel loss, named as Ensemble Soft-Margin Softmax (EM-Softmax). Extensive experiments on benchmark datasets are conducted to show the superiority of our design over the baseline softmax loss and several state-of-the-art alternatives.

• #1066
Wenhai Wang, Xiang Li, Tong Lu, Jian Yang
Vision and Classification

On the basis of the analysis by revealing the equivalence of modern networks, we find that both ResNet and DenseNet are essentially derived from the same "dense topology", yet they only differ in the form of connection: addition (dubbed "inner link") vs. concatenation (dubbed "outer link"). However, both forms of connections have the superiority and insufficiency. To combine their advantages and avoid certain limitations on representation learning, we present a highly efficient and modularized Mixed Link Network (MixNet) which is equipped with flexible inner link and outer link modules. Consequently, ResNet, DenseNet and Dual Path Network (DPN) can be regarded as a special case of MixNet, respectively. Furthermore, we demonstrate that MixNets can achieve superior efficiency in parameter over the state-of-the-art architectures on many competitive datasets like CIFAR-10/100, SVHN and ImageNet.

• #1655
Zero Shot Learning via Low-rank Embedded Semantic AutoEncoder
Yang Liu, Quanxue Gao, Jin Li, Jungong Han, Ling Shao
Vision and Classification

Zero-shot learning (ZSL) has been widely researched and get successful in machine learning. Most existing ZSL methods aim to accurately recognize objects of unseen classes by learning a shared mapping from the feature space to a semantic space. However, such methods did not investigate in-depth whether the mapping can precisely reconstruct the original visual feature. Motivated by the fact that the data have low intrinsic dimensionality e.g. low-dimensional subspace. In this paper, we formulate a novel framework named Low-rank Embedded Semantic AutoEncoder (LESAE) to jointly seek a low-rank mapping to link visual features with their semantic representations. Taking the encoder-decoder paradigm, the encoder part aims to learn a low-rank mapping from the visual feature to the semantic space, while decoder part manages to reconstruct the original data with the learned mapping. In addition, a non-greedy iterative algorithm is adopted to solve our model. Extensive experiments on six benchmark datasets demonstrate its superiority over several state-of-the-art algorithms.

• #2474
Energy-efficient Amortized Inference with Cascaded Deep Classifiers
Jiaqi Guan, Yang Liu, Qiang Liu, Jian Peng
Vision and Classification

Deep neural networks have been remarkable successful in various AI tasks but often cast high computation and energy cost for energy-constrained applications such as mobile sensing. We address this problem by proposing a novel framework that optimizes the prediction accuracy and energy cost simultaneously, thus enabling effective cost-accuracy trade-off at test time. In our framework, each data instance is pushed into a cascade of deep neural networks with increasing sizes, and a selection module is used to sequentially determine when a sufficiently accurate classifier can be used for this data instance. The cascade of neural networks and the selection module are jointly trained in an end-to-end fashion by the REINFORCE algorithm to optimize a trade-off between the computational cost and the predictive accuracy. Our method is able to simultaneously improve the accuracy and efficiency by learning to assign easy instances to fast yet sufficiently accurate classifiers to save computation and energy cost, while assigning harder instances to deeper and more powerful classifiers to ensure satisfiable accuracy. Moreover, we demonstrate our method's effectiveness with extensive experiments on CIFAR-10/100, ImageNet32x32 and original ImageNet dataset.

• #3068
HCR-Net: A Hybrid of Classification and Regression Network for Object Pose Estimation
Zairan Wang, Weiming Li, Yueying Kao, Dongqing Zou, Qiang Wang, Minsu Ahn, Sunghoon Hong
Vision and Classification

Object pose estimation from a single image is a fundamental and challenging problem in computer vision and robotics. Generally, current methods treat pose estimation as a classification or a regression problem. However, regression based methods usually suffer from the issue of imbalanced training data, while classification methods are difficult to discriminate nearby poses. In this paper, a hybrid CNN model, which we call it HCR-Net that integrates both a classification network and a regression network, is proposed to deal with these issues. Our model is inspired by that regression methods can get better accuracy on homogeneously distributed datasets while classification methods are more effective for coarse quantization of the poses even if the dataset is not well balanced. The classification methods and the regression methods essentially complement each other. Thus we integrate both them into a neural network in a hybrid fashion and train it end-to-end with two novel loss functions. As a result, our method surpass the state-of-the-art methods, even with imbalanced training data and  much less data augmentation. The experimental results on the challenging Pascal3D+ database demonstrate that our method outperforms the state-of-the-arts significantly, achieving improvements on ACC and AVP metrics up to 4% and 6%, respectively.

• #3181
Multi-scale and Discriminative Part Detectors Based Features for Multi-label Image Classification
Gong Cheng, Decheng Gao, Yang Liu, Junwei Han
Vision and Classification

Convolutional neural networks (CNNs) have shown their promise for image classification task. However, global CNN features still lack geometric invariance for addressing the problem of intra-class variations and so are not optimal for multi-label image classification. This paper proposes a new and effective framework built upon CNNs to learn Multi-scale and Discriminative Part Detectors (MsDPD)-based feature representations for multi-label image classification. Specifically, at each scale level, we (i) first present an entropy-rank based scheme to generate and select a set of discriminative part detectors (DPD), and then (ii) obtain a number of DPD-based convolutional feature maps with each feature map representing the occurrence probability of a particular part detector and learn DPD-based features by using a task-driven pooling scheme. The two steps are formulated into a unified framework by developing a new objective function, which jointly trains part detectors incrementally and integrates the learning of feature representations into the classification task. Finally, the multi-scale features are fused to produce the predictions. Experimental results on PASCAL VOC 2007 and VOC 2012 datasets demonstrate that the proposed method achieves better accuracy when compared with the existing state-of-the-art multi-label classification methods.

• #1484
Extracting Privileged Information from Untagged Corpora for Classifier Learning
Yazhou Yao, Jian Zhang, Fumin Shen, Wankou Yang, Xian-Sheng Hua, Zhenmin Tang
Vision and Classification

The performance of data-driven learning approaches is often unsatisfactory when the training data is inadequate either in quantity or quality. Manually labeled privileged information (PI), \eg attributes, tags or properties, is usually incorporated to improve classifier learning. However, the process of manually labeling is time-consuming and labor-intensive. To address this issue, we propose to enhance classifier learning by extracting PI from untagged corpora, which can effectively eliminate the dependency on manually labeled data. In detail, we treat each selected PI as a subcategory and learn one classifier for per subcategory independently. The classifiers for all subcategories are then integrated together to form a more powerful category classifier. Particularly, we propose a new instance-level multi-instance learning (MIL) model to simultaneously select a subset of training images from each subcategory and learn the optimal classifiers based on the selected images. Extensive experiments demonstrate the superiority of our approach.

### Monday 1616:40 - 18:05ML-RL1 - Reinforcement Learning (K11)

Chair: B. Ravindran
• #368
Learning to Design Games: Strategic Environments in Reinforcement Learning
Haifeng Zhang, Jun Wang, Zhiming Zhou, Weinan Zhang, Yin Wen, Yong Yu, Wenxin Li
Reinforcement Learning

In typical reinforcement learning (RL), the environment is assumed given and the goal of the learning is to identify an optimal policy for the agent taking actions through its interactions with the environment. In this paper, we extend this setting by considering the environment is not given, but controllable and learnable through its interaction with the agent at the same time. This extension is motivated by environment design scenarios in the real-world, including game design, shopping space design and traffic signal design. Theoretically, we find a dual Markov decision process (MDP) w.r.t. the environment to that w.r.t. the agent, and derive a policy gradient solution to optimizing the parametrized environment. Furthermore, discontinuous environments are addressed by a proposed general generative framework. Our experiments on a Maze game design task show the effectiveness of the proposed algorithms in generating diverse and challenging Mazes against various agent settings.

• #977
Where to Prune: Using LSTM to Guide End-to-end Pruning
Jing Zhong, Guiguang Ding, Yuchen Guo, Jungong Han, Bin Wang
Reinforcement Learning

Recent years have witnessed the great success of convolutional neural networks (CNNs) in many related fields. However, its huge model size and computation complexity bring in difficulty when deploying CNNs in some scenarios, like embedded system with low computation power. To address this issue, many works have been proposed to prune filters in CNNs to reduce computation. However, they mainly focus on seeking which filters are unimportant in a layer and then prune filters layer by layer or globally. In this paper, we argue that the pruning order is also very significant for model pruning. We propose a novel approach to figure out which layers should be pruned in each step.  First, we utilize a long short-term memory (LSTM) to learn the hierarchical characteristics of a network and generate a pruning decision for each layer, which is the main difference from previous works. Next, a channel-based method is adopted to evaluate the importance of filters in a to-be-pruned layer, followed by an accelerated recovery step. Experimental results demonstrate that our approach is capable of reducing 70.1% FLOPs for VGG and 47.5% for Resnet-56 with comparable accuracy. Also, the learning results seem to reveal the sensitivity of each network layer.

• #521
Cross-modal Bidirectional Translation via Reinforcement Learning
Jinwei Qi, Yuxin Peng
Reinforcement Learning

The inconsistent distribution and representation of image and text make it quite challenging to measure their similarity, and construct correlation between them. Inspired by neural machine translation to establish a corresponding relationship between two entirely different languages, we attempt to treat images as a special kind of language to provide visual descriptions, so that translation can be conduct between bilingual pair of image and text to effectively explore cross-modal correlation. Thus, we propose Cross-modal Bidirectional Translation (CBT) approach, and further explore the utilization of reinforcement learning to improve the translation process. First, a cross-modal translation mechanism is proposed, where image and text are treated as bilingual pairs, and cross-modal correlation can be effectively captured in both feature spaces of image and text by bidirectional translation training. Second, cross-modal reinforcement learning is proposed to perform a bidirectional game between image and text, which is played as a round to promote the bidirectional translation process. Besides, both inter-modality and intra-modality reward signals can be extracted to provide complementary clues for boosting cross-modal correlation learning. Experiments are conducted to verify the performance of our proposed approach on cross-modal retrieval, compared with 11 state-of-the-art methods on 3 datasets.

• #2711
Zhuobin Zheng, Chun Yuan, Zhihui Lin, Yangyang Cheng, Hanghao Wu
Reinforcement Learning

Deep Deterministic Policy Gradient (DDPG) algorithm has been successful for state-of-the-art performance in high-dimensional continuous control tasks. However, due to the complexity and randomness of the environment, DDPG tends to suffer from inefficient exploration and unstable training. In this work, we propose Self-Adaptive Double Bootstrapped DDPG (SOUP), an algorithm that extends DDPG to bootstrapped actor-critic architecture. SOUP improves the efficiency of exploration by multiple actor heads capturing more potential actions and multiple critic heads evaluating more reasonable Q-values collaboratively. The crux of double bootstrapped architecture is to tackle the fluctuations in performance, caused by multiple heads of spotty capacity varying throughout training. To alleviate the instability, a self-adaptive confidence mechanism is introduced to dynamically adjust the weights of bootstrapped heads and enhance the ensemble performance effectively and efficiently. We demonstrate that SOUP achieves faster learning by at least 45% while improving cumulative reward and stability substantially in comparison to vanilla DDPG on OpenAI Gym's MuJoCo environments.

• #3116
A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning
Long Yang, Minhao Shi, Qian Zheng, Wenjia Meng, Gang Pan
Reinforcement Learning

Recently, a new multi-step temporal learning algorithm Q(σ) unifies n-step Tree-Backup (when σ = 0) and n-step Sarsa (when σ = 1) by introducing a sampling parameter σ. However, similar to other multi-step temporal-difference learning algorithms, Q(σ) needs much memory consumption and computation time. Eligibility trace is an important mechanism to transform the off-line updates into efficient on-line ones which consume less memory and computation time. In this paper, we combine the original Q(σ) with eligibility traces and propose a new algorithm, called Qπ(σ,λ), where λ is trace-decay parameter. This new algorithm unifies Sarsa(λ) (when σ = 1) and Qπ (λ) (when σ = 0). Furthermore, we give an upper error bound of Qπ(σ,λ) policy evaluation algorithm. We prove that Qπ (σ, λ) control algorithm converges to the optimal value function exponentially. We also empirically compare it with conventional temporal-difference learning methods. Results show that, with an intermediate value of σ, Qπ(σ,λ) creates a mixture of the existing algorithms which learn the optimal value significantly faster than the extreme end (σ = 0, or 1).

• #3171
Algorithms or Actions? A Study in Large-Scale Reinforcement Learning
Anderson Rocha Tavares, Sivasubramanian Anbalagan, Leandro Soriano Marcolino, Luiz Chaimowicz
Reinforcement Learning

Large state and action spaces are very challenging to reinforcement learning. However, in many domains there is a set of algorithms available, which estimate the best action given a state. Hence, agents can either directly learn a performance-maximizing mapping from states to actions, or from states to algorithms. We investigate several aspects of this dilemma, showing sufficient conditions for learning over algorithms to outperform over actions for a finite number of training iterations. We present synthetic experiments to further study such systems. Finally, we propose a function approximation approach, demonstrating the effectiveness of learning over algorithms in real-time strategy games.

• #2254
Multinomial Logit Bandit with Linear Utility Functions
Mingdong Ou, Nan Li, Shenghuo Zhu, Rong Jin
Reinforcement Learning

Multinomial logit bandit is a sequential subset selection problem which arises in many applications. In each round, the player selects a K-cardinality subset from N candidate items, and receives a reward which is governed by a multinomial logit (MNL) choice model considering both item utility and substitution property among items. The player's objective is to dynamically learn the parameters of MNL model and maximize cumulative reward over a finite horizon T. This problem faces the exploration-exploitation dilemma, and the involved combinatorial nature makes it non-trivial. In recent years, there have developed some algorithms by exploiting specific characteristics of the MNL model, but all of them estimate the parameters of MNL model separately and incur a regret bound which is not preferred for large candidate set size N. In this paper, we consider the linear utility MNL choice model whose item utilities are represented as linear functions of d-dimension item features, and propose an algorithm, titled LUMB, to exploit the underlying structure. It is proven that the proposed algorithm achieves regret which is free of candidate set size. Experiments show the superiority of the proposed algorithm.

### Monday 1616:40 - 18:05HAI-HCC - Human Computation and Crowdsourcing (C2)

Chair: Eugene Vorobeychik
• #386
On the Cost Complexity of Crowdsourcing
Yili Fang, Hailong Sun, Pengpeng Chen, Jinpeng Huai
Human Computation and Crowdsourcing

Existing efforts mainly use empirical analysis to evaluate the effectiveness of crowdsourcing methods, which is often unreliable across experimental settings. Consequently, it is of great importance to study theoretical methods. This work, for the first time, defines the cost complexity of crowdsourcing, and presents two theorems to compute the cost complexity. Our theorems provide a general theoretical method to model the trade-off between costs and quality, which can be used to evaluate and design crowdsourcing algorithms, and characterize the complexity of crowdsourcing problems. Moreover, following our theorems, we prove a set of corollaries that can obtain existing theoretical results for special cases. We have verified our work theoretically and empirically.

• #2229
A Novel Strategy for Active Task Assignment in Crowd Labeling
Zehong Hu, Jie Zhang
Human Computation and Crowdsourcing

Active learning strategies are often used in crowd labeling to improve task assignment. However, these strategies require prohibitive computation time yet still cannot improve the assignment to the utmost, because they simply evaluate each possible assignment and then greedily select the optimal one. In this paper, we first derive an efficient algorithm for assignment evaluation. Then, to overcome the uncertainty of labels, we develop a novel strategy that modulates the scope of the greedy task assignment with posterior uncertainty and keeps the evaluation optimistic. The experiments on two popular worker models and four MTurk datasets show that our strategy achieves the best performance and highest computation efficiency.

• #2274
Simultaneous Clustering and Ranking from Pairwise Comparisons
Jiyi Li, Yukino Baba, Hisashi Kashima
Human Computation and Crowdsourcing

When people make decisions with a number of ideas, designs, or other kinds of objects, one attempt is probably to organize them into several groups of objects and to prioritize them according to some preference. The grouping task is referred to as clustering and the prioritizing task is called as ranking. These tasks are often outsourced with the help of human judgments in the form of pairwise comparisons. Two objects are compared on whether they are similar in the clustering problem, while the object of higher priority is determined in the ranking problem. Our research question in this paper is whether the pairwise comparisons for clustering also help ranking (and vice versa). Instead of solving the two tasks separately, we propose a unified formulation to bridge the two types of pairwise comparisons. Our formulation simultaneously estimates the object embeddings and the preference criterion vector. The experiments using real datasets support our hypothesis; our approach can generate better neighbor and preference estimation results than the approaches that only focus on a single type of pairwise comparisons.

• #3189
On the Efficiency of Data Collection for Crowdsourced Classification
Edoardo Manino, Long Tran-Thanh, Nicholas R. Jennings
Human Computation and Crowdsourcing

The quality of crowdsourced data is often highly variable. For this reason, it is common to collect redundant data and use statistical methods to aggregate it. Empirical studies show that the policies we use to collect such data have a strong impact on the accuracy of the system. However, there is little theoretical understanding of this phenomenon. In this paper we provide the first theoretical explanation of the accuracy gap between the most popular collection policies: the non-adaptive uniform allocation, and the adaptive uncertainty sampling and information gain maximisation. To do so, we propose a novel representation of the collection process in terms of random walks. Then, we use this tool to derive lower and upper bounds on the accuracy of the policies. With these bounds, we are able to quantify the advantage that the two adaptive policies have over the non-adaptive one for the first time.

• #4057
An Axiomatic View of the Parimutuel Consensus Mechanism
Rupert Freeman, David M. Pennock
Human Computation and Crowdsourcing

We consider an axiomatic view of the Parimutuel Consensus Mechanism defined by Eisenberg and Gale (1959). The parimutuel consensus mechanism can be interpreted as a parimutuel market for wagering with a proxy that bets optimally on behalf of the agents, depending on the bets of the other agents.  We show that the parimutuel consensus mechanism uniquely satisfies the desirable properties of Pareto optimality, individual rationality, budget balance, anonymity, sybilproofness and envy-freeness. While the parimutuel consensus mechanism does violate the key property of incentive compatibility, it is incentive compatible in the limit as the number of agents becomes large. Via simulations on real contest data, we show that violations of incentive compatibility are both rare and only minimally beneficial for the participants. This suggests that the parimutuel consensus mechanism is a reasonable mechanism for eliciting information in practice.

• #5143
(Sister Conferences Best Papers Track) Evaluating and Complementing Vision-to-Language Technology for People who are Blind with Conversational Crowdsourcing
Elliot Salisbury, Ece Kamar, Meredith Ringel Morris
Human Computation and Crowdsourcing

We study how real-time crowdsourcing can be used both for evaluating the value provided by existing automated approaches and for enabling workflows that provide scalable and useful alt text to blind users. We show that the shortcomings of existing AI image captioning systems frequently hinder a user's understanding of an image they cannot see to a degree that even clarifying conversations with sighted assistants cannot correct. Based on analysis of clarifying conversations collected from our studies, we design experiences that can effectively assist users in a scalable way without the need for real-time interaction. Our results provide lessons and guidelines that the designers of future AI captioning systems can use to improve labeling of social media imagery for blind users.

• #5128
(Sister Conferences Best Papers Track) Geolocating Images with Crowdsourcing and Diagramming
Rachel Kohler, John Purviance, Kurt Luther
Human Computation and Crowdsourcing

Many types of investigative work involve verifying the legitimacy of visual evidence by identifying the precise geographic location where a photo or video was taken. Professional geolocation is often a manual, time-consuming process that can involve searching large areas of satellite imagery for potential matches. In this paper, we explore how crowdsourcing can be used to support expert image geolocation. We adapt an expert diagramming technique to overcome spatial reasoning limitations of novice crowds so that they can support an expert's search. In an experiment (n=540), we found that diagrams work significantly better than ground-level photos and allow crowds to reduce a search area by half before any expert intervention. We also discuss hybrid approaches to complex image analysis combining crowds, experts, and computer vision.

### Monday 1616:40 - 18:05ML-CLU - Clustering (C3)

Chair: Zhao Kang
• #3980
A Local Algorithm for Product Return Prediction in E-Commerce
Yada Zhu, Jianbo Li, Jingrui He, Brian L. Quanz, Ajay A. Deshpande
Clustering

With the rapid growth of e-tail, the cost to handle returned online orders also increases significantly and has become a major challenge in the e-commerce industry. Accurate prediction of product returns allows e-tailers to prevent problematic transactions in advance. However, the limited existing work for modeling customer online shopping behaviors and predicting their return actions fail to integrate the rich information in the product purchase and return history (e.g., return history, purchase-no-return behavior, and customer/product similarity). Furthermore, the large-scale data sets involved in this problem, typically consisting of millions of customers and tens of thousands of products, also render existing methods inefficient and ineffective at predicting the product returns. To address these problems, in this paper, we propose to use a weighted hybrid graph to represent the rich information in the product purchase and return history, in order to predict product returns. The proposed graph consists of both customer nodes and product nodes, undirected edges reflecting customer return history and customer/product similarity based on their attributes, as well as directed edges discriminating purchase-no-return and no-purchase actions. Based on this representation, we study a random-walk-based local algorithm for predicting product return propensity for each customer, whose computational complexity depends only on the size of the output cluster rather than the entire graph. Such a property makes the proposed local algorithm particularly suitable for processing the large-scale data sets to predict product returns. To test the performance of the proposed techniques, we evaluate the graph model and algorithm on multiple e-commerce data sets, showing improved performance over state-of-the-art methods.

• #1810
Mixture of GANs for Clustering
Yang Yu, Wen-Ji Zhou
Clustering

For data clustering, Gaussian mixture model (GMM) is a typical method that trains several Gaussian models to capture the data. Each Gaussian model then provides the distribution information of a cluster. For clustering of high dimensional and complex data, more flexible models rather than Gaussian models are desired. Recently, the generative adversarial networks (GANs) have shown effectiveness in capturing complex data distribution. Therefore, GAN mixture model (GANMM) would be a promising alternative of GMM. However, we notice that the non-flexibility of the Gaussian model is essential in the expectation-maximization procedure for training GMM. GAN can have much higher flexibility, which disables the commonly employed expectation-maximization procedure, as that the maximization cannot change the result of the expectation. In this paper, we propose to use the epsilon-expectation-maximization procedure for training GANMM. The experiments show that the proposed GANMM can have good performance on complex data as well as simple data.

• #1846
An Information Theory based Approach to Multisource Clustering
Pierre-Alexandre Murena, Jérémie Sublime, Basarab Matei, Antoine Cornuéjols
Clustering

Clustering is a compression task which consists in grouping similar objects into clusters. In real-life applications, the system may have access to several views of the same data and each view may be processed by a specific clustering algorithm: this framework is called multi-view clustering and can benefit from algorithms capable of exchanging information between the different views. In this paper, we consider this type of unsupervised ensemble learning as a compression problem and develop a theoretical framework based on algorithmic theory of information suitable for multi-view clustering and collaborative clustering applications. Using this approach, we propose a new algorithm based on solid theoretical basis, and test it on several real and artificial data sets.

• #3443
High-Order Co-Clustering via Strictly Orthogonal and Symmetric L1-Norm Nonnegative Matrix Tri-Factorization
Kai Liu, Hua Wang
Clustering

Different to traditional clustering methods that deal with one single type of data, High-Order Co- Clustering (HOCC) aims to cluster multiple types of data simultaneously by utilizing the inter- or/and intra-type relationships across different data types. In existing HOCC methods, data points routinely enter the objective functions with squared residual errors. As a result, outlying data samples can dominate the objective functions, which may lead to incorrect clustering results. Moreover, existing methods usually suffer from soft clustering, where the probabilities to different groups can be very close. In this paper, we propose an L1 -norm symmetric nonnegative matrix tri-factorization method to solve the HOCC problem. Due to the orthogonal constraints and the symmetric L1 -norm formulation in our new objective, conventional auxiliary function approach no longer works. Thus we derive the solution algorithm using the alternating direction method of multipliers. Extensive experiments have been conducted on a real world data set, in which promising empirical results, including less time consumption, strictly orthogonal membership matrix, lower local minima etc., have demonstrated the effectiveness of our proposed method.

• #5457
(Journal track) Rademacher Complexity Bounds for a Penalized Multi-class Semi-supervised Algorithm
Yury Maximov, Massih-Reza Amini, Zaid Harchaoui
Clustering

We propose Rademacher complexity bounds for multi-class classifiers trained with a two-step semi-supervised model. In the first step, the algorithm partitions the partially labeled data and then identifies dense clusters containing k predominant classes using the labeled training examples such that the proportion of their non-predominant classes is below a fixed threshold stands for clustering consistency. In the second step, a classifier is trained by minimizing a margin empirical loss over the labeled training set and a penalization term measuring the disability of the learner to predict the k predominant classes of the identified clusters. The resulting data-dependent generalization error bound involves the margin distribution of the classifier, the stability of the clustering technique used in the first step and Rademacher complexity terms corresponding to partially labeled training data. Our theoretical result exhibit convergence rates extending those proposed in the literature for the binary case, and experimental results on different multi-class classification problems show empirical evidence that supports the theory.

• #1380
Self-weighted Multiple Kernel Learning for Graph-based Clustering and Semi-supervised Classification
Zhao Kang, Xiao Lu, Jinfeng Yi, Zenglin Xu
Clustering

Multiple kernel learning (MKL) method is generally believed to perform better than single kernel method. However, some empirical studies show that this is not always true: the combination of multiple kernels may even yield an even worse performance than using a single kernel. There are two possible reasons for the failure: (i) most existing MKL methods assume that the optimal kernel is a linear combination of base kernels, which may not hold true; and (ii) some kernel weights are inappropriately assigned due to noises and carelessly designed algorithms. In this paper, we propose a novel MKL framework by following two intuitive assumptions: (i) each kernel is a perturbation of the consensus kernel; and (ii) the kernel that is close to the consensus kernel should be assigned a large weight. Impressively, the proposed method can automatically assign an appropriate weight to each kernel without introducing additional parameters, as existing methods do. The proposed framework is integrated into a unified framework for graph-based clustering and semi-supervised classification. We have conducted experiments on multiple benchmark datasets and our empirical results verify the superiority of the proposed framework.

• #2045
Ranking Preserving Nonnegative Matrix Factorization
Jing Wang, Feng Tian, Weiwei Liu, Xiao Wang, Wenjie Zhang, Kenji Yamanishi
Clustering

Nonnegative matrix factorization (NMF),  a well-known technique  to find  parts-based representations of nonnegative data, has been widely studied. In reality,  ordinal relations often exist among data,  such as data i is more related to j than to q.  Such relative order is naturally available, and more importantly, it truly reflects the latent data structure.  Preserving the ordinal relations enables us to find structured representations of data that are faithful to the relative order, so that the learned representations become  more discriminative. However, current NMFs pay no attention to this. In this paper, we make the first attempt towards incorporating the ordinal relations and  propose a novel ranking preserving nonnegative matrix factorization (RPNMF) approach, which enforces the learned representations to be ranked according to the relations. We derive  iterative updating rules to solve RPNMF's objective function with  convergence guaranteed.  Experimental results with several datasets for clustering and classification have demonstrated that RPNMF achieves greater performance against the state-of-the-arts,  not only  in terms of  accuracy, but also interpretation of orderly data structure.

### Tuesday 1708:30 - 09:45EAR4 - Early Career 4 (VICTORIA)

Chair: Matthijs Spaan
• #5447
Improving Reinforcement Learning with Human Input
Matthew E. Taylor
Early Career 4

Reinforcement learning (RL) has had many successes when learning autonomously. This paper and accompanying talk consider how to make use of a non-technical human participant, when available. In particular, we consider the case where a human could 1) provide demonstrations of good behavior, 2) provide online evaluative feedback, or 3) define a curriculum of tasks for the agent to learn on. In all cases, our work has shown such information can be effectively leveraged. After giving a high-level overview of this work, we will highlight a set of open questions and suggest where future work could be usefully focused.

• #5490
Partakable Technology
Nardine Osman
Early Career 4

This paper proposes a shift in how technology is currently being developed by giving people, the users, control over their technology. We argue that users should have a say in the behaviour of the technologies that mediate their online interactions and control their private data. We propose 'partakable technologies', technologies where users can come together to discuss and agree on its features and functionalities. To achieve this, we base our proposal on a number of existing technologies in the fields of agreement technologies, natural language processing, normative systems, and formal verification. As an IJCAI early career spotlight paper, the paper provides an overview of the author's expertise in these different areas.

• #5496
Solving Games with Structured Strategy Spaces
Albert Xin Jiang
Early Career 4

### Tuesday 1708:30 - 09:55KR-QUE - Query Answering and Databases (C7)

Chair: Diego Calvanese
• #3135
Finite Model Reasoning in Hybrid Classes of Existential Rules
Georg Gottlob, Marco Manna, Andreas Pieris

Two paradigmatic restrictions that have been studied for ensuring the decidability of query answering under existential rules are guardedness and stickiness. With the aim of consolidating these restrictions, a flexible condition, called tameness, has been proposed a few years ago, which relies on hybrid reasoning, i.e., a combination of forward and backward procedures. The complexity of query answering under this hybrid class of existential rules is by now well-understood. However, the complexity of finite query answering, i.e., query answering under finite models, has remained an open problem. Closing this problem is the main goal of this work.

• #4439
Complexity of Approximate Query Answering under Inconsistency in Datalog+/-
Thomas Lukasiewicz, Enrico Malizia, Cristian Molinaro

Several semantics have been proposed to query inconsistent ontological knowledge bases, including the intersection of repairs and the intersection of closed repairs as two approximate inconsistency-tolerant semantics. In this paper, we analyze the complexity of conjunctive query answering under these two semantics for a wide range of Datalog+/- languages. We consider both the standard setting, where errors may only be in the database, and the generalized setting, where also the rules of a Datalog+/- knowledge base may be erroneous.

• #3199
Computing Approximate Query Answers over Inconsistent Knowledge Bases
Sergio Greco, Cristian Molinaro, Irina Trubitsyna

Consistent query answering is a principled approach for querying inconsistent knowledge bases. It relies on the notion of a "repair", that is, a maximal consistent subset of the facts in the knowledge base. One drawback of this approach is that entire facts are deleted to resolve inconsistency, even if they may still contain useful "reliable" information. To overcome this limitation, we propose a new notion of repair allowing values within facts to be updated for restoring consistency. This more fine-grained repair primitive allows us to preserve more information in the knowledge base. We also introduce the notion of a "universal repair", which is a compact representation of all repairs. Then, we show that consistent query answering in our framework is intractable (coNP-complete). In light of this result, we develop a polynomial time approximation algorithm for computing a sound (but possibly incomplete) set of consistent query answers.

• #1068
Mario Alviano

Propositional circumscription defines a preference relation over the models of a propositional theory, so that models being subset-minimal on the interpretation of a set of objective atoms are preferred.The complexity of several computational tasks increase by one level in the polynomial hierarchy due to such a preference relation;among them there is query answering, which amounts to decide whether there is an optimal model satisfying the query.A complete algorithm for query answering is obtained by searching for a model, not necessarily an optimal one, that satisfies the query, and such that no model unsatisfying the query is more preferred.If the query or its complement are among the objective atoms, the algorithm has a simpler behavior, which is also described in the paper.Moreover, an incomplete algorithm is obtained by searching for a model satisfying both the query and an objective atom being unit-implied by the theory extended with the complement of the query.A prototypical implementation is tested on instances from the 2nd International Competition on Computational Models of Argumentation (ICCMA'17).

• #3892
Compiling Model Representations for Querying Large ABoxes in Expressive DLs
Labinot Bajraktari, Magdalena Ortiz, Mantas Simkus

Answering ontology mediated queries (OMQs) has received much attention in the last decade, but the big gap between practicable algorithms for lightweight ontologies, that are supported by implemented reasoners, and purely theoretical algorithms for expressive ontologies that are not amenable to implementation, has only increased. Towards narrowing the gap, we propose an algorithm to compile a representation of sets of models for ALCHI ontologies, which is sufficient for answering any monotone OMQ. Rather than reasoning for specific ABoxes, or being fully data-independent, we use generic descriptions of families of ABoxes, given by what we call profiles. Our model compilation algorithm runs on TBoxes and sets of profiles, and supports the incremental addition of new profiles. To illustrate the potential of our approach for OMQ answering, we implement a rewriting into an extension of Datalog for OMQs comprising reachability queries, and provide some promising evaluation results.

• #3448
First-Order Rewritability of Frontier-Guarded Ontology-Mediated Queries
Pablo Barceló, Gerald Berger, Carsten Lutz, Andreas Pieris

We focus on ontology-mediated queries (OMQs) based on (frontier-)guarded existential rules and (unions of) conjunctive queries, and we investigate the problem of FO-rewritability, i.e., whether an OMQ can be rewritten as a first-order query. We adopt two different approaches. The first approach employs standard two-way alternating parity tree automata. Although it does not lead to a tight complexity bound, it provides a transparent solution based on widely known tools. The second approach relies on a sophisticated automata model, known as cost automata. This allows us to show that our problem is 2EXPTIME-complete. In both approaches, we provide semantic characterizations of FO-rewritability that are of independent interest.

• #3220
Consequence-based Reasoning for Description Logics with Disjunction, Inverse Roles, Number Restrictions, and Nominals
David Tena Cucala, Bernardo Cuenca Grau, Ian Horrocks

We present a consequence-based calculus for concept subsumption and classification in the description logic ALCHOIQ, which extends ALC with role hierarchies, inverse roles, number restrictions, and nominals. By using standard transformations, our calculus extends to SROIQ, which covers all of OWL 2 DL except for datatypes. A key feature of our calculus is its pay-as-you-go behaviour: unlike existing algorithms, our calculus is worst-case optimal for all the well-known proper fragments of ALCHOIQ, albeit not for the full logic.

### Tuesday 1708:30 - 09:55ML-MMM2 - Multi-Instance, Multi-Label, Multi-View Learning 2 (C8)

Chair: Pengfei Zhu
• #155
Grouping Attribute Recognition for Pedestrian with Joint Recurrent Learning
Xin Zhao, Liufang Sang, Guiguang Ding, Yuchen Guo, Xiaoming Jin
Multi-Instance, Multi-Label, Multi-View Learning 2

Pedestrian attributes recognition is to predict attribute labels of pedestrian from surveillance images, which is a very challenging task for computer vision due to poor imaging quality and small training dataset. It is observed that semantic pedestrian attributes to be recognised tend to show semantic or visual spatial correlation. Attributes can be grouped by the correlation while previous works mostly ignore this phenomenon. Inspired by Recurrent Neural Network (RNN)'s super capability of learning context correlations, this paper proposes an end-to-end Grouping Recurrent Learning (GRL) model that takes advantage of the intra-group mutual exclusion and inter-group correlation to improve the performance of pedestrian attribute recognition. Our GRL method starts with the detection of precise body region via Body Region Proposal followed by feature extraction from detected regions. These features, along with the semantic groups, are fed into RNN for recurrent grouping attribute recognition, where intra group correlations can be learned. Extensive empirical evidence shows that our GRL model achieves state-of-the-art results, based on pedestrian attribute datasets, i.e. standard PETA and RAP datasets.

• #1103
Multi-Label Co-Training
Yuying Xing, Guoxian Yu, Carlotta Domeniconi, Jun Wang, Zili Zhang
Multi-Instance, Multi-Label, Multi-View Learning 2

Multi-label learning aims at assigning a set of appropriate labels to multi-label samples.  Although it has been successfully applied in various domains in recent years, most multi-label learning methods require sufficient labeled training samples, because of the large number of possible label sets.  Co-training, as an important branch of semi-supervised learning, can leverage unlabeled samples, along with scarce labeled ones, and can potentially help with the large labeled data requirement. However, it is a difficult challenge to combine multi-label learning with co-training. Two distinct issues are associated with the challenge: (i) how to solve the widely-witnessed class-imbalance problem in multi-label learning; and (ii) how to select samples with confidence, and  communicate their predicted labels among  classifiers for model refinement. To address these issues, we introduce an approach called Multi-Label Co-Training (MLCT). MLCT leverages information concerning the co-occurrence  of pairwise labels to address the class-imbalance challenge; it introduces a predictive reliability measure to select samples, and applies label-wise filtering to confidently communicate labels of selected samples among co-training classifiers.  MLCT performs favorably against related competitive multi-label learning methods on benchmark datasets and it is also robust to the input parameters.

• #2732
Deep Discrete Prototype Multilabel Learning
Xiaobo Shen, Weiwei Liu, Yong Luo, Yew-Soon Ong, Ivor W. Tsang
Multi-Instance, Multi-Label, Multi-View Learning 2

kNN embedding methods, such as the state-of-the-art LM-kNN, have shown impressive results in multi-label learning. Unfortunately, these approaches suffer expensive computation and memory costs in large-scale settings. To fill this gap, this paper proposes a novel deep prototype compression, i.e., DBPC for fast multi-label prediction. DBPC compresses the database into a small set of short discrete prototypes, and uses the prototypes for prediction. The benefit of DBPC comes from two aspects: 1) The number of distance comparisons are reduced in the prototype; 2) The distance computation cost is significantly decreased in the reduced space. We propose to jointly learn the deep latent subspace and discrete prototypes within one framework. The encoding and decoding neural networks are employed to make deep discrete prototypes well represent the instances and labels. Extensive experiments on several large-scale datasets demonstrate that DBPC achieves several orders of magnitude lower storage and prediction complexity than state-of-the-art multi-label methods, while achieving competitive accuracy.

• #3183
Leveraging Latent Label Distributions for Partial Label Learning
Lei Feng, Bo An
Multi-Instance, Multi-Label, Multi-View Learning 2

In partial label learning, each training example is assigned a set of candidate labels, only one of which is the ground-truth label. Existing partial label learning frameworks either assume each candidate label of equal confidence or consider the ground-truth label as a latent variable hidden in the indiscriminate candidate label set, while the different labeling confidence levels of the candidate labels are regrettably ignored. In this paper, we formalize the different labeling confidence levels as the latent label distributions, and propose a novel unified framework to estimate the latent label distributions while training the model simultaneously. Specifically, we present a biconvex formulation with constrained local consistency and adopt an alternating method to solve this optimization problem. The process of alternating optimization exactly facilitates the mutual adaption of the model training and the constrained label propagation. Extensive experimental results on controlled UCI datasets as well as real-world datasets clearly show the effectiveness of the proposed approach.

• #3330
Robust Multi-view Learning via Half-quadratic Minimization
Yonghua Zhu, Xiaofeng Zhu, Wei Zheng
Multi-Instance, Multi-Label, Multi-View Learning 2

Although multi-view clustering is capable to usemore information than single view clustering, existing multi-view clustering methods still have issues to be addressed, such as initialization sensitivity, the specification of the number of clusters,and the influence of outliers. In this paper, we propose a robust multi-view clustering method to address these issues. Specifically, we first propose amulti-view based sum-of-square error estimation tomake the initialization easy and simple as well asuse a sum-of-norm regularization to automaticallylearn the number of clusters according to data distribution. We further employ robust estimators constructed by the half-quadratic theory to avoid theinfluence of outliers for conducting robust estimations of both sum-of-square error and the numberof clusters. Experimental results on both syntheticand real datasets demonstrate that our method outperforms the state-of-the-art methods.

• #1694
Localized Incomplete Multiple Kernel k-means
Xinzhong Zhu, Xinwang Liu, Miaomiao Li, En Zhu, Li Liu, Zhiping Cai, Jianping Yin, Wen Gao
Multi-Instance, Multi-Label, Multi-View Learning 2

The recently proposed multiple kernel k-means with incomplete kernels (MKKM-IK) optimally integrates a group of pre-specified incomplete kernel matrices to improve clustering performance. Though it demonstrates promising performance in various applications, we observe that it does not \emph{sufficiently  consider the local structure among data and indiscriminately forces all pairwise sample similarity to equally align with their ideal similarity values}. This could make the incomplete kernels less effectively imputed, and in turn adversely affect the clustering performance. In this paper, we propose a novel localized incomplete multiple kernel k-means (LI-MKKM) algorithm to address this issue. Different from existing MKKM-IK, LI-MKKM only requires the similarity of a sample to its k-nearest neighbors to align with their ideal similarity values. This helps the clustering algorithm to focus on closer sample pairs that shall stay together and avoids involving unreliable similarity evaluation for farther sample pairs. We carefully design a three-step iterative algorithm to solve the resultant optimization problem and theoretically prove its convergence. Comprehensive experiments on eight benchmark datasets demonstrate that our algorithm significantly outperforms the state-of-the-art comparable algorithms proposed in the recent literature, verifying the advantage of considering local structure.

• #2333
Label Embedding Based on Multi-Scale Locality Preservation
Cheng-Lun Peng, An Tao, Xin Geng
Multi-Instance, Multi-Label, Multi-View Learning 2

Label Distribution Learning (LDL) fits the situations well that focus on the overall distribution of the whole series of labels. The numerical labels of LDL satisfy the integrity probability constraint. Due to LDL's special label domain, existing label embedding algorithms that focus on embedding of binary labels are thus unfit for LDL. This paper proposes a specially designed approach MSLP that achieves label embedding for LDL by Multi-Scale Locality Preserving (MSLP). Specifically, MSLP takes the locality information of data in both the label space and the feature space into account with different locality granularity. By assuming an explicit mapping from the features to the embedded labels, MSLP does not need an additional learning process after completing embedding. Besides, MSLP is insensitive to the existing of data points violating the smoothness assumption, which is usually caused by noises. Experimental results demonstrate the effectiveness of MSLP in preserving the locality structure of label distributions in the embedding space and show its superiority over the state-of-the-art baseline methods.

### Tuesday 1708:30 - 09:55PS-UAI - Planning and Uncertainty in Ai: Markov Decision Processes (K2)

Chair: Chris Amato
• #828
Policy Optimization with Second-Order Advantage Information
Jiajin Li, Baoxiang Wang, Shengyu Zhang
Planning and Uncertainty in Ai: Markov Decision Processes

Policy optimization on high-dimensional continuous control tasks exhibits its difficulty caused by the large variance of the policy gradient estimators. We present the action subspace dependent gradient (ASDG) estimator which incorporates the Rao-Blackwell theorem (RB) and Control Variates (CV) into a unified framework to reduce the variance. To invoke RB, our proposed algorithm (POSA) learns the underlying factorization structure among the action space based on the second-order advantage information. POSA captures the quadratic information explicitly and efficiently by utilizing the wide \& deep architecture. Empirical studies show that our proposed approach demonstrates the performance improvements on high-dimensional synthetic settings and OpenAI Gym's MuJoCo continuous control tasks.

• #3283
Computational Approaches for Stochastic Shortest Path on Succinct MDPs
Krishnendu Chatterjee, Hongfei Fu, Amir Goharshady, Nastaran Okati
Planning and Uncertainty in Ai: Markov Decision Processes

We consider the stochastic shortest path (SSP) problem for succinct Markov decision processes (MDPs), where the MDP consists of a set of variables, and a set of nondeterministic rules that update the variables. First, we show that several examples from the AI literature can be modeled as succinct MDPs. Then we present computational approaches for upper and lower bounds for the SSP problem: (a) for computing upper bounds, our method is polynomial-time in the implicit description of the MDP; (b) for lower bounds, we present a polynomial-time (in the size of the implicit description) reduction to quadratic programming. Our approach is applicable even to infinite-state MDPs. Finally, we present experimental results to demonstrate the effectiveness of our approach on several classical examples from the AI literature.

• #3957
Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes
Shun Zhang, Edmund H. Durfee, Satinder Singh
Planning and Uncertainty in Ai: Markov Decision Processes

As it achieves a goal on behalf of its human user, an autonomous agent's actions may have side effects that change features of its environment in ways that negatively surprise its user. An agent that can be trusted to operate safely should thus only change features the user has explicitly permitted. We formalize this problem, and develop a planning algorithm that avoids potentially negative side effects given what the agent knows about (un)changeable features. Further, we formulate a provably minimax-regret querying strategy for the agent to selectively ask the user about features that it hasn't explicitly been told about. We empirically show how much faster it is than a more exhaustive approach and how much better its queries are than those found by the best known heuristic.

• #3587
Goal-HSVI: Heuristic Search Value Iteration for Goal POMDPs
Karel Horák, Branislav Bošanský, Krishnendu Chatterjee
Planning and Uncertainty in Ai: Markov Decision Processes

Partially observable Markov decision processes (POMDPs) are the standard models for planning under uncertainty with both finite and infinite horizon. Besides the well-known discounted-sum objective, indefinite-horizon objective (aka Goal-POMDPs) is another classical objective for POMDPs. In this case, given a set of target states and a positive cost for each transition, the optimization objective is to minimize the expected total cost until a target state is reached. In the literature, RTDP-Bel or heuristic search value iteration (HSVI) have been used for solving Goal-POMDPs. Neither of these algorithms has theoretical convergence guarantees, and HSVI may even fail to terminate its trials. We give the following contributions: (1) We discuss the challenges introduced in Goal-POMDPs and illustrate how they prevent the original HSVI from converging. (2) We present a novel algorithm inspired by HSVI, termed Goal-HSVI, and show that our algorithm has convergence guarantees. (3) We show that Goal-HSVI outperforms RTDP-Bel on a set of well-known examples.

• #1724
Expectation Optimization with Probabilistic Guarantees in POMDPs with Discounted-Sum Objectives
Krishnendu Chatterjee, Adrián Elgyütt, Petr Novotný, Owen Rouillé
Planning and Uncertainty in Ai: Markov Decision Processes

Partially-observable Markov decision processes (POMDPs) with discounted-sum payoff are a standard framework to model a wide range of problems related to decision making under uncertainty. Traditionally, the goal has been to obtain policies that optimize the expectation of the discounted-sum payoff. A key drawback of the expectation measure is that even low probability events with extreme payoff can significantly affect the expectation, and thus the obtained policies are not necessarily risk averse. An alternate approach is to optimize the probability that the payoff is above a certain threshold, which allows to obtain risk-averse policies, but ignore optimization of the expectation. We consider the expectation optimization with probabilistic guarantee (EOPG) problem where the goal is to optimize the expectation ensuring that the payoff is above a given threshold with at least a specified probability. We present several results on the EOPG problem, including the first algorithm to solve it.

• #1673
Dynamic Resource Routing using Real-Time Dynamic Programming
Sebastian Schmoll, Matthias Schubert
Planning and Uncertainty in Ai: Markov Decision Processes

Acquiring available resources in stochastic environments becomes more and more important to future mobility. For instance, cities like Melbourne, Canberra and San Francisco install sensors that detect in real-time whether a parking spot (resource) is available or not. In such environments, the current state of the resources may be fully observable, although the future development is stochastic. In order to reduce the traffic, such cities want to fully exploit parking spots, such that the amount of searching cars is minimized. Thus, we formulate a problem setting where the expected seek time for each driver is minimized. This problem can be modeled by a Markov Decision Process (MDP) and solved using standard algorithms. In this paper, we focus on the setting, where pre-computation is not possible and search policies have to be computed on the fly. Our approach is based on state-of-the-art Real-Time Dynamic Programming (RTDP) approaches. However, standard RTDP approaches do not perform well on this specific problem setting as shown in our experiments. We introduce adapted bounds and approximations that exploit the specific nature of the problem in order to improve the performance significantly.

• #3506
Planning and Learning with Stochastic Action Sets
Craig Boutilier, Alon Cohen, Avinatan Hassidim, Yishay Mansour, Ofer Meshi, Martin Mladenov, Dale Schuurmans
Planning and Uncertainty in Ai: Markov Decision Processes

In many practical uses of reinforcement learning (RL) the set of actions available at a given state is a random variable, with realizations governed by an exogenous stochastic process. Somewhat surprisingly, the foundations for such sequential decision processes have been unaddressed. In this work, we formalize and investigate MDPs with stochastic action sets (SAS-MDPs) to provide these foundations. We show that optimal policies and value functions in this model have a structure that admits a compact representation. From an RL perspective, we show that Q-learning with sampled action sets is sound. In model-based settings, we consider two important special cases: when individual actions are available with independent probabilities, and a sampling-based model for unknown distributions. We develop polynomial-time value and policy iteration methods for both cases, and provide a polynomial-time linear programming solution for the first case.

### Tuesday 1708:30 - 09:55NLP-DIA2 - Dialogue, Conversation Models (T2)

Chair: Lei Shu
• #781
Goal-Oriented Chatbot Dialog Management Bootstrapping with Transfer Learning
Vladimir Ilievski, Claudiu Musat, Andreea Hossman, Michael Baeriswyl
Dialogue, Conversation Models

Goal-Oriented (GO) Dialogue Systems, colloquially known as goal oriented chatbots, help users achieve a predefined goal (e.g. book a movie ticket) within a closed domain. A first step is to understand the user's goal by using natural language understanding techniques. Once the goal is known, the bot must manage a dialogue to achieve that goal, which is conducted with respect to a learnt policy. The success of the dialogue system depends on the quality of the policy, which is in turn reliant on the availability of high-quality training data for the policy learning method, for instance Deep Reinforcement Learning. Due to the domain specificity, the amount of available data is typically too low to allow the training of good dialogue policies. In this paper we introduce a transfer learning method to mitigate the effects of the low in-domain data availability. Our transfer learning based approach improves the bot's success rate by 20% in relative terms for distant domains and we more than double it for close domains, compared to the model without transfer learning. Moreover, the transfer learning chatbots learn the policy up to 5 to 10 times faster. Finally, as the transfer learning approach is complementary to additional processing such as warm-starting, we show that their joint application gives the best outcomes.

• #378
A Weakly Supervised Method for Topic Segmentation and Labeling in Goal-oriented Dialogues via Reinforcement Learning
Ryuichi Takanobu, Minlie Huang, Zhongzhou Zhao, Fenglin Li, Haiqing Chen, Xiaoyan Zhu, Liqiang Nie
Dialogue, Conversation Models

Topic structure analysis plays a pivotal role in dialogue understanding. We propose a reinforcement learning (RL) method for topic segmentation and labeling in goal-oriented dialogues, which aims to detect topic boundaries among dialogue utterances and assign topic labels to the utterances. We address three common issues in the goal-oriented customer service dialogues: informality, local topic continuity, and global topic structure. We explore the task in a weakly supervised setting and formulate it as a sequential decision problem. The proposed method consists of a state representation network to address the informality issue, and a policy network with rewards to model local topic continuity and global topic structure. To train the two networks and offer a warm-start to the policy, we firstly use some keywords to annotate the data automatically. We then pre-train the networks on noisy data. Henceforth, the method continues to refine the data labels using the current policy to learn better state representations on the refined data for obtaining a better policy. Results demonstrate that this weakly supervised method obtains substantial improvements over state-of-the-art baselines.

• #2624
Scheduled Policy Optimization for Natural Language Communication with Intelligent Agents
Wenhan Xiong, Xiaoxiao Guo, Mo Yu, Shiyu Chang, Bowen Zhou, William Yang Wang
Dialogue, Conversation Models

We investigate the task of learning to interpret natural language instructions by jointly reasoning with visual observations and language inputs. Unlike current methods which start with learning from demonstrations (LfD) and then use reinforcement learning (RL) to fine-tune the model parameters, we propose a novel policy optimization algorithm which can dynamically schedule demonstration learning and RL. The proposed training paradigm provides efficient exploration and generalization beyond existing methods. Comparing to existing ensemble models, the best single model based on our proposed method tremendously decreases the execution error by 55% on a block-world environment. To further illustrate the exploration strategy of our RL algorithm, our paper includes systematic studies on the evolution of policy entropy during training.

• #955
Assigning Personality/Profile to a Chatting Machine for Coherent Conversation Generation
Qiao Qian, Minlie Huang, Haizhou Zhao, Jingfang Xu, Xiaoyan Zhu
Dialogue, Conversation Models

Endowing a chatbot with personality is challenging but significant to deliver more realistic and natural conversations. In this paper, we address the issue of generating responses that are coherent to a pre-specified personality or profile. We present a method that uses generic conversation data from social media (without speaker identities) to generate profile-coherent responses. The central idea is to detect whether a profile should be used when responding to a user post (by a profile detector), and if necessary, select a key-value pair from the profile to generate a response forward and backward (by a bidirectional decoder) so that a personality-coherent response can be generated. Furthermore, in order to train the bidirectional decoder with generic dialogue data, a position detector is designed to predict a word position from which decoding should start given a profile value. Manual and automatic evaluation shows that our model can deliver more coherent, natural, and diversified responses.

• #1045
Adaboost with Auto-Evaluation for Conversational Models
Juncen Li, Ping Luo, Ganbin Zhou, Fen Lin, Cheng Niu
Dialogue, Conversation Models

We propose a boosting method for conversational models to encourage them to generate more human-like dialogs. In our method, we consider existing conversational models as weak generators and apply Adaboost to update those models. However, conventional Adaboost cannot be directly applied on conversational models. Because for conversational models, conventional Adaboost cannot adaptively adjust the weight on the instance for subsequent learning, result from the simple comparison between the true output y (to an input x) and its corresponding predicted output y' cannot directly evaluate the learning performance on x. To address this issue, we develop the Adaboost with Auto-Evaluation (called AwE). In AwE, an auto-evaluator is proposed to evaluate the predicted results, which makes it applicable to conversational models. Furthermore, we present the theoretical analysis that the training error drops exponentially fast only if certain assumption over the proposed auto-evaluator holds. Finally, we empirically show that AwE visibly boosts the performance of existing single conversational models and also outperforms the other ensemble methods for conversational models.

• #1324
An Ensemble of Retrieval-Based and Generation-Based Human-Computer Conversation Systems
Yiping Song, Cheng-Te Li, Jian-Yun Nie, Ming Zhang, Dongyan Zhao, Rui Yan
Dialogue, Conversation Models

Human-computer conversation systems have attracted much attention in Natural Language Processing. Conversation systems can be roughly divided into two categories: retrieval-based and generation-based systems. Retrieval systems search a user-issued utterance (namely a query ) in a large conversational repository and return a reply that best matches the query. Generative approaches synthesize new replies. Both ways have certain advantages but suffer from their own disadvantages. We propose a novel ensemble of retrieval-based and generation-based conversation system. The retrieved candidates, in addition to the original query, are fed to a reply generator via a neural network, so that the model is aware of more information. The generated reply together with the retrieved ones then participates in a re-ranking process to find the final reply to output. Experimental results show that such an ensemble system outperforms each single module by a large margin.

• #1730
Reinforcing Coherence for Sequence to Sequence Model in Dialogue Generation
Hainan Zhang, Yanyan Lan, Jiafeng Guo, Jun Xu, Xueqi Cheng
Dialogue, Conversation Models

Sequence to sequence (Seq2Seq) approach has gained great attention in the field of single-turn dialogue generation. However, one serious problem is that most existing Seq2Seq based models tend to generate common responses lacking specific meanings. Our analysis show that the underlying reason is that Seq2Seq is equivalent to optimizing Kullback–Leibler (KL) divergence, thus does not penalize the case whose generated probability is high while the true probability is low. However, the true probability is unknown, which poses challenges for tackling this problem. Inspired by the fact that the coherence (i.e. similarity) between post and response is consistent with human evaluation, we hypothesize that the true probability of a response is proportional to the coherence degree. The coherence scores are then used as the reward function in a reinforcement learning framework to penalize the case whose generated probability is high while the true probability is low. Three different types of coherence models, including an unlearned similarity function, a pretrained semantic matching function, and an end-to-end dual learning architecture, are proposed in this paper. Experimental results on both Chinese Weibo dataset and English Subtitle dataset show that the proposed models produce more specific and meaningful responses, yielding better performances against Seq2Seq models in terms of both metric-based and human evaluations.

### Tuesday 1708:30 - 09:55CV-DEE - Deep Learning for Computer Vision (T1)

Chair: Pong C. Yuen
• #3145
Image-level to Pixel-wise Labeling: From Theory to Practice
Tiezhu Sun, Wei Zhang, Zhijie Wang, Lin Ma, Zequn Jie
Deep Learning for Computer Vision

Conventional convolutional neural networks (CNNs) have achieved great success in image semantic segmentation. Existing methods mainly focus on learning pixel-wise labels from an image directly. In this paper, we advocate tackling the pixel-wise segmentation problem by considering the image-level classification labels. Theoretically, we analyze and discuss the effects of image-level labels on pixel-wise segmentation from the perspective of information theory. In practice, an end-to-end segmentation model is built by fusing the image-level and pixel-wise labeling networks. A generative network is included to reconstruct the input image and further boost the segmentation model training with an auxiliary loss. Extensive experimental results on benchmark dataset demonstrate the effectiveness of the proposed method, where good image-level labels can significantly improve the pixel-wise segmentation accuracy.

• #2729
Unifying and Merging Well-trained Deep Neural Networks for Inference Stage
Yi-Min Chou, Yi-Ming Chan, Jia-Hong Lee, Chih-Yi Chiu, Chu-Song Chen
Deep Learning for Computer Vision

We propose a novel method to merge convolutional neural-nets for the inference stage. Given two well-trained networks that may have different architectures that handle different tasks, our method aligns the layers of the original networks and merges them into a unified model by sharing the representative codes of weights. The shared weights are further re-trained to fine-tune the performance of the merged model. The proposed method effectively produces a compact model that may run original tasks simultaneously on resource-limited devices. As it preserves the general architectures and leverages the co-used weights of well-trained networks, a substantial training overhead can be reduced to shorten the system development time. Experimental results demonstrate a satisfactory performance and validate the effectiveness of the method.

• #1618
Refine or Represent: Residual Networks with Explicit Channel-wise Configuration
Yanyan Shen, Jinyang Gao
Deep Learning for Computer Vision

The successes of deep residual learning are mainly based on one key insight: instead of learning a completely new representation y = H(x), it is much easier to learn and optimize its residual mapping F(x)= H(x)-x, as F(x) could be generally closer to zero than the non-residual function H(x). In this paper, we further exploit this insight by explicitly configuring each feature channel with a fine-grained learning style. We define two types of channel-wise learning styles: Refine and Represent. A Refine channel is learnt via the residual function yi= Fi(x) + xi with a regularization term on the channel response ||Fi(x)||, aiming to refine the input feature channel xi of the layer. A Represent channel directly learns a new representation yi = Hi(x) without calculating the residual function with reference to xi. We apply random channel-wise configuration to each residual learning block. Experimental results on the CIFAR10, CIFAR100 and ImageNet datasets demonstrate that our proposed method can substantially improve the performance of conventional residual networks including ResNet, ResNeXt and SENet.

• #853
Human Motion Generation via Cross-Space Constrained Sampling
Zhongyue Huang, Jingwei Xu, Bingbing Ni
Deep Learning for Computer Vision

We aim to automatically generate human motion sequence from a single input person image, with some specific action label. To this end, we propose a cross-space human motion video generation network which features two paths: a forward path that first samples/generates a sequence of low dimensional motion vectors based on Gaussian Process (GP), which is paired with the input person image to form a moving human figure sequence; and a backward path based on the predicted human images to re-extract the corresponding latent motion representations. As lack of supervision, the reconstructed latent motion representations are expected to be as close as possible to the GP sampled ones, thus yielding a cyclic objective function for cross-space (i.e., motion and appearance) mutual constrained generation. We further propose an alternative sampling/generation algorithm with respect to constraints from both spaces. Extensive experimental results show that the proposed framework successfully generates novel human motion sequences with reasonable visual quality.

• #241
Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks
Yang He, Guoliang Kang, Xuanyi Dong, Yanwei Fu, Yi Yang
Deep Learning for Computer Vision

This paper proposed a Soft Filter Pruning (SFP) method to accelerate the inference procedure of deep Convolutional Neural Networks (CNNs). Specifically, the proposed SFP enables the pruned filters to be updated when training the model after pruning. SFP has two advantages over previous works: (1) Larger model capacity. Updating previously pruned filters provides our approach with larger optimization space than fixing the filters to zero. Therefore, the network trained by our method has a larger model capacity to learn from the training data. (2) Less dependence on the pretrained model. Large capacity enables SFP to train from scratch and prune the model simultaneously. In contrast, previous filter pruning methods should be conducted on the basis of the pre-trained model to guarantee their performance. Empirically, SFP from scratch outperforms the previous filter pruning methods. Moreover, our approach has been demonstrated effective for many advanced CNN architectures. Notably, on ILSCRC-2012, SFP reduces more than 42% FLOPs on ResNet-101 with even 0.2% top-5 accuracy improvement, which has advanced the state-of-the-art. Code is publicly available on GitHub: https://github.com/he-y/softfilter-pruning

• #627
Deep Propagation Based Image Matting
Yu Wang, Yi Niu, Peiyong Duan, Jianwei Lin, Yuanjie Zheng
Deep Learning for Computer Vision

In this paper, we propose a deep propagation based image matting framework by introducing deep learning into learning an alpha matte propagation principal. Our deep learning architecture is a concatenation of a deep feature extraction module, an affinity learning module and a matte propagation module. These three modules are all differentiable and can be optimized jointly via an end-to-end training process. Our framework results in a semantic-level pairwise similarity of pixels for propagation by learning deep image representations adapted to matte propagation. It combines the power of deep learning and matte propagation and can therefore surpass prior state-of-the-art matting techniques in terms of both accuracy and training complexity, as validated by our experimental results from 243K images created based on two benchmark matting databases.

• #1095
Progressive Blockwise Knowledge Distillation for Neural Network Acceleration
Hui Wang, Hanbin Zhao, Xi Li, Xu Tan
Deep Learning for Computer Vision

As an important and challenging problem in machine learning and computer vision, neural network acceleration essentially aims to enhance the computational efficiency without sacrificing the model accuracy too much. In this paper, we propose a progressive blockwise learning scheme for teacher-student model distillation at the subnetwork block level. The proposed scheme is able to distill the knowledge of the entire teacher network by locally extracting the knowledge of each block in terms of progressive blockwise function approximation. Furthermore, we propose a structure design criterion for the student subnetwork block, which is able to effectively preserve the original receptive field from the teacher network. Experimental results demonstrate the effectiveness of the proposed scheme against the state-of-the-art approaches.

### Tuesday 1708:30 - 09:55SIS-ML1 - Sister Conferences Best Papers: Machine Learning (K11)

Chair: Volker Tresp
• #5120
Emergent Tangled Program Graphs in Multi-Task Learning
Stephen Kelly, Malcolm Heywood
Sister Conferences Best Papers: Machine Learning

We propose a Genetic Programming (GP) framework to address high-dimensional Multi-Task Reinforcement Learning (MTRL) through emergent modularity. A bottom-up process is assumed in which multiple programs self-organize into collective decision-making entities, or teams, which then further develop into multi-team policy graphs, or Tangled Program Graphs (TPG). The framework learns to play three Atari video games simultaneously, producing a single control policy that matches or exceeds leading results from (game-specific) deep reinforcement learning in each game. More importantly, unlike the representation assumed for deep learning, TPG policies start simple and adaptively complexify through interaction with the task environment, resulting in agents that are exceedingly simple, operating in real-time without specialized hardware support such as GPUs.

• #5138
Make Evasion Harder: An Intelligent Android Malware Detection System
Shifu Hou, Yanfang Ye, Yangqiu Song, Melih Abdulhayoglu
Sister Conferences Best Papers: Machine Learning

To combat the evolving Android malware attacks, in this paper, instead of only using Application Programming Interface (API) calls, we further analyze the different relationships between them and create higher-level semantics which require more efforts for attackers to evade the detection. We represent the Android applications (apps), related APIs, and their rich relationships as a structured heterogeneous information network (HIN). Then we use a meta-path based approach to characterize the semantic relatedness of apps and APIs. We use each meta-path to formulate a similarity measure over Android apps, and aggregate different similarities using multi-kernel learning to make predictions. Promising experimental results based on real sample collections from Comodo Cloud Security Center demonstrate that our developed system HinDroid outperforms other alternative Android malware detection techniques.

• #5147
Time Series Chains: A Novel Tool for Time Series Data Mining
Yan Zhu, Makoto Imamura, Daniel Nikovski, Eamonn Keogh
Sister Conferences Best Papers: Machine Learning

Since their introduction over a decade ago, time se-ries motifs have become a fundamental tool for time series analytics, finding diverse uses in dozens of domains. In this work we introduce Time Series Chains, which are related to, but distinct from, time series motifs. Informally, time series chains are a temporally ordered set of subsequence patterns, such that each pattern is similar to the pattern that preceded it, but the first and last patterns are arbi-trarily dissimilar. In the discrete space, this is simi-lar to extracting the text chain “hit, hot, dot, dog” from a paragraph. The first and last words have nothing in common, yet they are connected by a chain of words with a small mutual difference. Time Series Chains can capture the evolution of systems, and help predict the future. As such, they potentially have implications for prognostics. In this work, we introduce a robust definition of time series chains, and a scalable algorithm that allows us to discover them in massive datasets.

• #5150
TensorCast: Forecasting Time-Evolving Networks with Contextual Information
Miguel Araújo, Pedro Ribeiro, Christos Faloutsos
Sister Conferences Best Papers: Machine Learning

Can we forecast future connections in a social network? Can we predict who will start using a given hashtag in Twitter, leveraging contextual information such as who follows or retweets whom to improve our predictions? In this paper we present an abridged report of TensorCast, an award winning method for forecasting time-evolving networks, that uses coupled tensors to incorporate multiple information sources. TensorCast is scalable (linearithmic on the number of connections), effective (more precise than competing methods) and general (applicable to any data source representable by a tensor). We also showcase our method when applied to forecast two large scale heterogeneous real world temporal networks, namely Twitter and DBLP.

• #5114
A Genetic Programming Approach to Designing Convolutional Neural Network Architectures
Masanori Suganuma, Shinichi Shirakawa, Tomoharu Nagao
Sister Conferences Best Papers: Machine Learning

We propose a method for designing convolutional neural network (CNN) architectures based on Cartesian genetic programming (CGP). In the proposed method, the architectures of CNNs are represented by directed acyclic graphs, in which each node represents highly-functional modules such as convolutional blocks and tensor operations, and each edge represents the connectivity of layers. The architecture is optimized to maximize the classification accuracy for a validation dataset by an evolutionary algorithm. We show that the proposed method can find competitive CNN architectures compared with state-of-the-art methods on the image classification task using CIFAR-10 and CIFAR-100 datasets.

• #5130
Distributing Frank-Wolfe via Map-Reduce
Armin Moharrer, Stratis Ioannidis
Sister Conferences Best Papers: Machine Learning

We identify structural properties under which a convex optimization over the simplex can be massively parallelized via map-reduce operations using the Frank-Wolfe (FW) algorithm. A broad class of problems, e.g., Convex Approximation, Experimental Designs, and Adaboost, can be tackled this way. We implement FW over Apache Spark, and solve problems with 20 million variables using 350 cores in 79 minutes; the same operation takes 165 hours when executed serially.

• #5122
An Efficient Minibatch Acceptance Test for Metropolis-Hastings
Daniel Seita, Xinlei Pan, Haoyu Chen, John Canny
Sister Conferences Best Papers: Machine Learning

We present a novel Metropolis-Hastings method for large datasets that uses small expected-size mini-batches of data. Previous work on reducing the cost of Metropolis-Hastings tests yields only constant factor reductions versus using the full dataset for each sample. Here we present a method that can be tuned to provide arbitrarily small batch sizes, by adjusting either proposal step size or temperature. Our test uses the noise-tolerant Barker acceptance test with a novel additive correction variable. The resulting test has similar cost to a normal SGD update. Our experiments demonstrate several order-of-magnitude speedups over previous work.

### Tuesday 1708:30 - 09:55HAI-COG - Cognition (C2)

Chair: Jörg Cassens
• #1751
Brain-inspired Balanced Tuning for Spiking Neural Networks
Tielin Zhang, Yi Zeng, Dongcheng Zhao, Bo Xu
Cognition

Due to the nature of Spiking Neural Networks (SNNs), it is challenging to be trained by biologically plausible learning principles. The multi-layered SNNs are with non-differential neurons, temporary-centric synapses, which make them nearly impossible to be directly tuned by back propagation. Here we propose an alternative biological inspired balanced tuning approach to train SNNs. The approach contains three main inspirations from the brain: Firstly, the biological network will usually be trained towards the state where the temporal update of variables are equilibrium (e.g. membrane potential); Secondly, specific proportions of excitatory and inhibitory neurons usually contribute to stable representations; Thirdly, the short-term plasticity (STP) is a general principle to keep the input and output of synapses balanced towards a better learning convergence. With these inspirations, we train SNNs with three steps: Firstly, the SNN model is trained with three brain-inspired principles; then weakly supervised learning is used to tune the membrane potential in the final layer for network classification; finally the learned information is consolidated from membrane potential into the weights of synapses by Spike-Timing Dependent Plasticity (STDP). The proposed approach is verified on the MNIST hand-written digit recognition dataset and the performance (the accuracy of 98.64%) indicates that the ideas of balancing state could indeed improve the learning ability of SNNs, which shows the power of proposed brain-inspired approach on the tuning of biological plausible SNNs.

• #2235
CSNN: An Augmented Spiking based Framework with Perceptron-Inception
Qi Xu, Yu Qi, Hang Yu, Jiangrong Shen, Huajin Tang, Gang Pan
Cognition

Spiking Neural Networks (SNNs) represent and transmit information in spikes, which is considered more biologically realistic and computationally powerful than the traditional Artificial Neural Networks. The spiking neurons encode useful temporal information and possess highly anti-noise property. The feature extraction ability of typical SNNs is limited by shallow structures. This paper focuses on improving the feature extraction ability of SNNs in virtue of powerful feature extraction ability of Convolutional Neural Networks (CNNs). CNNs can extract abstract features resorting to the structure of the convolutional feature maps. We propose a CNN-SNN (CSNN) model to combine feature learning ability of CNNs with cognition ability of SNNs.  The CSNN model learns the encoded spatial temporal representations of images in an event-driven way. We evaluate the CSNN model on the handwritten digits images dataset MNIST and its variational databases. In the presented experimental results, the proposed CSNN model is evaluated regarding learning capabilities, encoding mechanisms, robustness to noisy stimuli and its classification performance. The results show that CSNN behaves well compared to other cognitive models with significantly fewer neurons and training samples. Our work brings more biological realism into modern image classification models, with the hope that these models can inform how the brain performs this high-level vision task.

• #2310
Jointly Learning Network Connections and Link Weights in Spiking Neural Networks
Yu Qi, Jiangrong Shen, Yueming Wang, Huajin Tang, Hang Yu, Zhaohui Wu, Gang Pan
Cognition

Spiking neural networks (SNNs) are considered to be biologically plausible and power-efficient on neuromorphic hardware. However, unlike the brain mechanisms, most existing SNN algorithms have fixed network topologies and connection relationships. This paper proposes a method to jointly learn network connections and link weights simultaneously. The connection structures are optimized by the spike-timing-dependent plasticity (STDP) rule with timing information, and the link weights are optimized by a supervised algorithm. The connection structures and the weights are learned alternately until a termination condition is satisfied. Experiments are carried out using four benchmark datasets. Our approach outperforms classical learning methods such as STDP, Tempotron, SpikeProp, and a state-of-the-art supervised algorithm. In addition, the learned structures effectively reduce the number of connections by about 24%, thus facilitate the computational efficiency of the network.

• #4080
Replicating Active Appearance Model by Generator Network
Tian Han, Jiawen Wu, Ying Nian Wu
Cognition

A recent Cell paper [Chang and Tsao, 2017] reports an interesting discovery. For the face stimuli generated by a pre-trained active appearance model (AAM), the responses of neurons in the areas of the primate brain that are responsible for face recognition exhibit strong linear relationship with the shape variables and appearance variables of the AAM that generates the face stimuli. In this paper, we show that this behavior can be replicated by a deep generative model called the generator network, which assumes that the observed signals are generated by latent random variables via a top-down convolutional neural network. Specifically, we learn the generator network from the face images generated by a pre-trained AAM model using variational auto-encoder, and we show that the inferred latent variables of the learned generator network have strong linear relationship with the shape and appearance variables of the AAM model that generates the face images. Unlike the AAM model that has an explicit shape model where the shape variables generate the control points or landmarks, the generator network has no such shape model and shape variables. Yet the generator network can learn the shape knowledge in the sense that some of the latent variables of the learned generator network capture the shape variations in the face images generated by AAM.

• #4177
Similarity-Based Reasoning, Raven's Matrices, and General Intelligence
Can Serif Mekik, Ron Sun, David Yun Dai
Cognition

This paper presents a model tackling a variant of the Raven's Matrices family of human intelligence tests along with computational experiments. Raven's Matrices are thought to challenge human subjects' ability to generalize knowledge and deal with novel situations. We investigate how a generic ability to quickly and accurately generalize knowledge can be succinctly captured by a computational system. This work is distinct from other prominent attempts to deal with the task in terms of adopting a generalized similarity-based approach. Raven's Matrices appear to primarily require similarity-based or analogical reasoning over a set of varied visual stimuli. The similarity-based approach eliminates the need for structure mapping as emphasized in many existing analogical reasoning systems. Instead, it relies on feature-based processing with both relational and non-relational features. Preliminary experimental results suggest that our approach performs comparably to existing symbolic analogy-based models.

• #534
A Simple Convolutional Neural Network for Accurate P300 Detection and Character Spelling in Brain Computer Interface
Hongchang Shan, Yu Liu, Todor Stefanov
Cognition

A Brain Computer Interface (BCI) character speller allows human-beings to directly spell characters using eye-gazes, thereby building communication between the human brain and a computer. Convolutional Neural Networks (CNNs) have shown better performance than traditional machine learning methods for BCI signal recognition and its application to the character speller. However, current CNN architectures limit further accuracy improvements of signal detection and character spelling and also need high complexity to achieve competitive accuracy, thereby preventing the use of CNNs in portable BCIs. To address these issues, we propose a novel and simple CNN which effectively learns feature representations from both raw temporal information and raw spatial information. The complexity of the proposed CNN is significantly reduced compared with state-of-the-art CNNs for BCI signal detection. We perform experiments on three benchmark datasets and compare our results with those in previous research works which report the best results. The comparison shows that our proposed CNN can increase the signal detection accuracy by up to 15.61% and the character spelling accuracy by up to 19.35%.

• #27
Salient Object Detection by Lossless Feature Reflection
Pingping Zhang, Wei Liu, Huchuan Lu, Chunhua Shen
Cognition

Salient object detection, which aims to identify and locate the most salient pixels or regions in images, has been attracting more and more interest due to its various real-world applications. However, this vision task is quite challenging, especially under complex image scenes. Inspired by the intrinsic reflection of natural images, in this paper we propose a novel feature learning framework for large-scale salient object detection. Specifically, we design a symmetrical fully convolutional network (SFCN) to learn complementary saliency features under the guidance of lossless feature reflection. The location information, together with contextual and semantic information, of salient objects are jointly utilized to supervise the proposed network for more accurate saliency predictions. In addition, to overcome the blurry boundary problem, we propose a new structural loss function to learn clear object boundaries and spatially consistent saliency. The coarse prediction results are effectively refined by these structural information for performance improvements. Extensive experiments on seven saliency detection datasets demonstrate that our approach achieves consistently superior performance and outperforms the very recent state-of-the-art methods.

### Tuesday 1708:30 - 09:55MUL-CFR - Collaborative Filtering, Recommender Systems (C3)

Chair: Paola Velardi
• #470
Recurrent Collaborative Filtering for Unifying General and Sequential Recommender
Disheng Dong, Xiaolin Zheng, Ruixun Zhang, Yan Wang
Collaborative Filtering, Recommender Systems

General recommender and sequential recommender are two commonly applied modeling paradigms for recommendation tasks. General recommender focuses on modeling the general user preferences, ignoring the sequential patterns in user behaviors; whereas sequential recommender focuses on exploring the item-to-item sequential relations, failing to model the global user preferences. In addition, better recommendation performance has recently been achieved by adopting an approach to combine them. However, previous approaches are unable to solve both tasks in a unified way and cannot capture the whole historical sequential information. In this paper, we propose a recommendation model named Recurrent Collaborative Filtering (RCF), which unifies both paradigms within a single model.Specifically, we combine recurrent neural network (the sequential recommender part) and matrix factorization model (the general recommender part) in a multi-task learning framework, where we perform joint optimization with shared model parameters enforcing the two parts to regularize each other. Furthermore, we empirically demonstrate on MovieLens and Netflix datasets that our model outperforms the state-of-the-art methods across the tasks of both sequential and general recommender.

• #3015
Aspect-Level Deep Collaborative Filtering via Heterogeneous Information Networks
Xiaotian Han, Chuan Shi, Senzhang Wang, Philip S. Yu, Li Song
Collaborative Filtering, Recommender Systems

Latent factor models have been widely used for recommendation. Most existing latent factor models mainly utilize the rating information between users and items, although some recently extended models add some auxiliary information to learn a unified latent factor between users and items.  The unified latent factor only represents the latent features of users and items from the aspect of purchase history. However, the latent features of users and items may stem from different aspects, e.g., the brand-aspect and category-aspect of items. In this paper, we propose a Neural network based Aspect-level Collaborative Filtering model (NeuACF) to exploit different aspect latent factors. Through modelling rich objects and relations in recommender system as a heterogeneous information network, NeuACF first extracts different aspect-level similarity matrices of users and items through different meta-paths and then feeds an elaborately designed deep neural network with these matrices to learn aspect-level latent factors. Finally, the aspect-level latent factors are effectively fused with an attention mechanism for the top-N recommendation. Extensive experiments on three real datasets show that NeuACF significantly outperforms both existing latent factor models and recent neural network models.

• #2281
Outer Product-based Neural Collaborative Filtering
Xiangnan He, Xiaoyu Du, Xiang Wang, Feng Tian, Jinhui Tang, Tat-Seng Chua
Collaborative Filtering, Recommender Systems

In this work, we contribute a new multi-layer neural network architecture named ONCF to perform collaborative filtering. The idea is to use an outer product to explicitly model the pairwise correlations between the dimensions of the embedding space. In contrast to existing neural recommender models that combine user embedding and item embedding via a simple concatenation or element-wise product, our proposal of using outer product above the embedding layer results in a two-dimensional interaction map that is more expressive and semantically plausible. Above the interaction map obtained by outer product, we propose to employ a convolutional neural network to learn high-order correlations among embedding dimensions. Extensive experiments on two public implicit feedback data demonstrate the effectiveness of our proposed ONCF framework, in particular, the positive effect of using outer product to model the correlations between embedding dimensions in the low level of multi-layer neural recommender model.

• #2222
Adaptive Collaborative Similarity Learning for Unsupervised Multi-view Feature Selection
Xiao Dong, Lei Zhu, Xuemeng Song, Jingjing Li, Zhiyong Cheng
Collaborative Filtering, Recommender Systems

In this paper, we investigate the research problem of unsupervised multi-view feature selection. Conventional solutions first simply combine multiple pre-constructed view-specific similarity structures into a collaborative similarity structure, and then perform the subsequent feature selection. These two processes are separate and independent. The collaborative similarity structure remains fixed during feature selection. Further, the simple undirected view combination may adversely reduce the reliability of the ultimate similarity structure for feature selection, as the view-specific similarity structures generally involve noises and outlying entries. To alleviate these problems, we propose an adaptive collaborative similarity learning (ACSL) for multi-view feature selection. We propose to dynamically learn the collaborative similarity structure, and further integrate it with the ultimate feature selection into a unified framework. Moreover, a reasonable rank constraint is devised to adaptively learn an ideal collaborative similarity structure with proper similarity combination weights and desirable neighbor assignment, both of which could positively facilitate the feature selection. An effective solution guaranteed with the proved convergence is derived to iteratively tackle the formulated optimization problem. Experiments demonstrate the superiority of the proposed approach.

• #3266
NPE: Neural Personalized Embedding for Collaborative Filtering
ThaiBinh Nguyen, Atsuhiro Takasu
Collaborative Filtering, Recommender Systems

Matrix factorization is one of the most efficient approaches in recommender systems. However, such algorithms, which rely on the interactions between users and items, perform poorly for "cold-users" (users with little history of such interactions) and at capturing the relationships between closely related items. To address these problems, we propose a neural personalized embedding (NPE) model, which improves the recommendation performance for cold-users and can learn effective representations of items. It models a user's click to an item in two terms: the personal preference of the user for the item, and the relationships between this item and other items clicked by the user. We show that NPE outperforms competing methods for top-N recommendations, specially for cold-user recommendations. We also performed a qualitative analysis that shows the effectiveness of the representations learned by the model.

• #1888
Towards Better Representation Learning for Personalized News Recommendation: a Multi-Channel Deep Fusion Approach
Jianxun Lian, Fuzheng Zhang, Xing Xie, Guangzhong Sun
Collaborative Filtering, Recommender Systems

Millions of news articles emerge every day. How to provide personalized news recommendations has become a critical task for service providers. In the past few decades, latent factor models has been widely used for building recommender systems (RSs). With the remarkable success of deep learning techniques especially in visual computing and natural language understanding, more and more researchers have been trying to leverage deep neural networks to learn latent representations for advanced RSs. Following mainstream deep learning-based RSs, we propose a novel deep fusion model (DFM), which aims to improve the representation learning abilities in deep RSs and can be used for both candidate retrieval and item re-ranking. There are two key components in our DFM approach, namely an inception module and an attention mechanism. The inception module improves the plain multi-layer network via leveraging of various levels of interaction simultaneously, while the attention mechanism merges latent representations learnt from different channels in a customized fashion. We conduct extensive experiments on a commercial news reading dataset, and the results demonstrate that the proposed DFM is superior to several state-of-the-art models.

• #3708
Content-Aware Hierarchical Point-of-Interest Embedding Model for Successive POI Recommendation
Buru Chang, Yonggyu Park, Donghyeon Park, Seongsoon Kim, Jaewoo Kang
Collaborative Filtering, Recommender Systems

Recommending a point-of-interest (POI) a user will visit next based on temporal and spatial context information is an important task in mobile-based applications. Recently, several POI recommendation models based on conventional sequential-data modeling approaches have been proposed. However, such models focus on only a user's check-in sequence information and the physical distance between POIs. Furthermore, they do not utilize the characteristics of POIs or the relationships between POIs. To address this problem, we propose CAPE, the first content-aware POI embedding model which utilizes text content that provides information about the characteristics of a POI. CAPE consists of a check-in context layer and a text content layer. The check-in context layer captures the geographical influence of POIs from the check-in sequence of a user, while the text content layer captures the characteristics of POIs from the text content. To validate the efficacy of CAPE, we constructed a large-scale POI dataset. In the experimental evaluation, we show that the performance of the existing POI recommendation models can be significantly improved by simply applying CAPE to the models.

### Tuesday 1710:25 - 11:10Invited Talk (VICTORIA)

Chair: Sarit Kraus
• The Moral Machine Experiment
Jean-Francois Bonnefon
Invited Talk
• ### Tuesday 1711:20 - 12:45DEMOS1 - Demos Talks 1: Planning, Robotics, Vision (VICTORIA)

Chair: Paul Weng
• #5306
Data-Driven Inventory Management and Dynamic Pricing Competition on Online Marketplaces
Rainer Schlosser, Carsten Walther, Martin Boissier, Matthias Uflacker
Demos Talks 1: Planning, Robotics, Vision

Online markets are characterized by competition and limited demand information. In E-commerce, firms compete against each other using data-driven dynamic pricing and ordering strategies. To successfully manage both inventory levels as well as offer prices is a highly challenging task as (i) demand is uncertain, (ii) competitors strategically interact, and (iii) optimized pricing and ordering decisions are mutually dependent. Currently, retailers lack the possibility to test and evaluate their algorithms appropriately before releasing them into the real world. To study joint dynamic ordering and pricing competition on online marketplaces, we built an interactive simulation platform. To be both flexible and scalable, the platform has a microservice-based architecture and allows handling dozens of competing merchants and streams of consumers with configurable characteristics. Further, we deployed and compared different pricing and ordering strategies, from simple rule-based ones to highly sophisticated data-driven strategies which are based on state-of-the-art demand learning techniques and efficient dynamic optimization models.

• #5326
IBM Scenario Planning Advisor: Plan Recognition as AI Planning in Practice
Shirin Sohrabi, Michael Katz, Oktie Hassanzadeh, Octavian Udrea, Mark D. Feblowitz
Demos Talks 1: Planning, Robotics, Vision

We present the IBM Research Scenario Planning Advisor (SPA), a decision support system that allows users to generate diverse alternate scenarios of the future and enhance their ability to imagine the different possible outcomes, including unlikely but potentially impactful futures. The system includes tooling for experts to intuitively encode their domain knowledge, and uses AI Planning to reason about this knowledge and the current state of the world, including news and social media, when generating scenarios.

• #5328
Visualizations for an Explainable Planning Agent
Tathagata Chakraborti, Kshitij P. Fadnis, Kartik Talamadupula, Mishal Dholakia, Biplav Srivastava, Jeffrey O. Kephart, Rachel K. E. Bellamy
Demos Talks 1: Planning, Robotics, Vision

In this demonstration, we report on the visualization capabilities of an Explainable AI Planning (XAIP) agent that can support human-in-the-loop decision-making. Imposing transparency and explainability requirements on such agents is crucial for establishing human trust and common ground with an end-to-end automated planning system. Visualizing the agent's internal decision making processes is a crucial step towards achieving this. This may include externalizing the "brain" of the agent: starting from its sensory inputs, to progressively higher order decisions made by it in order to drive its planning components. We demonstrate these functionalities in the context of a smart assistant in the Cognitive Environments Laboratory at IBM's T.J. Watson Research Center.

• #5338
Near Real-Time Detection of Poachers from Drones in AirSim
Elizabeth Bondi, Ashish Kapoor, Debadeepta Dey, James Piavis, Shital Shah, Robert Hannaford, Arvind Iyer, Lucas Joppa, Milind Tambe
Demos Talks 1: Planning, Robotics, Vision

The unrelenting threat of poaching has led to increased development of new technologies to combat it. One such example is the use of thermal infrared cameras mounted on unmanned aerial vehicles (UAVs or drones) to spot poachers at night and report them to park rangers before they are able to harm any animals. However, monitoring the live video stream from these conservation UAVs all night is an arduous task. Therefore, we discuss SPOT (Systematic Poacher deTector), a novel application that augments conservation drones with the ability to automatically detect poachers and animals in near real time. SPOT illustrates the feasibility of building upon state-of-the-art AI techniques, such as Faster RCNN, to address the challenges of automatically detecting animals and poachers in infrared images. This paper reports (i) the design of SPOT, (ii) efficient processing techniques to ensure usability in the field, (iii) evaluation of SPOT based on historical videos and a real-world test run by the end-users, Air Shepherd, in the field, and (iv) the use of AirSim for live demonstration of SPOT. The promising results from a field test have led to a plan for larger-scale deployment in a national park in southern Africa. While SPOT is developed for conservation drones, its design and novel techniques have wider application for automated detection from UAV videos.

• #5339
A Virtual Environment with Multi-Robot Navigation, Analytics, and Decision Support for Critical Incident Investigation
David L. Smyth, James Fennell, Sai Abinesh, Nazli B. Karimi, Frank G. Glavin, Ihsan Ullah, Brett Drury, Michael G. Madden
Demos Talks 1: Planning, Robotics, Vision

Accidents and attacks that involve chemical, biological, radiological/nuclear or explosive (CBRNE) substances are rare, but can be of high consequence. Since the investigation of such events is not anybody's routine work, a range of AI techniques can reduce investigators' cognitive load and support decision-making, including: planning the assessment of the scene; ongoing evaluation and updating of risks; control of autonomous vehicles for collecting images and sensor data; reviewing images/videos for items of interest; identification of anomalies; and retrieval of relevant documentation. Because of the rare and high-risk nature of these events, realistic simulations can support the development and evaluation of AI-based tools. We have developed realistic models of CBRNE scenarios and implemented an initial set of tools.

• #5348
Generating Plans for Cooperative Connected UAVs
François Bodin, Tristan Charrier, Arthur Queffelec, François Schwarzentruber
Demos Talks 1: Planning, Robotics, Vision

We present a tool for graph coverage with a fleet of UAVs. The UAVs must achieve the coverage of an area under the constraint of staying connected with the base, where the mission supervisor starts the plan. With an OpenStreetMap interface, the user is able to choose a specific location on which the mission needs to be generated and observes the resulting plan being executed.

• #5347
Curly: An AI-based Curling Robot Successfully Competing in the Olympic Discipline of Curling
Dong-Ok Won, Byung-Do Kim, Ho-Jung Kim, Tae-San Eom, Klaus-Robert Müller, Seong-Whan Lee
Demos Talks 1: Planning, Robotics, Vision

Most artificial intelligence (AI) based learning systems act in virtual or laboratory environments. Here we demonstrate an AI-based curling robot system named Curly' that competes on a real-world curling ice sheet. Curly encompasses (1) an AI-based curling strategy and simulation engine under consideration of the high icy' uncertainty, (2) the thrower robot enabled by autonomous driving with traction control, and (3) the skip robot that allows to recognize the curling field and stone configuration based on vision technology. The Curly performed well both: in classical game situations and when interacting with human opponents, namely, the top-ranked Korean amateur high school curling team.

### Tuesday 1711:20 - 12:45MAS-RA - Resource Allocation (C8)

Chair: Haris Aziz
• #617
Democratic Fair Allocation of Indivisible Goods
Erel Segal-Halevi, Warut Suksompong
Resource Allocation

We study the problem of fairly allocating indivisible goods to groups of agents. Agents in the same group share the same set of goods even though they may have different preferences. Previous work has focused on unanimous fairness, in which all agents in each group must agree that their group's share is fair. Under this strict requirement, fair allocations exist only for small groups. We introduce the concept of democratic fairness, which aims to satisfy a certain fraction of the agents in each group. This concept is better suited to large groups such as cities or countries. We present protocols for democratic fair allocation among two or more arbitrarily large groups of agents with monotonic, additive, or binary valuations. Our protocols approximate both envy-freeness and maximin-share fairness. As an example, for two groups of agents with additive valuations, our protocol yields an allocation that is envy-free up to one good and gives at least half of the maximin share to at least half of the agents in each group.

• #1170
Maximin Share Allocations on Cycles
Zbigniew Lonc, Miroslaw Truszczynski
Resource Allocation

The problem of fair division of indivisible goods is a fundamental problem of social choice. Recently, the problem was extended to the setting when goods form a graph and the goal is to allocate goods to agents so that each agent's bundle forms a connected subgraph. Researchers proved that, unlike in the original problem (which corresponds to the case of the complete graph in the extended setting), in the case of the goods-graph being a tree, allocations offering each agent a bundle of or exceeding her maximin share value always exist. Moreover, they can be found in polynomial time. We consider here the problem of maximin share allocations of goods on a cycle. Despite the simplicity of the graph, the problem turns out be significantly harder than its tree version. We present cases when maximin share allocations of goods on cycles exist and provide results on allocations guaranteeing each agent a certain portion of her maximin share. We also study algorithms for computing maximin share allocations of goods on cycles.

• #1476
Truthful Fair Division without Free Disposal
Xiaohui Bei, Guangda Huzhang, Warut Suksompong
Resource Allocation

We study the problem of fairly dividing a heterogeneous resource, commonly known as cake cutting and chore division, in the presence of strategic agents. While a number of results in this setting have been established in previous works, they rely crucially on the free disposal assumption, meaning that the mechanism is allowed to throw away part of the resource at no cost. In the present work, we remove this assumption and focus on mechanisms that always allocate the entire resource. We exhibit a truthful envy-free mechanism for cake cutting and chore division for two agents with piecewise uniform valuations, and we complement our result by showing that such a mechanism does not exist when certain additional assumptions are made. Moreover, we give truthful mechanisms for multiple agents with restricted classes of valuations.

• #2340
Dynamic Fair Division Problem with General Valuations
Bo Li, Wenyang Li, Yingkai Li
Resource Allocation

In this paper, we focus on how to dynamically allocate a divisible resource fairly among n players who arrive and depart over time. The players may have general heterogeneous valuations over the resource. It is known that the exact envy-free and proportional allocations may not exist in the dynamic setting [Walsh, 2011]. Thus, we will study to what extent we can guarantee the fairness in the dynamic setting. We first design two algorithms which are O(log n)-proportional and O(n)-envy-free for the setting with general valuations, and by constructing the adversary instances such that all dynamic algorithms must be at least Omega(1)-proportional and Omega(n/log n)-envy-free, we show that the bounds are tight up to a logarithmic factor. Moreover, we introduce the setting where the players' valuations are uniform on the resource but with different demands, which generalize the setting of [Friedman et al., 2015]. We prove an O(log n) upper bound and a tight lower bound for this case.

• #4519
Fair Division Under Cardinality Constraints
Arpita Biswas, Siddharth Barman
Resource Allocation

We consider the problem of fairly allocating indivisible goods, among agents, under cardinality constraints and additive valuations. In this setting, we are given a partition of the entire set of goods---i.e., the goods are categorized---and a limit is specified on the number of goods that can be allocated from each category to any agent. The objective here is to find a fair allocation in which the subset of goods assigned to any agent satisfies the given cardinality constraints. This problem naturally captures a number of resource-allocation applications, and is a generalization of the well-studied unconstrained fair division problem.  The two central notions of fairness, in the context of fair division of indivisible goods, are envy freeness up to one good (EF1) and the (approximate) maximin share guarantee (MMS). We show that the existence and algorithmic guarantees established for these solution concepts in the unconstrained setting can essentially be achieved under cardinality constraints. Furthermore, focusing on the case wherein all the agents have the same additive valuation, we establish that EF1 allocations exist even under matroid constraints.

• #3557
Redividing the Cake
Erel Segal-Halevi
Resource Allocation

A heterogeneous resource, such as a land-estate, is already divided among several agents in an unfair way.The challenge is to re-divide it among the agents in a way that balances fairness with ownership rights.We present re-division protocols that attain various combinations of fairness and ownership rights, in various settings differing in the geometric constraints on the allotments: (a) no geometric constraints; (b) connectivity --- the cake is a one-dimensional interval and each piece must be a contiguous interval; (c) rectangularity --- the cake is a two-dimensional rectangle and the pieces should be rectangles; (d) convexity --- the cake is a two-dimensional convex polygon and the pieces should be convex.

• #4014
Comparing Approximate Relaxations of Envy-Freeness
Georgios Amanatidis, Georgios Birmpas, Vangelis Markakis
Resource Allocation

In fair division problems with indivisible goods it is well known that one cannot have any guarantees for the classic fairness notions of envy-freeness and proportionality. As a result, several relaxations have been introduced, most of which in quite recent works. We focus on four such notions, namely envy-freeness up to one good (EF1), envy-freeness up to any good (EFX), maximin share fairness (MMS), and pairwise maximin share fairness (PMMS). Since obtaining these relaxations also turns out to be problematic in several scenarios, approximate versions of them have also been considered. In this work, we investigate further the connections  between the four notions mentioned above and their approximate versions. We establish several tight or almost tight results concerning the approximation quality that any of these notions guarantees for the others, providing an almost complete picture of this landscape. Some of our findings reveal interesting and surprising consequences regarding the power of these notions, e.g., PMMS and EFX provide the same worst-case guarantee for MMS, despite PMMS being a strictly stronger notion than EFX. We believe such implications provide further insight on the quality of approximately fair solutions.

### Tuesday 1711:20 - 12:45CV-MT - Motion and Tracking (T1)

Chair: Wei Feng
• #2357
Feature Integration with Adaptive Importance Maps for Visual Tracking
Aishi Li, Ming Yang, Wanqi Yang
Motion and Tracking

Discriminative correlation filters have recently achieved excellent performance for visual object tracking. The key to success is to make full use of dense sampling and specific properties of circulant matrices in the Fourier domain. However, previous studies don't take into consideration the importance and complementary information of different features, simply concatenating them. This paper investigates an effective method of feature integration for correlation filters, which jointly learns filters, as well as importance maps in each frame. These importance maps borrow the advantages of different features, aiming to achieve complementary traits and improve robustness. Moreover, for each feature, an importance map is shared by its all channels to avoid overfitting. In addition, we introduce a regularization term for the importance maps and use the penalty factor to control the significance of features. Based on handcrafted and CNN features, we implement two trackers, which achieve a competitive performance compared with several state-of-the-art trackers.

• #1277
Learning Robust Gaussian Process Regression for Visual Tracking
Linyu Zheng, Ming Tang, Jinqiao Wang
Motion and Tracking

Recent developments of Correlation Filter based trackers (CF trackers) have attracted much attention because of their top performance. However, the boundary effect imposed by the basic periodic assumption in their fast optimization seriously degrades the performance of CF trackers. Although there existed many recent works to relax the boundary effect in CF trackers, the cost was that they can not utilize the kernel trick to improve the accuracy further. In this paper, we propose a novel Gaussian Process Regression based tracker (GPRT) which is a conceptually natural tracking approach. Compared to all the existing CF trackers, the boundary effect is eliminated thoroughly and the kernel trick can be employed in our GPRT. In addition, we present two efficient and effective update methods for our GPRT. Experiments are performed on two public datasets: OTB-2013 and OTB-2015. Without bells and whistles, on these two datasets, our GPRT obtains 84.1% and 79.2% in mean overlap precision, respectively, outperforming all the existing trackers with hand-crafted features.

• #1372
Long-Term Human Motion Prediction by Modeling Motion Context and Enhancing Motion Dynamics
Yongyi Tang, Lin Ma, Wei Liu, Wei-Shi Zheng
Motion and Tracking

Human motion prediction aims at generating future frames of human motion based on an observed sequence of skeletons. Recent methods employ the latest hidden states of a recurrent neural network (RNN) to encode the historical skeletons, which can only address short-term prediction. In this work, we propose a motion context modeling by summarizing the historical human motion with respect to the current prediction. A modified highway unit (MHU) is proposed for efficiently eliminating motionless joints and estimating next pose given the motion context. Furthermore, we enhance the motion dynamic by minimizing the gram matrix loss for long-term motion prediction. Experimental results show that the proposed model can promisingly forecast the human future movements, which yields superior performances over related state-of-the-art approaches. Moreover, specifying the motion context with the activity labels enables our model to perform human motion transfer.

• #2490
Layered Optical Flow Estimation Using a Deep Neural Network with a Soft Mask
Xi Zhang, Di Ma, Xu Ouyang, Shanshan Jiang, Lin Gan, Gady Agam
Motion and Tracking

• #3066
Do not Lose the Details: Reinforced Representation Learning for High Performance Visual Tracking
Qiang Wang, Mengdan Zhang, Junliang Xing, Jin Gao, Weiming Hu, Steve Maybank
Motion and Tracking

This work presents a novel end-to-end trainable CNN model for high performance visual object tracking. It learns both low-level fine-grained representations and a high-level semantic embedding space in a mutual reinforced way, and a multi-task learning strategy is proposed to perform the correlation analysis on representations from both levels. In particular, a fully convolutional encoder-decoder network is designed to reconstruct the original visual features from the semantic projections to preserve all the geometric information. Moreover, the correlation filter layer working on the fine-grained representations leverages a global context constraint for accurate object appearance modeling. The correlation filter in this layer is updated online efficiently without network fine-tuning. Therefore, the proposed tracker benefits from two complementary effects: the adaptability of the fine-grained correlation analysis and the generalization capability of the semantic embedding. Extensive experimental evaluations on four popular benchmarks demonstrate its state-of-the-art performance.

• #474
Evaluating Brush Movements for Chinese Calligraphy: A Computer Vision Based Approach
Pengfei Xu, Lei Wang, Ziyu Guan, Xia Zheng, Xiaojiang Chen, Zhanyong Tang, Dingyi Fang, Xiaoqing Gong, Zheng Wang
Motion and Tracking

Chinese calligraphy is a popular, highly esteemed art form in the Chinese cultural sphere and worldwide. Ink brushes are the traditional writing tool for Chinese calligraphy and the subtle nuances of brush movements have a great impact on the aesthetics of the written characters. However, mastering the brush movement is a challenging task for many calligraphy learners as it requires many years’ practice and expert supervision. This paper presents a novel approach to help Chinese calligraphy learners to quantify the quality of brush movements without expert involvement. Our approach extracts the brush trajectories from a video stream; it then compares them with example templates of reputed calligraphers to produce a score for the writing quality. We achieve this by first developing a novel neural network to extract the spatial and temporal movement features from the video stream. We then employ methods developed in the computer vision and signal processing domains to track the brush movement trajectory and calculate the score. We conducted extensive experiments and user studies to evaluate our approach. Experimental results show that our approach is highly accurate in identifying brush movements, yielding an average accuracy of 90%, and the generated score is within 3% of errors when compared to the one given by human experts.

• #596
Unsupervised Learning based Jump-Diffusion Process for Object Tracking in Video Surveillance
Xiaobai Liu, Donovan Lo, Chau Thuan
Motion and Tracking

This paper presents a principled way for dealing with occlusions in visual tracking which is a long-standing issue in computer vision but largely remains unsolved. As the major innovation, we develop a learning-based jump-diffusion process to jointly track object locations and estimate their visibility statuses over time. Our method employs in particular a set of jump dynamics to change object's visibility statuses and a set of diffusion dynamics to track objects in videos. Different from the traditional jump-diffusion process that stochastically generates dynamics, we utilize deep policy functions to determine the best dynamic at the present step and learn the optimal policies from raw videos using reinforcement learning methods.Our method is capable of tracking objects with severe occlusions in crowded scenes and thus recovers the complete trajectories of objects that undergo multiple interactions with others. We evaluate the proposed method on challenging video sequences and compare it to alternative methods. Significant improvements are obtained particularly for the videos including frequent interactions or occlusions.

### Tuesday 1711:20 - 12:45ML-LGM - Learning Generative Models (K11)

Chair: Xi Peng
• #3582
MEGAN: Mixture of Experts of Generative Adversarial Networks for Multimodal Image Generation
David Keetae Park, Seungjoo Yoo, Hyojin Bahng, Jaegul Choo, Noseong Park
Learning Generative Models

Recently, generative adversarial networks (GANs) have shown promising performance in generating realistic images. However, they often struggle in learning complex underlying modalities in a given dataset, resulting in poor-quality generated images. To mitigate this problem, we present a novel approach called mixture of experts GAN (MEGAN), an ensemble approach of multiple generator networks. Each generator network in MEGAN specializes in generating images with a particular subset of modalities, e.g., an image class. Instead of incorporating a separate step of handcrafted clustering of multiple modalities, our proposed model is trained through an end-to-end learning of multiple generators via gating networks, which is responsible for choosing the appropriate generator network for a given condition. We adopt the categorical reparameterization trick for a categorical decision to be made in selecting a generator while maintaining the flow of the gradients. We demonstrate that individual generators learn different and salient subparts of the data and achieve a multiscale structural similarity (MS-SSIM) score of 0.2470 for CelebA and a competitive unsupervised inception score of 8.33 in CIFAR-10.

• #2115
Geometric Enclosing Networks
Trung Le, Hung Vu, Tu Dinh Nguyen, Dinh Phung
Learning Generative Models

Training model to generate data has increasingly attracted research attention and become important in modern world applications. We propose in this paper a new geometry-based optimization approach to address this problem. Orthogonal to current state-of-the-art density-based approaches, most notably VAE and GAN, we present a fresh new idea that borrows the principle of minimal enclosing ball to train a generator G\left(\bz\right) in such a way that both training and generated data, after being mapped to the feature space, are enclosed in the same sphere. We develop theory to guarantee that the mapping is bijective so that its inverse from feature space to data space results in expressive nonlinear contours to describe the data manifold, hence ensuring data generated are also lying on the data manifold learned from training data. Our model enjoys a nice geometric interpretation, hence termed Geometric Enclosing Networks (GEN), and possesses some key advantages over its rivals, namely simple and easy-to-control optimization formulation, avoidance of mode collapsing and efficiently learn data manifold representation in a completely unsupervised manner. We conducted extensive experiments on synthesis and real-world datasets to illustrate the behaviors, strength and weakness of our proposed GEN, in particular its ability to handle multi-modal data and quality of generated data.

• #2302
Generative Warfare Nets: Ensemble via Adversaries and Collaborators
Honglun Zhang, Liqiang Xiao, Wenqing Chen, Yongkun Wang, Yaohui Jin
Learning Generative Models

Generative Adversarial Nets are a powerful method for training generative models of complex data, where a Generator and a Discriminator confront with each other and get optimized in a two-player minmax manner. In this paper, we propose the Generative Warfare Nets (GWN) that involve multiple generators and multiple discriminators from two sides to exploit the advantages of Ensemble Learning. We maintain the authorities for the generators and the discriminators to enhance inter-side interactions, and utilize the mechanisms of imitation and innovation to model intra-side interactions among the generators, where they can not only learn from but also compete with each other. Extensive experiments on three natural image datasets show that GWN can achieve state-of-the-art Inception scores and produce diverse high-quality synthetic results.

• #1046
Ming Hou, Brahim Chaib-draa, Chao Li, Qibin Zhao
Learning Generative Models

In this work, we consider the task of classifying binary positive-unlabeled (PU) data. The existing discriminative learning based PU models attempt to seek an optimal reweighting strategy for U data, so that a decent decision boundary can be found. However, given limited P data, the conventional PU models tend to suffer from overfitting when adapted to very flexible deep neural networks. In contrast, we are the first to innovate a totally new paradigm to attack the binary PU task, from perspective of generative learning by leveraging the powerful generative adversarial networks (GAN). Our generative positive-unlabeled (GenPU) framework incorporates an array of discriminators and generators that are endowed with different roles in simultaneously producing positive and negative realistic samples. We provide theoretical analysis to justify that, at equilibrium, GenPU is capable of recovering both positive and negative data distributions. Moreover, we show GenPU is generalizable and closely related to the semi-supervised classification. Given rather limited P data, experiments on both synthetic and real-world dataset demonstrate the effectiveness of our proposed framework. With infinite realistic and diverse sample streams generated from GenPU, a very flexible classifier can then be trained using deep neural networks.

• #3631
Joint Generative Moment-Matching Network for Learning Structural Latent Code
Hongchang Gao, Heng Huang
Learning Generative Models

Generative Moment-Matching Network (GMMN) is a deep generative model, which employs maximum mean discrepancy as the objective to learn model parameters. However, this model can only generate samples, failing to infer the latent code from samples for downstream tasks. In this paper, we propose a novel Joint Generative Moment-Matching Network (JGMMN), which learns the structural latent code for unsupervised inference. Specifically, JGMMN has a generation network for the generation task and an inference network for the inference task. We first reformulate this model as the two joint distributions matching problem. To solve this problem, we propose to use the Joint Maximum Mean Discrepancy (JMMD) as the objective to learn these two networks simultaneously. Furthermore, to enforce the consistency between the sample distribution and the inferred latent code distribution, we propose a novel multi-modal regularization to enforce this consistency. At last, extensive experiments on both synthetic and real-world datasets have verified the effectiveness and correctness of our proposed JGMMN.

• #1659
Unsupervised Disentangled Representation Learning with Analogical Relations
Zejian Li, Yongchuan Tang, Yongxing He
Learning Generative Models

Learning the disentangled representation of interpretable generative factors of data is one of the foundations to allow artificial intelligence to think like people. In this paper, we propose the analogical training strategy for the unsupervised disentangled representation learning in generative models. The analogy is one of the typical cognitive processes, and our proposed strategy is based on the observation that sample pairs in which one is different from the other in one specific generative factor show the same analogical relation. Thus, the generator is trained to generate sample pairs from which a designed classifier can identify the underlying analogical relation. In addition, we propose a disentanglement metric called the subspace score, which is inspired by subspace learning methods and does not require supervised information. Experiments show that our proposed training strategy allows the generative models to find the disentangled factors, and that our methods can give competitive performances as compared with the state-of-the-art methods.

• #6
MIXGAN: Learning Concepts from Different Domains for Mixture Generation
Guang-Yuan Hao, Hong-Xing Yu, Wei-Shi Zheng
Learning Generative Models

In this work, we present an interesting attempt on mixture generation: absorbing different image concepts (e.g., content and style) from different domains and thus generating a new domain with learned concepts. In particular, we propose a mixture generative adversarial network (MIXGAN). MIXGAN learns concepts of content and style from two domains respectively, and thus can join them for mixture generation in a new domain, i.e., generating images with content from one domain and style from another. MIXGAN overcomes the limitation of current GAN-based models which either generate new images in the same domain as they observed in training stage, or require off-the-shelf content templates for transferring or translation. Extensive experimental results demonstrate the effectiveness of MIXGAN as compared to related state-of-the-art GAN-based models.

### Tuesday 1711:20 - 13:00KR-WEB1 - Knowledge Representation and the Web: Description Logics, Ontologies (C7)

Chair: Thomas Lukasiewicz
• #1609
From Conjunctive Queries to Instance Queries in Ontology-Mediated Querying
Cristina Feier, Carsten Lutz, Frank Wolter
Knowledge Representation and the Web: Description Logics, Ontologies

We consider ontology-mediated queries (OMQs) based on expressive description logics of the ALC family and (unions) of conjunctive queries, studying the rewritability into OMQs based on instance queries (IQs). Our results include exact characterizations of when such a rewriting is possible and tight complexity bounds for deciding rewritability. We also give a tight complexity bound for the related problem of deciding whether a given MMSNP sentence (in other words: the complement of a monadic disjunctive Datalog program) is equivalent to a constraint satisfaction problem.

• #2968
Reverse Engineering Queries in Ontology-Enriched Systems: The Case of Expressive Horn Description Logic Ontologies
Víctor Gutiérrez-Basulto, Jean Christoph Jung, Leif Sabellek
Knowledge Representation and the Web: Description Logics, Ontologies

We introduce the query-by-example (QBE) paradigm for query answering in the presence of ontologies. Intuitively, QBE permits non-expert users to explore the data by providing examples of the information they (do not) want, which the system then generalizes into a query. Formally, we study the following question: given a knowledge base and sets of positive and negative examples, is there a query that returns all positive but none of the negative examples?  We focus on description logic knowledge bases with ontologies formulated in Horn-ALCI and (unions of) conjunctive queries. Our main contributions are characterizations, algorithms and tight complexity bounds for QBE.

• #1946
Horn-Rewritability vs PTime Query Evaluation in Ontology-Mediated Querying
Andre Hernich, Carsten Lutz, Fabio Papacchini, Frank Wolter
Knowledge Representation and the Web: Description Logics, Ontologies

In ontology-mediated querying with an expressive description logic L, two desirable properties of a TBox T are (1) being able to replace T with a TBox formulated in the Horn-fragment of L without affecting the answers to conjunctive queries, and (2) that every conjunctive query can be evaluated in PTime w.r.t. T. We investigate in which cases (1) and (2) are equivalent, finding that the answer depends on whether the unique name assumption (UNA) is made, on the description logic under consideration, and on the nesting depth of quantifiers in the TBox. We also clarify the relationship between query evaluation with and without UNA and consider natural variations of property (1).

• #162
Fast Compliance Checking in an OWL2 Fragment
Piero A. Bonatti
Knowledge Representation and the Web: Description Logics, Ontologies

We illustrate a formalization of data usage policies in a fragment of OWL2.  It can be used to encode (i) a company's data protection policy, (ii) data subjects' consent to data processing, and (iii) part of the GDPR (the forthcoming European Data Protection Regulation).  Then a company's policy can be checked for compliance with data subjects' consent and with part of the GDPR by means of subsumption queries.  We provide a complete and tractable structural subsumption algorithm for compliance checking and prove the intractability of a natural generalization of the policy language.

• #184
On Concept Forgetting in Description Logics with Qualified Number Restrictions
Yizheng Zhao, Renate Schmidt
Knowledge Representation and the Web: Description Logics, Ontologies

This paper presents a practical method for computing solutions of concept forgetting in the description logic ALCOQ(neg,and,or), basic ALC extended with nominals, qualified number restrictions, role negation, role conjunction and role disjunction. The method is based on a non-trivial generalisation of Ackermann's Lemma, and attempts to compute either semantic solutions of concept forgetting or uniform interpolants in ALCOQ(neg,and,or). It is so far the only approach to concept forgetting in description logics with number restrictions plus nominals, as well as in description logics with ABoxes. Results of an evaluation with a prototypical implementation have shown that the method was successful in more than 90% of the test cases from a large corpus of biomedical ontologies. In only 13.2% of these cases the solutions were semantic solutions.

• #1763
Embracing Change by Abstraction Materialization Maintenance for Large ABoxes
Markus Brenner, Birte Glimm
Knowledge Representation and the Web: Description Logics, Ontologies

Abstraction Refinement is a recently introduced technique which allows for reducing materialization of an ontology with a large ABox to materialization of a smaller (compressed) abstraction' of this ontology.  In this paper, we show how Abstraction Refinement can be adopted for incremental ABox materialization by combining it with the well-known DRed algorithm for materialization maintenance. Such a combination is non-trivial and to preserve soundness and completeness, already Horn ALCHI requires more complex abstractions. Nevertheless, we show that significant benefits can be obtained for synthetic and real-world ontologies.

• #3680
Inconsistency-Tolerant Ontology-Based Data Access Revisited: Taking Mappings into Account
Meghyn Bienvenu
Knowledge Representation and the Web: Description Logics, Ontologies

Inconsistency-tolerant query answering in the presence of ontologies has received considerable attention in recent years. However, existing work assumes that the data is expressed using the vocabulary of the ontology and is therefore not directly applicable to ontology-based data access (OBDA), where relational data is connected to the ontology via mappings. This motivates us to revisit existing results in the wider context of OBDA with mappings. After formalizing the problem, we perform a detailed analysis of the data complexity of inconsistency-tolerant OBDA for ontologies formulated in DL-Lite and other data-tractable description logics, considering three different semantics (AR, IAR, and brave), two notions of repairs (subset and symmetric difference), and two classes of global-as-view (GAV) mappings. We show that adding plain GAV mappings does not affect data complexity, but there is a jump in complexity if mappings with negated atoms are considered.

• #3609
Two Approaches to Ontology Aggregation Based on Axiom Weakening
Daniele Porello, Nicolas Troquard, Rafael Peñaloza, Roberto Confalonieri, Pietro Galliani, Oliver Kutz
Knowledge Representation and the Web: Description Logics, Ontologies

Axiom weakening is a novel technique that allows for fine-grained repair of inconsistent ontologies. In a multi-agent setting, integrating ontologies corresponding to multiple agents may lead to inconsistencies. Such inconsistencies can be resolved after the integrated ontology has been built, or their generation can be prevented during ontology generation. We implement and compare these two approaches. First, we study how to repair an inconsistent ontology resulting from a voting-based aggregation of views of heterogeneous agents. Second, we prevent the generation of inconsistencies by letting the agents engage in a turn-based rational protocol about the axioms to be added to the integrated ontology. We instantiate the two approaches using real-world ontologies and compare them by measuring the levels of satisfaction of the agents w.r.t. the ontology obtained by the two procedures.

### Tuesday 1711:20 - 13:00CSAT-SAT - Satisfiability (K2)

Chair: Sebastian Ordyniak
• #1063
Boosting MCSes Enumeration
Éric Grégoire, Yacine Izza, Jean-Marie Lagniez
Satisfiability

The enumeration of all Maximal Satisfiable Subsets (MSSes) or all Minimal Correction Subsets (MCSes) of an unsatisfiable CNF Boolean formula is a useful and sometimes necessary step for solving a variety of important A.I. issues. Although the number of different MCSes of a CNF Boolean formula is exponential in the worst case, it remains low in many practical situations; this makes the tentative enumeration possibly successful in these latter cases. In the paper, a technique is introduced that boosts the currently most efficient practical approaches to enumerate MCSes. It implements a model rotation paradigm that allows the set of MCSes to be computed in an heuristically efficient way.

• #1868
DMC: A Distributed Model Counter
Jean-Marie Lagniez, Pierre Marquis, Nicolas Szczepanski
Satisfiability

We present and evaluate DMC, a distributed model counter for propositional CNF formulae based on the state-of-the-art sequential model counter D4. DMC can take advantage of a (possibly large) number of sequential model counters running on (possibly heterogeneous) computing units spread over a network of computers. For ensuring an efficient workload distribution, the model counting task is shared between the model counters following a policy close to work stealing. The number and the sizes of the messages which are exchanged by the jobs are kept small. The results obtained show DMC as a much more efficient counter than D4, the distribution of the computation yielding large improvements for some benchmarks. DMC appears also as a serious challenger to the parallel model counter CountAntom and to the distributed model counter dCountAntom.

• #1947
On the Satisfiability Threshold of Random Community-Structured SAT
Dina Barak-Pelleg, Daniel Berend
Satisfiability

For both historical and practical reasons, the Boolean satisfiability problem (SAT) has become one of central importance in computer science. One type of instances arises when the clauses are chosen uniformly randomly \textendash{} random SAT. Here, a major problem, recently solved for sufficiently large clause length, is the satisfiability threshold conjecture. The value of this threshold is known exactly only for clause length $2$, and there has been a lot of research concerning its value for arbitrary fixed clause length. In this paper, we endeavor to study the satisfiability threshold for random industrial SAT. There is as yet no generally accepted model of industrial SAT, and we confine ourselves to one of the more common features of industrial SAT: the set of variables consists of a number of disjoint communities, and clauses tend to consist of variables from the same community. Our main result is that the threshold of random community-structured SAT tends to be smaller than its counterpart for random SAT. Moreover, under some conditions, this threshold even vanishes.

• #3157
Conflict Directed Clause Learning for Maximum Weighted Clique Problem
Emmanuel Hebrard, George Katsirelos
Satisfiability

The maximum clique and minimum vertex cover problems are among Karp's 21 NP-complete problems, and have numerous applications: in combinatorial auctions, for computing phylogenetic trees, to predict the structure of proteins, to analyse social networks, and so forth. Currently, the best complete methods are branch & bound algorithms and rely largely on graph colouring to compute a bound. We introduce a new approach based on SAT and on the "Conflict-Driven Clause Learning" (CDCL) algorithm. We propose an efficient implementation of Babel's bound and pruning rule, as well as a novel dominance rule. Moreover, we show how to compute concise explanations for this inference. Our experimental results show that this approach is competitive and often outperforms the state of the art for finding cliques of maximum weight.

• #3567
Solving Exist-Random Quantified Stochastic Boolean Satisfiability via Clause Selection
Nian-Ze Lee, Yen-Shi Wang, Jie-Hong R. Jiang
Satisfiability

Stochastic Boolean satisfiability (SSAT) is an expressive language to formulate decision problems with randomness. Solving SSAT formulas has the same PSPACE-complete computational complexity as solving quantified Boolean formulas (QBFs). Despite its broad applications and profound theoretical values, SSAT has received relatively little attention compared to QBF. In this paper, we focus on exist-random quantified SSAT formulas, also known as E-MAJSAT, which is a special fragment of SSAT commonly applied in probabilistic conformant planning, posteriori hypothesis, and maximum expected utility. Based on clause selection, a recently proposed QBF technique, we propose an algorithm to solve E-MAJSAT. Moreover, our method can provide an approximate solution to E-MAJSAT with a lower bound when an exact answer is too expensive to compute. Experiments show that the proposed algorithm achieves significant performance gains and memory savings over the state-of-the-art SSAT solvers on a number of benchmark formulas, and provides useful lower bounds for cases where prior methods fail to compute exact answers.

• #3839
Divide and Conquer: Towards Faster Pseudo-Boolean Solving
Jan Elffers, Jakob Nordström
Satisfiability

The last 20 years have seen dramatic improvements in the performance of algorithms for Boolean satisfiability---so-called SAT solvers---and today conflict-driven clause learning (CDCL) solvers are routinely used in a wide range of application areas. One serious short-coming of CDCL, however, is that the underlying method of reasoning is quite weak. A tantalizing solution is to instead use stronger pseudo-Boolean (PB) reasoning, but so far the promise of exponential gains in performance has failed to materialize---the increased theoretical strength seems hard to harness algorithmically, and in many applications CDCL-based methods are still superior. We propose a modified approach to pseudo-Boolean solving based on division instead of the saturation rule used in [Chai and Kuehlmann '05] and other PB solvers. In addition to resulting in a stronger conflict analysis, this also improves performance by keeping integer coefficient sizes down, and yields a very competitive solver as shown by the results in the Pseudo-Boolean Competitions 2015 and 2016.

• #3882
Seeking Practical CDCL Insights from Theoretical SAT Benchmarks
Jan Elffers, Jesús Giráldez-Cru, Stephan Gocht, Jakob Nordström, Laurent Simon
Satisfiability

Over the last decades Boolean satisfiability (SAT) solvers based on conflict-driven clause learning (CDCL) have developed to the point where they can handle formulas with millions of variables. Yet a deeper understanding of how these solvers can be so successful has remained elusive. In this work we shed light on CDCL performance by using theoretical benchmarks, which have the attractive features of being a) scalable, b) extremal with respect to different proof search parameters, and c) theoretically easy in the sense of having short proofs in the resolution proof system underlying CDCL. This allows for a systematic study of solver heuristics and how efficiently they search for proofs. We report results from extensive experiments on a wide range of benchmarks. Our findings include several examples where theory predicts and explains CDCL behaviour, but also raise a number of intriguing questions for further study.

• #5470
(Journal track) Complexity of n-Queens Completion
Ian P. Gent, Christopher Jefferson, Peter Nightingale
Satisfiability

The n-Queens problem is to place n chess queens on an n by n chessboard so that no two queens are on the same row, column or diagonal. The n-Queens Completion problem is a variant, dating to 1850, in which some queens are already placed and the solver is asked to place the rest, if possible. We show that n-Queens Completion is both NP-Complete and #P-Complete. A corollary is that any non-attacking arrangement of queens can be included as a part of a solution to a larger n-Queens problem. We introduce generators of random instances for n-Queens Completion and the closely related Blocked n-Queens and Excluded Diagonals Problem. We describe three solvers for these problems, and empirically analyse the hardness of randomly generated instances. For Blocked n-Queens and the Excluded Diagonals Problem, we show the existence of a phase transition associated with hard instances as has been seen in other NP-Complete problems, but a natural generator for n-Queens Completion did not generate consistently hard instances. The significance of this work is that the n-Queens problem has been very widely used as a benchmark in Artificial Intelligence, but conclusions on it are often disputable because of the simple complexity of the decision problem. Our results give alternative benchmarks which are hard theoretically and empirically, but for which solving techniques designed for n-Queens need minimal or no change.

### Tuesday 1711:20 - 13:00NLP-SAA - Sentiment Analysis and Argument Mining (T2)

Chair: Serena Villata
• #900
Beyond Polarity: Interpretable Financial Sentiment Analysis with Hierarchical Query-driven Attention
Ling Luo, Xiang Ao, Feiyang Pan, Jin Wang, Tong Zhao, Ningzi Yu, Qing He
Sentiment Analysis and Argument Mining

Sentiment analysis has played a significant role in financial applications in recent years. The informational and emotive aspects of news texts may affect the prices, volatilities, volume of trades, and even potential risks of financial subjects. Previous studies in this field mainly focused on identifying polarity~(e.g. positive or negative). However, as financial decisions broadly require justifications, only plausible polarity cannot provide enough evidence during the decision making processes of humanity. Hence an explainable solution is in urgent demand. In this paper, we present an interpretable neural net framework for financial sentiment analysis. First, we design a hierarchical model to learn the representation of a document from multiple granularities. In addition, we propose a query-driven attention mechanism to satisfy the unique characteristics of financial documents. With the domain specified questions provided by the financial analysts, we can discover different spotlights for queries from different aspects. We conduct extensive experiments on a real-world dataset. The results demonstrate that our framework can learn better representation of the document and unearth meaningful clues on replying different users? preferences. It also outperforms the state-of-the-art methods on sentiment prediction of financial documents.

• #1270
Text Emotion Distribution Learning via Multi-Task Convolutional Neural Network
Yuxiang Zhang, Jiamei Fu, Dongyu She, Ying Zhang, Senzhang Wang, Jufeng Yang
Sentiment Analysis and Argument Mining

Emotion analysis of on-line user generated textual content is important for natural language processing and social media analytics tasks. Most of previous emotion analysis approaches focus on identifying users’ emotional states from text by classifying emotions into one of the finite categories, e.g., joy, surprise, anger and fear. However, there exists ambiguity characteristic for the emotion analysis, since a single sentence can evoke multiple emotions with different intensities. To address this problem, we introduce emotion distribution learning and propose a multi-task convolutional neural network for text emotion analysis. The end-to-end framework optimizes the distribution prediction and classification tasks simultaneously, which is able to learn robust representations for the distribution dataset with annotations of different voters. While most work adopt the majority voting scheme for the ground truth labeling, we also propose a lexiconbased strategy to generate distributions from a single label, which provides prior information for the emotion classification. Experiments conducted on five public text datasets (i.e., SemEval, Fairy Tales, ISEAR, TEC, CBET) demonstrate that our proposed method performs favorably against the state-of-the-art approaches.

• #2377
Aspect Term Extraction with History Attention and Selective Transformation
Xin Li, Lidong Bing, Piji Li, Wai Lam, Zhimou Yang
Sentiment Analysis and Argument Mining

Aspect Term Extraction (ATE), a key sub-task in Aspect-Based Sentiment Analysis, aims to extract explicit aspect expressions from online user reviews. We present a new framework for tackling ATE. It can exploit two useful clues, namely opinion summary and aspect detection history. Opinion summary is distilled from the whole input sentence, conditioned on each current token for aspect prediction, and thus the tailor-made summary can help aspect prediction on this token. On the other hand, the aspect detection history information is distilled from the previous aspect predictions, and it can leverage the coordinate structure and tagging schema constraints to upgrade the aspect prediction. Experimental results over four benchmark datasets clearly demonstrate that our framework can outperform all state-of-the-art methods.

• #2831
A Hierarchical End-to-End Model for Jointly Improving Text Summarization and Sentiment Classification
Shuming Ma, Xu Sun, Junyang Lin, Xuancheng Ren
Sentiment Analysis and Argument Mining

Text summarization and sentiment classification both aim to capture the main ideas of the text but at different levels. Text summarization is to describe the text within a few sentences, while sentiment classification can be regarded as a special type of summarization which summarizes'' the text into a even more abstract fashion, i.e., a sentiment class. Based on this idea, we propose a hierarchical end-to-end model for joint learning of text summarization and sentiment classification, where the sentiment classification label is treated as the further summarization'' of the text summarization output. Hence, the sentiment classification layer is put upon the text summarization layer, and a hierarchical structure is derived. Experimental results on Amazon online reviews datasets show that our model achieves better performance than the strong baseline systems on both abstractive summarization and sentiment classification.

• #3168
Transition-based Adversarial Network for Cross-lingual Aspect Extraction
Wenya Wang, Sinno Jialin Pan
Sentiment Analysis and Argument Mining

In fine-grained opinion mining, the task of aspect extraction involves the identification of explicit product features in customer reviews. This task has been widely studied in some major languages, e.g., English, but was seldom addressed in other minor languages due to the lack of annotated corpus. To solve it, we develop a novel deep model to transfer knowledge from a source language with labeled training data to a target language without any annotations. Different from cross-lingual sentiment classification, aspect extraction across languages requires more fine-grained adaptation. To this end, we utilize transition-based mechanism that reads a word each time and forms a series of configurations that represent the status of the whole sentence. We represent each configuration as a continuous feature vector and align these representations from different languages into a shared space through an adversarial network. In addition, syntactic structures are also integrated into the deep model to achieve more syntactically-sensitive adaptations. The proposed method is end-to-end and achieves state-of-the-art performance on English, French and Spanish restaurant review datasets.

• #3276
Aspect Sentiment Classification with both Word-level and Clause-level Attention Networks
Jingjing Wang, Jie Li, Shoushan Li, Yangyang Kang, Min Zhang, Luo Si, Guodong Zhou
Sentiment Analysis and Argument Mining

Aspect sentiment classification, a challenging task in sentiment analysis, has been attracting more and more attention in recent years. In this paper, we highlight the need for incorporating the importance degrees of both words and clauses inside a sentence and propose a hierarchical network with both word-level and clause-level attentions to aspect sentiment classification. Specifically, we first adopt sentence-level discourse segmentation to segment a sentence into several clauses. Then, we leverage multiple Bi-directional LSTM layers to encode all clauses and propose a word-level attention layer to capture the importance degrees of words in each clause. Third and finally, we leverage another Bi-directional LSTM layer to encode the outputs from the former layers and propose a clause-level attention layer to capture the importance degrees of all the clauses inside a sentence. Experimental results on the laptop and restaurant datasets from SemEval-2015 demonstrate the effectiveness of our proposed approach to aspect sentiment classification.

• #4342
Learning to Give Feedback: Modeling Attributes Affecting Argument Persuasiveness in Student Essays
Zixuan Ke, Winston Carlile, Nishant Gurrapadi, Vincent Ng
Sentiment Analysis and Argument Mining

Argument persuasiveness is one of the most important dimensions of argumentative essay quality, yet it is little studied in automated essay scoring research. Using a recently released corpus of essays that are simultaneously annotated with argument components, argument persuasiveness scores, and attributes of argument components that impact an argument’s persuasiveness, we design and train the first set of neural models that predict the persuasiveness of an argument and its attributes in a student essay, enabling useful feedback to be provided to students on why their arguments are (un)persuasive in addition to how persuasive they are.

• #5471
(Journal track) Distributional Correspondence Indexing for Cross-Lingual and Cross-Domain Sentiment Classification
Alejandro Moreo Fernández, Andrea Esuli, Fabrizio Sebastiani
Sentiment Analysis and Argument Mining

Domain Adaptation (DA) techniques aim at enabling machine learning methods learn effective classifiers for a “target” domain when the only available training data belongs to a different “source” domain. In this extended abstract, we briefly describe our new DA method called Distributional Correspondence Indexing (DCI) for sentiment classification. DCI derives term representations in a vector space common to both domains where each dimension reflects its distributional correspondence to a pivot, i.e., to a highly predictive term that behaves similarly across domains. The experiments we have conducted show that DCI obtains better performance than current state-of-the-art techniques for cross-lingual and cross-domain sentiment classification.

### Tuesday 1711:20 - 13:00ML-LPR - Learning Preferences or Rankings (C2)

Chair: Yukino Baba
• #2980
High-dimensional Similarity Learning via Dual-sparse Random Projection
Dezhong Yao, Peilin Zhao, Tuan-Anh Nguyen Pham, Gao Cong
Learning Preferences or Rankings

We investigate how to adopt dual random projection for high-dimensional similarity learning. For a high-dimensional similarity learning problem, projection is usually adopted to map high-dimensional features into low-dimensional space, in order to reduce the computational cost. However, dimensionality reduction method sometimes results in unstable performance due to the suboptimal solution in original space. In this paper, we propose a dual random projection framework for similarity learning to recover the original optimal solution from subspace optimal solution. Previous dual random projection methods usually make strong assumptions about the data, which need to be low rank or have a large margin. Those assumptions limit dual random projection applications in similarity learning. Thus, we adopt a dual-sparse regularized random projection method that introduces a sparse regularizer into the reduced dual problem. As the original dual solution is a sparse one, applying a sparse regularizer in the reduced space relaxes the low-rank assumption. Experimental results show that our method enjoys higher effectiveness and efficiency than state-of-the-art solutions.

• #2738
Modeling Contemporaneous Basket Sequences with Twin Networks for Next-Item Recommendation
Duc-Trong Le, Hady W. Lauw, Yuan Fang
Learning Preferences or Rankings

Our interactions with an application frequently leave a heterogeneous and contemporaneous trail of actions and adoptions (e.g., clicks, bookmarks, purchases). Given a sequence of a particular type (e.g., purchases)-- referred to as the target sequence, we seek to predict the next item expected to appear beyond this sequence. This task is known as next-item recommendation. We hypothesize two means for improvement. First, within each time step, a user may interact with multiple items (a basket), with potential latent associations among them. Second, predicting the next item in the target sequence may be helped by also learning from another supporting sequence (e.g., clicks). We develop three twin network structures modeling the generation of both target and support basket sequences. One based on "Siamese networks" facilitates full sharing of parameters between the two sequence types. The other two based on "fraternal networks" facilitate partial sharing of parameters. Experiments on real-world datasets show significant improvements upon baselines relying on one sequence type.

• #2323
Chuxu Zhang, Lu Yu, Xiangliang Zhang, Nitesh V. Chawla
Learning Preferences or Rankings

We study the problem of author-paper correlation inference in big scholarly data, which is to effectively infer potential correlated works for researchers using historical records. Unlike supervised learning algorithms that predict relevance score of author-paper pair via time and memory consuming feature engineering, network embedding methods automatically learn nodes' representations that can be further used to infer author-paper correlation. However, most current models suffer from two limitations: (1) they produce general purpose embeddings that are independent of the specific task; (2) they are usually based on network structure but out of content semantic awareness. To address these drawbacks, we propose a task-guided and semantic-aware ranking model. First, the historical interactions among all correlated author-paper pairs are formulated as a pairwise ranking loss. Next, the paper's semantic embedding encoded by gated recurrent neural network, together with the author's latent feature is used to score each author-paper pair in ranking loss. Finally, a heterogeneous relations integrative learning module is designed to further augment the model. The evaluation results of extensive experiments on the well known AMiner dataset demonstrate that the proposed model reaches significant better performance, comparing to a number of baselines.

• #443
A Brand-level Ranking System with the Customized Attention-GRU Model
Yu Zhu, Junxiong Zhu, Jie Hou, Yongliang Li, Beidou Wang, Ziyu Guan, Deng Cai
Learning Preferences or Rankings

In e-commerce websites like Taobao, brand is playing a more important role in influencing users' decision of click/purchase, partly because users are now attaching more importance to the quality of products and brand is an indicator of quality. However, existing ranking systems are not specifically designed to satisfy this kind of demand. Some design tricks may partially alleviate this problem, but still cannot provide satisfactory results or may create additional interaction cost. In this paper, we design the first brand-level ranking system to address this problem. The key challenge of this system is how to sufficiently exploit users' rich behavior in e-commerce websites to rank the brands. In our solution, we firstly conduct the feature engineering specifically tailored for the personalized brand ranking problem and then rank the brands by an adapted Attention-GRU model containing three important modifications. Note that our proposed modifications can also apply to many other machine learning models on various tasks. We conduct a series of experiments to evaluate the effectiveness of our proposed ranking model and test the response to the brand-level ranking system from real users on a large-scale e-commerce platform, i.e. Taobao.

• #1662
Attentional Image Retweet Modeling via Multi-Faceted Ranking Network Learning
Zhou Zhao, Lingtao Meng, Jun Xiao, Min Yang, Fei Wu, Deng Cai, Xiaofei He, Yueting Zhuang
Learning Preferences or Rankings

Retweet prediction is a challenging problem in social media sites (SMS). In this paper, we study the problem of image retweet prediction in social media, which predicts the image sharing behavior that the user reposts the image tweets from their followees. Unlike previous studies, we learn user preference ranking model from their past retweeted image tweets in SMS. We first propose heterogeneous image retweet modeling network (IRM) that exploits users' past retweeted image tweets with associated contexts, their following relations in SMS and preference of their followees. We then develop a novel attentional multi-faceted ranking network learning framework with multi-modal neural networks for the proposed heterogenous IRM network to learn the joint image tweet representations and user preference representations for prediction task. The extensive experiments on a large-scale dataset from Twitter site shows that our method achieves better performance than other state-of-the-art solutions to the problem.

• #2338
Generalization Bounds for Regularized Pairwise Learning
Yunwen Lei, Shao-Bo Lin, Ke Tang
Learning Preferences or Rankings

Pairwise learning refers to learning tasks with the associated loss functions depending on pairs of examples. Recently, pairwise learning has received increasing attention since it covers many machine learning schemes, e.g., metric learning, ranking and AUC maximization, in a unified framework. In this paper, we establish a unified generalization error bound for regularized pairwise learning without either Bernstein conditions or capacity assumptions. We apply this general result to typical learning tasks including distance metric learning and ranking, for each of which our discussion is able to improve the state-of-the-art results.

• #933
Convolutional Neural Networks based Click-Through Rate Prediction with Multiple Feature Sequences
Patrick P. K. Chan, Xian Hu, Lili Zhao, Daniel S. Yeung, Dapeng Liu, Lei Xiao
Learning Preferences or Rankings

Convolutional Neural Network (CNN) achieved satisfying performance in click-through rate (CTR) prediction in recent studies. Since features used in CTR prediction have no meaningful sequence in nature, the features can be arranged in any order. As CNN learns the local information of a sample, the feature sequence may influence its performance significantly. However, this problem has not been fully investigated. This paper firstly investigates whether and how the feature sequence affects the performance of the CNN-based CTR prediction method. As the data distribution of CTR prediction changes with time, the best current sequence may not be suitable for future data. Two multi-sequence models are proposed to learn the information provided by different sequences. The first model learns all sequences using a single feature learning module, while each sequence is learnt individually by a feature learning module in the second one. Moreover, a method of generating a set of embedding sequences which aims to consider the combined influence of all feature pairs on feature learning is also introduced. The experiments are conducted to demonstrate the effectiveness and stability of our proposed models in the offline and online environment on both the benchmark Avazu dataset and a real commercial dataset.

• #2804
A Bayesian Latent Variable Model of User Preferences with Item Context
Learning Preferences or Rankings

Personalized recommendation has proven to be very promising in modeling the preference of users over items. However, most existing work in this context focuses primarily on modeling user-item interactions, which tend to be very sparse. We propose to further leverage the item-item relationships that may reflect various aspects of items that guide users' choices. Intuitively, items that occur within the same "context" (e.g., browsed in the same session, purchased in the same basket) are likely related in some latent aspect. Therefore, accounting for the item's context would complement the sparse user-item interactions by extending a user's preference to other items of similar aspects. To realize this intuition, we develop Collaborative Context Poisson Factorization (C2PF), a new Bayesian latent variable model that seamlessly integrates contextual relationships among items into a personalized recommendation approach. We further derive a scalable variational inference algorithm to fit C2PF to preference data. Empirical results on real-world datasets show evident performance improvements over strong factorization models.

### Tuesday 1711:20 - 13:00MLA-NET - Machine Learning Applications: Networks (C3)

Chair: Chuan Shi
• #713
Efficient Attributed Network Embedding via Recursive Randomized Hashing
Wei Wu, Bin Li, Ling Chen, Chengqi Zhang
Machine Learning Applications: Networks

Attributed network embedding aims to learn a low-dimensional representation for each node of a network, considering both attributes and structure information of the node. However, the learning based methods usually involve substantial cost in time, which makes them impractical without the help of a powerful workhorse. In this paper, we propose a simple yet effective algorithm, named NetHash, to solve this problem only with moderate computing capacity. NetHash employs the randomized hashing technique to encode shallow trees, each of which is rooted at a node of the network. The main idea is to efficiently encode both attributes and structure information of each node by recursively sketching the corresponding rooted tree from bottom (i.e., the predefined highest-order neighboring nodes) to top (i.e., the root node), and particularly, to preserve as much information closer to the root node as possible. Our extensive experimental results show that the proposed algorithm, which does not need learning, runs significantly faster than the state-of-the-art learning-based network embedding methods while achieving competitive or even better performance in accuracy.

• #833
ANOMALOUS: A Joint Modeling Approach for Anomaly Detection on Attributed Networks
Zhen Peng, Minnan Luo, Jundong Li, Huan Liu, Qinghua Zheng
Machine Learning Applications: Networks

The key point of anomaly detection on attributed networks lies in the seamless integration of network structure information and attribute information. A vast majority of existing works are mainly based on the Homophily assumption that implies the nodal attribute similarity of connected nodes. Nonetheless, this assumption is untenable in practice as the existence of noisy and structurally irrelevant attributes may adversely affect the anomaly detection performance. Despite the fact that recent attempts perform subspace selection to address this issue, these algorithms treat subspace selection and anomaly detection as two separate steps which often leads to suboptimal solutions. In this paper, we investigate how to fuse attribute and network structure information more synergistically to avoid the adverse effects brought by noisy and structurally irrelevant attributes. Methodologically, we propose a novel joint framework to conduct attribute selection and anomaly detection as a whole based on CUR decomposition and residual analysis. By filtering out noisy and irrelevant node attributes, we perform anomaly detection with the remaining representative attributes. Experimental results on both synthetic and real-world datasets corroborate the effectiveness of the proposed framework.

• #1144
Galaxy Network Embedding: A Hierarchical Community Structure Preserving Approach
Lun Du, Zhicong Lu, Yun Wang, Guojie Song, Yiming Wang, Wei Chen
Machine Learning Applications: Networks

Network embedding is a method of learning a low-dimensional vector representation of network vertices under the condition of preserving different types of network properties. Previous studies mainly focus on preserving structural information of vertices at a particular scale, like neighbor information or community information, but cannot preserve the hierarchical community structure, which would enable the network to be easily analyzed at various scales. Inspired by the hierarchical structure of galaxies, we propose the Galaxy Network Embedding (GNE) model, which formulates an optimization problem with spherical constraints to describe the hierarchical community structure preserving network embedding. More specifically, we present an approach of embedding communities into a low dimensional spherical surface, the center of which represents the parent community they belong to. Our experiments reveal that the representations from GNE preserve the hierarchical community structure and show advantages in several applications such as vertex multi-class classification and network visualization. The source code of GNE is available online.

• #1182
Power-law Distribution Aware Trust Prediction
Xiao Wang, Ziwei Zhang, Jing Wang, Peng Cui, Shiqiang Yang
Machine Learning Applications: Networks

Trust prediction, aiming to predict the trust relations between users in a social network, is a key to helping users discover the reliable information. Many trust prediction methods are proposed based on the low-rank assumption of a trust network. However, one typical property of the trust network is that the trust relations follow the power-law distribution, i.e., few users are trusted by many other users, while most tail users have few trustors. Due to these tail users, the fundamental low-rank assumption made by existing methods is seriously violated and becomes unrealistic. In this paper, we propose a simple yet effective method to address the problem of the violated low-rank assumption. Instead of discovering the low-rank component of the trust network alone, we learn a sparse component of the trust network to describe the tail users simultaneously. With both of the learned low-rank and sparse components, the trust relations in the whole network can be better captured. Moreover, the transitive closure structure of the trust relations is also integrated into our model. We then derive an effective iterative algorithm to infer the parameters of our model, along with the proof of correctness. Extensive experimental results on real-world trust networks demonstrate the superior performance of our proposed method over the state-of-the-arts.

• #1956
Dynamic Network Embedding : An Extended Approach for Skip-gram based Network Embedding
Lun Du, Yun Wang, Guojie Song, Zhicong Lu, Junshan Wang
Machine Learning Applications: Networks

Network embedding, as an approach to learn low-dimensional representations of vertices, has been proved extremely useful in many applications. Lots of state-of-the-art network embedding methods based on Skip-gram framework are efficient and effective. However, these methods mainly focus on the static network embedding and cannot naturally generalize to the dynamic environment. In this paper, we propose a stable dynamic embedding framework with high efficiency. It is an extension for the Skip-gram based network embedding methods, which can keep the optimality of the objective in the Skip-gram based methods in theory. Our model can not only generalize to the new vertex representation, but also update the most affected original vertex representations during the evolvement of the network. Multi-class classification on three real-world networks demonstrates that, our model can update the vertex representations efficiently and achieve the performance of retraining simultaneously. Besides, the visualization experimental result illustrates that, our model is capable of avoiding the embedding space drifting.

• #4371
Feature Hashing for Network Representation Learning
Qixiang Wang, Shanfeng Wang, Maoguo Gong, Yue Wu
Machine Learning Applications: Networks

The goal of network representation learning is to embed nodes so as to encode the proximity structures of a graph into a continuous low-dimensional feature space. In this paper, we propose a novel algorithm called node2hash based on feature hashing for generating node embeddings. This approach follows the encoder-decoder framework. There are two main mapping functions in this framework. The first is an encoder to map each node into high-dimensional vectors. The second is a decoder to hash these vectors into a lower dimensional feature space. More specifically, we firstly derive a proximity measurement called expected distance as target which combines position distribution and co-occurrence statistics of nodes over random walks so as to build a proximity matrix, then introduce a set of T different hash functions into feature hashing to generate uniformly distributed vector representations of nodes from the proximity matrix. Compared with the existing state-of-the-art network representation learning approaches, node2hash shows a competitive performance on multi-class node classification and link prediction tasks on three real-world networks from various domains.

• #3065
Discrete Network Embedding
Xiaobo Shen, Shirui Pan, Weiwei Liu, Yew-Soon Ong, Quan-Sen Sun
Machine Learning Applications: Networks

Network embedding aims to seek low-dimensional vector representations for network nodes, by preserving the network structure. The network embedding is typically represented in continuous vector, which imposes formidable challenges in storage and computation costs, particularly in large-scale applications. To address the issue, this paper proposes a novel discrete network embedding (DNE) for more compact representations. In particular, DNE learns short binary codes to represent each node. The Hamming similarity between two binary embeddings is then employed to well approximate the ground-truth similarity. A novel discrete multi-class classifier is also developed to expedite classification. Moreover, we propose to jointly learn the discrete embedding and classifier within a unified framework to improve the compactness and discrimination of network embedding. Extensive experiments on node classification consistently demonstrate that DNE exhibits lower storage and computational complexity than state-of-the-art network embedding methods, while obtains competitive classification results.

• #2060
Sampling for Approximate Bipartite Network Projection
Nesreen Ahmed, Nick Duffield, Liangzhen Xia
Machine Learning Applications: Networks

Bipartite graphs manifest as a stream of edges that represent transactions, e.g., purchases by retail customers. Recommender systems employ neighborhood-based measures of node similarity, such as the pairwise number of common neighbors (CN) and related metrics. While the number of node pairs that share neighbors is potentially enormous, only a relatively small proportion of them have many common neighbors. This motivates finding a weighted sampling approach to preferentially sample these node pairs. This paper presents a new sampling algorithm that provides a fixed size unbiased estimate of the similarity matrix resulting from a bipartite edge stream projection. The algorithm has two components. First, it maintains a reservoir of sampled bipartite edges with sampling weights that favor selection of high similarity nodes. Second, arriving edges generate a stream of similarity updates, based on their adjacency with the current sample. These updates are aggregated in a second reservoir sample-based stream aggregator to yield the final unbiased estimate. Experiments on real world graphs show that a 10% sample at each stage yields estimates of high similarity edges with weighted relative errors of about 1%.

### Tuesday 1711:20 - 18:20Competition (K14)

• Angry Birds Competition
Competition
• ### Tuesday 1714:00 - 14:45Invited Talk (VICTORIA)

Chair: Jerome Lang
• Model-free, Model-based, and General Intelligence
Hector Geffner
Invited Talk
• ### Tuesday 1714:55 - 16:10EAR5 - Early Career 5 (VICTORIA)

Chair: Qiang Yang
• #5482
Mining Streaming and Temporal Data: from Representation to Knowledge
Xiangliang Zhang
Early Career 5

In this big-data era, vast amount of continuously arriving data can be found in various fields, such as sensor networks, network management, web and financial applications. To process such data, algorithms are usually challenged by its complex structure and high volume. Representation learning facilitates the data operation by providing a condensed description of patterns underlying the data. Knowledge discovery based on the new representations will then be computationally efficient, and to certain extent be more effective due to the removal of noise and irrelevant information in the step of representation learning. In this paper, we will briefly review state-of-the-art techniques for extracting representation and discovering knowledge from streaming and temporal data, and demonstrate their performance at addressing several real application problems.

• #5495
The power of convexity in deep learning
J. Zico Kolter
Early Career 5

• #5494
Towards Sample Efficient Reinforcement Learning
Yang Yu
Early Career 5

Reinforcement learning is a major tool to realize intelligent agents that can be autonomously adaptive to the environment. With deep models, reinforcement learning has shown great potential in complex tasks such as playing games from pixels. However, current reinforcement learning techniques are still suffer from requiring a huge amount of interaction data, which could result in unbearable cost in real-world applications. In this article, we share our understanding of the problem, and discuss possible ways to alleviate the sample cost of reinforcement learning, from the aspects of exploration, optimization, environment modeling, experience transfer, and abstraction. We also discuss some challenges in real-world applications, with the hope of inspiring future researches.

### Tuesday 1714:55 - 16:10KR-PS - Knowledge Representation and Planning (C7)

Chair: David Toman
• #1756
Automata-Theoretic Foundations of FOND Planning for LTLf and LDLf Goals
Giuseppe De Giacomo, Sasha Rubin
Knowledge Representation and Planning

We study planning for LTLf and LDLf temporally extended goals in nondeterministic fully observable domains (FOND). We consider both strong and strong cyclic plans, and develop foundational automata-based techniques to deal with both cases.  Using these techniques we provide the computational characterization of both problems, separating the complexity in the size of the domain specification from that in the size of the formula. Specifically we establish them to be EXPTIME-complete and 2EXPTIME-complete, respectively, for both problems. In doing so, we also show 2EXPTIME-hardness for strong cyclic plans, which was open.

• #3368
Features, Projections, and Representation Change for Generalized Planning
Blai Bonet, Hector Geffner
Knowledge Representation and Planning

Generalized planning is concerned with the characterization and computation of plans that solve many instances at once. In the standard formulation, a generalized plan is a mapping from fea- ture or observation histories into actions, assuming that the instances share a common pool of features and actions. This assumption, however, excludes the standard relational planning domains where actions and objects change across instances. In this work, we extend the standard formulation of generalized planning to such domains. This is achieved by projecting the actions over the features, resulting in a common set of abstract actions which can be tested for soundness and completeness, and which can be used for generating general policies such as “if the gripper is empty, pick the clear block above x and place it on the table” that achieve the goal clear(x) in any Blocksworld instance. In this policy, “pick the clear block above x” is an abstract action that may represent the action Unstack(a, b) in one situation and the action Unstack(b, c) in another. Transformations are also introduced for computing such policies by means of fully observable non-deterministic (FOND) planners. The value of generalized representations for learning general policies is also discussed.

• #4265
Complexity of Scheduling Charging in the Smart Grid
Mathijs de Weerdt, Michael Albert, Vincent Conitzer, Koos van der Linden
Knowledge Representation and Planning

The problem of optimally scheduling the charging demand of electric vehicles within the constraints of the electricity infrastructure is called the charge scheduling problem. The models of the charging speed, horizon, and charging demand determine the computational complexity of the charge scheduling problem. We show that for about 20 variants the problem is either in P or weakly NP-hard and dynamic programs exist to compute optimal solutions. About 10 other variants of the problem are strongly NP-hard, presenting a potentially significant obstacle to their use in practical situations of scale. An experimental study establishes up to what parameter values the dynamic programs can determine optimal solutions in a couple of minutes.

• #1927
Small Undecidable Problems in Epistemic Planning
Sébastien Lê Cong, Sophie Pinchinat, François Schwarzentruber
Knowledge Representation and Planning

Epistemic planning extends classical planning with knowledge and is based on dynamic epistemic logic (DEL). The epistemic planning problem is undecidable in general. We exhibit a small undecidable subclass of epistemic planning over 2-agent S5 models with a fixed repertoire of one action, 6 propositions and a fixed goal. We furthermore consider a variant of the epistemic planning problem where the initial knowledge state is an automatic structure, hence possibly infinite. In that case, we show the epistemic planning problem with 1 public action and 2 propositions to be undecidable, while it is known to be decidable  with public actions over finite models. Our results are obtained by reducing the reachability problem over small universal cellular automata. While our reductions yield a goal formula that displays the common knowledge operator, we show, for each of our considered epistemic problems, a reduction into an epistemic planning problem for a common-knowledge-operator-free goal formula by using 2 additional actions.

• #2715
Multi-agent Epistemic Planning with Common Knowledge
Qiang Liu, Yongmei Liu
Knowledge Representation and Planning

In the past decade, multi-agent epistemic planning has received much attention from both dynamic logic and planning communities. Common knowledge is an essential part of multi-agent modal logics, and plays an important role in coordination and interaction of multiple agents. However, existing implementations of multi-agent epistemic planning provide very limited support for common knowledge, basically static propositional common knowledge. Our work aims to extend an existing multi-agent epistemic planning framework based on higher-order belief change with the capability to deal with common knowledge. We propose a novel normal form for multi-agent KD45 logic with common knowledge. We propose satisfiability solving, revision and update algorithms for this normal form. Based on our algorithms, we implemented a multi-agent epistemic planner with common knowledge called MEPC. Our planner successfully generated solutions for several domains that demonstrate the typical usage of common knowledge.

• #1239
PEORL: Integrating Symbolic Planning and Hierarchical Reinforcement Learning for Robust Decision-Making
Fangkai Yang, Daoming Lyu, Bo Liu, Steven Gustafson
Knowledge Representation and Planning

Reinforcement learning and symbolic planning have both been used to build intelligent autonomous agents. Reinforcement learning relies on learning from interactions with real world, which often requires an unfeasibly large amount of experience. Symbolic planning relies on manually crafted symbolic knowledge, which may not be robust to domain uncertainties and changes. In this paper we present a unified framework PEORL that integrates symbolic planning with hierarchical reinforcement learning (HRL) to cope with decision-making in dynamic environment with uncertainties. Symbolic plans are used to guide the agent's task execution and learning, and the learned experience is fed back to symbolic knowledge to improve planning. This method leads to rapid policy search and robust symbolic plans in complex domains. The framework is tested on benchmark domains of HRL.

### Tuesday 1714:55 - 16:10MAS-GSC - Game Theory and Social Choice (C8)

Chair: Maria Polukarov
• #1741
Strategyproof and Fair Matching Mechanism for Union of Symmetric M-convex Constraints
Yuzhe Zhang, Kentaro Yahiro, Nathanaël Barrot, Makoto Yokoo
Game Theory and Social Choice

In this paper, we identify a new class of distributional constraints defined as a union of symmetric M-convex sets, which can represent a variety of real-life constraints in two-sided matching settings. Since M-convexity is not closed under union, a union of symmetric M-convex sets does not belong to this well-behaved class of constraints in general. Thus, developing a fair and strategyproof mechanism that can handle this class is challenging. We present a novel mechanism called Quota Reduction Deferred Acceptance (QRDA), which repeatedly applies the standard DA mechanism by sequentially reducing artificially introduced maximum quotas. We show that QRDA is fair and strategyproof when handling a union of symmetric M-convex sets. Furthermore, in comparison to a baseline mechanism called Artificial Cap Deferred Acceptance (ACDA), QRDA always obtains a weakly better matching for students and, experimentally, performs better in terms of nonwastefulness.

• #2693
Exact Algorithms and Complexity of Kidney Exchange
Mingyu Xiao, Xuanbei Wang
Game Theory and Social Choice

Kidney Exchange is an approach to donor kidney transplantation where patients with incompatible donors swap kidneys to receive a compatible kidney. Since it was first put forward in 1986, increasing amount of people have gotten a life-saving kidney with the popularity of Kidney Exchange, as patients have more opportunities to get saved in this way. This growth is making the problem of optimally matching patients to donors more difficult to solve. The central problem, indeed, is the NP-hard problem to find the largest vertex-disjoint packing of cycles and chains in a graph that represents the compatibility between patients and donors, where due to the human resource limitation we may have constraints on the maximum length of cycles and chains. This paper mainly contributes to algorithms from theory for this problem with and without length constraints (restricted and free versions). We give: 1. A single-exponential exact algorithm based on subset convolution for the two versions; 2. An FPT algorithm for the free version with parameter being the number of vertex types'' in the graph.

• #3408
Facility Reallocation on the Line
Bart de Keijzer, Dominik Wojtczak
Game Theory and Social Choice

We consider a multi-stage facility reallocation problems on the real line, where a facility is being moved between stages based on the locations reported by n agents. The aim of the reallocation mechanism is to minimize the social cost, i.e., the sum over the total distance between the facility and all agents at all stages, plus the cost incurred for moving the facility. We also study this problem both in the offline setting and online setting. In the offline case the mechanism has full knowledge of the agent locations in all future stages, and in the online setting the mechanism does not know these future locations and must decide the location of the facility on a stage-per-stage basis. For both cases, we derive the optimal mechanism, where for the online setting we show that its competitive ratio is (n+2)/(n+1). As neither of these mechanisms turns out to be strategyproof, we propose another strategyproof mechanism which has a competitive ratio of (n+3)/(n+1) for odd n and (n+4)/n for even n, which we conjecture to be the best possible. We also consider a generalization with multiple facilities and weighted agents, for which we show that the optimum can be computed in polynomial time for a fixed number of facilities.

• #3413
Negotiation Strategies for Agents with Ordinal Preferences
Sefi Erlich, Noam Hazon, Sarit Kraus
Game Theory and Social Choice

Negotiation is a very common interaction between automated agents. Many common negotiation protocols work with cardinal utilities, even though ordinal preferences, which only rank the outcomes, are easier to elicit from humans. In this work we concentrate on negotiation with ordinal preferences over a finite set of outcomes. We study an intuitive protocol for bilateral negotiation, where the two parties make offers alternately. We analyze the negotiation protocol under different settings. First, we assume that each party has full information about the other party's preference order. We provide elegant strategies that specify a sub-game perfect equilibrium for the agents. We further show how the studied negotiation protocol almost completely implements a known bargaining rule. Finally, we analyze the no information setting. We study several solution concepts that are distribution-free, and analyze both the case where neither party knows the preference order of the other party, and the case where only one party is uninformed.

• #4120
Big City vs. the Great Outdoors: Voter Distribution and How It Affects Gerrymandering
Allan Borodin, Omer Lev, Nisarg Shah, Tyrone Strangway
Game Theory and Social Choice

Gerrymandering is the process by which parties manipulate boundaries of electoral districts in order to maximize the number of districts they can win. Demographic trends show an increasingly strong correlation between residence and party affiliation; some party’s supporters congregate in cities, while others stay in more rural areas. We investigate both theoretically and empirically the effect of this trend on a party's ability to gerrymander in a two-party model ("urban party" and "rural party"). Along the way, we propose a definition of the gerrymandering power of a party, and an algorithmic approach for near-optimal gerrymandering in large instances. Our results suggest that beyond a fairly small concentration of urban party's voters, the gerrymandering power of a party depends almost entirely on the level of concentration, and not on the party's share of the population. As partisan separation grows, the gerrymandering power of both parties converge so that each party can gerrymander to get only slightly more than what its voting share warrants, bringing about, ultimately, a more representative outcome. Moreover, there seems to be an asymmetry between the gerrymandering power of the parties, with the rural party being more capable of gerrymandering.

• #5137
(Sister Conferences Best Papers Track) Combinatorial Cost Sharing
Game Theory and Social Choice

We introduce a combinatorial variant of the cost sharing problem: several services can be provided to each player and each player values every combination of services differently. A publicly known cost function specifies the cost of providing every possible combination of services. A combinatorial cost sharing mechanism is a protocol that decides which services each player gets and at what price. We look for dominant strategy mechanisms that are (economically) efficient and cover the cost, ideally without overcharging (i.e., budget balanced). Note that unlike the standard cost sharing setting, combinatorial cost sharing is a multi-parameter domain. This makes designing dominant strategy mechanisms with good guarantees a challenging task. We present the Potential Mechanism -- a combination of the VCG mechanism and a well-known tool from the theory of cooperative games: Hart and Mas-Colell's potential function. The potential mechanism is a dominant strategy mechanism that always covers the incurred cost. When the cost function is subadditive the same mechanism is also approximately efficient. Our main technical contribution shows that when the cost function is submodular the potential mechanism is approximately budget balanced in three settings: supermodular valuations, symmetric cost function and general symmetric valuations, and two players with general valuations.

### Tuesday 1714:55 - 16:10CSAT-ML - Constraints, Satisfiability and Learning (K2)

Chair: Chen Gong
• #507
Descriptive Clustering: ILP and CP Formulations with Applications
Thi-Bich-Hanh Dao, Chia-Tung Kuo, S. S. Ravi, Christel Vrain, Ian Davidson
Constraints, Satisfiability and Learning

In many settings just finding a good clustering is insufficient and an explanation of the clustering is required. If the features used to perform the clustering are interpretable then methods such as conceptual clustering can be used. However, in many applications this is not the case particularly for image, graph and other complex data. Here we explore the setting where a set of interpretable discrete tags for each instance is available. We formulate the descriptive clustering problem as a bi-objective optimization to simultaneously find compact clusters using the features and to describe them using the tags. We present our formulation in a declarative platform and show it can be integrated into a standard iterative algorithm to find all Pareto optimal solutions to the two objectives. Preliminary results demonstrate the utility of our approach on real data sets for images and electronic health care records and that it outperforms single objective and multi-view clustering baselines.

• #1228
Machine Learning and Constraint Programming for Relational-To-Ontology Schema Mapping
Diego De Uña, Nataliia Rümmele, Graeme Gange, Peter Schachte, Peter J. Stuckey
Constraints, Satisfiability and Learning

The problem of integrating heterogeneous data sources into an ontology is highly relevant in the database field. Several techniques exist to approach the problem, but side constraints on the data cannot be easily implemented and thus the results may be inconsistent. In this paper we improve previous work by Taheriyan et al. [2016a] using Machine Learning (ML) to take into account inconsistencies in the data (unmatchable attributes) and encode the problem as a variation of the Steiner Tree, for which we use work by De Uña et al. [2016] in Constraint Programming (CP). Combining ML and CP achieves state-of-the-art precision, recall and speed, and provides a more flexible framework for variations of the problem.

• #2569
Faster Training Algorithms for Structured Sparsity-Inducing Norm
Bin Gu, Xingwang Ju, Xiang Li, Guansheng Zheng
Constraints, Satisfiability and Learning

Structured-sparsity regularization is popular for sparse learning because of its flexibility of encoding the feature structures. This paper considers a generalized version of structured-sparsity regularization (especially for $l_1/l_{\infty}$ norm) with arbitrary group overlap. Due to the group overlap, it is time-consuming to solve the associated proximal operator. Although Mairal~\shortcite{mairal2010network} have proposed a  network-flow  algorithm to solve the proximal operator, it is still time-consuming especially in the high-dimensional setting. To address this challenge, in this paper, we have developed a more efficient solution for $l_1/l_{\infty}$ group lasso with arbitrary group overlap using an Inexact Proximal-Gradient method. In each iteration, our algorithm only requires to calculate an inexact solution to the proximal sub-problem, which can be done efficiently. On the theoretic side, the proposed algorithm enjoys the same global convergence rate as the exact proximal methods. Experiments demonstrate that our algorithm is much more efficient than network-flow algorithm, while retaining the similar generalization performance.

• #3840
Learning SMT(LRA) Constraints using SMT Solvers
Samuel Kolb, Stefano Teso, Andrea Passerini, Luc De Raedt
Constraints, Satisfiability and Learning

We introduce the problem of learning SMT(LRA) constraints from data. SMT(LRA) extends propositional logic with (in)equalities between numerical variables. Many relevant formal verification problems can be cast as SMT(LRA) instances and SMT(LRA) has supported recent developments in optimization and counting for hybrid Boolean and numerical domains. We introduce SMT(LRA) learning, the task of learning SMT(LRA) formulas from examples of feasible and infeasible instances, and we contribute INCAL, an exact non-greedy algorithm for this setting. Our approach encodes the learning task itself as an SMT(LRA) satisfiability problem that can be solved directly by SMT solvers. INCAL is an incremental algorithm that achieves exact learning by looking only at a small subset of the data, leading to significant speed-ups. We empirically evaluate our approach on both synthetic instances and benchmark problems taken from the SMT-LIB benchmarks repository.

• #3899
Learning Optimal Decision Trees with SAT
Nina Narodytska, Alexey Ignatiev, Filipe Pereira, Joao Marques-Silva
Constraints, Satisfiability and Learning

Explanations of machine learning (ML) predictions are of fundamental importance in different settings. Moreover, explanations should be succinct, to enable easy understanding by humans.  Decision trees represent an often used approach for developing explainable ML models, motivated by the natural mapping between decision tree paths and rules. Clearly, smaller trees correlate well with smaller rules, and so one  challenge is to devise solutions for computing smallest size decision trees given training data. Although simple to formulate, the computation of smallest size decision trees turns out to be an extremely challenging computational problem, for which no practical solutions are known. This paper develops a SAT-based model for computing smallest-size decision trees given training data. In sharp contrast with past work, the proposed SAT model is shown to scale for publicly available datasets of practical interest.

• #2772
Neural Networks for Predicting Algorithm Runtime Distributions
Katharina Eggensperger, Marius Lindauer, Frank Hutter
Constraints, Satisfiability and Learning

Many state-of-the-art algorithms for solving hard combinatorial problems in artificial intelligence (AI) include elements of stochasticity that lead to high variations in runtime, even for a fixed problem instance. Knowledge about the resulting runtime distributions (RTDs) of algorithms on given problem instances can be exploited in various meta-algorithmic procedures, such as algorithm selection, portfolios, and randomized restarts. Previous work has shown that machine learning can be used to individually predict mean, median and variance of RTDs. To establish a new state-of-the-art in predicting RTDs, we demonstrate that the parameters of an RTD should be learned jointly and that neural networks can do this well by directly optimizing the likelihood of an RTD given runtime observations. In an empirical study involving five algorithms for SAT solving and AI planning, we show that neural networks predict the true RTDs of unseen instances better than previous methods, and can even do so when only few runtime observations are available per training instance.

### Tuesday 1714:55 - 16:10NLP-CLA - Sentence and Text Classification, Text Segmentation (T2)

Chair: Mausam
• #189
Differentiated Attentive Representation Learning for Sentence Classification
Qianrong Zhou, Xiaojie Wang, Xuan Dong
Sentence and Text Classification, Text Segmentation

Attention-based models have shown to be effective in learning representations for sentence classification. They are typically equipped with multi-hop attention mechanism. However, existing multi-hop models still suffer from the problem of paying much attention to the most frequently noticed words, which might not be important to classify the current sentence. And there is a lack of explicitly effective way that helps the attention to be shifted out of a wrong part in the sentence. In this paper, we alleviate this problem by proposing a differentiated attentive learning model. It is composed of two branches of attention subnets and an example discriminator. An explicit signal with the loss information of the first attention subnet is passed on to the second one to drive them to learn different attentive preference. The example discriminator then selects the suitable attention subnet for sentence classification. Experimental results on real and synthetic datasets demonstrate the effectiveness of our model.

• #255
Jumper: Learning When to Make Classification Decision in Reading
Xianggen Liu, Lili Mou, Haotian Cui, Zhengdong Lu, Sen Song
Sentence and Text Classification, Text Segmentation

In early years, text classification is typically accomplished by feature-based classifiers; recently, neural networks, as powerful classifiers, make it possible to work with raw input as the text stands. In this paper, we propose a novel framework, Jumper, inspired by the cognitive process of text reading, that models text classification as a sequential decision process. Basically, Jumper is a neural system that can scan a piece of text sequentially and make classification decision at the time it chooses. Both the classification and when to make the classification are part of the decision process which are controlled by the policy net and trained with reinforcement learning to maximize the overall classification accuracy. Experimental results show that a properly trained Jumper has the following properties: (1) It can make decisions whenever the evidence is enough, therefore reducing the total text reading by 30~40% and often finding the key rationale of prediction. (2) It can achieve classification accuracy better or comparable to state-of-the-art model in several benchmark and industrial datasets.

• #555
SegBot: A Generic Neural Text Segmentation Model with Pointer Network
Jing Li, Aixin Sun, Shafiq Joty
Sentence and Text Classification, Text Segmentation

Text segmentation is a fundamental task in natural language processing that comes in two levels of granularity: (i) segmenting a document into a sequence of topical segments (topic segmentation), and (ii) segmenting a sentence into a sequence of elementary discourse units (EDU segmentation). Traditional solutions to the two tasks heavily rely on carefully designed features. The recently proposed neural models do not need manual feature engineering, but they either suffer from sparse boundary tags or they cannot well handle the issue of variable size output vocabulary. We propose a generic end-to-end segmentation model called SegBot. SegBot uses a bidirectional recurrent neural network to encode input text sequence. The model then uses another recurrent neural network together with a pointer network to select text boundaries in the input sequence. In this way, SegBot does not require hand-crafted features. More importantly, our model inherently handles the issue of variable size output vocabulary and the issue of sparse boundary tags. In our experiments, SegBot outperforms state-of-the-art models on both topic and EDU segmentation tasks.

• #4344
Translations as Additional Contexts for Sentence Classification
Reinald Kim Amplayo, Kyungjae Lee, Jinyoung Yeo, Seung-won Hwang
Sentence and Text Classification, Text Segmentation

In sentence classification tasks, additional contexts, such as the neighboring sentences, may improve the accuracy of the classifier. However, such contexts are domain-dependent and thus cannot be used for another classification task with an inappropriate domain. In contrast, we propose the use of translated sentences as domain-free context that is always available regardless of the domain. We find that naive feature expansion of translations gains only marginal improvements and may decrease the performance of the classifier, due to possible inaccurate translations thus producing noisy sentence vectors. To this end, we present multiple context fixing attachment (MCFA), a series of modules attached to multiple sentence vectors to fix the noise in the vectors using the other sentence vectors as context. We show that our method performs competitively compared to previous models, achieving best classification performance on multiple data sets. We are the first to use translations as domain-free contexts for sentence classification.

• #769
Deep Text Classification Can be Fooled
Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, Wenchang Shi
Sentence and Text Classification, Text Segmentation

In this paper, we present an effective method to craft text adversarial samples, revealing one important yet underestimated fact that DNN-based text classifiers are also prone to adversarial sample attack. Specifically, confronted with different adversarial scenarios, the text items that are important for classification are identified by computing the cost gradients of the input (white-box attack) or generating a series of occluded test samples (black-box attack). Based on these items, we design three perturbation strategies, namely insertion, modification, and removal, to generate adversarial samples. The experiment results show that the adversarial samples generated by our method can successfully fool both state-of-the-art character-level and word-level DNN-based text classifiers. The adversarial samples can be perturbed to any desirable classes without compromising their utilities. At the same time, the introduced perturbation is difficult to be perceived.

• #696
Multiway Attention Networks for Modeling Sentence Pairs
Chuanqi Tan, Furu Wei, Wenhui Wang, Weifeng Lv, Ming Zhou
Sentence and Text Classification, Text Segmentation

Modeling sentence pairs plays the vital role for judging the relationship between two sentences, such as paraphrase identification, natural language inference, and answer sentence selection. Previous work achieves very promising results using neural networks with attention mechanism. In this paper, we propose the multiway attention networks which employ multiple attention functions to match sentence pairs under the matching-aggregation framework. Specifically, we design four attention functions to match words in corresponding sentences. Then, we aggregate the matching information from each function, and combine the information from all functions to obtain the final representation. Experimental results demonstrate that the proposed multiway attention networks improve the result on the Quora Question Pairs, SNLI, MultiNLI, and answer sentence selection task on the SQuAD dataset.

### Tuesday 1714:55 - 16:10SGP-ML - Heuristic Search and Learning (T1)

Chair: Frans Oliehoek
• #593
Distributed Self-Paced Learning in Alternating Direction Method of Multipliers
Xuchao Zhang, Liang Zhao, Zhiqian Chen, Chang-Tien Lu
Heuristic Search and Learning

Self-paced learning (SPL) mimics the cognitive process of humans, who generally learn from easy samples to hard ones. One key issue in SPL is the training process required for each instance weight depends on the other samples and thus cannot easily be run in a distributed manner in a large-scale dataset. In this paper, we reformulate the self-paced learning problem into a distributed setting and propose a novel Distributed Self-Paced Learning method (DSPL) to handle large scale datasets. Specifically, both the model and instance weights can be optimized in parallel for each batch based on a consensus alternating direction method of multipliers. We also prove the convergence of our algorithm under mild conditions. Extensive experiments on both synthetic and real datasets demonstrate that our approach is superior to those of existing methods.

• #616
Episodic Memory Deep Q-Networks
Zichuan Lin, Tianqi Zhao, Guangwen Yang, Lintao Zhang
Heuristic Search and Learning

Reinforcement learning (RL) algorithms have made huge progress in recent years by leveraging the power of deep neural networks (DNN). Despite the success, deep RL algorithms are known to be sample inefficient, often requiring many rounds of interactions with the environments to obtain satisfactory performances. Recently, episodic memory based RL has attracted attention due to its ability to latch on good actions quickly. In this paper, we present a simple yet effective biologically inspired RL algorithm called Episodic Memory Deep Q-Networks (EMDQN), which leverages episodic memory to supervise an agent during training. Experiments show that our proposed method leads to better sample efficiency and is more likely to find good policy. It only requires 1/5 of the interactions of DQN to achieve many state-of-the-art performances on Atari games, significantly outperforming regular DQN and other episodic memory based RL algorithms.

• #1919
Optimization based Layer-wise Magnitude-based Pruning for DNN Compression
Guiying Li, Chao Qian, Chunhui Jiang, Xiaofen Lu, Ke Tang
Heuristic Search and Learning

Layer-wise magnitude-based pruning (LMP) is a very popular method for deep neural network (DNN) compression. However, tuning the layer-specific thresholds is a difficult task, since the space of threshold candidates is exponentially large and the evaluation is very expensive. Previous methods are mainly by hand and require expertise. In this paper, we propose an automatic tuning approach based on optimization, named OLMP. The idea is to transform the threshold tuning problem into a constrained optimization problem (i.e., minimizing the size of the pruned model subject to a constraint on the accuracy loss), and then use powerful derivative-free optimization algorithms to solve it. To compress a trained DNN, OLMP is conducted within a new iterative pruning and adjusting pipeline. Empirical results show that OLMP can achieve the best pruning ratio on LeNet-style models (i.e., 114 times for LeNet-300-100 and 298 times for LeNet-5) compared with some state-of-the- art DNN pruning methods, and can reduce the size of an AlexNet-style network up to 82 times without accuracy loss.

• #2419
Three-Head Neural Network Architecture for Monte Carlo Tree Search
Chao Gao, Martin Müller, Ryan Hayward
Heuristic Search and Learning

AlphaGo Zero pioneered the concept of two-head neural networks in Monte Carlo Tree Search (MCTS), where the policy output is used for prior action probability and the state-value estimate is used for leaf node evaluation. We propose a three-head neural net architecture with policy, state- and action-value outputs, which could lead to more efficient MCTS since neural leaf estimate can still be back-propagated in tree with delayed node expansion and evaluation. To effectively train the newly introduced action-value head on the same game dataset as for two-head nets, we exploit the optimal relations between parent and children nodes for data augmentation and regularization. In our experiments for the game of Hex, the action-value head learning achieves similar error as the state-value prediction of a two-head architecture. The resulting neural net models are then combined with the same Policy Value MCTS (PV-MCTS) implementation. We show that, due to more efficient use of neural net evaluations, PV-MCTS with three-head neural nets consistently performs better than the two-head ones, significantly outplaying the state-of-the-art player MoHex-CNN.

• #3371
Master-Slave Curriculum Design for Reinforcement Learning
Yuechen Wu, Wei Zhang, Ke Song
Heuristic Search and Learning

Curriculum learning is often introduced as a leverage to improve the agent training for complex tasks, where the goal is to generate a sequence of easier subasks for an agent to train on, such that final performance or learning speed is improved. However, conventional curriculum is mainly designed for one agent with fixed action space and sequential simple-to-hard training manner. Instead, we present a novel curriculum learning strategy by introducing the concept of master-slave agents and enabling flexible action setting for agent training. Multiple agents, referred as master agent for the target task and slave agents for the subtasks, are trained concurrently within different action spaces by sharing a perception network with an asynchronous strategy. Extensive evaluation on the VizDoom platform demonstrates the joint learning of master agent and slave agents mutually benefit each other. Significant improvement is obtained over A3C in terms of learning speed and performance.

• #1958
Approximation Guarantees of Stochastic Greedy Algorithms for Subset Selection
Chao Qian, Yang Yu, Ke Tang
Heuristic Search and Learning

Subset selection is a fundamental problem in many areas, which aims to select the best subset of size at most $k$ from a universe. Greedy algorithms are widely used for subset selection, and have shown good approximation performances in deterministic situations. However, their behaviors are stochastic in many realistic situations (e.g., large-scale and noisy). For general stochastic greedy algorithms, bounded approximation guarantees were obtained only for subset selection with monotone submodular objective functions, while real-world applications often involve non-monotone or non-submodular objective functions and can be subject to a more general constraint than a size constraint. This work proves their approximation guarantees in these cases, and thus largely extends the applicability of stochastic greedy algorithms.

### Tuesday 1714:55 - 16:10ML-ROL - Reinforcement Learning and Online Learning (K11)

Chair: Fei Fang
• #3769
Exploration by Distributional Reinforcement Learning
Yunhao Tang, Shipra Agrawal
Reinforcement Learning and Online Learning

We propose a framework based on distributional reinforcement learning and recent attempts to combine Bayesian parameter updates with deep reinforcement learning. We show that our proposed framework conceptually unifies multiple previous methods in exploration. We also derive a practical algorithm that achieves efficient exploration on challenging control tasks.

• #734
Finite Sample Analysis of LSTD with Random Projections and Eligibility Traces
Haifang Li, Yingce Xia, Wensheng Zhang
Reinforcement Learning and Online Learning

Policy evaluation with linear function approximation is an important problem in reinforcement learning. When facing high-dimensional feature spaces, such a problem becomes extremely hard considering the computation efficiency and quality of approximations. We propose a new algorithm, LSTD(lambda)-RP, which leverages random projection techniques and takes eligibility traces into consideration to tackle the above two challenges. We carry out theoretical analysis of LSTD(lambda)-RP, and provide meaningful upper bounds of the estimation error, approximation error and total generalization error. These results demonstrate that LSTD(lambda)-RP can benefit from random projection and eligibility traces strategies, and LSTD(lambda)-RP can achieve better performances than prior LSTD-RP and LSTD(lambda) algorithms.

• #2402
Multi-modality Sensor Data Classification with Selective Attention
Xiang Zhang, Lina Yao, Chaoran Huang, Sen Wang, Mingkui Tan, Guodong Long, Can Wang
Reinforcement Learning and Online Learning

Multimodel wearable sensor data classificationplays an important role in ubiquitous computingand has a wide range of applications in variousscenarios from healthcare to entertainment. How-ever, most of the existing work in this field em-ploys domain-specific approaches and is thus inef-fective in complex situations where multi-modalitysensor data is collected. Moreover, the wearablesensor data is less informative than the conven-tional data such as texts or images. In this paper,to improve the adaptability of such classificationmethods across different application contexts, weturn this classification task into a game and applya deep reinforcement learning scheme to dynami-cally deal with complex situations. We also intro-duce a selective attention mechanism into the rein-forcement learning scheme to focus on the crucialdimensions of the data. This mechanism helps tocapture extra information from the signal, and canthus significantly improve the discriminative powerof the classifier. We carry out several experimentson three wearable sensor datasets, and demonstratecompetitive performance of the proposed approachcompared to several state-of-the-art baselines.

• #4471
Minghao Hu, Yuxing Peng, Zhen Huang, Xipeng Qiu, Furu Wei, Ming Zhou
Reinforcement Learning and Online Learning

• #779
Preventing Disparate Treatment in Sequential Decision Making
Hoda Heidari, Andreas Krause
Reinforcement Learning and Online Learning

We study fairness in sequential decision making environments, where at each time step a learning algorithm receives data corresponding to a new individual (e.g. a new job application) and must make an irrevocable decision about him/her (e.g. whether to hire the applicant) based on observations made so far. In order to prevent cases of disparate treatment, our time-dependent notion of fairness requires algorithmic decisions to be consistent: if two individuals are similar in the feature space and arrive during the same time epoch, the algorithm must assign them to similar outcomes. We propose a general framework for post-processing predictions made by a black-box learning model, that guarantees the resulting sequence of outcomes is consistent. We show theoretically that imposing consistency will not significantly slow down learning. Our experiments on two real-world data sets illustrate and confirm this finding in practice.

• #1965
Ruida Zhou, Chao Gan, Jing Yang, Cong Shen
Reinforcement Learning and Online Learning

In this paper, we propose a cost-aware cascading bandits model, a new variant of multi-armed bandits with cascading feedback, by considering the random cost of pulling arms. In each step, the learning agent chooses an {\it ordered} list of items and \congr{examines} them sequentially, until certain stopping condition is satisfied. Our objective is then to maximize the expected {\it net reward} in each step, i.e., the reward obtained in each step minus the total cost incurred in examining the items, by deciding the ordered list of items, as well as when to stop examination. We study both the offline and online settings, depending on whether the state and cost statistics of the items are known beforehand. For the offline setting, we show that the Unit Cost Ranking with Threshold 1 (UCR-T1) policy is optimal. For the online setting, we propose a Cost-aware Cascading Upper Confidence Bound (CC-UCB) algorithm, and show that the cumulative regret scales in $O(\log T)$. We also provide a lower bound for all $\alpha$-consistent policies, which scales in $\Omega(\log T)$ and matches our upper bound. The performance of the CC-UCB algorithm is evaluated with both synthetic and real-world data.

### Tuesday 1714:55 - 16:10HAI-PUM - Personalization, User Modeling (C2)

Chair: Grzegorz J. Nalepa
• #3485
Algorithms for Fair Load Shedding in Developing Countries
Olabambo I. Oluwasuji, Obaid Malik, Jie Zhang, Sarvapali D. Ramchurn
Personalization, User Modeling

Due to the limited generation capacity of power stations, many developing countries frequently resort to disconnecting large parts of the power grid from supply, a process termed load shedding. During load shedding, many homes are left without electricity, causing them inconvenience and discomfort. In this paper, we present a number of optimization heuristics that focus on pairwise and groupwise fairness, such that households (i.e. agents) are fairly allocated electricity. We evaluate the heuristics against standard fairness metrics in terms of comfort delivered to homes, as well as the number of times they are disconnected from electricity supply. Thus, we establish new benchmarks for fair load shedding schemes.

• #607
Learning Sequential Correlation for User Generated Textual Content Popularity Prediction
Wen Wang, Wei Zhang, Jun Wang, Junchi Yan, Hongyuan Zha
Personalization, User Modeling

Popularity prediction of user generated textual content is critical for prioritizing information in the web, which alleviates heavy information overload for ordinary readers. Most previous studies model each content instance separately for prediction and thus overlook the sequential correlations between instances of a specific user. In this paper, we go deeper into this problem based on the two observations for each user, i.e., sequential content correlation and sequential popularity correlation. We propose a novel deep sequential model called User Memory-augmented recurrent Attention Network (UMAN). This model encodes the two correlations by updating external user memories which is further leveraged for target text representation learning and popularity prediction. The experimental results on several real-world datasets validate the benefits of considering these correlations and demonstrate UMAN achieves best performance among several strong competitors.

• #2399
Personality-Aware Personalized Emotion Recognition from Physiological Signals
Sicheng Zhao, Guiguang Ding, Jungong Han, Yue Gao
Personalization, User Modeling

Emotion recognition methodologies from physiological signals are increasingly becoming personalized, due to the subjective responses of different subjects to physical stimuli. Existing works mainly focused on modelling the involved physiological corpus of each subject, without considering the psychological factors. The latent correlation among different subjects has also been rarely examined. We propose to investigate the influence of personality on emotional behavior in a hypergraph learning framework. Assuming that each vertex is a compound tuple (subject, stimuli), multi-modal hypergraphs can be constructed based on the personality correlation among different subjects and on the physiological correlation among corresponding stimuli. To reveal the different importance of vertices, hyperedges, and modalities, we assign each of them with weights. The emotion relevance learned on the vertex-weighted multi-modal multi-task hypergraphs is employed for emotion recognition. We carry out extensive experiments on the ASCERTAIN dataset and the results demonstrate the superiority of the proposed method.

• #4075
Cross-Domain Depression Detection via Harvesting Social Media
Tiancheng Shen, Jia Jia, Guangyao Shen, Fuli Feng, Xiangnan He, Huanbo Luan, Jie Tang, Thanassis Tiropanis, Tat-Seng Chua, Wendy Hall
Personalization, User Modeling

Depression detection is a significant issue for human well-being. In previous studies, online detection has proven effective in Twitter, enabling proactive care for depressed users. Owing to cultural differences, replicating the method to other social media platforms, such as Chinese Weibo, however, might lead to poor performance because of insufficient available labeled (self-reported depression) data for model training. In this paper, we study an interesting but challenging problem of enhancing detection in a certain target domain (e.g. Weibo) with ample Twitter data as the source domain. We first systematically analyze the depression-related feature patterns across domains and summarize two major detection challenges, namely isomerism and divergency. We further propose a cross-domain Deep Neural Network model with Feature Adaptive Transformation & Combination strategy (DNN-FATC) that transfers the relevant information across heterogeneous domains. Experiments demonstrate improved performance compared to existing heterogeneous transfer methods or training directly in the target domain (over 3.4% improvement in F1), indicating the potential of our model to enable depression detection via social media for more countries with different cultural settings.

• #3052
Neural Framework for Joint Evolution Modeling of User Feedback and Social Links in Dynamic Social Networks
Peizhi Wu, Yi Tu, Xiaojie Yuan, Adam Jatowt, Zhenglu Yang
Personalization, User Modeling

Modeling the evolution of user feedback and social links in dynamic social networks is of considerable significance, because it is the basis of many applications, including recommendation systems and user behavior analyses. Most of the existing methods in this area model user behaviors separately and consider only certain aspects of this problem, such as dynamic preferences of users, dynamic attributes of items, evolutions of social networks, and their partial integration. This work proposes a comprehensive general neural framework with several optimal strategies to jointly model the evolution of user feedback and social links. The framework considers the dynamic user preferences, dynamic item attributes, and time-dependent social links in time evolving social networks. Experimental results conducted on two real-world datasets demonstrate that our proposed model performs remarkably better than state-of-the-art methods.

• #2280
LSTM Networks for Online Cross-Network Recommendations
Dilruk Perera, Roger Zimmermann
Personalization, User Modeling

Cross-network recommender systems use auxiliary information from multiple source networks to create holistic user profiles and improve recommendations in a target network. However, we find two major limitations in existing cross-network solutions that reduce overall recommender performance. Existing models (1) fail to capture complex non-linear relationships in user interactions, and (2) are designed for offline settings hence, not updated online with incoming interactions to capture the dynamics in the recommender environment. We propose a novel multi-layered Long Short-Term Memory (LSTM) network based online solution to mitigate these issues. The proposed model contains three main extensions to the standard LSTM: First, an attention gated mechanism to capture long-term user preference changes. Second, a higher order interaction layer to alleviate data sparsity. Third, time aware LSTM cell gates to capture irregular time intervals between user interactions. We illustrate our solution using auxiliary information from Twitter and Google Plus to improve recommendations on YouTube. Extensive experiments show that the proposed model consistently outperforms state-of-the-art in terms of accuracy, diversity and novelty.

### Tuesday 1714:55 - 16:10ML-DL - Deep Learning (C3)

Chair: Matthias Schubert
• #4089
Network Approximation using Tensor Sketching
Shiva Prasad Kasiviswanathan, Nina Narodytska, Hongxia Jin
Deep Learning

Deep neural networks are powerful learning models that achieve state-of-the-art performance on many computer vision, speech, and language processing tasks. In this paper, we study a fundamental question that arises when designing deep network architectures: Given a target network architecture can we design a smaller' network architecture that 'approximates' the operation of the target network? The question is, in part, motivated by the challenge of parameter reduction (compression) in modern deep neural networks, as the ever increasing storage and memory requirements of these networks pose a problem in resource constrained environments.In this work, we focus on deep convolutional neural network architectures, and propose a novel randomized tensor sketching technique that we utilize to develop a unified framework for approximating the operation of both the convolutional and fully connected layers. By applying the sketching technique along different tensor dimensions, we design changes to the convolutional and fully connected layers that substantially reduce the number of effective parameters in a network. We show that the resulting smaller network can be trained directly, and has a classification accuracy that is comparable to the original network.

• #1075
Stochastic Fractional Hamiltonian Monte Carlo
Nanyang Ye, Zhanxing Zhu
Deep Learning

In this paper, we propose a novel stochastic fractional Hamiltonian Monte Carlo approach which generalizes the Hamiltonian Monte Carlo method within the framework of fractional calculus and L\'evy diffusion. Due to the large jumps'' introduced by L\'evy noise and momentum term, the proposed dynamics is capable of exploring the parameter space more efficiently and effectively. We have shown that the fractional Hamiltonian Monte Carlo could sample the multi-modal and high-dimensional target distribution more efficiently than the existing methods driven by Brownian diffusion. We further extend our method for optimizing deep neural networks. The experimental results show that the proposed stochastic fractional Hamiltonian Monte Carlo for training deep neural networks could converge faster than other popular optimization schemes and generalize better.

• #737
HST-LSTM: A Hierarchical Spatial-Temporal Long-Short Term Memory Network for Location Prediction
Dejiang Kong, Fei Wu
Deep Learning

The widely use of positioning technology has made mining the movements of people feasible and plenty of trajectory data have been accumulated. How to efficiently leverage these data for location prediction has become an increasingly popular research topic as it is fundamental to location-based services (LBS). The existing methods often focus either on long time (days or months) visit prediction (i.e., the recommendation of point of interest) or on real time location prediction (i.e., trajectory prediction). In this paper, we are interested in the location prediction problem in a weak real time condition and aim to predict users' movement in next minutes or hours. We propose a Spatial-Temporal Long-Short Term Memory (ST-LSTM) model which naturally combines spatial-temporal influence into LSTM to mitigate the problem of data sparsity. Further, we employ a hierarchical extension of the proposed ST-LSTM (HST-LSTM) in an encoder-decoder manner which models the contextual historic visit information in order to boost the prediction performance. The proposed HST-LSTM is evaluated on a real world trajectory data set and the experimental results demonstrate the effectiveness of the proposed model.

• #3881
Spatio-Temporal Check-in Time Prediction with Recurrent Neural Network based Survival Analysis
Guolei Yang, Ying Cai, Chandan K Reddy
Deep Learning

We introduce a novel check-in time prediction problem. The goal is to predict the time a user will check-in to a given location. We formulate check-in prediction as a survival analysis problem and propose a Recurrent-Censored Regression (RCR) model. We address the key challenge of check-in data scarcity, which is due to the uneven distribution of check-ins among users/locations. Our idea is to enrich the check-in data with potential visitors, i.e., users who have not visited the location before but are likely to do so. RCR uses recurrent neural network to learn latent representations from historical check-ins of both actual and potential visitors, which is then incorporated with censored regression to make predictions. Experiments show RCR outperforms state-of-the-art event time prediction techniques on real-world datasets.

• #4420
Learning to Recognize Transient Sound Events using Attentional Supervision
Szu-Yu Chou, Jyh-Shing Jang, Yi-Hsuan Yang
Deep Learning

Making sense of the surrounding context and ongoing events through not only the visual inputs but also acoustic cues is critical for various AI applications. This paper presents an attempt to learn a neural network model that recognizes more than 500 different sound events from the audio part of user generated videos (UGV). Aside from the large number of categories and the diverse recording conditions found in UGV, the task is challenging because a sound event may occur only for a short period of time in a video clip. Our model specifically tackles this issue by combining a main subnet that aggregates information from the entire clip to make clip-level predictions, and a supplementary subnet that examines each short segment of the clip for segment-level predictions. As the labeled data available for model training are typically on the clip level, the latter subnet learns to pay attention to segments selectively to facilitate attentional segment-level supervision. We call our model the M&mnet, for it leverages both “M”acro (clip-level) supervision and “m”icro (segment-level) supervision derived from the macro one. Our experiments show that M&mnet works remarkably well for recognizing sound events, establishing a new state-of-theart for DCASE17 and AudioSet data sets. Qualitative analysis suggests that our model exhibits strong gains for short events. In addition, we show that the micro subnet is computationally light and we can use multiple micro subnets to better exploit information in different temporal scales.

• #1582
LC-RNN: A Deep Learning Model for Traffic Speed Prediction
Zhongjian Lv, Jiajie Xu, Kai Zheng, Hongzhi Yin, Pengpeng Zhao, Xiaofang Zhou
Deep Learning

Traffic speed prediction is known as an important but challenging problem. In this paper, we propose a novel model, called LC-RNN, to achieve more accurate traffic speed prediction than existing solutions. It takes advantage of both RNN and CNN models by a rational integration of them, so as to learn more meaningful time-series patterns that can adapt to the traffic dynamics of surrounding areas. Furthermore, since traffic evolution is restricted by the underlying road network, a network embedded convolution structure is proposed to capture topology aware features. The fusion with other information, including periodicity and context factors, is also considered to further improve accuracy. Extensive experiments on two real datasets demonstrate that our proposed LC-RNN outperforms six well-known existing methods.

### Tuesday 1716:20 - 19:00ANAC Competition (K13)

• ANAC Competition
ANAC Competition
• ### Tuesday 1716:40 - 18:20SPE-EC - Special Track: Evolution of the Contours of AI (VICTORIA)

Chair: Ronen Brafman
• #5202
Towards Consumer-Empowering Artificial Intelligence
Giuseppe Contissa, Francesca Lagioia, Marco Lippi, Hans-Wolfgang Micklitz, Przemyslaw Palka, Giovanni Sartor, Paolo Torroni
Special Track: Evolution of the Contours of AI

Artificial Intelligence and Law is undergoing a critical transformation. Traditionally focused on the development of expert systems and on a scholarly effort to develop theories and methods for knowledge representation and reasoning in the legal domain, this discipline is now adapting to a sudden change of scenery. No longer confined to the walls of academia, it has welcomed new actors, such as businesses and companies, who are willing to play a major role and seize new opportunities offered by the same transformational impact that recent AI breakthroughs are having on many other areas. As it happens, commercial interests create new opportunities but they also represent a potential threat to consumers, as the balance of power seems increasingly determined by the availability of data. We believe that while this transformation is still in progress, time is ripe for the next frontier of this field of study, where a new shift of balance may be enabled by tools and services that can be of service not only to businesses but also to consumers and, more generally, the civil society. We call that frontier consumer-empowering AI.

• #5203
Quantifying Algorithmic Improvements over Time
Lars Kotthoff, Alexandre Fréchette, Tomasz Michalak, Talal Rahwan, Holger H. Hoos, Kevin Leyton-Brown
Special Track: Evolution of the Contours of AI

Assessing the progress made in AI and contributions to the state of the art is of major concern to the community. Recently, Frechette et al. [2016] advocated performing such analysis via the Shapley value, a concept from coalitional game theory. In this paper, we argue that while this general idea is sound, it unfairly penalizes older algorithms that advanced the state of the art when introduced, but were then outperformed by modern counterparts. Driven by this observation, we introduce the temporal Shapley value, a measure that addresses this problem while maintaining the desirable properties of the (classical) Shapley value. We use the tempo- ral Shapley value to analyze the progress made in (i) the different versions of the Quicksort algorithm; (ii) the annual SAT competitions 2007–2014; (iii) an annual competition of Constraint Programming, namely the MiniZinc challenge 2014–2016. Our analysis reveals novel insights into the development made in these important areas of research over time.

• #5204
The Facets of Artificial Intelligence: A Framework to Track the Evolution of AI
Fernando Martínez-Plumed, Bao Sheng Loe, Peter Flach, Seán Ó hÉigeartaigh, Karina Vold, José Hernández-Orallo
Special Track: Evolution of the Contours of AI

We present nine facets for the analysis of the past and future evolution of AI. Each facet has also a set of edges that can summarise different trends and contours in AI. With them, we first conduct a quantitative analysis using the information from two decades of AAAI/IJCAI conferences and around 50 years of documents from AI topics, an official database from the AAAI, illustrated by several plots. We then perform a qualitative analysis using the facets and edges, locating AI systems in the intelligence landscape and the discipline as a whole. This analytical framework provides a more structured and systematic way of looking at the shape and boundaries of AI.

• #5201
On a Scientific Discipline (Once) Named AI
Wolfgang Bibel
Special Track: Evolution of the Contours of AI

The paper envisions a scientific discipline of fundamental importance comparable to Physics or Biology, reminding that a discipline of such a contour was originally intended by the founders of Artificial Intelligence (AI). AI today, however, is far from such an encompassing discipline sharing the respective research interests with at least half a dozen of other disciplines. After the analysis of this situation and its background we discuss the consequences of this splintering by means of selected challenges. We deliberate thereby what could be done to alleviate the disadvantages resulting from the current state of affairs and to leverage AI's current prominence in the public attention to re-engage in the field's broader mission.

• #5206
Artificial Intelligence Conferences Closeness
Sébastien Konieczny, Emmanuel Lonca
Special Track: Evolution of the Contours of AI

We study the evolution of Artificial Intelligence conference closeness, using the coscinus tool. Coscinus computes the closeness between publication supports using the co-publication habits of authors: the more authors publish in two conferences, the closer these two conferences. In this paper we perform an analysis of the main Artificial Intelligence conferences based on principal components analysis and clustering performed on this closeness relation.

• #5205
Evolving AI from Research to Real Life – Some Challenges and Suggestions
Sandya Mannarswamy, Shourya Roy
Special Track: Evolution of the Contours of AI

Artificial Intelligence (AI) has come a long way from the stages of being just scientific fiction or academic research curiosity to a point, where it is poised to impact human life significantly. AI driven applications such as autonomous vehicles, medical diagnostics, conversational agents etc. are becoming a reality. In this position paper, we argue that there are certain challenges AI still needs to overcome in its evolution from Research to Real Life. We outline some of these challenges and our suggestions to address them. We provide pointers to similar issues and their resolutions in disciplines such as psychology and medicine from which AI community can leverage the learning. More importantly, this paper is intended to focus the attention of AI research community on translating AI research efforts into real world deployments.

### Tuesday 1716:40 - 18:20KR-MAS2 - Knowledge Representation and Agents: Verification, Model Checking (C7)

Chair: Dengji Zhao
• #2765
Model Checking Probabilistic Epistemic Logic for Probabilistic Multiagent Systems
Chen Fu, Andrea Turrini, Xiaowei Huang, Lei Song, Yuan Feng, Lijun Zhang
Knowledge Representation and Agents: Verification, Model Checking

In this work we study the model checking problem for probabilistic multiagent systems with respect to the probabilistic epistemic logic PETL, which can specify both temporal and epistemic properties. We show that under the realistic assumption of uniform schedulers, i.e., the choice of every agent depends only on its observation history, PETL model checking is undecidable. By restricting the class of schedulers to be memoryless schedulers, we show that the problem becomes decidable. More importantly, we design a novel algorithm which reduces the model checking problem into a mixed integer non-linear programming problem, which can then be solved by using an SMT solver. The algorithm has been implemented in an existing model checker and experiments are conducted on examples from the IPPC competitions.

• #3182
Alternating-time Temporal Logic on Finite Traces
Francesco Belardinelli, Alessio Lomuscio, Aniello Murano, Sasha Rubin
Knowledge Representation and Agents: Verification, Model Checking

We develop a logic-based technique to analyse finite interactions in multi-agent systems. We introduce a semantics for Alternating-time Temporal Logic (for both perfect and imperfect recall) and its branching-time fragments in which paths are finite instead of infinite.  We study validities of these logics and present optimal algorithms for their model-checking problems in the perfect recall case.

• #3907
LTL Realizability via Safety and Reachability Games
Alberto Camacho, Christian Muise, Jorge A. Baier, Sheila A. McIlraith
Knowledge Representation and Agents: Verification, Model Checking

In this paper, we address the problem of LTL realizability and synthesis. State of the art techniques rely on so-called bounded synthesis methods, which reduce the problem to a safety game. Realizability is determined by solving synthesis in a dual game. We provide a unified view of duality, and introduce novel bounded realizability methods via reductions to reachability games. Further, we introduce algorithms, based on AI automated planning, to solve these safety and reachability games. This is the the first complete approach to LTL realizability and synthesis via automated planning. Experiments illustrate that reductions to reachability games are an alternative to reductions to safety games, and show that planning can be a competitive approach to LTL realizability and synthesis.

• #2881
Symbolic Synthesis of Fault-Tolerance Ratios in Parameterised Multi-Agent Systems
Panagiotis Kouvaros, Alessio Lomuscio, Edoardo Pirovano
Knowledge Representation and Agents: Verification, Model Checking

We study the problem of determining the robustness of a multi-agent system of unbounded size against specifications expressed in a temporal-epistemic logic. We introduce a procedure to synthesise automatically the maximal ratio of faulty agents that may be present at runtime for a specification to be satisfied in a multi-agent system. We show the procedure to be sound and amenable to symbolic implementation. We present an implementation and report the experimental results obtained by running this on a number of protocols from swarm robotics.

• #2914
Synthesis of Controllable Nash Equilibria in Quantitative Objective Game
Shaull Almagor, Orna Kupferman, Giuseppe Perelli
Knowledge Representation and Agents: Verification, Model Checking

In Rational Synthesis, we consider a multi-agent system in which some of the agents are controllable and some are not. All agents have objectives, and the goal is to synthesize strategies for the controllable agents so that their objectives are satisfied, assuming rationality of the uncontrollable agents. Previous work on rational synthesis considers objectives in LTL, namely ones that describe on-going behaviors, and in Objective-LTL, which allows ranking of LTL formulas. In this paper, we extend rational synthesis to LTL[F] -- an extension of LTL by quality operators. The satisfaction value of an LTL[F] formula is a real value in [0,1], where the higher the value is, the higher is the quality in which the computation satisfies the specification. The extension significantly strengthens the framework of rational synthesis and enables a study its game- and social-choice theoretic aspects. In particular, we study the price of stability and price of anarchy of the rational-synthesis game and use them to explain the cooperative and non-cooperative settings of rational synthesis. Our algorithms make use of strategy logic and decision procedures for it. Thus, we are able to handle the richer quantitative setting using existing tools. In particular, we show that the cooperative and non-cooperative versions of quantitative rational synthesis are 2EXPTIME-complete and in 3EXPTIME, respectively -- not harder than the complexity known for their Boolean analogues.

• #1925
Verifying Emergence of Bounded Time Properties in Probabilistic Swarm Systems
Alessio Lomuscio, Edoardo Pirovano
Knowledge Representation and Agents: Verification, Model Checking

We introduce a parameterised semantics for reasoning about swarms as unbounded collections of agents in a probabilistic setting. We develop a method for the formal identification of emergent properties, expressed in a fragment of the probabilistic logic PCTL. We introduce algorithms for solving the related decision problems and show their correctness. We present an implementation and evaluate its performance on an ant coverage algorithm.

• #3835
Reachability Analysis of Deep Neural Networks with Provable Guarantees
Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska
Knowledge Representation and Agents: Verification, Model Checking

Verifying correctness for deep neural networks (DNNs) is challenging. We study a generic reachability problem for feed-forward DNNs which, for a given set of inputs to the network and a Lipschitz-continuous function over its outputs computes the lower and upper bound on the function values. Because the network and the function are Lipschitz continuous, all values in the interval between the lower and upper bound are reachable. We show how to obtain the safety verification problem, the output range analysis problem and a robustness measure by instantiating the reachability problem. We present a novel algorithm based on adaptive nested optimisation to solve the reachability problem. The technique has been implemented and evaluated on a range of DNNs, demonstrating its efficiency, scalability and ability to handle a broader class of networks than state-of-the-art verification approaches.

• #1183
Abstraction of Agents Executing Online and their Abilities in the Situation Calculus
Bita Banihashemi, Giuseppe De Giacomo, Yves Lespérance
Knowledge Representation and Agents: Verification, Model Checking

We develop a general framework for abstracting online behavior of an agent that may acquire new knowledge during execution (e.g., by sensing), in the situation calculus and ConGolog. We assume that we have both a high-level action theory and a low-level one that represent the agent's behavior at different levels of detail. In this setting, we define ability to perform a task/achieve a goal, and then show that under some reasonable assumptions, if the agent has a strategy by which she is able to achieve a goal at the high level, then we can refine it into a low-level strategy to do so.

### Tuesday 1716:40 - 18:20MAS-CCC - Cooperation, Coordination, Collaboration, Coalitions (C8)

Chair: Chen Hajaj
• #800
Fostering Cooperation in Structured Populations Through Local and Global Interference Strategies
The Anh Han, Simon Lynch, Long Tran-Thanh, Francisco C. Santos
Cooperation, Coordination, Collaboration, Coalitions

We study the situation of an exogenous decision-maker aiming to encourage a population of autonomous, self-regarding agents to follow a desired behaviour at a minimal cost. The primary goal is therefore to reach an efficient trade-off between pushing the agents to achieve the desired configuration while minimising the total investment. To this end, we test several interference paradigms resorting to simulations of agents facing a cooperative dilemma in a spatial arrangement. We systematically analyse and compare interference strategies rewarding local or global behavioural patterns.  Our results show that taking into account the neighbourhood's local properties, such as its level of cooperativeness, can lead to a significant improvement regarding cost efficiency while guaranteeing high levels of cooperation. As such, we argue that local interference strategies are more efficient than global ones in fostering cooperation in a population of autonomous agents.

• #4411
Vocabulary Alignment for Collaborative Agents: a Study with Real-World Multilingual How-to Instructions
Paula Chocron, Paolo Pareti
Cooperation, Coordination, Collaboration, Coalitions

Collaboration between heterogeneous agents typically requires the ability to communicate meaningfully. This can be challenging in open environments where participants may use different languages. Previous work proposed a technique to infer alignments between different vocabularies that uses only information about the tasks  being executed, without any external resource. Until now, this approach has only been evaluated with artificially created data. We adapt this technique to protocols written by humans in natural language, which we extract from instructional webpages. In doing so, we show how to take into account challenges that arise when working with natural language labels.The quality of the alignments obtained with our technique is evaluated in terms of their effectiveness in enabling successful collaborations, using a translation dictionary as a baseline. We show how our technique outperforms the dictionary when used to interact.

• #3123
Robust Norm Emergence by Revealing and Reasoning about Context: Socially Intelligent Agents for Enhancing Privacy
Nirav Ajmeri, Hui Guo, Pradeep K. Murukannaiah, Munindar P. Singh
Cooperation, Coordination, Collaboration, Coalitions

Norms describe the social architecture of a society and govern the interactions of its member agents. It may be appropriate for an agent to deviate from a norm; the deviation being indicative of a specialized norm applying under a specific context. Existing approaches for norm emergence assume simplified interactions wherein deviations are negatively sanctioned. We investigate via simulation the benefits of enriched interactions where deviating agents share selected elements of their contexts. We find that as a result (1) the norms are learned better with fewer sanctions, indicating improved social cohesion; and (2) the agents are better able to satisfy their individual goals. These results are robust under societies of varying sizes and characteristics reflecting pragmatic, considerate, and selfish agents.

• #5466
(Journal track) Incentive-Compatible Mechanisms for Norm Monitoring in Open Multi-Agent Systems
Natasha Alechina, Joseph Y. Halpern, Ian A. Kash, Brian Logan
Cooperation, Coordination, Collaboration, Coalitions

We consider the problem of detecting norm violations in open multi-agent systems (MAS). In this extended abstract, we outline the approach of [Alechina et al., 2018], and show how, using ideas from scrip systems, we can design mechanisms where the agents comprising the MAS are incentivised to monitor the actions of other agents for norm violations.

• #664
Explaining Multi-Criteria Decision Aiding Models with an Extended Shapley Value
Christophe Labreuche, Simon Fossier
Cooperation, Coordination, Collaboration, Coalitions

The capability to explain the result of aggregation models to decision makers is key to reinforcing user trust. In practice, Multi-Criteria Decision Aiding models are often organized in a hierarchical way, based on a tree of criteria. We present an explanation approach usable with any hierarchical multi-criteria model, based on an influence index of each attribute on the decision. A set of desirable axioms are defined. We show that there is a unique index fulfilling these axioms. This new index is an extension of the Shapley value on trees. An efficient rewriting of this index, drastically reducing the computation time, is obtained. Finally, the use of the new index is illustrated on an example.

• #5459
(Journal track) A COP Model for Graph-Constrained Coalition Formation
Filippo Bistaffa, Alessandro Farinelli
Cooperation, Coordination, Collaboration, Coalitions

We focus on Graph-Constrained Coalition Formation (GCCF), a widely studied subproblem of coalition formation where the set of valid coalitions is constrained by a graph. We propose COP-GCCF, a novel approach that models GCCF as a COP. We then solve such COP with a highly-parallel GPU implementation of Bucket Elimination, which is able to exploit the high constraint tightness of COP-GCCF. Results on realistic graphs, i.e., a crawl of the Twitter social graph, show that our approach outperforms state of the art algorithms (i.e., DyCE and IDP G ) by at least one order of magnitude, both in terms of runtime and memory.

• #5478
(Journal track) Constrained Coalition Formation on Valuation Structures: Formal Framework, Applications, and Islands of Tractability
Gianluigi Greco, Antonella Guzzo
Cooperation, Coordination, Collaboration, Coalitions

Coalition structure generation is considered in a setting where feasible coalition structures must satisfy constraints of two different kinds modeled in terms of a valuation structure, which consists of a set of pivotal agents that are pairwise incompatible, plus an interaction graph prescribing that a coalition C can form only if the subgraph induced over the nodes/agents in C is connected. It is shown that valuation structures can be used to model a number of relevant problems in real-world applications. Moreover, complexity issues arising with them are studied, by focusing in particular on identifying islands of tractability based on topological properties of the underlying interaction graph. Stability issues on valuation structures are studied too.

### Tuesday 1716:40 - 18:20SIS-PS - Sister Conferences Best Papers: Planning, Reinforcement Learning (K2)

Chair: Abdallah Saffidine
• #5110
Operator Counting Heuristics for Probabilistic Planning
Felipe Trevizan, Sylvie Thiébaux, Patrik Haslum
Sister Conferences Best Papers: Planning, Reinforcement Learning

For the past 25 years, heuristic search has been used to solve domain-independent probabilistic planning problems, but with heuristics that determinise the problem and ignore precious probabilistic information. In this paper, we present a generalization of the operator-counting family of heuristics to Stochastic Shortest Path problems (SSPs) that is able to represent the probability of the actions outcomes. Our experiments show that the equivalent of the net change heuristic in this generalized framework obtains significant run time and coverage improvements over other state-of-the-art heuristics in different planners.

• #5111
Cost-Based Goal Recognition for the Path-Planning Domain
Peta Masters, Sebastian Sardina
Sister Conferences Best Papers: Planning, Reinforcement Learning

"Plan recognition as planning" uses an off-the-shelf planner to perform goal recognition. In this paper, we apply the technique to path-planning. We show that a simpler formula provides an identical result in all but one set of conditions and, further, that identical ranking of goals by probability can be achieved without using any observations other than the agent's start location and where she is "now".

• #5126
Inductive Certificates of Unsolvability for Domain-Independent Planning
Salomé Eriksson, Gabriele Röger, Malte Helmert
Sister Conferences Best Papers: Planning, Reinforcement Learning

If a planning system outputs a solution for a given problem, it is simple to verify that the solution is valid. However, if a planner claims that a task is unsolvable, we currently have no choice but to trust the planner blindly. We propose a sound and complete class of certificates of unsolvability which can be verified efficiently by an independent program. To highlight their practical use, we show how these certificates can be generated for a wide range of state-of-the-art planning techniques with only polynomial overhead for the planner.

• #5112
An Empirical Study of Branching Heuristics through the Lens of Global Learning Rate
Jia Liang, Hari Govind, Pascal Poupart, Krzysztof Czarnecki, Vijay Ganesh
Sister Conferences Best Papers: Planning, Reinforcement Learning

In this paper, we analyze a suite of 7 well-known branching heuristics proposed by the SAT community and show that the better heuristics tend to generate more learnt clauses per decision, a metric we define as the global learning rate (GLR). We propose GLR as a metric for the branching heuristic to optimize. We test our hypothesis by developing a new branching heuristic that maximizes GLR greedily. We show empirically that this heuristic achieves very high GLR and interestingly very low literal block distance (LBD) over the learnt clauses. In our experiments this greedy branching heuristic enables the solver to solve instances faster than VSIDS, when the branching time is taken out of the equation. This experiment is a good proof of concept that a branching heuristic maximizing GLR will lead to good solver performance modulo the computational overhead. Finally, we propose a new branching heuristic, called SGDB, that uses machine learning to cheapily approximate greedy maximization of GLR. We show experimentally that SGDB performs on par with the VSIDS branching heuristic.

• #5116
Search Progress and Potentially Expanded States in Greedy Best-First Search
Manuel Heusner, Thomas Keller, Malte Helmert
Sister Conferences Best Papers: Planning, Reinforcement Learning

A classical result in optimal search shows that A* with an admissible and consistent heuristic expands every state whose f-value is below the optimal solution cost and no state whose f-value is above the optimal solution cost. For satisficing search algorithms, a similarly clear understanding is currently lacking. We examine the search behavior of greedy best-first search (GBFS) in order to make progress towards such an understanding. We introduce the concept of high-water mark benches, which separate the search space into areas that are searched by a GBFS algorithm in sequence. High-water mark benches allow us to exactly determine the set of states that are expanded by at least one GBFS tie-breaking strategy and give us a clearer understanding of search progress.

• #5142
Multi-Robot Motion Planning with Dynamics Guided by Multi-Agent Search
Duong Le, Erion Plaku
Sister Conferences Best Papers: Planning, Reinforcement Learning

This paper presents an effective multi-robot motion planner that enables each robot to reach its desired location while avoiding collisions with the other robots and the obstacles. The approach takes into account the differential constraints imposed by the underlying dynamics of each robot and generates dynamically-feasible motions that can be executed in the physical world. The crux of the approach is the sampling-based expansion of a motion tree in the continuous state space of all the robots guided by multi-agent search over a discrete abstraction. Experiments using vehicle models with nonlinear dynamics operating in complex environments show significant speedups over related work.

### Tuesday 1716:40 - 18:20NLP-CV2 - Language and Vision: Image Captioning, Visual Question Answering (T2)

Chair: Zhou Cheng
• #509
Image Cationing with Visual-Semantic LSTM
Nannan Li, Zhenzhong Chen
Language and Vision: Image Captioning, Visual Question Answering

In this paper, a novel image captioning approach is proposed to describe the content of images. Inspired by the visual processing of our cognitive system, we propose a visual-semantic LSTM model to locate the attention objects with their low-level features in the visual cell, and then successively extract high-level semantic features in the semantic cell. In addition, a state perturbation term is introduced to the word sampling strategy in the REINFORCE based method to explore proper vocabularies in the training process. Experimental results on MS COCO and Flickr30K validate the effectiveness of our approach when compared to the state-of-the-art methods.

• #182
Show and Tell More: Topic-Oriented Multi-Sentence Image Captioning
Yuzhao Mao, Chang Zhou, Xiaojie Wang, Ruifan Li
Language and Vision: Image Captioning, Visual Question Answering

Image captioning aims to generate textual descriptions for images. Most previous work generates a single-sentence description for each image. However, a picture is worth a thousand words. Single-sentence can hardly give a complete view of an image even by humans. In this paper, we propose a novel Topic-Oriented Multi-Sentence (\emph{TOMS}) captioning model, which can generate multiple topic-oriented sentences to describe an image. Different from object instances or attributes, topics mined by the latent Dirichlet allocation reflect hidden thematic structures in reference sentences of an image. In our model, each topic is integrated to a caption generator with a Fusion Gate Unit (FGU) to guide the generation of a sentence towards a certain topic perspective. With multiple sentences from different topics, our \emph{TOMS} provides a complete description of an image. Experimental results on both sentence and paragraph datasets demonstrate the effectiveness of our \emph{TOMS} in terms of topical consistency and descriptive completeness.

• #3045
Multi-Level Policy and Reward Reinforcement Learning for Image Captioning
Anan Liu, Ning Xu, Hanwang Zhang, Weizhi Nie, Yuting Su, Yongdong Zhang
Language and Vision: Image Captioning, Visual Question Answering

Image captioning is one of the most challenging hallmark of AI, due to its complexity in visual and natural language understanding. As it is essentially a sequential prediction task, recent advances in image captioning use Reinforcement Learning (RL) to better explore the dynamics of word-by-word generation. However, existing RL-based image captioning methods mainly rely on a single policy network and reward function that does not well fit the multi-level (word and sentence) and multi-modal (vision and language) nature of the task. To this end, we propose a novel multi-level policy and reward RL framework for image captioning. It contains two modules: 1) Multi-Level Policy Network that can adaptively fuse the word-level policy and the sentence-level policy for the word generation; and 2) Multi-Level Reward Function that collaboratively leverages both vision-language reward and language-language reward to guide the policy. Further, we propose a guidance term to bridge the policy and the reward for RL optimization. Extensive experiments and analysis on MSCOCO and Flickr30k show that the proposed framework can achieve competing performances with respect to different evaluation metrics.

• #2374
A Multi-task Learning Approach for Image Captioning
Wei Zhao, Benyou Wang, Jianbo Ye, Min Yang, Zhou Zhao, Ruotian Luo, Yu Qiao
Language and Vision: Image Captioning, Visual Question Answering

In this paper, we propose a Multi-task Learning Approach for Image Captioning (MLAIC ), motivated by the fact that humans have no difficulty performing such task because they possess capabilities of multiple domains. Specifically, MLAIC consists of three key components: (i) A multi-object classification model that learns rich category-aware image representations using a CNN image encoder; (ii) A syntax generation model that learns better syntax-aware LSTM based decoder; (iii) An image captioning model that generates image descriptions in text, sharing its CNN encoder and LSTM decoder with the object classification task and the syntax generation task, respectively. In particular, the image captioning model can benefit from the additional object categorization and syntax knowledge. To verify the effectiveness of our approach, we conduct extensive experiments on MS-COCO dataset. The experimental results demonstrate that our model achieves impressive results compared to other strong competitors.

• #4561
Feature Enhancement in Attention for Visual Question Answering
Yuetan Lin, Zhangyang Pang, Donghui Wang, Yueting Zhuang
Language and Vision: Image Captioning, Visual Question Answering

Attention mechanism has been an indispensable part of Visual Question Answering (VQA) models, due to the importance of its selective ability on image regions and/or question words. However, attention mechanism in almost all the VQA models takes as input the image visual and question textual features, which stem from different sources and between which there exists essential semantic gap. In order to further improve the accuracy of correlation between region and question in attention, we focus on region representation and propose the idea of feature enhancement, which includes three aspects. (1) We propose to leverage region semantic representation which is more consistent with the question representation. (2) We enrich the region representation using features from multiple hierarchies and (3) we refine the semantic representation for richer information. With these three incremental feature enhancement mechanisms, we improve the region representation and achieve better attentive effect and VQA performance. We conduct extensive experiments on the largest VQA v2.0 benchmark dataset and achieve competitive results without additional training data, and prove the effectiveness of our proposed feature-enhanced attention by visual demonstrations.

• #2651
From Pixels to Objects: Cubic Visual Attention for Visual Question Answering
Jingkuan Song, Pengpeng Zeng, Lianli Gao, Heng Tao Shen
Language and Vision: Image Captioning, Visual Question Answering

Recently, attention-based Visual Question Answering (VQA) has achieved great success by utilizing question to selectively target different visual areas that are related to the answer. Existing visual attention models are generally planar, i.e., different channels of the last conv-layer feature map of an image share the same weight. This conflicts with the attention mechanism because CNN features are naturally spatial and channel-wise. Also, visual attention models are usually conducted on pixel-level, which may cause region discontinuous problem. In this paper we propose a Cubic Visual Attention (CVA) model by successfully applying a novel channel and spatial attention on object regions to improve VQA task. Specifically, instead of attending to pixels, we first take advantage of the object proposal networks to generate a set of object candidates and extract their associated conv features. Then, we utilize the question to guide channel attention and spatial attention calculation based on the con-layer feature map. Finally, the attended visual features and the question are combined to infer the answer. We assess the performance of our proposed CVA on three public image QA datasets, including COCO-QA, VQA and Visual7W. Experimental results show that our proposed method significantly outperforms the state-of-the-arts.

• #1401
Show, Observe and Tell: Attribute-driven Attention Model for Image Captioning
Hui Chen, Guiguang Ding, Zijia Lin, Sicheng Zhao, Jungong Han
Language and Vision: Image Captioning, Visual Question Answering

Despite the fact that attribute-based approaches and attention-based approaches have been proven to be effective in image captioning, most attribute-based approaches simply predict attributes independently without taking the co-occurrence dependencies among attributes into account. Besides, most attention-based captioning models directly leverage the feature map extracted from CNN, in which many features may be redundant in relation to the image content. In this paper, we focus on training a good attribute-inference model via the recurrent neural network (RNN) for image captioning, where the co-occurrence dependencies among attributes can be maintained. The uniqueness of our inference model lies in the usage of a RNN with the visual attention mechanism to \textit{observe} the image before generating captions. Additionally, it is noticed that compact and attribute-driven features will be more useful for the attention-based captioning model. To this end, we extract the context feature for each attribute, and guide the captioning model adaptively attend to these context features. We verify the effectiveness and superiority of the proposed approach over the other captioning approaches by conducting massive experiments and comparisons on MS COCO image captioning dataset.

• #823
A Question Type Driven Framework to Diversify Visual Question Generation
Zhihao Fan, Zhongyu Wei, Piji Li, Yanyan Lan, Xuanjing Huang
Language and Vision: Image Captioning, Visual Question Answering

Visual question generation aims at asking questions about an image automatically. Existing research works on this topic usually generate a single question for each given image without considering the issue of diversity. In this paper, we propose a question type driven framework to produce multiple questions for a given image with different focuses. In our framework, each question is constructed following the guidance of a sampled question type in a sequence-to-sequence fashion. To diversify the generated questions, a novel conditional variational auto-encoder is introduced to generate multiple questions with a specific question type. Moreover, we design a strategy to conduct the question type distribution learning for each image to select the final questions. Experimental results on three benchmark datasets show that our framework outperforms the state-of-the-art approaches in terms of both relevance and diversity.

### Tuesday 1716:40 - 18:20CV-BFG - Biometrics, Face and Gesture Recognition (T1)

Chair: Mayank Vatsa
• #997
Deep Attribute Guided Representation for Heterogeneous Face Recognition
Decheng Liu, Nannan Wang, Chunlei Peng, Jie Li, Xinbo Gao
Biometrics, Face and Gesture Recognition

Heterogeneous face recognition (HFR) is a challenging problem in face recognition, subject to large texture and spatial structure differences of face images. Different from conventional face recognition in homogeneous environments, there exist many face images taken from different sources (including different sensors or different mechanisms) in reality. Motivated by human cognitive mechanism, we naturally utilize the explicit invariant semantic information (face attributes) to help address the gap of different modalities. Existing related face recognition methods mostly regard attributes as the high level feature integrated with other engineering features enhancing recognition performance, ignoring the inherent relationship between face attributes and identities. In this paper, we propose a novel deep attribute guided representation based heterogeneous face recognition method (DAG-HFR) without labeling attributes manually. Deep convolutional networks are employed to directly map face images in heterogeneous scenarios to a compact common space where distances mean similarities of pairs. An attribute guided triplet loss (AGTL) is designed to train an end-to-end HFR network which could effectively eliminate defects of incorrectly detected attributes. Extensive experiments on multiple heterogeneous scenarios (composite sketches, resident ID cards) demonstrate that the proposed method achieves superior performances compared with state-of-the-art methods.

• #730
Harnessing Synthesized Abstraction Images to Improve Facial Attribute Recognition
Keke He, Yanwei Fu, Wuhao Zhang, Chengjie Wang, Yu-Gang Jiang, Feiyue Huang, Xiangyang Xue
Biometrics, Face and Gesture Recognition

Facial attribute recognition is an important and yet challenging research topic. Different from most previous approaches which predict attributes only based on the whole images, this paper leverages facial parts locations for better attribute prediction. A facial abstraction image which contains both local facial parts and facial texture information is introduced. This abstraction image is generated by a Generative Adversarial Network (GAN). Then we build a dual-path facial attribute recognition network to utilize features from the original face images and facial abstraction images. Empirically, the features of facial abstraction images are complementary to features of original face images. With the facial parts localized by the abstraction images, our method improves facial attributes recognition, especially the attributes located on small face regions. Extensive evaluations conducted on CelebA and LFWA benchmark datasets show that state-of-the-art performance is achieved.

• #358
Live Face Verification with Multiple Instantialized Local Homographic Parameterization
Chen Lin, Zhouyingcheng Liao, Peng Zhou, Jianguo Hu, Bingbing Ni
Biometrics, Face and Gesture Recognition

State-of-the-art live face verification methods would easily be attacked by recorded facial expression sequence. This work directly addresses this issue via proposing a patch-wise motion parameterization based verification network infrastructure. This method directly explores the underlying subtle motion difference between the facial movements re-captured from a planer screen (e.g., a pad) and those from a real face; therefore interactive facial expression is no longer required. Furthermore, inspired by the fact that ?a fake facial movement sequence MUST contains many patch-wise fake sequences?, we embed our network into a multiple instance learning framework, which further enhance the recall rate of the proposed technique. Extensive experimental results on several face benchmarks well demonstrate the superior performance of our method.

• #2339
Zhou Yin, Wei-Shi Zheng, Ancong Wu, Hong-Xing Yu, Hai Wan, Xiaowei Guo, Feiyue Huang, Jianhuang Lai
Biometrics, Face and Gesture Recognition

While attributes have been widely used for person re-identification (Re-ID) which aims at matching the same person images across disjoint camera views, they are used either as extra features or for performing multi-task learning to assist the image-image matching task. However, how to find a set of person images according to a given attribute description, which is very practical in many surveillance applications, remains a rarely investigated cross-modality matching problem in person Re-ID. In this work, we present this challenge and leverage adversarial learning to formulate the attribute-image cross-modality person Re-ID model. By imposing a semantic consistency constraint across modalities as a regularization, the adversarial learning enables to generate image-analogous concepts of query attributes for matching the corresponding images at both global level and semantic ID level. We conducted extensive experiments on three attribute datasets and demonstrated that the regularized adversarial modelling is so far the most effective method for the attribute-image cross-modality person Re-ID problem.

• #1600
Dual Conditional GANs for Face Aging and Rejuvenation
Jingkuan Song, Jingqiu Zhang, Lianli Gao, Xianglong Liu, Heng Tao Shen
Biometrics, Face and Gesture Recognition

Face aging and rejuvenation is to predict the face of a person at different ages. While tremendous progress have been made in this topic, there are two central problems remaining largely unsolved: 1) the majority of prior works requires sequential training data, which is very rare in real scenarios, and 2) how to simultaneously render aging face and preserve personality. To tackle these issues, in this paper, we develop a novel dual conditional GAN (DCGAN) mechanism, which enables face aging and rejuvenation to be trained from multiple sets of unlabeled face images with different ages. In our architecture, the primal conditional GAN transforms a face image to other ages based on the age condition, while the dual conditional GAN learns to invert the task. Hence a loss function that accounts for the reconstruction error of images can preserve the personal identity, while the discriminators on the generated images learn the transition patterns (e.g., the shape and texture changes between age groups) and guide the generation of age-specific photo-realistic faces. Experimental results on two publicly dataset demonstrate the appealing performance of the proposed framework by comparing with the state-of-the-art methods.

• #159
DRPose3D: Depth Ranking in 3D Human Pose Estimation
Min Wang, Xipeng Chen, Wentao Liu, Chen Qian, Liang Lin, Lizhuang Ma
Biometrics, Face and Gesture Recognition

In this paper, we propose a two-stage depth ranking based method (DRPose3D) to tackle the problem of 3D human pose estimation. Instead of accurate 3D positions, the depth ranking can be identified by human intuitively and learned using the deep neural network more easily by solving classification problems. Moreover, depth ranking contains rich 3D information. It prevents the 2D-to-3D pose regression in two-stage methods from being ill-posed. In our method, firstly, we design a Pairwise Ranking Convolutional Neural Network (PRCNN) to extract depth rankings of human joints from images. Secondly, a coarse-to-fine 3D Pose Network(DPNet) is proposed to estimate 3D poses from both depth rankings and 2D human joint locations. Additionally, to improve the generality of our model, we introduce a statistical method to augment depth rankings. Our approach outperforms the state-of-the-art methods in the Human3.6M benchmark for all three testing protocols, indicating that depth ranking is an essential geometric feature which can be learned to improve the 3D pose estimation.

• #3601
Anonymizing k Facial Attributes via Adversarial Perturbations
Saheb Chhabra, Richa Singh, Mayank Vatsa, Gaurav Gupta
Biometrics, Face and Gesture Recognition

A face image not only provides details about the identity of a subject but also reveals several attributes such as gender, race, sexual orientation, and age. Advancements in machine learning algorithms and popularity of sharing images on the World Wide Web, including social media websites, have increased the scope of data analytics and information profiling from photo collections. This poses a serious privacy threat for individuals who do not want to be profiled. This research presents a novel algorithm for anonymizing selective attributes which an individual does not want to share without affecting the visual quality of images. Using the proposed algorithm, a user can select single or multiple attributes to be surpassed while preserving identity information and visual content. The proposed adversarial perturbation based algorithm embeds imperceptible noise in an image such that attribute prediction algorithm for the selected attribute yields incorrect classification result, thereby preserving the information according to user's choice. Experiments on three popular databases i.e. MUCT, LFWcrop, and CelebA show that the proposed algorithm not only anonymizes \textit{k}-attributes, but also preserves image quality and identity information.

• #37
3D-Aided Deep Pose-Invariant Face Recognition
Jian Zhao, Lin Xiong, Yu Cheng, Yi Cheng, Jianshu Li, Li Zhou, Yan Xu, Jayashree Karlekar, Sugiri Pranata, Shengmei Shen, Junliang Xing, Shuicheng Yan, Jiashi Feng
Biometrics, Face and Gesture Recognition

Learning from synthetic faces, though perhaps appealing for high data efficiency, may not bring satisfactory performance due to the distribution discrepancy of the synthetic and real face images. To mitigate this gap, we propose a 3D-Aided Deep Pose-Invariant Face Recognition Model (3D-PIM), which automatically recovers realistic frontal faces from arbitrary poses through a 3D face model in a novel way. Specifically, 3D-PIM incorporates a simulator with the aid of a 3D Morphable Model (3D MM) to obtain shape and appearance prior for accelerating face normalization learning, requiring less training data. It further leverages a global-local Generative Adversarial Network (GAN) with multiple critical improvements as a refiner to enhance the realism of both global structures and local details of the face simulator’s output using unlabelled real data only, while preserving the identity information. Qualitative and quantitative experiments on both controlled and in-the-wild benchmarks clearly demonstrate superiority of the proposed model over state-of-the-arts.

### Tuesday 1716:40 - 18:20SGP-SO - Heuristic Search and Optimization (K11)

Chair: Pavel Surynek
• #24
The FastMap Algorithm for Shortest Path Computations
Liron Cohen, Tansel Uras, Shiva Jahangiri, Aliyah Arunasalam, Sven Koenig, T. K. Satish Kumar
Heuristic Search and Optimization

We present a new preprocessing algorithm for embedding the nodes of a given edge-weighted undirected graph into a Euclidean space. The Euclidean distance between any two nodes in this space approximates the length of the shortest path between them in the given graph. Later, at runtime, a shortest path between any two nodes can be computed with an A* search using the Euclidean distances as heuristic. Our preprocessing algorithm, called FastMap, is inspired by the data-mining algorithm of the same name and runs in near-linear time. Hence, FastMap is orders of magnitude faster than competing approaches that produce a Euclidean embedding using Semidefinite Programming. FastMap also produces admissible and consistent heuristics and therefore guarantees the generation of shortest paths. Moreover, FastMap applies to general undirected graphs for which many traditional heuristics, such as the Manhattan Distance heuristic, are not well defined. Empirically, we demonstrate that A* search using the FastMap heuristic is competitive with A* search using other state-of-the-art heuristics, such as the Differential heuristic.

• #111
A Fast Local Search Algorithm for Minimum Weight Dominating Set Problem on Massive Graphs
Yiyuan Wang, Shaowei Cai, Jiejiang Chen, Minghao Yin
Heuristic Search and Optimization

The minimum weight dominating set (MWDS) problem is NP-hard and also important in many applications. Recent heuristic MWDS algorithms can hardly solve massive real world graphs effectively. In this paper, we design a fast local search algorithm called FastMWDS for the MWDS problem, which aims to obtain a good solution on massive graphs within a short time. In this novel local search framework, we propose two ideas to make it effective. Firstly, we design a new fast construction procedure with four reduction rules to cut down the size of massive graphs. Secondly, we propose the three-valued two-level configuration checking strategy to improve local search, which is interestingly a variant of configuration checking (CC) with two levels and multiple values. Experiment results on a broad range of massive real world graphs show that FastMWDS finds much better solutions than state of the art MWDS algorithms.

• #179
Convergence Analysis of Gradient Descent for Eigenvector Computation
Zhiqiang Xu, Xin Cao, Xin Gao
Heuristic Search and Optimization

We present a novel, simple and systematic convergence analysis of gradient descent for eigenvector computation. As a popular, practical, and provable approach to numerous machine learning problems, gradient descent has found successful applications to eigenvector computation as well. However, surprisingly, it lacks a thorough theoretical analysis for the underlying geodesically non-convex problem. In this work, the convergence of the gradient descent solver for the leading eigenvector computation is shown to be at a global rate O(min{ (lambda_1/Delta_p)^2 log(1/epsilon), 1/epsilon }), where Delta_p=lambda_p-lambda_p+1>0 represents the generalized positive eigengap and always exists without loss of generality with lambda_i being the i-th largest eigenvalue of the given real symmetric matrix and p being the multiplicity of lambda_1. The rate is linear at (lambda_1/Delta_p)^2 log(1/epsilon) if (lambda_1/Delta_p)^2=O(1), otherwise sub-linear at O(1/epsilon). We also show that the convergence only logarithmically instead of quadratically depends on the initial iterate. Particularly, this is the first time the linear convergence for the case that the conventionally considered eigengap Delta_1= lambda_1 - lambda_2=0 but the generalized eigengap Delta_p satisfies (lambda_1/Delta_p)^2=O(1), as well as the logarithmic dependence on the initial iterate are established for the gradient descent solver. We are also the first to leverage for analysis the log principal angle between the iterate and the space of globally optimal solutions. Theoretical properties are verified in experiments.

• #637
A Fast Algorithm for Optimally Finding Partially Disjoint Shortest Paths
Longkun Guo, Yunyun Deng, Kewen Liao, Qiang He, Timos Sellis, Zheshan Hu
Heuristic Search and Optimization

The classical disjoint shortest path problem has recently recalled interests from researchers in the network planning and optimization community. However, the requirement of the shortest paths being completely vertex or edge disjoint might be too restrictive and demands much more resources in a network. Partially disjoint shortest paths, in which a bounded number of shared vertices or edges is allowed, balance between degree of disjointness and occupied network resources. In this paper, we consider the problem of finding k shortest paths which are edge disjoint but partially vertex disjoint. For a pair of distinct vertices in a network graph, the problem aims to optimally find k edge disjoint shortest paths among which at most a bounded number of vertices are shared by at least two paths. In particular, we present novel techniques for exactly solving the problem with a runtime that significantly improves the current best result. The proposed algorithm is also validated by computer experiments on both synthetic and real networks which demonstrate its superior efficiency of up to three orders of magnitude faster than the state of the art.

• #1777
A General Approach to Running Time Analysis of Multi-objective Evolutionary Algorithms
Chao Bian, Chao Qian, Ke Tang
Heuristic Search and Optimization

Evolutionary algorithms (EAs) have been widely applied to solve multi-objective optimization problems. In contrast to great practical successes, their theoretical foundations are much less developed, even for the essential theoretical aspect, i.e., running time analysis. In this paper, we propose a general approach to estimating upper bounds on the expected running time of multi-objective EAs (MOEAs), and then apply it to diverse situations, including bi-objective and many-objective optimization as well as exact and approximate analysis. For some known asymptotic bounds, our analysis not only provides their leading constants, but also improves them asymptotically. Moreover, our results provide some theoretical justification for the good empirical performance of MOEAs in solving multi-objective combinatorial problems.

• #1812
An Exact Algorithm for Maximum k-Plexes in Massive Graphs
Jian Gao, Jiejiang Chen, Minghao Yin, Rong Chen, Yiyuan Wang
Heuristic Search and Optimization

The maximum k-plex, a generalization of maximum clique, is used to cope with a great number of real-world problems. The aim of this paper is to propose a novel exact k-plex algorithm that can deal with large-scaled graphs with millions of vertices and edges. Specifically, we first propose several new graph reduction methods through a careful analyzing of structures of induced subgraphs. Afterwards, we present a preprocessing method to simplify initial graphs. Additionally, we present a branch-and-bound algorithm integrating the reduction methods as well as a new dynamic vertex selection mechanism. We perform intensive experiments to evaluate our algorithm, and show that the proposed strategies are effective and our algorithm outperforms state-of-the-art algorithms, especially for real-world massive graphs.

• #3209
Methods for off-line/on-line optimization under uncertainty
Allegra De Filippo, Michele Lombardi, Michela Milano
Heuristic Search and Optimization

In this work we present two general techniques to deal with multi-stage optimization problems under uncertainty, featuring off-line and on-line decisions. The methods are applicable when: 1) the uncertainty is exogenous; 2) there exists a heuristic for the on-line phase that can be modeled as a parametric convex optimization problem. The first technique replaces the on-line heuristics with an anticipatory solver, obtained through a systematic procedure. The second technique consists in making the off-line solver aware of the on-line heuristic, and capable of controlling its parameters so as to steer its behavior. We instantiate our approaches on two case studies: an energy management system with uncertain renewable generation and load demand, and a vehicle routing problem with uncertain travel times. We show how both techniques achieve high solution quality w.r.t. an oracle operating under perfect information, by obtaining different trade-offs in terms of computation time.

• #2812
Sequence Selection by Pareto Optimization
Chao Qian, Chao Feng, Ke Tang
Heuristic Search and Optimization

The problem of selecting a sequence of items from a universe that maximizes some given objective function arises in many real-world applications. In this paper, we propose an anytime randomized iterative approach POSeqSel, which maximizes the given objective function and minimizes the sequence length simultaneously. We prove that for any previously studied objective function, POSeqSel using a reasonable time can always reach or improve the best known approximation guarantee. Empirical results exhibit the superior performance of POSeqSel.

### Tuesday 1716:40 - 18:20ML-TS1 - Time Series and Data Streams (C2)

Chair: Kerstin Bach
• #10
Predicting Complex Activities from Ongoing Multivariate Time Series
Weihao Cheng, Sarah Erfani, Rui Zhang, Ramamohanarao Kotagiri
Time Series and Data Streams

The rapid development of sensor networks enables recognition of complex activities (CAs) using multivariate time series. However, CAs are usually performed over long periods of time, which causes slow recognition by models based on fully observed data. Therefore, predicting CAs at early stages becomes an important problem. In this paper, we propose Simultaneous Complex Activities Recognition and Action Sequence Discovering (SimRAD), an algorithm which predicts a CA over time by mining a sequence of multivariate actions from sensor data using a Deep Neural Network. SimRAD simultaneously learns two probabilistic models for inferring CAs and action sequences, where the estimations of the two models are conditionally dependent on each other. SimRAD continuously predicts the CA and the action sequence, thus the predictions are mutually updated until the end of the CA. We conduct evaluations on a real-world CA dataset consisting of a rich amount of sensor data, and the results show that SimRAD outperforms state-of-the-art methods by average 7.2% in prediction accuracy with high confidence.

• #829
Fan Zhou, Qiang Gao, Goce Trajcevski, Kunpeng Zhang, Ting Zhong, Fengli Zhang
Time Series and Data Streams

Trajectory-User Linking (TUL) is an essential task in Geo-tagged social media (GTSM) applications, enabling personalized Point of Interest (POI) recommendation and activity identification. Existing works on mining mobility patterns often model trajectories using Markov Chains (MC) or recurrent neural networks (RNN) -- either assuming independence between non-adjacent locations or following a shallow generation process. However, most of them ignore the fact that human trajectories are often sparse, high-dimensional and may contain embedded hierarchical structures. We tackle the TUL problem with a semi-supervised learning framework, called TULVAE (TUL via Variational AutoEncoder), which learns the human mobility in a neural generative architecture with stochastic latent variables that span hidden states in RNN. TULVAE alleviates the data sparsity problem by leveraging large-scale unlabeled data and represents the hierarchical and structural semantics of trajectories with high-dimensional latent variables. Our experiments demonstrate that TULVAE improves efficiency and linking performance in real GTSM datasets, in comparison to existing methods.

• #879
Online Continuous-Time Tensor Factorization Based on Pairwise Interactive Point Processes
Hongteng Xu, Dixin Luo, Lawrence Carin
Time Series and Data Streams

A continuous-time tensor factorization method is developed for event sequences containing multiple "modalities." Each data element is a point in a tensor, whose dimensions are associated with the discrete alphabet of the modalities. Each tensor data element has an associated time of occurence and a feature vector. We model such data based on pairwise interactive point processes, and the proposed framework connects pairwise tensor factorization with a feature-embedded point process. The model accounts for interactions within each modality, interactions across different modalities, and continuous-time dynamics of the interactions. Model learning is formulated as a convex optimization problem, based on online alternating direction method of multipliers. Compared to existing state-of-the-art methods, our approach captures the latent structure of the tensor and its evolution over time, obtaining superior results on real-world datasets.

• #1604
Periodic-CRN: A Convolutional Recurrent Model for Crowd Density Prediction with Recurring Periodic Patterns
Ali Zonoozi, Jung-jae Kim, Xiao-Li Li, Gao Cong
Time Series and Data Streams

Time-series forecasting in geo-spatial domains has important applications, including urban planning, traffic management and behavioral analysis. We observed recurring periodic patterns in some spatio-temporal data, which were not considered explicitly by previous non-linear works. To address this lack, we propose novel Periodic-CRN' (PCRN) method, which adapts convolutional recurrent network (CRN) to accurately capture spatial and temporal correlations, learns and incorporates explicit periodic representations, and can be optimized with multi-step ahead prediction. We show that PCRN consistently outperforms the state-of-the-art methods for crowd density prediction across two taxi datasets from Beijing and Singapore.

• #1648
Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting
Bing Yu, Haoteng Yin, Zhanxing Zhu
Time Series and Data Streams

Timely accurate traffic forecast is crucial for urban traffic control and guidance. Due to the high nonlinearity and complexity of traffic flow, traditional methods cannot satisfy the requirements of mid-and-long term prediction tasks and often neglect spatial and temporal dependencies. In this paper, we propose a novel deep learning framework, Spatio-Temporal Graph Convolutional Networks (STGCN), to tackle the time series prediction problem in traffic domain. Instead of applying regular convolutional and recurrent units, we formulate the problem on graphs and build the model with complete convolutional structures, which enable much faster training speed with fewer parameters. Experiments show that our model STGCN effectively captures comprehensive spatio-temporal correlations through modeling multi-scale traffic networks and consistently outperforms state-of-the-art baselines on various real-world traffic datasets.

• #3197
NeuCast: Seasonal Neural Forecast of Power Grid Time Series
Pudi Chen, Shenghua Liu, Chuan Shi, Bryan Hooi, Bai Wang, Xueqi Cheng
Time Series and Data Streams

In the smart power grid, short-term load forecasting (STLF) is a crucial step in scheduling and planning for future load, so as to improve the reliability, cost, and emissions of the power grid. Different from traditional time series forecast, STLF is a more challenging task, because of the complex demand of active and reactive power from numerous categories of electrical loads and the effects of environment. Therefore, we propose NeuCast, a seasonal neural forecasting method, which dynamically models various loads as co-evolving time series in a hidden space, as well as extra weather conditions, in a neural network structure. NeuCast captures seasonality and patterns of the time series by integrating factor modeling and hidden state recognition. NeuCast can also detect anomalies and forecast under different temperature assumptions. Extensive experiments on 134 real-word datasets show the improvements of NeuCast over the stateof-the-art methods.

• #4106
Finding Frequent Entities in Continuous Data
Ferran Alet, Rohan Chitnis, Leslie P. Kaelbling, Tomas Lozano-Perez
Time Series and Data Streams

In many applications that involve processing high-dimensional data, it is important to identify a small set of entities that account for a significant fraction of detections. Rather than formalize this as a clustering problem, in which all detections must be grouped into hard or soft categories, we formalize it as an instance of the frequent items or heavy hitters problem, which finds groups of tightly clustered objects that have a high density in the feature space. We show that the heavy hitters formulation generates solutions that are more accurate and effective than the clustering formulation. In addition, we present a novel online algorithm for heavy hitters, called HAC, which addresses problems in continuous space, and demonstrate its effectiveness on real video and household domains.

• #1805
Hierarchical Electricity Time Series Forecasting for Integrating Consumption Patterns Analysis and Aggregation Consistency
Yue Pang, Bo Yao, Xiangdong Zhou, Yong Zhang, Yiming Xu, Zijing Tan
Time Series and Data Streams

Electricity demand forecasting is a very important problem for energy supply and environmental protection. It can be formalized as a hierarchical time series forecasting problem with the aggregation constraints according to the geographical hierarchy, since the sum of the prediction results of the disaggregated time series should be equal to the prediction results of the aggregated ones. However in most previous work, the aggregation consistency is ensured at the loss of forecast accuracy. In this paper, we propose a novel clustering-based hierarchical electricity time series forecasting approach. Instead of dealing with the geographical hierarchy directly, we explore electricity consumption patterns by clustering analysis and build a new consumption pattern based time series hierarchy. We then present a novel hierarchical forecasting method with consumption hierarchical aggregation constraints to improve the electricity demand predictions of the bottom level, followed by a bottom-up" method to obtain forecasts of the geographical higher levels. Especially, we observe that in our consumption pattern based hierarchy the reconciliation error of the bottom level time series is correlated" to its membership degree of the corresponding cluster (consumption pattern), and hence apply this correlations as the regularization term in our forecasting objective function. Extensive experiments on real-life datasets verify that our approach achieves the best prediction accuracy, compared with the state-of-the-art methods.

### Tuesday 1716:40 - 18:20MUL-WEB1 - AI and the Web, Networks 1 (C3)

Chair: David Pennock
• #525
A Comparative Study of Transactional and Semantic Approaches for Predicting Cascades on Twitter
Yunwei Zhao, Can Wang, Chi-Hung Chi, Kwok-Yan Lam, Sen Wang
AI and the Web, Networks 1

• #650
Improving Information Centrality of a Node in Complex Networks by Adding Edges
Liren Shan, Yuhao Yi, Zhongzhi Zhang
AI and the Web, Networks 1

The problem of increasing the centrality of a network node arises in many practical applications. In this paper, we study the optimization problem of maximizing the information centrality Iv of a given node v in a network with n nodes and m edges, by creating k new edges incident to v. Since Iv is the reciprocal of the sum of resistance distance Rv between v and all nodes, we alternatively consider the problem of minimizing Rv by adding k new edges linked to v. We show that the objective function is monotone and supermodular. We provide a simple greedy algorithm with an approximation factor (1 − 1/e) and O(n^3) running time. To speed up the computation, we also present an algorithm to compute (1 − 1/e − epsilon) approximate resistance distance Rv after iteratively adding k edges, the running time of which is Otilde(mk*epsilon^−2) for any epsilon > 0, where the Otilde(·) notation suppresses the poly(log n) factors. We experimentally demonstrate the effectiveness and efficiency of our proposed algorithms.

• #1734
Scalable Multiplex Network Embedding
Hongming Zhang, Liwei Qiu, Lingling Yi, Yangqiu Song
AI and the Web, Networks 1

Network embedding has been proven to be helpful for many real-world problems. In this paper, we present a scalable multiplex network embedding model to represent information of multi-type relations into a unified embedding space. To combine information of different types of relations while maintaining their distinctive properties, for each node, we propose one high-dimensional common embedding and a lower-dimensional additional embedding for each type of relation. Then multiple relations can be learned jointly based on a unified network embedding model. We conduct experiments on two tasks: link prediction and node classification using six different multiplex networks. On both tasks, our model achieved better or comparable performance compared to current state-of-the-art models with less memory use.

• #1884
Adversarially Regularized Graph Autoencoder for Graph Embedding
Shirui Pan, Ruiqi Hu, Guodong Long, Jing Jiang, Lina Yao, Chengqi Zhang
AI and the Web, Networks 1

Graph embedding is an effective method to represent graph data in a low dimensional space for graph analytics.  Most existing embedding algorithms typically focus on preserving the topological structure or minimizing the reconstruction errors of graph data,  but they have mostly ignored the data distribution of the latent codes from the graphs, which often results in inferior embedding in  real-world  graph data. In this paper, we propose a novel adversarial graph embedding framework for graph data. The framework encodes the topological structure and node content in a graph to a compact representation, on which a decoder is trained to reconstruct the graph structure. Furthermore, the latent representation is enforced to match a prior distribution via an adversarial training scheme. To learn a robust embedding,  two variants of adversarial approaches,  adversarially regularized graph autoencoder (ARGA) and adversarially regularized variational graph autoencoder (ARVGA), are developed. Experimental studies on real-world graphs validate our design and demonstrate that our algorithms outperform baselines by a wide margin in link prediction,  graph clustering, and graph visualization tasks.

• #3528
Discrete Interventions in Hawkes Processes with Applications in Invasive Species Management
Amrita Gupta, Mehrdad Farajtabar, Bistra Dilkina, Hongyuan Zha
AI and the Web, Networks 1

The spread of invasive species to new areas threatens the stability of ecosystems and causes major economic losses. We propose a novel approach to minimize the spread of an invasive species given a limited intervention budget. We first model invasive species spread using Hawkes processes, and then derive closed-form expressions for characterizing the effect of an intervention action on the invasion process. We use this to obtain an optimal intervention plan based on an integer programming formulation, and compare the optimal plan against several ecologically-motivated heuristic strategies used in practice. We present an empirical study of two variants of the invasive control problem: minimizing the final rate of invasions, and minimizing the number of invasions at the end of a given time horizon. The optimized intervention achieves nearly the same level of control that would be attained by completely eradicating the species, but at only 60-80\% of the cost.

• #2600
Learning to Explain Ambiguous Headlines of Online News
Tianyu Liu, Wei Wei, Xiaojun Wan
AI and the Web, Networks 1

• #2703
Fact Checking via Evidence Patterns
Valeria Fionda, Giuseppe Pirrò
AI and the Web, Networks 1

We tackle fact checking using Knowledge Graphs (KGs) as a source of background knowledge. Our approach leverages the KG schema to generate candidate evidence patterns, that is, schema-level paths that capture the semantics of a target fact in alternative ways. Patterns verified in the data are used to both assemble semantic evidence for a fact and provide a numerical assessment of its truthfulness. We present efficient algorithms to generate and verify evidence patterns, and assemble evidence. We also provide a translation of the core of our algorithms into the SPARQL query language. Not only our approach is faster than the state of the art and offers comparable accuracy, but it can also use any SPARQL-enabled KG.

• #1650
Globally Optimized Mutual Influence Aware Ranking in E-Commerce Search
Tao Zhuang, Wenwu Ou, Zhirong Wang
AI and the Web, Networks 1

In web search, mutual influences between documents have been studied from the perspective of search result diversification. But the methods in web search is not directly applicable to e-commerce search because of their differences. And little research has been done on the mutual influences between items in e-commerce search. We propose a global optimization framework for mutual influence aware ranking in e-commerce search. Our framework directly optimizes the Gross Merchandise Volume (GMV) for ranking, and decomposes ranking into two tasks. The first task is mutual influence aware purchase probability estimation. We propose a global feature extension method to incorporate mutual influences into the features of an item. We also use Recurrent Neural Network (RNN) to capture influences related to ranking orders in purchase probability estimation. The second task is to find the best ranking order based on the purchase probability estimations. We treat the second task as a sequence generation problem and solved it using the beam search algorithm. We performed online A/B test on a large e-commerce search engine. The results show that our method brings a 5% increase in GMV for the search engine over a strong baseline.

### Wednesday 1808:30 - 09:45EAR6 - Early Career 6 (VICTORIA)

Chair: Craig Boutilier
• #5487
Statistical Quality Control for Human Computation and Crowdsourcing
Yukino Baba
Early Career 6

Human computation is a method for solving difficult problems by combining humans and computers. Quality control is a critical issue in human computation because it relies on a large number of participants (i.e., crowds) and there is an uncertainty about their reliability. A solution for this issue is to leverage the power of the "wisdom of crowds"; for example, we can aggregate the outputs of multiple participants or ask a participant to check the output of another participant to improve its quality. In this paper, we review several statistical approaches for controlling the quality of outputs from crowds.

• #5489
Engineering Graph Features via Network Functional Blocks
Vincent W. Zheng
Early Career 6

Graph is a prevalent data structure that enables many predictive tasks. How to engineer graph features is a fundamental question. Our concept is to go beyond nodes and edges, and explore richer structures (e.g., paths, subgraphs) for graph feature engineering. We call such richer structures as network functional blocks, because each structure serves as a network building block but with some different functionality. We use semantic proximity search as an example application to share our recent work on exploiting different granularities of network functional blocks. We show that network functional blocks are effective, and they can be useful for a wide range of applications.

• #5497
Symbolic Compilation, Inference, and Decision-making with Deep-learned Models
Scott Sanner
Early Career 6

### Wednesday 1808:30 - 09:45MAS-AM2 - Auctions and Markets 2 (C8)

Chair: Aris Filos-Ratsikas
• #1168
Online Pricing for Revenue Maximization with Unknown Time Discounting Valuations
Weichao Mao, Zhenzhe Zheng, Fan Wu, Guihai Chen
Auctions and Markets 2

Online pricing mechanisms have been widely applied to resource allocation in multi-agent systems. However, most of the existing online pricing mechanisms assume buyers have fixed valuations over the time horizon, which cannot capture the dynamic nature of valuation in emerging applications. In this paper, we study the problem of revenue maximization in online auctions with unknown time discounting valuations, and model it as non-stationary multi-armed bandit optimization. We design an online pricing mechanism, namely Biased-UCB, based on unique features of the discounting valuations. We use competitive analysis to theoretically evaluate the performance guarantee of our pricing mechanism, and derive the competitive ratio. Numerical results show that our design achieves good performance in terms of revenue maximization on a real-world bidding dataset.

• #2088
The Promise and Perils of Myopia in Dynamic Pricing With Censored Information
Meenal Chhabra, Sanmay Das, Ilya Ryzhov
Auctions and Markets 2

A seller with unlimited inventory of a digital good interacts with potential buyers with i.i.d. valuations. The seller can adaptively quote prices to each buyer to maximize long-term profits, but does not know the valuation distribution exactly. Under a linear demand model, we consider two information settings: partially censored, where agents who buy reveal their true valuations after the purchase is completed, and completely censored, where agents never reveal their valuations. In the partially censored case, we prove that myopic pricing with a Pareto prior is Bayes optimal and has finite regret. In both settings, we evaluate the myopic strategy against more sophisticated look-aheads using three valuation distributions generated from real data on auctions of physical goods, keyword auctions, and user ratings, where the linear demand assumption is clearly violated. For some datasets, complete censoring actually helps, because the restricted data acts as a "regularizer" on the posterior, preventing it from being affected too much by outliers.

• #3130
Customer Sharing in Economic Networks with Costs
Bin Li, Dong Hao, Dengji Zhao, Tao Zhou
Auctions and Markets 2

In an economic market, sellers, infomediaries and customers constitute an economic network. Each seller has her own customer group and the seller's private customers are unobservable to other sellers. Therefore, a seller can only sell commodities among her own customers unless other sellers or infomediaries share her sale information to their customer groups. However, a seller is not incentivized to share others' sale information by default, which leads to inefficient resource allocation and limited revenue for the sale. To tackle this problem, we develop a novel mechanism called customer sharing mechanism (CSM) which incentivizes all sellers to share each other's sale information to their private customer groups. Furthermore, CSM also incentivizes all customers to truthfully participate in the sale. In the end, CSM not only allocates the commodities efficiently but also optimizes the seller's revenue.

• #3275
Budget-feasible Procurement Mechanisms in Two-sided Markets
Weiwei Wu, Xiang Liu, Minming Li
Auctions and Markets 2

This paper considers the mechanism design problem in two-sided markets where multiple strategic buyers come with budgets to procure as much value of items  as possible from the strategic sellers. Each seller holds an item with public value and is allowed to bid its private cost.  Buyers could claim their budgets, not necessarily the true ones.  The goal is to seek budget-feasible mechanisms that ensure sellers are rewarded enough payment and buyers' budgets are not exceeded.  Our main contribution  is a random  mechanism  that guarantees various desired theoretical guarantees like the budget feasibility,  the truthfulness on the sellers' side and the buyers' side simultaneously, and constant approximation to the optimal total procured value of buyers.

• #3604
Integrating Demand Response and Renewable Energy In Wholesale Market
Chaojie Li, Chen Liu, Xinghuo Yu, Ke Deng, Tingwen Huang, Liangchen Liu
Auctions and Markets 2

Demand response (DR) can provide a cost-effect approach for reducing peak loads while renewable energy sources (RES) can result in an environmental-friendly solution for solving the problem of power shortage. The increasingly integration of DR and renewable energy bring challenging issues for energy policy makers, and electricity market regulators in the main power grid. In this paper, a new two-stage stochastic game model is introduced to operate the electricity market, where Stochastic Stackelberg-Cournot-Nash (SSCN) equilibrium  is applied to characterize the optimal energy bidding strategy of the forward market and the optimal energy trading strategy of the spot market. To obtain a SSCN equilibrium, sampling average approximation (SAA) technique is harnessed to address the stochastic game model in a distributed way. By this game model, the participation ratio of demand response can be significantly increased while the unreliability of power system caused by renewable energy resources can be considerably reduced. The effectiveness of proposed model is illustrated by extensive simulations.

• #2070
Equilibrium Behavior in Competing Dynamic Matching Markets
Zhuoshu Li, Neal Gupta, Sanmay Das, John P. Dickerson
Auctions and Markets 2

Rival markets like rideshare services, universities, and organ exchanges compete to attract participants, seeking to maximize their own utility at potential cost to overall social welfare.  Similarly, individual participants in such multi-market systems also seek to maximize their individual utility. If entry is costly, they should strategically enter only a subset of the available markets. All of this decision making---markets competitively adapting their matching strategies and participants arriving, choosing which market(s) to enter, and departing from the system---occurs dynamically over time. This paper provides the first analysis of equilibrium behavior in dynamic competing matching market systems---first from the points of view of individual participants when market policies are fixed, and then from the points of view of markets when agents are stochastic. When compared to single markets running social-welfare-maximizing matching policies, losses in overall social welfare in competitive systems manifest due to both market fragmentation and the use of non-optimal matching policies. We quantify such losses and provide policy recommendations to help alleviate them in fielded systems.

### Wednesday 1808:30 - 09:55KR-CSAT - Knowledge Representation, Constraints and Satisfiability (C7)

Chair: Francesco Ricca
• #382
Exploiting Justifications for Lazy Grounding of Answer Set Programs
Bart Bogaerts, Antonius Weinzierl
Knowledge Representation, Constraints and Satisfiability

Answer set programming (ASP) is an established knowledge representation formalism. Lazy grounding avoids the so-called grounding bottleneck of ASP by interleaving grounding and solving; this technique was recently extended to work with conflict-driven clause learning. Unfortunately, it often happens that such a lazy grounding ASP system, at the fixpoint of the evaluation, arrives at an assignment that contains literals that are true but unjustified. The system then is unable to determine the actual causes of the situation and falls back to chronological backtracking, potentially wasting an exponential amount of time. In this paper, we show how top-down query mechanisms can be used to analyze the situation, learn a new clause or nogood, and backjump further in the search tree. Contributions include a rephrasing of lazy grounding in terms of justifications and algorithms to construct relevant justifications without grounding. Initial experiments indicate that the newly developed techniques indeed allow for an exponential speed-up.

• #1227
Possibilistic ASP Base Revision by Certain Input
Laurent Garcia, Claire Lefèvre, Odile Papini, Igor Stéphan, Eric Würbel
Knowledge Representation, Constraints and Satisfiability

Belief base revision has been studied within the answer set programming framework. We go a step further by introducing uncertainty and studying belief base revision when beliefs are represented by possibilistic logic programs under possibilistic answer set semantics and revised by certain input. The paper proposes two approaches of rule-based revision operators and presents their semantic characterization in terms of possibilistic distribution. This semantic characterization allows for equivalently considering the evolution of syntactic logic programs and the evolution of their semantic content. It then studies the logical properties of the proposed operators and gives complexity results.

• #1719
Pseudo-Boolean Constraints from a Knowledge Representation Perspective
Daniel Le Berre, Pierre Marquis, Stefan Mengel, Romain Wallon
Knowledge Representation, Constraints and Satisfiability

We study pseudo-Boolean constraints (PBC) and their special case cardinality constraints (CARD) from the perspective of knowledge representation. To this end, the succinctness of PBC and CARD is compared to that of many standard propositional languages. Moreover, we determine which queries and transformations are feasible in polynomial time when knowledge is represented by PBC or CARD, and which are not (unconditionally or unless P = NP). In particular, the advantages and disadvantages compared to CNF are discussed.

• #3103
Novel Algorithms for Abstract Dialectical Frameworks based on Complexity Analysis of Subclasses and SAT Solving
Thomas Linsbichler, Marco Maratea, Andreas Niskanen, Johannes P. Wallner, Stefan Woltran
Knowledge Representation, Constraints and Satisfiability

Abstract dialectical frameworks (ADFs) constitute one of the most powerful formalisms in abstract argumentation. Their high computational complexity poses, however, certain challenges when designing efficient systems. In this paper, we tackle this issue by (i) analyzing the complexity of ADFs under structural restrictions, (ii) presenting novel algorithms which make use of these insights, and (iii) empirically evaluating a resulting implementation which relies on calls to SAT solvers.

• #3648
Stratified Negation in Limit Datalog Programs
Mark Kaminski, Bernardo Cuenca Grau, Egor V. Kostylev, Boris Motik, Ian Horrocks
Knowledge Representation, Constraints and Satisfiability

There has recently been an increasing interest in declarative data analysis, where analytic tasks are specified using a logical language, and their implementation and optimisation are delegated to a general-purpose query engine. Existing declarative languages for data analysis can be formalised as variants of logic programming equipped with arithmetic function symbols and/or aggregation, and are typically undecidable. In prior work, the language of limit programs was proposed, which is sufficiently powerful to capture many analysis tasks and has decidable entailment problem. Rules in this language, however, do not allow for negation. In this paper, we study an extension of limit programs with stratified negation-as-failure. We show that the additional expressive power makes reasoning computationally more demanding, and provide tight data complexity bounds. We also identify a fragment with tractable data complexity and sufficient expressivity to capture many relevant tasks.

• #1833
Classification Transfer for Qualitative Reasoning Problems
Manuel Bodirsky, Peter Jonsson, Barnaby Martin, Antoine Mottet
Knowledge Representation, Constraints and Satisfiability

We study formalisms for temporal and spatial reasoning in the modern context of Constraint Satisfaction Problems (CSPs). We show how questions on the complexity of their subclasses can be solved using existing results via the powerful use of primitive positive (pp) interpretations and pp-homotopy. We demonstrate the methodology by giving a full complexity classification of all constraint languages that are first-order definable in Allen's Interval Algebra and contain the basic relations (s) and (f). In the case of the Rectangle Algebra we answer in the affirmative the old open question as to whether ORD-Horn is a maximally tractable subset among the (disjunctive, binary) relations. We then generalise our results for the Rectangle Algebra to the r-dimensional Block Algebra.

• #3240
Simpler and Faster Algorithm for Checking the Dynamic Consistency of Conditional Simple Temporal Networks
Luke Hunsberger, Roberto Posenato
Knowledge Representation, Constraints and Satisfiability

Recent work on Conditional Simple Temporal Networks (CSTNs) has focused on checking the dynamic consistency (DC) property assuming that execution strategies can react instantaneously to observations. Three alternative semantics---IR-DC, 0-DC, and π-DC---have been presented. The most practical DC-checking algorithm for CSTNs has only been analyzed with respect to the IR-DC semantics, while the 0-DC semantics was shown to have a serious flaw that the π-DC semantics fixed. Whether the IR-DC semantics had the same flaw and, if so, what the consequences would be for the DC-checking algorithm remained open questions. This paper (1) shows that the IR-DC semantics is also flawed; (2) shows that one of the constraint-propagation rules from the IR-DC-checking algorithm is not sound with respect to the IR-DC semantics; (3) presents a simpler algorithm, called the π-DC-checking algorithm; (4) proves that it is sound and complete with respect to the π-DC semantics; and (5) empirically evaluates the new algorithm.

### Wednesday 1808:30 - 09:55ML-TAM2 - Transfer, Adaptation, Multi-Task Learning 2 (K2)

Chair: Yuguang Yan
• #2043
Xiao Zhang, Wenzhong Li, Vu Nguyen, Fuzhen Zhuang, Hui Xiong, Sanglu Lu

Multi-label learning is widely applied in many real-world applications, such as image and gene annotation. While most of the existing multi-label learning models focus on the single-task learning problem, there are always some tasks that share some commonalities, which can help each other to improve the learning performances if the knowledge in the similar tasks can be smartly shared. In this paper, we propose a LABel-sensitive TAsk Grouping framework, named LABTAG, based on Bayesian nonparametric approach for multi-task multi-label classification. The proposed framework explores the label correlations to capture feature-label patterns, and clusters similar tasks into groups with shared knowledge, which are learned jointly to produce a strengthened multi-task multi-label model. We evaluate the model performance on three public multi-task multi-label data sets, and the results show that LABTAG outperforms the compared baselines with a significant margin.

• #3072
Cross-Domain 3D Model Retrieval via Visual Domain Adaption
Anan Liu, Shu Xiang, Wenhui Li, Weizhi Nie, Yuting Su

Recent advances in 3D capturing devices and 3D modeling software have led to extensive and diverse 3D datasets, which usually have different distributions. Cross-domain 3D model retrieval is becoming an important but challenging task. However, existing works mainly focus on 3D model retrieval in a closed dataset, which seriously constrain their implementation for real applications. To address this problem, we propose a novel crossdomain 3D model retrieval method by visual domain adaptation. This method can inherit the advantage of deep learning to learn multi-view visual features in the data-driven manner for 3D model representation. Moreover, it can reduce the domain divergence by exploiting both domainshared and domain-specific features of different domains. Consequently, it can augment the discrimination of visual descriptors for cross-domain similarity measure. Extensive experiments on two popular datasets, under three designed cross-domain scenarios, demonstrate the superiority and effectiveness of the proposed method by comparing against the state-of-the-art methods. Especially, the proposed method can significantly outperform the most recent method for cross-domain 3D model retrieval and the champion of Shrec’16 Large-Scale 3D Shape Retrieval from ShapeNet Core55.

• #4085
Same Representation, Different Attentions: Shareable Sentence Representation Learning from Multiple Tasks
Renjie Zheng, Junkun Chen, Xipeng Qiu

Distributed representation plays an important role in deep learning based natural language processing. However, the representation of a sentence often varies in different tasks, which is usually learned from scratch and suffers from the limited amounts of training data. In this paper, we claim that a good sentence representation should be invariant and can benefit the various subsequent tasks. To achieve this purpose, we propose a new scheme of information sharing for multi-task learning. More specifically, all tasks share the same sentence representation and each task can select the task-specific information from the shared sentence representation with attention mechanisms. The query vector of each task's attention could be either static parameters or generated dynamically. We conduct extensive experiments on 16 different text classification tasks, which demonstrate the benefits of our architecture. Source codes of this paper are available on Github.

• #2791
Online Heterogeneous Transfer Metric Learning
Yong Luo, Tongliang Liu, Yonggang Wen, Dacheng Tao

Distance metric learning (DML) has been demonstrated to be successful and essential in diverse applications. Transfer metric learning (TML) can help DML in the target domain with limited label information by utilizing information from some related source domains. The heterogeneous TML (HTML), where the feature representations vary from the source to the target domain, is general and challenging. However, current HTML approaches are usually conducted in a batch manner and cannot handle sequential data. This motivates the proposed online HTML (OHTML) method. In particular, the distance metric in the source domain is pre-trained using some existing DML algorithms. To enable knowledge transfer, we assume there are large amounts of unlabeled corresponding data that have representations in both the source and target domains. By enforcing the distances (between these unlabeled samples) in the target domain to agree with those in the source domain under the manifold regularization theme, we learn an improved target metric. We formulate the problem in the online setting so that the optimization is efficient and the model can be adapted to new coming data. Experiments in diverse applications demonstrate both effectiveness and efficiency of the proposed method.

• #3379
Predicting Activity and Location with Multi-task Context Aware Recurrent Neural Network
Dongliang Liao, Weiqing Liu, Yuan Zhong, Jing Li, Guowei Wang

Predicting users’ activity and location preferences is of great significance in location based services. Considering that users’ activity and location preferences interplay with each other, many scholars tried to figure out the relation between users’ activities and locations for improving prediction performance. However, most previous works enforce a rigid human-defined modeling strategy to capture these two factors, either activity purpose controlling location preference or spatial region determining activity preference. Unlike existing methods, we introduce spatial-activity topics as the latent factor capturing both users’ activity and location preferences. We propose Multi-task Context Aware Recurrent Neural Network to leverage the spatial activity topic for activity and location prediction. More specifically, a novel Context Aware Recurrent Unit is designed to integrate the sequential dependency and temporal regularity of spatial activity topics. Extensive experimental results based on real-world public datasets demonstrate that the proposed model significantly outperforms state-of-the-art approaches.

• #40
Improving Entity Recommendation with Search Log and Multi-Task Learning
Jizhou Huang, Wei Zhang, Yaming Sun, Haifeng Wang, Ting Liu

Entity recommendation, providing search users with an improved experience by assisting them in finding related entities for a given query, has become an indispensable feature of today's Web search engine. Existing studies typically only consider the query issued at the current time step while ignoring the in-session preceding queries. Thus, they typically fail to handle the ambiguous queries such as "apple" because the model could not understand which apple (company or fruit) is talked about. In this work, we believe that the in-session contexts convey valuable evidences that could facilitate the semantic modeling of queries, and take that into consideration for entity recommendation. Furthermore, in order to better model the semantics of queries, we learn the model in a multi-task learning setting where the query representation is shared across entity recommendation and context-aware ranking. We evaluate our approach using large-scale, real-world search logs of a widely used commercial Web search engine. The experimental results show that incorporating context information significantly improves entity recommendation, and learning the model in a multi-task learning setting could bring further improvements.

• #4231
Experienced Optimization with Reusable Directional Model for Hyper-Parameter Search
Yi-Qi Hu, Yang Yu, Zhi-Hua Zhou

Hyper-parameter selection is a crucial yet difficult issue in machine learning. For this problem, derivative-free optimization has being playing an irreplaceable role. However, derivative-free optimization commonly requires a lot of hyper-parameter samples, while each sample could have a high cost for hyper-parameter selection due to the costly evaluation of a learning model. To tackle this issue, in this paper, we propose an experienced optimization approach, i.e., learning how to optimize better from a set of historical optimization processes. From the historical optimization processes on previous datasets, a directional model is trained to predict the direction of the next good hyper-parameter. The directional model is then reused to guide the optimization in learning new datasets. We implement this mechanism within a state-of-the-art derivative-free optimization method SRacos, and conduct experiments on learning the hyper-parameters of heterogeneous ensembles and neural network architectures. Experimental results verify that the proposed approach can significantly improve the learning accuracy within a limited hyper-parameter sample budget.

### Wednesday 1808:30 - 09:55NLP-SEC - Sentence Embedding, Text Classification (T2)

Chair: Yangqiu Song
• #1285
Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling
Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Sen Wang, Chengqi Zhang
Sentence Embedding, Text Classification

Many natural language processing tasks solely rely on sparse dependencies between a few tokens in a sentence. Soft attention mechanisms show promising performance in modeling local/global dependencies by soft probabilities between every two tokens, but they are not effective and efficient when applied to long sentences. By contrast, hard attention mechanisms directly select a subset of tokens but are difficult and inefficient to train due to their combinatorial nature. In this paper, we integrate both soft and hard attention into one context fusion model, "reinforced self-attention (ReSA)", for the mutual benefit of each other. In ReSA, a hard attention trims a sequence for a soft self-attention to process, while the soft attention feeds reward signals back to facilitate the training of the hard one. For this purpose, we develop a novel hard attention called "reinforced sequence sampling (RSS)", selecting tokens in parallel and trained via policy gradient. Using two RSS modules, ReSA efficiently extracts the sparse dependencies between each pair of selected tokens. We finally propose an RNN/CNN-free sentence-encoding model, "reinforced self-attention network (ReSAN)", solely based on ReSA.  It achieves state-of-the-art performance on both the Stanford Natural Language Inference (SNLI) and the Sentences Involving Compositional Knowledge (SICK) datasets.

• #1685
An Adaptive Hierarchical Compositional Model for Phrase Embedding
Bing Li, Xiaochun Yang, Bin Wang, Wei Wang, Wei Cui, Xianchao Zhang
Sentence Embedding, Text Classification

Phrase embedding aims at representing phrases in a vector space and it is important for the performance of many NLP tasks. Existing models only regard a phrase as either full-compositional or non-compositional, while ignoring the hybrid-compositionality that widely exists, especially in long phrases. This drawback prevents them from having a deeper insight into the semantic structure for long phrases and as a consequence, weakens the accuracy of the embeddings. In this paper, we present a novel method for jointly learning compositionality and phrase embedding by adaptively weighting different compositions using an implicit hierarchical structure. Our model has the ability of adaptively adjusting among different compositions without entailing too much model complexity and time cost. To the best of our knowledge, our work is the first effort that considers hybrid-compositionality in phrase embedding. The experimental evaluation demonstrates that our model outperforms state-of-the-art methods in both similarity tasks and analogy tasks.

• #1909
Transformable Convolutional Neural Network for Text Classification
Liqiang Xiao, Honglun Zhang, Wenqing Chen, Yongkun Wang, Yaohui Jin
Sentence Embedding, Text Classification

Convolutional neural networks (CNNs) have shown their promising performance for natural language processing tasks, which extract n-grams as features to represent the input. However, n-gram based CNNs are inherently limited to fixed geometric structure and cannot proactively adapt to the transformations of features. In this paper, we propose two modules to provide CNNs with the flexibility for complex features and the adaptability for transformation, namely, transformable convolution and transformable pooling. Our method fuses dynamic and static deviations to redistribute the sampling locations, which can capture both current and global transformations. Our modules can be easily integrated by other models to generate new transformable networks. We test proposed modules on two state-of-the-art models, and the results demonstrate that our modules can effectively adapt to the feature transformation in text classification.

• #2269
Instance Weighting with Applications to Cross-domain Text Classification via Trading off Sample Selection Bias and Variance
Rui Xia, Zhenchun Pan, Feng Xu
Sentence Embedding, Text Classification

Domain adaptation is an important problem in natural language processing (NLP) due to the distributional difference between the labeled source domain and the target domain. In this paper, we study the domain adaptation problem from the instance weighting perspective. By using density ratio as the instance weight, the traditional instance weighting approaches can potentially correct the sample selection bias in domain adaptation. However, researchers often failed to achieve good performance when applying instance weighting to domain adaptation in NLP and many negative results were reported in the literature. In this work, we conduct an in-depth study on the causes of the failure, and find that previous work only focused on reducing the sample selection bias, but ignored another important factor, sample selection variance, in domain adaptation. On this basis, we propose a new instance weighting framework by trading off two factors in instance weight learning. We evaluate our approach on two cross-domain text classification tasks and compare it with eight instance weighting methods. The results prove our approach's advantages in domain adaptation performance, optimization efficiency and parameter stability.

• #1962
Inferring Temporal Knowledge for Near-Periodic Recurrent Events
Dinesh Raghu, Surag Nair, Mausam
Sentence Embedding, Text Classification

We define the novel problem of extracting and predicting occurrence dates for a class of recurrent events -- events that are held periodically as per a near-regular schedule (e.g., conferences, film festivals, sport championships). Knowledge-bases such as Freebase contain a large number of such recurring events, but they also miss substantial information regarding specific event instances and their occurrence dates. We develop a temporal extraction and inference engine to fill in the missing dates as well as to predict their future occurrences. Our engine performs joint inference over several knowledge sources -- (1) information about an event instance and its date extracted from text by our temporal extractor, (2) information about the typical schedule (e.g., every second week of June") for a recurrent event extracted by our schedule extractor, and (3) known dates for other instances of the same event. The output of our system is a representation for the event schedule and an occurrence date for each event instance. We find that our system beats humans in predicting future occurrences of recurrent events by significant margins. We release our code and system output for further research.

• #2458
Time-evolving Text Classification with Deep Neural Networks
Yu He, Jianxin Li, Yangqiu Song, Mutian He, Hao Peng
Sentence Embedding, Text Classification

Traditional text classification algorithms are based on the assumption that data are independent and identically distributed. However, in most non-stationary scenarios, data may change smoothly due to long-term evolution and short-term fluctuation, which raises new challenges to traditional methods. In this paper, we present the first attempt to explore evolutionary neural network models for time-evolving text classification. We first introduce a simple way to extend arbitrary neural networks to evolutionary learning by using a temporal smoothness framework, and then propose a diachronic propagation framework to incorporate the historical impact into currently learned features through diachronic connections. Experiments on real-world news data demonstrate that our approaches greatly and consistently outperform traditional neural network models in both accuracy and stability.

• #2173
EZLearn: Exploiting Organic Supervision in Automated Data Annotation
Maxim Grechkin, Hoifung Poon, Bill Howe
Sentence Embedding, Text Classification

Many real-world applications require automated data annotation, such as identifying tissue origins based on gene expressions and classifying images into semantic categories. Annotation classes are often numerous and subject to changes over time, and annotating examples has become the major bottleneck for supervised learning methods. In science and other high-value domains, large repositories of data samples are often available, together with two sources of organic supervision: a lexicon for the annotation classes, and text descriptions that accompany some data samples. Distant supervision has emerged as a promising paradigm for exploiting such indirect supervision by automatically annotating examples where the text description contains a class mention in the lexicon. However, due to linguistic variations and ambiguities, such training data is inherently noisy, which limits the accuracy in this approach. In this paper, we introduce an auxiliary natural language processing system for the text modality, and incorporate co-training to reduce noise and augment signal in distant supervision. Without using any manually labeled data, our EZLearn system learned to accurately annotate data samples in functional genomics and scientific figure comprehension, substantially outperforming state-of-the-art supervised methods trained on tens of thousands of annotated examples.

### Wednesday 1808:30 - 09:55ROB-CV - Robotics and Vision (T1)

Chair: Mohan Sridharan
• #1147
CR-GAN: Learning Complete Representations for Multi-view Generation
Yu Tian, Xi Peng, Long Zhao, Shaoting Zhang, Dimitris N. Metaxas
Robotics and Vision

Generating multi-view images from a single-view input is an important yet challenging problem. It has broad applications in vision, graphics, and robotics. Our study indicates that the widely-used generative adversarial network (GAN) may learn ?incomplete? representations due to the single-pathway framework: an encoder-decoder network followed by a discriminator network.We propose CR-GAN to address this problem. In addition to the single reconstruction path, we introduce a generation sideway to maintain the completeness of the learned embedding space. The two learning paths collaborate and compete in a parameter-sharing manner, yielding largely improved generality to ?unseen? dataset. More importantly, the two-pathway framework makes it possible to combine both labeled and unlabeled data for self-supervised learning, which further enriches the embedding space for realistic generations. We evaluate our approach on a wide range of datasets. The results prove that CR-GAN significantly outperforms state-of-the-art methods, especially when generating from ?unseen? inputs in wild conditions.

• #700
GraspNet: An Efficient Convolutional Neural Network for Real-time Grasp Detection for Low-powered Devices
Umar Asif, Jianbin Tang, Stefan Harrer
Robotics and Vision

Recent research on grasp detection has focused on improving accuracy through deep CNN models, but at the cost of large memory and computational resources. In this paper, we propose an efficient CNN architecture which produces high grasp detection accuracy in real-time while maintaining a compact model design. To achieve this, we introduce a CNN architecture termed GraspNet which has two main branches: i) An encoder branch which downsamples an input image using our novel Dilated Dense Fire (DDF) modules - squeeze and dilated convolutions with dense residual connections. ii) A decoder branch which upsamples the output of the encoder branch to the original image size using deconvolutions and fuse connections. We evaluated GraspNet for grasp detection using offline datasets and a real-world robotic grasping setup. In experiments, we show that GraspNet achieves superior grasp detection accuracy compared to the stateof-the-art computation-efficient CNN models with real-time inference speed on embedded GPU hardware (Nvidia Jetson TX1), making it suitable for low-powered devices.

• #3050
An Appearance-and-Structure Fusion Network for Object Viewpoint Estimation
Yueying Kao, Weiming Li, Zairan Wang, Dongqing Zou, Ran He, Qiang Wang, Minsu Ahn, Sunghoon Hong
Robotics and Vision

Automatic object viewpoint estimation from a single image is an important but challenging problem in machine intelligence community. Although impressive performance has been achieved, current state-of-the-art methods still have difficulty to deal with the visual ambiguity and structure ambiguity in real world images. To tackle these problems, a novel Appearance-and-Structure Fusion network, which we call it ASFnet that estimates viewpoint by fusing both appearance and structure information, is proposed in this paper. The structure information is encoded by precise semantic keypoints and can help address the visual ambiguity. Meanwhile, distinguishable appearance features contribute to overcoming the structure ambiguity. Our ASFnet integrates an appearance path and a structure path to an end-to-end network and allows deep features effectively share supervision from both the two complementary aspects. A convolutional layer is learned to fuse the two path results adaptively. To balance the influence from the two supervision sources, a piecewise loss weight strategy is employed during training. Experimentally, our proposed network outperforms state-of-the-art methods on a public PASCAL 3D+ dataset, which verifies the effectiveness of our method and further corroborates the above proposition.

• #2836
Active Recurrence of Lighting Condition for Fine-Grained Change Detection
Qian Zhang, Wei Feng, Liang Wan, Fei-Peng Tian, Ping Tan
Robotics and Vision

This paper addresses active lighting recurrence (ALR), a new problem that actively relocalizes a light source to physically reproduce the lighting condition for a same scene from single reference image. ALR is of great importance for fine-grained visual monitoring and change detection, because some phenomena or minute changes can only be clearly observed under particular lighting conditions. Hence, effective ALR should be able to online navigate a light source toward the target pose, which is challenging due to the complexity and diversity of real-world lighting \& imaging processes. We propose to use the simple parallel lighting as an analogy model and based on Lambertian law to compose an instant navigation ball for this purpose. We theoretically prove the feasibility of this ALR strategy for realistic near point light sources and its invariance to the ambiguity of normal \& lighting decomposition. Extensive quantitative experiments and challenging real-world tasks on fine-grained change monitoring of cultural heritages verify the effectiveness of our approach. We also validate its generality to non-Lambertian scenes.

• #2284
Implicit Non-linear Similarity Scoring for Recognizing Unseen Classes
Yuchen Guo, Guiguang Ding, Jungong Han, Sicheng Zhao, Bin Wang
Robotics and Vision

Recognizing unseen classes is an important task for real-world applications, due to: 1) it is common that some classes in reality have no labeled image exemplar for training; and 2) novel classes emerge rapidly. Recently, to address this task many zero-shot learning (ZSL) approaches have been proposed where explicit linear scores, like inner product score, are employed to measure the similarity between a class and an image. We argue that explicit linear scoring (ELS) seems too weak to capture complicated image-class correspondence. We propose a simple yet effective framework, called Implicit Non-linear Similarity Scoring (ICINESS). In particular, we train a scoring network which uses image and class features as input, fuses them by hidden layers, and outputs the similarity. Based on the universal approximation theorem, it can approximate the true similarity function between images and classes if a proper structure is used in an implicit non-linear way, which is more flexible and powerful. With ICINESS framework, we implement ZSL algorithms by shallow and deep networks, which yield consistently superior results.

• #2750
Virtual-to-Real: Learning to Control in Visual Semantic Segmentation
Zhang-Wei Hong, Yu-Ming Chen, Hsuan-Kung Yang, Shih-Yang Su, Tzu-Yun Shann, Yi-Hsiang Chang, Brian Hsi-Lin Ho, Chih-Chieh Tu, Tsu-Ching Hsiao, Hsin-Wei Hsiao, Sih-Pin Lai, Yueh-Chuan Chang, Chun-Yi Lee
Robotics and Vision

Collecting training data from the physical world is usually time-consuming and even dangerous for fragile robots, and thus, recent advances in robot learning advocate the use of simulators as the training platform. Unfortunately, the reality gap between synthetic and real visual data prohibits direct migration of the models trained in virtual worlds to the real world. This paper proposes a modular architecture for tackling the virtual-to-real problem. The proposed architecture separates the learning model into a perception module and a control policy module, and uses semantic image segmentation as the meta representation for relating these two modules.  The perception module translates the perceived RGB image to semantic image segmentation.  The control policy module is implemented as a deep reinforcement learning agent, which performs actions based on the translated image segmentation. Our architecture is evaluated in an obstacle avoidance task and a target following task.  Experimental results show that our architecture significantly outperforms all of the baseline methods in both virtual and real environments, and demonstrates a faster learning curve than them.  We also present a detailed analysis for a variety of variant configurations, and validate the transferability of our modular architecture.

• #2196
3D-PhysNet: Learning the Intuitive Physics of Non-Rigid Object Deformations
Zhihua Wang, Stefano Rosa, Bo Yang, Sen Wang, Niki Trigoni, Andrew Markham
Robotics and Vision

The ability to interact and understand the environment is a fundamental prerequisite for a wide range of applications from robotics to augmented reality. In particular, predicting how deformable objects will react to applied forces in real time is a significant challenge. This is further confounded by the fact that shape information about encountered objects in the real world is often impaired by occlusions, noise and missing regions e.g. a robot manipulating an object will only be able to observe a partial view of the entire solid. In this work we present a framework, 3D-PhysNet, which is able to predict how a three-dimensional solid will deform under an applied force using intuitive physics modelling. In particular, we propose a new method to encode the physical properties of the material and the applied force, enabling generalisation over materials. The key is to combine deep variational autoencoders with adversarial training, conditioned on the applied force and the material properties.We further propose a cascaded architecture that takes a single 2.5D depth view of the object and predicts its deformation. Training data is provided by a physics simulator. The network is fast enough to be used in real-time applications from partial views. Experimental results show the viability and the generalisation properties of the proposed architecture.

### Wednesday 1808:30 - 09:55ML-ACT - Active Learning (K11)

Chair: Sheng-Jun Huang
• #1487
Cost-Effective Active Learning for Hierarchical Multi-Label Classification
Yi-Fan Yan, Sheng-Jun Huang
Active Learning

Active learning reduces the labeling cost by actively querying labels for the most valuable data. It is particularly important for multi-label learning, where the annotation cost is rather high because each instance may have multiple labels simultaneously. In many multi-label tasks, the labels are organized into hierarchies from coarse to fine. The labels at different levels of the hierarchy contribute differently to the model training, and also have diverse annotation costs. In this paper, we propose a multi-label active learning approach to exploit the label hierarchies for cost-effective queries. By incorporating the potential contribution of ancestor and descendant labels, a novel criterion is proposed to estimate the informativeness of each candidate query. Further, a subset selection method is introduced to perform active batch selection by balancing the informativeness and cost of each instance-label pair. Experimental results validate the effectiveness of both the proposed criterion and the selection method.

• #2386
On Whom Should I Perform this Lab Test Next? An Active Feature Elicitation Approach
Sriraam Natarajan, Srijita Das, Nandini Ramanan, Gautam Kunapuli, Predrag Radivojac
Active Learning

We consider the problem of actively feature elicitation in which given a few examples with all the features (say the full EHR) and a few examples with some of the features (say demographics), the goal is to identify the set of examples on whom more informationÂ  (say the lab tests) needs to be collected. The observation is that some set of features may be more expensive, personal or cumbersome to collect. We propose an active learning approach which identifies examples that are dissimilar to the ones with the full set of data and acquire the complete set of features for these examples. Motivated by real clinical tasks, our extensive evaluation on three clinical tasks demonstrate the effectiveness of this approach.

• #4050
Hierarchical Active Learning with Group Proportion Feedback
Zhipeng Luo, Milos Hauskrecht
Active Learning

Learning of classification models in practice often relies on nontrivial human annotation effort in which humans assign class labels to data instances. As this process can be very time consuming and costly, finding effective ways to reduce the annotation cost becomes critical for building such models. In this work we solve this problem by exploring a new approach that actively learns classification models from groups, which are subpopulations of instances, and human feedback on the groups. Each group is labeled with a number in [0,1] interval representing a human estimate of the proportion of instances with one of the class labels in this subpopulation. To form the groups to be annotated, we develop a hierarchical active learning framework that divides the whole population into smaller subpopulations, which allows us to gradually learn more refined models from the subpopulations and their class proportion labels. Our extensive experiments on numerous datasets show that our method is competitive and outperforms existing approaches for reducing the human annotation cost.

• #3901
Experimental Design under the Bradley-Terry Model
Yuan Guo, Peng Tian, Jayashree Kalpathy-Cramer, Susan Ostmo, J.Peter Campbell, Michael F.Chiang, Deniz Erdogmus, Jennifer Dy, Stratis Ioannidis
Active Learning

Labels generated by human experts via comparisons exhibit smaller variance compared to traditional sample labels. Collecting comparison labels is challenging over large datasets, as the number of comparisons grows quadratically with the dataset size. We study the following experimental design problem: given a budget of expert comparisons, and a set of existing sample labels, we determine the comparison labels to collect that lead to the highest classification improvement. We study several experimental design objectives motivated by the Bradley-Terry model. The resulting optimization problems amount to maximizing submodular functions. We experimentally evaluate the performance of these methods over synthetic and real-life datasets.

• #3268
Self-Supervised Deep Low-Rank Assignment Model for Prototype Selection
Xingxing Zhang, Zhenfeng Zhu, Yao Zhao, Deqiang Kong
Active Learning

Prototype selection is a promising technique for removing redundancy and irrelevance from large-scale data. Here, we consider it as a task assignment problem, which refers to assigning each element of a source set to one representative, i.e., prototype. However, due to the outliers and uncertain distribution on source, the selected prototypes are generally less representative and interesting. To alleviate this issue, we develop in this paper a Self-supervised Deep Low-rank Assignment model (SDLA). By dynamically integrating a low-rank assignment model with deep representation learning, our model effectively ensures the goodness-of-exemplar and goodness-of-discrimination of selected prototypes. Specifically, on the basis of a denoising autoencoder, dissimilarity metrics on source are continuously self-refined in embedding space with weak supervision from selected prototypes, thus preserving categorical similarity. Conversely, working on this metric space, similar samples tend to select the same prototypes by designing a low-rank assignment model. Experimental results on applications like text clustering and image classification (using prototypes) demonstrate our method is considerably superior to the state-of-the-art methods in prototype selection.

• #3317
New Balanced Active Learning Model and Optimization Algorithm
Xiaoqian Wang, Yijun Huang, Ji Liu, Heng Huang
Active Learning

It is common in machine learning applications that unlabeled data are abundant while acquiring labels is extremely difficult. In order to reduce the cost of training model while maintaining the model quality, active learning provides a feasible solution. Instead of acquiring labels for random samples, active learning methods carefully select the data to be labeled so as to alleviate the impact from the redundancy or noise in the selected data and improve the trained model performance. In early stage experimental design, previous active learning methods adopted data reconstruction framework, such that the selected data maintained high representative power. However, these models did not consider the data class structure, thus the selected samples could be predominated by the samples from major classes. Such mechanism fails to include samples from the minor classes thus tends to be less "representative". To solve this challenging problem, we propose a novel active learning model for the early stage of experimental design. We use exclusive sparsity norm to enforce the selected samples to be (roughly) evenly distributed among different groups. We provide a new efficient optimization algorithm and theoretically prove the optimal convergence rate O(1/{T^2}). With a simple substitution, we reduce the computational load of each iteration from O(n^3) to O(n^2), which makes our algorithm more scalable than previous frameworks.

• #2131
Adversarial Active Learning for Sequences Labeling and Generation
Yue Deng, KaWai Chen, Yilin Shen, Hongxia Jin
Active Learning

We introduce an active learning framework for general sequence learning tasks including sequence labeling and generation. Most existing active learning algorithms mainly rely on an uncertainty measure derived from the probabilistic classifier for query sample selection. However, such approaches suffer from two shortcomings in the context of sequence learning including 1) cold start problem and 2) label sampling dilemma. To overcome these shortcomings, we propose a deep-learning-based active learning framework to directly identify query samples from the perspective of adversarial learning.  Our approach intends to offer labeling  priorities for sequences whose information content are least covered by existing labeled data. We verify our sequence-based active learning approach  on two tasks including sequence labeling and sequence generation.

### Wednesday 1808:30 - 09:55SIS-WEB - Sister Conferences: Web, Recommendation, Retrieval (C2)

Chair: Jia Jia
• #5105
Modeling the Assimilation-Contrast Effects in Online Product Rating Systems: Debiasing and Recommendations
Xiaoying Zhang, Hong Xie, Junzhou Zhao, John C.S. Lui
Sister Conferences: Web, Recommendation, Retrieval

The unbiasedness of online product ratings, an important property to ensure that users’ ratings indeed reflect their true evaluations to products, is vital both in shaping consumer purchase decisions and providing reliable recommendations. Recent experimental studies showed that distortions from historical ratings would ruin the unbiasedness of subsequent ratings. How to “discover” the distortions from historical ratings in each single rating (or at the micro-level), and perform the “debiasing operations” in real rating systems are the main objectives of this work. Using 42 million real customer ratings, we first show that users either “assimilate” or “contrast” to historical ratings under different scenarios: users conform to historical ratings if historical ratings are not far from the product quality (assimilation), while users deviate from historical ratings if historical ratings are significantly different from the product quality (contrast). This phenomenon can be explained by the well-known psychological argument: the “Assimilate-Contrast” theory. However, none of the existing works on modeling historical ratings’ influence have taken this into account, and this motivates us to propose the Histori- cal Influence Aware Latent Factor Model (HIALF), the first model for real rating systems to capture and mitigate historical distortions in each single rating. HIALF also allows us to study the influence patterns of historical ratings from a modeling perspective, and it perfectly matches the assimilation and contrast effects we previously observed. Also, HIALF achieves significant improvements in predicting subsequent ratings, and accurately predicts the relationships revealed in previous empirical measurements on real ratings. Finally, we show that HIALF can contribute to better recommendations by decoupling users’ real preference from distorted ratings, and reveal the intrinsic product quality for wiser consumer purchase decisions.

• #5113
Translation-based Recommendation: A Scalable Method for Modeling Sequential Behavior
Ruining He, Wang-Cheng Kang, Julian McAuley
Sister Conferences: Web, Recommendation, Retrieval

Modeling the complex interactions between users and items is at the core of designing successful recommender systems. One key task consists of predicting users’ personalized sequential behavior, where the challenge mainly lies in modeling ‘third-order’ interactions between a user, her previously visited item(s), and the next item to consume. In this paper, we propose a unified method, TransRec, to model such interactions for largescale sequential prediction. Methodologically, we embed items into a ‘transition space’ where users are modeled as translation vectors operating on item sequences. Empirically, this approach outperforms the state-of-the-art on a wide spectrum of real-world datasets.

• #5148
Unbiased Learning-to-Rank with Biased Feedback
Thorsten Joachims, Adith Swaminathan, Tobias Schnabel
Sister Conferences: Web, Recommendation, Retrieval

Implicit feedback (e.g., clicks, dwell times, etc.) is an abundant source of data in human-interactive systems. While implicit feedback has many advantages (e.g., it is inexpensive to collect, user-centric, and timely), its inherent biases are a key obstacle to its effective use. For example, position bias in search rankings strongly influences how many clicks a result receives, so that directly using click data as a training signal in Learning-to-Rank (LTR) methods yields sub-optimal results. To overcome this bias problem, we present a counterfactual inference framework that provides the theoretical basis for unbiased LTR via Empirical Risk Minimization despite biased data. Using this framework, we derive a propensity-weighted ranking SVM for discriminative learning from implicit feedback, where click models take the role of the propensity estimator. Beyond the theoretical support, we show empirically that the proposed learning method is highly effective in dealing with biases, that it is robust to noise and propensity model mis-specification, and that it scales efficiently. We also demonstrate the real-world applicability of our approach on an operational search engine, where it substantially improves retrieval performance.

• #5108
A Model of Distributed Query Computation in Client-Server Scenarios on the Semantic Web
Olaf Hartig, Ian Letter, Jorge Pérez
Sister Conferences: Web, Recommendation, Retrieval

This paper provides an overview of a model for capturing properties of client-server-based query computation setups. This model can be used to formally analyze different combinations of client and server capabilities, and compare them in terms of various fine-grain complexity measures. While the motivations and the focus of the presented work are related to querying the Semantic Web, the main concepts of the model are general enough to be applied in other contexts as well.

• #5109
Reducing Controversy by Connecting Opposing Views
Kiran Garimella, Gianmarco De Francisci Morales, Aristides Gionis, Michael Mathioudakis
Sister Conferences: Web, Recommendation, Retrieval

Controversial issues often split the population into groups with opposing views. When such issues emerge on social media, we often observe the creation of "echo chambers," i.e., situations where like-minded people reinforce each other’s opinion, but do not get exposed to the views of the opposing side. In this paper we study algorithmic techniques for bridging these chambers, and thus reduce controversy. Specifically, we represent discussions as graphs, and cast our objective as an edge-recommendation problem. The goal of the recommendation is to reduce the controversy score of the graph, measured by a recently-developed metric based on random walks. At the same time, we take into account the acceptance probability of the recommended edges, which represent the probability that the recommended edges materialize in the graph.

• #5127
Learning with Sparse and Biased Feedback for Personal Search
Michael Bendersky, Xuanhui Wang, Marc Najork, Donald Metzler
Sister Conferences: Web, Recommendation, Retrieval

Personal search, including email, on-device, and personal media search, has recently attracted a considerable attention from the information retrieval community. In this paper, we provide an overview of challenges and opportunities of learning with implicit user feedback (e.g., click data) in personal search. Implicit user feedback provides a convenient source of supervision for ranking models in personal search. This feedback, however, has two major drawbacks: it is highly sparse and biased due to the personal nature of queries and documents. We demonstrate how these drawbacks can be overcome, and empirically demonstrate the benefits of learning with implicit feedback in the context of a large-scale email search engine.

• #5134
A Conversational Approach to Process-oriented Case-based Reasoning
Christian Zeyen, Gilbert Müller, Ralph Bergmann
Sister Conferences: Web, Recommendation, Retrieval

Process-oriented case-based reasoning (POCBR) supports workflow modeling by retrieving and adapting workflows that have proved useful in the past. Current approaches typically require users to specify detailed queries, which can be a demanding task. Conversational case-based reasoning (CCBR) particularly addresses this problem by proposing methods that incrementally elicit the relevant features of the target problem in an interactive dialog. However, no CCBR approaches exist that are applicable for workflow cases that go beyond attribute-value representations such as labeled graphs. This paper closes this gap and presents a conversational POCBR approach (C-POCBR) in which questions related to structural properties of the workflow cases are generated automatically. An evaluation with cooking workflows indicates that C-POCBR can reduce the communication effort for users during retrieval.

### Wednesday 1808:30 - 09:55ML-NN2 - Neural Networks (C3)

Chair: Nina Narodystka
• #1389
AAR-CNNs: Auto Adaptive Regularized Convolutional Neural Networks
Yao Lu, Guangming Lu, Yuanrong Xu, Bob Zhang
Neural Networks

In order to address the overfitting problem caused by the small or simple training datasets and the large model’s size in Convolutional Neural Networks (CNNs), a novel Auto Adaptive Regularization (AAR) method is proposed in this paper. The relevant networks can be called AAR-CNNs. AAR is the first method using the “abstraction extent” (predicted by AE net) and a tiny learnable module (SE net) to auto adaptively predict more accurate and individualized regularization information. The AAR module can be directly inserted into every stage of any popular networks and trained end to end to improve the networks’ flexibility. This method can not only regularize the network at both the forward and the backward processes in the training phase, but also regularize the network on a more refined level (channel or pixel level) depending on the abstraction extent’s form. Comparative experiments are performed on low resolution ImageNet, CIFAR and SVHN datasets. Experimental results show that the AAR-CNNs can achieve state-of-the-art performances on these datasets.

• #1638
A Unified Analysis of Stochastic Momentum Methods for Deep Learning
Yan Yan, Tianbao Yang, Zhe Li, Qihang Lin, Yi Yang
Neural Networks

Stochastic momentum methods have been widely adopted in training deep neural networks. However, their theoretical analysis of convergence of the training objective and the generalization error for prediction is still under-explored. This paper aims to bridge the gap between practice and theory by analyzing the stochastic gradient (SG) method, and the stochastic momentum methods including two famous variants, i.e., the stochastic heavy-ball (SHB) method and the stochastic variant of Nesterov?s accelerated gradient (SNAG) method. We propose a framework that unifies the three variants. We then derive the convergence rates of the norm of gradient for the non-convex optimization problem, and analyze the generalization performance through the uniform stability approach. Particularly, the convergence analysis of the training objective exhibits that SHB and SNAG have no advantage over SG. However, the stability analysis shows that the momentum term can improve the stability of the learned model and hence improve the generalization performance. These theoretical insights verify the common wisdom and are also corroborated by our empirical analysis on deep learning.

• #1745
DeepTravel: a Neural Network Based Travel Time Estimation Model with Auxiliary Supervision
Hanyuan Zhang, Hao Wu, Weiwei Sun, Baihua Zheng
Neural Networks

Estimating the travel time of a path is of great importance to smart urban mobility. Existing approaches are either based on estimating the time cost of each road segment which are not able to capture many cross-segment complex factors, or designed heuristically in a non-learning-based way which fail to leverage the natural abundant temporal labels of the data, i.e., the time stamp of each trajectory point. In this paper, we leverage on new development of deep neural networks and propose a novel auxiliary supervision model, namely DeepTravel, that can automatically and effectively extract different features, as well as make full use of the temporal labels of the trajectory data. We have conducted comprehensive experiments on real datasets to demonstrate the out-performance of DeepTravel over existing approaches.

• #4425
A Novel Data Representation for Effective Learning in Class Imbalanced Scenarios
Sri Harsha Dumpala, Rupayan Chakraborty, Sunil Kumar Kopparapu
Neural Networks

Class imbalance refers to the scenario where certain classes are highly under-represented compared to other classes in terms of the availability of training data. This situation hinders the applicability of conventional machine learning algorithms to most of the classification problems where class imbalance is prominent. Most existing methods addressing class imbalance either rely on sampling techniques or cost-sensitive learning methods; thus inheriting their shortcomings. In this paper, we introduce a novel approach that is different from sampling or cost-sensitive learning based techniques, to address the class imbalance problem, where two samples are simultaneously considered to train the classifier. Further, we propose a mechanism to use a single base classifier, instead of an ensemble of classifiers, to obtain the output label of the test sample using majority voting method. Experimental results on several benchmark datasets clearly indicate the usefulness of the proposed approach over the existing state-of-the-art techniques.

• #3225
CAGAN: Consistent Adversarial Training Enhanced GANs
Yao Ni, Dandan Song, Xi Zhang, Hao Wu, Lejian Liao
Neural Networks

Generative adversarial networks (GANs) have shown impressive results, however, the generator and the discriminator are optimized in finite parameter space which means their performance still need to be improved. In this paper, we propose a novel approach of adversarial training between one generator and an exponential number of critics which are sampled from the original discriminative neural network via dropout. As discrepancy between outputs of different sub-networks of a same sample can measure the consistency of these critics, we encourage the critics to be consistent to real samples and inconsistent to generated samples during training, while the generator is trained to generate consistent samples for different critics. Experimental results demonstrate that our method can obtain state-of-the-art Inception scores of 9.17 and 10.02 on supervised CIFAR-10 and unsupervised STL-10 image generation tasks, respectively, as well as achieve competitive semi-supervised classification results on several benchmarks. Importantly, we demonstrate that our method can maintain stability in training and alleviate mode collapse.

• #2361
Convolutional Memory Blocks for Depth Data Representation Learning
Keze Wang, Liang Lin, Chuangjie Ren, Wei Zhang, Wenxiu Sun
Neural Networks

Compared to natural RGB images, data captured by 3D / depth sensors (e.g., Microsoft Kinect) have different properties, e.g., less discriminable in appearance due to lacking color / texture information. Applying convolutional neural networks (CNNs) on these depth data would lead to unsatisfying learning efficiency, i.e., requiring large amounts of annotated training data for convergence. To address this issue, this paper proposes a novel memory network module, called Convolutional Memory Block (CMB), which empowers CNNs with the memory mechanism on handling depth data. Different from the existing memory networks that store long / short term dependency from sequential data, our proposed CMB focuses on modeling the representative dependency (correlation) among non-sequential samples. Specifically, our CMB consists of one internal memory (i.e., a set of feature maps) and three specific controllers, which enable a powerful yet efficient memory manipulation mechanism. In this way, the internal memory, being implicitly aggregated from all previous inputted samples, can learn to store and utilize representative features among the samples. Furthermore, we employ our CMB to develop a concise framework for predicting articulated pose from still depth images. Comprehensive evaluations on three public benchmarks demonstrate significant superiority (about 6%) of our framework over all the compared methods. More importantly, thanks to the enhanced learning efficiency, our framework can still achieve satisfying results using 50% less training data.

• #4025
Counterexample-Guided Data Augmentation
Tommaso Dreossi, Shromona Ghosh, Xiangyu Yue, Kurt Keutzer, Alberto Sangiovanni-Vincentelli, Sanjit A. Seshia
Neural Networks

We present a novel framework for augmenting data sets for machine learning based on counterexamples. Counterexamples are misclassified examples that have important properties for retraining and improving the model. Key components of our framework include a \textit{counterexample generator}, which produces data items that are misclassified by the model and error tables, a novel data structure that stores information pertaining to misclassifications. Error tables can be used to explain the model's vulnerabilities and are used to efficiently generate counterexamples for augmentation. We show the efficacy of the proposed framework by comparing it to classical augmentation techniques on a case study of object detection in autonomous driving based on deep neural networks.

### Wednesday 1808:30 - 09:55Open Session (K21)

Chair: Reinhard Lafrenz, Laure Le Bars
• A future European AI ecosystem and On-Demand platform
Open Session
• ### Wednesday 1808:55 - 09:55Industry Day (A4)

• Industry Day - Session 1a
Industry Day
• ### Wednesday 1809:55 - 16:40Competition (Registration Area)

Chair: Jochen Renz
• The Angry Birds Human vs Machine Challenge
Competition
• ### Wednesday 1810:25 - 11:10Invited Talk (VICTORIA)

Chair: Vincent Conitzer
• Maximizing the Social Good: Markets without Money
Nicole Immorlica
Invited Talk
• ### Wednesday 1810:25 - 12:30Open Session (K21)

Chair: Alibaba Group
• Alimama - Smart Advertising Workshop
Open Session
• ### Wednesday 1810:25 - 12:45Industry Day (A4)

• Industry Day - Session 1b
Industry Day
• ### Wednesday 1811:20 - 12:45SUR-ML - Survey Track: Machine Learning (VICTORIA)

Chair: Longbing Cao
• #5414
Robust Multi-view Representation: A Unified Perspective from Multi-view Learning to Domain Adaption
Zhengming Ding, Ming Shao, Yun Fu
Survey Track: Machine Learning

Multi-view data are extensively accessible nowadays thanks to various types of features, different view-points and sensors which tend to facilitate better representation in many key applications. This survey covers the topic of robust multi-view data representation, centered around several major visual applications. First of all, we formulate a unified learning framework which is able to model most existing multi-view learning and domain adaptation in this line. Following this, we conduct a comprehensive discussion across these two problems by reviewing the algorithms along these two topics, including multi-view clustering, multi-view classification, zero-shot learning, and domain adaption. We further present more practical challenges in multi-view data analysis. Finally, we discuss future research including incomplete, unbalance, large-scale multi-view learning. This would benefit AI community from literature review to future direction.

• #5433
Systems AI: A Declarative Learning Based Programming Perspective
Parisa Kordjamshidi, Dan Roth, Kristian Kersting
Survey Track: Machine Learning

Data-driven approaches are becoming dominant problem-solving techniques in many areas of research and industry. Unfortunately, current technologies do not make such techniques easy to use for application experts who are not fluent in machine learning nor for machine learning experts who aim at testing ideas on real-world data and need to evaluate those as a part of an end-to-end system. We review key efforts made by various AI communities to provide languages for high-level abstractions over learning and reasoning techniques needed for designing complex AI systems. We classify the existing frameworks based on the type of techniques as well as the data and knowledge representations they use, provide a comparative study of the way they address the challenges of programming real-world applications, and highlight some shortcomings and future directions.

• #5437
Yanan Sui, Masrour Zoghi, Katja Hofmann, Yisong Yue
Survey Track: Machine Learning

The dueling bandits problem is an online learning framework where learning happens on-the-fly'' through preference feedback, i.e., from comparisons between a pair of actions. Unlike conventional online learning settings that require absolute feedback for each action, the dueling bandits framework assumes only the presence of (noisy) binary feedback about the relative quality of each pair of actions. The dueling bandits problem is well-suited for modeling settings that elicit subjective or implicit human feedback, which is typically more reliable in preference form. In this survey, we review recent results in the theories, algorithms, and applications of the dueling bandits problem. As an emerging domain, the theories and algorithms of dueling bandits have been intensively studied during the past few years. We provide an overview of recent advancements, including algorithmic advances and applications. We discuss extensions to standard problem formulation and novel application areas, highlighting key open research questions in our discussion.

• #5438
Boosting Combinatorial Problem Modeling with Machine Learning
Michele Lombardi, Michela Milano
Survey Track: Machine Learning

In the past few years, the area of Machine Learning (ML) has witnessed tremendous advancements, becoming a pervasive technology in a wide range of applications. One area that can significantly benefit from the use of ML is Combinatorial Optimization. The three pillars of constraint satisfaction and optimization problem solving, i.e., modeling, search, and optimization, can exploit ML techniques to boost their accuracy, efficiency and effectiveness. In this survey we focus on the modeling component, whose effectiveness is crucial for solving the problem. The modeling activity has been traditionally shaped by optimization and domain experts, interacting to provide realistic results. Machine Learning techniques can tremendously ease the process, and exploit the available data to either create models or refine expert-designed ones. In this survey we cover approaches that have been recently proposed to enhance the modeling process by learning either single constraints, objective functions, or the whole model. We highlight common themes to multiple approaches and draw connections with related fields of research.

### Wednesday 1811:20 - 12:45KR-MAS3 - Knowledge Representation and Agents: Arguing and Negotiating (C7)

Chair: Faria Nassiri-Mofakham
• #1752
Two Sides of the Same Coin: Belief Revision and Enforcing Arguments
Adrian Haret, Johannes P. Wallner, Stefan Woltran
Knowledge Representation and Agents: Arguing and Negotiating

We study a type of change on knowledge bases inspired by the dynamics of formal argumentation systems, where the goal is to enforce acceptance of certain arguments. We put forward that enforcing acceptance of arguments can be viewed as a member of the wider family of belief change operations, and that an axiomatic treatment of it is therefore desirable. In our case, laying down axioms enables a precise account of the close connection between enforcing arguments and belief revision. Our analysis of enforcing arguments proceeds by (i) axiomatizing it as an operation in propositional logic and providing a representation result in terms of rankings on sets of interpretations, (ii) showing that it stands in close relationship to belief revision, and (iii) using it as a gateway towards a principled treatment of enforcement in abstract argumentation.

• #504
A Study of Argumentative Characterisations of Preferred Subtheories
Marcello D'Agostino, Sanjay Modgil
Knowledge Representation and Agents: Arguing and Negotiating

Classical logic argumentation (Cl-Arg) under the stable semantics yields argumentative characterisations of non-monotonic inference in Preferred Subtheories. This paper studies these characterisations under both the standard approach to Cl-Arg, and a recent dialectical approach that is provably rational under resource bounds. Two key contributions are made. Firstly, the preferred extensions are shown to coincide with the stable extensions. This means that algorithms and proof theories for the admissible semantics can now be used to decide credulous inference in Preferred Subtheories. Secondly, we show that as compared with the standard approach, the grounded semantics applied to the dialectical approach more closely approximates sceptical inference in Preferred Subtheories.

• #2712
Probabilistic bipolar abstract argumentation frameworks: complexity results
Bettina Fazzinga, Sergio Flesca, Filippo Furfaro
Knowledge Representation and Agents: Arguing and Negotiating

Probabilistic Bipolar Abstract Argumentation Frameworks (prBAFs), combining the possibility of specifying supports between arguments with a probabilistic modeling of the uncertainty, are considered, and the complexity of the fundamentalproblem of computing extensions' probabilities is addressed.The most popular semantics of supports and extensions are considered, as well as different paradigms for defining the probabilistic encoding of the uncertainty.Interestingly, the presence of supports, which does not alter the complexity of verifying extensions in the deterministic case, is shown to introduce a new source of complexity in some probabilistic settings, for which tractable cases are also identified.

• #2977
An Empirical Study of Knowledge Tradeoffs in Case-Based Reasoning
Devi Ganesan, Sutanu Chakraborti
Knowledge Representation and Agents: Arguing and Negotiating

Case-Based Reasoning provides a framework for integrating domain knowledge with data in the form of four knowledge containers namely Case base, Vocabulary, Similarity and Adaptation. It is a known fact in Case-Based Reasoning community that knowledge can be interchanged between the containers. However, the explicit interplay between them, and how this interchange is affected by the knowledge richness of the underlying domain is not yet fully understood. We attempt to bridge this gap by proposing footprint size reduction as a measure for quantifying knowledge tradeoffs between containers. The proposed measure is empirically evaluated on synthetic as well as real world datasets. From a practical standpoint, footprint size reduction provides a unified way of estimating the impact of a given piece of knowledge in any knowledge container, and can also suggest ways of characterizing the nature of domains ranging from ill-defined to well-defined ones. Our study also makes evident the need for maintenance approaches that go beyond case base and competence to include other containers and performance objectives.

• #4004
Relevance in Structured Argumentation
AnneMarie Borg, Christian Straßer
Knowledge Representation and Agents: Arguing and Negotiating

We study properties related to relevance in non-monotonic consequence relations obtained by systems of structured argumentation. Relevance desiderata concern the robustness of a consequence relation under the addition of irrelevant information. For an account of what (ir)relevance amounts to we use syntactic and semantic considerations. Syntactic criteria have been proposed in the domain of relevance logic and were recently used in argumentation theory under the names of non-interference and crash-resistance. The basic idea is that the conclusions of a given argumentative theory should be robust under adding information that shares no propositional variables with the original database. Some semantic relevance criteria are known from non-monotonic logic. For instance, cautious monotony states that if we obtain certain conclusions from an argumentation theory, we may expect to still obtain the same conclusions if we add some of them to the given database. In this paper we investigate properties of structured argumentation systems that warrant relevance desiderata.

• #1986
Argumentation-Based Recommendations: Fantastic Explanations and How to Find Them
Antonio Rago, Oana Cocarascu, Francesca Toni
Knowledge Representation and Agents: Arguing and Negotiating

A significant problem of recommender systems is their inability to explain recommendations, resulting in turn in ineffective feedback from users and the inability to adapt to users’ preferences. We propose a hybrid method for calculating predicted ratings, built upon an item/aspect-based graph with users’ partially given ratings, that can be naturally used to provide explanations for recommendations, extracted from user-tailored Tripolar Argumentation Frameworks (TFs). We show that our method can be understood as a gradual semantics for TFs, exhibiting a desirable, albeit weak, property of balance. We also show experimentally that our method is competitive in generating correct predictions, compared with state-of-the-art methods, and illustrate how users can interact with the generated explanations to improve quality of recommendations.

• #5462
(Journal track) On the Equivalence between Assumption-Based Argumentation and Logic Programming
Knowledge Representation and Agents: Arguing and Negotiating

In this work, we explain how Assumption-Based Argumentation (ABA) is subsumed by Logic Programming (LP). The translation from ABA to LP (with a few restrictions on the ABA framework) results in a normal logic program whose semantics coincide with the semantics of the underlying ABA framework. Although the precise technicalities are beyond the current extended abstract (these can be found in the associated full paper) we provide a number of examples to illustrate the general idea.

### Wednesday 1811:20 - 12:45CSAT-CSAT - Constraints and Satisfiability (K2)

Chair: Sophie Tourret
• #1908
A Fast Algorithm for Generalized Arc Consistency of the Alldifferent Constraint
Xizhe Zhang, Qian Li, Weixiong Zhang
Constraints and Satisfiability

The alldifferent constraint is an essential ingredient of most Constraints Satisfaction Problems (CSPs). It has been known that the generalized arc consistency (GAC) of alldifferent constraints can be reduced to the maximum matching problem in a value graph. The redundant edges, which do not appear in any maximum matching of the value graph, can and should be removed from the graph. The existing methods attempt to identify these redundant edges by computing the strongly connected components after finding a maximum matching for the graph. Here, we present a novel theorem for identification of the redundant edges. We show that some of the redundant edges can be immediately detected after finding a maximum matching. Based on this theoretical result, we present an efficient algorithm for processing alldifferent constraints. Experimental results on real problems show that our new algorithm significantly outperforms the-state-of-art approaches.

• #3777
Compact-MDD: Efficiently Filtering (s)MDD Constraints with Reversible Sparse Bit-sets
Hélène Verhaeghe, Christophe Lecoutre, Pierre Schaus
Constraints and Satisfiability

Multi-Valued Decision Diagrams (MDDs) are instrumental in modeling combinatorial problems with Constraint Programming.In this paper, we propose a related data structure called sMDD (semi-MDD) where the central layer of the diagrams is non-deterministic.We show that it is easy and efficient to transform any table (set of tuples) into an sMDD.We also introduce a new filtering algorithm, called Compact-MDD, which is based on bitwise operations, and can be applied to both MDDs and sMDDs.Our experimental results show the practical interest of our approach, both in terms of compression and filtering speed.

• #3873
Core-Guided Minimal Correction Set and Core Enumeration
Nina Narodytska, Nikolaj Bjørner, Maria-Cristina Marinescu, Mooly Sagiv
Constraints and Satisfiability

A set of constraints is unsatisfiable if there is no solution that satisfies these constraints. To analyse unsatisfiable problems, the user needs to understand where inconsistencies come from and how they can be repaired. Minimal unsatisfiable cores and correction sets are important subsets of constraints that enable such analysis. In this work, we propose a new algorithm for extracting minimal unsatisfiable cores and correction sets simultaneously. Building on top of the relaxation and strengthening framework, we introduce novel techniques for extracting these sets. Our new solver significantly outperforms several state of the art algorithms on common benchmarks when it comes to extracting correction sets and compares favorably on core extraction.

• #3860
A Reactive Strategy for High-Level Consistency During Search
Robert J. Woodward, Berthe Y. Choueiry, Christian Bessiere
Constraints and Satisfiability

Constraint propagation during backtrack search significantly improves the performance of solving a Constraint Satisfaction Problem. While Generalized Arc Consistency (GAC) is the most popular level of propagation, higher-level consistencies (HLC) are needed to solve difficult instances. Deciding to enforce an HLC instead of GAC remains the topic of active research. We propose a simple and effective strategy that reactively triggers an HLC by monitoring search performance: When search starts thrashing, we trigger an HLC, then conservatively revert to GAC. We detect thrashing by counting the number of backtracks at each level of the search tree and geometrically adjust the frequency of triggering an HLC based on its filtering effectiveness. We validate our approach on benchmark problems using Partition-One Arc-Consistency as an HLC. However, our strategy is generic and can be used with other higher-level consistency algorithms.

• #5125
(Sister Conferences Best Papers Track) Reduced Cost Fixing for Maximum Satisfiability
Fahiem Bacchus, Antti Hyttinen, Matti Järvisalo, Paul Saikko
Constraints and Satisfiability

Maximum satisfiability (MaxSAT) offers a competitive approach to solving NP-hard real-world optimization problems. While state-of-the-art MaxSAT solvers rely heavily on Boolean satisfiability (SAT) solvers, a recent trend, brought on by MaxSAT solvers implementing the so-called implicit hitting set (IHS) approach, is to integrate techniques from the realm of integer programming (IP) into the solving process. This allows for making use of additional IP solving techniques to further speed up MaxSAT solving. In this line of work, we investigate the integration of the technique of reduced cost fixing from the IP realm into IHS solvers, and empirically show that reduced cost fixing considerable speeds up a state-of-the-art MaxSAT solver implementing the IHS approach.

• #5119
(Sister Conferences Best Papers Track) Multi-Objective Optimization Through Pareto Minimal Correction Subsets
Miguel Terra-Neves, Inês Lynce, Vasco Manquinho
Constraints and Satisfiability

A Minimal Correction Subset (MCS) of an unsatisfiable constraint set is a minimal subset of constraints that, if removed, makes the constraint set satisfiable. MCSs enjoy a wide range of applications, such as finding approximate solutions to constrained optimization problems. However, existing work on applying MCS enumeration to optimization problems focuses on the single-objective case. In this work, Pareto Minimal Correction Subsets (Pareto-MCSs) are proposed for approximating the Pareto-optimal solution set of multi-objective constrained optimization problems. We formalize and prove an equivalence relationship between Pareto-optimal solutions and Pareto-MCSs. Moreover, Pareto-MCSs and MCSs can be connected in such a way that existing state-of-the-art MCS enumeration algorithms can be used to enumerate Pareto-MCSs. Finally, experimental results on the multi-objective virtual machine consolidation problem show that the Pareto-MCS approach is competitive with state-of-the-art algorithms.

• #5149
(Sister Conferences Best Papers Track) Dynamic Dependency Awareness for QBF
Constraints and Satisfiability

We provide the first proof complexity results for QBF dependency calculi. By showing that the reflexive resolution path dependency scheme admits exponentially shorter Q-resolution proofs on a known family of instances, we answer a question first posed by Slivovsky and Szeider (SAT 2014). Further, we introduce a new calculus in which a dependency scheme is applied dynamically. We demonstrate the further potential of this approach beyond that of the existing static system with an exponential separation.

### Wednesday 1811:20 - 12:45NLP-QA - Question Answering (T2)

Chair: Cynthia Matuszek
• #318
Quality Matters: Assessing cQA Pair Quality via Transductive Multi-View Learning
Xiaochi Wei, Heyan Huang, Liqiang Nie, Fuli Feng, Richang Hong, Tat-Seng Chua

Community-based question answering (cQA) sites have become important knowledge sharing platforms, as massive cQA pairs are archived, but the uneven quality of cQA pairs leaves information seekers unsatisfied. Various efforts have been dedicated to predicting the quality of cQA contents. Most of them concatenate different features into single vectors and then feed them into regression models. In fact, the quality of cQA pairs is influenced by different views, and the agreement among them is essential for quality assessment. Besides, the lacking of labeled data significantly hinders the quality prediction performance. Toward this end, we present a transductive multi-view learning model. It is designed to find a latent common space by unifying and preserving information from various views, including question, answer, QA relevance, asker, and answerer. Additionally, rich information in the unlabeled test cQA pairs are utilized via transductive learning to enhance the representation ability of the common space. Extensive experiments on real-world datasets have well-validated the proposed model.

• #493
Curriculum Learning for Natural Answer Generation
Cao Liu, Shizhu He, Kang Liu, Jun Zhao

By reason of being able to obtain natural language responses, natural answers are more favored in real-world Question Answering (QA) systems. Generative models learn to automatically generate natural answers from large-scale question answer pairs (QA-pairs). However, they are suffering from the uncontrollable and uneven quality of QA-pairs crawled from the Internet. To address this problem, we propose a curriculum learning based framework for natural answer generation (CL-NAG), which is able to take full advantage of the valuable learning data from a noisy and uneven-quality corpus. Specifically, we employ two practical measures to automatically measure the quality (complexity) of QA-pairs. Based on the measurements, CL-NAG firstly utilizes simple and low-quality QA-pairs to learn a basic model, and then gradually learns to produce better answers with richer contents and more complete syntaxes based on more complex and higher-quality QA-pairs. In this way, all valuable information in the noisy and uneven-quality corpus could be fully exploited. Experiments demonstrate that CL-NAG outperforms the state-of-the-arts, which increases 6.8% and 8.7% in the accuracy for simple and complex questions, respectively.

• #1507
Taiki Miyanishi, Jun-ichiro Hirayama, Atsunori Kanemura, Motoaki Kawanabe

We propose a physical-world question-answering (QA) method, where the system answers a text question about the physical world by searching a given sequence of sentences about daily-life episodes. To address various information needs in a physical world situation, the physical-world QA methods have to generate mixed-type responses (e.g. word sequence, word set, number, and time as well as a single word) according to the content of questions, after reading physical-world event stories. Most existing methods only provide words or choose answers from multiple candidates. In this paper, we use multiple decoders to generate a mixed-type answer encoding daily episodes with a memory architecture that can capture short- and long-term event dependencies. Results using house-activity stories show that the use of multiple decoders with memory components is effective for answering various physical-world QA questions.

• #1646
Towards Reading Comprehension for Long Documents
Yuanxing Zhang, Yangbin Zhang, Kaigui Bian, Xiaoming Li

• #4502
ElimiNet: A Model for Eliminating Options for Reading Comprehension with Multiple Choice Questions
Soham Parikh, Ananya Sai, Preksha Nema, Mitesh Khapra

The task of Reading Comprehension with Multiple Choice Questions, requires a human (or machine) to read a given {passage, question} pair and select one of the n given options. The current state of the art model for this task first computes a question-aware representation for the passage and then selects the option which has the maximum similarity with this representation. However, when humans perform this task they do not just focus on option selection but use a combination of elimination and selection. Specifically, a human would first try to eliminate the most irrelevant option and then read the passage again in the light of this new information (and perhaps ignore portions corresponding to the eliminated option). This process could be repeated multiple times till the reader is finally ready to select the correct option. We propose ElimiNet, a neural network-based model which tries to mimic this process. Specifically, it has gates which decide whether an option can be eliminated given the {passage, question} pair and if so it tries to make the passage representation orthogonal to this eliminated option (akin to ignoring portions of the passage corresponding to the eliminated option). The model makes multiple rounds of partial elimination to refine the passage representation and finally uses a selection module to pick the best option. We evaluate our model on the recently released large scale RACE dataset and show that it outperforms the current state of the art model on 7 out of the 13 question types in this dataset. Further, we show that taking an ensemble of our elimination-selection based method with a selection based method gives us an improvement of 3.1% over the best-reported performance on this dataset.

• #2379
Kaichun Yao, Libo Zhang, Tiejian Luo, Lili Tao, Yanjun Wu

We propose a novel neural network model that aims to generate diverse and human-like natural language questions. Our model not only directly captures the variability in possible questions by using a latent variable, but also generates certain types of questions by introducing an additional observed variable. We deploy our model in the generative adversarial network (GAN) framework and modify the discriminator which not only allows evaluating the question authenticity, but predicts the question type. Our model is trained and evaluated on a question-answering dataset SQuAD, and the experimental results shown the proposed model is able to generate diverse and readable questions with the specific attribute.

• #2698
Hermitian Co-Attention Networks for Text Matching in Asymmetrical Domains
Yi Tay, Anh Tuan Luu, Siu Cheung Hui

Co-Attentions are highly effective attention mechanisms for text matching applications. Co-Attention enables the learning of pairwise attentions, i.e., learning to attend based on computing word-level affinity scores between two documents. However, text matching problems can exist in either symmetrical or asymmetrical domains. For example, paraphrase identification is a symmetrical task while question-answer matching and entailment classification are considered asymmetrical domains. In this paper, we argue that Co-Attention models in asymmetrical domains require different treatment as opposed to symmetrical domains, i.e., a concept of word-level directionality should be incorporated while learning word-level similarity scores. Hence, the standard inner product in real space commonly adopted in co-attention is not suitable. This paper leverages attractive properties of the complex vector space and proposes a co-attention mechanism based on the complex-valued inner product (Hermitian products). Unlike the real dot product, the dot product in complex space is asymmetric because the first item is conjugated. Aside from modeling and encoding directionality, our proposed approach also enhances the representation learning process. Extensive experiments on five text matching benchmark datasets demonstrate the effectiveness of our approach.

### Wednesday 1811:20 - 12:45CV-REC2 - Computer Vision: Recognition (T1)

Chair: Zhou Yu
• #105
Deep View-Aware Metric Learning for Person Re-Identification
Pu Chen, Xinyi Xu, Cheng Deng
Computer Vision: Recognition

Person re-identification remains a challenging issue due to the dramatic changes in visual appearance caused by the variations in camera views, human pose, and background clutter. In this paper, we propose a deep view-aware metric learning (DVAML) model, where image pairs with similar and dissimilar views are projected into different feature subspaces, which can discover the intrinsic relevance between image pairs from different aspects. Additionally, we employ multiple metrics to jointly learn feature subspaces on which the relevance between image pairs are explicitly captured and thus greatly promoting the retrieval accuracy. Extensive experiment results on datasets CUHK01, CUHK03, and PRID2011 demonstrate the superiority of our method compared with state-of-the-art approaches.

• #736
Centralized Ranking Loss with Weakly Supervised Localization for Fine-Grained Object Retrieval
Xiawu Zheng, Rongrong Ji, Xiaoshuai Sun, Yongjian Wu, Feiyue Huang, Yanhua Yang
Computer Vision: Recognition

Fine-grained object retrieval has attracted extensive research focus recently. Its state-of-the-art schemesare typically based upon convolutional neural network (CNN) features. Despite the extensive progress, two issues remain open. On one hand, the deep features are coarsely extracted at image level rather than precisely at object level, which are interrupted by background clutters. On the other hand, training CNN features with a standard triplet loss is time consuming and incapable to learn discriminative features. In this paper, we present a novel fine-grained object retrieval scheme that conquers these issues in a unified framework. Firstly, we introduce a novel centralized ranking loss (CRL), which achieves a very efficient (1,000times training speedup comparing to the triplet loss) and discriminative feature learning by a ?centralized? global pooling. Secondly, a weakly supervised attractive feature extraction is proposed, which segments object contours with top-down saliency. Consequently, the contours are integrated into the CNN response map to precisely extract features ?within? the target object. Interestingly, we have discovered that the combination of CRL and weakly supervised learning can reinforce each other. We evaluate the performance ofthe proposed scheme on widely-used benchmarks including CUB200-2011 and CARS196. We havereported significant gains over the state-of-the-art schemes, e.g., 5.4% over SCDA [Wei et al., 2017]on CARS196, and 3.7% on CUB200-2011.

• #839
High Resolution Feature Recovering for Accelerating Urban Scene Parsing
Rui Zhang, Sheng Tang, Luoqi Liu, Yongdong Zhang, Jintao Li, Shuicheng Yan
Computer Vision: Recognition

Both accuracy and speed are equally important in urban scene parsing. Most of the existing methods mainly focus on improving parsing accuracy, ignoring the problem of low inference speed due to large-sized input and high resolution feature maps. To tackle this issue, we propose a High Resolution Feature Recovering (HRFR) framework to accelerate a given parsing network. A Super-Resolution Recovering module is employed to recover features of large original-sized images from features of down-sampled input. Therefore, our framework can combine the advantages of (1) fast speed of networks with down-sampled input and (2) high accuracy of networks with large original-sized input. Additionally, we employ auxiliary intermediate supervision and boundary region re-weighting to facilitate the optimization of the network. Extensive experiments on the two challenging Cityscapes and CamVid datasets well demonstrate the effectiveness of the proposed HRFR framework, which can accelerate the scene parsing inference process by about 3.0x speedup from 1/2 down-sampled input with negligible accuracy reduction.

• #1086
Semantic Locality-Aware Deformable Network for Clothing Segmentation
Wei Ji, Xi Li, Yueting Zhuang, Omar El Farouk Bourahla, Yixin Ji, Shihao Li, Jiabao Cui
Computer Vision: Recognition

Clothing segmentation is a challenging vision problem typically implemented within a fine-grained semantic segmentation framework. Different from conventional segmentation, clothing segmentation has some domain-specific properties such as texture richness, diverse appearance variations, non-rigid geometry deformations, and small sample learning. To deal with these points, we propose a semantic locality-aware segmentation model, which adaptively attaches an original clothing image with a semantically similar (e.g., appearance or pose) auxiliary exemplar by search. Through considering the interactions of the clothing image and its exemplar, more intrinsic knowledge about the locality manifold structures of clothing images is discovered to make the learning process of small sample problem more stable and tractable. Furthermore, we present a CNN model based on the deformable convolutions to extract the non-rigid geometry-aware features for clothing images. Experimental results demonstrate the effectiveness of the proposed model against the state-of-the-art approaches.

• #1116
MEnet: A Metric Expression Network for Salient Object Segmentation
Shulian Cai, Jiabin Huang, Delu Zeng, Xinghao Ding, John Paisley
Computer Vision: Recognition

Recent CNN-based saliency models have achieved excellent performance on public datasets, but most are sensitive to distortions from noise or compression. In this paper, we propose an end-to-end generic salient object segmentation model called Metric Expression Network (MEnet) to overcome this drawback. We construct a topological metric space where the implicit metric is determined by a deep network. In this latent space, we can group pixels within an observed image semantically into two regions, based on whether they are in a salient region or a non-salient region in the image. We carry out all feature extractions at the pixel level, which makes the output boundaries of the salient object finely-grained. Experimental results show that the proposed metric can generate robust salient maps that allow for object segmentation. By testing the method on several public benchmarks, we show that the performance of MEnet achieves excellent results. We also demonstrate that the proposed method outperforms previous CNN-based methods on distorted images.

• #1934
Cross-Modality Person Re-Identification with Generative Adversarial Training
Pingyang Dai, Rongrong Ji, Haibin Wang, Qiong Wu, Yuyu Huang
Computer Vision: Recognition

Person re-identification (Re-ID) is an important task in video surveillance which automatically searches and identifies people across different cameras. Despite the extensive Re-ID progress in RGB cameras, few works have studied the Re-ID between infrared and RGB images, which is essentially a cross-modality problem and widely encountered in real-world scenarios. The key challenge lies in two folds, i.e., the lack of discriminative information to re-identify the same person between RGB and infrared modalities, and the difficulty to learn a robust metric towards such a large-scale cross-modality retrieval. In this paper, we tackle the above two challenges by proposing a novel cross-modality generative adversarial network (termed cmGAN). To handle the issue of insufficient discriminative information, we leverage the cutting-edge generative adversarial training to design our own discriminator to learn discriminative feature representation from different modalities. To handle the issue of large-scale cross-modality metric learning, we integrates both identification loss and cross-modality triplet loss, which minimize inter-class ambiguity while maximizing cross-modality similarity among instances. The entire cmGAN can be trained in an end-to-end manner by using standard deep neural network framework. We have quantized the performance of our work in the newly-released SYSU RGB-IR Re-ID benchmark, and have reported superior performance, i.e., Cumulative Match Characteristic curve (CMC) and Mean Average Precision (MAP), over the state-of-the-art works [Wu et al., 2017], respectively.

• #1707
SafeNet: Scale-normalization and Anchor-based Feature Extraction Network for Person Re-identification
Kun Yuan, Qian Zhang, Chang Huang, Shiming Xiang, Chunhong Pan
Computer Vision: Recognition

Person Re-identification (ReID) is a challenging retrieval task that requires matching a person's image across non-overlapping camera views. The quality of fulfilling this task is largely determined on the robustness of the features that are used to describe the person. In this paper, we show the advantage of jointly utilizing multi-scale abstract information to learn powerful features over full body and parts. A scale normalization module is proposed to balance different scales through residual-based integration. To exploit the information hidden in non-rigid body parts, we propose an anchor-based method to capture the local contents by stacking convolutions of kernels with various aspect ratios, which focus on different spatial distributions. Finally, a well-defined framework is constructed for simultaneously learning the representations of both full body and parts. Extensive experiments conducted on current challenging large-scale person ReID datasets, including Market1501, CUHK03 and DukeMTMC, demonstrate that our proposed method achieves the state-of-the-art results.

### Wednesday 1811:20 - 12:45ML-FLS - Feature Selection, Learning Sparse Models (K11)

Chair: Zhangyang Wang
• #934
Exact Low Tubal Rank Tensor Recovery from Gaussian Measurements
Canyi Lu, Jiashi Feng, Zhouchen Lin, Shuicheng Yan
Feature Selection, Learning Sparse Models

The recent proposed Tensor Nuclear Norm (TNN) [Lu et al., 2016; 2018a] is an interesting convex penalty induced by the tensor SVD [Kilmer and Martin, 2011]. It plays a similar role as the matrix nuclear norm which is the convex surrogate of the matrix rank. Considering that the TNN based Tensor Robust PCA [Lu et al., 2018a] is an elegant extension of Robust PCA with a similar tight recovery bound, it is natural to solve other low rank tensor recovery problems extended from the matrix cases. However, the extensions and proofs are generally tedious. The general atomic norm provides a unified view of low-complexity structures induced norms, e.g., the l1-norm and nuclear norm. The sharp estimates of the required number of generic measurements for exact recovery based on the atomic norm are known in the literature. In this work, with a careful choice of the atomic set, we prove that TNN is a special atomic norm. Then by computing the Gaussian width of certain cone which is necessary for the sharp estimate, we achieve a simple bound for guaranteed low tubal rank tensor recovery from Gaussian measurements. Specifically, we show that by solving a TNN minimization problem, the underlying tensor of size n1×n2×n3 with tubal rank r can be exactly recovered when the given number of Gaussian measurements is O(r(n1+n2−r)n3). It is order optimal when comparing with the degrees of freedom r(n1+n2−r)n3. Beyond the Gaussian mapping, we also give the recovery guarantee of tensor completion based on the uniform random mapping by TNN minimization. Numerical experiments verify our theoretical results.

• #1004
Cuckoo Feature Hashing: Dynamic Weight Sharing for Sparse Analytics
Jinyang Gao, Beng Chin Ooi, Yanyan Shen, Wang-Chien Lee
Feature Selection, Learning Sparse Models

Feature hashing is widely used to process large scale sparse features for learning of predictive models. Collisions inherently happen in the hashing process and hurt the model performance. In this paper, we develop a feature hashing scheme called Cuckoo Feature Hashing(CCFH) based on the principle behind Cuckoo hashing, a hashing scheme designed to resolve collisions. By providing multiple possible hash locations for each feature, CCFH prevents the collisions between predictive features by dynamically hashing them into alternative locations during model training. Experimental results on prediction tasks with hundred-millions of features demonstrate that CCFH can achieve the same level of performance by using only 15%-25% parameters compared with conventional feature hashing.

• #3806
Stochastic Second-Order Method for Large-Scale Nonconvex Sparse Learning Models
Hongchang Gao, Heng Huang
Feature Selection, Learning Sparse Models

Sparse learning models have shown promising performance in the high dimensional machine learning applications. The main challenge of sparse learning models is how to optimize it efficiently. Most existing methods solve this problem by relaxing it as a convex problem, incurring large estimation bias. Thus, the sparse learning model with nonconvex constraint has attracted much attention due to its better performance. But it is difficult to optimize due to the non-convexity. In this paper, we propose a linearly convergent stochastic second-order method to optimize this nonconvex problem for large-scale datasets. The proposed method incorporates second-order information to improve the convergence speed. Theoretical analysis shows that our proposed method enjoys linear convergence rate and guarantees to converge to the underlying true model parameter. Experimental results have verified the efficiency and correctness of our proposed method.

• #3529
Accelerated Difference of Convex functions Algorithm and its Application to Sparse Binary Logistic Regression
Duy Nhat Phan, Hoai Minh Le, Hoai An Le Thi
Feature Selection, Learning Sparse Models

In this work, we present a variant of DCA (Difference of Convex function Algorithm) with the aim to improve its convergence speed. The proposed algorithm, named Accelerated DCA (ADCA), consists in incorporating the Nesterov's acceleration technique into DCA. We first investigate ADCA for solving the standard DC program and rigorously study its convergence properties and the convergence rate. Secondly, we develop ADCA for a special case of the standard DC program whose the objective function is the sum of a differentiable with L-Lipschitz gradient function (possibly nonconvex) and a nonsmooth DC function. We exploit the special structure of the problem to propose an efficient DC decomposition for which the corresponding ADCA scheme is inexpensive. As an application, we consider the sparse binary logistic regression problem. Numerical experiments on several benchmark datasets illustrate the efficiency of our algorithm and its superiority over well-known methods.

• #1799
Unsupervised Deep Hashing via Binary Latent Factor Models for Large-scale Cross-modal Retrieval
Gengshen Wu, Zijia Lin, Jungong Han, Li Liu, Guiguang Ding, Baochang Zhang, Jialie Shen
Feature Selection, Learning Sparse Models

Despite its great success, matrix factorization based cross-modality hashing suffers from two problems: 1) there is no engagement between feature learning and binarization; and 2) most existing methods impose the relaxation strategy by discarding the discrete constraints when learning the hash function, which usually yields suboptimal solutions. In this paper, we propose a novel multimodal hashing framework, referred as Unsupervised Deep Cross-Modal Hashing (UDCMH), for multimodal data search in a self-taught manner via integrating deep learning and matrix factorization with binary latent factor models. On one hand, our unsupervised deep learning framework enables the feature learning to be jointly optimized with the binarization. On the other hand, the hashing system based on the binary latent factor models can generate unified binary codes by solving a discrete-constrained objective function directly with no need for a relaxation step. Moreover, novel Laplacian constraints are incorporated into the objective function, which allow to preserve not only the nearest neighbors that are commonly considered in the literature but also the farthest neighbors of data, even if the semantic labels are not available. Extensive experiments on multiple datasets highlight the superiority of the proposed framework over several state-of-the-art baselines.

• #1136
Improving Deep Neural Network Sparsity through Decorrelation Regularization
Xiaotian Zhu, Wengang Zhou, Houqiang Li
Feature Selection, Learning Sparse Models

Modern deep learning models usually suffer high complexity in model size and computation when transplanted to resource constrained platforms. To this end, many works are dedicated to compressing deep neural networks. Adding group LASSO regularization is one of the most effective model compression methods since it generates structured sparse networks. We investigate the deep neural networks trained by group LASSO constraint and observe that even with strong sparsity regularization imposed, there still exists substantial filter correlation among the convolution filters, which is undesired for a compact neural network. We propose to suppress such correlation with a new kind of constraint called decorrelation regularization, which explicitly forces the network to learn a set of less correlated filters. The experiments on CIFAR10/100 and ILSVRC2012 datasets show that when combined our decorrelation regularization with group LASSO, the correlation between filters could be effectively weakened, which increases the sparsity of the resulting model and leads to better compressing performance.

• #2523
Efficient DNN Neuron Pruning by Minimizing Layer-wise Nonlinear Reconstruction Error
Chunhui Jiang, Guiying Li, Chao Qian, Ke Tang
Feature Selection, Learning Sparse Models

Deep neural networks (DNNs) have achieved great success, but the applications to mobile devices are limited due to their huge model size and low inference speed. Much effort thus has been devoted to pruning DNNs. Layer-wise neuron pruning methods have shown their effectiveness, which minimize the reconstruction error of linear response with a limited number of neurons in each single layer pruning. In this paper, we propose a new layer-wise neuron pruning approach by minimizing the reconstruction error of nonlinear units, which might be more reasonable since the error before and after activation can change significantly. An iterative optimization procedure combining greedy selection with gradient decent is proposed for single layer pruning. Experimental results on benchmark DNN models show the superiority of the proposed approach. Particularly, for VGGNet, the proposed approach can compress its disk space by 13.6× and bring a speedup of 3.7×; for AlexNet, it can achieve a compression rate of 4.1× and a speedup of 2.2×, respectively.

### Wednesday 1811:20 - 12:45MLA-BM - Biomedical Applications (C3)

Chair: Jonathan Rubin
• #704
Pairwise-Ranking based Collaborative Recurrent Neural Networks for Clinical Event Prediction
Zhi Qiao, Shiwan Zhao, Cao Xiao, Xiang Li, Yong Qin, Fei Wang
Biomedical Applications

Patient Electronic Health Records (EHR) data consist of sequences of patient visits over time. Sequential prediction of patients' future clinical events (e.g., diagnoses) from their historical EHR data is a core research task and motives a series of predictive models including deep learning. The existing research mainly adopts a classification framework, which treats the observed and unobserved events as positive and negative classes. However, this may not be true in real clinical setting considering the high rate of missed diagnoses and human errors. In this paper, we propose to formulate the clinical event prediction problem as an events recommendation problem. An end-to-end pairwise-ranking based collaborative recurrent neural networks (PacRNN) is proposed to solve it, which firstly embeds patient clinical contexts with attention RNN, then uses Bayesian Personalized Ranking (BPR) regularized by disease co-occurrence to rank probabilities of patient-specific diseases, as well as use point process to provide simultaneous prediction of the occurring time of these diagnoses. Experimental results on two real world EHR datasets demonstrate the robust performance, interpretability, and efficacy of PacRNN.

• #1591
A Novel Neural Network Model based on Cerebral Hemispheric Asymmetry for EEG Emotion Recognition
Yang Li, Wenming Zheng, Zhen Cui, Tong Zhang, Yuan Zong
Biomedical Applications

In this paper, we propose a novel neural network model, called bi-hemispheres domain adversarial neural network (BiDANN), for EEG emotion recognition. BiDANN is motivated by the neuroscience findings, i.e., the emotional brain's asymmetries between left and right hemispheres. The basic idea of BiDANN is to map the EEG feature data of both left and right hemispheres into discriminative feature spaces separately, in which the data representations can be classified easily. For further precisely predicting the class labels of testing data, we narrow the distribution shift between training and testing data by using a global and two local domain discriminators, which work adversarially to the classifier to encourage domain-invariant data representations to emerge. After that, the learned classifier from labeled training data can be applied to unlabeled testing data naturally. We conduct two experiments to verify the performance of our BiDANN model on SEED database. The experimental results show that the proposed model achieves the state-of-the-art performance.

• #2358
Predicting the Spatio-Temporal Evolution of Chronic Diseases in Population with Human Mobility Data
Yingzi Wang, Xiao Zhou, Anastasios Noulas, Cecilia Mascolo, Xing Xie, Enhong Chen
Biomedical Applications

Chronic diseases like cancer and diabetes are major threats to human life. Understanding the distribution and progression of chronic diseases of a population is important in assisting the allocation of medical resources as well as the design of policies in preemptive healthcare. Traditional methods to obtain large scale indicators on population health, e.g., surveys and statistical analysis, can be costly and time-consuming and often lead to a coarse spatio-temporal picture. In this paper, we leverage a dataset describing the human mobility patterns of citizens in a large metropolitan area. By viewing local human lifestyles we predict the evolution rate of several chronic diseases at the level of a city neighborhood. We apply the combination of a collaborative topic modeling (CTM) and a Gaussian mixture method (GMM) to tackle the data sparsity challenge and achieve robust predictions on health conditions simultaneously. Our method enables the analysis and prediction of disease rate evolution at fine spatio-temporal scales and demonstrates the potential of incorporating datasets from mobile web sources to improve population health monitoring. Evaluations using real-world check-in and chronic disease morbidity datasets in the city of London show that the proposed CTM+GMM model outperforms various baseline methods.

• #2613
Hybrid Approach of Relation Network and Localized Graph Convolutional Filtering for Breast Cancer Subtype Classification
Sungmin Rhee, Seokjun Seo, Sun Kim
Biomedical Applications

Network biology has been successfully used to help reveal complex mechanisms of disease, especially cancer. On the other hand, network biology requires in-depth knowledge to construct disease-specific networks, but our current knowledge is very limited even with the recent advances in human cancer biology. Deep learning has shown an ability to address the problem like this. However, it conventionally used grid-like structured data, thus application of deep learning technologies to the human disease subtypes is yet to be explored. To overcome the issue, we propose a hybrid model, which integrates two key components 1) graph convolution neural network (graph CNN) and 2) relation network (RN). Experimental results on synthetic data and breast cancer data demonstrate that our proposed method shows better performances than existing methods.

• #4414
Joint Learning of Phenotypes and Diagnosis-Medication Correspondence via Hidden Interaction Tensor Factorization
Kejing Yin, William K. Cheung, Yang Liu, Benjamin C. M. Fung, Jonathan Poon
Biomedical Applications

Non-negative tensor factorization has been shown effective for discovering phenotypes from the EHR data with minimal human supervision. In most cases, an interaction tensor of the elements in the EHR (e.g., diagnoses and medications) has to be first established before the factorization can be applied. Such correspondence information however is often missing. While different heuristics can be used to estimate the missing correspondence, any errors introduced will in turn cause inaccuracy for the subsequent phenotype discovery task. This is especially true for patients with multiple diseases diagnosed (e.g., under critical care). To alleviate this limitation, we propose the hidden interaction tensor factorization (HITF) where the diagnosis-medication correspondence and the underlying phenotypes are inferred simultaneously. We formulate it under a Poisson non-negative tensor factorization framework and learn the HITF model via maximum likelihood estimation. For performance evaluation, we applied HITF to the MIMIC III dataset. Our empirical results show that both the phenotypes and the correspondence inferred are clinically meaningful. In addition, the inferred HITF model outperforms a number of state-of-the-art methods for mortality prediction.

• #2398
Interpretable Drug Target Prediction Using Deep Neural Representation
Kyle Yingkai Gao, Achille Fokoue, Heng Luo, Arun Iyengar, Sanjoy Dey, Ping Zhang
Biomedical Applications

The identification of drug-target interactions (DTIs) is a key task in drug discovery, where drugs are chemical compounds and targets are proteins.  Traditional DTI prediction methods are either time consuming (simulation-based methods) or heavily dependent on domain expertise (similarity-based and feature-based methods). In this work, we propose an end-to-end neural network model that predicts DTIs directly from low level representations.  In addition to making predictions, our model provides biological interpretation using two-way attention mechanism. Instead of using simplified settings where a dataset is evaluated as a whole, we designed an evaluation dataset from BindingDB following more realistic settings where predictions of unobserved examples (proteins and drugs) have to be made.  We experimentally compared our model with matrix factorization, similarity-based methods, and a previous deep learning approach.  Overall, the results show that our model outperforms other approaches without requiring domain knowledge and feature engineering.  In a case study, we illustrated the ability of our approach to provide biological insights to interpret the predictions.

• #843
Drug Similarity Integration Through Attentive Multi-view Graph Auto-Encoders
Tengfei Ma, Cao Xiao, Jiayu Zhou, Fei Wang
Biomedical Applications

Drug similarity has been studied to support downstream clinical tasks such as inferring novel properties of drugs (e.g. side effects, indications, interactions) from known properties. The growing availability of new types of drug features brings the opportunity of learning a more comprehensive and accurate drug similarity that represents the full spectrum of underlying drug relations. However, it is challenging to integrate these heterogeneous, noisy, nonlinear-related  information to learn accurate similarity measures especially when labels are scarce. Moreover, there is a trade-off between accuracy and interpretability. In this paper, we propose to learn accurate and interpretable similarity measures from multiple types of drug features. In particular, we model the integration using multi-view graph auto-encoders, and add attentive mechanism to determine the weights for each view with respect to corresponding tasks and features for better interpretability. Our model has flexible design for both semi-supervised and unsupervised settings. Experimental results demonstrated significant predictive accuracy improvement. Case studies also showed better model capacity (e.g. embed node features) and interpretability.

### Wednesday 1811:20 - 13:00MAS-PS - Agents and Planning (C8)

Chair: Satoshi Kurihara
• #304
Managing Communication Costs under Temporal Uncertainty
Nikhil Bhargava, Christian Muise, Tiago Vaquero, Brian Williams
Agents and Planning

In multi-agent temporal planning, individual agents cannot know a priori when other agents will execute their actions and so treat those actions as uncertain. Only when others communicate the results of their actions is that uncertainty resolved. If a full communication protocol is specified ahead of time, then delay controllability can be used to assess the feasibility of the temporal plan. However, agents often have flexibility in choosing when to communicate the results of their action. In this paper, we address the question of how to choose communication protocols that guarantee the feasibility of the original temporal plan subject to some cost associated with that communication. To do so, we introduce a means of extracting delay controllability conflicts and show how we can use these conflicts to more efficiently guide our search. We then present three conflict-directed search algorithms and explore the theoretical and empirical trade-offs between the different approaches.

• #1701
Traffic Light Scheduling, Value of Time, and Incentives
Argyrios Deligkas, Erez Karpas, Ron Lavi, Rann Smorodinsky
Agents and Planning

We study the intersection signalling control problem for cars with heterogeneous valuations of time (VoT). We are interested in a control algorithm that has some desirable properties: (1) it induces cars to report their VoT truthfully, (2) it minimizes the value of time lost for cars waiting at the intersection, and (3) it is computationally efficient. We obtain three main results: (1) We describe a computationally efficient heuristic forward search approach to solve the static problem. Simulation results show that this method is significantly faster than the dynamic-programming approach to solve the static problem (which is by itself polynomial time). We therefore believe that our algorithm can be commercially implemented. (2) We extend the solution of the static problem to the dynamic case. We couple our algorithm with a carefully designed payment scheme which yields an incentive compatible mechanism. In other words, it is the best interest of each car to truthfully report its VoT. (3) We describe simulation results that compare the social welfare obtained by our scheduling algorithm, as measured by the total value of waiting time, to the social welfare obtained by other intersection signalling control methods.

• #2032
Hang Ma, Glenn Wagner, Ariel Felner, Jiaoyang Li, T. K. Satish Kumar, Sven Koenig
Agents and Planning

We formalize Multi-Agent Path Finding with Deadlines (MAPF-DL). The objective is to maximize the number of agents that can reach their given goal vertices from their given start vertices within the deadline, without colliding with each other. We first show that MAPF-DL is NP-hard to solve optimally. We then present two classes of optimal algorithms, one based on a reduction of MAPF-DL to a flow problem and a subsequent compact integer linear programming formulation of the resulting reduced abstracted multi-commodity flow network and the other one based on novel combinatorial search algorithms. Our empirical results demonstrate that these MAPF-DL solvers scale well and each one dominates the other ones in different scenarios.

• #1905
Scalable Initial State Interdiction for Factored MDPs
Swetasudha Panda, Yevgeniy Vorobeychik
Agents and Planning

We propose a novel Stackelberg game model of MDP interdiction in which the defender modifies the initial state of the planner, who then responds by computing an optimal policy starting with that state. We first develop a novel approach for MDP interdiction in factored state space that allows the defender to modify the initial state. The resulting approach can be computationally expensive for large factored MDPs. To address this, we develop several interdiction algorithms that leverage variations of reinforcement learning using both linear and non-linear function approximation. Finally, we extend the interdiction framework to consider a Bayesian interdiction problem in which the interdictor is uncertain about some of the planner's initial state features. Extensive experiments demonstrate the effectiveness of our approaches.

• #3392
Solving Patrolling Problems in the Internet Environment
Tomas Brazdil, Antonin Kucera, Vojtech Rehak
Agents and Planning

We propose an algorithm for constructing efficient patrolling strategies in the Internet environment, where the protected targets are nodes connected to the network and the patrollers are software agents capable of detecting/preventing undesirable activities on the nodes. The algorithm is based on a novel compositional principle designed for a special class of strategies, and it can quickly construct (sub)optimal solutions even if the number of targets reaches hundreds of millions.

• #3627
Counterplanning using Goal Recognition and Landmarks
Alberto Pozanco, Yolanda E-Martín, Susana Fernández, Daniel Borrajo
Agents and Planning

In non-cooperative multi-agent systems, agents might want to prevent the opponents from achieving their goals. One alternative to solve this task would be using counterplanning to generate a plan that allows an agent to block other's to reach their goals. In this paper, we introduce a fully automated domain-independent approach for counterplanning. It combines; goal recognition to infer an opponent's goal; landmarks' computation to identify subgoals that can be used to block opponents' goals achievement; and classical automated planning to generate plans that prevent the opponent's goals achievement. Experimental results in several domains show the benefits of our novel approach.

• #3903
A Decentralised Approach to Intersection Traffic Management
Huan Vu, Samir Aknine, Sarvapali D. Ramchurn
Agents and Planning

Traffic congestion has a significant impact on quality of life and the economy. This paper presents a decentralised traffic management mechanism for intersections using a distributed constraint optimisation approach (DCOP). Our solution outperforms the state of the art solution both for stable traffic conditions (about 60% reduced waiting time) and robustness to unpredictable events.

• #1289
Extended Increasing Cost Tree Search for Non-Unit Cost Domains
Thayne T. Walker, Nathan R. Sturtevant, Ariel Felner
Agents and Planning

Multi-agent pathfinding (MAPF) has applications in navigation, robotics, games and planning. Most work on search-based optimal algorithms for MAPF has focused on simple domains with unit cost actions and unit time steps. Although these constraints keep many aspects of the algorithms simple, they also severely limit the domains that can be used. In this paper we introduce a new definition of the MAPF problem for non-unit cost and non-unit time step domains along with new multiagent state successor generation schemes for these domains. Finally, we define an extended version of the increasing cost tree search algorithm (ICTS) for non-unit costs, with two new sub-optimal variants of ICTS: epsilon-ICTS and w-ICTS. Our experiments show that higher quality sub-optimal solutions are achievable in domains with finely discretized movement models in no more time than lower-quality, optimal solutions in domains with coarsely discretized movement models.

### Wednesday 1811:20 - 13:00ML-RLA - Reinforcement Learning and Applications (C2)

Chair: Matthew E. Taylor
• #172
Impression Allocation for Combating Fraud in E-commerce Via Deep Reinforcement Learning with Action Norm Penalty
Mengchen Zhao, Zhao Li, Bo An, Haifeng Lu, Yifan Yang, Chen Chu
Reinforcement Learning and Applications

Conducting fraud transactions has become popular among e-commerce sellers to make their products favorable to the platform and buyers, which decreases the utilization efficiency of buyer impressions and jeopardizes the business environment. Fraud detection techniques are necessary but not enough for the platform since it is impossible to recognize all the fraud transactions. In this paper, we focus on improving the platform's impression allocation mechanism to maximize its profit and reduce the sellers' fraudulent behaviors simultaneously. First, we learn a seller behavior model to predict the sellers' fraudulent behaviors from the real-world data provided by one of the largest e-commerce company in the world. Then, we formulate the platform's impression allocation problem as a continuous Markov Decision Process (MDP) with unbounded action space. In order to make the action executable in practice and facilitate learning, we propose a novel deep reinforcement learning algorithm DDPG-ANP that introduces an action norm penalty to the reward function. Experimental results show that our algorithm significantly outperforms existing baselines in terms of scalability and solution quality.

• #523
StackDRL: Stacked Deep Reinforcement Learning for Fine-grained Visual Categorization
Xiangteng He, Yuxin Peng, Junjie Zhao
Reinforcement Learning and Applications

Fine-grained visual categorization (FGVC) is the discrimination of similar subcategories, whose main challenge is to localize the quite subtle visual distinctions between similar subcategories. There are two pivotal problems: discovering which region is discriminative and representative, and determining how many discriminative regions are necessary to achieve the best performance. Existing methods generally solve these two problems relying on the prior knowledge or experimental validation, which extremely restricts the usability and scalability of FGVC. To address the "which" and "how many" problems adaptively and intelligently, this paper proposes a stacked deep reinforcement learning approach (StackDRL). It adopts a two-stage learning architecture, which is driven by the semantic reward function. Two-stage learning localizes the object and its parts in sequence ("which"), and determines the number of discriminative regions adaptively ("how many"), which is quite appealing in FGVC. Semantic reward function drives StackDRL to fully learn the discriminative and conceptual visual information, via jointly combining the attention-based reward and category-based reward. Furthermore, unsupervised discriminative localization avoids the heavy labor consumption of labeling, and extremely strengthens the usability and scalability of our StackDRL approach. Comparing with ten state-of-the-art methods on CUB-200-2011 dataset, our StackDRL approach achieves the best categorization accuracy.

• #2177
Deep Reinforcement Learning in Ice Hockey for Context-Aware Player Evaluation
Guiliang Liu, Oliver Schulte
Reinforcement Learning and Applications

A variety of machine learning models have been proposed to assess the performance of players in professional sports. However, they have only a limited ability to model how player performance depends on the game context. This paper proposes a new approach to capturing game context: we apply Deep Reinforcement Learning (DRL) to learn an action-value Q function from 3M play-by-play events in the National Hockey League (NHL). The neural network representation integrates both continuous context signals and game history, using a possession-based LSTM. The learned Q-function is used to value players' actions under different game contexts. To assess a player's overall performance, we introduce a novel Game Impact Metric (GIM) that aggregates the values of the player's actions. Empirical Evaluation shows GIM is consistent throughout a play season, and correlates highly with standard success measures and future salary.

• #4005
Hashing over Predicted Future Frames for Informed Exploration of Deep Reinforcement Learning
Haiyan Yin, Jianda Chen, Sinno Jialin Pan
Reinforcement Learning and Applications

In deep reinforcement learning (RL) tasks, an efficient exploration mechanism should be able to encourage an agent to take actions that lead to less frequent states which may yield higher accumulative future return. However, both knowing about the future and evaluating the frequentness of states are non-trivial tasks, especially for deep RL domains, where a state is represented by high-dimensional image frames. In this paper, we propose a novel informed exploration framework for deep RL, where we build the capability for an RL agent to predict over the future transitions and evaluate the frequentness for the predicted future frames in a meaningful manner. To this end, we train a deep prediction model to predict future frames given a state-action pair, and a convolutional autoencoder model to hash over the seen frames. In addition, to utilize the counts derived from the seen frames to evaluate the frequentness for the predicted frames, we tackle the challenge of matching the predicted future frames and their corresponding seen frames at the latent feature level. In this way, we derive a reliable metric for evaluating the novelty of the future direction pointed by each action, and hence inform the agent to explore the least frequent one.

• #2661
A Reinforced Topic-Aware Convolutional Sequence-to-Sequence Model for Abstractive Text Summarization
Li Wang, Junlin Yao, Yunzhe Tao, Li Zhong, Wei Liu, Qiang Du
Reinforcement Learning and Applications

In this paper, we propose a deep learning approach to tackle the automatic summarization tasks by incorporating topic information into the convolutional sequence-to-sequence (ConvS2S) model and using self-critical sequence training (SCST) for optimization. Through jointly attending to topics and word-level alignment, our approach can improve coherence, diversity, and informativeness of generated summaries via a biased probability generation mechanism. On the other hand, reinforcement training, like SCST, directly optimizes the proposed model with respect to the non-differentiable metric ROUGE, which also avoids the exposure bias during inference. We carry out the experimental evaluation with state-of-the-art methods over the Gigaword, DUC-2004, and LCSTS datasets. The empirical results demonstrate the superiority of our proposed method in the abstractive summarization.

• #2840
Beyond the Click-Through Rate: Web Link Selection with Multi-level Feedback
Kun Chen, Kechao Cai, Longbo Huang, John C.S. Lui
Reinforcement Learning and Applications

• #4272
Learning Environmental Calibration Actions for Policy Self-Evolution
Chao Zhang, Yang Yu, Zhi-Hua Zhou
Reinforcement Learning and Applications

Reinforcement learning in physical world is often expensive. Simulators are commonly employed to train policies. Due to the simulation error, trained-in-simulator policies are hard to be directly deployed in physical world. Therefore, how to efficiently reuse these policies to the real environment is a key issue. To address this issue, this paper presents a policy self-evolution process: in the target environment, the agent firstly executes a few calibration actions to perceive the environment, and then reuses the previous policies according to the observation of the environment. In this way, the mission of policy learning in the target environment is reduced to the task of environment identification through executing the calibration actions, which needs much less samples than learning a policy from scratch. We propose the POSEC (POlicy Self-Evolution by Calibration) approach, which learns the most informative calibration actions for policy self-evolution. Taking three robotic arm controlling tasks as the test beds, we show that the proposed method can learn a fine policy for a new arm with only a few (e.g. five) samples of the target environment.

• #5135
(Sister Conferences Best Papers Track) Importance Sampling for Fair Policy Selection
Shayan Doroudi, Philip S. Thomas, Emma Brunskill
Reinforcement Learning and Applications

We consider the problem of off-policy policy selection in reinforcement learning: using historical data generated from running one policy to compare two or more policies. We show that approaches based on importance sampling can be unfair---they can select the worse of two policies more often than not. We then give an example that shows importance sampling is systematically unfair in a practically relevant setting; namely, we show that it unreasonably favors shorter trajectory lengths. We then present sufficient conditions to theoretically guarantee fairness. Finally, we provide a practical importance sampling-based estimator to help mitigate the unfairness due to varying trajectory lengths.

### Wednesday 1811:20 - 13:00CV-VID - Video: Events, Activities, Surveillance, Question Answering (T5)

Chair: Hongyuan Zhu
• #953
Watching a Small Portion could be as Good as Watching All: Towards Efficient Video Classification
Hehe Fan, Zhongwen Xu, Linchao Zhu, Chenggang Yan, Jianjun Ge, Yi Yang
Video: Events, Activities, Surveillance, Question Answering

We aim to significantly reduce the computational cost for classification of temporally untrimmed videos while retaining similar accuracy. Existing video classification methods sample frames with a predefined frequency over entire video. Differently, we propose an end-to-end deep reinforcement approach which enables an agent to classify videos by watching a very small portion of frames like what we do. We make two main contributions. First, information is not equally distributed in video frames along time. An agent needs to watch more carefully when a clip is informative and skip the frames if they are redundant or irrelevant. The proposed approach enables the agent to adapt sampling rate to video content and skip most of the frames without the loss of information. Second, in order to have a confident decision, the number of frames that should be watched by an agent varies greatly from one video to another. We incorporate an adaptive stop network to measure confidence score and generate timely trigger to stop the agent watching videos, which improves efficiency without loss of accuracy. Our approach reduces the computational cost significantly for the large-scale YouTube-8M dataset, while the accuracy remains the same.

• #86
Visible Thermal Person Re-Identification via Dual-Constrained Top-Ranking
Mang Ye, Zheng Wang, Xiangyuan Lan, Pong C. Yuen
Video: Events, Activities, Surveillance, Question Answering

Cross-modality person re-identification between the thermal and visible domains is extremely important for night-time surveillance applications. Existing works in this filed mainly focus on learning sharable feature representations to handle the cross-modality discrepancies. However, besides the cross-modality discrepancy caused by different camera spectrums, visible thermal person re-identification also suffers from large cross-modality and intra-modality variations caused by different camera views and human poses. In this paper, we propose a dual-path network with a novel bi-directional dual-constrained top-ranking loss to learn discriminative feature representations. It is advantageous in two aspects: 1) end-to-end feature learning directly from the data without extra metric learning steps, 2) it simultaneously handles the cross-modality and intra-modality variations to ensure the discriminability of the learnt representations. Meanwhile, identity loss is further incorporated to model the identity-specific information to handle large intra-class variations. Extensive experiments on two datasets demonstrate the superior performance compared to the state-of-the-arts.

• #332
PredCNN: Predictive Learning with Cascade Convolutions
Ziru Xu, Yunbo Wang, Mingsheng Long, Jianmin Wang
Video: Events, Activities, Surveillance, Question Answering

Predicting future frames in videos remains an unsolved but challenging problem. Mainstream recurrent models suffer from huge memory usage and computation cost, while convolutional models are unable to effectively capture the temporal dependencies between consecutive video frames. To tackle this problem, we introduce an entirely CNN-based architecture, PredCNN, that models the dependencies between the next frame and the sequential video inputs. Inspired by the core idea of recurrent models that previous states have more transition operations than future states, we design a cascade multiplicative unit (CMU) that provides relatively more operations for previous video frames. This newly proposed unit enables PredCNN to predict future spatiotemporal data without any recurrent chain structures, which eases gradient propagation and enables a fully paralleled optimization. We show that PredCNN outperforms the state-of-the-art recurrent models for video prediction on the standard Moving MNIST dataset and two challenging crowd flow prediction datasets, and achieves a faster training speed and lower memory footprint.

• #3663
Crowd Counting using Deep Recurrent Spatial-Aware Network
Lingbo Liu, Hongjun Wang, Guanbin Li, Wanli Ouyang, Liang Lin
Video: Events, Activities, Surveillance, Question Answering

Crowd counting from unconstrained scene images is a crucial task in many real-world applications like urban surveillance and management, but it is greatly challenged by the camera’s perspective that causes huge appearance variations in people’s scales and rotations. Conventional methods address such challenges by resorting to fixed multi-scale architectures that are often unable to cover the largely varied scales while ignoring the rotation variations. In this paper, we propose a unified neural network framework, named Deep Recurrent Spatial-Aware Network, which adaptively addresses the two issues in a learnable spatial transform module with a region-wise refinement process. Specifically, our framework incorporates a Recurrent Spatial-Aware Refinement (RSAR) module iteratively conducting two components: i) a Spatial Transformer Network that dynamically locates an attentional region from the crowd density map and transforms it to the suitable scale and rotation for optimal crowd estimation; ii) a Local Refinement Network that refines the density map of the attended region with residual learning. Extensive experiments on four challenging benchmarks show the effectiveness of our approach. Specifically, comparing with the existing best-performing methods, we achieve an improvement of 12\% on the largest dataset WorldExpo’10 and 22.8\% on the most challenging dataset UCF\_CC\_50

• #3410
Video Captioning with Tube Features
Bin Zhao, Xuelong Li, Xiaoqiang Lu
Video: Events, Activities, Surveillance, Question Answering

Visual feature plays an important role in the video captioning task. Considering that the video content is mainly composed of the activities of salient objects, it has restricted the caption quality of current approaches which just focus on global frame features while paying less attention to the salient objects. To tackle this problem, in this paper, we design an object-aware feature for video captioning, denoted as tube feature. Firstly, Faster-RCNN is employed to extract object regions in frames, and a tube generation method is developed to connect the regions from different frames but belonging to the same object. After that, an encoder-decoder architecture is constructed for video caption generation. Specifically, the encoder is a bi-directional LSTM, which is utilized to capture the dynamic information of each tube. The decoder is a single LSTM extended with an attention model, which enables our approach to adaptively attend to the most correlated tubes when generating the caption. We evaluate our approach on two benchmark datasets: MSVD and Charades. The experimental results have demonstrated the effectiveness of tube feature in the video captioning task.

• #511
Visual Data Synthesis via GAN for Zero-Shot Video Classification
Chenrui Zhang, Yuxin Peng
Video: Events, Activities, Surveillance, Question Answering

Zero-Shot Learning (ZSL) in video classification is a promising research direction, which aims to tackle the challenge from explosive growth of video categories. Most existing methods exploit seento- unseen correlation via learning a projection between visual and semantic spaces. However, such projection-based paradigms cannot fully utilize the discriminative information implied in data distribution, and commonly suffer from the information degradation issue caused by "heterogeneity gap". In this paper, we propose a visual data synthesis framework via GAN to address these problems. Specifically, both semantic knowledge and visual distribution are leveraged to synthesize video feature of unseen categories, and ZSL can be turned into typical supervised problem with the synthetic features. First, we propose multi-level semantic inference to boost video feature synthesis, which captures the discriminative information implied in joint visual-semantic distribution via feature-level and label-level semantic inference. Second, we propose Matching-aware Mutual Information Correlation to overcome information degradation issue, which captures seen-to-unseen correlation in matched and mismatched visual-semantic pairs by mutual information, providing the zero-shot synthesis procedure with robust guidance signals. Experimental results on four video datasets demonstrate that our approach can improve the zero-shot video classification performance significantly.

• #550
Zhou Zhao, Zhu Zhang, Shuwen Xiao, Zhou Yu, Jun Yu, Deng Cai, Fei Wu, Yueting Zhuang
Video: Events, Activities, Surveillance, Question Answering

• #1365
Multi-Turn Video Question Answering via Multi-Stream Hierarchical Attention Context Network
Zhou Zhao, Xinghua Jiang, Deng Cai, Jun Xiao, Xiaofei He, Shiliang Pu
Video: Events, Activities, Surveillance, Question Answering

Conversational video question answering is a challenging task in visual information retrieval, which generates the accurate answer from the referenced video contents according to the visual conversation context and given question. However, the existing visual question answering methods mainly tackle the problem of single-turn video question answering, which may be ineffectively applied for multi-turn video question answering directly, due to the insufficiency of modeling the sequential conversation context. In this paper, we study the problem of multi-turn video question answering from the viewpoint of multi-step hierarchical attention context network learning. We first propose the hierarchical attention context network for context-aware question understanding by modeling the hierarchically sequential conversation context structure. We then develop the multi-stream spatio-temporal attention network for learning the joint representation of the dynamic video contents and context-aware question embedding. We next devise the hierarchical attention context network learning method with multi-step reasoning process for multi-turn video question answering. We construct two large-scale multi-turn video question answering datasets. The extensive experiments show the effectiveness of our method.

### Wednesday 1814:00 - 14:45Invited Talk (VICTORIA)

Chair: Fredrik Heintz
• Intelligible Intelligence & Beneficial Intelligence
Max Tegmark
Invited Talk
• ### Wednesday 1814:55 - 16:10SUR-NLCV - Survey Track: Natural Language Processing and Computer Vision (VICTORIA)

Chair: Yuhong Guo
• #5431
Five Years of Argument Mining: a Data-driven Analysis
Elena Cabrio, Serena Villata
Survey Track: Natural Language Processing and Computer Vision

Argument mining is the research area aiming at extracting natural language arguments and their relations from text, with the final goal of providing machine-processable structured data for computational models of argument. This research topic has started to attract the attention of a small community of researchers around 2014, and it is nowadays counted as one of the most promising research areas in Artificial Intelligence in terms of growing of the community, funded projects, and involvement of companies. In this paper, we present the argument mining tasks, and we discuss the obtained results in the area from a data-driven perspective. An open discussion highlights the main weaknesses suffered by the existing work in the literature, and proposes open challenges to be faced in the future.

• #5441
"Chitty-Chitty-Chat Bot": Deep Learning for Conversational AI
Rui Yan
Survey Track: Natural Language Processing and Computer Vision

Conversational AI is of growing importance since it enables easy interaction interface between humans and computers. Due to its promising potential and alluring commercial values to serve as virtual assistants and/or social chatbots, major AI, NLP, and Search & Mining conferences are explicitly calling-out for contributions from conversational studies. It is an active research area and of considerable interest. To build a conversational system with moderate intelligence is challenging, and requires abundant dialogue data and interdisciplinary techniques. Along with the Web 2.0, the massive data available greatly facilitate data-driven methods such as deep learning for human-computer conversations. In general, conversational systems can be categorized into 1) task-oriented systems which aim to help users accomplish goals in vertical domains, and 2) social chat bots which can converse seamlessly and appropriately with humans, playing the role of a chat companion. In this paper, we focus on the survey of non-task-oriented chit-chat bots.

• #5445
Event Coreference Resolution: A Survey of Two Decades of Research
Jing Lu, Vincent Ng
Survey Track: Natural Language Processing and Computer Vision

Recent years have seen a gradual shift of focus from entity-based tasks to event-based tasks in information extraction research. Being a core event-based task, event coreference resolution is less studied but arguably more challenging than entity coreference resolution. This paper provides an overview of the major milestones made in event coreference research since its inception two decades ago.

• #5408
Affective Image Content Analysis: A Comprehensive Survey
Sicheng Zhao, Guiguang Ding, Qingming Huang, Tat-Seng Chua, Björn W. Schuller, Kurt Keutzer
Survey Track: Natural Language Processing and Computer Vision

Images can convey rich semantics and induce strong emotions in viewers. Recently, with the explosive growth of visual data, extensive research efforts have been dedicated to affective image content analysis (AICA). In this paper, we review the state-of-the-art methods comprehensively with respect to two main challenges -- affective gap and perception subjectivity. We begin with an introduction to the key emotion representation models that have been widely employed in AICA. Available existing datasets for performing evaluation are briefly described. We then summarize and compare the representative approaches on emotion feature extraction, personalized emotion prediction, and emotion distribution learning. Finally, we discuss some future research directions.

### Wednesday 1814:55 - 16:10SIS-KR - Sister Conference Best Papers: Knowledge Representation and Reasoning (C7)

Chair: Carsten Lutz
• #5104
The Finite Model Theory of Bayesian Networks: Descriptive Complexity
Fabio Gagliardi Cozman, Denis Deratani Mauá
Sister Conference Best Papers: Knowledge Representation and Reasoning

We adapt the theory of descriptive complexity to Bayesian networks, to quantify the expressivity of specifications based on predicates and quantifiers. We show that Bayesian network specifications that employ first-order quantification capture the complexity class PP; by allowing quantification over predicates, the resulting Bayesian network specifications capture each class in the hierarchy PP^(NP^...^NP), a result that does not seem to have equivalent in the literature.

• #5118
The Intricacies of Three-Valued Extensional Semantics for Higher-Order Logic Programs
Panos Rondogiannis, Ioanna Symeonidou
Sister Conference Best Papers: Knowledge Representation and Reasoning

In this paper we examine the problem of providing a purely extensional three-valued semantics for higher-order logic programs with negation. We demonstrate that a technique that was proposed by M. Bezem for providing extensional semantics to positive higher-order logic programs, fails when applied to higher-order logic programs with negation. On the positive side, we demonstrate that for stratified higher-order logic programs, extensionality is indeed achieved by the technique. We analyze the reasons of the failure of extensionality in the general case, arguing that a three-valued setting can not distinguish between certain predicates that appear to have a different behaviour inside a program context, but which happen to be identical as three-valued relations.

• #5123
Attributed Description Logics: Reasoning on Knowledge Graphs
Markus Krötzsch, Maximilian Marx, Ana Ozaki, Veronika Thost
Sister Conference Best Papers: Knowledge Representation and Reasoning

In modelling real-world knowledge, there often arises a need to represent and reason with meta-knowledge. To equip description logics (DLs) for dealing with such ontologies, we enrich DL concepts and roles with finite sets of attribute–value pairs, called annotations, and allow concept inclusions to express constraints on annotations. We investigate a range of DLs starting from the lightweight description logic EL, covering the prototypical ALCH, and extending to the very expressive SROIQ, the DL underlying OWL 2 DL.

• #5129
Finite Controllability of Conjunctive Query Answering with Existential Rules: Two Steps Forward
Giovanni Amendola, Nicola Leone, Marco Manna
Sister Conference Best Papers: Knowledge Representation and Reasoning

Reasoning with existential rules typically consists of checking whether a Boolean conjunctive query is satisfied by all models of a first-order sentence having the form of a conjunction of Datalog rules extended with existential quantifiers in rule-heads. To guarantee decidability, five basic decidable classes - linear, weakly-acyclic, guarded, sticky, and shy - have been singled out, together with several generalizations and combinations. For all basic classes, except shy, the important property of finite controllability has been proved, ensuring that a query is satisfied by all models of the sentence if, and only if, it is satisfied by all of its finite models. This paper takes two steps forward: (i) devise a general technique to facilitate the process of (dis)proving finite controllability of an arbitrary class of existential rules; and (ii) specialize the technique to complete the picture for the five mentioned classes, by showing that also shy is finitely controllable.

• #5117
Weighted Bipolar Argumentation Graphs: Axioms and Semantics
Leila Amgoud, Jonathan Ben-Naim
Sister Conference Best Papers: Knowledge Representation and Reasoning

The paper studies how arguments can be evaluated in weighted bipolar argumentation graphs (i.e., graphs whose arguments have basic weights and may be supported and attacked). It introduces principles that an evaluation method (or semantics) would satisfy, analyzes existing semantics with respect to them, and finally proposes a new semantics for the class of non-maximal acyclic graphs.

• #5151
Orchestrating a Network of Mereotopological Theories: An Abridged Report
C. Maria Keet, Oliver Kutz
Sister Conference Best Papers: Knowledge Representation and Reasoning

Parthood is used widely in ontologies across subject domains, specified in a multitude of mereological theories, and even more when combined with topology. To complicate the landscape, decidable languages put restrictions on the language features, so that only fragments of the mereo(topo)logical theories can be represented, even though those full features may be needed to check correctness during modelling. We address these issues by specifying a structured network of theories formulated in multiple logics that are glued together by the various linking constructs of the Distributed Ontology Language, DOL. For the KGEMT mereotopology and its five sub-theories, together with the DL-based OWL species and first- and second-order logic, this network in DOL orchestrates 28 ontologies.

### Wednesday 1814:55 - 16:10MAS-ML - Agents and Learning (C8)

Chair: Ann Nowé
• #3918
Combinatorial Auctions via Machine Learning-based Preference Elicitation
Gianluca Brero, Benjamin Lubin, Sven Seuken
Agents and Learning

Combinatorial auctions (CAs) are used to allocate multiple items among bidders with complex valuations. Since the value space grows exponentially in the number of items, it is impossible for bidders to report their full value function even in medium-sized settings. Prior work has shown that current designs often fail to elicit the most relevant values of the bidders, thus leading to inefficiencies. We address this problem by introducing a machine learning-based elicitation algorithm to identify which values to query from the bidders. Based on this elicitation paradigm we design a new CA mechanism we call PVM, where payments are determined so that bidders’ incentives are aligned with allocative efficiency. We validate PVM experimentally in several spectrum auction domains, and we show that it achieves high allocative efficiency even when only few values are elicited from the bidders.

• #2135
Recurrent Deep Multiagent Q-Learning for Autonomous Brokers in Smart Grid
Yaodong Yang, Jianye Hao, Mingyang Sun, Zan Wang, Changjie Fan, Goran Strbac
Agents and Learning

The broker mechanism is widely applied to serve for interested parties to derive long-term policies in order to reduce costs or gain profits in smart grid. However, a broker is faced with a number of challenging problems such as balancing demand and supply from customers and competing with other coexisting brokers to maximize its profit. In this paper, we develop an effective pricing strategy for brokers in local electricity retail market based on recurrent deep multiagent reinforcement learning and sequential clustering. We use real household electricity consumption data to simulate the retail market for evaluating our strategy. The experiments demonstrate the superior performance of the proposed pricing strategy and highlight the effectiveness of our reward shaping mechanism.

• #2151
What Game Are We Playing? End-to-end Learning in Normal and Extensive Form Games
Chun Kai Ling, Fei Fang, J. Zico Kolter
Agents and Learning

Although recent work in AI has made great progress in solving large, zero-sum, extensive-form games, the underlying assumption in most past work is that the parameters of the game itself are known to the agents.  This paper deals with the relatively under-explored but equally important "inverse" setting, where the parameters of the underlying game are not known to all agents, but must be learned through observations.  We propose a differentiable, end-to-end learning framework for addressing this task.  In particular, we consider a regularized version of the game, equivalent to a particular form of quantal response equilibrium, and develop 1) a primal-dual Newton method for finding such equilibrium points in both normal and extensive form games; and 2) a backpropagation method that lets us analytically compute gradients of all relevant game parameters through the solution itself.  This ultimately lets us learn the game by training in an end-to-end fashion, effectively by integrating a "differentiable game solver" into the loop of larger deep network architectures. We demonstrate the effectiveness of the learning method in several settings including poker and security game tasks.

• #3591
Balancing Two-Player Stochastic Games with Soft Q-Learning
Jordi Grau-Moya, Felix Leibfried, Haitham Bou-Ammar
Agents and Learning

Within the context of video games the notion of perfectly rational agents can be undesirable as it leads to uninteresting situations, where humans face tough adversarial decision makers. Current frameworks for stochastic games and reinforcement learning prohibit tuneable strategies as they seek optimal performance. In this paper, we enable such tuneable behaviour by generalising soft Q-learning to stochastic games, where more than one agent interact strategically. We contribute both theoretically and empirically. On the theory side, we show that games with soft Q-learning exhibit a unique value and generalise team games and zero-sum games far beyond these two extremes to cover a continuous spectrum of gaming behaviour. Experimentally, we show how tuning agents' constraints affect performance and demonstrate, through a neural network architecture, how to reliably balance games with high-dimensional representations.

• #4151
Keeping in Touch with Collaborative UAVs: A Deep Reinforcement Learning Approach
Bo Yang, Min Liu
Agents and Learning

Effective collaborations among autonomous unmanned aerial vehicles (UAVs) rely on timely information sharing. However, the time-varying flight environment and the intermittent link connectivity pose great challenges to message delivery. In this paper, we leverage the deep reinforcement learning (DRL) technique to address the UAVs' optimal links discovery and selection problem in uncertain environments. As the multi-agent learning efficiency is constrained by the high-dimensional and continuous action spaces, we slice the whole action spaces into a number of tractable fractions to achieve efficient convergences of optimal policies in continuous domains. Moreover, for the nonstationarity issue that particularly challenges the multi-agent DRL with local perceptions, we present a multi-agent mutual sampling method that jointly interacts the intra-agent and inter-agent state-action information to stabilize and expedite the training procedure. We evaluate the proposed algorithm on the UAVs' continuous network connection task. Results show that the associated UAVs can quickly select the optimal connected links, which facilitate the UAVs' teamwork significantly.

### Wednesday 1814:55 - 16:10ROB-PS - Robotics and Planning (K2)

Chair: Vaishak Belle
• #3401
Fast Model Identification via Physics Engines for Data-Efficient Policy Search
Shaojun Zhu, Andrew Kimmel, Kostas E. Bekris, Abdeslam Boularias
Robotics and Planning

This paper presents a method for identifying mechanical parameters of robots or objects, such as their mass and friction coefficients. Key features are the use of off-the-shelf physics engines and the adaptation of a Bayesian optimization technique towards minimizing the number of real-world experiments needed for model-based reinforcement learning. The proposed framework reproduces in a physics engine experiments performed on a real robot and optimizes the model's mechanical parameters so as to match real-world trajectories. The optimized model is then used for learning a policy in simulation, before real-world deployment. It is well understood, however, that it is hard to exactly reproduce real trajectories in simulation. Moreover, a near-optimal policy can be frequently found with an imperfect model. Therefore, this work proposes a strategy for identifying a model that is just good enough to approximate the value of a locally optimal policy with a certain confidence, instead of wasting effort on identifying the most accurate model. Evaluations, performed both in simulation and on a real robotic manipulation task, indicate that the proposed strategy results in an overall time-efficient, integrated model identification and learning solution, which significantly improves the data-efficiency of existing policy search algorithms.

• #3824
Behavioral Cloning from Observation
Faraz Torabi, Garrett Warnell, Peter Stone
Robotics and Planning

Humans often learn how to perform tasks via imitation: they observe others perform a task, and then very quickly infer the appropriate actions to take based on their observations. While extending this paradigm to autonomous agents is a well-studied problem in general, there are two particular aspects that have largely been overlooked: (1) that the learning is done from observation only (i.e., without explicit action information), and (2) that the learning is typically done very quickly. In this work, we propose a two-phase, autonomous imitation learning technique called behavioral cloning from observation (BCO), that aims to provide improved performance with respect to both of these aspects. First, we allow the agent to acquire experience in a self-supervised fashion. This experience is used to develop a model which is then utilized to learn a particular task by observing an expert perform that task without the knowledge of the specific actions taken. We experimentally compare BCO to imitation learning methods, including the state-of-the-art, generative adversarial imitation learning (GAIL) technique, and we show comparable task performance in several different simulation domains while exhibiting increased learning speed after expert trajectories become available.

• #3837
Multi-modal Predicate Identification using Dynamically Learned Robot Controllers
Saeid Amiri, Suhua Wei, Shiqi Zhang, Jivko Sinapov, Jesse Thomason, Peter Stone
Robotics and Planning

Intelligent robots frequently need to explore the objects in their working environments. Modern sensors have enabled robots to learn object properties via perception of multiple modalities. However, object exploration in the real world poses a challenging trade-off between information gains and exploration action costs. Mixed observability Markov decision process (MOMDP) is a framework for planning under uncertainty, while accounting for both fully and partially observable components of the state. Robot perception frequently has to face such mixed observability. This work enables a robot equipped with an arm to dynamically construct query-oriented MOMDPs for multi-modal predicate identification (MPI) of objects. The robot's behavioral policy is learned from two datasets collected using real robots. Our approach enables a robot to explore object properties in a way that is significantly faster while improving accuracies in comparison to existing methods that rely on hand-coded exploration strategies.

• #5472
(Journal track) Solving Multi-Agent Path Finding on Strongly Biconnected Digraphs
Adi Botea, Davide Bonusi, Pavel Surynek
Robotics and Planning

We present and evaluate diBOX, an algorithm for multi-agent path finding on strongly biconnected directed graphs. diBOX runs in polynomial time, computes suboptimal solutions and is complete for instances on strongly biconnected digraphs with at least two unoccupied positions. A detailed empirical analysis shows a good scalability for diBOX.

• #5136
(Sister Conferences Best Papers Track) Greedy Stone Tower Creations with a Robotic Arm
Martin Wermelinger, Fadri Furrer, Hironori Yoshida, Fabio Gramazio, Matthias Kohler, Roland Siegwart, Marco Hutter
Robotics and Planning

Predominately, robotic construction is applied as prefabrication in structured indoor environments with standard building materials. Our work, on the other hand, focuses on utilizing irregular materials found on-site, such as rubble and rocks, for autonomous construction. We present a pipeline to detect arbitrarily placed objects in a scene and form a structure out of the detected objects. The next best stacking pose is selected using a searching method employing gradient descent with random initial orientations, exploiting a physics engine. This approach is validated in an experimental setup using a robotic manipulator by constructing balancing vertical stacks without mortars and adhesives. We show the results of eleven consecutive trials to form such towers autonomously using four arbitrarily in front of the robot placed rocks.

• #4216
Robot Task Interruption by Learning to Switch Among Multiple Models
Anahita Mohseni-Kabir, Manuela Veloso
Robotics and Planning

### Wednesday 1814:55 - 16:10Panel (T2)

• The Future of AI in Europe
Fredrik Heintz
Panel
• ### Wednesday 1814:55 - 16:10KR-NLP2 - Knowledge Graphs (T1)

Chair: Parisa Kordjamshidi
• #258
Translating Embeddings for Knowledge Graph Completion with Relation Attention Mechanism
Wei Qian, Cong Fu, Yu Zhu, Deng Cai, Xiaofei He
Knowledge Graphs

Knowledge graph embedding is an essential problem in knowledge extraction. Recently, translation based embedding models (e.g., TransE) have received increasingly attentions. These methods try to interpret the relations among entities as translations from head entity to tail entity and achieve promising performance on knowledge graph completion. Previous researchers attempt to transform the entity embedding concerning the given relation for distinguishability. Also, they naturally think the relation-related transforming should reflect attention mechanism, which means it should focus on only a part of the attributes. However, we found previous methods are failed with creating attention mechanism, and the reason is that they ignore the hierarchical routine of human cognition. When predicting whether a relation holds between two entities, people first check the category of entities, then they focus on fined-grained relation-related attributes to make the decision. In other words, the attention should take effect on entities filtered by the right category. In this paper, we propose a novel knowledge graph embedding method named TransAt to learn the translation based embedding, relation-related categories of entities and relation-related attention simultaneously. Extensive experiments show that our approach outperforms state-of-the-art methods significantly on public datasets, and our method can learn the true attention varying among relations.

• #454
Co-training Embeddings of Knowledge Graphs and Entity Descriptions for Cross-lingual Entity Alignment
Muhao Chen, Yingtao Tian, Kai-Wei Chang, Steven Skiena, Carlo Zaniolo
Knowledge Graphs

Multilingual knowledge graph (KG) embeddings provide latent semantic representations of entities and structured knowledge with cross-lingual inferences, which benefit various knowledge-driven cross-lingual NLP tasks. However, precisely learning such cross-lingual inferences is usually hindered by the low coverage of entity alignment in many KGs. Since many multilingual KGs also provide literal descriptions of entities, in this paper, we introduce an embedding-based approach which leverages a weakly aligned multilingual KG for semi-supervised cross-lingual learning using entity descriptions. Our approach performs co-training of two embedding models, i.e. a multilingual KG embedding model and a multilingual literal description embedding model. The models are trained on a large Wikipedia-based trilingual dataset where most entity alignment is unknown to training. Experimental results show that the performance of the proposed approach on the entity alignment task improves at each iteration of co-training, and eventually reaches a stage at which it significantly surpasses previous approaches. We also show that our approach has promising abilities for zero-shot entity alignment, and cross-lingual KG completion.

• #698
Non-translational Alignment for Multi-relational Networks
Shengnan Li, Xin Li, Rui Ye, Mingzhong Wang, Haiping Su, Yingzi Ou
Knowledge Graphs

Most existing solutions for the alignment of multi-relational networks, such as multi-lingual knowledge bases, are translation''-based which facilitate the network embedding via the trans-family, such as TransE. However, they cannot address triangular or other structural properties effectively. Thus, we propose a non-translational approach, which aims to utilize a probabilistic model to offer more robust solutions to the alignment task, by exploring the structural properties as well as leveraging on anchors to project each network onto the same vector space during the process of learning the representation of individual networks. The extensive experiments on four multi-lingual knowledge graphs demonstrate the effectiveness and robustness of the proposed method over a set of state-of-the-art alignment methods.

• #1409
TreeNet: Learning Sentence Representations with Unconstrained Tree Structure
Zhou Cheng, Chun Yuan, Jiancheng Li, Haiqin Yang
Knowledge Graphs

Recursive neural network (RvNN) has been proved to be an effective and promising tool to learn sentence representations by explicitly exploiting the sentence structure. However, most existing work can only exploit simple tree structure, e.g., binary trees, or ignore the order of nodes, which yields suboptimal performance. In this paper, we proposed a novel neural network, namely TreeNet, to capture sentences structurally over the raw unconstrained constituency trees, where the number of child nodes can be arbitrary. In TreeNet, each node is learning from its left sibling and right child in a bottom-up left-to-right order, thus enabling the net to learn over any tree. Furthermore, multiple soft gates and a memory cell are employed in implementing the TreeNet to determine to what extent it should learn, remember and output, which proves to be a simple and efficient mechanism for semantic synthesis. Moreover, TreeNet significantly suppresses convolutional neural networks (CNN) and Long Short-Term Memory (LSTM) with fewer parameters. It improves the classification accuracy by 2%-5% with 42% of the best CNN’s parameters or 94% of standard LSTM’s. Extensive experiments demonstrate TreeNet achieves the state-of-the-art performance on all four typical text classification tasks.

• #1735
Efficient Pruning of Large Knowledge Graphs
Stefano Faralli, Irene Finocchi, Simone Paolo Ponzetto, Paola Velardi
Knowledge Graphs

In this paper we present an efficient and highly accurate algorithm to prune noisy or over-ambiguous knowledge graphs given as input an extensional definition of a domain of interest, namely as a set of instances or concepts. Our method climbs the graph in a bottom-up fashion, iteratively layering the graph and pruning nodes and edges in each layer while not compromising the connectivity of the set of input nodes. Iterative layering and protection of pre-defined nodes allow to extract semantically coherent DAG structures from noisy or over-ambiguous cyclic graphs, without loss of information and without incurring in computational bottlenecks, which are the main problem of state-of-the-art methods for cleaning large, i.e., Web-scale, knowledge graphs. We apply our algorithm to the tasks of pruning automatically acquired taxonomies using benchmarking data from a SemEval evaluation exercise, as well as the extraction of a domain-adapted taxonomy from the Wikipedia category hierarchy. The results show the superiority of our approach over state-of-art algorithms in terms of both output quality and computational efficiency.

• #5144
(Sister Conferences Best Papers Track) Completeness-aware Rule Learning from Knowledge Graphs
Thomas Pellissier Tanon, Daria Stepanova, Simon Razniewski, Paramita Mirza, Gerhard Weikum
Knowledge Graphs

Knowledge graphs (KGs) are huge collections of primarily encyclopedic facts that are widely used in entity recognition, structured search, question answering, and similar. Rule mining is commonly applied to discover patterns in KGs. However, unlike in traditional association rule mining, KGs provide a setting with a high degree of incompleteness, which may result in the wrong estimation of the quality of mined rules, leading to erroneous beliefs such as all artists have won an award. In this paper we propose to use (in-)completeness meta-information to better assess the quality of rules learned from incomplete KGs. We introduce completeness-aware scoring functions for relational association rules. Experimental evaluation both on real and synthetic datasets shows that the proposed rule ranking approaches have remarkably higher accuracy than the state-of-the-art methods in uncovering missing facts.

### Wednesday 1814:55 - 16:10ML-INT - Interpretability (K11)

Chair: Xintao Wu
• #4208
Interpretable Adversarial Perturbation in Input Embedding Space for Text
Motoki Sato, Jun Suzuki, Hiroyuki Shindo, Yuji Matsumoto
Interpretability

Following great success in the image processing field, the idea of adversarial training has been applied to tasks in the natural language processing (NLP) field. One promising approach directly applies adversarial training developed in the image processing field to the input word embedding space instead of the discrete input space of texts. However, this approach abandons such interpretability as generating adversarial texts to significantly improve the performance of NLP tasks. This paper restores interpretability to such methods by restricting the directions of perturbations toward the existing words in the input embedding space. As a result, we can straightforwardly reconstruct each input with perturbations to an actual text by considering the perturbations to be the replacement of words in the sentence while maintaining or even improving the task performance.

• #676
Contextual Outlier Interpretation
Ninghao Liu, Donghwa Shin, Xia Hu
Interpretability

While outlier detection has been intensively studied in many applications, interpretation is becoming increasingly important to help people trust and evaluate the developed detection models through providing intrinsic reasons why the given outliers are identified. It is a nontrivial task for interpreting the abnormality of outliers due to the distinct characteristics of different detection models, complicated structures of data in certain applications, and imbalanced distribution of outliers and normal instances. In addition, contexts where outliers locate, as well as the relation between outliers and the contexts, are usually overlooked in existing interpretation frameworks. To tackle the issues, in this paper, we propose a Contextual Outlier INterpretation (COIN) framework to explain the abnormality of outliers spotted by detectors. The interpretability of an outlier is achieved through three aspects, i.e., outlierness score, attributes that contribute to the abnormality, and contextual description of its neighborhoods. Experimental results on various types of datasets demonstrate the flexibility and effectiveness of the proposed framework.

• #4026
A Symbolic Approach to Explaining Bayesian Network Classifiers
Andy Shih, Arthur Choi, Adnan Darwiche
Interpretability

We propose an approach for explaining Bayesian network classifiers, which is based on compiling such classifiers into decision functions that have a tractable and symbolic form. We introduce two types of explanations for why a classifier may have classified an instance positively or negatively and suggest algorithms for computing these explanations. The first type of explanation identifies a minimal set of the currently active features that is responsible for the current classification, while the second type of explanation identifies a minimal set of features whose current state (active or not) is sufficient for the classification. We consider in particular the compilation of Naive and Latent-Tree Bayesian network classifiers into Ordered Decision Diagrams (ODDs), providing a context for evaluating our proposal using case studies and experiments based on classifiers from the literature.

• #4223
Mixed Causal Structure Discovery with Application to Prescriptive Pricing
Wei Wenjuan, Feng Lu, Liu Chunchen
Interpretability

Prescriptive pricing is one of the most advanced pricing techniques, which derives the optimal price strategy to maximize the future profit/revenue by carrying out a two-stage process, demand modeling and price optimization.Demand modeling tries to reveal price-demand laws by discovering causal relationships among demands, prices, and objective factors, which is the foundation of price optimization.Existing methods either use regression or causal learning for uncovering the price-demand relations, but suffer from pain points in either accuracy/efficiency or mixed data type processing, while all of these are actual requirements in practical pricing scenarios.This paper proposes a novel demand modeling technique for practical usage.Speaking concretely, we propose a new locally consistent information criterion named MIC,and derive MIC-based inference algorithms for an accurate recovery of causal structure on mixed factor space.Experiments on simulate/real datasets show the superiority of our new approach in both price-demand law recovery and demand forecasting, as well as show promising performance in supporting optimal pricing.

• #5452
(Journal track) Learning Explanatory Rules from Noisy Data
Richard Evans, Edward Grefenstette
Interpretability

Artificial Neural Networks are powerful function approximators capable of modelling solutions to a wide variety of problems, both supervised and unsupervised. As their size and expressivity increases, so too does the variance of the model, yielding a nearly ubiquitous overfitting problem. Although mitigated by a variety of model regularisation methods, the common cure is to seek large amounts of training data—which is not necessarily easily obtained—that sufficiently approximates the data distribution of the domain we wish to test on. In contrast, logic programming methods such as Inductive Logic Programming offer an extremely data-efficient process by which models can be trained to reason on symbolic domains. However, these methods are unable to deal with the variety of domains neural networks can be applied to: they are not robust to noise in or mislabelling of inputs, and perhaps more importantly, cannot be applied to non-symbolic domains where the data is ambiguous, such as operating on raw pixels. In this paper, we propose a Differentiable Inductive Logic framework (∂ILP), which can not only solve tasks which traditional ILP systems are suited for, but shows a robustness to noise and error in the training data which ILP cannot cope with. Furthermore, as it is trained by backpropagation against a likelihood objective, it can be hybridised by connecting it with neural networks over ambiguous data in order to be applied to domains which ILP cannot address, while providing data efficiency and generalisation beyond what neural networks on their own can achieve.

• #5461
(Journal track) Visualisation and 'Diagnostic Classifiers' Reveal how Recurrent and Recursive Neural Networks Process Hierarchical Structure
Dieuwke Hupkes, Willem Zuidema
Interpretability

In this paper, we investigate how recurrent neural networks can learn and process languages with hierarchical, compositional semantics. To this end, we define the artificial task of processing nested arithmetic expressions, and study whether different types of neural networks can learn to compute their meaning. We find that simple recurrent networks cannot find a generalising solution to this task, but gated recurrent neural networks perform surprisingly well: networks learn to predict the outcome of the arithmetic expressions with high accuracy, although performance deteriorates somewhat with increasing length. We test multiple hypotheses on the information that is encoded and processed by the networks using a method called diagnostic classification. In this method, simple neural classifiers are used to test sequences of predictions about features of the hidden state representations at each time step. Our results indicate that the networks follow a strategy similar to our hypothesised ‘cumulative strategy’, which explains the high accuracy of the network on novel expressions, the generalisation to longer expressions than seen in training, and the mild deterioration with increasing length. This, in turn, shows that diagnostic classifiers can be a useful technique for opening up the black box of neural networks.

### Wednesday 1814:55 - 16:10UAI-KR - Bayesian Networks (C2)

Chair: Mikko Koivisto
• #1935
Algorithms for the Nearest Assignment Problem
Sara Rouhani, Tahrima Rahman, Vibhav Gogate
Bayesian Networks

We consider the following nearest assignment problem (NAP): given a Bayesian network B and probability value q, find a configuration w of variables in B such that difference between q and the probability of w is minimized. NAP is much harder than conventional inference problems such as finding the most probable explanation and is NP-hard even on independent Bayesian networks (IBNs), which are networks having no edges. Therefore, in order to solve NAP on IBNs, we show how to encode it as a two-way number partitioning problem. This encoding allows us to use greedy poly-time approximation algorithms from the number partitioning literature to yield an algorithm with guarantees for solving NAP on IBNs. We extend this basic algorithm from independent networks to arbitrary probabilistic graphical models by leveraging cutset conditioning and (Rao-Blackwellised) sampling algorithms. We derive approximation and complexity guarantees for our new algorithms and show experimentally that they are quite accurate in practice.

• #2439
Estimation with Incomplete Data: The Linear Case
Karthika Mohan, Felix Thoemmes, Judea Pearl
Bayesian Networks

Traditional methods for handling incomplete data, including Multiple Imputation and Maximum Likelihood, require that the data be Missing At Random (MAR). In most cases, however, missingness in a variable depends on the underlying value of that variable. In this work, we devise model-based methods to consistently estimate mean, variance and covariance given data that are Missing Not At Random (MNAR). While previous work on MNAR data require variables to be discrete, we extend the analysis to continuous variables drawn from Gaussian distributions. We demonstrate the merits of our techniques by comparing it empirically to state of the art software packages.

• #2887
Stochastic Anytime Search for Bounding Marginal MAP
Radu Marinescu, Rina Dechter, Alexander Ihler
Bayesian Networks

The Marginal MAP inference task is known to be extremely hard particularly because the evaluation of each complete MAP assignment involves an exact likelihood computation (a combinatorial sum). For this reason, most recent state-of-the-art solvers that focus on computing anytime upper and lower bounds on the optimal value are limited to solving instances with tractable conditioned summation subproblems. In this paper, we develop new search-based bounding schemes for Marginal MAP that produce anytime upper and lower bounds without performing exact likelihood computations. The empirical evaluation demonstrates the effectiveness of our new methods against the current best-performing search-based bounds.

• #4071
On Robust Trimming of Bayesian Network Classifiers
YooJung Choi, Guy Van den Broeck
Bayesian Networks

This paper considers the problem of removing costly features from a Bayesian network classifier. We want the classifier to be robust to these changes, and maintain its classification behavior. To this end, we propose a closeness metric between Bayesian classifiers, called the expected classification agreement (ECA). Our corresponding trimming algorithm finds an optimal subset of features and a new classification threshold that maximize the expected agreement, subject to a budgetary constraint. It utilizes new theoretical insights to perform branch-and-bound search in the space of feature sets, while computing bounds on the ECA. Our experiments investigate both the runtime cost of trimming and its effect on the robustness and accuracy of the final classifier.

• #5458
(Journal track) Learning Continuous Time Bayesian Networks in Non-stationary Domains
Simone Villa, Fabio Stella
Bayesian Networks

Non-stationary continuous time Bayesian networks are introduced. They allow the parents set of each node in a continuous time Bayesian network to change over time. Structural learning of nonstationary continuous time Bayesian networks is developed under different knowledge settings. A macroeconomic dataset is used to assess the effectiveness of learning non-stationary continuous time Bayesian networks from real-world data.

• #2973
Extracting Job Title Hierarchy from Career Trajectories: A Bayesian Perspective
Huang Xu, Zhiwen Yu, Bin Guo, Mingfei Teng, Hui Xiong
Bayesian Networks

A job title usually implies the responsibility and the rank of a job position. While traditional job title analysis has been focused on studying the responsibilities of job titles, this paper attempts to reveal the rank of job titles. Specifically, we propose to extract job title hierarchy from employees' career trajectories. Along this line, we first quantify the Difficulty of Promotion (DOP) from one job title to another by a monotonic transformation of the length of tenure based on the assumption that a longer tenure usually implies a greater difficulty to be promoted. Then, the difference of two job title ranks is defined as a mapping of the DOP observed from job transitions. A Gaussian Bayesian Network (GBN) is adopted to model the joint distribution of the job title ranks and the DOPs in a career trajectory. Furthermore, a stochastic algorithm is developed  for inferring the posterior job title rank by a given collection of DOPs in the GBN. Finally, experiments on more than 20 million job trajectories show that the job title hierarchy can be extracted precisely by the proposed method.

### Wednesday 1814:55 - 16:10ML-SSL - Semi-Supervised Learning (C3)

Chair: Ming Li
• #1411
Tri-net for Semi-Supervised Deep Learning
Dong-Dong Chen, Wei Wang, Wei Gao, Zhi-Hua Zhou
Semi-Supervised Learning

Deep neural networks have witnessed great successes in various real applications, but it requires a large number of labeled data for training. In this paper, we propose tri-net, a deep neural network which is able to use massive unlabeled data to help learning with limited labeled data. We consider model initialization, diversity augmentation and pseudo-label editing simultaneously. In our work, we utilize output smearing to initialize modules, use fine-tuning on labeled data to augment diversity and eliminate unstable pseudo-labels to alleviate the influence of suspicious pseudo-labeled data. Experiments show that our method achieves the best performance in comparison with state-of-the-art semi-supervised deep learning methods. In particular, it achieves 8.30% error rate on CIFAR-10 by using only 4000 labeled examples.

• #2164
Adversarial Constraint Learning for Structured Prediction
Hongyu Ren, Russell Stewart, Jiaming Song, Volodymyr Kuleshov, Stefano Ermon
Semi-Supervised Learning

Constraint-based learning reduces the burden of collecting labels by having users specify general properties of structured outputs, such as constraints imposed by physical laws. We propose a novel framework for simultaneously learning these constraints and using them for supervision, bypassing the difficulty of using domain expertise to manually specify constraints. Learning requires a black-box simulator of structured outputs, which generates valid labels, but need not model their corresponding inputs or the input-label relationship. At training time, we constrain the model to produce outputs that cannot be distinguished from simulated labels by adversarial training. Providing our framework with a small number of labeled inputs gives rise to a new semi-supervised structured prediction model; we evaluate this model on multiple tasks --- tracking, pose estimation and time series prediction --- and find that it achieves high accuracy with only a small number of labeled inputs. In some cases, no labels are required at all.

• #851
Solving Separable Nonsmooth Problems Using Frank-Wolfe with Uniform Affine Approximations
Edward Cheung, Yuying Li
Semi-Supervised Learning

Frank-Wolfe methods (FW) have gained significant interest in the machine learning community due to their ability to efficiently solve large problems that admit a sparse structure (e.g. sparse vectors and low-rank matrices). However the performance of the existing FW method hinges on the quality of the linear approximation. This typically restricts FW to smooth functions for which the approximation quality, indicated by a global curvature measure, is reasonably good. In this paper, we propose a modified FW algorithm amenable to nonsmooth functions, subject to a separability assumption, by optimizing for approximation quality over all affine functions, given a neighborhood of interest. We analyze theoretical properties of the proposed algorithm and demonstrate that it overcomes many issues associated with existing methods in the context of nonsmooth low-rank matrix estimation.

• #91
Teaching Semi-Supervised Classifier via Generalized Distillation
Chen Gong, Xiaojun Chang, Meng Fang, Jian Yang
Semi-Supervised Learning

Semi-Supervised Learning (SSL) is able to build reliable classifier with very scarce labeled examples by properly utilizing the abundant unlabeled examples. However, existing SSL algorithms often yield unsatisfactory performance due to the lack of supervision information. To address this issue, this paper formulates SSL as a Generalized Distillation (GD) problem, which treats existing SSL algorithm as a learner and introduces a teacher to guide the learner?s training process. Specifically, the intelligent teacher holds the privileged knowledge that ?explains? the training data but remains unknown to the learner, and the teacher should convey its rich knowledge to the imperfect learner through a specific teaching function. After that, the learner gains knowledge by ?imitating? the output of the teaching function under an optimization framework. Therefore, the learner in our algorithm learns from both the teacher and the training data, so its output can be substantially distilled and enhanced. By deriving the Rademacher complexity and error bounds of the proposed algorithm, the usefulness of the introduced teacher is theoretically demonstrated. The superiority of our algorithm to the related state-of-the-art methods has also been empirically demonstrated by the experiments on different datasets with various sources of privileged knowledge.

• #1886
Semi-Supervised Optimal Margin Distribution Machines
Teng Zhang, Zhi-Hua Zhou
Semi-Supervised Learning

Semi-supervised support vector machines is an extension of standard support vector machines with unlabeled instances, and the goal is to find a label assignment of the unlabeled instances, so that the decision boundary has the maximal \textit{minimum margin} on both the original labeled instances and unlabeled instances. Recent studies, however, disclosed that maximizing the minimum margin does not necessarily lead to better performance, and instead, it is crucial to optimize the \textit{margin distribution}. In this paper, we propose a novel approach SODM (Semi-supervised Optimal margin Distribution Machine), which tries to assign the label to unlabeled instances and achieve optimal margin distribution simultaneously. Specifically, we characterize the margin distribution by the first- and second-order statistics, i.e., the margin mean and variance, and extend a stochastic mirror prox method to solve the resultant minimax problem. Extensive experiments on UCI data sets show that SODM is significantly better than compared methods, which verifies the superiority of optimal margin distribution learning.

• #2348
Semi-Supervised Multi-Modal Learning with Incomplete Modalities
Yang Yang, De-Chuan Zhan, Xiang-Rong Sheng, Yuan Jiang
Semi-Supervised Learning

In real world applications, data are often with multiple modalities. Researchers proposed the multi-modal learning approaches for integrating the information from different modalities. Most of the previous multi-modal methods assume that training examples are with complete modalities. However, due to the failures of data collection, self-deficiencies and other various reasons, multi-modal examples are usually with incomplete feature representation in real applications. In this paper, the incomplete feature representation issues in multi-modal learning are named as incomplete modalities, and we propose a semi-supervised multi-modal learning method aimed at this incomplete modal issue (SLIM). SLIM can utilize the extrinsic information from unlabeled data against the insufficiencies brought by the incomplete modal issues in a semi-supervised scenario. Besides, the proposed SLIM forms the problem into a unified framework which can be treated as a classifier or clustering learner, and integrate the intrinsic consistencies and extrinsic unlabeled information. As SLIM can extract the most discriminative predictors for each modality, experiments on 15 real world multi-modal datasets validate the effectiveness of our method.

### Wednesday 1814:55 - 16:10CV-CV1 - Computer Vision 1 (T5)

Chair: Zhou Zhao
• #512
Better and Faster: Knowledge Transfer from Multiple Self-supervised Learning Tasks via Graph Distillation for Video Classification
Chenrui Zhang, Yuxin Peng
Computer Vision 1

Video representation learning is a vital problem for classification task. Recently, a promising unsupervised paradigm termed self-supervised learning has emerged, which explores inherent supervisory signals implied in massive data for feature learning via solving auxiliary tasks. However, existing methods in this regard suffer from two limitations when extended to video classification. First, they focus only on a single task, whereas ignoring complementarity among different task-specific features and thus resulting in suboptimal video representation. Second, high computational and memory cost hinders their application in real-world scenarios. In this paper, we propose a graph-based distillation framework to address these problems: (1) We propose logits graph and representation graph to transfer knowledge from multiple self-supervised tasks, where the former distills classifier-level knowledge by solving a multi-distribution joint matching problem, and the latter distills internal feature knowledge from pairwise ensembled representations with tackling the challenge of heterogeneity among different features; (2) The proposal that adopts a teacher-student framework can reduce the redundancy of knowledge learned from teachers dramatically, leading to a lighter student model that solves classification task more efficiently. Experimental results on 3 video datasets validate that our proposal not only helps learn better video representation but also compress model for faster inference.

• #1907
When Image Denoising Meets High-Level Vision Tasks: A Deep Learning Approach
Ding Liu, Bihan Wen, Xianming Liu, Zhangyang Wang, Thomas Huang
Computer Vision 1

Conventionally, image denoising and high-level vision tasks are handled separately in computer vision. In this paper, we cope with the two jointly and explore the mutual influence between them. First we propose a convolutional neural network for image denoising which achieves the state-of-the-art performance. Second we propose a deep neural network solution that cascades two modules for image denoising and various high-level tasks, respectively, and use the joint loss for updating only the denoising network via back-propagation. We demonstrate that on one hand, the proposed denoiser has the generality to overcome the performance degradation of different high-level vision tasks. On the other hand, with the guidance of high-level vision information, the denoising network can generate more visually appealing results. To the best of our knowledge, this is the first work investigating the benefit of exploiting image semantics simultaneously for image denoising and high-level vision tasks via deep learning.

• #560
Robust Face Sketch Synthesis via Generative Adversarial Fusion of Priors and Parametric Sigmoid
Shengchuan Zhang, Rongrong Ji, Jie Hu, Yue Gao, Chia-Wen Lin
Computer Vision 1

Despite the extensive progress in face sketch synthesis, existing methods are mostly workable under constrained conditions, such as fixed illumination, pose, background and ethnic origin that are hardly to control in real-world scenarios. The key issue lies in the difficulty to use data under fixed conditions to train a model against imaging variations. In this paper, we propose a novel generative adversarial network termed pGAN, which can generate face sketches efficiently using training data under fixed conditions and handle the aforementioned uncontrolled conditions. In pGAN, we embed key photo priors into the process of synthesis and design a parametric sigmoid activation function for compensating illumination variations. Compared to the existing methods, we quantitatively demonstrate that the proposed method can work well on face photos in the wild.

• #510
Scanpath Prediction for Visual Attention using IOR-ROI LSTM
Zhenzhong Chen, Wanjie Sun
Computer Vision 1

Predicting scanpath when a certain stimulus is presented plays an important role in modeling visual attention and search. This paper presents a model that integrates convolutional neural network and long short-term memory (LSTM) to generate realistic scanpaths. The core part of the proposed model is a dual LSTM unit, i.e., an inhibition of return LSTM (IOR-LSTM) and a region of interest LSTM (ROI-LSTM), capturing IOR dynamics and gaze shift behavior simultaneously. IOR-LSTM simulates the visual working memory to adaptively integrate and forget scene information. ROI-LSTM is responsible for predicting the next ROI given the inhibited image features. Experimental results indicate that the proposed architecture can achieve superior performance in predicting scanpaths.

• #1026
Learning to Write Stylized Chinese Characters by Reading a Handful of Examples
Danyang Sun, Tongzheng Ren, Chongxuan Li, Hang Su, Jun Zhu
Computer Vision 1

Automatically writing stylized characters is an attractive yet challenging task, especially for Chinese characters with complex shapes and structures. Most current methods are restricted to generate stylized characters already present in the training