tract: Generating photo-realistic images from a text description is a challenging problem in computer vision. Previous works have shown promising performance to generate synthetic images conditional on text by Generative Adversarial Networks (GANs). In this paper, we focus on the category-consistent and relativistic diverse constraints to optimize the diversity of synthetic images. Based on those constraints, a category-consistent and relativistic diverse conditional GAN (CRD-CGAN) is proposed to synthesize $K$ photo-realistic images simultaneously. We use the attention loss and diversity loss to improve the sensitivity of the GAN to word attention and noises. Then, we employ the relativistic conditional loss to estimate the probability of relatively real or fake for synthetic images, which can improve the performance of basic conditional loss. Finally, we introduce a category-consistent loss to alleviate the over-category issues between K synthetic images. We evaluate our approach using the Birds-200-2011, Oxford-102 flower and MSCOCO 2014 datasets, and the extensive experiments demonstrate superiority of the proposed method in comparison with state-of-the-art methods in terms of photorealistic and diversity of the generated synthetic images.
tract: Explainability of AI systems is critical for users to take informed actions and hold systems accountable. While "opening the opaque box" is important, understanding who opens the box can govern if the Human-AI interaction is effective. In this paper, we conduct a mixed-methods study of how two different groups of whos--people with and without a background in AI--perceive different types of AI explanations. These groups were chosen to look at how disparities in AI backgrounds can exacerbate the creator-consumer gap. We quantitatively share what the perceptions are along five dimensions: confidence, intelligence, understandability, second chance, and friendliness. Qualitatively, we highlight how the AI background influences each group's interpretations and elucidate why the differences might exist through the lenses of appropriation and cognitive heuristics. We find that (1) both groups had unwarranted faith in numbers, to different extents and for different reasons, (2) each group found explanatory values in different explanations that went beyond the usage we designed them for, and (3) each group had different requirements of what counts as humanlike explanations. Using our findings, we discuss potential negative consequences such as harmful manipulation of user trust and propose design interventions to mitigate them. By bringing conscious awareness to how and why AI backgrounds shape perceptions of potential creators and consumers in XAI, our work takes a formative step in advancing a pluralistic Human-centered Explainable AI discourse.
tract: Countless research works of deep neural networks (DNNs) in the task of credit card fraud detection have focused on improving the accuracy of point predictions and mitigating unwanted biases by building different network architectures or learning models. Quantifying uncertainty accompanied by point estimation is essential because it mitigates model unfairness and permits practitioners to develop trustworthy systems which abstain from suboptimal decisions due to low confidence. Explicitly, assessing uncertainties associated with DNNs predictions is critical in real-world card fraud detection settings for characteristic reasons, including (a) fraudsters constantly change their strategies, and accordingly, DNNs encounter observations that are not generated by the same process as the training distribution, (b) owing to the time-consuming process, very few transactions are timely checked by professional experts to update DNNs. Therefore, this study proposes three uncertainty quantification (UQ) techniques named Monte Carlo dropout, ensemble, and ensemble Monte Carlo dropout for card fraud detection applied on transaction data. Moreover, to evaluate the predictive uncertainty estimates, UQ confusion matrix and several performance metrics are utilized. Through experimental results, we show that the ensemble is more effective in capturing uncertainty corresponding to generated predictions. Additionally, we demonstrate that the proposed UQ methods provide extra insight to the point predictions, leading to elevate the fraud prevention process.
tract: The purpose of this paper is to examine the opportunities and barriers of Integrated Human-Machine Intelligence (IHMI) in civil engineering. Integrating artificial intelligence's high efficiency and repeatability with humans' adaptability in various contexts can advance timely and reliable decision-making during civil engineering projects and emergencies. Successful cases in other domains, such as biomedical science, healthcare, and transportation, showed the potential of IHMI in data-driven, knowledge-based decision-making in numerous civil engineering applications. However, whether the industry and academia are ready to embrace the era of IHMI and maximize its benefit to the industry is still questionable due to several knowledge gaps. This paper thus calls for future studies in exploring the value, method, and challenges of applying IHMI in civil engineering. Our systematic review of the literature and motivating cases has identified four knowledge gaps in achieving effective IHMI in civil engineering. First, it is unknown what types of tasks in the civil engineering domain can be assisted by AI and to what extent. Second, the interface between human and AI in civil engineering-related tasks need more precise and formal definition. Third, the barriers that impede collecting detailed behavioral data from humans and contextual environments deserve systematic classification and prototyping. Lastly, it is unknown what expected and unexpected impacts will IHMI have on the AEC industry and entrepreneurship. Analyzing these knowledge gaps led to a list of identified research questions. This paper will lay the foundation for identifying relevant studies to form a research roadmap to address the four knowledge gaps identified.
tract: Quantization is a technique for reducing deep neural networks (DNNs) training and inference times, which is crucial for training in resource constrained environments or time critical inference applications. State-of-the-art (SOTA) quantization approaches focus on post-training quantization, i.e. quantization of pre-trained DNNs for speeding up inference. Very little work on quantized training exists, which neither al-lows dynamic intra-epoch precision switches nor em-ploys an information theory based switching heuristic. Usually, existing approaches require full precision refinement afterwards and enforce a global word length across the whole DNN. This leads to suboptimal quantization mappings and resource usage. Recognizing these limits, we introduce MARViN, a new quantized training strategy using information theory-based intra-epoch precision switching, which decides on a per-layer basis which precision should be used in order to minimize quantization-induced information loss. Note that any quantization must leave enough precision such that future learning steps do not suffer from vanishing gradients. We achieve an average speedup of 1.86 compared to a float32 basis while limiting mean accuracy degradation on AlexNet/ResNet to only -0.075%.
tract: In this work, we propose an adversarial unsupervised domain adaptation (UDA) approach with the inherent conditional and label shifts, in which we aim to align the distributions w.r.t. both $p(x|y)$ and $p(y)$. Since the label is inaccessible in the target domain, the conventional adversarial UDA assumes $p(y)$ is invariant across domains, and relies on aligning $p(x)$ as an alternative to the $p(x|y)$ alignment. To address this, we provide a thorough theoretical and empirical analysis of the conventional adversarial UDA methods under both conditional and label shifts, and propose a novel and practical alternative optimization scheme for adversarial UDA. Specifically, we infer the marginal $p(y)$ and align $p(x|y)$ iteratively in the training, and precisely align the posterior $p(y|x)$ in testing. Our experimental results demonstrate its effectiveness on both classification and segmentation UDA, and partial UDA.
tract: There has been a growing interest in unsupervised domain adaptation (UDA) to alleviate the data scalability issue, while the existing works usually focus on classifying independently discrete labels. However, in many tasks (e.g., medical diagnosis), the labels are discrete and successively distributed. The UDA for ordinal classification requires inducing non-trivial ordinal distribution prior to the latent space. Target for this, the partially ordered set (poset) is defined for constraining the latent vector. Instead of the typically i.i.d. Gaussian latent prior, in this work, a recursively conditional Gaussian (RCG) set is proposed for ordered constraint modeling, which admits a tractable joint distribution prior. Furthermore, we are able to control the density of content vectors that violate the poset constraint by a simple "three-sigma rule". We explicitly disentangle the cross-domain images into a shared ordinal prior induced ordinal content space and two separate source/target ordinal-unrelated spaces, and the self-training is worked on the shared space exclusively for ordinal-aware domain alignment. Extensive experiments on UDA medical diagnoses and facial age estimation demonstrate its effectiveness.
tract: Automatic segmentation of anatomical structures is critical for many medical applications. However, the results are not always clinically acceptable and require tedious manual revision. Here, we present a novel concept called artificial intelligence assisted contour revision (AIACR) and demonstrate its feasibility. The proposed clinical workflow of AIACR is as follows given an initial contour that requires a clinicians revision, the clinician indicates where a large revision is needed, and a trained deep learning (DL) model takes this input to update the contour. This process repeats until a clinically acceptable contour is achieved. The DL model is designed to minimize the clinicians input at each iteration and to minimize the number of iterations needed to reach acceptance. In this proof-of-concept study, we demonstrated the concept on 2D axial images of three head-and-neck cancer datasets, with the clinicians input at each iteration being one mouse click on the desired location of the contour segment. The performance of the model is quantified with Dice Similarity Coefficient (DSC) and 95th percentile of Hausdorff Distance (HD95). The average DSC/HD95 (mm) of the auto-generated initial contours were 0.82/4.3, 0.73/5.6 and 0.67/11.4 for three datasets, which were improved to 0.91/2.1, 0.86/2.4 and 0.86/4.7 with three mouse clicks, respectively. Each DL-based contour update requires around 20 ms. We proposed a novel AIACR concept that uses DL models to assist clinicians in revising contours in an efficient and effective way, and we demonstrated its feasibility by using 2D axial CT images from three head-and-neck cancer datasets.
tract: We present the Regensburg Breast Shape Model (RBSM) - a 3D statistical shape model of the female breast built from 110 breast scans, and the first ever publicly available. Together with the model, a fully automated, pairwise surface registration pipeline used to establish correspondence among 3D breast scans is introduced. Our method is computationally efficient and requires only four landmarks to guide the registration process. In order to weaken the strong coupling between breast and thorax, we propose to minimize the variance outside the breast region as much as possible. To achieve this goal, a novel concept called breast probability masks (BPMs) is introduced. A BPM assigns probabilities to each point of a 3D breast scan, telling how likely it is that a particular point belongs to the breast area. During registration, we use BPMs to align the template to the target as accurately as possible inside the breast region and only roughly outside. This simple yet effective strategy significantly reduces the unwanted variance outside the breast region, leading to better statistical shape models in which breast shapes are quite well decoupled from the thorax. The RBSM is thus able to produce a variety of different breast shapes as independently as possible from the shape of the thorax. Our systematic experimental evaluation reveals a generalization ability of 0.17 mm and a specificity of 2.8 mm for the RBSM. Ultimately, our model is seen as a first step towards combining physically motivated deformable models of the breast and statistical approaches in order to enable more realistic surgical outcome simulation.
tract: Autonomous Underwater Vehicles (AUVs) are platforms used for research and exploration of marine environments. However, these types of vehicles face many challenges that hinder their widespread use in the industry. One of the main limitations is obtaining accurate position estimation, due to the lack of GPS signal underwater. This estimation is usually done with Kalman filters. However, new developments in the neuroscience field have shed light on the mechanisms by which mammals are able to obtain a reliable estimation of their current position based on external and internal motion cues. A new type of neuron, called Grid cells, has been shown to be part of path integration system in the brain. In this article, we show how grid cells can be used for obtaining a position estimation of underwater vehicles. The model of grid cells used requires only the linear velocities together with heading orientation and provides a reliable estimation of the vehicle's position. We provide simulation results for an AUV which show the feasibility of our proposed methodology.
tract: In the field of autonomous driving and robotics, point clouds are showing their excellent real-time performance as raw data from most of the mainstream 3D sensors. Therefore, point cloud neural networks have become a popular research direction in recent years. So far, however, there has been little discussion about the explainability of deep neural networks for point clouds. In this paper, we propose new explainability approaches for point cloud deep neural networks based on local surrogate model-based methods to show which components make the main contribution to the classification. Moreover, we propose a quantitative validation method for explainability methods of point clouds which enhances the persuasive power of explainability by dropping the most positive or negative contributing features and monitoring how the classification scores of specific categories change. To enable an intuitive explanation of misclassified instances, we display features with confounding contributions. Our new explainability approach provides a fairly accurate, more intuitive and widely applicable explanation for point cloud classification tasks. Our code is available at this https URL
tract: Artificial intelligence (AI) in healthcare is a potentially revolutionary tool to achieve improved healthcare outcomes while reducing overall health costs. While many exploratory results hit the headlines in recent years there are only few certified and even fewer clinically validated products available in the clinical setting. This is a clear indication of failing translation due to shortcomings of the current approach to AI in healthcare. In this work, we highlight the major areas, where we observe current challenges for translation in AI in healthcare, namely precision medicine, reproducible science, data issues and algorithms, causality, and product development. For each field, we outline possible solutions for these challenges. Our work will lead to improved translation of AI in healthcare products into the clinical setting
tract: 3D point cloud completion is very challenging because it heavily relies on the accurate understanding of the complex 3D shapes (e.g., high-curvature, concave/convex, and hollowed-out 3D shapes) and the unknown & diverse patterns of the partially available point clouds. In this paper, we propose a novel solution,i.e., Point-block Carving (PC), for completing the complex 3D point cloud completion. Given the partial point cloud as the guidance, we carve a3D block that contains the uniformly distributed 3D points, yielding the entire point cloud. To achieve PC, we propose a new network architecture, i.e., CarveNet. This network conducts the exclusive convolution on each point of the block, where the convolutional kernels are trained on the 3D shape data. CarveNet determines which point should be carved, for effectively recovering the details of the complete shapes. Furthermore, we propose a sensor-aware method for data augmentation,i.e., SensorAug, for training CarveNet on richer patterns of partial point clouds, thus enhancing the completion power of the network. The extensive evaluations on the ShapeNet and KITTI datasets demonstrate the generality of our approach on the partial point clouds with diverse patterns. On these datasets, CarveNet successfully outperforms the state-of-the-art methods.
tract: Ultrasound is the preferred choice for early screening of dense breast cancer. Clinically, doctors have to manually write the screening report which is time-consuming and laborious, and it is easy to miss and miswrite. Therefore, this paper proposes a method for efficiently generating personalized breast ultrasound screening preliminary reports by AI, especially for benign and normal cases which account for the majority. Doctors then make simple adjustments or corrections to quickly generate final reports. The proposed approach has been tested using a database of 1133 breast tumor instances. Experimental results indicate this pipeline improves doctors' work efficiency by up to 90%, which greatly reduces repetitive work.
tract: The computational vision community has recently paid attention to continual learning for blind image quality assessment (BIQA). The primary challenge is to combat catastrophic forgetting of previously-seen IQA datasets (i.e., tasks). In this paper, we present a simple yet effective continual learning method for BIQA with improved quality prediction accuracy, plasticity-stability trade-off, and task-order/length robustness. The key step in our approach is to freeze all convolution filters of a pre-trained deep neural network (DNN) for an explicit promise of stability, and learn task-specific normalization parameters for plasticity. We assign each new task a prediction head, and load the corresponding normalization parameters to produce a quality score. The final quality estimate is computed by feature fusion and adaptive weighting using hierarchical representations, without leveraging the test-time oracle. Extensive experiments on six IQA datasets demonstrate the advantages of the proposed method in comparison to previous training techniques for BIQA.
tract: Egocentric videos can bring a lot of information about how humans perceive the world and interact with the environment, which can be beneficial for the analysis of human behaviour. The research in egocentric video analysis is developing rapidly thanks to the increasing availability of wearable devices and the opportunities offered by new large-scale egocentric datasets. As computer vision techniques continue to develop at an increasing pace, the tasks related to the prediction of future are starting to evolve from the need of understanding the present. Predicting future human activities, trajectories and interactions with objects is crucial in applications such as human-robot interaction, assistive wearable technologies for both industrial and daily living scenarios, entertainment and virtual or augmented reality. This survey summarises the evolution of studies in the context of future prediction from egocentric vision making an overview of applications, devices, existing problems, commonly used datasets, models and input modalities. Our analysis highlights that methods for future prediction from egocentric vision can have a significant impact in a range of applications and that further research efforts should be devoted to the standardisation of tasks and the proposal of datasets considering real-world scenarios such as the ones with an industrial vocation.
tract: This paper presents a Simple and effective unsupervised adaptation method for Robust Object Detection (SimROD). To overcome the challenging issues of domain shift and pseudo-label noise, our method integrates a novel domain-centric augmentation method, a gradual self-labeling adaptation procedure, and a teacher-guided fine-tuning mechanism. Using our method, target domain samples can be leveraged to adapt object detection models without changing the model architecture or generating synthetic data. When applied to image corruptions and high-level cross-domain adaptation benchmarks, our method outperforms prior baselines on multiple domain adaptation benchmarks. SimROD achieves new state-of-the-art on standard real-to-synthetic and cross-camera setup benchmarks. On the image corruption benchmark, models adapted with our method achieved a relative robustness improvement of 15-25% AP50 on Pascal-C and 5-6% AP on COCO-C and Cityscapes-C. On the cross-domain benchmark, our method outperformed the best baseline performance by up to 8% AP50 on Comic dataset and up to 4% on Watercolor dataset.
tract: The pixelwise reconstruction error of deep autoencoders is often utilized for image novelty detection and localization under the assumption that pixels with high error indicate which parts of the input image are unfamiliar and therefore likely to be novel. This assumed correlation between pixels with high reconstruction error and novel regions of input images has not been verified and may limit the accuracy of these methods. In this paper we utilize saliency maps to evaluate whether this correlation exists. Saliency maps reveal directly how much a change in each input pixel would affect reconstruction loss, while each pixel's reconstruction error may be attributed to many input pixels when layers are fully connected. We compare saliency maps to reconstruction error maps via qualitative visualizations as well as quantitative correspondence between the top K elements of the maps for both novel and normal images. Our results indicate that reconstruction error maps do not closely correlate with the importance of pixels in the input images, making them insufficient for novelty localization.
tract: Knowledge built culturally across generations allows humans to learn far more than an individual could glean from their own experience in a lifetime. Cultural knowledge in turn rests on language: language is the richest record of what previous generations believed, valued, and practiced. The power and mechanisms of language as a means of cultural learning, however, are not well understood. We take a first step towards reverse-engineering cultural learning through language. We developed a suite of complex high-stakes tasks in the form of minimalist-style video games, which we deployed in an iterated learning paradigm. Game participants were limited to only two attempts (two lives) to beat each game and were allowed to write a message to a future participant who read the message before playing. Knowledge accumulated gradually across generations, allowing later generations to advance further in the games and perform more efficient actions. Multigenerational learning followed a strikingly similar trajectory to individuals learning alone with an unlimited number of lives. These results suggest that language provides a sufficient medium to express and accumulate the knowledge people acquire in these diverse tasks: the dynamics of the environment, valuable goals, dangerous risks, and strategies for success. The video game paradigm we pioneer here is thus a rich test bed for theories of cultural transmission and learning from language.
tract: Learning continuous control in high-dimensional sparse reward settings, such as robotic manipulation, is a challenging problem due to the number of samples often required to obtain accurate optimal value and policy estimates. While many deep reinforcement learning methods have aimed at improving sample efficiency through replay or improved exploration techniques, state of the art actor-critic and policy gradient methods still suffer from the hard exploration problem in sparse reward settings. Motivated by recent successes of value-based methods for approximating state-action values, like RBF-DQN, we explore the potential of value-based reinforcement learning for learning continuous robotic manipulation tasks in multi-task sparse reward settings. On robotic manipulation tasks, we empirically show RBF-DQN converges faster than current state of the art algorithms such as TD3, SAC, and PPO. We also perform ablation studies with RBF-DQN and have shown that some enhancement techniques for vanilla Deep Q learning such as Hindsight Experience Replay (HER) and Prioritized Experience Replay (PER) can also be applied to RBF-DQN. Our experimental analysis suggests that value-based approaches may be more sensitive to data augmentation and replay buffer sample techniques than policy-gradient methods, and that the benefits of these methods for robot manipulation are heavily dependent on the transition dynamics of generated subgoal states.