Depression is a common and serious mental disorder that causes a person to have sad or hopeless feelings in his/her daily life. With the rapid development of social media, people tend to express their thoughts or emotions on the social platform. Different social platforms have various formats of data presentation, which makes huge and diverse data available for analysis by researchers. In our study, we aim to detect users with depressive tendency on Instagram. We create a depression dictionary for automatically collecting data of depressive and non-depressive users. In terms of the prediction model, we construct a multimodal system, which utilizes image, text and behavior features to predict the aggregated depression score of each post on Instagram. Considering the time interval between posts, we propose a two-stage detection mechanism for detecting depressive users. Experimental results demonstrate that our proposed methods can achieve up to 0.835 F1-score for detecting depressive users. It can therefore serve as an early depression detector for a timely treatment before it becomes severe.
The online question answering (QA) community has been popular in recent years. In this paper, we focus on the online health question answering (HQA) community. The HQA community provides a platform for health consumers to inquire about health information. There are two ways to use this platform. One is to post a question and wait for answers to be provided by authenticated doctors. The other is to search for relevant questions with answers. For the latter, health consumers may prefer an accepted answer marked by the previous health consumer. However, there is a large proportion of questions without an accepted answer and it is inconvenient for people who want to search for relevant questions. To address this issue, we aim to select high-quality answers from the answers without marked accepted answers. We propose a deep learning approach to achieve this goal. To train the model for the prediction of answer quality, we first view the accepted answer as the positive answer and propose a method to label the negative answer. Next, we capture the semantic information on the question and the answer by the deep learning structure. We then combine the information to predict the quality score of the answer. We collect data from one of the biggest Chinese HQA community and divide them into groups by the medical departments for detailed analysis. Finally, we conduct experiments to show the effectiveness of categorization and the labeling method. The results show that our approach outperforms other studies and we further research into the differences among the results of different categories.
Given a set of existing products in the market and a set of customer preferences, we set a price for a specific product selected from a pool of candidate products to launch to market to gain the most profit. A customer preference represents his/her basic requirements. The dynamic skyline of a customer preference identifies the products that the customer may purchase. Each time the price of a candidate product is adjusted, it needs to compete with all of the existing products to determine whether it can be one of the dynamic skyline products of some customer preferences. To compute in parallel, we use a Voronoi-Diagram-based partitioning method to separate the set of existing products and that of customer preferences into cells. For these cells, a large number of combinations can be generated. For each price under consideration of a candidate product, we process all the combinations in parallel to determine whether this candidate product can be one of the dynamic skyline products of the customer preferences. We then integrate the results to decide the price for each candidate product to achieve the most profit. To further improve the performance, we design two efficient pruning strategies to avoid computing all combinations. A set of experiments using real and synthetic datasets are performed and the experiment results reveal that the pruning strategies are effective.
In the Industry 4.0 era, manufacturers compete to produce better products that are expected to satisfy a larger number of customers. We propose a recommendation system for upgrading products considering user preferences. This approach is based on the dominating regions of dominant competitors. The dominating region represents the estimation of the number of potential customers. However, examining overlapped dominating regions for a high dimensional space is NP-hard. We propose a novel method named TDRDFS which constructs a Dominant Graph of Intersection skyline points (DGI) for modeling the dominating regions. Our experiments show that TDRDFS significantly reduces computation. Based on our approach, product vendors are able to determine the strategy of upgrading products easily.
A skyline query searches the data points that are not dominated by others in the dataset. It is widely adopted for many applications which require multi-criteria decision making. However, skyline query processing is considerably time-consuming for a high-dimensional large scale dataset. Parallel computing techniques are therefore needed to address this challenge, among which MapReduce is one of the most popular frameworks to process big data. A great number of efficient MapReduce skyline algorithms have been proposed in the literature and most of their designs focus on partitioning and pruning the given dataset. However, there are still opportunities for further parallelism. In this study, we propose two parallel skyline processing algorithms using a novel LShape partitioning strategy and an effective Propagation Filtering method. These two algorithms are 2Phase LShape and 1Phase LShape, used for multiple reducers and single reducer, respectively. By extensive experiments, we verify that our algorithms outperformed the state-of-the-art approaches, especially for high-dimensional large scale datasets.
Acute Kidney Injury (AKI) is common among inpatients. Severe AKI increases all-cause mortality especially in critically ill patients. Older patients are more at risk of AKI because of the declined renal function, increased comorbidities, aggressive medical treatments, and nephrotoxic drugs. Early prediction of AKI for older inpatients is therefore crucial.We use 80 different laboratory tests from the electronic health records and two types of representations for each laboratory test, that is, we consider 160 (laboratory test, type) pairs one by one to do the prediction. By proposing new similarity measures and employing the classification technique of the K nearest neighbors, we are able to identify the most effective (laboratory test, type) pairs for the prediction. Furthermore, in order to know how early and accurately can AKI be predicted to make our method clinically useful, we evaluate the prediction performance of up to 5 days prior to the AKI event.
In recent years, patients usually accept more accurate and detailed examinations because of the rapid advances in medical technology. Many of the examination reports are not represented in numerical data, but text documents written by the medical examiners based on the observations from the instruments and biochemical tests. If the above-mentioned unstructured data can be organized as a report in a structured form, it will help doctors to understand a patient's status of the various examinations more efficiently. Besides, further association analysis on the structuralized data can be performed to identify potential factors that affect a disease.In this paper, from the pathology examination reports of renal diseases, we applied the POS tagging results of natural language analysis to automatically extract the keyword phrases. Then a medical dictionary for various examination items in an examination report is established, which is used as the basic information for retrieving the terms to construct a structured form of the report. Moreover, a topical probability modeling method is applied to automatically discover the candidate keyword phrases of the examination items from the reports. Finally, a system is implemented to generate the structured form for the various examination items in a report according to the constructed medical dictionary.
Depression, a common mental disorder, affects not only individuals but also families and society. In the beginning stage, most of the depressive people do not know they are suffering from depression. Some of them visit different medical departments to ask for help. However, their symptoms may not be relieved because of not having a proper diagnosis. In this paper, we find discriminatory features for establishing an early depression detection model by analyzing medical data. These features are composed of patients’ medical information, including diagnosed diseases and medical departments. We use real-world electronic health records dataset from the Taiwan National Health Insurance Research Database for the analysis and focus on young people aged 10-24 years. The experiment results show that our model can detect future diagnosis of depression based on patients’ records up to 90 days in advance. Furthermore, even better performance can be achieved with longer observation time.
The World Health Organization (WHO) predicts that depression disorders will be widespread in the next 20 years. These disorders may affect a person’s general health and habits such as altered sleeping and eating patterns in addition to their interpersonal relationships. Early depression detection and prevention therefore becomes an important issue. To address this critical issue, we recruited 1453 individuals who use Facebook frequently and collected their Facebook data. We then propose an automatic depression detection approach, named Deep Learning-based Depression Detection with Heterogeneous Data Sources (D3-HDS), to predict the depression label of an individual by analyzing his/her living environment, behavior, and the posting contents in the social media. The proposed method employs Recurrent Neural Networks to compute the posts representation of each individual.
Research on mining data collected from smartphones has received tremendous interests in the past few years. While significant research efforts have been made on mining various smartphone data, such as GPS trajectories, app usage logs, and accelerator readings, the issue of mining Wi-Fi SSID (Service Set IDentifier) logs was not well explored. The SSID of a Wi-Fi access point is normally a human-readable string, which is typically named by the owner of the Wi-Fi network. Extracting and leveraging information encoded in SSIDs are crucial for a better understanding of the smartphone data. In this study, we investigate the problem of inferring location type of a given SSID, i.e., associating an SSID with a location type, such as a workplace or a shop, where the Wi-Fi access point is installed.
In this paper, in order to efficiently process skyline queries by the MapReduce framework, two algorithms are proposed to prevent the bottleneck of centrally finding the global skyline from the local skylines. The proposed algorithms aim to reduce the number of dominance tests, which check whether a data point is dominated by another data point, and perform the necessary dominance tests in parallel. The first algorithm uses a grid-based and an angle-based partitioning schemes to divide the data space into segments for finding the local skyline data points. Two sets of rules are designed respectively for the two partitioning methods to reduce the number of dominance tests among the local skyline data points to find the skyline data points. The second algorithm uses the skyline data points discovered from sample data points to filter out most non-skyline data points in the mappers.
Human rationality–the ability to behave in order to maximize the achievement of their presumed goals (i.e., their optimal choices)–is the foundation for democracy. Research evidence has suggested that voters may not make decisions after exhaustively processing relevant information; instead, our decision-making capacity may be restricted by our own biases and the environment. In this paper, we investigate the extent to which humans in a democratic society can be rational when making decisions in a serious, complex situation–voting in a local political election. We believe examining human rationality in a political election is important, because a well-functioning democracy rests largely upon the rational choices of individual voters. Previous research has shown that explicit political attitudes predict voting intention and choices (i.e., actual votes) in democratic societies, indicating that people are able to reason comprehensively when making voting decisions. Other work, though, has demonstrated that the attitudes of which we may not be aware, such as our implicit (e.g., subconscious) preferences, can predict voting choices, which may question the well-functioning democracy. In this study, we systematically examined predictors on voting intention and choices in the 2014 mayoral election in Taipei, Taiwan. Results indicate that explicit political party preferences had the largest impact on voting intention and choices. Moreover, implicit political party preferences interacted with explicit political party preferences in accounting for voting intention, and in turn predicted voting choices. Ethnic identity and perceived voting intention of significant others were found to predict voting choices, but not voting intention. In sum, to the comfort of democracy, voters appeared to engage mainly explicit, controlled processes in making their decisions; but findings on ethnic identity and perceived voting intention of significant others may suggest otherwise.
Nowadays, mobile devices have become a ubiquitous medium supporting various forms of functionality and are widely accepted for commons. In this study, we investigate using Wi-Fi logs from a mobile device to discover user preferences. The core ideas are two folds. First, every Wi-Fi access point is with a network name, normally a human-readable string, called SSID (Service Set Identifier). Since SSIDs are often with semantics, from which we can infer the place where the user stayed. Second, a Wi-Fi log is produced when the user is near a Wi-Fi access point. A high frequency of a consecutively observed SSID implies a long stay duration at a place. To the best of our knowledge, our work is the first attempting to understand users from the collected Wi-Fi logs from mobile devices. However, Wi-Fi logs are essentially of various information types and with noises.