Computer Science And Technology - Research Publications

Permanent URI for this collectionhttps://kr.cup.edu.in/handle/32116/82

Browse

Search Results

Now showing 1 - 10 of 10
  • Item
    An Empirical Study on Detection of Android Adware Using Machine Learning Techniques
    (Springer, 2023-10-06T00:00:00) Farooq, Umar; Khurana, Surinder Singh; Singh, Parvinder; Kumar, Munish
    The Android operating system, without showing signs of diminishing, has experienced unprecedented popularity and continues to thrive with a significant user base. Its notable aspect for supporting third-party applications has revolutionized the digital landscape, allowing developers to generate revenue through advertising. Adware has emerged as a prominent monetization method for developers of both Adware and the applications that integrate it. However, as the utilization of Adware proliferates, it simultaneously escalates the risk of fraudulent activities associated with advertising approaches. The increasing prevalence of Adware introduces a pressing need for robust detection and mitigation strategies to address the potentially detrimental effects of fraudulent practices. In response, the proposed system focuses on analyzing and identifying alterations in network traffic acquired from Android devices. This research delves into an extensive exploration of machine and deep learning models, aiming to enhance the detection and mitigation of Adware. The exceptional capabilities of the LGBM model highlight the system's noteworthy performance in binary classification. However, in multiclass classification, the XGBM model emerges as the frontrunner, outperforming other models and showcasing superior effectiveness in distinguishing and classifying Adware and general Malware. These outcomes highlight the remarkable efficacy of the system in accurately classifying adware instances, regardless of the classification scenario. The findings not only validate the viability of the proposed system but also underscore the superior performance of specific machine learning models employed in the research. With further refinement and optimization, the system holds great promise in enhancing the security and integrity of the Android ecosystem. � 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
  • Item
    Feature Engineering and Ensemble Learning-Based Classification of VPN and Non-VPN-Based Network Traffic over Temporal Features
    (Springer, 2023-07-29T00:00:00) Abbas, Gazy; Farooq, Umar; Singh, Parvinder; Khurana, Surinder Singh; Singh, Paramjeet
    With the rapid advancement in technology, the constant emergence of new applications and services has resulted in a drastic increase in Internet traffic, making it increasingly challenging for network analysts to maintain network security and classify traffic, especially when encrypted or tunneled. To address this issue, the proposed strategy aims to distinguish between regular traffic and traffic tunneled through a virtual private network and characterize traffic from seven different applications. The proposed approach utilizes various ensemble machine learning techniques, which are efficient and accurate and consume minimal computational time for training and prediction compared to conventional machine and deep learning models. These models were applied for both the classification and characterization of network traffic, deriving efficient results. The extreme and light gradient boosting algorithms performed well in multiclass classification, while AdaBoost and Light GBM performed well in binary classification. However, when all the datasets were merged and categorized into two classes and various feature engineering methods were applied, the proposed system achieved an accuracy of more than 99%, with minimal error scores using light GBM with min�max scaling over stratified fivefold, thereby outperforming all existing approaches. This research highlights the efficiency and potential of the proposed model in detecting network traffic. � 2023, The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd.
  • Item
    Detection of content-based cybercrime in Roman Kashmiri using ensemble learning
    (Springer, 2023-09-25T00:00:00) Farooq, Umar; Singh, Parvinder; Khurana, Surinder Singh; Kumar, Munish
    The official language of Kashmir, Kashmiri language or Koshur, is spoken by more than 7 million people, yet its content-based cybercrime detection remains unexplored in theoretical and experimental research. Furthermore, the absence of programming libraries for sentimental analysis and a benchmark corpus has impeded advancements in this field. Challenges persist in working with diverse scripts of Kashmiri, including Perso-Arabic, Sharada, Devanagari, and Roman. Detecting cybercrime in this language is challenging due to its complex morphological nature, lack of resources, scarcity of annotated datasets, and varied linguistic characteristics, emphasizing the importance of overcoming these obstacles to develop effective detection systems. This paper attempts to detect content-based cybercrime in Roman Kashmiri script, extensively utilized on online platforms like social media, chat rooms, emails, etc., by the Kashmiri community. A well-balanced and meaningful dataset, the first of its kind in this context, is compiled, incorporating positive and negative comments, and three strategies were employed for analysis. The findings reveal that the Tf-Idf Vectorizer outperforms other tokenization methods (Count Vectorizer and Tf-Idf Transformer), bi-gram notation exhibits superior performance compared to one and tri-gram notations, and the XGBM proves to be the most effective in terms of evaluation metrics. Leveraging these strategies, Python applications were developed for text classification, successfully distinguishing cyberbullying (unsafe) from non-cyberbullying (safe) instances, with the XGBM exhibiting exceptional accuracy using the Tf-Idf Vectorizer with bi-gram, a Bag of Words, and lexical features. This pioneering research underscores the urgent need for content-based cybercrime detection advancements in the Kashmiri language, paving the way for effective detection systems to address language-specific challenges and promote a safer online environment for the Kashmiri community. Furthermore, this research opens new avenues for further advancements in detecting and preventing cybercrime in Kashmiri and potentially in other languages lacking robust cybercrime detection methodologies. � 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
  • Item
    Revolutionize AI Trading Bots with AutoML-Based Multi-timeframe Bitcoin Price Prediction
    (Springer, 2023-06-27T00:00:00) Khurana, Surinder Singh; Singh, Parvinder; Garg, Naresh Kumar
    Multi-timeframe analysis/prediction provides essential information to traders. It gives a broader perspective of market trends and is used to identify significant levels of support and resistance. This will help traders/trading bots in making trading decisions. The majority of current studies focused on forecasting the closing price of daily candlesticks or high-frequency time frames, such as those of 1�min or 5�min. For artificially intelligent trading bots focusing on swing trading, price prediction related to other time frames is very significant. In this research, we present a study on developing a model to enable artificial intelligent-based trading bots to predict price components (open, high, low, and close prices) of the next 30-min, 1-h, and 4-h candlesticks of Bitcoin price. The study used two Auto-Machine Learning libraries: Tree-Based Pipeline Optimization Tool (TPOT) and AutoSklearn, to find the most suitable model for the task. The models are trained on historical price data of Bitcoin, and technical indicators are computed on these data. The performance of the trained models is evaluated in terms of R2 Score (Coefficient of Determination), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). The results showed that TPOT outperformed AutoSklearn library for all three time frames. It predicted all price components of 30-min candlestick with R2 Score of 0.999. � 2023, The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd.
  • Item
    OG-CAT: A Novel Algorithmic Trading Alternative to Investment in Crypto Market
    (Springer, 2023-03-28T00:00:00) Khurana, Surinder Singh; Singh, Parvinder; Garg, Naresh Kumar
    Cryptocurrencies have emerged as a good tool for investment/trading in the last decade. The investors have achieved promising gains with the long-term investments made at reasonably good price/time. However, investment in cryptocurrencies is also exposed to extremely high volatility. Due to this, the investment may suffer from a high drawdown as the price may fall. In this work, we proposed optimized Greedy-cost averaging based trading (OG-CAT) a novel trading framework as an alternative to long-term investment in cryptocurrencies. The approach exploits the wavy structure of the price movement of cryptocurrencies, the high volatility of price, and the concept of cost averaging. Furthermore, the parameters of the approach are optimized with the simulated annealing algorithm. The approach is evaluated on the two prominent cryptocurrencies: bitcoin and ethereum. During the evaluation, OG-CAT not only outperformed the buy-and-hold investment approach in terms of profit but also demonstrated a lower drawdown. The profit percentage in the case of trading BTC with OG-CAT is 1.63 times more and the max drawdown is 1.62 times less than compared to the buy-and-hold strategy. � 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
  • Item
    CottonLeafNet: cotton plant leaf disease detection using deep neural networks
    (Springer, 2023-03-18T00:00:00) Singh, Paramjeet; Singh, Parvinder; Farooq, Umar; Khurana, Surinder Singh; Verma, Jitendra Kumar; Kumar, Munish
    India is a cover crop region whereby agricultural production sustains a substantial proportion of the populace and upon which the whole Indian economy is heavily reliant. As per research, it provides subsistence for around 70% of rural households. In terms of agricultural output and exports, India ranks second and ninth, respectively. However, it accomplishes the first position globally in terms of cotton exports thereby adequately contributing to the economy of the country. However, it has been documented that various crops especially cotton plants are severely harmed by various pests, extreme climatic variations, nutrient inadequacy and toxicity, and so on. Cotton plant diseases cause a wide range of illnesses ranging from bacterial to nutritional deficiency giving a hard time for the human eye to recognize. However, most of the researchers have considered only a few types of cotton leaf diseases and excluded many. Keeping these constraints in consideration, this research seeks to aid the detection of these diseases by employing deep learning paradigms. The research begins with acquiring a near-balanced dataset with 22 leaf disease types including bacterial, fungal, viral, nutrient deficiency, etc. followed by data augmentation to boost the performance of the models. Many algorithms were tested, however, CNN happens to be very efficient and productive. The proposed model when evaluated on the test set achieves an accuracy of 99.39% with a negligible error rate, thus outperforming all the existing approaches by consuming less computational time. The outcome portrays that the proposed approach has the efficiency to be implemented in real-time detection systems to aid the precise detection of cotton leaf diseases to help the farmers in taking appropriate actions. � 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
  • Item
    Recognition of offline handwritten Urdu characters using RNN and LSTM models
    (Springer, 2022-06-17T00:00:00) Misgar, Muzafar Mehraj; Mushtaq, Faisel; Khurana, Surinder Singh; Kumar, Munish
    Optical Character Recognition (OCR), helps to convert different types of scanned documents, such as images into searchable and editable content. OCR is language dependant and very limited research has been carried out in this field for Urdu and Urdu like scriptures (E.g. Farsi, Arabic, and Urdu) unlike other languages like English, Hindi, etc. The lack of research work is attributed to a lack of publically available benchmark databases and inherent complexities involved in these languages like cursive nature and change in the shape of a character depending upon its position in a ligature. Each character has 2�4 different shapes depending upon its position in the word; initial, medial, or final. In this article, the we have proposed a methodology to automate the data collection process and collected a large handwritten dataset of 110,785 Urdu characters and laid out the comaparative analysis of two deep learning models SimpleRNN and LSTM to showcase the potential of RNN models for chararacter recognition. Data was collected from 250 authors on the A4 size sheet. Each sheet contains 132 shapes for Urdu characters and 10 numerals. As far as the authors know, this is the first time that such a large dataset has been proposed which contains all the possible shapes of Urdu character numerals as well. Experimentation has been done for the numeral, full characters, and for whole data set separately to lay a comparative analysis of classification capabilities of RNN and LSTM models. Despite of such inherit complexities in Urdu script, the RNN and LSTM models proved to be more effective in achieving a high accuracy rates. Respective accuracy for RNN achieved for each category are: 96.96% for numerals, 85.22% for full characters and 73.62% for whole data and LSTM outperforms the prior one with max accuracy for each category of data as 97.80% for numerals, 97.43% for full characters and 91.30% for whole data. Besides, the proposed dataset opens a new window for future research, showcasing the huge potential of this dataset for data analysis not only for Urdu language but for other languages like Arabic, Persian,etc. which uses similar kind of character sets. � 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
  • Item
    An efficient approach for copy-move image forgery detection using convolution neural network
    (Springer, 2022-02-17T00:00:00) Koul, Saboor; Kumar, Munish; Khurana, Surinder Singh; Mushtaq, Faisel; Kumar, Krishan
    Digital imaging has become elementary in this novel era of technology with unconventional image forging techniques and tools. Since, we understand that digital image forgery is possible, it cannot be even presented as a piece of evidence anywhere. Dissecting this fact, we must dig unfathomable into the issue to help alleviate such derelictions. Copy-move and splicing of images to create a forged one prevail in this monarchy of digitalization. Copy-move involves copying one part of the image and pasting it to another part of the image while the latter involves merging of two images to significantly change the original image and create a new forged one. In this article, a novel slant using a convolutional neural network (CNN) has been proposed for automatic detection of copy-move forgery detection. For the experimental work, a benchmark dataset namely, MICC-F2000 is considered which consists of 2000 images in which 1300 are original and 700 are forged. The experimental results depict that the proposed model outperforms the other traditional methods for copy-move forgery detection. The results of copy-move forgery were highly promising with an accuracy of 97.52% which is 2.52% higher than the existing methods. � 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
  • Item
    Semi-supervised labeling: a proposed methodology for labeling the twitter datasets
    (Springer, 2022-01-28T00:00:00) Jan, Tabassum Gull; Khurana, Surinder Singh; Kumar, Munish
    Twitter has nowadays become a trending microblogging and social media platform for news and discussions. Since the dramatic increase in its platform has additionally set off a dramatic increase in spam utilization in this platform. For Supervised machine learning, one always finds a need to have a labeled dataset of Twitter. It is desirable to design a semi-supervised labeling technique for labeling newly prepared recent datasets. To prepare the labeled dataset lot of human affords are required. This issue has motivated us to propose an efficient approach for preparing labeled datasets so that time can be saved and human errors can be avoided. Our proposed approach relies on readily available features in real-time for better performance and wider applicability. This work aims at collecting the most recent tweets of a user using Twitter streaming and prepare a recent dataset of Twitter. Finally, a semi-supervised machine learning algorithm based on the self-training technique was designed for labeling the tweets. Semi-supervised support vector machine and semi-supervised decision tree classifiers were used as base classifiers in the self-training technique. Further, the authors have applied K means clustering algorithm to the tweets based on the tweet content. The principled novel approach is an ensemble of semi-supervised and unsupervised learning wherein it was found that semi-supervised algorithms are more accurate in prediction than unsupervised ones. To effectively assign the labels to the tweets, authors have implemented the concept of voting in this novel approach and the label pre-directed by the majority voting classifier is the actual label assigned to the tweet dataset. Maximum accuracy of 99.0% has been reported in this paper using a majority voting classifier for spam labeling. � 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
  • Item
    UrduDeepNet: offline handwritten Urdu character recognition using deep neural network
    (Springer Science and Business Media Deutschland GmbH, 2021-06-07T00:00:00) Mushtaq, Faisel; Misgar, Muzafar Mehraj; Kumar, Munish; Khurana, Surinder Singh
    Handwritten Urdu character recognition system faces several challenges including the writer-dependent variations and non-availability of benchmark databases for cursive writing scripts. In this study, we propose a handwritten Urdu character dataset for Nasta�liq writing style covering isolated, positional characters as well as numerals. We also propose a convolutional neural network (CNN) architecture for the recognition of handwritten Urdu characters and numerals. CNN is a novel technique for image recognition that does not need explicit feature engineering and extraction and produces efficient results as compared to standard handcrafted feature extraction approaches. The proposed system was trained on a training dataset of 74, 285 samples and evaluated on a test dataset of 21, 223 samples and achieved a recognition rate of 98.82% for 133 classes, outperforming the results of all state-of-the-art systems for the Urdu language. � 2021, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature.